Problem Statement
Explain how to combine multiple text processing commands using pipes. Provide examples of complex data extraction and analysis pipelines.
Explanation
Pipes (|) connect command output to input of next command, enabling powerful data processing pipelines. Each command in pipeline processes data and passes results to next command. Example: cat access.log | grep '200' | awk '{print $1}' | sort | uniq -c | sort -rn extracts IPs with successful requests, counts occurrences, and sorts by frequency.
Breaking down the pipeline: cat reads file, grep filters 200 status codes, awk extracts IP addresses (first field), first sort orders for uniq, uniq -c counts duplicates, final sort -rn orders by count descending. This pattern finds most frequent IPs making successful requests, useful for analyzing web traffic or detecting potential abuse.
Log analysis pipeline: tail -f /var/log/syslog | grep 'ERROR' | awk '{print $5}' | sort | uniq -c monitors errors in real-time, groups by error type, and counts occurrences. CSV processing: cut -d',' -f2,5 data.csv | grep 'active' | sort -t',' -k2 -n extracts specific columns, filters active records, and sorts numerically.
Complex example: ps aux | grep -v grep | awk '$3 > 10 {print $2, $3, $11}' | sort -k2 -rn | head -10 finds top 10 CPU-consuming processes excluding grep. Disk usage: du -h /var | sort -h | tail -20 finds 20 largest directories. Understanding pipelines enables sophisticated data analysis using standard Unix tools without writing custom scripts.
