Problem Statement
Explain how to analyze system logs for troubleshooting. What are common log files and what do they contain?
Explanation
Key log files: /var/log/syslog or /var/log/messages (general system messages from kernel, system daemons, applications), /var/log/auth.log or /var/log/secure (authentication, sudo, SSH login attempts), /var/log/kern.log (kernel messages, hardware issues, driver problems), /var/log/dmesg (boot messages, hardware detection), /var/log/boot.log (service startup during boot).
Application logs: /var/log/apache2/ or /var/log/httpd/ (web server access and error logs), /var/log/mysql/ (database logs), /var/log/nginx/ (nginx logs), /var/log/mail.log (mail server), application-specific directories. Check application documentation for log locations.
Systemd journal: journalctl queries systemd journal (binary format, queryable). Commands: journalctl -xe (recent entries with explanations), journalctl -u service (service-specific), journalctl --since "1 hour ago" (time-based), journalctl -f (follow), journalctl -b (current boot), journalctl -k (kernel messages), journalctl -p err (priority filter).
Log analysis techniques: tail -f /var/log/syslog monitors in real-time, grep 'ERROR' /var/log/syslog finds errors, awk, sed for parsing structured logs, sort | uniq -c for frequency analysis. Example: grep 'Failed password' /var/log/auth.log | awk '{print $11}' | sort | uniq -c | sort -rn lists failed login attempts by IP.
Common patterns: "Out of memory" indicates memory exhaustion, "Connection refused" suggests service not running or firewall blocking, "Permission denied" indicates file permission issues, "No space left on device" means disk full, "segmentation fault" indicates application crash. Correlate timestamps across logs to understand event sequences.
Log rotation: logrotate prevents logs from filling disk. Configuration in /etc/logrotate.conf and /etc/logrotate.d/. Settings: rotation frequency (daily, weekly), number of old logs to keep, compression, post-rotation scripts. Check logrotate status in /var/lib/logrotate/status.
Centralized logging: for multiple servers, use centralized logging (ELK stack, Splunk, Graylog) shipping logs to central server for analysis, correlation, and alerting. Understanding log analysis is fundamental for troubleshooting production issues.