Boot failures: symptoms - system won't start, kernel panic, drops to emergency shell. Troubleshooting: boot from live USB, check logs in /var/log, run fsck on filesystems, check /etc/fstab for errors (typos, wrong UUIDs), reinstall bootloader (grub-install), boot older kernel from GRUB menu, check hardware (bad RAM, failing disk with SMART data: smartctl -a /dev/sda).
Network problems: symptoms - no connectivity, slow network, DNS issues. Troubleshooting: ping 127.0.0.1 (test loopback), ping gateway (test local network), ping 8.8.8.8 (test internet without DNS), nslookup google.com (test DNS), check interface status: ip link, check IP configuration: ip addr, check routes: ip route, check DNS: cat /etc/resolv.conf, restart network: systemctl restart networking, check firewall rules, test with different DNS: nslookup google.com 8.8.8.8.
Performance degradation: symptoms - slow response, high load average, OOM killer activating. Troubleshooting: top/htop identify resource hogs, check CPU (>80% sustained), memory (high swap usage), disk I/O (high iowait in top), check for memory leaks (growing RSS), review logs for errors, analyze with sar historical data, investigate recent changes (new software, config changes, increased load), iostat for disk bottlenecks, netstat for network connections.
Service failures: symptoms - service won't start, keeps restarting, crashes. Troubleshooting: systemctl status service shows state and recent logs, journalctl -xu service shows detailed logs, check config files for syntax errors, verify file permissions, check dependencies: systemctl list-dependencies service, test manually: /usr/bin/service --verbose, review recent changes, check available resources (disk space, memory), strace for system call tracing.
Disk full issues: symptoms - "No space left" errors, applications crashing, can't write logs. Troubleshooting: df -h shows usage by filesystem, find large files: du -sh /* | sort -h, check inodes: df -i, clean /tmp and /var/tmp, rotate logs, clean package caches, find deleted but open files: lsof | grep deleted (restart services to release), expand filesystem or add storage.
High CPU usage: identify process with top, check if legitimate load or bug (runaway loop), nice/renice to lower priority, kill if necessary, investigate application logs, check for infinite loops or inefficient queries, profile with strace or perf.
Memory issues: check with free -h, identify memory hogs with top (RSS column), check for leaks (memory growing over time), adjust swappiness if swapping too aggressively, add more RAM if consistently over capacity, check OOM killer logs: dmesg | grep -i kill.
Systematic approach: define problem clearly, collect information (logs, monitoring data, recent changes), form hypothesis, test hypothesis, implement fix, verify resolution, document solution. Understanding common issues and systematic troubleshooting reduces mean time to resolution.