Problem Statement
Why is observability important in modern operating systems and what OS-level metrics would you track in production?
Explanation
Observability in modern operating systems and platforms means having visibility into performance, reliability and resource usage at runtime so that operators can detect anomalies, diagnose faults and optimise behaviour. OS-level metrics to track include CPU utilisation, queue lengths of ready and I/O queues, memory usage, page fault rate, context-switch rate, interrupt rate, I/O latency, scheduler load, and for cloud OS also VM/container migration events, network packet drops and NUMA node imbalances. Referencing case studies of production OS helps you show applied knowledge in interviews.
