Problem Statement
What factors should you consider when choosing a shard key? Explain the consequences of a poor shard key choice.
Explanation
Choosing a shard key is one of the most critical decisions in sharding because it cannot be changed after sharding without rebuilding the collection. A good shard key must satisfy three main criteria: high cardinality, even distribution, and query pattern alignment.
High cardinality means the shard key has many distinct values. This allows MongoDB to distribute data across many chunks and shards. Low cardinality keys, like a status field with only three values, can only create three chunks maximum, preventing effective distribution across many shards.
Even distribution means queries and inserts are spread across all shards, avoiding hotspots. Monotonically increasing keys like timestamps or auto-incrementing IDs are poor choices because all new writes go to the highest chunk on one shard. This creates a write hotspot that defeats the purpose of sharding.
Query pattern alignment means your most common queries should include the shard key. When queries include the shard key, mongos can route them directly to specific shards. Without the shard key in queries, mongos must broadcast to all shards, which is inefficient and slow.
Poor shard key choices lead to several problems. Uneven distribution causes some shards to fill up while others remain empty, wasting resources and limiting scalability. Hotspots concentrate all activity on one shard, creating bottlenecks. Scatter-gather queries that hit all shards are slow and resource-intensive.
A common strategy is using a compound shard key that combines a field with good distribution, like user ID, with a time-based field for efficient time-range queries. Another approach is using hashed shard keys for monotonically increasing values to ensure even distribution.
Code Solution
SolutionRead Only