Problem Statement
Explain the concept of sharding in MongoDB and when you should implement it.
Explanation
Sharding is MongoDB's method for horizontal scaling by distributing data across multiple servers called shards. Each shard is a separate database that holds a portion of the data, and together all shards form the complete dataset. This allows MongoDB to handle datasets and workloads that exceed what a single server can support.
In a sharded cluster, data is partitioned based on a shard key, which is a field or fields that exist in every document. MongoDB divides the shard key values into chunks, and each chunk is assigned to a specific shard. As data grows, MongoDB automatically splits chunks and migrates them across shards to maintain balance.
You should implement sharding when your dataset exceeds the storage capacity of a single server, typically when approaching several hundred gigabytes or terabytes. Sharding is also appropriate when your read or write throughput exceeds what a single server or replica set can handle, even with adequate hardware.
However, sharding adds complexity to your deployment. You need config servers to store cluster metadata, mongos routers to direct queries, and multiple shard replica sets. Only implement sharding when you have exhausted other optimization options like indexing, vertical scaling, and replica set read distribution.
Before sharding, ensure your application is ready. Choose your shard key carefully because it cannot be changed after sharding without recreating the collection. A good shard key provides high cardinality, even distribution, and supports your most common query patterns.
Code Solution
SolutionRead Only
// When to shard:
// 1. Data size > single server storage (500GB-1TB+)
// 2. Working set > available RAM
// 3. Write throughput > single server capacity
// 4. Need to distribute data geographically
// Sharded cluster components:
// Config Servers: store metadata (replica set)
// Mongos Routers: query routing (multiple instances)
// Shards: store data (each is replica set)
// Basic sharding setup
sh.enableSharding("myDatabase")
sh.shardCollection("myDatabase.users", { userId: 1 })