Sharding provides powerful horizontal scaling but comes with significant limitations and challenges that you must consider before implementation.
First, operational complexity increases dramatically. You must deploy and manage config servers, mongos routers, and multiple shard replica sets. This is much more complex than managing a single replica set. Monitoring, backup, and maintenance procedures become more complicated.
Second, the shard key is immutable after sharding. Once you shard a collection with a specific shard key, you cannot change it without recreating the collection and migrating all data. Choosing the wrong shard key can cripple performance, and fixing it requires significant downtime and effort. This makes shard key selection a critical decision that must be made carefully.
Third, some operations are limited or inefficient in sharded clusters. Unique indexes can only be created on the shard key or fields that include the shard key. This restricts your ability to enforce uniqueness on other fields. Transactions across shards are possible but have performance implications. Aggregation pipelines may require merging results from multiple shards.
Fourth, scatter-gather queries that hit all shards are significantly slower than targeted queries. If your query patterns do not include the shard key, performance may actually be worse than an unsharded deployment. This makes query pattern analysis critical before sharding.
Fifth, balancing operations consume resources. Chunk migrations use network bandwidth, disk I/O, and CPU. During heavy migration periods, cluster performance can degrade. You can schedule balancing windows, but this adds management overhead.
Finally, sharding requires more hardware and infrastructure, increasing costs. You need at minimum three config servers, at least two mongos routers, and multiple shard replica sets. For small datasets, these costs outweigh the benefits.
Example code
// Shard key limitations
// Cannot change shard key after sharding
sh.shardCollection("db.users", { email: 1 })
// Later realize email is poor choice
// Must drop collection and reshhard - significant downtime
// Unique indexes require shard key
// Shard key: { userId: 1 }
db.users.createIndex({ email: 1 }, { unique: true })
// ERROR: cannot create unique index on field without shard key
// Must include shard key
db.users.createIndex({ userId: 1, email: 1 }, { unique: true })
// OK: includes shard key
// Scatter-gather queries are slow
db.users.find({ age: { $gt: 25 } }) // No shard key
// Queries all shards, slow with many shards
// Before sharding: consider alternatives
// - Better indexing
// - Vertical scaling
// - Read replicas for read distribution
// - Application-level caching