MongoDB/NoSQL Interview Questions & Answers | Preplance

Mobile App Now Available

Master Interviews
Anywhere, Anytime

Get the Preplance app for a seamless learning experience. Practice offline, get daily streaks, and stay ahead with real-time interview updates.

Get it on

Google Play

4.9/5 Rating on Store

MongoDB/NoSQL Interview Questions & Answers | Preplance | Preplance

41. What are the best practices for optimizing aggregation pipeline performance?

Difficulty: HardType: SubjectiveTopic: Aggregation

First, place dollar match stages as early as possible to filter documents before expensive operations. This reduces the data volume flowing through the pipeline. Use indexes for dollar match and dollar sort stages by ensuring indexed fields are used in these operations. Second, limit the data processed by using dollar project or dollar addFields to remove unnecessary fields early in the pipeline. Smaller documents flow faster through stages. Avoid using dollar lookup when possible, as joins are expensive. Consider embedding data instead of referencing. Third, use dollar limit immediately after dollar sort to prevent sorting the entire result set. MongoDB can optimize this combination to use a top-K sort algorithm, which is much faster than sorting everything then limiting. Fourth, use allowDiskUse option for large aggregations that exceed the memory limit of 100 MB. This allows MongoDB to write temporary files to disk, though it is slower than in-memory processing. Fifth, analyze your pipeline with explain method to see which stages are slow and whether indexes are being used. Sixth, for very large aggregations, consider using dollar merge or dollar out to write results to a collection, then query that collection. This is useful for reports that do not need real-time data.

Example Code

// Optimized pipeline
db.orders.aggregate([
  // 1. Filter early with indexed field
  { $match: { status: "completed", date: { $gte: ISODate("2024-01-01") } } },
  
  // 2. Project only needed fields
  { $project: { customerId: 1, amount: 1, _id: 0 } },
  
  // 3. Group and calculate
  { $group: { _id: "$customerId", total: { $sum: "$amount" } } },
  
  // 4. Sort with limit (uses top-K sort)
  { $sort: { total: -1 } },
  { $limit: 100 }
], {
  allowDiskUse: true  // For large datasets
})

// Create index to support pipeline
db.orders.createIndex({ status: 1, date: 1 })

50. Describe the replication architecture in MongoDB. How do secondary nodes stay synchronized with the primary?

Difficulty: HardType: SubjectiveTopic: Replication

MongoDB's replication architecture centers around the primary node, which is the only node accepting write operations. When a write occurs, the primary records it in the oplog, a special capped collection in the local database. The oplog stores operations in the order they are executed. Secondary nodes continuously query the primary's oplog for new entries. When new operations appear, secondaries copy these oplog entries and apply them to their own data files in the same order. This process is called replication. The operations in the oplog are idempotent, meaning they can be applied multiple times safely without changing the result. Secondaries maintain a cursor on the primary's oplog, tracking the timestamp of the last operation they applied. They fetch operations with timestamps greater than their last applied timestamp. This allows secondaries to catch up if they fall behind due to network issues or heavy load. Sometimes a secondary can replicate from another secondary instead of the primary, a process called chained replication. This reduces load on the primary and is enabled by default. However, data always originates from the primary through the oplog. If a secondary falls too far behind and the operations it needs have been overwritten in the oplog, it must perform an initial sync, copying all data from another member. This is why oplog sizing is important for production deployments.

Example Code

// Replication flow:
// 1. Write operation on primary
db.users.insertOne({ name: "Alice" })  // On primary

// 2. Primary writes to oplog
// local.oplog.rs: { ts: Timestamp(...), op: "i", ns: "db.users", o: {...} }

// 3. Secondaries query oplog
// Secondary runs: db.oplog.rs.find({ ts: { $gt: lastApplied } })

// 4. Secondaries apply operations
// Secondary executes same insert

// 5. Secondaries update their sync state
// Track timestamp of last applied operation

51. Explain how replica set elections work in MongoDB. What factors determine which secondary becomes the new primary?

Difficulty: MediumType: SubjectiveTopic: Replication

Replica set elections occur when the current primary becomes unavailable or steps down. The remaining replica set members automatically start an election to choose a new primary. During the election, members cannot accept writes, but the process typically completes within seconds. Each eligible voting member casts a vote for the member it believes should become primary. Members vote based on several factors in priority order. First, a member must have the most recent data. Members compare their oplog timestamps, and those with older data will not vote for themselves. Second, member priority affects elections. You can configure each member with a priority value from 0 to 1000, with default being 1. Members with higher priority are more likely to be elected. A member with priority 0 can never become primary and is useful for special-purpose nodes like analytics servers. Third, members must be able to reach a majority of voting members. In a three-member replica set, at least two members must be available to elect a new primary. This majority requirement prevents split-brain scenarios where multiple primaries could exist. Once a member receives votes from the majority of voting members, it becomes the new primary. Applications using replica set connections automatically discover the new primary and route write operations to it.

Example Code

// Configure member priorities
var cfg = rs.conf()

// Member 0: high priority, preferred primary
cfg.members[0].priority = 2

// Member 1: normal priority
cfg.members[1].priority = 1

// Member 2: priority 0, never becomes primary
cfg.members[2].priority = 0

rs.reconfig(cfg)

// During election:
// 1. Primary becomes unavailable
// 2. Secondaries detect loss of primary (heartbeat timeout)
// 3. Eligible members call for election
// 4. Members vote based on priority and data recency
// 5. Member with majority votes becomes new primary

52. What is the oplog and why is its size important? How do you determine the appropriate oplog size?

Difficulty: HardType: SubjectiveTopic: Replication

The oplog, or operations log, is a special capped collection in the local database that records all write operations on the primary node. It stores operations in chronological order with timestamps. Secondary nodes replicate by reading the oplog and applying these operations to their data. Oplog size is critical because it is a capped collection with a fixed maximum size. Once full, the oldest entries are overwritten to make room for new operations. If a secondary falls too far behind and the operations it needs have been overwritten, it cannot catch up through normal replication and must perform a full initial sync. Determining the right oplog size depends on your write volume and how long secondaries might be offline. The oplog should be large enough to hold at least 24 to 48 hours of operations, giving you time to address issues before a secondary falls too far behind. For deployments with high write rates or where maintenance might require nodes to be offline for extended periods, you need a larger oplog. You can calculate oplog size based on your write operations per hour. Monitor your oplog window, which is the time span of operations stored in the oplog. If the window is too short relative to your maintenance windows or replication lag, increase the oplog size. MongoDB allows resizing the oplog without downtime in replica sets.

Example Code

// Check current oplog size and window
use local
db.oplog.rs.stats(1024*1024)  // Size in MB

// Check oplog time window
var first = db.oplog.rs.find().sort({$natural:1}).limit(1).next()
var last = db.oplog.rs.find().sort({$natural:-1}).limit(1).next()
var window = (last.ts.getTime() - first.ts.getTime()) / 1000 / 3600
print("Oplog window: " + window + " hours")

// Resize oplog (requires replSetResizeOplog command)
db.adminCommand({
  replSetResizeOplog: 1,
  size: 16000  // Size in MB
})

53. Explain write concerns in MongoDB. What is the difference between w:1, w:'majority', and w:'all'?

Difficulty: MediumType: SubjectiveTopic: Read Write

Write concern in MongoDB defines the level of acknowledgment requested from MongoDB for write operations. It determines how many replica set members must acknowledge a write before the operation returns success. This balances durability against performance. Write concern w 1 is the default, requiring acknowledgment only from the primary node. The write is considered successful once the primary has written the data, even before it replicates to secondaries. This is fast but provides the least durability. If the primary fails immediately after acknowledging but before replication, the write could be lost. Write concern w majority requires acknowledgment from the majority of replica set members, including the primary. For a three-member replica set, this means at least two nodes must acknowledge the write. This provides strong durability because the data exists on multiple nodes before success is returned. If the primary fails, the data is safe because it has been replicated to at least one secondary. Write concern w all requires acknowledgment from all replica set members. This provides maximum durability but is rarely used because it is slow and makes the system sensitive to any member being unavailable. If any member is down, writes with w all will fail or timeout. You can also specify wtimeout to limit how long to wait for the requested write concern. If the timeout is reached before enough members acknowledge, the write may still succeed on the primary, but the operation returns an error.

Example Code

// w:1 - primary only (default, fastest)
db.orders.insertOne(
  { item: "book", qty: 1 },
  { writeConcern: { w: 1 } }
)

// w:'majority' - majority of members (durable)
db.orders.insertOne(
  { item: "book", qty: 1 },
  { writeConcern: { w: "majority", wtimeout: 5000 } }
)

// w:'all' - all members (slowest, most durable)
db.orders.insertOne(
  { item: "book", qty: 1 },
  { writeConcern: { w: "all", wtimeout: 5000 } }
)

54. What are read concerns in MongoDB and how do they ensure data consistency?

Difficulty: MediumType: SubjectiveTopic: Read Write

Read concern controls the consistency and isolation properties of data read from MongoDB. It determines what data is visible to a read operation, especially in replica sets where data may not be immediately consistent across all members. Read concern local returns the most recent data available on the queried node, without guarantees that the data has been replicated. This is the default and provides the fastest reads but may return data that could be rolled back if the primary fails before replication. Read concern majority returns data that has been acknowledged by the majority of replica set members. This ensures you never read data that could be rolled back during failover. It provides strong consistency guarantees but may return slightly older data if the most recent writes have not yet replicated to the majority. Read concern linearizable provides read-your-writes consistency for single-document reads. It guarantees that read operations return data that reflects all successful writes that completed before the read started. This is useful for scenarios requiring strict consistency but has performance implications. Read concern snapshot provides point-in-time consistency for multi-document reads within transactions. All reads in the transaction see a consistent snapshot of data as it existed at the transaction start time. Choose read concern based on your consistency requirements. Use majority for critical reads where data accuracy is essential. Use local for better performance when eventual consistency is acceptable.

Example Code

// Read concern 'local' - fastest, may return rollback data
db.inventory.find({ qty: { $gt: 0 } })
  .readConcern("local")

// Read concern 'majority' - durable reads, slower
db.inventory.find({ qty: { $gt: 0 } })
  .readConcern("majority")

// Read concern 'linearizable' - strongest consistency
db.inventory.findOne({ _id: 1 })
  .readConcern("linearizable")

// In transactions with snapshot isolation
const session = client.startSession()
session.startTransaction({
  readConcern: { level: "snapshot" },
  writeConcern: { w: "majority" }
})

55. Walk through what happens during a replica set failover when the primary node fails.

Difficulty: HardType: SubjectiveTopic: Replication

When the primary node fails, a coordinated failover process begins automatically. First, the secondary nodes detect that the primary is unreachable through the absence of heartbeat messages. Replica set members send heartbeat pings every two seconds, and if a member does not respond within 10 seconds, it is marked as unreachable. Second, once secondaries confirm the primary is down, they initiate an election. Any secondary can call for an election if it detects the primary is unavailable. The election process involves members voting for which secondary should become the new primary. Third, during the election, members evaluate candidates based on priority settings, data recency, and availability. Members with outdated data or lower priority will not receive votes. The election must reach a majority consensus, so in a three-member replica set, at least two members must vote for the same candidate. Fourth, once a candidate receives votes from the majority of members, it transitions to the primary role and begins accepting write operations. The entire election process typically completes in a few seconds, though it can take up to 30 seconds in some scenarios. Fifth, applications using MongoDB replica set connection strings automatically discover the new primary through the connection driver. The driver maintains a list of replica set members and can redirect operations to the new primary without manual intervention. There may be a brief period where write operations fail during the election, so applications should implement retry logic. Finally, when the failed original primary recovers and rejoins the replica set, it becomes a secondary and syncs data from the new primary.

Example Code

// Timeline of failover:
// Time 0: Primary node crashes
// Time 2-10s: Secondaries detect primary unreachable (heartbeat timeout)
// Time 10s: Secondary initiates election
// Time 10-15s: Election process, members vote
// Time 15s: New primary elected, begins accepting writes
// Time 16s: Applications reconnect to new primary

// Application connection with automatic failover
const client = new MongoClient(
  'mongodb://host1:27017,host2:27017,host3:27017/?replicaSet=myRS',
  {
    retryWrites: true,  // Automatic retry on failover
    w: 'majority'        // Wait for replication
  }
)

// After failover completes
// Old primary (when recovered): joins as secondary
// New primary: handles all writes
// Other secondary: continues replicating

63. Explain the concept of sharding in MongoDB and when you should implement it.

Difficulty: MediumType: SubjectiveTopic: Sharding

Sharding is MongoDB's method for horizontal scaling by distributing data across multiple servers called shards. Each shard is a separate database that holds a portion of the data, and together all shards form the complete dataset. This allows MongoDB to handle datasets and workloads that exceed what a single server can support. In a sharded cluster, data is partitioned based on a shard key, which is a field or fields that exist in every document. MongoDB divides the shard key values into chunks, and each chunk is assigned to a specific shard. As data grows, MongoDB automatically splits chunks and migrates them across shards to maintain balance. You should implement sharding when your dataset exceeds the storage capacity of a single server, typically when approaching several hundred gigabytes or terabytes. Sharding is also appropriate when your read or write throughput exceeds what a single server or replica set can handle, even with adequate hardware. However, sharding adds complexity to your deployment. You need config servers to store cluster metadata, mongos routers to direct queries, and multiple shard replica sets. Only implement sharding when you have exhausted other optimization options like indexing, vertical scaling, and replica set read distribution. Before sharding, ensure your application is ready. Choose your shard key carefully because it cannot be changed after sharding without recreating the collection. A good shard key provides high cardinality, even distribution, and supports your most common query patterns.

Example Code

// When to shard:
// 1. Data size > single server storage (500GB-1TB+)
// 2. Working set > available RAM
// 3. Write throughput > single server capacity
// 4. Need to distribute data geographically

// Sharded cluster components:
// Config Servers: store metadata (replica set)
// Mongos Routers: query routing (multiple instances)
// Shards: store data (each is replica set)

// Basic sharding setup
sh.enableSharding("myDatabase")
sh.shardCollection("myDatabase.users", { userId: 1 })

64. What factors should you consider when choosing a shard key? Explain the consequences of a poor shard key choice.

Difficulty: HardType: SubjectiveTopic: Shard Key

Choosing a shard key is one of the most critical decisions in sharding because it cannot be changed after sharding without rebuilding the collection. A good shard key must satisfy three main criteria: high cardinality, even distribution, and query pattern alignment. High cardinality means the shard key has many distinct values. This allows MongoDB to distribute data across many chunks and shards. Low cardinality keys, like a status field with only three values, can only create three chunks maximum, preventing effective distribution across many shards. Even distribution means queries and inserts are spread across all shards, avoiding hotspots. Monotonically increasing keys like timestamps or auto-incrementing IDs are poor choices because all new writes go to the highest chunk on one shard. This creates a write hotspot that defeats the purpose of sharding. Query pattern alignment means your most common queries should include the shard key. When queries include the shard key, mongos can route them directly to specific shards. Without the shard key in queries, mongos must broadcast to all shards, which is inefficient and slow. Poor shard key choices lead to several problems. Uneven distribution causes some shards to fill up while others remain empty, wasting resources and limiting scalability. Hotspots concentrate all activity on one shard, creating bottlenecks. Scatter-gather queries that hit all shards are slow and resource-intensive. A common strategy is using a compound shard key that combines a field with good distribution, like user ID, with a time-based field for efficient time-range queries. Another approach is using hashed shard keys for monotonically increasing values to ensure even distribution.

Example Code

// Good shard key: compound with high cardinality
sh.shardCollection("db.orders", { userId: 1, orderDate: 1 })
// Pros: high cardinality, supports user queries, time-based queries
// Query: db.orders.find({ userId: 123, orderDate: {...} })
// Routes to specific shard

// Poor shard key: low cardinality
sh.shardCollection("db.orders", { status: 1 })
// Cons: only 3-4 values (pending, completed, cancelled)
// Cannot distribute beyond 3-4 chunks

// Poor shard key: monotonically increasing
sh.shardCollection("db.orders", { _id: 1 })
// Cons: all new writes to one shard (hotspot)
// Better: { _id: "hashed" }

// Queries without shard key (slow)
db.orders.find({ productId: 456 })
// Mongos broadcasts to all shards

65. Describe the components of a MongoDB sharded cluster and how they work together.

Difficulty: HardType: SubjectiveTopic: Sharding

A MongoDB sharded cluster consists of three main components: shards, config servers, and mongos routers. Each component plays a specific role in distributing data and routing queries. Shards are the actual data stores in a sharded cluster. Each shard is typically deployed as a replica set to provide high availability and data redundancy. Shards hold subsets of the data based on the shard key ranges assigned to them. For example, one shard might hold documents with user IDs from 1 to 10000, while another holds 10001 to 20000. Config servers store metadata about the cluster configuration. This includes which shards exist, what chunks of data each shard contains, and the ranges of shard key values in each chunk. Config servers must be deployed as a replica set for high availability because they are critical to cluster operations. If config servers are unavailable, the cluster cannot route queries or perform administrative operations, though existing connections continue to work. Mongos routers are the query routing layer that applications connect to. Applications never connect directly to shards. Instead, they connect to mongos instances, which appear as normal MongoDB servers. When mongos receives a query, it consults the config servers to determine which shards contain relevant data. For targeted queries that include the shard key, mongos routes to specific shards. For scatter-gather queries without the shard key, mongos broadcasts to all shards and merges results. The balancer is a background process that monitors chunk distribution across shards. When it detects imbalance, it migrates chunks from heavily loaded shards to less loaded ones. This ensures even distribution as data grows. Together, these components enable transparent horizontal scaling. Applications interact with mongos routers as if querying a single database, while data is distributed across multiple shards for scalability and performance.

Example Code

// Sharded cluster architecture:
//
// Application
//     ↓
// Mongos (Query Router) ← queries config servers for metadata
//     ↓
// Config Servers (Metadata) - replica set
//     ↓
// Shards (Data Storage) - each is replica set
//   Shard 1: userId 1-10000
//   Shard 2: userId 10001-20000
//   Shard 3: userId 20001-30000

// Query flow:
// 1. App sends: db.users.find({ userId: 15000 })
// 2. Mongos checks config: "userId 15000 is on Shard 2"
// 3. Mongos routes query to Shard 2 only
// 4. Shard 2 returns results
// 5. Mongos returns to application

66. Explain the difference between horizontal and vertical scaling. Why does MongoDB prefer horizontal scaling?

Difficulty: MediumType: SubjectiveTopic: Scalability

Vertical scaling means adding more resources to a single server, such as upgrading CPU, adding more RAM, or using faster storage. It is the simpler approach because your architecture remains unchanged; you just use more powerful hardware. However, vertical scaling has hard limits. There is a maximum amount of RAM, CPU, and storage you can add to a single machine, and costs increase exponentially at the high end. Horizontal scaling means adding more servers to distribute the load across multiple machines. Instead of one powerful server, you use many commodity servers working together. MongoDB implements horizontal scaling through sharding, where data is partitioned across multiple shards. MongoDB prefers horizontal scaling for several reasons. First, it has virtually unlimited capacity. You can add shards indefinitely as your data and workload grow. There is no practical upper limit like there is with vertical scaling. Second, it is more cost-effective. Adding commodity servers is cheaper than buying enterprise-grade high-end hardware. Third, horizontal scaling provides better fault tolerance. With data distributed across multiple servers, the failure of one shard does not take down the entire system. When combined with replica sets on each shard, you have both high availability and horizontal scalability. Fourth, it enables geographic distribution. You can place shards in different data centers or regions to reduce latency for users in different locations. However, horizontal scaling adds complexity. You need to manage multiple servers, choose good shard keys, and handle distributed queries. Vertical scaling remains viable for smaller deployments or when your data fits on a powerful single server. Many deployments use both strategies: vertical scaling for individual servers and horizontal scaling across multiple shards.

Example Code

// Vertical scaling (single server)
// Year 1: 16GB RAM, 4 cores, 500GB storage
// Year 2: 64GB RAM, 16 cores, 2TB storage
// Year 3: 256GB RAM, 32 cores, 8TB storage
// Eventually: Cannot scale further, very expensive

// Horizontal scaling (sharding)
// Year 1: 3 shards, 16GB RAM each
// Year 2: 6 shards, 16GB RAM each
// Year 3: 12 shards, 16GB RAM each
// Can continue adding shards indefinitely

// Sharding setup for horizontal scaling
sh.addShard("shard1/host1:27017")
sh.addShard("shard2/host2:27017")
sh.addShard("shard3/host3:27017")
// Add more shards as needed

67. What are chunks in MongoDB sharding and how does chunk splitting and migration work?

Difficulty: HardType: SubjectiveTopic: Sharding

Chunks are logical groupings of documents based on shard key ranges. MongoDB divides the shard key space into chunks, and each chunk contains documents whose shard key values fall within a specific range. For example, one chunk might contain all documents with user IDs from 1 to 1000, while another contains 1001 to 2000. By default, each chunk has a maximum size of 64 MB. When a chunk grows beyond this size due to inserts or updates, MongoDB automatically splits it into two smaller chunks. Chunk splitting is a metadata operation that updates the config servers; no data is moved during a split. The split point is chosen to divide the chunk into roughly equal sizes. After chunks are split, the balancer monitors the distribution of chunks across shards. If one shard has significantly more chunks than another, the balancer migrates chunks from the heavily loaded shard to less loaded shards. Chunk migration involves copying documents from the source shard to the destination shard, then updating metadata on config servers to reflect the new location. During migration, the chunk being migrated remains on the source shard and continues serving queries until migration completes. Once all documents are copied and verified, the metadata is updated atomically. This ensures that chunk migrations are transparent to applications, though they do consume resources like network bandwidth and disk I/O. The balancer can be configured to run only during specific time windows to avoid impacting production workloads. You can also manually split chunks or move them if needed for special circumstances. However, automatic chunk management works well for most deployments. Chunk size affects migration frequency and efficiency. Larger chunks mean fewer but longer migrations. Smaller chunks mean more frequent but faster migrations. The default 64 MB is a good balance for most workloads.

Example Code

// View chunks for a collection
use config
db.chunks.find({ ns: "mydb.users" }).pretty()

// Example chunk
{
  _id: "mydb.users-userId_1000",
  ns: "mydb.users",
  min: { userId: 1000 },
  max: { userId: 2000 },
  shard: "shard0001"
}

// Chunk lifecycle:
// 1. Chunk grows beyond 64MB
// 2. MongoDB splits chunk at midpoint
//    Chunk A: userId 1000-1500
//    Chunk B: userId 1501-2000
// 3. Balancer detects imbalance
// 4. Balancer migrates Chunk B to another shard

// Manual chunk split (rarely needed)
sh.splitAt("mydb.users", { userId: 5000 })

// Change chunk size
use config
db.settings.updateOne(
  { _id: "chunksize" },
  { $set: { value: 128 } },  // 128MB chunks
  { upsert: true }
)

68. Explain the difference between targeted queries and broadcast queries in a sharded cluster. How does this affect performance?

Difficulty: MediumType: SubjectiveTopic: Sharding

In a sharded cluster, queries fall into two categories based on whether they include the shard key: targeted queries and broadcast queries. The difference significantly impacts performance. Targeted queries include the shard key in the query filter. When mongos receives a targeted query, it can determine exactly which shard or shards contain the relevant data by checking the config servers. Mongos then routes the query only to those specific shards. For example, if you query for a specific user ID and user ID is your shard key, mongos routes to only the shard containing that user ID range. This is very efficient because only one shard needs to process the query. Broadcast queries do not include the shard key in the filter. Without the shard key, mongos cannot determine which shards contain relevant data, so it must broadcast the query to all shards. Each shard processes the query independently and returns results to mongos, which then merges the results before returning them to the application. This is much slower because all shards must be queried, network traffic multiplies, and mongos must merge potentially large result sets. The performance impact is substantial. Targeted queries scale linearly; adding more shards does not slow them down because each query still hits only specific shards. Broadcast queries get slower as you add shards because more shards must be queried. In a ten-shard cluster, a broadcast query does ten times the work of querying a single shard. To optimize performance, design your shard key and queries so that common operations are targeted. Include the shard key in query filters whenever possible. For queries that cannot include the shard key, consider using covered indexes on each shard to make the broadcast queries more efficient. Monitor your query patterns using profiling and explain to identify broadcast queries that can be optimized.

Example Code

// Shard key: { userId: 1 }

// Targeted query - includes shard key
db.orders.find({ userId: 12345, status: "completed" })
// Mongos knows userId 12345 is on Shard 2
// Routes to Shard 2 only - FAST

// Broadcast query - no shard key
db.orders.find({ status: "completed" })
// Mongos doesn't know which shards have completed orders
// Queries all shards, merges results - SLOW

// Explain shows broadcast
db.orders.find({ status: "completed" }).explain()
// Shows: SHARD_MERGE stage (broadcast to all shards)

// Best practice: include shard key
db.orders.find({
  userId: { $in: [123, 456, 789] },  // Shard key
  status: "completed"
})
// Targeted to specific shards - FAST

69. What is zone sharding in MongoDB and when would you use it?

Difficulty: HardType: SubjectiveTopic: Sharding

Zone sharding, also called tag-aware sharding, allows you to control which shards store specific ranges of data based on shard key values. You create zones, associate shards with zones, and define shard key ranges for each zone. The balancer then ensures chunks falling within zone ranges are migrated to shards associated with those zones. Zone sharding is useful for several scenarios. First, geographic data distribution where you want to keep data close to users. For example, you can create US zone and EU zone, assign shards in US data centers to US zone and shards in EU data centers to EU zone, then define ranges like user IDs starting with 1 for US and 2 for EU. Second, data tiering based on access patterns. You might have hot data that is frequently accessed and cold data that is rarely accessed. Create hot zone on fast SSD-backed shards and cold zone on cheaper HDD-backed shards, then assign recent data to hot zone and old data to cold zone based on date ranges. Third, multi-tenancy where you want to isolate different customers on different hardware. Create zones for different customer tiers, like premium zone on powerful hardware and standard zone on regular hardware, then assign customers to appropriate zones. To implement zone sharding, first add shards to zones using sh.addShardToZone. Then define the ranges of shard key values for each zone using sh.updateZoneKeyRange. The balancer automatically migrates chunks to respect zone boundaries. This happens gradually and transparently. Zone sharding adds complexity because you must manage zone definitions and shard assignments. Use it only when you have clear requirements for data placement. For most applications, automatic chunk distribution without zones is sufficient.

Example Code

// Geographic zone sharding

// 1. Add shards to zones
sh.addShardToZone("shard-us-west", "US")
sh.addShardToZone("shard-us-east", "US")
sh.addShardToZone("shard-eu-west", "EU")

// 2. Define shard key ranges for zones
sh.updateZoneKeyRange(
  "mydb.users",
  { userId: 1000000, country: "US" },
  { userId: 1999999, country: "US" },
  "US"
)

sh.updateZoneKeyRange(
  "mydb.users",
  { userId: 2000000, country: "EU" },
  { userId: 2999999, country: "EU" },
  "EU"
)

// 3. Balancer migrates chunks to appropriate shards
// US users' data stays on US shards
// EU users' data stays on EU shards

// Result: reduced latency for users
// US user queries hit US shards (low latency)
// EU user queries hit EU shards (low latency)

70. What are the limitations and challenges of sharding in MongoDB?

Difficulty: MediumType: SubjectiveTopic: Sharding

Sharding provides powerful horizontal scaling but comes with significant limitations and challenges that you must consider before implementation. First, operational complexity increases dramatically. You must deploy and manage config servers, mongos routers, and multiple shard replica sets. This is much more complex than managing a single replica set. Monitoring, backup, and maintenance procedures become more complicated. Second, the shard key is immutable after sharding. Once you shard a collection with a specific shard key, you cannot change it without recreating the collection and migrating all data. Choosing the wrong shard key can cripple performance, and fixing it requires significant downtime and effort. This makes shard key selection a critical decision that must be made carefully. Third, some operations are limited or inefficient in sharded clusters. Unique indexes can only be created on the shard key or fields that include the shard key. This restricts your ability to enforce uniqueness on other fields. Transactions across shards are possible but have performance implications. Aggregation pipelines may require merging results from multiple shards. Fourth, scatter-gather queries that hit all shards are significantly slower than targeted queries. If your query patterns do not include the shard key, performance may actually be worse than an unsharded deployment. This makes query pattern analysis critical before sharding. Fifth, balancing operations consume resources. Chunk migrations use network bandwidth, disk I/O, and CPU. During heavy migration periods, cluster performance can degrade. You can schedule balancing windows, but this adds management overhead. Finally, sharding requires more hardware and infrastructure, increasing costs. You need at minimum three config servers, at least two mongos routers, and multiple shard replica sets. For small datasets, these costs outweigh the benefits.

Example Code

// Shard key limitations

// Cannot change shard key after sharding
sh.shardCollection("db.users", { email: 1 })
// Later realize email is poor choice
// Must drop collection and reshhard - significant downtime

// Unique indexes require shard key
// Shard key: { userId: 1 }
db.users.createIndex({ email: 1 }, { unique: true })
// ERROR: cannot create unique index on field without shard key

// Must include shard key
db.users.createIndex({ userId: 1, email: 1 }, { unique: true })
// OK: includes shard key

// Scatter-gather queries are slow
db.users.find({ age: { $gt: 25 } })  // No shard key
// Queries all shards, slow with many shards

// Before sharding: consider alternatives
// - Better indexing
// - Vertical scaling
// - Read replicas for read distribution
// - Application-level caching

77. Explain multi-document transactions in MongoDB. How do they ensure ACID properties and what are their limitations?

Difficulty: HardType: SubjectiveTopic: Transactions

Multi-document transactions in MongoDB allow you to execute multiple operations across multiple documents and collections atomically, ensuring ACID properties: Atomicity, Consistency, Isolation, and Durability. Either all operations in a transaction succeed and are committed, or all fail and are rolled back. Atomicity means all operations in the transaction complete successfully or none do. If any operation fails, all previous operations are rolled back. Consistency ensures the database moves from one valid state to another. Isolation means concurrent transactions do not interfere with each other. Durability guarantees committed transactions survive system failures. To use transactions, you start a session, begin a transaction, perform operations passing the session to each operation, and then commit or abort the transaction. MongoDB uses write-ahead logging and the oplog to implement transaction guarantees. During a transaction, changes are not visible to other operations until commit. Transactions have important limitations. First, they are only available on replica sets and sharded clusters, not standalone instances. Second, they have a 60-second default timeout and are subject to the 16 MB oplog entry size limit. Third, transactions that affect multiple shards have additional overhead. Fourth, transactions should be kept short because they hold locks on documents. Long-running transactions can block other operations and affect performance. Fifth, DDL operations like creating indexes cannot be performed inside transactions. Finally, transactions add latency compared to individual operations, so use them only when atomicity across multiple documents is truly required.

Example Code

// Bank transfer example - ACID transaction
const session = client.startSession()

try {
  session.startTransaction({
    readConcern: { level: 'snapshot' },
    writeConcern: { w: 'majority' }
  })
  
  // Deduct from account 1
  const result1 = await db.accounts.updateOne(
    { _id: 'account1', balance: { $gte: 100 } },
    { $inc: { balance: -100 } },
    { session }
  )
  
  if (result1.modifiedCount === 0) {
    throw new Error('Insufficient funds')
  }
  
  // Add to account 2
  await db.accounts.updateOne(
    { _id: 'account2' },
    { $inc: { balance: 100 } },
    { session }
  )
  
  // Record transaction
  await db.transfers.insertOne(
    { from: 'account1', to: 'account2', amount: 100 },
    { session }
  )
  
  // Commit - all or nothing
  await session.commitTransaction()
  
} catch (error) {
  // Rollback all changes
  await session.abortTransaction()
  throw error
} finally {
  session.endSession()
}

78. What are the key differences between WiredTiger and MMAPv1 storage engines? Why was WiredTiger introduced?

Difficulty: MediumType: SubjectiveTopic: Storage

WiredTiger and MMAPv1 are two storage engines in MongoDB with significant differences in architecture and performance characteristics. MMAPv1 was the original storage engine but is now deprecated, while WiredTiger is the default since MongoDB 3.2. The primary difference is concurrency control. MMAPv1 uses collection-level locking, meaning only one write operation can occur per collection at a time. This creates bottlenecks in write-heavy workloads. WiredTiger uses document-level locking, allowing multiple write operations to different documents in the same collection simultaneously. This dramatically improves write throughput. Second, WiredTiger supports compression of both data and indexes using snappy or zlib algorithms. This reduces storage requirements by 50-80 percent and improves I/O performance because less data needs to be read from disk. MMAPv1 does not support compression. Third, WiredTiger uses checkpoints and write-ahead logging for crash recovery and durability. Checkpoints are snapshots of data written to disk every 60 seconds, while the journal records operations between checkpoints. This provides better recovery guarantees than MMAPv1's journaling. Fourth, WiredTiger uses a cache to hold frequently accessed data in memory. The cache size is configurable and defaults to 50 percent of RAM minus 1 GB. MMAPv1 relied on the operating system's file system cache, giving less control over memory usage. WiredTiger was introduced to address MMAPv1's limitations in write performance, storage efficiency, and concurrency. It provides better performance for most workloads, especially write-heavy applications, and is now the only supported storage engine in modern MongoDB versions.

Example Code

// WiredTiger configuration
storage:
  engine: wiredTiger
  wiredTiger:
    engineConfig:
      cacheSizeGB: 4  // Internal cache size
      journalCompressor: snappy
    collectionConfig:
      blockCompressor: snappy  // Data compression
    indexConfig:
      prefixCompression: true  // Index compression

// WiredTiger features:
// - Document-level locking (high write concurrency)
// - Compression (60% space savings typical)
// - Checkpoints every 60 seconds
// - Write-ahead logging (journal)
// - Configurable cache

// MMAPv1 (deprecated):
// - Collection-level locking (write bottleneck)
// - No compression
// - Power-of-2 sized allocations (wasted space)
// - Relies on OS file system cache

79. Explain geospatial indexes in MongoDB. What are the differences between 2d and 2dsphere indexes?

Difficulty: MediumType: SubjectiveTopic: Indexing

Geospatial indexes in MongoDB enable efficient queries on location data, allowing you to find documents based on geographic coordinates. MongoDB supports two types of geospatial indexes: 2d for flat geometries and 2dsphere for spherical geometries. 2d indexes are designed for flat, two-dimensional coordinate systems like maps. They work with legacy coordinate pairs represented as arrays of two numbers. 2d indexes support queries within rectangles, circles, and polygons on a flat plane. Use 2d indexes when your data represents points on a flat map, such as game positions or floor plans. 2dsphere indexes work with spherical geometries that model the Earth's surface. They use GeoJSON format to represent points, lines, and polygons considering the Earth as an oblate spheroid. This provides accurate distance calculations for geographic coordinates like GPS locations. 2dsphere indexes support queries like finding locations within a certain distance, within a polygon, or along a route. GeoJSON format specifies geometry types and coordinates. A point is defined with type Point and coordinates as longitude then latitude. A polygon is defined with type Polygon and an array of coordinate arrays forming closed rings. MongoDB provides geospatial query operators like dollar near for proximity queries, dollar geoWithin for area queries, and dollar geoIntersects for intersection queries. Geospatial queries are essential for location-based applications like finding nearby restaurants, delivery routing, ride-sharing services, or real estate searches. The 2dsphere index is recommended for most geographic applications because it accounts for the Earth's curvature and provides more accurate results.

Example Code

// Create 2dsphere index for GPS coordinates
db.places.createIndex({ location: "2dsphere" })

// Insert location using GeoJSON
db.places.insertOne({
  name: "Coffee Shop",
  location: {
    type: "Point",
    coordinates: [-73.97, 40.77]  // [longitude, latitude]
  }
})

// Find places near a point (within 1000 meters)
db.places.find({
  location: {
    $near: {
      $geometry: {
        type: "Point",
        coordinates: [-73.98, 40.76]
      },
      $maxDistance: 1000  // meters
    }
  }
})

// Find places within a polygon
db.places.find({
  location: {
    $geoWithin: {
      $geometry: {
        type: "Polygon",
        coordinates: [[[-74, 40], [-73, 40], [-73, 41], [-74, 41], [-74, 40]]]
      }
    }
  }
})

80. What are the different backup and recovery strategies for MongoDB? Compare their advantages and disadvantages.

Difficulty: HardType: SubjectiveTopic: Backup

MongoDB offers several backup and recovery strategies, each with different trade-offs between consistency, performance impact, and ease of use. Choosing the right strategy depends on your recovery time objectives, data size, and operational requirements. First, mongodump and mongorestore are utility tools that create logical backups by reading documents and writing them to BSON files. Advantages include simplicity, platform independence, and ability to backup specific databases or collections. Disadvantages are that they are slow for large datasets, do not capture a point-in-time snapshot by default, and require significant I/O resources. Second, file system snapshots create binary copies of data files at the storage layer. Cloud providers and storage systems offer snapshot features that create consistent copies almost instantly. Advantages include speed, minimal performance impact, and point-in-time consistency. Disadvantages are dependency on specific storage systems and typically requiring the entire data directory. Third, MongoDB Cloud Manager and Ops Manager provide continuous backup with point-in-time recovery. They take periodic snapshots and capture oplog entries between snapshots, allowing you to restore to any point in time. Advantages include automated scheduling, point-in-time recovery, and minimal manual intervention. Disadvantages are cost for cloud services and complexity for on-premises Ops Manager. Fourth, delayed replica set members can serve as a live backup. Configure a secondary with a replication delay of several hours. If data corruption or accidental deletion occurs, you can recover from the delayed member before it replicates the problem. Advantages include continuous availability and simple recovery. Disadvantages are resource overhead and limited recovery window. Best practices include testing your backup and recovery procedures regularly, storing backups in a different location or region, encrypting sensitive backups, and documenting recovery procedures. Combine multiple strategies for defense in depth.

Example Code

// Method 1: mongodump and mongorestore
// Backup entire database
mongodump --host=localhost --port=27017 --db=mydb --out=/backup/

// Restore from backup
mongorestore --host=localhost --port=27017 --db=mydb /backup/mydb/

// Method 2: File system snapshot (example with LVM)
lvcreate --size 10G --snapshot --name mongodb-snap /dev/vg0/mongodb
tar czf /backup/mongodb-snap.tar.gz /mnt/mongodb-snap
lvremove /dev/vg0/mongodb-snap

// Method 3: Delayed replica set member
// Configure secondary with 4-hour delay
var cfg = rs.conf()
cfg.members[2].priority = 0
cfg.members[2].hidden = true
cfg.members[2].slaveDelay = 14400  // 4 hours in seconds
rs.reconfig(cfg)

// Recovery from delayed member:
// 1. Stop delayed secondary
// 2. Copy its data files
// 3. Restore primary from these files

// Method 4: Continuous backup (conceptual)
// MongoDB Atlas automated backups:
// - Snapshots every 6-24 hours
// - Oplog between snapshots
// - Point-in-time restore to any second

81. What security measures should you implement for a production MongoDB deployment?

Difficulty: HardType: SubjectiveTopic: Security

Securing a production MongoDB deployment requires multiple layers of protection covering authentication, authorization, network security, encryption, and auditing. Following security best practices is essential to protect sensitive data and prevent unauthorized access. First, enable authentication and require all clients to authenticate before accessing the database. MongoDB supports various authentication mechanisms including SCRAM-SHA-256, LDAP, Kerberos, and x.509 certificates. Never run MongoDB without authentication in production. Create users with strong passwords and rotate credentials regularly. Second, implement role-based access control with the principle of least privilege. Create custom roles that grant only the minimum permissions needed for each user or application. MongoDB provides built-in roles like read, readWrite, dbAdmin, and userAdmin, but custom roles offer fine-grained control. Regularly audit user permissions and remove unnecessary access. Third, secure network access using firewalls, VPNs, or private networks. Bind MongoDB to specific network interfaces rather than all interfaces, and use IP whitelisting to restrict which hosts can connect. Never expose MongoDB directly to the internet without proper network security. Fourth, enable encryption both at rest and in transit. Use TLS/SSL for client connections and inter-node communication in replica sets and sharded clusters. Enable encryption at rest to protect data files from unauthorized access if physical media is compromised. MongoDB Enterprise edition provides native encryption at rest. Fifth, enable auditing to track database access and changes. Audit logs help detect suspicious activity, comply with regulations, and investigate security incidents. MongoDB Enterprise edition provides comprehensive auditing features. Sixth, keep MongoDB and drivers updated with the latest security patches. Follow MongoDB security advisories and apply patches promptly. Seventh, implement backup encryption and secure backup storage. Finally, use security scanning tools to detect vulnerabilities and misconfigurations.

Example Code

// 1. Enable authentication
// Start MongoDB with --auth flag
mongod --auth --port 27017 --dbpath /data/db

// Create admin user
use admin
db.createUser({
  user: "admin",
  pwd: "SecurePassword123!",
  roles: [{ role: "userAdminAnyDatabase", db: "admin" }]
})

// 2. Create application user with limited permissions
use myapp
db.createUser({
  user: "appUser",
  pwd: "AppPassword456!",
  roles: [
    { role: "readWrite", db: "myapp" }
  ]
})

// 3. Custom role with specific permissions
use admin
db.createRole({
  role: "reportsReader",
  privileges: [
    {
      resource: { db: "analytics", collection: "reports" },
      actions: ["find"]
    }
  ],
  roles: []
})

// 4. Network security in config file
net:
  bindIp: 127.0.0.1,10.0.0.5  // Specific IPs only
  port: 27017
  tls:
    mode: requireTLS
    certificateKeyFile: /path/to/cert.pem
    CAFile: /path/to/ca.pem

// 5. Enable encryption at rest (Enterprise)
security:
  enableEncryption: true
  encryptionKeyFile: /path/to/keyfile

// 6. Enable auditing (Enterprise)
auditLog:
  destination: file
  format: JSON
  path: /var/log/mongodb/audit.json

82. How do you monitor and troubleshoot MongoDB performance issues in production?

Difficulty: HardType: SubjectiveTopic: Monitoring

Monitoring and troubleshooting MongoDB performance requires a systematic approach using built-in tools, metrics analysis, and query optimization techniques. Proactive monitoring helps identify issues before they impact users. First, use MongoDB's built-in monitoring tools. The db.serverStatus command provides comprehensive metrics including connections, memory usage, operations, network traffic, and storage statistics. The db.currentOp command shows currently running operations, helping identify long-running queries. The mongostat utility displays real-time statistics on operations, connections, and resource usage. Second, enable database profiling to capture slow queries. The profiler records queries exceeding a specified threshold to the system.profile collection. Set profiling level 1 to log slow queries or level 2 to log all queries. Analyze profiled queries to identify optimization opportunities. Use explain method on slow queries to understand execution plans and index usage. Third, monitor key performance metrics. Watch for high query execution times, low cache hit ratios indicating insufficient memory, high page faults showing working set exceeds RAM, connection saturation, replication lag in replica sets, and uneven shard distribution in sharded clusters. Set up alerts for abnormal values. Fourth, use monitoring platforms like MongoDB Cloud Manager, Ops Manager, or third-party tools like Prometheus and Grafana. These provide historical trending, alerting, and dashboards. They help identify patterns and correlate metrics across multiple servers. Fifth, when troubleshooting specific issues, follow a structured approach. For slow queries, use explain to check if indexes are used, verify index selectivity, and ensure query patterns match shard keys in sharded clusters. For memory issues, check working set size versus available RAM and consider adding memory or sharding. For write performance, verify write concern settings, check for lock contention, and ensure indexes are not over-indexed causing write overhead. Regularly review and optimize indexes, remove unused indexes, monitor index size and fragmentation, and rebuild indexes if needed. Test changes in staging environments before applying to production.

Example Code

// 1. Check current operations
db.currentOp({
  "active": true,
  "secs_running": { "$gt": 5 }  // Running > 5 seconds
})

// Kill long-running operation
db.killOp(12345)

// 2. Enable profiling for slow queries (> 100ms)
db.setProfilingLevel(1, { slowms: 100 })

// Analyze slow queries
db.system.profile.find().sort({ ts: -1 }).limit(10)

// 3. Get server statistics
db.serverStatus()

// Key metrics to monitor:
// - connections.current (connection count)
// - opcounters (operations per second)
// - mem.resident (RAM usage)
// - wiredTiger.cache (cache statistics)
// - network.bytesIn/bytesOut
// - repl.lag (replication lag)

// 4. Check database statistics
db.stats()

// 5. Analyze query performance
db.orders.find({ customerId: 123 }).explain("executionStats")

// Look for:
// - executionTimeMillis (total time)
// - totalDocsExamined vs totalDocsReturned
// - stage: "IXSCAN" (using index) vs "COLLSCAN" (full scan)

// 6. Check index usage statistics
db.orders.aggregate([{ $indexStats: {} }])

// 7. Monitor with mongostat (external command)
mongstat --host localhost:27017 5
// Shows operations, memory, connections every 5 seconds

83. What are Change Streams in MongoDB and what are practical use cases for them?

Difficulty: MediumType: SubjectiveTopic: Change Streams

Change Streams allow applications to access real-time data changes without the complexity of tailing the oplog. They provide a high-level API for subscribing to data changes across collections, databases, or entire clusters. Change Streams are built on top of the oplog and are available on replica sets and sharded clusters starting from MongoDB 3.6. Change Streams capture insert, update, replace, and delete operations as they occur. When you open a change stream, you receive a cursor that emits change events as documents are modified. Each event includes the operation type, the affected document or document key, and the timestamp. Practical use cases are numerous. First, cache invalidation where you update application caches when data changes in MongoDB. When a product price changes, you can immediately invalidate the cached price and fetch the new value. This keeps caches synchronized without polling. Second, real-time notifications where you push updates to users. For example, in a social media application, notify users when someone likes their post or sends a message. Change Streams detect these inserts and trigger notifications immediately. Third, data synchronization across systems where you replicate changes to other databases, search engines, or data warehouses. When an order is placed in MongoDB, you can push the order to an analytics database or update a search index in Elasticsearch. Fourth, audit logging where you record all changes to sensitive collections for compliance. Change Streams capture who changed what and when, creating an audit trail. Fifth, microservices communication where services react to data changes in other services, enabling event-driven architectures. Change Streams support filtering using aggregation pipeline syntax, allowing you to subscribe only to relevant changes. You can resume from a specific point using resume tokens, ensuring no events are missed during application restarts.

Example Code

// Watch entire collection
const changeStream = db.collection('products').watch()

changeStream.on('change', (change) => {
  console.log('Change event:', change)
  
  switch(change.operationType) {
    case 'insert':
      handleNewProduct(change.fullDocument)
      break
    case 'update':
      invalidateCache(change.documentKey._id)
      break
    case 'delete':
      removeFromSearch(change.documentKey._id)
      break
  }
})

// Watch with filter - only completed orders
const pipeline = [
  {
    $match: {
      'operationType': 'insert',
      'fullDocument.status': 'completed'
    }
  }
]

const orderStream = db.orders.watch(pipeline)
orderStream.on('change', (change) => {
  // Trigger fulfillment process
  processOrder(change.fullDocument)
})

// Resume from token (after restart)
const resumeToken = getLastProcessedToken()
const resumableStream = db.orders.watch([], {
  resumeAfter: resumeToken
})

// Use case: Real-time dashboard
db.sales.watch().on('change', (change) => {
  if (change.operationType === 'insert') {
    updateDashboard(change.fullDocument)
    websocket.broadcast('newSale', change.fullDocument)
  }
})

84. What are the essential best practices for deploying MongoDB in production environments?

Difficulty: HardType: SubjectiveTopic: Production Ops

Deploying MongoDB in production requires careful planning and adherence to best practices to ensure reliability, performance, and security. These practices span architecture, configuration, operations, and monitoring. First, always use replica sets even for single-server deployments. Replica sets provide high availability through automatic failover and data redundancy. Use at least three data-bearing members across different availability zones or data centers for maximum resilience. Configure appropriate write and read concerns based on your durability requirements. Second, properly size your hardware and infrastructure. Ensure your working set fits in RAM for optimal performance. Working set is the frequently accessed data and indexes. MongoDB performance degrades significantly when the working set exceeds available memory. Use SSD storage for data directories to improve I/O performance. Allocate sufficient CPU cores for concurrent operations. Third, implement comprehensive monitoring and alerting. Track key metrics including query performance, replication lag, connection counts, memory usage, disk I/O, and error rates. Set up alerts for abnormal conditions like high replication lag, low disk space, or connection saturation. Use MongoDB Cloud Manager, Ops Manager, or integrate with tools like Prometheus and Grafana. Fourth, establish backup and disaster recovery procedures. Implement automated backups with point-in-time recovery capability. Test restore procedures regularly to verify backups work. Store backups in a different geographic location. Document recovery time objectives and recovery point objectives, and ensure procedures meet these targets. Fifth, implement security hardening. Enable authentication and authorization, use TLS for all connections, implement network security through firewalls and VPNs, enable encryption at rest for sensitive data, configure auditing for compliance, and keep MongoDB updated with security patches. Sixth, optimize for your workload. Create appropriate indexes based on query patterns, but avoid over-indexing which slows writes. Use covered queries where possible. In sharded clusters, choose shard keys carefully to ensure even distribution and support common queries. Seventh, manage connection pooling properly. Configure appropriate connection pool sizes in drivers to balance concurrency with resource usage. Monitor connection counts and adjust pool sizes if needed. Finally, document your deployment architecture, configurations, and operational procedures for your team.

Example Code

// Production configuration example (mongod.conf)

// Network settings
net:
  port: 27017
  bindIp: 10.0.0.5,127.0.0.1  // Private IP + localhost
  maxIncomingConnections: 1000
  tls:
    mode: requireTLS
    certificateKeyFile: /etc/ssl/mongodb.pem

// Security
security:
  authorization: enabled
  keyFile: /etc/mongodb/keyfile  // Replica set auth

// Storage
storage:
  dbPath: /data/mongodb
  engine: wiredTiger
  wiredTiger:
    engineConfig:
      cacheSizeGB: 8  // 50% of RAM
      journalCompressor: snappy
    collectionConfig:
      blockCompressor: snappy

// Replication
replication:
  replSetName: "production-rs"
  oplogSizeMB: 10240  // 10GB oplog

// System resource limits
processManagement:
  fork: true
  pidFilePath: /var/run/mongodb/mongod.pid

// Operational logging
systemLog:
  destination: file
  path: /var/log/mongodb/mongod.log
  logAppend: true
  logRotate: reopen

// Profiling (disable in production, enable when troubleshooting)
operationProfiling:
  mode: slowOp
  slowOpThresholdMs: 100

// Best practices checklist:
// ✓ Replica set with 3+ members
// ✓ Authentication and authorization enabled
// ✓ TLS encryption
// ✓ Monitoring and alerting configured
// ✓ Automated backups
// ✓ Proper hardware sizing (working set in RAM)
// ✓ Connection limits configured
// ✓ Regular security updates
// ✓ Documented procedures

91. Explain the trade-offs between embedding documents versus referencing in MongoDB. When should you use each approach?

Difficulty: HardType: SubjectiveTopic: Data Modeling

Embedding and referencing are two fundamental approaches to modeling relationships in MongoDB, each with distinct advantages and trade-offs. The choice between them significantly impacts performance, data integrity, and application complexity. Embedding stores related data within the same document. For example, storing a user's address directly inside the user document. Advantages include better read performance because all data is retrieved in a single query, atomicity where updates to embedded data are atomic, and locality where related data is stored together on disk. Disadvantages of embedding include document size limits because documents cannot exceed 16 MB, data duplication when the same embedded data needs to appear in multiple parent documents, and unbounded growth problems when arrays can grow indefinitely. Embedding also makes it harder to query embedded data independently. Referencing stores related data in separate documents and uses references like foreign keys to link them. For example, storing user ID in order documents to reference the user collection. Advantages include no duplication because shared data exists once, ability to query referenced data independently, and no size limits on the relationship. Disadvantages of referencing include requiring multiple queries or dollar lookup to retrieve related data, no atomic updates across documents outside transactions, and increased complexity in application code to manage relationships. Use embedding when you have one-to-one relationships, one-to-few relationships with a small bounded number of embedded documents, data that is always accessed together, data that rarely changes, or when atomic updates to related data are critical. For example, embed user addresses, order line items, or blog post comments. Use referencing when you have one-to-many relationships with unbounded growth, many-to-many relationships, data that is frequently accessed independently, data that changes frequently and needs to be shared, or when document size approaches limits. For example, reference users from orders, products from categories, or students from courses. In practice, hybrid models are common. You might embed frequently accessed fields for performance while referencing complete related documents. For example, embed author name and ID in blog posts for display, but reference the full author document for profile pages.

Example Code

// EMBEDDING APPROACH
// Good for: one-to-one, one-to-few, always accessed together
{
  _id: 1,
  name: "John Doe",
  email: "john@example.com",
  address: {  // Embedded document
    street: "123 Main St",
    city: "New York",
    zip: "10001"
  },
  orders: [  // Embedded array (bounded)
    { orderId: 1, item: "Laptop", amount: 999 },
    { orderId: 2, item: "Mouse", amount: 29 }
  ]
}
// Single query gets everything
db.users.findOne({ _id: 1 })

// REFERENCING APPROACH
// Good for: one-to-many (unbounded), many-to-many, independent access

// Users collection
{
  _id: 1,
  name: "John Doe",
  email: "john@example.com"
}

// Orders collection (references user)
{
  _id: 101,
  userId: 1,  // Reference to user
  items: [...],
  total: 1500
}

// Requires multiple queries or $lookup
const user = db.users.findOne({ _id: 1 })
const orders = db.orders.find({ userId: 1 })

// HYBRID APPROACH
// Embed frequently used data, reference full document
{
  _id: 101,
  author: {  // Embed key info
    id: 1,
    name: "John Doe"
  },
  title: "MongoDB Guide",
  content: "..."
}
// Get author details when needed
db.authors.findOne({ _id: 1 })

92. How does schema validation work in MongoDB and what are its practical applications?

Difficulty: MediumType: SubjectiveTopic: Data Modeling

Schema validation in MongoDB allows you to enforce document structure and data types despite MongoDB being schema-less by default. This provides a balance between flexibility and data integrity, ensuring critical fields meet specific requirements while allowing schema evolution. You define validation rules using JSON Schema syntax when creating or modifying collections. The validator specifies required fields, data types using bsonType, string patterns using regex, numeric ranges with minimum and maximum, and nested document structures. MongoDB validates documents against these rules during insert and update operations. Validation level determines when rules are applied. Strict validation applies rules to all inserts and updates, rejecting any documents that do not match. Moderate validation applies rules only to new documents and updates to existing valid documents, but allows updates to existing invalid documents. This is useful when adding validation to collections with existing data. Validation action determines what happens when validation fails. Error action rejects the operation and returns an error to the application. Warn action logs a warning in MongoDB logs but allows the operation to proceed. Warn is useful during development or when gradually introducing validation. Practical applications include ensuring email fields contain valid email patterns, enforcing required fields like user IDs or timestamps, validating numeric ranges like age between 0 and 150, ensuring enum-like fields contain only allowed values, and validating nested document structures. Schema validation is particularly valuable in applications where data quality is critical, when multiple applications access the same database and need consistent data formats, during migrations when transitioning from loose to strict schemas, or for compliance requirements that mandate certain data fields. Validation complements application-level validation. Use MongoDB validation for critical structural requirements and data integrity, while application-level validation handles business logic, user feedback, and complex cross-field validations.

93. Explain bulk operations in MongoDB. What is the difference between ordered and unordered bulk operations?

Difficulty: MediumType: SubjectiveTopic: CRUD

Bulk operations in MongoDB allow you to perform multiple write operations in a single request, significantly improving performance by reducing network overhead. Instead of making separate round trips for each operation, you batch them together and send to the database once. MongoDB provides the bulkWrite method that accepts an array of operations including insertOne, updateOne, updateMany, deleteOne, deleteMany, and replaceOne. You can mix different operation types in a single bulk request. This is more efficient than executing operations individually, especially when dealing with large datasets or remote database connections. The key distinction between ordered and unordered bulk operations lies in execution behavior and error handling. Ordered bulk operations execute in sequence, stopping at the first error. If an operation fails, all subsequent operations are cancelled. This maintains the order you specified and provides predictable behavior but means one failure stops the entire batch. Unordered bulk operations execute in parallel with no guaranteed order, and continue executing all operations even if some fail. MongoDB may reorder operations for optimization. If multiple operations fail, all errors are collected and returned together at the end. This provides better performance because operations can be parallelized and one failure does not prevent other operations from succeeding. Choose ordered bulk operations when operation order matters, such as when later operations depend on earlier ones, or when you need all-or-nothing semantics within the batch. For example, inserting a parent document before child documents that reference it. Choose unordered bulk operations for maximum performance when operations are independent and order does not matter. This is common for batch inserts, mass updates, or data migrations where each operation stands alone. Unordered operations can be significantly faster for large batches. Bulk operations return detailed results including how many documents were inserted, matched, modified, deleted, and any errors that occurred. Handle errors appropriately based on your requirements.

Example Code

// ORDERED bulk operations (default)
try {
  const result = await db.users.bulkWrite([
    { insertOne: { document: { name: "Alice", age: 25 } } },
    { insertOne: { document: { name: "Bob", age: 30 } } },
    { insertOne: { document: { name: "Alice" } } },  // Error: duplicate
    { insertOne: { document: { name: "David", age: 35 } } }  // Not executed
  ], { ordered: true })
} catch (error) {
  // First 2 succeed, 3rd fails, 4th never executed
  console.log(error.writeErrors)
}

// UNORDERED bulk operations
try {
  const result = await db.users.bulkWrite([
    { insertOne: { document: { name: "Alice", age: 25 } } },
    { insertOne: { document: { name: "Bob", age: 30 } } },
    { insertOne: { document: { name: "Alice" } } },  // Error: duplicate
    { insertOne: { document: { name: "David", age: 35 } } }  // Still executed
  ], { ordered: false })
} catch (error) {
  // 3 operations succeed, 1 fails, all errors reported
  console.log(error.writeErrors)
}

// Mixed operations bulk write
const operations = [
  { insertOne: { document: { name: "Eve", age: 28 } } },
  { updateOne: {
      filter: { name: "Alice" },
      update: { $set: { age: 26 } }
  }},
  { deleteOne: { filter: { name: "Bob" } } },
  { replaceOne: {
      filter: { name: "Charlie" },
      replacement: { name: "Charlie", age: 40, city: "NYC" }
  }}
]

const result = await db.users.bulkWrite(operations, { ordered: false })

console.log(result)
// {
//   insertedCount: 1,
//   matchedCount: 2,
//   modifiedCount: 2,
//   deletedCount: 1,
//   upsertedCount: 0
// }

94. What are the key features and advantages of MongoDB Atlas compared to self-hosted MongoDB?

Difficulty: MediumType: SubjectiveTopic: Atlas

MongoDB Atlas is a fully managed cloud database service that eliminates the operational overhead of running MongoDB infrastructure. While self-hosted MongoDB provides full control, Atlas offers significant advantages in ease of use, reliability, and advanced features. First, automated deployment and scaling. Atlas handles cluster provisioning, configuration, and scaling across AWS, Azure, and Google Cloud. You can scale vertically by changing instance sizes or horizontally by adding shards with a few clicks. Auto-scaling can adjust resources based on workload patterns. Self-hosted MongoDB requires manual server provisioning, configuration management, and scaling procedures. Second, continuous backups with point-in-time recovery. Atlas automatically takes snapshots and captures oplog entries, allowing restore to any point in time. Cloud backups are stored in different regions for disaster recovery. Self-hosted deployments require implementing and maintaining backup solutions manually. Third, built-in monitoring and alerting. Atlas provides real-time performance metrics, query profiling, and customizable alerts for issues like high CPU, replication lag, or connection spikes. The performance advisor suggests index improvements. Self-hosted monitoring requires deploying and configuring separate monitoring tools. Fourth, global clusters for multi-region deployments. Atlas supports geo-distributed clusters with data locality controls, allowing you to store data close to users in different regions while maintaining a single database view. This reduces latency and supports compliance requirements. Implementing multi-region setups with self-hosted MongoDB is complex. Fifth, integrated security features including encryption at rest and in transit by default, network isolation through VPC peering, IP whitelisting, database authentication with various mechanisms, and audit logging. Self-hosted deployments must configure each security layer manually. Sixth, automated patching and updates. Atlas handles MongoDB version upgrades, security patches, and driver updates with minimal downtime. Self-hosted deployments require planning and executing maintenance windows. Seventh, serverless instances for development and variable workloads. Atlas Serverless automatically scales compute and storage based on usage and charges only for resources consumed. This is ideal for applications with unpredictable traffic or development environments. The main disadvantages of Atlas are cost, which can be higher than self-hosted for stable workloads, less control over infrastructure and configuration details, and potential vendor lock-in. Choose Atlas when you want to focus on application development rather than database operations, need enterprise features without complexity, require multi-region deployment, or want predictable operational costs.

Example Code

// Atlas Connection String
const uri = "mongodb+srv://user:pass@cluster0.mongodb.net/mydb?retryWrites=true&w=majority"

// Atlas Exclusive Features:

// 1. Global Clusters (multi-region)
// - Data in US, Europe, Asia
// - Zone sharding by region
// - Low-latency worldwide access

// 2. Atlas Search (full-text search)
db.products.aggregate([
  {
    $search: {
      index: "default",
      text: {
        query: "laptop gaming",
        path: ["title", "description"]
      }
    }
  }
])

// 3. Atlas Data Lake (query S3 data)
// - Analyze data in cloud storage
// - Federated queries across MongoDB and S3

// 4. Automated Backup Schedule
// - Snapshots: every 6-24 hours
// - Oplog: continuous
// - Retention: 7 days to indefinite
// - Point-in-time restore to any second

// 5. Performance Advisor
// - Suggests missing indexes
// - Identifies slow queries
// - Recommends schema improvements

// 6. Atlas Triggers (serverless functions)
// - React to database changes
// - Scheduled functions
// - Event-driven architecture

// 7. Charts (data visualization)
// - Build dashboards
// - Embedded analytics
// - No coding required

95. Compare mongodump/mongorestore with mongoexport/mongoimport. When should you use each?

Difficulty: MediumType: SubjectiveTopic: Import Export

MongoDB provides two pairs of tools for data backup and migration: mongodump with mongorestore, and mongoexport with mongoimport. Each serves different purposes and has distinct characteristics. Mongodump and mongorestore work with BSON format, the native binary format MongoDB uses internally. Mongodump creates a binary backup of databases or collections, preserving all data types, indexes, and collection options. Mongorestore recreates collections from these binary backups, maintaining fidelity to the original data. Advantages of mongodump and mongorestore include preserving all BSON data types including binary data, dates, and ObjectId that would lose fidelity in JSON, backing up and restoring indexes and collection settings, faster performance for large datasets because binary format is more compact and faster to process, and full database backups including system collections. Disadvantages include binary format is not human-readable, output is not easily editable, and less portable across different MongoDB versions or external systems. Mongoexport and mongoimport work with human-readable formats like JSON and CSV. Mongoexport exports collection data to JSON or CSV files. Mongoimport imports data from JSON, CSV, or TSV files into collections. Advantages of mongoexport and mongoimport include human-readable output that can be viewed and edited in text editors, CSV format integrates easily with spreadsheets and other tools, flexibility to import from external sources not originally from MongoDB, and ability to export selected fields and query results. Disadvantages include some data types lose precision or cannot be represented in JSON or CSV, indexes and collection settings are not exported, slower performance for large datasets, and potential data type conversion issues. Use mongodump and mongorestore for full database backups, disaster recovery, migrating entire databases between MongoDB instances, preserving exact data including all types and indexes, and when performance matters for large datasets. Use mongoexport and mongoimport for data migration to non-MongoDB systems, sharing data with external tools or teams, exporting reports or subsets of data, loading test data or initial datasets, migrating specific collections or query results, and when you need human-readable format for inspection or editing.

Example Code

// MONGODUMP / MONGORESTORE (Binary BSON)

// Backup entire database
mongodump --host=localhost --port=27017 \
  --db=mydb --out=/backup/
// Creates: /backup/mydb/ with .bson and .metadata.json files

// Backup specific collection
mongodump --db=mydb --collection=users --out=/backup/

// Backup with query filter
mongodump --db=mydb --collection=orders \
  --query='{"status": "completed"}' --out=/backup/

// Restore entire database
mongorestore --host=localhost --port=27017 \
  --db=mydb /backup/mydb/

// Restore with different name
mongorestore --db=mydb_copy /backup/mydb/

// MONGOEXPORT / MONGOIMPORT (Human-readable)

// Export to JSON
mongoexport --db=mydb --collection=users \
  --out=users.json
// Creates readable JSON file

// Export to CSV with specific fields
mongoexport --db=mydb --collection=users \
  --type=csv --fields=name,email,age --out=users.csv

// Export with query
mongoexport --db=mydb --collection=orders \
  --query='{"total": {"$gt": 100}}' --out=large_orders.json

// Import JSON
mongoimport --db=mydb --collection=users \
  --file=users.json

// Import CSV with header
mongoimport --db=mydb --collection=users \
  --type=csv --headerline --file=users.csv

// Import with upsert (update or insert)
mongoimport --db=mydb --collection=products \
  --file=products.json --mode=upsert \
  --upsertFields=productId

// WHEN TO USE EACH:
// mongodump/restore: Production backups, full migrations, preserve everything
// mongoexport/import: Data sharing, CSV integration, subset exports, human-readable

96. Given an employees collection with nested salary history and skills array, write queries to find employees with specific criteria and calculate aggregations.

Difficulty: HardType: SubjectiveTopic: Querying

Working with complex document structures in MongoDB requires understanding nested field access, array queries, and aggregation pipelines. Let's explore practical query scenarios using an employees collection with embedded documents and arrays. For querying nested fields, use dot notation to access fields within embedded documents. For example, to find employees in a specific city, query address.city. MongoDB treats embedded documents as regular fields once you use dot notation. For array queries, MongoDB provides specialized operators. To find employees with a specific skill, use the array field directly and MongoDB matches if any element equals the value. For complex array conditions, use dollar elemMatch to ensure all conditions apply to the same array element. For aggregation scenarios, use the aggregation pipeline to perform calculations across documents. The dollar group stage groups documents by fields and calculates aggregates using accumulator operators like dollar sum, dollar avg, dollar max, and dollar min. For sorting by nested or computed fields, use dollar addFields to create temporary fields, then dollar sort on those fields. This is useful when sorting by array length or computed values. For filtering arrays and projections, use dollar filter in aggregation pipelines to include only specific array elements in results. This is more powerful than projection alone because you can apply complex conditions. For combining multiple conditions, use logical operators like dollar and, dollar or, and dollar not to build complex queries. Array operators like dollar all ensure arrays contain all specified values, while dollar in matches if the field equals any value in an array. These techniques combine to enable sophisticated queries on complex document structures while maintaining query performance through proper indexing on query fields.

Example Code

// Sample employees collection structure
{
  _id: 1,
  name: "Alice Johnson",
  age: 32,
  department: "Engineering",
  salary: 95000,
  address: {
    street: "123 Main St",
    city: "San Francisco",
    zip: "94102"
  },
  skills: ["JavaScript", "Python", "MongoDB"],
  projects: [
    { name: "Project A", role: "Lead", duration: 6 },
    { name: "Project B", role: "Developer", duration: 3 }
  ]
}

// Query 1: Find employees in San Francisco with salary > 80000
db.employees.find({
  "address.city": "San Francisco",
  salary: { $gt: 80000 }
})

// Query 2: Find employees with both JavaScript AND Python skills
db.employees.find({
  skills: { $all: ["JavaScript", "Python"] }
})

// Query 3: Find employees who were Lead in any project for > 5 months
db.employees.find({
  projects: {
    $elemMatch: {
      role: "Lead",
      duration: { $gt: 5 }
    }
  }
})

// Query 4: Count employees by department
db.employees.aggregate([
  {
    $group: {
      _id: "$department",
      count: { $sum: 1 },
      avgSalary: { $avg: "$salary" }
    }
  },
  { $sort: { avgSalary: -1 } }
])

// Query 5: Find top 3 highest paid employees by department
db.employees.aggregate([
  { $sort: { department: 1, salary: -1 } },
  {
    $group: {
      _id: "$department",
      topEmployees: {
        $push: {
          name: "$name",
          salary: "$salary"
        }
      }
    }
  },
  {
    $project: {
      department: "$_id",
      topEmployees: { $slice: ["$topEmployees", 3] },
      _id: 0
    }
  }
])

// Query 6: Employees with more than 3 skills, sorted by skill count
db.employees.aggregate([
  {
    $addFields: {
      skillCount: { $size: "$skills" }
    }
  },
  { $match: { skillCount: { $gt: 3 } } },
  { $sort: { skillCount: -1 } },
  {
    $project: {
      name: 1,
      skillCount: 1,
      skills: 1
    }
  }
])

// Query 7: Employees in SF or NYC with specific skills
db.employees.find({
  $or: [
    { "address.city": "San Francisco" },
    { "address.city": "New York" }
  ],
  skills: { $in: ["MongoDB", "PostgreSQL"] }
})

97. What steps and considerations are involved in migrating a large production database from a single MongoDB instance to a sharded cluster with minimal downtime?

Difficulty: HardType: SubjectiveTopic: Production Ops

Migrating a production database to a sharded cluster is a complex operation requiring careful planning, testing, and execution to minimize downtime and ensure data integrity. The process involves multiple phases and careful coordination. Phase one is planning and preparation. First, analyze your current workload, data size, growth rate, and query patterns. Identify the optimal shard key based on query patterns, data distribution, and cardinality. A poor shard key choice cannot be changed without rebuilding, so this is critical. Second, design your target sharded cluster architecture including the number of shards, shard replica sets, config servers, and mongos routers. Third, prepare the infrastructure by provisioning servers for config servers, shards, and mongos routers. Configure networking, security, and monitoring. Fourth, test the entire migration process in a staging environment with a copy of production data. Verify application compatibility with mongos, test failover scenarios, and measure performance. Phase two is deploying the sharded infrastructure. Deploy config servers as a replica set, deploy each shard as a replica set, deploy mongos routers, and configure authentication and TLS. At this point, the sharded cluster is empty and ready to receive data. Phase three is the actual migration with minimal downtime. The recommended approach uses MongoDB's live migration capabilities. First, add your existing replica set as the first shard in the cluster. Second, enable sharding on the database using sh.enableSharding. Third, shard the collections using sh.shardCollection with your chosen shard key. At this point, all data is on one shard. Fourth, add additional empty shards to the cluster. Fifth, the balancer automatically begins migrating chunks to new shards, distributing data. Sixth, update application connection strings to use mongos instead of direct replica set connections. Use connection pooling and retry logic for resilience. During migration, monitor chunk migration progress, watch for any application errors or timeouts, verify data distribution across shards is balanced, and monitor performance metrics on all components. Phase four is post-migration validation. Verify all data migrated successfully by comparing document counts, test application functionality thoroughly including read and write operations, monitor performance and optimize indexes if needed, and establish backup and disaster recovery procedures for the sharded cluster. Key considerations include choosing an optimal shard key, extensive testing in staging before production, having a rollback plan, communicating with stakeholders about migration timeline, and planning for increased operational complexity of managing a sharded cluster.

Example Code

// MIGRATION PROCESS

// Current: Single replica set
// Target: 3-shard cluster

// Step 1: Deploy infrastructure
// - Config servers (3-member replica set)
// - Shard 1 (existing replica set)
// - Shard 2 (new 3-member replica set)
// - Shard 3 (new 3-member replica set)
// - Mongos routers (2+ instances)

// Step 2: Initialize config server replica set
rs.initiate({
  _id: "configReplSet",
  configsvr: true,
  members: [
    { _id: 0, host: "cfg1:27019" },
    { _id: 1, host: "cfg2:27019" },
    { _id: 2, host: "cfg3:27019" }
  ]
})

// Step 3: Start mongos
mongos --configdb configReplSet/cfg1:27019,cfg2:27019,cfg3:27019

// Step 4: Add existing replica set as first shard
sh.addShard("rs0/host1:27017,host2:27017,host3:27017")

// Step 5: Enable sharding on database
sh.enableSharding("mydb")

// Step 6: Shard the collection
sh.shardCollection("mydb.users", { userId: 1 })
// Or hashed: sh.shardCollection("mydb.users", { userId: "hashed" })

// Step 7: Add additional shards
sh.addShard("rs1/shard2-host1:27017,shard2-host2:27017")
sh.addShard("rs2/shard3-host1:27017,shard3-host2:27017")

// Step 8: Monitor chunk migration
sh.status()
db.printShardingStatus()

// Check balancer status
sh.isBalancerRunning()

// View chunk distribution
use config
db.chunks.find({ ns: "mydb.users" }).count()
db.chunks.aggregate([
  { $group: { _id: "$shard", count: { $sum: 1 } } }
])

// Step 9: Update application connection string
// Old: mongodb://host1:27017,host2:27017/?replicaSet=rs0
// New: mongodb://mongos1:27017,mongos2:27017/

// Step 10: Verify data
db.users.countDocuments()  // Should match original count

// ROLLBACK PLAN (if needed)
// 1. Stop application writes
// 2. Remove shards (except original)
// 3. Disable sharding on database
// 4. Reconnect app to original replica set
// 5. Resume operations

// MONITORING DURING MIGRATION
db.currentOp()  // Watch for chunk migrations
db.serverStatus().sharding  // Shard statistics
mongotop 5  // Monitor collection activity
mongostat 5  // Monitor operations

Master Interviews Anywhere, Anytime

All MongoDB/NoSQL Interview Questions

Overview

1. What format does MongoDB use to store data internally?

2. What field serves as the primary key in MongoDB documents?

Continue your preparation

3. Which of the following is NOT a valid MongoDB data type?

4. What is the main advantage of MongoDB's schema-less design?

5. Which method would you use to insert multiple documents at once in MongoDB?

6. Which operator is used to update or add a field in a MongoDB document?

7. How do you create a new database in MongoDB?

8. What is returned when you successfully delete a document using deleteOne()?

9. Explain the difference between a Document and a Collection in MongoDB.

10. What are the key differences between MongoDB and relational databases?

11. Explain the four basic CRUD operations in MongoDB with examples.

12. When should you choose MongoDB over a relational database?

13. What is the difference between find() and findOne() methods in MongoDB?

14. Explain horizontal scalability in MongoDB and how it differs from vertical scalability.

15. What are embedded documents in MongoDB and when should you use them?

16. Which operator is used to find documents where a field value is greater than a specified value?

17. What is the default index automatically created by MongoDB for every collection?

18. Which operator matches documents where an array field contains all specified elements?

19. Which method is used to create an index in MongoDB?

20. In MongoDB projections, what does { name: 1, age: 1, _id: 0 } do?

21. What type of index is required to perform text search in MongoDB?

22. For a compound index { age: 1, city: 1 }, which query can use this index efficiently?

23. Explain the difference between $in and $or operators in MongoDB queries.

24. How do indexes improve query performance in MongoDB and what are the trade-offs?

25. What is a compound index and when should you use it? Explain the index prefix rule.

26. What strategies can you use to optimize slow MongoDB queries?

27. How do you query array fields in MongoDB? Explain $elemMatch with an example.

28. What is the explain() method in MongoDB and what key metrics should you look for when analyzing query performance?

29. Which aggregation stage is used to filter documents in the pipeline?

30. In the aggregation pipeline, which accumulator operator calculates the average value?

31. What is the recommended order for aggregation stages to optimize performance?

32. What does the $lookup stage do in MongoDB aggregation?

33. What is the purpose of the $project stage in aggregation?

34. What does the $unwind stage do in an aggregation pipeline?

35. Explain what the MongoDB Aggregation Framework is and how it differs from regular find queries.

36. Describe the most commonly used aggregation pipeline stages and their purposes.

37. How would you use aggregation to find the top 5 customers by total purchase amount? Explain each stage.

38. What are accumulator operators in MongoDB aggregation and when would you use $push vs $addToSet?

39. Explain how to use $lookup to join collections in MongoDB aggregation. When should you avoid using $lookup?

40. What is Map-Reduce in MongoDB and how does it compare to the Aggregation Framework?

41. What are the best practices for optimizing aggregation pipeline performance?

42. Explain the difference between $out and $merge stages in aggregation pipelines.

43. What is a replica set in MongoDB?

44. Which statement is TRUE about the primary node in a replica set?

45. What is the oplog in MongoDB replication?

46. When does an election occur in a MongoDB replica set?

47. What does write concern { w: 'majority' } ensure?

48. Which read preference directs read operations to secondary nodes only?

49. Explain the concept of replica sets in MongoDB and how they ensure high availability.

50. Describe the replication architecture in MongoDB. How do secondary nodes stay synchronized with the primary?

51. Explain how replica set elections work in MongoDB. What factors determine which secondary becomes the new primary?

52. What is the oplog and why is its size important? How do you determine the appropriate oplog size?

53. Explain write concerns in MongoDB. What is the difference between w:1, w:'majority', and w:'all'?

54. What are read concerns in MongoDB and how do they ensure data consistency?

55. Walk through what happens during a replica set failover when the primary node fails.

56. What are arbiter nodes in MongoDB replica sets and when should you use them?

57. What is sharding in MongoDB?

58. Which characteristic is MOST important when choosing a shard key?

59. What is the role of mongos in a sharded cluster?

60. What do config servers store in a MongoDB sharded cluster?

61. What triggers chunk migration in a sharded cluster?

62. What is the main advantage of hashed sharding over range-based sharding?

63. Explain the concept of sharding in MongoDB and when you should implement it.

64. What factors should you consider when choosing a shard key? Explain the consequences of a poor shard key choice.

65. Describe the components of a MongoDB sharded cluster and how they work together.

66. Explain the difference between horizontal and vertical scaling. Why does MongoDB prefer horizontal scaling?

67. What are chunks in MongoDB sharding and how does chunk splitting and migration work?

68. Explain the difference between targeted queries and broadcast queries in a sharded cluster. How does this affect performance?

69. What is zone sharding in MongoDB and when would you use it?

70. What are the limitations and challenges of sharding in MongoDB?

71. What is required to use multi-document transactions in MongoDB?

72. Which storage engine is the default in MongoDB 3.2 and later?

73. When should you use GridFS in MongoDB?

74. What happens to documents when a TTL (Time To Live) index expires them?

75. What is a key characteristic of capped collections in MongoDB?

76. What do Change Streams in MongoDB allow applications to do?

Master Interviews
Anywhere, Anytime