1. Which SQL statement retrieves all columns and rows from a table named employees?
Difficulty: EasyType: MCQTopic: SQL Basics & DML
- READ * FROM employees;
- SELECT ALL FROM employees;
- SELECT * FROM employees;
- GET * FROM employees;
SELECT chooses columns and rows from one or more tables. The star is a shorthand for all columns.
In production, prefer listing columns to control payloads, avoid hidden schema breaks, and help indexes get used effectively.
Correct Answer: SELECT * FROM employees;
Example Code
SELECT * FROM employees;
2. Which clause filters rows before aggregation and affects which rows are grouped?
Difficulty: EasyType: MCQTopic: SQL Basics & DML
- HAVING
- WHERE
- ORDER BY
- GROUP BY
WHERE applies to raw rows, before grouping. It reduces input early and improves performance when selective.
Use HAVING after GROUP BY when you need to filter on aggregate results, like COUNT or SUM.
Correct Answer: WHERE
Example Code
SELECT department_id, COUNT(*)
FROM employees
WHERE active = 1
GROUP BY department_id;
3. What is the default sort order when ORDER BY is used without ASC or DESC?
Difficulty: EasyType: MCQTopic: SQL Basics & DML
- Random order
- Descending
- Ascending
- Engine-dependent and undefined
ORDER BY sorts ascending by default. Use DESC for reverse order.
Without ORDER BY, row order is not guaranteed, even if it looks stable in small tests.
Correct Answer: Ascending
Example Code
SELECT name FROM customers ORDER BY name;
4. Which construct is commonly used to paginate results in SQL engines like Postgres and MySQL?
Difficulty: MediumType: MCQTopic: SQL Basics & DML
- TOP n ONLY
- FETCH RANDOM n
- LIMIT n OFFSET m
- GROUP BY n, m
LIMIT n OFFSET m returns a window of rows. It is simple and widely supported.
For large pages, prefer keyset pagination using a stable sort key to avoid deep offsets and performance drops.
Correct Answer: LIMIT n OFFSET m
Example Code
SELECT id, name
FROM products
ORDER BY id
LIMIT 20 OFFSET 40;
5. Which predicate correctly checks for missing values in standard SQL?
Difficulty: MediumType: MCQTopic: SQL Basics & DML
- WHERE col = NULL
- WHERE col IS NULL
- WHERE col == NULL
- WHERE NULL IN (col)
NULL represents unknown. Comparisons with equals return unknown, not true. Use IS NULL or IS NOT NULL.
When aggregating, functions like COUNT(col) skip NULLs, while COUNT(*) counts all rows.
Correct Answer: WHERE col IS NULL
Example Code
SELECT COUNT(*) total, COUNT(email) with_email
FROM users
WHERE deleted_at IS NULL;
6. What is the main purpose of using column or table aliases in a SELECT statement?
Difficulty: EasyType: MCQTopic: SQL Basics & DML
- To create a physical copy of the table
- To rename objects permanently
- To give temporary readable names that improve clarity
- To enforce unique constraints
Aliases are temporary names for columns or tables in a single query. They improve readability and help when the same table is referenced twice.
They do not change schema or metadata; they exist only during the statement.
Correct Answer: To give temporary readable names that improve clarity
Example Code
SELECT e.id AS emp_id, d.name AS dept
FROM employees e JOIN departments d ON d.id = e.dept_id;
7. Which query returns rows where city is either 'Delhi' or 'Mumbai'?
Difficulty: EasyType: MCQTopic: SQL Basics & DML
- SELECT * FROM stores WHERE city BETWEEN 'Delhi' AND 'Mumbai';
- SELECT * FROM stores WHERE city IN ('Delhi','Mumbai');
- SELECT * FROM stores WHERE city LIKE 'Delhi|Mumbai';
- SELECT * FROM stores WHERE city HAS ANY ('Delhi','Mumbai');
IN tests membership in a list. It is clear for short, explicit sets.
BETWEEN is for ranges, and LIKE is for pattern matching. Use the right tool for the comparison you need.
Correct Answer: SELECT * FROM stores WHERE city IN ('Delhi','Mumbai');
Example Code
SELECT * FROM stores WHERE city IN ('Delhi','Mumbai');8. Which statement inserts a single row into a table?
Difficulty: EasyType: MCQTopic: SQL Basics & DML
- UPDATE table SET ...;
- INSERT INTO table (cols) VALUES (...);
- CREATE INTO table VALUES (...);
- PUT INTO table VALUES (...);
INSERT adds new rows. List columns in a stable order to avoid surprises when schema evolves.
Bulk loads and multi-row inserts are faster than many single-row inserts for large data.
Correct Answer: INSERT INTO table (cols) VALUES (...);
Example Code
INSERT INTO employees (id, name, email)
VALUES (101, 'Asha', 'asha@example.com');
9. What is the safest pattern to update or delete rows in production?
Difficulty: MediumType: MCQTopic: SQL Basics & DML
- Run without WHERE to catch all cases
- Always use LIMIT with DELETE
- Target with a precise WHERE predicate and verify with SELECT first
- Disable foreign keys temporarily for speed
Preview the effect with a SELECT using the same predicate. Then run the UPDATE or DELETE.
Consider transactions and backups when touching many rows. Cascades may propagate changes across tables.
Correct Answer: Target with a precise WHERE predicate and verify with SELECT first
Example Code
-- preview
SELECT * FROM orders WHERE status='CANCELLED';
-- then
DELETE FROM orders WHERE status='CANCELLED';
10. Why do many teams avoid SELECT * in production queries? Give practical reasons.
Difficulty: EasyType: SubjectiveTopic: SQL Basics & DML
SELECT star pulls every column, even those you do not need. That increases I O, network transfer, and CPU, and can hide poor indexing because wider rows change access paths.
Listing columns makes intent clear, stabilizes APIs, and reduces breakage when new columns are added. It also helps the optimizer when covering indexes can satisfy the query without touching the base table.
Example Code
SELECT id, name, status
FROM tickets
WHERE status='OPEN';
11. Explain how NULL affects comparisons, sorting, and aggregates. How do you handle it safely?
Difficulty: MediumType: SubjectiveTopic: SQL Basics & DML
NULL means unknown. Comparisons like col = NULL are unknown, so conditions fail. Use IS NULL and IS NOT NULL. Sort order places NULLs either first or last depending on the engine or explicit NULLS FIRST or LAST.
Aggregates skip NULLs except COUNT(*). Use COALESCE to substitute defaults, and design constraints to avoid surprise NULLs where business rules require values.
Example Code
SELECT COALESCE(phone,'N/A') AS phone
FROM users
ORDER BY phone NULLS LAST;
12. Describe how LIKE works with % and _ wildcards. When should you avoid it?
Difficulty: EasyType: SubjectiveTopic: SQL Basics & DML
Percent matches any sequence of characters, including empty. Underscore matches exactly one character. Anchoring at the end, like 'abc%', can use indexes; leading wildcards like '%abc' usually force scans.
Use LIKE for simple patterns. For heavy text search or complex rules, prefer full text indexes or regex features to keep queries fast and precise.
Example Code
SELECT * FROM students WHERE first_name LIKE 'K%';
SELECT * FROM students WHERE first_name LIKE '__K%';
13. How do you choose appropriate data types for columns, and why does it matter?
Difficulty: MediumType: SubjectiveTopic: SQL Basics & DML
Pick types that reflect domain rules and size. Use integers for ids, proper date or timestamp types for time, and numeric with scale for money. Narrow types reduce storage, improve cache usage, and speed scans and joins.
Consistent types across join keys prevent implicit casts that block index use. Document units and time zones to avoid logic bugs later.
Example Code
CREATE TABLE payments(
id BIGINT PRIMARY KEY,
amount NUMERIC(12,2) NOT NULL,
paid_at TIMESTAMP NOT NULL
);
14. Outline a safe pattern to perform a multi-step change: insert a row, update a counter, and read back the result.
Difficulty: MediumType: SubjectiveTopic: SQL Basics & DML
Wrap the steps in a transaction so all succeed or all roll back. Insert the detail row, update the aggregate with a precise predicate, then select the final state to confirm. Keep the transaction short to reduce lock time.
Check for errors at each step and commit only when all statements succeed. This preserves consistency under concurrent load.
Example Code
BEGIN;
INSERT INTO orders(id, customer_id, total) VALUES(?, ?, ?);
UPDATE customers SET order_count = order_count + 1 WHERE id = ?;
SELECT order_count FROM customers WHERE id = ?;
COMMIT;
15. You need customers who placed more than 5 orders in 2025. Where should the date filter go and where should the count filter go?
Difficulty: EasyType: MCQTopic: Aggregation & Grouping
- Date in HAVING, count in WHERE
- Both filters in WHERE
- Date in WHERE, count in HAVING
- Both filters in HAVING
WHERE filters individual rows before grouping. Put row-level criteria like a date range there so the aggregation only sees relevant rows.
HAVING filters groups after GROUP BY. Use it for predicates on aggregates like COUNT, SUM, or AVG.
Correct Answer: Date in WHERE, count in HAVING
Example Code
SELECT customer_id, COUNT(*) AS orders_2025
FROM orders
WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01'
GROUP BY customer_id
HAVING COUNT(*) > 5;
16. Which statement about COUNT is correct?
Difficulty: EasyType: MCQTopic: Aggregation & Grouping
- COUNT(col) includes NULL values
- COUNT(*) skips rows with NULL columns
- COUNT(col) ignores NULLs; COUNT(*) counts all rows
- COUNT(DISTINCT col) equals COUNT(col)
COUNT(column) counts only non-NULL entries in that column. COUNT(*) includes every row regardless of NULLs.
When you need unique non-NULLs, use COUNT(DISTINCT col).
Correct Answer: COUNT(col) ignores NULLs; COUNT(*) counts all rows
Example Code
SELECT COUNT(*) all_rows, COUNT(email) emails
FROM users;
17. In standard SQL, which SELECT list is valid with GROUP BY?
Difficulty: MediumType: MCQTopic: Aggregation & Grouping
- Non-aggregated columns must all appear in GROUP BY
- Any column can appear; the engine picks a value
- Only aggregated columns are allowed; no GROUP BY columns
- GROUP BY columns must be numeric
Every non-aggregated column in the SELECT list must be functionally dependent on the GROUP BY columns, and practically must be listed there.
If a column is not grouped, aggregate it or remove it from the projection.
Correct Answer: Non-aggregated columns must all appear in GROUP BY
Example Code
SELECT dept_id, COUNT(*)
FROM employees
GROUP BY dept_id;
18. How do you count unique paying customers per month?
Difficulty: MediumType: MCQTopic: Aggregation & Grouping
- COUNT(customer_id)
- COUNT(*)
- COUNT(DISTINCT customer_id)
- SUM(DISTINCT customer_id)
COUNT(DISTINCT col) deduplicates values before counting. It is ideal when multiple rows per customer exist.
If you also group by month, you get unique customers per month.
Correct Answer: COUNT(DISTINCT customer_id)
Example Code
SELECT date_trunc('month', paid_at) AS mon,
COUNT(DISTINCT customer_id) AS unique_buyers
FROM payments
GROUP BY date_trunc('month', paid_at);19. What is a common pattern to compute conditional totals in a single pass?
Difficulty: MediumType: MCQTopic: Aggregation & Grouping
- Multiple correlated subqueries
- SUM(CASE WHEN condition THEN value ELSE 0 END)
- JOIN the table to itself per condition
- Use DISTINCT inside SUM directly
Conditional aggregation uses CASE to route rows into different buckets, then aggregates once. It is efficient and expressive.
On PostgreSQL, FILTER(...) is an alternative, but CASE works across engines.
Correct Answer: SUM(CASE WHEN condition THEN value ELSE 0 END)
Example Code
SELECT
SUM(CASE WHEN status='PAID' THEN amount ELSE 0 END) AS paid_amt,
SUM(CASE WHEN status='REFUND' THEN amount ELSE 0 END) AS refund_amt
FROM invoices;
20. Which is the best practice when filtering rows on a simple column value for a grouped report?
Difficulty: MediumType: MCQTopic: Aggregation & Grouping
- Put the filter in HAVING to be safe
- Put the filter in WHERE for better performance
- Put the filter in ORDER BY for stability
- Duplicate the filter in WHERE and HAVING
Row-level predicates belong in WHERE. That reduces data before grouping and often yields simpler plans.
Reserve HAVING for predicates that depend on aggregates.
Correct Answer: Put the filter in WHERE for better performance
Example Code
SELECT store_id, SUM(total) FROM sales
WHERE region='APAC'
GROUP BY store_id;
21. You need average rating per product, but some rows have NULL rating. What does AVG(rating) return?
Difficulty: EasyType: MCQTopic: Aggregation & Grouping
- Zero if any NULL exists
- Error due to NULLs
- Mean of non-NULL ratings only
- Mean including NULL as zero
AVG ignores NULL values. Only present ratings contribute to the average.
Use COALESCE when you want to substitute a default before aggregation.
Correct Answer: Mean of non-NULL ratings only
Example Code
SELECT product_id, AVG(rating) AS avg_rating
FROM reviews
GROUP BY product_id;
22. What’s a portable way to group daily sales by date if your engine lacks DATE_TRUNC?
Difficulty: MediumType: MCQTopic: Aggregation & Grouping
- GROUP BY HOUR(sale_ts)
- GROUP BY CAST(sale_ts AS DATE)
- GROUP BY TO_CHAR(sale_ts, 'YYYYMMDD') for all engines
- Group without conversion; the engine will infer days
Casting a timestamp to DATE groups all times on the same calendar day. It is widely supported across engines.
String formatting works but can introduce locale issues and sort quirks.
Correct Answer: GROUP BY CAST(sale_ts AS DATE)
Example Code
SELECT CAST(sale_ts AS DATE) AS d, SUM(amount)
FROM sales
GROUP BY CAST(sale_ts AS DATE);
23. In PostgreSQL, which syntax cleanly aggregates only rows that match a condition without CASE?
Difficulty: MediumType: MCQTopic: Aggregation & Grouping
- SUM(*) WHERE condition
- SUM(value) FILTER (WHERE condition)
- SUM(DISTINCT value WHERE condition)
- SUM(value) ONLY IF condition
The FILTER clause applies a WHERE-like predicate to a single aggregate. It keeps the query readable when many conditional metrics are needed.
On engines without FILTER, use SUM(CASE WHEN ... THEN value ELSE 0 END).
Correct Answer: SUM(value) FILTER (WHERE condition)
Example Code
SELECT
COUNT(*) AS orders,
SUM(amount) FILTER (WHERE status='PAID') AS paid_amt
FROM orders;
24. After aggregating, which is the simplest way to sort by the computed total?
Difficulty: EasyType: MCQTopic: SQL Basics & DML
- Repeat the SUM expression in ORDER BY
- Use the alias of the aggregate in ORDER BY
- ORDER BY must use column numbers only
- ORDER BY is not allowed after GROUP BY
Most engines allow ORDER BY on a SELECT alias. It keeps the query clean and avoids repetition.
If your engine forbids it, repeat the expression or use the ordinal position carefully.
Correct Answer: Use the alias of the aggregate in ORDER BY
Example Code
SELECT customer_id, SUM(total) AS spend
FROM sales
GROUP BY customer_id
ORDER BY spend DESC;
25. Explain how CASE and COALESCE help build robust grouped reports when data is messy.
Difficulty: MediumType: SubjectiveTopic: SQL Basics & DML
CASE lets you route rows into buckets based on business rules, then aggregate each bucket in one scan. This avoids extra joins and keeps logic in the query. You can also map null or out-of-range values into a safe category before you group.
COALESCE substitutes defaults for NULL so aggregates and comparisons behave predictably. For example, you can turn missing amounts into zero for totals, or fill missing categories with “Unknown” so the group appears in the report instead of disappearing.
Example Code
SELECT COALESCE(category,'Unknown') AS cat,
SUM(CASE WHEN status='PAID' THEN amount ELSE 0 END) AS paid
FROM invoices
GROUP BY COALESCE(category,'Unknown');26. How would you return each customer’s latest order total without losing other customers? Compare two approaches.
Difficulty: HardType: SubjectiveTopic: Window Functions
A window function solution ranks orders per customer by date and then filters to rank equals one. This scans once, computes row-numbers in a partition, and is easy to extend for ties or top-k.
A join solution aggregates to the max order date per customer, then joins back to orders to fetch the row. It is portable but can be trickier with ties or if dates are not unique. Window functions are preferred when available for clarity and performance.
Example Code
SELECT customer_id, total
FROM (
SELECT customer_id, total,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
FROM orders
) t
WHERE rn = 1;27. You need products with zero sales in 2025. Show two correct patterns and explain their trade-offs.
Difficulty: MediumType: SubjectiveTopic: Joins & Set Operations
A LEFT JOIN with WHERE right.id IS NULL is a classic anti-join. It is readable and can use indexes on the join keys. Be sure the join predicate covers the date range so matched rows are only from 2025.
NOT EXISTS with a correlated subquery is also correct. It stops on first match and handles NULLs cleanly. Engines often produce similar plans; prefer the pattern your team uses consistently.
Example Code
SELECT p.product_id
FROM products p
LEFT JOIN sales s
ON s.product_id = p.product_id
AND s.sale_date >= '2025-01-01' AND s.sale_date < '2026-01-01'
WHERE s.product_id IS NULL;
-- or
SELECT p.product_id
FROM products p
WHERE NOT EXISTS (
SELECT 1 FROM sales s
WHERE s.product_id=p.product_id
AND s.sale_date >= '2025-01-01' AND s.sale_date < '2026-01-01'
);28. How do DISTINCT and GROUP BY differ when removing duplicates, and when would you prefer one?
Difficulty: MediumType: SubjectiveTopic: SQL Basics & DML
DISTINCT removes duplicate rows from the projection exactly as listed. It is concise when you just need unique combinations of columns without any aggregates.
GROUP BY collapses rows by keys and invites aggregates in the same step. Prefer GROUP BY when you also need counts or sums per unique combination. Prefer DISTINCT when you only need uniqueness and no metrics.
Example Code
SELECT DISTINCT user_id FROM logins;
-- vs
SELECT user_id, COUNT(*) FROM logins GROUP BY user_id;
29. Describe a query to return the top 3 products by revenue in each category and explain tie handling.
Difficulty: HardType: SubjectiveTopic: Window Functions
Use a window function to rank rows inside each category by revenue. ROW_NUMBER gives exactly three rows even with ties; RANK or DENSE_RANK include all ties but may return more than three rows per category.
Pick the ranking function that matches the business rule. Then filter on the rank in an outer query for a clean result.
Example Code
SELECT category_id, product_id, revenue
FROM (
SELECT category_id, product_id, SUM(amount) AS revenue,
ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY SUM(amount) DESC) AS rn
FROM sales
GROUP BY category_id, product_id
) t
WHERE rn <= 3;30. What does an INNER JOIN return?
Difficulty: EasyType: MCQTopic: Joins & Set Operations
- All rows from the left table regardless of match
- Only rows that satisfy the join predicate in both tables
- All rows from both tables regardless of match
- Only unmatched rows from both tables
An INNER JOIN keeps only the intersection: rows that match on the join keys in both inputs. If a row does not find a partner, it is excluded.
Use INNER JOIN when you need strictly related data and do not want NULLs from missing matches.
Correct Answer: Only rows that satisfy the join predicate in both tables
Example Code
SELECT o.id, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id;
31. Which statement about LEFT JOIN is correct?
Difficulty: EasyType: MCQTopic: Joins & Set Operations
- It returns only matching rows from left and right
- It returns all left rows and matching right rows; non-matches have NULLs on the right
- It returns all right rows and matching left rows; non-matches have NULLs on the left
- It returns only unmatched rows
LEFT JOIN preserves every row from the left input and fills right-side columns with NULL when a match is missing.
Use it to find presence plus absence cases, such as customers without orders.
Correct Answer: It returns all left rows and matching right rows; non-matches have NULLs on the right
Example Code
SELECT c.id, o.id AS order_id
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id;
32. How does a RIGHT JOIN relate to a LEFT JOIN?
Difficulty: EasyType: MCQTopic: Joins & Set Operations
- They are identical and interchangeable without changes
- RIGHT JOIN preserves the left table; LEFT JOIN preserves the right table
- RIGHT JOIN is the mirror image of LEFT JOIN after swapping table positions
- RIGHT JOIN always performs faster than LEFT JOIN
RIGHT JOIN keeps all rows from the right input and matches from the left; LEFT JOIN does the opposite. Swapping table order turns one into the other.
Most teams standardize on LEFT JOIN for readability and consistency.
Correct Answer: RIGHT JOIN is the mirror image of LEFT JOIN after swapping table positions
Example Code
SELECT *
FROM A LEFT JOIN B ON B.k = A.k;
-- same rows as
SELECT *
FROM B RIGHT JOIN A ON B.k = A.k;
33. What does a FULL OUTER JOIN return?
Difficulty: MediumType: MCQTopic: Joins & Set Operations
- Only rows present in both tables
- All left rows only
- All right rows only
- All rows that appear in either table, matching when possible
FULL OUTER JOIN returns the union of matches and non-matches from both sides, padding missing columns with NULLs.
It is useful for reconciliation and diff-like reports where you must see orphans from both directions.
Correct Answer: All rows that appear in either table, matching when possible
Example Code
SELECT a.id, b.id
FROM A a
FULL OUTER JOIN B b ON b.id = a.id;
34. What is a CROSS JOIN?
Difficulty: EasyType: MCQTopic: Joins & Set Operations
- A join that requires equality match on keys
- A join that returns only unmatched rows
- A join that returns the cartesian product of the two inputs
- A join that removes duplicates automatically
CROSS JOIN pairs every row from the first input with every row from the second. The result has rows equal to rows(A) times rows(B).
Use it to generate combinations such as dates by products, but be careful with size explosion.
Correct Answer: A join that returns the cartesian product of the two inputs
Example Code
SELECT d.dt, p.product_id
FROM dates d
CROSS JOIN products p;
35. When do you use a SELF JOIN?
Difficulty: MediumType: MCQTopic: Joins & Set Operations
- To join a table with a different database
- To join a table with itself to model hierarchies or peer relations
- To join two tables with the same name in different schemas
- To avoid using subqueries
A SELF JOIN uses aliases to treat the same table as two roles. It helps express parent child links or supervisor employee relations.
Aliases are mandatory so the columns from each role can be referenced unambiguously.
Correct Answer: To join a table with itself to model hierarchies or peer relations
Example Code
SELECT e.emp_id, e.name, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.emp_id = e.manager_id;
36. Which pattern correctly returns rows in A that have no match in B?
Difficulty: MediumType: MCQTopic: Joins & Set Operations
- A JOIN B ON ...
- A LEFT JOIN B ON ... WHERE B.id IS NULL
- A RIGHT JOIN B ON ... WHERE A.id IS NULL
- A FULL JOIN B WHERE A.id IS NOT NULL AND B.id IS NOT NULL
A LEFT JOIN followed by WHERE B.id IS NULL forms an anti-join. It keeps left rows that did not match any right row.
NOT EXISTS is an equivalent and often clearer way to write the same logic.
Correct Answer: A LEFT JOIN B ON ... WHERE B.id IS NULL
Example Code
SELECT a.id
FROM A a
LEFT JOIN B b ON b.id = a.id
WHERE b.id IS NULL;
37. How do you select rows from A that have at least one match in B without duplicating A due to multiple matches?
Difficulty: MediumType: MCQTopic: Joins & Set Operations
- A JOIN B and SELECT DISTINCT A columns
- A LEFT JOIN B WHERE B.id IS NULL
- Use EXISTS with a correlated subquery on B
- FULL JOIN and filter non-null sides
EXISTS returns true on the first found match and stops. It avoids row multiplication and usually produces an efficient semi-join plan.
Distinct can work but may add extra sort or hash steps and is less direct.
Correct Answer: Use EXISTS with a correlated subquery on B
Example Code
SELECT a.id
FROM A a
WHERE EXISTS (
SELECT 1 FROM B b WHERE b.a_id = a.id
);
38. In a LEFT JOIN, where should you place a filter on the right table's column to avoid unintentionally turning it into an INNER JOIN?
Difficulty: MediumType: MCQTopic: SQL Basics & DML
- In the WHERE clause
- In the ON clause
- After GROUP BY
- In ORDER BY
Filters on the right table that go in WHERE will eliminate NULL-extended rows and effectively collapse the join to INNER. Put such filters inside ON so unmatched rows are still preserved.
Use WHERE only for predicates that should apply after the join, such as filtering by a left table column.
Correct Answer: In the ON clause
Example Code
SELECT c.id, o.id AS order_id
FROM customers c
LEFT JOIN orders o
ON o.customer_id = c.id
AND o.status = 'PAID'
WHERE c.active = 1;
39. Which set operator keeps duplicates?
Difficulty: EasyType: MCQTopic: Joins & Set Operations
- UNION
- INTERSECT
- EXCEPT
- UNION ALL
UNION performs a distinct across rows, removing duplicates. UNION ALL appends results and preserves duplicates.
Use UNION ALL when you know inputs are disjoint or you want to keep frequency.
Correct Answer: UNION ALL
Example Code
SELECT name FROM customers
UNION ALL
SELECT name FROM employees;
40. After joining orders to order_items, your totals are doubled. Why did this happen, and how do you fix it?
Difficulty: MediumType: SubjectiveTopic: Joins & Set Operations
The join expanded each order into one row per item. If you later sum an order level column, it repeats per item and inflates the total. This is a cardinality trap common with one-to-many joins.
Fix by aggregating at the right level before joining, or by summing item level amounts instead of order level amounts. Another option is to SELECT DISTINCT at the order level if your goal is only to deduplicate, but pre-aggregation is more precise for sums.
Example Code
SELECT SUM(oi.line_total)
FROM orders o
JOIN order_items oi ON oi.order_id = o.id;
41. Compare NOT EXISTS with NOT IN for anti-joins. When can NOT IN produce surprising results?
Difficulty: MediumType: SubjectiveTopic: Joins & Set Operations
NOT EXISTS uses a correlated subquery and returns true when no matching row exists. It is NULL safe and stops at the first match. Plans are often efficient on indexed keys.
NOT IN compares a value against a list. If the subquery can return NULL, the comparison becomes unknown and may return no rows at all. Prefer NOT EXISTS for robust results unless you guarantee the subquery column is NOT NULL.
Example Code
SELECT a.id FROM A a
WHERE NOT EXISTS (
SELECT 1 FROM B b WHERE b.a_id = a.id
);
42. Your engine lacks FULL OUTER JOIN. How can you emulate it, and what are the trade-offs?
Difficulty: HardType: SubjectiveTopic: Joins & Set Operations
Union a LEFT JOIN and a RIGHT JOIN, and remove the overlap by filtering the right side of one branch to NULL. This yields all matches and non-matches from both inputs.
The trade-off is extra scan work and potential duplication risk if the filters are not precise. Indexes on join keys help both branches. Test the plan size on large tables to ensure it remains acceptable.
Example Code
SELECT a.*, b.*
FROM A a LEFT JOIN B b ON b.k = a.k
UNION ALL
SELECT a.*, b.*
FROM A a RIGHT JOIN B b ON b.k = a.k
WHERE a.k IS NULL;
43. How does join order and indexing affect performance in multi-table joins?
Difficulty: MediumType: SubjectiveTopic: Joins & Set Operations
Optimizers try different join orders to minimize intermediate row counts. Starting with the most selective filters reduces data early and speeds the whole plan. Good statistics and appropriate indexes help the optimizer choose correctly.
Indexes on join keys enable efficient lookups and hash or merge joins. Composite indexes that match the join and filter columns in order can be especially powerful in star schemas.
Example Code
SELECT f.sale_id, d.date_id
FROM fact_sales f
JOIN dim_date d ON d.date_id = f.date_id
WHERE d.yyyy_mm = 202509;
44. Explain when to put predicates in ON versus WHERE for outer joins. Give a rule of thumb.
Difficulty: MediumType: SubjectiveTopic: Joins & Set Operations
Use ON to define matching logic and any right-side filters you want applied before deciding whether a row matches. This keeps NULL-extended rows when the right side fails either the join or its filter.
Use WHERE for predicates that should apply after the join, typically on the preserved table. A simple rule: put right-side filters in ON when using a LEFT JOIN; put left-side filters in ON when using a RIGHT JOIN.
Example Code
SELECT c.id, o.id AS order_id
FROM customers c
LEFT JOIN orders o
ON o.customer_id = c.id AND o.status='PAID'
WHERE c.region='APAC';
45. Which option best describes a correlated subquery?
Difficulty: EasyType: MCQTopic: Subqueries & CTEs
- A subquery that runs once and returns a constant
- A subquery that references columns from the outer query and runs per outer row
- A subquery that always returns one row and one column
- A subquery used only in the SELECT list
A correlated subquery depends on the current row of the outer query. The database must re-evaluate it for each outer row.
This design is expressive, but can be slow if the inner expression is not selective or not indexed. Many engines can transform it into a join when the logic allows.
Correct Answer: A subquery that references columns from the outer query and runs per outer row
Example Code
SELECT e.emp_id
FROM employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM employees x
WHERE x.dept_id = e.dept_id
);
46. You must return customers who have at least one order. Which pattern is most robust and avoids duplicate inflation?
Difficulty: MediumType: MCQTopic: Subqueries & CTEs
- JOIN orders and SELECT DISTINCT customers
- WHERE customer_id IN (SELECT customer_id FROM orders)
- WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id)
- FULL JOIN orders and filter NULLs
EXISTS is a semi-join. It returns true on the first match and stops. It does not duplicate the left row, so you avoid DISTINCT.
IN can work if the subquery is deduplicated and non-null. EXISTS is safer and often produces efficient plans with good indexes.
Correct Answer: WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id)
Example Code
SELECT c.customer_id
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
);
47. When can NOT IN return zero rows unexpectedly?
Difficulty: MediumType: MCQTopic: Joins & Set Operations
- When the left side contains duplicates
- When the right side has at least one NULL value
- When the subquery uses ORDER BY
- When both sides are numeric
NOT IN compares each value to the list. If the list contains NULL, comparisons become unknown and no row qualifies.
Use NOT EXISTS for anti-joins, or ensure the subquery column is declared NOT NULL or filtered to exclude NULL.
Correct Answer: When the right side has at least one NULL value
Example Code
SELECT a.id
FROM A a
WHERE a.id NOT IN (
SELECT b.a_id FROM B b WHERE b.a_id IS NOT NULL
);
48. What is a main benefit of using a Common Table Expression (CTE) with WITH?
Difficulty: EasyType: MCQTopic: Subqueries & CTEs
- It permanently stores results in the database
- It is always faster than a subquery
- It improves readability by naming intermediate results in steps
- It forces parallel execution
A CTE lets you break logic into steps and reuse names in a single statement. It makes complex queries easier to read and maintain.
Performance is engine dependent. A CTE may inline like a subquery, or materialize. Treat it as a clarity tool first.
Correct Answer: It improves readability by naming intermediate results in steps
Example Code
WITH recent AS (
SELECT * FROM orders WHERE order_date >= '2025-01-01'
)
SELECT customer_id, COUNT(*) FROM recent GROUP BY customer_id;
49. Which statement about using multiple CTEs is correct?
Difficulty: EasyType: MCQTopic: Subqueries & CTEs
- Only one CTE is allowed per query
- CTEs cannot reference other CTEs
- You can chain CTEs and later ones can read earlier ones
- CTEs cannot be used with aggregates
You can define many CTEs in a query. Later CTEs can read from earlier ones, which helps you build complex logic step by step.
This structure also keeps each step testable and easier to reason about during reviews.
Correct Answer: You can chain CTEs and later ones can read earlier ones
Example Code
WITH s AS (
SELECT customer_id, SUM(total) AS spend FROM orders GROUP BY customer_id
), r AS (
SELECT * FROM s WHERE spend > 1000
)
SELECT * FROM r;
50. In a recursive CTE, what are the two essential parts?
Difficulty: MediumType: MCQTopic: Subqueries & CTEs
- A window and a partition
- An anchor member and a recursive member
- A view and a trigger
- A function and a procedure
The anchor member seeds the first set of rows. The recursive member references the CTE and expands the set until no new rows appear.
Always add a termination guard such as a depth limit to avoid infinite recursion.
Correct Answer: An anchor member and a recursive member
Example Code
WITH RECURSIVE org AS (
SELECT id, manager_id, 1 AS lvl FROM emp WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.manager_id, o.lvl+1 FROM emp e JOIN org o ON e.manager_id=o.id
)
SELECT * FROM org;
51. Which pattern safely updates a table based on a value computed by a subquery?
Difficulty: MediumType: MCQTopic: Subqueries & CTEs
- UPDATE with a correlated subquery in SET
- DELETE then INSERT the new values
- Use TRUNCATE and reload the table
- You cannot reference the same table in a subquery
You can compute a value in a subquery and set the column in one statement. This keeps the change atomic and clear.
Make sure the subquery is selective and indexed to avoid row by row scans in large tables.
Correct Answer: UPDATE with a correlated subquery in SET
Example Code
UPDATE orders o
SET discount = (
SELECT CASE WHEN vip THEN 0.1 ELSE 0 END
FROM customers c WHERE c.id = o.customer_id
);
52. Which is often faster for existence checks on large, indexed tables?
Difficulty: MediumType: MCQTopic: Query Performance & Optimizer
- JOIN plus DISTINCT on the left keys
- EXISTS with a selective correlated subquery
- FULL OUTER JOIN with filters
- UNION ALL between tables
EXISTS can stop on the first match and leverage an index on the right table. That avoids duplicate inflation and extra sorts.
JOIN plus DISTINCT may add more work because the left row can multiply before being deduplicated.
Correct Answer: EXISTS with a selective correlated subquery
Example Code
SELECT c.id FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id=c.id AND o.total>0
);
53. Which statement compares a CTE and a temporary table accurately?
Difficulty: EasyType: MCQTopic: Subqueries & CTEs
- A CTE stores data on disk across sessions
- A temp table is a named object you can index and reuse within the session
- CTEs are always materialized to disk
- Temp tables cannot be used in joins
Temp tables live for the session or transaction and can be indexed and referenced many times. They are good for reuse and complex pipelines.
CTEs are single statement helpers. Many engines inline them; some may materialize. Pick based on reuse and performance needs.
Correct Answer: A temp table is a named object you can index and reuse within the session
Example Code
CREATE TEMP TABLE t AS SELECT * FROM big WHERE flag=1;
CREATE INDEX ON t(id);
SELECT * FROM t JOIN dim USING(id);
54. Explain how you would use a CTE to deduplicate users by email while keeping the most recent row.
Difficulty: MediumType: SubjectiveTopic: Subqueries & CTEs
First, rank rows per email by a stable order such as created time or an auto id. Use ROW_NUMBER in a CTE to assign one to the “keeper”. This is easy to reason about and test.
Then filter to the rows with row number equal to one. You get exactly one row per email without a self join. This pattern is clear and works across engines with window support.
Example Code
WITH ranked AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) rn
FROM users
)
SELECT * FROM ranked WHERE rn=1;
55. How do you traverse a hierarchy with a recursive CTE to get full paths and depths? How do you avoid cycles?
Difficulty: HardType: SubjectiveTopic: Subqueries & CTEs
Seed the root rows in the anchor member with level one and a path column that starts at the root. In the recursive member, join children to parents, increase the level, and append to the path. This produces a row for each reachable node with its full chain.
To avoid cycles, track visited ids in the path and stop when the next id already appears. You can also add a max depth guard to prevent runaway recursion if the data is corrupt.
Example Code
WITH RECURSIVE tree AS (
SELECT id, parent_id, 1 AS lvl, CAST(id AS TEXT) AS path
FROM nodes WHERE parent_id IS NULL
UNION ALL
SELECT n.id, n.parent_id, t.lvl+1, t.path||'>'||n.id
FROM nodes n JOIN tree t ON n.parent_id=t.id
WHERE POSITION(CAST(n.id AS TEXT) IN t.path)=0 AND t.lvl<20
)
SELECT * FROM tree;
56. Show how you would express an anti-join using NOT EXISTS and explain why it handles NULLs correctly.
Difficulty: MediumType: SubjectiveTopic: Subqueries & CTEs
Write a correlated subquery in NOT EXISTS that checks for the absence of a matching row. The database evaluates it per outer row and returns true only when the inner query returns no rows.
This approach does not compare values directly, so NULL does not break the logic. It avoids the three valued logic pitfall that can affect NOT IN when the right side has NULLs.
Example Code
SELECT p.product_id
FROM products p
WHERE NOT EXISTS (
SELECT 1 FROM sales s
WHERE s.product_id = p.product_id
);
57. Compare a correlated subquery and a window function to get each department’s highest paid employee.
Difficulty: MediumType: SubjectiveTopic: Window Functions
A correlated subquery approach joins employees to a per department maximum and filters on equality. It is portable and simple to read, but can struggle with ties without extra logic.
A window function ranks employees inside each department and then you filter to rank one. This is clear, flexible for ties, and usually efficient because it scans once and avoids self joins.
Example Code
SELECT dept_id, emp_id, salary
FROM (
SELECT dept_id, emp_id, salary,
ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) rn
FROM employees
) x
WHERE rn=1;58. Do CTEs always improve performance? Explain how engines may inline or materialize them and how you decide.
Difficulty: MediumType: SubjectiveTopic: Subqueries & CTEs
A CTE is a query organization tool. Some engines inline it like a view, so it behaves like a subquery. Others may materialize it into a temporary result. The choice can change the plan and memory use.
Decide by measuring with an execution plan. If reuse or expensive re-computation is the issue, a temp table with an index can win. If clarity is the goal and the step is cheap, a CTE is ideal.
Example Code
WITH t AS (SELECT * FROM big WHERE flag=1)
SELECT COUNT(*) FROM t;
59. Which statement correctly contrasts ROW_NUMBER(), RANK(), and DENSE_RANK()?
Difficulty: EasyType: MCQTopic: Window Functions
- All assign the same value to ties and do not skip numbers
- ROW_NUMBER() breaks ties arbitrarily; RANK() leaves gaps; DENSE_RANK() has no gaps
- ROW_NUMBER() leaves gaps; RANK() does not; DENSE_RANK() breaks ties arbitrarily
- Only DENSE_RANK() requires ORDER BY
ROW_NUMBER() gives a unique sequence per partition. If two rows tie on the sort key, it still assigns different numbers, so ties are broken by the engine’s tie-breakers.
RANK() assigns the same rank to ties but leaves gaps after a tie. DENSE_RANK() also assigns the same rank to ties, but the next rank is the immediate next number, so there are no gaps.
Correct Answer: ROW_NUMBER() breaks ties arbitrarily; RANK() leaves gaps; DENSE_RANK() has no gaps
Example Code
SELECT emp_id,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn,
RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS drnk
FROM employees;60. In a window definition, what do PARTITION BY and ORDER BY control?
Difficulty: EasyType: MCQTopic: Window Functions
- PARTITION BY sorts rows; ORDER BY filters rows
- PARTITION BY chooses aggregate; ORDER BY chooses frame
- PARTITION BY forms independent groups; ORDER BY defines row order within each group
- Both PARTITION BY and ORDER BY are optional and equivalent
PARTITION BY splits the result set into logical groups. The function runs separately within each partition. ORDER BY sets the sequence inside each partition so the function can compute running or ranked results.
Leaving PARTITION BY out means the entire result is a single partition. Leaving ORDER BY out makes ranking undefined and turns many analytics into simple per-partition aggregates.
Correct Answer: PARTITION BY forms independent groups; ORDER BY defines row order within each group
Example Code
SUM(amount) OVER (PARTITION BY customer_id ORDER BY sale_ts)
61. Which frame returns a classic running total that includes the current row?
Difficulty: MediumType: MCQTopic: Window Functions
- ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
- ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
- RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
A running total accumulates from the start of the partition up to the current row. The ROWS frame is precise and counts physical rows, which avoids surprises with duplicate sort values.
RANGE frames can expand to peers that share the same ORDER BY value, which can change results. For financial and inventory totals, prefer ROWS for predictable behavior.
Correct Answer: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Example Code
SELECT order_id, sale_date, amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY sale_date, order_id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM sales;62. How do you compute a 7-day moving average per product using window functions?
Difficulty: MediumType: MCQTopic: Window Functions
- AVG(amount) GROUP BY 7
- AVG(amount) OVER (PARTITION BY product ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
- AVG(amount) OVER (ORDER BY product RANGE 7 PRECEDING)
- AVG(amount) OVER (PARTITION BY day ORDER BY product)
A moving window spans a fixed number of rows relative to the current row. For seven days, include six preceding rows plus the current row.
Partition by product to keep time series independent. Use ROWS frames for steady behavior when multiple rows share the same day stamp.
Correct Answer: AVG(amount) OVER (PARTITION BY product ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
Example Code
SELECT product_id, day,
AVG(amount) OVER (
PARTITION BY product_id
ORDER BY day
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS avg7
FROM daily_sales;63. What does LAG(value, 1, 0) return when there is no previous row in the partition?
Difficulty: EasyType: MCQTopic: Window Functions
- NULL always
- The value of the current row
- Zero, because of the default argument
- An error due to missing row
LAG reads a value from a previous row in the same ordered partition. When that row does not exist, the default is used.
Supplying a safe default keeps arithmetic stable and avoids NULL propagation in first-row calculations.
Correct Answer: Zero, because of the default argument
Example Code
SELECT day, amount,
LAG(amount, 1, 0) OVER (PARTITION BY product ORDER BY day) AS prev_amt
FROM daily_sales;64. FIRST_VALUE and LAST_VALUE often need an explicit frame. Why?
Difficulty: MediumType: MCQTopic: Window Functions
- They always ignore ORDER BY
- Default frame can include following rows, making LAST_VALUE equal to a future row
- They only work with RANGE frames
- They do not support PARTITION BY
The default frame for ordered windows is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW in many engines. With peers, that frame can include more rows than expected, and LAST_VALUE may not reflect the true last row in the partition.
Set ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to get the absolute first or last within the partition, or use CURRENT ROW for stepwise behavior.
Correct Answer: Default frame can include following rows, making LAST_VALUE equal to a future row
Example Code
LAST_VALUE(amount) OVER (
PARTITION BY product
ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_amt
65. Which pair are window ranking distribution functions useful for relative standing?
Difficulty: MediumType: MCQTopic: Window Functions
- NTILE and CUME_DIST
- SUM and AVG
- MIN and MAX
- COUNT and DISTINCT
CUME_DIST returns a value from greater than 0 to 1 that shows the cumulative distribution position. NTILE splits ordered rows into a fixed number of buckets like quartiles or deciles.
They are handy for percentile-style reporting, leaderboards, and customer segmentation without collapsing rows.
Correct Answer: NTILE and CUME_DIST
Example Code
SELECT user_id, spend,
CUME_DIST() OVER (ORDER BY spend) AS cd,
NTILE(4) OVER (ORDER BY spend) AS quartile
FROM users_spend;66. How do GROUP BY aggregations differ from window aggregations?
Difficulty: EasyType: MCQTopic: Window Functions
- They are identical; both preserve all rows
- GROUP BY collapses rows per group; window functions add metrics while keeping each row
- Window functions drop duplicates; GROUP BY does not
- Only window functions can use SUM
GROUP BY returns one row per group and removes detail. Window aggregates compute per-row metrics across related rows, so the original detail is preserved.
This allows you to display both row-level attributes and group-level statistics in one result.
Correct Answer: GROUP BY collapses rows per group; window functions add metrics while keeping each row
Example Code
SELECT order_id, amount,
SUM(amount) OVER (PARTITION BY customer_id) AS cust_total
FROM orders;67. You want a running total that resets each month per customer. What is the correct setup?
Difficulty: MediumType: MCQTopic: Window Functions
- PARTITION BY customer_id only
- PARTITION BY month only; ORDER BY customer_id
- PARTITION BY customer_id, month; ORDER BY date
- No PARTITION BY; ORDER BY month, customer_id
To reset totals at month boundaries and per customer, include both customer and month in the partition key. The ORDER BY controls the accumulation order within each partition.
A clean month key can be a date truncated to month or a yyyymm integer.
Correct Answer: PARTITION BY customer_id, month; ORDER BY date
Example Code
SELECT customer_id, DATE_TRUNC('month', sale_ts) AS mon, sale_ts, amount,
SUM(amount) OVER (
PARTITION BY customer_id, DATE_TRUNC('month', sale_ts)
ORDER BY sale_ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS run_month
FROM sales;68. How would you compute a running total that only counts 'PAID' invoices while still showing all invoice rows?
Difficulty: MediumType: SubjectiveTopic: Window Functions
Use a conditional expression inside the window aggregate. Wrap the amount in CASE so only qualifying rows contribute to the running sum. The window still spans all rows in order, so non-paid rows appear with the same running total carry-over.
This pattern is efficient because the table is scanned once and the condition is applied row by row. It avoids extra joins and keeps the business rule close to the metric.
Example Code
SELECT invoice_id, status, amount,
SUM(CASE WHEN status='PAID' THEN amount ELSE 0 END)
OVER (ORDER BY issued_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS paid_running
FROM invoices
ORDER BY issued_at;69. Explain how to compute day-over-day change and growth rate for a metric using window functions.
Difficulty: EasyType: SubjectiveTopic: Window Functions
Use LAG to fetch the previous day’s value in the same series. Subtract to get the absolute change, and divide by the lag value to get the percentage growth. Provide a safe default in LAG or guard against division by zero to keep results stable.
Partition the series by entity, like product or region, and order by the date. This produces clean, independent time series with clear comparisons across consecutive points.
Example Code
SELECT day, val,
LAG(val,1) OVER (PARTITION BY series_id ORDER BY day) AS prev,
val - LAG(val,1,0) OVER (PARTITION BY series_id ORDER BY day) AS delta,
CASE WHEN LAG(val,1) OVER (PARTITION BY series_id ORDER BY day) > 0
THEN (val - LAG(val,1) OVER (PARTITION BY series_id ORDER BY day))
/ LAG(val,1) OVER (PARTITION BY series_id ORDER BY day)
END AS growth
FROM series;70. Describe two ways to return the top 3 products by revenue in each category and discuss tie handling.
Difficulty: MediumType: SubjectiveTopic: Window Functions
A window approach computes revenue per product, ranks within the category using ROW_NUMBER, and filters rows with rank less than or equal to three. ROW_NUMBER returns exactly three rows even if there are ties at the boundary.
If ties must be included, use RANK or DENSE_RANK and filter on rank less than or equal to three, which may return more than three rows per category. Choose the ranking function that matches the business rule.
Example Code
SELECT category_id, product_id, revenue
FROM (
SELECT category_id, product_id, SUM(amount) AS revenue,
ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY SUM(amount) DESC) AS rn
FROM sales
GROUP BY category_id, product_id
) t
WHERE rn <= 3;71. What pitfalls arise from using RANGE frames with duplicate ORDER BY values, and how do you avoid them?
Difficulty: HardType: SubjectiveTopic: Window Functions
RANGE frames operate on value ranges, not physical row counts. When many rows share the same ORDER BY value, the frame can expand to include all peers. Running totals may jump by batches, and averages might include more rows than you expect.
To avoid surprises, use ROWS frames for precise row counts, or include a tiebreaker in ORDER BY like an id column to create a strict ordering. For first and last value, specify an explicit ROWS frame that covers exactly the rows you want.
Example Code
SUM(amount) OVER (
ORDER BY day, id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS safe_running
72. When would you prefer window functions over self-joins for comparative analytics, and why?
Difficulty: MediumType: SubjectiveTopic: Window Functions
Window functions read neighboring rows without multiplying the dataset. They are ideal for running totals, period-over-period comparisons, and rank-based filters. Plans are usually simpler, and the intent is clearer in code reviews.
Self-joins can express the same logic but tend to inflate rows and require DISTINCT or GROUP BY to recover granularity. That increases memory and sort work. Window functions preserve detail and perform well with proper indexing and ordering.
Example Code
SELECT order_id, amount,
LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_amt
FROM orders;73. Some warehouses support QUALIFY to filter on window results. Without QUALIFY, how do you structure the query cleanly?
Difficulty: MediumType: SubjectiveTopic: Window Functions
Compute the window metric in a subquery or CTE, then filter in the outer query. This keeps the window logic readable and the filter straightforward, and it works across engines that lack QUALIFY.
Name the computed column clearly, such as rn or score, so the outer WHERE clause is self-explanatory. The optimizer can often inline the step without extra cost.
Example Code
WITH ranked AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_ts DESC) AS rn
FROM events
)
SELECT * FROM ranked WHERE rn = 1;
74. What is the defining property of a clustered index?
Difficulty: EasyType: MCQTopic: Indexing
- It stores a separate structure pointing to table rows
- It orders the table’s data pages by the index key
- It compresses data but does not affect order
- It can be created multiple times per table without limits
A clustered index defines the physical order of rows on disk. The table and the index are the same structure, so range scans on the key are very fast.
Because the table can be ordered only one way, you can have only one clustered index per table.
Correct Answer: It orders the table’s data pages by the index key
Example Code
CREATE TABLE orders (
order_id BIGINT PRIMARY KEY, -- often clustered by default
order_date DATE,
customer_id BIGINT
);
75. When does a non-clustered index help most?
Difficulty: EasyType: MCQTopic: Indexing
- On low-selectivity columns like boolean flags only
- On columns frequently used in WHERE, JOIN, ORDER BY, or GROUP BY
- Only on columns that are part of the primary key
- Only when the table also has no clustered index
A non-clustered index is a separate structure (key → row locator). It accelerates point lookups, filters, joins, and sorts.
Choose columns that are selective and appear often in predicates or ordering to see the most benefit.
Correct Answer: On columns frequently used in WHERE, JOIN, ORDER BY, or GROUP BY
Example Code
CREATE INDEX idx_users_email ON users(email);
76. Which column is the best candidate for an index from a selectivity perspective?
Difficulty: MediumType: MCQTopic: Indexing
- status (values: 'A' or 'B' only)
- country_code (200 possible values, unevenly distributed)
- email (unique per user)
- is_active (true/false)
High-selectivity columns filter out many rows per lookup. Unique keys are the most selective and typically index well.
Low-cardinality columns like boolean flags rarely help alone, though they can help as the leading column in a composite index for special cases.
Correct Answer: email (unique per user)
Example Code
CREATE UNIQUE INDEX idx_users_email ON users(email);
77. What is a covering index?
Difficulty: MediumType: MCQTopic: Indexing
- An index that includes every column in the table
- An index whose key columns match the primary key
- An index that includes all columns a query needs so no table lookup is required
- A clustered index with included columns
If the index contains the filter, join, and select columns, the engine can answer the query from the index alone. That avoids extra lookups and reduces I/O.
In some engines you add non-key INCLUDE columns to achieve coverage without bloating the key.
Correct Answer: An index that includes all columns a query needs so no table lookup is required
Example Code
CREATE INDEX idx_orders_customer_date
ON orders(customer_id, order_date)
INCLUDE (total_amount);
78. For a composite index on (country_code, email), which predicate can fully seek the index?
Difficulty: MediumType: MCQTopic: Indexing
- WHERE email = 'a@b.com'
- WHERE country_code = 'IN' AND email = 'a@b.com'
- WHERE country_code LIKE '%N'
- WHERE email LIKE 'a%'
Composite indexes are left-prefix. To fully seek, you must filter on the leading column(s). Filtering only on the second column often needs a scan.
Plan index column order by your most common predicates and how they combine.
Correct Answer: WHERE country_code = 'IN' AND email = 'a@b.com'
Example Code
CREATE INDEX idx_users_cc_email ON users(country_code, email);
79. Which WHERE clause is SARGable and most likely to use an index on created_at?
Difficulty: MediumType: MCQTopic: Query Performance & Optimizer
- WHERE DATE(created_at) = '2025-10-01'
- WHERE created_at >= '2025-10-01' AND created_at < '2025-10-02'
- WHERE EXTRACT(DAY FROM created_at) = 1
- WHERE CAST(created_at AS DATE) BETWEEN '2025-10-01' AND '2025-10-01'
SARGable predicates let the engine search the index without transforming the column. Range predicates on the raw column preserve index use.
Wrapping the column in a function forces a scan in many engines because the index on the base value cannot be used directly.
Correct Answer: WHERE created_at >= '2025-10-01' AND created_at < '2025-10-02'
Example Code
SELECT * FROM events
WHERE created_at >= '2025-10-01' AND created_at < '2025-10-02';
80. Which LIKE predicate can use a standard B-tree index in most engines?
Difficulty: EasyType: MCQTopic: Query Performance & Optimizer
- WHERE name LIKE '%son'
- WHERE name LIKE '%son%'
- WHERE name LIKE 'son%'
- WHERE LOWER(name) LIKE 'son%'
A leading wildcard breaks the ability to seek from the left, so the index is not used. A prefix search like 'son%' can seek to the prefix range.
If you need case-insensitive search, consider functional indexes or citext/GIN features where available.
Correct Answer: WHERE name LIKE 'son%'
Example Code
SELECT * FROM people WHERE name LIKE 'son%';
81. Why should you index foreign key columns in large OLTP tables?
Difficulty: MediumType: MCQTopic: Indexing
- To allow multiple NULLs in child rows
- To speed up parent deletes/updates and common joins from child to parent
- Because engines forbid unindexed foreign keys
- To automatically enable partitioning
When a parent row changes or is deleted, the engine must find matching children. An index on the child foreign key avoids full scans and reduces lock time.
It also makes child-to-parent joins fast in day-to-day queries.
Correct Answer: To speed up parent deletes/updates and common joins from child to parent
Example Code
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
82. Outdated table statistics usually cause which symptom?
Difficulty: MediumType: MCQTopic: Query Performance & Optimizer
- Queries always choose index seeks even for huge ranges
- The optimizer may pick poor join orders or scan methods due to bad cardinality estimates
- Deadlocks increase immediately
- ACID properties are disabled
Statistics inform the optimizer about value distributions. If they are stale, row estimates drift and the plan can degrade.
Refreshing stats helps the optimizer choose the right indexes and join strategies for current data.
Correct Answer: The optimizer may pick poor join orders or scan methods due to bad cardinality estimates
Example Code
-- engine specific
-- PostgreSQL autovacuum/analyze or: ANALYZE orders;
-- SQL Server: UPDATE STATISTICS orders;
83. How do you use EXPLAIN (or EXPLAIN ANALYZE) to tune a slow query, and what are the first things you look for?
Difficulty: MediumType: SubjectiveTopic: Query Performance & Optimizer
Start by capturing the plan to see join order, access paths, and estimated versus actual row counts. Look for scans on very large tables where a selective index could be used, and for big mismatches between estimated and actual rows that signal bad statistics or non-SARGable predicates.
Then check operators with the highest cost or time. Add or adjust indexes to support the most selective predicates and common joins. Fix non-SARGable expressions, reduce columns (avoid select star), and ensure filters happen as early as possible. Re-run EXPLAIN ANALYZE to verify the change improved row counts and timing.
Example Code
EXPLAIN ANALYZE
SELECT o.id, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '30 days';
84. Why is keyset pagination often better than OFFSET/LIMIT on large result sets? Show the pattern.
Difficulty: MediumType: SubjectiveTopic: Query Performance & Optimizer
OFFSET forces the engine to count and skip many rows, which grows linearly with page number. This wastes work and can lead to unstable pages as new rows arrive.
Keyset pagination uses a stable, indexed cursor like an id or timestamp. You fetch the next page with a predicate on that key, which lets the engine seek directly to the next range. It is faster and consistent under concurrent inserts.
Example Code
-- page 1
SELECT * FROM posts ORDER BY id ASC LIMIT 50;
-- page 2
SELECT * FROM posts WHERE id > 50 ORDER BY id ASC LIMIT 50;
85. When would you create a partial or filtered index, and what is the trade-off?
Difficulty: MediumType: SubjectiveTopic: Indexing
Create a filtered index when queries target a well-defined subset, like active=true, recent dates, or non-NULL values. The smaller index is cheaper to maintain and can be highly selective for those queries.
The trade-off is coverage. Queries that fall outside the predicate cannot use the index. Be sure the predicate matches how the application filters in practice so the benefit is realized.
Example Code
-- PostgreSQL
CREATE INDEX idx_invoices_paid_recent
ON invoices(paid_at)
WHERE paid = true AND paid_at >= CURRENT_DATE - INTERVAL '90 days';
86. Indexes speed reads, but they slow writes. Explain how and give two mitigation strategies.
Difficulty: EasyType: SubjectiveTopic: Indexing
On every INSERT, UPDATE, or DELETE, each affected index must be updated. This adds CPU, I/O, and potential page splits. Over-indexed tables show slower bulk loads and heavier lock contention under write-heavy workloads.
Mitigate by indexing only what you query often, and by consolidating composite indexes to cover multiple access patterns. For large one-time loads, disable or drop non-essential indexes and recreate them after the load. Batch writes and keep transactions short to reduce blocking.
Example Code
/* Strategy example: create minimal indexes first, add others after profiling */
87. Your search uses LOWER(email) = 'x@y.com'. How do you keep it SARGable without changing callers?
Difficulty: MediumType: SubjectiveTopic: Indexing
Create a functional index on the expression the query uses. Then queries that apply the same function can seek that index directly. This avoids scanning the whole table due to a function on the column.
If your engine supports case-insensitive types or collations, consider them as an alternative so you can index the plain column and keep simpler predicates.
Example Code
-- PostgreSQL
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- query
SELECT * FROM users WHERE LOWER(email) = 'x@y.com';
88. A report joins a large fact table to three dimensions and runs slowly. Outline a tuning checklist focused on indexes and join order.
Difficulty: HardType: SubjectiveTopic: Query Performance & Optimizer
First, ensure foreign keys from the fact to each dimension are indexed. Add composite indexes that start with the join key and include common filters for selective access. Next, verify that the dimensions have primary or unique indexes on their keys for quick lookups.
Run EXPLAIN to see join order and row estimates. Push selective filters into the earliest steps, and consider pre-aggregating the fact in a CTE or temp table if the report only needs rolled-up data. Validate statistics, and test alternative join orders by adding selective predicates or hints only as a last resort.
Example Code
CREATE INDEX idx_fact_date ON fact_sales(date_id);
CREATE INDEX idx_fact_customer ON fact_sales(customer_id);
-- Verify unique indexes on dim_date(date_id), dim_customer(customer_id)
89. What is the primary reason to run EXPLAIN (or EXPLAIN ANALYZE) on a query?
Difficulty: EasyType: MCQTopic: Query Performance & Optimizer
- To automatically rewrite the query for you
- To view the planned or actual execution steps and their costs
- To rebuild indexes used by the query
- To change the isolation level for the session
EXPLAIN shows how the optimizer intends to execute your SQL: join order, access paths, estimated rows, and operator costs. With ANALYZE, you also see actual rows and timing.
You use this to spot scans where seeks are expected, poor join choices, and misestimates that hint at stale statistics or non-SARGable predicates.
Correct Answer: To view the planned or actual execution steps and their costs
Example Code
EXPLAIN ANALYZE
SELECT o.id, c.name
FROM orders o JOIN customers c ON c.id=o.customer_id
WHERE o.created_at >= CURRENT_DATE - INTERVAL '30 days';
90. Which join algorithm is typically best when both inputs are already sorted on the join key?
Difficulty: MediumType: MCQTopic: Query Performance & Optimizer
- Nested Loop Join
- Hash Join
- Merge Join
- Cross Join
Merge join streams through two sorted inputs and matches rows in order. When both sides are sorted (or cheap to sort) it can be very fast and memory-friendly.
Hash join excels with large, unsorted sets and equality predicates. Nested loops shine when the outer is small and the inner has an efficient seek.
Correct Answer: Merge Join
Example Code
/* Engine chooses merge when indexes support order:
JOIN ON t1.k = t2.k with both indexed by k */
91. Why do incorrect cardinality estimates hurt performance?
Difficulty: MediumType: MCQTopic: Query Performance & Optimizer
- They change ACID guarantees
- They lead the optimizer to pick poor join orders and operators
- They prevent use of any index
- They force serial execution
The cost model relies on estimated row counts. If estimates are wrong, the optimizer may choose scans, wrong join algorithms, or expensive rehashes.
Refreshing statistics, using SARGable predicates, and avoiding functions on columns help the engine estimate more accurately.
Correct Answer: They lead the optimizer to pick poor join orders and operators
Example Code
-- PostgreSQL
ANALYZE sales;
-- SQL Server
UPDATE STATISTICS sales;
92. Which predicate keeps a datetime column SARGable for a single day filter?
Difficulty: EasyType: MCQTopic: Query Performance & Optimizer
- WHERE DATE(created_at) = '2025-10-01'
- WHERE created_at BETWEEN '2025-10-01' AND '2025-10-01 23:59:59'
- WHERE EXTRACT(DAY FROM created_at) = 1
- WHERE CAST(created_at AS DATE) = '2025-10-01'
Filtering the raw column with a range allows an index seek. Wrapping the column in a function blocks index use in many engines and forces scans.
A half-open range (greater-than-or-equal to start and less than next day) is even safer across time types and precision.
Correct Answer: WHERE created_at BETWEEN '2025-10-01' AND '2025-10-01 23:59:59'
Example Code
WHERE created_at >= '2025-10-01' AND created_at < '2025-10-02'
93. When can rewriting a wide OR predicate as UNION ALL improve performance?
Difficulty: MediumType: MCQTopic: Joins & Set Operations
- When the OR compares unrelated tables
- When each disjunct can use a different index and results are later combined
- When the table has no indexes
- When the query has a GROUP BY
Wide ORs often disable index seeks. Splitting into separate queries lets each branch use the best index, then UNION ALL combines results.
Add DISTINCT only if needed, since it adds sort or hash work. Ensure predicates are mutually exclusive or dedupe explicitly.
Correct Answer: When each disjunct can use a different index and results are later combined
Example Code
(SELECT * FROM users WHERE email = $1)
UNION ALL
(SELECT * FROM users WHERE phone = $2)
94. What is a common symptom of parameter sniffing issues?
Difficulty: MediumType: MCQTopic: Query Performance & Optimizer
- Plans change on every execution randomly
- A cached plan optimized for a rare value performs poorly for common values
- Transactions fail to commit
- Indexes disappear from the plan output
On first execution, the engine compiles a plan using the sniffed parameter. If that value is atypical, the cached plan can be bad for other values.
Mitigations include recompiling, using option hints, parameterizing carefully, or separating hot and cold paths with targeted queries.
Correct Answer: A cached plan optimized for a rare value performs poorly for common values
Example Code
-- SQL Server example
EXEC sp_recompile N'dbo.SearchOrders';
95. When can a stream (ordered) aggregate beat a hash aggregate?
Difficulty: MediumType: MCQTopic: Aggregation & Grouping
- When input is already ordered on GROUP BY keys
- When there is no index on any column
- When grouping uses non-equality conditions
- Never; hash is always faster
With input sorted by group keys, a streaming aggregate walks once and emits groups as they change. It avoids building a hash table and reduces memory.
Hash aggregate is great for large, unsorted sets but may spill if memory is tight.
Correct Answer: When input is already ordered on GROUP BY keys
Example Code
SELECT customer_id, SUM(total)
FROM orders
GROUP BY customer_id;
-- index on (customer_id) enables stream aggregate
96. How can you help the optimizer avoid an explicit SORT operator for ORDER BY?
Difficulty: EasyType: MCQTopic: Query Performance & Optimizer
- Always add DISTINCT
- Create or use an index that matches the ORDER BY columns and direction
- Rewrite as GROUP BY
- Disable the query cache
If an index provides the required order, the engine can scan it and return rows already sorted. This removes the need for a costly sort step.
Column order and ASC or DESC matter. Consider covering includes to avoid extra lookups.
Correct Answer: Create or use an index that matches the ORDER BY columns and direction
Example Code
CREATE INDEX idx_orders_date_id ON orders(order_date DESC, id DESC);
97. What is a key difference between a view and a materialized view?
Difficulty: MediumType: MCQTopic: Query Performance & Optimizer
- Views store data; materialized views do not
- Materialized views store precomputed results and need refresh; views re-run the query each time
- Views can use indexes; materialized views cannot
- Materialized views cannot be joined
A materialized view trades storage and refresh work for faster reads, which helps heavy, repeated analytics. A plain view is only a saved query definition.
Choose refresh timing based on freshness needs. Some engines support fast refresh with logs; others only full refresh.
Correct Answer: Materialized views store precomputed results and need refresh; views re-run the query each time
Example Code
-- Postgres
CREATE MATERIALIZED VIEW mv_sales AS
SELECT date_trunc('day', created_at) d, SUM(total) s
FROM orders GROUP BY 1;98. Explain filter pushdown and why reducing rows early helps most query plans.
Difficulty: MediumType: SubjectiveTopic: Query Performance & Optimizer
Filter pushdown means applying selective predicates as close to the base tables as possible. This prevents unnecessary row explosion in joins and reduces the amount of data flowing to sorts, aggregates, and window operations.
Smaller inputs improve cache locality, cut memory use, and allow different join algorithms. The effect is multiplicative across multi-join plans, so early reduction often yields the biggest gains.
Example Code
SELECT ...
FROM orders o
JOIN items i ON i.order_id=o.id
WHERE o.status='PAID' AND o.created_at>=CURRENT_DATE-INTERVAL '30 days';
99. When would you prefer a temporary table over a CTE for performance?
Difficulty: MediumType: SubjectiveTopic: Subqueries & CTEs
Use a temp table when you need to reuse an expensive intermediate result multiple times, add indexes to it, or break a huge plan into steps to help the optimizer. It can reduce re-computation and gives the engine cardinality facts after indexes are built.
A CTE is great for readability in a single statement, but many engines inline it. If the logic is heavy and reused, a temp table with the right index can be faster and more predictable.
Example Code
CREATE TEMP TABLE t AS SELECT * FROM big WHERE flag=1;
CREATE INDEX ON t(user_id);
SELECT ... FROM t JOIN users u ON u.id=t.user_id;
100. Should you rely on optimizer hints to tune queries? When are they appropriate?
Difficulty: MediumType: SubjectiveTopic: Query Performance & Optimizer
Hints can force join orders or algorithms, but they lock the plan to today’s data shape. Over time, they may hide underlying issues like missing indexes or bad statistics and can age poorly.
Use hints as a last resort for stable, critical queries after fixing schema and statistics. Document why the hint exists and review it when data distributions change.
Example Code
-- Example (syntax varies)
SELECT /*+ USE_HASH(o c) LEADING(o c) */ ...
101. Your running total is slow on a very large table. How do you optimize a window function heavy query?
Difficulty: MediumType: SubjectiveTopic: Query Performance & Optimizer
First make the ORDER BY for the window align with an index to enable ordered scans and reduce sorting. Partition by selective keys to keep frames small. Prefer ROWS frames for predictable work and add a tiebreaker to the order to avoid large peer groups.
If you only need aggregates at coarser grains, pre-aggregate in a CTE or temp table before the window step. For time series, consider partitioning or pruning to restrict the scanned time range.
Example Code
CREATE INDEX idx_sales_pid_day ON sales(product_id, day);
SELECT product_id, day,
SUM(amount) OVER (PARTITION BY product_id ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) rt
FROM sales WHERE day >= CURRENT_DATE - INTERVAL '180 days';102. Describe a strategy to stabilize performance for a parameterized search endpoint suffering from plan cache instability.
Difficulty: HardType: SubjectiveTopic: Query Performance & Optimizer
Segment queries so highly skewed predicates use a dedicated path. For example, branch on a flag to call a version optimized for rare values. Use parameterization with sensible defaults, and consider OPTIMIZE FOR or recompile hints cautiously when estimates are volatile.
Add supporting indexes that match the common filter patterns, and ensure statistics contain histograms on skewed columns. Monitor with plan hash and track regressions when distributions shift.
Example Code
-- Example branching
IF @IsRare=1
SELECT ... WHERE country='VA' OPTION (RECOMPILE);
ELSE
SELECT ... WHERE country=@country;
103. Which option correctly expands the A C I D properties for relational transactions?
Difficulty: EasyType: MCQTopic: Transactions & Concurrency
- Atomicity, Consistency, Isolation, Durability
- Accuracy, Concurrency, Integrity, Durability
- Atomicity, Concurrency, Idempotency, Durability
- Aggregation, Consistency, Isolation, Dependability
Atomicity means all operations succeed as one unit, or none do. Consistency means every committed transaction preserves declared rules like constraints and data types.
Isolation keeps concurrent transactions from seeing each other’s intermediate states. Durability guarantees committed changes survive crashes through logs or replication.
Correct Answer: Atomicity, Consistency, Isolation, Durability
Example Code
BEGIN;
-- multiple writes
COMMIT;
104. Match the phenomenon to its description: dirty read, non-repeatable read, phantom read.
Difficulty: EasyType: MCQTopic: Transactions & Concurrency
- Dirty: sees uncommitted data; Non-repeatable: same row changes; Phantom: new rows appear matching a predicate
- Dirty: same row changes; Non-repeatable: uncommitted data; Phantom: network delay
- Dirty: lost update; Non-repeatable: deadlock; Phantom: lock timeout
- Dirty: missing index; Non-repeatable: stale stats; Phantom: plan cache issue
A dirty read returns data that may later roll back. A non-repeatable read happens when a row read twice returns different values after another commit.
A phantom read adds or removes qualifying rows between two scans of the same predicate, usually on ranges.
Correct Answer: Dirty: sees uncommitted data; Non-repeatable: same row changes; Phantom: new rows appear matching a predicate
Example Code
SELECT * FROM orders WHERE status='NEW'; -- second read returns different result set
105. Which mapping is correct for standard SQL isolation levels and phenomena prevented?
Difficulty: MediumType: MCQTopic: Transactions & Concurrency
- Read Uncommitted: prevents dirty reads; Read Committed: prevents phantoms; Repeatable Read: prevents all; Serializable: none
- Read Uncommitted: none; Read Committed: prevents dirty reads; Repeatable Read: prevents dirty and non-repeatable reads; Serializable: prevents dirty, non-repeatable, and phantoms
- Read Uncommitted: prevents phantoms only; Read Committed: prevents all; Repeatable Read: prevents dirty reads; Serializable: prevents none
- All four levels prevent dirty reads only
As isolation increases, more anomalies are blocked. Read Committed avoids dirty reads. Repeatable Read also stabilizes previously read rows. Serializable enforces the same outcome as some serial order of transactions and eliminates phantoms.
Engines vary: some provide Snapshot Isolation variants that behave between Repeatable Read and Serializable for common workloads.
Correct Answer: Read Uncommitted: none; Read Committed: prevents dirty reads; Repeatable Read: prevents dirty and non-repeatable reads; Serializable: prevents dirty, non-repeatable, and phantoms
Example Code
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
106. Which pairing of lock type and operation is most accurate for row-level locking?
Difficulty: MediumType: MCQTopic: Transactions & Concurrency
- Shared lock for writes, Exclusive lock for reads
- Exclusive lock for writes, Shared lock for consistent reads in locking engines
- Intent lock blocks all table access always
- Update lock is only for DDL
Writers take exclusive locks to update protected rows. Locking engines may take shared locks for consistent reads at some levels to coordinate with writers.
Intent locks communicate a transaction’s intent at higher granularity so engines can detect conflicts quickly without scanning fine-grained locks.
Correct Answer: Exclusive lock for writes, Shared lock for consistent reads in locking engines
Example Code
-- conceptual
UPDATE accounts SET balance = balance - 100 WHERE id=1;
107. What is the core idea of MVCC (Multi-Version Concurrency Control)?
Difficulty: MediumType: MCQTopic: Transactions & Concurrency
- Readers block writers by holding table locks
- Readers see a snapshot while writers create new versions, so reads avoid blocking writes
- Every update overwrites rows in place to avoid bloat
- It disables transactions for read queries
MVCC keeps past versions long enough to serve a snapshot consistent with each transaction’s view. Readers do not wait for writers and vice versa in common cases.
Engines must vacuum or purge dead versions to reclaim space and control index bloat.
Correct Answer: Readers see a snapshot while writers create new versions, so reads avoid blocking writes
Example Code
START TRANSACTION; -- snapshot is taken here
SELECT * FROM orders WHERE id=10;
108. Which statement about deadlocks is correct?
Difficulty: EasyType: MCQTopic: Transactions & Concurrency
- They are timeouts caused by slow queries only
- They happen when two transactions wait on each other’s locks in a cycle
- They cannot occur under Serializable isolation
- They are solved by adding more CPU
A deadlock is a cycle of waiting. Engines detect the cycle and abort one victim to break it. Your code should handle the error and retry safely.
Prevent deadlocks by acquiring resources in a consistent order, keeping transactions short, and using selective indexes to lock fewer rows.
Correct Answer: They happen when two transactions wait on each other’s locks in a cycle
Example Code
BEGIN; -- T1 locks row A then tries B
-- T2 locks row B then tries A -> cycle
109. What risk comes with using a WITH (NOLOCK) or READ UNCOMMITTED hint in a reporting query?
Difficulty: MediumType: MCQTopic: Transactions & Concurrency
- Only slower execution, results stay correct
- You may read uncommitted or torn data and see duplicates or missing rows
- It disables logging for the whole database
- It escalates all locks to table locks
At the lowest isolation, reads can observe rows that roll back or scan through changing pages. You can get inconsistent counts, duplicates, or missed rows.
Prefer Snapshot or Read Committed Snapshot where available for non-blocking reads with consistent results.
Correct Answer: You may read uncommitted or torn data and see duplicates or missing rows
Example Code
SELECT * FROM Orders WITH (NOLOCK);
110. What does a SAVEPOINT allow you to do inside a transaction?
Difficulty: EasyType: MCQTopic: Transactions & Concurrency
- Commit part of a transaction permanently
- Pause replication
- Roll back to a named point without aborting the whole transaction
- Change isolation level mid-statement
Savepoints let you protect a block of work inside a larger unit. If the block fails, you roll back to the savepoint and continue with other steps.
They are useful for best-effort operations like optional inserts or per-row validations during batch processing.
Correct Answer: Roll back to a named point without aborting the whole transaction
Example Code
BEGIN;
SAVEPOINT s1;
-- try some write
ROLLBACK TO SAVEPOINT s1;
COMMIT;
111. What problem does two-phase commit (2PC) address in distributed transactions?
Difficulty: HardType: MCQTopic: Transactions & Concurrency
- Joins across shards are faster
- Guarantees all participants commit or all roll back together
- Eliminates network latency
- Avoids the need for application retries
2PC coordinates multiple resource managers to vote on success, then either commits all or aborts all. It preserves atomicity across systems.
It adds latency and failure modes like in-doubt transactions. Many modern designs prefer idempotent events with eventual consistency unless strict atomicity is required.
Correct Answer: Guarantees all participants commit or all roll back together
Example Code
-- conceptual coordinator messages: prepare -> commit
112. Explain the lost update problem and two strategies to prevent it without serializing the whole workload.
Difficulty: MediumType: SubjectiveTopic: Transactions & Concurrency
Lost updates occur when two transactions read the same row, compute new values, and write back, with the second write overwriting the first. Each transaction appeared correct alone, but the combined effect drops one change.
Prevent it with optimistic concurrency: store a version or timestamp and add it to the WHERE of the UPDATE. If zero rows update, retry with fresh data. Or use SELECT FOR UPDATE to lock the row before computing, which is pessimistic but simple for hot records.
Example Code
UPDATE accounts
SET balance = balance + :delta, version = version + 1
WHERE id = :id AND version = :version;
113. Contrast Snapshot Isolation with true Serializable isolation. When does Snapshot still allow anomalies?
Difficulty: MediumType: SubjectiveTopic: Transactions & Concurrency
Snapshot Isolation lets readers see a stable, committed snapshot and resolves write conflicts when two transactions update the same row. It removes most read-write blocking and avoids many anomalies seen at Read Committed.
However, Snapshot can allow write skew: two transactions read overlapping sets and make disjoint writes that break a cross-row invariant. Serializable adds predicate-level protection or conflict detection to prevent these phantoms and write skew at higher cost.
Example Code
-- Example invariant: at least one doctor on call
-- T1 sets A off call, T2 sets B off call; both read 'someone is on call' -> write skew
114. Your app occasionally hits deadlock victims. Describe a robust handling strategy at the application layer.
Difficulty: EasyType: SubjectiveTopic: Transactions & Concurrency
Treat deadlocks as expected under contended workloads. Make the unit of work idempotent, catch the specific error code, and retry the whole transaction a small number of times with backoff. Keep transactions short and ordered to lower the chance of a second collision.
Log the query shapes involved so you can re-order updates or add selective indexes to shrink lock footprints and reduce contention hot spots.
Example Code
-- pseudocode
for attempt in 1..3:
begin tx
try { do_work(); commit; break; }
catch deadlock { rollback; sleep(jitter); continue; }115. Why are long-running transactions harmful in MVCC systems, and how do you mitigate the damage?
Difficulty: MediumType: SubjectiveTopic: Transactions & Concurrency
Long transactions hold snapshots open. The engine must keep old row versions and cannot reclaim space or freeze visibility easily. This increases table and index bloat and slows vacuum or purge cycles. It can also block truncation and keep conflict fields hot.
Mitigate by breaking work into smaller commits, moving pure reads to a replica, and avoiding idle-in-transaction connections. Schedule vacuum or maintenance windows and monitor age metrics to intervene before wraparound risks.
Example Code
SET idle_in_transaction_session_timeout = '5min';
116. You need exactly-once effects for payment writes over flaky networks. Outline a database pattern that achieves safe retries.
Difficulty: HardType: SubjectiveTopic: SQL Basics & DML
Use idempotency keys stored with a unique constraint. The client sends a stable key per logical operation. Wrap the insert or update in a transaction that first checks or inserts the key row. If the same key arrives again, the unique constraint forces an upsert path that returns the original result without applying the effect twice.
Pair this with atomic upserts on balances or ledger entries and commit once. Keep the key TTL long enough to cover realistic retry windows.
Example Code
INSERT INTO idempotency_keys(key, result_hash)
VALUES(:k, :h)
ON CONFLICT (key) DO NOTHING;
-- proceed only once; fetch prior result when conflict occurs
117. Which statement best distinguishes a correlated subquery from a non-correlated subquery?
Difficulty: EasyType: MCQTopic: Subqueries & CTEs
- A correlated subquery is executed once per outer row because it references outer columns
- A correlated subquery returns multiple result sets
- Only non-correlated subqueries can appear in WHERE clauses
- Correlated subqueries must use GROUP BY
A correlated subquery depends on values from the current row of the outer query, so the database re-evaluates it for each outer row. That makes it expressive but potentially slower on large inputs.
A non-correlated subquery is independent and runs once, with its result reused. It is easier for optimizers to transform into joins or set operations.
Correct Answer: A correlated subquery is executed once per outer row because it references outer columns
Example Code
SELECT e.*
FROM employees e
WHERE e.salary > (
SELECT AVG(salary) FROM employees x WHERE x.dept_id = e.dept_id
);
118. When comparing EXISTS and IN, which choice is generally safer with NULLs and scalable for large subqueries?
Difficulty: EasyType: MCQTopic: Subqueries & CTEs
- IN is safer and always faster
- EXISTS is safer with NULLs and often scales better on indexed correlated checks
- IN ignores indexes by design
- EXISTS requires DISTINCT in the subquery
EXISTS stops at the first matching row and is not affected by NULLs in the subquery. It works well with an index on the correlated predicate.
IN compares a value to a set. If the subquery can yield NULLs, NOT IN can return no rows due to three-valued logic. Deduplicate the subquery or prefer NOT EXISTS.
Correct Answer: EXISTS is safer with NULLs and often scales better on indexed correlated checks
Example Code
SELECT * FROM orders o
WHERE EXISTS (
SELECT 1 FROM order_items i WHERE i.order_id = o.id
);
119. Which SQL pattern accurately returns rows in A that have no match in B?
Difficulty: MediumType: MCQTopic: Joins & Set Operations
- A INNER JOIN B ON A.k = B.k
- A LEFT JOIN B ON A.k = B.k WHERE B.k IS NULL
- A RIGHT JOIN B ON A.k = B.k WHERE A.k IS NOT NULL
- A CROSS JOIN B WHERE A.k <> B.k
The anti-join pattern uses a left join and then filters for rows where the right side is missing. This expresses “in A but not in B”.
The same logic can be written with NOT EXISTS. Choose the form that is clearest and performs well in your engine.
Correct Answer: A LEFT JOIN B ON A.k = B.k WHERE B.k IS NULL
Example Code
SELECT A.*
FROM A
LEFT JOIN B ON B.k = A.k
WHERE B.k IS NULL;
120. Which predicate most closely mirrors a semi-join?
Difficulty: MediumType: MCQTopic: Joins & Set Operations
- IN or EXISTS used to test membership without duplicating left rows
- UNION ALL of two tables
- FULL OUTER JOIN with NULL filters
- CROSS JOIN with WHERE 1=1
A semi-join answers whether a matching row exists on the right, but it does not return right-hand columns and does not multiply left rows.
Implement with EXISTS or IN when you only need to filter the left side by presence on the right.
Correct Answer: IN or EXISTS used to test membership without duplicating left rows
Example Code
SELECT c.* FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id
);
121. What does a Common Table Expression (CTE) primarily provide?
Difficulty: EasyType: MCQTopic: Subqueries & CTEs
- A permanent materialized result that persists across sessions
- A temporary named result visible only to the next statement
- Automatic indexing of intermediate results
- Guaranteed performance improvement over subqueries
A CTE created with WITH gives a readable name to an intermediate query and exists only for that statement. Many engines inline it like a subquery.
CTEs are about clarity and stepwise logic. Performance depends on the optimizer and data shape, not the syntax alone.
Correct Answer: A temporary named result visible only to the next statement
Example Code
WITH recent AS (
SELECT * FROM orders WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT * FROM recent WHERE status = 'PAID';
122. What are the two required parts of a recursive CTE?
Difficulty: MediumType: MCQTopic: Subqueries & CTEs
- Anchor member and termination hint
- Anchor member and recursive member joined by UNION ALL
- Recursive member and materialized view
- Base table and trigger
The anchor produces the seed rows. The recursive member references the CTE and expands the result, usually via UNION ALL. A safety condition limits depth.
This pattern models hierarchies like org charts and folder trees without procedural code.
Correct Answer: Anchor member and recursive member joined by UNION ALL
Example Code
WITH RECURSIVE org AS (
SELECT id, manager_id, 1 AS lvl FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.manager_id, o.lvl+1
FROM employees e JOIN org o ON e.manager_id = o.id
)
SELECT * FROM org;
123. Why can UNION ALL be faster than UNION?
Difficulty: EasyType: MCQTopic: Joins & Set Operations
- UNION ALL automatically creates indexes
- UNION ALL skips duplicate removal and avoids sort/hash work
- UNION ALL compiles to fewer joins
- UNION ALL forces parallelism
UNION performs a distinct operation across inputs, which costs CPU and memory. UNION ALL concatenates results and is usually faster.
Use UNION when you truly need deduplication. Otherwise prefer UNION ALL and dedupe later if needed.
Correct Answer: UNION ALL skips duplicate removal and avoids sort/hash work
Example Code
SELECT id FROM a
UNION ALL
SELECT id FROM b;
124. Which approach often improves performance when an outer table is large and a correlated subquery is slow?
Difficulty: MediumType: MCQTopic: Joins & Set Operations
- Rewrite the subquery as a JOIN with proper indexes
- Wrap the subquery in SELECT DISTINCT
- Add ORDER BY inside the subquery
- Use CROSS JOIN instead
Correlated subqueries can execute once per outer row. A join lets the optimizer find an efficient join order and use indexes on the join keys.
Ensure the join does not multiply rows unintentionally. Use grouping or semi-join style predicates if you only need existence.
Correct Answer: Rewrite the subquery as a JOIN with proper indexes
Example Code
SELECT o.*
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE c.vip = true;
125. What is a practical use of a SELF JOIN?
Difficulty: EasyType: MCQTopic: Joins & Set Operations
- To union two tables without duplicates
- To compare rows within the same table, such as employees to their managers
- To enforce a foreign key
- To materialize a view
Self joins let you relate a row to another row in the same table via a relationship like manager_id or previous_id.
Alias the table twice so each role is clear. Add indexes on the linking columns for speed.
Correct Answer: To compare rows within the same table, such as employees to their managers
Example Code
SELECT e.name AS emp, m.name AS mgr
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;
126. When is a CROSS JOIN the correct choice?
Difficulty: MediumType: MCQTopic: Joins & Set Operations
- When you need the cartesian product of two sets, like generating all size–color combinations
- When you want to filter unmatched rows
- When you want to deduplicate inputs
- When you need outer join semantics
A cross join returns every combination of rows from both inputs. It is useful for generating grids or filling a date series crossed with categories.
Use with care. Without a subsequent filter, the row count is rows(A) × rows(B), which can explode quickly.
Correct Answer: When you need the cartesian product of two sets, like generating all size–color combinations
Example Code
SELECT p.product_id, d.dt
FROM products p
CROSS JOIN dates d;
127. Explain when to prefer a window function over a GROUP BY for analytical queries.
Difficulty: EasyType: SubjectiveTopic: Window Functions
Choose window functions when you need to keep row detail while adding analytics like running totals, ranks, or group totals. Windows compute across related rows but do not collapse the result, so you can show both the raw row and the metric together.
GROUP BY is right when you want one row per group. It summarizes and discards the original granularity. Use it for final aggregates, and windows for per-row insights.
Example Code
SELECT order_id, amount,
SUM(amount) OVER (PARTITION BY customer_id) AS cust_total
FROM orders;128. How do you prevent infinite loops and duplicates in a recursive CTE that walks a graph?
Difficulty: MediumType: SubjectiveTopic: Subqueries & CTEs
Add a termination condition that stops expansion when there are no new edges to follow or a maximum depth is reached. In hierarchical data, stop when the parent is NULL or repeats.
Track visited nodes in the recursive member. Accumulate the path and exclude nodes already seen. This prevents cycles from re-entering the same vertex and emitting duplicates.
Example Code
WITH RECURSIVE walk AS (
SELECT id, parent_id, ARRAY[id] AS path FROM nodes WHERE parent_id IS NULL
UNION ALL
SELECT n.id, n.parent_id, path || n.id
FROM nodes n
JOIN walk w ON w.id = n.parent_id
WHERE n.id <> ALL(w.path)
)
SELECT * FROM walk;
129. Derived table subqueries and CTEs can both modularize logic. How do you choose between them for readability and performance?
Difficulty: MediumType: SubjectiveTopic: Subqueries & CTEs
Use a CTE when naming the step improves clarity, you reuse the step, or the engine supports readable recursion. A derived table is fine for small, local transformations and keeps the scope tight.
Performance often depends on the optimizer. Many engines inline both forms. When reuse, indexing, or materialization control is needed, consider a temporary table with indexes instead.
Example Code
SELECT *
FROM (
SELECT customer_id, SUM(total) AS s FROM orders GROUP BY 1
) t
JOIN customers c ON c.id = t.customer_id;
130. You must flag customers whose latest order value is above their category’s average order. Outline a clean SQL approach.
Difficulty: MediumType: SubjectiveTopic: Subqueries & CTEs
First, identify each customer’s latest order using ROW_NUMBER over orders partitioned by customer and ordered by date. Filter to rn equals one to keep the latest row. Next, compute the average per category in a separate step.
Join the two steps and compare the latest order amount to the category average. This avoids correlated re-computations and keeps the intent simple to tune.
Example Code
WITH latest AS (
SELECT o.*,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC, id DESC) rn
FROM orders o
), avg_cat AS (
SELECT category_id, AVG(total) AS avg_total FROM orders GROUP BY category_id
)
SELECT l.customer_id, l.total > a.avg_total AS above_avg
FROM latest l
JOIN avg_cat a ON a.category_id = l.category_id
WHERE l.rn = 1;131. Which window function assigns a unique sequence number to each row within its partition even when values tie?
Difficulty: EasyType: MCQTopic: Window Functions
- ROW_NUMBER()
- RANK()
- DENSE_RANK()
- NTILE()
ROW_NUMBER always produces 1, 2, 3, and so on with no duplicates and no gaps, even when the ORDER BY values are the same.
RANK assigns the same rank to ties and leaves gaps. DENSE_RANK assigns the same rank to ties but does not leave gaps.
Correct Answer: ROW_NUMBER()
Example Code
SELECT order_id,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC, order_id DESC) AS rn
FROM orders;132. What do LAG and LEAD let you do without a self join?
Difficulty: EasyType: MCQTopic: Window Functions
- Calculate percentiles across the table
- Look at previous or next row values in the same partition order
- Materialize a view for later use
- Lock rows for update safely
LAG peeks backward and LEAD peeks forward along the ordered rows in the same partition. This is perfect for day-over-day changes or trend detection.
Because they run inside the same result set, you avoid extra joins and keep the query simpler and faster.
Correct Answer: Look at previous or next row values in the same partition order
Example Code
SELECT day, amount,
amount - LAG(amount) OVER (ORDER BY day) AS delta
FROM sales_by_day;133. In a window frame, what is the key difference between ROWS and RANGE?
Difficulty: MediumType: MCQTopic: Window Functions
- ROWS counts physical rows; RANGE groups rows with peer values by the ORDER BY
- ROWS is slower than RANGE
- RANGE can only be used with dates
- They are identical in all engines
ROWS defines frames by row count relative to the current row. RANGE uses logical offsets on the ORDER BY value and often includes peer rows with the same sort key.
For precise moving windows like last 3 rows, pick ROWS. For value ranges like past 7 days by timestamp, RANGE can be appropriate when supported.
Correct Answer: ROWS counts physical rows; RANGE groups rows with peer values by the ORDER BY
Example Code
SUM(amount) OVER (
ORDER BY day
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_sum
134. Which frame reliably produces a cumulative running total from the first row to the current row?
Difficulty: EasyType: MCQTopic: Window Functions
- ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
- ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
- ROWS BETWEEN UNBOUNDED FOLLOWING AND CURRENT ROW
UNBOUNDED PRECEDING to CURRENT ROW sums all rows from the start of the partition through the current one. It is the standard running total frame.
Be explicit about ROWS to avoid surprises with peer groups under RANGE semantics.
Correct Answer: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Example Code
SUM(amount) OVER (
PARTITION BY account_id
ORDER BY txn_time, id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_balance
135. Which function returns a continuous percentile value as a window calculation in engines that support it?
Difficulty: MediumType: MCQTopic: Window Functions
- PERCENT_RANK()
- CUME_DIST()
- PERCENTILE_CONT()
- NTILE()
PERCENTILE_CONT computes a value at the requested percentile using interpolation across ordered data. It can run as an ordered-set aggregate or as a window function.
PERCENT_RANK and CUME_DIST return relative ranks between zero and one. NTILE splits rows into buckets.
Correct Answer: PERCENTILE_CONT()
Example Code
PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY amount)
OVER (PARTITION BY region) AS p90_amount
136. What does NTILE(4) OVER (...) compute?
Difficulty: EasyType: MCQTopic: Window Functions
- A rolling average over 4 rows
- Four equal-width value ranges
- Quartile bucket numbers across ordered rows
- A dense rank from 1 to 4 with no ties
NTILE splits the ordered rows into the requested number of buckets and labels each row with its bucket number. With 4, you get quartiles by row count.
It is handy for top-x percent tagging and A B testing splits.
Correct Answer: Quartile bucket numbers across ordered rows
Example Code
NTILE(4) OVER (PARTITION BY country ORDER BY revenue DESC) AS quartile
137. Portable way to count only paid orders in a window when FILTER clause is unavailable?
Difficulty: MediumType: MCQTopic: Window Functions
- COUNT(paid)
- COUNT(*) FILTER (WHERE paid)
- SUM(CASE WHEN paid THEN 1 ELSE 0 END)
- COUNT(DISTINCT CASE WHEN paid THEN 1 END)
CASE inside SUM is widely supported and yields a conditional count. FILTER is elegant where supported, but not universal across engines.
This pattern also works for conditional sums and averages by returning the numeric to aggregate or zero.
Correct Answer: SUM(CASE WHEN paid THEN 1 ELSE 0 END)
Example Code
SUM(CASE WHEN status='PAID' THEN 1 ELSE 0 END)
OVER (PARTITION BY customer_id) AS paid_cnt
138. Why should you include a tiebreaker column in the ORDER BY of window functions?
Difficulty: MediumType: MCQTopic: Window Functions
- To reduce memory usage
- To avoid nondeterministic row ordering when sort keys tie
- To force parallel execution
- To remove duplicates automatically
If two rows share the same sort key, the engine may return them in any order. Adding a stable tiebreaker like id keeps rankings and running totals deterministic.
This prevents odd jumps in running metrics and inconsistent test results.
Correct Answer: To avoid nondeterministic row ordering when sort keys tie
Example Code
ROW_NUMBER() OVER (ORDER BY created_at DESC, id DESC) AS rn
139. Which practice most often speeds large running-window computations?
Difficulty: HardType: MCQTopic: Window Functions
- Use RANGE with an expression on the ORDER BY
- Align ORDER BY with a supporting index and use ROWS frames
- Add DISTINCT to the outer query
- Wrap the window in a CTE to force materialization always
When the scan order matches an index, less sorting is needed and the window can stream. ROWS frames keep work bounded to exact row counts instead of peers.
Materialization and DISTINCT can add unnecessary sorts and memory usage unless required by logic.
Correct Answer: Align ORDER BY with a supporting index and use ROWS frames
Example Code
CREATE INDEX idx_sales_pid_day ON sales(product_id, day);
-- then order by (product_id, day)
140. Explain how to compute a 7-day moving average per product. Mention the window frame choice and why.
Difficulty: EasyType: SubjectiveTopic: Window Functions
Partition by product so each series is independent. Order by date so rows stream in time order. Use a ROWS frame of six preceding to current row to capture exactly seven rows, which is stable even with multiple rows per day.
RANGE frames can include peers with the same date and change the number of rows. ROWS keeps the window size fixed and matches the business definition precisely.
Example Code
AVG(amount) OVER (
PARTITION BY product_id
ORDER BY sale_date, id
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS avg_7d
141. Give two idioms to select the latest row per customer and discuss trade-offs.
Difficulty: MediumType: SubjectiveTopic: Window Functions
First, use ROW_NUMBER over (PARTITION BY customer ORDER BY created_at DESC, id DESC) then filter rn equals one. It is readable and fast with a matching index.
Second, use a join to a subquery that selects MAX(created_at) per customer. It can be concise but may miss ties unless you add a tiebreaker. The window approach is safer and easier to extend.
Example Code
WITH ranked AS (
SELECT *, ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY created_at DESC, id DESC) rn
FROM orders
)
SELECT * FROM ranked WHERE rn=1;142. How do you compute an account running balance that resets when a new month starts?
Difficulty: MediumType: SubjectiveTopic: Window Functions
Partition the window by account and the month key so the cumulative sum restarts for each month. Order by transaction time and add a stable id as a tie breaker.
This produces a per-month running total without extra joins or procedural loops.
Example Code
SUM(amount) OVER (
PARTITION BY account_id, date_trunc('month', txn_time)
ORDER BY txn_time, id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS month_running_balance143. When would you use PERCENT_RANK or CUME_DIST instead of DENSE_RANK for analytics?
Difficulty: EasyType: SubjectiveTopic: Window Functions
Use PERCENT_RANK or CUME_DIST when you need a normalized score between zero and one to compare positions across groups with different sizes. These functions express relative standing, which is ideal for dashboards and thresholds like top five percent.
DENSE_RANK gives integer ranks and is best when you need discrete positions or to filter top N by rank.
Example Code
PERCENT_RANK() OVER (PARTITION BY category ORDER BY score) AS pr
-- 0 for first, 1 for last
144. You need a running distinct count of active users per app over time. Outline approaches and their costs.
Difficulty: HardType: SubjectiveTopic: Aggregation & Grouping
A pure SQL window cannot maintain a running DISTINCT across changing frames efficiently. One approach is to pre-aggregate to daily unique users per app, then apply a running SUM over those daily counts. This reduces cardinality and keeps the window light.
For exact moving distincts at row granularity, use specialized features (e.g., HyperLogLog approx algorithms) or maintain rollups in staging tables. Exact per-row distinct windows tend to be heavy and may require materialization steps.
Example Code
-- daily uniques first, then window
WITH d AS (
SELECT app_id, day, COUNT(DISTINCT user_id) AS du
FROM events GROUP BY app_id, day
)
SELECT app_id, day,
SUM(du) OVER (PARTITION BY app_id ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS du_7d
FROM d;
145. Which statement correctly contrasts a PRIMARY KEY with a UNIQUE constraint?
Difficulty: EasyType: MCQTopic: Data Modeling & Constraints
- Both allow NULLs and both enforce uniqueness
- PRIMARY KEY enforces uniqueness and NOT NULL; UNIQUE enforces uniqueness but can allow a single NULL (engine-dependent)
- UNIQUE is faster than PRIMARY KEY for joins
- You can have many PRIMARY KEYs per table but only one UNIQUE
A primary key uniquely identifies each row and forbids NULL by definition. It is also the default target for foreign keys.
UNIQUE enforces distinct values but often allows one or more NULLs depending on the DBMS. You can define multiple UNIQUE constraints per table.
Correct Answer: PRIMARY KEY enforces uniqueness and NOT NULL; UNIQUE enforces uniqueness but can allow a single NULL (engine-dependent)
Example Code
CREATE TABLE users (
id BIGINT PRIMARY KEY,
email TEXT UNIQUE
);
146. What is the main purpose of a FOREIGN KEY constraint?
Difficulty: EasyType: MCQTopic: Data Modeling & Constraints
- To improve SELECT performance automatically
- To ensure child rows reference existing parent rows
- To partition a table by a key
- To encrypt sensitive columns
A foreign key enforces referential integrity. Values in the child column must exist in the referenced parent key.
This prevents orphan records and maintains valid relationships as data changes.
Correct Answer: To ensure child rows reference existing parent rows
Example Code
CREATE TABLE orders (
id BIGINT PRIMARY KEY,
customer_id BIGINT REFERENCES customers(id)
);
147. Which ON DELETE action keeps child rows but nulls their foreign key when the parent is removed?
Difficulty: MediumType: MCQTopic: Data Modeling & Constraints
- ON DELETE RESTRICT
- ON DELETE CASCADE
- ON DELETE SET NULL
- ON DELETE NO ACTION always
SET NULL preserves the child record but clears the foreign key, which is valid only if the column allows NULL.
CASCADE deletes the children, RESTRICT/NO ACTION prevents the parent delete when children exist.
Correct Answer: ON DELETE SET NULL
Example Code
customer_id BIGINT REFERENCES customers(id) ON DELETE SET NULL
148. Which constraint ensures column values satisfy a Boolean expression?
Difficulty: EasyType: MCQTopic: Data Modeling & Constraints
CHECK validates each row against a condition. It blocks inserts and updates that violate business rules.
Use it for domains like positive amounts or enumerated statuses.
Correct Answer: CHECK
Example Code
amount NUMERIC(12,2) CHECK (amount >= 0)
149. What does a DEFAULT constraint do during INSERT?
Difficulty: EasyType: MCQTopic: Data Modeling & Constraints
- Rejects the row if the column is NULL
- Automatically fills the column when a value is not provided
- Creates an index automatically
- Refreshes materialized views
DEFAULT supplies a value when the insert omits that column. It does not run if you pass an explicit value, including NULL.
Combine NOT NULL with DEFAULT for robust domain rules.
Correct Answer: Automatically fills the column when a value is not provided
Example Code
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
150. Which is a valid reason to use a composite PRIMARY KEY?
Difficulty: MediumType: MCQTopic: Data Modeling & Constraints
- To avoid creating any indexes
- When the natural identity is a combination of columns that must be unique together
- To speed up UPDATE statements automatically
- Because composite keys allow NULLs
Composite keys model identities like (order_id, line_no) or (country_code, phone). They encode business meaning and enforce pair uniqueness.
They do add key width to child tables. Consider a surrogate key if many foreign keys must reference the row.
Correct Answer: When the natural identity is a combination of columns that must be unique together
Example Code
PRIMARY KEY (order_id, line_no)
151. In engines that support DEFERRABLE constraints, what does INITIALLY DEFERRED enable?
Difficulty: HardType: MCQTopic: Data Modeling & Constraints
- Constraint checks only during CREATE TABLE
- Constraint checks at statement end but not at COMMIT
- Constraint checks postponed until COMMIT within the transaction
- Automatic disable of all constraints
Deferrable constraints allow complex multi-step changes that temporarily violate rules within a transaction. The final state must pass checks at commit.
Use this for bulk swaps or cycle updates in parent–child graphs.
Correct Answer: Constraint checks postponed until COMMIT within the transaction
Example Code
ALTER TABLE t
ADD CONSTRAINT fk DEFERRABLE INITIALLY DEFERRED;
152. Which trade-off best describes surrogate keys (e.g., identity/UUID) compared to natural keys?
Difficulty: MediumType: MCQTopic: Data Modeling & Constraints
- Surrogates reduce joins but increase storage
- Surrogates simplify relationships and change rarely, but add an extra column without business meaning
- Surrogates must be globally unique and slow inserts
- Natural keys are always stable and smaller
A surrogate key decouples identity from volatile business fields and keeps foreign keys narrow and stable.
The cost is another column and an extra uniqueness rule if you still need to enforce natural key uniqueness.
Correct Answer: Surrogates simplify relationships and change rarely, but add an extra column without business meaning
Example Code
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
sku TEXT UNIQUE
);
153. What is the most portable way to restrict a status column to a few allowed values?
Difficulty: MediumType: MCQTopic: Data Modeling & Constraints
- Create a VIEW
- Use a CHECK constraint with an IN list
- Rely on the application only
- Create a bitmap index
A CHECK with an IN list validates values at the database boundary and works across vendors.
Vendor-specific enums can be ergonomic, but CHECK is simple and portable.
Correct Answer: Use a CHECK constraint with an IN list
Example Code
status TEXT NOT NULL CHECK (status IN ('NEW','PAID','CANCELLED'))154. Why should foreign key columns be indexed in write-heavy OLTP tables, and what problems appear if they are not?
Difficulty: MediumType: SubjectiveTopic: Indexing
When a parent row changes or is deleted, the engine must verify or locate matching child rows. Without an index, it scans the whole child table, which is slow and holds locks longer. That increases contention and deadlock risk.
An index on the foreign key lets the engine seek directly to the children. It speeds joins, shortens lock durations, and keeps hot paths predictable under load.
Example Code
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
155. Compare using constraints versus triggers to enforce business rules. When is each appropriate?
Difficulty: MediumType: SubjectiveTopic: Data Modeling & Constraints
Use declarative constraints first. They are simple, automatic, and optimized by the engine. Examples include NOT NULL, UNIQUE, CHECK, and FOREIGN KEY. They are easy to reason about and are validated by the optimizer.
Triggers are for rules that require procedural logic across rows or tables, audit trails, or derived writes. They add complexity and can surprise callers if not documented. Prefer constraints for static invariants; use triggers sparingly for cross-row logic you cannot express declaratively.
Example Code
ALTER TABLE invoices ADD CONSTRAINT chk_amt CHECK (total >= 0);
-- Trigger only if you need side-effects like audit rows
156. Contrast sequences and identity columns for generating primary keys. Mention flexibility and portability.
Difficulty: MediumType: SubjectiveTopic: Data Modeling & Constraints
Sequences are independent objects you can reuse across tables, set start and increment, and alter without touching table DDL. They work well when you need gaps handled explicitly or cross-table ordering.
Identity columns are convenient table-local generators. They reduce boilerplate but are less flexible to share or reset. Choose sequences for cross-table coordination or when your DBMS lacks uniform identity semantics; use identity for simple, per-table keys.
Example Code
CREATE SEQUENCE seq_id START WITH 1;
CREATE TABLE t (id BIGINT PRIMARY KEY DEFAULT nextval('seq_id'));157. You need a column that must be unique when present, but may be NULL for many rows. How do you enforce this across common engines?
Difficulty: HardType: SubjectiveTopic: Data Modeling & Constraints
A plain UNIQUE often allows multiple NULLs, which is the desired behavior. If your engine treats NULLs as equal, emulate with a filtered or partial unique index on non-NULL rows. That enforces uniqueness only when a value exists.
This pattern keeps flexibility for missing values while preventing duplicates when the field is provided.
Example Code
-- PostgreSQL
CREATE UNIQUE INDEX uniq_users_email
ON users(email) WHERE email IS NOT NULL;
158. Explain how normalization up to 3NF helps prevent insert, update, and delete anomalies. Give a short example.
Difficulty: MediumType: SubjectiveTopic: Data Modeling & Constraints
In 1NF you store atomic values, so lists do not cause duplication or parsing. In 2NF you remove partial dependency on part of a composite key, which stops unrelated attributes from repeating across rows. In 3NF you remove transitive dependencies so non-keys depend only on the key.
For example, storing customer_city in the orders table causes update anomalies if the city changes. Moving customer attributes to the customers table and referencing it by a foreign key keeps one source of truth and avoids inconsistent edits.
Example Code
CREATE TABLE customers(id PK, name, city);
CREATE TABLE orders(id PK, customer_id FK REFERENCES customers(id));
159. Which precaution helps prevent unexpected large deletions when using ON DELETE CASCADE?
Difficulty: MediumType: MCQTopic: Data Modeling & Constraints
- Disable all indexes on child tables
- Run deletes outside transactions
- Guard with WHERE filters and verify affected counts, and use FK plus restrictive app paths
- Use WITH (NOLOCK) to avoid blocking
CASCADE can remove deep trees. Wrap deletes in a transaction, check impact with a SELECT first, and delete in small batches if needed.
Keep foreign keys explicit and ensure only intended users can perform destructive actions.
Correct Answer: Guard with WHERE filters and verify affected counts, and use FK plus restrictive app paths
Example Code
BEGIN;
-- preview
SELECT COUNT(*) FROM orders WHERE customer_id = :id;
-- then delete parent
DELETE FROM customers WHERE id = :id;
COMMIT;
160. For general-purpose predicates (=, <, <=, BETWEEN, ORDER BY), which index type is usually the default and most versatile?
Difficulty: EasyType: MCQTopic: Indexing
- Bitmap index
- Full-text index
- Hash index
- B-tree index
B-tree supports equality, range predicates, and ordered scans, which makes it the standard choice in most engines. Hash is limited to equality. Bitmap is great for low-cardinality analytics in certain systems. Full-text targets language search, not ordinary filters.
Correct Answer: B-tree index
Example Code
CREATE INDEX idx_users_created ON users(created_at);
161. Given a composite index (country, city, created_at), which WHERE clause is most likely to use the index efficiently?
Difficulty: MediumType: MCQTopic: Indexing
- WHERE city = 'Delhi' AND created_at >= CURRENT_DATE - INTERVAL '7 days'
- WHERE country = 'IN' AND city = 'Delhi'
- WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
- WHERE city LIKE 'D%'
Composite B-trees are left-prefix. Predicates on leading columns enable seeks. Skipping the first column usually prevents efficient use. Adding a selective range on created_at after equality filters still works well.
Correct Answer: WHERE country = 'IN' AND city = 'Delhi'
Example Code
CREATE INDEX idx_loc_time ON places(country, city, created_at);
162. What is the main benefit of a covering (included-columns) index for a SELECT?
Difficulty: MediumType: MCQTopic: Indexing
- It reduces CPU usage by disabling joins
- The engine can satisfy the query directly from the index without reading the base table
- It forces parallelism
- It compresses the data automatically
When all needed columns are in the index (keys + included), the plan can avoid extra lookups. This cuts I/O and latency for hot paths.
Use includes for non-key columns used in SELECT or WHERE to create index-only scans where supported.
Correct Answer: The engine can satisfy the query directly from the index without reading the base table
Example Code
-- SQL Server style
CREATE INDEX ix ON orders(customer_id) INCLUDE (status,total);
-- Postgres: just place columns in index and rely on index-only scan when visible
163. Which statement best describes a clustered index in systems that support it (e.g., SQL Server)?
Difficulty: EasyType: MCQTopic: Indexing
- It is a separate structure from the table data
- It defines the physical order of rows, so the table is stored by the key
- It disables nonclustered indexes
- It always improves INSERT performance
A clustered index makes the table pages follow the index order, so range scans on that key are very efficient. There can be only one clustered index. It may slow random inserts into the middle of the key space.
Correct Answer: It defines the physical order of rows, so the table is stored by the key
Example Code
-- SQL Server
CREATE CLUSTERED INDEX cix_orders ON orders(order_date, id);
164. Why might an engine fall back to heap lookups even if all projected columns are in the index?
Difficulty: MediumType: MCQTopic: Indexing
- Because WHERE always ignores indexes
- To validate tuple visibility or MVCC metadata that the index alone cannot confirm
- Because ORDER BY requires a table scan
- Because DISTINCT disables index use
Some engines need to confirm row visibility (e.g., via visibility maps) before serving from the index alone. If visibility info is not set, they probe the base table. Regular vacuuming and recent access can improve index-only scans.
Correct Answer: To validate tuple visibility or MVCC metadata that the index alone cannot confirm
Example Code
-- Postgres example: VACUUM ANALYZE helps visibility maps
165. On which type of column does an index typically give the biggest benefit?
Difficulty: EasyType: MCQTopic: Indexing
- Columns with very few distinct values and most rows match
- Columns with high selectivity where filters return a small fraction of rows
- Columns never used in WHERE, JOIN, or ORDER BY
- Columns used only in INSERT statements
Indexes shine when they can skip most rows. Low-selectivity predicates often end up scanning large portions of the table, reducing index value except in bitmaps or special cases.
Correct Answer: Columns with high selectivity where filters return a small fraction of rows
Example Code
CREATE INDEX idx_email ON users(email);
166. When is a partial (filtered) index a strong optimization?
Difficulty: MediumType: MCQTopic: Indexing
- When most queries filter on a rare condition, like status='ACTIVE'
- When every query scans the whole table
- When the table has no writes
- When you need to enforce a foreign key
A partial index stores only rows that satisfy a predicate. It stays small and hot for common filters, reducing I/O and maintenance cost.
Correct Answer: When most queries filter on a rare condition, like status='ACTIVE'
Example Code
-- PostgreSQL
CREATE INDEX idx_active ON users(email) WHERE active = true;
167. Bitmap indexes are most effective for which data pattern?
Difficulty: MediumType: MCQTopic: Indexing
- High-cardinality columns with unique values
- Low-cardinality columns combined across many predicates in analytics
- Sequential surrogate keys
- JSON document search
Bitmap indexes encode presence bits and combine quickly with AND/OR, ideal for star-schema filters on low-cardinality attributes. They are less suited to heavy OLTP writes due to maintenance overhead.
Correct Answer: Low-cardinality columns combined across many predicates in analytics
Example Code
-- Engine-specific; common in Oracle and some warehouses
168. Why can too many indexes slow down an OLTP workload?
Difficulty: EasyType: MCQTopic: Indexing
- Indexes consume CPU only at query time
- Each INSERT/UPDATE/DELETE must also update affected indexes
- Queries stop using the plan cache
- Locks are disabled
Every data change must keep indexes in sync. More indexes mean more writes, page splits, and log records. Index only where you gain more on reads than you lose on writes.
Correct Answer: Each INSERT/UPDATE/DELETE must also update affected indexes
Example Code
/* Review index usage and drop unused ones */
169. Users search names with LIKE '%son'. Plain B-tree does not help this suffix pattern. Outline options to speed it up.
Difficulty: MediumType: SubjectiveTopic: Query Performance & Optimizer
A leading wildcard prevents a seek. Use specialized text search: trigram or n-gram indexes (e.g., Postgres pg_trgm with GIN/GIST) or full-text search for token queries. For pure suffix matches, maintain a reverse-indexed column (store reversed strings) and seek on 'nos%'.
If search can be constrained to prefix, change UX to anchor the left side (LIKE 'son%') which is index-friendly with B-tree.
Example Code
-- Postgres trigram
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_trgm_name ON people USING GIN(name gin_trgm_ops);
-- or store reversed_name and query WHERE reversed_name LIKE 'nos%';
170. Explain SARGability and give two examples of non-SARGable predicates with index-friendly rewrites.
Difficulty: EasyType: SubjectiveTopic: Query Performance & Optimizer
SARGable predicates let the engine Search ARGuments directly via an index. Non-SARGable expressions wrap the column or start with a wildcard so the engine cannot seek.
Rewrite DATE(created_at)=d as created_at >= d AND created_at < d+1. Rewrite LOWER(email)=x as email ILIKE x or use a functional index on LOWER(email). Avoid leading '%' in LIKE; prefer anchored prefixes or specialized text indexes.
Example Code
/* Good date filter */
created_at >= '2025-10-01' AND created_at < '2025-10-02'
/* Functional index option */
CREATE INDEX ix_email_lower ON users((lower(email)));
171. A feed endpoint returns the last 50 posts for a user ordered by created_at DESC, id DESC. Design an index to avoid sorting and discuss why the second key matters.
Difficulty: MediumType: SubjectiveTopic: Indexing
Create a composite index keyed by (user_id, created_at DESC, id DESC). The engine can seek on user_id and stream rows already in the required order. Adding id as a tiebreaker stabilizes order when timestamps collide and preserves index order across peers.
Returning the first 50 rows becomes a cheap ordered index scan with an early stop.
Example Code
CREATE INDEX idx_posts_feed ON posts(user_id, created_at DESC, id DESC);
172. A selective query uses an index sometimes but scans at other times. How do statistics influence this and how would you fix it?
Difficulty: MediumType: SubjectiveTopic: Query Performance & Optimizer
The optimizer estimates row counts from column and histogram statistics. If stats are stale or too generic, it may think many rows match and pick a scan. Update statistics, increase histogram buckets for skewed columns, or create filtered stats if supported.
Also check parameter sniffing and ensure literals or sniffed values reflect common cases. Verify that predicates are SARGable.
Example Code
-- Postgres
ANALYZE users;
-- SQL Server
UPDATE STATISTICS users WITH FULLSCAN;
173. Explain index maintenance: fragmentation, page splits, fillfactor, and when to rebuild versus reorganize.
Difficulty: HardType: SubjectiveTopic: Indexing
Frequent random inserts or deletes fragment leaf pages, causing extra I/O. Page splits occur when an insertion lands in a full page and the engine splits it, which adds log volume and can cascade. Lowering fillfactor leaves free space to absorb new rows in hot ranges and reduces splits.
Reorganize (or REINDEX concurrently) when fragmentation is moderate and uptime matters. Rebuild when fragmentation is severe, statistics are off, or you must change index definition or fillfactor. Schedule maintenance off-peak and monitor write pressure so benefit outweighs the cost.
Example Code
-- SQL Server examples
ALTER INDEX ix ON t REORGANIZE;
ALTER INDEX ix ON t REBUILD WITH (FILLFACTOR = 90);
174. What does EXPLAIN (or EXPLAIN ANALYZE) primarily show you?
Difficulty: EasyType: MCQTopic: Query Performance & Optimizer
- The exact SQL the optimizer rewrites
- The chosen operators, join order, and estimated or actual row counts and costs
- Only indexes used, not joins
- Only the final result set schema
A plan reveals how the optimizer will execute your query: scans, seeks, joins, sort or hash steps, and the order they occur. It also shows cardinality estimates and relative costs.
With ANALYZE (or actual execution plans), you also see runtime counters like rows read and timing. Comparing estimates versus actuals is key to spotting poor statistics or non-SARGable predicates.
Correct Answer: The chosen operators, join order, and estimated or actual row counts and costs
Example Code
EXPLAIN ANALYZE
SELECT * FROM orders o JOIN customers c ON c.id=o.customer_id
WHERE o.status='PAID';
175. Which mapping of join algorithm to best use case is most accurate?
Difficulty: MediumType: MCQTopic: Query Performance & Optimizer
- Nested loops: large-large; Hash join: small-large; Merge join: random
- Nested loops: small outer with indexed inner; Hash join: big inputs, equality; Merge join: both inputs pre-sorted on join keys
- Nested loops: only for CROSS JOIN; Hash join: only for range predicates; Merge join: avoids sorting
- All algorithms behave the same
Nested loops shine when the outer side is small and the inner side can be probed by an index fast. Hash joins work well for large inputs and equality joins without indexes. Merge joins are ideal when both sides are ordered on the join keys, avoiding extra sorts.
Knowing these trade-offs helps you design indexes and predicates that steer the optimizer toward efficient operators.
Correct Answer: Nested loops: small outer with indexed inner; Hash join: big inputs, equality; Merge join: both inputs pre-sorted on join keys
Example Code
-- add index to favor nested loops
CREATE INDEX idx_items_order ON order_items(order_id);
176. Why is keyset pagination usually faster and more stable than OFFSET ... LIMIT for deep pages?
Difficulty: EasyType: MCQTopic: Query Performance & Optimizer
- OFFSET always uses parallelism
- OFFSET forces the engine to scan and discard many rows before returning the page
- Keyset requires a full table scan each time
- Keyset cannot preserve ordering
OFFSET N means the engine must find and skip N rows in order, which grows linearly with page depth. That adds sorts and reads, and results drift when new rows arrive.
Keyset uses a stable cursor, like the last seen id and timestamp, so the query can seek to a point and fetch the next chunk. It is faster and avoids misses or duplicates under concurrency.
Correct Answer: OFFSET forces the engine to scan and discard many rows before returning the page
Example Code
SELECT * FROM posts
WHERE user_id=:u AND (created_at,id) < (:t,:id)
ORDER BY created_at DESC, id DESC
LIMIT 50;
177. What is parameter sniffing and how can it hurt performance?
Difficulty: MediumType: MCQTopic: Query Performance & Optimizer
- It disables indexes for parameterized queries
- The optimizer caches a plan optimized for one parameter value that may be bad for others
- It forces recompilation on every execution
- It removes statistics to save memory
With parameter sniffing, the first execution influences the cached plan. If that value is atypical, later executions may reuse a poor plan and run slowly.
Mitigations include query hints or option recompile, separating skewed cases, or using plan guides and better statistics on skewed columns.
Correct Answer: The optimizer caches a plan optimized for one parameter value that may be bad for others
Example Code
-- SQL Server example
EXEC sp_executesql N'SELECT * FROM t WHERE status=@s', N'@s nvarchar(10)', @s='HOT';
178. Why do stale or low-quality statistics cause bad plans?
Difficulty: EasyType: MCQTopic: Query Performance & Optimizer
- They disable joins
- The optimizer misestimates row counts and chooses wrong operators or join order
- They reduce disk space
- They only affect INSERT speed
Cardinality estimates drive plan choices. If the engine expects far fewer or far more rows than reality, it can pick scans, sorts, or joins that are too expensive.
Refresh statistics and increase histogram detail for skew. For filtered workloads, use filtered statistics or extended stats when available.
Correct Answer: The optimizer misestimates row counts and chooses wrong operators or join order
Example Code
-- Postgres
ANALYZE orders;
-- SQL Server
UPDATE STATISTICS orders WITH FULLSCAN;
179. Which rewrite makes a non-SARGable filter SARGable for date equality?
Difficulty: EasyType: MCQTopic: Query Performance & Optimizer
- WHERE DATE(created_at) = '2025-10-01'
- WHERE created_at >= '2025-10-01' AND created_at < '2025-10-02'
- WHERE CAST(created_at AS TEXT) LIKE '2025-10-01%'
- WHERE EXTRACT(DAY FROM created_at) = 1
Wrapping the column in a function blocks index seeks. Range boundaries allow the engine to use the index on created_at efficiently.
This pattern also avoids time zone surprises and keeps the predicate precise and fast.
Correct Answer: WHERE created_at >= '2025-10-01' AND created_at < '2025-10-02'
Example Code
created_at >= :d AND created_at < :d + INTERVAL '1 day'
180. When does a temporary table often outperform a large CTE?
Difficulty: MediumType: MCQTopic: Subqueries & CTEs
- When you need recursion
- When the intermediate result is reused multiple times or benefits from indexing
- When you only need a single pass and small data
- When you must guarantee no spills
Many optimizers inline CTEs, so heavy subqueries can be recomputed. Persisting the step to a temp table lets you index it, reuse it, and control row counts.
This can lower total work and give the optimizer better cardinality for later joins.
Correct Answer: When the intermediate result is reused multiple times or benefits from indexing
Example Code
CREATE TEMP TABLE recent AS
SELECT * FROM events WHERE ts>=NOW()-INTERVAL '7 days';
CREATE INDEX ON recent(user_id);
SELECT ... FROM recent r JOIN users u ON u.id=r.user_id;
181. Why should query hints be a last resort?
Difficulty: MediumType: MCQTopic: Query Performance & Optimizer
- Hints always make queries slower
- Hints bypass the optimizer and can lock you into fragile plans as data changes
- Hints prevent parallelism forever
- Hints disable transactions
Hints force a specific join, index, or plan shape. They may help in a pinch, but as data grows or distributions change, a forced plan can degrade.
Fix root causes first: SARGability, statistics, and indexing. Use hints sparingly and document why.
Correct Answer: Hints bypass the optimizer and can lock you into fragile plans as data changes
Example Code
-- Example (vendor-specific)
SELECT /*+ INDEX(o idx_orders_status) */ * FROM orders o WHERE status='PAID';
182. You see large gaps between estimated and actual rows in a plan. Walk through a systematic way to fix this.
Difficulty: EasyType: SubjectiveTopic: Query Performance & Optimizer
Start by checking statistics on columns in filters and joins. Update them and ensure histograms capture skewed distributions. If correlations exist between columns, consider composite indexes or extended statistics so the optimizer understands joint selectivity.
Next, rewrite non-SARGable predicates and add selective indexes to reduce row counts early. Re-run the plan to confirm estimates converge toward actuals, and verify that the join order now favors smaller, selective inputs.
Example Code
CREATE STATISTICS s_orders ON orders(status, country);
-- or extended stats equivalent per engine
183. Describe a safe batching strategy for deleting 50 million old rows without locking the table for hours.
Difficulty: MediumType: SubjectiveTopic: SQL Basics & DML
Delete in small, time-bounded chunks that follow an index, such as primary key or date. Use a WHERE with a range or a keyset cursor to grab, say, 5–20k rows per batch. Commit between batches to release locks and let checkpoints advance.
Add supporting indexes, disable unnecessary triggers, and run during low traffic. Monitor fragmentation; if your engine supports it, use partitioning and drop or detach whole partitions for instant archival cleanup.
Example Code
WHILE 1=1 LOOP
DELETE FROM events WHERE ts < :cutoff
ORDER BY ts
LIMIT 10000;
EXIT WHEN ROW_COUNT()=0;
COMMIT;
END LOOP;
184. A heavy GROUP BY with a wide table is slow. Suggest two approaches to reduce work without changing results.
Difficulty: MediumType: SubjectiveTopic: Query Performance & Optimizer
Project only needed columns before the GROUP BY using a subquery so fewer bytes flow through sort or hash. If the grouping key has an index that matches the order, use an index-ordered scan to avoid a full sort.
Pre-aggregate upstream where possible, or use partial aggregation by partition key. In some engines, materializing to a temporary table and indexing the group keys first reduces spills and memory pressure.
Example Code
SELECT k, SUM(v)
FROM (
SELECT key AS k, value AS v FROM big_table WHERE dt>=:d
) s
GROUP BY k;
185. Your plan shows sort or hash spills to disk. What are the likely causes and remedies?
Difficulty: MediumType: SubjectiveTopic: Query Performance & Optimizer
Spills occur when the working set for sort or hash exceeds available memory grant. Causes include under-estimated cardinality, too many wide columns, or very large result sets. They slow queries dramatically due to temp I O.
Reduce width by selecting only needed columns, improve estimates via statistics, and add supporting indexes to avoid big sorts. Tuning memory grants or work_mem can help, but fixing row estimates is usually the durable solution.
Example Code
-- Postgres example
SET work_mem = '256MB'; -- as a tactical fix alongside logical rewrites
186. Which is often faster when deduplicating before a join inflates rows?
Difficulty: MediumType: MCQTopic: Query Performance & Optimizer
- Join first, DISTINCT after
- DISTINCT or GROUP BY on the smaller side before joining
- Use CROSS JOIN to avoid duplicates
- Wrap everything in a CTE to force materialization
De-duplicating early on the side that causes multiplicity prevents row explosion and shrinks the join inputs. That reduces memory, sorts, and downstream aggregation cost.
This pattern is especially useful when joining many-to-many relationships or dimension tables with duplicated keys.
Correct Answer: DISTINCT or GROUP BY on the smaller side before joining
Example Code
WITH d AS (
SELECT DISTINCT customer_id FROM clicks WHERE dt=:d
)
SELECT ... FROM d JOIN customers c ON c.id=d.customer_id;
187. After a version upgrade your critical query regressed. Outline a triage plan to isolate and remediate the issue.
Difficulty: HardType: SubjectiveTopic: Query Performance & Optimizer
Capture before-and-after execution plans and compare join order, operator choices, and estimates. Check if new cardinality models or changed defaults altered heuristics. Verify statistics and compatibility settings; sometimes a minor setting flips behavior.
As a quick mitigation, use plan freezing or hints while you fix root causes. Then make predicates SARGable, add missing indexes, and consider breaking the query into optimized stages. Document the change and add a regression test so future upgrades keep the better plan.
Example Code
-- SQL Server
EXEC sp_create_plan_guide_from_handle @name='freeze_q', @plan_handle=...;
188. Which option correctly lists the ACID properties for transactions?
Difficulty: EasyType: MCQTopic: Transactions & Concurrency
- Atomicity, Consistency, Isolation, Durability
- Accuracy, Concurrency, Integrity, Durability
- Atomicity, Concurrency, Idempotency, Durability
- Aggregation, Consistency, Isolation, Distribution
ACID guarantees reliable changes. Atomicity means all-or-nothing. Consistency means rules and constraints hold before and after the unit of work.
Isolation limits interference among concurrent units. Durability means a committed unit survives crashes due to logging, flushing, or replication.
Correct Answer: Atomicity, Consistency, Isolation, Durability
Example Code
BEGIN;
-- do multiple statements
COMMIT; -- all persist or none with ROLLBACK
189. Which commands are core Transaction Control Language (TCL)?
Difficulty: EasyType: MCQTopic: Transactions & Concurrency
- CREATE, ALTER, DROP
- SELECT, INSERT, UPDATE
- COMMIT, ROLLBACK, SAVEPOINT
- GRANT, REVOKE
TCL manages the unit of work. COMMIT makes changes permanent. ROLLBACK undoes the unit. SAVEPOINT lets you roll back partway.
DDL often auto-commits in some engines. Be mindful of boundaries so side effects are predictable.
Correct Answer: COMMIT, ROLLBACK, SAVEPOINT
Example Code
BEGIN;
UPDATE accounts SET bal=bal-100 WHERE id=1;
SAVEPOINT s1;
UPDATE accounts SET bal=bal+100 WHERE id=2;
ROLLBACK TO s1; -- optional
COMMIT;
190. Which isolation level prevents dirty reads but may allow non-repeatable reads?
Difficulty: EasyType: MCQTopic: Transactions & Concurrency
- Read Uncommitted
- Read Committed
- Repeatable Read
- Serializable
Read Committed shows only committed data, so you avoid reading a rolled-back change. But the same row can be updated between two reads, producing a non-repeatable read.
Repeatable Read also blocks non-repeatable reads. Serializable adds protection against phantoms, at higher cost.
Correct Answer: Read Committed
Example Code
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- now reads see only committed rows
191. Which pairing of read phenomenon and description is correct?
Difficulty: MediumType: MCQTopic: Transactions & Concurrency
- Dirty read: reading a row that no longer exists
- Non-repeatable read: same query returns extra rows due to inserts by others
- Phantom: a range query sees new rows appear between two scans
- Lost update: two transactions read different values then both write
Phantoms are new or missing rows that match a predicate on a later scan. They show up with range predicates. Non-repeatable reads are the same row showing different values.
Dirty reads are uncommitted values. Lost updates happen when both read the same value and the later write overwrites the earlier without awareness.
Correct Answer: Phantom: a range query sees new rows appear between two scans
Example Code
/* Phantom example */
-- T1: SELECT * FROM orders WHERE amt > 100; -- sees N
-- T2: INSERT order amt=200 COMMIT
-- T1: re-run: now sees N+1
192. In MVCC engines, how are readers and writers decoupled?
Difficulty: MediumType: MCQTopic: Transactions & Concurrency
- Readers block writers with shared locks
- Writers block readers with exclusive locks always
- Readers see a snapshot while writers create new row versions
- Both always serialize on a global mutex
Multi-Version Concurrency Control keeps old versions to serve snapshots. Readers avoid waiting and do not take long read locks. Writers update a new version and mark old versions for cleanup.
This design improves read concurrency but needs vacuuming or GC to reclaim old versions.
Correct Answer: Readers see a snapshot while writers create new row versions
Example Code
-- Snapshot at statement start
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
193. Which lock types best describe typical row-level reads and writes?
Difficulty: EasyType: MCQTopic: Transactions & Concurrency
- Reads take exclusive; writes take shared
- Reads take shared or use versions; writes take exclusive
- Both take update locks
- Neither uses locks at all
Reads either take shared locks (blocking writers on those rows) or read MVCC versions without locking them. Writes need exclusive intent so concurrent writers do not conflict.
Lock names and hierarchy vary by engine, but the concept of shared versus exclusive is common.
Correct Answer: Reads take shared or use versions; writes take exclusive
Example Code
UPDATE inventory SET qty=qty-1 WHERE sku='X'; -- exclusive row lock
194. What is a practical strategy to reduce deadlocks in a high-contention workload?
Difficulty: MediumType: MCQTopic: Transactions & Concurrency
- Acquire resources in a consistent order across transactions
- Disable all indexes
- Use NOLOCK everywhere
- Increase the lock timeout to hours
A consistent access order removes cycles in the wait graph. Combine with shorter transactions and appropriate indexes to narrow lock scopes.
NOLOCK can read uncommitted data and produce inconsistent results. Long timeouts only hide the problem.
Correct Answer: Acquire resources in a consistent order across transactions
Example Code
-- Always lock parent, then child
-- e.g., update account, then update its orders
195. Why is WITH (NOLOCK) risky on OLTP reads?
Difficulty: MediumType: MCQTopic: Transactions & Concurrency
- It forces serialization of all reads
- It can read uncommitted, duplicate, or missing rows due to in-flight changes
- It disables the buffer cache
- It prevents deadlocks always
NOLOCK (read uncommitted) sacrifices correctness for speed. You can see rows twice, miss rows, or read values that roll back later.
Use snapshot-based isolation for non-blocking reads while keeping logical consistency.
Correct Answer: It can read uncommitted, duplicate, or missing rows due to in-flight changes
Example Code
SELECT * FROM orders WITH (NOLOCK); -- may be inconsistent
196. Optimistic concurrency control typically relies on what?
Difficulty: MediumType: MCQTopic: Transactions & Concurrency
- Locking rows for the entire user session
- Version checks at write time to detect conflicts
- Exclusive table locks on every update
- Disabling transactions
Optimistic control assumes conflicts are rare. It reads without locks and, at update, verifies the row version or timestamp has not changed. If changed, it retries or errors.
This reduces lock contention but needs careful retry logic in the application.
Correct Answer: Version checks at write time to detect conflicts
Example Code
UPDATE t SET val=:v, version=version+1
WHERE id=:id AND version=:expected;
197. What does Two-Phase Commit (2PC) coordinate across?
Difficulty: HardType: MCQTopic: Transactions & Concurrency
- Multiple queries in one session
- Multiple partitions of a single table only
- Multiple resource managers or databases to achieve atomic commit
- Multiple users’ transactions on one server
2PC uses a prepare phase to ensure all participants can commit, then a commit phase to finalize atomically. It provides ACID across systems at the cost of latency and blocking under failures.
Modern systems also use consensus replication to achieve similar guarantees.
Correct Answer: Multiple resource managers or databases to achieve atomic commit
Example Code
-- Coordinator: PREPARE TRANSACTION 'tx1'; then COMMIT PREPARED 'tx1';
198. How would you design safe retries for transient errors like deadlocks or lock timeouts?
Difficulty: MediumType: SubjectiveTopic: Transactions & Concurrency
First, make each unit idempotent: the same unit can run again without double effects. Use natural or surrogate keys to detect duplicates and upserts where possible. Log correlation IDs so you can spot duplicates.
Next, implement bounded exponential backoff with jitter. On specific transient codes, roll back, wait, and retry a few times. Keep the unit small and consistent in resource order to reduce collisions. If retries exhaust, surface a controlled failure and alert.
Example Code
for attempt in 1..5 {
BEGIN; -- do work
if success COMMIT; else ROLLBACK; sleep(backoff(attempt)+jitter);
}199. A report reads SUM(amount) WHERE date BETWEEN d1 AND d2 twice and gets different totals. Explain the cause and two fixes.
Difficulty: MediumType: SubjectiveTopic: Transactions & Concurrency
You are seeing phantoms or concurrent updates changing rows in the range. Between scans, other units inserted or updated values that match the predicate, so the second read changed.
Fix by using a higher isolation level. Serializable prevents phantoms by locking the predicate. Or design the report to read a stable snapshot, such as Repeatable Read in MVCC, or use a time-boxed ETL snapshot table for consistent reads.
Example Code
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT SUM(amount) FROM sales WHERE dt BETWEEN :d1 AND :d2;
200. What are the risks of long-running transactions and how do you mitigate them?
Difficulty: MediumType: SubjectiveTopic: Transactions & Concurrency
They hold locks and versions for a long time, increasing contention and bloat. In MVCC, old versions cannot be reclaimed, which inflates storage and slows scans. Checkpoint and log retention also grow.
Mitigate by committing in smaller units, breaking work into batches, and using snapshot reads for reporting. Keep interactive transactions short and avoid idle-in-transaction sessions.
Example Code
-- Batch pattern
DELETE FROM events WHERE created_at < :cutoff LIMIT 10000; COMMIT; -- loop
201. Explain write skew under snapshot isolation and how to prevent it.
Difficulty: HardType: SubjectiveTopic: Transactions & Concurrency
Under snapshot isolation, two units can each read a state that seems valid, then perform disjoint writes that together violate a constraint. Because neither row is updated by both, no direct conflict is detected, and the invariant breaks.
Prevent by adding a locking read on the predicate (SELECT … FOR UPDATE on the relevant set), using Serializable isolation, or enforcing the invariant with a constraint and retrying on violation.
Example Code
-- Guard with predicate lock
SELECT * FROM shifts WHERE doctor_id=:d AND overlaps(:start,:end) FOR UPDATE;
202. What are gap or key-range locks and when are they useful?
Difficulty: MediumType: SubjectiveTopic: Transactions & Concurrency
They protect not only existing rows but also the gaps between index keys that satisfy a range predicate. This blocks phantoms by preventing inserts into the protected range while the unit runs.
They appear in engines like InnoDB at Repeatable Read with range scans. They can reduce concurrency, so scope them with selective predicates and good indexes.
Example Code
-- InnoDB example range
SELECT * FROM orders WHERE amount BETWEEN 100 AND 200 FOR UPDATE;
203. Which principle should guide database permission design for application users?
Difficulty: EasyType: MCQTopic: Security & Access Control
- Grant all privileges to avoid outages
- Use least privilege and grant only what is required
- Grant schema owner rights to every login
- Rely on application checks and avoid DB permissions
Least privilege shrinks blast radius. If a credential leaks, the attacker is limited to the granted actions. Start with read-only and add specific INSERT, UPDATE, or EXECUTE rights as needed. Prefer role-based grants over per-user grants.
Correct Answer: Use least privilege and grant only what is required
Example Code
CREATE ROLE app_read;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_read;
204. What do GRANT and REVOKE do?
Difficulty: EasyType: MCQTopic: Security & Access Control
- Define table structure
- Control data-at-rest encryption
- Assign or remove privileges on database objects
- Start and stop database services
GRANT gives a principal rights on objects (tables, views, procedures). REVOKE removes those rights. Grant to roles, then attach users to roles for clean, auditable access.
Correct Answer: Assign or remove privileges on database objects
Example Code
GRANT SELECT, INSERT ON orders TO app_writer;
REVOKE INSERT ON orders FROM app_writer;
205. What is the most effective defense against SQL injection?
Difficulty: EasyType: MCQTopic: Security & Access Control
- Manual escaping only
- Parameterized queries or prepared statements
- Blocking single quotes in input
- Client-side validation only
Parameters keep code and data separate. The engine treats the parameter as a literal, not executable SQL. Combine with allow-list validation and least-privilege accounts to layer defenses.
Correct Answer: Parameterized queries or prepared statements
Example Code
SELECT * FROM users WHERE email = $1; -- bind at driver
206. What does Row-Level Security (RLS) enforce?
Difficulty: MediumType: MCQTopic: Security & Access Control
- Automatic encryption of all columns
- Policy-based filtering so a user sees only permitted rows
- Faster full table scans
- Automatic index creation for every foreign key
RLS adds a row predicate evaluated per user or role. It enables multi-tenant isolation inside a single table. Keep predicates SARGable and index the filtering columns to avoid slow plans.
Correct Answer: Policy-based filtering so a user sees only permitted rows
Example Code
ALTER TABLE invoices ENABLE ROW LEVEL SECURITY;
CREATE POLICY p_tenant ON invoices
USING (tenant_id = current_setting('app.tenant')::int);207. Which two layers cover common database encryption needs?
Difficulty: EasyType: MCQTopic: Security & Access Control
- Only disk encryption
- At rest on disks and in transit on connections
- Only per-column hashing
- Compression and deduplication
Encrypt at rest to protect media theft. Use TLS in transit to protect the wire. For highly sensitive fields, consider app-level encryption or tokenization with secure key management and rotation.
Correct Answer: At rest on disks and in transit on connections
Example Code
ssl = on -- enable TLS per engine
-- TDE or disk encryption configured at server/storage layer
208. Authentication verifies identity; authorization decides what?
Difficulty: EasyType: MCQTopic: Security & Access Control
- What the identity is allowed to do
- Which hash algorithm to use
- When to vacuum tables
- Where backups are stored
Authn answers “who are you.” Authz answers “what can you access.” Use centralized roles and scoped object permissions to enforce authorization consistently.
Correct Answer: What the identity is allowed to do
Example Code
CREATE ROLE reporting;
GRANT SELECT ON SCHEMA analytics TO reporting;
209. Why expose a VIEW to consumers instead of the base table?
Difficulty: MediumType: MCQTopic: Security & Access Control
- Views are always faster
- Views protect against injection automatically
- Views can project and filter columns, simplifying permissions and audits
- Views prevent locking
Grant SELECT on a view while denying base tables. This hides sensitive columns, encodes business rules, and narrows the surface area for callers.
Correct Answer: Views can project and filter columns, simplifying permissions and audits
Example Code
CREATE VIEW v_orders_public AS
SELECT id, created_at, total FROM orders WHERE status <> 'DELETED';
210. What should a practical audit trail capture for data changes?
Difficulty: MediumType: MCQTopic: Security & Access Control
- Only timing information
- Only new values
- Actor, time, object, and before/after values where feasible
- Nothing if backups exist
A good audit entry ties user, timestamp, target, and the change. Use append-only audit tables or built-in extensions. Keep the path tamper-evident and searchable.
Correct Answer: Actor, time, object, and before/after values where feasible
Example Code
INSERT INTO audit_log(actor, action, table_name, row_id, old_val, new_val)
VALUES (current_user, 'UPDATE', 'orders', :id, :old, :new);
211. Which combination enables point-in-time recovery (PITR)?
Difficulty: EasyType: MCQTopic: Backup & Recovery
- Full backup only
- Differential backups only
- Full backup plus continuous transaction log archiving
- Schema-only backups
A full backup gives you a base. Archiving WAL or transaction logs lets you replay to a chosen timestamp. Practice restores so you can execute under stress.
Correct Answer: Full backup plus continuous transaction log archiving
Example Code
# Concept
base_full.tar + wal_archive/0000... -> recover to '2025-10-15 10:30:00'
212. You need production-like test data without exposing PII. Describe a safe masking strategy and where to apply it.
Difficulty: MediumType: SubjectiveTopic: Security & Access Control
Mask or anonymize in the database before export or restore. Replace direct identifiers (email, phone, name) with deterministic but fake values so joins and uniqueness still work. Randomize quasi-identifiers and dates within safe bands to preserve distribution while breaking re-identification.
Automate the process as a repeatable pipeline. Store the masking logic in version control, run it as part of backup-to-test jobs, and restrict access to the pre-masked dump only. Validate that constraints and key relationships remain intact after masking.
Example Code
UPDATE users
SET email = CONCAT('user', id, '@example.test'),
phone = LPAD(id::text, 10, '0');213. Why must you test restores and not only take backups? Outline a simple validation routine.
Difficulty: EasyType: SubjectiveTopic: Backup & Recovery
Backups can be corrupt, incomplete, or incompatible. A restore is the proof that recovery works. Without tests, failures appear during incidents when time is scarce.
Automate periodic restores to a sandbox, replay logs to a target timestamp, and run canary queries and row-count checks. Compare checksums or counts against baselines, and alert if validation fails. Record RTO and update runbooks.
Example Code
1) Restore latest full
2) Replay logs to T
3) run: SELECT COUNT(*) FROM critical_tables; -- compare to baseline
214. Explain RPO and RTO. How do they shape your backup and HA design?
Difficulty: MediumType: SubjectiveTopic: Backup & Recovery
RPO is tolerated data loss. It drives log backup cadence and replication mode. A near-zero RPO needs synchronous replication or continuous log shipping.
RTO is tolerated downtime. It drives automation, hot standbys, and failover drills. Short RTO favors pre-provisioned replicas, scripted role swaps, and frequent restore testing.
Example Code
Policy: RPO 5 min (logs every 5 min); RTO 15 min (warm standby + automated failover)
215. Compare asynchronous read replicas with synchronous replicas for high availability and recovery.
Difficulty: MediumType: SubjectiveTopic: Backup & Recovery
Asynchronous replicas are great for read scale and reporting. They do not slow writes, but can lag. On failover you may lose recent commits equal to the lag. They suit relaxed RPO.
Synchronous replicas acknowledge commits only after a replica confirms. This reduces or eliminates data loss at the cost of higher write latency and tighter coupling. Choose per workload, per region, and document expected RPO/RTO.
Example Code
-- Concept: sync commit waits for replica ACK; async returns after local log flush
216. You must build dynamic SQL for a flexible report. How do you keep it safe and fast?
Difficulty: MediumType: SubjectiveTopic: SQL Basics & DML
Allow only whitelisted identifiers (columns, sort keys). Bind user values as parameters so the engine can reuse plans and injection is blocked. Expose a wrapper stored procedure with EXECUTE permissions, not direct table access.
Log the final statement and role for audit. Add resource guards such as row limits and timeouts, and index the columns your dynamic filters hit most.
Example Code
-- SQL Server sample
DECLARE @sql nvarchar(max)=N'SELECT col_list FROM t WHERE status=@s';
EXEC sp_executesql @sql, N'@s nvarchar(10)', @s=@in_status;