1. Which SQL statement retrieves all columns and rows from a table named employees?
- READ * FROM employees;
- SELECT ALL FROM employees;
- SELECT * FROM employees;
- GET * FROM employees;
Correct Answer: SELECT * FROM employees;
SELECT * FROM employees;
Get the Preplance app for a seamless learning experience. Practice offline, get daily streaks, and stay ahead with real-time interview updates.
Get it on
Google Play
4.9/5 Rating on Store
SQL & Databases · Complete Question Bank
Practice the complete collection of SQL & Databases interview questions including theory, coding, MCQs and real interview problems.
Questions
216
Full database
Topics
11
Categorised
Difficulty
Mixed Levels
Easy Hard
Scroll through every important SQL & Databases question asked in real interviews. Includes MCQs, subjective questions, and coding prompts.
Correct Answer: SELECT * FROM employees;
SELECT * FROM employees;
Browse more SQL & Databases questions, explore related subjects, or practice full interview sets to strengthen your preparation.
Correct Answer: WHERE
SELECT department_id, COUNT(*) FROM employees WHERE active = 1 GROUP BY department_id;
Correct Answer: Ascending
SELECT name FROM customers ORDER BY name;
Correct Answer: LIMIT n OFFSET m
SELECT id, name FROM products ORDER BY id LIMIT 20 OFFSET 40;
Correct Answer: WHERE col IS NULL
SELECT COUNT(*) total, COUNT(email) with_email FROM users WHERE deleted_at IS NULL;
Correct Answer: To give temporary readable names that improve clarity
SELECT e.id AS emp_id, d.name AS dept FROM employees e JOIN departments d ON d.id = e.dept_id;
Correct Answer: SELECT * FROM stores WHERE city IN ('Delhi','Mumbai');
SELECT * FROM stores WHERE city IN ('Delhi','Mumbai');Correct Answer: INSERT INTO table (cols) VALUES (...);
INSERT INTO employees (id, name, email) VALUES (101, 'Asha', 'asha@example.com');
Correct Answer: Target with a precise WHERE predicate and verify with SELECT first
-- preview SELECT * FROM orders WHERE status='CANCELLED'; -- then DELETE FROM orders WHERE status='CANCELLED';
SELECT id, name, status FROM tickets WHERE status='OPEN';
SELECT COALESCE(phone,'N/A') AS phone FROM users ORDER BY phone NULLS LAST;
SELECT * FROM students WHERE first_name LIKE 'K%'; SELECT * FROM students WHERE first_name LIKE '__K%';
CREATE TABLE payments( id BIGINT PRIMARY KEY, amount NUMERIC(12,2) NOT NULL, paid_at TIMESTAMP NOT NULL );
BEGIN; INSERT INTO orders(id, customer_id, total) VALUES(?, ?, ?); UPDATE customers SET order_count = order_count + 1 WHERE id = ?; SELECT order_count FROM customers WHERE id = ?; COMMIT;
Correct Answer: Date in WHERE, count in HAVING
SELECT customer_id, COUNT(*) AS orders_2025 FROM orders WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01' GROUP BY customer_id HAVING COUNT(*) > 5;
Correct Answer: COUNT(col) ignores NULLs; COUNT(*) counts all rows
SELECT COUNT(*) all_rows, COUNT(email) emails FROM users;
Correct Answer: Non-aggregated columns must all appear in GROUP BY
SELECT dept_id, COUNT(*) FROM employees GROUP BY dept_id;
Correct Answer: COUNT(DISTINCT customer_id)
SELECT date_trunc('month', paid_at) AS mon,
COUNT(DISTINCT customer_id) AS unique_buyers
FROM payments
GROUP BY date_trunc('month', paid_at);Correct Answer: SUM(CASE WHEN condition THEN value ELSE 0 END)
SELECT SUM(CASE WHEN status='PAID' THEN amount ELSE 0 END) AS paid_amt, SUM(CASE WHEN status='REFUND' THEN amount ELSE 0 END) AS refund_amt FROM invoices;
Correct Answer: Put the filter in WHERE for better performance
SELECT store_id, SUM(total) FROM sales WHERE region='APAC' GROUP BY store_id;
Correct Answer: Mean of non-NULL ratings only
SELECT product_id, AVG(rating) AS avg_rating FROM reviews GROUP BY product_id;
Correct Answer: GROUP BY CAST(sale_ts AS DATE)
SELECT CAST(sale_ts AS DATE) AS d, SUM(amount) FROM sales GROUP BY CAST(sale_ts AS DATE);
Correct Answer: SUM(value) FILTER (WHERE condition)
SELECT COUNT(*) AS orders, SUM(amount) FILTER (WHERE status='PAID') AS paid_amt FROM orders;
Correct Answer: Use the alias of the aggregate in ORDER BY
SELECT customer_id, SUM(total) AS spend FROM sales GROUP BY customer_id ORDER BY spend DESC;
SELECT COALESCE(category,'Unknown') AS cat,
SUM(CASE WHEN status='PAID' THEN amount ELSE 0 END) AS paid
FROM invoices
GROUP BY COALESCE(category,'Unknown');SELECT customer_id, total
FROM (
SELECT customer_id, total,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
FROM orders
) t
WHERE rn = 1;SELECT p.product_id
FROM products p
LEFT JOIN sales s
ON s.product_id = p.product_id
AND s.sale_date >= '2025-01-01' AND s.sale_date < '2026-01-01'
WHERE s.product_id IS NULL;
-- or
SELECT p.product_id
FROM products p
WHERE NOT EXISTS (
SELECT 1 FROM sales s
WHERE s.product_id=p.product_id
AND s.sale_date >= '2025-01-01' AND s.sale_date < '2026-01-01'
);SELECT DISTINCT user_id FROM logins; -- vs SELECT user_id, COUNT(*) FROM logins GROUP BY user_id;
SELECT category_id, product_id, revenue
FROM (
SELECT category_id, product_id, SUM(amount) AS revenue,
ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY SUM(amount) DESC) AS rn
FROM sales
GROUP BY category_id, product_id
) t
WHERE rn <= 3;Correct Answer: Only rows that satisfy the join predicate in both tables
SELECT o.id, c.name FROM orders o JOIN customers c ON c.id = o.customer_id;
Correct Answer: It returns all left rows and matching right rows; non-matches have NULLs on the right
SELECT c.id, o.id AS order_id FROM customers c LEFT JOIN orders o ON o.customer_id = c.id;
Correct Answer: RIGHT JOIN is the mirror image of LEFT JOIN after swapping table positions
SELECT * FROM A LEFT JOIN B ON B.k = A.k; -- same rows as SELECT * FROM B RIGHT JOIN A ON B.k = A.k;
Correct Answer: All rows that appear in either table, matching when possible
SELECT a.id, b.id FROM A a FULL OUTER JOIN B b ON b.id = a.id;
Correct Answer: A join that returns the cartesian product of the two inputs
SELECT d.dt, p.product_id FROM dates d CROSS JOIN products p;
Correct Answer: To join a table with itself to model hierarchies or peer relations
SELECT e.emp_id, e.name, m.name AS manager FROM employees e LEFT JOIN employees m ON m.emp_id = e.manager_id;
Correct Answer: A LEFT JOIN B ON ... WHERE B.id IS NULL
SELECT a.id FROM A a LEFT JOIN B b ON b.id = a.id WHERE b.id IS NULL;
Correct Answer: Use EXISTS with a correlated subquery on B
SELECT a.id FROM A a WHERE EXISTS ( SELECT 1 FROM B b WHERE b.a_id = a.id );
Correct Answer: In the ON clause
SELECT c.id, o.id AS order_id FROM customers c LEFT JOIN orders o ON o.customer_id = c.id AND o.status = 'PAID' WHERE c.active = 1;
Correct Answer: UNION ALL
SELECT name FROM customers UNION ALL SELECT name FROM employees;
SELECT SUM(oi.line_total) FROM orders o JOIN order_items oi ON oi.order_id = o.id;
SELECT a.id FROM A a WHERE NOT EXISTS ( SELECT 1 FROM B b WHERE b.a_id = a.id );
SELECT a.*, b.* FROM A a LEFT JOIN B b ON b.k = a.k UNION ALL SELECT a.*, b.* FROM A a RIGHT JOIN B b ON b.k = a.k WHERE a.k IS NULL;
SELECT f.sale_id, d.date_id FROM fact_sales f JOIN dim_date d ON d.date_id = f.date_id WHERE d.yyyy_mm = 202509;
SELECT c.id, o.id AS order_id FROM customers c LEFT JOIN orders o ON o.customer_id = c.id AND o.status='PAID' WHERE c.region='APAC';
Correct Answer: A subquery that references columns from the outer query and runs per outer row
SELECT e.emp_id FROM employees e WHERE e.salary > ( SELECT AVG(salary) FROM employees x WHERE x.dept_id = e.dept_id );
Correct Answer: WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id)
SELECT c.customer_id FROM customers c WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id );
Correct Answer: When the right side has at least one NULL value
SELECT a.id FROM A a WHERE a.id NOT IN ( SELECT b.a_id FROM B b WHERE b.a_id IS NOT NULL );
Correct Answer: It improves readability by naming intermediate results in steps
WITH recent AS ( SELECT * FROM orders WHERE order_date >= '2025-01-01' ) SELECT customer_id, COUNT(*) FROM recent GROUP BY customer_id;
Correct Answer: You can chain CTEs and later ones can read earlier ones
WITH s AS ( SELECT customer_id, SUM(total) AS spend FROM orders GROUP BY customer_id ), r AS ( SELECT * FROM s WHERE spend > 1000 ) SELECT * FROM r;
Correct Answer: An anchor member and a recursive member
WITH RECURSIVE org AS ( SELECT id, manager_id, 1 AS lvl FROM emp WHERE manager_id IS NULL UNION ALL SELECT e.id, e.manager_id, o.lvl+1 FROM emp e JOIN org o ON e.manager_id=o.id ) SELECT * FROM org;
Correct Answer: UPDATE with a correlated subquery in SET
UPDATE orders o SET discount = ( SELECT CASE WHEN vip THEN 0.1 ELSE 0 END FROM customers c WHERE c.id = o.customer_id );
Correct Answer: EXISTS with a selective correlated subquery
SELECT c.id FROM customers c WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id=c.id AND o.total>0 );
Correct Answer: A temp table is a named object you can index and reuse within the session
CREATE TEMP TABLE t AS SELECT * FROM big WHERE flag=1; CREATE INDEX ON t(id); SELECT * FROM t JOIN dim USING(id);
WITH ranked AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) rn FROM users ) SELECT * FROM ranked WHERE rn=1;
WITH RECURSIVE tree AS ( SELECT id, parent_id, 1 AS lvl, CAST(id AS TEXT) AS path FROM nodes WHERE parent_id IS NULL UNION ALL SELECT n.id, n.parent_id, t.lvl+1, t.path||'>'||n.id FROM nodes n JOIN tree t ON n.parent_id=t.id WHERE POSITION(CAST(n.id AS TEXT) IN t.path)=0 AND t.lvl<20 ) SELECT * FROM tree;
SELECT p.product_id FROM products p WHERE NOT EXISTS ( SELECT 1 FROM sales s WHERE s.product_id = p.product_id );
SELECT dept_id, emp_id, salary
FROM (
SELECT dept_id, emp_id, salary,
ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) rn
FROM employees
) x
WHERE rn=1;WITH t AS (SELECT * FROM big WHERE flag=1) SELECT COUNT(*) FROM t;
Correct Answer: ROW_NUMBER() breaks ties arbitrarily; RANK() leaves gaps; DENSE_RANK() has no gaps
SELECT emp_id,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn,
RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS drnk
FROM employees;Correct Answer: PARTITION BY forms independent groups; ORDER BY defines row order within each group
SUM(amount) OVER (PARTITION BY customer_id ORDER BY sale_ts)
Correct Answer: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
SELECT order_id, sale_date, amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY sale_date, order_id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM sales;Correct Answer: AVG(amount) OVER (PARTITION BY product ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
SELECT product_id, day,
AVG(amount) OVER (
PARTITION BY product_id
ORDER BY day
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS avg7
FROM daily_sales;Correct Answer: Zero, because of the default argument
SELECT day, amount,
LAG(amount, 1, 0) OVER (PARTITION BY product ORDER BY day) AS prev_amt
FROM daily_sales;Correct Answer: Default frame can include following rows, making LAST_VALUE equal to a future row
LAST_VALUE(amount) OVER ( PARTITION BY product ORDER BY day ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS last_amt
Correct Answer: NTILE and CUME_DIST
SELECT user_id, spend,
CUME_DIST() OVER (ORDER BY spend) AS cd,
NTILE(4) OVER (ORDER BY spend) AS quartile
FROM users_spend;Correct Answer: GROUP BY collapses rows per group; window functions add metrics while keeping each row
SELECT order_id, amount,
SUM(amount) OVER (PARTITION BY customer_id) AS cust_total
FROM orders;Correct Answer: PARTITION BY customer_id, month; ORDER BY date
SELECT customer_id, DATE_TRUNC('month', sale_ts) AS mon, sale_ts, amount,
SUM(amount) OVER (
PARTITION BY customer_id, DATE_TRUNC('month', sale_ts)
ORDER BY sale_ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS run_month
FROM sales;SELECT invoice_id, status, amount,
SUM(CASE WHEN status='PAID' THEN amount ELSE 0 END)
OVER (ORDER BY issued_at ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS paid_running
FROM invoices
ORDER BY issued_at;SELECT day, val,
LAG(val,1) OVER (PARTITION BY series_id ORDER BY day) AS prev,
val - LAG(val,1,0) OVER (PARTITION BY series_id ORDER BY day) AS delta,
CASE WHEN LAG(val,1) OVER (PARTITION BY series_id ORDER BY day) > 0
THEN (val - LAG(val,1) OVER (PARTITION BY series_id ORDER BY day))
/ LAG(val,1) OVER (PARTITION BY series_id ORDER BY day)
END AS growth
FROM series;SELECT category_id, product_id, revenue
FROM (
SELECT category_id, product_id, SUM(amount) AS revenue,
ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY SUM(amount) DESC) AS rn
FROM sales
GROUP BY category_id, product_id
) t
WHERE rn <= 3;SUM(amount) OVER ( ORDER BY day, id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS safe_running
SELECT order_id, amount,
LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_amt
FROM orders;WITH ranked AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_ts DESC) AS rn FROM events ) SELECT * FROM ranked WHERE rn = 1;
Correct Answer: It orders the table’s data pages by the index key
CREATE TABLE orders ( order_id BIGINT PRIMARY KEY, -- often clustered by default order_date DATE, customer_id BIGINT );
Correct Answer: On columns frequently used in WHERE, JOIN, ORDER BY, or GROUP BY
CREATE INDEX idx_users_email ON users(email);
Correct Answer: email (unique per user)
CREATE UNIQUE INDEX idx_users_email ON users(email);
Correct Answer: An index that includes all columns a query needs so no table lookup is required
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date) INCLUDE (total_amount);
Correct Answer: WHERE country_code = 'IN' AND email = 'a@b.com'
CREATE INDEX idx_users_cc_email ON users(country_code, email);
Correct Answer: WHERE created_at >= '2025-10-01' AND created_at < '2025-10-02'
SELECT * FROM events WHERE created_at >= '2025-10-01' AND created_at < '2025-10-02';
Correct Answer: WHERE name LIKE 'son%'
SELECT * FROM people WHERE name LIKE 'son%';
Correct Answer: To speed up parent deletes/updates and common joins from child to parent
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
Correct Answer: The optimizer may pick poor join orders or scan methods due to bad cardinality estimates
-- engine specific -- PostgreSQL autovacuum/analyze or: ANALYZE orders; -- SQL Server: UPDATE STATISTICS orders;
EXPLAIN ANALYZE SELECT o.id, c.name FROM orders o JOIN customers c ON c.id = o.customer_id WHERE o.order_date >= CURRENT_DATE - INTERVAL '30 days';
-- page 1 SELECT * FROM posts ORDER BY id ASC LIMIT 50; -- page 2 SELECT * FROM posts WHERE id > 50 ORDER BY id ASC LIMIT 50;
-- PostgreSQL CREATE INDEX idx_invoices_paid_recent ON invoices(paid_at) WHERE paid = true AND paid_at >= CURRENT_DATE - INTERVAL '90 days';
/* Strategy example: create minimal indexes first, add others after profiling */
-- PostgreSQL CREATE INDEX idx_users_email_lower ON users(LOWER(email)); -- query SELECT * FROM users WHERE LOWER(email) = 'x@y.com';
CREATE INDEX idx_fact_date ON fact_sales(date_id); CREATE INDEX idx_fact_customer ON fact_sales(customer_id); -- Verify unique indexes on dim_date(date_id), dim_customer(customer_id)
Correct Answer: To view the planned or actual execution steps and their costs
EXPLAIN ANALYZE SELECT o.id, c.name FROM orders o JOIN customers c ON c.id=o.customer_id WHERE o.created_at >= CURRENT_DATE - INTERVAL '30 days';
Correct Answer: Merge Join
/* Engine chooses merge when indexes support order: JOIN ON t1.k = t2.k with both indexed by k */
Correct Answer: They lead the optimizer to pick poor join orders and operators
-- PostgreSQL ANALYZE sales; -- SQL Server UPDATE STATISTICS sales;
Correct Answer: WHERE created_at BETWEEN '2025-10-01' AND '2025-10-01 23:59:59'
WHERE created_at >= '2025-10-01' AND created_at < '2025-10-02'
Correct Answer: When each disjunct can use a different index and results are later combined
(SELECT * FROM users WHERE email = $1) UNION ALL (SELECT * FROM users WHERE phone = $2)
Correct Answer: A cached plan optimized for a rare value performs poorly for common values
-- SQL Server example EXEC sp_recompile N'dbo.SearchOrders';
Correct Answer: When input is already ordered on GROUP BY keys
SELECT customer_id, SUM(total) FROM orders GROUP BY customer_id; -- index on (customer_id) enables stream aggregate
Correct Answer: Create or use an index that matches the ORDER BY columns and direction
CREATE INDEX idx_orders_date_id ON orders(order_date DESC, id DESC);
Correct Answer: Materialized views store precomputed results and need refresh; views re-run the query each time
-- Postgres
CREATE MATERIALIZED VIEW mv_sales AS
SELECT date_trunc('day', created_at) d, SUM(total) s
FROM orders GROUP BY 1;SELECT ... FROM orders o JOIN items i ON i.order_id=o.id WHERE o.status='PAID' AND o.created_at>=CURRENT_DATE-INTERVAL '30 days';
CREATE TEMP TABLE t AS SELECT * FROM big WHERE flag=1; CREATE INDEX ON t(user_id); SELECT ... FROM t JOIN users u ON u.id=t.user_id;
-- Example (syntax varies) SELECT /*+ USE_HASH(o c) LEADING(o c) */ ...
CREATE INDEX idx_sales_pid_day ON sales(product_id, day);
SELECT product_id, day,
SUM(amount) OVER (PARTITION BY product_id ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) rt
FROM sales WHERE day >= CURRENT_DATE - INTERVAL '180 days';-- Example branching IF @IsRare=1 SELECT ... WHERE country='VA' OPTION (RECOMPILE); ELSE SELECT ... WHERE country=@country;
Correct Answer: Atomicity, Consistency, Isolation, Durability
BEGIN; -- multiple writes COMMIT;
Correct Answer: Dirty: sees uncommitted data; Non-repeatable: same row changes; Phantom: new rows appear matching a predicate
SELECT * FROM orders WHERE status='NEW'; -- second read returns different result set
Correct Answer: Read Uncommitted: none; Read Committed: prevents dirty reads; Repeatable Read: prevents dirty and non-repeatable reads; Serializable: prevents dirty, non-repeatable, and phantoms
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
Correct Answer: Exclusive lock for writes, Shared lock for consistent reads in locking engines
-- conceptual UPDATE accounts SET balance = balance - 100 WHERE id=1;
Correct Answer: Readers see a snapshot while writers create new versions, so reads avoid blocking writes
START TRANSACTION; -- snapshot is taken here SELECT * FROM orders WHERE id=10;
Correct Answer: They happen when two transactions wait on each other’s locks in a cycle
BEGIN; -- T1 locks row A then tries B -- T2 locks row B then tries A -> cycle
Correct Answer: You may read uncommitted or torn data and see duplicates or missing rows
SELECT * FROM Orders WITH (NOLOCK);
Correct Answer: Roll back to a named point without aborting the whole transaction
BEGIN; SAVEPOINT s1; -- try some write ROLLBACK TO SAVEPOINT s1; COMMIT;
Correct Answer: Guarantees all participants commit or all roll back together
-- conceptual coordinator messages: prepare -> commit
UPDATE accounts SET balance = balance + :delta, version = version + 1 WHERE id = :id AND version = :version;
-- Example invariant: at least one doctor on call -- T1 sets A off call, T2 sets B off call; both read 'someone is on call' -> write skew
-- pseudocode
for attempt in 1..3:
begin tx
try { do_work(); commit; break; }
catch deadlock { rollback; sleep(jitter); continue; }SET idle_in_transaction_session_timeout = '5min';
INSERT INTO idempotency_keys(key, result_hash) VALUES(:k, :h) ON CONFLICT (key) DO NOTHING; -- proceed only once; fetch prior result when conflict occurs
Correct Answer: A correlated subquery is executed once per outer row because it references outer columns
SELECT e.* FROM employees e WHERE e.salary > ( SELECT AVG(salary) FROM employees x WHERE x.dept_id = e.dept_id );
Correct Answer: EXISTS is safer with NULLs and often scales better on indexed correlated checks
SELECT * FROM orders o WHERE EXISTS ( SELECT 1 FROM order_items i WHERE i.order_id = o.id );
Correct Answer: A LEFT JOIN B ON A.k = B.k WHERE B.k IS NULL
SELECT A.* FROM A LEFT JOIN B ON B.k = A.k WHERE B.k IS NULL;
Correct Answer: IN or EXISTS used to test membership without duplicating left rows
SELECT c.* FROM customers c WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.id );
Correct Answer: A temporary named result visible only to the next statement
WITH recent AS ( SELECT * FROM orders WHERE created_at >= CURRENT_DATE - INTERVAL '30 days' ) SELECT * FROM recent WHERE status = 'PAID';
Correct Answer: Anchor member and recursive member joined by UNION ALL
WITH RECURSIVE org AS ( SELECT id, manager_id, 1 AS lvl FROM employees WHERE manager_id IS NULL UNION ALL SELECT e.id, e.manager_id, o.lvl+1 FROM employees e JOIN org o ON e.manager_id = o.id ) SELECT * FROM org;
Correct Answer: UNION ALL skips duplicate removal and avoids sort/hash work
SELECT id FROM a UNION ALL SELECT id FROM b;
Correct Answer: Rewrite the subquery as a JOIN with proper indexes
SELECT o.* FROM orders o JOIN customers c ON c.id = o.customer_id WHERE c.vip = true;
Correct Answer: To compare rows within the same table, such as employees to their managers
SELECT e.name AS emp, m.name AS mgr FROM employees e LEFT JOIN employees m ON m.id = e.manager_id;
Correct Answer: When you need the cartesian product of two sets, like generating all size–color combinations
SELECT p.product_id, d.dt FROM products p CROSS JOIN dates d;
SELECT order_id, amount,
SUM(amount) OVER (PARTITION BY customer_id) AS cust_total
FROM orders;WITH RECURSIVE walk AS ( SELECT id, parent_id, ARRAY[id] AS path FROM nodes WHERE parent_id IS NULL UNION ALL SELECT n.id, n.parent_id, path || n.id FROM nodes n JOIN walk w ON w.id = n.parent_id WHERE n.id <> ALL(w.path) ) SELECT * FROM walk;
SELECT * FROM ( SELECT customer_id, SUM(total) AS s FROM orders GROUP BY 1 ) t JOIN customers c ON c.id = t.customer_id;
WITH latest AS (
SELECT o.*,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC, id DESC) rn
FROM orders o
), avg_cat AS (
SELECT category_id, AVG(total) AS avg_total FROM orders GROUP BY category_id
)
SELECT l.customer_id, l.total > a.avg_total AS above_avg
FROM latest l
JOIN avg_cat a ON a.category_id = l.category_id
WHERE l.rn = 1;Correct Answer: ROW_NUMBER()
SELECT order_id,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC, order_id DESC) AS rn
FROM orders;Correct Answer: Look at previous or next row values in the same partition order
SELECT day, amount,
amount - LAG(amount) OVER (ORDER BY day) AS delta
FROM sales_by_day;Correct Answer: ROWS counts physical rows; RANGE groups rows with peer values by the ORDER BY
SUM(amount) OVER ( ORDER BY day ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) AS moving_sum
Correct Answer: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
SUM(amount) OVER ( PARTITION BY account_id ORDER BY txn_time, id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_balance
Correct Answer: PERCENTILE_CONT()
PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY amount) OVER (PARTITION BY region) AS p90_amount
Correct Answer: Quartile bucket numbers across ordered rows
NTILE(4) OVER (PARTITION BY country ORDER BY revenue DESC) AS quartile
Correct Answer: SUM(CASE WHEN paid THEN 1 ELSE 0 END)
SUM(CASE WHEN status='PAID' THEN 1 ELSE 0 END) OVER (PARTITION BY customer_id) AS paid_cnt
Correct Answer: To avoid nondeterministic row ordering when sort keys tie
ROW_NUMBER() OVER (ORDER BY created_at DESC, id DESC) AS rn
Correct Answer: Align ORDER BY with a supporting index and use ROWS frames
CREATE INDEX idx_sales_pid_day ON sales(product_id, day); -- then order by (product_id, day)
AVG(amount) OVER ( PARTITION BY product_id ORDER BY sale_date, id ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS avg_7d
WITH ranked AS (
SELECT *, ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY created_at DESC, id DESC) rn
FROM orders
)
SELECT * FROM ranked WHERE rn=1;SUM(amount) OVER (
PARTITION BY account_id, date_trunc('month', txn_time)
ORDER BY txn_time, id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS month_running_balancePERCENT_RANK() OVER (PARTITION BY category ORDER BY score) AS pr -- 0 for first, 1 for last
-- daily uniques first, then window WITH d AS ( SELECT app_id, day, COUNT(DISTINCT user_id) AS du FROM events GROUP BY app_id, day ) SELECT app_id, day, SUM(du) OVER (PARTITION BY app_id ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS du_7d FROM d;
Correct Answer: PRIMARY KEY enforces uniqueness and NOT NULL; UNIQUE enforces uniqueness but can allow a single NULL (engine-dependent)
CREATE TABLE users ( id BIGINT PRIMARY KEY, email TEXT UNIQUE );
Correct Answer: To ensure child rows reference existing parent rows
CREATE TABLE orders ( id BIGINT PRIMARY KEY, customer_id BIGINT REFERENCES customers(id) );
Correct Answer: ON DELETE SET NULL
customer_id BIGINT REFERENCES customers(id) ON DELETE SET NULL
Correct Answer: CHECK
amount NUMERIC(12,2) CHECK (amount >= 0)
Correct Answer: Automatically fills the column when a value is not provided
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
Correct Answer: When the natural identity is a combination of columns that must be unique together
PRIMARY KEY (order_id, line_no)
Correct Answer: Constraint checks postponed until COMMIT within the transaction
ALTER TABLE t ADD CONSTRAINT fk DEFERRABLE INITIALLY DEFERRED;
Correct Answer: Surrogates simplify relationships and change rarely, but add an extra column without business meaning
CREATE TABLE products ( id BIGSERIAL PRIMARY KEY, sku TEXT UNIQUE );
Correct Answer: Use a CHECK constraint with an IN list
status TEXT NOT NULL CHECK (status IN ('NEW','PAID','CANCELLED'))CREATE INDEX idx_orders_customer_id ON orders(customer_id);
ALTER TABLE invoices ADD CONSTRAINT chk_amt CHECK (total >= 0); -- Trigger only if you need side-effects like audit rows
CREATE SEQUENCE seq_id START WITH 1;
CREATE TABLE t (id BIGINT PRIMARY KEY DEFAULT nextval('seq_id'));-- PostgreSQL CREATE UNIQUE INDEX uniq_users_email ON users(email) WHERE email IS NOT NULL;
CREATE TABLE customers(id PK, name, city); CREATE TABLE orders(id PK, customer_id FK REFERENCES customers(id));
Correct Answer: Guard with WHERE filters and verify affected counts, and use FK plus restrictive app paths
BEGIN; -- preview SELECT COUNT(*) FROM orders WHERE customer_id = :id; -- then delete parent DELETE FROM customers WHERE id = :id; COMMIT;
Correct Answer: B-tree index
CREATE INDEX idx_users_created ON users(created_at);
Correct Answer: WHERE country = 'IN' AND city = 'Delhi'
CREATE INDEX idx_loc_time ON places(country, city, created_at);
Correct Answer: The engine can satisfy the query directly from the index without reading the base table
-- SQL Server style CREATE INDEX ix ON orders(customer_id) INCLUDE (status,total); -- Postgres: just place columns in index and rely on index-only scan when visible
Correct Answer: It defines the physical order of rows, so the table is stored by the key
-- SQL Server CREATE CLUSTERED INDEX cix_orders ON orders(order_date, id);
Correct Answer: To validate tuple visibility or MVCC metadata that the index alone cannot confirm
-- Postgres example: VACUUM ANALYZE helps visibility maps
Correct Answer: Columns with high selectivity where filters return a small fraction of rows
CREATE INDEX idx_email ON users(email);
Correct Answer: When most queries filter on a rare condition, like status='ACTIVE'
-- PostgreSQL CREATE INDEX idx_active ON users(email) WHERE active = true;
Correct Answer: Low-cardinality columns combined across many predicates in analytics
-- Engine-specific; common in Oracle and some warehouses
Correct Answer: Each INSERT/UPDATE/DELETE must also update affected indexes
/* Review index usage and drop unused ones */
-- Postgres trigram CREATE EXTENSION IF NOT EXISTS pg_trgm; CREATE INDEX idx_trgm_name ON people USING GIN(name gin_trgm_ops); -- or store reversed_name and query WHERE reversed_name LIKE 'nos%';
/* Good date filter */ created_at >= '2025-10-01' AND created_at < '2025-10-02' /* Functional index option */ CREATE INDEX ix_email_lower ON users((lower(email)));
CREATE INDEX idx_posts_feed ON posts(user_id, created_at DESC, id DESC);
-- Postgres ANALYZE users; -- SQL Server UPDATE STATISTICS users WITH FULLSCAN;
-- SQL Server examples ALTER INDEX ix ON t REORGANIZE; ALTER INDEX ix ON t REBUILD WITH (FILLFACTOR = 90);
Correct Answer: The chosen operators, join order, and estimated or actual row counts and costs
EXPLAIN ANALYZE SELECT * FROM orders o JOIN customers c ON c.id=o.customer_id WHERE o.status='PAID';
Correct Answer: Nested loops: small outer with indexed inner; Hash join: big inputs, equality; Merge join: both inputs pre-sorted on join keys
-- add index to favor nested loops CREATE INDEX idx_items_order ON order_items(order_id);
Correct Answer: OFFSET forces the engine to scan and discard many rows before returning the page
SELECT * FROM posts WHERE user_id=:u AND (created_at,id) < (:t,:id) ORDER BY created_at DESC, id DESC LIMIT 50;
Correct Answer: The optimizer caches a plan optimized for one parameter value that may be bad for others
-- SQL Server example EXEC sp_executesql N'SELECT * FROM t WHERE status=@s', N'@s nvarchar(10)', @s='HOT';
Correct Answer: The optimizer misestimates row counts and chooses wrong operators or join order
-- Postgres ANALYZE orders; -- SQL Server UPDATE STATISTICS orders WITH FULLSCAN;
Correct Answer: WHERE created_at >= '2025-10-01' AND created_at < '2025-10-02'
created_at >= :d AND created_at < :d + INTERVAL '1 day'
Correct Answer: When the intermediate result is reused multiple times or benefits from indexing
CREATE TEMP TABLE recent AS SELECT * FROM events WHERE ts>=NOW()-INTERVAL '7 days'; CREATE INDEX ON recent(user_id); SELECT ... FROM recent r JOIN users u ON u.id=r.user_id;
Correct Answer: Hints bypass the optimizer and can lock you into fragile plans as data changes
-- Example (vendor-specific) SELECT /*+ INDEX(o idx_orders_status) */ * FROM orders o WHERE status='PAID';
CREATE STATISTICS s_orders ON orders(status, country); -- or extended stats equivalent per engine
WHILE 1=1 LOOP DELETE FROM events WHERE ts < :cutoff ORDER BY ts LIMIT 10000; EXIT WHEN ROW_COUNT()=0; COMMIT; END LOOP;
SELECT k, SUM(v) FROM ( SELECT key AS k, value AS v FROM big_table WHERE dt>=:d ) s GROUP BY k;
-- Postgres example SET work_mem = '256MB'; -- as a tactical fix alongside logical rewrites
Correct Answer: DISTINCT or GROUP BY on the smaller side before joining
WITH d AS ( SELECT DISTINCT customer_id FROM clicks WHERE dt=:d ) SELECT ... FROM d JOIN customers c ON c.id=d.customer_id;
-- SQL Server EXEC sp_create_plan_guide_from_handle @name='freeze_q', @plan_handle=...;
Correct Answer: Atomicity, Consistency, Isolation, Durability
BEGIN; -- do multiple statements COMMIT; -- all persist or none with ROLLBACK
Correct Answer: COMMIT, ROLLBACK, SAVEPOINT
BEGIN; UPDATE accounts SET bal=bal-100 WHERE id=1; SAVEPOINT s1; UPDATE accounts SET bal=bal+100 WHERE id=2; ROLLBACK TO s1; -- optional COMMIT;
Correct Answer: Read Committed
SET TRANSACTION ISOLATION LEVEL READ COMMITTED; -- now reads see only committed rows
Correct Answer: Phantom: a range query sees new rows appear between two scans
/* Phantom example */ -- T1: SELECT * FROM orders WHERE amt > 100; -- sees N -- T2: INSERT order amt=200 COMMIT -- T1: re-run: now sees N+1
Correct Answer: Readers see a snapshot while writers create new row versions
-- Snapshot at statement start SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
Correct Answer: Reads take shared or use versions; writes take exclusive
UPDATE inventory SET qty=qty-1 WHERE sku='X'; -- exclusive row lock
Correct Answer: Acquire resources in a consistent order across transactions
-- Always lock parent, then child -- e.g., update account, then update its orders
Correct Answer: It can read uncommitted, duplicate, or missing rows due to in-flight changes
SELECT * FROM orders WITH (NOLOCK); -- may be inconsistent
Correct Answer: Version checks at write time to detect conflicts
UPDATE t SET val=:v, version=version+1 WHERE id=:id AND version=:expected;
Correct Answer: Multiple resource managers or databases to achieve atomic commit
-- Coordinator: PREPARE TRANSACTION 'tx1'; then COMMIT PREPARED 'tx1';
for attempt in 1..5 {
BEGIN; -- do work
if success COMMIT; else ROLLBACK; sleep(backoff(attempt)+jitter);
}SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; SELECT SUM(amount) FROM sales WHERE dt BETWEEN :d1 AND :d2;
-- Batch pattern DELETE FROM events WHERE created_at < :cutoff LIMIT 10000; COMMIT; -- loop
-- Guard with predicate lock SELECT * FROM shifts WHERE doctor_id=:d AND overlaps(:start,:end) FOR UPDATE;
-- InnoDB example range SELECT * FROM orders WHERE amount BETWEEN 100 AND 200 FOR UPDATE;
Correct Answer: Use least privilege and grant only what is required
CREATE ROLE app_read; GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_read;
Correct Answer: Assign or remove privileges on database objects
GRANT SELECT, INSERT ON orders TO app_writer; REVOKE INSERT ON orders FROM app_writer;
Correct Answer: Parameterized queries or prepared statements
SELECT * FROM users WHERE email = $1; -- bind at driver
Correct Answer: Policy-based filtering so a user sees only permitted rows
ALTER TABLE invoices ENABLE ROW LEVEL SECURITY;
CREATE POLICY p_tenant ON invoices
USING (tenant_id = current_setting('app.tenant')::int);Correct Answer: At rest on disks and in transit on connections
ssl = on -- enable TLS per engine -- TDE or disk encryption configured at server/storage layer
Correct Answer: What the identity is allowed to do
CREATE ROLE reporting; GRANT SELECT ON SCHEMA analytics TO reporting;
Correct Answer: Views can project and filter columns, simplifying permissions and audits
CREATE VIEW v_orders_public AS SELECT id, created_at, total FROM orders WHERE status <> 'DELETED';
Correct Answer: Actor, time, object, and before/after values where feasible
INSERT INTO audit_log(actor, action, table_name, row_id, old_val, new_val) VALUES (current_user, 'UPDATE', 'orders', :id, :old, :new);
Correct Answer: Full backup plus continuous transaction log archiving
# Concept base_full.tar + wal_archive/0000... -> recover to '2025-10-15 10:30:00'
UPDATE users
SET email = CONCAT('user', id, '@example.test'),
phone = LPAD(id::text, 10, '0');1) Restore latest full 2) Replay logs to T 3) run: SELECT COUNT(*) FROM critical_tables; -- compare to baseline
Policy: RPO 5 min (logs every 5 min); RTO 15 min (warm standby + automated failover)
-- Concept: sync commit waits for replica ACK; async returns after local log flush
-- SQL Server sample DECLARE @sql nvarchar(max)=N'SELECT col_list FROM t WHERE status=@s'; EXEC sp_executesql @sql, N'@s nvarchar(10)', @s=@in_status;