Home / SQL / T-SQL / SQL Subqueries and Common Table Expressions (CTEs): A Complete Beginner's Guide

SQL Subqueries and Common Table Expressions (CTEs): A Complete Beginner's Guide

On

Understanding SQL subqueries and Common Table Expressions (CTEs) is a game-changing skill that elevates your SQL development from basic to advanced. Both subqueries and CTEs allow you to break down complex problems into manageable pieces, making your queries more powerful, readable, and maintainable. In this comprehensive guide, I'll show you how to master subqueries and CTEs using practical examples from the WideWorldImporters database, helping you understand when to use each technique and how they transform the way you write SQL queries.

As you advance in SQL development, you'll encounter scenarios where simple SELECT statements can't solve the problem efficiently. Subqueries and CTEs are essential tools for handling complex data analysis, hierarchical relationships, and multi-step calculations. Let's explore each concept with real-world examples that you can immediately apply to your own work.

Understanding SQL Subqueries

A subquery is a query nested inside another query. Think of it as a query within a query—you use the results from an inner query to help the outer query make decisions or retrieve specific data.

Developer Insight: When I first learned subqueries, the concept seemed abstract. The breakthrough came when I realized subqueries are simply breaking a problem into steps: "First, find X, then use X to find Y." Instead of doing mental gymnastics, let SQL do the work.


Subqueries in the WHERE Clause

The most common use of subqueries is filtering data based on results from another query.

-- Find customers who have placed above-average order counts
SELECT 
    CustomerID,
    CustomerName,
    PhoneNumber
FROM Sales.Customers
WHERE CustomerID IN (
    SELECT CustomerID
    FROM Sales.Orders
    GROUP BY CustomerID
    HAVING COUNT(*) > (
        SELECT AVG(OrderCount)
        FROM (
            SELECT CustomerID, COUNT(*) AS OrderCount
            FROM Sales.Orders
            GROUP BY CustomerID
        ) AS CustomerOrderCounts
    )
)
ORDER BY CustomerName;

This query breaks down into steps:

  1. Inner-most query: Calculate average orders per customer
  2. Middle query: Find customers with more orders than average
  3. Outer query: Get full customer details for those customers

Simple Scalar Subqueries

A scalar subquery returns a single value that can be used in comparisons:

-- Find stock items more expensive than the average
SELECT 
    StockItemID,
    StockItemName,
    UnitPrice
FROM Warehouse.StockItems
WHERE UnitPrice > (
    SELECT AVG(UnitPrice)
    FROM Warehouse.StockItems
)
ORDER BY UnitPrice DESC;

The subquery (SELECT AVG(UnitPrice) FROM Warehouse.StockItems) returns one number, which SQL then uses to filter the main query.


Subqueries in the FROM Clause

You can use a subquery as a temporary table (called a derived table):

-- Get customer order statistics
SELECT 
    CustomerName,
    OrderStats.TotalOrders,
    OrderStats.TotalRevenue
FROM Sales.Customers c
INNER JOIN (
    SELECT 
        o.CustomerID,
        COUNT(DISTINCT o.OrderID) AS TotalOrders,
        SUM(ol.Quantity * ol.UnitPrice) AS TotalRevenue
    FROM Sales.Orders o
    INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
    WHERE o.OrderDate >= '2016-01-01'
    GROUP BY o.CustomerID
) AS OrderStats ON c.CustomerID = OrderStats.CustomerID
WHERE OrderStats.TotalRevenue > 10000
ORDER BY OrderStats.TotalRevenue DESC;

Developer Insight: FROM clause subqueries were confusing until I understood them as "virtual tables." You're creating a temporary table, giving it an alias (OrderStats), and joining to it just like any regular table.


Correlated Subqueries

A correlated subquery references columns from the outer query, executing once for each row:

-- Find products that have never been ordered
SELECT 
    StockItemID,
    StockItemName,
    UnitPrice
FROM Warehouse.StockItems si
WHERE NOT EXISTS (
    SELECT 1
    FROM Sales.OrderLines ol
    WHERE ol.StockItemID = si.StockItemID
)
ORDER BY StockItemName;

For each stock item, the subquery checks if any order lines exist. NOT EXISTS returns items with no matching orders.

Subqueries in the SELECT Clause

You can use subqueries as calculated columns:

-- Show customers with their order counts
SELECT 
    c.CustomerID,
    c.CustomerName,
    (SELECT COUNT(*)
     FROM Sales.Orders o
     WHERE o.CustomerID = c.CustomerID) AS OrderCount,
    (SELECT MAX(o.OrderDate)
     FROM Sales.Orders o
     WHERE o.CustomerID = c.CustomerID) AS LastOrderDate
FROM Sales.Customers c
WHERE c.CustomerID <= 10
ORDER BY c.CustomerID;

Each subquery executes once per customer row, calculating their specific statistics.


Understanding Common Table Expressions (CTEs)

A Common Table Expression (CTE) is a named temporary result set that exists for the duration of a single query. CTEs make complex queries more readable by breaking them into logical, named steps.

Basic CTE Syntax

WITH CTE_Name AS (
    -- Your query here
    SELECT columns
    FROM table
    WHERE conditions
)
SELECT *
FROM CTE_Name;

The WITH keyword introduces the CTE, followed by the CTE name, the AS keyword, and the query definition in parentheses.


Simple CTE Example

Let's rewrite one of our earlier subqueries as a CTE:

-- Find stock items more expensive than average using CTE
WITH AvgPrice AS (
    SELECT AVG(UnitPrice) AS AveragePrice
    FROM Warehouse.StockItems
)
SELECT 
    si.StockItemID,
    si.StockItemName,
    si.UnitPrice,
    ap.AveragePrice
FROM Warehouse.StockItems si
CROSS JOIN AvgPrice ap
WHERE si.UnitPrice > ap.AveragePrice
ORDER BY si.UnitPrice DESC;

This is more readable than the subquery version—you can see exactly what AvgPrice represents.

Developer Insight: CTEs transformed my SQL writing. Instead of nested parentheses creating "arrow code," CTEs read like a story: "First calculate this, then calculate that, then show the final result." It's SQL that human beings can actually maintain.


Multiple CTEs in One Query

You can define multiple CTEs separated by commas:

-- Analyze customer purchasing patterns with multiple CTEs
WITH CustomerOrders AS (
    SELECT 
        CustomerID,
        COUNT(DISTINCT OrderID) AS OrderCount,
        MIN(OrderDate) AS FirstOrder,
        MAX(OrderDate) AS LastOrder
    FROM Sales.Orders
    WHERE OrderDate >= '2016-01-01'
    GROUP BY CustomerID
),
CustomerRevenue AS (
    SELECT 
        o.CustomerID,
        SUM(ol.Quantity * ol.UnitPrice) AS TotalRevenue
    FROM Sales.Orders o
    INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
    WHERE o.OrderDate >= '2016-01-01'
    GROUP BY o.CustomerID
)
SELECT 
    c.CustomerName,
    co.OrderCount,
    cr.TotalRevenue,
    DATEDIFF(DAY, co.FirstOrder, co.LastOrder) AS CustomerLifetimeDays
FROM Sales.Customers c
INNER JOIN CustomerOrders co ON c.CustomerID = co.CustomerID
INNER JOIN CustomerRevenue cr ON c.CustomerID = cr.CustomerID
WHERE cr.TotalRevenue > 5000
ORDER BY cr.TotalRevenue DESC;

This query uses two CTEs (CustomerOrders and CustomerRevenue) to organize the logic clearly.


CTEs for Complex Calculations

CTEs excel at building calculations step-by-step:

-- Calculate inventory reorder priorities
WITH StockLevels AS (
    SELECT 
        si.StockItemID,
        si.StockItemName,
        si.SupplierID,
        sih.QuantityOnHand,
        sih.ReorderLevel,
        (sih.ReorderLevel - sih.QuantityOnHand) AS UnitsNeeded
    FROM Warehouse.StockItems si
    INNER JOIN Warehouse.StockItemHoldings sih ON si.StockItemID = sih.StockItemID
    WHERE sih.QuantityOnHand < sih.ReorderLevel
),
SupplierInfo AS (
    SELECT 
        s.SupplierID,
        s.SupplierName,
        s.PhoneNumber
    FROM Purchasing.Suppliers s
)
SELECT 
    sl.StockItemName,
    sl.QuantityOnHand,
    sl.ReorderLevel,
    sl.UnitsNeeded,
    si.SupplierName,
    si.PhoneNumber,
    CASE 
        WHEN sl.QuantityOnHand = 0 THEN 'CRITICAL'
        WHEN sl.UnitsNeeded >= 50 THEN 'HIGH'
        ELSE 'NORMAL'
    END AS Priority
FROM StockLevels sl
INNER JOIN SupplierInfo si ON sl.SupplierID = si.SupplierID
ORDER BY 
    CASE 
        WHEN sl.QuantityOnHand = 0 THEN 1
        WHEN sl.UnitsNeeded >= 50 THEN 2
        ELSE 3
    END,
    sl.UnitsNeeded DESC;

Recursive CTEs

Recursive CTEs reference themselves, perfect for hierarchical data like organizational charts or bill-of-materials:

-- Generate a number sequence using recursive CTE
WITH NumberSequence AS (
    -- Anchor: Starting point
    SELECT 1 AS Number
    
    UNION ALL
    
    -- Recursive part: Reference the CTE itself
    SELECT Number + 1
    FROM NumberSequence
    WHERE Number < 10
)
SELECT Number
FROM NumberSequence;

This generates numbers 1 through 10. The anchor query creates the first row, then the recursive part keeps adding rows until the WHERE condition fails.

Practical Example with Real Data:

-- Find all people and simulate a reporting hierarchy
WITH RECURSIVE PeopleHierarchy AS (
    -- Anchor: Top-level people (simulated)
    SELECT 
        PersonID,
        FullName,
        EmailAddress,
        1 AS Level
    FROM Application.People
    WHERE PersonID <= 5
)
SELECT 
    PersonID,
    FullName,
    EmailAddress,
    Level
FROM PeopleHierarchy
ORDER BY Level, PersonID;

Subqueries vs CTEs: When to Use Each

Use Subqueries When:

1. Simple, one-off filtering

-- Quick filter - subquery is fine
SELECT CustomerName
FROM Sales.Customers
WHERE CustomerID IN (SELECT CustomerID FROM Sales.Orders WHERE OrderDate = '2016-05-20');

2. Using WHERE clause with IN, EXISTS, or comparison operators

-- CTEs can't be used directly in WHERE with IN
WHERE CustomerID IN (subquery)
WHERE EXISTS (subquery)

3. Single-column scalar values

SELECT *, (SELECT AVG(Price) FROM Products) AS AvgPrice
FROM Products;

Use CTEs When:

1. Complex queries needing multiple steps

-- Much more readable with CTE
WITH Step1 AS (...),
     Step2 AS (...),
     Step3 AS (...)
SELECT * FROM Step3;

2. Reusing the same calculation multiple times

WITH ImportantCustomers AS (
    SELECT CustomerID FROM Sales.Customers WHERE ...
)
SELECT ... FROM ImportantCustomers  -- Used here
UNION
SELECT ... FROM ImportantCustomers  -- And again here

3. Recursive queries

WITH RECURSIVE TreeData AS (...)
SELECT * FROM TreeData;

4. Improving readability and maintainability

-- CTE version is self-documenting
WITH HighValueOrders AS (...),
     FrequentCustomers AS (...),
     TopProducts AS (...)

Developer Insight: My rule of thumb: If I'm nesting more than two subqueries or using the same subquery twice, I switch to CTEs. If it's a simple filter or comparison, subqueries are fine. Readability always wins.


Practical Real-World Examples

Example 1: Sales Performance Analysis

-- Identify top-performing customers by revenue and order frequency
WITH CustomerMetrics AS (
    SELECT 
        o.CustomerID,
        COUNT(DISTINCT o.OrderID) AS TotalOrders,
        SUM(ol.Quantity * ol.UnitPrice) AS TotalRevenue,
        AVG(ol.Quantity * ol.UnitPrice) AS AvgOrderValue
    FROM Sales.Orders o
    INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
    WHERE o.OrderDate >= '2016-01-01'
    GROUP BY o.CustomerID
),
TopPerformers AS (
    SELECT 
        CustomerID,
        TotalOrders,
        TotalRevenue,
        AvgOrderValue
    FROM CustomerMetrics
    WHERE TotalRevenue > (SELECT AVG(TotalRevenue) * 1.5 FROM CustomerMetrics)
)
SELECT 
    c.CustomerName,
    tp.TotalOrders,
    tp.TotalRevenue,
    tp.AvgOrderValue
FROM TopPerformers tp
INNER JOIN Sales.Customers c ON tp.CustomerID = c.CustomerID
ORDER BY tp.TotalRevenue DESC;

Example 2: Inventory Management Report

-- Compare stock levels across suppliers
WITH SupplierInventory AS (
    SELECT 
        si.SupplierID,
        COUNT(DISTINCT si.StockItemID) AS TotalItems,
        SUM(sih.QuantityOnHand) AS TotalUnits,
        SUM(CASE WHEN sih.QuantityOnHand < sih.ReorderLevel THEN 1 ELSE 0 END) AS LowStockItems
    FROM Warehouse.StockItems si
    INNER JOIN Warehouse.StockItemHoldings sih ON si.StockItemID = sih.StockItemID
    GROUP BY si.SupplierID
)
SELECT 
    s.SupplierName,
    inv.TotalItems,
    inv.TotalUnits,
    inv.LowStockItems,
    CAST(inv.LowStockItems AS DECIMAL) / inv.TotalItems * 100 AS LowStockPercentage
FROM SupplierInventory inv
INNER JOIN Purchasing.Suppliers s ON inv.SupplierID = s.SupplierID
WHERE inv.LowStockItems > 0
ORDER BY LowStockPercentage DESC;

Example 3: Customer Segmentation Using Subqueries

-- Segment customers by purchase behavior
SELECT 
    c.CustomerID,
    c.CustomerName,
    CASE 
        WHEN (SELECT COUNT(*) FROM Sales.Orders o WHERE o.CustomerID = c.CustomerID) > 50 
            THEN 'VIP'
        WHEN (SELECT COUNT(*) FROM Sales.Orders o WHERE o.CustomerID = c.CustomerID) > 20 
            THEN 'Regular'
        WHEN (SELECT COUNT(*) FROM Sales.Orders o WHERE o.CustomerID = c.CustomerID) > 0 
            THEN 'Occasional'
        ELSE 'New'
    END AS CustomerSegment,
    (SELECT COUNT(*) FROM Sales.Orders o WHERE o.CustomerID = c.CustomerID) AS OrderCount
FROM Sales.Customers c
WHERE c.CustomerID <= 20
ORDER BY OrderCount DESC;

Common Mistakes to Avoid

Mistake 1: Forgetting Aliases for Derived Tables

Problem: Subqueries in FROM clause must have an alias.

-- WRONG: No alias
SELECT CustomerID, TotalOrders
FROM (
    SELECT CustomerID, COUNT(*) AS TotalOrders
    FROM Sales.Orders
    GROUP BY CustomerID
);

-- CORRECT: Add alias
SELECT CustomerID, TotalOrders
FROM (
    SELECT CustomerID, COUNT(*) AS TotalOrders
    FROM Sales.Orders
    GROUP BY CustomerID
) AS OrderCounts;

Mistake 2: Using CTEs Inefficiently

Problem: Referencing a CTE multiple times doesn't guarantee it's executed once.

-- CTE may execute twice (optimizer-dependent)
WITH ExpensiveCalculation AS (
    SELECT ... -- Complex calculation
)
SELECT * FROM ExpensiveCalculation
UNION ALL
SELECT * FROM ExpensiveCalculation;

Developer Insight: SQL Server's query optimizer decides whether to materialize a CTE or inline it. For expensive calculations used multiple times, consider using a temp table instead.

Mistake 3: Correlated Subqueries Performance Issues

Problem: Correlated subqueries can be slow because they execute once per outer row.

-- SLOW: Correlated subquery
SELECT 
    c.CustomerName,
    (SELECT COUNT(*) FROM Sales.Orders o WHERE o.CustomerID = c.CustomerID) AS OrderCount
FROM Sales.Customers c;

-- FASTER: Use JOIN or CTE
SELECT 
    c.CustomerName,
    COUNT(o.OrderID) AS OrderCount
FROM Sales.Customers c
LEFT JOIN Sales.Orders o ON c.CustomerID = o.CustomerID
GROUP BY c.CustomerID, c.CustomerName;

Mistake 4: Missing Semicolon Before CTE

Problem: SQL Server requires a statement terminator before WITH.

-- WRONG: No semicolon
SELECT * FROM Sales.Customers
WITH CustomerOrders AS (...)
SELECT * FROM CustomerOrders;

-- CORRECT: Add semicolon
SELECT * FROM Sales.Customers;

WITH CustomerOrders AS (...)
SELECT * FROM CustomerOrders;

Mistake 5: Trying to Use CTEs in WHERE Clause

Problem: CTEs can't be used directly in WHERE with IN or EXISTS.

-- WRONG: Can't use CTE this way
WITH SpecialCustomers AS (SELECT CustomerID FROM ...)
SELECT * FROM Sales.Orders
WHERE CustomerID IN SpecialCustomers;

-- CORRECT: Use subquery or join to CTE
WITH SpecialCustomers AS (SELECT CustomerID FROM ...)
SELECT o.* FROM Sales.Orders o
INNER JOIN SpecialCustomers sc ON o.CustomerID = sc.CustomerID;

Best Practices

1. Name CTEs descriptively - Use names that explain what the CTE contains.

2. Break complex queries into steps - Use multiple CTEs rather than deeply nested subqueries.

3. Consider performance - Correlated subqueries can be slow; test alternatives.

4. Use appropriate tool - Subqueries for simple filters, CTEs for complex logic.

5. Add comments - Explain business logic, especially in recursive CTEs.

6. Test incrementally - Build and verify each CTE before adding the next.

7. Be mindful of scope - CTEs only exist for the immediate query.

-- Example of good practices
-- Step 1: Identify high-value customers
WITH HighValueCustomers AS (
    SELECT CustomerID, SUM(TotalAmount) AS Revenue
    FROM Sales.Orders
    WHERE OrderDate >= '2016-01-01'
    GROUP BY CustomerID
    HAVING SUM(TotalAmount) > 10000
),
-- Step 2: Get their recent orders
RecentOrders AS (
    SELECT o.OrderID, o.CustomerID, o.OrderDate
    FROM Sales.Orders o
    INNER JOIN HighValueCustomers hvc ON o.CustomerID = hvc.CustomerID
    WHERE o.OrderDate >= '2016-05-01'
)
-- Final result: Show customer details with order info
SELECT 
    c.CustomerName,
    ro.OrderID,
    ro.OrderDate
FROM RecentOrders ro
INNER JOIN Sales.Customers c ON ro.CustomerID = c.CustomerID
ORDER BY c.CustomerName, ro.OrderDate DESC;

Summary and Key Takeaways

Mastering SQL subqueries and CTEs unlocks advanced query capabilities and dramatically improves code readability. Here's what you need to remember:

Subqueries are queries nested inside other queries:

  • Used in WHERE, FROM, SELECT clauses
  • Perfect for simple filters and comparisons
  • Can be scalar (single value) or correlated (reference outer query)
  • Required for WHERE clause with IN, EXISTS, or comparison operators

Common Table Expressions (CTEs) are named temporary result sets:

  • Defined with WITH keyword
  • Improve readability by breaking complex queries into steps
  • Can be referenced multiple times in the same query
  • Support recursive queries for hierarchical data
  • Must be followed by SELECT, INSERT, UPDATE, or DELETE

When to use each:

  • Simple filtering → Subquery
  • Complex multi-step logic → CTE
  • Reusing calculations → CTE
  • Recursive operations → CTE (only option)
  • WHERE clause with IN/EXISTS → Subquery
  • Improving maintainability → CTE

Developer Insight: Both subqueries and CTEs solve similar problems differently. Subqueries are compact and familiar; CTEs are readable and maintainable. As your queries grow in complexity, you'll naturally gravitate toward CTEs. The WideWorldImporters database provides excellent practice opportunities—experiment with both techniques to understand their strengths.

Next Steps: Practice converting subqueries to CTEs and vice versa. Try building complex reports using multiple CTEs. Experiment with recursive CTEs for generating sequences or traversing hierarchies. The more you work with these techniques, the more natural they become.


Frequently Asked Questions

Q: Can I use a CTE in a WHERE clause like a subquery?

A: No, you cannot use a CTE directly in a WHERE clause with IN or EXISTS. CTEs must be referenced in the FROM clause. If you need filtering behavior in WHERE, use a subquery or join to the CTE. For example, instead of WHERE CustomerID IN (MyCTE), write WHERE CustomerID IN (SELECT CustomerID FROM MyCTE) or join to MyCTE in the FROM clause.

Q: Are CTEs faster than subqueries or vice versa?

A: Performance is usually similar because SQL Server's query optimizer often produces the same execution plan for equivalent subqueries and CTEs. The optimizer decides whether to materialize or inline either approach. However, readability and maintainability differ significantly—CTEs are generally easier to understand and modify. Always test performance with your actual data volumes using execution plans.

Q: Can I use multiple CTEs in one query, and can one CTE reference another?

A: Yes! You can define multiple CTEs separated by commas in a single WITH clause. Later CTEs can reference earlier ones, allowing you to build complex logic step by step. For example: WITH CTE1 AS (...), CTE2 AS (SELECT * FROM CTE1 WHERE ...), CTE3 AS (SELECT * FROM CTE2 JOIN ...). This is one of the major advantages of CTEs—building sophisticated queries through logical progression.

Tags: