Home / Intermediate SQL / Query Optimization / T-SQL / How SQL Server Indexes Actually Work: A Deep Dive into Clustered, Nonclustered, and Included Columns

How SQL Server Indexes Actually Work: A Deep Dive into Clustered, Nonclustered, and Included Columns

On

Have you ever wondered why some queries run in milliseconds while others take forever? The secret often lies in understanding how indexes actually work in SQL Server. As someone who's spent years optimizing databases and troubleshooting slow queries, I can tell you that mastering indexes is one of the most valuable skills for any SQL developer.

In this guide, we'll explore how SQL Server indexes actually work under the hood, focusing on clustered indexes, nonclustered indexes, and the powerful concept of included columns. By the end, you'll understand not just what indexes are, but how they organize data and why they make your queries faster.

What Are SQL Server Indexes and Why Do They Matter?

Think of a SQL Server index like the index at the back of a textbook. Instead of flipping through every page to find information about "transactions," you look it up in the index and jump directly to page 247. SQL Server indexes work the same way—they help the database engine find data without scanning every single row.

Without indexes, SQL Server performs table scans, reading every row in a table to find what you need. For small tables with a few hundred rows, this isn't a problem. But when you're working with tables containing millions of rows—like the Sales.OrderLines table in a busy e-commerce database—table scans become painfully slow.

Understanding Clustered Indexes: The Foundation of Table Storage

A clustered index determines the physical order of data in a table. This is crucial: a table can have only one clustered index because the data can only be physically sorted one way.

How Clustered Indexes Actually Store Data

When you create a clustered index, SQL Server reorganizes the entire table so that rows are stored in the order of the clustered index key. The data pages themselves become the leaf level of the index.

Let's look at the Sales.Orders table in WideWorldImporters:

-- Check the existing clustered index on Sales.Orders
SELECT 
    i.name AS IndexName,
    i.type_desc AS IndexType,
    COL_NAME(ic.object_id, ic.column_id) AS ColumnName
FROM sys.indexes i
INNER JOIN sys.index_columns ic 
    ON i.object_id = ic.object_id 
    AND i.index_id = ic.index_id
WHERE i.object_id = OBJECT_ID('Sales.Orders')
    AND i.type_desc = 'CLUSTERED';

The Sales.Orders table has a clustered index on OrderID. This means the actual data rows are physically stored in OrderID order on disk. When you query:

SELECT OrderID, CustomerID, OrderDate, ExpectedDeliveryDate
FROM Sales.Orders
WHERE OrderID = 5000;

SQL Server uses the clustered index structure to navigate directly to OrderID 5000. It's like a B-tree structure where:

  1. The root page points to intermediate pages
  2. Intermediate pages point to other intermediate pages or leaf pages
  3. Leaf pages contain the actual data rows

The Impact of Clustered Index Key Selection

Choosing the right clustered index key is critical. In WideWorldImporters, most tables use an auto-incrementing identity column as the clustered index, which is a best practice because:

  • Sequential inserts are efficient (new rows go at the end)
  • No page splits occur during inserts
  • Narrow key (4 bytes for INT) means smaller index size

Here's what happens with a poor clustered index choice:

-- Example: DON'T do this on a large table
-- This would be inefficient for Sales.Orders
CREATE CLUSTERED INDEX IX_BadExample 
ON Sales.Orders(CustomerName); -- NVARCHAR column, not unique

This creates problems:

  • Wide keys (strings) increase index size
  • Non-unique values require SQL Server to add hidden uniquifiers
  • Random inserts cause page splits and fragmentation

Nonclustered Indexes: Pointers to Your Data

Unlike clustered indexes, you can have up to 999 nonclustered indexes per table (though you'd never want that many). Nonclustered indexes create a separate structure that contains:

  1. The index key columns you specify
  2. A pointer to the actual data row

The Anatomy of a Nonclustered Index

Let's create a practical nonclustered index on the Sales.Customers table:

-- Create a nonclustered index for customer lookup by name
CREATE NONCLUSTERED INDEX IX_Customers_CustomerName
ON Sales.Customers(CustomerName);

This index creates a B-tree structure where:

  • Leaf level contains CustomerName values and pointers
  • Pointers are the clustered index key (CustomerID)
  • The structure is sorted by CustomerName

When you execute this query:

SELECT CustomerID, CustomerName, PhoneNumber
FROM Sales.Customers
WHERE CustomerName = 'Tailspin Toys (Head Office)';

SQL Server:

  1. Seeks to "Tailspin Toys (Head Office)" in the nonclustered index
  2. Finds the CustomerID (the clustered key)
  3. Uses the CustomerID to look up the full row in the clustered index
  4. Returns the PhoneNumber (not in the nonclustered index)

That third step is called a key lookup, and it's expensive when returning many rows.

Included Columns: The Performance Game-Changer

This is where included columns become powerful. Included columns allow you to store additional columns at the leaf level of a nonclustered index without making them part of the index key.

Why Included Columns Matter

Let's optimize our previous query using included columns:

-- Drop the old index
DROP INDEX IF EXISTS IX_Customers_CustomerName 
ON Sales.Customers;

-- Create an improved version with included columns
CREATE NONCLUSTERED INDEX IX_Customers_CustomerName_Includes
ON Sales.Customers(CustomerName)
INCLUDE (PhoneNumber, DeliveryAddressLine1, DeliveryPostalCode);

Now when you run the same query:

SELECT CustomerID, CustomerName, PhoneNumber
FROM Sales.Customers
WHERE CustomerName = 'Tailspin Toys (Head Office)';

SQL Server can satisfy the entire query from the nonclustered index alone—no key lookup needed! This is called a covering index.

Real-World Example: Optimizing Order Reporting

Let's tackle a common reporting query in WideWorldImporters:

-- Common query: Orders within a date range with customer details
SELECT 
    o.OrderID,
    o.OrderDate,
    o.ExpectedDeliveryDate,
    c.CustomerName,
    p.FullName AS SalespersonName
FROM Sales.Orders o
INNER JOIN Sales.Customers c ON o.CustomerID = c.CustomerID
INNER JOIN Application.People p ON o.SalespersonPersonID = p.PersonID
WHERE o.OrderDate >= '2016-01-01' 
    AND o.OrderDate < '2016-02-01'
ORDER BY o.OrderDate;

Without proper indexes, this query performs table scans. Let's create an optimal index:

-- Create a covering index for date-range order queries
CREATE NONCLUSTERED INDEX IX_Orders_OrderDate_Covering
ON Sales.Orders(OrderDate)
INCLUDE (CustomerID, SalespersonPersonID, ExpectedDeliveryDate);

This index:

  • Has OrderDate as the key column (supports the WHERE clause and ORDER BY)
  • Includes CustomerID and SalespersonPersonID (for the JOINs)
  • Includes ExpectedDeliveryDate (in the SELECT list)
  • Eliminates key lookups for this specific query pattern

Key Differences: Clustered vs. Nonclustered vs. Included Columns

FeatureClustered IndexNonclustered IndexIncluded Columns
Limit per table1999Part of nonclustered
StorageData pages are leaf levelSeparate B-tree structureStored at leaf level only
Key size impactAffects all nonclustered indexesOnly affects this indexNo impact on key size
SearchableYesYes (key columns only)No (not searchable)
Best forPrimary access patternCommon query filtersSELECT list columns

Step-by-Step Guide: Building Effective Indexes

Step 1: Identify Slow Queries

Use execution plans to find table scans and key lookups:

-- Enable actual execution plan in SSMS (Ctrl+M)
-- Then run your query
SELECT 
    ol.OrderLineID,
    ol.StockItemID,
    ol.Quantity,
    ol.UnitPrice,
    si.StockItemName
FROM Sales.OrderLines ol
INNER JOIN Warehouse.StockItems si ON ol.StockItemID = si.StockItemID
WHERE ol.OrderID = 5000;

Look for:

  • Table Scan (bad for large tables)
  • Index Scan (reads entire index)
  • Key Lookup (indicates missing included columns)

Step 2: Analyze the WHERE Clause

The most selective columns in your WHERE clause should be your index keys:

-- Good index candidate: filter on StockItemID
CREATE NONCLUSTERED INDEX IX_OrderLines_StockItemID
ON Sales.OrderLines(StockItemID)
INCLUDE (Quantity, UnitPrice);

Step 3: Add Columns from SELECT and JOIN

Included columns should cover the rest of your query needs:

-- Even better: cover the entire query
CREATE NONCLUSTERED INDEX IX_OrderLines_OrderID_StockItemID
ON Sales.OrderLines(OrderID, StockItemID)
INCLUDE (Quantity, UnitPrice, Description);

Step 4: Test and Measure

Always compare execution plans before and after:

-- Check index usage statistics
SELECT 
    OBJECT_NAME(s.object_id) AS TableName,
    i.name AS IndexName,
    s.user_seeks,
    s.user_scans,
    s.user_lookups,
    s.user_updates
FROM sys.dm_db_index_usage_stats s
INNER JOIN sys.indexes i 
    ON s.object_id = i.object_id 
    AND s.index_id = i.index_id
WHERE OBJECT_NAME(s.object_id) = 'OrderLines';

Common Mistakes and How to Avoid Them

Mistake 1: Over-Indexing

Every index speeds up reads but slows down inserts, updates, and deletes. I've seen developers create 20+ indexes on a single table, causing insert operations to grind to a halt.

Solution: Create indexes based on actual query patterns, not hypothetical scenarios. Use the query store to identify your most frequent and expensive queries.

Mistake 2: Wide Clustered Index Keys

-- BAD: Wide clustered index key
CREATE CLUSTERED INDEX IX_Bad
ON Warehouse.StockItems(StockItemName, SupplierID);

This makes every nonclustered index larger because they all store the clustered key as the row pointer.

Solution: Keep clustered index keys narrow—preferably a single INT or BIGINT identity column.

Mistake 3: Forgetting Column Order in Composite Indexes

Column order matters! An index on (OrderDate, CustomerID) is NOT the same as (CustomerID, OrderDate).

-- This index helps: WHERE OrderDate = X AND CustomerID = Y
CREATE INDEX IX_Orders_Date_Customer
ON Sales.Orders(OrderDate, CustomerID);

-- But NOT: WHERE CustomerID = Y (needs CustomerID first)

Solution: Put the most selective columns first, and arrange columns to match your WHERE clause order when possible.

Mistake 4: Not Using Included Columns

Adding columns to the key when they should be included wastes space and limits index effectiveness:

-- BAD: Making columns searchable that don't need to be
CREATE INDEX IX_Wasteful
ON Sales.Customers(CustomerName, PhoneNumber, DeliveryAddressLine1);

-- GOOD: Only searchable column in key
CREATE INDEX IX_Efficient
ON Sales.Customers(CustomerName)
INCLUDE (PhoneNumber, DeliveryAddressLine1);

Performance Tips from the Trenches

1. Use filtered indexes for subset queries:

-- Index only active orders
CREATE NONCLUSTERED INDEX IX_Orders_Active
ON Sales.Orders(OrderDate)
INCLUDE (CustomerID)
WHERE PickingCompletedWhen IS NULL;

2. Monitor index fragmentation:

SELECT 
    OBJECT_NAME(ips.object_id) AS TableName,
    i.name AS IndexName,
    ips.avg_fragmentation_in_percent,
    ips.page_count
FROM sys.dm_db_index_physical_stats(
    DB_ID(), NULL, NULL, NULL, 'SAMPLED') ips
INNER JOIN sys.indexes i 
    ON ips.object_id = i.object_id 
    AND ips.index_id = i.index_id
WHERE ips.avg_fragmentation_in_percent > 30
    AND ips.page_count > 1000
ORDER BY ips.avg_fragmentation_in_percent DESC;

3. Consider columnstore indexes for analytics:

For large fact tables with aggregation queries, columnstore indexes can be significantly faster than traditional B-tree indexes.

Summary: Key Takeaways

Understanding how SQL Server indexes actually work is essential for building high-performance databases. Here's what we've covered:

  • Clustered indexes determine the physical storage order of data; you can have only one per table
  • Nonclustered indexes create separate structures with pointers back to the data
  • Included columns make nonclustered indexes "covering" without increasing key size
  • Choose clustered index keys carefully—preferably narrow, unique, and sequential
  • Index the columns in your WHERE clause, and include columns from SELECT
  • Avoid over-indexing; every index has a maintenance cost
  • Column order matters in composite indexes
  • Monitor index usage and fragmentation regularly

The difference between a well-indexed database and a poorly indexed one can be orders of magnitude in performance. Start with your most critical queries, measure the impact, and iterate.

Frequently Asked Questions

Q: How many nonclustered indexes should I create on a table?

A: There's no magic number, but start conservatively. Focus on indexes that support your most frequent and expensive queries. I typically see 3-7 nonclustered indexes per table in OLTP systems. Monitor insert/update performance as you add indexes, and remove indexes that aren't being used (check sys.dm_db_index_usage_stats).

Q: When should I use included columns instead of adding columns to the index key?

A: Use included columns when you need to cover a query but don't need to search, sort, or filter on those columns. Included columns are only stored at the leaf level and don't increase the size of upper-level index pages. If a column appears only in your SELECT list (not in WHERE, JOIN, or ORDER BY), it should be an included column.

Q: Can I have a table without a clustered index?

A: Yes! These are called heaps. However, for most OLTP tables, I strongly recommend having a clustered index. Heaps can suffer from fragmentation and forwarding pointers. The main exception is staging tables where you're doing bulk inserts and then processing the data—heaps can be faster for pure bulk loading scenarios.