Welcome back to our deep dive into SQL window functions! In Part 1, we mastered ranking and sequential functions. Now, we'll explore aggregate window functions that calculate running totals and moving averages, along with advanced functions like NTILE, FIRST_VALUE, and LAST_VALUE. These techniques are essential for financial reporting, trend analysis, and customer segmentation—and I'll show you practical examples using the WideWorldImporters database.
Quick Recap: Window Functions Basics
If you haven't read Part 1, here's what you need to know: window functions perform calculations across related rows without collapsing your result set. They use the OVER clause with three components:
- PARTITION BY: Divides data into groups
- ORDER BY: Defines row sequence
- Frame Specification: Defines which rows to include (the focus of this post!)
Aggregate Window Functions: Running Totals and Cumulative Calculations
You can use familiar aggregate functions (SUM, AVG, COUNT, MIN, MAX) as window functions to create running calculations without losing row-level detail. This is a game-changer for financial reports and dashboards.
Running Totals: Cumulative Sums
Running totals are one of the most requested analytics in financial reporting and dashboards. They show cumulative values as you progress through your data.
-- Calculate cumulative revenue by customer over time
SELECT
c.CustomerName,
o.OrderDate,
SUM(ol.Quantity * ol.UnitPrice) AS OrderTotal,
SUM(SUM(ol.Quantity * ol.UnitPrice)) OVER (
PARTITION BY c.CustomerID
ORDER BY o.OrderDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS RunningTotal
FROM Sales.Customers c
INNER JOIN Sales.Orders o ON c.CustomerID = o.CustomerID
INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
WHERE c.CustomerName LIKE 'Tailspin Toys%'
GROUP BY c.CustomerID, c.CustomerName, o.OrderDate, o.OrderID
ORDER BY c.CustomerName, o.OrderDate;
The ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW frame specification explicitly tells SQL Server to include all rows from the start of the partition up to the current row. While this is the default behavior when you use ORDER BY, being explicit improves readability.
Performance Note: Running totals with window functions are dramatically faster than self-joins or correlated subqueries. On tables with millions of rows, the difference can be 10x-100x faster.
Year-to-Date Calculations
A common business requirement is calculating year-to-date totals that reset each year:
-- Year-to-date revenue with monthly breakdown
WITH MonthlyRevenue AS (
SELECT
YEAR(o.OrderDate) AS OrderYear,
MONTH(o.OrderDate) AS OrderMonth,
SUM(ol.Quantity * ol.UnitPrice) AS MonthlyTotal
FROM Sales.Orders o
INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
WHERE o.OrderDate >= '2015-01-01'
GROUP BY YEAR(o.OrderDate), MONTH(o.OrderDate)
)
SELECT
OrderYear,
OrderMonth,
MonthlyTotal,
SUM(MonthlyTotal) OVER (
PARTITION BY OrderYear
ORDER BY OrderMonth
ROWS UNBOUNDED PRECEDING
) AS YearToDateTotal,
AVG(MonthlyTotal) OVER (
PARTITION BY OrderYear
ORDER BY OrderMonth
ROWS UNBOUNDED PRECEDING
) AS YearToDateAverage
FROM MonthlyRevenue
ORDER BY OrderYear, OrderMonth;
By partitioning by OrderYear, the running total resets each January—exactly what year-to-date reporting requires.
Percentage of Total Calculations
Window functions make it easy to show both detail and context in a single query:
-- Show each order line with its percentage of total order value
SELECT
o.OrderID,
o.OrderDate,
si.StockItemName,
ol.Quantity,
ol.UnitPrice,
ol.Quantity * ol.UnitPrice AS LineTotal,
SUM(ol.Quantity * ol.UnitPrice) OVER (PARTITION BY o.OrderID) AS OrderTotal,
CAST(
(ol.Quantity * ol.UnitPrice) * 100.0 /
SUM(ol.Quantity * ol.UnitPrice) OVER (PARTITION BY o.OrderID)
AS DECIMAL(5,2)
) AS PercentOfOrder
FROM Sales.Orders o
INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
INNER JOIN Warehouse.StockItems si ON ol.StockItemID = si.StockItemID
WHERE o.OrderID IN (SELECT TOP 5 OrderID FROM Sales.Orders ORDER BY OrderDate DESC)
ORDER BY o.OrderID, LineTotal DESC;
This query shows each line item with its percentage contribution to the order total—invaluable for understanding order composition and identifying high-value items.
Moving Averages: Smoothing Trends and Detecting Patterns
Moving averages smooth out short-term fluctuations and highlight longer-term trends. They're essential for time series analysis, forecasting, and anomaly detection.
Simple Moving Average
A simple moving average calculates the average over a fixed window of preceding rows:
-- Calculate 7-day moving average of daily orders
WITH DailyOrders AS (
SELECT
CAST(o.OrderDate AS DATE) AS OrderDate,
COUNT(DISTINCT o.OrderID) AS DailyOrderCount,
SUM(ol.Quantity * ol.UnitPrice) AS DailyRevenue
FROM Sales.Orders o
INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
WHERE o.OrderDate >= '2016-01-01' AND o.OrderDate < '2016-12-31'
GROUP BY CAST(o.OrderDate AS DATE)
)
SELECT
OrderDate,
DailyOrderCount,
DailyRevenue,
AVG(DailyOrderCount) OVER (
ORDER BY OrderDate
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS MovingAvg7Day,
AVG(DailyRevenue) OVER (
ORDER BY OrderDate
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS RevenueMovingAvg7Day
FROM DailyOrders
ORDER BY OrderDate;
The frame ROWS BETWEEN 6 PRECEDING AND CURRENT ROW creates a 7-day window (6 previous days plus current day). This is perfect for detecting trends while filtering out daily noise.
Multiple Time Windows for Comparison
You can calculate multiple moving averages with different windows to compare short-term and long-term trends:
-- Compare 7-day and 30-day moving averages
WITH DailyMetrics AS (
SELECT
CAST(o.OrderDate AS DATE) AS OrderDate,
COUNT(DISTINCT o.OrderID) AS OrderCount,
SUM(ol.Quantity * ol.UnitPrice) AS Revenue
FROM Sales.Orders o
INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
WHERE o.OrderDate >= '2016-01-01'
GROUP BY CAST(o.OrderDate AS DATE)
)
SELECT
OrderDate,
Revenue,
AVG(Revenue) OVER (
ORDER BY OrderDate
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS MA_7Day,
AVG(Revenue) OVER (
ORDER BY OrderDate
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) AS MA_30Day,
-- Signal: When 7-day crosses above 30-day, might indicate upward trend
CASE
WHEN AVG(Revenue) OVER (ORDER BY OrderDate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) >
AVG(Revenue) OVER (ORDER BY OrderDate ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)
THEN 'Uptrend'
ELSE 'Downtrend'
END AS TrendSignal
FROM DailyMetrics
ORDER BY OrderDate;
When the short-term average crosses above the long-term average, it often signals a positive trend—a technique borrowed from technical analysis in finance.
Understanding Frame Specifications: ROWS vs RANGE
Frame specifications define exactly which rows to include in your window calculation. This is where window functions become truly powerful and flexible.
ROWS: Physical Row Counting
ROWS counts physical rows regardless of their values:
-- Example: Include exactly 3 preceding rows
SELECT
OrderDate,
DailyRevenue,
AVG(DailyRevenue) OVER (
ORDER BY OrderDate
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
) AS FourDayAverage
FROM DailyRevenue;
This always includes exactly 4 rows (3 preceding + current) in the average calculation.
RANGE: Logical Value-Based Windows
RANGE includes all rows with values within a specified range:
-- Example: Include all orders within 7 days before current row
SELECT
OrderDate,
DailyRevenue,
SUM(DailyRevenue) OVER (
ORDER BY OrderDate
RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW
) AS RollingWeekRevenue
FROM DailyRevenue;
RANGE is useful when you have gaps in your data (like weekends) and want to include all rows within a time period rather than a fixed row count.
Important: For most scenarios with sequential data, ROWS is more predictable and faster than RANGE.
Common Frame Specifications
-- All rows from partition start to current row (running total)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- Fixed window: 7 rows (6 before + current)
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
-- Centered window: 3 before, current, 3 after
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
-- All rows in partition
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
-- Current row only (default with no frame specification)
ROWS BETWEEN CURRENT ROW AND CURRENT ROW
Advanced Window Functions: NTILE, FIRST_VALUE, and LAST_VALUE
NTILE: Creating Buckets and Quartiles
NTILE divides your result set into a specified number of roughly equal groups. It's perfect for creating percentile groups or categorizing data into buckets.
-- Divide customers into 4 quartiles based on total purchase amount
WITH CustomerSpending AS (
SELECT
c.CustomerID,
c.CustomerName,
SUM(ol.Quantity * ol.UnitPrice) AS TotalSpent,
COUNT(DISTINCT o.OrderID) AS OrderCount
FROM Sales.Customers c
INNER JOIN Sales.Orders o ON c.CustomerID = o.CustomerID
INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
GROUP BY c.CustomerID, c.CustomerName
)
SELECT
CustomerName,
TotalSpent,
OrderCount,
NTILE(4) OVER (ORDER BY TotalSpent DESC) AS SpendingQuartile,
CASE NTILE(4) OVER (ORDER BY TotalSpent DESC)
WHEN 1 THEN 'Premium'
WHEN 2 THEN 'Gold'
WHEN 3 THEN 'Silver'
WHEN 4 THEN 'Bronze'
END AS CustomerTier
FROM CustomerSpending
ORDER BY TotalSpent DESC;
NTILE(4) divides customers into 4 groups. The top 25% get quartile 1, next 25% get quartile 2, etc. This is perfect for customer segmentation, RFM analysis, or ABC classification.
Real-World Application: I've used NTILE extensively in customer analytics to identify VIP customers (top 10%), at-risk customers (bottom 20%), and to create balanced test/control groups for A/B testing. It's also invaluable for salary banding and performance tier assignments.
Creating Deciles and Percentiles
-- Create 10 deciles (10% buckets) for stock items by price
SELECT
StockItemName,
UnitPrice,
NTILE(10) OVER (ORDER BY UnitPrice) AS PriceDecile,
NTILE(100) OVER (ORDER BY UnitPrice) AS PricePercentile
FROM Warehouse.StockItems
WHERE UnitPrice > 0
ORDER BY UnitPrice;
NTILE(100) creates percentiles—the top 1% gets percentile 100, bottom 1% gets percentile 1. This granular segmentation is powerful for pricing strategies and competitive analysis.
FIRST_VALUE and LAST_VALUE: Anchoring to Boundary Values
These functions retrieve values from the first or last row in the window frame.
-- Compare each month's revenue to the best and worst months
WITH MonthlySales AS (
SELECT
YEAR(o.OrderDate) AS OrderYear,
MONTH(o.OrderDate) AS OrderMonth,
SUM(ol.Quantity * ol.UnitPrice) AS MonthlyRevenue
FROM Sales.Orders o
INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
WHERE YEAR(o.OrderDate) = 2016
GROUP BY YEAR(o.OrderDate), MONTH(o.OrderDate)
)
SELECT
OrderYear,
OrderMonth,
MonthlyRevenue,
FIRST_VALUE(MonthlyRevenue) OVER (
ORDER BY MonthlyRevenue DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS BestMonth,
LAST_VALUE(MonthlyRevenue) OVER (
ORDER BY MonthlyRevenue DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS WorstMonth,
MonthlyRevenue - FIRST_VALUE(MonthlyRevenue) OVER (
ORDER BY MonthlyRevenue DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS GapFromBest,
CAST(
MonthlyRevenue * 100.0 / FIRST_VALUE(MonthlyRevenue) OVER (
ORDER BY MonthlyRevenue DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS DECIMAL(5,2)
) AS PercentOfBest
FROM MonthlySales
ORDER BY OrderMonth;
Critical Gotcha: LAST_VALUE requires ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to work as expected. Without this frame specification, LAST_VALUE only looks at rows up to the current row, not the actual last row in the partition. This is one of the most common mistakes I see developers make.
Comparing to Baseline Values
-- Compare each product's current inventory to its highest and lowest levels
SELECT
si.StockItemName,
sih.QuantityOnHand,
FIRST_VALUE(sih.QuantityOnHand) OVER (
PARTITION BY si.StockItemID
ORDER BY sih.QuantityOnHand DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS HighestInventory,
LAST_VALUE(sih.QuantityOnHand) OVER (
PARTITION BY si.StockItemID
ORDER BY sih.QuantityOnHand DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS LowestInventory
FROM Warehouse.StockItems si
CROSS APPLY (
SELECT TOP 1 QuantityOnHand
FROM Warehouse.StockItemHoldings
WHERE StockItemID = si.StockItemID
) sih
WHERE si.IsChillerStock = 0;
Real-World Use Case: Customer Lifetime Value Dashboard
Let me show you a comprehensive example that combines multiple aggregate window functions for customer analytics:
-- Comprehensive customer lifetime value analysis
WITH CustomerOrderDetails AS (
SELECT
c.CustomerID,
c.CustomerName,
o.OrderID,
o.OrderDate,
SUM(ol.Quantity * ol.UnitPrice) AS OrderValue
FROM Sales.Customers c
INNER JOIN Sales.Orders o ON c.CustomerID = o.CustomerID
INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
GROUP BY c.CustomerID, c.CustomerName, o.OrderID, o.OrderDate
),
CustomerMetrics AS (
SELECT
CustomerID,
CustomerName,
OrderID,
OrderDate,
OrderValue,
-- Running total of customer spending
SUM(OrderValue) OVER (
PARTITION BY CustomerID
ORDER BY OrderDate
ROWS UNBOUNDED PRECEDING
) AS LifetimeValue,
-- Average order value up to this point
AVG(OrderValue) OVER (
PARTITION BY CustomerID
ORDER BY OrderDate
ROWS UNBOUNDED PRECEDING
) AS AvgOrderValue,
-- Count of orders so far
COUNT(*) OVER (
PARTITION BY CustomerID
ORDER BY OrderDate
ROWS UNBOUNDED PRECEDING
) AS OrderCount,
-- 3-order moving average
AVG(OrderValue) OVER (
PARTITION BY CustomerID
ORDER BY OrderDate
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS MovingAvg3Orders
FROM CustomerOrderDetails
)
SELECT
CustomerName,
OrderDate,
OrderValue,
OrderCount,
LifetimeValue,
AvgOrderValue,
MovingAvg3Orders,
-- Categorize customer based on lifetime value
NTILE(4) OVER (ORDER BY LifetimeValue DESC) AS ValueQuartile,
-- Compare to best performing customer
CAST(
LifetimeValue * 100.0 / FIRST_VALUE(LifetimeValue) OVER (
ORDER BY LifetimeValue DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS DECIMAL(5,2)
) AS PercentOfTopCustomer
FROM CustomerMetrics
WHERE CustomerID IN (SELECT TOP 5 CustomerID FROM Sales.Customers ORDER BY CustomerID)
ORDER BY CustomerName, OrderDate;
This query provides a complete customer analytics dashboard: lifetime value progression, average order patterns, trending analysis, customer segmentation, and competitive benchmarking—all in one result set.
Performance Optimization for Aggregate Window Functions
1. Pre-Aggregate When Possible
If you're calculating window functions on aggregated data, do the aggregation first:
-- BETTER: Pre-aggregate before window functions
WITH DailySales AS (
SELECT
CAST(OrderDate AS DATE) AS OrderDate,
SUM(OrderTotal) AS DayTotal
FROM Sales.Orders
GROUP BY CAST(OrderDate AS DATE)
)
SELECT
OrderDate,
DayTotal,
SUM(DayTotal) OVER (ORDER BY OrderDate) AS RunningTotal,
AVG(DayTotal) OVER (ORDER BY OrderDate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS MA_7Day
FROM DailySales;
This reduces the number of rows the window function processes, significantly improving performance.
2. Index Appropriately
Create indexes on partition and order columns:
-- Support window functions on Orders table
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_OrderDate
ON Sales.Orders (CustomerID, OrderDate)
INCLUDE (OrderID);
-- Support date-based aggregations
CREATE NONCLUSTERED INDEX IX_Orders_OrderDate
ON Sales.Orders (OrderDate)
INCLUDE (CustomerID);
3. Use ROWS Instead of RANGE
ROWS is generally faster and more predictable:
-- FASTER: Use ROWS for fixed window sizes
AVG(Revenue) OVER (ORDER BY OrderDate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
-- SLOWER: RANGE can include variable numbers of rows
AVG(Revenue) OVER (ORDER BY OrderDate RANGE BETWEEN 6 PRECEDING AND CURRENT ROW)
Common Mistakes and How to Avoid Them
Mistake 1: Forgetting Frame Specification for LAST_VALUE
-- WRONG: Without frame spec, LAST_VALUE only sees current row
SELECT OrderDate, Revenue,
LAST_VALUE(Revenue) OVER (ORDER BY OrderDate) AS LastRevenue
FROM DailyRevenue;
-- RIGHT: Specify full frame
SELECT OrderDate, Revenue,
LAST_VALUE(Revenue) OVER (
ORDER BY OrderDate
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS LastRevenue
FROM DailyRevenue;
Mistake 2: Mixing Window Functions with GROUP BY Incorrectly
-- WRONG: Window function on grouped data without proper structure
SELECT
CustomerID,
OrderDate,
SUM(OrderTotal) AS DailyTotal,
SUM(SUM(OrderTotal)) OVER (ORDER BY OrderDate) AS RunningTotal -- Nested SUM!
FROM Sales.Orders
GROUP BY CustomerID, OrderDate;
-- RIGHT: Aggregate first in CTE, then apply window function
WITH DailyTotals AS (
SELECT
CustomerID,
OrderDate,
SUM(OrderTotal) AS DailyTotal
FROM Sales.Orders
GROUP BY CustomerID, OrderDate
)
SELECT
CustomerID,
OrderDate,
DailyTotal,
SUM(DailyTotal) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS RunningTotal
FROM DailyTotals;
Mistake 3: Not Handling Edge Cases in Moving Averages
-- BETTER: Handle beginning rows with insufficient data
SELECT
OrderDate,
Revenue,
CASE
WHEN ROW_NUMBER() OVER (ORDER BY OrderDate) < 7 THEN NULL
ELSE AVG(Revenue) OVER (
ORDER BY OrderDate
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
)
END AS MA_7Day
FROM DailyRevenue;
The first 6 rows don't have enough data for a true 7-day average. You can return NULL or use partial windows depending on business requirements.
Key Takeaways: Mastering Aggregate and Advanced Window Functions
Here are the critical points to remember:
-
Running totals are simple: Use SUM with ORDER BY and ROWS UNBOUNDED PRECEDING for cumulative calculations.
-
Moving averages smooth trends: Perfect for filtering noise and detecting patterns in time series data.
-
Frame specifications matter: ROWS for physical row counts, RANGE for value-based windows. Always be explicit.
-
LAST_VALUE needs full frame: Use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING or you'll get unexpected results.
-
NTILE creates equal buckets: Perfect for customer segmentation, percentile analysis, and creating balanced groups.
-
Pre-aggregate for performance: Calculate sums/averages first in CTEs before applying window functions.
-
Index partition and order columns: This is the most impactful performance optimization.
-
Test edge cases: First and last rows in partitions, NULL values, and insufficient data for window frames.
Frequently Asked Questions
Q: What's the difference between ROWS and RANGE in frame specifications?
A: ROWS counts physical rows (e.g., "the 7 rows before this one"), while RANGE includes all rows with values within a specified range (e.g., "all rows with dates within 7 days"). ROWS is more predictable and faster for most use cases. Use RANGE when you have gaps in your data (like weekends) and need to include all rows within a time period regardless of how many there are.
Q: Can I use aggregate window functions without ORDER BY?
A: Yes, but the behavior changes significantly. Without ORDER BY, the window includes all rows in the partition, essentially giving you a partition-level aggregate on every row. With ORDER BY, you get cumulative or moving calculations. For example, SUM(Amount) OVER (PARTITION BY CustomerID) gives the total for all customer orders on every row, while SUM(Amount) OVER (PARTITION BY CustomerID ORDER BY OrderDate) gives a running total.
Q: Why is my running total query so slow?
A: The most common causes are: (1) Missing indexes on PARTITION BY and ORDER BY columns, (2) Not pre-aggregating data before applying window functions, (3) Using RANGE instead of ROWS, or (4) Processing too many rows at once. Start by checking your execution plan—look for table scans, sorts, and high row counts. Add appropriate indexes, consider pre-aggregation in CTEs, and make sure you're using ROWS for frame specifications.