Home / Intermediate SQL / SQL window functions / T-SQL / SQL Window Functions Part 2: Running Totals, Moving Averages and Advanced Techniques

SQL Window Functions Part 2: Running Totals, Moving Averages and Advanced Techniques

On

Welcome back to our deep dive into SQL window functions! In Part 1, we mastered ranking and sequential functions. Now, we'll explore aggregate window functions that calculate running totals and moving averages, along with advanced functions like NTILE, FIRST_VALUE, and LAST_VALUE. These techniques are essential for financial reporting, trend analysis, and customer segmentation—and I'll show you practical examples using the WideWorldImporters database.


Quick Recap: Window Functions Basics

If you haven't read Part 1, here's what you need to know: window functions perform calculations across related rows without collapsing your result set. They use the OVER clause with three components:

  • PARTITION BY: Divides data into groups
  • ORDER BY: Defines row sequence
  • Frame Specification: Defines which rows to include (the focus of this post!)

Aggregate Window Functions: Running Totals and Cumulative Calculations

You can use familiar aggregate functions (SUM, AVG, COUNT, MIN, MAX) as window functions to create running calculations without losing row-level detail. This is a game-changer for financial reports and dashboards.

Running Totals: Cumulative Sums

Running totals are one of the most requested analytics in financial reporting and dashboards. They show cumulative values as you progress through your data.

-- Calculate cumulative revenue by customer over time
SELECT 
    c.CustomerName,
    o.OrderDate,
    SUM(ol.Quantity * ol.UnitPrice) AS OrderTotal,
    SUM(SUM(ol.Quantity * ol.UnitPrice)) OVER (
        PARTITION BY c.CustomerID 
        ORDER BY o.OrderDate
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS RunningTotal
FROM Sales.Customers c
INNER JOIN Sales.Orders o ON c.CustomerID = o.CustomerID
INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
WHERE c.CustomerName LIKE 'Tailspin Toys%'
GROUP BY c.CustomerID, c.CustomerName, o.OrderDate, o.OrderID
ORDER BY c.CustomerName, o.OrderDate;

The ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW frame specification explicitly tells SQL Server to include all rows from the start of the partition up to the current row. While this is the default behavior when you use ORDER BY, being explicit improves readability.

Performance Note: Running totals with window functions are dramatically faster than self-joins or correlated subqueries. On tables with millions of rows, the difference can be 10x-100x faster.

Year-to-Date Calculations

A common business requirement is calculating year-to-date totals that reset each year:

-- Year-to-date revenue with monthly breakdown
WITH MonthlyRevenue AS (
    SELECT 
        YEAR(o.OrderDate) AS OrderYear,
        MONTH(o.OrderDate) AS OrderMonth,
        SUM(ol.Quantity * ol.UnitPrice) AS MonthlyTotal
    FROM Sales.Orders o
    INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
    WHERE o.OrderDate >= '2015-01-01'
    GROUP BY YEAR(o.OrderDate), MONTH(o.OrderDate)
)
SELECT 
    OrderYear,
    OrderMonth,
    MonthlyTotal,
    SUM(MonthlyTotal) OVER (
        PARTITION BY OrderYear 
        ORDER BY OrderMonth
        ROWS UNBOUNDED PRECEDING
    ) AS YearToDateTotal,
    AVG(MonthlyTotal) OVER (
        PARTITION BY OrderYear 
        ORDER BY OrderMonth
        ROWS UNBOUNDED PRECEDING
    ) AS YearToDateAverage
FROM MonthlyRevenue
ORDER BY OrderYear, OrderMonth;

By partitioning by OrderYear, the running total resets each January—exactly what year-to-date reporting requires.

Percentage of Total Calculations

Window functions make it easy to show both detail and context in a single query:

-- Show each order line with its percentage of total order value
SELECT 
    o.OrderID,
    o.OrderDate,
    si.StockItemName,
    ol.Quantity,
    ol.UnitPrice,
    ol.Quantity * ol.UnitPrice AS LineTotal,
    SUM(ol.Quantity * ol.UnitPrice) OVER (PARTITION BY o.OrderID) AS OrderTotal,
    CAST(
        (ol.Quantity * ol.UnitPrice) * 100.0 / 
        SUM(ol.Quantity * ol.UnitPrice) OVER (PARTITION BY o.OrderID)
        AS DECIMAL(5,2)
    ) AS PercentOfOrder
FROM Sales.Orders o
INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
INNER JOIN Warehouse.StockItems si ON ol.StockItemID = si.StockItemID
WHERE o.OrderID IN (SELECT TOP 5 OrderID FROM Sales.Orders ORDER BY OrderDate DESC)
ORDER BY o.OrderID, LineTotal DESC;

This query shows each line item with its percentage contribution to the order total—invaluable for understanding order composition and identifying high-value items.


Moving Averages: Smoothing Trends and Detecting Patterns

Moving averages smooth out short-term fluctuations and highlight longer-term trends. They're essential for time series analysis, forecasting, and anomaly detection.

Simple Moving Average

A simple moving average calculates the average over a fixed window of preceding rows:

-- Calculate 7-day moving average of daily orders
WITH DailyOrders AS (
    SELECT 
        CAST(o.OrderDate AS DATE) AS OrderDate,
        COUNT(DISTINCT o.OrderID) AS DailyOrderCount,
        SUM(ol.Quantity * ol.UnitPrice) AS DailyRevenue
    FROM Sales.Orders o
    INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
    WHERE o.OrderDate >= '2016-01-01' AND o.OrderDate < '2016-12-31'
    GROUP BY CAST(o.OrderDate AS DATE)
)
SELECT 
    OrderDate,
    DailyOrderCount,
    DailyRevenue,
    AVG(DailyOrderCount) OVER (
        ORDER BY OrderDate
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS MovingAvg7Day,
    AVG(DailyRevenue) OVER (
        ORDER BY OrderDate
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS RevenueMovingAvg7Day
FROM DailyOrders
ORDER BY OrderDate;

The frame ROWS BETWEEN 6 PRECEDING AND CURRENT ROW creates a 7-day window (6 previous days plus current day). This is perfect for detecting trends while filtering out daily noise.

Multiple Time Windows for Comparison

You can calculate multiple moving averages with different windows to compare short-term and long-term trends:

-- Compare 7-day and 30-day moving averages
WITH DailyMetrics AS (
    SELECT 
        CAST(o.OrderDate AS DATE) AS OrderDate,
        COUNT(DISTINCT o.OrderID) AS OrderCount,
        SUM(ol.Quantity * ol.UnitPrice) AS Revenue
    FROM Sales.Orders o
    INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
    WHERE o.OrderDate >= '2016-01-01'
    GROUP BY CAST(o.OrderDate AS DATE)
)
SELECT 
    OrderDate,
    Revenue,
    AVG(Revenue) OVER (
        ORDER BY OrderDate
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS MA_7Day,
    AVG(Revenue) OVER (
        ORDER BY OrderDate
        ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
    ) AS MA_30Day,
    -- Signal: When 7-day crosses above 30-day, might indicate upward trend
    CASE 
        WHEN AVG(Revenue) OVER (ORDER BY OrderDate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) > 
             AVG(Revenue) OVER (ORDER BY OrderDate ROWS BETWEEN 29 PRECEDING AND CURRENT ROW)
        THEN 'Uptrend'
        ELSE 'Downtrend'
    END AS TrendSignal
FROM DailyMetrics
ORDER BY OrderDate;

When the short-term average crosses above the long-term average, it often signals a positive trend—a technique borrowed from technical analysis in finance.


Understanding Frame Specifications: ROWS vs RANGE

Frame specifications define exactly which rows to include in your window calculation. This is where window functions become truly powerful and flexible.

ROWS: Physical Row Counting

ROWS counts physical rows regardless of their values:

-- Example: Include exactly 3 preceding rows
SELECT 
    OrderDate,
    DailyRevenue,
    AVG(DailyRevenue) OVER (
        ORDER BY OrderDate
        ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
    ) AS FourDayAverage
FROM DailyRevenue;

This always includes exactly 4 rows (3 preceding + current) in the average calculation.

RANGE: Logical Value-Based Windows

RANGE includes all rows with values within a specified range:

-- Example: Include all orders within 7 days before current row
SELECT 
    OrderDate,
    DailyRevenue,
    SUM(DailyRevenue) OVER (
        ORDER BY OrderDate
        RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW
    ) AS RollingWeekRevenue
FROM DailyRevenue;

RANGE is useful when you have gaps in your data (like weekends) and want to include all rows within a time period rather than a fixed row count.

Important: For most scenarios with sequential data, ROWS is more predictable and faster than RANGE.

Common Frame Specifications

-- All rows from partition start to current row (running total)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

-- Fixed window: 7 rows (6 before + current)
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW

-- Centered window: 3 before, current, 3 after
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING

-- All rows in partition
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

-- Current row only (default with no frame specification)
ROWS BETWEEN CURRENT ROW AND CURRENT ROW

Advanced Window Functions: NTILE, FIRST_VALUE, and LAST_VALUE

NTILE: Creating Buckets and Quartiles

NTILE divides your result set into a specified number of roughly equal groups. It's perfect for creating percentile groups or categorizing data into buckets.

-- Divide customers into 4 quartiles based on total purchase amount
WITH CustomerSpending AS (
    SELECT 
        c.CustomerID,
        c.CustomerName,
        SUM(ol.Quantity * ol.UnitPrice) AS TotalSpent,
        COUNT(DISTINCT o.OrderID) AS OrderCount
    FROM Sales.Customers c
    INNER JOIN Sales.Orders o ON c.CustomerID = o.CustomerID
    INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
    GROUP BY c.CustomerID, c.CustomerName
)
SELECT 
    CustomerName,
    TotalSpent,
    OrderCount,
    NTILE(4) OVER (ORDER BY TotalSpent DESC) AS SpendingQuartile,
    CASE NTILE(4) OVER (ORDER BY TotalSpent DESC)
        WHEN 1 THEN 'Premium'
        WHEN 2 THEN 'Gold'
        WHEN 3 THEN 'Silver'
        WHEN 4 THEN 'Bronze'
    END AS CustomerTier
FROM CustomerSpending
ORDER BY TotalSpent DESC;

NTILE(4) divides customers into 4 groups. The top 25% get quartile 1, next 25% get quartile 2, etc. This is perfect for customer segmentation, RFM analysis, or ABC classification.

Real-World Application: I've used NTILE extensively in customer analytics to identify VIP customers (top 10%), at-risk customers (bottom 20%), and to create balanced test/control groups for A/B testing. It's also invaluable for salary banding and performance tier assignments.

Creating Deciles and Percentiles

-- Create 10 deciles (10% buckets) for stock items by price
SELECT 
    StockItemName,
    UnitPrice,
    NTILE(10) OVER (ORDER BY UnitPrice) AS PriceDecile,
    NTILE(100) OVER (ORDER BY UnitPrice) AS PricePercentile
FROM Warehouse.StockItems
WHERE UnitPrice > 0
ORDER BY UnitPrice;

NTILE(100) creates percentiles—the top 1% gets percentile 100, bottom 1% gets percentile 1. This granular segmentation is powerful for pricing strategies and competitive analysis.

FIRST_VALUE and LAST_VALUE: Anchoring to Boundary Values

These functions retrieve values from the first or last row in the window frame.

-- Compare each month's revenue to the best and worst months
WITH MonthlySales AS (
    SELECT 
        YEAR(o.OrderDate) AS OrderYear,
        MONTH(o.OrderDate) AS OrderMonth,
        SUM(ol.Quantity * ol.UnitPrice) AS MonthlyRevenue
    FROM Sales.Orders o
    INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
    WHERE YEAR(o.OrderDate) = 2016
    GROUP BY YEAR(o.OrderDate), MONTH(o.OrderDate)
)
SELECT 
    OrderYear,
    OrderMonth,
    MonthlyRevenue,
    FIRST_VALUE(MonthlyRevenue) OVER (
        ORDER BY MonthlyRevenue DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS BestMonth,
    LAST_VALUE(MonthlyRevenue) OVER (
        ORDER BY MonthlyRevenue DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS WorstMonth,
    MonthlyRevenue - FIRST_VALUE(MonthlyRevenue) OVER (
        ORDER BY MonthlyRevenue DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS GapFromBest,
    CAST(
        MonthlyRevenue * 100.0 / FIRST_VALUE(MonthlyRevenue) OVER (
            ORDER BY MonthlyRevenue DESC
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        ) AS DECIMAL(5,2)
    ) AS PercentOfBest
FROM MonthlySales
ORDER BY OrderMonth;

Critical Gotcha: LAST_VALUE requires ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to work as expected. Without this frame specification, LAST_VALUE only looks at rows up to the current row, not the actual last row in the partition. This is one of the most common mistakes I see developers make.

Comparing to Baseline Values

-- Compare each product's current inventory to its highest and lowest levels
SELECT 
    si.StockItemName,
    sih.QuantityOnHand,
    FIRST_VALUE(sih.QuantityOnHand) OVER (
        PARTITION BY si.StockItemID 
        ORDER BY sih.QuantityOnHand DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS HighestInventory,
    LAST_VALUE(sih.QuantityOnHand) OVER (
        PARTITION BY si.StockItemID 
        ORDER BY sih.QuantityOnHand DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS LowestInventory
FROM Warehouse.StockItems si
CROSS APPLY (
    SELECT TOP 1 QuantityOnHand 
    FROM Warehouse.StockItemHoldings 
    WHERE StockItemID = si.StockItemID
) sih
WHERE si.IsChillerStock = 0;

Real-World Use Case: Customer Lifetime Value Dashboard

Let me show you a comprehensive example that combines multiple aggregate window functions for customer analytics:

-- Comprehensive customer lifetime value analysis
WITH CustomerOrderDetails AS (
    SELECT 
        c.CustomerID,
        c.CustomerName,
        o.OrderID,
        o.OrderDate,
        SUM(ol.Quantity * ol.UnitPrice) AS OrderValue
    FROM Sales.Customers c
    INNER JOIN Sales.Orders o ON c.CustomerID = o.CustomerID
    INNER JOIN Sales.OrderLines ol ON o.OrderID = ol.OrderID
    GROUP BY c.CustomerID, c.CustomerName, o.OrderID, o.OrderDate
),
CustomerMetrics AS (
    SELECT 
        CustomerID,
        CustomerName,
        OrderID,
        OrderDate,
        OrderValue,
        -- Running total of customer spending
        SUM(OrderValue) OVER (
            PARTITION BY CustomerID 
            ORDER BY OrderDate 
            ROWS UNBOUNDED PRECEDING
        ) AS LifetimeValue,
        -- Average order value up to this point
        AVG(OrderValue) OVER (
            PARTITION BY CustomerID 
            ORDER BY OrderDate 
            ROWS UNBOUNDED PRECEDING
        ) AS AvgOrderValue,
        -- Count of orders so far
        COUNT(*) OVER (
            PARTITION BY CustomerID 
            ORDER BY OrderDate 
            ROWS UNBOUNDED PRECEDING
        ) AS OrderCount,
        -- 3-order moving average
        AVG(OrderValue) OVER (
            PARTITION BY CustomerID 
            ORDER BY OrderDate 
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) AS MovingAvg3Orders
    FROM CustomerOrderDetails
)
SELECT 
    CustomerName,
    OrderDate,
    OrderValue,
    OrderCount,
    LifetimeValue,
    AvgOrderValue,
    MovingAvg3Orders,
    -- Categorize customer based on lifetime value
    NTILE(4) OVER (ORDER BY LifetimeValue DESC) AS ValueQuartile,
    -- Compare to best performing customer
    CAST(
        LifetimeValue * 100.0 / FIRST_VALUE(LifetimeValue) OVER (
            ORDER BY LifetimeValue DESC
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        ) AS DECIMAL(5,2)
    ) AS PercentOfTopCustomer
FROM CustomerMetrics
WHERE CustomerID IN (SELECT TOP 5 CustomerID FROM Sales.Customers ORDER BY CustomerID)
ORDER BY CustomerName, OrderDate;

This query provides a complete customer analytics dashboard: lifetime value progression, average order patterns, trending analysis, customer segmentation, and competitive benchmarking—all in one result set.


Performance Optimization for Aggregate Window Functions

1. Pre-Aggregate When Possible

If you're calculating window functions on aggregated data, do the aggregation first:

-- BETTER: Pre-aggregate before window functions
WITH DailySales AS (
    SELECT 
        CAST(OrderDate AS DATE) AS OrderDate,
        SUM(OrderTotal) AS DayTotal
    FROM Sales.Orders
    GROUP BY CAST(OrderDate AS DATE)
)
SELECT 
    OrderDate,
    DayTotal,
    SUM(DayTotal) OVER (ORDER BY OrderDate) AS RunningTotal,
    AVG(DayTotal) OVER (ORDER BY OrderDate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS MA_7Day
FROM DailySales;

This reduces the number of rows the window function processes, significantly improving performance.

2. Index Appropriately

Create indexes on partition and order columns:

-- Support window functions on Orders table
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_OrderDate 
ON Sales.Orders (CustomerID, OrderDate)
INCLUDE (OrderID);

-- Support date-based aggregations
CREATE NONCLUSTERED INDEX IX_Orders_OrderDate 
ON Sales.Orders (OrderDate)
INCLUDE (CustomerID);

3. Use ROWS Instead of RANGE

ROWS is generally faster and more predictable:

-- FASTER: Use ROWS for fixed window sizes
AVG(Revenue) OVER (ORDER BY OrderDate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

-- SLOWER: RANGE can include variable numbers of rows
AVG(Revenue) OVER (ORDER BY OrderDate RANGE BETWEEN 6 PRECEDING AND CURRENT ROW)

Common Mistakes and How to Avoid Them

Mistake 1: Forgetting Frame Specification for LAST_VALUE

-- WRONG: Without frame spec, LAST_VALUE only sees current row
SELECT OrderDate, Revenue,
       LAST_VALUE(Revenue) OVER (ORDER BY OrderDate) AS LastRevenue
FROM DailyRevenue;

-- RIGHT: Specify full frame
SELECT OrderDate, Revenue,
       LAST_VALUE(Revenue) OVER (
           ORDER BY OrderDate 
           ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
       ) AS LastRevenue
FROM DailyRevenue;

Mistake 2: Mixing Window Functions with GROUP BY Incorrectly

-- WRONG: Window function on grouped data without proper structure
SELECT 
    CustomerID,
    OrderDate,
    SUM(OrderTotal) AS DailyTotal,
    SUM(SUM(OrderTotal)) OVER (ORDER BY OrderDate) AS RunningTotal  -- Nested SUM!
FROM Sales.Orders
GROUP BY CustomerID, OrderDate;

-- RIGHT: Aggregate first in CTE, then apply window function
WITH DailyTotals AS (
    SELECT 
        CustomerID,
        OrderDate,
        SUM(OrderTotal) AS DailyTotal
    FROM Sales.Orders
    GROUP BY CustomerID, OrderDate
)
SELECT 
    CustomerID,
    OrderDate,
    DailyTotal,
    SUM(DailyTotal) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS RunningTotal
FROM DailyTotals;

Mistake 3: Not Handling Edge Cases in Moving Averages

-- BETTER: Handle beginning rows with insufficient data
SELECT 
    OrderDate,
    Revenue,
    CASE 
        WHEN ROW_NUMBER() OVER (ORDER BY OrderDate) < 7 THEN NULL
        ELSE AVG(Revenue) OVER (
            ORDER BY OrderDate 
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        )
    END AS MA_7Day
FROM DailyRevenue;

The first 6 rows don't have enough data for a true 7-day average. You can return NULL or use partial windows depending on business requirements.


Key Takeaways: Mastering Aggregate and Advanced Window Functions

Here are the critical points to remember:

  1. Running totals are simple: Use SUM with ORDER BY and ROWS UNBOUNDED PRECEDING for cumulative calculations.

  2. Moving averages smooth trends: Perfect for filtering noise and detecting patterns in time series data.

  3. Frame specifications matter: ROWS for physical row counts, RANGE for value-based windows. Always be explicit.

  4. LAST_VALUE needs full frame: Use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING or you'll get unexpected results.

  5. NTILE creates equal buckets: Perfect for customer segmentation, percentile analysis, and creating balanced groups.

  6. Pre-aggregate for performance: Calculate sums/averages first in CTEs before applying window functions.

  7. Index partition and order columns: This is the most impactful performance optimization.

  8. Test edge cases: First and last rows in partitions, NULL values, and insufficient data for window frames.


Frequently Asked Questions

Q: What's the difference between ROWS and RANGE in frame specifications?

A: ROWS counts physical rows (e.g., "the 7 rows before this one"), while RANGE includes all rows with values within a specified range (e.g., "all rows with dates within 7 days"). ROWS is more predictable and faster for most use cases. Use RANGE when you have gaps in your data (like weekends) and need to include all rows within a time period regardless of how many there are.

Q: Can I use aggregate window functions without ORDER BY?

A: Yes, but the behavior changes significantly. Without ORDER BY, the window includes all rows in the partition, essentially giving you a partition-level aggregate on every row. With ORDER BY, you get cumulative or moving calculations. For example, SUM(Amount) OVER (PARTITION BY CustomerID) gives the total for all customer orders on every row, while SUM(Amount) OVER (PARTITION BY CustomerID ORDER BY OrderDate) gives a running total.

Q: Why is my running total query so slow?

A: The most common causes are: (1) Missing indexes on PARTITION BY and ORDER BY columns, (2) Not pre-aggregating data before applying window functions, (3) Using RANGE instead of ROWS, or (4) Processing too many rows at once. Start by checking your execution plan—look for table scans, sorts, and high row counts. Add appropriate indexes, consider pre-aggregation in CTEs, and make sure you're using ROWS for frame specifications.