SQL Window Functions
Introduction
SQL Window Functions are a powerful feature that allow you to perform calculations across a set of table rows that are somehow related to the current row. Unlike regular aggregate functions which collapse groups of rows into a single value, window functions maintain the separate identity of each row while still performing calculations across a specified "window" of rows.
Think of a window function as opening a window to a set of rows and performing calculations just within that window. This capability enables complex data analysis operations that would otherwise require multiple queries or join operations.
Why Use Window Functions?
Window functions solve several common analytical problems elegantly:
- Calculating running totals or cumulative sums
- Finding moving averages
- Ranking rows within groups
- Accessing values from preceding or following rows
- Comparing current row values to group statistics
Basic Syntax
The general syntax for window functions follows this pattern:
SELECT
    column1,
    column2,
    window_function() OVER (
        [PARTITION BY partition_column]
        [ORDER BY sort_column]
        [frame_clause]
    ) AS alias
FROM table_name;
Let's break down these components:
- window_function(): The function to apply (e.g.,- ROW_NUMBER(),- SUM(),- AVG())
- OVER: Indicates that this is a window function
- PARTITION BY: Divides rows into groups (windows) for the function to operate on
- ORDER BY: Defines the logical order of rows within each partition
- frame_clause: Specifies which rows within the partition to include in the window (e.g.,- ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
Common Window Functions
Ranking Functions
ROW_NUMBER()
Assigns a unique sequential integer to each row within a partition.
SELECT 
    product_category,
    product_name,
    price,
    ROW_NUMBER() OVER (PARTITION BY product_category ORDER BY price DESC) AS price_rank
FROM products;
Output:
product_category   product_name   price   price_rank
--------------------------------------------------
Electronics        Laptop         1200    1
Electronics        Smartphone     800     2
Electronics        Headphones     200     3
Clothing           Winter Coat    150     1
Clothing           Jeans          80      2
Clothing           T-shirt        25      3
RANK() and DENSE_RANK()
These functions assign ranks to rows within a partition. The difference is in how they handle ties:
- RANK()leaves gaps in the ranking when there are ties
- DENSE_RANK()doesn't leave gaps
SELECT 
    student_name,
    test_score,
    RANK() OVER (ORDER BY test_score DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY test_score DESC) AS dense_rank
FROM student_scores;
Output:
student_name   test_score   rank   dense_rank
-------------------------------------------
Alice          95           1      1
Bob            95           1      1
Charlie        90           3      2
David          85           4      3
Eve            85           4      3
Frank          80           6      4
Aggregate Window Functions
You can use standard aggregate functions like SUM(), AVG(), COUNT(), MIN(), and MAX() as window functions.
Running Total Example
SELECT 
    sale_date,
    amount,
    SUM(amount) OVER (ORDER BY sale_date) AS running_total
FROM sales;
Output:
sale_date    amount    running_total
-----------------------------------
2023-01-01   100       100
2023-01-02   150       250
2023-01-03   200       450
2023-01-04   120       570
2023-01-05   180       750
Group Comparisons
Compare each row to aggregated group values:
SELECT 
    department,
    employee_name,
    salary,
    AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary,
    salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM employees;
Output:
department    employee_name    salary    dept_avg_salary    diff_from_avg
-------------------------------------------------------------------
Engineering   Alice            85000     80000              5000
Engineering   Bob              75000     80000              -5000
Engineering   Charlie          80000     80000              0
Marketing     David            65000     70000              -5000
Marketing     Eve              75000     70000              5000
Sales         Frank            90000     85000              5000
Sales         Grace            80000     85000              -5000
Value Functions
LAG() and LEAD()
These functions access data from previous or following rows without using a self-join.
SELECT 
    product_name,
    month,
    sales,
    LAG(sales, 1) OVER (PARTITION BY product_name ORDER BY month) AS prev_month_sales,
    sales - LAG(sales, 1) OVER (PARTITION BY product_name ORDER BY month) AS sales_growth
FROM monthly_sales;
Output:
product_name    month      sales    prev_month_sales    sales_growth
-----------------------------------------------------------------
Product A       2023-01    1000     NULL                NULL
Product A       2023-02    1200     1000                200
Product A       2023-03    950      1200                -250
Product B       2023-01    800      NULL                NULL
Product B       2023-02    820      800                 20
Product B       2023-03    880      820                 60
FIRST_VALUE() and LAST_VALUE()
Return the first or last value in an ordered set of values.
SELECT 
    department,
    employee_name,
    salary,
    FIRST_VALUE(employee_name) OVER (
        PARTITION BY department 
        ORDER BY salary DESC
    ) AS highest_paid_employee
FROM employees;
Output:
department    employee_name    salary    highest_paid_employee
---------------------------------------------------------
Engineering   Alice            85000     Alice
Engineering   Charlie          80000     Alice
Engineering   Bob              75000     Alice
Marketing     Eve              75000     Eve
Marketing     David            65000     Eve
Sales         Frank            90000     Frank
Sales         Grace            80000     Frank
Window Frame Clause
The frame clause defines exactly which rows are included in the window for each calculation. The syntax is:
ROWS BETWEEN <start> AND <end>
Where <start> and <end> can be:
- UNBOUNDED PRECEDING: All rows before the current row
- n PRECEDING: n rows before the current row
- CURRENT ROW: The current row
- n FOLLOWING: n rows after the current row
- UNBOUNDED FOLLOWING: All rows after the current row
Moving Average Example
Calculate a 3-day moving average of sales:
SELECT 
    sale_date,
    daily_sales,
    AVG(daily_sales) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ) AS moving_avg_3day
FROM daily_sales;
Output:
sale_date    daily_sales    moving_avg_3day
------------------------------------------
2023-01-01   100            125
2023-01-02   150            150
2023-01-03   200            190
2023-01-04   220            190
2023-01-05   150            185
Real-World Examples
Example 1: Sales Performance Analysis
Identify each month's sales performance and compare it to previous months and department averages:
SELECT 
    department,
    sales_month,
    sales_amount,
    SUM(sales_amount) OVER (
        PARTITION BY department 
        ORDER BY sales_month
    ) AS cumulative_sales,
    LAG(sales_amount) OVER (
        PARTITION BY department 
        ORDER BY sales_month
    ) AS prev_month_sales,
    sales_amount - LAG(sales_amount) OVER (
        PARTITION BY department 
        ORDER BY sales_month
    ) AS monthly_growth,
    AVG(sales_amount) OVER (
        PARTITION BY department
    ) AS dept_avg_sales,
    RANK() OVER (
        PARTITION BY department 
        ORDER BY sales_amount DESC
    ) AS sales_rank_in_dept
FROM monthly_department_sales;
Example 2: Student Performance Tracking
Analyze student test scores across different subjects and time periods:
SELECT
    student_name,
    subject,
    test_date,
    score,
    AVG(score) OVER (
        PARTITION BY student_name
    ) AS student_avg_score,
    score - AVG(score) OVER (
        PARTITION BY student_name
    ) AS diff_from_student_avg,
    AVG(score) OVER (
        PARTITION BY subject
    ) AS subject_avg_score,
    RANK() OVER (
        PARTITION BY subject 
        ORDER BY score DESC
    ) AS rank_in_subject,
    LAG(score) OVER (
        PARTITION BY student_name, subject 
        ORDER BY test_date
    ) AS previous_test_score,
    score - LAG(score) OVER (
        PARTITION BY student_name, subject 
        ORDER BY test_date
    ) AS score_improvement
FROM student_test_scores;
Example 3: Website User Activity Analysis
Analyze user engagement and identify patterns:
SELECT
    user_id,
    activity_date,
    pages_viewed,
    SUM(pages_viewed) OVER (
        PARTITION BY user_id 
        ORDER BY activity_date
    ) AS cumulative_pages,
    AVG(pages_viewed) OVER (
        PARTITION BY user_id 
        ORDER BY activity_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS rolling_7day_avg,
    NTILE(4) OVER (
        ORDER BY SUM(pages_viewed) OVER (
            PARTITION BY user_id
        ) DESC
    ) AS user_activity_quartile
FROM user_activity;
Visual Explanation of Window Functions
Common Mistakes and How to Avoid Them
- 
Forgetting the PARTITION BYclause: Without partitioning, the window function will apply to the entire result set. Always consider if your calculation should be grouped.
- 
Confusing ROW_NUMBER(),RANK(), andDENSE_RANK(): Remember thatRANK()leaves gaps for ties,DENSE_RANK()doesn't leave gaps, andROW_NUMBER()always assigns sequential numbers regardless of ties.
- 
Misunderstanding window frames: By default, frame clauses are different depending on whether you include an ORDER BYin your window definition. Be explicit with your frame clause to avoid surprises.
- 
Overlapping calculations: If you need multiple window functions with different partitioning or ordering, you'll need to define them separately. 
Performance Considerations
Window functions can be resource-intensive, especially on large datasets. Here are some tips:
- 
Be specific with your PARTITION BYclauses: Smaller partitions mean less work per window.
- 
Limit the window size when possible: Using a frame clause like ROWS BETWEEN 5 PRECEDING AND CURRENT ROWis more efficient than processing the entire partition.
- 
Consider adding appropriate indexes: Indexes on columns used in PARTITION BYandORDER BYclauses can improve performance.
- 
Use Common Table Expressions (CTEs): Breaking complex window function queries into CTEs can make them more readable and sometimes more efficient. 
Summary
SQL Window Functions are a powerful tool for data analysis that allow you to:
- Perform calculations across related rows without grouping the results
- Create running totals, moving averages, and rankings
- Compare current rows to previous or following rows
- Analyze data within specific partitions or groups
By mastering window functions, you can write more efficient and elegant SQL queries for complex analytical tasks that would otherwise require multiple queries or complex joins.
Practice Exercises
- 
Basic Ranking: Write a query that ranks products by price within each category, using both RANK()andDENSE_RANK().
- 
Running Totals: Calculate the running total of sales by date for each salesperson. 
- 
Moving Averages: Calculate a 7-day moving average of daily website visitors. 
- 
Percentile Analysis: Use NTILE()to divide customers into quartiles based on their total purchases.
- 
Year-over-Year Comparison: Compare each month's sales to the same month from the previous year using LAG().
Additional Resources
💡 Found a typo or mistake? Click "Edit this page" to suggest a correction. Your feedback is greatly appreciated!