SQL Recursive Queries
Introduction
Have you ever needed to query hierarchical data like an organizational chart, a file system, or a bill of materials? Or perhaps you've needed to generate a sequence of dates or numbers? These problems can be challenging with basic SQL queries, but become much simpler with recursive queries.
Recursive queries in SQL allow you to reference a query within itself, enabling you to traverse hierarchical data structures or generate sequences through iteration. They're implemented using a special form of Common Table Expression (CTE) called a recursive CTE.
In this tutorial, we'll explore how recursive queries work, their syntax, and practical applications that will help you solve complex data problems with elegance and efficiency.
Prerequisites
Before diving into recursive queries, you should be familiar with:
- Basic SQL queries (SELECT, FROM, WHERE)
- JOINs in SQL
- Regular CTEs (Common Table Expressions)
Understanding Recursive CTEs
A recursive CTE is a common table expression that references itself. It consists of two parts:
- Anchor member: The initial query that returns the base result set (no recursion)
- Recursive member: The query that references the CTE itself and is executed repeatedly
Here's the general syntax:
WITH RECURSIVE cte_name AS (
    -- Anchor member
    SELECT initial_columns
    FROM initial_tables
    WHERE initial_conditions
    
    UNION ALL
    
    -- Recursive member
    SELECT recursive_columns
    FROM recursive_tables
    JOIN cte_name ON join_condition
    WHERE recursive_conditions
)
SELECT columns FROM cte_name;
Note: Some database systems like PostgreSQL and SQLite use the
WITH RECURSIVEsyntax, while others like SQL Server and MySQL allowWITHwithout explicitly specifyingRECURSIVE.
How Recursive CTEs Work
The execution of a recursive CTE follows these steps:
- Execute the anchor member to create the initial result set (iteration 0)
- Execute the recursive member, substituting the previous iteration's results
- Add the results to the result set
- Repeat steps 2-3 until the recursive member returns no rows
- Return the complete result set
Let's visualize this process:
Basic Example: Generating a Sequence
Let's start with a simple example - generating a sequence of numbers from 1 to 10:
WITH RECURSIVE number_sequence AS (
    -- Anchor member: Start with 1
    SELECT 1 AS n
    
    UNION ALL
    
    -- Recursive member: Add 1 to previous value until we reach 10
    SELECT n + 1
    FROM number_sequence
    WHERE n < 10
)
SELECT n FROM number_sequence;
Output:
 n
---
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
Let's break down how this works:
- The anchor member returns a single row with the value 1
- The recursive member takes the previous result (1), adds 1, and returns 2
- This process continues, adding 1 each time, until n reaches 10
- When n = 10, the WHERE condition (n < 10) fails, and recursion stops
Working with Hierarchical Data
One of the most common applications of recursive queries is traversing hierarchical data structures. Let's consider an employee organizational chart:
CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    name VARCHAR(100),
    manager_id INT,
    FOREIGN KEY (manager_id) REFERENCES employees(employee_id)
);
INSERT INTO employees VALUES
(1, 'John Doe', NULL),       -- CEO
(2, 'Jane Smith', 1),        -- CTO, reports to CEO
(3, 'Bob Johnson', 1),       -- CFO, reports to CEO
(4, 'Alice Williams', 2),    -- Engineering Manager, reports to CTO
(5, 'Charlie Brown', 2),     -- Product Manager, reports to CTO
(6, 'Diana Miller', 4),      -- Senior Developer, reports to Engineering Manager
(7, 'Edward Davis', 4),      -- Developer, reports to Engineering Manager
(8, 'Fiona Clark', 5);       -- Designer, reports to Product Manager
Now, let's use a recursive CTE to display the entire organizational hierarchy with levels:
WITH RECURSIVE org_chart AS (
    -- Anchor member: Start with the CEO (no manager)
    SELECT 
        employee_id,
        name,
        manager_id,
        0 AS level,
        CAST(name AS VARCHAR(1000)) AS path
    FROM 
        employees
    WHERE 
        manager_id IS NULL
    
    UNION ALL
    
    -- Recursive member: Join with employees who have managers in our current result
    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        oc.level + 1,
        CAST(oc.path || ' > ' || e.name AS VARCHAR(1000))
    FROM 
        employees e
    JOIN 
        org_chart oc ON e.manager_id = oc.employee_id
)
SELECT 
    level,
    REPEAT('  ', level) || name AS employee,
    path
FROM 
    org_chart
ORDER BY 
    path;
Output:
 level | employee          | path
-------+-------------------+-----------------------------
 0     | John Doe          | John Doe
 1     |   Jane Smith      | John Doe > Jane Smith
 2     |     Alice Williams| John Doe > Jane Smith > Alice Williams
 3     |       Diana Miller| John Doe > Jane Smith > Alice Williams > Diana Miller
 3     |       Edward Davis| John Doe > Jane Smith > Alice Williams > Edward Davis
 2     |     Charlie Brown | John Doe > Jane Smith > Charlie Brown
 3     |       Fiona Clark | John Doe > Jane Smith > Charlie Brown > Fiona Clark
 1     |   Bob Johnson     | John Doe > Bob Johnson
In this example:
- We start with the CEO (the employee with no manager)
- We recursively find all employees who report to someone in our current result set
- We track the level of hierarchy and build a path string to show the reporting structure
Finding Paths in Graph Data
Recursive CTEs are also excellent for solving graph traversal problems. Let's look at a flight connections example:
CREATE TABLE flights (
    departure_city VARCHAR(50),
    arrival_city VARCHAR(50),
    duration_hours NUMERIC(4,2),
    PRIMARY KEY (departure_city, arrival_city)
);
INSERT INTO flights VALUES
('New York', 'London', 7.5),
('New York', 'Paris', 8.25),
('London', 'Berlin', 2.5),
('London', 'Rome', 3.0),
('Paris', 'Rome', 2.5),
('Paris', 'Madrid', 2.0),
('Rome', 'Athens', 2.5),
('Madrid', 'Lisbon', 1.5);
Now, let's find all possible routes from New York to Athens with up to 3 connections:
WITH RECURSIVE routes AS (
    -- Anchor member: Direct flights from New York
    SELECT 
        departure_city,
        arrival_city,
        1 AS connections,
        ARRAY[departure_city, arrival_city] AS route,
        duration_hours
    FROM 
        flights
    WHERE 
        departure_city = 'New York'
    
    UNION ALL
    
    -- Recursive member: Add connecting flights
    SELECT 
        r.departure_city,
        f.arrival_city,
        r.connections + 1,
        r.route || f.arrival_city,
        r.duration_hours + f.duration_hours
    FROM 
        routes r
    JOIN 
        flights f ON r.arrival_city = f.departure_city
    WHERE 
        r.connections < 3
        AND f.arrival_city <> ALL(r.route) -- Avoid cycles
)
SELECT 
    route,
    connections,
    duration_hours AS total_hours
FROM 
    routes
WHERE 
    arrival_city = 'Athens'
ORDER BY 
    total_hours;
Output:
 route                              | connections | total_hours
-----------------------------------+-------------+-------------
 {New York,Paris,Rome,Athens}      | 3           | 13.25
 {New York,London,Rome,Athens}     | 3           | 13.00
This query:
- Starts with all direct flights from New York
- Recursively adds connecting flights, limiting to 3 connections
- Builds a route array and calculates total duration
- Filters for routes ending in Athens
- Avoids cycles by ensuring we don't visit the same city twice
Preventing Infinite Recursion
It's important to ensure your recursive queries terminate. Without proper termination conditions, you could end up with infinite recursion, which database systems typically prevent with maximum recursion depth settings.
Here are strategies to prevent infinite recursion:
- Use a limiting condition: Include a WHERE clause in your recursive member that will eventually return no rows.
- Track and limit depth: Include a counter field and stop when it reaches a maximum value.
- Detect cycles: For graph traversals, keep track of visited nodes to avoid cycles.
Most database systems also have built-in safeguards:
- PostgreSQL: Has a max_recursive_iterationssetting (default: 100)
- SQL Server: Limits recursive CTEs to 100 levels by default, but can be specified with OPTION (MAXRECURSION n)
- MySQL: Has a cte_max_recursion_depthsystem variable (default: 1000)
Performance Considerations
Recursive queries can be powerful but may impact performance with large data sets. Consider these tips:
- Add termination conditions: Ensure your recursive member has conditions that will stop the recursion.
- Use appropriate indexes: Index the columns used in join conditions.
- Limit recursion depth: Use a level or depth counter to limit how deep the recursion goes.
- Filter early: Apply filters in both the anchor and recursive members when possible.
Real-World Applications
Recursive CTEs are valuable in many scenarios:
1. Organization Charts
As we saw earlier, recursive CTEs can display reporting hierarchies of any depth.
2. Bill of Materials (BOM)
Calculate the components needed for manufacturing:
WITH RECURSIVE bom_explosion AS (
    -- Anchor: Top-level product
    SELECT 
        parent_id,
        component_id,
        quantity,
        1 AS level
    FROM 
        product_components
    WHERE 
        parent_id = 100 -- Product ID
    
    UNION ALL
    
    -- Recursive: Sub-components
    SELECT 
        pc.parent_id,
        pc.component_id,
        be.quantity * pc.quantity,
        be.level + 1
    FROM 
        bom_explosion be
    JOIN 
        product_components pc ON be.component_id = pc.parent_id
)
SELECT 
    component_id,
    SUM(quantity) AS total_quantity
FROM 
    bom_explosion
GROUP BY 
    component_id;
3. File System Navigation
Traverse directory structures of any depth:
WITH RECURSIVE directory_contents AS (
    -- Anchor: Start with root directory
    SELECT 
        id,
        name,
        parent_id,
        CAST(name AS VARCHAR(1000)) AS path,
        0 AS depth
    FROM 
        files
    WHERE 
        parent_id IS NULL
    
    UNION ALL
    
    -- Recursive: Find all children
    SELECT 
        f.id,
        f.name,
        f.parent_id,
        CAST(dc.path || '/' || f.name AS VARCHAR(1000)),
        dc.depth + 1
    FROM 
        files f
    JOIN 
        directory_contents dc ON f.parent_id = dc.id
)
SELECT * FROM directory_contents ORDER BY path;
4. Calendar Generation
Generate a series of dates for reporting:
WITH RECURSIVE date_series AS (
    -- Anchor: Start date
    SELECT 
        CAST('2023-01-01' AS DATE) AS date
    
    UNION ALL
    
    -- Recursive: Add one day until end date
    SELECT 
        date + INTERVAL '1 day'
    FROM 
        date_series
    WHERE 
        date < '2023-12-31'
)
SELECT date FROM date_series;
Summary
Recursive queries in SQL provide an elegant solution for working with hierarchical data, traversing graphs, and generating sequences. They consist of an anchor member and a recursive member, combined with a UNION ALL operator.
Key takeaways:
- Recursive CTEs use self-reference to iterate through data
- They're perfect for hierarchical data like org charts and file systems
- They can solve graph traversal problems like finding routes
- Always include termination conditions to prevent infinite recursion
- Consider performance impacts with large datasets
By mastering recursive queries, you've added a powerful tool to your SQL toolbox that can solve problems that would otherwise require multiple queries or application-level processing.
Exercises
- Write a recursive query to generate Fibonacci numbers up to 100.
- Create a recursive query to find all employees under a specific manager in the employees table.
- Implement a recursive query to find all possible paths between two cities in the flights table, with no more than 4 connections.
- Write a recursive query to calculate the factorial of numbers 1 through 10.
Additional Resources
💡 Found a typo or mistake? Click "Edit this page" to suggest a correction. Your feedback is greatly appreciated!