SQL Window Function | How to write SQL Query using Frame Clause, CUME_DIST | SQL Queries Tutorial

4 min read 2 months ago
Published on Oct 02, 2024 This response is partially generated with the help of AI. It may contain inaccuracies.

Table of Contents

Introduction

This tutorial focuses on SQL window functions, also known as analytic functions, which are essential for performing calculations across a set of table rows that are related to the current row. You will learn how to use various window functions, including FIRST_VALUE, LAST_VALUE, NTH_VALUE, NTILE, CUME_DIST, and PERCENT_RANK. Additionally, the tutorial covers the FRAME clause and how to effectively utilize the OVER clause with PARTITION BY and ORDER BY.

Step 1: Understanding Window Functions

  • Window functions allow you to perform calculations across a specified range of rows without collapsing the results into a single output.
  • Common window functions include:
    • FIRST_VALUE: Retrieves the first value in an ordered set.
    • LAST_VALUE: Retrieves the last value in an ordered set.
    • NTH_VALUE: Retrieves the N-th value from a specified range.
    • NTILE: Divides a result set into a specified number of buckets.
    • CUME_DIST: Computes the cumulative distribution of a value in a set.
    • PERCENT_RANK: Calculates the relative rank of a row within a partition.

Step 2: Using FIRST_VALUE

  • The FIRST_VALUE function retrieves the first value from a set of rows.
  • Syntax:
    FIRST_VALUE(column_name) OVER (PARTITION BY column_name ORDER BY column_name)
    
  • Example:
    SELECT FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY hire_date) AS first_salary
    FROM employees;
    

Step 3: Using LAST_VALUE

  • The LAST_VALUE function retrieves the last value from a set of rows.
  • Syntax:
    LAST_VALUE(column_name) OVER (PARTITION BY column_name ORDER BY column_name ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
    
  • Example:
    SELECT LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY hire_date) AS last_salary
    FROM employees;
    

Step 4: Understanding the FRAME Clause

  • The FRAME clause defines the range of rows used for calculations within a window function.
  • Key concepts:
    • ROWS: Specifies the physical number of rows.
    • RANGE: Specifies a logical range of values.
  • Example:
    SELECT AVG(salary) OVER (ORDER BY hire_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS avg_salary
    FROM employees;
    

Step 5: Alternate Way of Writing Queries

  • You can also structure your queries using the WINDOW clause to define window specifications that can be reused.
  • Example:
    SELECT employee_id, salary,
           SUM(salary) OVER w AS total_salary
    FROM employees
    WINDOW w AS (PARTITION BY department ORDER BY hire_date);
    

Step 6: Using NTH_VALUE

  • The NTH_VALUE function retrieves the N-th value in a set.
  • Syntax:
    NTH_VALUE(column_name, N) OVER (PARTITION BY column_name ORDER BY column_name)
    
  • Example:
    SELECT NTH_VALUE(salary, 2) OVER (PARTITION BY department ORDER BY hire_date) AS second_salary
    FROM employees;
    

Step 7: Using NTILE

  • The NTILE function divides a result set into a specified number of buckets and assigns a bucket number to each row.
  • Syntax:
    NTILE(number_of_buckets) OVER (ORDER BY column_name)
    
  • Example:
    SELECT employee_id, salary, 
           NTILE(4) OVER (ORDER BY salary) AS salary_bucket
    FROM employees;
    

Step 8: Using CUME_DIST

  • The CUME_DIST function calculates the cumulative distribution of a value in a result set.
  • Syntax:
    CUME_DIST() OVER (PARTITION BY column_name ORDER BY column_name)
    
  • Example:
    SELECT employee_id, salary, 
           CUME_DIST() OVER (ORDER BY salary) AS salary_distribution
    FROM employees;
    

Step 9: Using PERCENT_RANK

  • The PERCENT_RANK function computes the relative rank of a row within a partition.
  • Syntax:
    PERCENT_RANK() OVER (PARTITION BY column_name ORDER BY column_name)
    
  • Example:
    SELECT employee_id, salary, 
           PERCENT_RANK() OVER (ORDER BY salary) AS salary_percent_rank
    FROM employees;
    

Conclusion

In this tutorial, you have learned how to utilize various SQL window functions such as FIRST_VALUE, LAST_VALUE, NTH_VALUE, NTILE, CUME_DIST, and PERCENT_RANK. You also explored the FRAME clause and different ways to structure your SQL queries using the WINDOW clause. These powerful functions enhance your ability to perform complex calculations and analyses across datasets in SQL. For further practice, consider exploring more advanced SQL topics or implementing these functions in real-world scenarios.