Home > SQLServer, SQLServer 2012 > SQL Server – "Denali" – Using Analytic Function CUME_DIST()

SQL Server – "Denali" – Using Analytic Function CUME_DIST()


CUME_DIST() calculates relative position of a value relative to a group of values. The value returned by CUME_DIST() is > 0 and <= 1, which represents percentage of number of rows with value less than (for ascending order) or equal to current row.

For example, consider below data:

Year        Month       Amount

2010        1           5000.00

2010        2           6000.00

2010        3           7000.00

2010        4           2000.00

2011        1           1000.00

2011        2           2000.00

2011        3           3000.00

2011        4           4000.00

Cumulative distribution of row 4 (year = 2010, month = 4) will be 37.50%, as number of rows which has value <= row 4 are 3 rows, and the value returned by CUME_DIST() will be 0.375.

 

Using CUME_DIST():

CUME_DIST() takes an ORDER BY clause as argument, ORDER BY clause determines the order in which the operation is performed:

SELECT [Year], [Month], [Amount],

       CUME_DIST() OVER (ORDER BY [Amount]) CUME_DIST

       –CAST(CUME_DIST() OVER (ORDER BY [Amount]) * 100 AS VARCHAR) + '%' CUME_DIST

FROM   dbo.SalesData

ORDER BY [Amount]

 

Result Set:

Year        Month       Amount                CUME_DIST

2011        1           1000.00               0.125

2011        2           2000.00               0.375

2010        4           2000.00               0.375

2011        3           3000.00               0.5

2011        4           4000.00               0.625

2010        1           5000.00               0.75

2010        2           6000.00               0.875

2010        3           7000.00               1

 

CUME_DUST() with PARTITION BY:

PARTITION BY clause divides the result set into partition to which CUME_DIST() is applied:

SELECT [Year], [Month], [Amount],

       CUME_DIST() OVER (PARTITION BY [Year] ORDER BY [Amount])

       CUME_DIST

FROM   dbo.SalesData

ORDER BY [Year], [Month], [Amount]

Result Set:

Year        Month       Amount                CUME_DIST

2010        1           5000.00               0.5

2010        2           6000.00               0.75

2010        3           7000.00               1

2010        4           2000.00               0.25

2011        1           1000.00               0.25

2011        2           2000.00               0.5

2011        3           3000.00               0.75

2011        4           4000.00               1

 

CUME_DIST() can also be used to find "TOP" rows, check below queries which selects 50% of data and the different result sets generated by them:

;WITH tempTable

AS

(

       SELECT [Year], [Month], [Amount],

              CUME_DIST() OVER (ORDER BY [Amount]) CUME_DIST

       FROM   dbo.SalesData

)

SELECT [Year], [Month], [Amount]

FROM   tempTable

WHERE  CUME_DIST < 0.50

 

 

SELECT TOP 50 PERCENT [Year], [Month], [Amount]

FROM   dbo.SalesData

Result Sets:

Year        Month       Amount

———– ———– ———————

2011        1           1000.00

2011        2           2000.00

2010        4           2000.00

 

(3 row(s) affected)

 

Year        Month       Amount

———– ———– ———————

2011        1           1000.00

2011        2           2000.00

2011        3           3000.00

2011        4           4000.00

 

(4 row(s) affected)

 

CUME_DIST() has been available in Oracle since 8i, and can also be used as an aggregate function in Oracle.

 

Hope This Helps! Cheers!

Hope This Helps!

Vishal

  1. No comments yet.
  1. No trackbacks yet.