Archive

Posts Tagged ‘analytic function denali’

SQL Server – "Denali" – PERCENT_RANK() Analytic Function

19.08.2011 1 comment

PERCENT_RANK() returns the position of a row within the result set. In contrast to RANK() function, PERCENT_RANK() ranks rows between 0 and 1, both inclusive.

Computation formula used by PERCENT_RANK():

     (RANK() – 1) / (Number of Rows – 1)

     where, RANK() is the rank of the row within the result set.

Using PERCENT_RANK():

PERCENT_RANK() takes and ORDER BY clause as argument:

SELECT [Year], [Month], [Amount],
       PERCENT_RANK() OVER (ORDER BY [Amount]) [PERCENT_RANK],
       RANK() OVER (ORDER BY [Amount]) [RANK]
FROM   dbo.SalesData

 

Result Set:

 

Year        Month       Amount                PERCENT_RANK           RANK
2011        1           1000.00               0                      1
2011        2           2000.00               0.142857142857143      2
2010        4           2000.00               0.142857142857143      2
2011        3           3000.00               0.428571428571429      4
2011        4           4000.00               0.571428571428571      5
2010        1           5000.00               0.714285714285714      6
2010        2           6000.00               0.857142857142857      7
2010        3           7000.00               1                      8

(8 row(s) affected)

 

PERCENT_RANK for row 4 (Year = 2011, Month = 3) is calculated as:

(4 – 1.00) / (8 – 1.00) = 0.4285714..

 

PARTITION BY:

You can use PARTITION BY clause with PERCENT_RANK() to divide result set into partitions:

SELECT [Year], [Month], [Amount],

       PERCENT_RANK() OVER (PARTITION BY [Year] ORDER BY [Amount]) [PERCENT_RANK],

       RANK() OVER (PARTITION BY [Year] ORDER BY [Amount]) [RANK]

FROM   dbo.SalesData

Result Set:

Year        Month       Amount                PERCENT_RANK           RANK
2010        4           2000.00               0                      1
2010        1           5000.00               0.333333333333333      2
2010        2           6000.00               0.666666666666667      3
2010        3           7000.00               1                      4
2011        1           1000.00               0                      1
2011        2           2000.00               0.333333333333333      2
2011        3           3000.00               0.666666666666667      3
2011        4           4000.00               1                      4

(8 row(s) affected)

 

Hope This Helps! Cheers!

Hope This Helps!

Vishal

SQL Server – "Denali" – Using Analytic Function CUME_DIST()

18.08.2011 No comments

CUME_DIST() calculates relative position of a value relative to a group of values. The value returned by CUME_DIST() is > 0 and <= 1, which represents percentage of number of rows with value less than (for ascending order) or equal to current row.

For example, consider below data:

Year        Month       Amount

2010        1           5000.00

2010        2           6000.00

2010        3           7000.00

2010        4           2000.00

2011        1           1000.00

2011        2           2000.00

2011        3           3000.00

2011        4           4000.00

Cumulative distribution of row 4 (year = 2010, month = 4) will be 37.50%, as number of rows which has value <= row 4 are 3 rows, and the value returned by CUME_DIST() will be 0.375.

 

Using CUME_DIST():

CUME_DIST() takes an ORDER BY clause as argument, ORDER BY clause determines the order in which the operation is performed:

SELECT [Year], [Month], [Amount],

       CUME_DIST() OVER (ORDER BY [Amount]) CUME_DIST

       –CAST(CUME_DIST() OVER (ORDER BY [Amount]) * 100 AS VARCHAR) + '%' CUME_DIST

FROM   dbo.SalesData

ORDER BY [Amount]

 

Result Set:

Year        Month       Amount                CUME_DIST

2011        1           1000.00               0.125

2011        2           2000.00               0.375

2010        4           2000.00               0.375

2011        3           3000.00               0.5

2011        4           4000.00               0.625

2010        1           5000.00               0.75

2010        2           6000.00               0.875

2010        3           7000.00               1

 

CUME_DUST() with PARTITION BY:

PARTITION BY clause divides the result set into partition to which CUME_DIST() is applied:

SELECT [Year], [Month], [Amount],

       CUME_DIST() OVER (PARTITION BY [Year] ORDER BY [Amount])

       CUME_DIST

FROM   dbo.SalesData

ORDER BY [Year], [Month], [Amount]

Result Set:

Year        Month       Amount                CUME_DIST

2010        1           5000.00               0.5

2010        2           6000.00               0.75

2010        3           7000.00               1

2010        4           2000.00               0.25

2011        1           1000.00               0.25

2011        2           2000.00               0.5

2011        3           3000.00               0.75

2011        4           4000.00               1

 

CUME_DIST() can also be used to find "TOP" rows, check below queries which selects 50% of data and the different result sets generated by them:

;WITH tempTable

AS

(

       SELECT [Year], [Month], [Amount],

              CUME_DIST() OVER (ORDER BY [Amount]) CUME_DIST

       FROM   dbo.SalesData

)

SELECT [Year], [Month], [Amount]

FROM   tempTable

WHERE  CUME_DIST < 0.50

 

 

SELECT TOP 50 PERCENT [Year], [Month], [Amount]

FROM   dbo.SalesData

Result Sets:

Year        Month       Amount

———– ———– ———————

2011        1           1000.00

2011        2           2000.00

2010        4           2000.00

 

(3 row(s) affected)

 

Year        Month       Amount

———– ———– ———————

2011        1           1000.00

2011        2           2000.00

2011        3           3000.00

2011        4           4000.00

 

(4 row(s) affected)

 

CUME_DIST() has been available in Oracle since 8i, and can also be used as an aggregate function in Oracle.

 

Hope This Helps! Cheers!

Hope This Helps!

Vishal

SQL Server – "Denali" – Analytic Functions FIRST_VALUE() and LAST_VALUE()

12.08.2011 No comments

FIRST_VALUE() and LAST_VALUE() are new analytic function introduced in SQL Server "Denali". As the name suggests FIRST_VALUE() returns first value in an ordered set of values, and LAST_VALUE() returns the last value from an ordered set of values.

For example,

SELECT EmployeeID, FirstName, MiddleName, LastName,

       FIRST_VALUE(EmployeeID) OVER (ORDER BY EmployeeID)

       AS [First Value]

FROM   Table_Employees

Result Set:

EmployeeID   FirstName    MiddleName   LastName First Value

1            Ken          J            Sánchez  1

2            Ken          J            Sánchez  1

3            Ken          J            Sánchez  1

4            Terri        Lee          Duffy    1

5            Terri        Lee          Duffy    1

6            Terri        Lee          Duffy    1

7            Terri        Lee          Duffy    1

 

These functions can also be used with PARTITION BY clause, which allows us to group the result set into subsets.

Using PARTITION BY with FIRST_VALUE():

By using PARTITION BY clause with FIRST_VALUE() function we can divide the result set by name:

SELECT EmployeeID, FirstName, MiddleName, LastName,

       FIRST_VALUE(EmployeeID) OVER

       (PARTITION BY FirstName, MiddleName, LastName

       ORDER BY FirstName, MiddleName, LastName)

       AS [First Value]

FROM   Table_Employees

Result Set:

EmployeeID   FirstName    MiddleName   LastName First Value

1            Ken          J            Sánchez  1

2            Ken          J            Sánchez  1

3            Ken          J            Sánchez  1

4            Terri        Lee          Duffy    4

5            Terri        Lee          Duffy    4

6            Terri        Lee          Duffy    4

7            Terri        Lee          Duffy    4

 

Now let’s try to use this to find duplicate rows which we tried yesterday:

;WITH  EmployeesCTE

       (EmployeeID, FirstName, MiddleName, LastName, DuplicateOf)

AS

(

       SELECT EmployeeID, FirstName, MiddleName, LastName,

              FIRST_VALUE(EmployeeID) OVER

              (PARTITION BY FirstName, MiddleName, LastName

              ORDER BY FirstName, MiddleName, LastName)

       FROM   Table_Employees

)

SELECT EmployeeID, FirstName, MiddleName, LastName, DuplicateOf

FROM   EmployeesCTE

WHERE  EmployeeID <> DuplicateOf — Duplicate Rows

       –EmployeeID = DuplicateOf — Unique Rows

Note that we no longer need a self-join in this case.

Result Set:

EmployeeID   FirstName    MiddleName   LastName DuplicateOf

2            Ken          J            Sánchez  1

3            Ken          J            Sánchez  1

5            Terri        Lee          Duffy    4

6            Terri        Lee          Duffy    4

7            Terri        Lee          Duffy    4

LAST_VALUE() has the same syntax and clauses as FIRST_VALUE(), the only difference is, it returns last value from an ordered set:

SELECT EmployeeID, FirstName, MiddleName, LastName,

       FIRST_VALUE(EmployeeID) OVER

       (PARTITION BY FirstName, MiddleName, LastName

       ORDER BY FirstName, MiddleName, LastName)

       AS [FirstValue],

       LAST_VALUE(EmployeeID) OVER

       (PARTITION BY FirstName, MiddleName, LastName

       ORDER BY FirstName, MiddleName, LastName)

       AS [LastValue]

FROM   Table_Employees

Result Set:

EmployeeID FirstName  MiddleName  LastName  FirstValue  LastValue  

1          Ken        J           Sánchez   1           3

2          Ken        J           Sánchez   1           3

3          Ken        J           Sánchez   1           3

4          Terri      Lee         Duffy     4           7

5          Terri      Lee         Duffy     4           7

6          Terri      Lee         Duffy     4           7

7          Terri      Lee         Duffy     4           7

 

Hope This Helps!

Vishal

SQL Server – "Denali" – Analytic Functions – LAG() and LEAD()

28.07.2011 No comments

LAG() function can be used to access data from a previous row in the result set without using a self-join. And it’s counterpart LEAD() can be used to access data from a subsequent row in the same result set. These functions are introduced in “Denali” as T-SQL functions, these has been available as MDX functions since SQL Server 2005.

LAG() and LEAD() functions are also available in Oracle since Oracle 8i.

Syntax for LAG()/LEAD():

LAG/LEAD (scalar_expression [,offset] [,default])
    OVER ( [ partition_by_clause ] order_by_clause )

offset provides the numbers of rows back or forward the current row. default provides a default value when the scalar_expression returned by offset is NULL.

Below example uses LEG() and LEAD() to identify previous or next date for a particular order.

CREATE TABLE #Orders

(

       OrderDate     DATE,

       ProductID     INT,

       Quantity      INT

)

INSERT INTO #Orders VALUES

('2011-07-28',11,12), ('2011-03-18',12,74), ('2011-04-12',13,95),

('2011-07-25',14,57), ('2011-05-30',11,28), ('2011-05-21',10,12),

('2011-04-12',11,38)

 

SELECT OrderDate, ProductID, Quantity

FROM   #Orders

Result Set:

OrderDate  ProductID   Quantity

———- ———– ———–

2011-07-28 11          12

2011-03-18 12          74

2011-04-12 13          95

2011-07-25 14          57

2011-05-30 11          28

2011-05-21 10          12

2011-04-12 11          38

 

(7 row(s) affected)

Last order date using LAG():

SELECT OrderDate, ProductID, Quantity,

       LAG(OrderDate, 1, OrderDate)

       OVER (PARTITION BY ProductID ORDER BY OrderDate)

       AS Last_OrderDate

FROM   #Orders

Result Set:

OrderDate  ProductID   Quantity    Last_OrderDate

———- ———– ———– ————–

2011-05-21 10          12          2011-05-21

2011-04-12 11          38          2011-04-12

2011-05-30 11          28          2011-04-12

2011-07-28 11          12          2011-05-30

2011-03-18 12          74          2011-03-18

2011-04-12 13          95          2011-04-12

2011-07-25 14          57          2011-07-25

 

(7 row(s) affected)

Next order date using LEAD():

SELECT OrderDate, ProductID, Quantity,

       LEAD(OrderDate, 1, OrderDate)

       OVER (PARTITION BY ProductID ORDER BY OrderDate)

       AS Next_OrderDate

FROM   #Orders

Result Set:

OrderDate  ProductID   Quantity    Next_OrderDate

———- ———– ———– ————–

2011-05-21 10          12          2011-05-21

2011-04-12 11          38          2011-05-30

2011-05-30 11          28          2011-07-28

2011-07-28 11          12          2011-07-28

2011-03-18 12          74          2011-03-18

2011-04-12 13          95          2011-04-12

2011-07-25 14          57          2011-07-25

 

(7 row(s) affected)

Hope This Helps!

Vishal