Archive

Posts Tagged ‘row number’

SQL Server – Identifying unique and duplicate rows in a table

11.08.2011 5 comments

Finding duplicate rows in a table can be done easily by using ROW_NUMBER() function. You can number each similar rows by using PARTITION BY clause.

For example, consider the below table:

EmployeeID  FirstName      MiddleName     LastName

———– ———-     ———–    ———

1           Ken            J              Sánchez

2           Terri          Lee            Duffy

3           Gail           A              Erickson

4           Ken            J              Sánchez

5           Terri          Lee            Duffy

6           Gail           A              Erickson

 

(6 row(s) affected)

Selecting using DISTINCT does not help here, as we have an identity column [EmployeeID] which id unique for all rows. To find duplicates here we need to compare only common columns [FirstName], [MiddleName] and [LastName]. We can partition data based on these columns for row numbering as below:

SELECT      EmployeeID, FirstName, MiddleName, LastName,

            ROW_NUMBER() OVER

            (PARTITION BY FirstName, MiddleName, LastName

             ORDER BY EmployeeID)

            AS 'Row Number'

FROM        dbo.Table_Employees

Result Set:

EmployeeID  FirstName      MiddleName     LastName     Row Number

———– ———-     ———–    ———    ———–

3           Gail           A              Erickson     1

6           Gail           A              Erickson     2

1           Ken            J              Sánchez      1

4           Ken            J              Sánchez      2

2           Terri          Lee            Duffy        1

5           Terri          Lee            Duffy        2

 

(6 row(s) affected)

Now we have both unique and duplicate rows from dbo.Table_Employees. All rows with [Row Number] = 1 are unique, and all other rows are duplicate rows. Still we cannot identify that if a row is duplicate, which is a corresponding unique row for the same (except by looking at it ? , which may not be feasible if there are a large number of rows).

 

To easily identify a unique row corresponding to a duplicate row, we would like to add another column which gives us the [EmployeeID] of the unique row. In the above example, row with [EmployeeID] = 6 is duplicate row, and we would like to add a column which contains value 3 ([EmployeeID] of corresponding unique row). So the row should look like below:

EmployeeID  FirstName      MiddleName     LastName     Row Number     Duplicate Of

———– ———-     ———–    ———    ———–    ————-

6           Gail           A              Erickson     2              3

 

This can be achieved by:

 

1. Move all unique rows to TableA (all rows where [RowNumber] = 1),

2. Move all duplicate rows to TableB (all rows where [RowNumber] <> 1),

3. JOIN TableA with TableB to get value for [Duplicate Of].

 

We don’t need to create actual separate tables here, we can use CTE instead. Final query will be as follows:

 

WITH  CTE_Employees

      (EmployeeID, FirstName, MiddleName, LastName, RowNumber)

AS

(

      SELECT      EmployeeID, FirstName, MiddleName, LastName,

                  ROW_NUMBER() OVER

                  (PARTITION BY FirstName, MiddleName, LastName

                  ORDER BY EmployeeID)

      FROM        Table_Employees

)

SELECT      TableB.EmployeeID, TableB.FirstName, TableB.MiddleName,

            TableB.LastName, TableA.EmployeeID AS 'Duplicate Of'

FROM        CTE_Employees TableB

INNER JOIN CTE_Employees TableA

            ON    TableB.FirstName  = TableA.FirstName

            AND   TableB.MiddleName = TableA.MiddleName

            AND   TableB.LastName   = TableA.LastName

WHERE       TableB.RowNumber > 1 — Duplicate rows

AND         TableA.RowNumber = 1 — Unique rows

Result Set:

EmployeeID  FirstName      MiddleName     LastName      Duplicate Of

———– ———-     ———–    ———     ————-

6           Gail           A              Erickson      3

4           Ken            J              Sánchez       1

5           Terri          Lee            Duffy         2

 

(3 row(s) affected)

Hope This Helps!

Vishal

SQL Functions – ROW_NUMBER()

01.07.2011 No comments

ROW_NUMBER() can be used to generate a sequential number for each row in the result set. Unlike RANK() and DENSE_RANK(), ROW_NUMBER() in case of ties it does not generate same number, it simply ignores the tie and generates sequential numbers for each of the tied rows.

For example,

— © 2011 – Vishal (http://SqlAndMe.com)

 

SELECT      Products.ProductID, Products.Name,

            Products.Category, Products.Price,

            ROW_NUMBER() OVER (ORDER BY [Price])

            AS [RowNumber]

FROM

(

      SELECT 804 AS [ProductID],

      'HL Fork' AS [Name], 'Accessories' AS [Category],

      229.49 AS [Price] UNION ALL

      SELECT 871, 'Mountain Bottle Cage', 'Accessories', 9.99    UNION ALL

      SELECT 921, 'Mountain Tire Tube',   'Parts',       24.99   UNION ALL

      SELECT 845, 'Mountain Pump',        'Parts',       24.99   UNION ALL

      SELECT 980, 'Mountain-400-W Silver','Accessories', 769.49  UNION ALL

      SELECT 988, 'Mountain-500 Silver',  'Parts',       564.99  UNION ALL

      SELECT 793, 'Road-250 Black',       'Accessories', 2443.35 UNION ALL

      SELECT 762, 'Road-650 Red',         'Parts',       782.99  UNION ALL

      SELECT 923, 'Touring Tire Tube',    'Accessories', 4.99    UNION ALL

      SELECT 966, 'Touring-1000 Blue',    'Parts',       2384.07

) Products

ROW_NUMBER() Result Set:

ProductID   Name                  Category    Price         RowNumber

———– ——————— ———– ————- ————

923         Touring Tire Tube     Accessories 4.99          1

871         Mountain Bottle Cage  Accessories 9.99          2

845         Mountain Pump         Parts       24.99         3

921         Mountain Tire Tube    Parts       24.99         4

804         HL Fork               Accessories 229.49        5

988         Mountain-500 Silver   Parts       564.99        6

980         Mountain-400-W Silver Accessories 769.49        7

762         Road-650 Red          Parts       782.99        8

966         Touring-1000 Blue     Parts       2384.07       9

793         Road-250 Black        Accessories 2443.35       10

 

(10 row(s) affected)

If we had used RANK() function, it would have generated the result as below:

……………

871         Mountain Bottle Cage  Accessories 9.99            2

845         Mountain Pump         Parts       24.99           3

921         Mountain Tire Tube    Parts       24.99           3

804         HL Fork               Accessories 229.49          5
……………

and DENSE_RANK() would have generated 4 for 'HL Fork'.

 

Using ROW_NUMBER with <PARTITION BY>:

By using PARTITION BY clause with ROW_NUMBER() the result set can be divided into number of result sets based on the PARTITION BY column, and to each result set ROW_NUMBER() function will be applied independently.

We can apply PARTITION BY clause to above data to partition the result sets by Product Category.

The result set would contain:

            Partition By Category

10 Rows ->  Accessories (5 Rows) -> Apply ROW_NUMBER() -> 1, 2, 3, 4, 5

            Parts (5 Rows)       -> Apply ROW_NUMBER() -> 1, 2, 3, 4, 5

SELECT      Products.ProductID, Products.Name,

            Products.Category, Products.Price,

            ROW_NUMBER() OVER (PARTITION BY [Category] ORDER BY [Price])

            AS [RowNumber]

FROM

(

      SELECT 804 AS [ProductID],

      'HL Fork' AS [Name], 'Accessories' AS [Category],

      229.49 AS [Price] UNION ALL

      SELECT 871, 'Mountain Bottle Cage', 'Accessories', 9.99    UNION ALL

      SELECT 921, 'Mountain Tire Tube',   'Parts',       24.99   UNION ALL

      SELECT 845, 'Mountain Pump',        'Parts',       24.99   UNION ALL

      SELECT 980, 'Mountain-400-W Silver','Accessories', 769.49  UNION ALL

      SELECT 988, 'Mountain-500 Silver',  'Parts',       564.99  UNION ALL

      SELECT 793, 'Road-250 Black',       'Accessories', 2443.35 UNION ALL

      SELECT 762, 'Road-650 Red',         'Parts',       782.99  UNION ALL

      SELECT 923, 'Touring Tire Tube',    'Accessories', 4.99    UNION ALL

      SELECT 966, 'Touring-1000 Blue',    'Parts',       2384.07

) Products

Result Set:

ProductID   Name                  Category    Price         RowNumber

———– ——————— ———– ———–   ————

923         Touring Tire Tube     Accessories 4.99          1

871         Mountain Bottle Cage  Accessories 9.99          2

804         HL Fork               Accessories 229.49        3

980         Mountain-400-W Silver Accessories 769.49        4

793         Road-250 Black        Accessories 2443.35       5

845         Mountain Pump         Parts       24.99         1

921         Mountain Tire Tube    Parts       24.99         2

988         Mountain-500 Silver   Parts       564.99        3

762         Road-650 Red          Parts       782.99        4

966         Touring-1000 Blue     Parts       2384.07       5

 

(10 row(s) affected)

Hope This Helps!

Vishal