Archive

Archive for the ‘Common Table Expressions’ Category

SQL Server – Identifying unique and duplicate rows in a table

August 11, 2011 4 comments

Finding duplicate rows in a table can be done easily by using ROW_NUMBER() function. You can number each similar rows by using PARTITION BY clause.

For example, consider the below table:

EmployeeID  FirstName      MiddleName     LastName

———– ———-     ———–    ———

1           Ken            J              Sánchez

2           Terri          Lee            Duffy

3           Gail           A              Erickson

4           Ken            J              Sánchez

5           Terri          Lee            Duffy

6           Gail           A              Erickson

 

(6 row(s) affected)

Selecting using DISTINCT does not help here, as we have an identity column [EmployeeID] which id unique for all rows. To find duplicates here we need to compare only common columns [FirstName], [MiddleName] and [LastName]. We can partition data based on these columns for row numbering as below:

SELECT      EmployeeID, FirstName, MiddleName, LastName,

            ROW_NUMBER() OVER

            (PARTITION BY FirstName, MiddleName, LastName

             ORDER BY EmployeeID)

            AS 'Row Number'

FROM        dbo.Table_Employees

Result Set:

EmployeeID  FirstName      MiddleName     LastName     Row Number

———– ———-     ———–    ———    ———–

3           Gail           A              Erickson     1

6           Gail           A              Erickson     2

1           Ken            J              Sánchez      1

4           Ken            J              Sánchez      2

2           Terri          Lee            Duffy        1

5           Terri          Lee            Duffy        2

 

(6 row(s) affected)

Now we have both unique and duplicate rows from dbo.Table_Employees. All rows with [Row Number] = 1 are unique, and all other rows are duplicate rows. Still we cannot identify that if a row is duplicate, which is a corresponding unique row for the same (except by looking at it :) , which may not be feasible if there are a large number of rows).

 

To easily identify a unique row corresponding to a duplicate row, we would like to add another column which gives us the [EmployeeID] of the unique row. In the above example, row with [EmployeeID] = 6 is duplicate row, and we would like to add a column which contains value 3 ([EmployeeID] of corresponding unique row). So the row should look like below:

EmployeeID  FirstName      MiddleName     LastName     Row Number     Duplicate Of

———– ———-     ———–    ———    ———–    ————-

6           Gail           A              Erickson     2              3

 

This can be achieved by:

 

1. Move all unique rows to TableA (all rows where [RowNumber] = 1),

2. Move all duplicate rows to TableB (all rows where [RowNumber] <> 1),

3. JOIN TableA with TableB to get value for [Duplicate Of].

 

We don’t need to create actual separate tables here, we can use CTE instead. Final query will be as follows:

 

WITH  CTE_Employees

      (EmployeeID, FirstName, MiddleName, LastName, RowNumber)

AS

(

      SELECT      EmployeeID, FirstName, MiddleName, LastName,

                  ROW_NUMBER() OVER

                  (PARTITION BY FirstName, MiddleName, LastName

                  ORDER BY EmployeeID)

      FROM        Table_Employees

)

SELECT      TableB.EmployeeID, TableB.FirstName, TableB.MiddleName,

            TableB.LastName, TableA.EmployeeID AS 'Duplicate Of'

FROM        CTE_Employees TableB

INNER JOIN CTE_Employees TableA

            ON    TableB.FirstName  = TableA.FirstName

            AND   TableB.MiddleName = TableA.MiddleName

            AND   TableB.LastName   = TableA.LastName

WHERE       TableB.RowNumber > 1 – Duplicate rows

AND         TableA.RowNumber = 1 – Unique rows

Result Set:

EmployeeID  FirstName      MiddleName     LastName      Duplicate Of

———– ———-     ———–    ———     ————-

6           Gail           A              Erickson      3

4           Ken            J              Sánchez       1

5           Terri          Lee            Duffy         2

 

(3 row(s) affected)

Hope This Helps!

Vishal

If you like this post, do like my Facebook Page -> SqlAndMe
EMail me your questions -> Vishal@SqlAndMe.com
Follow me on Twitter -> @SqlAndMe

TSQL – Common Table Expressions (CTE) – Part 4 – Multiple References

April 19, 2011 Leave a comment

A CTE can be referenced multiple time in a single statement, this can be useful in a self-join scenario. age old employee manager example…. consider following data,

image

Now to list employees with their respective managers we need to do a self-join

SELECT      EMP1.EmpID, EMP1.EmpName, EMP1.ManagerID,
            EMP2.EmpName AS [ManagerName]
FROM        dbo.Employees EMP1
LEFT JOIN   dbo.Employees EMP2 ON EMP1.ManagerID = EMP2.EmpID

We can also achieve this by using a CTE,

;WITH EmpCTE AS
(
      SELECT      EmpID, EmpName, ManagerID
      FROM        dbo.Employees
)     SELECT      CTE1.EmpID, CTE1.EmpName, CTE1.ManagerID,
                  CTE2.EmpName AS [ManagerName]
      FROM        EmpCTE CTE1
      LEFT JOIN   EmpCTE CTE2 ON CTE1.ManagerID = CTE2.EmpID

Note that we are referencing the same CTE multiple times in the statement. and the output will be as follows:

image

Now, if you try to use a derived table in this scenario – well, you can’t do it…

SELECT      dTable.EmpID, dTable.EmpName, dTable.ManagerID,
            dTable1.EmpName AS [ManagerName]
FROM        (SELECT     EmpID, EmpName, ManagerID
            FROM  dbo.Employees)AS dTable
INNER JOIN  dTable dTable1 ON dTable.ManagerID = dTable.EmpID

if you try to execute above statement, you will get an error:

Msg 208, Level 16, State 1, Line 1

Invalid object name 'dTable'.

You can download the scripts here.

Check out all five parts:

TSQL – Common Table Expressions (CTE) – Part 0 – Introduction
TSQL – Common Table Expressions (CTE) – Part 1 – Recursive Query

TSQL – Common Table Expressions (CTE) – Part 2 – Simplifying a Query

TSQL – Common Table Expressions (CTE) – Part 3 – GROUP BY calculated column

TSQL – Common Table Expressions (CTE) – Part 4 – Multiple References

Hope This Helps!

Vishal

If you like this post, do like my Facebook Page -> SqlAndMe
EMail me your questions -> Vishal@SqlAndMe.com
Follow me on Twitter -> @SqlAndMe

TSQL – Common Table Expressions (CTE) – Part 3 – GROUP BY calculated column

April 16, 2011 2 comments

There are other ways too which can be used to GROUP BY a calculated column such as using a derived table.

Consider, the below data which needs to be categorized based on the Cost Range.

 

USE AdventureWorks2008R2
SELECT StandardCost, ProductID
FROM Production.Product

 

image

 

The data can be categorized using a simple query, which divides the Standard Cost columns into required ranges:

SELECT CASE
WHEN StandardCost BETWEEN    0 AND  499 THEN 'Low'
WHEN StandardCost BETWEEN  500 AND  999 THEN 'Medium'
WHEN StandardCost BETWEEN 1000 AND 1499 THEN 'High'
WHEN StandardCost >=      1500          THEN 'Very High'
END AS [Cost Range], COUNT(ProductID)
FROM Production.Product
GROUP BY 1

 

but, if you try to execute the above query you will receive an error:

Msg 164, Level 15, State 1, Line 8 Each GROUP BY expression must contain at least one column that is not an outer reference.

What is this error? well, in simple terms we are trying to group by before we retrieve all data. How to avoid this error? There are two ways.

1. Use a CTE:

;WITH scTable([Cost Range], ProductID)
AS
(
    SELECT CASE
    WHEN StandardCost BETWEEN    0 AND  499 THEN 'Low'
    WHEN StandardCost BETWEEN  500 AND  999 THEN 'Medium'
    WHEN StandardCost BETWEEN 1000 AND 1499 THEN 'High'
    WHEN StandardCost >=      1500          THEN 'Very High'
    END AS [Cost Range], ProductID
    FROM Production.Product
)

 

we are not using GROUP BY inside the CTE here, this will simply retrieve the data, and after that we can us use GROUP BY:

 

SELECT [Cost Range], COUNT(ProductID) AS ProductsInRange
FROM scTable
GROUP BY [Cost Range]

 

and, here’s the required output:

 

image

2. Using derived table: Same output can be achieved using a derived table:

SELECT dTable.[Cost Range], COUNT(dTable.[ProductID]) AS ProductsInRange
FROM
(
    SELECT CASE
    WHEN StandardCost BETWEEN    0 AND  499 THEN 'Low'
    WHEN StandardCost BETWEEN  500 AND  999 THEN 'Medium'
    WHEN StandardCost BETWEEN 1000 AND 1499 THEN 'High'
    WHEN StandardCost >=      1500          THEN 'Very High'
    END AS [Cost Range], ProductID
    FROM Production.Product
) AS dTable
  GROUP BY dTable.[Cost Range]

 

image

 

And, the execution plans generated will also be identical. so either can be used in such a scenario. image

You can download the sample scripts used in this example here.

Hope This Helps!

Vishal

If you like this post, do like my Facebook Page -> SqlAndMe
EMail me your questions -> Vishal@SqlAndMe.com
Follow me on Twitter -> @SqlAndMe

TSQL – Common Table Expressions (CTE) – Part 2 – Simplifying a Query

April 15, 2011 Leave a comment

Another use of CTE can be to simply a query which joins a lot of tables. we can divide he number of joins between the CTE and the query. Hence, making it more readable and easier to understand.

For this example, I have create five different tables for Product Name, Category, Description, Model and Price. each table contains a primary key Product ID.

Sample data is used as below:

image image image image image

Now, to return all columns in a single Result set we can write a query:

SELECT      P.ProductID, P.ProductName, PC.Category,
            PD.Description, PM.Model, PP.Price
FROM        dbo.Product P
INNER JOIN  dbo.ProductCategory PC
            ON PC.ProductID = P.ProductID
INNER JOIN  dbo.ProductDescription PD
            ON PD.ProductID = P.ProductID
INNER JOIN  dbo.ProductModel PM
            ON PM.ProductID = P.ProductID
INNER JOIN  dbo.ProductPrice PP
            ON PP.ProductID = P.ProductID

This works perfectly, the only problem here is number of JOINS required. And as the number of tables increase it would get more messier!

A CTE can be put to use to steal some joins from the query. for our example, we can perform two JOINs inside a CTE, and two JOINs in the outer query. Thus, making it more readable.

It can be converted to:

;WITH ProductDetails(ProductID, Category, Description, Model)
AS
( 
      – JOINs used in CTE = 2
      SELECT PC.ProductID, PC.Category, PD.Description, PM.Model
      FROM dbo.ProductCategory PC
      INNER JOIN dbo.ProductDescription PD ON PD.ProductID = PC.ProductID
      INNER JOIN dbo.ProductModel PM ON PM.ProductID = PC.ProductID
)
– JOINS used in outer query = 2
      SELECT PD.ProductID, P.ProductName, PD.Category, PD.Description, 
      PD.Model, PP.Price
      FROM ProductDetails PD
      INNER JOIN dbo.ProductPrice PP ON PP.ProductID = PD.ProductID
      INNER JOIN dbo.Product P ON P.ProductID = PD.ProductID

looks good, doesn’t it?

You can download the scripts used in this example here.

Hope This Helps!

Vishal

If you like this post, do like my Facebook Page -> SqlAndMe
EMail me your questions -> Vishal@SqlAndMe.com
Follow me on Twitter -> @SqlAndMe

TSQL – Common Table Expressions (CTE) – Part 1 – Recursive Query

April 14, 2011 2 comments

Let’s take a look at how CTE can be used to create a recursive query:

In a typical employee – supervisor scenario, A supervisor is an employee and an employee can be a supervisor of another employee(s).

This is also known as a self-referencing. For example:

image

The above table contains a list of Employee IDs and Employee Names. It also contains a column named ManagerID for Employee’s Manager. The Manager here is also an employee.

To generate a hierarchical list of all employees:
1.
we first need to retrieve root employee, i.e. top level employee,
2.
after the root level is retrieved, we need to find all child nodes for that root level,
3.
perform step 2 until all levels are retrieved.

This can be achieved using a Recursive Common Table Expression, which consists of two parts:
1. Retrieve root level – this is known as Anchor in a Recursive CTE
2.
retrieve all children of root level – this is known as Recursive Part in a Recursive CTE.

These process is performed recursively until all level are retrieved.

The Anchor part will be written as follows: (Retrieving root level)

SELECT Emp.ManagerID, Emp.EmpID, Emp.EmpName, 0 AS EmpLevel
FROM dbo.Employees Emp
WHERE ManagerID IS NULL

And, the recursive part is written as: (Retrieving all children of root), “EmployeeHierarchy” will be the CTE itself, hence making it recursive:

SELECT Emp.ManagerID, Emp.EmpID, Emp.EmpName, EmpLevel + 1
FROM dbo.[Employees] Emp
INNER JOIN EmployeeHierarchy AS EmpH ON Emp.ManagerID = EmpH.EmpID

After combining both parts the full query will be:

WITH EmployeeHierarchy(ManagerID, EmpID, EmpName, EmpLevel)
AS
( 
      SELECT Emp.ManagerID, Emp.EmpID, Emp.EmpName, 0 AS EmpLevel
      FROM dbo.Employees Emp
      WHERE ManagerID IS NULL 
 
      UNION ALL 
 
      SELECT Emp.ManagerID, Emp.EmpID, Emp.EmpName, EmpLevel + 1
      FROM dbo.Employees Emp
      INNER JOIN EmployeeHierarchy AS EmpH ON Emp.ManagerID = EmpH.EmpID
)     SELECT ManagerID, EmpID, EmpName, EmpLevel
      FROM EmployeeHierarchy
      ORDER BY ManagerID

The output of above query:

image

Download the sample scripts used in this example here.

Hope This Helps!

Vishal

If you like this post, do like my Facebook Page -> SqlAndMe
EMail me your questions -> Vishal@SqlAndMe.com
Follow me on Twitter -> @SqlAndMe

TSQL – Common Table Expressions (CTE) – Part 0 – Introduction

April 13, 2011 Leave a comment

What is it? A CTE Creates a temporary named result set. It is available within the execution of a single DML statement. A Common Table Expression can be used to:

1. Create Recursive Query
2. Create a temporary view, to simplify another query
3. To avoid “Msg 164, Level 15, State 1” – group by a derived column / non deterministic
   
function
4. And, Use the result set returned multiple time in the same statement.

Let’s see how to create one,
I have created two tables for this example – FirstNames, LastNames as below:

Data in FirstNames and LastNames :
image image

To combine the firstname and lastname in output, we can use INNER JOIN:

SELECT FN.EmpID, FN.FirstName, LN.LastName
FROM dbo.FirstNames FN
INNER JOIN dbo.LastNames LN ON LN.EmpID = FN.EmpID

To write the same using CTE, let’s look at the syntax first:

WITH TempResultSetName(Column1, Column2, …)
AS
( 
      SELECT Query
)
SELECT Column1, Column2, FROM TempResultSetName

The “WITH TempResultSetName” defines a name for the result set, and the Column List defines the column names for the result set – this is optional, if column names are not specified it will used the names from the “SELECT Query”.

The “SELECT Query” is the actual data for the results. i.e. which returns actual rows.

The Final part of CTE is the “SELECT Column1, ….. FROM TempResultSetName” – this simply outputs the result set to user. you are not limited to only SELECT here, any valid DML can be used.

However, you can only use the TempResultSet in FROM clause of a SELECT statement, i.e. you can not INSERT INTO or UPDATE or DELETE to/from TempResultSet.

Now, let’s write the actual CTE to replace INNER JOIN:

WITH TempResultSetName(EmpID, FirstName, LastName)
AS
( 
      SELECT FN.EmpID, FN.FirstName, LN.LastName
      FROM dbo.FirstNames FN
      INNER JOIN dbo.LastNames LN ON LN.EmpID = FN.EmpID
)     SELECT EmpID, FirstName, LastName
      FROM TempResultSetName

Line 1 – Defines the Result Set and Columns in Result Set
Line 4 to 6 – Populates the Result Set
Line 8 & 9 – Outputs the Result Set – The output can also include other tables, I have added examples in the sample script.

Download the scripts used in post here.

Hope This Helps!

Vishal

If you like this post, do like my Facebook Page -> SqlAndMe
EMail me your questions -> Vishal@SqlAndMe.com
Follow me on Twitter -> @SqlAndMe

Follow

Get every new post delivered to your Inbox.

Join 248 other followers