Wednesday, March 7, 2012

Hierarchical table Functions Vs Hierarchical CTE

Recently I was in need of a hierarchical tree data. I learned about CTE and how they can be used to build hierarchical data with simple syntax. I used CTE and was through with the task. Later during free time, I tried to compare CTE approach with the traditional SQL 2K Table Function approach. It was surprising to see the query costs when I ran both the modes at one go...

Query Cost (relative to batch) : 0.49%
Query Text : Select * From fn_GetTree(8);

Query Cost (relative to batch) : 99.51%
Query Text : with treedata (id, parentid, status, prevStatus, lvl) as (select ...)


What does that indicate? Does it mean that the Table Function approach is much faster than CTE? I am sure that I was not making unwanted Joins in the CTE mode.

Can someone explain why that huge difference is there? And what the scenarios where CTE is better over Table Functions?

Functions can cause the plan guessing to be off big time. It is just guessing that your function is of the lowest cost possible, since if 8 happens to return no rows, it might be. If 8 returns 200 bajillion rows, it will be the other way around.

Did you do this when running the query, or just the guess?

Try running the statements with statistics io on (look up STATISTICS IO option in BOL) and see how they actually compare. They may be very close to the same size...

|||

Hi Louis,

I followed your advice on turning on the Statistics. Here is what I get when running the queries...

SET STATISTICS TIME ON;
SET STATISTICS IO ON;

SELECT * FROM ufn_GetClientTree (9260);
GO

with ClientTree (ClientId, ParentClientId, Status, PrevStatus, Level#) AS
(
Select Client.ClientId, Client.ParentClientId, Client.Status, Client.PrevStatus, 0
FROM Client
Where Client.ClientId = 9260
UNION ALL
Select Client.ClientId, Client.ParentClientId, Client.Status, Client.PrevStatus, ClientTree.Level# + 1
FROM Client
INNER JOIN ClientTree On ClientTree.ClientId = Client.ParentClientId
)
SELECT * FROM ClientTree;

Here is the output I get in the messages pane...

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(14 row(s) affected)

Table '#1BC821DD'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 47 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.

(14 row(s) affected)

Table 'Worktable'. Scan count 16, logical reads 30626, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Client'. Scan count 1, logical reads 276, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 75 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

Total records in the table : 15041

|||

A few more questions:

Does one actually take that much longer to execute?

Can you post the code for the UDF?

Could you run them in two batches, each with DBCC FREEPROCCACHE prior to the execution (probably not on your production server :)

These are kind of interesting results.

|||

Hi Louis,

I appreciate your interest on this topic (infact I knew you would be the one responding to this topic ;)). Here is the function and the CTE for further investigation...


Table Function:

CREATE FUNCTION [dbo].[ufn_GetClientTree] ( @.parentId AS int )
RETURNS @.tree table
(
ClientId INT NOT NULL,
ParentClientId INT NULL,
Status TINYINT NULL,
PrevStatus TINYINT NULL,
Level# INT NOT NULL
)
AS
BEGIN
DECLARE @.lvl AS int
SELECT @.lvl = 1
INSERT INTO @.tree (ClientId, ParentClientId, Status, PrevStatus, Level#)
SELECT clientId, NULL, Status, PrevStatus, 0
FROM Client
WHERE clientId = @.parentId
INSERT INTO @.tree (ClientId, ParentClientId, Status, PrevStatus, Level#)
SELECT ClientId, ParentclientId, Status, PrevStatus, @.lvl
FROM Client
WHERE ParentClientId = @.parentId
WHILE @.@.ROWCOUNT > 0
BEGIN
SET @.lvl = @.lvl + 1
INSERT INTO @.tree (ClientId, ParentClientId, Status, PrevStatus, Level#)
SELECT Client.ClientId, Client.ParentClientId, Client.Status, Client.PrevStatus, @.lvl
FROM Client
INNER JOIN @.tree AS T ON Client.ParentClientId = T.ClientId AND T.Level# = @.lvl - 1
END

RETURN;
END;

CTE:

WITH ClientTree (ClientId, ParentClientId, Status, PrevStatus, Level#) AS
(
Select Client.ClientId, Client.ParentClientId, Client.Status, Client.PrevStatus, 0
FROM Client
WHERE Client.ClientId = 9260
UNION ALL
SELECT Client.ClientId, Client.ParentClientId, Client.Status, Client.PrevStatus, T.Level# + 1
FROM Client
INNER JOIN ClientTree AS T On T.ClientId = Client.ParentClientId
)
SELECT * FROM ClientTree;


The total number of records in the Client table is 15052 and 14112 records of these dont have any children.

Here is the output of the function and CTE:

ClientId -- PClientId -- Sts -- PSts -- Lvl
9260 -- NULL -- 1 -- NULL -- 0
3190 -- 9260 -- 1 -- NULL -- 1
13755 -- 9260 -- 1 -- NULL -- 1
8 -- 13755 -- 1 -- NULL -- 2
6 -- 8 -- 1 -- NULL -- 3
3192 -- 8 -- 1 -- NULL -- 3
7029 -- 8 -- 1 -- NULL -- 3
123740 -- 8 -- 1 -- NULL -- 3
3 -- 6 -- 1 -- NULL -- 4
7 -- 6 -- 1 -- NULL -- 4
11 -- 7 -- 1 -- NULL -- 5
41 -- 7 -- 1 -- NULL -- 5
10684 -- 3 -- 1 -- NULL -- 5
33550 -- 3 -- 1 -- NULL -- 5

|||Itzik has a good article on this topic.
http://www.sqlmag.com/articles/index.cfm?articleid=8826|||

Is there any considerable difference in time for either?

I think that the function is hiding the costs (they tend to do that.) Try executing the code without the function wrapper to see te costs. I took this example from the Pro SQL Server 2005 code I wrote (it runs in Adventureworks)

The first execute showed a lot more reads than the second one with the function (you would think that the function was perfect!) But executing the code without the function showed small sets of reads for each level. The code in the CTE accumulated them in one bucket.

USE AdventureWorks --all of the code is to be executed
--in adventureworks

set statistics io on

GO

--First, using 2000 style sans function
DECLARE @.managerId int
SELECT @.managerId = 140
DECLARE @.outTable table (employeeId int, managerId int, treeLevel int)

--used to hold the level of the tree we are currently at in the loop
DECLARE @.treeLevel as int
SET @.treelevel = 1

--get the top level
INSERT INTO @.outTable
SELECT employeeId, managerId, @.treelevel as treelevel
FROM HumanResources.employee as employee
WHERE (employee.managerId = @.managerId)

WHILE (1 = 1) --imitates do...until construct
BEGIN

INSERT INTO @.outTable
SELECT employee.employeeId, employee.managerId,
treelevel + 1 as treelevel
FROM HumanResources.employee as employee
JOIN @.outTable as ht
ON employee.managerId = ht.employeeId
--this where isolates a given level of the tree
WHERE EXISTS( SELECT *
FROM @.outTable AS holdTree
WHERE treelevel = @.treelevel
AND employee.managerId = holdtree.employeeId)

IF @.@.rowcount = 0
BEGIN
BREAK
END

SET @.treelevel = @.treelevel + 1
END
select *
from @.outTable
go

/* --then here is that code in a function
create function managerTree
(
@.managerId int
)
returns @.outTable table (employeeId int, managerId int, treeLevel int)
as
BEGIN
--used to hold the level of the tree we are currently at in the loop
DECLARE @.treeLevel as int
SET @.treelevel = 1

--get the top level
INSERT INTO @.outTable
SELECT employeeId, managerId, @.treelevel as treelevel
FROM HumanResources.employee as employee
WHERE (employee.managerId = @.managerId)

WHILE (1 = 1) --imitates do...until construct
BEGIN

INSERT INTO @.outTable
SELECT employee.employeeId, employee.managerId,
treelevel + 1 as treelevel
FROM HumanResources.employee as employee
JOIN @.outTable as ht
ON employee.managerId = ht.employeeId
--this where isolates a given level of the tree
WHERE EXISTS( SELECT *
FROM @.outTable AS holdTree
WHERE treelevel = @.treelevel
AND employee.managerId = holdtree.employeeId)

IF @.@.rowcount = 0
BEGIN
BREAK
END

SET @.treelevel = @.treelevel + 1
END
RETURN
END
*/

--this executes that function

select *
from managerTree (140)

--recursive query using CTE

DECLARE @.managerId int
SET @.managerId = 140;

WITH EmployeeHierarchy (EmployeeId, ManagerId)
AS
(
SELECT EmployeeID, ManagerID
FROM HumanResources.Employee as Employee
WHERE ManagerID=@.managerId

UNION ALL

SELECT Employee.EmployeeID, Employee.ManagerID
FROM HumanResources.Employee as Employee
INNER JOIN EmployeeHierarchy
on Employee.ManagerID= EmployeeHierarchy.EmployeeID)

SELECT *
from EmployeeHierarchy

|||


SET STATISTICS IO ON;
SET STATISTICS TIME ON;

;

ONLY FUNCTION CONTENT:


DECLARE @.parentId AS int
DECLARE @.tree table
(
ClientId INT NOT NULL,
ParentClientId INT NULL,
Status TINYINT NULL,
PrevStatus TINYINT NULL,
Level# INT NOT NULL
)
DECLARE @.lvl AS int

SELECT @.lvl = 1, @.parentId = 9260
INSERT INTO @.tree (ClientId, ParentClientId, Status, PrevStatus, Level#)
SELECT clientId, NULL, Status, PrevStatus, 0
FROM Client
WHERE clientId = @.parentId
INSERT INTO @.tree (ClientId, ParentClientId, Status, PrevStatus, Level#)
SELECT ClientId, ParentclientId, Status, PrevStatus, @.lvl
FROM Client
WHERE ParentClientId = @.parentId
WHILE @.@.ROWCOUNT > 0
BEGIN
SET @.lvl = @.lvl + 1
INSERT INTO @.tree (ClientId, ParentClientId, Status, PrevStatus, Level#)
SELECT Client.ClientId, Client.ParentClientId, Client.Status, Client.PrevStatus, @.lvl
FROM Client
INNER JOIN @.tree AS T ON Client.ParentClientId = T.ClientId AND T.Level# = @.lvl - 1
END

SELECT * FROM @.tree
;


CTE:

WITH ClientTree (ClientId, ParentClientId, Status, PrevStatus, Level#) AS
(
Select Client.ClientId, Client.ParentClientId, Client.Status, Client.PrevStatus, 0
FROM Client
WHERE Client.ClientId = 9260
UNION ALL
SELECT Client.ClientId, Client.ParentClientId, Client.Status, Client.PrevStatus, T.Level# + 1
FROM Client
INNER JOIN ClientTree AS T On T.ClientId = Client.ParentClientId
)
SELECT * FROM ClientTree;

***************************************************************************************


SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(1 row(s) affected)

Table '#52593CB8'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Client'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(2 row(s) affected)

Table '#52593CB8'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Client'. Scan count 1, logical reads 274, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 4 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(1 row(s) affected)

Table '#52593CB8'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Client'. Scan count 1, logical reads 274, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 9 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
Table '#52593CB8'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(4 row(s) affected)

Table 'Worktable'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Client'. Scan count 1, logical reads 274, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 8 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(2 row(s) affected)

Table '#52593CB8'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Client'. Scan count 1, logical reads 274, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 8 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(4 row(s) affected)

Table '#52593CB8'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Client'. Scan count 1, logical reads 274, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 8 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(0 row(s) affected)

Table '#52593CB8'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Client'. Scan count 1, logical reads 274, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 8 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

(14 row(s) affected)

Table '#52593CB8'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.

##################################################################
##################################################################

(14 row(s) affected)

Table 'Worktable'. Scan count 16, logical reads 30648, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Client'. Scan count 1, logical reads 276, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 75 ms.

Having got this information, how should I check which is better. How do i sum the logical reads. Is it simply adding all the logical reads or has some formula?

When I sum all the logical reads, I get 1707 for the 200 version, and 30924 for the CTE.

I hope i am not asking too many questions and bothering you people by still not closing this thread. Thanks for the time you have spared for this thread so far.

No comments:

Post a Comment