ACCESS_METHODS_DATASET_PARENT – Not one I’d heard of either!

I was doing some database tuning recently and I found a missing index that I wanted to add. This is a reasonably straightforward thing to want to do, so I scripted it up and executed the command and went to grab a coffee.


CREATE INDEX IX_Test ON dbo.Test(TestColumn);

<5 minutes passes>

So, I come back to my desk only to find that the index hasn’t finished creating yet! This was unexpected since it was a reasonably narrow index on a table that was only 2-3GB in size.

Using some queries to dig into the DMV’s and a look at the waits I see my index is waiting on LATCH_EX with a latch class of ACCESS_METHODS_DATASET_PARENT, and it had been waiting from the moment I left my desk! This was not a wait type I was familiar with so some research was required.

Reaching for my favourite search engine I soon stumbled upon this blog post from Paul Randal http://www.sqlskills.com/blogs/paul/most-common-latch-classes-and-what-they-mean/.

Basically following his advice and doing some digging I found that the MAXDOP on this server was set to 0 which is the default. This is a 24 core server and I wouldn’t normally advise setting MAXDOP to 0 on a server of this size. The cost threshold for parallelism was set to 5 (also the default) which is quite low considering the workloads performed by this box.

In order to get around the problem I discussed changing the MAXDOP of this server to 8 but the team responsible for it didn’t want to make the change at that time, and opted to change it at a later date. Great, what now? I needed this index and I needed it now…

On this occasion I opted to reach for a MAXDOP hint. For those that didn’t know, you can apply a MAXDOP hint to an index creation statement. The syntax is shown below:


CREATE INDEX IX_Test ON dbo.Test(TestColumn) WITH (MAXDOP = 4);

This time when I executed the script the index creation took only 2 minutes, and the procedures that needed it were now executing much faster than before.

Essentially I’ve written this post in the hope that it helps someone else out if they stumble across the same problem. Aside from Paul’s blog linked above I couldn’t really find any other useful troubleshooting advice for this particular issue. Happy troubleshooting!

Calculating Age with an Inline Table Valued Function

I was working on a data warehousing project recently where I was working with VERY large data sets. It was a customer insight warehouse and as part of that warehouse details about customers were stored including their DOB.

Part of the requirement was to keep a record of their age at the time of the transaction and also be able to track that person’s current age.

As with most things we always try to re-use something (a piece of code, a process) that we’ve used before to save time right?

So there was a UDF in the database called ufn_GetAge. This was a fairly typical scalar UDF that took two date parameters and then worked out the elapsed time between the two to calculate an age:


CREATE FUNCTION [dbo].[ufn_GetAge]
 (
 @DOB    DATETIME,

@Date       DATETIME
 )

RETURNS INT

AS

BEGIN

DECLARE @Age         INT

IF @DOB >= @Date

RETURN 0

SET @Age = DATEDIFF(YY, @DOB, @Date)

IF MONTH(@DOB) > MONTH(@Date) OR

(MONTH(@DOB) = MONTH(@Date) AND

DAY(@DOB)   > DAY(@Date))

SET @Age = @Age - 1

RETURN @Age

END

This had always “worked fine” before and so it was re-used as part of an ETL load for a VERY large data set (300+ million rows). So what happened when we ran the ETL? Well, all I can tell you is it took a very long time…

Some debugging on the ETL was performed and this function was found to be the culprit. A new solution was called for, enter the Inline Table Valued Function.

So… we needed an accurate way of calculating the difference between two dates in order to come up with an Age (in whole years). I wrote several functions that should have done the trick but under testing showed the results generated were not always quite correct. Eventually I came up with a piece of code that does the trick:


CREATE FUNCTION [dbo].[iTVF_CalculateAge]

(

@DOB date,

@Date date

)

RETURNS TABLE

AS

RETURN

(

SELECT

CASE

WHEN (@DOB IS NULL) OR (@Date IS NULL) THEN -1

ELSE DATEDIFF(hour,@DOB, @Date)/8766

END AS Age

)

CAVEAT: This always seems to come up with the correct answer and passed all the unit tests we ran on it. If you wish to use it, ensure that you test it to make sure it generates the results you need. If you do find an inaccuracy then do please let me know in the comments below this post.

So… how does this new inline function perform when compared to the original UDF above? I’m glad you asked!

Firstly I created a Test database and two test tables. Once called Test and once called Test2. Both contain two columns, an ID (INT IDENTITY (1,1)) and a DOB (DATE). I inserted 1 million dates into Test and 10 million dates into Test2.

Testing, Testing, 123….

Firstly the 1 million row dataset:


SET STATISTICS IO ON;

SET STATISTICS TIME ON;

SELECT t.ID, t.DOB FROM dbo.Test t;

SELECT t.ID, t.DOB, [dbo].[ufn_GetAge](t.DOB,GETDATE()) FROM dbo.Test t;

SELECT t.ID, t.DOB, age.Age

FROM dbo.Test t

CROSS APPLY [dbo].[iTVF_CalculateAge](t.DOB,GETDATE()) age;

SET STATISTICS IO OFF;

SET STATISTICS TIME OFF;

The first query is run merely to get an idea of raw performance to select out the data (no functions or calculations applied), the second query runs the UDF and the third uses the new inline function.

And so onto the results:

(1000000 row(s) affected)
Table 'Test'. Scan count 1, logical reads 1986, physical reads 3, read-ahead reads 1973, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 328 ms,  elapsed time = 5164 ms.
(1000000 row(s) affected)
Table 'Test'. Scan count 1, logical reads 1986, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5397 ms,  elapsed time = 6567 ms.
(1000000 row(s) affected)
Table 'Test'. Scan count 1, logical reads 1986, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 874 ms,  elapsed time = 5409 ms.

As you can see from the results, the raw select of all the data (1 million rows) it took 5164ms, the UDF took 6567ms and the iTVF took 5409ms. So on time alone the iTVF wins, but the real benefit is in the CPU cycles… the iTVF took only 874ms where the UDF took 5397ms! I declare iTVF the winner.

And over 10 million rows:

(10000000 row(s) affected)
Table 'Test2'. Scan count 1, logical reads 19838, physical reads 2, read-ahead reads 19834, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 3152 ms,  elapsed time = 49175 ms.
(10000000 row(s) affected)
Table 'Test2'. Scan count 1, logical reads 19838, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 52494 ms,  elapsed time = 66646 ms.
(10000000 row(s) affected)
Table 'Test2'. Scan count 1, logical reads 19838, physical reads 2, read-ahead reads 19834, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 6724 ms,  elapsed time = 51387 ms.

The results are very clear. Over the 10 million row results set, using the iTVF we have saved nearly 48,000 ms in CPU time, and 15 seconds in elapsed time! When this was used where I’ve been working recently we took an 18 minute query down to 10 seconds. That is a hell of a saving.

As the result set grows the savings grow, I encourage you to consider converting your UDF’s to iTVF’s and start boosting the performance of some of your procedures!