I admit that sp_prepare is an odd bird, and thankfully one that isn’t used a ton. I still run into applications that are unfortunate enough to have been written by people who hate bloggers and continue to use it, though, so here goes.
When you use sp_prepare, parameterized queries behave differently from normal: the parameters don’t get histogram cardinality estimates, they get density vector cardinality estimates.
Here’s a quick demo to show you that in action:
CREATE INDEX
p
ON dbo.Posts
(ParentId)
WITH
(
SORT_IN_TEMPDB = ON,
DATA_COMPRESSION = PAGE
);
DECLARE
@handle int =
NULL,
@parameters nvarchar(MAX) =
N'@ParentId int',
@sql nvarchar(MAX) =
N'
SELECT
c = COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE p.ParentId = @ParentId;
';
EXEC sys.sp_prepare
@handle OUTPUT,
@parameters,
@sql;
EXEC sys.sp_execute
@handle,
184618;
EXEC sys.sp_execute
@handle,
0;
EXEC sys.sp_unprepare
@handle;
OldPlan
The plans for the two executions have the same poor cardinality estimate. In this case, since we have an ideal index and there’s no real complexity, there’s no performance issue.
But you can probably guess (at least for the second query) how being off by 201,694,000% might cause issues in queries that ask a bit more of the optimizer.
The point here is that both queries get the same incorrect estimate of 3 rows. If you add a recompile hint, or execute the same code using sp_executesql, the first query will get a histogram cardinality estimate, and the second query will reuse it.
Given the historical behavior of sp_prepare, I was a little surprised that the Parameter Sensitive Plan (PSP) optimization available in SQL Server 2022 kicked in.
NewDifferent
If we change the database compatibility level to 160, the plans change a bit.
ALTER DATABASE StackOverflow2013 SET COMPATIBILITY_LEVEL = 160;
Now we see two different plans without a recompilation, as well as the plan per value option text at the end of the queries, indicating the PSP optimization kicked in.
The differences here are fairly obvious, but…
Each plan gets accurate cardinality
The second plan goes parallel to make processing ~6 million rows faster
Different aggregates more suited to the amount of data in play are chosen (the hash match aggregate is eligible for batch mode)
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
Imagine you have a rather complicated query that you want to abstract into a simple query for your less-than-with-it end users.
A view is probably a pretty good way of doing that, since you can shrink your preposterously-constructed tour through every table in the schema down to a simple select-from-one-object.
The problem is that now everyone expects it to perform well throughout all time, under any circumstances, come what may. It’s sort of like how your parents expect dinner to be $20 and tips to be 20% regardless of where they go or what they order.
Lobster? $5.
Steak? $5.
Bottle of wine? $5.
Any dessert you can imagine? $5.
Tip? Gosh, mister, another $5?
I sincerely apologize to anyone who continues to live in, or who moved to Europe to avoid tipping.
If you’d like some roommates, I have some parents you’d get along with.
Viewfinder
Creating a view in SQL Server doesn’t do anything special for you, outside of not making people remember your [reference to joke above] query.
You can put all manner of garbage in your view, make it reference another half dozen views full of garbage, and expect sparkling clean query performance every time.
Guess what happens?
Reality.
When you use views, the only value is abstraction. You still need to be concerned with how the query is written, and if the query has decent indexes to support it. In other words, you can’t just write a view and expect the optimizer to do anything special with it.
SQL Server doesn’t cache results, it only caches raw data. If you want the results of a view to be saved, you need to index it.
Take these two dummy queries, one against a created view, and the other an ad hoc query identical to what’s in the view:
CREATE OR ALTER VIEW
dbo.just_a_query
WITH SCHEMABINDING
AS
SELECT
p.OwnerUserId,
TotalScore =
ISNULL
(
SUM(p.Score),
0
),
TotalPosts =
COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE EXISTS
(
SELECT
1/0
FROM dbo.Votes AS v
WHERE
v.PostId = p.Id
)
GROUP BY
p.OwnerUserId;
GO
SELECT
p.OwnerUserId,
TotalScore =
ISNULL
(
SUM(p.Score),
0
),
TotalPosts =
COUNT_BIG(*)
FROM dbo.Posts AS p
WHERE EXISTS
(
SELECT
1/0
FROM dbo.Votes AS v
WHERE
v.PostId = p.Id
)
AND
p.OwnerUserId = 22656
GROUP BY
p.OwnerUserId;
GO
SELECT
jaq.*
FROM dbo.just_a_query AS jaq
WHERE
jaq.OwnerUserId = 22656;
GO
The plans are identical, and identically bad. Why? Because I didn’t try very hard, and there’s no good indexes for them.
Remember when I said that’s important?
Keep in mind this is a query with some batch mode involved, so it could be a lot worse. But both instances complete within a second or so of each other.
The horrible thing is that indexed views are so strict in SQL Server that we can’t even create one on the view in question. That really sucks. We get this error.
CREATE UNIQUE CLUSTERED INDEX
cuqadoodledoo
ON dbo.not_just_a_query
(
OwnerUserId
)
WITH
(
SORT_IN_TEMPDB = ON,
DATA_COMPRESSION = PAGE
);
Msg 10127, Level 16, State 1, Line 95
Cannot create index on view “StackOverflow2013.dbo.not_just_a_query” because it contains one or more subqueries.
Consider changing the view to use only joins instead of subqueries. Alternatively, consider not indexing this view.
Alternatively, go screw yourself. Allowing joins but not exists is somewhat baffling, since they’re quite different in that joins allow for multiple matches but exists does not. We’d have to do a lot of fancy grouping footwork to get equivalent results with a join, since distinct isn’t allowed in an indexed view in SQL Server either.
We could also pull the exists out of the view, add the Id column to the select list, group by that and OwnerUserId, index both of them, and… yeah nah.
I have no idea who’s in charge of indexed views in the product at this point, but a sufficiently lubricated republic would likely come calling with tar and feathers in the face of this injustice.
This is basic query syntax. It’s not like uh… min, max, sum, avg, except, intersect, union, union all, cross apply, outer apply, outer joins, or um, hey, is it too late for me to change careers?
The Pain In Pain Falls Painly On The Pain
You may have ended up here looking to learn all the minute differences between views and indexed views in SQL Server.
You may be disappointed in reading this post, but I can assure you that you’re not nearly as disappointed in this post as I am with indexed views in SQL Server.
They’re like one of those articles about flying cars where you read the headline and you’re like “woah, I’m living in the future”, but then three paragraphs in you find out the cars don’t really fly or drive and they might actually just be igloos that are only big enough for an Italian Greyhound or a paper plane that the author’s kid glued wheels to.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
When you’re trying to figure out how to store string data, it often seems easiest to just choose an extra long — even MAX — data type to avoid future truncation errors.
Even if you’re storing strings with a known, absolute length, developers may choose to not enforce that in the application, either via a drop down menu or other form of validation.
And so to avoid errors when users try to put their oh-so-important data in their oh-so-expensive database, we get columns added to tables that can fit a galaxy of data in them, when we only need to store an ashtray worth of data.
While getting data into those columns is relatively easy — most application inserts are single rows — getting data out of those columns can be quite painful, whether it’s searching or just presenting in the select portion of a query.
Let’s look at a couple simple examples of how that happens.
Search Engine
Let’s take a query like this one:
SELECT TOP (20)
p.Id,
p.Title,
p.Body
FROM dbo.Posts AS p
WHERE p.Body LIKE N'SQL Server%';
The Body column in the Posts table is nvarchar and MAX, but the same thing would happen with a varchar column.
If you need a simple way to remember how to pronounce those data types, just remember to Pahk yah (n)vahcah in Hahvahd Yahd.
Moving on – while much has been written about leading wildcard searches (that start with a % sign), we don’t do that here. Also, in general, using charindex or patindex instead of leading wildcard like searching won’t buy you all that much (if anything at all).
Anyway, since you can’t put a MAX datatype in the key of an index, part of the problem with them is that there’s no way to efficiently organize the data for searching. Included columns don’t do that, and so we end up with a query plan that looks some-such like this:
We spend ~13.5 seconds scanning the clustered index on the Posts table, then about two minutes and twenty seven seconds (minus the original 13.5) applying the predicate looking for posts that start with SQL Server.
That’s a pretty long time to track down and return 19 rows.
Let’s change the query a little bit and look at how else big string columns can cause problems.
Memory Bank
Rather than search on the Body column, let’s select some values from it ordered by the Score column.
Since Score isn’t indexed, it’s not sorted in the database. That means SQL Server needs to ask for memory to put the data we’re selecting in the order we’re asking for.
SELECT TOP (200)
p.Body
FROM dbo.Posts AS p
ORDER BY p.Score DESC;
The plan for this query asks for a 5GB memory grant:
I know what you’re thinking: the Body column probably has some pretty big data in it, and you’re right. In this case, it’s the right data type to use.
The bad news is that SQL Server will makes the same memory grant estimation based on the size of the data we need to sort whether or not it’s a good choice.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
Several weeks back, I blogged about a missed opportunity with the new parameter sensitive plan feature. At the time, I thought that there was indeed sufficient skewness available to trigger the additional plan variants, and apparently some nice folks at Microsoft agreed.
I’m not gonna go into a lot of depth here, because I’m writing this while traveling, but we get the (mostly) desired outcome of two different plans being generated. Each plan is more suitable to the amount of data that the query has to process.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
Why? Because that’s the way it’s implemented in Postgres and DB2. Oracle, of course, just calls it TRUNC.
So, while it’s nice to have (what appears to be) the same behavior, it doesn’t exactly help to not have a 1:1 calling equivalent to other platforms.
I assume most of these additions to T-SQL are for cross-platform development and migrations.
Of course, Microsoft being so gosh darn late to this game means folks have probably been rolling-their-own versions of these functions for years.
If they went and called their system function DATE_TRUNC or even TRUNC, they might have some object naming issues to contend with.
Well, okay. But how does it work?
Childish Games
Here are some quick examples of how you call it.
SELECT TOP (10)
u.DisplayName,
year =
DATETRUNC(YEAR, u.LastAccessDate),
quarter =
DATETRUNC(QUARTER, u.LastAccessDate),
month =
DATETRUNC(MONTH, u.LastAccessDate),
dayofyear =
DATETRUNC(DAYOFYEAR, u.LastAccessDate),
day =
DATETRUNC(DAY, u.LastAccessDate),
week =
DATETRUNC(WEEK, u.LastAccessDate),
iso_week =
DATETRUNC(ISO_WEEK, u.LastAccessDate),
hour =
DATETRUNC(HOUR, u.LastAccessDate),
minute =
DATETRUNC(MINUTE, u.LastAccessDate),
second =
DATETRUNC(SECOND, u.LastAccessDate),
millisecond =
DATETRUNC(MILLISECOND, u.LastAccessDate),
microsecond =
DATETRUNC(MICROSECOND, u.LastAccessDate) /*Doesn't work with datetime because there are no microseconds*/
FROM dbo.Users AS u;
And here are the results:
The thing to note here is that there’s no rounding logic involved. You just go to the start of whatever unit of time you choose. Of course, this doesn’t seem to do anything to the millisecond portion of DATETIME, because it’s not precise enough.
But for anyone out there who was hoping for a SOMONTH function to complement the EOMONTH function, you get this instead.
Works well enough!
But does it perform, Darling?
UnSARGable?
To make any test like this worthwhile, we need an index to make data searchable.
CREATE INDEX
v
ON dbo.Votes
(CreationDate)
WITH
(
SORT_IN_TEMPDB = ON,
DATA_COMPRESSION = PAGE
);
You know, because that’s what they do. To make searching faster. Hello.
So look, under these perfect circumstances, everything performs well. But we have to do a lot of typing.
SELECT
c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.CreationDate >= DATETRUNC(YEAR, CONVERT(datetime, '20130101 00:00:00.000'));
Note here that we’re working on a literal value, not a column value, and we have to tell the datetrunc function which type we want via the convert function so that we get a simple seek plan:
SELECT
c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE v.CreationDate >= DATETRUNC(YEAR, '20130101 00:00:00.000');
This has some… obvious performance issues compared to the above plan with correct data types.
Query Pranks
Frequent blog readers will not be surprised that wrapping a table column in the new DATETRUNC function yields old performance problems:
SELECT
c = COUNT_BIG(*)
FROM dbo.Votes AS v
WHERE DATETRUNC(YEAR, v.CreationDate) >= CONVERT(datetime, '20130101 00:00:00.000');
This is particularly annoying because we’re truncating the column to the beginning of the year, which should be searchable in the index since that’s the sort order of the data in the index.
Like most functions, these are fine in the presentation layer, but terrible in the relational layer. There’s no warning about this performance degradation in the documentation, either at the example of using the function in a where clause, or in the final closing remarks.
But that’s par for the course with any of these built-in functions.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
Let’s say for the sake of argument that @table variables were always more “in-memory” than #temp tables (they’re not). They’d still have a lot of the same fundamental problems:
With all that stuff in mind: who cares if they’re only in memory? There are so many downsides for most query patterns, they could all happen in quantum space and performance would still stink.
And here’s the other thing: Most of y’all don’t have enough memory to begin with, and the memory you do have you misuse in a million ways. Why do you wanna make that worse?
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
Most of the time, if you get the estimated plan for a query, the actual execution plan will look about the same.
You’ll get the additional information of course, but the shape and most of the other attributes will match up.
Other times, things will change between the two.
Here’s a silly example I came across recently while working on demos for some other stuff.
SELECT TOP (100)
p.Id,
vs.UpVotes,
vs.DownVotes
FROM dbo.VoteStats() AS vs
JOIN dbo.Posts AS p
ON vs.PostId = p.Id
WHERE vs.DownVotes > vs.UpMultipier
AND p.CommunityOwnedDate IS NULL
AND p.ClosedDate IS NULL
ORDER BY vs.UpVotes DESC;
Estimated Execution Plan
Here’s the estimated plan I get for this query:
SQL Server estimates that a serial execution plan will work best, here.
Actual Execution Plan
Here’s the actual execution plan for the same query:
At runtime, SQL Server decides that a parallel plan is the better solution.
Wild Nights
SQL Server can totally estimate parallel plans. I’ve seen it a million times.
In this case, though, it doesn’t do that. If you get the estimated plan after executing the query, it’ll come back with the parallel version.
Why does this happen? Well, as SQL Server adds features to the Intelligent Query Processing, more and more decisions are made when a query executes.
Stuff like this can make estimated and actual plans look much different from each other:
Adaptive joins
Interleaved execution
Table variable deferred compilation
Batch mode on row store
Wonders of modern query optimizers, eh?
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
When we talk about finding queries to tune, there’s an unfortunate term that gets thrown around: Expensive Queries.
Why is it unfortunate? Well, it reinforces the wrong mindset when it comes to query tuning, and leads people down the wrong path, looking at the wrong metrics.
SQL Server does have a cost-based optimizer, but those costs don’t mean anything to your hardware, or even to how long a query runs for.
Those costs are all estimates, based on two-decade old computer specs. There are many times when they’re wrong, or not properly aligned with reality.
Worse, it leads people to want to do crazy things like sort the plan cache by query cost to find things to tune.
Worse than that, they’ll look at “query cost relative to batch” to compare two queries for efficiency.
Ethically Sourced
There are many sources to find queries eating up your server hardware.
The point of this post isn’t to teach you how to use any of those things, but to teach you how to be smarter about using them in whatever way you’re comfortable.
My two favorite metrics to look at when looking for queries to tune are CPU and RAM. I explain why in the post, but the short story is that they’re reliable, real-life metrics that can be directly measured before and after to gauge progress.
I don’t look at things like reads, because those might go up or down while your query runtime doesn’t change at all.
They’re also pretty misleading if you’re looking at STATISTICS IO in a lot of circumstances, like with lookups.
Sortie
A while back I recorded a bunch of videos that show how cached/estimated plans can be really misleading when it comes to costs and all that stuff.
You can find it here:
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.
USE StackOverflow2013;
EXEC dbo.DropIndexes;
SET NOCOUNT ON;
DBCC FREEPROCCACHE;
GO
CREATE INDEX
chunk
ON dbo.Posts
(OwnerUserId, Score DESC)
INCLUDE
(CreationDate, LastActivityDate)
WITH
(MAXDOP = 8, SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO
CREATE OR ALTER VIEW
dbo.PushyPaul
WITH SCHEMABINDING
AS
SELECT
p.OwnerUserId,
p.Score,
p.CreationDate,
p.LastActivityDate,
PostRank =
DENSE_RANK() OVER
(
PARTITION BY
p.OwnerUserId
ORDER BY
p.Score DESC
)
FROM dbo.Posts AS p;
GO
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656;
GO
CREATE OR ALTER PROCEDURE
dbo.StinkyPete
(
@UserId int
)
AS
SET NOCOUNT, XACT_ABORT ON;
BEGIN
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = @UserId;
END;
GO
EXEC dbo.StinkyPete
@UserId = 22656;
/*Start Here*/
ALTER DATABASE
StackOverflow2013
SET PARAMETERIZATION SIMPLE;
DBCC TRACEOFF
(
4199,
-1
);
ALTER DATABASE SCOPED CONFIGURATION
SET QUERY_OPTIMIZER_HOTFIXES = OFF;
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND 1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/
/*Let's cause a problem!*/
ALTER DATABASE
StackOverflow2013
SET PARAMETERIZATION FORCED;
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND 1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/
/*Can we fix the problem?*/
DBCC TRACEON
(
4199,
-1
);
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND 1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/
/*That's kinda weird...*/
DBCC FREEPROCCACHE;
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND 1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/
/*Turn Down Service*/
DBCC TRACEOFF
(
4199,
-1
);
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND 1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/
/*Okay then.*/
/*I'm different.*/
ALTER DATABASE SCOPED CONFIGURATION
SET QUERY_OPTIMIZER_HOTFIXES = ON;
SELECT
p.*
FROM dbo.PushyPaul AS p
WHERE p.OwnerUserId = 22656
AND 1 = (SELECT 1); /*Avoid trivial plan/simple parameterization*/
/*Cleanup*/
ALTER DATABASE
StackOverflow2013
SET PARAMETERIZATION SIMPLE;
ALTER DATABASE SCOPED CONFIGURATION
SET QUERY_OPTIMIZER_HOTFIXES = OFF;
DBCC TRACEOFF
(
4199,
-1
);
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.