I was quite publicly mystified by the coupling of Azure Data Studio into the SQL Server Management Studio installer. They’re different tools for different people.
This isn’t a tirade against Azure Data Studio, nor is it a victory lap since apparently the feedback item I posted was part of the decision to remove it.
This is purely informational, since the announcement that it’s not in there anymore is nearly as quiet as the announcement that it was being included back in SQL Server Management Studio 18.7, and only slightly louder than the availability of a command line switch to skip installing it.
Back when I initially complained about the inability to skip installing Azure Data Studio, there was a lot of talk about how SQL Server Management Studio would increasingly rely on it for new functionality. I suppose that was either untrue, or the roadmap changed significantly.
Quite cynically, I thought it was a cheap way to increase the install base of a new product, but who knows? I assume Microsoft has better telemetry about usage than binaries just existing. Again, who knows?
A further miffance was that you could download and install Azure Data Studio independently, but not SQL Server Management Studio.
For SSMS 20 we have removed Azure Data Studio from the installation. In addition to being a highly voted item (Make ADS An Optional Install Alongside SSMS · Community (azure.com)), this aligns with the future extension work for SSMS 21 and we decided to make the change in the current release.
So, hooray! Dreams do come true, etc. When I was but a young boy, I dreamed that someday my desires would be aligned with future extension work for SSMS.
I’d also like to say here that being the public face of SQL Server Management Studio makes Erin Stellato about the bravest soul I know. It’s a tool that millions of people rely on, and nearly everyone actively complains about. Part of my RSS feed is not only posts from Microsoft about data platform related stuff, but also the comment feed. She has saintly patience in her responses to the comments on these things. Comments on the internet maintain their position and title as the worst things on earth. Erin deserves infinite credit for doing the job that she does as well as she does it.
If you’re still stuck using older versions because of a new bug or an old feature, you can skip installing Azure Data Studio by running the installer via the command prompt:
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
I have never once seen anyone use these. The most glaring issue with them is that unlike a lot of other directives in SQL, these ones just don’t do a good job of telling you what they do, and their behavior is sort of weird.
Unlike EXISTS and NOT EXISTS, which state their case very plainly, as do UNION and UNION ALL, figuring these out is not the most straightforward thing. Especially since INTERSECT has operator precedence rules that many other directives do not.
INTERSECT gives you a set of unique rows from both queries
EXCEPT gives you a set of unique rows from the “first” query
So, cool, if you know you want a unique set of rows from somewhere, these are good places to start.
What’s better, is that they handle NULL values without a lot of overly-protective syntax with ISNULL, COALESCE, or expansive and confusing OR logic.
The tricky part is spotting when you should use these things, and how to write a query that makes the most of them.
And in what order.
Easy Examples
Often the best way to get a feel for how things work is to run simple queries and test the results vs. your expectations, whatever they may be.
I like these queries, because the UserId column in the Comments table is not only NULLable, but contains actual NULLs. Wild, right?
SELECT
c.*
FROM dbo.Comments AS c
WHERE c.UserId IS NULL
AND c.Score > 2
INTERSECT
SELECT
c.*
FROM dbo.Comments AS c
WHERE c.UserId IS NULL
AND c.Score > 3
ORDER BY
c.Score;
Running this will return results where a Comment’s Score is greater than 3 only, because that’s the starting point for where both query results begin to match results across all the columns.
Note that the UserId column being NULL doesn’t pose any problems here, and doesn’t require any special handling. Like I said. And will keep saying. Please remember what I say, I beg of you.
Moving on to EXCEPT:
SELECT
c.*
FROM dbo.Comments AS c
WHERE c.UserId IS NULL
AND c.Score > 2
EXCEPT
SELECT
c.*
FROM dbo.Comments AS c
WHERE c.UserId IS NULL
AND c.Score > 3
ORDER BY
c.Score;
This will only return results from the “first” query (often referred to as the left, or outer query) with a Score of 3, because that’s the only data that exists in it that isn’t also in the “second” (or right, or inner) query.
Both queries will find many of the same rows after Score hits 2 or 3, but those get filtered out to show only the difference(s) between the two.
In case it wasn’t obvious, it’s a bit like using NOT EXISTS, in that rows are only checked, and not projected from the second/right/inner query, looking for Scores greater than 3.
Again, NULLs in the UserId column are handled just fine. No ISNULL/COALESCE/OR gymnastics required.
While no database platform adheres strictly or urgently to ANSI standards, waiting 20 years for an implementation in SQL Server is kind of really-extra-super-duper son-of-a-gun boy-howdy dag-nabbit-buster alright-bucko hold-your-horses listen-here-pal levels of irritating.
Think of all the useless, deprecated, and retired things we’ve gotten in the past 20 years instead of basic functionality. It’s infinitely miffing.
Anyway, I like these additions quite a lot. In many ways, these are extensions of INTERSECT and EXCEPT, because the workarounds involved for them involved those very directives. Sort of like Microsoft finally adding GREATEST and LEAST, after decades of developers wondering just what the hell to do instead, I hope they didn’t show up too late to keep SQL Server from being bullied by developers who are used to other platforms.
We can finally start to replace mutton-headed, ill-performing syntax like this:
SELECT
c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON u.Id = p.LastEditorUserId
OR (p.LastEditorUserId IS NULL);
With stuff that doesn’t suck, like this:
SELECT
c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON u.Id IS NOT DISTINCT FROM p.LastEditorUserId;
The query plan timings tell enough of a story here:
But not everyone is able to use the latest and greatest (or least and greatest, ha ha ha) syntax. And the newest syntax isn’t always better for performance, without additional tweaks.
And that’s okay with me. I do performance tuning for a living, and my job is to know all the available options and test them.
Like here. Like now.
The Only One I Know
Let’s compare these two queries. It’ll be fun, and if you don’t think it’s fun, that’s why you’ll pay me. Hopefully.
SELECT
c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON u.Id IS NOT DISTINCT FROM p.LastEditorUserId;
SELECT
c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON u.Id = p.LastEditorUserId
WHERE EXISTS
(
SELECT p.LastEditorUserId FROM dbo.Posts AS p
INTERSECT
SELECT u.Id FROM dbo.Users AS u
);
Here’s the supporting index that I have for these queries:
CREATE INDEX
LastEditorUserId
ON dbo.Posts
(LastEditorUserId)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
It’s good enough. That’s what counts, I guess. Showing up.
20th Century Boy
At first glance, many queries may appear to be quite astoundingly better. SQL Server has many tricks up its sleeves in newer versions, assuming that you’re ready to embrace higher compatibility levels, and pay through the nose for Enterprise Edition.
This is a great example. Looking at the final query timing, you might think that the new IS [NOT] DISTINCT FROM syntax is a real dumb dumb head.
But unless you’re invested in examining these types of things, you’ll miss subtle query plan difference, which is why you’ll pay me, hopefully,
The second query receives the blessing of Batch Mode On Row Store, while the first does not. If we use the a helper object to get them both functioning on even terms, performance is quite close:
SELECT
c = COUNT_BIG(*)
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON u.Id IS NOT DISTINCT FROM p.LastEditorUserId
LEFT JOIN dbo.columnstore_helper AS ch
ON 1 = 0;
In this case, the slightly tweaked query just slightly edges out the older version of writing the query.
I Can’t Imagine The World Without Me
There are many ways to write a query, and examine the performance characteristics. As SQL Server adds more options, syntax, capabilities, and considerations, testing and judging them all (especially with various indexing strategies) becomes quite an endeavor.
I don’t blame developers for being unaware or, or unable to test a variety of different rewrites and scenarios. The level of understanding that it takes to tune many queries extends quite beyond common knowledge or sense.
The aim of these posts is to give developers a wider array of techniques, and a better understanding of what works and why, while exposing them to newer options available as upgrade cycles march bravely into the future.
Keeping up with SQL Server isn’t exactly a full time job. Things are changed and added from release to release, which are years apart.
But quite often I find companies full of people struggling to understand basic concepts, troubleshooting, and remediations that are nearly as old as patches for Y2K bugs.
My rates are reasonable, etc.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
UNION and UNION ALL seem to get used with the same lack of discretion and testing as several other things in T-SQL: CTEs vs temp tables, temp tables vs. table variables, etc.
There are many times I’ve seen developers use UNION when result sets have no chance of being non-unique anyway, and many times I’ve seen them use UNION ALL when there would be a great benefit to discarding unnecessary duplicates.
Even worse is when the whole query is written incorrectly in the first place, and both DISTINCT and UNION are dumped all over queries to account for unwanted results across the board.
For example, someone may test a query in isolation, decide that DISTINCT needs to be applied to that result set, and then use UNION when appending another set of results to the final query. Throw in the typical slew of NOLOCK hints and one is left to wonder if anyone even understands what correct output might look like at all.
The answer to most questions about the correct way to write a query of course hinge on the quality of the underlying data, and any observed flaws reported by end users or QA testers.
This all becomes quite difficult to wade through, because developers may understand the correct logic, but not the correct way to implement it.
Just An Onion
To start, let’s flesh out what each operator means in the most basic way.
Using a nifty SQL Server 2022 function, and the power of batch separator loops, we’re going to load the numbers 1-2 into two tables, twice.
CREATE TABLE
#t1
(
i integer
);
INSERT
#t1 (i)
SELECT
gs.*
FROM GENERATE_SERIES(1, 5) AS gs;
GO 2
CREATE TABLE
#t2
(
i integer
);
INSERT
#t2 (i)
SELECT
gs.*
FROM GENERATE_SERIES(1, 6) AS gs;
GO 2
Doing this will provide a unique set of the numbers 1-6 from both temporary tables.
SELECT
t.i
FROM #t1 AS t
UNION
SELECT
t.i
FROM #t2 AS t;
Which is not logically equivalent to doing this:
SELECT DISTINCT
t.i
FROM #t1 AS t
UNION ALL
SELECT DISTINCT
t.i
FROM #t2 AS t;
The first query will not only deduplicate rows within each query, but also in the final result.
The second query will only deduplicate results from each query, but not from the final result.
To avoid playing word games with you, the first query will return the numbers 1-6 only once, and the second query will return 1-5 once, and 1-6 once.
Some additional sense can be made of the situation by looking at the query plans, and where the distinctness is applied.
To put things plainly: if you’re already using UNION to bring to results together, there’s not a lot of sense in adding DISTINCT to each query.
Precedence, etc.
To better understand how UNION and UNION ALL are applied, I’d encourage you to use this simple example:
/*Changing these to UNION makes no difference*/
SELECT 1
UNION ALL
SELECT 2
UNION ALL
SELECT 3
/*Changing these to UNION makes a difference*/
UNION ALL
SELECT 1
UNION ALL
SELECT 2
UNION ALL
SELECT 3;
Specifically, look at the actual execution plans for these as you quote out ALL from the lines below the second comment.
You may even be surprised by what comes back when you get to the final UNION-ed select!
Orthodoxy
There has been quite a bit of performance debate about UNION and UNION ALL. Obviously, using UNION incurs some overhead to deduplicate results.
When you need it for result correctness, I’d encourage you to think about a few things:
The number of columns you’re selecting
The data types of the columns you’re selecting
What data actually identifies a unique row
I’ve come across many queries that were selecting quite a long list of columns, with lots of string data involved, that did a whole lot better using windowing functions over one, or a limited number of columns, with more manageable data types, to produce the desired results.
Here is a somewhat undramatic example:
DROP TABLE IF EXISTS
#u1;
SELECT
c.CreationDate,
c.PostId,
c.Score,
c.Text,
c.UserId
INTO #u1
FROM dbo.Comments AS c
WHERE c.Score IN (2, 9, 10)
AND c.UserId IS NOT NULL
UNION
SELECT
c.CreationDate,
c.PostId,
c.Score,
c.Text,
c.UserId
FROM dbo.Comments AS c
WHERE c.Score IN (3, 9, 10)
AND c.UserId IS NOT NULL;
DROP TABLE IF EXISTS
#u2;
SELECT
y.CreationDate,
y.PostId,
y.Score,
y.Text,
y.UserId
INTO #u2
FROM
(
SELECT
x.*,
n =
ROW_NUMBER() OVER
(
PARTITION BY
x.UserId,
x.Score,
x.CreationDate,
x.PostId
ORDER BY
x.UserId,
x.Score,
x.CreationDate,
x.PostId
)
FROM
(
SELECT
c.CreationDate,
c.PostId,
c.Score,
c.Text,
c.UserId
FROM dbo.Comments AS c
WHERE c.Score IN (2, 9, 10)
AND c.UserId IS NOT NULL
UNION ALL
SELECT
c.CreationDate,
c.PostId,
c.Score,
c.Text,
c.UserId
FROM dbo.Comments AS c
WHERE c.Score IN (3, 9, 10)
AND c.UserId IS NOT NULL
) AS x
) AS y
WHERE y.n = 1;
In the first query, we’re doing a straight union of all the columns in the Comments table, which includes the Text column (nvarchar 700).
In the second query, the UNION has been replaced by UNION ALL, and I’m using ROW_NUMBER on the non-text columns, and filtering to only the first result.
Here are the query plans:
If you’re looking at the second query plan and wondering why you’re not seeing the usual traces of windowing functions (window aggregates, or segment and sequence project, a filter operator to get n = 1), I’d highly suggest reading Undocumented Query Plans: The ANY Aggregate.
Like I said, this is a somewhat undramatic example. It only shaves about 500ms off the execution time, though that is technically about 30% faster in this scenario. It’s a good technique to keep in mind.
The index in place for these queries has this definition:
CREATE INDEX
c
ON dbo.Comments
(UserId, Score, CreationDate, PostId)
INCLUDE
(Text)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
Is UNION Ever Better Than UNION ALL?
There have been a number of times when producing distinct results has improved things rather dramatically, but there are a couple general characteristics they all shared:
Producing unique rows, either via UNION or DISTINCT is not prohibitively time consuming
The source being unique-ified feeds into an operation that is time consuming
Here’s an example:
CREATE INDEX
not_badges
ON dbo.Badges
(Name, UserId)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
CREATE INDEX
not_posts
ON dbo.Posts
(OwnerUserId)
INCLUDE
(Score, PostTypeId)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
GO
DROP TABLE IF EXISTS
#waypops;
CREATE TABLE
#waypops
(
UserId integer NOT NULL
);
INSERT
#waypops WITH(TABLOCKX)
(
UserId
)
SELECT
b.UserId
FROM dbo.Badges AS b
WHERE b.Name IN
(
N'Popular Question', N'Notable Question',
N'Nice Question', N'Good Question',
N'Famous Question', N'Favorite Question',
N'Great Question', N'Stellar Question',
N'Nice Answer', N'Good Answer', N'Great Answer'
);
SELECT
wp.UserId,
SummerHereSummerThere =
SUM(ca.Score)
FROM #waypops AS wp
CROSS APPLY
(
SELECT
u.Score,
ScoreOrder =
ROW_NUMBER() OVER
(
ORDER BY
u.Score DESC
)
FROM
(
SELECT
p.Score,
p.OwnerUserId
FROM dbo.Posts AS p
WHERE p.OwnerUserId = wp.UserId
AND p.PostTypeId = 1
UNION /*ALL*/
SELECT
p.Score,
p.OwnerUserId
FROM dbo.Posts AS p
WHERE p.OwnerUserId = wp.UserId
AND p.PostTypeId = 2
) AS u
) AS ca
WHERE ca.ScoreOrder = 0
GROUP BY
wp.UserId
ORDER BY
wp.UserId;
Executing this query as UNION-ed results gives us a query that finishes in about 3 seconds.
Note that the Distinct Sort operator chosen to implement the desired results of the UNION reduces the rows from 13,300,793 to 2,406,354. This is especially important when Lazy Table Spools are involved.
Here’s the query plan when it’s executed with UNION ALL:
Execution time goes from 3 seconds to 33 seconds. You may notice that the numbers on the inner side of the nested loops join are much larger across the plan, and that the Lazy Table Spool goes from about 900ms (2.587 seconds minus 1.612 seconds) to taking about 18 seconds (20 seconds minus 1.7 seconds). The Nested Loops Join also suffers rather dramatically, taking nearly 9 seconds, instead of the original 300ms, largely owing to the fact that it has to deal with 946,677,558 additional rows.
You’d suffer, too. Mightily.
Championship Belt
Choosing between UNION and UNION ALL is of course primarily driven by logical query correctness, but you should fully consider which columns actually identify a unique row for your query.
There are sometimes better ways of identifying uniqueness than comparing every single column being selected out in the final result set.
When you run into slow queries that are using UNION and UNION ALL, it’s usually worth investigating the overall usage, and if using one over the other gives you better performance along with correct results.
Where UNION can be particularly troublesome:
You’re selecting a lot of columns (especially strings)
You’re attempting to deduplicating many rows
You’re not working with a primary key
You’re not working with useful supporting indexes
Where UNION ALL can be particularly troublesome:
You’re selecting a lot of rows, and many duplicates exist in it
You’re sending those results into other operations, like joins (particularly nested loops)
You’re doing something computationally expensive on the results of the UNION ALL
Keep in mind that using UNION/UNION ALL is a generally better practice than writing some monolithic query with endless OR conditions in it.
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.
I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.
I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.
Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
Fixing Parallel Row Skew With TOP In SQL Server (With A Brief Re-Complaint About CXCONSUMER Waits)
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
A Little About SOS_SCHEDULER_YIELD Waits In SQL Server
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
Views get a somewhat bad rap from performance tuners, but… It’s not because views are inherently bad. It’s just that we’ve seen things. Horrible things.
Attack ships on fire off the shoulder of Orion… I watched sea-beams glitter in the dark near the Tannhäuser Gate. All those moments will be lost in time, like tears in rain…
The problem is really the stuff that people stick into views. They’re sort of like a junk drawer for data. Someone builds a view that returns a correct set of results, which becomes a source of truth. Then someone else comes along and uses that view in another view, because they know it returns the correct results, and so on and so on. Worse, views tend to do a bunch of data massaging, left joining and coalescing and substringing and replacing and case expressioning and converting things to other things. The bottom line is that views are as bad as you make them.
The end result is a trash monster with a query plan that can only be viewed in full from deep space.
When critical processes start to rely on these views, things inevitably slow to a crawl.
I’ve said all that about views to say that the exact same problem can happen with inline table valued functions. I worked with a client last year who (smartly) started getting away from scalar and multi-statement functions, but the end results were many, many layers of nested inline functions.
Performance wasn’t great. It wasn’t worse, but it was nothing to gloat and beam over.
The Case For Views
Really, the main reason to use a view over an inline table valued function is the potential for turning it into an indexed view. If Microsoft would put an ounce of effort into making indexed views more useful and usable, it would loom a bit larger.
There are some niche reasons too, like some query generation applications use metadata discovery to build dynamic queries that can’t “see” into inline table valued functions the way they can with views, but I try not to get bogged down in tool-specific requirements like that without good reason.
Both views and inline table valued functions offer schemabinding as a creation option. This, among other incantations, are necessary if you’re going to follow the indexed view path.
But, here we find ourselves at the end of the case for views. Perhaps I’m not digging deep enough, but I can’t find much realistic upside.
While doing some research for this, I read through the CREATE VIEW documentation to see if I was missing anything. I was a bit surprised by this, but don’t see it as a great reason to use them:
CHECK OPTION
Forces all data modification statements executed against the view to follow the criteria set within select_statement. When a row is modified through a view, the WITH CHECK OPTION makes sure the data remains visible through the view after the modification is committed.
If you’re into that sort of thing, perhaps this will make views more appealing to you. I’m not sure I can think of why I’d want this to happen, but 🤷♂️
The Case For Inline Functions
Now that we’re squared away on views, and we’ve made sure we’re starting with the understanding that either of these module types can be lousy for performance if you put a lousy query in them, and fail to create useful indexes for those queries to access data efficiently.
What would sway my heart of stone towards the humble inline table valued function?
Parameters.
Views can’t be created in a way to pass parameter directly to them. This can be a huge performance win under the right conditions, especially because if you use cross or outer apply to integrate an inline table valued function into your query. You can pass table columns directly in to the function as parameter values. Inline table valued functions take the ick away.
You know how with stored procedures, if you want to use one to process multiple rows from a table, the most workable approach is to use a loop or cursor to assign row values to parameters, and then execute the procedure with them?
Just an example, if you had a stored procedure to take (to make it simple, full) backups, it would be handy to be able to do something like this:
EXEC dbo.TakeAFullBackup
@DatabaseName AS
SELECT
d.name
FROM sys.databases AS d
WHERE d.database_id > 4;
But no, we have to write procedural code to get a list of database names, loop through them, and execute the procedure for each one (or some other close-enough approximation).
Kinda lame, SQL Server. Kinda lame.
Rat Race
When I first came across this oddity, I probably thought (and wrote) things like: “though this is a rare occurrence in views…”
Time has tried that line of thinking and found it wanting. I’ve seen this happen many, many times over now. It’s funny, the more things you learn that can go wrong in a query plan, the more things you become quite paranoid about. The mental checklist is astounding.
Let’s start, as we often do, with an index:
CREATE INDEX
p
ON dbo.Posts
(OwnerUserId, Score DESC)
INCLUDE
(CreationDate, LastActivityDate)
WITH
(SORT_IN_TEMPDB = ON, DATA_COMPRESSION = PAGE);
Now, before we move on, it’s worth noting that this issue is fixed under certain conditions:
I’m not sure which CU this fix was released for in SQL Server 2019, it’s not in any that I can find easily
You’re on SQL Server 2022 and using compatibility level 160
From my testing, it doesn’t matter which compatibility level you’re in on SQL Server 2017 or 2019, as long as optimizer hot fixes are enabled.
/*Using a database scoped configuration*/
ALTER DATABASE SCOPED CONFIGURATION
SET QUERY_OPTIMIZER_HOTFIXES = ON;
/*Using a trace flag instead*/
DBCC TRACEON(4199, -1);
/*SQL Server 2022+ only*/
ALTER DATABASE StackOverflow2013
SET COMPATIBILITY_LEVEL = 160;
For our purposes, we’ll be using SQL Server 2022 in compatibility level 150, with query optimizer hot fixes disabled.
No Problemo (Query)
Here’s a view and a query, where things work just fine:
CREATE OR ALTER VIEW
dbo.DasView
WITH SCHEMABINDING
AS
SELECT
p.Score,
p.OwnerUserId,
p.CreationDate,
p.LastActivityDate,
DENSE_RANK() OVER
(
PARTITION BY
p.OwnerUserId
ORDER BY
p.Score DESC
) AS PostRank
FROM dbo.Posts AS p;
GO
SELECT
p.*
FROM dbo.DasView AS p
WHERE p.OwnerUserId = 22656;
GO
The reason this works fine is because the where clause contains a literal value, and not a variable or parameter placeholder.
Everything is how we would expect this query plan to look, given the indexes available.
Si Problemo (View)
Where things become wantonly unhinged is when we supply a placeholder for that literal value.
CREATE OR ALTER PROCEDURE
dbo.DasProcedure
(
@OwnerUserId integer
)
AS
SET NOCOUNT, XACT_ABORT ON;
BEGIN
SELECT
p.*
FROM dbo.DasView AS p
WHERE p.OwnerUserId = @OwnerUserId
/*OPTION(QUERYTRACEON 4199)*/
/*OPTION(USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_160'))*/;
END;
GO
EXEC dbo.DasProcedure
@OwnerUserId = 22656;
GO
Note that I have a query trace on and use hint here, but quoted out. You could also use these to fix the issue for a single query, but my goal is to show you what happens when things aren’t fixed.
Here’s what that looks like:
Rather than a seek into the index we created, the entire thing is scanned, and we have a filter that evaluates our placeholder from 17 million rows and whittles the results down to 27,901 rows.
No Problemo (Function)
Using an inline table valued function allows us to bypass the issue, without any hints or database settings changes.
CREATE OR ALTER FUNCTION
dbo.DasFunction
(
@OwnerUserId integer
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
SELECT
p.Score,
p.OwnerUserId,
p.CreationDate,
p.LastActivityDate,
DENSE_RANK() OVER
(
PARTITION BY
p.OwnerUserId
ORDER BY
p.Score DESC
) AS PostRank
FROM dbo.Posts AS p
WHERE p.OwnerUserId = @OwnerUserId;
GO
This changes our procedure as well:
CREATE OR ALTER PROCEDURE
dbo.DasProcedure
(
@OwnerUserId integer
)
AS
SET NOCOUNT, XACT_ABORT ON;
BEGIN
SELECT
p.*
FROM dbo.DasFunction(@OwnerUserId) AS p;
END;
GO
EXEC dbo.DasProcedure
@OwnerUserId = 22656;
And our query plan goes back to normal.
Even if you don’t have this specific problem, it’s often worth exploring converting views to inline table valued functions, because quite often there is a common filtering or joining criteria, and having parameters to express that is beneficial in a couple ways:
It better shows the intent of module and what it can be used for
It prevents developers from forgetting filtering criteria and exploding results
Thanks for reading!
Going Further
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.
Whether you want to be the next great query tuning wizard, or you just need to learn how to start solving tough business problems at work, you need a solid understanding of not only what makes things fast, but also what makes them slow.
I work with consulting clients worldwide fixing complex SQL Server performance problems. I want to teach you how to do the same thing using the same troubleshooting tools and techniques I do.
I’m going to crack open my bag of tricks and show you exactly how I find which queries to tune, indexes to add, and changes to make. In this day long session, you’re going to learn about hardware, query rewrites that work, effective index design patterns, and more.
Before you get to the cutting edge, you need to have a good foundation. I’m going to teach you how to find and fix performance problems with confidence.
If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that, and need to solve database performance problems quickly. You can also get a quick, low cost health check with no phone time required.