SQL Server Archives | Page 151 of 164

Does Using DISTINCT In SQL Server Queries Cause Performance Problems?

Posted on May 20, 2019May 16, 2022 by Erik Darling

Footnote

I have two queries. They return the same number of rows.

The only difference is one column in the select list.

This query has the Id column, which is the primary key and clustered index of the Posts table.

    SELECT   DISTINCT
             p.Id, p.AcceptedAnswerId, p.AnswerCount, p.Body, p.ClosedDate,
             p.CommentCount, p.CommunityOwnedDate, p.CreationDate,
             p.FavoriteCount, p.LastActivityDate, p.LastEditDate,
             p.LastEditorDisplayName, p.LastEditorUserId, p.OwnerUserId,
             p.ParentId, p.PostTypeId, p.Score, p.Tags, p.Title, p.ViewCount
    FROM     dbo.Posts AS p
    JOIN     dbo.Votes AS v
        ON p.Id = v.PostId
    WHERE    p.PostTypeId = 2
    AND      p.CreationDate >= '20131225'
    ORDER BY p.Id;

The query plan for it looks like this:

Notice that no operator in this plan performs any kind of aggregation.

There’s no Hash Match Aggregate, no Stream Aggregate, no Distinct Sort, NADA!

It runs for ~1.9 seconds to return about 25k rows.

Lessen

Watch how much changes when we remove that Id column from the select list.

    SELECT   DISTINCT
             p.AcceptedAnswerId, p.AnswerCount, p.Body, p.ClosedDate,
             p.CommentCount, p.CommunityOwnedDate, p.CreationDate,
             p.FavoriteCount, p.LastActivityDate, p.LastEditDate,
             p.LastEditorDisplayName, p.LastEditorUserId, p.OwnerUserId,
             p.ParentId, p.PostTypeId, p.Score, p.Tags, p.Title, p.ViewCount
    FROM     dbo.Posts AS p
    JOIN     dbo.Votes AS v
        ON p.Id = v.PostId
    WHERE    p.PostTypeId = 2
    AND      p.CreationDate >= '20131225';

This is what the query plan now looks like:

Zooming in a bit…

After we Scan the Posts table, we sort about 47k rows.

After the join to Votes, we aggregate data twice. There are two Stream Aggregate operators.

What do we sort?

We Sort every column in the table by every column in the table.

In other words, we order by every column we’ve selected.

What do we aggregate?

Everything. Twice.

What Does It All Mean?

When selecting distinct rows, it can be beneficial to include a column that the optimizer can guarantee is unique in the set of selected columns. Think of a primary key, or another column with a uniqueness constraint on it.

Without that, you can end up doing a lot of extra work to create a distinct result set.

Of course, there are times when that changes the logic of the query.

Thanks for reading!

Going Further

If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server Index Fragmentation And Broken Demos

Posted on May 17, 2019May 16, 2022 by Erik Darling

Egg Meet Face

The other night I was presenting for a user group, and I had a demo break on me.

Not the first time, not the last time. But this was weird because I had rehearsed things that morning.

I skipped over it at the time, but afterwards I started thinking about what had happened, and walking back through other stuff I had done that day.

Turns out, I had fragmented my indexes, and that broke a trivial plan demo.

Just Too Trivial

The working demo looks like this.

I run these queries. The top one receives a trivial plan, and doesn’t have a missing index request.

The bottom one gets full optimization, and one shows up.

	/*Nothing for you*/
	SELECT *
	FROM dbo.Users AS u
	WHERE u.Reputation = 2;

	/*Missing index requests*/
	SELECT *
	FROM dbo.Users AS u
	WHERE u.Reputation = 2
	AND 1 = (SELECT 1);

Snap Your Shot

How I had broken this demo was by playing with Snapshot Isolation.

At some point earlier in the day, I had done something like this:

ALTER DATABASE StackOverflow2013 
    SET ALLOW_SNAPSHOT_ISOLATION ON;

BEGIN TRAN
UPDATE dbo.Users SET Reputation += 1
ROLLBACK

When you use optimistic isolation levels, a 14 byte pointer gets added to every row to keep track of its version.

If I had run the update without it, it wouldn’t have been a big deal. But, you know.

I’m not that lucky. Or smart.

See, after updating every row in the table, my table got uh… bigger.

SQL Server Query Results — Looter in a riot

Now, if I rebuild the table with snapshot still on, the problem goes away.

The problem is that I didn’t do that before or after my little experiment.

With a heavily fragmented index, both queries not only get fully optimized, but also go parallel.

They’re both a bit faster. They both use a little more resources.

Why? Because SQL Server looked at the size of the table and decided it would be expensive to scan this big chonker.

Egg Wipes

Moral of the story: Index fragmentation makes your queries better.

Don’t @ me.

Going Further

Are Table Variable Indexes Ever Useful In SQL Server?

Posted on May 16, 2019May 16, 2022 by Erik Darling

Oughtta Know

Indexes are good for so much more than what they’re given credit for by the general public.

One example where indexes can be useful is with the oft-maligned table variable.

Now, they won’t help you get a better estimate from a table variable. In versions prior to the upcoming 2019 release, table variables will only net you a single row estimate.

Yes, you can recompile to get around that. Yes, you can use a trace flag to occasionally be helpful with that.

Those defenses are inadequate, and you know it.

Help How?

Let’s say we have this query against a table variable.

SELECT u.DisplayName, b.Date
FROM dbo.Users AS u
CROSS APPLY
    (
        SELECT TOP 1 *
        FROM @waypops AS w
        WHERE u.Id = w.UserId
        ORDER BY w.Date DESC
    ) AS b
WHERE u.Reputation >= 100000;

With an unindexed table variable, the plan looks like this:

You can see by the helpful new operator time stats in SSMS 18 that this query runs for 13.443 seconds.

Of that, 13.333 seconds is spent scanning the table variable. Bad guess? You bet.

If we change the table variable definition to include an index, the plan changes, and runs much faster.

The query no longer goes parallel, but it runs for 226ms.

A significant change aside from parallelism is that the Top operator is no longer a Top N Sort.

The clustered index has put the table variable data in useful order for our query.

Insertions

The table variable insert looks like this:

DECLARE @waypops TABLE 
(
  UserId INT NOT NULL,
  Date DATETIME NOT NULL
  --, INDEX c CLUSTERED(UserId, Date DESC)
);

INSERT @waypops 
        (UserId, Date)
SELECT b.UserId, b.Date
FROM dbo.Badges AS b
WHERE b.Name IN ( N'Popular Question')
UNION ALL 
SELECT b.UserId, b.Date
FROM dbo.Badges AS b
WHERE b.Name IN (N'Notable Question' )

Right now, I’ve got the index definition quoted out. The insert runs for .662ms.

The insert with the index in place runs for .967ms:

Given the 13 second improvement to the final query, I’ll take the ~300ms hit on this one.

Wierda

If you’re wondering why I’ve got the insert query broken up with a UNION ALL, it’s because the alternative really sucks:

DECLARE @waypops TABLE 
(
  UserId INT NOT NULL,
  Date DATETIME NOT NULL
  , INDEX c CLUSTERED(UserId, Date DESC)
);

INSERT @waypops 
        (UserId, Date)
SELECT b.UserId, b.Date
FROM dbo.Badges AS b
WHERE b.Name IN ( N'Popular Question', N'Notable Question')

This insert takes 1.4 seconds, and introduces a spilling sort operator.

So uh, don’t do that IRL.

Thanks for reading!

Going Further

Should An Optimistic Isolation Level Be The New Default For SQL Server Databases?

Posted on May 15, 2019May 16, 2022 by Erik Darling

Two Words

Alright, I’ve lied to you already. There are way more than two words involved, here.

I recently had breakfast (JUST BREAKFAST, GOD) with Microsoft’s Most Handsome Man™, and the topic came up.

Partially because there are such fundamental changes coming to SQL Server, and also because it’s already the default in Azure SQL DB.

If I had to name a redeeming quality of Azure SQL DB, that would be it.

Unboxing

I’ve seen pessimistic locking causing problems and heartache on many, many servers.

While it wasn’t totally the fault of the isolation level (query and index tuning was needed, and there was some other silliness), it shouldn’t take that kind of dedication to make It Just Run Faster©.

Possibly the worst side effect is people leaning heavily on reading dirty data (via read uncommitted/nolock) to avoid issues.

You can’t preach about the evils of dirty reads without offering ways to avoid blocking.

Yes, I’m looking at you.

You all cackle at seeing NOLOCK everywhere, but I don’t hear much about solving blocking problems without it.

Standards and Futures

Right now “other major database platforms” offer optimistic locking as the default.There are implementation differences, but the net result is the same.

Readers and writers don’t suffer locking contention, and only fully committed data is read.

SQL Server currently offers optimistic locking via Read Committed Snapshot Isolation (RCSI from here), and Snapshot Isolation (SI from here).

Right now, they both send row versions to tempdb. But in the works for SQL Server 2019 is a feature called Accelerated Database Recovery (ADR from here). The mechanism that makes this possible is a Persistent Version Store (PVS from here) that’s local to the database, not tempdb.

While RCSI and SI could end up sending quite a bit of data to tempdb, which could cause contention there without a lot of precautions and setup work, any potential contention from ADR would be localized.

I know, Microsoft has been addressing tempdb, too. Setup now configures tempdb in a mostly sane way, and trace flags 1117 and 1118 are the default behavior in 2016+. That should make optimistic locking a more comfortable setting for people, but no. No. I hardly ever see it in use.

Dependencies

Breaking changes are hard. I know, there are people out there who depend on pessimistic locking for certain patterns to maintain correctness.

The thing is, it’s a lot easier to have just these processes use pessimistic locking while allowing the rest of the us to not have to sprinkle NOLOCK hints every which where to get a report to run.

An example of a process that requires some locking hints is Paul White’s post about Sequence Tables.

At this point in time, if you’re implementing a queue in SQL Server, you should be the one learning about how to get the kind of locking you need to get it working. Normal people who just want their app to work shouldn’t be the ones staring at articles about isolation levels, locking hints, lock escalation, etc.

Get It, Get It

This is a good problem to get in front of. I’d be quite happy to not have to talk about the reader/writer aspects of locking anymore.

In the same way, it would be nice to not have to talk to users who are endlessly frustrated by locking problems, explain dirty reads, explain optimistic isolation levels, caution them against certain issues, and then tell them to have fun removing all those old lock hints.

Thanks for reading!

Going Further

The Real Value Of Optimistic Isolation Levels In SQL Server

Posted on May 14, 2019May 16, 2022 by Erik Darling

Erroneous

While a lot of people speak highly of optimistic locking because it cuts down on blocking issues, I think it’s of equal or higher value that dirty reads are prevented.

“Imagine” a situation where someone ran a query they shouldn’t have.

Either forgetting a join, a where, or some other limiting condition.

You know, while they’re continuously integrating something.

Out, damn spot

    BEGIN TRAN;
    UPDATE h
    SET h.Score = 2147483647
    FROM dbo.HighQuestionScores AS h
    WHERE 1=1;

This will update every row in the table. I know this because once upon a time, I wrote a similar query and learned the importance of backups.

I also learned that tools that throw a warning when modification queries are missing WHERE clauses think this one is just fine.

But now if I go into another window, this query will return wrong results.

    SELECT TOP 10 *
    FROM dbo.HighQuestionScores AS hqs WITH(NOLOCK)
    ORDER BY hqs.Id DESC;

In fact, any query that runs with a nolock hint will show incorrect values here.

Boss runs a report? Wrong.
Trigger fires based on a score change? Wrong.
User checks their account? Wrong.

Fully Correct

    SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
    
    SELECT TOP 10 *
    FROM dbo.HighQuestionScores AS hqs
    ORDER BY hqs.Id DESC;

This query uses the snapshot isolation level.

It would do the same thing under read committed snapshot isolation, but without needing to set the isolation level.

But now we don’t get someone’s accident in our query results.

Remember folks: yeah, blocking sucks, but incorrect data is usually way, way worse.

Thanks for reading!

Going Further

Breaking Columnstore Delta Rowgroups

Posted on May 14, 2019May 14, 2022 by Joe Obbish

The documentation on delta rowgroups says:

A delta rowgroup is a clustered index that’s used only with columnstore indexes. It improves columnstore compression and performance by storing rows until the number of rows reaches a threshold and are then moved into the columnstore.

This clustered index isn’t chosen by the person who creates the table. It’s a hidden internal column automatically added to delta rowgroups. Perhaps the implementation is similar to that of table spools:

The worktable is structured as if it was defined with a clustered index on zero columns. This means that a 4-byte uniqueifier is added to each row stored except the first.

You know how programmers are: they love reusing old code. If a similar implementation is used for delta rowgroups then it should be possible to see SQL Server errors with the right pattern of data loading. More investigation is required.

Defining the rules

It might be possible to find evidence of this clustered index by using DBCC PAGE. In the T-SQL code below, I create a table clustered columnstore index, insert 500 rows, delete 499 rows, insert 500 more rows, and delete 499 rows again:

CREATE TYPE dbo.SEAN_GALLARDY_INT FROM SMALLINT NOT NULL;

DROP TABLE IF EXISTS dbo.view_hidden_clustered_index;

CREATE TABLE dbo.view_hidden_clustered_index (
	ID SEAN_GALLARDY_INT,
	INDEX CCI CLUSTERED COLUMNSTORE
);

GO

CREATE OR ALTER PROCEDURE #p AS
BEGIN
	SET NOCOUNT ON;

	DECLARE @value_to_insert INT = 1;

	WHILE @value_to_insert <= 500
	BEGIN
		INSERT INTO dbo.view_hidden_clustered_index VALUES (@value_to_insert);

		SET @value_to_insert = @value_to_insert + 1;
	END;

	DELETE FROM dbo.view_hidden_clustered_index
	WHERE ID < 500;

	WHILE @value_to_insert <= 1000
	BEGIN
		INSERT INTO dbo.view_hidden_clustered_index VALUES (@value_to_insert);

		SET @value_to_insert = @value_to_insert + 1;
	END;

	DELETE FROM dbo.view_hidden_clustered_index
	WHERE ID > 500 AND ID < 1000;
END;

GO

EXEC #p;

There’s only a single page that stores the 2 rows currently held by the table. This can be viewed with the undocumented DBCC PAGE, TF 3604, and the undocumented sys.dm_db_database_page_allocations:

DECLARE @file_id SEAN_GALLARDY_INT;
DECLARE @page_id INT;

SELECT @file_id = allocated_page_file_id, @page_id = allocated_page_page_id
FROM sys.dm_db_database_page_allocations 
    (DB_ID(), OBJECT_ID('dbo.view_hidden_clustered_index'),NULL, NULL, 'DETAILED')
WHERE is_allocated = 1 AND allocation_unit_type = 1 AND is_iam_page = 0 and page_type = 1;

DBCC TRACEON(3604);
DBCC PAGE('TEST',@file_id,@page_id,3) WITH TABLERESULTS;
DBCC TRACEOFF(3604);

The information that we’re looking for is near the bottom of the result set:

A few things are worth calling out. The 500th row that was inserted into the table has a value of 499 for the “CSILOCATOR” field. This value appears to be stored in little-endian format in the memory dump for Slot 0 Offset 0x60. You can decode the raw value to 499 in T-SQL if desired:

SELECT CAST(0x000001F3 AS INT);

The 1000th row that was inserted into the table into the table has a value of 999 for the CSILOCATOR field. Most importantly, this CSILOCATOR field has a length of four bytes. A typical four byte int in SQL Server has a maximum value of 2147483647. If it’s possible to load billions of rows into a single delta rowgroup then we may run out of values for the CSILOCATOR field.

Of course, a single delta rowgroup cannot hold more than 1048576 rows. As you can see in this example, SQL Server does not always reuse freed up values for the CSILOCATOR field. The table currently has two rows, yet the field has advanced to 999. The right pattern of deletes, inserts, and updates should allow the CSILOCATOR to continue to grow without running out of room for rows in the delta rowgroup.

As a final note, I cannot prove that the CSILOCATOR field corresponds to the clustered index, but it certainly seems to serve the function of a uniqueifier that would be needed for said clustered index.

Running up the score

I need to find a relatively efficient way to advance the CSILOCATOR because I need to do it over 2 billion times, if my theory is correct about the maximum allowed value. Both updating all of the rows in a delta rowgroup and deleting and reinserting advance the CSILOCATOR. I expected that small batch sizes would work best, and they did. For my table’s schema, the sweet spot for updates is about 275 rows and the sweet spot for delete/inserts is about 550 rows. Delete/inserts appeared to be faster than updates for the purpose of constantly reloading the same rows over and over.

Strategies that use multiple CPU cores are possible, but I wanted to do other work on this PC and didn’t want to listen to a loud fan all day. Here’s what the final testing code looked like:

DROP TABLE IF EXISTS dbo.delta_store_test;

CREATE TABLE dbo.delta_store_test (
	ID TINYINT NOT NULL,
	INDEX CCI CLUSTERED COLUMNSTORE
);


DROP TABLE IF EXISTS dbo.LOG_TABLE;

CREATE TABLE dbo.LOG_TABLE (
	log_time DATETIME,
	loop_count INT,
	PRIMARY KEY (log_time)
);

GO

DROP TABLE IF EXISTS dbo.delta_store_source;
CREATE TABLE dbo.delta_store_source (
	ID TINYINT NOT NULL
);

INSERT INTO dbo.delta_store_source
SELECT TOP (550) 1
FROM master..spt_values t1
OPTION (MAXDOP 1);

GO

CREATE OR ALTER PROCEDURE #p AS
BEGIN
	SET NOCOUNT ON;

	DECLARE @loops INT = 0;

	WHILE @loops <= 8000000
	BEGIN
		DELETE FROM dbo.delta_store_test

		INSERT INTO dbo.delta_store_test
		SELECT 1
		FROM dbo.delta_store_source WITH (TABLOCK);

		SET @loops = @loops + 1;

		IF @loops % 10000 = 0
		BEGIN
			INSERT INTO dbo.LOG_TABLE
			VALUES (GETDATE(), @loops);
		END;
	END;
END;

GO

EXEC #p;

If you’re wondering about the temporary stored procedure creation, it’s a habit that I’ve gotten into whenever I write a T-SQL while loop. While not applicable here, very fast loops can incur ASYNC_NETWORK_IO overhead due to the passing of DONE tokens to the client.

Winning the game

After about seven hours the code reaches its glorious end:

Msg 666, Level 16, State 2, Procedure #p, Line 11 [Batch Start Line 26]
The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID 72057596406595584. Dropping and re-creating the index may resolve this; otherwise, use another clustering key.

I loaded a few more rows into the table until I was unable to insert even one row. Using DBCC PAGE and friends again, we can see that the CSILOCATOR has a very large value of 2147483646.

Issuing a REBUILD does resolve the issue because it wipes away our sins, as REBUILDs often do. Naturally using another clustering key is not an option.

Joe: 1

Microsoft: 0

The cost of playing

My desktop computer has an intel core i5-4670 processor. Intel claims a TDP of 84 watts. Using CPUID HWMonitor, it looks like my CPU uses about 15 W of additional power when running the workload. It’s a single core workload, so I feel that 15 W is reasonable. 15 watts is 0.015 kW, and when used over 7 hours it translates to 0.105 kWh. My last electric bill has a rate of $0.11663 per kWh, so the total cost of this test comes out to about 1.2 cents. I will be sending Erik an invoice.

Final thoughts

It is possible to hit error 666 when maliciously loading data into a columnstore index. It requires loading over 2 billion rows into the same delta rowgroup, so it’s difficult to think of a production scenario that would lead to this outcome. For those of you with eccentric workloads that give cause for concern, you can roughly check if you are running of IDs by running a query similar to the following for each columnstore table:

SELECT 2147483647 - 1 - MAX(CAST(SUBSTRING(%%physloc%%, 5, 4) AS INT)) REMAINING_CSILOCATORS
FROM dbo.delta_store_test;

Thanks for reading!

Underused sp_WhoIsActive capabilities: @delta_interval

Posted on May 10, 2019May 14, 2022 by Erik Darling

Great Ghost

In the last post, we looked at how to see if your query is misbehaving, or if things are running normally.

Note that I didn’t say “good” here. Just normal.

For you.

What Have You Done For Me Lately?

Using sp_WhoIsActive in a slightly different way, we can see what a query has been up to over a duration of our choosing.

The delta columns are what happened over the span of time we pick. The columns that come back normally are cumulative to where the query is at now.

So for a 5 seconds interval:

EXEC sp_WhoIsActive @delta_interval = 5;

I’m stacking a couple screens shots from the normal single line of results, because they’re a little too wide all at once.

What deltas can help you figure out is how “active” your query really is.

In other words, it can answer the question: is my query stuck, or is my query doing something?

As an example, here’s a query that opened a transaction, ran an update, and then… sat there.

Over all, we’ve done 4.4 million reads, 80k writes, and used about 7k ms of CPU.

But in the last five seconds, we haven’t done anything at all.

Thanks for reading!

Going Further

Underused sp_WhoIsActive Capabilities For Troubleshooting SQL Server: @get_avg_time

Posted on May 9, 2019May 14, 2022 by Erik Darling

Ablazemalls

Look, it’s impressive enough that sp_WhoIsActive works as well as it does. Most people who’ve been using it for a while are familiar with using @get_plans to retrieve query plans, or @get_locks to see locking information. I’m gonna spend a couple posts talking about less well known flags that I really like.

We’re gonna start off with one that can help you figure out if a particular query is misbehaving, or if things are just cruising along normally.

Nipples

I’ve got a stored procedure. What it does is unimportant. What’s important is that it’s sensitive to parameter sniffing.

We may not know this walking into an emergency, but we can figure it out pretty quickly using sp_WhoIsActive.

EXEC sp_WhoIsActive @get_avg_time = 1;

This’ll give us back the usual information about how long our query has been running, but now right next to it is another column that ends in (avg).

When you use this switch, sp_WhoIsActive goes into the plan cache and looks at how long a particular query runs for on average.

If that number is much lower than how long the query has been running for, and it isn’t being blocked, you just might have a case of parameter sniffing on your hands.

Thanks for reading!

Going Further

Last Week’s Almost Definitely Not Office Hours: May 3

Posted on May 8, 2019May 16, 2022 by Erik Darling

ICYMI

Last week’s thrilling, stunning, flawless episode of whatever-you-wanna-call-it.

Going Further

The Only Thing Worse Than Optional Parameters For SQL Server Query Performance

Posted on May 7, 2019May 16, 2022 by Erik Darling

Kitchen Stink

A lot has been written about “kitchen sink” queries. A couple of my favorites are by Aaron Bertrand and Gail Shaw.

Both articles have similar start and end points. But I’m going to start at an even worse point.

Catch All Parameters

This is the worst possible idea.

    CREATE OR ALTER PROCEDURE dbo.AwesomeSearchProcedure (@SearchString NVARCHAR(MAX))
    AS
    SET NOCOUNT, XACT_ABORT ON;
    SET STATISTICS TIME, IO OFF;
    BEGIN
    
    DECLARE @AltString NVARCHAR(MAX) = N'%'
    
    SELECT TOP (1000) p.OwnerUserId, p.Title, p.CreationDate, p.LastActivityDate, p.Body
    FROM dbo.Posts AS p
    WHERE p.OwnerUserId LIKE ISNULL(N'%' + @SearchString + N'%', @AltString)
    OR    p.Title LIKE ISNULL(N'%' + @SearchString + N'%', @AltString)
    OR    p.CreationDate LIKE ISNULL(N'%' + @SearchString + N'%', @AltString)
    OR    p.LastActivityDate LIKE ISNULL(N'%' + @SearchString + N'%', @AltString)
    OR    p.Body LIKE ISNULL(N'%' + @SearchString + N'%', @AltString);
        
    END;
    GO

It doesn’t get any better if you do this, either.

    SELECT TOP (1000) p.OwnerUserId, p.Title, p.CreationDate, p.LastActivityDate, p.Body
    FROM dbo.Posts AS p
    WHERE (p.OwnerUserId LIKE N'%' + @SearchString + N'%' OR @SearchString IS NULL)
    OR    (p.Title LIKE N'%' + @SearchString + N'%' OR @SearchString IS NULL)
    OR    (p.CreationDate LIKE N'%' + @SearchString + N'%' OR @SearchString IS NULL)
    OR    (p.LastActivityDate LIKE N'%' + @SearchString + N'%' OR @SearchString IS NULL)
    OR    (p.Body LIKE N'%' + @SearchString + N'%' OR @SearchString IS NULL);

Sample Run

In both cases, just searching for a single value results in a query that runs for >2 minutes.

    EXEC dbo.AwesomeSearchProcedure @SearchString = N'35004';
    GO

The problem is that we’re just searching for an OwnerUserId, but SQL Server doesn’t know that.

The query plan looks like this:

See that Filter? That’s where we do all of our search work. We scan the whole Posts table, and push every row across the pipe to the Filter.

Irritable

This pattern might work on a small amount of data, but like most things that are efficient in small doses this will quickly fall apart when your database reaches a mature size.

My example is pretty simple, too, just hitting one table. In real life, you monsters are going this across joins, throwing in row numbers, distincts, and ordering by the first 10 columns.

It only gets worse as it gets more complicated.

Thanks for reading!