Introducing sp_QuickieStore: Find Your Worst Queries In SQL Server’s Query Store

Querying Query Store Sucks


If you’ve ever tried to write a query to find stuff in Query Store and quickly gotten frustrated, you’re not alone.

Whomever designed that schema hates you. It’s probably the same person who designed Extended Events.

I know, there’s a GUI. The stuff you can find in there is definitely useful, but I often find myself wishing it did a little bit more. Asking for additional features often feels like porridge-begging, and even if something new did get added eventually, it wouldn’t help you in the moment.

With all that in mind, I wrote sp_QuickieStore to try to take some of that pain away. As of this post, it’s V1. There will likely be changes and fixes and all that good stuff, but right now it’s good enough for an introduction.

You can get it here.

What You Can Do


Out of the box, you get a lot of choices about what you want to get and how you want to get it.

You can choose:

  • Which metric you want to focus on:
    • CPU, duration, reads, writes, memory, tempdb, executions
  • How many queries you want to get back
  • The start and end dates of your search
  • Minimum number of executions
  • Minimum duration
  • A specific stored procedure
  • Plan IDs to include or ignore
  • Query IDs to include or ignore
  • Query text to search for
  • Additional query details
    • Compilation metrics, wait stats (2017+), plan cache data, query store options
  • If you want your numbers formatted

Where You Can Do It


sp_QuickieStore is compatible with SQL Server 2016 SP2+, 2017 CU3+, any version of 2019, and probably Azure SQL DB. I designed it to work there, but long term who knows what changes will occur that make support not possible. No promises.

It’ll take advantage of new columns added to SQL Server 2017 and 2019, and the wait stats tracking added in SQL Server 2017.

Testing on my end is done on case-sensitive instances from 2016 up. If you find bugs, please report them on GitHub. I do not offer private email support.

GitHub is also where you can go to ask for features, with the usual caveats that open source is not free development or consulting.

Thanks for understanding, and all that.

Getting Started


sp_QuickieStore has a bunch of documentation built in to the procedure, and there are a number of example calls in the repo.

Now, I know. It’s called sp_QuickieStore, so you expect it to be fast. I’ve done what I can to make things as efficient as possible, but gosh darn it all, sometimes Query Store is just gonna be Query Store.

SQL Server Query Plan
enema

It would be nice if the maintainer of a database that promises high performance maybe put some useful indexes on their views to analyze performance so the queries that touch those views to analyze performance run quickly.

I know, I know. Take off your Crazy Pants, Erik.

Anyway, I’ll have some more posts coming this week to fill in details.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Documentation for dm_db_missing_index_group_stats_query

No, It’s New


When I was checking out early builds of SQL Server 2019, I noticed a new DMV called dm_db_missing_index_group_stats_query, that I thought was pretty cool.

It helped you tie missing index requests to the queries that requested them. Previously, that took a whole lot of heroic effort, or luck.

With this new DMV, it’s possible to combine queries that look for missing indexes with queries that look for tuning opportunities in the plan cache or in Query Store.

It seems to tie back to dm_db_missing_index_groups, on the index_group_handle column in this DMV joined to the group handle column in the new DMV.

If you’re wondering why I’m not giving you any code samples here, it’s because I’m going to get some stuff built into sp_BlitzIndex to take advantage of it, now that it’s documented.

Special thanks to William Assaf (b|t) for helping to get this done.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

Why Microsoft Should Get Rid Of SQL Server’s Plan Cache And Use Query Store Instead

Used To Love Her


I used to think the plan cache was so cool.

  • You can find queries that aren’t good there
  • Plans are full of details (and XML)
  • Supporting DMVs give you extra insights about resource usage

But most of the time now, I’m totally frustrated with it.

It clears out a lot, plans aren’t there for some queries, and the plans that are there can be very misleading.

Can you really tell someone what their worst performing queries are when everything in there is from the last 15 minutes?

No.

Why I Like Query Store Better


Plans end up there for more definite amounts of time, even if you stick a recompile hint on them.

Plus, you get a whole lot more information about what went on with all the different query metrics. As far as a source of “what happened and when”, it’s much more reliable. It’s almost like having a monitoring tool in SQL Server.

Do you trust your plan cache to tell you what was slow more than a couple days ago?

One really big bonus is that it’s portable, and it’s per-database. You can backup or clone a database so someone else can easily look at it somewhere else.

On top of that, it’s more usable: there are built in reports, a graphical interface, settings you can tweak, and plans you can force with a couple clicks.

You could solve a lot of frustrations by moving things here.

Microsoft Has Work To Do, Though


There have been a number of issues, and unanswered user requests for Query Store over the years, and not all of the associated DMVs have full parity.

Querying the data also needs to be made less prone to performance issues. And, living in the future, it’s a much more sensible location for the last known actual plan.

A nice to have would be predicate normalization for identical and morally equivalent queries, with totals and averages for resource usage, so you could group things together. Especially for morally equivalent queries, this would allow plan forcing where the queries get different ids.

Some monitoring tools have this in their interface, and the number of times I’ve been able to track down outliers within a group of executions to solve a problem is, well, at least memorable to me.

I wish more people had monitoring tools.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

For Better Query Store Adoption, Make Querying It Faster And Easier

Mama, I Tried


Back when I first wrote sp_BlitzQueryStore, I was totally enamored with Query Store.

Like the plan cache, but better. History. Different plans for the same query. Not disturbed by memory pressure or restarts.

Then I waited patiently to find a client on 2016 using it.

And waited, and waited, and waited.

And finally, some came along.

Slow Pokes And No Pokes


When I ran it, it took forever. Not even the XML part. The XML part was fast.

Gathering the initial set of data was slow.

With some time to experiment and dig in, I found that the IN_MEM tables cause significant performance issues when:

  • Query Store is actively logging data
  • Query Store is > 25 MB or so

Yes, children, in memory tables can be slow, too.

The Problem


Let’s take a couple simple queries against Query Store tables:

SELECT TOP 10 *
FROM sys.query_store_runtime_stats AS qsrs
WHERE qsrs.avg_cpu_time >= 500000
AND   qsrs.last_execution_time >= DATEADD(DAY, -1, GETDATE())
ORDER BY qsrs.avg_cpu_time DESC;

SELECT TOP 10 *
FROM sys.query_store_plan AS qsp
WHERE qsp.query_plan IS NOT NULL
AND   qsp.last_execution_time >= DATEADD(DAY, -1, GETDATE())
ORDER BY qsp.last_execution_time DESC;

The first query runs for 10 seconds, with the entirety of the time spent filtering data out of the IN_MEM table:

Ho hum.

The second query is even worse, at nearly 2 minutes:

Filtering on the 1
Fingerling on the floor

“Unrealistic”


I know, this configuration is probably unsupported because I used SELECT * or something.

I wrote this query hoping to quickly get the worst plans by a specific metric.

WITH the_pits
    AS
     (
         SELECT   TOP ( 101 )
                  qsrs.plan_id,
                  qsp.query_id,
                  qsrs.avg_duration / 100000. AS avg_duration_s,
                  qsrs.avg_cpu_time / 100000. AS avg_cpu_time_s,
                  qsrs.avg_query_max_used_memory,
                  qsrs.avg_logical_io_reads,
                  qsrs.avg_logical_io_writes,
                  qsrs.avg_tempdb_space_used,
                  qsrs.last_execution_time,
                  /*
                  You can stick any of the above metrics in here to
                  find offenders by different resource abuse
                  */
                  MAX(qsrs.avg_cpu_time) OVER
                  ( 
                      PARTITION BY 
                          qsp.query_id 
                      ORDER BY
                          qsp.query_id
                      ROWS UNBOUNDED PRECEDING
                  ) AS n
         FROM     sys.query_store_runtime_stats AS qsrs
         JOIN     sys.query_store_plan AS qsp
             ON qsp.plan_id = qsrs.plan_id
         WHERE    qsrs.avg_duration >= ( 5000. * 1000. )
         AND      qsrs.avg_cpu_time >= ( 1000. * 1000. )
         AND      qsrs.last_execution_time >= DATEADD(DAY, -7, GETDATE())
         AND      qsp.query_plan IS NOT NULL
         /*
         Don't forget to change this to same thing!
         */
         ORDER BY qsrs.avg_cpu_time DESC
     )
SELECT   p.plan_id,
         p.query_id,
         p.avg_duration_s,
         p.avg_cpu_time_s,
         p.avg_query_max_used_memory,
         p.avg_logical_io_reads,
         p.avg_logical_io_writes,
         p.avg_tempdb_space_used,
         p.last_execution_time,
         qsqt.query_sql_text, 
         TRY_CONVERT(XML, qsp.query_plan) AS query_plan
FROM     sys.query_store_plan AS qsp
JOIN     the_pits AS p
    ON p.plan_id = qsp.plan_id
JOIN     sys.query_store_query AS qsq
    ON qsq.query_id = qsp.query_id
JOIN     sys.query_store_query_text AS qsqt
    ON qsq.query_text_id = qsqt.query_text_id
ORDER BY p.n DESC;

 

It works pretty well. Sometimes.

Other times, it runs for 4.5 minutes.

I know what you’re thinking: “Erik, you’re doing all sorts of crazy stuff in there. You’re making it slow.”

But none of the crazy stuff I’m doing is where the slowdown is.

It’s all in the same stuff I pointed out in the simpler queries.

12.5 seconds…
FOUR MINUTES

Testing, testing


I can’t stress how much I want Query Store to be successful. I absolutely love the idea.

But it just wasn’t implemented very well. Simple filtering against the data takes forever.

And yes, you can have NULL query plans for some reason. That’s rich.

?‍♂️?‍♀️?‍♂️?‍♀️?‍♂️?‍♀️?‍♂️?‍♀️?‍♂️

Usability issues don’t stop there. You can hit weird server performance issues, and reports are broken.

The irony of needing to tune queries so you can find queries to tune is ironic.

I’m nearly sure of it.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.

SQL Server T-SQL Protip: watch those TOPs without ORDER BY

In the documentation for TOP, the following is listed as a best practice:

In a SELECT statement, always use an ORDER BY clause with the TOP clause. Because, it’s the only way to predictably indicate which rows are affected by TOP.

Let’s work through a real world example.

The good


One of the great things about the “Top Resource Consuming Queries” query store SSMS report is that it is always able to render the query plan, even for very complex queries. I’m not aware of a pure T-SQL solution that can avoid requiring the end user to save xml to files in all cases. The report nearly always takes a long time to run, so it’s easy to capture the T-SQL that powers the grid details version:

DECLARE @results_row_count INT = 100,
@interval_start_time DATETIMEOFFSET = '2019-05-24 15:30:00 +00:00',
@interval_end_time DATETIMEOFFSET = '2019-05-24 18:00:00 +00:00';
 
SELECT TOP (@results_row_count)
    p.query_id query_id,
    q.object_id object_id,
    ISNULL(OBJECT_NAME(q.object_id),'') object_name,
    qt.query_sql_text query_sql_text,
    ROUND(CONVERT(float, SUM(rs.avg_duration*rs.count_executions))*0.001,2) total_duration,
    SUM(rs.count_executions) count_executions,
    COUNT(distinct p.plan_id) num_plans
FROM sys.query_store_runtime_stats rs
    JOIN sys.query_store_plan p ON p.plan_id = rs.plan_id
    JOIN sys.query_store_query q ON q.query_id = p.query_id
    JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
WHERE NOT (rs.first_execution_time > @interval_end_time OR rs.last_execution_time < @interval_start_time)
GROUP BY p.query_id, qt.query_sql_text, q.object_id
HAVING COUNT(distinct p.plan_id) >= 1
ORDER BY total_duration DESC;

Note the presence of the ORDER BY. I get exactly the results that I was expecting:

The bad


If I ask for extra details (who doesn’t want more details?), a significantly more complex query is generated:

-- grid format query with additional details
-- grid format query with additional details
DECLARE @results_row_count INT = 100,
@interval_start_time DATETIMEOFFSET = '2019-05-24 15:30:00 +00:00',
@interval_end_time DATETIMEOFFSET = '2019-05-24 18:00:00 +00:00';
 
With wait_stats AS
(
SELECT
    ws.plan_id plan_id,
    ws.execution_type,
    ROUND(CONVERT(float, SUM(ws.total_query_wait_time_ms)/SUM(ws.total_query_wait_time_ms/ws.avg_query_wait_time_ms))*1,2) avg_query_wait_time,
    ROUND(CONVERT(float, SQRT( SUM(ws.stdev_query_wait_time_ms*ws.stdev_query_wait_time_ms*(ws.total_query_wait_time_ms/ws.avg_query_wait_time_ms))/SUM(ws.total_query_wait_time_ms/ws.avg_query_wait_time_ms)))*1,2) stdev_query_wait_time,
    CAST(ROUND(SUM(ws.total_query_wait_time_ms/ws.avg_query_wait_time_ms),0) AS BIGINT) count_executions,
    MAX(itvl.end_time) last_execution_time,
    MIN(itvl.start_time) first_execution_time
FROM sys.query_store_wait_stats ws
    JOIN sys.query_store_runtime_stats_interval itvl ON itvl.runtime_stats_interval_id = ws.runtime_stats_interval_id
WHERE NOT (itvl.start_time > @interval_end_time OR itvl.end_time < @interval_start_time)
GROUP BY ws.plan_id, ws.runtime_stats_interval_id, ws.execution_type ),
top_wait_stats AS
(
SELECT TOP (@results_row_count)
    p.query_id query_id,
    q.object_id object_id,
    ISNULL(OBJECT_NAME(q.object_id),'') object_name,
    qt.query_sql_text query_sql_text,
    ROUND(CONVERT(float, SUM(ws.avg_query_wait_time*ws.count_executions))*1,2) total_query_wait_time,
    SUM(ws.count_executions) count_executions,
    COUNT(distinct p.plan_id) num_plans
FROM wait_stats ws
    JOIN sys.query_store_plan p ON p.plan_id = ws.plan_id
    JOIN sys.query_store_query q ON q.query_id = p.query_id
    JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
WHERE NOT (ws.first_execution_time > @interval_end_time OR ws.last_execution_time < @interval_start_time)
GROUP BY p.query_id, qt.query_sql_text, q.object_id
),
top_other_stats AS
(
SELECT TOP (@results_row_count)
    p.query_id query_id,
    q.object_id object_id,
    ISNULL(OBJECT_NAME(q.object_id),'') object_name,
    qt.query_sql_text query_sql_text,
    ROUND(CONVERT(float, SUM(rs.avg_duration*rs.count_executions))*0.001,2) total_duration,
    ROUND(CONVERT(float, SUM(rs.avg_cpu_time*rs.count_executions))*0.001,2) total_cpu_time,
    ROUND(CONVERT(float, SUM(rs.avg_logical_io_reads*rs.count_executions))*8,2) total_logical_io_reads,
    ROUND(CONVERT(float, SUM(rs.avg_logical_io_writes*rs.count_executions))*8,2) total_logical_io_writes,
    ROUND(CONVERT(float, SUM(rs.avg_physical_io_reads*rs.count_executions))*8,2) total_physical_io_reads,
    ROUND(CONVERT(float, SUM(rs.avg_clr_time*rs.count_executions))*0.001,2) total_clr_time,
    ROUND(CONVERT(float, SUM(rs.avg_dop*rs.count_executions))*1,0) total_dop,
    ROUND(CONVERT(float, SUM(rs.avg_query_max_used_memory*rs.count_executions))*8,2) total_query_max_used_memory,
    ROUND(CONVERT(float, SUM(rs.avg_rowcount*rs.count_executions))*1,0) total_rowcount,
    ROUND(CONVERT(float, SUM(rs.avg_log_bytes_used*rs.count_executions))*0.0009765625,2) total_log_bytes_used,
    ROUND(CONVERT(float, SUM(rs.avg_tempdb_space_used*rs.count_executions))*8,2) total_tempdb_space_used,
    SUM(rs.count_executions) count_executions,
    COUNT(distinct p.plan_id) num_plans
FROM sys.query_store_runtime_stats rs
    JOIN sys.query_store_plan p ON p.plan_id = rs.plan_id
    JOIN sys.query_store_query q ON q.query_id = p.query_id
    JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
WHERE NOT (rs.first_execution_time > @interval_end_time OR rs.last_execution_time < @interval_start_time)
GROUP BY p.query_id, qt.query_sql_text, q.object_id
)
SELECT TOP (@results_row_count)
    A.query_id query_id,
    A.object_id object_id,
    A.object_name object_name,
    A.query_sql_text query_sql_text,
    A.total_duration total_duration,
    A.total_cpu_time total_cpu_time,
    A.total_logical_io_reads total_logical_io_reads,
    A.total_logical_io_writes total_logical_io_writes,
    A.total_physical_io_reads total_physical_io_reads,
    A.total_clr_time total_clr_time,
    A.total_dop total_dop,
    A.total_query_max_used_memory total_query_max_used_memory,
    A.total_rowcount total_rowcount,
    A.total_log_bytes_used total_log_bytes_used,
    A.total_tempdb_space_used total_tempdb_space_used,
    ISNULL(B.total_query_wait_time,0) total_query_wait_time,
    A.count_executions count_executions,
    A.num_plans num_plans
FROM top_other_stats A LEFT JOIN top_wait_stats B on A.query_id = B.query_id and A.query_sql_text = B.query_sql_text and A.object_id = B.object_id
WHERE A.num_plans >= 1
ORDER BY total_duration DESC
)

Now we have not 1, not 2, but THREE TOP operators! But only one of them has an ORDER BY. The results are completely different, and are pretty much useless:

The ugly


This has nothing to do with TOP as far as I know, but I included it just for fun:

Final thoughts


All of you developers out there should watch your TOPs and make sure you’re using ORDER BY as needed. Otherwise, you might end up with annoyed end users writing blog posts about your code.

Thanks for reading!

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. I’m offering a 75% discount to my blog readers if you click from here. I’m also available for consulting if you just don’t have time for that and need to solve performance problems quickly.