Better Domain Knowledge Makes For Better Queries

Better Domain Knowledge Makes For Better Queries


Chapters

Full Transcript

Erik Darling here with Darling Data, continuing my upward trajectory as a monitoring tool mogul in the SQL Server community, but today we’re going to take a little break from shoveling free software to the masses, and we’re going to talk a little bit about how you can use better domain knowledge to write better queries, and I’ll give you a little example of that using the Stack Overflow database, because that’s usually how these things work, isn’t it? Anyway, down in the video description, you will find all sorts of helpful links wherein you can hire me for consulting, buy my training, become a paid supporting member of this YouTube channel, and also where you can do other things, like ask me office hours questions, and as always, if you enjoy what I’m doing over here, please do like, subscribe, and tell a friend, because I like to see the number, I like to, bigger number is better for these things, so please, please spread the good word, my friends. Okay, we’re going to briefly pimp some free software here. I built a free SQL Server monitoring tool, free, open source, available on GitHub, the link to that is down in the video description as well.
If you are in need of that sort of thing, if you care about the performance of the SQL Server, but maybe don’t have the budget to get a paid monitoring tool, which charges you way too much for the pieces of crap that they are, then you can use mine, which is not a piece of crap, and well, you know, it’s pretty good at this point. So, you’ve got that. It’s an option for you, right now. I will be out in the world doing all sorts of fun things.
I have SQL Day coming up in Poland, May 11th through 13th. Data Saturday, Croatia, June 12th and 13th. Pass on tour in Chicago, Illinois, May 7th and 8th.
And, of course, Pass Data Summit, the full big deal out in Seattle, Washington, November 9th through 11th. So, if you are in the ticket buying mood and any one of those locations seem seemly to you, well, I suggest you go buy tickets. I’ll have pre-cons at all of them, teaching advanced T-SQL stuff, so, you know, you should show up, do that.
Anyway, it is March, but it doesn’t feel much like March today. Actually, I’m not sure what it feels like, because my fingers aren’t really feeling much. So, you know, hopefully it’s not a stroke, but I’m pretty sure it’s just cold, because I can still do a drumroll.
So, anyway, we are back into having two spawns of SSMS open territory in my life, so here we go. Now, a lot of times when I am helping clients write queries, you know, it takes, there’s like a learning curve with just about any data set. Because when you first sit down and you look at things, you might say, oh, well, they’ve got some tables.
And, you know, this is the way that they’re joining those tables together. These are the existing queries, so maybe I should follow that pattern. But then you might find that some of those patterns aren’t really correct and don’t really yield the results that anyone would expect or want.
You get kind of weird stuff back, and you’re like, I don’t think that’s exactly it, right? Like, you know, you’ll learn to get a sort of taste for these things the longer you’re consulting. And so, like, a lot of the time that I spend, aside from, like, making the queries faster and, you know, figuring out why the queries are slow, stuff like that, is just sort of figuring out the data that’s underneath them.
Because sometimes the more you look, the stranger things you find. So, if you were to sit down at the Stack Overflow, sit down and you became a developer at Stack Overflow. Not that I think they’re hiring at this point.
No, this is the same. If you were to become a developer at Stack Overflow, you might see, like, users and you might see posts and you might say, well, I’m going to assume that there is a pretty strong relationship between users posting things and posts existing and stuff like that. And, like, I’m pretty sure this seems like, you know, a pretty strong relationship, right?
Users post things. And so, you might get a request to write a query and you might say, no problem. I need to find the user with the most posts, so I’m going to go do that.
And you might run a query that looks like this. You might get no results back using the magical live zoom feature available in SQL Server Management Studio 2022 point something. I can zoom right in there and my green screen is having a tough day.
But I can zoom in and see that I got no results back and I am quite saddened by this. So, this is where sort of, like, examining the data and getting better domain knowledge about it can help you write faster queries better and better queries faster. Both ways, right? Effective and efficient.
People say you can’t have both. I say you can’t because I care about you. So, if we were to look at who has the most posts in the post table, we would find this owner user ID zero. Who is this mysterious owner user ID zero with 226,238 posts?
What a prolific, prolific thing that must be. What a user. Well, we don’t have that user ID in the post table.
That is not a person. That is not a thing that exists. And so, we become confused and sad. Maybe a bit listless, limp.
You know? Just a flab in the world. And then we start to lose faith in these relationships. You might say, well, gosh darn it, why don’t we have a foreign key?
You might, maybe it’s not enforced or something, right? Maybe someone untrusted the foreign key. There’s all sorts of things that are like, ah, I thought that made sense, but now it doesn’t make sense. So, like, again, the more domain knowledge you have about the environments you’re working in, the more effective you become in them.
So, if, you know, going through this exercise and saying, well, I no longer trust, like, that foreign key, SQL Server doesn’t trust it. I don’t trust it either now. You might say, well, what if I go and check first, right?
So, what if I say, now, instead of just getting whatever from the post table, I say, well, I need to add a qualifier here. I have to say that whatever user ID I find, that user ID has to exist in the users table because without that, we get no results. And that doesn’t make for a very good query experience.
And so, with that domain knowledge applied, now we can run this query and we can get back a very expected result, right? We get back top post to John Skeet. So, my advice here is don’t take relationships for granted in SQL or in life, I suppose.
When you’re, you know, first starting to work with a data set and learn about that data set, I do encourage you to explore it so that, you know, because you might be helping someone tune a query that’s wrong. You might be helping someone tune a query that doesn’t make sense, right? So, the more time you spend learning the data and exploring, the better off you are.
And I guarantee you, it will help you not only write better queries, but in some cases, it might even help make your queries faster because you’ll be avoiding touching a lot of data that you don’t have to. Anyway, thank you for watching. I hope you enjoyed yourselves.
I hope you learned something. And I will see you m-m-m-m-monday for office hours. All right. I hope I have some questions. We’ll have to go look. I’ll go check out the Google form today.
All right. Goodbye. Have a good weekend.

Going Further


If this is the kind of SQL Server stuff you love learning about, you’ll love my training. Blog readers get 25% off the Everything Bundle — over 100 hours of performance tuning content. Need hands-on help? I offer consulting engagements from targeted investigations to ongoing retainers. Want a quick sanity check before committing to a full engagement? Schedule a call — no commitment required.



Leave a Reply

Your email address will not be published. Required fields are marked *