Troubleshooting MySQL Performance. Step Zero: Check If You Have a Performance Problem to Troubleshoot

In MySQL by Aurimas Mikalauskas1 Comment

I lost count of the number of times I had customers come to me for a regular MySQL performance check-up saying “no, we don’t have any performance problems right now”, just to find they do actually have them, they’re just not aware of it. It’s no big surprise this happens when you’re not using your system as actively as your users are.

So let me share with you one trick I’m using to quickly check if the system I’m on has any problems that need troubleshooting. Actually, it’s dead simple. And no, it’s not a slow query log, it’s Threads_running. Let me show you.

Wait, what’s wrong with the slow query log?

Nothing at all. MySQL’s slow query log is a great tool when you’re looking for slow queries. Or when you’re fishing for things to optimize – either to reduce the resource consumption, or to increase speed for queries you didn’t know are slow.

And it’s absolutely great for that – I’m using slow query log all the time. But:

  1. I want something I can take a look at right away (like vmstat, or pt-diskstats) and with zero performance impact.
  2. A “slow” query doesn’t necessarily imply a performance problem by my definition. How do you define what is a slow query? A query that takes more than 10s? More than 1s? 0.1s?

There are a lot of background queries such as reports or queries generating summary tables, that are bound to take long and that’s totally fine. We should just leave them alone.

Let’s define a MySQL Performance Problem then

Agreed. Here’s how I define it:

MySQL has a performance problem if a request is taking longer than it should.

In other words, if you run a query that should always take 0.01s to return and (sometimes, or always) it takes considerably more than 0.01s – you have a problem.

Oh and notice I didn’t say Application Performance Problem. That’s because Application Response Time often won’t correlate with MySQL Query Response Time. Sure, if MySQL query takes 5s to return, Application will take at least 5s to respond. But it could also take 20s to respond if there’s a RESTful call taking 10s and 5 other API calls taking another 10s, even though MySQL query only takes 0.005s to return.

Makes sense?

Using Threads_running to Spot Performance Problems

I promised it will be dead simple, so here we go. We login to the server, we check the usual stuff – run vmstat 1 for a few minutes to see the server utilisation, pt-diskstats in case the disks look suspicious. Then, we run this:

Did you see that? Yep, that blip at the 6th second. Well that’s what I’m talking about. But first, what does that Threads_running actually show?

Basically, it is a number of requests that MySQL is dealing with currently. Similar to “r” column in vmstat that shows the number of requests that CPU is dealing with. Of course, it doesn’t tell anything about the state of it – whether it’s waiting on a table level lock, mutex or reading the data, but for our purposes it’s perfect. Well almost perfect.

Thing is, Threads_running spikes could mean different things. It could be that the server is just getting bursts of requests (this often happens with cache stampedes), or, way more often, it means queries are piling up.

And if queries are piling up, then you have a performance problem.

There could be many reasons why queries are piling up, and it’s not the goal of this article to discuss them (that would be the Step 1), we’ll talk about it a bit later on this blog. But now at least we know we have something to work with.

You maybe wondering why you would not notice these on zabbix or cacti? The reason is very simple – these tools show 5 minute averages for most status variables and averages tend to gloss things over. In fact, in the case of Threads_running, the zabbix agent should take a sample exactly at the right time when there is a blip, and if the blip only occurs once every 10 minutes, it will take a long time before zabbix catches a single instance of it.

Which is why 1s sampling is sooo important.

Oh and one more thing. If there are no blips, this definitely doesn’t mean there are no performance problems, not at all. But for these other problems you have to dig deeper (usually using the slow query log or performance schema), whereas this Threads_running check takes literally a second to kick off and you can monitor the server workload live right away.

Now before we wrap up, I’d like to show you something even more interesting.

Adding a spark-line for more clarity

When you only have 1-9 concurrent queries running most of the time, noticing spikes is trivial, because all of the sudden you see double digit numbers instead of single digit ones.

On a busy server with over 10-30 concurrent threads, however, spikes are much harder to see, so in such cases I add a simple spark-line that helps me see more vividly. Here, let me show you one such case:

In this case noticing the blip is much harder, especially if you consider that normally you have to scroll through pages and pages of these.. And here’s how the turbo-charged version looks like:

Of course with that many concurrent queries, it’s a good question whether this is a glitch, or just a normal workload fluctuation, but if you monitor the server long enough, you will see if these are repeating and sometimes tend to take longer to clear off.

BTW, this sparkline() function can be used on top of vmstat or other command line tools to help you see trends better. And you can see there’s a variable divisor which is useful with bigger numbers so you don’t get too many dots on your terminal:

I hope you will find this useful.

In next few weeks I’d like to write about how do you actually troubleshoot MySQL now that you know there’s something to troubleshoot. Let me know if that’s something of interest.

Share this Post

About the Author

Aurimas Mikalauskas

Facebook Twitter Google+

Linux hacker since the age of 0xE, deeply passionate about high performance and scalability. Former architect of scalable systems at Percona - largest independent MySQL and MongoDB consultancy company (for 9 years). Audiobook addict, piano & guitar virtuoso, scuba diver, wind surfer and a wannabe glider pilot. Learn more about the author.