Collecting Server Performance Metrics: Performance Monitor

Collecting Server Performance Metrics: Performance Monitor

Whether you're a DBA/administrator looking to tune a server, planning for hardware updates, or looking to make a move to the cloud there are a few key performance metrics you're going to want to collect. How do you know if the bottleneck is memory, disk or CPU? How do you know which of the Azure VM sizes to choose? The answer to these questions and other lies in the performance metrics on your existing servers.

Sure, you could just take the number of cores and memory, go to the Azure portal and choose a VM that matches. Is that really the best way to go about a cloud migration? Not really. Are you using all the memory that's on the server? Is the CPU constantly being maxed out? How do you even know what size disk to select?

There are a few key performance metrics that you'll want to gather to answer these questions. Today we are going to look at collecting a few of those metrics using a free tool built into Windows called Performance Monitor (also referred to as Perfmon).

Ultimately we are going to collect metrics on three main areas: CPU, memory and disk (then add network bandwidth for good measure).

CPU

On the CPU we will be looking to get information about the compute capacity, such as the number of processors and cores as well as the utilization.

Relevant CPU counters collected at the total level:

  • Processor - % Processor Time - Total CPU usage percentage

Memory

Like the CPU, this is pretty straightforward. How much do we have and how much are we using. Also important here is how often does data page to disk.

Relevant memory counters collected at the total level:

  • Memory - Available MBytes - Amount of free memory (careful reading into this SQL takes and holds memory)
  • Paging File - % Usage - Memory swap to disk

Disk

This is likely the area that is most complex and will generate the most discussion. Everything from "IOPS aren't a real thing" to "IOPS are the best way to measure disk performance". We are going to gather a few different pieces of information on the disk side. IOPS, latency, and throughput are all important in different ways for sizing a new system and for tuning an existing system. So let's go ahead and get some of the basics gathered. Depending on what this data will be used for, you can collect individual disk metrics or just choose the total for all disks. Collecting for individual disks makes analyzing more work, but also gives more detail.

Relevant disk counters collected for each logical disk:

  • Logical Disk - Avg. Disk Bytes/Read - Size of reads
  • Logical Disk - Avg. Disk Bytes/Write - Size of writes
  • Logical Disk - Avg. Disk sec/Read - Latency for reads
  • Logical Disk - Avg. Disk sec/Write - Latency for writes
  • Logical Disk - Disk Read Bytes/sec - Throughput for reads
  • Logical Disk - Disk Write Bytes/sec - Throughput for writes
  • Logical Disk - Disk Reads/sec - IOPS for reads
  • Logical Disk - Disk Writes/sec - IOPS for writes

Network Bandwidth

For a migration to Azure VMs this can be useful just to do some sanity checks on the new configuration and to help identify if slowdowns are on the Azure side or your connection from on-prem to Azure.

Relevant network counters collected at the server level:

  • Network Interface - Bytes Total/sec - Total network in and out

Let's get started.

Launching Performance Monitor

To access the Windows Performance Monitor simply click the Start button and start typing "Performance Monitor" until it shows up in the result list. If you happen to be running an older version of Windows without search functionality or can't seem to find Performance Monitor in the results go to Run and simply type "perfmon". It is also available in the Computer Management console under System Tools, then Performance.

This utility will allow live viewing of metrics and background collection. We will want to setup a background collection that runs for a long enough time to get a solid representative sample of the workloads on the server. In an ideal world this would run for a few days to cover the major events like changes in daily user load and nightly batch processing.

Setting Up the Data Collector

There are a few ways to setup the collection of performance counters. I will outline one way in this post.

In the Performance Monitor navigation tree, expand the Data Collector Sets node. Then, right-click on the User Defined node and select New > Data Collector Set.

Name the data collector set and select the option labeled Create manually (Advanced).

When asked for the type of data to include make sure to check the box next to Performance counter otherwise there are going to be a few extra steps to get the counters into the collector and the next few steps won't match up at all.

Now we are at the good part. Which performance counters would you like to log? Click the add button and refer to the list above for which counters to add. Let's keep the sample interval at 15 seconds. Click OK when all the counters are added. Depending on the number of disks there could be quite a large list.

After being returned to the counter summary, click Next.

Tell Performance Monitor where to save the results. Click Next.

On the final page you can change the account that will be used for data collection. Be sure that this account has access to gather the metrics from this computer or on the remote computer if you are setting this up to collection from another location. Keep the radio button on save and close, then click Finish.

You will be returned to the Performance Monitor console. We need to make a small change to the way the data is stored, as we will want it in a CSV format. Let's navigate to the collector. On the navigation tree expand the User Defined node and select the collector with the name specified in the earlier steps. There should be one collector there, likely named DataCollector01 because Performance Monitor is as creative as I am. Right-click on that collector and open the properties.

Note: If you want to control the name of the collector, rather than telling the wizard we wanted to add performance counters, you would leave that option blank. The wizard would end rather than allowing you to add performance counters. Then you'd right-click the data collector set and select New > Data Collector. The first page of that wizard allows for a custom name.

On the Performance Counters tab there is an option for Log format. The default is going to be Binary. Change this to tab or comma separated so the data is more easily consumed. You can also switch over to the File tab and change the name of this collector's file so it's more meaningful. Finally, click OK.

Now we are ready to start collecting!

Right-click the data collector set that was just created and select Start. The counters will be written every 15 seconds into a file in the directory specified in the collector wizard.

Viewing the Results

When you are ready to take a look at the collection results it's time to dust off your Excel or Power BI skills.

I like to bring this data into Power BI because it's easy to clean up the results and it's easy to build some quick visuals. A few clicks will help remove the first row in the results that are blank, unpivot all the columns into rows, remove my computer name from the counter text, add a few category labels, and change some column headers.

I can then go build a few visualizations. I have a summary showing the total, median, average, minimum and maximum for each counter, then the same calculations for each counter group.

I also have a page for each of the categories with some visualizations. Because who doesn't love a good line chart?

A table of data is great, and the details are absolutely necessary. However, it can be difficult to see the peaks and valleys in a list of counter results, especially when looking over a day or two. A simple line chart can really help with that. I generally want to see how the different data points trend over the entire collection period. I want to see how often the system runs close to the maximum. I want to see if the average is because there are sustained periods of maximum activity followed by sustained periods of no activity or if it's a constant amount of activity with some periodic peaks. All of these charts will look different based on if the server is running a transactional system or a data warehouse.

What do I do with this information?

I'm going to fall back on my consulting answer. It depends. If you're looking to tune the system, then this will help you know where to look. There are other pieces of information in SQL Server these days that will help point you in the right direction better than these counters.

For me, I generally use this when I'm working with customers who are looking to move servers into Azure. It's great to look at the existing server specs and just use that but it only gives part of the picture. We still need to understand the disk requirements as throughput and IOPS in Azure are not fixed values. They vary based on the VM size, disk type chosen, and the number of disks. We don't want to throw 24 cores at a server just because it has 24 today because the existing server may have changed over time. It may be that a consolidation was done and the server is now underpowered. It may be that services were moved off the server and now the CPU doesn't go above 26% even at peak times. We don't want to overpower VMs because then you're wasting money. We don't want to underpower VMs because then you're having a terrible experience.

In a future post I'll go through how to match this information up to Azure VM sizes. In the meantime, now you have a good way to gather some baseline performance metrics from your existing servers.