Reading the load testing report

Reading a test report — especially a load test report — can be tricky at first. But once we understand what each metric means and why it matters, it becomes easier to see what questions the report is trying to answer, like how well the system performs under pressure or where it might slow down.

Core Performance Testing Metrics¶

Metric Name	Description
Response time	This is the amount of time it takes for a request to be processed and for a response to be returned. A high response time can indicate that the system is overloaded or that there are bottlenecks in the system. For performance analysis it is very useful to use percentiles. The most common percentiles are p90 (90^th percentile), p95 (95^th percentile) and p99 (99^th percentile).
Throughput	This is the number of requests that a system can handle per unit of time. A high throughput is desirable, as it indicates that the system can handle a large volume of traffic.
Error Rate	This is the percentage of requests that result in an error. A high error rate can indicate that the system is not functioning properly and needs to be optimized.
Resource utilization	This is the percentage of a system's resources (such as CPU, memory, and network bandwidth) that are being used during the test. High resource utilization can indicate that the system is reaching its limits and may need to be scaled.

Why Averages are Misleading?¶

Averages do not show the range or variability of data. For example, if most response times are very fast, but a few are extremely slow, the average might suggest everything is fine, even though some users experience poor performance.
Ignores outliers. If you have one very slow response time, it can significantly affect the average, making it seem like the system is slower overall than it actually is for most users.
Lacks detail. Important details like the distribution of response times, peak loads, and periods of high latency are lost.
Doesn't reflect user experience. If a small percentage of users have a very poor experience, it can still lead to dissatisfaction, even if the average response time is acceptable.

How to complement the averages?¶

Including a few more measurements such as standard deviation (how close from the average) and percentiles help averages paint a better picture of the system's performance.

Standard Deviation¶

How are the response times spread out from the average?

A low value means that the responses were close to the average, while a high value means they were more varied. If we observe a high standard deviation, it indicates that our application is not responding consistently.

For example, an API that produces an average response time of 800ms will have to factor in the fastest and the slowest API response time in order to create an observation by standard deviation as to how tightly near to 800ms or far from 800ms.

Percentiles¶

How does a response time compare to all the others?

The bigger our user base, the closer we want to measure to the 100^th percentile. Whether a 3-second response time at the 99^th percentile is good enough depends on how many users are in that 1%.

For example, if we have 1000 users and 10 of them experience a response time of 3 seconds, then that is not acceptable. However, if we have 1 million users and only 10 of them experience a response time of 3 seconds, then that is acceptable.