SYS-CON MEDIA Authors: Pat Romanski, Yeshim Deniz, Zakia Bouachraoui, Liz McMillan, Janakiram MSV

Blog Feed Post

Using the Right Mean for Meaningful Performance Analysis

Performance analytics is a field which deals with huge discrete data sets that need to be grouped, organized, and aggregated to gain an understanding of the data. Synthetic and real user monitoring are the two most popular techniques to evaluate the performance of websites; both these techniques use historical data sets to evaluate performance.

In web performance analytics, it is preferred to use statistical values that describe a central tendency ( the odd numbermeasure of central location) for the discrete data set under observation. The statistical metric can be used to evaluate and analyze the data. These data sets have innumerable data points that need to be aggregated using different statistical approaches.

With the number of statistical metrics available, the big question is how do you determine the right statistical metric for a given data set. Mean, Median, and Geometric Mean are all valid measures of central tendency, but under different conditions, some measures of central tendency are more appropriate to use than others.

This article discusses different statistical approaches used in the world of web performance evaluation and the methods preferred in different contexts of performance analysis using real-world performance data.

Common Statistical Metrics

  • Arithmetic Mean (Average)

The average is used to describe a single central value in a large set of discrete data. The mathematical formula to calculate the average isThe average is equal to the sum of all data points divided by the number of items, where ‘n’ represents the number of data samples.

  • Median

Median is the middle score for a set of data that has been arranged in the order of magnitude. Let us consider a set of data point as [12, 31, 44, 47, 22, 18, 60, 75, 80]. To get the median of the data set the data points need to be sorted in ascending order.

12, 18, 22, 31, 44, 47, 60, 75, 80

The median for the above data set is ’44’ as the middle item is (n+1)/2 if odd number of items. The median would be n/2 if there is even number of items in the series.

  • Geometric Mean

Geometric mean is the nth positive root of the product of n positive given values. The mathematical formula to calculate the geometric mean for X containing n discrete set of data points is

  • Standard Deviation

Standard deviation is used for measuring the extent of variation of the data samples around the center. The mathematical formulae to calculate the standard deviation for a set of data samples is

Where ‘a’ denotes the average of ‘n’ data samples of value ‘x’.

Determining the Right Statistical Approach

The two graphs below illustrate the different data distributions we come across in web performance monitoring. Using the formulae explained above, we have derived the average, median and the geometric mean of the webpage load time for website A and B.

Webpage load time Website A

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat4-300x130.png 300w, http://blog.catchpoint.com/wp-content/uploads/2017/05/stat4-768x332.png 768w" sizes="(max-width: 993px) 100vw, 993px" />

 Webpage load time Website B

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat5-300x123.png 300w, http://blog.catchpoint.com/wp-content/uploads/2017/05/stat5-768x314.png 768w" sizes="(max-width: 994px) 100vw, 994px" />

Let us discuss a few use cases to understand how different statistical metrics are applicable in different scenarios.

USE CASE 1

G1 – Scatter plot showing webpage load time data set

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat6-300x227.png 300w" sizes="(max-width: 500px) 100vw, 500px" />

G2 – Histogram shown the distribution of data

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat7-300x223.png 300w" sizes="(max-width: 492px) 100vw, 492px" />

The graphs G1 and G2 plots data for webpage load time. The uneven distribution of the data points in the scatterplot and histogram helps us understand how inconsistent the load time is.

We can see a higher number of data points in the trailing end of the Gaussian distribution in the histogram (G2); this means that most of the data points are of higher value.

What would be a good statistical metric in such cases? Before answering this, lets us take an example. Consider the following data set

Data Set = [4,4.3,5,6.5,6.8,7,7.2,20,30]

If we use median it gives a value of 6.8. But most of the data points tend towards a higher range with 30 being the highest. So, taking the median value in cases with higher outliers is not an accurate estimate of the page load time. Median should be used for data sets with fewer outliers and values that are concentrated towards the center of the Gaussian distribution.

Now let us take the average for this same data set. This gives us a value of 27.4 which is slightly more skewed towards the outlier values. Once again, the average is not an accurate measure for web page load time.

Since median and average don’t apply to this set of data, let us consider the geometric mean. We get a value of 7.8 using geometric mean; this value is closer to the central value and is not skewed to the higher or lower values in the data set.

In this use case, we have determined the geometric mean as the most accurate statistical method to analyze the data.

USE CASE 2

G3 – Scatter plot showing webpage load time data set

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat8-300x194.png 300w" sizes="(max-width: 462px) 100vw, 462px" />

G4 – Histogram shown the distribution of data

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat9-300x230.png 300w" sizes="(max-width: 483px) 100vw, 483px" />

In the graphs above (G3 and G4), most of the data points are close to each other with a higher population in the center of Gaussian surface. The difference between each of the data points are much less than the distribution considered in the previous scenario. This indicates a consistent page load time across different test runs.

Using average or median to evaluate the central tendency would be more accurate in this case as there are not many outliers so the average wouldn’t be skewed towards the outlier values.

USE CASE 3

  Website A

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat10-300x123.png 300w, http://blog.catchpoint.com/wp-content/uploads/2017/05/stat10-768x314.png 768w" sizes="(max-width: 832px) 100vw, 832px" />

Website B

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat11-300x123.png 300w, http://blog.catchpoint.com/wp-content/uploads/2017/05/stat11-768x316.png 768w" sizes="(max-width: 827px) 100vw, 827px" />

 

The above data distribution shows the webpage load time for two different websites. In performance analysis, we need to evaluate the consistency of a webpage. And if there is high volatility in the page performance then we should be able to measure the difference between the central value versus the outliers.

In this case, the standard deviation values are 9.1 and 1.7 seconds for website A and B respectively while the median for website A and B are 26.6 and 18.1 seconds. Based on the standard deviation values, we see there are data points for website A at 36 secs (median + SD) and website B at 20 secs (median + SD). This means that website A had high number of data points concentrated at 36 secs or more and website B had high number data points concentrated at 20 secs or more.

To know what percent of data had higher value when compared to the standard deviation we can use the cumulative distribution graph.

Website A                                                                     
http://blog.catchpoint.com/wp-content/uploads/2017/05/stat12-300x127.png 300w, http://blog.catchpoint.com/wp-content/uploads/2017/05/stat12-768x324.png 768w" sizes="(max-width: 850px) 100vw, 850px" />
 Website B

http://blog.catchpoint.com/wp-content/uploads/2017/05/stat13-300x127.png 300w, http://blog.catchpoint.com/wp-content/uploads/2017/05/stat13-768x325.png 768w" sizes="(max-width: 827px) 100vw, 827px" />

From the cumulative distribution graph shown above we can see that website A had almost 20% of data points higher than the standard deviation values whereas website B had 10% of data more than standard deviation value.

Standard deviation can be used for evaluating how far and consistent the data points are with respect to the central value of data distribution in performance analysis.

 

Median and average are applicable when the data points are concentrated towards the center of the Gaussian distribution. On the other hand, if there are more data points distributed towards the tail of the Gaussian distribution and there is a high difference between each data point then geometric mean would be a better choice. Standard deviation should be used to understand the variance of the data points from the median value and to gauge the consistency of the sites performance.

 

The post Using the Right Mean for Meaningful Performance Analysis appeared first on Catchpoint's Blog - Web Performance Monitoring.

Read the original blog entry...

More Stories By Mehdi Daoudi

Catchpoint radically transforms the way businesses manage, monitor, and test the performance of online applications. Truly understand and improve user experience with clear visibility into complex, distributed online systems.

Founded in 2008 by four DoubleClick / Google executives with a passion for speed, reliability and overall better online experiences, Catchpoint has now become the most innovative provider of web performance testing and monitoring solutions. We are a team with expertise in designing, building, operating, scaling and monitoring highly transactional Internet services used by thousands of companies and impacting the experience of millions of users. Catchpoint is funded by top-tier venture capital firm, Battery Ventures, which has invested in category leaders such as Akamai, Omniture (Adobe Systems), Optimizely, Tealium, BazaarVoice, Marketo and many more.

Latest Stories
Financial enterprises in New York City, London, Singapore, and other world financial capitals are embracing a new generation of smart, automated FinTech that eliminates many cumbersome, slow, and expensive intermediate processes from their businesses. Accordingly, attendees at the upcoming 23rd CloudEXPO, June 24-26, 2019 at Santa Clara Convention Center in Santa Clara, CA will find fresh new content in full new FinTech & Enterprise Blockchain track.
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
As you know, enterprise IT conversation over the past year have often centered upon the open-source Kubernetes container orchestration system. In fact, Kubernetes has emerged as the key technology -- and even primary platform -- of cloud migrations for a wide variety of organizations. Kubernetes is critical to forward-looking enterprises that continue to push their IT infrastructures toward maximum functionality, scalability, and flexibility. As they do so, IT professionals are also embr...
In today's always-on world, customer expectations have changed. Competitive differentiation is delivered through rapid software innovations, the ability to respond to issues quickly and by releasing high-quality code with minimal interruptions. DevOps isn't some far off goal; it's methodologies and practices are a response to this demand. The demand to go faster. The demand for more uptime. The demand to innovate. In this keynote, we will cover the Nutanix Developer Stack. Built from the foundat...
CloudEXPO has been the M&A capital for Cloud companies for more than a decade with memorable acquisition news stories which came out of CloudEXPO expo floor. DevOpsSUMMIT New York faculty member Greg Bledsoe shared his views on IBM's Red Hat acquisition live from NASDAQ floor. Acquisition news was announced during CloudEXPO New York which took place November 12-13, 2019 in New York City.
AI and machine learning disruption for Enterprises started happening in the areas such as IT operations management (ITOPs) and Cloud management and SaaS apps. In 2019 CIOs will see disruptive solutions for Cloud & Devops, AI/ML driven IT Ops and Cloud Ops. Customers want AI-driven multi-cloud operations for monitoring, detection, prevention of disruptions. Disruptions cause revenue loss, unhappy users, impacts brand reputation etc.
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.
The graph represents a network of 1,329 Twitter users whose recent tweets contained "#DevOps", or who were replied to or mentioned in those tweets, taken from a data set limited to a maximum of 18,000 tweets. The network was obtained from Twitter on Thursday, 10 January 2019 at 23:50 UTC. The tweets in the network were tweeted over the 7-hour, 6-minute period from Thursday, 10 January 2019 at 16:29 UTC to Thursday, 10 January 2019 at 23:36 UTC. Additional tweets that were mentioned in this...
The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio a...
Technology has changed tremendously in the last 20 years. From onion architectures to APIs to microservices to cloud and containers, the technology artifacts shipped by teams has changed. And that's not all - roles have changed too. Functional silos have been replaced by cross-functional teams, the skill sets people need to have has been redefined and the tools and approaches for how software is developed and delivered has transformed. When we move from highly defined rigid roles and systems to ...
After years of investments and acquisitions, CloudBlue was created with the goal of building the world's only hyperscale digital platform with an increasingly infinite ecosystem and proven go-to-market services. The result? An unmatched platform that helps customers streamline cloud operations, save time and money, and revolutionize their businesses overnight. Today, the platform operates in more than 45 countries and powers more than 200 of the world's largest cloud marketplaces, managing mo...
Docker and Kubernetes are key elements of modern cloud native deployment automations. After building your microservices, common practice is to create docker images and create YAML files to automate the deployment with Docker and Kubernetes. Writing these YAMLs, Dockerfile descriptors are really painful and error prone.Ballerina is a new cloud-native programing language which understands the architecture around it - the compiler is environment aware of microservices directly deployable into infra...
The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential. DevOpsSUMMIT at CloudEXPO expands the DevOps community, enable a wide sharing of knowledge, and educate delegates and technology providers alike.
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...