Today I've exported the data from Google Search Console in CSV, and I've been playing with it. I've sorted the articles of this website from the most clicked to the least.
Here is the result:
The sample has 228 articles in total, and it only includes posts with a minimum of 1 click:
|[1 - 228]||50264||100.00%||220.46||1,024.14|
Average (µ) of 220.46 visits per article with a standard deviation (σ) of 1,024.14 visits.
The most visited post got 26.76% clicks (13450), and the first 20 articles contributed to 80.53% clicks (40479):
|[1 - 20]||40479||80.53%||2,023.95||2,947.97|
Let's calculate the z-score of the 3 first articles:
The probability of an event assuming a cumulative standard normal distribution:
Using the previous formula and calculating the probability for a :
# Cumulative standard deviation function y = f(u) y = (1/sqrt(2 * pi)) * exp(-u^2/2) endfunction # Calculating the integral [P, ier, nfun, err] = quad ("f", -Inf, 12.92) # P = 1.0000
The probability of that event is .
A normal distribution considers that a 12.92σ event is unlikely to happen. The second post is a 5.09σ event, and it has also a probability of 0 to happen.
The conclusion is that a single post can move the mean and average of the whole distribution. Thus, it's incorrect to assume a normal distribution here.
For example, let's assume that I didn't publish the most clicked post:
|[2 - 228]||36814||73.24%||162.18||525.04|
The average number of clicks and the standard deviation would be reduced by approximately half.
In practical terms:
Hi, I'm Erik, an engineer from Barcelona. If you like the post or have any comments, say hi.