Jef Spaleta (jspaleta) wrote,
Jef Spaleta
jspaleta

Giving accurate Fedora client counting the 115% effort it deserves.

If you are not familiar with the Fedora Client statistics effort take a moment and read:
http://fedoraproject.org/wiki/Statistics

I'd like to take a moment and talk specifically about how to do a better job at interpreting the total unique IP connections listed here:
http://fedoraproject.org/wiki/Statistics#Total_repository_connections

There are two competing factors which influence how unique IP counts can be interpreted as client counts.  On the one hand there is the effect of private subnets which map multiple clients to a single IP address. This would lead to the unique IP address count to be an undercount of the actual number of clients.  On the other hand we know we have clients which roam across networks and those clients could easily be counted multiple times in the unique IP logs, leading to the unique IP counts being an over estimate of the actual number of clients.

So which is it in reality? Is the 14 million+ unique IP counts sitting in the Fedora MirrorManager logs an over or under count of reality?

I'm here to tell you friends, that its an undercount..by about 15%.  There are probably about 16 million Fedora clients in the wild in reality. How do I get that?

Easy, I had my buddy Mike "Chops" McGrath do a little data mining of the Smolt logs and come up with an aggregate ratio of Smolt UUIDs to unique IPs.  That ratio can be taken as a scaling factor to convert unique IP counts to unique client counts given the following assumptions.

1) The smolt userbase represents a sampling of the overall client base which is no more likely to be on a private network than the average Fedora client.
2) The smolt userbase represents a sampling of the overall client base which is no more likely to have a dynamic IP address than the average Fedora client.
3) The ratio is reasonably stable over a release cycle timescale, but may be subject to a slowly varying drift.

If those three assumptions hold the ratio of UUIDs to IPs is an adequate scaling factor.  We looked over the last 16 months of aggregate Smolt logging data here is what we found:
Mean Ratio: 1.16
Ratio Stdev: 0.0263

Here's is a graph of the Smolt ratios calculated monthly.

Smolt Correction Graph

I'm pretty confident in the validity of scaling factor. I'm also very pleased to see that the number is greater than 1.  This means that the currently unique IP address statistics we are showing are a conservative estimate of the actual client numbers.  No caveats, no soft-selling.
There are 14 million+ Fedora clients out in the wild and its time we start making that point loudly and confidently.

-jef"Measurement methodology matters"spaleta

 

 


Subscribe
  • Post a new comment

    Error

    default userpic

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 13 comments