May 19th, 2009

Proprietary web services... the open ecosystem's steroid scandal.

Okay, why exactly was steroid use by major league baseball players such a scandal? Objectively speaking, steroid use actually helped the for-profit businesses which keep major league baseball going make more money. Better hitting, better pitching...better ticket sales...better free agent salaries...better business. Steroids was good for the business which makes major league baseball possible... why wasn't it ultimately good for major league baseball?

I'll tell you why.. because baseball is more than just the business of selling sports entertainment, baseball like all professional sports competitions is grounded deeply on the sense of fair play.  The ground rules of fair play are easy to summarize.  Give competitors equitable and fair access to similar equipment and resources and let them compete against each other based primarily on personal dedication and skill under an established set of rules.  Steroids upset the ground rules of fair play as it was perceived to be a quick fix to shore up shortcomings in either dedication and/or skill.  Though I could probably argue it took a certain sort of personal dedication to the concept of achievement to inject yourself in the arse with a needle every day.  I hate needles, I could never be a major league baseball player.

Steroids were great for the business of baseball..but it was bad for baseball itself.   That's an important point...a point I think the open software ecosystem and the businesses which back it need to understand.  I think proprietary web services maybe the equivalent of baseball's steroid scandal. Proprietary web services may in the short run be good for the business interests which support the open software ecosystem..but it might ultimately be very bad for the open software ecoysystem itself.  Maybe not for exactly the same reasons, analogies are only good up to a point, but I think its worth reflecting on as an example of how business interests can work against the interests of the larger community. 

I think the Franklin Street Statement on the freedoms of network services has come a little late as a a fully preventative measure, but it might not be too late to have a positive impact on aligning long term business interests with community interests to prevent a deep debilitating rift.  Going back to the baseball analogy, its sort of like its 1988 when Tom Boswell broke the silence concerning Canseco's use of steroids in an interview with Charlie Rose on CBS.  The Franklin Street Statement maybe that sort of moment.  I'd like to hope the open development ecosystem will take less than the decade it took baseball to deal with steroids, to confront the business interests which are leveraging the concepts of proprietary web services which undermine the sense of fairness in the open development model.

-jef"play ball"spaleta

Giving accurate Fedora client counting the 115% effort it deserves.

If you are not familiar with the Fedora Client statistics effort take a moment and read:
http://fedoraproject.org/wiki/Statistics

I'd like to take a moment and talk specifically about how to do a better job at interpreting the total unique IP connections listed here:
http://fedoraproject.org/wiki/Statistics#Total_repository_connections

There are two competing factors which influence how unique IP counts can be interpreted as client counts.  On the one hand there is the effect of private subnets which map multiple clients to a single IP address. This would lead to the unique IP address count to be an undercount of the actual number of clients.  On the other hand we know we have clients which roam across networks and those clients could easily be counted multiple times in the unique IP logs, leading to the unique IP counts being an over estimate of the actual number of clients.

So which is it in reality? Is the 14 million+ unique IP counts sitting in the Fedora MirrorManager logs an over or under count of reality?

I'm here to tell you friends, that its an undercount..by about 15%.  There are probably about 16 million Fedora clients in the wild in reality. How do I get that?

Easy, I had my buddy Mike "Chops" McGrath do a little data mining of the Smolt logs and come up with an aggregate ratio of Smolt UUIDs to unique IPs.  That ratio can be taken as a scaling factor to convert unique IP counts to unique client counts given the following assumptions.

1) The smolt userbase represents a sampling of the overall client base which is no more likely to be on a private network than the average Fedora client.
2) The smolt userbase represents a sampling of the overall client base which is no more likely to have a dynamic IP address than the average Fedora client.
3) The ratio is reasonably stable over a release cycle timescale, but may be subject to a slowly varying drift.

If those three assumptions hold the ratio of UUIDs to IPs is an adequate scaling factor.  We looked over the last 16 months of aggregate Smolt logging data here is what we found:
Mean Ratio: 1.16
Ratio Stdev: 0.0263

Here's is a graph of the Smolt ratios calculated monthly.

Smolt Correction Graph

I'm pretty confident in the validity of scaling factor. I'm also very pleased to see that the number is greater than 1.  This means that the currently unique IP address statistics we are showing are a conservative estimate of the actual client numbers.  No caveats, no soft-selling.
There are 14 million+ Fedora clients out in the wild and its time we start making that point loudly and confidently.

-jef"Measurement methodology matters"spaleta