Log in

entries friends calendar profile Previous Previous Next Next
I have an idea! - Jef"I am the pusher robot"Spaleta
ramblings of the self-elected Fedora party whip
I have an idea!
I'm stilling thinking really really hard about how to do more focused new contributor recruitment and training in Fedora.

Anyone reading this and working on the mugshot service? Or if you happen to be sitting next to someone in your cubicle farm who is, throw something at them. I'm stilling waiting for a sanitized database dump to play for package pattern analysis like i've written about previously in my package tempest post. Can I find distinct clumps of users by looking at the package usage patterns in mugshot? If so can I link them up with an existing SIG or encourage them to form new SIGs for package sub-collections that most interest them as users and start contributing to the distribution release effort?

If we had a popcon implementation running for Fedora, that would be useful as a dataset to crunch, but it would be harder to complete the loop and do the targeted recruitment. Mugshot is interesting in that I can do the analysis anonymously, and then hand it back to the mugshot service in a way that invitations to mugshot users can be made without leaking any personal information to me or to anyone else. Its just another sort of notification service that mugshot pushes to service users.

I'm also stilling thinking very very hard about applying similarity analysis in an task tempest so we can build a dataset of tasks or dare I say, ideas, to datamine looking for good contributor recruitment opportunities and possibly some organized skill development to help people start hacking in areas that interest them. Problem is we don't have anything in terms of a dataset to crunch on in Fedora that I could use as a starting point...yet. But if the mugshot data analysis bares some contributor recruitment fruit, that should help get the ball rolling.

4 comments or Leave a comment
From: (Anonymous) Date: January 5th, 2009 01:22 pm (UTC) (Link)


Yet very few people use mug-shot.

If we're interested in packages, what we need is basically Debian popularity contest, with the installer asking folks if they want to opt-in, so we get a high number of people in there by default. However that would also require tracking who uses what, which feels like a privacy issue to me (it is -- only terrorists use emacs).

While I would be interested in people-who-use-also-use, as for computer-generated recruitment, personally I don't find this so interesting, humans are pretty good at finding projects they like and you'll never get that expressed well by a machine IMHO. There are a lot more factors deciding whether someone would want to participate. If you had the "these are alike and popular" data combined with a way to identify mailing list contributers on those topics? Still, that's hard, and I think the gain could be small.

Couldn't we benefit first from better advertising of the SIGS on fedoraproject.org and even the Fedora start page?

-- Michael DeHaan who still hasn't got a LJ account and isn't bothering with Fedora open ID again because LJ always yells at him.

jspaleta From: jspaleta Date: January 5th, 2009 06:36 pm (UTC) (Link)

Re: Hmmm

Once I have some data mining done...I want to target SIG advertising to particular clumps of people that align with SIGs.

How many SIGs do we have right now? 20+ different SIGs, thats a lot of stuff to throw at people on a website. Hell man, people are confused by the short list of "roles" we expose on the website. A long listing of SIG opportunities is going to look like a link farm and people will just gloss over it.

I'm trying to get to a point where we can do some targetted ads to connect likely new contributors with SIGs that align with their interests and get SIGs to prepare a followup "classroom" on irc.

And yes, this mugshot datamining is not the end all be all... its the data I have right now. So I want to use it..right now. I want to show that the analysis technique is worth building specialized ui around...by showing that its finding interesting information in the datasets on hand.

I've already being talking to lmacken about the analysis technique and how to gather a dataset as part of a "my fedora portal" or other such contributor/user interface thing as part of a service offering for people with fas accounts so we can better link SIG members into the analysis directly. If this analysis works...there is no reason we couldnt build a service interface which popped up likely SIG membership candidates to SIG members as part of the portal messaging.

"Hey Jef, as a Doomsday Device SIG member, you should know that these 5 other Fedora account holders are likely candidates for your SIG based on interests"

I don't need full coverage for this analysis to work...sparse data is okay. What I am trying to describe its just a slightly more complicated way of doing the sort of similarity analysis that social networking services do like last.fm when they recommend certain songs to you based on what and other people like or dislike. But now I'm trying to find a way to recommend connections between potential contributors and project areas.

The brainstorm dataset is a goldmine for this type of analysis...but the brainstorm ui stresses exactly the wrong thing..popularity. For what I'm doing overall popularity of a particular "thing" isn't important at all. The voting record for multiple people across multiple "things" is what is important. I can identify clumps of people who vote similarly, I can identify clumps of "things" that have highly similar voting records. For mugshot those "things" are applications. For brainstorm those "things" are ideas. I can do it for other "things" as long as you can give me the data in the form of a matrix that maps users to each thing via a numerical vote scaling, where a 0.0 value represents a non-vote. Negative votes are fine, floating values are fine... infinity probably not so fine.

From: ex_cgwalter Date: January 5th, 2009 04:09 pm (UTC) (Link)
I don't understand exactly what you're asking. Can you clarify?

Incidentally, the applications collection is now part of GNOME Online, not Mugshot. http://online.gnome.org/applications
jspaleta From: jspaleta Date: January 5th, 2009 05:49 pm (UTC) (Link)
I had a couple of rounds with owen taylor about getting a sanitized datadump from the applications database for me to work with. I'm hoping he just forgot about it. He said the service was migrating to new servers.

The underlying question is this. Do application usage patterns provide a way to discover clumps of people which represent niche software interests.
No a priori definitions of any usage patterns. The application usage data itself can be mined to find self-consistent groupings of people AND groupings of applications. Once we identify those clumps of people, can we target those users for recruitment for contribution into interest clumps they care about?

In past posts I was able to get a sanitized dump of brainstorm voting data (large but sparse matrix) which I can process in a similar way to find clumps of people and clumps of ideas. Slightly different question in that its voting record focused and not application usage focused. But in the abstract its the same question..since application usage is a sort of vote. I'm still planning to follow up with the brainstorm people about my analysis.

4 comments or Leave a comment