» tagged pages
» logout

(Feed found, click Add Page to syndicate.) Error finding feed, please try again » Find feed title

A Blog Page allows you to add entries, for news or other time sensitive postings

(Login required to save to your tagged pages.)
(or Cancel)

Make further edits, (or Cancel)

(Login required to save to your tagged pages.)
(or Cancel)

(Editing anonymously: to be credited for your changes, login or register a new account)

Change Page Permissions? Changing these permissions will adjust who can modify this page.

alex (change)
Swik Users (change)
(or Cancel)
Upload an image from your computer:
or Copy an image from a URL:
or Erase the current icon:
Icon Preview:

or Cancel

Erase FLOSSmole? The contents of FLOSSmole page and all pages directly attached to FLOSSmole will be erased.

or Cancel

(Editing anonymously: to be credited for your changes, login or register a new account)

other page actions:
FLOSSmole

FLOSSmole

Tag Cloud

To further filter what appears in the Things Tagged FLOSSmole list, select a tag from the Tag Cloud.

open source, open data: collaborative collection and analysis of open source project data.

, James Howison, Kevin Crowston

sorted by: recent | see : popular
Content Tagged FLOSSmole

Problem with Public Areas file

We recently had an error in the Sourceforge Public Areas data dumps for October.

New files available for you to download. They are marked "2".

FLOSSmole: FLOSSmole

October developers

Better late than never on the October Developers and Project Developers for SourceForge:

Download these developer files here

Sorry about the .gz format instead of the .bz - for some reason I have problems using the Google Code auto-uploader with .bz files. It's a very strange problem, random and intermittent, hard to pin down why some .bz files will go and some will not. Anyway, I gzipped them and we are good to go.

In case you never downloaded our developer files before, here's what is included there:

-- this shows the list of all developers
SELECT dev_loginname, realname, date_collected, datasource_id
FROM developers
WHERE datasource_id=143
ORDER BY dev_loginname

-- this links the list of developers to what project they work on, and also tells what role they have on that project
SELECT dev_loginname, proj_unixname, is_admin, position, date_collected
FROM developer_projects
WHERE datasource_id=143
ORDER BY dev_loginname

FLOSSmole: FLOSSmole

October data

The October data are available at the following locations:

Google Code Downloads Page

Notes:
  • SF Stats are unavailable this month because the server was not reliably reporting stats when we did our collection.
  • There were a few errors in the file uploads. These files can't be taken off of Google Code, so I've just marked the files "DO NOT DOWNLOAD". You can usually tell these files by their very tiny filesize.
  • SF Developers and SF Developer_projects did not make the datamart build. I'll release these separately -- UPDATE: these are on our download site now

FLOSSmole: FLOSSmole

Bad August 2008 file

The August 2008 SF file released to Google Code was corrupted, so anyone who downloaded this file only got 90k projects. Please download the new file here. It is called "Datamarts: SF Other, Aug 2008, New, v.1"

FLOSSmole: FLOSSmole

Direct DB access for FLOSSmole collection available

Hello moles,

I'm excited to give you all a heads up that the entire flossmole database is now available directly via a MySQL server.

We have transferred the database to the NSF TeraGrid Data Central hosting site [1] (based at the San Diego Supercomputing centre). It's a bigger machine and professionally administered, which was much better than we could offer ourselves. See below for access procedure.

The process of transferring the database also enabled us to prepare comprehensive datamarts for each datasource in the database. These are mysqldump files which can be used for local access to parts of the database; there are two for each datasource, one containing the raw html pages and one, substantially smaller, containing just the parsed data points. These will be available shortly and will be an option for those who want to install a local copy of the DB; although we'd be very interested in reasons people find to do that, we'd like to have people sharing useful transformations of the data and the Data Central database should be pretty quick.

So now we have three great options for accessing the FLOSSmole data:

1. The traditional monthly flat files
2. Direct MySQL access to the full database @ DC.
3. Comprehensive datamarts for local access

Database access further info
------------------------------

In order to demonstrate usage to NSF and to monitor run-away queries (hey, I write them myself. Often :) interested users need to contact the FLOSSmole project to request a personal username and password, which should not be shared. Other than that simple request, we're not introducing any new AUPs or conditions.

Initially requesters should join and email their request to the ossmole-discuss list, with a preferred username. We can review using the list that way if the traffic spikes. Turnaround should be no longer than a business day or two (we email the db admin at Data Central with the request).

OTOH when we, and hopefully you, publish workflows using the database, we would like them to work 'out of the box', without a potential user needing to request a user/pass. To enable this, in addition to the full database we are in the process of creating a small database, with very limited data in each table (~20 rows in each table, just demo data). This is to allow querying through a single, public, shared login which we urge people to use when publishing their workflows; once potential users wish to go beyond the sample data they should request their own user/pass and plug it into the workflow. We're still figuring out the best way to do this (finding 20 projects with total data coverage is actually quite hard :)

Hopefully this improves the accessibility of the datasets, and will likely result in finding more bugs; both from the migration [2] and within the dataset. We're asking people to file bugs and request for documentation in the Sourceforge Trackers; although discussing them on this list is always welcome as well.

So, have at it.

--J

[1]: http://datacentral.sdsc.edu/ . The Machine (thor) is on Internet2 and has >80G of RAM.

[2]: If anyone wants to chat about ways to confirm data integrity while migrating 300GB+ databases with some very large tables, ping me :) I think we got it sorted, with the much appreciated help of our Master's student, Vinay Venugopal.

FLOSSmole: FLOSSmole

August data released

Hello moles!

Lots of exciting news this month:

1. All releases are up on Google Code instead of SF this month. Go to the FLOSSmole downloads page on Google Code to pick up this month's files.

2. Sourceforge stats server is down, so no stats this month.

3. Debian is back! The data is minimal, but check out what we do have and enjoy that.

4. I've updated the Google Code FLOSSMole wiki also, so if you're looking for basic how-to or documentation, check there first.

5. coming soon: Free Software Foundation (re-released!), Eclipse (ask and ye shall receive)

ENJOY!!

Update: the SF file was corrupted, so anyone who downloaded this file only got 90k projects. Please download the new file here. It is called "Datamarts: SF Other, Aug 2008, New, v.1"

FLOSSmole: FLOSSmole

OSS Watch in Oxford

I am at Oxford in the UK (staying at Hertford College Graduate Center) for a few days for the OSS Watch workshop on Profiling Open Source Communities.

OSS Watch is the National Advisory Service on open source for UK Further Education (FE) and Higher Education (HE). As such, it is part of our remit to help FE and HE institutions and projects who want to engage with open source development, and a key factor for that is the development of open source communities.


I'm giving The Standard FLOSSmole Talk today at 11:45GMT.

FLOSSmole: FLOSSmole

July data released

Well moles, we were hoping to move to Google Code over the summer to host our file releases, but unfortunately, they have some size (and other?) limitations which are messing us up. They have no problem storing certain files, but others produce errors when I try to upload them. They are rejecting more and more of our files. I've put in a ticket to see if they can bump up the quota over there, but in the meantime, I'm back to putting files up on Sourceforge Downloads page.

Here's the download files area over on SF (available files for July: Freshmeat, Rubyforge, and Objectweb)

(Remember, since it's an "odd" month, for Sourceforge data, select the "June" release.)

I was able to get some of the files up on Google Code, so go over there and check the files out there if you like.

FLOSSmole: FLOSSmole

June SF data posted

June Sourceforge data has been posted.

This time, I've released the files onto our Google Code project page. Let's see how we like using Google Code for this (psst, I can tell you as the person who does the file releases: it's a lot easier to use Google Code than Sourceforge for this particular part of the job!)

Enjoy!

UPDATE: Because of file size limits on Google Code, I've had to re-release our code onto Sourceforge.

Here is the link to the Flossmole SF Downloads Page, where you can get ALL the June files, including the huge datamart files.

FLOSSmole: FLOSSmole

May 2008 data released

Moles, the May 2008 data has been released. Find it all at our SF project page. Forges included this month are: FM, RF, OW.

FLOSSmole: FLOSSmole

April 2008 data released

Hi moles! April 2008 data has been released on our SF project page. Enjoy it! (Debian data included!)

UPDATE 08-JUL-2008: new files released to get past problem with data quality on these April files.

FLOSSmole: FLOSSmole

February data released

Go to our file release page on Sourceforge and pick up all the latest files.

Included this month: SF, FM, RF, OW flat files and data marts (sql statements).

(Debian is on the way! I'll update here as soon as it's ready.)

Enjoy!

FLOSSmole: FLOSSmole

Jan files released and Dec datamarts posted

Hi Moles!

The January data files have been released, and I went back and released the December data marts too (sql statements to make your own database).

Have fun with it! Grab the data from our Sourceforge project page.

FLOSSmole: FLOSSmole

December data released

Hello moles! After considerable delay, here is the December data. Enjoy!

Download the data from our SF release page!

FLOSSmole: FLOSSmole

November data released

Files for November 2007 have been released here:

Sourceforge File Release Page

FLOSSmole: FLOSSmole

October data released

Sorry for the lack of postings! The data has been released, I guess I just forgot to post the updates.

Here are the October data links.

FLOSSmole: FLOSSmole

Page 1 | Next >>
Username:
Password:
(or Cancel)