[Building Sakai] Playing with the Sakai Mailing List Data

Charles Severance csev at umich.edu
Sun Apr 7 19:53:43 PDT 2013


As part of the SI301 - Networked Computing class that I am teaching at UMich this semester, I am using the entire Sakai dev lst from 2005 to give them a significant amount of  data to chew on and visualize.  Here is the README for all the code I have written so far:

https://github.com/csev/networks-code/blob/master/gmane/README.txt

It consists of a gmane spider (gmane.py) - a data cleanup / indexing step (gmodel.py) and then a data analysis phase (gbasic.py) - there will be more data analysis programs written by students to do different data analysis and visualization of the data over the next few weeks.  If you read the above README, cleaning up the data is pretty tricky and prone to error so I figured I would give you all access to the data and let you see if it needs more cleaning before I loosed my students on the data:

Here is the cleaned data:

http://www-personal.umich.edu/~csev/sakai/email/index.sqlite

You can use the Firefox SQLiteManager plugin to look at the data.  The data model is pretty obvious and the message headers and bodies are compressed with Python's zlib.  The first and very simple data analysis program to read this file is:

http://www-personal.umich.edu/~csev/sakai/email/gbasic.py

The entire workflow is described in the github repo.   But don't spider your own copy of the data - otherwise gmane might lose their patience.  If folks are interested I will upload the raw data  (content.sqlite for Sakai is 635 MB - 10X the properly modeled, cleaned up and compressed data in index.sqlite).

We can do a lot of cool analysis like text analysis or average reply speed or average number of replies - I will see what the students want to do as their projects once I whip up a few cool visualizations to get them started.

Let me know if you see an error or your school or you is improperly represented in the data.  I tried as best I could to map people to their most recent email address if they have had more then one email address over the life of the mailing list.

/Chuck

P.S. Here is some of the output of the program:

Top 40 Email list organizations
gmail.com 7339
umich.edu 6243
uct.ac.za 2451
indiana.edu 2258
unicon.net 2055
tfd.co.uk 1591
berkeley.edu 1384
longsight.com 1347
stanford.edu 1266
ox.ac.uk 1193
ucdavis.edu 1175
rsmart.com 1063
cam.ac.uk 1035
etudes.org 866
gatech.edu 857
rutgers.edu 758
columbia.edu 700
virginia.edu 644
earthlink.net 606
mtu.edu 585
mac.com 563
ufp.pt 475
rice.edu 442
uva.nl 421
yale.edu 407
sakaifoundation.org 321
csu.edu.au 284
uhi.ac.uk 261
yahoo.com 259
upvnet.upv.es 250
hotmail.com 249
upmc.fr 248
threecanoes.com 245
unisa.ac.za 241
serensoft.com 238
ufp.edu.pt 237
aeroplanesoftware.com 235
unavarra.es 227
ucmerced.edu 223
loi.nl 220

Top 40 Email list participants
steve.swinsburg at gmail.com 2657
azeckoski at unicon.net 1742
ieb at tfd.co.uk 1591
csev at umich.edu 1304
david.horwitz at uct.ac.za 1184
stephen.marquard at uct.ac.za 853
arwhyte at umich.edu 782
matthew at longsight.com 701
adam.marshall at ox.ac.uk 699
jimeng at umich.edu 698
slt at columbia.edu 686
clay.fenlason at gatech.edu 670
adrian.r.fish at gmail.com 612
markjnorton at earthlink.net 605
chmaurer at indiana.edu 601
swgithen at mtu.edu 585
knoop at umich.edu 571
hedrick at rutgers.edu 565
ggolden22 at mac.com 560
sinou at etudes.org 527
ray at berkeley.edu 491
bkirschn at umich.edu 489
tpamsler at ucdavis.edu 485
lance at indiana.edu 479
botimer at umich.edu 461
jholtzman at berkeley.edu 452
jleasia at umich.edu 451
zqian at umich.edu 435
matthew.buckett at ox.ac.uk 434
caseyd1 at stanford.edu 431
nuno at ufp.pt 429
ktsao at stanford.edu 429
ajpoland at indiana.edu 405
dlhaines at umich.edu 377
mmmay at indiana.edu 356
a.m.berg at uva.nl 337
dave.ross at gmail.com 330
ottenhoff at longsight.com 328
john.bush at rsmart.com 325
jpgorrono at ucdavis.edu 298



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20130407/db59be99/attachment.html 


More information about the sakai-dev mailing list