[Building Sakai] Playing with the Sakai Mailing List Data

Steve Swinsburg steve.swinsburg at gmail.com
Mon Apr 8 05:04:57 PDT 2013


Now all we need are prizes! Or a metric of message quality ;)

cheers,
Steve

On 08/04/2013, at 12:53 PM, Charles Severance <csev at umich.edu> wrote:

> As part of the SI301 - Networked Computing class that I am teaching at UMich this semester, I am using the entire Sakai dev lst from 2005 to give them a significant amount of  data to chew on and visualize.  Here is the README for all the code I have written so far:
> 
> https://github.com/csev/networks-code/blob/master/gmane/README.txt
> 
> It consists of a gmane spider (gmane.py) - a data cleanup / indexing step (gmodel.py) and then a data analysis phase (gbasic.py) - there will be more data analysis programs written by students to do different data analysis and visualization of the data over the next few weeks.  If you read the above README, cleaning up the data is pretty tricky and prone to error so I figured I would give you all access to the data and let you see if it needs more cleaning before I loosed my students on the data:
> 
> Here is the cleaned data:
> 
> http://www-personal.umich.edu/~csev/sakai/email/index.sqlite
> 
> You can use the Firefox SQLiteManager plugin to look at the data.  The data model is pretty obvious and the message headers and bodies are compressed with Python's zlib.  The first and very simple data analysis program to read this file is:
> 
> http://www-personal.umich.edu/~csev/sakai/email/gbasic.py
> 
> The entire workflow is described in the github repo.   But don't spider your own copy of the data - otherwise gmane might lose their patience.  If folks are interested I will upload the raw data  (content.sqlite for Sakai is 635 MB - 10X the properly modeled, cleaned up and compressed data in index.sqlite).
> 
> We can do a lot of cool analysis like text analysis or average reply speed or average number of replies - I will see what the students want to do as their projects once I whip up a few cool visualizations to get them started.
> 
> Let me know if you see an error or your school or you is improperly represented in the data.  I tried as best I could to map people to their most recent email address if they have had more then one email address over the life of the mailing list.
> 
> /Chuck
> 
> P.S. Here is some of the output of the program:
> 
> Top 40 Email list organizations
> gmail.com 7339
> umich.edu 6243
> uct.ac.za 2451
> indiana.edu 2258
> unicon.net 2055
> tfd.co.uk 1591
> berkeley.edu 1384
> longsight.com 1347
> stanford.edu 1266
> ox.ac.uk 1193
> ucdavis.edu 1175
> rsmart.com 1063
> cam.ac.uk 1035
> etudes.org 866
> gatech.edu 857
> rutgers.edu 758
> columbia.edu 700
> virginia.edu 644
> earthlink.net 606
> mtu.edu 585
> mac.com 563
> ufp.pt 475
> rice.edu 442
> uva.nl 421
> yale.edu 407
> sakaifoundation.org 321
> csu.edu.au 284
> uhi.ac.uk 261
> yahoo.com 259
> upvnet.upv.es 250
> hotmail.com 249
> upmc.fr 248
> threecanoes.com 245
> unisa.ac.za 241
> serensoft.com 238
> ufp.edu.pt 237
> aeroplanesoftware.com 235
> unavarra.es 227
> ucmerced.edu 223
> loi.nl 220
> 
> Top 40 Email list participants
> steve.swinsburg at gmail.com 2657
> azeckoski at unicon.net 1742
> ieb at tfd.co.uk 1591
> csev at umich.edu 1304
> david.horwitz at uct.ac.za 1184
> stephen.marquard at uct.ac.za 853
> arwhyte at umich.edu 782
> matthew at longsight.com 701
> adam.marshall at ox.ac.uk 699
> jimeng at umich.edu 698
> slt at columbia.edu 686
> clay.fenlason at gatech.edu 670
> adrian.r.fish at gmail.com 612
> markjnorton at earthlink.net 605
> chmaurer at indiana.edu 601
> swgithen at mtu.edu 585
> knoop at umich.edu 571
> hedrick at rutgers.edu 565
> ggolden22 at mac.com 560
> sinou at etudes.org 527
> ray at berkeley.edu 491
> bkirschn at umich.edu 489
> tpamsler at ucdavis.edu 485
> lance at indiana.edu 479
> botimer at umich.edu 461
> jholtzman at berkeley.edu 452
> jleasia at umich.edu 451
> zqian at umich.edu 435
> matthew.buckett at ox.ac.uk 434
> caseyd1 at stanford.edu 431
> nuno at ufp.pt 429
> ktsao at stanford.edu 429
> ajpoland at indiana.edu 405
> dlhaines at umich.edu 377
> mmmay at indiana.edu 356
> a.m.berg at uva.nl 337
> dave.ross at gmail.com 330
> ottenhoff at longsight.com 328
> john.bush at rsmart.com 325
> jpgorrono at ucdavis.edu 298
> 
> 
> 
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
> 
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20130408/fb98009c/attachment.html 


More information about the sakai-dev mailing list