[sakai2-tcc] Status of Tools Survey/Summaries

Sat Jun 1 19:14:27 PDT 2013

Hi TCC,

I made an effort to summarize and graph the results of the survey and the SQL query results. I think it would be good to have another pair of eyes review my methodology and thoughts on how the data is represented before making public. Thanks to Matt Jones who I was able to bounce off some ideas and helped me learn about exception handling so I could catch data problems. 

The results are in the shared Dropbox folder - CLE Tools Query Results -> Summary Results - Draft

* SurveyMonkey_Summary.pdf     - this is a summary of the survey generated by SurveyMonkey. It does not include comments and is not filtered, except that from 84 initial responses I trimmed it to 70 responses by removing incomplete data (primarily).

* Summary_of_Comments.txt - comments pulled from SurveyMonkey, per tool. Useful reading. Comments can be traced back to the commenter if necessary.

* Graphs.pdf and Graphs.key - same information - one is in PDF and one is in Keynote.  Attempted to graph data to try and make it meaningful. Added comments intended to make it clear how I chose to present the data. Cutoff and grouping of tools is something which I made an initial cut.

Other files represent detailed source data. 

Methodology
-----------------------
For the Tools and Events sql data, the methodology was this (and I can provide fairly readable Python code if you wish, except that I ended up reading "percentages" and not "events").

* For each institution/file, I manually opened the CSV file and created a new column. I then had Excel calculate the percentage of events /tools relative to each other, so column added up to 100%
* Had Python go through every file (tools and events processed separately) and add up the percentages from each file for each event/tool. If an event/tool did not exist in every file, which was the case, that was okay, it would just add the values from the files in which it existed. 
* That provided an aggregate amount of a somewhat meaningless number, an addition of percentages across files. That number provided a relative weight for each event/tool. This means each institution had the same weight in the results.
*  In the case of events, I had Python process the aggregation at the event level, and at the "prefix" level. It seems that each event looks something like asn.submit.submission  and asn.revise.opendate, etc. The prefix would be "asn" in this case. It seemed logical to roll up the events in this way.

Hope that makes sense. 

See you tomorrow!

-- Neal