[Building Sakai] Building scalable Sakai tools

Wed Mar 20 03:11:36 PDT 2013

There is also a plain scalability issue with loading all the data to
render a view.

At the recent Sakai SA meeting UNISA where talking about courses of 30
000 students*. If you have a gradebook with 100 items you will be
fetching 30 000 x 100 records from the database and performing a
calculation on them. Even if the DAO is happening efficiently (and I
agree with David it usually isn't), you will fairly quickly hit a point
where this doesn't scale. I would suggest a couple of rules of thumb:

1) No tool should load all its data, or a list of all participants on
its default page (culprits I know of site info, gradebook)
2) if 1 is required to calculate information about high level items
(e.g. average grades) it should be calculated at a specific point in the
workflow (e.g. grading) and persisted.
2) If a list of participants is needed it must be paged and should offer
search and filtering options if the list of users goes above x

David

* UNISA consider this a standard course not a MOOC ;-)

On Wed, 2013-03-20 at 05:58 -0400, Adams, David wrote:
> It will be harder to bridge multi-select JOINs across disparate components (eg user service to Mneme or Assignments), but I'm going to guess that within each component you can make a lot of improvements. I've not worked with Mneme, but I'm assuming it uses hibernate. We log all our SQL statements and in pretty much every hibernate tool we see the pattern where each individual object gets loaded separately, one by one.
>
> So, to take a perhaps-not-exact, but a realistic demonstration of the patterns we see example: to load a quiz for a single user, it might load the quiz, and then it will get a list of part IDs, load each part one by one, and from them get lists of question IDs, load each question one by one with individual SQL statements, then check for attachments to each question one by one with individual SQL statements, then load a list of answer IDs, and load the answers one by one with individual SQL statements. And then it will choose the three or four that even need to be displayed on the current page and just display those. So for a quiz with 50 questions, you might see 200-300 individual SQL queries on each page load, and that's only if they aren't somehow loaded twice, as the analysis you attach describes, once to count, another to display. When the instructor tries to grade the quiz this might all happen again, only times the number of students in the class. We've seen page loads with tens or hundreds of thousands of single-record queries that could have been compressed to two or three well-designed SQL queries.
>
> I think the process to fix this would be to take each common activity in a tool, eg taking a quiz, grading a quiz, viewing a grade; running those activities on a test instance while logging all database activity, and then analyzing the relationship between the number of queries and the number of users in the site, the size and complexity of the quiz, etc, etc. Identify the areas where the relationship is exponential, and go in and replace the SQL-blind Java code that's letting hibernate make all the decisions about SQL with some carefully crafted and properly cached explicit joined statements.
>
> To avoid this in other hibernate tools, the only solution is for developers not to let themselves be entranced by Hibernate's ability to hide the database activity from you. The analysis of what queries are being generated needs to be done no matter how you write the logic in the first place. If you've got a O(n^2) relationship between your number of objects and your number of queries, you're doing something wrong.
>
> As for Assignments, what needs to be fixed there is to bring out the XML blob into plain fields in the assignment tables. Until then, performance will be terrible, as most of the things you want to search or sort on are hidden from the database.
>
> -dave
>
> ________________________________________
> From: sakai-dev-bounces at collab.sakaiproject.org [sakai-dev-bounces at collab.sakaiproject.org] On Behalf Of Mark Breuker [mbreuker at loi.nl]
> Sent: Wednesday, March 20, 2013 5:11 AM
> To: sakai-dev at collab.sakaiproject.org
> Cc: Berg, Alan
> Subject: [Building Sakai] Building scalable Sakai tools
>
> Hi all,
>
> We are experiencing performance issues with Mneme in a worksite that has around 2500 students. When an instructor wants to open the list of submissions per quiz / assignment the page takes around 1 minute to load :( We asked Edia to investigate the issue for us (see analysis attached) and found some parts in the code that can be improved.
>
> We are also seeing similar performance issues in other tools. Assignments also performs very badly. Assignment 2 is a lot better but still takes around 6 seconds to load a similar page with the same amount of users/submissions. I know Alan Berg has also documented slow performance in a number of other tools here: https://confluence.sakaiproject.org/display/WGMOOC/MOOC+Scalabilty
>
> In order to move forward and fix the issue in Mneme (and other tools) I would like to know if there are common design patterns that can (and should) be used when doing thinks like loading a list of all users in a site combined the date they submitted an assignment. Arguably the best way would be to perform a SQL JOIN query (that joins the site member info with the submission info) on the database but that would brake the service oriented design of Sakai.
>
> Bottom line: I'm looking for some input to document design patterns for highly scalable Sakai tools. I've started a page on Confluence here: https://confluence.sakaiproject.org/x/owPzB
>
> Cheers,
>
> Mark
>
> Mark Breuker
> Product Owner
> Tel.: +31 71 5451 203
>
> Leidse Onderwijsinstellingen bv
> Leidsedreef 2
> 2352 BA Leiderdorp
> www.loi.nl
>
> ________________________________
>
> [cid:nwss_loi29.gif]
>
> De informatie verzonden met dit e-mailbericht (en bijlagen) is uitsluitend bestemd voor de geadresseerde(n) en zij die van de geadresseerde(n) toestemming hebben dit bericht te lezen. Gebruik door anderen dan geadresseerde(n) is verboden. De informatie in dit e-mailbericht (en de bijlagen) kan vertrouwelijk van aard zijn en kan binnen het bereik vallen van een wettelijke geheimhoudingsplicht. Indien u deze e-mail ten onrechte ontvangen hebt, wordt u verzocht ons daarvan zo spoedig mogelijk per e-mail of telefonisch op de hoogte te stellen, en het ontvangen bericht (en de bijlagen) te wissen zonder deze te lezen, te kopiëren of aan derden bekend te stellen.
>
> P  Denk aan het milieu voordat u dit bericht print
>
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"
>

________________________________
 UNIVERSITY OF CAPE TOWN

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 9111. This e-mail is intended only for the person(s) to whom it is addressed. If the e-mail has reached you in error, please notify the author. If you are not the intended recipient of the e-mail you may not use, disclose, copy, redirect or print the content. If this e-mail is not related to the business of UCT it is sent by the sender in the sender's individual capacity.