[sakai2-tcc] Annotation based injection within Sakai components

Wed May 16 21:42:00 PDT 2012

Regarding session replication.  I think you should talk to folks at
unicon who did some work there for wiley a few years back.  I think
the approach was on top of terracotta.  Some of that made it into
core, but I don't really know the state of that anymore.

There was a lot of good brainpower thrown at that problem.  What I
remember was that sakai's model objects have references back to the
service they came from and this caused a giant nightmare for any type
of session replication attempting to make that stuff transient was
impossible due to classloading issues, and that is how they ended up
on terracotta.  Or something like that, probably others over there are
better versed.

The skinny is, I agree being more stateless is a big step forward in
that direction, but this problem may prove to be hard to resolve.
There may be other ways to attack this problem now.  Its worth
exploration, for the reasons you've outlined.

On Wed, May 16, 2012 at 9:31 PM, Adams, David <da1 at vt.edu> wrote:
> Not that I'm on the technical committee (not even sure this message will make it through) but as an interested party on the operations side, I think Chuck's list of fixing some of the biggest deployment and scaling headaches ought to be high on the list, as they could mostly be accomplished without breaking tools. In particular, issues surrounding event message passing between servers (sounds like rSmart has a solution here), and finding a way to make user state transferable between app servers (I assume cleaning up the Velocity tools would go a long way towards this goal), so that we can take servers offline without waiting hours for active sessions to drain (or so that server crashes don't destroy hundreds of user sessions at a time) are key, as is runtime configurability.
>
> As for Chuck's mention of the authz tables, here at VT we had about 18 months of intermittent reliability problems (ie, complete system crashes across all app servers that cropped up in just seconds) that we can trace 90% of back to database bottlenecks due to authz queries piling up waiting for shared memory latches to be freed in Oracle--with every new user request wanting to run a dozen or more of those queries, if there was any slowdown whatsoever, the app servers would exhaust their DB pools in seconds, no matter how much headroom we gave them. (We solved the problem by moving from a 16-core server to an 80-core server... Oracle correlates the available shared memory latches to the core count.)
>
> While those queries are heavy on the joins, and we found that we could make improvements in throughput by, for example, hardcoding the role keys for .anon and .auth roles into the primary authz query, and so I think Chuck's proposal of eliminating the string lookups on every query is a good idea, ultimately those fixes, for Oracle at least, don't solve the contention problem, which is all on the sakai_realm_rl_fn table itself. With just three integer fields, you can cram a lot of 12- (or 24-) byte rows into an 8k memory block, and removing the joins from the equation only buys you so much. A better approach, I believe, would be to find ways of either massively reducing the required number of records in that table by expanding the role of something like the !site.helper realm, but with the ability to remove positions at higher levels. So each site type ought to have a basic template of permissions for each role, but instead of copying those out to each realm individually, that
>  default list ought to be the most commonly used set and there should be no reason for most sites to have their own permissions at all. And of the ones that do end up changing permissions, a short list of additions and deletions could be cached upon a visit to a site and merged with the site type defaults. Combined with a more reliable cache thanks to a better event system, it ought to be possible to keep active database queries to a minimum.
>
> Alternatively, make available an alternative authz provider that stores its data in some other database system. The realm-role-function triplet lookup seems well suited to some kind of NoSQL datastore, and just eliminating the authz load from the database would drop our query load by 40+%.
>
> A similar system makes sense for presence/persistent chat. Those are the kind of features that schools like VT end up turning off as soon as we run into performance issues because of the high load in terms of app server memory and database query volume. If those components could be moved out of the primary Sakai Java VM and database pair and into an associated but separate system, we operations folks would be a lot happier about the scalability, reliability, and performance of CLE.
>
> David Adams
> Director, Systems Integration and Support
> Virginia Tech Learning Technologies
>
> ________________________________________
> From: sakai2-tcc-bounces at collab.sakaiproject.org [sakai2-tcc-bounces at collab.sakaiproject.org] On Behalf Of Charles Severance [csev at umich.edu]
> Sent: Wednesday, May 16, 2012 11:39 AM
> To: sakai2-tcc Committee
> Subject: Re: [sakai2-tcc] Annotation based injection within Sakai components
>
> I think that we need to avoid major non-upwards compatible Kernel changes through the end of 2013.  Small changes are OK not lets not build a whole new kernel.  OAE has done that.   As others have said, there will be so little new development in 2.x that our antique service pattern is not the limiting factor.
>
> What people are proposing sound a lot like Moodle 2.0.   It was intended as a rewrite, an architectural redo, etc etc.   After it fell behind, they ended up not accomplishing everything they hoped and then released something which was not all that great of an improvement for end users nor all that great of an improvement for module developers.  It was just different and incompatible - not particularly better.
>
> It took a lot of inertia out of Moodle for several years and they are just finally recovering from it.   Re-Writes take forever.
>
> If I were to come up with things to put on a 2012-2013 roadmap it would be things like:
>
> - Create non-SQL message pump and put it in trunk as the default
>
> - Add id columns to AUTHZ tables and switch to all integer joins for significant AUTHZ queries
>
> - Go through and greatly reduce the use of state from Velocity tools
>
> - Careful performance testing and tuning in trunk
>
> - Add multi-tenancy hooks into the data model
>
> - Improve our test coverage for test plans, unit tests, etc
>
> - Strengthen /direct across all tools - build a set of tests that pound /direct
>
>
> I am not saying that these *are* the things to do - what I am saying is that Spring 3 is *not* the thing to do.   We need to not make the Moodle mistake and lose our contrib community in our gung-ho goal of internal cleanliness that has no end-user or system deployer impact.
>
> In terms of OAE (I will speak carefully here), I think that 2014 will be an important year for OAE.  If OAE is still trying to get off the ground in 2014, I think that the broader community might decide that OAE has taken too long and decide that it is not the next big thing given that it started in 2008.   A rewrite that is still not done after six years and millions of dollars per year kind of proves the point.
>
> So from a CLE perspective I think that in 2013 we need to be tactical and then in 2014 become strategic depending on the state of OAE.  Here are some possible scenarios:
>
> (a) OAE emerges beautifully in 2014 and is in reasonably wide production in 2014.  In that case I think we all will just start working on OAE and 2.10 will live long and propsper with a long series of non-expanding dot releases.
>
> (b) OAE does not thrive in 2014.  At this point, we *may* decide that CLE needs to move back to a feature-expanding strategy.   If that is the case, our path forward would likely *not* be writing our own brand new Kernel or evolving the current kernel in new ways.  A *much* smarter approach would be to take parts of OAE - in particular the service model, etc and then extend it and then graft our services into it to make a super-new kernel.  I think that much of the early work in OAE w.r.t. how to make services was pretty solid and brilliant and we dont' need to re-visit all the things they tried and discarded.
>
> Sooooo - that is a long way to say that a major kernel redo in 13-14 is not a good idea IMHO.   In 14-15 we either will be in maintenance mode or borrowing tasty bits from OAE.  So starting an effort to make a "third" kernel seems not to be a useful effort at this point.
>
> Sorry for the long text.
>
> /Chuck
>
>
>
>
> _______________________________________________
> sakai2-tcc mailing list
> sakai2-tcc at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai2-tcc
> _______________________________________________
> sakai2-tcc mailing list
> sakai2-tcc at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai2-tcc

-- 
John Bush
602-490-0470