[sakai2-tcc] Infrastructure discussion for future meeting?

Wed Mar 13 10:58:21 PDT 2013

John Bush wrote:
> So regarding the #15.  On one hand we already have that, I mean you
> are using java, you can always launch a thread via timer task or
> quartz or whatever.  Its not like we are using ruby or php or
> something that doesn't have threads and a giant global interpreter
> lock.  We do this sort of thing all over that place already.

Agreed. The one thing this is missing is some common structure. What
led me to think of this one was the recent addition of ZIP support in
resources. Testing it out on some ~3GB folders led to some very long
running responses. And the user is likely to click the command over
again if there's no response for 20+ seconds. That's not the only item
with issues. Any extract/zip/export/archive process that involves a
large dataset has the potential to run long enough to cause problems
or at the very least confusion.

In the long view, I'd like to see such operations that risk a long
running job return immediately with an indicator that the job is in
progress. The little widget at the top with user account links could
be used to make the job status report (with progress bars or wait
times if possible) available everywhere, and it could also indicate
when the job was complete.

So even if we were to just stick with local threads, there's still
more layers that could be built up to keep the users in the loop and
to save the implementers the effort of worrying about how to notify
users of the status or completion of a job.

That said, I very much like the external message queue idea, and I
think that's where we should be looking. The key to success would be
to start small, and get the message queue component established as
part of the architecture, and only then start moving major
functionality things to it. But once it was in place, there's a lot it
could do beyond helping to handle asynchronous requests. Inter-server
event communication being high on that list.

The more tasks that were moved async would also mean that deployers
could control the level of throughput for such batch jobs as well, or
even pause them if necessary. We could use this infrastructure to
manage long-running schema transitions like the XML->binary properties
blob transition in Content around version 2.5 or so.

> I think this type of integration model is really becoming more and
> more necessary, especially if we want to keep local code out of sakai.
> It really makes things much more manageable.  Things like analytics,
> warehousing, data integration, etc are all obvious use cases for this.

Definitely warehousing is a high priority around here. And there are
possibilities for using the queue server as an ingress for
course-management change notifications if your SIS is slick enough.

> In addition, I think you can make a good argument that you really want
> your broker living in a separate process, I don't think you want it
> embedded, unless you have some way to restart it without bringing down
> the whole Sakai instance.  So I'm not sure what people think about
> this architecture, we've always had this approach of running
> everything in a bunch of cloned Tomcats + a database.

I think moving to a more componentized architecture is the right
direction to be heading. In terms of robustness and scalability, I
think we're seeing that lots of people are hitting the limits of the
amount of memory churn Java can handle and the number of app servers
that can be running simultaneously because of database contention.

Pulling certain tasks out of the app server JVM is the right move.
Maybe we can get Search, Sitestats, Quartz, and James out of Tomcat
down the road as well. But I think a solid message queue is the first
step in achieving any componentization.

-dave