[DG: Open Forum] [Announcements] Sakai OAE

Sat Sep 8 02:30:08 PDT 2012

I agree with everything Zach has said here about how the problems with OAE evolved - and I wanted to add 
some details about how they developed in the first place, since I don't think that Bob Martin's description 
matches the early part of our situation very well. I was part of a group who was intimately involved with 
the development of Sakai in its middle years, and got to witness a number of these developments at first hand.

In contrast to "having slowed to a crawl", our development on what is now named Sakai CLE between 2004 and 
2007 was steadily growing more efficient, and we were having very few problems attracting new, enthusiastic 
contributors and getting them up to speed with being productive with adding features and stability to Sakai. 
Features which, one has to mention, in short order found their way in front of real users. Far from new 
developers "making messes, slowing everyone down", Sakai 2.x was enjoying a period of community health and 
productivity which exceeded what it had seen since inception in the early days of CHEF at Michigan.

This makes the trajectory since then seem all the more sad and inexplicable.

I would like to write to ensure the community can firmly register what I believe are the right lessons from 
this painful experience and perhaps avoid a second, or as Zach has pointed out would even be a THIRD 
repetition of exactly the same mistakes ("whatever is going to happen next").

For a start, the grievous performance problems and outright architectural failure of what is now called OAE 
should be no surprise to anyone. As early as 2007 with some pretty basic measurements of Jackrabbit and 
other JCR performance on what might be a typical workload of Sakai requests, I could easily establish that 
JCR-based technologies were not even within a factor of 100 of meeting their performance targets. At that 
point I was advised to sit on my findings for fear of spoiling the JCR party - sadly at the time I didn't 
see that much could be served by making a fuss since my previous attempts at whistleblowing had been roundly 
suppressed. However, this doesn't excuse the failure of any of the projects since then to make even these 
basic kinds of engineering measurements before committing to a technology -

i) what is a typical workload of queries for a running system - that is, what nature of queries, with what 
kind of terms related in what way?
ii) what are its requirements in terms of transactionality and related throughput, perhaps expressed in 
terms of requests per second, related to an expected deployment size and number of active users

the fact that none of these projects have ever assembled or published these kinds of basic measurements, or 
where they were ever assembled, had them suppressed, makes it no surprise at all that "After four years, the 
system can't support more than a few dozen users at a time, or search through more than a few thousand objects."

I don't necessarily agree that "anti-RDBMS" sentiment is primarily the issue. The needs of Sakai may well be 
best met by a technology which is an RDBMS or is not an RDBMS - the issue is that what this technology is, 
is still not known 5 years later because the relevant engineering study was never done. The attempt to 
"hedge bets" by trying to create a new storage-neutral platform that would abstract away "any" storage 
system woefully misses the point of the whole NoSQL movement, which was to provide enough diversity so that 
ONE engineering choice might be found optimal for a particular workload and stuck with. At the end of the 
day this might actually be an RDBMS after all - but we still simply don't know. It appears only within the 
last months of the project that some serious attempts at performance characterisation (by Branden) are 
starting (again). However, even "back of the envelope" stuff would have been just fine, so long as it was 
sufficiently public and regular. Even preferable - getting basic "order of magnitude" estimates quite 
quickly (public, agreed, and easily reproducible) would probably be much more useful than getting highly 
detailed measurements months (or as it has turned out, years) later.

I should mention at this point that I'm in 100% agreement with Dave Adams' characterisation of the causes 
which led to failure, but wanted to add some details from my experience. This wasn't simply a "generic 
project failure" a la Brooks, Martin or Gall, but resulted from a number of quite specific failures in the 
technical and management culture of the project. In particular, at the end of the period I mentioned, 
architectural control of Sakai was seized by a very small group with a number of particular failings. 
Amongst these failings were i) a lack of appetite for detail and consistent work, or ability to stick with 
any architectural choices over a period longer than 18-24 months, ii) a need to be seen as proving "heroic 
remedies" for the general dissatisfaction within Sakai, together with a gung-ho "high risk, high rewards" 
value system, iii) an intolerance for shared values and shared technical decision-making that was extreme 
even by the relatively low standards of the community of that decade.

Having already run into failure after 3 years of expressing these qualities, the attempt to finesse the 
community by renaming the project "Sakai 3" as "Sakai OAE" and begin afresh with exactly the same approach 
and outlook to waste the remaining community goodwill and resources was an act of astonishing wickedness and 
cynicism.

And what was so substantially unsatisfactory, right from the start, about the entire family of Sakai 3/OAE 
approaches? Almost everyone in this thread has put their finger on it already, but I will state it again 
just so there can be no doubt in the record - it was the FAILURE TO MAINTAIN A WORKING SYSTEM.

The banked value in the Sakai 2.x system, was, and is incalculable. It is a "going concern", meeting needs 
of its userbase, day in, day out. It is a tried and tested codebase, and user community, with more than a 
decade of experience behind it. The fact that it has survived the cynical starvation of resources 
perpetrated by the OAE branch for 5 years is only further testament to its real vitality and durability. I 
suggest that those responsible accept as an axiom that this codebase will still be in use in 2020 and 
beyond, and consider what that implies in terms of decisions about resources, staffing, and technical 
direction. The fact that anyone ever dreamed that there was value behind an approach which involved throwing 
away/ignoring its codebase and community is one of the most startling failures of management I've ever seen. 
When, in 2008 I tried to remonstrate with one of the managers responsible, I was presented with analogies 
taken from blackjack, casinos, and the gold rush... I can only refer back to my earlier points about the 
"need to feel heroic" and "lack of appetite for steady, detailed work".

The staff who are the true heroes are the ones who have been all this time thanklessly maintaining this 2.x 
branch with steady and responsible effort, continuing to put out releases, whilst all the glory has been 
going to a will-o-the-wisp project that has wasted engineering resources and mindshare for 5 years for the 
benefit of just a handful of users.

I would like to add my voice to those calling for a wholesale abandonment of the remnants of OAE and 
OAE-like projects, and a return of resources and prime status to those planning and managing Sakai 2.x 
releases. The problems of that codebase were already well-understood by the middle of the last decade, and 
there are numerous straightforward directions that can be taken to improve it by means of steady, 
INCREMENTAL developments. For example - an architecture-wide agreement on handling of markup fragments and 
URLs both appearing within these fragments and those used to address them. Or, as another example, a 
simplified "shim" API maintained on top of and alongside the ContentHostingSystem that could be used for 
gradual migration of old applications towards a suitable repository system - together with a standard for 
associated RESTful URLs. As others have noted also, the huge investment in UX research and front-end 
implementation on 3/OAE must not be abandoned, but used to inform new developments and improvements on the 
2.x platform. But above all, I recommend that one core principle of sanity be adopted, which is that no work 
be embarked on by the community which is even suspected of being more than a couple of weeks away from being 
integrated into a working branch of the working CLE system which is currently in widespread use, delivering 
value to users.

These choices would already have been overdue by 2009.

Cheers,
Antranig.

On 9/7/2012 2:33 PM, Zach A. Thomas wrote:

 > Remarkably, Uncle Bob Martin told the whole story even before it
 > happened, in episode 1 of his Clean Coders video series.[1] Here is
 > the relevant portion. I've added footnotes to show where the OAE story
 > deviates (it's not by much). We lived out versions of this story three
 > times: when OAE started as Sakai 3, when we replaced Jackrabbit with
 > sparsemapcontent, and whatever is going to happen next.
 >
 > Zach
 >
 > P.S. Disclaimer: I'm one of the developers who was downsized. It makes
 > sense; I was relatively expensive.
 >
 >> # The Productivity Trap
 >> ## © 2011, Uncle Bob Martin
 >>
 >> Have you ever worked on a greenfield project? Do you remember how
 >> productive it was? Lightning would crackle from your fingertips as
 >> you got feature after feature working. You were going fast. Users
 >> would come to you and ask you for something new and you could have it
 >> to them in a matter of hours or days.
 >>
 >> But the speed didn't last. A year or two later, things had slowed to
 >> a crawl. The harder and harder you worked, the slower and slower
 >> things seemed to go. And why? Because of the mess that had grown in
 >> the code. And slogging through that mess had really started to slow
 >> you down, and slow you down a lot. That high productivity you enjoyed
 >> at the beginning had now plummeted. Things that took you hours before
 >> took you days. Things that took you days before now took you weeks,
 >> months, or can't even be done at all.
 >>
 >> Managers grew concerned. After all, they had based their plans on
 >> your high productivity at the start.[2] Now they faced a very scary
 >> gap in their business plans. The first strategy used to fill that gap
 >> was to put more pressure on the software developers. This served only
 >> to drive the developers to make even bigger and bigger messes, and
 >> despite heroic efforts, slow down even more.
 >>
 >> Next, managers tried to add more people to the team. This, of course,
 >> caused productivity to plummet as the new people sucked the life out
 >> of the old people.[3] Then, the new people started working faster,
 >> making messes of their own, slowing everybody down even more. Adding
 >> man-power's not cheap. So now managers were faced with ever
 >> increasing cost and ever decreasing productivity.
[]