[sakai2-tcc] [Building Sakai] Assignments Shows Wrong Submission For Student

Wed Mar 6 07:07:00 PST 2013

Matthew Jones wrote:
> Fixing the data model so there is no XML and performance is good in sites with large
> enrollment. Possibly a creative fix to better process the XML could be achieved rather
> than breaking up the data.

Assignments is not the only Sakai tool to have made this mistake in
data design, but it's one of the hardest to dig the data out from
(Content is way harder). As a systems/DB guy I can't fathom the logic
behind putting search-worthy and sortable fields into a non-native
blob.

I see no technical reason not to just develop a single transition to
unpack the base64-encoded XML attributes into typed, named fields in
the table. The data model is very tightly defined. There aren't
unexpected fields popping up here and there within the XML. I haven't
looked closely at the Assignments data API, but I can't imagine it
would need to change very much. Only the data access layer would need
to change, and it ought to be much simpler in the end.

The benefits in terms of data retrieval times would be enormous. On
Oracle pulling a CLOB field can be literally 100 times slower than
pulling a row of varchar fields containing the exact same data. Not to
mention the lack of need to maintain the serialization code and
libraries and the codec CPU time.

And the search and sort capabilities would be huge bonuses for the
tool developers as well as for sysadmins *ahem* who will no longer
need to reverse engineer non-standard object serialization methods
into a set of scripts that can parse database blobs into sane, typed
data.

Performance-wise, though, the area of the code that really needs this
treatment is ContentHosting. A huge opportunity was missed when the
decision was made to rewrite the XML serialization code and provide an
online background migration service (all of which code is still
lingering in kernel somewhere) but without addressing the primary
design error in which relational data and searchable attributes were
put into non-relational, non-searchable form. In fact, the current
binary format is actually worse for searching than the XML was.

> The No-SQL XML data structure of Sakai was a good idea but with
> the lack of easily addable indexes and slow readers doesn't work
> very great in practice.

I would argue that the performance failure was entirely predictable,
ie it wasn't ever a good idea. XML is properly seen as a data transfer
format, not a data storage format. ORM was a fresh concept at the time
these decisions were made, and people rightly see relational databases
as imperfect media for representing object structures. But relational
databases are proven, fast, well-understood, and well-supported, and
in any case, that's the data storage system that was being used. You
should design for the technology you've chosen.

Trying to force some non-native blobs into a relational database to
satisfy an urge for purity in system design means that ultimately we
have had to compromise performance, spend countless hours hacking
around bottlenecks, live with a lack of tools for inspecting,
troubleshooting, and tweaking data structures from outside the
application environment, and make the code for accessing and handling
these blobs non-standard, cluttered, and confusing.

There's a valuable lesson to be learned from this if we pay attention.

-dave