[sakai2-tcc] [Building Sakai] Assignments Shows Wrong Submission For Student

Wed Mar 6 12:47:23 PST 2013

The issue with *our* use of XML as a NoSQL store is that we just treat it as a string of text that needs to be deserialized.

In Oracle, the XMLTYPE allows for direct querying of the XML within the database, and many other powerful actions. Then you don't need to worry about database upgrades if you want to add another column, the data just going into the XML and you work with it. You can create a dynamic view from that info for faster data access, if required.

The new engine I am working on (FoxOpen) is completely XML driven and it is amazingly fast since it uses native XML support in Oracle. I don't know if MySQL has this same level of support.

For our uses though, this data should just be converted out to a relational model since it would be a lot of work to retrofit proper XML support.

cheers,
Steve

On 07/03/2013, at 2:07 AM, David Adams <da1 at vt.edu> wrote:

> Matthew Jones wrote:
>> Fixing the data model so there is no XML and performance is good in sites with large
>> enrollment. Possibly a creative fix to better process the XML could be achieved rather
>> than breaking up the data.
> 
> Assignments is not the only Sakai tool to have made this mistake in
> data design, but it's one of the hardest to dig the data out from
> (Content is way harder). As a systems/DB guy I can't fathom the logic
> behind putting search-worthy and sortable fields into a non-native
> blob.
> 
> I see no technical reason not to just develop a single transition to
> unpack the base64-encoded XML attributes into typed, named fields in
> the table. The data model is very tightly defined. There aren't
> unexpected fields popping up here and there within the XML. I haven't
> looked closely at the Assignments data API, but I can't imagine it
> would need to change very much. Only the data access layer would need
> to change, and it ought to be much simpler in the end.
> 
> The benefits in terms of data retrieval times would be enormous. On
> Oracle pulling a CLOB field can be literally 100 times slower than
> pulling a row of varchar fields containing the exact same data. Not to
> mention the lack of need to maintain the serialization code and
> libraries and the codec CPU time.
> 
> And the search and sort capabilities would be huge bonuses for the
> tool developers as well as for sysadmins *ahem* who will no longer
> need to reverse engineer non-standard object serialization methods
> into a set of scripts that can parse database blobs into sane, typed
> data.
> 
> Performance-wise, though, the area of the code that really needs this
> treatment is ContentHosting. A huge opportunity was missed when the
> decision was made to rewrite the XML serialization code and provide an
> online background migration service (all of which code is still
> lingering in kernel somewhere) but without addressing the primary
> design error in which relational data and searchable attributes were
> put into non-relational, non-searchable form. In fact, the current
> binary format is actually worse for searching than the XML was.
> 
>> The No-SQL XML data structure of Sakai was a good idea but with
>> the lack of easily addable indexes and slow readers doesn't work
>> very great in practice.
> 
> I would argue that the performance failure was entirely predictable,
> ie it wasn't ever a good idea. XML is properly seen as a data transfer
> format, not a data storage format. ORM was a fresh concept at the time
> these decisions were made, and people rightly see relational databases
> as imperfect media for representing object structures. But relational
> databases are proven, fast, well-understood, and well-supported, and
> in any case, that's the data storage system that was being used. You
> should design for the technology you've chosen.
> 
> Trying to force some non-native blobs into a relational database to
> satisfy an urge for purity in system design means that ultimately we
> have had to compromise performance, spend countless hours hacking
> around bottlenecks, live with a lack of tools for inspecting,
> troubleshooting, and tweaking data structures from outside the
> application environment, and make the code for accessing and handling
> these blobs non-standard, cluttered, and confusing.
> 
> There's a valuable lesson to be learned from this if we pay attention.
> 
> -dave
> _______________________________________________
> sakai2-tcc mailing list
> sakai2-tcc at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai2-tcc