[sakai2-tcc] [Building Sakai] The myth of schemas (was: Assignments...)

Thu Mar 7 16:15:31 PST 2013

I agree, I think there's use cases where you want high consistency and
others where you want high partition tolerance, those are the only *real*
differences. Those these are not all or nothing though, and relational
mysql/maria (cluster) achieves some of this by potentially degrading
availability or consistency to scale out in the event of a failure.

However, the bigger problem with Sakai's use of XML in these old tables was
having queries that either need to read into the data in all of the XML to
build list views or index, not just a single entry. Either it wasn't all
indexed properly, paged correctly or the queries just were wrong.

Some tools like MailArchive were significantly improved in "more recent"
 versions of Sakai (2.6 - https://jira.sakaiproject.org/browse/SAK-13584)
however, announcements and MailArchive still have some improvements that
can be made because of edge cases (
https://jira.sakaiproject.org/browse/SAK-16539). The edge cases (large
sites, large message bodies, large numbers of objects) are the biggest
problems with these tools and rarely tested outside of production)

This data can be stored NoSQL-like moderately efficiently, as long as you
have a good plan for handling indexes and special cases as they arise (like
FriendFeed did/does: http://backchannel.org/blog/friendfeed-schemaless-mysql)

Then you get a semi hybrid system where you can use relational tables where
appropriate and also have schema-less blocks of text. The only real
difference between FF and what Sakai does is Sakai creates new columns (in
the same table) every-time we needed an index, Friendfeed created a new
separate table. This seemed like it allowed a more predictable/repeatable
pattern and also allowed use of a real generic "holds everything" entities
table. (Wow, how interestingly named!) I know I've been beating the FF drum
before but I think it's a viable model.

Just nobody (AFIAK) other than Dr. Chuck (and Savitha a little in
Announcements?) went back and put much time into cleaning up much of this
stuff or standardizing any of these old data storage. (Somewhat because of
the possibility of something new and shiny on the horizon!)

On Thu, Mar 7, 2013 at 6:29 PM, Steve Swinsburg
<steve.swinsburg at gmail.com>wrote:

> I think you need to understand that there are complex systems that are
> designed well and that don't suffer from the problems you describe.
> I never said relational databases are a hindrance. They are great. But
> there are times when a non relational model can be better suited. And when
> done right, it works well.
>
> S
>
>
> On Fri, Mar 8, 2013 at 2:50 AM, David Adams <da1 at vt.edu> wrote:
>
>> Steve Swinsburg wrote:
>> > You may not have got the gist of my earlier comments, but I was
>> > agreeing with you.
>>
>> I did get that, but I think a lot of the poor technical decisions in
>> the Sakai community going back to CHEF days and continuing most
>> spectacularly into the OAE project is wrapped up in this idea that
>> relational databases are a hinderance to be gotten around however
>> possible. Your comments reinforced that false idea, and I think
>> it's important to address them to avoid the continued unquestioned
>> acceptance of that idea within this community.
>>
>> I called your claims poisonous because the idea that there's some
>> magic easy path to deal with data storage, sorting, searching, and
>> collating is very tempting, and every time someone claims that
>> "technology X *never* has problem Y that relational databases
>> suffer from", there are a hundred readers out there who take that to
>> heart. But it's not true. The problems are the same, no matter what
>> storage mechanism you use. If they manifest in slightly different
>> ways, that may be a fact you can use to your advantage depending on
>> your system design, but schema changes are part of every system.
>>
>> For any non-trivial data, there is a schema implicit in both the data
>> and the code. And for any non-trivial system, schema transitions will
>> be required. I would argue that non-trivial schema transitions can be
>> *much* harder on non-RDBMS systems. We only have to look at all the
>> work expended in the OAE project on its own schema changes for evidence.
>>
>> -dave
>> _______________________________________________
>> sakai2-tcc mailing list
>> sakai2-tcc at collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/sakai2-tcc
>>
>
>
> _______________________________________________
> sakai2-tcc mailing list
> sakai2-tcc at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai2-tcc
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai2-tcc/attachments/20130307/f51e6b3b/attachment.html