[Building Sakai] ElasticSearch Testing

John Bush john.bush at rsmart.com
Fri Feb 22 09:15:39 PST 2013


I've been spending the last few weeks tweeking Sakai's elasticsearch
impl in order to better scale.  It would be helpful if folks could
give me an idea of the number of docs in their sakai repos, and the
total size.  I'm sure this varies, but in general for our clients,
especially those that have been using Sakai for a bit, I'm seeing
around 400-500k docs and nearly a 1/2 terabyte of data.

You can simply run these queries to collect that info:

select count(resource_id) from content_resource
select sum(file_size) from content_resource

Currently using 4 medium size nodes in aws, with 35k docs and a repo
of 20GB, I'm getting search response times on average around 150ms and
often much faster.  I'm going along doubling the repository size and
so far not seeing much of any impact in performance, although I
imagine there is a point that changes.

The code in trunk does not scale well, so I will be making a big
commit once I have all the kinks ironed out.  It turns out that the
highlighting in ElasticSearch is slow, and it also greatly increased
the size of the repo.  I had to rewrite that piece to do my own
highlighting, similar to what we were doing in the legacy search.  The
side affect of that is that we no longer need to store the whole
source doc in ES, the index size has dramatically dropped as such.
Right now I'm seeing an index size that is about half the size of the
repo.  I think I can get that down further, but its significantly
better than triple the repo which I was seeing before.

I still have some work to do, to fine turn or eliminate the use of
facets.  Facets add an enormous memory requirement.  I think I can
eliminate this by some more careful indexing, which may end up
increasing the size of the index again, but I think that is a fair
tradeout vs requiring significantly more ram.  There is supposed to be
a way to reel in the memory consumption of these guys, but I have yet
to get that configuration working in practice.

Attached is a screenshot, 17,618 hits in 0.118 seconds, not bad.

--
John Bush
602-490-0470
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2013-02-22 at 10.08.30 AM.png
Type: image/png
Size: 39185 bytes
Desc: not available
Url : http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20130222/fdb57b56/attachment.png 


More information about the sakai-dev mailing list