[Building Sakai] ElasticSearch Testing

Zhen Qian zqian at umich.edu
Tue Feb 26 08:06:39 PST 2013


Hi, John:

Here is the result for CTools in UMich, with 10+ years of data:

select count(resource_id) from content_resource
15652880

select sum(file_size) from content_resource

15565111245559

So it is 15 million doc with 16T in size. Last time (7/2012) when the
system was re-indexed with Sakai 2.7 search, it took 6 hours. I don't have
the search turnaround data at hand, though.

Search is among the top of our load testing candidates here in UMich. I
hope we can do the load test soon. I think it is fine to get the basic
search (without facet support) working first.

BTW, is there a wiki page for the elastic search project on Sakai
confluence site?

Thanks,

- Zhen


On Fri, Feb 22, 2013 at 12:15 PM, John Bush <john.bush at rsmart.com> wrote:

> I've been spending the last few weeks tweeking Sakai's elasticsearch
> impl in order to better scale.  It would be helpful if folks could
> give me an idea of the number of docs in their sakai repos, and the
> total size.  I'm sure this varies, but in general for our clients,
> especially those that have been using Sakai for a bit, I'm seeing
> around 400-500k docs and nearly a 1/2 terabyte of data.
>
> You can simply run these queries to collect that info:
>
> select count(resource_id) from content_resource
> select sum(file_size) from content_resource
>
> Currently using 4 medium size nodes in aws, with 35k docs and a repo
> of 20GB, I'm getting search response times on average around 150ms and
> often much faster.  I'm going along doubling the repository size and
> so far not seeing much of any impact in performance, although I
> imagine there is a point that changes.
>
> The code in trunk does not scale well, so I will be making a big
> commit once I have all the kinks ironed out.  It turns out that the
> highlighting in ElasticSearch is slow, and it also greatly increased
> the size of the repo.  I had to rewrite that piece to do my own
> highlighting, similar to what we were doing in the legacy search.  The
> side affect of that is that we no longer need to store the whole
> source doc in ES, the index size has dramatically dropped as such.
> Right now I'm seeing an index size that is about half the size of the
> repo.  I think I can get that down further, but its significantly
> better than triple the repo which I was seeing before.
>
> I still have some work to do, to fine turn or eliminate the use of
> facets.  Facets add an enormous memory requirement.  I think I can
> eliminate this by some more careful indexing, which may end up
> increasing the size of the index again, but I think that is a fair
> tradeout vs requiring significantly more ram.  There is supposed to be
> a way to reel in the memory consumption of these guys, but I have yet
> to get that configuration working in practice.
>
> Attached is a screenshot, 17,618 hits in 0.118 seconds, not bad.
>
> --
> John Bush
> 602-490-0470
>
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to
> sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of
> "unsubscribe"
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20130226/f1054830/attachment.html 


More information about the sakai-dev mailing list