[Building Sakai] Elastic Search (SRCH-111)

Zhen Qian zqian at umich.edu
Fri Jan 25 21:22:11 PST 2013


Thanks, John. It does look like an empirical problem.

Another question: how do you handle the acl of searchable items? I briefly
looked through the source code, and it looks to me the permission control
is on site level, and there is no support for group permission yet?

Here is a question for Adrian: Is your SOLR integration work based on the
recent Solr 4 release, which brings in many scalability improvements?

Thanks,

- Zhen


On Fri, Jan 25, 2013 at 6:03 PM, John Bush <john.bush at rsmart.com> wrote:

> yes it uses the defaults which based on what I read might be
> reasonable for most people.  I believe that is 5 shards and 1 replica.
>  There is a lot of discussion about picking the optimal numbers if you
> google around.
>
> You set them in your sakai.properties like this:
>
>  elasticsearch.index.number_of_shards=5
>  elasticsearch.index.number_of_replicas=1
>
> In most cases I believe you want the number of shards to be set to
> around about how many nodes you might grow to.  You can't change that
> number without a full reindex.  The number of replicas is adjustable
> at runtime, so you could change that on the fly using the JSON api and
> curl for example.
>
> This video does a good job explaining the dynamics if you have the
> time check it out:
>
> http://www.elasticsearch.org/videos/2012/06/05/big-data-search-and-analytics.html
>
> On Fri, Jan 25, 2013 at 1:41 PM, Zhen Qian <zqian at umich.edu> wrote:
> > John:
> >
> > I am new to Elastic Search. When I look at the elastic search impl code,
> I
> > cannot find the settings for shards or replicas per node. Is it using the
> > default setting of ElasticSearch?
> >
> > Thanks,
> >
> > - Zhen
> >
> >
> > On Fri, Jan 25, 2013 at 11:29 AM, John Bush <john.bush at rsmart.com>
> wrote:
> >>
> >> >
> >> > If that's the case, then I agree entirely. It seems mad to be forced
> to
> >> > cluster your sakai app servers just to scale your search thing. I'm
> not
> >> > sure
> >> > that's what he is saying though. ...Anybody serious about search will
> >> > need an external search thing.
> >>
> >> Unless the use cases for search changes dramatically, I find it hard
> >> to imagine a case where you would need to add nodes just to handle
> >> search.  Once things are indexed that work load is not really the
> >> significant, and really as you pointed out as content is being created
> >> and indexed on the fly its not very significant either.  So I disagree
> >> that an embedded approach can't just scale with the normal user load
> >> the only change being perhaps how many users you can fit on a node or
> >> RAM.
> >>
> >> Maybe pounds work differently that dollars, but at the end of the day
> >> this is all about cost.  If I was to go to any sane operations or IT
> >> manager and say if you want search to work in Sakai you can add some
> >> more RAM to your existing app server nodes (or maybe do nothing), or
> >> you can setup a new server and a potentially a new cluster.  Which
> >> option do you think they'd take ?  Configuration, server deployment,
> >> procurement of the machines, the knowledge around all that stuff all
> >> amounts to cost.  So this argument is not solely about what
> >> architectures we like more or think might scale better, at the end of
> >> the day its about cost.  Personally, I think an embedded approach is
> >> more cost effective.  For rSmart which literally has hundreds of Sakai
> >> nodes a change in the cost structure of that magnitude is very
> >> significant.  I realize for others the situation is different.
> >>
> >> The idea that search is somehow the bottleneck of the system that
> >> warrants a new app node or that search activity is so great that it
> >> poses overall risk to the node just isn't consistent with my
> >> experience.  If you really wanted to protect users from risk, I'd
> >> start with externalizing msgcntr and samigo.
> >>
> >> > Surely the integration is using the REST
> >> > api, not the internal Java one? I think the embedded/external argument
> >> > is
> >> > moot.
> >>
> >> The integration uses the internal Java APIs, but that doesn't mean you
> >> couldn't conceivably run ES as a separate server.  The code as is
> >> doesn't support that yet, but its certainly possible, but not
> >> something I was ever planning on personally implementing, but I don't
> >> see the usefulness of such a design.  Understand that even when ES is
> >> embedded you can access the REST app directly with curl or whatever,
> >> this is in fact how I typically work to create queries or do anything
> >> administrative.
> >>
> >> --
> >> John Bush
> >> 602-490-0470
> >>
> >> _______________________________________________
> >> sakai-dev mailing list
> >> sakai-dev at collab.sakaiproject.org
> >> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
> >>
> >> TO UNSUBSCRIBE: send email to
> >> sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of
> >> "unsubscribe"
> >
> >
>
>
>
> --
> John Bush
> 602-490-0470
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20130126/6ecc23b3/attachment.html 


More information about the sakai-dev mailing list