[Building Sakai] Elastic Search (SRCH-111)

John Bush john.bush at rsmart.com
Fri Jan 25 15:03:54 PST 2013


yes it uses the defaults which based on what I read might be
reasonable for most people.  I believe that is 5 shards and 1 replica.
 There is a lot of discussion about picking the optimal numbers if you
google around.

You set them in your sakai.properties like this:

 elasticsearch.index.number_of_shards=5
 elasticsearch.index.number_of_replicas=1

In most cases I believe you want the number of shards to be set to
around about how many nodes you might grow to.  You can't change that
number without a full reindex.  The number of replicas is adjustable
at runtime, so you could change that on the fly using the JSON api and
curl for example.

This video does a good job explaining the dynamics if you have the
time check it out:
http://www.elasticsearch.org/videos/2012/06/05/big-data-search-and-analytics.html

On Fri, Jan 25, 2013 at 1:41 PM, Zhen Qian <zqian at umich.edu> wrote:
> John:
>
> I am new to Elastic Search. When I look at the elastic search impl code, I
> cannot find the settings for shards or replicas per node. Is it using the
> default setting of ElasticSearch?
>
> Thanks,
>
> - Zhen
>
>
> On Fri, Jan 25, 2013 at 11:29 AM, John Bush <john.bush at rsmart.com> wrote:
>>
>> >
>> > If that's the case, then I agree entirely. It seems mad to be forced to
>> > cluster your sakai app servers just to scale your search thing. I'm not
>> > sure
>> > that's what he is saying though. ...Anybody serious about search will
>> > need an external search thing.
>>
>> Unless the use cases for search changes dramatically, I find it hard
>> to imagine a case where you would need to add nodes just to handle
>> search.  Once things are indexed that work load is not really the
>> significant, and really as you pointed out as content is being created
>> and indexed on the fly its not very significant either.  So I disagree
>> that an embedded approach can't just scale with the normal user load
>> the only change being perhaps how many users you can fit on a node or
>> RAM.
>>
>> Maybe pounds work differently that dollars, but at the end of the day
>> this is all about cost.  If I was to go to any sane operations or IT
>> manager and say if you want search to work in Sakai you can add some
>> more RAM to your existing app server nodes (or maybe do nothing), or
>> you can setup a new server and a potentially a new cluster.  Which
>> option do you think they'd take ?  Configuration, server deployment,
>> procurement of the machines, the knowledge around all that stuff all
>> amounts to cost.  So this argument is not solely about what
>> architectures we like more or think might scale better, at the end of
>> the day its about cost.  Personally, I think an embedded approach is
>> more cost effective.  For rSmart which literally has hundreds of Sakai
>> nodes a change in the cost structure of that magnitude is very
>> significant.  I realize for others the situation is different.
>>
>> The idea that search is somehow the bottleneck of the system that
>> warrants a new app node or that search activity is so great that it
>> poses overall risk to the node just isn't consistent with my
>> experience.  If you really wanted to protect users from risk, I'd
>> start with externalizing msgcntr and samigo.
>>
>> > Surely the integration is using the REST
>> > api, not the internal Java one? I think the embedded/external argument
>> > is
>> > moot.
>>
>> The integration uses the internal Java APIs, but that doesn't mean you
>> couldn't conceivably run ES as a separate server.  The code as is
>> doesn't support that yet, but its certainly possible, but not
>> something I was ever planning on personally implementing, but I don't
>> see the usefulness of such a design.  Understand that even when ES is
>> embedded you can access the REST app directly with curl or whatever,
>> this is in fact how I typically work to create queries or do anything
>> administrative.
>>
>> --
>> John Bush
>> 602-490-0470
>>
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev at collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to
>> sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of
>> "unsubscribe"
>
>



-- 
John Bush
602-490-0470


More information about the sakai-dev mailing list