[Building Sakai] Elastic Search (SRCH-111)

Sun Jan 27 09:07:45 PST 2013

Zhen, for the Solr implementation, I decided to use Solr 3.6.1, but
the schema is compatible with Solr4. SolrJ 3.6.1 is able to send
requests to a Solr4 server, so yes it works fine.

I think soon enough I'll move the SolrJ version to 4.0.1, but that's
not a priority now (and shouldn't change anything really).

Colin Hebert

On 27 January 2013 17:20, Colin Hebert <hebert.colin at gmail.com> wrote:
> Zhen, for the Solr implementation, I decided to use Solr 3.6.1, but
> the schema is compatible with Solr4. SolrJ 3.6.1 is able to send
> requests to a Solr4 server, so yes it works fine.
>
> I think soon enough I'll move the SolrJ version to 4.0.1, but that's
> not a priority now (and shouldn't change anything really).
> Colin Hebert
>
>
> On 26 January 2013 06:22, Zhen Qian <zqian at umich.edu> wrote:
>> Thanks, John. It does look like an empirical problem.
>>
>> Another question: how do you handle the acl of searchable items? I briefly
>> looked through the source code, and it looks to me the permission control is
>> on site level, and there is no support for group permission yet?
>>
>> Here is a question for Adrian: Is your SOLR integration work based on the
>> recent Solr 4 release, which brings in many scalability improvements?
>>
>> Thanks,
>>
>> - Zhen
>>
>>
>> On Fri, Jan 25, 2013 at 6:03 PM, John Bush <john.bush at rsmart.com> wrote:
>>>
>>> yes it uses the defaults which based on what I read might be
>>> reasonable for most people.  I believe that is 5 shards and 1 replica.
>>>  There is a lot of discussion about picking the optimal numbers if you
>>> google around.
>>>
>>> You set them in your sakai.properties like this:
>>>
>>>  elasticsearch.index.number_of_shards=5
>>>  elasticsearch.index.number_of_replicas=1
>>>
>>> In most cases I believe you want the number of shards to be set to
>>> around about how many nodes you might grow to.  You can't change that
>>> number without a full reindex.  The number of replicas is adjustable
>>> at runtime, so you could change that on the fly using the JSON api and
>>> curl for example.
>>>
>>> This video does a good job explaining the dynamics if you have the
>>> time check it out:
>>>
>>> http://www.elasticsearch.org/videos/2012/06/05/big-data-search-and-analytics.html
>>>
>>> On Fri, Jan 25, 2013 at 1:41 PM, Zhen Qian <zqian at umich.edu> wrote:
>>> > John:
>>> >
>>> > I am new to Elastic Search. When I look at the elastic search impl code,
>>> > I
>>> > cannot find the settings for shards or replicas per node. Is it using
>>> > the
>>> > default setting of ElasticSearch?
>>> >
>>> > Thanks,
>>> >
>>> > - Zhen
>>> >
>>> >
>>> > On Fri, Jan 25, 2013 at 11:29 AM, John Bush <john.bush at rsmart.com>
>>> > wrote:
>>> >>
>>> >> >
>>> >> > If that's the case, then I agree entirely. It seems mad to be forced
>>> >> > to
>>> >> > cluster your sakai app servers just to scale your search thing. I'm
>>> >> > not
>>> >> > sure
>>>
>>> >> > that's what he is saying though. ...Anybody serious about search will
>>> >> > need an external search thing.
>>> >>
>>> >> Unless the use cases for search changes dramatically, I find it hard
>>> >> to imagine a case where you would need to add nodes just to handle
>>> >> search.  Once things are indexed that work load is not really the
>>> >> significant, and really as you pointed out as content is being created
>>> >> and indexed on the fly its not very significant either.  So I disagree
>>> >> that an embedded approach can't just scale with the normal user load
>>> >> the only change being perhaps how many users you can fit on a node or
>>> >> RAM.
>>> >>
>>> >> Maybe pounds work differently that dollars, but at the end of the day
>>> >> this is all about cost.  If I was to go to any sane operations or IT
>>> >> manager and say if you want search to work in Sakai you can add some
>>> >> more RAM to your existing app server nodes (or maybe do nothing), or
>>> >> you can setup a new server and a potentially a new cluster.  Which
>>> >> option do you think they'd take ?  Configuration, server deployment,
>>> >> procurement of the machines, the knowledge around all that stuff all
>>> >> amounts to cost.  So this argument is not solely about what
>>> >> architectures we like more or think might scale better, at the end of
>>> >> the day its about cost.  Personally, I think an embedded approach is
>>> >> more cost effective.  For rSmart which literally has hundreds of Sakai
>>> >> nodes a change in the cost structure of that magnitude is very
>>> >> significant.  I realize for others the situation is different.
>>> >>
>>> >> The idea that search is somehow the bottleneck of the system that
>>> >> warrants a new app node or that search activity is so great that it
>>> >> poses overall risk to the node just isn't consistent with my
>>> >> experience.  If you really wanted to protect users from risk, I'd
>>> >> start with externalizing msgcntr and samigo.
>>> >>
>>> >> > Surely the integration is using the REST
>>> >> > api, not the internal Java one? I think the embedded/external
>>> >> > argument
>>> >> > is
>>> >> > moot.
>>> >>
>>> >> The integration uses the internal Java APIs, but that doesn't mean you
>>> >> couldn't conceivably run ES as a separate server.  The code as is
>>> >> doesn't support that yet, but its certainly possible, but not
>>> >> something I was ever planning on personally implementing, but I don't
>>> >> see the usefulness of such a design.  Understand that even when ES is
>>> >> embedded you can access the REST app directly with curl or whatever,
>>> >> this is in fact how I typically work to create queries or do anything
>>> >> administrative.
>>> >>
>>> >> --
>>> >> John Bush
>>> >> 602-490-0470
>>> >>
>>> >> _______________________________________________
>>> >> sakai-dev mailing list
>>> >> sakai-dev at collab.sakaiproject.org
>>> >> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>> >>
>>> >> TO UNSUBSCRIBE: send email to
>>> >> sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of
>>> >> "unsubscribe"
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> John Bush
>>> 602-490-0470
>>
>>
>>
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev at collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org
>> with a subject of "unsubscribe"