[Building Sakai] Elastic Search (SRCH-111)

John Bush john.bush at rsmart.com
Thu Jan 24 08:48:07 PST 2013


great, that is exactly what we need.  We probably need to talk more
about whether we make api changes or not and the UI.

On Thu, Jan 24, 2013 at 9:38 AM, Colin Hebert <colin.hebert at it.ox.ac.uk> wrote:
> Regarding the implementation switch, as I said I already added this
> option in my code (should be put in trunk to be usable) :
>
> https://github.com/ColinHebert/Sakai-Solr/blob/master/pack/src/main/webapp/WEB-INF/components.xml
>
> You simply need to set "search.service.impl" and
> "search.indexbuilder.impl" to the one provided by the implementation
> you prefer.
>
> On 24 January 2013 16:18, John Bush <john.bush at rsmart.com> wrote:
>> Based on my work and what I've seen around the solr work, I think we
>> are well posed to simply create some configuration that makes a switch
>> simple and easy.  Something we can do from sakai.properties and not by
>> modifying spring config would be the goal.
>>
>> The decision to move it in was made at the unconference in Phoenix
>> last week after I showed a few TCC members the work.  There probably
>> could have been more transparency about it, but everyone agreed this
>> was a good out of the box experience and the transition from the
>> legacy system should be easy.  Beth is finding some issues, and I'm
>> confident I can address those quickly.
>>
>> In terms of the default experience, I don't think solr is an option
>> unless I'm wrong, as it requires a separate search server.  So I think
>> ES is more well suited to be the default.  We don't even require that
>> for the database right now.
>>
>> This work was initiated by Ian's blog, and some conversations I had
>> with him along the way.  I think there is probably a lot of ways these
>> two efforts can join forces.  I realize system architecture varies
>> from organization to organization.  Our goal is to have a search impl
>> that can scale from small to large without ever needing to reengineer
>> anything.  That is what I think ES delivers, you simply add or remove
>> nodes as capacity grows and shrinks.
>>
>> Now other might prefer a separate server farm and some message queues
>> and a bunch of stuff like that.  I guess it might be helpful to
>> understand who other than oxford is really interested in supporting
>> that heavy weight of a Sakai installation.  Because while it isn't
>> probably a huge amount of work to align these implementations to be
>> easily switchable, it is work.
>>
>> In terms of load testing we've performed some small amount of testing
>> mostly to validate the cluster behavior and get some idea how long
>> reindexing a terrabyte of data might take.  I don't have any official
>> reports to share, but the search times have been impressive, and I
>> think the indexing time is acceptable.  Our goal over the next few
>> weeks is to produce some benchmarks comparing ES to the old search as
>> the information is available I'll share it.
>>
>> On Thu, Jan 24, 2013 at 7:31 AM, Colin Hebert <colin.hebert at it.ox.ac.uk> wrote:
>>> In the code written for the Solr implementation we added the
>>> possibility of choosing which implementation of search is used,
>>> allowing users to keep the current search index and to avoid forcing
>>> them to move right away and allowing other implementations to be used.
>>>
>>> There are also multiple additions that I think could be nice to
>>> include in the Elastic Search implementation:
>>>  - The Task system (done with TimerTasks in the ES impl) which in the
>>> current code of the Solr impl allows to submit Tasks to a queue. This
>>> queue can either be an inMemory queue, dequeued by a couple of
>>> ExecutorService, or an external queue (such as an AMQP server [yay,
>>> scalability]).
>>> Another nice thing we've done is allowing the indexing server to do
>>> the text extraction (Solr Cell or Attachment Type in ES).
>>>
>>> I'm also a bit curious, is this implementation of Search with ES
>>> related in any way with the work of Ian Boston (
>>> http://blog.tfd.co.uk/2012/10/11/sakai-cle-elasticsearch/ ).
>>>
>>> Anyway the code for SolrSearch is still available (
>>> https://github.com/ColinHebert/Sakai-Solr ) if anyone wants to take a
>>> look at it. We're also doing a BOF on the subject of Search next week
>>> during the EuroSakai conference in Paris to talk about what could be
>>> done with search and how to improve it further.
>>>
>>> On another note, as I said when we started talking about the Solr
>>> Implementation, the API could be easier to implement if rewritten.
>>> This has been done ( https://github.com/ColinHebert/Sakai-Search2 )
>>> and is currently just in need of a nice UI probably designed by
>>> someone who has some knowledge of UXP in search.
>>>
>>> Colin Hebert
>>>
>>> On 24 January 2013 14:12, Adam Marshall <adam.marshall at it.ox.ac.uk> wrote:
>>>> We are running SOLR in production with no issues, we were poised to contribute this back (possibly with help from Chuck / Adrian Fish) until this email dropped into my inbox. Now I'm not sure what to do - I think we'd like to support a configurable plugin approach.
>>>>
>>>> I'll get Colin Hebert who wrote the implementation to make a post to outline his thoughts on the matter.
>>>>
>>>> adam
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Beth Kirschner [mailto:bkirschn at umich.edu]
>>>> Sent: 24 January 2013 14:06
>>>> To: Adam Marshall; John Bush
>>>> Cc: sakai-dev (sakai-dev at collab.sakaiproject.org)
>>>> Subject: Re: [Building Sakai] Elastic Search (SRCH-111)
>>>>
>>>> I was wondering the same thing... I remember when discussing SOLR, the thought was that it would provide potential for new functionality (e.g. faceted search), but not address the scalability problems with search. The SRCH-111 JIRA states "The bulk of this work is simply a backend replacement that fixes most of the indexing/merging problems that have been experienced in large deployments...". This all sounds very promising. Has any of this been load tested? I'd like to put this on UM's load test calendar to compare results. I wonder if there's an opportunity to have configurable plugin options for a search back end?
>>>>
>>>> - Beth
>>>>
>>>> On Jan 24, 2013, at 8:39 AM, Adam Marshall wrote:
>>>>
>>>>> Has this been discussed before?
>>>>>
>>>>> I mentioned to the list ages ago that we have reimplemented search using SOLR and nobody mentioned this elastic search work. We have been asked to contribute our SOLR work to 2.10 (by Chuck) - so I think we should have a discussion as to how our implementation and this Elastic search work should together.
>>>>>
>>>>> adam
>>>>>
>>>>> -----Original Message-----
>>>>> From: sakai-dev-bounces at collab.sakaiproject.org
>>>>> [mailto:sakai-dev-bounces at collab.sakaiproject.org] On Behalf Of Beth
>>>>> Kirschner
>>>>> Sent: 24 January 2013 13:36
>>>>> To: John Bush
>>>>> Cc: sakai-dev (sakai-dev at collab.sakaiproject.org)
>>>>> Subject: Re: [Building Sakai] Elastic Search (SRCH-111)
>>>>>
>>>>> Thanks!
>>>>>
>>>>> On Jan 23, 2013, at 8:12 PM, John Bush wrote:
>>>>>
>>>>>> It's fixed, https://jira.sakaiproject.org/browse/SRCH-112, sorry
>>>>>> about that did some refactoring for unit tests introduced that.
>>>>>>
>>>>>> On Wed, Jan 23, 2013 at 5:58 PM, John Bush <john.bush at rsmart.com> wrote:
>>>>>>> hmm, that sounds like a bug, it should be 100% backwards compatible
>>>>>>> with existing configuration.  Put a JIRA in and I'll address it.
>>>>>>>
>>>>>>> On Wed, Jan 23, 2013 at 11:47 AM, Beth Kirschner <bkirschn at umich.edu> wrote:
>>>>>>>> Hi John,
>>>>>>>>
>>>>>>>> The new elastic search (SRCH-111) does not seem to be backward compatible, as least for sakai.properties configuration. My sakai trunk build does not boot with "search.enable = true". I've attached the catalina.out file, but here's the first error:
>>>>>>>>
>>>>>>>> 2013-01-23 10:41:35,127 ERROR Thread-3
>>>>>>>> org.sakaiproject.search.elasticsearch.ElasticSearchIndexBuilder -
>>>>>>>> Failed to load Stop words into Analyzer
>>>>>>>> java.lang.NullPointerException
>>>>>>>>
>>>>>>>> I'm not sure if this is intentional and the other specified sakai.properties (elasticsearch.http.*) need also need to be set or is this a bug? It will definitely break a lot of implementations as it stands. Perhaps I missed some email about this, but we should probably either update the JIRA to indicate properties changes will be _required_, or write this up as a new bug and make sure previous configurations will boot.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> - Beth
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> John Bush
>>>>>>> 602-490-0470
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> John Bush
>>>>>> 602-490-0470
>>>>>
>>>>> _______________________________________________
>>>>> sakai-dev mailing list
>>>>> sakai-dev at collab.sakaiproject.org
>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>
>>>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"
>>>>
>>>> _______________________________________________
>>>> sakai-dev mailing list
>>>> sakai-dev at collab.sakaiproject.org
>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>
>>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"
>>
>>
>>
>> --
>> John Bush
>> 602-490-0470



-- 
John Bush
602-490-0470


More information about the sakai-dev mailing list