[Building Sakai] Elastic Search (SRCH-111)

John Bush john.bush at rsmart.com
Thu Jan 24 08:18:01 PST 2013


Based on my work and what I've seen around the solr work, I think we
are well posed to simply create some configuration that makes a switch
simple and easy.  Something we can do from sakai.properties and not by
modifying spring config would be the goal.

The decision to move it in was made at the unconference in Phoenix
last week after I showed a few TCC members the work.  There probably
could have been more transparency about it, but everyone agreed this
was a good out of the box experience and the transition from the
legacy system should be easy.  Beth is finding some issues, and I'm
confident I can address those quickly.

In terms of the default experience, I don't think solr is an option
unless I'm wrong, as it requires a separate search server.  So I think
ES is more well suited to be the default.  We don't even require that
for the database right now.

This work was initiated by Ian's blog, and some conversations I had
with him along the way.  I think there is probably a lot of ways these
two efforts can join forces.  I realize system architecture varies
from organization to organization.  Our goal is to have a search impl
that can scale from small to large without ever needing to reengineer
anything.  That is what I think ES delivers, you simply add or remove
nodes as capacity grows and shrinks.

Now other might prefer a separate server farm and some message queues
and a bunch of stuff like that.  I guess it might be helpful to
understand who other than oxford is really interested in supporting
that heavy weight of a Sakai installation.  Because while it isn't
probably a huge amount of work to align these implementations to be
easily switchable, it is work.

In terms of load testing we've performed some small amount of testing
mostly to validate the cluster behavior and get some idea how long
reindexing a terrabyte of data might take.  I don't have any official
reports to share, but the search times have been impressive, and I
think the indexing time is acceptable.  Our goal over the next few
weeks is to produce some benchmarks comparing ES to the old search as
the information is available I'll share it.

On Thu, Jan 24, 2013 at 7:31 AM, Colin Hebert <colin.hebert at it.ox.ac.uk> wrote:
> In the code written for the Solr implementation we added the
> possibility of choosing which implementation of search is used,
> allowing users to keep the current search index and to avoid forcing
> them to move right away and allowing other implementations to be used.
>
> There are also multiple additions that I think could be nice to
> include in the Elastic Search implementation:
>  - The Task system (done with TimerTasks in the ES impl) which in the
> current code of the Solr impl allows to submit Tasks to a queue. This
> queue can either be an inMemory queue, dequeued by a couple of
> ExecutorService, or an external queue (such as an AMQP server [yay,
> scalability]).
> Another nice thing we've done is allowing the indexing server to do
> the text extraction (Solr Cell or Attachment Type in ES).
>
> I'm also a bit curious, is this implementation of Search with ES
> related in any way with the work of Ian Boston (
> http://blog.tfd.co.uk/2012/10/11/sakai-cle-elasticsearch/ ).
>
> Anyway the code for SolrSearch is still available (
> https://github.com/ColinHebert/Sakai-Solr ) if anyone wants to take a
> look at it. We're also doing a BOF on the subject of Search next week
> during the EuroSakai conference in Paris to talk about what could be
> done with search and how to improve it further.
>
> On another note, as I said when we started talking about the Solr
> Implementation, the API could be easier to implement if rewritten.
> This has been done ( https://github.com/ColinHebert/Sakai-Search2 )
> and is currently just in need of a nice UI probably designed by
> someone who has some knowledge of UXP in search.
>
> Colin Hebert
>
> On 24 January 2013 14:12, Adam Marshall <adam.marshall at it.ox.ac.uk> wrote:
>> We are running SOLR in production with no issues, we were poised to contribute this back (possibly with help from Chuck / Adrian Fish) until this email dropped into my inbox. Now I'm not sure what to do - I think we'd like to support a configurable plugin approach.
>>
>> I'll get Colin Hebert who wrote the implementation to make a post to outline his thoughts on the matter.
>>
>> adam
>>
>>
>>
>> -----Original Message-----
>> From: Beth Kirschner [mailto:bkirschn at umich.edu]
>> Sent: 24 January 2013 14:06
>> To: Adam Marshall; John Bush
>> Cc: sakai-dev (sakai-dev at collab.sakaiproject.org)
>> Subject: Re: [Building Sakai] Elastic Search (SRCH-111)
>>
>> I was wondering the same thing... I remember when discussing SOLR, the thought was that it would provide potential for new functionality (e.g. faceted search), but not address the scalability problems with search. The SRCH-111 JIRA states "The bulk of this work is simply a backend replacement that fixes most of the indexing/merging problems that have been experienced in large deployments...". This all sounds very promising. Has any of this been load tested? I'd like to put this on UM's load test calendar to compare results. I wonder if there's an opportunity to have configurable plugin options for a search back end?
>>
>> - Beth
>>
>> On Jan 24, 2013, at 8:39 AM, Adam Marshall wrote:
>>
>>> Has this been discussed before?
>>>
>>> I mentioned to the list ages ago that we have reimplemented search using SOLR and nobody mentioned this elastic search work. We have been asked to contribute our SOLR work to 2.10 (by Chuck) - so I think we should have a discussion as to how our implementation and this Elastic search work should together.
>>>
>>> adam
>>>
>>> -----Original Message-----
>>> From: sakai-dev-bounces at collab.sakaiproject.org
>>> [mailto:sakai-dev-bounces at collab.sakaiproject.org] On Behalf Of Beth
>>> Kirschner
>>> Sent: 24 January 2013 13:36
>>> To: John Bush
>>> Cc: sakai-dev (sakai-dev at collab.sakaiproject.org)
>>> Subject: Re: [Building Sakai] Elastic Search (SRCH-111)
>>>
>>> Thanks!
>>>
>>> On Jan 23, 2013, at 8:12 PM, John Bush wrote:
>>>
>>>> It's fixed, https://jira.sakaiproject.org/browse/SRCH-112, sorry
>>>> about that did some refactoring for unit tests introduced that.
>>>>
>>>> On Wed, Jan 23, 2013 at 5:58 PM, John Bush <john.bush at rsmart.com> wrote:
>>>>> hmm, that sounds like a bug, it should be 100% backwards compatible
>>>>> with existing configuration.  Put a JIRA in and I'll address it.
>>>>>
>>>>> On Wed, Jan 23, 2013 at 11:47 AM, Beth Kirschner <bkirschn at umich.edu> wrote:
>>>>>> Hi John,
>>>>>>
>>>>>> The new elastic search (SRCH-111) does not seem to be backward compatible, as least for sakai.properties configuration. My sakai trunk build does not boot with "search.enable = true". I've attached the catalina.out file, but here's the first error:
>>>>>>
>>>>>> 2013-01-23 10:41:35,127 ERROR Thread-3
>>>>>> org.sakaiproject.search.elasticsearch.ElasticSearchIndexBuilder -
>>>>>> Failed to load Stop words into Analyzer
>>>>>> java.lang.NullPointerException
>>>>>>
>>>>>> I'm not sure if this is intentional and the other specified sakai.properties (elasticsearch.http.*) need also need to be set or is this a bug? It will definitely break a lot of implementations as it stands. Perhaps I missed some email about this, but we should probably either update the JIRA to indicate properties changes will be _required_, or write this up as a new bug and make sure previous configurations will boot.
>>>>>>
>>>>>> Thanks,
>>>>>> - Beth
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> John Bush
>>>>> 602-490-0470
>>>>
>>>>
>>>>
>>>> --
>>>> John Bush
>>>> 602-490-0470
>>>
>>> _______________________________________________
>>> sakai-dev mailing list
>>> sakai-dev at collab.sakaiproject.org
>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>
>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"
>>
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev at collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"



-- 
John Bush
602-490-0470


More information about the sakai-dev mailing list