[Building Sakai] Search weirdness

Stephen Marquard stephen.marquard at uct.ac.za
Wed Jun 16 06:30:41 PDT 2010


The JIRA in which this was added is

http://jira.sakaiproject.org/browse/SAK-16034

To support storing digested content, it looks like your ECP should also implement StoredDigestContentProducer. Our logic was that it wasn't always necessary to store the digested versions, depending on how expensive it is to digest content. Some files in CHS are known to be very resource-intensive to digest (PDFs, OOXML docs), so that was the primary target for performance gains.

Regards
Stephen
 
>>> Adrian Fish <a.fish at lancaster.ac.uk> 6/16/2010 6:59 AM >>> 
Hi Stephen,

It's on the trunk that I'm seeing it. I have actually seen the new 
directory I think (indexwork?). My ECP getContent is definitely getting 
called on a match though.

Cheers,

Adrian.

Stephen Marquard wrote:
> Hi Adrian,
>
> Search does not store the digested content in the index itself (it just stores the search terms), so when the search tool displays a list of matches, it calls the ECP for each item to get back a digested version show it can show the few lines with matching text.
>
> One of the changes between 2.6 and 2.7 is that the digested content now gets stored in a separate filesystem tree on first digesting (at index time), removing the need to re-digest it every time there's a match on the content in the index. Are you seeing behaviour below in 2.6.x or 2.7/trunk?
>
> Regards
> Stephen
>  
>   
>>>> Adrian Fish <a.fish at lancaster.ac.uk> 6/16/2010 4:45 AM >>> 
>>>>         
> I've implemented a EntityContentProducer in YAFT which works as 
> expected. The search index builder calls into it when a relevant YAFT 
> event occurs or when the index is refreshed and all seems okay. 
> However,when I search for something I know is there, via the search tool 
> interface, the matches and getContent methods on my ECP get called 
> again, even though the content should theoretically be in the index. I'm 
> wondering whether this is the case for all the search enabled tools.
>
> Has anybody any idea what is going on?
>
> Cheers,
>
> Adrian.
>
>   

-- 
==================================
Adrian Fish
Software Engineer
Centre for e-Science
Bowland Tower South C Floor
Lancaster University
Lancaster
LA1 4YW
email: a.fish at lancaster.ac.uk

http://confluence.sakaiproject.org/display/YAFT/Yaft
http://confluence.sakaiproject.org/display/BLOG/Home
http://confluence.sakaiproject.org/display/AGORA/Home




 

###
UNIVERSITY OF CAPE TOWN 

This e-mail is subject to the UCT ICT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) to whom it is addressed. If the e-mail has reached you in error, please notify the author. If you are not the intended recipient of the e-mail you may not use, disclose, copy, redirect or print the content. If this e-mail is not related to the business of UCT it is sent by the sender in the sender's individual capacity.

###
 


More information about the sakai-dev mailing list