[Building Sakai] Search tool: memory problem in rebuilding indexes.

Ian Boston ian at caret.cam.ac.uk
Wed Nov 11 11:13:57 PST 2009


On 11 Nov 2009, at 10:41, Stephen Marquard wrote:

> Hi,
>
> I believe we saw something similar. There may be a fix in trunk  
> though I don't have a JIRA reference handy. If you search recent  
> JIRAs for Search you may find it, otherwise David Horwitz can tell  
> you more though he's away until mid next week.
>
> Also the POI digesters for OOXML (Office 2007+ docx, xlsx, pptx,  
> etc.) are particularly bad at using memory - digesting content with  
> these digesters _significantly_ increases GC activity.
>
> We haven't yet found a solution to this except to minimize the  
> impact through restricting indexing to a single app server.
>
> This is likely to be an issue in Sakai 3 as well AFAIK, as the same  
> underlying libraries are used.


I think Sakai 2 uses older versions of POI.

The indexers in Sakai3 (Jackrabbit) are more up to date, not least  
because there are committers on POI and Lucene working on or in close  
contact with the Jackrabbit team, so the use of Lucene we way way way  
more advanced than in Sakai Search.

The other thing to note is a) Apache Tika is becoming and b) POI is  
starting to do releases again, so taking a later version of POI will  
almost certainly fix these problems.
IIUC
Ian



>
> Regards
> Stephen
>
>>>> Angel Nueda Lozano <anueda at asic.upv.es> 2009/11/11 12:02 PM >>>
> Hi all.
> We are testing the search tool in our 2.6.x instance. We have seen  
> that
> while the indexes are builded tomcat process is increasing the size of
> memory until it comes to occupy the entire system memory. This occurs
> around 200.000 documents processed and thereafter the process becomes
> extremely slow.
> Has anyone had similar problems? Is it some bug?
> Thanks in advance
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org 
>  with a subject of "unsubscribe"
>
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe at collab.sakaiproject.org 
>  with a subject of "unsubscribe"



More information about the sakai-dev mailing list