[Building Sakai] Search tool: memory problem in rebuilding indexes.

Stephen Marquard stephen.marquard at uct.ac.za
Wed Nov 11 11:56:07 PST 2009


We have current versions of POI and they don't fix the problem.

Regards
Stephen 
 
>>> Ian Boston <ian at caret.cam.ac.uk> 11/11/2009 9:13 PM >>> 

On 11 Nov 2009, at 10:41, Stephen Marquard wrote:

> Hi,
>
> I believe we saw something similar. There may be a fix in trunk  
> though I don't have a JIRA reference handy. If you search recent  
> JIRAs for Search you may find it, otherwise David Horwitz can tell  
> you more though he's away until mid next week.
>
> Also the POI digesters for OOXML (Office 2007+ docx, xlsx, pptx,  
> etc.) are particularly bad at using memory - digesting content with  
> these digesters _significantly_ increases GC activity.
>
> We haven't yet found a solution to this except to minimize the  
> impact through restricting indexing to a single app server.
>
> This is likely to be an issue in Sakai 3 as well AFAIK, as the same  
> underlying libraries are used.


I think Sakai 2 uses older versions of POI.

The indexers in Sakai3 (Jackrabbit) are more up to date, not least  
because there are committers on POI and Lucene working on or in close  
contact with the Jackrabbit team, so the use of Lucene we way way way  
more advanced than in Sakai Search.

The other thing to note is a) Apache Tika is becoming and b) POI is  
starting to do releases again, so taking a later version of POI will  
almost certainly fix these problems.
IIUC
Ian




More information about the sakai-dev mailing list