[Deploying Sakai] Search tool and rebuilding index

David Horwitz david.horwitz at uct.ac.za
Thu May 5 04:24:43 PDT 2011


Hi Leon,


These are simply individual items that search wasn't able to digest, 
explanations inline

On 05/05/2011 05:01 AM, Leon Kolchinsky wrote:
> Thanks David and Anand,
>
> I've cleared search tables in the DB, deleted files in shared and 
> local search folders and rerun "Rebuild Whole Index 
> <http://vera029.its.monash.edu.au/portal/tool/55d5a8bf-b3a1-42ed-81ac-7da250e0dc90/admin/index?command=rebuildinstance>"
>
> It seems OK except some worning messages.
>
> Should I be worried about the following?
>
> 2011-05-05 11:37:02,208  WARN Timer-1 
> org.sakaiproject.search.indexer.impl.SearchBuilderQueueManager - 
> Entity Reference Longer than 255 characters, ignored: 
> Reference=/content/group/379e49c0-50fe-4a2b-ba24-752f9cc228b4/Anushi 
> PHD/Anushi PHD/Anushi Immuno nanoparticle library.Data/PDF/M6_Levy et 
> al, 1999 (Effect of shear on plasmid DNA in 
> solution)-4065125014/M6_Levy et al, 1999 (Effect of shear on plasmid 
> DNA in solution).pdf
> 2011-05-05 11:37:02,228  WARN Timer-1 
> org.sakaiproject.search.indexer.impl.SearchBuilderQueueManager - 
> Entity Reference Longer than 255 characters, ignored: 
> Reference=/content/group/379e49c0-50fe-4a2b-ba24-752f9cc228b4/Anushi 
> PHD/Anushi PHD/Anushi Immuno nanoparticle 
> library.Data/PDF/F2_Taussig1974 (Mists and aerosols New studies, new 
> thoughts)-0908373378/F2_Taussig1974 (Mists and aerosols New studies, 
> new thoughts).pdf
>
Search has a 255 character limit on a reference, content references can 
be longer - we need to fix this sometime

>
> 2011-05-05 11:50:42,537  WARN Timer-1 
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer 
> - Failed to digest 
> /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR Leadrership 
> course module 2.pdf with 
> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 9e72139 
> cause: Failed to get content for indexing: cause: Error: expected hex 
> character and not  :32
>

The pdf digester (Apache PDFBox) was not able to read this pdf


> 2011-05-05 11:50:42,538  INFO Timer-1 
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer 
> - Digested /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR 
> Leadrership course module 2.pdf into 43 characters with Default Digester
> 2011-05-05 11:50:43,832  WARN Timer-1 
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer 
> - Failed to digest 
> /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics 
> Admin/Consultative Council for Human Research Ethics Checklist.pdf 
> with 
> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 9e72139 
> cause: Failed to get content for indexing: cause: null
> 2011-05-05 11:50:43,832  INFO Timer-1 
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer 
> - Digested /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics 
> Admin/Consultative Council for Human Research Ethics Checklist.pdf 
> into 66 characters with Default Digester
> 2011-05-05 11:50:45,785  WARN Timer-1 
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer 
> - Failed to digest 
> /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics Admin/CCHRE 
> Research Governance SOPs.pdf with 
> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 9e72139 
> cause: Failed to get content for indexing: cause: null
> 2011-05-05 11:50:45,786  INFO Timer-1 
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer 
> - Digested /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics 
> Admin/CCHRE Research Governance SOPs.pdf into 40 characters with 
> Default Digester
> 2011-05-05 11:50:46,558  WARN Timer-1 
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer 
> - Failed to digest 
> /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics 
> Admin/Consultative Council for Human Research Governance Checklist.pdf 
> with 
> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 9e72139 
> cause: Failed to get content for indexing: cause: null
> 2011-05-05 11:50:46,558  INFO Timer-1 
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer 
> - Digested /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics 
> Admin/Consultative Council for Human Research Governance Checklist.pdf 
> into 70 characters with Default Digester
> Warning, header block comes after data blocks in POIFS block listing
>
>

Ditto

D

> Thanks,
> Leon Kolchinsky
>
>
>
> On Thu, May 5, 2011 at 01:41, Anand Mehta <anand.mehta at yahoo.com 
> <mailto:anand.mehta at yahoo.com>> wrote:
>
>     Hi Leon,
>
>     There should be no problem with all Tomcats accessing the
>     sharedJournalBase directory simultaneously. As David mentioned,
>     cleaning the search tables and files should help. You can also try
>     the workaround here https://jira.sakaiproject.org/browse/SRCH-15
>     if you still see some messages about missing segments.
>
>     Thanks,
>     Anand
>     ------------------------------------------------------------------------
>     *From:* David Horwitz <david.horwitz at uct.ac.za
>     <mailto:david.horwitz at uct.ac.za>>
>     *To:* production at collab.sakaiproject.org
>     <mailto:production at collab.sakaiproject.org>
>     *Sent:* Wednesday, May 4, 2011 4:02 AM
>     *Subject:* Re: [Deploying Sakai] Search tool and rebuilding index
>
>     Hi Leon,
>
>     It looks like you need to clear out your search tables in the db.
>     So with sakai stoped:
>
>     1) run:
>
>     truncate table search_journal;
>     truncate table search_node_status;
>     truncate table search_segments;
>     truncate table search_transaction;
>     truncate table searchbuilderitem;
>     truncate table searchwriterlock;
>
>     (In oracle replace the truncate wiht delete from)
>
>     2) delete any files in your shared search folder
>     3) delete any files in the local search index
>
>
>     Regards
>
>     D
>
>     On 05/04/2011 08:16 AM, Leon Kolchinsky wrote:
>>     Thanks Anand,
>>
>>     You've been of a great help.
>>     I'm testing it now on one server and tomorrow I'll test it with
>>     shared
>>     "sharedJournalBase at org.sakaiproject.search.api.JournalSettings=
>>     <mailto:sharedJournalBase at org.sakaiproject.search.api.JournalSettings=><common
>>     directory accessible by all tomcats>/search"
>>     on NFS share.
>>
>>     1) Should I expect any problems given the fact that 2 servers
>>     would access
>>     sharedJournalBase at org.sakaiproject.search.api.JournalSettings
>>     <mailto:sharedJournalBase at org.sakaiproject.search.api.JournalSettings>
>>     location simultaniously?
>>
>>     2) What would be the reason for the following messages (after
>>     pressing "Rebuild Whole Index") and how to mitigate it:
>>
>>     2011-05-04 15:37:50,859  WARN Timer-1
>>     org.sakaiproject.search.optimize.shared.impl.JournalOptimizationOperation
>>     - Failed to compete optimize
>>     org.sakaiproject.search.optimize.api.OptimizedFailedIndexTransactionException:
>>     Failed to Optimize indexes
>>             at
>>     org.sakaiproject.search.optimize.shared.impl.OptimizeSharedTransactionListenerImpl.prepare(OptimizeSharedTransactionListenerImpl.java:292)
>>             at
>>     org.sakaiproject.search.transaction.impl.IndexTransactionImpl.firePrepare(IndexTransactionImpl.java:313)
>>             at
>>     org.sakaiproject.search.transaction.impl.IndexTransactionImpl.prepare(IndexTransactionImpl.java:147)
>>             at
>>     org.sakaiproject.search.optimize.shared.impl.JournalOptimizationOperation.runOnce(JournalOptimizationOperation.java:82)
>>             at
>>     org.sakaiproject.search.journal.impl.IndexManagementTimerTask.run(IndexManagementTimerTask.java:139)
>>             at java.util.TimerThread.mainLoop(Timer.java:512)
>>             at java.util.TimerThread.run(Timer.java:462)
>>     Caused by: java.io <http://java.io.Fi>.FileNotFoundException: no
>>     segments* file found in
>>     org.apache.lucene.store.FSDirectory@/srv/apache-tomcat-5.5.26/sakai/indexwork/journal-optimize-import/3:
>>     <mailto:org.apache.lucene.store.FSDirectory@/srv/apache-tomcat-5.5.26/sakai/indexwork/journal-optimize-import/3:>
>>     files:
>>             at
>>     org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:516)
>>             at
>>     org.apache.lucene.index.IndexReader.open(IndexReader.java:185)
>>             at
>>     org.apache.lucene.index.IndexReader.open(IndexReader.java:167)
>>             at
>>     org.sakaiproject.search.optimize.shared.impl.OptimizeSharedTransactionListenerImpl.prepare(OptimizeSharedTransactionListenerImpl.java:194)
>>             ... 6 more
>>     2011-05-04 15:39:20,930  INFO Timer-1
>>     org.sakaiproject.search.optimize.shared.impl.DbJournalOptimizationManager
>>     - Rolled Back Failed Shared Index operation a retry will happen
>>     on annother node soon
>>     2011-05-04 15:39:21,060  INFO Timer-1
>>     org.sakaiproject.search.journal.impl.MergeUpdateOperation - Local
>>     Merge Operation
>>             Merged Journal 59 from the redolog in 64 ms
>>             Merged Journal 60 from the redolog in 62 ms
>>
>>     2011-05-04 15:39:23,524  WARN Timer-1
>>     org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
>>     - Failed to digest
>>     /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR
>>     Leadrership course module 2.pdf wi
>>     th
>>     org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 50700ae1
>>     cause: Failed to get content for indexing: cause: Error: expected
>>     hex character and not  :32
>>     2011-05-04 15:39:23,524  INFO Timer-1
>>     org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
>>     - Digested
>>     /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR
>>     Leadrership course module 2.pdf into 43 ch
>>     aracters with Default Digester
>>     2011-05-04 15:39:33,461  WARN Timer-1
>>     org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
>>     - Failed to digest
>>     /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR
>>     Leadrership course module 1.pdf wi
>>     th
>>     org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 50700ae1
>>     cause: Failed to get content for indexing: cause: Error: expected
>>     hex character and not  :32
>>     2011-05-04 15:39:33,461  INFO Timer-1
>>     org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
>>     - Digested
>>     /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR
>>     Leadrership course module 1.pdf into 43 ch
>>     aracters with Default Digester
>>     2011-05-04 15:39:47,987  INFO Timer-1
>>     org.sakaiproject.search.journal.impl.SharedFilesystemJournalStorage
>>     - ++++++ Saving
>>     /srv/tomcat/sakai/indexwork/indexer-work/indextx-1304483547557 to
>>     shared
>>     2011-05-04 15:39:48,629  INFO Timer-1
>>     org.sakaiproject.search.indexer.impl.TransactionalIndexWorker -
>>     Indexed 198 documents in 27529 ms 7.192415271168586
>>     documents/second into save point 61
>>     2011-05-04 15:39:48,650  INFO Timer-1
>>     org.sakaiproject.search.optimize.shared.impl.DbJournalOptimizationManager
>>     - Locked 59 savePoints
>>
>>
>>
>>
>>     Thanks,
>>     Leon Kolchinsky
>>
>>
>>
>>     On Wed, May 4, 2011 at 00:10, Anand Mehta <anand.mehta at yahoo.com
>>     <mailto:anand.mehta at yahoo.com>> wrote:
>>
>>         Hello Leon,
>>
>>         You can add the following settings to sakai.properties:
>>
>>         localIndexBase at org.sakaiproject.search.api.JournalSettings=$
>>         <mailto:localIndexBase at org.sakaiproject.search.api.JournalSettings=$>{sakai.home}/indexwork
>>         sharedJournalBase at org.sakaiproject.search.api.JournalSettings= <mailto:sharedJournalBase at org.sakaiproject.search.api.JournalSettings=><common
>>         directory accessible by all tomcats>/search
>>
>>         After you have restarted all Tomcats, go to the search tool
>>         in the admin workspace (add it if not there), click on Admin
>>         link and then click "Rebuild Whole Index". I hope this helps.
>>
>>         Thanks,
>>         Anand
>>         ------------------------------------------------------------------------
>>         *From:* Leon Kolchinsky <lkolchin at gmail.com
>>         <mailto:lkolchin at gmail.com>>
>>         *To:* production <production at collab.sakaiproject.org
>>         <mailto:production at collab.sakaiproject.org>>
>>         *Sent:* Tuesday, May 3, 2011 12:32 AM
>>         *Subject:* [Deploying Sakai] Search tool and rebuilding index
>>
>>         Hello,
>>
>>         I'd like to enable "Search tool" on both of our Sakai nodes
>>         (v.2.6.2) running behind load balancer and using the same
>>         Oracle DB.
>>
>>         I'm testing it now on the dev. server and it seems that it's
>>         not finding announcement for example.
>>
>>         I've read
>>         https://confluence.sakaiproject.org/display/SEARCH/Home but
>>         find it a bit non user friendly (i.e. configuration
>>         instructions ;).
>>
>>         So I've several questions:
>>
>>         1) What other parameters should I use for  for my
>>         installation (besides search.enable=true).
>>
>>         2) How can I rebuild indexes?
>>         What command should I run? What parameter should I configure
>>         for that in sakai.properties?
>>         Should I configure a path for indexes in sakai.properties?
>>         How would you sync. those indexes on both nodes?
>>         If I have 2 nodes running on the same Oracle DB
>>         (Sorry couldn't find much info on that)
>>
>>
>>         Thanks in advance for any help,
>>         Leon Kolchinsky
>>
>>
>>         _______________________________________________
>>         production mailing list
>>         production at collab.sakaiproject.org
>>         <mailto:production at collab.sakaiproject.org>
>>         http://collab.sakaiproject.org/mailman/listinfo/production
>>
>>         TO UNSUBSCRIBE: send email to
>>         production-unsubscribe at collab.sakaiproject.org
>>         <mailto:production-unsubscribe at collab.sakaiproject.org> with
>>         a subject of "unsubscribe"
>>
>>
>>
>>     _______________________________________________
>>     production mailing list
>>     production at collab.sakaiproject.org  <mailto:production at collab.sakaiproject.org>
>>     http://collab.sakaiproject.org/mailman/listinfo/production
>>
>>     TO UNSUBSCRIBE: send email toproduction-unsubscribe at collab.sakaiproject.org  <mailto:production-unsubscribe at collab.sakaiproject.org>  with a subject of "unsubscribe"
>
>     _______________________________________________
>     production mailing list
>     production at collab.sakaiproject.org
>     <mailto:production at collab.sakaiproject.org>
>     http://collab.sakaiproject.org/mailman/listinfo/production
>
>     TO UNSUBSCRIBE: send email to
>     production-unsubscribe at collab.sakaiproject.org
>     <mailto:production-unsubscribe at collab.sakaiproject.org> with a
>     subject of "unsubscribe"
>
>
>     _______________________________________________
>     production mailing list
>     production at collab.sakaiproject.org
>     <mailto:production at collab.sakaiproject.org>
>     http://collab.sakaiproject.org/mailman/listinfo/production
>
>     TO UNSUBSCRIBE: send email to
>     production-unsubscribe at collab.sakaiproject.org
>     <mailto:production-unsubscribe at collab.sakaiproject.org> with a
>     subject of "unsubscribe"
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/production/attachments/20110505/bc860dfa/attachment-0001.html 


More information about the production mailing list