[Deploying Sakai] Search tool and rebuilding index
David Horwitz
david.horwitz at uct.ac.za
Thu May 5 04:24:43 PDT 2011
Hi Leon,
These are simply individual items that search wasn't able to digest,
explanations inline
On 05/05/2011 05:01 AM, Leon Kolchinsky wrote:
> Thanks David and Anand,
>
> I've cleared search tables in the DB, deleted files in shared and
> local search folders and rerun "Rebuild Whole Index
> <http://vera029.its.monash.edu.au/portal/tool/55d5a8bf-b3a1-42ed-81ac-7da250e0dc90/admin/index?command=rebuildinstance>"
>
> It seems OK except some worning messages.
>
> Should I be worried about the following?
>
> 2011-05-05 11:37:02,208 WARN Timer-1
> org.sakaiproject.search.indexer.impl.SearchBuilderQueueManager -
> Entity Reference Longer than 255 characters, ignored:
> Reference=/content/group/379e49c0-50fe-4a2b-ba24-752f9cc228b4/Anushi
> PHD/Anushi PHD/Anushi Immuno nanoparticle library.Data/PDF/M6_Levy et
> al, 1999 (Effect of shear on plasmid DNA in
> solution)-4065125014/M6_Levy et al, 1999 (Effect of shear on plasmid
> DNA in solution).pdf
> 2011-05-05 11:37:02,228 WARN Timer-1
> org.sakaiproject.search.indexer.impl.SearchBuilderQueueManager -
> Entity Reference Longer than 255 characters, ignored:
> Reference=/content/group/379e49c0-50fe-4a2b-ba24-752f9cc228b4/Anushi
> PHD/Anushi PHD/Anushi Immuno nanoparticle
> library.Data/PDF/F2_Taussig1974 (Mists and aerosols New studies, new
> thoughts)-0908373378/F2_Taussig1974 (Mists and aerosols New studies,
> new thoughts).pdf
>
Search has a 255 character limit on a reference, content references can
be longer - we need to fix this sometime
>
> 2011-05-05 11:50:42,537 WARN Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Failed to digest
> /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR Leadrership
> course module 2.pdf with
> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 9e72139
> cause: Failed to get content for indexing: cause: Error: expected hex
> character and not :32
>
The pdf digester (Apache PDFBox) was not able to read this pdf
> 2011-05-05 11:50:42,538 INFO Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Digested /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR
> Leadrership course module 2.pdf into 43 characters with Default Digester
> 2011-05-05 11:50:43,832 WARN Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Failed to digest
> /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics
> Admin/Consultative Council for Human Research Ethics Checklist.pdf
> with
> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 9e72139
> cause: Failed to get content for indexing: cause: null
> 2011-05-05 11:50:43,832 INFO Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Digested /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics
> Admin/Consultative Council for Human Research Ethics Checklist.pdf
> into 66 characters with Default Digester
> 2011-05-05 11:50:45,785 WARN Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Failed to digest
> /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics Admin/CCHRE
> Research Governance SOPs.pdf with
> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 9e72139
> cause: Failed to get content for indexing: cause: null
> 2011-05-05 11:50:45,786 INFO Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Digested /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics
> Admin/CCHRE Research Governance SOPs.pdf into 40 characters with
> Default Digester
> 2011-05-05 11:50:46,558 WARN Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Failed to digest
> /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics
> Admin/Consultative Council for Human Research Governance Checklist.pdf
> with
> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 9e72139
> cause: Failed to get content for indexing: cause: null
> 2011-05-05 11:50:46,558 INFO Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Digested /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics
> Admin/Consultative Council for Human Research Governance Checklist.pdf
> into 70 characters with Default Digester
> Warning, header block comes after data blocks in POIFS block listing
>
>
Ditto
D
> Thanks,
> Leon Kolchinsky
>
>
>
> On Thu, May 5, 2011 at 01:41, Anand Mehta <anand.mehta at yahoo.com
> <mailto:anand.mehta at yahoo.com>> wrote:
>
> Hi Leon,
>
> There should be no problem with all Tomcats accessing the
> sharedJournalBase directory simultaneously. As David mentioned,
> cleaning the search tables and files should help. You can also try
> the workaround here https://jira.sakaiproject.org/browse/SRCH-15
> if you still see some messages about missing segments.
>
> Thanks,
> Anand
> ------------------------------------------------------------------------
> *From:* David Horwitz <david.horwitz at uct.ac.za
> <mailto:david.horwitz at uct.ac.za>>
> *To:* production at collab.sakaiproject.org
> <mailto:production at collab.sakaiproject.org>
> *Sent:* Wednesday, May 4, 2011 4:02 AM
> *Subject:* Re: [Deploying Sakai] Search tool and rebuilding index
>
> Hi Leon,
>
> It looks like you need to clear out your search tables in the db.
> So with sakai stoped:
>
> 1) run:
>
> truncate table search_journal;
> truncate table search_node_status;
> truncate table search_segments;
> truncate table search_transaction;
> truncate table searchbuilderitem;
> truncate table searchwriterlock;
>
> (In oracle replace the truncate wiht delete from)
>
> 2) delete any files in your shared search folder
> 3) delete any files in the local search index
>
>
> Regards
>
> D
>
> On 05/04/2011 08:16 AM, Leon Kolchinsky wrote:
>> Thanks Anand,
>>
>> You've been of a great help.
>> I'm testing it now on one server and tomorrow I'll test it with
>> shared
>> "sharedJournalBase at org.sakaiproject.search.api.JournalSettings=
>> <mailto:sharedJournalBase at org.sakaiproject.search.api.JournalSettings=><common
>> directory accessible by all tomcats>/search"
>> on NFS share.
>>
>> 1) Should I expect any problems given the fact that 2 servers
>> would access
>> sharedJournalBase at org.sakaiproject.search.api.JournalSettings
>> <mailto:sharedJournalBase at org.sakaiproject.search.api.JournalSettings>
>> location simultaniously?
>>
>> 2) What would be the reason for the following messages (after
>> pressing "Rebuild Whole Index") and how to mitigate it:
>>
>> 2011-05-04 15:37:50,859 WARN Timer-1
>> org.sakaiproject.search.optimize.shared.impl.JournalOptimizationOperation
>> - Failed to compete optimize
>> org.sakaiproject.search.optimize.api.OptimizedFailedIndexTransactionException:
>> Failed to Optimize indexes
>> at
>> org.sakaiproject.search.optimize.shared.impl.OptimizeSharedTransactionListenerImpl.prepare(OptimizeSharedTransactionListenerImpl.java:292)
>> at
>> org.sakaiproject.search.transaction.impl.IndexTransactionImpl.firePrepare(IndexTransactionImpl.java:313)
>> at
>> org.sakaiproject.search.transaction.impl.IndexTransactionImpl.prepare(IndexTransactionImpl.java:147)
>> at
>> org.sakaiproject.search.optimize.shared.impl.JournalOptimizationOperation.runOnce(JournalOptimizationOperation.java:82)
>> at
>> org.sakaiproject.search.journal.impl.IndexManagementTimerTask.run(IndexManagementTimerTask.java:139)
>> at java.util.TimerThread.mainLoop(Timer.java:512)
>> at java.util.TimerThread.run(Timer.java:462)
>> Caused by: java.io <http://java.io.Fi>.FileNotFoundException: no
>> segments* file found in
>> org.apache.lucene.store.FSDirectory@/srv/apache-tomcat-5.5.26/sakai/indexwork/journal-optimize-import/3:
>> <mailto:org.apache.lucene.store.FSDirectory@/srv/apache-tomcat-5.5.26/sakai/indexwork/journal-optimize-import/3:>
>> files:
>> at
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:516)
>> at
>> org.apache.lucene.index.IndexReader.open(IndexReader.java:185)
>> at
>> org.apache.lucene.index.IndexReader.open(IndexReader.java:167)
>> at
>> org.sakaiproject.search.optimize.shared.impl.OptimizeSharedTransactionListenerImpl.prepare(OptimizeSharedTransactionListenerImpl.java:194)
>> ... 6 more
>> 2011-05-04 15:39:20,930 INFO Timer-1
>> org.sakaiproject.search.optimize.shared.impl.DbJournalOptimizationManager
>> - Rolled Back Failed Shared Index operation a retry will happen
>> on annother node soon
>> 2011-05-04 15:39:21,060 INFO Timer-1
>> org.sakaiproject.search.journal.impl.MergeUpdateOperation - Local
>> Merge Operation
>> Merged Journal 59 from the redolog in 64 ms
>> Merged Journal 60 from the redolog in 62 ms
>>
>> 2011-05-04 15:39:23,524 WARN Timer-1
>> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
>> - Failed to digest
>> /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR
>> Leadrership course module 2.pdf wi
>> th
>> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 50700ae1
>> cause: Failed to get content for indexing: cause: Error: expected
>> hex character and not :32
>> 2011-05-04 15:39:23,524 INFO Timer-1
>> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
>> - Digested
>> /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR
>> Leadrership course module 2.pdf into 43 ch
>> aracters with Default Digester
>> 2011-05-04 15:39:33,461 WARN Timer-1
>> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
>> - Failed to digest
>> /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR
>> Leadrership course module 1.pdf wi
>> th
>> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 50700ae1
>> cause: Failed to get content for indexing: cause: Error: expected
>> hex character and not :32
>> 2011-05-04 15:39:33,461 INFO Timer-1
>> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
>> - Digested
>> /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR
>> Leadrership course module 1.pdf into 43 ch
>> aracters with Default Digester
>> 2011-05-04 15:39:47,987 INFO Timer-1
>> org.sakaiproject.search.journal.impl.SharedFilesystemJournalStorage
>> - ++++++ Saving
>> /srv/tomcat/sakai/indexwork/indexer-work/indextx-1304483547557 to
>> shared
>> 2011-05-04 15:39:48,629 INFO Timer-1
>> org.sakaiproject.search.indexer.impl.TransactionalIndexWorker -
>> Indexed 198 documents in 27529 ms 7.192415271168586
>> documents/second into save point 61
>> 2011-05-04 15:39:48,650 INFO Timer-1
>> org.sakaiproject.search.optimize.shared.impl.DbJournalOptimizationManager
>> - Locked 59 savePoints
>>
>>
>>
>>
>> Thanks,
>> Leon Kolchinsky
>>
>>
>>
>> On Wed, May 4, 2011 at 00:10, Anand Mehta <anand.mehta at yahoo.com
>> <mailto:anand.mehta at yahoo.com>> wrote:
>>
>> Hello Leon,
>>
>> You can add the following settings to sakai.properties:
>>
>> localIndexBase at org.sakaiproject.search.api.JournalSettings=$
>> <mailto:localIndexBase at org.sakaiproject.search.api.JournalSettings=$>{sakai.home}/indexwork
>> sharedJournalBase at org.sakaiproject.search.api.JournalSettings= <mailto:sharedJournalBase at org.sakaiproject.search.api.JournalSettings=><common
>> directory accessible by all tomcats>/search
>>
>> After you have restarted all Tomcats, go to the search tool
>> in the admin workspace (add it if not there), click on Admin
>> link and then click "Rebuild Whole Index". I hope this helps.
>>
>> Thanks,
>> Anand
>> ------------------------------------------------------------------------
>> *From:* Leon Kolchinsky <lkolchin at gmail.com
>> <mailto:lkolchin at gmail.com>>
>> *To:* production <production at collab.sakaiproject.org
>> <mailto:production at collab.sakaiproject.org>>
>> *Sent:* Tuesday, May 3, 2011 12:32 AM
>> *Subject:* [Deploying Sakai] Search tool and rebuilding index
>>
>> Hello,
>>
>> I'd like to enable "Search tool" on both of our Sakai nodes
>> (v.2.6.2) running behind load balancer and using the same
>> Oracle DB.
>>
>> I'm testing it now on the dev. server and it seems that it's
>> not finding announcement for example.
>>
>> I've read
>> https://confluence.sakaiproject.org/display/SEARCH/Home but
>> find it a bit non user friendly (i.e. configuration
>> instructions ;).
>>
>> So I've several questions:
>>
>> 1) What other parameters should I use for for my
>> installation (besides search.enable=true).
>>
>> 2) How can I rebuild indexes?
>> What command should I run? What parameter should I configure
>> for that in sakai.properties?
>> Should I configure a path for indexes in sakai.properties?
>> How would you sync. those indexes on both nodes?
>> If I have 2 nodes running on the same Oracle DB
>> (Sorry couldn't find much info on that)
>>
>>
>> Thanks in advance for any help,
>> Leon Kolchinsky
>>
>>
>> _______________________________________________
>> production mailing list
>> production at collab.sakaiproject.org
>> <mailto:production at collab.sakaiproject.org>
>> http://collab.sakaiproject.org/mailman/listinfo/production
>>
>> TO UNSUBSCRIBE: send email to
>> production-unsubscribe at collab.sakaiproject.org
>> <mailto:production-unsubscribe at collab.sakaiproject.org> with
>> a subject of "unsubscribe"
>>
>>
>>
>> _______________________________________________
>> production mailing list
>> production at collab.sakaiproject.org <mailto:production at collab.sakaiproject.org>
>> http://collab.sakaiproject.org/mailman/listinfo/production
>>
>> TO UNSUBSCRIBE: send email toproduction-unsubscribe at collab.sakaiproject.org <mailto:production-unsubscribe at collab.sakaiproject.org> with a subject of "unsubscribe"
>
> _______________________________________________
> production mailing list
> production at collab.sakaiproject.org
> <mailto:production at collab.sakaiproject.org>
> http://collab.sakaiproject.org/mailman/listinfo/production
>
> TO UNSUBSCRIBE: send email to
> production-unsubscribe at collab.sakaiproject.org
> <mailto:production-unsubscribe at collab.sakaiproject.org> with a
> subject of "unsubscribe"
>
>
> _______________________________________________
> production mailing list
> production at collab.sakaiproject.org
> <mailto:production at collab.sakaiproject.org>
> http://collab.sakaiproject.org/mailman/listinfo/production
>
> TO UNSUBSCRIBE: send email to
> production-unsubscribe at collab.sakaiproject.org
> <mailto:production-unsubscribe at collab.sakaiproject.org> with a
> subject of "unsubscribe"
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/production/attachments/20110505/bc860dfa/attachment-0001.html
More information about the production
mailing list