[Deploying Sakai] Search tool and rebuilding index
Leon Kolchinsky
lkolchin at gmail.com
Thu May 5 18:27:09 PDT 2011
Thanks for the explanation David ;)
Cheers,
Leon Kolchinsky
On Thu, May 5, 2011 at 21:24, David Horwitz <david.horwitz at uct.ac.za> wrote:
> Hi Leon,
>
>
> These are simply individual items that search wasn't able to digest,
> explanations inline
>
>
> On 05/05/2011 05:01 AM, Leon Kolchinsky wrote:
>
> Thanks David and Anand,
>
> I've cleared search tables in the DB, deleted files in shared and local
> search folders and rerun "Rebuild Whole Index<http://vera029.its.monash.edu.au/portal/tool/55d5a8bf-b3a1-42ed-81ac-7da250e0dc90/admin/index?command=rebuildinstance>
> "
>
> It seems OK except some worning messages.
>
> Should I be worried about the following?
>
> 2011-05-05 11:37:02,208 WARN Timer-1
> org.sakaiproject.search.indexer.impl.SearchBuilderQueueManager - Entity
> Reference Longer than 255 characters, ignored:
> Reference=/content/group/379e49c0-50fe-4a2b-ba24-752f9cc228b4/Anushi
> PHD/Anushi PHD/Anushi Immuno nanoparticle library.Data/PDF/M6_Levy et al,
> 1999 (Effect of shear on plasmid DNA in solution)-4065125014/M6_Levy et al,
> 1999 (Effect of shear on plasmid DNA in solution).pdf
> 2011-05-05 11:37:02,228 WARN Timer-1
> org.sakaiproject.search.indexer.impl.SearchBuilderQueueManager - Entity
> Reference Longer than 255 characters, ignored:
> Reference=/content/group/379e49c0-50fe-4a2b-ba24-752f9cc228b4/Anushi
> PHD/Anushi PHD/Anushi Immuno nanoparticle library.Data/PDF/F2_Taussig1974
> (Mists and aerosols New studies, new thoughts)-0908373378/F2_Taussig1974
> (Mists and aerosols New studies, new thoughts).pdf
>
> Search has a 255 character limit on a reference, content references can
> be longer - we need to fix this sometime
>
>
>
> 2011-05-05 11:50:42,537 WARN Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Failed to digest /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR
> Leadrership course module 2.pdf with
> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 9e72139cause: Failed to get content for indexing: cause: Error: expected hex
> character and not :32
>
>
> The pdf digester (Apache PDFBox) was not able to read this pdf
>
>
>
> 2011-05-05 11:50:42,538 INFO Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Digested /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR
> Leadrership course module 2.pdf into 43 characters with Default Digester
> 2011-05-05 11:50:43,832 WARN Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Failed to digest
> /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics
> Admin/Consultative Council for Human Research Ethics Checklist.pdf with
> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 9e72139cause: Failed to get content for indexing: cause: null
> 2011-05-05 11:50:43,832 INFO Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Digested /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics
> Admin/Consultative Council for Human Research Ethics Checklist.pdf into 66
> characters with Default Digester
> 2011-05-05 11:50:45,785 WARN Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Failed to digest
> /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics Admin/CCHRE
> Research Governance SOPs.pdf with
> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 9e72139cause: Failed to get content for indexing: cause: null
> 2011-05-05 11:50:45,786 INFO Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Digested /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics
> Admin/CCHRE Research Governance SOPs.pdf into 40 characters with Default
> Digester
> 2011-05-05 11:50:46,558 WARN Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Failed to digest
> /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics
> Admin/Consultative Council for Human Research Governance Checklist.pdf with
> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 9e72139cause: Failed to get content for indexing: cause: null
> 2011-05-05 11:50:46,558 INFO Timer-1
> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
> - Digested /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/Ethics
> Admin/Consultative Council for Human Research Governance Checklist.pdf into
> 70 characters with Default Digester
> Warning, header block comes after data blocks in POIFS block listing
>
>
>
> Ditto
>
>
> D
>
> Thanks,
> Leon Kolchinsky
>
>
>
> On Thu, May 5, 2011 at 01:41, Anand Mehta <anand.mehta at yahoo.com> wrote:
>
>> Hi Leon,
>>
>> There should be no problem with all Tomcats accessing the
>> sharedJournalBase directory simultaneously. As David mentioned, cleaning the
>> search tables and files should help. You can also try the workaround here
>> https://jira.sakaiproject.org/browse/SRCH-15 if you still see some
>> messages about missing segments.
>>
>>
>> Thanks,
>> Anand
>> ------------------------------
>> *From:* David Horwitz <david.horwitz at uct.ac.za>
>> *To:* production at collab.sakaiproject.org
>> *Sent:* Wednesday, May 4, 2011 4:02 AM
>> *Subject:* Re: [Deploying Sakai] Search tool and rebuilding index
>>
>> Hi Leon,
>>
>> It looks like you need to clear out your search tables in the db. So with
>> sakai stoped:
>>
>> 1) run:
>>
>> truncate table search_journal;
>> truncate table search_node_status;
>> truncate table search_segments;
>> truncate table search_transaction;
>> truncate table searchbuilderitem;
>> truncate table searchwriterlock;
>>
>> (In oracle replace the truncate wiht delete from)
>>
>> 2) delete any files in your shared search folder
>> 3) delete any files in the local search index
>>
>>
>> Regards
>>
>> D
>>
>> On 05/04/2011 08:16 AM, Leon Kolchinsky wrote:
>>
>> Thanks Anand,
>>
>> You've been of a great help.
>> I'm testing it now on one server and tomorrow I'll test it with shared "
>> sharedJournalBase at org.sakaiproject.search.api.JournalSettings=<common
>> directory accessible by all tomcats>/search"
>> on NFS share.
>>
>> 1) Should I expect any problems given the fact that 2 servers would access
>> sharedJournalBase at org.sakaiproject.search.api.JournalSettings location
>> simultaniously?
>>
>> 2) What would be the reason for the following messages (after pressing "Rebuild
>> Whole Index") and how to mitigate it:
>>
>> 2011-05-04 15:37:50,859 WARN Timer-1
>> org.sakaiproject.search.optimize.shared.impl.JournalOptimizationOperation -
>> Failed to compete optimize
>> org.sakaiproject.search.optimize.api.OptimizedFailedIndexTransactionException:
>> Failed to Optimize indexes
>> at
>> org.sakaiproject.search.optimize.shared.impl.OptimizeSharedTransactionListenerImpl.prepare(OptimizeSharedTransactionListenerImpl.java:292)
>> at
>> org.sakaiproject.search.transaction.impl.IndexTransactionImpl.firePrepare(IndexTransactionImpl.java:313)
>> at
>> org.sakaiproject.search.transaction.impl.IndexTransactionImpl.prepare(IndexTransactionImpl.java:147)
>> at
>> org.sakaiproject.search.optimize.shared.impl.JournalOptimizationOperation.runOnce(JournalOptimizationOperation.java:82)
>> at
>> org.sakaiproject.search.journal.impl.IndexManagementTimerTask.run(IndexManagementTimerTask.java:139)
>> at java.util.TimerThread.mainLoop(Timer.java:512)
>> at java.util.TimerThread.run(Timer.java:462)
>> Caused by: java.io <http://java.io.Fi>.FileNotFoundException: no
>> segments* file found in
>> org.apache.lucene.store.FSDirectory@/srv/apache-tomcat-5.5.26/sakai/indexwork/journal-optimize-import/3:files:
>> at
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:516)
>> at org.apache.lucene.index.IndexReader.open(IndexReader.java:185)
>> at org.apache.lucene.index.IndexReader.open(IndexReader.java:167)
>> at
>> org.sakaiproject.search.optimize.shared.impl.OptimizeSharedTransactionListenerImpl.prepare(OptimizeSharedTransactionListenerImpl.java:194)
>> ... 6 more
>> 2011-05-04 15:39:20,930 INFO Timer-1
>> org.sakaiproject.search.optimize.shared.impl.DbJournalOptimizationManager -
>> Rolled Back Failed Shared Index operation a retry will happen on annother
>> node soon
>> 2011-05-04 15:39:21,060 INFO Timer-1
>> org.sakaiproject.search.journal.impl.MergeUpdateOperation - Local Merge
>> Operation
>> Merged Journal 59 from the redolog in 64 ms
>> Merged Journal 60 from the redolog in 62 ms
>>
>> 2011-05-04 15:39:23,524 WARN Timer-1
>> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
>> - Failed to digest /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR
>> Leadrership course module 2.pdf wi
>> th
>> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 50700ae1cause: Failed to get content for indexing: cause: Error: expected hex
>> character and not :32
>> 2011-05-04 15:39:23,524 INFO Timer-1
>> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
>> - Digested /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR
>> Leadrership course module 2.pdf into 43 ch
>> aracters with Default Digester
>> 2011-05-04 15:39:33,461 WARN Timer-1
>> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
>> - Failed to digest /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR
>> Leadrership course module 1.pdf wi
>> th
>> org.sakaiproject.search.component.adapter.contenthosting.PDFContentDigester at 50700ae1cause: Failed to get content for indexing: cause: Error: expected hex
>> character and not :32
>> 2011-05-04 15:39:33,461 INFO Timer-1
>> org.sakaiproject.search.component.adapter.contenthosting.ContentHostingContentProducer
>> - Digested /content/group/963f86af-9012-4626-bdcd-e8592c60c1db/MIHSR
>> Leadrership course module 1.pdf into 43 ch
>> aracters with Default Digester
>> 2011-05-04 15:39:47,987 INFO Timer-1
>> org.sakaiproject.search.journal.impl.SharedFilesystemJournalStorage - ++++++
>> Saving /srv/tomcat/sakai/indexwork/indexer-work/indextx-1304483547557 to
>> shared
>> 2011-05-04 15:39:48,629 INFO Timer-1
>> org.sakaiproject.search.indexer.impl.TransactionalIndexWorker - Indexed 198
>> documents in 27529 ms 7.192415271168586 documents/second into save point 61
>> 2011-05-04 15:39:48,650 INFO Timer-1
>> org.sakaiproject.search.optimize.shared.impl.DbJournalOptimizationManager -
>> Locked 59 savePoints
>>
>>
>>
>>
>> Thanks,
>> Leon Kolchinsky
>>
>>
>>
>> On Wed, May 4, 2011 at 00:10, Anand Mehta <anand.mehta at yahoo.com> wrote:
>>
>> Hello Leon,
>>
>> You can add the following settings to sakai.properties:
>>
>> localIndexBase at org.sakaiproject.search.api.JournalSettings=$
>> {sakai.home}/indexwork
>> sharedJournalBase at org.sakaiproject.search.api.JournalSettings=<common
>> directory accessible by all tomcats>/search
>>
>> After you have restarted all Tomcats, go to the search tool in the admin
>> workspace (add it if not there), click on Admin link and then click "Rebuild
>> Whole Index". I hope this helps.
>>
>>
>> Thanks,
>> Anand
>> ------------------------------
>> *From:* Leon Kolchinsky <lkolchin at gmail.com>
>> *To:* production <production at collab.sakaiproject.org>
>> *Sent:* Tuesday, May 3, 2011 12:32 AM
>> *Subject:* [Deploying Sakai] Search tool and rebuilding index
>>
>> Hello,
>>
>> I'd like to enable "Search tool" on both of our Sakai nodes (v.2.6.2)
>> running behind load balancer and using the same Oracle DB.
>>
>> I'm testing it now on the dev. server and it seems that it's not finding
>> announcement for example.
>>
>> I've read https://confluence.sakaiproject.org/display/SEARCH/Home but
>> find it a bit non user friendly (i.e. configuration instructions ;).
>>
>> So I've several questions:
>>
>> 1) What other parameters should I use for for my installation (besides
>> search.enable=true).
>>
>> 2) How can I rebuild indexes?
>> What command should I run? What parameter should I configure for that in
>> sakai.properties?
>> Should I configure a path for indexes in sakai.properties? How would you
>> sync. those indexes on both nodes?
>> If I have 2 nodes running on the same Oracle DB
>> (Sorry couldn't find much info on that)
>>
>>
>> Thanks in advance for any help,
>> Leon Kolchinsky
>>
>>
>> _______________________________________________
>> production mailing list
>> production at collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/production
>>
>> TO UNSUBSCRIBE: send email to
>> production-unsubscribe at collab.sakaiproject.org with a subject of
>> "unsubscribe"
>>
>>
>>
>> _______________________________________________
>> production mailing listproduction at collab.sakaiproject.orghttp://collab.sakaiproject.org/mailman/listinfo/production
>>
>> TO UNSUBSCRIBE: send email to production-unsubscribe at collab.sakaiproject.org with a subject of "unsubscribe"
>>
>>
>> _______________________________________________
>> production mailing list
>> production at collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/production
>>
>> TO UNSUBSCRIBE: send email to
>> production-unsubscribe at collab.sakaiproject.org with a subject of
>> "unsubscribe"
>>
>>
>> _______________________________________________
>> production mailing list
>> production at collab.sakaiproject.org
>> http://collab.sakaiproject.org/mailman/listinfo/production
>>
>> TO UNSUBSCRIBE: send email to
>> production-unsubscribe at collab.sakaiproject.org with a subject of
>> "unsubscribe"
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/production/attachments/20110506/6b66432d/attachment-0001.html
More information about the production
mailing list