[Building Sakai] quartz scheduler problem

Mike De Simone michael.desimone at rsmart.com
Thu Jan 24 09:42:55 PST 2013


We usually only run quartz on one server in a cluster.

i.e., set startScheduler at org.sakaiproject.api.app.scheduler.SchedulerManager=false
on all but one server, then you effectively have just one tomcat messing
with the QRTZ tables.

If that's not feasible for you to implement, then I suggest staggering
restarts by at least 2 minutes (ideally 5) in order to serialize this kind
of activity on the database.

Absent a distributed transaction type of solution, these are the only 2
things I can think of to help here.



Thanks,

-------------------------------
Mike DeSimone
Application Operations Manager
*r**Smart* | 602-490-0473


On Mon, Jan 14, 2013 at 9:38 AM, Zhen Qian <zqian at umich.edu> wrote:

> Hi, Dev team:
>
> We have found problems with our 2.9 cluster quartz job scheduler
> deployment:
>
> 1. During server start up time, there are error log messages wrt
> SchedulerInvocationManagerImpl init:
>
> 2013-01-14 05:01:34,645 [localhost-startStop-1] INFO
>  org.sakaiproject.component.app.scheduler.ScheduledInvocationManagerImpl -
> init()
> 2013-01-14 05:01:34,667 [localhost-startStop-1] ERROR
> org.sakaiproject.component.app.scheduler.ScheduledInvocationManagerImpl -
> failed to schedule ScheduledInvocationRunner job
> org.quartz.ObjectAlreadyExistsException: Unable to store Job with name:
> 'org.sakaiproject.component.app.scheduler.ScheduledInvocationManagerImpl.runner'
> and group:
>  'org.sakaiproject.component.app.scheduler.ScheduledInvocationManagerImpl',
> because one already exists with this identification.
>         at
> org.quartz.impl.jdbcjobstore.JobStoreSupport.storeJob(JobStoreSupport.java:1098)
>         at
> org.quartz.impl.jdbcjobstore.JobStoreSupport$3.execute(JobStoreSupport.java:1047)
>         .....
>
> On the other hand, the quartz job is indeed defined in the database:
>
> select job_name, job_group, description, job_class_name, is_durable,
> is_volatile, is_stateful, requests_recovery, jdb_data from qrtz_job_details
> where
> job_name='org.sakaiproject.component.app.scheduler.ScheduledInvocationManagerImpl.runner'
>
>
> org.sakaiproject.component.app.scheduler.ScheduledInvocationManagerImpl.runner
> org.sakaiproject.component.app.scheduler.ScheduledInvocationManagerImpl
> org.sakaiproject.component.app.scheduler.jobs.ScheduledInvocationRunner 0
> 0 1 0 (BLOB)
>
> But there is no trigger defined for this job:
>
> select * from qrtz_triggers where
> job_name='org.sakaiproject.component.app.scheduler.ScheduledInvocationManagerImpl.runner'
>
> returns empty set.
>
> 2. The
> "org.sakaiproject.component.app.scheduler.ScheduledInvocationManagerImpl.runner"
> job had run successfully invocation rate of every 10 minutes, for a couple
> of days. It only stopped running recently after one of our recent app
> server restarts. Since then, tasks has been queued up, and one can view
> them from the scheduler_delayed_invocation table in db:
>
> select count(*) from scheduler_delayed_invocation
> 1011
>
>
> =======================================================================================
>
> *So far, I cannot recreate the problems with single server deployment. *
>
> The following function is used for ScheduledInvocationManagerImpl init
> call:
>
> *protected void registerScheduledInvocationRunner() throws
> SchedulerException {*
>
> *   //trigger will not start immediately, wait until interval has passed
> before 1st run*
>
> *   long startTime = System.currentTimeMillis() +
> getScheduledInvocationRunnerInterval();*
>
> *       JobDetail detail = new JobDetail(
> "org.sakaiproject.component.app.scheduler.ScheduledInvocationManagerImpl.runner"
> ,*
>
> *
> "org.sakaiproject.component.app.scheduler.ScheduledInvocationManagerImpl",
> ScheduledInvocationRunner.class);*
>
> *       Trigger trigger = new SimpleTrigger(
> "org.sakaiproject.component.app.scheduler.ScheduledInvocationManagerImpl.runner"
> ,*
>
> *
> "org.sakaiproject.component.app.scheduler.ScheduledInvocationManagerImpl",
> new Date(startTime), null, SimpleTrigger.REPEAT_INDEFINITELY,*
>
> *          getScheduledInvocationRunnerInterval());       *
>
> *       m_schedulerManager.getScheduler().unscheduleJob(trigger.getName(),
> trigger.getGroup());*
>
> *       m_schedulerManager.getScheduler().scheduleJob(detail, trigger);*
>
> *    }*
>
> Notice it tries to unscheduleJob first and add it back again. Will that be
> a problem in a cluster environment, when all the servers are restating at
> about the same time, and each server tries to unschedule and reschedule the
> same job?
>
> Any suggestions? We will create a JIRA ticket for the jobscheduler
> problems found.
>
> Thanks,
>
> - Zhen
>
>
>
>
>
>
> _______________________________________________
> sakai-dev mailing list
> sakai-dev at collab.sakaiproject.org
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to
> sakai-dev-unsubscribe at collab.sakaiproject.org with a subject of
> "unsubscribe"
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://collab.sakaiproject.org/pipermail/sakai-dev/attachments/20130124/34e95957/attachment.html 


More information about the sakai-dev mailing list