Job scheduled run twice

I try to schedule my job but it runs twice or more...

 

I shutdown the server by the job itself, it runs a command to shutdown the EC2 server where it's running.

 

I start the EC2 server through a Lambda function, 5 minutes after another lambda start the Matillion's job through API (I tried to use Matillion's scheduler with same results).

 

This works like a charm BUT... When I start the EC2 instance later in the day when I connect to Matillion, the job is starting automatically !!! and I don't understand why ? The source of the running job is API (or schedule depending on the method I use to schedule the job).

 

I think Matillion is queuing job and when I shutdown the server it seems to mess this queue and Matillion try to rerun when the server is started...

I too see similar behavior for 2 of about 80+ jobs & dont understand why.

Those 2 jobs are running whenever they miss the schedule - as METL is shutdown on job scheduled time, but running only once - as job scheduled frequency, but whenever METL is brought up - out of the job scheduled time..

The solution I tested and It seems to work for now is to send an SQS message to AWS Queue and the lambda function stop the instance (instead of Matillion launching a shell script to shutdown itself). It seems to work well, when I start matillion the job is not launching automatically.

Any new insights regarding this topic?

 

What we noticed is that the scheduler reruns everything that is expected to be started but was not started.

So e.g. if you shutdown the server before the scheduler could trigger the job the job is triggered on the next startup. For us this happens for example on the weekend. We shutdown the server earlier during the weekend. Some of the jobs are running intra daily and they are triggered directly after the start of the server on Sunday. This could be avoided by creating different schedules for weekday/weekend but in ours case it is not critical and relevant.

 

In the past we had some issues regarding our Matillion instance and needed to clone the instance to test something on this copy. That gets annoying too. If you don't deactivate all the schedules the copied image will start all the jobs when started again... So we had to deactivate all the schedules before creating this image...

 

In my opinion it would be useful to control this behavior. I think depending on the use case it might be useful that the scheduler restarts everything that he expected to be running. But in my opinion must of the time I don't want this behavior. So when my server crashed and I restart the server maybe a day later I don't want everything to be triggered...

So maybe such a feature like "do not rerun jobs an startup flag" could solve this problem as well.

 

if you enable the ignore missfire in the scheduler setup, if the matillion server is not ON then the schedule will not stay in queue.

Hope this helps.

I am shutting down the job from Lambda all the time ( not from within the instance). still it is happening for few jobs. I am not able to find out the difference between the jobs that start abnormally vs others behaving normal.

Yep, probably an option to hold any new jobs from being started, if required. Like a resource drain, to manage situations like an outage period. Something to inactivate all schedules temporarily in one shot, rather than modifying the jobs one by one

This behaviour has been fixed in Matillion ETL version 1.69.1 (interim release)

 

New features and improvements

  • A scheduled job that misses its scheduled time (by default, within 30 seconds of its scheduled time) is declared a misfire. When a misfire is detected, it will now show in the task history decorated with a new "misfire" icon. This should make it clearer when misfires are detected and what that means for misfire handling. These events will not appear when the Ignore Misfire option is selected when creating a schedule.