We're getting tens of thousands of errors in our catalina log for our dual-node HA clustered environment. Error msg below. Is this related to a time sync issue between the nodes?

Frederick.Wright · March 9, 2022, 1:21pm

INFO [QuartzScheduler_QuartzScheduler-10.82.105.132_ClusterManager] org.quartz.impl.jdbcjobstore.JobStoreSupport.clusterRecover ClusterManager: Scan

ning for instance "x.x.x.x"'s failed in-progress jobs.

Bryan · March 15, 2022, 8:48pm

Hi @Frederick.Wright,

This is probably a question to ask Matillion support simply because I don't think there are many HA setups out there so the footprint and visibility to your situation is pretty small. It won't hurt to post back what you find out from support though. Sorry, I couldn't be of more help. 😞

ChikaMatillionCommunityMgr · March 17, 2022, 8:42pm

Hi @Frederick.Wright

That message specifically has to do with being in a clustered HA architecture. In that HA architecture, you have 2 or more running nodes that can run Matillion jobs. The behavior of HA is that if a node fails, the other node can take the job workload. That particular message is related to this behavior, where Matillion is checking to see if any jobs were running on a node, but failed due to an issue with the node. If it had found any, it would then trigger for that failed job to run on the available node. So, it’s just the normal behavior of Matillion. And, the reason for the volume of those messages is so that Matillion’s HA architecture can be as resilient as possible. I hope that clears up any confusion.

Frederick.Wright · March 15, 2022, 9:43pm

Thanks @Bryan - I have temporarily stopped the messages by idling the other node, and will check again after we have completed the upgrade to v1.61.6 since the dependency on NTP may be interfering, as NTP was completely deprecated for RHEL 8.x. So I suspect there is a synchronization issue, and fingers cross it may be resolved after upgrade!

Topic		Replies	Views
Communications Link Failure: The last packet sent successfully to the server was 0 milliseconds ago Matillion ETL	2	0	October 19, 2021
After upgrading to Matillion version 1.72.7 , a lot of our schedules are failing on the intial step itself because it spins for a long time and it fails with a generic error that Network adapter could not establish the connection Matillion ETL	3	2	March 20, 2024
Prevent duplicate job option in job schedule is checked but it is not working Matillion ETL	2	0	May 17, 2021
I am receiving errors when trying to update certain dependencies on our Matillion instance on RHEL 8.2. I have put the exact error message in the "details" section below. Has anyone else encountered this error? Matillion ETL	2	1	October 22, 2021
I set up the new Matillion Error Handling feature and thought that it will alert for errors layers deep in your jobs. However, when one of my components fails in a transformation job within my orchestration job, the Component Message does not come through Matillion ETL	8	6	December 1, 2022

We're getting tens of thousands of errors in our catalina log for our dual-node HA clustered environment. Error msg below. Is this related to a time sync issue between the nodes?

Related topics