Hi all,
the number of jobs, schedules and dependencies keep growing.
How do you solve the issues with dependencies between different jobs?
With smaller projects and job plans it is possible to add all jobs into one orchestration job.
But with larger and more complex jobs this gets messy pretty fast.
I can imagine different solution for this issue:
The first solution is to use logging tables and solve this issue using the database.
In this case the jobs needs to poll the database to check if the dependent job is finished. Or another solution is that every finished jobs is starting all jobs that are dependent and they only start if all requirements are fulfilled.
This might get messy fast as well because the dependencies are maintained inside a database table. How to handle errors and delays?
Another solution is to use e.g. Apache Airflow.
In this case Airflow needs to poll Matillion to monitor the job status.
Is anyone using Airflow with Matillion for larger projects? Is the polling an issue for Matillion? I have some trust issues for such an external solution in combination with the Matillion API.
Our requirement is to easily add new dependencies and restart jobs when they fail.
I am missing any other solutions?
One more fancy solution might be to have jobs running all the time and everything has some form of eventual consistency. But this won't work everywhere.
Kind regards,
Nils