Created One Orchestrate Job for the Entire process, It get the processing date and Triggers all the 4 dim load in parallel (Transform jobs), Once after the dim load process is done, Fact load is triggered ( Transform Job).
For some reason Fact load fails or one of the dim load fails, then the entire orchestrate job goes to Failed state. When I rerun the Job, It should process only the failed loads.
Eg: If Fact load failed, On Rerun Dim loads should be skipped and Fact need to be triggered
or If One of Dim failed, Only the Failed Dim load should get triggered and then Fact process should start.
How to achieve this Design in Matillion?
Do we have Checkpointing in Matillion to skip the completed the process ?
There's no concept of restartability with Matillion jobs. If you re-run a failed job it will just run the whole thing again from the start. I think that's why they have this section on idempotence in the HA design document. The idea is that you design your jobs in such a way that it doesn't matter if they get run multiple times.
You can re-start a job manually from a chosen component, but it does not allow you to restart multiple threads of activity, or from partway through an interator. If you do that there's no easy way to reset the variables to how they were at the point of failure, although this Exchange Job might be helpful.
What you are describing is very typical and definitely doable. There are a couple ways of tackling this issue but I will describe the most common one I see and it's the one we use. We created a config table in Snowflake which we leverage to track state. It's basically an EAV (Entity, Attribute, Value) model. This table can be used to track state of a whole job, a task, set of tasks, or all of the above but the idea is that you are using this table track a state.
So, let's apply this to your scenario. You have 4 dim loads happening asynchronously at the same time. For each one of those 4 loads you would have an entry in the config table that defines the schema, table, job, and timestamp. The first step to loading a dim would be to update the corresponding record in the config table and remove the timestamp. This is the indicator that the dim is in a loading status. When the dim has completed the last step is to set the timestamp value to the current timestamp. This does 2 things for you. It tells you when the last load completed and whether it loaded successfully.
Now that you have a state tracking mechanism for your individual loads, you would put some logic at the beginning of your job that would only process the Dim or Fact tables that do not have the timestamp set or meet some logic like the timestamp is set but it's the next day or window for the job run so run it anyways. I hope this helps paint a picture of how to tackle this. Let me know if this isn't clear enough and I can probably offer some more info. Thanks!
It's really all about tracking state of each load and determining what to do about it.
Thanks Bryan for the detailed explanation. I understood that matillion doesn't have an inbuild functionality and user has to build it based on their use case, We need to maintain a JOB_LOG table to track the loads and if the loads are done skip it else trigger the loads.
You are correct about Matillion not having that level of logging functionality. In all honesty, doing it yourself outside of Matillion helps in a handful of ways. It allows you to design a system that works best for you, your company, and the situation. If you have ever been in a position where the company decides to take a different approach to things which requires new tooling, it's easier to pivot to a new toolset if you have not developed yourself into a long term tool dependency.