Does anyone else find Shared Jobs clunky and hard to manage?

The concept of job reuse makes perfect sense but understanding which version was used where and how it can be updated everywhere seems non-existent. Am I missing something?

 

I would interested to know from others what works well with Shared Jobs for them and what doesn't. Maybe it's just us that is struggling with making a go of it. We have fallen back to just creating small reusable orchestrations and reusing them through out projects. This presents issues when trying to share across multiple projects though which is why we wanted to Shared Jobs.

 

Hi Bryan,

we have many shared jobs in use for our projects and this works quite good if you want to reuse some business logic and call it with different input parameters.

I agree with you that the versioning feature of shared jobs is sometimes complicated. However, in the end it makes sense. Example: Imagine, you update an existing shared job which is currently on v5 and you deploy a new version v6. All existing uses of the shared jobs in your orchestration jobs will remain on v5. Thus, you can't break anything with the new version of the shared job. If you want to use v6 in your orchestration jobs, you have to delete the v5 components and add the v6 one (and define all parameters from scratch).

However, there is also an option to replace a version of a shared job. If you deploy changes to a shared job which is on version v5 as above, just edit the version number in the shared job wizzard and set it from v6 to v5. Then, all existing occurences of this shared job are replaced with the new version (which still says it is v5 but has your new changes included).

Hope this helps,

Michael

Hi @Bryan@Michael

We want to make some improvements to this feature in the near future, would you mind if i reached out to you to have a chat on how we can better solve your problems?

Thanks

Paul

Thanks for the insight Michael. It's always good to hear other people's perspective on usage. Between the info you have provided on your experience/usage and the info our Matillion rep has given us, we are comfortable giving this another shot. Knowing what I know now, I am sure we will find it quite beneficial. Thanks again!

@Michael@Bryan​ The described workaround "migrates" all references to the new version which is exactly not what you want to do in some cases (as @Michael​ explained). It would be helpful to have means to find all impacted jobs and migrate to a specific version by choice.

Hi Paul, please feel free to reach out. Kait Zemitis is our Matillion rep and knows how to get a hold of me. Thanks and I look forward to hearing from you!

@Bodo​ ,

I agree that a list of all impacted jobs could be helpful. Matillion still lacks a feature to find all referenced jobs so I think we have to wait for this feature for shared jobs as well. Nevertheless, a selective upgrade option would be valuable for us as well, especially as we have Dev and Prod projects on the same instance and it is always tricky to handle if you change a shared job in Dev but don't want to deploy it to Prod right away.

This feature request is closely related Show job references / where-used list

 

Hi @Bodo​ , We solved the issue with needing to find where a shared job is being used with an outside process. We have a process that offloads all groups which includes jobs to a snowflake table. We have views created around that data. One of the allow you to search for a component or in this case shared job across all other projects and jobs. It's handy for the situation that you described where you don't necessarily know where the shared job is being used.

@Bryan​ I've just created a feature request based on this as the problem for us is even worse. Here

@Michael​ workaround is a good idea but it doesn't allow for a proper deployment.

@PaulJohnson-DirectorofProduct​ Would gladly participate in the discussion, as the current situation is not acceptable for our current situation. Also when we are forced to use shared jobs to allow for parallel execution. See here

@Bryan​ Excellent! So you are already halfway to the solution for the mentioned Feature Request :) I think we will need to do the same. This job is based on the JSON you get from the API, right?

@Bodo​, you are correct. We are just using the Matillion API. There are 2 different approaches to this and knowing what I know now I would suggest the method that we did not initially implement. I would use the API profile to pull the info in the correct fields you want. Because we were in a hurry, we just dumped the entire JSON export into a variant column in a table in Snowflake. This is a quick and dirty way of doing it but there are caveats with doing it this way. If Matillion ever changes their schema within their JSON for the export you will end up with mixed data and will then have to write some SQL in your views to handle the different sets of JSON. If you used the API Profile the way it was designed then you simply make the change to the API profile and you have the same sets of fields still being exported and don't have to worry about changing the views in Snowflake. I hope this makes sense.

We go a bit further and offload schedules and statuses for job runs to solve another issue around knowing the status of all scheduled jobs no matter what project they live in. We then build a Power BI dashboard over this data and we can now watch failed vs successful job runs over time. If we see failures, we can drill into and see the exact project --> job --> component that failed along with any messages. It makes a great one stop solution for visibility across all jobs.

We run our orchestration that offloads this data once a night. This gives us a nightly snapshot of all projects and jobs and history of those jobs.

@Bryan​ Thanks for the explanation! In terms of runhistory we do the same based on BigQuery and Google Sheets. But for development support we don't have the JSON we already export and version in GIT also in the database. It would enable us to do exactly what you have described. It tried to use the API Query with the existing Matillion API Profile for the runstats (simple parent-child tables for the summary and details) but I was not able to define an xpath expression referencing the JSON parentid (summary) from the child group (details). So I ended up doing it in python where it was not a problem. I will give it a new try!

@Bodo​, this may be a hidden gem for you as it's not documented anywhere that I could find. This is an API Profile that I put together. It should tie all the history together in a way that should be pretty meaningful. Let me know if there is something I missed as I just started working on this new format this morning. I have tested and I know it works for our 1.44.11 version of Matillion.

<api:script xmlns:api="http://apiscript.com/ns?v1" xmlns:xs="http://www.w3.org/2001/XMLSchema">

<api:info title="Baseline" desc="Generated schema file." xmlns:other="http://apiscript.com/ns?v1">

<attr name="executionid" xs:type="string" other:xPath="/id" />

<attr name="jobtype" xs:type="string" other:xPath="/type" />

<attr name="customerid" xs:type="string" other:xPath="/customerID" />

<attr name="groupname" xs:type="string" other:xPath="/groupName" />

<attr name="projectid" xs:type="string" other:xPath="/projectID" />

<attr name="projectname" xs:type="string" other:xPath="/projectName" />

<attr name="versionid" xs:type="string" other:xPath="/versionID" />

<attr name="versionname" xs:type="string" other:xPath="/versionName" />

<attr name="jobid" xs:type="string" other:xPath="/jobID" />

<attr name="jobname" xs:type="string" other:xPath="/jobName" />

<attr name="environmentid" xs:type="string" other:xPath="/environmentID" />

<attr name="environmentname" xs:type="string" other:xPath="/environmentName" />

<attr name="state" xs:type="string" other:xPath="/state" />

<attr name="enqueuedtime" xs:type="string" other:xPath="/enqueuedTime" />

<attr name="starttime" xs:type="string" other:xPath="/startTime" />

<attr name="endtime" xs:type="string" other:xPath="/endTime" />

<attr name="message" xs:type="string" other:xPath="/message" />

<attr name="rowcount" xs:type="string" other:xPath="/rowCount" />

<attr name="taskid" xs:type="string" other:xPath="/tasks/taskID" />

<attr name="taskparentid" xs:type="string" other:xPath="/tasks/parentID" />

<attr name="tasktype" xs:type="string" other:xPath="/tasks/type" />

<attr name="taskjobID" xs:type="string" other:xPath="/tasks/jobID" />

<attr name="taskjobname" xs:type="string" other:xPath="/tasks/jobName" />

<attr name="taskjobrevision" xs:type="string" other:xPath="/tasks/jobRevision" />

<attr name="taskjobtimestamp" xs:type="string" other:xPath="/tasks/jobTimestamp" />

<attr name="taskcomponentid" xs:type="string" other:xPath="/tasks/componentID" />

<attr name="taskcomponentname" xs:type="string" other:xPath="/tasks/componentName" />

<attr name="taskstate" xs:type="string" other:xPath="/tasks/state" />

<attr name="taskrowcount" xs:type="string" other:xPath="/tasks/rowCount" />

<attr name="taskstarttime" xs:type="string" other:xPath="/tasks/startTime" />

<attr name="taskendtime" xs:type="string" other:xPath="/tasks/endTime" />

<attr name="taskmessage" xs:type="string" other:xPath="/tasks/message" />

<!-- Below are a the expected parameters to be passed in. In this case a Project Group and Project Name -->

<input name="groupname" xs:type="string" />

<input name="projectname" xs:type="string" />

<input name="startdate" xs:type="string" />

<input name="starttime" xs:type="string" />

<input name="enddate" xs:type="string" />

<input name="endtime" xs:type="string" />

</api:info>

<api:set attr="JSONPath" value="/" />

<api:set attr="BackwardsCompatibilityMode" value="false" />

<api:set attr="DataModel" value="DOCUMENT" />

<api:set attr="RepeatElement" value="/tasks" />

<api:set attr="URI" value="http://127.0.0.1:8080/rest/v1/group/name/[_input.groupname]/project/name/[_input.projectname]/task/filter/by/start/range/date/[_input.startdate]/time/[_input.starttime]/to/date/[_input.enddate]/time/[_input.endtime]" />

<api:script method="GET">

<api:call op="jsonproviderGet">

<api:push/>

</api:call>

</api:script>

</api:script>

@Bryan​ , I think the definition of the repeating element was the missing part here when I remember correctly. Thanks for the example! This will help me with using the API Query / Profile better!

@Bodo (Customer)​ , I am glad that may help. The other info I will leave you with is that the API Profiles support repeating elements as you get into arrays within arrays within array, etc. The newly updated documentation now includes this: https://documentation.matillion.com/docs/2836045