Is it possible to launch a job in a project from another project?

Hey everyone,

 

I build my projects with one orchestration job that then runs everything. It is this job that is scheduled to run every night.

 

Now I'm gonna build another project that will be dependent and will have to be executed once the first project execution is complete.

 

How can I do that? Can I launch a job in project B from a job in project A?

 

If it helps, I'm in Matillion for Snowflake in Azure.

 

Thanks.

 

JFS.

 

 

Hello,

you can

  • (in case you have an enabled SQS configuration) use as last component in your job the "SQS Message" component to send a message to the Queue that Matillion is listen
  • Use the Matillion API to trigger a seperate job in another project, for example via Python script

Hi JFS,

 

It certainly is, yes. There are two main ways:

 

  1. Using the Matillion REST API, for example from a Bash Script (example below)
  2. If you have enabled it, using the messaging listener, e.g. Azure Queue, SQS or Pub/Sub depending on your cloud provider

 

Inside a Bash Script you could run a command like this:

 

curl -k -X POST -u ian:ian "http://localhost:8080/rest/v1/group/name/Matillion/project/name/Other%20Project/version/name/default/job/name/The%20Optional%20Job/run?environmentName=Demo"

 

A few notes on this command:

 

  • The address to use is http://localhost:8080 to launch a job on the Matillion ETL instance you're logged into
  • If the Job names etc have spaces in them, you have to make them URL-safe, i.e. replacing spaces with %20
  • There is no way to access Project Group Passwords from a Bash Script, so I've hardcoded mine in the example above. You might prefer to use variable(s)

 

When you successfully launch a job in a different Group, Project or Version to the one you're logged into you won't see anything happen in the UI. It only shows jobs running in the project you're logged into.

 

To get a single view of all the jobs running on your instance you can use the Matillion ETL JDBC driver, and choose the runningjob table. In the screenshot below you can see that there are jobs running in multiple different projects.

 

 

Best regards,

Ian

 

Hey Ian,

 

Thanks for the example, it helps a lot. But I'm having issues. Here's my command:

 

curl -X POST -u user:password "http://localhost:8080/rest/v1/group/name/Matillion/project/name/ProjectName/version/name/default/job/name/JobName/run?environmentName=EnvironmentName"

 

I think I respect your notes, no spaces, I'm trying to run a job on the same server I'm logged in.

 

In the log, though I get this:

13-Jul-2021 15:49:41.852 WARNING [https-openssl-apr-8443-exec-5] com.matillion.bi.emerald.server.modelmanager.JobReader.getTransformationJob User [jfs] failed to retrieve the Job with ID [769604] from the Version with ID [796] in the Project with ID [795] for the Project Group with ID [694]. Version [796] does not contain a Transformation Job with ID [769604].

 

The log is referring to a Transformation job, I'm trying to launch an Orchestration job... we can do that, can we?

 

At the end, the message I get is "Failed connect to localhost:8080; Connection refused", but we see that I connect.

 

Or am I missing something in the path somewhere? Do I need to specify the path through the folders I organised my jobs in?

 

Any idea?

 

Thanks a lot,

 

JFS.

Hi @JFS​,

Lots going on there!

First of all, your curl command looks correct to me. I copied and pasted it, changed the names of the Project, Environment etc and ran it from an SSH session in my own Matillion instance. First with an Orchestration Job, and then afterwards with a Transformation Job:

The first one worked, and it returned the task history ID of the newly running job.

But trying to run a Transformation Job in that way is not permitted. You are only allowed to launch an Orchestration Job via the API. I think we may have changed the error message for that situation over time, so you might see slightly different. The above screenshot is from Matillion ETL 1.54.

But in any case I’m not 100% certain why that restriction exists. Transformation Jobs are top-level objects, same as Orchestration Jobs, so it might be worthwhile adding that request to our Ideas Portal if it’s a big use case for you. (Same applies to Shared Jobs).

There’s no need to refer to folder structures or paths with the API. All job names are forced unique so you can’t have the same named job in two places in the folder structure.

Regarding the ‘failed to retrieve’ error, I believe that’s just background noise in the logfile. On most (all?) instances, if you run:

sudo grep “failed to retrieve the” /var/log/tomcat*/catalina.out

… it returns a lot of lines. I don’t know why those messages appear, but it does not seem to cause any problems.

The last thing is the ‘Connection refused’ error. Out of the box, Matillion listens on ports 8080 and 8443. Those get redirected to the standard HTTP/S ports by an iptables redirect so it looks to non-local users like it’s using the standard ports. You can see that by running a sudo iptables -L -n -t nat command from an SSH session.

But it is possible to switch the listeners around from their default positions, from the Admin / SSL menu.

So if you're trying port 8080 and getting a connection refused error, my guesses would be:

  • You're not running that curl command from on the Matillion instance (use port 80 instead if you're not on the instance)
  • You have switched off the HTTP listener
  • The iptables redirects are missing

Hope that makes sense and is helpful!

Ian

Hey @ian.funnell

Thanks a lot for the help.

I had 2 issues in my Curl command. First, from your example, I did not replace the "Matillion" by the project group name my project is in. Completely my bad here. The second issue was the SSL protocol. It was set to HTTPS (443). By selecting BOTH, it now works.

Anyways, thanks again for the help,

JFS.

Hi Ian,

 

thank you for the very helpful post above. I need to run several jobs in sequence and would like the first job to end before the next one starts, because there are some dependences between them.

 

Is there a way to wait for the first one to execute before the second one starts? I assume I can capture the job ID and then query it until it is shown as completed. How can I return the ID in Bash? Or have you developed a different approach?

 

Thank you,

Aristotle

 

Ian,

I know it's been a while since you posted this. Can you pls share some more on how the "Running Jobs" component is setup? Is it the JDBC Incremental Load (which probably makes no sense)? We are hosting the data in Snowflake.

 

Thank you,

Aristotle

Hi @Aristotle​ and thanks for your patience while I looked into this!

There are two possible endpoints, although the first one mentioned is the most preferable:

http://localhost:8080/rest/v0/tasks

http://localhost:8080/rest/v1/group/name/<<yourgroup>>/project/name/<<yourproject>>/task/running

We also have a blog that might be of interest to you that has some good information on the REST API. You can find it here.

Let me know if that helps at all!

Many thanks,

Claire

Thank you for your reply Claire. I was able to successfully implement those API calls.

 

I should have explained myself a little better, but I was referring to the following statement: "To get a single view of all the jobs running on your instance you can use the Matillion ETL JDBC driver, and choose the running job table." I was not able to follow the setup of the JDBC driver to do this.

 

BTW, the blog is very intersting. Thank you for the link.

 

Aristotle