Community update 2021-04

Hi all! It's Ian from Matillion's Customer Success team, with a updated snapshot of the most popular Community tags in the last six weeks.

 

 

Matillion ETL itself, “VM”, “Jobs”, “SQL”, “Run” and “Scripts” all tie in with a new community sourced JDBC driver for Matillion ETL itself which we have just added to the Matillion Exchange.

 

It’s a standard JDBC type 4 driver, which you can install by following this guide.

 

The JSON fragment that you’ll need for /usr/share/emerald/WEB-INF/classes/jdbc-providers.properties looks like this

 

{

  “name”: “METL”,

  “driver”: “com.thereisnogravity.Driver”,

  “url”: “jdbc:metl://”,

  “fetchSize”: “500”,

  “limit”: “limit-inline”,

  “allowUpload”: true

}

 

It’s also available for download on the Exchange.

 

Once installed

  • Create an ordinary Database Query component using the driver
  • Provide the username and password of a user that’s privileged to use the Matillion API

 

You will find various built-in data sources available, including

  • group, project, version and job - standard Matillion metadata
  • schedule and environment - covering all projects that the user is permitted to access
  • runningjob - showing a snapshot of all running jobs in all projects
  • joblaunchstats - gives you insight into any latency issues which are preventing your jobs from starting running immediately after they are launched (for example trying to run too many jobs at once, or trying to run the same named job twice in parallel). The statistics will cycle once per week, so you should aim to capture them to a more permanent table
  • instance - giving you basic metrics on your VM, such as CPU, network, timezone etc

 

The driver will work better on more recent versions of Matillion, and I’d recommend sticking to Basic mode rather than using the SQL editor.

 

You’re free to use this however it’s useful, but one suggested Orchestration Job structure would be like this:

 

 

More posts to follow, but meanwhile your comments, feedback and feature requests are welcome as always.

 

Best regards,

Ian

 

Hi Ian

 

The JDBC API is great!. Any plans for enabling component level information?

Hi User1593422479064255852,

 

Glad it has been useful!

 

That's an interesting idea. You're right: you can fetch all the job names easily, but there's nothing at component level currently.

 

Did you have a use case in mind for component-level information? Presumably something that's not already covered in the web user interface... maybe change tracking?

 

Hello,

i noticed this new driver and try to use this directly without any problems, an additional good way to receive informations from our instances/projects.

I directly created a monitoring based on this driver, but than i see that the informations regarding memory in the table "instance", this is for me not really helpful because my monitoring told me that the memory is always growing to 100% after a restart of the instance. In real that not fit with the internal informations from our instance. Maybe there is a possiblity to collect better meaningful memory informations and publish this in the column "memoryused". All other informations seams to be ok from my side.

Hi Ian

Its nothing urgent really… I wrote an ETL a short time ago take a Customers Matillion JSON project and extract/flatten the JSONs components into a table. The purpose was to help us speed up estimation for work, as the customer wanted to migrate his code from Redshift to Snowflake. With the components flattened, we could filter the volume/ type of components that might cause migration a problem an apply a time weighting quicker.

I was just thinking a component level JDBC API would have made my ETL much easier to develop. However, if there is no customer demand then not worth bothering with.

cheers

Phil