How to use GCP and Python in matillion

Hello everyone,

I need to work on a transformation process that involves the following steps:

1. Connect to a table in GCP and pass the information from said table to a variable of type dataframe "df".

2. The df variable must go through transformations (python script) to later fill a new table in GCP with the already transformed information.

 

It would help me a lot if you share a similar project with me so I can have a guide of what I should do.

Hi there David,

 

If your input data is a BigQuery table already set up in Matillion, it sounds like you have a simple transformation task. If it is a different database table/file hosted in GCP, you may need to start by using a database load to land the data in your data warehouse before working with it in Matillion.

 

If your transformations are not extraordinarily complex (such as an ML model) you would probably be better served to refactor your transformations into components in a transformation. If it is a ML model, or another complex piece of business logic that can't be transferred to SQL/components, you may want to rethink the structure of your process. ETL (and Matillion) is not really meant to handle heavy ML workloads on its own. The python capabilities of the Matillion instance itself will be strained to operate on the scale of data held in the warehouse itself. Instead, you can look towards using Matillion to 'trigger' and orchestrate the work to be done within BigQuery, or another tool meant for compute-intensive work.

 

Best of luck! Let the community know if you solve your issue, and how you achieved your solution.

 

Brendan