Export data generated from python to snowflake

i have created a python script which will send a get request from a website, i can get the data in json format and i would like to use Matillion and place it to snowflake . i have tried to use grid variable and export it to a table, but failed to do so . any suggestion how should i do the process ? thanks

Where do you execute the python script? On the Matillion VM?

 

I would recommend that before you extract data using a Python script, you first evaluate the Matillion components such as "Data Transfer", "API Extract" (incl. API profile) if they fit your use case.

 

If you execute the Python code using the "Python Script" component and select the local storage as the storage path, your JSON file will be stored on the Matillion VM. Here I don't think there is a simple drag and drop component that copies the data from the VM to Snowflake. Sure you will be able to find the file path on the VM, but you would have to connect to Snowflake via the Python script. Before that, you would have to install the Snowflake Python Connector on the VM: https://docs.snowflake.com/en/user-guide/python-connector.html and make sure that the credentials are not stored in the script (keyword: information security). 

 

Matillion is designed to work with a cloud blob storage (Azure, AWS or GCP) and a cloud data warehouse engine (e.g. Snowflake). Therefore, your focus should be on how to get the data into a cloud blob storage. 

From there, you could then use a corresponding Cloud Storage component (e.g. Azure Blob Storage).

 

API Extract component: API Extract | Matillion ETL Documentation

API Profile: Manage Query Profiles | Matillion ETL Documentation

Data Transfer component: Data Transfer Component (matillion.com)

Cloud Storage components: S3 Load Component (matillion.com) , Azure Blob Storage Load (matillion.com),

 

Could you specify: "[...] a python script which will send a get request from a website"?

I assume you extract data from a website via GET Method? Do you use a REST API?

yes , the script i wrote is just data retrieve from GET request, then i will store it as json/CSV format, ultimately move it to snowflake.

Hey , How you have loaded the data into S3bucket from url , Can you please update your code, I have same requirement but unable to do , Please help.

meaning to say i should upload the data query via python to s3 bucket , then i will load the data from s3 to snowflake?

If you should request a REST API via GET method, then I would use the Matillion component API Extract instead of the Python script. See https://documentation.matillion.com/docs/1959484

 

The advantage is that you can store the credentials in the API profile. The sensitive information would then be masked for others and you can conveniently access the credentials via the API Extract component.

 

If there is no other way and you have to rely on the Python script, you have several options. I see two options for Snowflake and AWS.

 

1. install Boto3 (AWS SDK) on the Matillion VM. According to the documentation, the Python script would then use your AWS credentials that you used in the Matillion setup. See https://documentation.matillion.com/docs/2234735

2. run the Python script outside Matillion. See https://documentation.matillion.com/docs/2972828

 

I have not yet been able to implement either option in practice (I work with Azure and Snowflake). Feel free to share your experiences here.

 

What advantages do you see with the Python script?

 

thanks for the reply . i do have a lot of variables for the url parameters in a loop and no credentials are needed to do the get request. hence I'm using python script to get the data.

I will upload it to a s3 and store the file name as grid variable and load the s3 , extract it to snowflake