I am trying to get data from api which returns the data in csv format. How can I load this data using Matillion into snowflake since API query component cannot be used. What are the best possible options and solution to the option

I tried generating the access token while using python script and stored the output return in one variable. Now how should I push that into snowflake. Also, I tried importing libraries of snowflake while writing python script in matillion but it says no package found .

Excited to hear back for possible solution

Hi @AkashBansal

Seems you have stumped our awesome community on this one, but fear not we can ask our solution architects to look at this for you, if you would like me to raise a support ticket on your behalf please let me know and I can arrange that for you.

Thanks Joe

Hi @AkashBansal​, I will preface my answer with I have not done this personally but I have some thoughts on various ways to tackle this depending on your situation and skill level with different technologies. It sounds like the approach you have started with in regards to pulling data from an API using Python and storing it in a variable is a good start to one approach that comes to mind. Staying with the approach, instead of putting the contents into a single variable, you could create or use an existing grid variable, parse the CSV content and pump it into the grid variable. Once you have the GridVariable you can use a Transformation job to pump the data into a table in Snowflake. This is an example of what the transform job might look like:

Another option is to take the single variable with the contents of the CSV and create a file in an S3 Bucket (if you are on AWS) which can then be loaded using Matillion's S3 Load component.

Another option is to convert the CSV content to JSON using a method like what is described here: https://www.geeksforgeeks.org/convert-csv-to-json-using-python/. From there you could dump that JSON to a single column in a grid variable and use the transform job as described above to pump a single column of JSON content into a Variant column in Snowflake.

There are other methods but these suit the path you have taken thus far based on your original question.

I hope this helps in some way. Let us know if you have more questions.

I forgot to address the importing of python packages issue where Matillion was saying no package found. Matillion only comes with the core Python libraries along with a couple others. If you want libraries that are not currently on the Matillion instance, you will need to connect to the console of the Matillion instance using your favorite method. I prefer SSH using PuTTY. Once you are on the console you will need to install the libraries you wish to use.

 

Example of installing NumPy for Python2 scripts:

pip install numpy

 

Example of installing NumPy for Python3 scripts:

python3 -m pip install numpy

 

Since we build out our Matillion environments in AWS using CloudFormation templates, we include the install commands for our Python Libraries within the UserData section for the Matillion instance. That way when we launch a new Matillion instance, it will come with all the libraries already installed and ready to use.

 

I hope this helps answer the issue related to no package found.

Hey Bryan,

Thanks for the follow up . I somehow understand what to do but struggling with how to load the data into grid from python script. It would be great of you if you can share me an example. Really appreciate such a quick response from Matillion.

Stay Safe

Thanks Bryan and Joe for a quick turnaround. It worked!!!!

I was also curious to know how to generate an rsd file for Json data by passing Bearer token as authentication .It would be great if you show me an example .

Thanks again

Will do thanks bryan

Hi @AkashBansal​, here is a mini example I threw together. I hope this helps.

#Python 3 script below

import csv

# Define local variable to store CSV data as Python List

gridVarValues = []

# Create some CSV data to be used for the example

demoData = 'TestDataCol1_Row1,TestDataCol2_Row1,TestDataCol3_Row1\r\n'

demoData = demoData + 'TestDataCol1_Row2,TestDataCol2_Row2,TestDataCol3_Row2\r\n'

demoData = demoData + 'TestDataCol1_Row3,TestDataCol2_Row3,TestDataCol3_Row3'

# Create reader for demoData in order to iterate over CSV rows and columns

# Since each row is delimited by a carriage return then split it by the lines before reading

reader = csv.reader(demoData.splitlines())

# Iterate over each row in demoData and append the row to our Python List

for row in reader:

gridVarValues.append(row)

# If rows were appended then update the Matillion GridVariable using our Python list

if gridVarValues != []:

context.updateGridVariable('CSVdata',gridVarValues)

@AkashBansal​, for the sake of not burying this question and answer, can you post this question as a separate discussion. I believe there are a lot of people that would like to know that question and answer to this one as I believe it's been asked in the past.

Hi All ,

 

I have written a python code using which you can pull CSV data through API and load it to S3 bucket.

This can be used in python component in Matllion

 

import requests

import boto3

 

#API Endpoint URL

url = 'add you URL here'

 

#Authentication credentials

user='add your username'

password='add your password'

 

#create a session and add authentication credentials to the session

s3_bucket = 'add your S3 Bucket'

s3_path = 'add your S3 Path'

 

#Make API request to get CSV Data

session=requests.Session()

session.auth = (user,password)

 

response =session.get(url)

 

print(response.text)

 

#Load data to S3 Bucket

s3 = boto3.resource('s3')

object = s3.Object(s3_bucket, s3_path)

object.put(Body=response.text)

Per this email thread, I'm doing something similar and I wanted to know if there was a way to instead pass the response from my API to multiple grid variables as against having it in just one grid variable. Something like

 

Name, Gender, City

John , Male, Texas

 

GridVar1 = Name

GridVar1 = Gender

GridVar1 = City

 

If you could post a sample script, or the grid variable sample screenshot, that will be nice.