Hello, I would like to import the zipped and unzipped files I use data transfer but it does not take the variables my database is BigQuery

tatus_parker · April 8, 2021, 2:44pm

Hello,

I would like to import the zipped and unzipped files I use data transfer but it does not take the variables

my database is BigQuery

Bryan · April 8, 2021, 9:46pm

Check to make sure your "path" variable by default resolves to an actual file. Matillion has some nuances around validation at configuration versus run time. It basically needs to validate correctly at the time of configuration even though at run time you may be changing the "path" variable value dynamically. This means when you go to configure your Data Transfer component and you use the "path" variable, the value for "path" needs to resolve to an actual file in the bucket. Let us know if this doesn't help. Thanks!

Bryan · April 9, 2021, 2:47pm

Hi @tatus_parker,

At this point I don't think I can I be of much help. Since we are using the Matillion for Snowflake product, it's could be slightly different than Matillion for BigQuery. Meaning, there could be a bug or slight code difference between the two products.

If you want to invest the time, you might do this via Python or Bash Script. I have moved files via Python before in AWS and it was pretty easy to get up and running. The added element that you have is that your file is zipped which again should be pretty easy to implement as there is a Python module that deals with archive type files.

Hopefully, someone else can jump in here and give you some advice that is running on the same product as you.

tatus_parker · April 12, 2021, 3:26pm

Hi @ELTuser77 ,

Please can you help me ?

Thanks

azucena.coronel · June 9, 2021, 5:11am

Hi @tatus_parker

Were you able to resolve this?

@Bryan hi! I am working with Matillion 1.50 for Snowflake and having a similar issue than Tatus.

It seems that the Data Transfer component do not allow variables; I've tried several ways of sending the variable (File Iterator, Grid Iterator, Table Iterator) with the same result:

I click the three dots to get into the Source URL config, browse to my bucket and folder and insert the variable there, then click OK
The variable gets dropped, when you see the config in the properties panel
When I run it, I can see the variable correct in the iteration, but I get an error on the Data Transfer component; as it doesn't point to the file, because it doesn't seem to be using the variable.

I tried to implement the logic in this post https://matillioncommunity.discourse.group/t/is-it-possible-to-transfer-files-using-a-file-pattern-in-data-transfer-component-or-any-other-component/1167

But I'm thinking there is a bug with this Matillion version on Snowflake

@Bryan have you got it working on your side with your Matillion on Snowflake? and if so, which version are you on?

tatus_parker · April 9, 2021, 8:34am

Hi @Bryan

sorry it doesn't work, I still have the error

Thanks

ELTuser77 · April 12, 2021, 4:07pm

Hi @tatus_parker

This isn't something I'm familiar with I'm afraid. I would suggest reaching out to Support via email and someone from Customer Success can hopefully assist.

Thanks!

Bryan · June 9, 2021, 6:35pm

Hi @azucena.coronel and @tatus_parker,

In all honesty, we don't use the Data Transfer component and lean on Python to do the moving and copying of files within S3. We use Python because we get more functionality and flexibility using the boto3 module.

With that said, I did some research into what you both have reported and found a few key pieces of information that might help.

Source URL for Data Transfer:

This URL is required to point an individual file within S3
When using variables take a look at the screenshot below
- 0694G00000EuAkdQAF_0D74G000007j7c5SAA692×273 69.5 KB

Target Object Name:

Required
Can be a variable

Target URL:

Could be the same bucket but a different folder (i.e. archive)
Could be a different S3 bucket and path completely as long as there is access

I will follow up with an example of how to iterate over files in a an S3 bucket/folder and copy the files to another location.

Bryan · June 9, 2021, 6:46pm

Here is an example of an orchestration that will iterate over JSON files in an S3 Bucket/folder and then copy them to an archive folder in the same S3 bucket. I hope this helps you two.

azucena.coronel · June 10, 2021, 4:54am

Leaving here my little python script with boto3 and unzipfile as a workaround for buggy versions of Data Transfer component.

Hope it is useful for you @tatus_parker

--------------------------------------------------------

import boto3

import zipfile

import io

###########

# CONFIGS #

###########

# AWS Simple Storage Service client

s3 = boto3.resource('s3')

s3_client = boto3.client('s3')

def main():

context.updateVariable("S3Bucket", "my_bucket")

bucket = s3.Bucket(S3Bucket)

filesList = []

for obj in bucket.objects.filter(Delimiter='/', Prefix='zip/'):

filesList.append(obj.key)

for e in filesList:

if '.zip' in str(e):

filename = str(e).replace('zip','csv')

obj = bucket.Object(e)

with io.BytesIO(obj.get()["Body"].read()) as tf:

tf.seek(0)

# Read the file as a zipfile and process the members

with zipfile.ZipFile(tf, mode='r') as zipf:

for subfile in zipf.namelist():

with zipf.open(subfile) as f:

dataBytes = f.read()

#s3_client.upload_file(filename,'${S3Bucket}',dataBytes)

#upload_file writes to an existing object, if it doesn't exist it will fail

#it is better to use put_object, to account for new files

s3_client.put_object(Body=dataBytes, Bucket='${S3Bucket}', Key=filename)

if __name__ == '__main__':

main()

azucena.coronel · June 10, 2021, 3:03am

What a rockstar @Bryan , thanks very much for putting together that example.

Unfortunately, Matillion confirmed that my version 1.50 has a bug that doesn't allow Data Transfer components to take a variable.

I even ! 👏

tried with your suggestion of using a variable for bucket and filename, without luck. I can also see that in your version the Source URL and Target Object are 2 different fields. In my version is the single one.

I will go through the python boto3 path to achieve this, and maybe once we get on the latest version at some point go and try again to use the native matillion components.

Thanks again for your response

Bryan · June 10, 2021, 3:14pm

No problem. Thanks for confirming that info about version 1.50 that could definitely help someone else out later if they are on that version and struggling as you were. The good thing about doing these type of tasks via Python is that they are essentially impervious to Matillion component and bugs. Of course you don't want to have to write everything in Python if you don't have to either as that adds time to development and a level of technical debt overtime. It's just good to be able to use Python as a second method to get around issues that you can't efficiently solve through components. It looks like you are well on your well with the script below. Nice work!

Topic		Replies	Views
Hello, I would like to know if it is possible on Matillion to load the openSSL encrypted files Matillion ETL	4	0	April 8, 2021
Hi everyone I am new to matillion etl tool can u suggest how to learn or grow on on Matillion ETL	6	0	May 5, 2021
Export data generated from python to snowflake Matillion ETL	7	2	April 25, 2022
Hello All, In Matlllion ETL we have Database Query component, Which stages the data, onto S3 Storage, So overthere i could see that it doesnt have any file format(csv)etc ? If so How does it Load into Snowflake ! Can anyone throw some light on this? Matillion ETL	2	2	February 1, 2022
Unload the data from big query to a file in sorted order Matillion ETL	4	0	December 19, 2023

Hello, I would like to import the zipped and unzipped files I use data transfer but it does not take the variables my database is BigQuery

Related topics