Hello, I would like to import the zipped and unzipped files I use data transfer but it does not take the variables my database is BigQuery

Hello,

I would like to import the zipped and unzipped files I use data transfer but it does not take the variables

my database is BigQuery

Hi @tatus_parker​,

Check to make sure your "path" variable by default resolves to an actual file. Matillion has some nuances around validation at configuration versus run time. It basically needs to validate correctly at the time of configuration even though at run time you may be changing the "path" variable value dynamically. This means when you go to configure your Data Transfer component and you use the "path" variable, the value for "path" needs to resolve to an actual file in the bucket. Let us know if this doesn't help. Thanks!

Hi @tatus_parker​,

At this point I don't think I can I be of much help. Since we are using the Matillion for Snowflake product, it's could be slightly different than Matillion for BigQuery. Meaning, there could be a bug or slight code difference between the two products.

If you want to invest the time, you might do this via Python or Bash Script. I have moved files via Python before in AWS and it was pretty easy to get up and running. The added element that you have is that your file is zipped which again should be pretty easy to implement as there is a Python module that deals with archive type files.

Hopefully, someone else can jump in here and give you some advice that is running on the same product as you.

Hi @ELTuser77​ ,

Please can you help me ?

Thanks

Hi @tatus_parker

Were you able to resolve this?

@Bryan​ hi! I am working with Matillion 1.50 for Snowflake and having a similar issue than Tatus.

It seems that the Data Transfer component do not allow variables; I've tried several ways of sending the variable (File Iterator, Grid Iterator, Table Iterator) with the same result:

  1. I click the three dots to get into the Source URL config, browse to my bucket and folder and insert the variable there, then click OK
  2. The variable gets dropped, when you see the config in the properties panel
  3. When I run it, I can see the variable correct in the iteration, but I get an error on the Data Transfer component; as it doesn't point to the file, because it doesn't seem to be using the variable.

I tried to implement the logic in this post https://matillioncommunity.discourse.group/t/is-it-possible-to-transfer-files-using-a-file-pattern-in-data-transfer-component-or-any-other-component/1167

But I'm thinking there is a bug with this Matillion version on Snowflake

@Bryan​ have you got it working on your side with your Matillion on Snowflake? and if so, which version are you on?

Hi @Bryan

sorry it doesn't work, I still have the error

Thanks

Hi @tatus_parker

This isn't something I'm familiar with I'm afraid. I would suggest reaching out to Support via email and someone from Customer Success can hopefully assist.

Thanks!

Hi @azucena.coronel​ and @tatus_parker​,

In all honesty, we don't use the Data Transfer component and lean on Python to do the moving and copying of files within S3. We use Python because we get more functionality and flexibility using the boto3 module.

With that said, I did some research into what you both have reported and found a few key pieces of information that might help.

Source URL for Data Transfer:

Target Object Name:

  • Required
  • Can be a variable

Target URL:

  • Could be the same bucket but a different folder (i.e. archive)
  • Could be a different S3 bucket and path completely as long as there is access

I will follow up with an example of how to iterate over files in a an S3 bucket/folder and copy the files to another location.

Here is an example of an orchestration that will iterate over JSON files in an S3 Bucket/folder and then copy them to an archive folder in the same S3 bucket. I hope this helps you two.

Leaving here my little python script with boto3 and unzipfile as a workaround for buggy versions of Data Transfer component.

Hope it is useful for you @tatus_parker

--------------------------------------------------------

import boto3

import zipfile

import io

###########

# CONFIGS #

###########

# AWS Simple Storage Service client

s3 = boto3.resource('s3')

s3_client = boto3.client('s3')

def main():

context.updateVariable("S3Bucket", "my_bucket")

bucket = s3.Bucket(S3Bucket)

filesList = []

for obj in bucket.objects.filter(Delimiter='/', Prefix='zip/'):

filesList.append(obj.key)

for e in filesList:

if '.zip' in str(e):

filename = str(e).replace('zip','csv')

obj = bucket.Object(e)

with io.BytesIO(obj.get()["Body"].read()) as tf:

tf.seek(0)

# Read the file as a zipfile and process the members

with zipfile.ZipFile(tf, mode='r') as zipf:

for subfile in zipf.namelist():

with zipf.open(subfile) as f:

dataBytes = f.read()

#s3_client.upload_file(filename,'${S3Bucket}',dataBytes)

#upload_file writes to an existing object, if it doesn't exist it will fail

#it is better to use put_object, to account for new files

s3_client.put_object(Body=dataBytes, Bucket='${S3Bucket}', Key=filename)

if __name__ == '__main__':

main()

What a rockstar @Bryan​ , thanks very much for putting together that example.

Unfortunately, Matillion confirmed that my version 1.50 has a bug that doesn't allow Data Transfer components to take a variable.

I even ! 👏

tried with your suggestion of using a variable for bucket and filename, without luck. I can also see that in your version the Source URL and Target Object are 2 different fields. In my version is the single one.

I will go through the python boto3 path to achieve this, and maybe once we get on the latest version at some point go and try again to use the native matillion components.

Thanks again for your response

No problem. Thanks for confirming that info about version 1.50 that could definitely help someone else out later if they are on that version and struggling as you were. The good thing about doing these type of tasks via Python is that they are essentially impervious to Matillion component and bugs. Of course you don't want to have to write everything in Python if you don't have to either as that adds time to development and a level of technical debt overtime. It's just good to be able to use Python as a second method to get around issues that you can't efficiently solve through components. It looks like you are well on your well with the script below. Nice work!