Hello,
I would like to import the zipped and unzipped files I use data transfer but it does not take the variables
my database is BigQuery
Hello,
I would like to import the zipped and unzipped files I use data transfer but it does not take the variables
my database is BigQuery
Hi @tatus_parker,
Check to make sure your "path" variable by default resolves to an actual file. Matillion has some nuances around validation at configuration versus run time. It basically needs to validate correctly at the time of configuration even though at run time you may be changing the "path" variable value dynamically. This means when you go to configure your Data Transfer component and you use the "path" variable, the value for "path" needs to resolve to an actual file in the bucket. Let us know if this doesn't help. Thanks!
Hi @tatus_parker,
At this point I don't think I can I be of much help. Since we are using the Matillion for Snowflake product, it's could be slightly different than Matillion for BigQuery. Meaning, there could be a bug or slight code difference between the two products.
If you want to invest the time, you might do this via Python or Bash Script. I have moved files via Python before in AWS and it was pretty easy to get up and running. The added element that you have is that your file is zipped which again should be pretty easy to implement as there is a Python module that deals with archive type files.
Hopefully, someone else can jump in here and give you some advice that is running on the same product as you.
Hi @tatus_parker
Were you able to resolve this?
@Bryan hi! I am working with Matillion 1.50 for Snowflake and having a similar issue than Tatus.
It seems that the Data Transfer component do not allow variables; I've tried several ways of sending the variable (File Iterator, Grid Iterator, Table Iterator) with the same result:
I tried to implement the logic in this post https://matillioncommunity.discourse.group/t/is-it-possible-to-transfer-files-using-a-file-pattern-in-data-transfer-component-or-any-other-component/1167
But I'm thinking there is a bug with this Matillion version on Snowflake
@Bryan have you got it working on your side with your Matillion on Snowflake? and if so, which version are you on?
Hi @tatus_parker
This isn't something I'm familiar with I'm afraid. I would suggest reaching out to Support via email and someone from Customer Success can hopefully assist.
Thanks!
Hi @azucena.coronel and @tatus_parker,
In all honesty, we don't use the Data Transfer component and lean on Python to do the moving and copying of files within S3. We use Python because we get more functionality and flexibility using the boto3 module.
With that said, I did some research into what you both have reported and found a few key pieces of information that might help.
Source URL for Data Transfer:
Target Object Name:
Target URL:
I will follow up with an example of how to iterate over files in a an S3 bucket/folder and copy the files to another location.
Here is an example of an orchestration that will iterate over JSON files in an S3 Bucket/folder and then copy them to an archive folder in the same S3 bucket. I hope this helps you two.
Leaving here my little python script with boto3 and unzipfile as a workaround for buggy versions of Data Transfer component.
Hope it is useful for you @tatus_parker
--------------------------------------------------------
import boto3
import zipfile
import io
###########
# CONFIGS #
###########
# AWS Simple Storage Service client
s3 = boto3.resource('s3')
s3_client = boto3.client('s3')
def main():
context.updateVariable("S3Bucket", "my_bucket")
bucket = s3.Bucket(S3Bucket)
filesList = []
for obj in bucket.objects.filter(Delimiter='/', Prefix='zip/'):
filesList.append(obj.key)
for e in filesList:
if '.zip' in str(e):
filename = str(e).replace('zip','csv')
obj = bucket.Object(e)
with io.BytesIO(obj.get()["Body"].read()) as tf:
tf.seek(0)
# Read the file as a zipfile and process the members
with zipfile.ZipFile(tf, mode='r') as zipf:
for subfile in zipf.namelist():
with zipf.open(subfile) as f:
dataBytes = f.read()
#s3_client.upload_file(filename,'${S3Bucket}',dataBytes)
#upload_file writes to an existing object, if it doesn't exist it will fail
#it is better to use put_object, to account for new files
s3_client.put_object(Body=dataBytes, Bucket='${S3Bucket}', Key=filename)
if __name__ == '__main__':
main()
What a rockstar @Bryan , thanks very much for putting together that example.
Unfortunately, Matillion confirmed that my version 1.50 has a bug that doesn't allow Data Transfer components to take a variable.
I even ! 👏
tried with your suggestion of using a variable for bucket and filename, without luck. I can also see that in your version the Source URL and Target Object are 2 different fields. In my version is the single one.
I will go through the python boto3 path to achieve this, and maybe once we get on the latest version at some point go and try again to use the native matillion components.
Thanks again for your response
No problem. Thanks for confirming that info about version 1.50 that could definitely help someone else out later if they are on that version and struggling as you were. The good thing about doing these type of tasks via Python is that they are essentially impervious to Matillion component and bugs. Of course you don't want to have to write everything in Python if you don't have to either as that adds time to development and a level of technical debt overtime. It's just good to be able to use Python as a second method to get around issues that you can't efficiently solve through components. It looks like you are well on your well with the script below. Nice work!