Hello All, In Matlllion ETL we have Database Query component, Which stages the data, onto S3 Storage, So overthere i could see that it doesnt have any file format(csv)etc ? If so How does it Load into Snowflake ! Can anyone throw some light on this?

Hello All,
In Matlllion ETL we have Database Query component, Which stages the data, onto S3 Storage, So overthere i could see that it doesnt have any file format(csv)etc ? If so How does it Load into Snowflake ! Can anyone throw some light on this ?

There are possibly a couple things at play here. What the actual format of the file is when it lands in S3 depends on the options that you have selected in the Data Query component. Typically the file format will be in a tab delimited format but could be compressed using something like GZip if you have chosen that as an option.

 

Either way, the easiest way of understanding what statement Matillion sent to Snowflake to in order to load the data is to go into Snowflake and look at the history. You will see a COPY INTO statement. As far as I know it should pretty much always look similar to this:

 

COPY INTO "Database_Name."Schema_Name"."Table_Name" ("Column_X", "Column_Y")

 FROM 's3://your_bucket/you_folder' CREDENTIALS= 

    AWS_KEY_ID='' 

    AWS_SECRET_KEY='' 

    AWS_TOKEN=''

FILE_FORMAT=(TYPE='CSV'FIELD_DELIMITER='\t' COMPRESSION='GZIP' TIMESTAMP_FORMAT='auto' DATE_FORMAT='auto' TIME_FORMAT='auto' TRIM_SPACE=TRUE NULL_IF=('\\N') EMPTY_FIELD_AS_NULL=FALSE)

ON_ERROR=ABORT_STATEMENT TRUNCATECOLUMNS=TRUE

 

The bold is what Snowflake will use to the read the file that is staged in S3. Hopefully this helps. Thanks for posting!

Initially, when we were migrating, huge tables, From Onprem the job used to fail after running for 12+ hours the jobs used to fail, due to S3 timeout error, so at that, time we havent found any info regarding the file type, as CSV. So may be this migh a bug where if a job fails the file type might be abandoned in the middle without completing it.