How can I "split" 1 CSV file into multiple CSV files, load each of them into S3, and finally load each into their own Snowflake table?

I have a csv file that I need to get from an SFTP server. Normally, this is pretty straightforward. Use data load component to load from SFTP -> S3 and then S3 load the file into a Snowflake table. However, this CSV file that I need to load is actually 3 different CSV files appended together in a strange format. So the CSV will look like this:

  • CSV_1 field names
  • CSV_2 field names
  • CSV_3 field names
  • CSV_1 rows
  • CSV_2 rows
  • CSV_3 rows

Also, each CSV file does not necessarily have the same number of fields.

 

I think writing a Python script might be the best way to tackle. Have the Python script split into 3 CSVs and then load each into S3. Of course, if anyone has done something similar or have better ideas, please let me know.

 

Thanks!

Hello,

Nice problem you got there :) If the file itself fits into a snowflake table 'as-is', I would probably try to solve this within Snowflake.

Load all into a one column and continue from there, then you can do everything in Matillion Transformations (in Snowflake)

 

Of course if it doesn't fit into the table, then Python would be reasonable part of the integration and take care of it there, as you mentioned.

 

Br,

Michael