Load latest CSV files dynamically from S3 different folders

sunny_etl · September 15, 2023, 4:36am

Hello experts, I am dynamically building a pipeline to fetch the most recent CSV files from various folders and load them into the corresponding tables in snowflake. (Incremental Process)

for ex:

FOLDER A (FILE1 DATETIME, FILE2 DATETIME.....) - LOAD LATEST FILE INTO TABLE A

FOLDER B (FILE1 DATETIME1, FILE2 DATETIME.....) - LOAD LATEST FILE INTO TABLE B

AnudeepK · September 29, 2023, 10:02pm

This can be acheived by multiple steps.

1. List the files in S3 and record them in a Snowflake table with its Metadata (last modified datetime, created datetime etc)

2. Use a high watermark approach to filter the unprocessed files.

3.Iterate through the remaining list and load into corresponding tables dynamically by extracting the table name from file path and passing it as a variable to your load job.

4. Make sure you mark the file as processed once it is loaded successfully.

Another Alternative approach could be using Snowpipe feature on Snowflake.

Topic		Replies	Views
By Matillion how we can achieve incremental load? Matillion ETL	2	0	March 6, 2023
How can I "split" 1 CSV file into multiple CSV files, load each of them into S3, and finally load each into their own Snowflake table? Matillion ETL	1	0	May 20, 2022
Hello Team, I have requirement to create a pipeline which ingests data(Batch/cdc) from Salesforce to Snowflake with S3 as a staging area. The pipeline should be dynamically able to create folders in S3 and store files for each load(batch/cdc) Matillion ETL	1	1	January 23, 2023
I would like to load 4 CSV files from S3 bucket to 4 different Snowflake tables. All loading tasks are independent and csv files are available in S3 Bucket like /basefolder/subfolder/<4csvfiles> Matillion ETL	3	0	September 11, 2021
I have loaded a table in Snowflake using S3 load generator. I need to append to this table (files from S3 loads) based on a column match. If col1 exists in the table & file only then proceed with appending the data in target table. Please advise Matillion ETL	2	0	August 10, 2023

Load latest CSV files dynamically from S3 different folders

Related topics