Load latest CSV files dynamically from S3 different folders

Hello experts, I am dynamically building a pipeline to fetch the most recent CSV files from various folders and load them into the corresponding tables in snowflake. (Incremental Process)

for ex:

FOLDER A (FILE1 DATETIME, FILE2 DATETIME.....) - LOAD LATEST FILE INTO TABLE A

FOLDER B (FILE1 DATETIME1, FILE2 DATETIME.....) - LOAD LATEST FILE INTO TABLE B

This can be acheived by multiple steps.

 

1. List the files in S3 and record them in a Snowflake table with its Metadata (last modified datetime, created datetime etc)

2. Use a high watermark approach to filter the unprocessed files.

3.Iterate through the remaining list and load into corresponding tables dynamically by extracting the table name from file path and passing it as a variable to your load job.

4. Make sure you mark the file as processed once it is loaded successfully.

 

 

Another Alternative approach could be using Snowpipe feature on Snowflake.