How do you loop an entire pipeline to extract multiple csv files in batches?

matillearner · December 1, 2023, 1:59pm

I am quite new to Matillion and am struggling to create a pipeline that can loop through a table to extract data in batches. Each batch should then uploaded from the S3 bucket to a SFTP folder as a separate csv file with a future date stamp.

Example: Extract 100 000 records in batches of 25 000. Each batch is exported in a separate csv file with a date stamp. The first file date is today's date, the second file is dated tomorrow, the third file is dated the day after tomorrow etc.

The pipe components are as follows:

Query Result To Scalar (to count records) > If (to check if any of them are valid) > Create Table (create a temp table for the specific batch) > SQL Script (insert the batch to the temp table and to set the flag on the extracted batch) > Query Result To Scalar (to determine the date stamp of the file) > S3 Unload (pull the data from the temp table into the csv file) > Data Transfer (push the file to the SFTP).

Currently, there is a pipeline for each csv file (more than 10 of them). This delivers the desired results but I believe there must be a much better way to accomplish this than the "work around" of 1 pipe per file.

Is there any way that I can run all 10 batches through just the one pipeline, such as extract the first batch and upload, extract the second batch and upload, until all files have been created?

ASD · December 8, 2023, 5:26pm

Have you tried using File Iterator?

https://docs.matillion.com/data-productivity-cloud/designer/docs/file-iterator/

Topic		Replies	Views
Hello Team, I have requirement to create a pipeline which ingests data(Batch/cdc) from Salesforce to Snowflake with S3 as a staging area. The pipeline should be dynamically able to create folders in S3 and store files for each load(batch/cdc) Matillion ETL	1	1	January 23, 2023
API Extract - Iterate and combine data from multiple calls Matillion ETL	5	2	February 9, 2022
Issue in File Iterator Matillion ETL	4	22	September 6, 2024
I have a GCP bucket with hundreds of csv files in the format [filename]-YYYYMMDD-[suffix].csv I want to fetch just the last 3 days' files each day to merge into BigQuery Matillion ETL	2	0	June 6, 2023
Can Matillion ETL convert multiple CSV files to Parquet? Matillion ETL	1	0	January 26, 2023

How do you loop an entire pipeline to extract multiple csv files in batches?

Related topics