Speeding up bulk API POST requests with API Extract

Brendan · April 27, 2024, 12:01am

I am implementing a reverse-ETL integration which involves using Matillion to push data from our Snowflake data warehouse to a CRM daily. Data is transformed to a staging table formatted to the CRM's API format, then a table iterator sends the requests with an API Extract component. The number of POST requests made each run is on the order of 1,000 - 2,000 requests, 10-20MB of data.

My challenge is that each POST request is taking approximately 3-5 seconds to run. At scale, this integration could take up to an hour to run.

Ideas I've implemented so far:

running POST requests concurrently by using a table iterator component set to concurrent.
Implementing CDC at the column level such that only changed values are sent, significantly reducing data transfer.
utilizing the CRM's batch API to reduce the number of requests.

I'm interested in any suggestions. Is this integration out of the realm of possibility within Matillion? Is a table iterator the wrong way to go?

Thank you in advance!

ClaireSeniorCommunityManager · September 9, 2024, 9:53am

Hi Brendan, I was wondering if you ever figured out any other ideas to speed up your requests or if you were still needing feedback? If you do still want feedback, could you clarify if the POST requests are 10-20mb each or if that's split over the 1000-2000 requests?

Many thanks,

Claire

Brendan · September 10, 2024, 4:20pm

Hey Claire! Thanks for checking in.

The POST request payloads are approx 10KB each, 10-20MB split over the 1-2k requests.

Since this post, I have actually moved away from the API Extract component for the largest request type with this API. I use the component for fetching information (the format of the response is helpful and reduces code to be written) but posting data to the API is now done with the Requests library in the Python component. This led to about a 5x speed increase (down to 0.9 seconds per request on avg,) so I suspect that the API Extract component comes with a good amount of extra baggage I don't need for POST requests.

I also tweaked my concurrency to leverage the instance hardware as best I could. We run Matillion ETL on a medium sized potato, and 1,000 requests running concurrently led to some sub-optimal runtimes and lag in the UI for users during runtime. I split the requests into batches of 100 requests or so, and iterate sequentially on those batches of 100 concurrent requests. This removed the lag on the UI during runtime, and sped requests up approximately another 10-15%, down to 0.8 seconds per request. This could be good to forward to the feature team- It'd be great to have native support for more concurrency control.

Thanks!

Brendan

ClaireSeniorCommunityManager · September 11, 2024, 8:34am

Hi Brendan,

Thanks so much for coming back and letting us know how you got on! I'll pass this back to the team so they can see how you improved the runtime.

Many thanks, Claire

Topic		Replies	Views
API Extract - Iterate and combine data from multiple calls Matillion ETL	5	2	February 9, 2022
DYNAMICALLY ITERATE MULTIPLE API ENDPOINTS FOR DATA EXTRACTION AND LOAD THE DATA INTO SNOWFLAKE WITH SCHEMA(TABLES AND COLUMNS INFORMATION ) Matillion ETL	1	1	September 7, 2023
We have a requirement to load 2 months of audit log data from this endpoint (<InstanceAddress>/rest/v1/audit/export) . Has anyone ever worked on it before? Matillion ETL	1	0	November 15, 2024
Extract API task history to monitor it Matillion ETL	5	3	November 29, 2022
I'm a back end developer in .NET/C# that was given the task of calling the Matillion API to get task status information and history for a reporting dashboard. and I need help understanding some Matillion terms for URLs - can anyone help? Matillion ETL	8	3	August 26, 2021

Speeding up bulk API POST requests with API Extract

Related topics