Speeding up bulk API POST requests with API Extract

I am implementing a reverse-ETL integration which involves using Matillion to push data from our Snowflake data warehouse to a CRM daily. Data is transformed to a staging table formatted to the CRM's API format, then a table iterator sends the requests with an API Extract component. The number of POST requests made each run is on the order of 1,000 - 2,000 requests, 10-20MB of data.

 

My challenge is that each POST request is taking approximately 3-5 seconds to run. At scale, this integration could take up to an hour to run.

 

Ideas I've implemented so far:

  • running POST requests concurrently by using a table iterator component set to concurrent.
  • Implementing CDC at the column level such that only changed values are sent, significantly reducing data transfer.
  • utilizing the CRM's batch API to reduce the number of requests.

 

I'm interested in any suggestions. Is this integration out of the realm of possibility within Matillion? Is a table iterator the wrong way to go?

 

Thank you in advance!

Hi Brendan, I was wondering if you ever figured out any other ideas to speed up your requests or if you were still needing feedback? If you do still want feedback, could you clarify if the POST requests are 10-20mb each or if that's split over the 1000-2000 requests?  

 

Many thanks,

Claire

Hey Claire! Thanks for checking in.

 

The POST request payloads are approx 10KB each, 10-20MB split over the 1-2k requests.

 

Since this post, I have actually moved away from the API Extract component for the largest request type with this API. I use the component for fetching information (the format of the response is helpful and reduces code to be written) but posting data to the API is now done with the Requests library in the Python component. This led to about a 5x speed increase (down to 0.9 seconds per request on avg,) so I suspect that the API Extract component comes with a good amount of extra baggage I don't need for POST requests.

 

I also tweaked my concurrency to leverage the instance hardware as best I could. We run Matillion ETL on a medium sized potato, and 1,000 requests running concurrently led to some sub-optimal runtimes and lag in the UI for users during runtime. I split the requests into batches of 100 requests or so, and iterate sequentially on those batches of 100 concurrent requests. This removed the lag on the UI during runtime, and sped requests up approximately another 10-15%, down to 0.8 seconds per request. This could be good to forward to the feature team- It'd be great to have native support for more concurrency control.

 

Thanks!

 

Brendan

Hi Brendan,

 

Thanks so much for coming back and letting us know how you got on! I'll pass this back to the team so they can see how you improved the runtime.

 

Many thanks, Claire