What is making staging from a Database Query so slow?

I'm loading results from an Amazon Athena query into a staging table in Snowflake for further processing. Athena completed the query in 2.5 minutes, but Matillion is taking an hour to load the 10 million rows into Snowflake on an m5.large instance . Snowflake doesn't seem to be under stress (the warehouse used for the load is suspended), and the EC2 instance doesn't seem to be under stress either (it's using less than half of the memory and both CPUs are under 20%).

 

What else can I look at or tune to make this go faster? Thanks

Hi @SRenk​,

What I believe is happening is that Matillion is following the typical loading pattern for Snowflake which means it's generating files based on the Athena query results and then dropping them either in S3 or into internal Snowflake storage and then loading into Snowflake. Since the query takes 2.5 minutes and you noticed that the warehouse goes into a suspended state that means the creation of the files and putting them into S3 or internal storage is likely where the hold up is.

There is probably another approach that will load the data substantially faster. I would lean on the native abilities of AWS and Snowflake. Meaning you can configure Athena to generate it's query result set as a file(s) in a S3 bucket. From there it's as easy as using Matillion's S3 Load component or issuing a Snowflake COPY INTO statement to load the data from S3 into a table.

Let me know if this pattern doesn't make sense and I will try explain it a bit clearer. Thanks for posting!