If we use ELT tool as Matillian for ingestion from On prem to Snowflake , does it need to land data in Azure Blob? Or can it directly push the data in Snowflake?

If we use ELT tool as Matillian for ingestion from On prem to Snowflake , does it need to land data in Azure Blob? Or can it directly push the data in Snowflake?

Hi @MS249509​,

The ingestion process that Matillion provides is completely based around Snowflake's offerings. Meaning, ingestion methods into Snowflake are dictated by Snowflake itself and not the tool that you use. Conceptually, the 2 approaches Snowflake offers are basically the same with 1 exception which is, do you use internal stages or external stages? An example of a Snowflake External Stage would be Azure Blob storage, internal stages would be Azure Blob storage that Snowflake manages. With internal stages you only reference them by stages as Snowflake is putting an abstraction layer over the underlying storage mechanism (Azure Blob storage). So, fundamentally you don't know or care that it's Blob storage or not.

Either way the data needs to land in an external storage location or an internal storage location before it's moved into Snowflake. The reason for this is because there is really only 1 option to ingest data into Snowflake efficiently which is the COPY INTO command. When you are using Matillion to ingest data into Snowflake, Matillion is simply automating the SQL (in this case COPY INTO) statements so that you don't have to write and understand the intricacies of those statements.

So, the approach you go with is up to you but if you choose to your internal stages then you are effectively saying that you don't want to create a data lake within Blob storage. If you use external stages then, you will want to be strategic about the folder structure of the Blob storage that you setup. This will ultimately allow you to process specific days, months, years, or everything in Blob storage if needed. If it's a one time load and you don't need to do ongoing loads, using internal stages are probably the way to go. In my opinion, if you are going to do an initial load and also incremental loads going forward, it's worth setting up External Stages in Snowflake. There are a lot of benefits to External Stages.

I hope this helps you in understanding the path you should take.

Thank you Bryan, your explanation sounds very convincing.

​But then what advantages matallian offers over SnowPipe. Any idea?

It completely depends on your need. If you are wanting to load data as it shows up in Blob storage to where it's a bit more streaming, using Snowflake streams and tasks make a lot of sense. If you have scenarios where you load less often (once a day, once an hour, etc.) Something like Matillion might be a good solution as it can be scheduled. As an example we use Matillion to load data that already exists in AWS S3 storage (which is the equivalent to Azure Blob storage). We use other loading platforms for doing CDC type replication from on-prem SQL servers and DB2 servers as they are much more resource setup intensive. We like to use our time more on delivering solution from data than trying to figure out how to get data into Snowflake. I hope that helps clarify.