I have been looking for the best practice for setup of our scenario, but could only find similar documentation for RedShift use cases.
We are running Matillion in a VPC in our company account. Our Snowflake SaaS environment is running in Snowflake's VPC in the same AWS Region.
What are the best practices to setup this connection in AWS? I know AWS PrivateLink is an option, but what are the non-private link options; and is there a way to do so without Data Egress costs?
Hi @McDale , I can shed some light on this as we have implemented a pattern that seems to work with Snowflake, AWS and Matillion in AWS. We don't use the private link option. In my opinion this isn't required unless you are in a security critical industry like banking. If your Snowflake account is on AWS and your Matillion is also in an AWS account then there is no egress charges as far I know. The egress charges would come in when you pull the data from the AWS cloud provider. An egress example would be, if you needed to consume data from Snowflake or AWS where the destination is in an on-premise database.
The VPC and subnets you use in AWS for Matillion will need outgoing internet access. In Snowflake the most secure way of handling the connections from other sources is through Network Policies where you are whitelisting IP's and/ranges of IP's. For us, in Snowflake we whitelist our AWS VPC IP range which gives Matillion network access to Snowflake. Anyone not using Network Policies in Snowflake where PrivateLink isn't being used is just asking for trouble.
So, that takes care of the network side of things but there is still the security side that needs to be resolved. A second layer of secure access is achieved through IAM roles and relationships. Setting this part up allows Snowflake secure access to S3 buckets in your companies AWS account. Check out this article: https://docs.snowflake.com/en/user-guide/data-load-s3-config-aws-iam-role.html. Perhaps the most important part of this article is this: https://docs.snowflake.com/en/user-guide/data-load-s3-config-aws-iam-role.html#step-2-create-an-aws-iam-role. If you read step 2 very carefully, you will notice that you are allowing another AWS account (which is where Snowflake lies) access into your AWS account which is where that relationship is built. Getting this part implemented allows you to create Stages in Snowflake that point to S3 buckets in your companies AWS account. This is the most secure way to implement this relationship. My understanding is that even with a PrivateLink you should still follow these steps.
I hope this explains how to securely implement connectivity between AWS and Snowflake without using PrivateLink.
Hi Bryan,
Thanks for your answer! I was thinking the same thing, if we are not storing PCI Data or something requiring additional security, I don’t see the need for PrivateLink either.
I guess my concern is that if we allow our VPC, with Matillion on EC2, outgoing internet access and use that to connect to Snowflake’s URL (which Snowflake has deployed in the same AWS Region) does sending data over that link constitute Data Egress from our account? Or is AWS’s internal network routing smart enough to know that both our VPC and the Snowflake instance we are connecting to are in the same AWS Region, and therefore not Egressing data?
Hey @McDale, I asked that specific question when we were setting up our Snowflake environment about 1.5 years ago and although the answer came across without a ton of confidence Snowflake was pretty sure that that there was no egress charges as long as the data stayed within AWS somewhere. Based on what I have see on our bills I would be inclined to confirm that statement. Egress is a little hard to see as they don't break down egress based on what was moved and therefore it's sort of lumped together on the bill. Considering that we know we do have some egress going on and comparing what we know we have versus situations where data egress could happen with Snowflake and AWS, I would expect the number to quite a bit higher if they were categorizing that data movement as egress.
Like you, our Snowflake account and company AWS account are in the same region. So, another question to Snowflake may be if those 2 accounts were in other regions if there would be egress charges. I sort of suspect there wouldn't be extra egress charges.
The whole principle of egress charges is about keeping data within AWS in general because the more data that lands and stays in AWS the compute is going to be required to use that data which is where they make their money. If you're pulling that data out of AWS then they are losing the possible consumption of compute for that and therefore they charge you the egress.
I hope this helps Mike. If you have any concerns about this, I would ask your Snowflake rep if you have one as they should have a very solid answer to those questions by now.