I am building the downstream side of a data pipeline. Currently the data (JSON) is landed in S3, where some tokenization is done and then pushed to Snowflake via Kafka. During this process, AWS Glue Data Catalog is updated for each data object (which is dynamic). I need to use the schemas from glue catalog in Matillion to load flatten these JSON files in Snowflake. What are the options to do this?
Hi @jefferson.handa
Apologies I have just seen this post and appreciate it has been a while but wanted to share some feedback I received from our team here at Matillion.
An option we would like to suggest would be looking first at the S3 load component in METL, this can load JSON files from an S3 bucket directly into a Snowflake variant column. We then have transformation components like Extract Nested Data to allow the flattening of the data in Snowflake.
If AWS Glue has to be used for another reason Snowflake External Tables (via external stages) look to be able to connect to AWS Glue catalogues. Matillion transformation jobs then work with both internal and external snowflake tables in exactly the same way.
However, because the data is not held in Snowflake, external tables will be slower to query.
I wanted to attach some documentation to help with this also:
Keep me posted if you need any further support on this.
Kind regards, Joe