Can I use AWS Glue Catalog in Matillion?

jefferson.handa · October 5, 2023, 1:19am

I am building the downstream side of a data pipeline. Currently the data (JSON) is landed in S3, where some tokenization is done and then pushed to Snowflake via Kafka. During this process, AWS Glue Data Catalog is updated for each data object (which is dynamic). I need to use the schemas from glue catalog in Matillion to load flatten these JSON files in Snowflake. What are the options to do this?

Joe-Community-Manager · February 26, 2024, 2:26pm

Hi @jefferson.handa

Apologies I have just seen this post and appreciate it has been a while but wanted to share some feedback I received from our team here at Matillion.

An option we would like to suggest would be looking first at the S3 load component in METL, this can load JSON files from an S3 bucket directly into a Snowflake variant column. We then have transformation components like Extract Nested Data to allow the flattening of the data in Snowflake.

If AWS Glue has to be used for another reason Snowflake External Tables (via external stages) look to be able to connect to AWS Glue catalogues. Matillion transformation jobs then work with both internal and external snowflake tables in exactly the same way.

However, because the data is not held in Snowflake, external tables will be slower to query.

I wanted to attach some documentation to help with this also:

S3 Load Component

Extract Nested Data

Keep me posted if you need any further support on this.

Kind regards, Joe

Topic		Replies	Views
Data Ingestion: Load Dynamic Files from S3 to Snowflake Matillion ETL	1	1	January 29, 2021
I am trying to set up a process of sharing the data from Snowflake to SFTP server on a regular basis using Matillion Matillion ETL	2	0	December 23, 2021
Matillion looking for use cases for syncing data back to DynamoDB Matillion ETL	3	0	July 22, 2021
How to send a csv file from Snowflake to S3? Matillion ETL	5	1	May 5, 2021
Hello Team, I have requirement to create a pipeline which ingests data(Batch/cdc) from Salesforce to Snowflake with S3 as a staging area. The pipeline should be dynamically able to create folders in S3 and store files for each load(batch/cdc) Matillion ETL	1	1	January 23, 2023

Can I use AWS Glue Catalog in Matillion?

Related topics