Hi everyone I am new to matillion etl tool and I wanna know how can I convert a csv file into avro format, source platform might be SnowFlake

Tony · February 16, 2022, 10:22pm

I am considering to use the Python alternative however I want to hear from you if there is another alternate way. Thanks in advance Matillion community :)

ChikaMatillionCommunityMgr · February 16, 2022, 11:07pm

Hi @Tony

Welcome — we're glad you're here!

Hey @Bryan, this question made me think of you and your experience with CSV files, care to share what you know? Thanks in advance.

Again, glad to have you onboard, @Tony

Chika

PS. Feel free to share your best practices too!

Bryan · February 17, 2022, 4:30pm

Hi @Tony,

Unfortunately, there is nothing out of the box that Matillion has to do this conversion. Your comment about using Python is going to be your best approach for this. One thing I like to ask is, what are you trying to solve for by converting a CSV to avro? I ask because there could be other methods to get you to your end result.

It looks like you are using snowflake. I am making a general assumption that you are wanting to load avro files into Snowflake. You can load CSV files into Snowflake but I know there are a few caveats with that. JSON seems to be the most flexible file type when loading data into a variant column. JSON has the most functionality in Snowflake around querying it natively without having to transform it before use.

The conversion from CSV to AVRO is not super common but there are a few Python Modules that can help. I tend to lean towards the pandas module for most thing but in this case there is nothing that I know of in Pandas that deals with Avro. Unfortunately, I wouldn't be of much help in this space.

Let us know what end result you are working towards and maybe we can help give you ideas on how to get there. Thanks for reaching out!

ChikaMatillionCommunityMgr · February 17, 2022, 4:59pm

Thank you tons, @Bryan

Tony · February 17, 2022, 5:10pm

Hey Bryan, thank you very much for your kind response. Yes I am using Snowflake, the task I need to complete is to load snowflake tables (which stored csv format files) then I want to transform those dataframes from that csv format into avro format. I have found various alternate ways to accomplish that using the Python node (According to my research we can harness pyarrow library (from Apache foundation) or dask library (Python framework) among other alternatives such as Pandas, Koalas, Pyspark, etc...however Pyarrow and Dask enables create a more flexible scenario in terms of optimization, the capability to apply partition to the data, etc). This is an extract of the code using Dask:
import dask.dataframe as dd
df = dd.read_csv('./data/people/*.csv')
df.to_parquet('./tmp/people_parquet2', write_index=False)

Tony · February 16, 2022, 11:26pm

Thank you @Bryan I will appreciate your kind support. Thanks in advance.

Bryan · February 17, 2022, 8:09pm

That's the approach I would take if I had to do this myself. I am assuming you are needing the Avro file of the CSV file because you're loading some other system besides Snowflake, is that correct?

Tony · February 18, 2022, 3:46am

That is correct. Thanks very much Bryan for all your help

Topic		Replies	Views
I am trying to get data from api which returns the data in csv format. How can I load this data using Matillion into snowflake since API query component cannot be used. What are the best possible options and solution to the option Matillion ETL	10	5	September 9, 2021
Hi everyone I am new to matillion etl tool can u suggest how to learn or grow on on Matillion ETL	6	0	May 5, 2021
Hello All, In Matlllion ETL we have Database Query component, Which stages the data, onto S3 Storage, So overthere i could see that it doesnt have any file format(csv)etc ? If so How does it Load into Snowflake ! Can anyone throw some light on this? Matillion ETL	2	3	February 1, 2022
Dynamically generate column list from incoming source file Matillion ETL	21	2	March 31, 2022
Handling malformed CSV (unenclosed delimiters) in an ELT fashion Matillion ETL	5	1	June 2, 2022

Hi everyone I am new to matillion etl tool and I wanna know how can I convert a csv file into avro format, source platform might be SnowFlake

Related topics