Which component i can use

MC203075 · March 3, 2023, 6:46pm

Hi team,

In my current project i am having scala code. The scala code is all about it will take a file as input read each line , based on the data in each line it will generate new transformations and write to a new file. New file will have additional columns that are derived as part of transformations. I need to do the same thing in matillion . Can somebody help me how we can do this in matillion or even sf. Since snowflake is dwh i am using. I tried with snowflake stored proc. It is giving time out error. Since file size is big.

Michael · March 6, 2023, 3:44pm

The first step is to load your file to a Snowflake table. Depending on the file format and structure this can be out of the box with Matillion. E.g., for CSV or JSON files on S3 this is easily done with the S3 Load component.

If your file has a custom format, you might need to transform it first.

Once the data is loaded to Snowflake, you can apply standard SQL or a Matillion Transformation Job to transform your data. You'll have to "translate" your Scala code to SQL. With that, it should also be feasible to process larger volumes of data.

MC203075 · March 6, 2023, 6:01pm

Hi Michael,

I have done the samw thing. Loaded the file into table and from table i am iterating each line to transformed as per the requirement. Each line iterating is the issue here in snowflake. My file size is 15gb . Thats the problem. I am using large warehouse to process it. After 30 mins run i was able to iterate 30k records which is very expensive for 15gb file and i will get timed out. Any suggestion on the iteration part will highly appreciated. I am struggling to process it.

Michael · March 7, 2023, 8:28am

If you want to work with a SQL based database like Snowflake you have to stop processing the data with "line by line" iterations. SQL works on sets and you should consider this when designing your transformation processes.

Example: If you have a table with integer columns a and b want to calculate a + b, you don't apply the calculation to line1, then line2, then line3, ... Simply write SELECT a+b FROM table and the calculation is applied to all the lines/rows in the table.

Topic		Replies	Views
Is the a transform transaction entirely on snowflake compute and storage or does matillions instance do any of the work? Matillion ETL	1	0	August 19, 2022
Dynamically generate column list from incoming source file Matillion ETL	21	2	March 31, 2022
Hello All, In Matlllion ETL we have Database Query component, Which stages the data, onto S3 Storage, So overthere i could see that it doesnt have any file format(csv)etc ? If so How does it Load into Snowflake ! Can anyone throw some light on this? Matillion ETL	2	1	February 1, 2022
How to use GCP and Python in matillion Matillion ETL	1	2	December 28, 2023
I am trying to get data from api which returns the data in csv format. How can I load this data using Matillion into snowflake since API query component cannot be used. What are the best possible options and solution to the option Matillion ETL	10	4	September 9, 2021

Which component i can use

Related topics