Load filename along with data using Azure Blob Storage Load

Hi All,

I have an ETL, which extracts data from XML files which I am staging in Azure Blob Storage and loads into Snowflake tables. I am using Azure Blob Storage Load and SQL Query components to achieve this.

As there are multiple files in blob storage, I want the filename also to be loaded along with the data in Snowflake tables.

Please help me in achieving this requirement?

 

The job runs successfully but now I want filename as well to be loaded in the tables.

 

Below is the snapshot of components I am using currently:

 

Thank you

Hey @SR018652​ , if you want to ingest the filename as well then I would recommend using a SQL component instead to leverage a Snowflake COPY INTO statement. You will need to enter your field names etc manually with $ references to their ordinal position, but with this approach you can also query system fields like METADATA$FILENAME and METADATA$FILE_ROW_NUMBER. You can read more about this on Snowflake's documentation here.

So you may end up with a statement such as the following (apologies for badly formatted SQL in this window):

COPY INTO table1(filename, file_row_number, col1, col2)

FROM (

SELECT

METADATA$FILENAME,

METADATA$FILE_ROW_NUMBER,

t.$1,

t.$2

FROM @mystage1/data1.csv.gz (file_format = > myformat) t

);

Thank you, Chris. I could see the file names in a table. But because of how I am receiving my source files. This process is doesn't result in correct final data.

 

Is their any way to store the variable values, which in my case store Filename, from the iterator component to a Snowflake table?

 

Thanks.

Shivroopa

Absolutely you can, but you will still need to leverage a COPY INTO statement via a SQL table. You can then simply enter any Matillion variables into your SQL and store them as strings. For example, consider the following SQL:

COPY INTO table1(matillion_variable_a, matillion_variable_b, filename, file_row_number, col1, col2)

FROM (

SELECT

'${MATILLION_VARIABLE_A}',

'${MATILLION_VARIABLE_B}',

METADATA$FILENAME,

METADATA$FILE_ROW_NUMBER,

t.$1,

t.$2

FROM @mystage1/data1.csv.gz (file_format = > myformat) t

);