Column metadata from an Excel file with the Excel query component

Hi community!

 

I've got a metadata driven data warehouse as in there is some logic that creates tables based on what is available in a data table. I.e. I read metadat from table X so I know what fields are in that table. Based on this the logic will create a target table.

 

Now i've got this Excel file but I do not know if it's possible to somehow read columns inside this file.

Has anyone got any suggestions?

Hi @j.ouwehand1606738637880​ ,

There are several ways to do this. The scenario you choose to use will depend on your situation and your skill level in certain areas. The easiest out of the box solution would be to leverage the Excel Query component. It allows you to read the data from a worksheet in the Excel workbook and create a table from it. Thus, you get all columns in a table based on what is in the Excel sheet. If you decided that you wanted make some changes to those columns, sort the data differently, aggregate it, etc. you can then use a transformation job or a SQL Script component in an orchestration job to "SELECT INTO" a new table from the table that was just created.

If you are comfortable with Python, I would encourage you to change out the Pandas module. There is a read_excel method that will allow you to read a worksheet from an Excel file into a dataframe where you can get the header information.

You could also convert your Excel file into a CSV file (manually or using Python) and then load your CSV file into a Snowflake table that has a variant column as a whole. From there you would create your SELECT INTO statement based on that. (That's assuming you are using Snowflake)

As a side note, the metadata you are referring to is actually the schema. The schema driven approach you mentioned is common throughout any data warehouse. What I am saying is, don't feel like you are alone with this type of effort. Your scenario is very common for most of us.

I hope this helps you! If you feel like you want to tackle this from a different approach than the ones I mentioned or need more detail please feel free to reply and I am sure someone will jump in and help where they can. Thanks for posting!