Hi All,
I have a challenging problem that I’m attempting to solve with DPC.
I’m using the Excel Query component to import a data file from an S3 bucket. The problem is this DPC component requires the XLSX file to be saved using Strict Open XML format. The source system generates this XLSX file as a normal Excel file. There are a lot of null fields in this source data and if I try using the normal Excel file it swizzles the columns when loading into the Snowflake destination table.
The task is to open the XLSX file that is in S3, then resave with Excel’s Strict Open XML format.
I started writing some PythonPushdown code to do this but I’m no Python genius and this is proving to be quite challenging.
Has anyone attempted something like this in DPC or using a different tool? I’d love to hear how you were able to accomplish something like this.
Thanks,
Mick
Hi Mick,
Thank you for your post, I have spoken with the team about this and there are a few things to note.
Firstly you cannot receive a strict type outside of a windows environment, this is due to how Microsoft licence.
An option would be to use a lambda to convert to CSV.
-
Snowflake’s Snowpark Python runtime does not natively support Excel libraries like openpyxl or xlsxwriter, especially not with Excel-specific saving formats.
-
Excel’s Strict Open XML format is a Microsoft-specific save option, and no open-source Python libraries can convert to this format with 100% fidelity—only Excel desktop or Microsoft Graph API can do it properly.
If you need any further support on this, let me know, and I can help arrange this.
Kind regards, Joe
Hi Joe,
Thank you for this message. I also reached out to Zach Ennis yesterday and we are still discussing this. I am still searching for a solution.
I am curious what is meant by “Office Open XML Excel sheet” as documented in the Excel Query component documentation.
Thanks,
Mick
Hi @mick.jaeger
No worries, An Office Open XML Excel sheet is a different standard format. There is some further information on this here.
Thanks Joe