Is it possible to load a file (from Azure Blob Storage) into Snowflake as a single varchar, or a single varchar per line?
My situation is that a source system provides data in a CSV file with unenclosed delimiters.
example file illustrating the unenclosed delimiter character:
ID, name, amount, date
001,Tammy,$123,456.78,2012-12-12
Because this kind of malformed CSV is ambiguous, I don't expect a CSV handler to fix this on its own. We've produced a script to solve this issue and can run that script in pre-processing, but I'm interested in a ETL-style approach, handling the issue within Snowflake/Matillion.
Thank you in advance for your advice and support!
Hi @Brendan ,
the first solution that comes to my mind is to use the "regular" CSV load process with an empty delimiter (is that even possible) or a delimiter that is non-existent in your input data. The idea is to treat every full line in the CSV as one attribute. Just specify the target structure as one VARCHAR column with maximum lenght and then this should give you one row per input line with the complete content.
Let me know if that works for you. Looking forward to hear about the results.
Michael
Hey @Brendan , my suggestion regarding a VARIANT field is purely for if your data may not strictly be a string. For example, it could have been a semi-structured construct, such as a JSON object. VARIANT is the basic fall-back to contain any type of data before formatting, and you can convert from a VARIANT to a string or any other Snowflake data type.
Other than the file type of the destination field being a STRING or a VARIANT, nothing else changes here. Row order would still be maintained in the same way as ingesting into a STRING, you would simply have a VARIANT field for every record of data instead of a STRING.
For bonus points if it is useful, you can also include METADATA$FILE_ROW_NUMBER in your ingestion as another field. Then the loaded data can easily be compared with the original file. You can pair this with METADATA$FILENAME
Hai @Brendan
My first solution that comes to my mind is to use the "regular" CSV load process with an empty delimiter (is that even possible) or a delimiter that is non-existent in your input data. The idea is to treat every full line in the CSV as one attribute. create table with single column using super datatype .
create table csvload(
data super
)
There is no limit for super data type u can load csv file data into single column values
cheers
Subash S
Michael is correct here. Minor clarification is that you can specifically set your FIELD_DELIMITER in your Snowflake File Format to `NONE`.
Depending on your use case, you may also find it more helpful to ingest the data into a VARIANT column instead of a STRING, but that's completely your choice.
Thank you Michael and Chris! Michael has the right path forward, and Chris's configuration suggestion was right on point.
Using The Azure Blob Storage Load component, I was able to set the field delimiter to NONE and process the files' format within Matillion transformations on Snowflake. IT will not have to hear from me this week regarding their file formatting!
I'm not sure how VARIANT processing would be different. If I loaded this file in as a single VARIANT, would the row order be maintained? Set-based manipulation was sufficient for my needs, but I could see a benefit to using a VARIANT if order is maintained.