How can we set the PATH in the Azure Blob Storage Load? Here is what i want to achieve COPYINTO @"DEV_STG"."PUBLIC"."STAGE"/XX/IN PATTERN = '.*CONNECTIONS.*'

I got it working using the Pattern instead of the PATH. From a functional perspective this works the same, but performance is way worse. ADLS simply seems to be way less efficient with the PATTERN clause compared to the PATH clause. I did a simple test with the LIST command to demonstrate:

LIST @"DEV_STG"."PUBLIC"."STAGE"/XX/IN PATTERN = '.*CONNECTIONS.*'

-- <1 second

;

LIST @"DEV_STG"."PUBLIC"."STAGE" PATTERN = 'XX/IN/.*CONNECTIONS.*'

-- >60seconds

;

Output is the same. When our ADLS was realy small this wasn't an issue, but when it started to grow to thousands of files (in other directories than the XX example!), performance suddenly became much worse.

The option to supply a relative path (as in the external table component) seems to be missing in matillion in the Azure Blob Storage Load component or am i missing something?

COPY INTO "DEV_STG"."PUBLIC"."STAGE_CSV_FILE" FROM

@"STAGE"

PATTERN='.*'

FILE_FORMAT= (

FORMAT_NAME='"FF_CAR"'

)

ON_ERROR='ABORT_STATEMENT'

PURGE=FALSE

TRUNCATECOLUMNS=FALSE

FORCE=FALSE

Hi @BATENBURG​,

There are a couple ways to accomplish this but a lot of it hinges on the folder structure of your Blob Storage. As you have probably seen, load performance is better if you can pair down the folders in your Blob Storage you are loading from and the quantity of files in those folders. If you folder structure in your Blob Storage is following best practices then it could be fairly easy to implement.

The best practice is to partition your storage into folders based on some logic. The typical design is some variation of /YYYY/MM/DD/HH/MM/SS. This type of structure can speed up loads considerably when your data has grown over time while giving you flexibility around what data is loaded or perhaps reloaded. For instance, if you wanted to go back and load or reload all data from a specific day or hour, you have the flexibility to do that.

We use the above pattern. On the Matillion, what we do is generate a list of paths to load from using string manipulation in Python. We then store those paths in a grid variable. From there we use a Grid Iterator tied to the Load component and on each iteration we pass in the path to the load component. The load time never grows or shrinks because it's always the same count of folders that will be loaded and the file count is similar each day.

This may not apply directly to you situation but hopefully it gives you an idea.