Is there a way to read SFTP file names in Matillion DPC?

Hello!

 

I'm receiving CSV files from a third-party vendor onto our SFTP server on an daily basis. In METL I typically pull the file names via a file iterator however SFTP is not an option for Input Data Type on the File Iterator component.

 

Is there a way to read SFTP file names to a grid variable in Matillion DPC? This would allow me to dynamically ingest files via the Data Transfer component to my CDW but right now that does not seem feasible.

 

Any and all input is appreciated.

 

Thanks you!

Hello @jason.hoehn

Thank you for your post and welcome to the community, I wanted to share with you this idea, once implemented this will allow you to achieve this and more.

It is currently under review and we should have an update on it soon so please up-vote and subscribe and you will be informed.

Kind regards, Joe

Hello there, @JoeCommunityManager​!

It's good to see that the File Iterator component will be receiving an enhancement for SFTP data types in due time. That being said for the meantime my solution was to write a Python UDF in Snowflake to interface with the SFTP bucket and call that UDF in Matillion. It was a bit more of a lift but achieved exactly what I was looking for.

Cheers,

Jason Hoehn

Hi @jason.hoehn

Thanks for sharing your workaround that's amazing! It's much appreciated :)

Kind regards, Joe

@jason.hoehn​ Can you share what your python script was?

While Matillion should deliver this functionality by the end of the month, I need the capability now, but I was not able to get python to work. Did you have to install libraries in the hybrid agent for it to work as I have not gone down that path yet.

CREATE OR REPLACE FUNCTION <database>.<schema>.<function_name>("SFTP_HOST" VARCHAR(16777216), "SFTP_DIRECTORY" VARCHAR(16777216))

RETURNS VARIANT

LANGUAGE PYTHON

RUNTIME_VERSION = '3.8'

PACKAGES = ('snowflake-snowpark-python','pysftp')

HANDLER = 'get_files'

EXTERNAL_ACCESS_INTEGRATIONS = (<external_access_integration>)

SECRETS = ('cred'=<database>.<schema>.<secrets_name>)

AS '

import _snowflake

import pysftp

from snowflake.snowpark.files import SnowflakeFile

def get_files(sftp_host, sftp_directory):

   

  sftp_cred = _snowflake.get_username_password(''cred'');

  sftp_host = sftp_host

  sftp_directory = sftp_directory

  sftp_username = sftp_cred.username

  sftp_password = sftp_cred.password

 

  cnopts = pysftp.CnOpts()

  cnopts.hostkeys = None

 

  try:

    with pysftp.Connection(host = sftp_host, username = sftp_username, password = sftp_password, cnopts = cnopts) as sftp:

      if sftp.exists(sftp_directory):

        sftp.chdir(sftp_directory)

        files_list = sftp.listdir()

        ret = []

        for file in files_list:

          ret.append(file)

    return files_list

  except Exception as e:

    return f" Error with SFTP : {e}"

';

Script is above. Let me know if you need the code for the external access integration & secrets as well.

 

Once I got this setup I called the UDF via query results to scalar for further use downstream in Matillion.

This is great. Thank you.

 

Yes, how the external access integration and secrets work would be beneficial. I've been using boto3 for AWS secrets in the Matillion python component but have not used Snowflake python based functions so I am still learning that side.

Hello! Apologies for the delayed response here. Below is the code I used to create the Secret & External Access Integration. You also need to setup a Network Rule that enables outbound traffic to the SFTP bucket. I've included that code below as well.

 

-- Network Rule

create network rule if not exists <database>.<schema>.<network_rule>

  type = HOST_PORT

  mode = EGRESS

  value_list = ('<host_port>')

  comment = 'Network rule that allows Snowflake Matillion users to access the SFTP host for <sftp_source>.';

 

-- Secret

create secret if not exists <database>.<schema>.<secret>

  type = PASSWORD

  username = '<username>'

  password = '<password>';

 

-- External Access Integration

create or replace external access integration <eai_name>

  allowed_network_rules = (<database>.<schema>.<network_rule>)

  allowed_authentication_secrets = (<database>.<schema>.<secret>)

  enabled = TRUE;