SQL Script component multiple statements session issue

I want to ask if anyone has the same problem or was able to use an workaround for this.

 

The SQL script component allows to execute multiple scripts by separating them using ";"

This is useful if you want to execute something sequentially and it is one unit of work so you want to keep it inside one component.

I expected that I can use multiple statements inside this component for session based features.

So e.g. temporary Snowflake tables (https://docs.snowflake.com/en/user-guide/tables-temp-transient.html)

This kind of table only exists during the session runtime. I needed an actual table instead of an result set due to another issue.

What happened was that sometimes my script could not find this temporary table.

The issue seems to be that Matillions connection pooling can use different connections/sessions for the statements inside the SQL.

I was able to solve this by creating an transient table at the start and dropping this table at the end of the script.

 

But now I am facing another issue.

We have workloads that require other warehouse sizes.

The current solution is scaling up our default warehouse and scaling it down afterwards. This has some flaws. With multiple jobs scaling the warehouse, it might happen that one job is scaling the warehouse down but the other jobs still needs the larger one.

It also has the issue that during the time we need this scaled warehouse some smaller jobs will use this warehouse and will generate more costs due to the horizontal scaling.

 

So what we want to do is to use warehouses with different sizes instead of scaling one warehouse up and down. e.b. MATILLION_SMALL, MATILLION_LARGE

 

A lot of our business logic is inside sql script components (if this is good or not is another discussion).

The first problem is that this component has no parameter to select the warehouse it should use. My colleague already created an idea for this (https://metlcommunity.matillion.com/s/idea/0874G000000kBoVQAU/detail)

I thought well I can use the statement "USE WAREHOUSE XY;" before executing my script.

But with this session/connection pooling issue in mind this will result in random selection of warehouses when multiple jobs are running at the same time.

 

So you have to use the default warehouse when using the SQL Script Component? Any ideas how to fix this?

 

I created an Idea for this issue:

(https://metlcommunity.matillion.com/s/idea/0874G000000kBpYQAU/detail)

 

My personal opinion is that the SQL Script component should not use this session pooling and the expected behavior is that everything is running in the same session.

But maybe this is only my personal opinion and this has and good reason?

 

 

 

Hi @Nils

Thank you for your post, the level of detail you have given is amazing. Not surprisingly it has really ignited a conversation amongst the @MatillionProductTeam

We are just forming a response to you now and I will be sure to share an official update here with you and the wider community in the next coming days.

However, I would love to hear the thoughts of any other Community members.

Kind regards, Joe

Hi @JoeCommunityManager​,

The above mentioned issue with session pooling has created issues for one of the workloads I have. I have iterators with concurrent execution of an orchestration job for multiple datasets. The orchestration job has a sql script component in which I am trying to set a query tag for the session to associate it with the query within the component( the tag should be different for each dataset within the iterator) as the SQL component does not return the query id associated with the query executed. With concurrent executions, this session query tag need not necessarily get associated with the query we are executing within the component as it can get reset with some other parallel executing job for the iterator. It will be really useful if the set of queries within a SQL component uses the same session for execution. The same issue has happened with the session variables as well if the iterator execution pattern is set to concurrent. With sequential execution, it just works fine.

Hi @Nils@Tinju

And again thank you for such a thought-provoking thread, I have spoken with the Matillion Product team who also enjoyed discussing this subject too.

They had advised me that a method to resolve this is if the SQL Script component is contained within a transaction (Begin, Commit) then all the statements should be executed on the same connection.

Please give that a try and let me know how it goes, I look forward to hearing back from you.

And of course please head to @Nilsidea and up-vote and share any feedback and suggestions on this for the product team.

Kind regards, Joe

I just saw this 2-year old topic, and noticed that the idea was turned down. We are in the process of evaluating Matillion before buying. It's a big concern for my team because we are using temporary tables heavily in our SQL script files. If we are required to re-write them to avoid query failures for "missing" temporary tables, I am not sure if we still want to buy it.

Just to expand on this @Nils​ we mean the "Begin" and "Commit" components rather than coding those statements into the SQL which tends to cause unintended side-effects.

I think this workaround heavily depends on the use case.

Adding the transaction for every SQL Script component won't work because multiple transactions can not run in parallel.

E.g. I have one orchestration job that is starting four other Orchestration Jobs in parallel and inside these jobs I have these Begin/End Statements.

I currently get the error "A transaction is already in progress."

This can be useful to avoid issues when the user expects another outcome but in this case this feels strange again.

On the other hand if I build this transaction Begin/Commit around my outer Orchestration job that is starting these four SQL Script Components in parallel than they will be executed sequentially. So all the four different scripts are not running in parallel anymore.

What I want to do is to run these scripts in parallel but inside these scripts everything should be running in one session.

So this workaround will work with single jobs only or with an massiv performance loss when used for multiple scripts at the same time.

Splitting all jobs into single jobs will cause issues because of the dependencies with upcoming jobs.

But like mentioned at the start this will depend on the use cases. For heavy usage of the SQL Script components with a lot of dependencies and performance requirements this won't work but it might be helpful for other customers.

@Tinju​ I think for your use case you will face the same issues regarding the restriction on parallel running transactions?

I agree with @Nils​. I have the same use case described by Nils where the outer orchestration job calls several jobs concurrently and with Begin, Commit components, I am getting the error - A transaction is already in progress.

While considering this scenario related to session where jobs need to be run concurrently, there are two requirements.

  1. The scripts within each orchestration job should run within a single session
  2. The session used by one job should not be shared by another job running in parallel. This can reset/alter the session variable or the query tag which we are trying to set unique to each job.

I can get the intended results by running the jobs sequentially, but as Nils mentioned it comes at the cost of performance.

Also to add to the above, if Matillion could return the query Id associated with the query we run in a SQL component( assume we run only one query in a SQL component), I could have used this id instead of associating a query tag to the query. This may be different from Nil's use case, but I am sure there are quite a few use cases out there which can take advantage of the query id.

Thank you for your further feedback and detail, I cannot stress enough how helpful this will be for the product team.

I have highlighted this again to them and also looped in with the product owner who has taken ownership of @Nils​ idea.

Upon discussing this post and the idea with him, it does look like there are some factors causing this and the team is already working on fixing them of course, the team is looking at the functionality raised in the idea.

I encourage anyone that wishes to see such functionality added to the product to please head to https://metlcommunity.matillion.com/s/idea/0874G000000kBpYQAU/detail and up-vote it and feel free to leave further comments and use cases.

Thank you again to you both for such an interesting conversation and again for raising an idea that will help improve the product and users' experiences.

Kind regards, Joe