Is it possible to join two tables using grid variables as the join columns?

I'm attempting to find a way to compare two tables dynamically, using grid variables containing column names as the join conditions. Is this possible?

 

So far, the only solution I've seen is using python to read from the grid variables and generate a SQL query from a template, but I'm trying to avoid that if possible.

 

The goal of this job is to compare two tables and determine the different rows between them, then bucket the rows accordingly (which exist in one but not the other, which ones are new, etc). Though, this job will be used across many table sets, and so the column names are going to be different each iteration, and I need the comparison to account for that

Hi @anthony​,

There are a few ways to handle this comparison. It's a bit complicated but I can throw out one scenario that might make it a bit easier. I am making the assumption that you are using Snowflake but this approach should work with the other CDWs.

I like where you are headed with the grid variable idea. I think the approach still applies but for something like this, it might be worth leveraging a table. It's usually best to leverage your CDW as much as possible as it has the most power.

Here is the though I have... Create a table that has 2 columns. Put the tables you are comparing against in that table. (i.e Table A will be compared to Table B). Create an orchestration that iterates over the table with the mappings and on each iteration run a series of queries over them.

Assuming the column names are the same, I would query information schema for the columns in each table and compare them. You can save those results to a transient or temporary table to be used later. Use a Query to Grid component to run another query against the information schema to generate the comparison SQL statements. From there you should be able to iterate over the grid variable and run each query and store the results to a transient table or to a file if necessary.

I know this is high level but hopefully the approach I explained helps spark an idea or gives you some place to start. Please feel free to post back if needed. It's a complicated process and not easy to do in any CDW.

Hey @Bryan​ , thank you for the response! We are using Snowflake, yes.

What I found was that I able to move forward on this by using query to grid variables to assign metadata to grid variables in the necessary format to facilitate a dynamic comparison later on using the detect changes component.

I like the ingenuity! This is definitely nothing to take away from your work or the Matillion product because it makes sense and gives you what you want but I do have some general advice based on long running experience. Try to leverage Snowflake as much as possible and try not to rely so heavily on specific functionality (Grid Variables) in the tool (Matillion). This is nothing against Matillion or any other specific tool but Data Warehouse solutions don't ever go away but many times companies want to move to different ELT products or perhaps use 2 or more. Building a high level of technical dependency around specific functionality makes it really hard to move to another tool or solution later.

 

Again, great solution and dev work to get you to your goal!