How to pass grid variable data in the sub-job being call from the main job

Hi,

 

 

Is there a way I can filter the data of a grid variable when setting it for grid variable in the sub-job.

 

Here is what I am trying to do:

 

 

I have a static table with tablenames(from which duplicates needs to be removed) and comma separate key columns. I am reading this data into a grid variable and iterating over the table_name from this gv. I now need to send only that row from the grid, corresponding tablename of which is being parsed in grid iterator.

 

 

In my sub-job i have rest of the steps to take care of duplicate check and rejections.

 

 

Can anyone please how can I do this please.

Hi @Pallavi​,

I am wondering if another approach will work better for removing duplicates. Everything you mentioned about removing duplicates and dealing with comma separated key columns has my mind drifting towards a transformation job instead of an orchestration job. Have you tried a transformation job that would read from the table, remove the duplicates, and then write the changes back? I would think that it could look something like this:

I hope this helps!

Hello Bryan,

 

Thank you for your response. Few points that above design does not address :

 

  1. The duplicates can be caused due to row level duplicate or key level duplicate - So instead of using distinct (where key is not required), I am using Rank.
  2. I need to "reject" the records in #1, which business would want to see before proceeding.
  3. I need to have a generic job, which takes parameterized source table(Snowflake) and so the columns and keys (through gv) and target file.
  4. So my transformation job has a source table(table param> -> Rank -> filter1-> STAGE_<tblparam> || Filter2->Reject_<Tableparam>

In order to read the keys through gv, I have the above question.

 

Regards,

Pallavi

 

Hi @Pallavi​ ,

I don't have a way to filter a data grid, but for number 3 above - "I need to have a generic job, which takes parameterized source table(Snowflake) and so the columns and keys (through gv) and target file" - you could use the table iterator component and select the unique values from the table with the table names and column list, skipping the step of loading into a grid variable. If you still want to use a grid variable, you can use the Grid Iterator component to spin through the grid variable and call the sub job for each set of values. Something like:

@Bryan@DataGuy​ Thank you for the responses! So, I have been able to build my transformation job(generic in nature) which i am calling for each table_name + primary_keys driven through a static staging table. (added attachment of screenshot).

My transformation job is a generic job which runs for each of the tables that I add in the static table. (added screenshot of my transformation job)

But in the process of running this job for multiple table, i see a weird constraint. In the "Table Input" component, the column names are passed through grid variable which has to have a default value for validation. So far so good. The validation process also required me to pass default value for the parameters I pass from the calling Orchestration job. But, when I run the job for a table which does NOT contain the default value that I pass into the grid variable, from calling Orchestration job, my job aborts with "Parameter error: unrecognized column <default value>" in Table Input component. (Screenshot attached)

Would any of you be available for a quick call to go over the issue that I am facing?

Regards,

Pallavi

How to pass grid variable data in the sub-job being call from the main job

I was heading down the path of @DataGuy​ but it sounds like to get there you are needing to disregard or capture those duplicates in a separate dataset. I am failing to fully understand the big picture and what problem you are trying to solve.

Could you perhaps give us an example of the dataset that contains the 2 different duplicate scenarios? It would be good to know how this data set plays into the bigger picture and what functionality it's driving your process.

In thinking about number 1 and 2 on your list of points, through Snowflake functions like SPLIT_TO_TABLE and DISTINCT keywords you should be able to get a list of dedup'd rows and dup'd rows which would allow you to process them independently from each other. Again, I am not sure where the dup and dedup rows play into calling sub-orchestrations or sub-transformation jobs.

Perhaps a set of screenshots that show what you are currently doing may help?

How to pass grid variable data in the sub-job being call from the main job

Just made an observation - If I make by grid variables shared and run the iterator concurrently, all instances run successfully. Which confused me even more!

Help! Help! Help!

 

Regards,

Pallavi

You may use Query Result to Grid component to filter the list avoiding duplicates from your table list. Once you filter them, you can pass it on to the Grid variable and then iterate it through Grid Iterator. Hope this helps.