What is the best job flow for a Salesforce Incremental Load when you have to add standard/additional columns?

kerri.gathright · May 27, 2021, 9:31pm

I already have a job flow that is working but it isn't very efficient. Also my row count isn't working (that's a separate post). Here is the flow described and a screenshot:

Calling Salesforce API and loading raw data into table A.
Populating table B using table A and adding our three standard columns. (create date, created by and job id).
Then I update the data for the standard columns with the metadata update step in table B.
Then I write the audit data.

I'd rather not have two tables - in this flow I'm not really even using the awesome merge logic from the Salesforce Incremental load component. I tried adding these standard columns to the first step that loads table A, but since they don't exist in the source table they are dropped when I execute the job.

Note - I am aware that I could use Python but even without me adding our additional/standard columns this table has more than 600 columns.

Bryan · May 28, 2021, 9:52pm

Hi @kerri.gathright,

What you have looks like a pretty standard flow with one possible improvement. There could be a possibility of not needing to do the metadata update. What does the Metadata Update query have in it? Is it something like:

update TableB

set create date = 'Static Date of some sort',

created_by = 'Static created by name',

job id = 'Static Job ID'

where id = 'some load id'

If the values that you are using for the create date, created by and job id fields are the same for all rows in the last load, you could include the values for the standard fields as you load Table B. I have feeling there may be more logic in your TableB SQL Script and Metadata Update script that required you to tackle this in 2 two steps instead of 1. Perhaps you can give us a snippet of what is in those scripts to give us an idea of further complications with the above suggestion. I hope this helps!

kerri.gathright · June 2, 2021, 11:10pm

Just as a final comment here, I ended up removing the loading into table b from table a, and the metadata update steps. I was able to show my teammates that the Salesforce Incremental Load has columns available to determine the updates/inserts even though the names of the columns aren't what we are familiar with.

So the final version going to Production is this:

Thanks for all of the input! Much appreciated.

kerri.gathright · June 1, 2021, 7:42pm

UPDATE table b

SET DL_CREATED_DATE= CAST(CURRENT_TIMESTAMP AS TIMESTAMP_NTZ) ,

DL_CREATEDBY=CAST('${environment_username}' as VARCHAR(50)) ,

DL_JOBID= CAST('${run_history_id}' as VARCHAR(50))

;

You are correct, it really is that simple and I could include it in the previous step and eliminate the metadata update step.

I would only want the created date changed IF the record itself was new or updated. Currently, the way it's set up I'm truncating and reloading in the step after Salesforce all which isn't helpful. I was primarily trying to figure out if the Salesforce Incremental Load and the next step that is loading into Table B could be combined. Based on the suggestion in my other forum post I could achieve this with the change tracking feature in Snowflake. IF I'm understanding that correctly.

Topic		Replies	Views
How to carry a value from an API call into the next job step Matillion ETL	6	0	June 1, 2021
I want to know answers about some of my questions, please check in Details: Matillion ETL	2	2	March 3, 2023
Incremental load fails for a table when a new column is added in the source table. What can be done to solve this issue? Matillion ETL	6	0	December 20, 2023
Is it possible to a batch load or perform loading of multiple objects from Salesforce using Matillion ETL? Matillion ETL	1	1	November 1, 2022
Incremental Load Component Matillion ETL	2	0	August 4, 2022

What is the best job flow for a Salesforce Incremental Load when you have to add standard/additional columns?

Related topics