I already have a job flow that is working but it isn't very efficient. Also my row count isn't working (that's a separate post). Here is the flow described and a screenshot:
Calling Salesforce API and loading raw data into table A.
Populating table B using table A and adding our three standard columns. (create date, created by and job id).
Then I update the data for the standard columns with the metadata update step in table B.
Then I write the audit data.
I'd rather not have two tables - in this flow I'm not really even using the awesome merge logic from the Salesforce Incremental load component. I tried adding these standard columns to the first step that loads table A, but since they don't exist in the source table they are dropped when I execute the job.
Note - I am aware that I could use Python but even without me adding our additional/standard columns this table has more than 600 columns.
What you have looks like a pretty standard flow with one possible improvement. There could be a possibility of not needing to do the metadata update. What does the Metadata Update query have in it? Is it something like:
update TableB
set create date = 'Static Date of some sort',
created_by = 'Static created by name',
job id = 'Static Job ID'
where id = 'some load id'
If the values that you are using for the create date, created by and job id fields are the same for all rows in the last load, you could include the values for the standard fields as you load Table B. I have feeling there may be more logic in your TableB SQL Script and Metadata Update script that required you to tackle this in 2 two steps instead of 1. Perhaps you can give us a snippet of what is in those scripts to give us an idea of further complications with the above suggestion. I hope this helps!
Just as a final comment here, I ended up removing the loading into table b from table a, and the metadata update steps. I was able to show my teammates that the Salesforce Incremental Load has columns available to determine the updates/inserts even though the names of the columns aren't what we are familiar with.
SET DL_CREATED_DATE= CAST(CURRENT_TIMESTAMP AS TIMESTAMP_NTZ) ,
DL_CREATEDBY=CAST('${environment_username}' as VARCHAR(50)) ,
DL_JOBID= CAST('${run_history_id}' as VARCHAR(50))
;
You are correct, it really is that simple and I could include it in the previous step and eliminate the metadata update step.
I would only want the created date changed IF the record itself was new or updated. Currently, the way it's set up I'm truncating and reloading in the step after Salesforce all which isn't helpful. I was primarily trying to figure out if the Salesforce Incremental Load and the next step that is loading into Table B could be combined. Based on the suggestion in my other forum post I could achieve this with the change tracking feature in Snowflake. IF I'm understanding that correctly.