I am currently trying to transform a table.
I would like to get the following result.
Can anyone help me with this?
I am currently trying to transform a table.
I would like to get the following result.
Can anyone help me with this?
Hi @ImmanuelHasselmann,
Without more information I can't say for sure but it appears you want to pivot the data based on either the StartDate or EndDate. The approach is a little different depending on the warehouse product you are using (Snowflake, BigQuery, Redshift, etc.) You can do this by writing the SQL statement by hand or you can use a Matillion Transformation where you take the base table and apply the pivot transformation to it before landing the data in the destination table. I hope this helps!
Hi Immanuel,
As Bryan mentioned, this looks like a densification pivot. On Snowflake you can take advantage of the GENERATOR function to perform the densification, and the LAST_VALUE function for the most-recent lookup.
If your data is in a table named manufacturer, something like this:
WITH spine AS (SELECT TO_DATE('20141231', 'YYYYMMDD')
+ ROW_NUMBER() OVER (ORDER BY SEQ8()) AS mark_date
FROM TABLE(GENERATOR(ROWCOUNT=>20000))),
limiter AS (SELECT MIN(start_date) AS start_date, MAX(end_date) AS end_date FROM manufacturer),
codes AS (SELECT DISTINCT manufacturer FROM manufacturer)
SELECT spine.mark_date,
codes.manufacturer,
LAST_VALUE(actual.condition_value IGNORE NULLS) OVER (PARTITION BY codes.manufacturer ORDER BY spine.mark_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM spine
FULL JOIN limiter
FULL JOIN codes
LEFT JOIN manufacturer actual ON actual.manufacturer = codes.manufacturer AND actual.start_date = spine.mark_date
WHERE spine.mark_date >= limiter.start_date
AND spine.mark_date <= limiter.end_date
ORDER BY spine.mark_date, codes.manufacturer
Henry
Unfortunately, I get the wrong result with the code. I want the sum of the valid conditions per date.
OK, I think it's a little simpler then.. you just SUM the values that are within range per code and date.
WITH spine AS (SELECT TO_DATE('20141231', 'YYYYMMDD')
+ ROW_NUMBER() OVER (ORDER BY SEQ8()) AS mark_date
FROM TABLE(GENERATOR(ROWCOUNT=>20000))),
limiter AS (SELECT MIN(start_date) AS start_date, MAX(end_date) AS end_date FROM manufacturer),
codes AS (SELECT DISTINCT manufacturer FROM manufacturer)
SELECT spine.mark_date,
codes.manufacturer,
SUM(m.condition_value)
FROM spine
JOIN limiter ON spine.mark_date >= limiter.start_date AND spine.mark_date <= limiter.end_date
FULL JOIN codes
LEFT JOIN manufacturer m ON m.manufacturer = codes.manufacturer AND m.start_date <= spine.mark_date AND m.end_date >= spine.mark_date
GROUP BY spine.mark_date, codes.manufacturer
ORDER BY 1,2;
Henry