Hello everyone, I am new to Matillion

I am currently trying to transform a table.

 

I would like to get the following result.

 

Can anyone help me with this?

Hi @ImmanuelHasselmann​,

Without more information I can't say for sure but it appears you want to pivot the data based on either the StartDate or EndDate. The approach is a little different depending on the warehouse product you are using (Snowflake, BigQuery, Redshift, etc.) You can do this by writing the SQL statement by hand or you can use a Matillion Transformation where you take the base table and apply the pivot transformation to it before landing the data in the destination table. I hope this helps!

Hi Immanuel,

 

As Bryan mentioned, this looks like a densification pivot. On Snowflake you can take advantage of the GENERATOR function to perform the densification, and the LAST_VALUE function for the most-recent lookup.

 

If your data is in a table named manufacturer, something like this:

 

WITH spine AS (SELECT TO_DATE('20141231', 'YYYYMMDD')

+ ROW_NUMBER() OVER (ORDER BY SEQ8()) AS mark_date

FROM TABLE(GENERATOR(ROWCOUNT=>20000))),

limiter AS (SELECT MIN(start_date) AS start_date, MAX(end_date) AS end_date FROM manufacturer),

codes AS (SELECT DISTINCT manufacturer FROM manufacturer)

SELECT spine.mark_date,

codes.manufacturer,

LAST_VALUE(actual.condition_value IGNORE NULLS) OVER (PARTITION BY codes.manufacturer ORDER BY spine.mark_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

FROM spine

FULL JOIN limiter

FULL JOIN codes

LEFT JOIN manufacturer actual ON actual.manufacturer = codes.manufacturer AND actual.start_date = spine.mark_date

WHERE spine.mark_date >= limiter.start_date

AND spine.mark_date <= limiter.end_date

ORDER BY spine.mark_date, codes.manufacturer

 

Henry

 

Unfortunately, I get the wrong result with the code. I want the sum of the valid conditions per date. 

OK, I think it's a little simpler then.. you just SUM the values that are within range per code and date.

 

WITH spine AS (SELECT TO_DATE('20141231', 'YYYYMMDD')

+ ROW_NUMBER() OVER (ORDER BY SEQ8()) AS mark_date

FROM TABLE(GENERATOR(ROWCOUNT=>20000))),

limiter AS (SELECT MIN(start_date) AS start_date, MAX(end_date) AS end_date FROM manufacturer),

codes AS (SELECT DISTINCT manufacturer FROM manufacturer)

SELECT spine.mark_date,

codes.manufacturer,

SUM(m.condition_value)

FROM spine

JOIN limiter ON spine.mark_date >= limiter.start_date AND spine.mark_date <= limiter.end_date

FULL JOIN codes

LEFT JOIN manufacturer m ON m.manufacturer = codes.manufacturer AND m.start_date <= spine.mark_date AND m.end_date >= spine.mark_date

GROUP BY spine.mark_date, codes.manufacturer

ORDER BY 1,2;

 

Henry