🎙️ Ask Matillion Anything with Julian Wiffen 🎙️

Joe-Community-Manager · April 17, 2024, 8:33am

Welcome, data enthusiasts! 🌟 Join us for an engaging session where we dive deep into the fascinating world of AI, data engineering, and cutting-edge models. I’m thrilled to introduce Julian Wiffen, our Director of Data Science & AI at Matillion. 🚀

Julian & Matillion have explored the potential of generative AI to transform data engineering. From standardizing job titles to handling vast blocks of unstructured data, he’s been at the forefront of innovation. 🤖

Now, it’s your turn! 🗣️ Ask Julian Anything about AI, data pipelines, or the future of analytics. Whether you’re a seasoned pro or just curious, this is your chance to engage with an expert. 🤝

Post your questions below, Julian will be on hand to answer them starting Monday 22nd April until Friday 26th April.

ASD · April 17, 2024, 4:18pm

Questions

DO you think in future Generative AI will become mandatory skill for all data professionals?
What are the biggest challenges of using generative AI in data pipelines? and how do we ensure quality of data generated by AI?

Thanks @JoeCommunityManager and team for having these sessions,

JulianWiffen · April 19, 2024, 4:37pm

Hello ASD

Will Generative AI become a mandatory skill for all data professionals? I would say probably yes, at least at a basic level.

It offers so many potential useful techniques to the data engineer, scientist or analyst - a few examples off the top of my head

classifying widely varying or free text columns into simple categorical buckets
feature engineering for BI purposes or predictive models based on yes/no questions over unstructured data
data clean-up or standardisation
the ability handle inputs in multiple languages
summarisation of very granular data to an appropriate level for the recipient
effectively extremely sophisticated regex filtering
data enrichment (e.g. 'summarise what you know about this company')
generation of test data...

What are the biggest challenges - getting used to the non-deterministic nature of it first of all - we'll need to learn to wrap quality processes around it, just as we might around any business process done by a human, especially as we are looking at either text based outputs that are hard to code checks around and/or use cases where we are trying to measure the quality of a judgement call.

There's still a lot to be learned about how to measure the quality. Some things we are exploring include the use of multiple choice questions to test model reasoning in a way that can be automated, taking a random sample for human evaluation of quality, feedback links on anything that is going to a human end user and using vector separation to compare how consistently a model answers the same question. As models become more sophisticated, I can see an approach in which a lower cost, high throughput model processes all transactions and a small percentage are passed to a larger, more costly but more advanced model to evaluate the quality. We may also start to see the approach of asking multiple models the same question in an ensemble method - effectively having them vote on the outcome.

The other big challenge is in speed and in rate/throughput limits - we've grown used to handling very large datasets - 100,000 rows would not be classed as a large volume for any normal ETL purposes these days, but it would take significant time to process that through even a simple LLM. This will of course improve over time, but we may need to do some mental recalibration when working in this space. We're certainly going to need to get adept at balancing accuracy & quality against cost and speed.

ASD · April 20, 2024, 7:09pm

Spot-on insights! Appreciate it greatly. Thank you @JulianWiffen

Your insights are really motivating me to dive deeper into learning more about Generative AI.

Topic		Replies	Views
AI The Future Of Data Engineering! Data Productivity Cloud	3	0	March 22, 2024
Ask Matillion Anything is back! Data Productivity Cloud	1	1	August 27, 2024
The AI Certification for the Data Productivity Cloud has launched! Academy And Certification	3	2	November 29, 2024
Thank You for Making Our Inaugural AMA Session Extraordinary! Data Productivity Cloud	1	0	November 14, 2023
MATILLION ANNOUNCEMENT: Join Our New Beginners User Group Today! Data Productivity Cloud	1	0	November 15, 2023

🎙️ Ask Matillion Anything with Julian Wiffen 🎙️

Related topics