Monday Morning Data ChatMay 09, 2022
#145 - Data Engineering AMA w/ Matt Housley and Joe Reis
Matt Housley and Joe Reis chat about where data engineering is going, and take audience questions.
#144 - Data Career Advice w/ Matt Housley, Chris Tabb, and Joe Reis
The data job market is certainly evolving. Matt, Chris, and Joe have a candid chat and AMA about career advice going into 2024.
#143 - The Future of Generative AI in Data Analytics w/ Amit Prakash
Amit Prakash (CTO/Co-Founder of Thoughtspot) joins the show to chat about the future of generative AI in data analytics. Thoughtspot has been a leader in searchable analytics, and it will be interesting to get Amit's take on where the field of analytics is heading next.
#142 - Incentivizing Devs to Pursue Open-Source Projects w/ Max Howell
Max Howell created Homebrew, one of the most popular open-source software (OSS) packages on the planet. He's also the founder of tea.xyz, which is helping incentivize developers to pursue their OSS projects.
In this episode, we chat about the realities and future of OSS, how developers can be remunerated for their OSS projects, and much more.
#141 - Data Vendors and Grifters w/ Aaron Hunsaker
Aaron Hunsaker joins Matthew Housley and I to chat about data grifters, dealing with vendors, how data people should converse with "the business", and much more. Aaron doesn't hold back.
Enjoy this very candid in-person chat.
#140 - The Power of 3 (Math Nerds, Professors, and Authors) w/ Hala Nelson
Hala Nelson joins the show to chat writing books, teaching math, and much more. It's not often we get three math nerds, professors, and authors in the same conversation, and this is a lot of fun. Enjoy!
#139 - Streaming Data Processing Deep Dive w/ David Yaffe and Johnny Graettinger (Estuary)
David Yaffe and Johnny Graettinger (both from Estuary) join the show to do a deep dive into streaming data processing. We also cover how to scale change data capture (CDC) and where transformations belong in data pipelines.
Note - Joe's audio was having issues for this episode. Apologies.
#138 - The Rise and Importance of Business Language w/ John O'Gorman
Multiple products, versions, platforms, targets technologies, formats, and locales? How do you make sense of the "multiple of multiples" challenge from a technical perspective? The "language of the business" and data in all its structured, semi-structured, and 'unstructured' forms helps drive this home.
John O'Gorman has world-class expertise in language, semantics, and tying this together for the business. We hope you learn something new from this episode.
#137 - Why Apache Iceberg Won the Table Format War, Data Mesh, and More w/ Brian Olsen
Brian Olsen joins us again to chat about why Apache Iceberg won the table format war. We also finish our chat from last time about Data Mesh. #dataengineering #datalake #datamesh
#136 - Programming Languages for Data Science, and Why Your BI Team is Your Best Bet for Data Science w/ Dave Langer
David Langer joins the show to chat about how programming languages for data science, BI teams have a unique advantage in helping introduce data science into their organizations.
#135 - Dataframe Deep Dive w/ Devin Petersohn
Devin Petersohn (Modin, Ponder) knows a thing or two about dataframes, having done his PhD thesis on them, among other related achievements. We'll talk about all things dataframes, both high level and in the weeds. If you've ever wanted to learn about dataframes, this is the discussion for you.
#134 - Should Your Business Chase Generative AI? w/ Andreas Welsch
Andreas Welsch (Chief AI Strategist, Host of the Intelligence Briefing) joins the show to discuss the change management required to succeed with Generative AI in today's business world, prompt engineering, and more.
#133 - Intro to Data Contracts w/ Andrew Jones
#132 - Data Collaboration From the Outside-In w/ Andrew Padilla
Data collaboration is hard. Andrew Padilla chats about how to effectively address data collaboration at a common level across your organization, then apply the relevant parts to your internal orgs. #data
#131 - The Importance of Actionable Data to Inform Decision-Making w/ Joe Perez
Joe Perez ("Dr. Joe") joins the show to chat about the importance of actionable data to inform decision-making. We also discuss the various ways disparate data are brought together into a cohesive data warehouse, and how there should be a finite, measurable, deployable strategy
#130 - Data Modeling in 2023 w/ Colin Zima
Colin Zima joins the show to chat about what data modeling should look like in 2023 and beyond. We'll chat about real ETL, semantic layers, the troubles with BI, and much more. #datamodeling #data #dataengineering #analytics
#129 - Putting Data Products at the Center of Data Management w/ Saket Saurabh
Saket Saurabh joins the show to chat about taking a data product-centric view to reinventing data management. #data #dataproducts #datamanagement #dataengineering
#128 - Big & Small Data in 2023 w/ Joe Reis & Matt Housley
There's a lot of debate on big and small data. For systems and compute, some say "Big Data is Dead", while others challenge this notion. In AI and ML, big tech companies can pour tons of money and data into building massive LLMs, while open source provides compelling "small data" alternatives to the LLM walled gardens.
So which is it? Will Big Data reign supreme or will small data become more popular? Matt and I riff on these topics and more.
#data #dataengineering #chatgpt #ai #bigdata
#127 - Product Management as a Data Scientist w/ Santona Tuli
Santona Tuli discusses the product management aspects as a data scientist - product and stakeholder management, due diligence of requirements gathering, and developing a strategy before implementing DS/ML pipelines. You know, the fun stuff ;)
#126 - Data Virtualization Hot Takes w/ Brian Olsen
#125 - The Art of Developer Relations w/ Tim Berglund
#124 - The Rise of the Semantic Layer in the Modern Data Stack w/ Dave Mariani
#123 - Semantic Layers w/ Artyom Keydunov & Pavel Tiunov (Cube.dev)
Artyom Keydunov & Pavel Tiunov (co-founders of Cube.dev) join the show to chat about all things semantic layers. Curious about the future of data apps? Then check out this episode. #data #semanticlayer #dataengineering #analytics
#122 - Data Product Management w/ Malcolm Hawker
Malcolm Hawker joins the show to chat about data product management, generative AI and its impact on data products and the broader data industry, and much more.
#datamesh #dataengineering #dataproducts #data
#121 - It's Joe & Matt! - Random Grab Bag Episode
Joe Reis and Matt Housley rant about generative AI, Data Council, WASM, data modeling, regulations, and how programming languages and paradigms impact the business.
#data #dataengineering #datamodeling #chatgpt
#120 - How Data Modeling Relates to Data Engineering w/ Larry Burns
Larry Burns is a longtime data modeling expert and author on the topic. He joins the show to discuss some of the (often misunderstood) ways data modeling relates to data engineering, logical data modeling, and much more. If you're a data engineer, you can't afford to miss this discussion. #dataengineering #data #datamodeling
#119 - Upskilling in a Downturn, Tech Education, Publishing a Tech Book, and More w/ Jess Haberman
Jess Haberman (Anaconda, formerly at O'Reilly Media) joins the show to chat about ways to upskill in a downturn, tech and data education, publishing a tech book, and much more.
Jess is zero BS and always brings a practical perspective. We should know. She signed our book ;)
#118 - Unlocking the Value of Unstructured Data with AI w/ Cody & Will (Coactive.ai)
Cody Coleman and Will Gaviria Rojas (Co-founders of Coactive.ai) join the show to chat about the rising importance of unstructured data and the role of AI in unlocking the value of unstructured content. Their motto is "Content is King, and AI is the new Queen." Given the rise of unstructured data, this is a must for anyone working in AI.
#117 - Designing Massive Distributed Systems at LinkedIn w/ Felix GV
Felix GV designs HUGE distributed systems as a principal staff engineer at LinkedIn. We talk about the Venice open-source project and the challenges of massively distributed data.
But Felix is also thinking about something WAY bigger - designing software for a multi-planetary civilization! This is a very interesting discussion that you won't usually come across on this planet, or others (yet).
#116 - Calling Data Bullsh*t w/ Rogier Werschkull
Rogier Werschkull is keen on detecting BS in the data industry. This is a very opinionated and informative discussion about the state of affairs of hype and reality in the data industry today.
Matt Turck's Gone MAD! - The 2023 ML, AI, and Data (MAD) Landscape
Matt Turck (Firstmark Capital) joins us to chat about what's new in the data landscape, the economic realities impacting the startup ecosystem, all things AI, and much more.
Matt is easily one of our favorite thinkers in the data and tech space, so this is definitely worth tuning in for.
MAD Landscape 2023: https://mattturck.com/mad2023/
#115 - Building Data Infrastructure w/ Neelesh Salian
Neelesh Salian (Staff software engineer at dbt labs) joins the show to discuss building data infrastructure, general engineering topics at the staff+ level, and more.
#dataengineering #softwareengineering #data
#114 - How to Write a Best-Selling O'Reilly Technical Book w/ Joe Reis & Matt Housley
Joe and Matt discuss getting Fundamentals of Data Engineering signed with O'Reilly, the writing process, and marketing a best-selling tech book. This is worth a listen if you've ever wanted to write a technical book.
#dataengineering #data #author #oreilly
#113 - Random Data Grab Bag w/ Scott Taylor
Scott Taylor has become our regular for Valentine's Day week. Join us for a grab bag of thoughts and listener questions.
#112- Real-time Analytics at Massive Scale w/ Venkat Venkataramani
Venkat Venkataramani (CEO & co-founder at Rockset) joins the show to chat about real-time analytics at massive scale. In a former life, Venkat led Facebook's infrastructure team that handled billions of events per second for all of Facebook's user data services. Venkat is on OG in the real-time analytics space, so you'll learn a lot about this topic.
#dataengineering #data #realtimedata #analytics
Data Catalogs - A Debate w/ Shirshanka Das & Ananth Packkildurai - Special Episode
#111 - Streaming Data In the Enterprise w/ John Kutay
John Kutay (Striim) joins the show to chat about “easy mode” and “hard mode” use cases for streaming data in the enterprise.
#110 - Data Quality - The Hard Parts w/ Jeremy Stanley (Anomalo)
What's Next, with Zhamak Dehghani (Next Data) - Special Episode
Zhamak Dehghani (CEO & Founder of Next Data) discusses why she started Next Data, and how her company will finally help Data Mesh become a reality.
Next Data: https://www.nextdata.com/
#109 - ColorWise, Content Creation, and More w/ Kate Strachnyi
Kate Strachnyi (DATAcated) joins the show to chat about her new book, ColorWise (O'Reilly), content creation, running DATAcated, and much more.
#108 - Switching From a Batch to Streaming Mindset w/ Chip Huyen
Chip Huyen (Claypot AI) joins the show to chat about switching from a batch mindset to a streaming mindset, the convergence of data scientist and data engineering roles, and much more.
Chip is a well-regarded person in the ML/AI space. She's CEO/co-founder of Claypot AI, teaches Machine Learning Systems Design at Stanford, and wrote the phenomenal book, Designing Machine Learning Systems (O'Reilly), among many achievements.
#datascience #ai #dataengineering
#107 - Data Modeling, Writing books, and More w/ Serge Gershkovich
Serge Gershkovich (SqlDBM) joins the show to chat about a variety of topics, like data modeling, writing books, career pivots, and much more.
Cameo appearance from Chris Tabb!
#datamodeling #dataengineering #data
#106 - Moving on from the Modern Data Stack w/ Sarah Catanzaro
Sarah Catanzaro (Amplify Partners) has a keen sense of the data landscape. She joins the show to chat about why she's shifting her focus from the Modern Data Stack, its limitations, and the culture surrounding it that she thinks is curbing the impact of data.
We'll also talk about whatever else is on her mind.
#105 - The Future of Data Catalogs w/ Ole Olesen-Bagneux
#104 - Our Worst Episode Ever! - Grab Bag Q&A w/ Matt Housley and Joe Reis
Welcome to our worst episode ever! But also pretty fun.
There's a lot going on in the data and tech space right now. Matt and Joe kick it old school and dive into a variety of topics ranging from all things downturn, cloud vs on-prem, what we look for in guests, and much more.
We had to cut the show a bit short because Joe kept getting interrupted by delivery people.
#103 - How Current Market Conditions Affect the Data and Tech Landscape w/ Sanjeev Mohan
Sanjeev Mohan (SanjMo, former Gartner Research VP) joins the show to chat about the current market conditions affecting the data and tec landscape, trends he's seeing, and much more.
#data #it #tech
Data Mesh, The Hard Parts AMA w/ Zhamak Dehghani - Special Episode
Data Mesh has caught the technology and data industry by storm and is easily one of the hottest topics today. Oftentimes, the harder parts of Data Mesh don't get as much attention or coverage.
This was an open session at the Utah Data Engineering Meetup (November 2022) to ask Zhamak about the hard parts of data mesh that have inspired her to start a company to address them head-on.
#102 - Boring Tech is Exciting! w/ Jeremiah Lowin (Prefect)
Jeremiah Lowin (CEO/Co-Founder @ Prefect) joins the show to chat about why he's excited about boring tech, the current VC/startup environment, building trust, meetups, and much more.
#dataengineering #prefect #data
#101 - Honest No-BS Data Modeling w/ Juan Sequeda
Data modeling is seeing a massive resurgence of attention lately. How do you cut through the noise and know what's useful for your situation?
Juan Sequeda (data.world) joins the show to chat about honest, no-BS data modeling, aka data modeling for the real world.
#datamodel #dataengineering #data
#100 - The Intersection of Software Engineering & Data w/ Sonny Rivera (ThoughtSpot)
Sonny Rivera (ThoughtSpot) joins the show to chat about software engineering practices on data, data modeling, approaching things from first principles, the future of BI, and much more.
Note - we had a few hiccups with Streamyard, but the show conversation is very good.
#dataengineering #thoughtspot #softwareengineering #data