Monday Morning Data ChatJun 13, 2022
#154 - Getting Business Value From Data, the CXO playbook, AWS ReInvent, and more w/ Sol Rashidi
Sol Rashidi is a heavy hitter in the enterprise data space, having been CXO at Estee Lauder, Sony, Merck, and more. She joins us to chat about getting business value from data, the CXO playbook, AWS ReInvent, and more.
#153 - Automating Analytics with Generative AI w/ Sarah Nagy
Sarah Nagy (CEO, Seek.ai) joins us to chat about automating analytics with generative AI, the generative AI space in general, and much more.
#152 - Knowledge Graphs, Semantics, and More w/ Dave McComb
Dave McComb (Semantic Arts) is a pioneer in the use of knowledge graphs and semantics in data management. He joins to chat about these topics, and much more.
#151 - The EU AI Act w/ Kai Zenner
Kai Zenner joins us to chat about all things EU AI Act. If you've wanted to learn about this upcoming piece of critical regulation, tune in.
#150 - Apache Hudi Deep Dive w/ Nadine Farah
Nadine Farah joins the show to chat about Apache Hudi's core primitives: indexing, CDC, table services, faster UPSERTs, incremental processing framework, and more.
#149 - Why is Data Security so Hard? w/ Yoav Cohen
Yoav Cohen (co-founder & CTO at Satori) joins the show to chat about why data security is hard, strategies companies use to deal with analytics over sensitive data, security and compliance requirements that data teams need to meet, and much more.
#148 - Data Conference Recap (Coalesce, Gitex Dubai, DEWCon) w/ Kevin Hu
Our favorite nerd sniper, Kevin Hu (CEO of Metaplane), joins the show to help us recap some major conferences we attended last week. Lots of data news, gossip, anecdotes, and more.
#147 - Data Warehouses and Semantics Deep Dive, SDF, and more w/ Lukas Schulte (SDF)
Why are semantics important for a data warehouse? Lukas Schulte joins us to chat about why semantics are important, the heterogeneity of data systems, how semantics relate to SQL compilers, his project SDF, and much more.
Please be aware that this discussion will get into the nitty-gritty and technical weeds of all things data.
#146 - Improving Your Health and Wellness - Techie Edition w/ Colleen Fotsch
This is a bit of a different episode, but it's a topic that is long overdue for discussion. Between long hours sitting in front of a monitor, "hustle culture", and prevalent alcohol and drug use, our profession is literally killing us. The negative effects on health and wellness among techies are insane. We've seen our friends go to the ER from stress, diet, and lifestyle-related emergencies. We've lost other friends along the way. Colleen Fotsch is uniquely qualified to discuss this issue. She is used to operating at the highest levels of sports, being an NCAA D1-champion swimmer, multiple-time CrossFit Games athlete and coach, and former US Bobsled team member. She also works as a data analyst and part-time coach for Opex, a leader in fitness education coaching (she's Joe's coach). It's time we wake up and look at how we can improve our health and wellness, and bring our best selves to our work and life. Colleen's IG: https://www.instagram.com/colleenfotsch/?hl=en
#145 - Data Engineering AMA w/ Matt Housley and Joe Reis
Matt Housley and Joe Reis chat about where data engineering is going, and take audience questions.
#144 - Data Career Advice w/ Matt Housley, Chris Tabb, and Joe Reis
The data job market is certainly evolving. Matt, Chris, and Joe have a candid chat and AMA about career advice going into 2024.
#143 - The Future of Generative AI in Data Analytics w/ Amit Prakash
Amit Prakash (CTO/Co-Founder of Thoughtspot) joins the show to chat about the future of generative AI in data analytics. Thoughtspot has been a leader in searchable analytics, and it will be interesting to get Amit's take on where the field of analytics is heading next.
#142 - Incentivizing Devs to Pursue Open-Source Projects w/ Max Howell
Max Howell created Homebrew, one of the most popular open-source software (OSS) packages on the planet. He's also the founder of tea.xyz, which is helping incentivize developers to pursue their OSS projects.
In this episode, we chat about the realities and future of OSS, how developers can be remunerated for their OSS projects, and much more.
#141 - Data Vendors and Grifters w/ Aaron Hunsaker
Aaron Hunsaker joins Matthew Housley and I to chat about data grifters, dealing with vendors, how data people should converse with "the business", and much more. Aaron doesn't hold back.
Enjoy this very candid in-person chat.
#140 - The Power of 3 (Math Nerds, Professors, and Authors) w/ Hala Nelson
Hala Nelson joins the show to chat writing books, teaching math, and much more. It's not often we get three math nerds, professors, and authors in the same conversation, and this is a lot of fun. Enjoy!
#139 - Streaming Data Processing Deep Dive w/ David Yaffe and Johnny Graettinger (Estuary)
David Yaffe and Johnny Graettinger (both from Estuary) join the show to do a deep dive into streaming data processing. We also cover how to scale change data capture (CDC) and where transformations belong in data pipelines.
Note - Joe's audio was having issues for this episode. Apologies.
#138 - The Rise and Importance of Business Language w/ John O'Gorman
Multiple products, versions, platforms, targets technologies, formats, and locales? How do you make sense of the "multiple of multiples" challenge from a technical perspective? The "language of the business" and data in all its structured, semi-structured, and 'unstructured' forms helps drive this home.
John O'Gorman has world-class expertise in language, semantics, and tying this together for the business. We hope you learn something new from this episode.
#137 - Why Apache Iceberg Won the Table Format War, Data Mesh, and More w/ Brian Olsen
Brian Olsen joins us again to chat about why Apache Iceberg won the table format war. We also finish our chat from last time about Data Mesh. #dataengineering #datalake #datamesh
#136 - Programming Languages for Data Science, and Why Your BI Team is Your Best Bet for Data Science w/ Dave Langer
David Langer joins the show to chat about how programming languages for data science, BI teams have a unique advantage in helping introduce data science into their organizations.
#135 - Dataframe Deep Dive w/ Devin Petersohn
Devin Petersohn (Modin, Ponder) knows a thing or two about dataframes, having done his PhD thesis on them, among other related achievements. We'll talk about all things dataframes, both high level and in the weeds. If you've ever wanted to learn about dataframes, this is the discussion for you.
#134 - Should Your Business Chase Generative AI? w/ Andreas Welsch
Andreas Welsch (Chief AI Strategist, Host of the Intelligence Briefing) joins the show to discuss the change management required to succeed with Generative AI in today's business world, prompt engineering, and more.
#133 - Intro to Data Contracts w/ Andrew Jones
#132 - Data Collaboration From the Outside-In w/ Andrew Padilla
Data collaboration is hard. Andrew Padilla chats about how to effectively address data collaboration at a common level across your organization, then apply the relevant parts to your internal orgs. #data
#131 - The Importance of Actionable Data to Inform Decision-Making w/ Joe Perez
Joe Perez ("Dr. Joe") joins the show to chat about the importance of actionable data to inform decision-making. We also discuss the various ways disparate data are brought together into a cohesive data warehouse, and how there should be a finite, measurable, deployable strategy
#130 - Data Modeling in 2023 w/ Colin Zima
Colin Zima joins the show to chat about what data modeling should look like in 2023 and beyond. We'll chat about real ETL, semantic layers, the troubles with BI, and much more. #datamodeling #data #dataengineering #analytics
#129 - Putting Data Products at the Center of Data Management w/ Saket Saurabh
Saket Saurabh joins the show to chat about taking a data product-centric view to reinventing data management. #data #dataproducts #datamanagement #dataengineering
#128 - Big & Small Data in 2023 w/ Joe Reis & Matt Housley
There's a lot of debate on big and small data. For systems and compute, some say "Big Data is Dead", while others challenge this notion. In AI and ML, big tech companies can pour tons of money and data into building massive LLMs, while open source provides compelling "small data" alternatives to the LLM walled gardens.
So which is it? Will Big Data reign supreme or will small data become more popular? Matt and I riff on these topics and more.
#data #dataengineering #chatgpt #ai #bigdata
#127 - Product Management as a Data Scientist w/ Santona Tuli
Santona Tuli discusses the product management aspects as a data scientist - product and stakeholder management, due diligence of requirements gathering, and developing a strategy before implementing DS/ML pipelines. You know, the fun stuff ;)
#126 - Data Virtualization Hot Takes w/ Brian Olsen
#125 - The Art of Developer Relations w/ Tim Berglund
#124 - The Rise of the Semantic Layer in the Modern Data Stack w/ Dave Mariani
#123 - Semantic Layers w/ Artyom Keydunov & Pavel Tiunov (Cube.dev)
Artyom Keydunov & Pavel Tiunov (co-founders of Cube.dev) join the show to chat about all things semantic layers. Curious about the future of data apps? Then check out this episode. #data #semanticlayer #dataengineering #analytics
#122 - Data Product Management w/ Malcolm Hawker
Malcolm Hawker joins the show to chat about data product management, generative AI and its impact on data products and the broader data industry, and much more.
#datamesh #dataengineering #dataproducts #data
#121 - It's Joe & Matt! - Random Grab Bag Episode
Joe Reis and Matt Housley rant about generative AI, Data Council, WASM, data modeling, regulations, and how programming languages and paradigms impact the business.
#data #dataengineering #datamodeling #chatgpt
#120 - How Data Modeling Relates to Data Engineering w/ Larry Burns
Larry Burns is a longtime data modeling expert and author on the topic. He joins the show to discuss some of the (often misunderstood) ways data modeling relates to data engineering, logical data modeling, and much more. If you're a data engineer, you can't afford to miss this discussion. #dataengineering #data #datamodeling
#119 - Upskilling in a Downturn, Tech Education, Publishing a Tech Book, and More w/ Jess Haberman
Jess Haberman (Anaconda, formerly at O'Reilly Media) joins the show to chat about ways to upskill in a downturn, tech and data education, publishing a tech book, and much more.
Jess is zero BS and always brings a practical perspective. We should know. She signed our book ;)
#118 - Unlocking the Value of Unstructured Data with AI w/ Cody & Will (Coactive.ai)
Cody Coleman and Will Gaviria Rojas (Co-founders of Coactive.ai) join the show to chat about the rising importance of unstructured data and the role of AI in unlocking the value of unstructured content. Their motto is "Content is King, and AI is the new Queen." Given the rise of unstructured data, this is a must for anyone working in AI.
#117 - Designing Massive Distributed Systems at LinkedIn w/ Felix GV
Felix GV designs HUGE distributed systems as a principal staff engineer at LinkedIn. We talk about the Venice open-source project and the challenges of massively distributed data.
But Felix is also thinking about something WAY bigger - designing software for a multi-planetary civilization! This is a very interesting discussion that you won't usually come across on this planet, or others (yet).
#116 - Calling Data Bullsh*t w/ Rogier Werschkull
Rogier Werschkull is keen on detecting BS in the data industry. This is a very opinionated and informative discussion about the state of affairs of hype and reality in the data industry today.
Matt Turck's Gone MAD! - The 2023 ML, AI, and Data (MAD) Landscape
Matt Turck (Firstmark Capital) joins us to chat about what's new in the data landscape, the economic realities impacting the startup ecosystem, all things AI, and much more.
Matt is easily one of our favorite thinkers in the data and tech space, so this is definitely worth tuning in for.
MAD Landscape 2023: https://mattturck.com/mad2023/
#115 - Building Data Infrastructure w/ Neelesh Salian
Neelesh Salian (Staff software engineer at dbt labs) joins the show to discuss building data infrastructure, general engineering topics at the staff+ level, and more.
#dataengineering #softwareengineering #data
#114 - How to Write a Best-Selling O'Reilly Technical Book w/ Joe Reis & Matt Housley
Joe and Matt discuss getting Fundamentals of Data Engineering signed with O'Reilly, the writing process, and marketing a best-selling tech book. This is worth a listen if you've ever wanted to write a technical book.
#dataengineering #data #author #oreilly
#113 - Random Data Grab Bag w/ Scott Taylor
Scott Taylor has become our regular for Valentine's Day week. Join us for a grab bag of thoughts and listener questions.
#112- Real-time Analytics at Massive Scale w/ Venkat Venkataramani
Venkat Venkataramani (CEO & co-founder at Rockset) joins the show to chat about real-time analytics at massive scale. In a former life, Venkat led Facebook's infrastructure team that handled billions of events per second for all of Facebook's user data services. Venkat is on OG in the real-time analytics space, so you'll learn a lot about this topic.
#dataengineering #data #realtimedata #analytics
Data Catalogs - A Debate w/ Shirshanka Das & Ananth Packkildurai - Special Episode
#111 - Streaming Data In the Enterprise w/ John Kutay
John Kutay (Striim) joins the show to chat about “easy mode” and “hard mode” use cases for streaming data in the enterprise.
#110 - Data Quality - The Hard Parts w/ Jeremy Stanley (Anomalo)
What's Next, with Zhamak Dehghani (Next Data) - Special Episode
Zhamak Dehghani (CEO & Founder of Next Data) discusses why she started Next Data, and how her company will finally help Data Mesh become a reality.
Next Data: https://www.nextdata.com/
#109 - ColorWise, Content Creation, and More w/ Kate Strachnyi
Kate Strachnyi (DATAcated) joins the show to chat about her new book, ColorWise (O'Reilly), content creation, running DATAcated, and much more.
#108 - Switching From a Batch to Streaming Mindset w/ Chip Huyen
Chip Huyen (Claypot AI) joins the show to chat about switching from a batch mindset to a streaming mindset, the convergence of data scientist and data engineering roles, and much more.
Chip is a well-regarded person in the ML/AI space. She's CEO/co-founder of Claypot AI, teaches Machine Learning Systems Design at Stanford, and wrote the phenomenal book, Designing Machine Learning Systems (O'Reilly), among many achievements.
#datascience #ai #dataengineering