

DataTalks.ClubAug 15, 2024

DataOps, Observability, and The Cure for Data Team Blues - Christopher Bergh
0:00
hi everyone Welcome to our event this event is brought to you by data dos club which is a community of people who love
0:06
data and we have weekly events and today one is one of such events and I guess we
0:12
are also a community of people who like to wake up early if you're from the states right Christopher or maybe not so
0:19
much because this is the time we usually have uh uh our events uh for our guests
0:27
and presenters from the states we usually do it in the evening of Berlin time but yes unfortunately it kind of
0:34
slipped my mind but anyways we have a lot of events you can check them in the
0:41
description like there's a link um I don't think there are a lot of them right now on that link but we will be
0:48
adding more and more I think we have like five or six uh interviews scheduled so um keep an eye on that do not forget
0:56
to subscribe to our YouTube channel this way you will get notified about all our future streams that will be as awesome
1:02
as the one today and of course very important do not forget to join our community where you can hang out with
1:09
other data enthusiasts during today's interview you can ask any question there's a pin Link in live chat so click
1:18
on that link ask your question and we will be covering these questions during the interview now I will stop sharing my
1:27
screen and uh there is there's a a message in uh and Christopher is from
1:34
you so we actually have this on YouTube but so they have not seen what you wrote
1:39
but there is a message from to anyone who's watching this right now from Christopher saying hello everyone can I
1:46
call you Chris or you okay I should go I should uh I should look on YouTube then okay yeah but anyways I'll you don't
1:53
need like you we'll need to focus on answering questions and I'll keep an eye
1:58
I'll be keeping an eye on all the question questions so um
2:04
yeah if you're ready we can start I'm ready yeah and you prefer Christopher
2:10
not Chris right Chris is fine Chris is fine it's a bit shorter um
2:18
okay so this week we'll talk about data Ops again maybe it's a tradition that we talk about data Ops every like once per
2:25
year but we actually skipped one year so because we did not have we haven't had
2:31
Chris for some time so today we have a very special guest Christopher Christopher is the co-founder CEO and
2:37
head chef or hat cook at data kitchen with 25 years of experience maybe this
2:43
is outdated uh cuz probably now you have more and maybe you stopped counting I
2:48
don't know but like with tons of years of experience in analytics and software engineering Christopher is known as the
2:55
co-author of the data Ops cookbook and data Ops Manifesto and it's not the
3:00
first time we have Christopher here on the podcast we interviewed him two years ago also about data Ops and this one
3:07
will be about data hops so we'll catch up and see what actually changed in in
3:13
these two years and yeah so welcome to the interview well thank you for having
3:19
me I'm I'm happy to be here and talking all things related to data Ops and why
3:24
why why bother with data Ops and happy to talk about the company or or what's changed
3:30
excited yeah so let's dive in so the questions for today's interview are prepared by Johanna berer as always
3:37
thanks Johanna for your help so before we start with our main topic for today
3:42
data Ops uh let's start with your ground can you tell us about your career Journey so far and also for those who
3:50
have not heard have not listened to the previous podcast maybe you can um talk
3:55
about yourself and also for those who did listen to the previous you can also maybe give a summary of what has changed
4:03
in the last two years so we'll do yeah so um my name is Chris so I guess I'm
4:09
a sort of an engineer so I spent about the first 15 years of my career in
4:15
software sort of working and building some AI systems some non- AI systems uh
4:21
at uh Us's NASA and MIT linol lab and then some startups and then um
4:30
Microsoft and then about 2005 I got I got the data bug uh I think you know my
4:35
kids were small and I thought oh this data thing was easy and I'd be able to go home uh for dinner at 5 and life
4:41
would be fine um because I was a big you started your own company right and uh it didn't work out that way
4:50
and um and what was interesting is is for me it the problem wasn't doing the
4:57
data like I we had smart people who did data science and data engineering the act of creating things it was like the
5:04
systems around the data that were hard um things it was really hard to not have
5:11
errors in production and I would sort of driving to work and I had a Blackberry at the time and I would not look at my
5:18
Blackberry all all morning I had this long drive to work and I'd sit in the parking lot and take a deep breath and
5:24
look at my Blackberry and go uh oh is there going to be any problems today and I'd be and if there wasn't I'd walk and
5:30
very happy um and if there was I'd have to like rce myself um and you know and
5:36
then the second problem is the team I worked for we just couldn't go fast enough the customers were super
5:42
demanding they didn't care they all they always thought things should be faster and we are always behind and so um how
5:50
do you you know how do you live in that world where things are breaking left and right you're terrified of making errors
5:57
um and then second you just can't go fast enough um and it's preh Hadoop era
6:02
right it's like before all this big data Tech yeah before this was we were using
6:08
uh SQL Server um and we actually you know we had smart people so we we we
6:14
built an engine in SQL Server that made SQL Server a column or
6:20
database so we built a column or database inside of SQL Server um so uh
6:26
in order to make certain things fast and and uh yeah it was it was really uh it's not
6:33
bad I mean the principles are the same right before Hadoop it's it's still a database there's still indexes there's
6:38
still queries um things like that we we uh at the time uh you would use olap
6:43
engines we didn't use those but you those reports you know are for models it's it's not that different um you know
6:50
we had a rack of servers instead of the cloud um so yeah and I think so what what I
6:57
took from that was uh it's just hard to run a team of people to do do data and analytics and it's not
7:05
really I I took it from a manager perspective I started to read Deming and
7:11
think about the work that we do as a factory you know and in a factory that produces insight and not automobiles um
7:18
and so how do you run that factory so it produces things that are good of good
7:24
quality and then second since I had come from software I've been very influenced
7:29
by by the devops movement how you automate deployment how you run in an agile way how you
7:35
produce um how you how you change things quickly and how you innovate and so
7:41
those two things of like running you know running a really good solid production line that has very low errors
7:47
um and then second changing that production line at at very very often they're kind of opposite right um and so
7:55
how do you how do you as a manager how do you technically approach that and
8:00
then um 10 years ago when we started data kitchen um we've always been a profitable company and so we started off
8:07
uh with some customers we started building some software and realized that we couldn't work any other way and that
8:13
the way we work wasn't understood by a lot of people so we had to write a book and a Manifesto to kind of share our our
8:21
methods and then so yeah we've been in so we've been in business now about a little over 10
8:28
years oh that's cool and uh like what
8:33
uh so let's talk about dat offs and you mentioned devops and how you were inspired by that and by the way like do
8:41
you remember roughly when devops as I think started to appear like when did people start calling these principles
8:49
and like tools around them as de yeah so agile Manifesto well first of all the I
8:57
mean I had a boss in 1990 at Nasa who had this idea build a
9:03
little test a little learn a lot right that was his Mantra and then which made
9:09
made a lot of sense um and so and then the sort of agile software Manifesto
9:14
came out which is very similar in 2001 and then um the sort of first real
9:22
devops was a guy at Twitter started to do automat automated deployment you know
9:27
push a button and that was like 200 Nish and so the first I think devops
9:33
Meetup was around then so it's it's it's been 15 years I guess 6 like I was
9:39
trying to so I started my career in 2010 so I my first job was a Java
9:44
developer and like I remember for some things like we would just uh SFTP to the
9:52
machine and then put the jar archive there and then like keep our fingers crossed that it doesn't break uh uh like
10:00
it was not really the I wouldn't call it this way right you were deploying you
10:06
had a Dey process I put it yeah
10:11
right was that so that was documented too it was like put the jar on production cross your
10:17
fingers I think there was uh like a page on uh some internal Viki uh yeah that
10:25
describes like with passwords and don't like what you should do yeah that was and and I think what's interesting is
10:33
why that changed right and and we laugh at it now but that was why didn't you
10:38
invest in automating deployment or a whole bunch of automated regression
10:44
tests right that would run because I think in software now that would be rare
10:49
that people wouldn't use C CD they wouldn't have some automated tests you know functional
10:56
regression tests that would be the exception whereas that the norm at the beginning of your career and so that's
11:03
what's interesting and I think you know if we if we talk about what's changed in the last two three years I I think it is
11:10
getting more standard there are um there's a lot more companies who are
11:15
talking data Ops or data observability um there's a lot more tools that are a lot more people are
11:22
using get in data and analytics than ever before I think thanks to DBT um and
11:29
there's a lot of tools that are I think getting more code Centric right that
11:35
they're not treating their configuration like a black box there there's several
11:41
bi tools that tout the fact that they that they're uh you know they're they're git Centric you know and and so and that
11:49
they're testable and that they have apis so things like that I think people maybe let's take a step back and just do a
11:57
quick summary of what data Ops data Ops is and then we can talk about like what changed in the last two years sure so I
12:06
guess it starts with a problem and that it's it sort of
12:11
admits some dark things about data and analytics and that we're not really successful and we're not really happy um
12:19
and if you look at the statistics on sort of projects and problems and even
12:25
the psychology like I think about a year or two we did a survey of
12:31
data Engineers 700 data engineers and 78% of them wanted their job to come with a therapist and 50% were thinking
12:38
of leaving the career altogether and so why why is everyone sort of unhappy well I I I think what happens is
12:46
teams either fall into two buckets they're sort of heroic teams who
12:52
are doing their they're working night and day they're trying really hard for their customer um and then they get
13:01
burnt out and then they quit honestly and then the second team have wrapped
13:06
their projects up in so much process and proceduralism and steps that doing
13:12
anything is sort of so slow and boring that they again leave in frustration um
13:18
or or live in cynicism and and that like the only outcome is quit and
13:24
start uh woodworking yeah the only outcome really is quit and start working
13:29
and um as a as a manager I always hated that right because when when your team
13:35
is either full of heroes or proceduralism you always have people who have the whole system in their head
13:42
they're certainly key people and then when they leave they take all that knowledge with them and then that
13:48
creates a bottleneck and so both of which are aren aren't and I think the
13:53
main idea of data Ops is there's a balance between fear and herois
14:00
that you can live you don't you know you don't have to be fearful 95% of the time maybe one or two% it's good to be
14:06
fearful and you don't have to be a hero again maybe one or two per it's good to be a hero but there's a balance um and
14:13
and in that balance you actually are much more prod

Working as a Core Developer in the Scikit-Learn Universe - Guillaume Lemaître
In this podcast episode, we talked with Guillaume Lemaître about navigating scikit-learn and imbalanced-learn. 🔗 CONNECT WITH Guillaume Lemaître LinkedIn - https://www.linkedin.com/in/guillaume-lemaitre-b9404939/ Twitter - https://x.com/glemaitre58 Github - https://github.com/glemaitre Website - https://glemaitre.github.io/ 🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks-club.slack.com/join/shared_invite/zt-2hu0sjeic-ESN7uHt~aVWc8tD3PefSlA#/shared-invite/email Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/u/0/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ Check other upcoming events - https://lu.ma/dtc-events LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://twitter.com/DataTalksClub Website - https://datatalks.club/ 🔗 CONNECT WITH ALEXEY Twitter - https://twitter.com/Al_Grigor Linkedin - https://www.linkedin.com/in/agrigorev/ 🎙 ABOUT THE PODCAST At DataTalksClub, we organize live podcasts that feature a diverse range of guests from the data field. Each podcast is a free-form conversation guided by a prepared set of questions, designed to learn about the guests’ career trajectories, life experiences, and practical advice. These insightful discussions draw on the expertise of data practitioners from various backgrounds. We stream the podcasts on YouTube, where each session is also recorded and published on our channel, complete with timestamps, a transcript, and important links. You can access all the podcast episodes here - https://datatalks.club/podcast.html 📚Check our free online courses ML Engineering course - http://mlzoomcamp.com Data Engineering course - https://github.com/DataTalksClub/data-engineering-zoomcamp MLOps course - https://github.com/DataTalksClub/mlops-zoomcamp Analytics in Stock Markets - https://github.com/DataTalksClub/stock-markets-analytics-zoomcamp LLM course - https://github.com/DataTalksClub/llm-zoomcamp Read about all our courses in one place - https://datatalks.club/blog/guide-to-free-online-courses-at-datatalks-club.html 👋🏼 GET IN TOUCH If you want to support our community, use this link - https://github.com/sponsors/alexeygrigorev If you're a company and want to support us, contact at alexey@datatalks.club

Building a Domestic Risk Assessment Tool - Sabina Firtala
Links:
- LinkedIn:https://www.linkedin.com/company/frontline100/
- Ba Linh Le's LinkedIn: https://www.linkedin.com/in/ba-linh-le-/
- Sabrina's LinkedIn: https://www.linkedin.com/in/sabina-firtala/
- Twitter: https://x.com/frontline_100?mx=2
- Website: https://www.frontline100.com/
Free LLM course: https://github.com/DataTalksClub/llm-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Berlin Buzzwords 2024
We stream the podcasts on YouTube, where each session is also recorded and published on our channel, complete with timestamps, a transcript, and important links. You can access all the podcast episodes here - https://datatalks.club/podcast.html 📚Check our free online courses ML Engineering course - http://mlzoomcamp.com Data Engineering course - https://github.com/DataTalksClub/data-engineering-zoomcamp MLOps course - https://github.com/DataTalksClub/mlops-zoomcamp Analytics in Stock Markets - https://github.com/DataTalksClub/stock-markets-analytics-zoomcamp LLM course - https://github.com/DataTalksClub/llm-zoomcamp Read about all our courses in one place - https://datatalks.club/blog/guide-to-free-online-courses-at-datatalks-club.html 👋🏼 GET IN TOUCH If you want to support our community, use this link - https://github.com/sponsors/alexeygrigorev If you’re a company, support us at alexey@datatalks.club

Community Building and Teaching in AI & Tech - Erum Afzal
We talked about:
- Erum's Background
- Omdena Academy and Erum’s Role There
- Omdena’s Community and Projects
- Course Development and Structure at Omdena Academy
- Student and Instructor Engagement
- Engagement and Motivation
- The Role of Teaching in Community Building
- The Importance of Communities for Career Building
- Advice for Aspiring Instructors and Freelancers
- DS and ML Talent Market Saturation
- Resources for Learning AI and Community Building
- Erum’s Resource Recommendations
Links:
LinkedIn: https://www.linkedin.com/in/erum-afzal-64827b24/
Twitter: https://twitter.com/Erum55449739
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Working in Open Source - Probabl.ai and sklearn - Vincent Warmerdam
We talked about:
- Vincent’s Background
- SciKit Learn’s History and Company Formation
- Maintaining and Transitioning Open Source Projects
- Teaching and Learning Through Open Source
- Role of Developer Relations and Content Creation
- Teaching Through Calm Code and The Importance of Content Creation
- Current Projects and Future Plans for Calm Code
- Data Processing Tricks and The Importance of Innovation
- Learning the Fundamentals and Changing the Way You See a Problem
- Dev Rel and Core Dev in One
- Why :probabl. Needs a Dev Rel
- Exploration of Skrub and Advanced Data Processing
- Personal Insights on SciKit Learn and Industry Trends
- Vincent’s Upcoming Projects
Links:
- probabl. YouTube channel: https://www.youtube.com/@UCIat2Cdg661wF5DQDWTQAmg
- Calmcode website: https://calmcode.io/
- probabl. website: https://probabl.ai/
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

AI for Ecology, Biodiversity, and Conservation - Tanya Berger-Wolf
Links:
- Biodiversity and Artificial Intelligence pdf: https://www.gpai.ai/projects/responsible-ai/environment/biodiversity-and-AI-opportunities-recommendations-for-action.pdf
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Knowledge Graphs and LLMs Across Academia and Industry - Anahita Pakiman
We talked about:
- Anahita's Background
- Mechanical Engineering and Applied Mechanics
- Finite Element Analysis vs. Machine Learning
- Optimization and Semantic Reporting
- Application of Knowledge Graphs in Research
- Graphs vs Tabular Data
- Computational graphs
- Graph Data Science and Graph Machine Learning
- Combining Knowledge Graphs and Large Language Models (LLMs)
- Practical Applications and Projects
- Challenges and Learnings
- Anahita’s Recommendations
Links:
- GitHub repo: https://github.com/antahiap/ADPT-LRN-PHYS/tree/main
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Inclusive Data Leadership Coaching - Tereza Iofciu
We talked about:
- Tereza’s background
- Switching from an Individual Contributor to Lead
- Python Pizza and the pizza management metaphor
- Learning to figure things out on your own and how to receive feedback
- Tereza as a leadership coach
- Podcasts
- Tereza’s coaching framework (selling yourself vs bragging)
- The importance of retrospectives
- The importance of communication and active listening
- Convincing people you don’t have power over
- Building relationships and empathy
- Inclusive leadership
Links:
- LinkedIn: https://www.linkedin.com/in/tereza-iofciu/
- Twitter: https://twitter.com/terezaif
- Github: https://github.com/terezaif
- Website: https:// terezaiofciu.com
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Building Production Search Systems - Daniel Svonava
Links:
- VectorHub: https://superlinked.com/vectorhub/?utm_source=community&utm_medium=podcast&utm_campaign=datatalks
- Daniel's LinkedIn: https://www.linkedin.com/in/svonava/
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html This podcast is sponsored by VectorHub, a free open-source learning community for all things vector embeddings and information retrieval systems.

Building Machine Learning Products - Reem Mahmoud
We talked about:
- Reem’s background
- Context-aware sensing and transfer learning
- Shifting focus from PhD to industry
- Reem’s experience with startups and dealing with prejudices towards PhDs
- AI interviewing solution
- How candidates react to getting interviewed by an AI avatar
- End-to-end overview of a machine learning project
- The pitfalls of using LLMs in your process
- Mitigating biases
- Addressing specific requirements for specific roles
- Reem’s resource recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/reemmahmoud/recent-activity/all/
- Website: https://topmate.io/reem_mahmoud
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Make an Impact Through Volunteering Open Source Work - Sara EL-ATEIF
We talked about:
- Sara’s background
- On being a Google PhD fellow
- Sara’s volunteer work
- Finding AI volunteer work
- Sara’s Fruit Punch challenge
- How to take part in AI challenges
- AI Wonder Girls
- Hackathons
- Things people often miss in AI projects and hackathons
- Getting creative
- Fostering your social media
- Tips on applying for volunteer projects
- Why it’s worth doing volunteer projects
- Opportunities for data engineers and students
- Sara’s newsletter suggestions
Links:
- Dev and AI hackathons: https://devpost.com/
- Healthcare-focused challenges: https://grand-challenge.org/challenges/
- Volunteering in projects (AI4Good): https://www.fruitpunch.ai/
- Volunteering in projects (AI4Good) 2: https://www.omdena.com/
- Twitter: https://twitter.com/el_ateifSara
- Instagram: https://www.instagram.com/saraelateif/
- LinkedIn: https://www.linkedin.com/in/sara-el-ateif/
- Youtube: www.youtube.com/@elateifsara
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Accelerating The Job Hunt for The Perfect Job in Tech - Sarah Mestiri
We talked about:
- Sarah’s background
- How Sarah became a coach and found her niche
- Sarah’s clients
- How Sarah helps her clients find the perfect job
- Finding a specialization
- Informational interviews
- Building a connection for mutual benefit
- The networking strategy
- Listing your projects in the CV
- The importance of doing research yourself and establishing your interests
- How to land a part-time job when the company wants full-time
- Age is not a factor
- Applying for jobs after finishing a course and the importance of sharing your learnings
- Sarah resource recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/sarahmestiri/
- Website: https://thrivingcareermoms.com/
- Personal Website: https://www.sarahmestiri.com/
- Youtube channel: https://www.youtube.com/@thrivingcareermoms444
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Machine Learning Engineering in Finance - Nemanja Radojkovic
We talked about:
- Nemanja’s background
- When Nemanja first work as a data person
- Typical problems that ML Ops folks solve in the financial sector
- What Nemanja currently does as an ML Engineer
- The obstacle of implementing new things in financial sector companies
- Going through the hurdles of DevOps
- Working with an on-premises cluster
- “ML Ops on a Shoestring” (You don’t need fancy stuff to start w/ ML Ops)
- Tactical solutions
- Platform work and code work
- Programming and soft skills needed to be an ML Engineer
- The challenges of transitioning from and electrical engineering and sales to ML Ops
- The ML Ops tech stack for beginners
- Working on projects to determine which skills you need
Links:
- LinkedIn: https://www.linkedin.com/in/radojkovic/
Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Stock Market Analysis with Python and Machine Learning - Ivan Brigida
We talked about:
- Ivan’s background
- How Ivan became interested in investing
- Getting financial data to run simulations
- Open, High, Low, Close, Volume
- Risk management strategy
- Testing your trading strategies
- Sticking to your strategy
- Important metrics and remembering about trading fees
- Important features
- Deployment
- How DataTalks.Club courses helped Ivan
- Ivan’s site and course sign-up
Links:
- Exploring Finance APIs: https://pythoninvest.com/long-read/exploring-finance-apis
- Python Invest Blog Articles: https://pythoninvest.com/blog
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Bayesian Modeling and Probabilistic Programming - Rob Zinkov
We talked about:
- Rob’s background
- Going from software engineering to Bayesian modeling
- Frequentist vs Bayesian modeling approach
- About integrals
- Probabilistic programming and samplers
- MCMC and Hakaru
- Language vs library
- Encoding dependencies and relationships into a model
- Stan, HMC (Hamiltonian Monte Carlo) , and NUTS
- Sources for learning about Bayesian modeling
- Reaching out to Rob
Links:
- Book 1: https://bayesiancomputationbook.com/welcome.html
- Book/Course: https://xcelab.net/rm/statistical-rethinking/
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Navigating Challenges and Innovations in Search Technologies - Atita Arora
We talked about:
- Atita’s background
- How NLP relates to search
- Atita’s experience with Lucidworks and OpenSource Connections
- Atita’s experience with Qdrant and vector databases
- Utilizing vector search
- Major changes to search Atita has noticed throughout her career
- RAG (Retrieval-Augmented Generation)
- Building a chatbot out of transcripts with LLMs
- Ingesting the data and evaluating the results
- Keeping humans in the loop
- Application of vector databases for machine learning
- Collaborative filtering
- Atita’s resource recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/atitaarora/
- Twitter: https://x.com/atitaarora
- Github: https://github.com/atarora
- Human-in-the-Loop Machine Learning: https://www.manning.com/books/human-in-the-loop-machine-learning
- Relevant Search: https://www.manning.com/books/relevant-search
- Let's learn about Vectors: https://hub.superlinked.com/ Langchain: https://python.langchain.com/docs/get_started/introduction
- Qdrant blog: https://blog.qdrant.tech/
- OpenSource Connections Blog: https://opensourceconnections.com/blog/
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

The Entrepreneurship Journey: From Freelancing to Starting a Company - Adrian Brudaru
We talked about:
- Adrian’s background
- The benefits of freelancing
- Having an agency vs freelancing
- What let Adrian switch over from freelancing
- The conception of DLT (Growth Full Stack)
- The investment required to start a company
- Growth through the provision of services
- Growth through teaching (product-market fit)
- Moving on to creating docs
- Adrian’s current role
- Strategic partnerships and community growth through DocDB
- Plans for the future of DLT
- DLT vs Airbyte vs Fivetran
- Adrian’s resource recommendations
Links:
- Adrian's LinkedIn: https://www.linkedin.com/in/data-team/
- Twitter: https://twitter.com/dlt_library
- Github: https://github.com/dlt-hub/dlt
- Website: https://dlthub.com/docs/intro
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Become a Data Freelancer - Dimitri Visnadi
We talked about:
- Dimitri’s background
- The first steps of transitioning into freelance
- Working with recruiters (contracting)
- Deciding on what to charge for your services
- Establishing your network
- Self-marketing
- Contracting vs freelancing
- Which channel is better for those starting out?
- Cutting out the middleman
- Where to look for clients and how to vet them
- The different way of getting into freelancing
- Going back to a full-time job after freelancing
- Common mistakes freelancers make
- Dimitri’s resource suggestions
- Reaching out to Dimitri
Links:
- LinkedIn profile: http://www.linkedin.com/in/visnadi
- The DataFreelancer website: https://thedatafreelancer.com/
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

AI for Digital Health - Maria Bruckert
We talked about:
- Maria’s background
- Deciding to go into telecare (healthcare)
- Current difficulties in healthcare
- Getting into the healthcare industry as a lifestyle brand
- The importance of a plan B and being flexible
- What is SQIN and the importance of communication
- Going from lipstick to skin health analysis
- The importance of community and broadening your audience
- The importance of feedback and communicating benefits
- The current state and growth of SQIN
- Convincing investors and the importance of proving profitability
- Maria’s role at SQIN
- Balancing a newborn child and a new company
Links:
- Free ML Engineering course: http://mlzoomcamp.com
- Join DataTalks.Club: https://datatalks.club/slack.html
- Our events: https://datatalks.club/events.html

Cracking the Code: Machine Learning Made Understandable - Christoph Molnar
We talked about:
- Christoph’s background
- Kaggle and other competitions
- How Christoph became interested in interpretable machine learning
- Interpretability vs Accuracy
- Christoph’s current competition engagement
- How Christoph chooses topics for books
- Why Christoph started the writing journey with a book
- Self-publishing vs via a publisher
- Christoph’s other books
- What is conformal prediction?
- Christoph’s book on SHAP
- Explainable AI vs Interpretable AI
- Working alone vs with other people
- Christoph’s other engagements and how to stay hands-on
- Keeping a logbook
- Does one have to be an expert on the topic to write a book about it?
- Writing in the open and other feedback gathering methods
- Advice for those who want to be technical writers
- Self-publishing tools
- Finding Christoph online
Links:
- LinkedIn: https://www.linkedin.com/in/christoph-molnar/
- Website: https://christophmolnar.com/
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

The Unwritten Rules for Success in Machine Learning - Jack Blandin
We talked about:
- Jack’s background
- Transitioning from IC to management
- Lesson not taught in traditional school
- The importance of people’s perception, trust, and respect
- How soft skills are relevant to machine learning
- How to put on a salesman hat in machine learning management
- The importance of visuals and building a POC as fast as possible
- 1st Rule of Machine Learning – don’t be afraid to start without machine learning
- The importance of understanding the reality that data represents
- The importance of putting yourself in the shoes of customers
- The importance of software engineering skills in machine learning
- Where to find Jack’s content
- Jack’s next venture
Links:
- Jack's LinkedIn profile: https://www.linkedin.com/in/jackblandin/
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

From a Research Scientist at Amazon to a Machine learning/AI Consultant - Verena Webber
Links:
- Mini sound bath: https://www.youtube.com/watch?v=g-lDrcSqcrQ
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

From Marketing to Product Owner in Search - Lera Kaimashnіkova
We talked about:
- Lera’s background
- Lera’s move from Ukraine to Germany
- The transition from Marketing to Product Ownership
- The importance of communication and one-on-ones
- The role of Product Owner
- Utilizing Scrum as a Product Owner
- Building teams and cross-functionality
- Lera’s experience learning about search
- The importance of having both technical knowledge and business context
- Open developer positions at AUTODOC
- What experience Lera came to AUTODOC with
- How marketing skills helped Lera in her current role
- Lera’s resource recommendations
- Everything is possible
Links:
- Post: https://www.linkedin.com/posts/leracaiman_elasticsearch-ecommerce-activity-7106615081588674560-5WQO
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Collaborative Data Science in Business - Ioannis Mesionis
Links:
- LinkedIn: https://www.linkedin.com/in/ioannis-mesionis/
- Github: https://github.com/ioannismesionis
- Website: https://ioannismesionis.github.io/
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Bridging Data Science and Healthcare - Eleni Stamatelou
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

DataTalks.Club Anniversary Interview - Alexey Grigorev, Johanna Bayer
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Data Engineering for Fraud Prevention - Angela Ramirez
We talked about:
- Angela's background
- Angela's role at Sam's Club
- The usefulness of knowing ML as a data engineer
- Angela's career path
- Transitioning from data analyst to data engineer/system designer
- Best practices for system design and data engineering
- Working with document databases
- Working with network-based databases
- Detecting fraud with a network-based database
- Selecting the database type to work with
- Neo4j vs Postgres
- The importance of having software engineering knowledge in data engineering
- Data quality check tooling
- The greatest challenges in data engineering
- Debugging and finding the root cause of a failed job
- What kinds of tools Angela uses on a daily basis
- Working with external data sources
- Angela's resource recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/aramirez1305/
- Twitter: https://twitter.com/angelamaria__r
- Github: https://github.com/aramir62
- Previous podcast talk: https://twitter.com/i/spaces/1OwGWwZAZDnGQ?s=20
Free ML Engineering course: http://mlzoomcamp.com
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

From Data Manager to Data Architect - Loïc Magnien
We talked about:
- Loïc's background
- Data management
- Loïc's transition to data engineer
- Challenges in the transition to data engineering
- What is a data architect?
- The output of a data architect's work
- Establishing metrics and dimensions
- The importance of communication
- Setting up best practices for the team
- Staying relevant and tech-watching
- Setting up specifications for a pipeline
- Be agile, create a POC, iterate ASAP, and build reusable templates
- Reaching out to Loïc for questions
Links:
- Loiic LinkedIn: https://www.linkedin.com/in/loicmagnien/
Free ML Engineering course: http://mlzoomcamp.com
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Pragmatic and Standardized MLOps - Maria Vechtomova
We talked about:
- Maria's background
- Marvelous MLOps
- Maria's definition of MLOps
- Alternate team setups without a central MLOps team
- Pragmatic vs non-pragmatic MLOps
- Must-have ML tools (categories)
- Maturity assessment
- What to start with in MLOps
- Standardized MLOps
- Convincing DevOps to implement
- Understanding what the tools are used for instead of knowing all the tools
- Maria's next project plans
- Is LLM Ops a thing?
- What Ahold Delhaize does
- Resource recommendations to learn more about MLOps
- The importance of data engineering knowledge for ML engineers
Links:
- LinkedIn: https://www.linkedin.com/company/marvelous-mlops/
- Website: https://marvelousmlops.substack.com/
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Democratizing Causality - Aleksander Molak
We talked about:
- Aleksander's background
- Aleksander as a Causal Ambassador
- Using causality to make decisions
- Counterfactuals and and Judea Pearl
- Meta-learners vs classical ML models
- Average treatment effect
- Reducing causal bias, the super efficient estimator, and model uplifting
- Metrics for evaluating a causal model vs a traditional ML model
- Is the added complexity of a causal model worth implementing?
- Utilizing LLMs in causal models (text as outcome)
- Text as treatment and style extraction
- The viability of A/B tests in causal models
- Graphical structures and nonparametric identification
- Aleksander's resource recommendations
Links:
- The Book of Why: https://amzn.to/3OZpvBk
- Causal Inference and Discovery in Python: https://amzn.to/46Pperr
- Book's GitHub repo: https://github.com/PacktPublishing/Causal-Inference-and-Discovery-in-Python
- The Battle of Giants: Causality vs NLP (PyData Berlin 2023): https://www.youtube.com/watch?v=Bd1XtGZhnmw
- New Frontiers in Causal NLP (papers repo): https://bit.ly/3N0TFTL
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Mastering Data Engineering as a Remote Worker - José María Sánchez Salas
We talked about:
- José's background
- How José relocated to Norway and his schedule
- Tech companies in Norway and José role
- Challenges of working as a remote data engineer
- José's newsletter on how to make use of data
- The process of making data useful
- Where José gets inspiration for his newsletter
- Dealing with burnout
- When in Norway, do as the Norwegians do
- The legalities of working remotely in Norway
- The benefits of working remotely
Links:
- LinkedIn: https://www.linkedin.com/in/jmssalas
- Github: https://github.com/jmssalas
- Website & Newsletter: https://jmssalas.com
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

The Good, the Bad and the Ugly of GPT - Sandra Kublik
We talked about:
- Sandra's background
- Making a YouTube channel to break into the LLM space
- The business cases for LLMs
- LLMs as amplifiers
- The befits of keeping a human in the loop when using LLMs (AI limitations)
- Using LLMs as assistants
- Building an app that uses an LLM
- Prompt whisperers and how to improve your prompts
- Sandra's 7-day LLM experiment
- Sandra's LLM content recommendations
- Finding Sandra online
Links:
- LinkedIn: https://www.linkedin.com/in/sandrakublik/
- Twitter: https://twitter.com/sandra_kublik
- Youtube: https://www.youtube.com/@sandra_kublik
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

LLMs for Everyone - Meryem Arik
We talked about:
- Meryam's background
- The constant evolution of startups
- How Meryam became interested in LLMs
- What is an LLM (generative vs non-generative models)?
- Why LLMs are important
- Open source models vs API models
- What TitanML does
- How fine-tuning a model helps in LLM use cases
- Fine-tuning generative models
- How generative models change the landscape of human work
- How to adjust models over time
- Vector databases and LLMs
- How to choose an open source LLM or an API
- Measuring input data quality
- Meryam's resource recommendations
Links:
- Website: https://www.titanml.co/
- Beta docs: https://titanml.gitbook.io/iris-documentation/overview/guide-to-titanml...
- Using llama2.0 in TitanML Blog: https://medium.com/@TitanML/the-easiest-way-to-fine-tune-and-inference-llama-2-0-8d8900a57d57
- Discord: https://discord.gg/83RmHTjZgf
- Meryem LinkedIn: https://www.linkedin.com/in/meryemarik/
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Investing in Open-Source Data Tools - Bela Wiertz
We talked about:
- Bela's background
- Why startups even need investors
- Why open source is a viable go-to-market strategy
- Building a bottom-up community
- The investment thesis for the TKM Family Office and the blurriness of the funding round naming convention
- Angel investors vs VC Funds vs family offices
- Bela's investment criteria and GitHub stars as a metric
- Inbound sourcing, outbound sourcing, and investor networking
- Making a good impression on an investor
- Balancing open and closed source parts of a product
- The future of open source
- Recent successes of open source companies
- Bela's resource recommendations
Links:
- Understand who is engaging with your open source project article: https://www.crowd.dev/
- Top 6 Books on Developer Community Building: https://www.crowd.dev/post/top-6-books-on-developer-community-building
- Which open source software metrics matter: https://www.bvp.com/atlas/measuring-the-engagement-of-an-open-source-software-community#Which-open-source-software-metrics-matter
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Why Machine Learning Design is Broken - Valerii Babushkin
Links:
- Book: https://www.manning.com/books/machine-learning-system-design?utm_source=AGMLBookcamp&utm_medium=affiliate&utm_campaign=book_babushkin_machine_4_25_23&utm_content=twitter
- Discount: poddatatalks21 (35% off)
- Evidently: https://www.evidentlyai.com/
- Article: https://medium.com/people-ai-engineering/design-documents-for-ml-models-bbcd30402ff7
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Interpretable AI and ML - Polina Mosolova
We talked about:
- Polina's background
- How common it is for PhD students to build ML pipelines end-to-end
- Simultaneous PhD and industry experience
- Support from both the academic and industry sides
- How common the industrial PhD setup is and how to get into one
- Organizational trust theory
- How price relates to trust
- How trust relates to explainability
- The importance of actionability
- Explainability vs interpretability vs actionability
- Complex glass box models
- Does the explainability of a model follow explainability?
- What explainable AI bring to customers and end users
- Can all trust be turned into KPI?
Links:
- LinkedIn: https://www.linkedin.com/in/polina-mosolova/
- Neural Additive Models paper: https://proceedings.neurips.cc/paper/2021/file/251bd0442dfcc53b5a761e050f8022b8-Paper.pdf
- Neural Basis Model paper: https://arxiv.org/pdf/2205.14120.pdf
- Interpretable Feature Spaces paper: https://kdd.org/exploration_files/vol24issue1_1._Interpretable_Feature_Spaces_revised.pdf

From Scratch to Success: Building an MLOps Team and ML Platform - Simon Stiebellehner
We talked about:
- Simon's background
- What MLOps is and what it isn't
- Skills needed to build an ML platform that serves 100s of models
- Ranking the importance of skills
- The point where you should think about building an ML platform
- The importance of processes in ML platforms
- Weighing your options with SaaS platforms
- The exploratory setup, experiment tracking, and model registry
- What comes after deployment?
- Stitching tools together to create an ML platform
- Keeping data governance in mind when building a platform
- What comes first – the model or the platform?
- Do MLOps engineers need to have deep knowledge of how models work?
- Is API design important for MLOps?
- Simon's recommendations for furthering MLOps knowledge
Links:
- LinkedIn: https://www.linkedin.com/in/simonstiebellehner/
- Github: https://github.com/stiebels
- Medium: https://medium.com/@sistel
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

From MLOps to DataOps - Santona Tuli
We talked about:
- Santona's background
- Focusing on data workflows
- Upsolver vs DBT
- ML pipelines vs Data pipelines
- MLOps vs DataOps
- Tools used for data pipelines and ML pipelines
- The “modern data stack” and today's data ecosystem
- Staging the data and the concept of a “lakehouse”
- Transforming the data after staging
- What happens after the modeling phase
- Human-centric vs Machine-centric pipeline
- Applying skills learned in academia to ML engineering
- Crafting user personas based on real stories
- A framework of curiosity
- Santona's book and resource recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/santona-tuli/
- Upsolver website: upsolver.com
- Why we built a SQL-based solution to unify batch and stream workflows: https://www.upsolver.com/blog/why-we-built-a-sql-based-solution-to-unify-batch-and-stream-workflows
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Data Developer Relations - Hugo Bowne-Anderson
We talked about:
- Hugo's background
- Why do tools and the companies that run them have wildly different names
- Hugo's other projects beside Metaflow
- Transitioning from educator to DevRel
- What is DevRel?
- DevRel vs Marketing
- How DevRel coordinates with developers
- How DevRel coordinates with marketers
- What skills a DevRel needs
- The challenges that come with being an educator
- Becoming a good writer: nature vs nurture
- Hugo's approach to writing and suggestions
- Establishing a goal for your content
- Choosing a form of media for your content
- Is DevRel intercompany or intracompany?
- The Vanishing Gradients podcast
- Finding Hugo online
Links:
- Hugo Browne's github: http://hugobowne.github.io/
- Vanishing Gradients: https://vanishinggradients.fireside.fm/
- MLOps and DevOps: Why Data Makes It Differenthttps://www.oreilly.com/radar/mlops-and-devops-why-data-makes-it-different/
- Evaluate Metaflow for free, right from your Browser: https://outerbounds.com/sandbox/
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Lessons Learned from Freelancing and Working in a Start-up - Antonis Stellas
We talked about;
- Antonis' background
- The pros and cons of working for a startup
- Useful skills for working at a startup and the Lean way to work
- How Antonis joined the DataTalks.Club community
- Suggestions for students joining the MLOps course
- Antonis contributing to Evidently AI
- How Antonis started freelancing
- Getting your first clients on Upwork
- Pricing your work as a freelancer
- The process after getting approved by a client
- Wearing many hats as a freelancer and while working at a startup
- Other suggestions for getting clients as a freelancer
- Antonis' thoughts on the Data Engineering course
- Antonis' resource recommendations
Links:
- Lean Startup by Eric Ries: https://theleanstartup.com/
- Lean Analytics: https://leananalyticsbook.com/
- Designing Machine Learning Systems by Chip Huyen: https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/
- Kafka Streaming with python by Khris Jenkins tutorial video: https://youtu.be/jItIQ-UvFI4
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Data Access Management - Bart Vandekerckhove
We talked about:
- Bart's background
- What is data governance?
- Data dictionaries and data lineage
- Data access management
- How to learn about data governance
- What skills are needed to do data governance effectively
- When an organization needs to start thinking about data governance
- Good data access management processes
- Data masking and the importance of automating data access
- DPO and CISO roles
- How data access management works with a data mesh approach
- Avoiding the role explosion problem
- The importance of data governance integration in DataOps
- Terraform as a stepping stone to data governance
- How Raito can help an organization with data governance
- Open-source data governance tools
Links:
- LinkedIn: https://www.linkedin.com/in/bartvandekerckhove/
- Twitter: https://twitter.com/Bart_H_VDK
- Github: https://github.com/raito-io
- Website: https://www.raito.io/
- Data Mesh Learning Slack: https://data-mesh-learning.slack.com/join/shared_invite/zt-1qs976pm9-ci7lU8CTmc4QD5y4uKYtAA#/shared-invite/email
- DataQG Website: https://dataqg.com/
- DataQG Slack: https://dataqgcommunitygroup.slack.com/join/shared_invite/zt-12n0333gg-iTZAjbOBeUyAwWr8I~2qfg#/shared-invite/email
- DMBOK (Data Management Book of Knowledge): https://www.dama.org/cpages/body-of-knowledge
- DMBOK Wheel describing the data governance activities: https://www.dama.org/cpages/dmbok-2-wheel-images
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Data Strategy: Key Principles and Best Practices - Boyan Angelov
We talked about:
- Boyan's background
- What is data strategy?
- Due diligence and establishing a common goal
- Designing a data strategy
- Impact assessment, portfolio management, and DataOps
- Data products
- DataOps, Lean, and Agile
- Data Strategist vs Data Science Strategist
- The skills one needs to be a data strategist
- How does one become a data strategist?
- Data strategist as a translator
- Transitioning from a Data Strategist role to a CTO
- Using ChatGPT as a writing co-pilot
- Using ChatGPT as a starting point
- How ChatGPT can help in data strategy
- Pitching a data strategy to a stakeholder
- Setting baselines in a data strategy
- Boyan's book recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/angelovboyan/
- Twitter: https://twitter.com/thinking_code
- Github: https://github.com/boyanangelov
- Website: https://boyanangelov.com/
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Practical Data Privacy - Katharine Jarmul
We talked about:
- Katharine's background
- Katharine's ML privacy startup
- GDPR, CCPA, and the “opt-in as the default” approach
- What is data privacy?
- Finding Katharine's book – Practical Data Privacy
- The various definitions of data privacy and “user profiles”
- Privacy engineering and privacy-enhancing technologies
- Why data privacy is important
- What is differential privacy?
- The importance of keeping privacy in mind when designing systems
- Data privacy on the example of ChatGPT
- Katharine's resource suggestions for learning about data privacy
Links:
- LinkedIn: https://www.linkedin.com/in/katharinejarmul/
- Twitter: https://twitter.com/kjam
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Building Scalable and Reliable Machine Learning Systems - Arseny Kravchenko
We talked about:
- Arseny's background
- Working on machine learning in startups
- What is Machine Learning System Design?
- Constraints and requirements
- Known unknowns vs unknown unknowns (Design stage)
- Writing a design document
- Technical problems vs product-oriented problems
- The solution part of the Design Document
- What motivated Arseny to write a book on ML System Design
- Examples of a Design Document in the book
- The types of readers for ML System Design
- Working with the co-author
- Reacting to constraints and feedback when writing a book
- Arseny's favorite chapter of the book
- Other resources where you can learn about ML System Design
- Twitter Giveaway
Links:
- Book: https://www.manning.com/books/machine-learning-system-design?utm_source=AGMLBookcamp&utm_medium=affiliate&utm_campaign=book_babushkin_machine_4_25_23&utm_content=twitter
- Discount: poddatatalks21 (35% off)
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Building an Open-Source NLP Tool - Johannes Hötter
We talked about:
- Johannes’s background
- Johannes’s Open Source Spotlight demos – Refinery and Bricks
- The difficulties of working with natural language processing (NLP)
- Incorporating ChatGPT into a process as a heuristic
- What is Bricks?
- The process of starting a startup – Kern
- Making the decision to go with open source
- Pros and cons of launching as open source
- Kern’s business model
- Working with enterprises
- Johannes as a salesperson
- The team at Kern
- Johannes’s role at Kern
- How Johannes and Henrik separate responsibilities at Kern
- Working with very niche use cases
- The short story of how Kern got its funding
- Johannes’s resource recommendation
Links:
- Refinery's GitHub repo: https://github.com/code-kern-ai/refinery
- Bricks' Github repo: https://github.com/code-kern-ai/bricks
- Bricks Open Source Spotlight demo: https://www.youtube.com/watch?v=r3rXzoLQy2U
- Refinery Open Source Spotlight demo: https://www.youtube.com/watch?v=LlMhN2f7YDg
- Discord: https://discord.com/invite/qf4rGCEphW
- Ker's Website: https://www.kern.ai
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Navigating Industrial Data Challenges - Rosona Eldred
We talked about:
- Rosona’s background
- How mathematics knowledge helps in industry
- What is industrial data?
- Setting up an industrial process using blue paint
- Internet companies’ data vs industrial data
- Explaining industrial processes using packing peanuts
- Why productive industry needs data
- Measuring product qualities
- How data specialists use industrial data
- Defining and measuring sustainability
- Using data in reactionary measures to changing regulations
- Types of industrial data
- Solving problems and optimizing with industrial data
- Industrial solvers
- Tiny data vs Big data in productive industry
- The advantages of coming from academia into productive industry
- Materials and resources for industrial data
- Women in industry
- Why Rosona decided to shift to industrial data
Links:
- Kaggle dataset: https://www.kaggle.com/datasets/paresh2047/uci-semcom

Mastering Self-Learning in Machine Learning - Aaisha Muhammad
We talked about:
- Aaisha’s background
- How homeschooling affects self-study
- Deciding on what to learn about
- Establishing whether a resource is good
- How Aaisha focuses on learning
- Deciding on what kind of project to build
- Find research materials
- Aaisha’s experience with the Data Talks Club ML Zoomcamp
- ML Zoomcamp projects
- Aaisha’s interest in bioinformatics
- Keeping motivated with deadlines
- Notes and time-tracking tools
- Drawbacks to self-studying
- Aaisha’s interest in machine learning
- Aaisha’s least favorable part of ML Zoomcamp
- Helping people as a way to learn
- Using ChatGPT as a “study group”
- Is it possible to use self-studying to learn high-level topics
- Switching topics to avoid burnout
- Aaisha’s resource recommendations
Links:
- LinkedIn: https://www.linkedin.com/in/aaisha-muhammad/
- Twitter: https://twitter.com/ZealousMushroom
- Github: https://github.com/AaishaMuhammad
- Website: http://www.aaishamuhammad.co.za/
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

The Secret Sauce of Data Science Management - Shir Meir Lador
We talked about:
- Shir’s background
- Debrief culture
- The responsibilities of a group manager
- Defining the success of a DS manager
- The three pillars of data science management
- Managing up
- Managing down
- Managing across
- Managing data science teams vs business teams
- Scrum teams, brainstorming, and sprints
- The most important skills and strategies for DS and ML managers
- Making sure proof of concepts get into production
Links:
- The secret sauce of data science management: https://www.youtube.com/watch?v=tbBfVHIh-38
- Lessons learned leading AI teams: https://blogs.intuit.com/2020/06/23/lessons-learned-leading-ai-teams/
- How to avoid conflicts and delays in the AI development process (Part I): https://blogs.intuit.com/2020/12/08/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-i/
- How to avoid conflicts and delays in the AI development process (Part II): https://blogs.intuit.com/2021/01/06/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-ii/
- Leading AI teams deck: https://drive.google.com/drive/folders/1_CnqjugtsEbkIyOUKFHe48BeRttX0uJG
- Leading AI teams video: https://www.youtube.com/watch?app=desktop&v=tbBfVHIh-38
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

SE4ML - Software Engineering for Machine Learning - Nadia Nahar
We talked about:
- Nadia’s background
- Academic research in software engineering
- Design patterns
- Software engineering for ML systems
- Problems that people in industry have with software engineering and ML
- Communication issues and setting requirements
- Artifact research in open source products
- Product vs model
- Nadia’s open source product dataset
- Failure points in machine learning projects
- Finding solutions to issues using Nadia’s dataset and experience
- The problem of siloing data scientists and other structure issues
- The importance of documentation and checklists
- Responsible AI
- How data scientists and software engineers can work in an Agile way
Links:
- Model Card: https://arxiv.org/abs/1810.03993
- Datasheets: https://arxiv.org/abs/1803.09010
- Factsheets: https://arxiv.org/abs/1808.07261
- Research Paper: https://www.cs.cmu.edu/~ckaestne/pdf/icse22_seai.pdf
- Arxiv version: https://arxiv.org/pdf/2110.
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html