DataTalks.Club

DataTalks.Club

By DataTalks.Club

DataTalks.Club - the place to talk about data!
Available on
Apple Podcasts Logo
Pocket Casts Logo
RadioPublic Logo
Spotify Logo
Currently playing episode

DataOps, Observability, and The Cure for Data Team Blues - Christopher Bergh

DataTalks.ClubAug 15, 2024

00:00
53:47
DataOps, Observability, and The Cure for Data Team Blues - Christopher Bergh

DataOps, Observability, and The Cure for Data Team Blues - Christopher Bergh

0:00

hi everyone Welcome to our event this event is brought to you by data dos club which is a community of people who love

0:06

data and we have weekly events and today one is one of such events and I guess we

0:12

are also a community of people who like to wake up early if you're from the states right Christopher or maybe not so

0:19

much because this is the time we usually have uh uh our events uh for our guests

0:27

and presenters from the states we usually do it in the evening of Berlin time but yes unfortunately it kind of

0:34

slipped my mind but anyways we have a lot of events you can check them in the

0:41

description like there's a link um I don't think there are a lot of them right now on that link but we will be

0:48

adding more and more I think we have like five or six uh interviews scheduled so um keep an eye on that do not forget

0:56

to subscribe to our YouTube channel this way you will get notified about all our future streams that will be as awesome

1:02

as the one today and of course very important do not forget to join our community where you can hang out with

1:09

other data enthusiasts during today's interview you can ask any question there's a pin Link in live chat so click

1:18

on that link ask your question and we will be covering these questions during the interview now I will stop sharing my

1:27

screen and uh there is there's a a message in uh and Christopher is from

1:34

you so we actually have this on YouTube but so they have not seen what you wrote

1:39

but there is a message from to anyone who's watching this right now from Christopher saying hello everyone can I

1:46

call you Chris or you okay I should go I should uh I should look on YouTube then okay yeah but anyways I'll you don't

1:53

need like you we'll need to focus on answering questions and I'll keep an eye

1:58

I'll be keeping an eye on all the question questions so um

2:04

yeah if you're ready we can start I'm ready yeah and you prefer Christopher

2:10

not Chris right Chris is fine Chris is fine it's a bit shorter um

2:18

okay so this week we'll talk about data Ops again maybe it's a tradition that we talk about data Ops every like once per

2:25

year but we actually skipped one year so because we did not have we haven't had

2:31

Chris for some time so today we have a very special guest Christopher Christopher is the co-founder CEO and

2:37

head chef or hat cook at data kitchen with 25 years of experience maybe this

2:43

is outdated uh cuz probably now you have more and maybe you stopped counting I

2:48

don't know but like with tons of years of experience in analytics and software engineering Christopher is known as the

2:55

co-author of the data Ops cookbook and data Ops Manifesto and it's not the

3:00

first time we have Christopher here on the podcast we interviewed him two years ago also about data Ops and this one

3:07

will be about data hops so we'll catch up and see what actually changed in in

3:13

these two years and yeah so welcome to the interview well thank you for having

3:19

me I'm I'm happy to be here and talking all things related to data Ops and why

3:24

why why bother with data Ops and happy to talk about the company or or what's changed

3:30

excited yeah so let's dive in so the questions for today's interview are prepared by Johanna berer as always

3:37

thanks Johanna for your help so before we start with our main topic for today

3:42

data Ops uh let's start with your ground can you tell us about your career Journey so far and also for those who

3:50

have not heard have not listened to the previous podcast maybe you can um talk

3:55

about yourself and also for those who did listen to the previous you can also maybe give a summary of what has changed

4:03

in the last two years so we'll do yeah so um my name is Chris so I guess I'm

4:09

a sort of an engineer so I spent about the first 15 years of my career in

4:15

software sort of working and building some AI systems some non- AI systems uh

4:21

at uh Us's NASA and MIT linol lab and then some startups and then um

4:30

Microsoft and then about 2005 I got I got the data bug uh I think you know my

4:35

kids were small and I thought oh this data thing was easy and I'd be able to go home uh for dinner at 5 and life

4:41

would be fine um because I was a big you started your own company right and uh it didn't work out that way

4:50

and um and what was interesting is is for me it the problem wasn't doing the

4:57

data like I we had smart people who did data science and data engineering the act of creating things it was like the

5:04

systems around the data that were hard um things it was really hard to not have

5:11

errors in production and I would sort of driving to work and I had a Blackberry at the time and I would not look at my

5:18

Blackberry all all morning I had this long drive to work and I'd sit in the parking lot and take a deep breath and

5:24

look at my Blackberry and go uh oh is there going to be any problems today and I'd be and if there wasn't I'd walk and

5:30

very happy um and if there was I'd have to like rce myself um and you know and

5:36

then the second problem is the team I worked for we just couldn't go fast enough the customers were super

5:42

demanding they didn't care they all they always thought things should be faster and we are always behind and so um how

5:50

do you you know how do you live in that world where things are breaking left and right you're terrified of making errors

5:57

um and then second you just can't go fast enough um and it's preh Hadoop era

6:02

right it's like before all this big data Tech yeah before this was we were using

6:08

uh SQL Server um and we actually you know we had smart people so we we we

6:14

built an engine in SQL Server that made SQL Server a column or

6:20

database so we built a column or database inside of SQL Server um so uh

6:26

in order to make certain things fast and and uh yeah it was it was really uh it's not

6:33

bad I mean the principles are the same right before Hadoop it's it's still a database there's still indexes there's

6:38

still queries um things like that we we uh at the time uh you would use olap

6:43

engines we didn't use those but you those reports you know are for models it's it's not that different um you know

6:50

we had a rack of servers instead of the cloud um so yeah and I think so what what I

6:57

took from that was uh it's just hard to run a team of people to do do data and analytics and it's not

7:05

really I I took it from a manager perspective I started to read Deming and

7:11

think about the work that we do as a factory you know and in a factory that produces insight and not automobiles um

7:18

and so how do you run that factory so it produces things that are good of good

7:24

quality and then second since I had come from software I've been very influenced

7:29

by by the devops movement how you automate deployment how you run in an agile way how you

7:35

produce um how you how you change things quickly and how you innovate and so

7:41

those two things of like running you know running a really good solid production line that has very low errors

7:47

um and then second changing that production line at at very very often they're kind of opposite right um and so

7:55

how do you how do you as a manager how do you technically approach that and

8:00

then um 10 years ago when we started data kitchen um we've always been a profitable company and so we started off

8:07

uh with some customers we started building some software and realized that we couldn't work any other way and that

8:13

the way we work wasn't understood by a lot of people so we had to write a book and a Manifesto to kind of share our our

8:21

methods and then so yeah we've been in so we've been in business now about a little over 10

8:28

years oh that's cool and uh like what

8:33

uh so let's talk about dat offs and you mentioned devops and how you were inspired by that and by the way like do

8:41

you remember roughly when devops as I think started to appear like when did people start calling these principles

8:49

and like tools around them as de yeah so agile Manifesto well first of all the I

8:57

mean I had a boss in 1990 at Nasa who had this idea build a

9:03

little test a little learn a lot right that was his Mantra and then which made

9:09

made a lot of sense um and so and then the sort of agile software Manifesto

9:14

came out which is very similar in 2001 and then um the sort of first real

9:22

devops was a guy at Twitter started to do automat automated deployment you know

9:27

push a button and that was like 200 Nish and so the first I think devops

9:33

Meetup was around then so it's it's it's been 15 years I guess 6 like I was

9:39

trying to so I started my career in 2010 so I my first job was a Java

9:44

developer and like I remember for some things like we would just uh SFTP to the

9:52

machine and then put the jar archive there and then like keep our fingers crossed that it doesn't break uh uh like

10:00

it was not really the I wouldn't call it this way right you were deploying you

10:06

had a Dey process I put it yeah

10:11

right was that so that was documented too it was like put the jar on production cross your

10:17

fingers I think there was uh like a page on uh some internal Viki uh yeah that

10:25

describes like with passwords and don't like what you should do yeah that was and and I think what's interesting is

10:33

why that changed right and and we laugh at it now but that was why didn't you

10:38

invest in automating deployment or a whole bunch of automated regression

10:44

tests right that would run because I think in software now that would be rare

10:49

that people wouldn't use C CD they wouldn't have some automated tests you know functional

10:56

regression tests that would be the exception whereas that the norm at the beginning of your career and so that's

11:03

what's interesting and I think you know if we if we talk about what's changed in the last two three years I I think it is

11:10

getting more standard there are um there's a lot more companies who are

11:15

talking data Ops or data observability um there's a lot more tools that are a lot more people are

11:22

using get in data and analytics than ever before I think thanks to DBT um and

11:29

there's a lot of tools that are I think getting more code Centric right that

11:35

they're not treating their configuration like a black box there there's several

11:41

bi tools that tout the fact that they that they're uh you know they're they're git Centric you know and and so and that

11:49

they're testable and that they have apis so things like that I think people maybe let's take a step back and just do a

11:57

quick summary of what data Ops data Ops is and then we can talk about like what changed in the last two years sure so I

12:06

guess it starts with a problem and that it's it sort of

12:11

admits some dark things about data and analytics and that we're not really successful and we're not really happy um

12:19

and if you look at the statistics on sort of projects and problems and even

12:25

the psychology like I think about a year or two we did a survey of

12:31

data Engineers 700 data engineers and 78% of them wanted their job to come with a therapist and 50% were thinking

12:38

of leaving the career altogether and so why why is everyone sort of unhappy well I I I think what happens is

12:46

teams either fall into two buckets they're sort of heroic teams who

12:52

are doing their they're working night and day they're trying really hard for their customer um and then they get

13:01

burnt out and then they quit honestly and then the second team have wrapped

13:06

their projects up in so much process and proceduralism and steps that doing

13:12

anything is sort of so slow and boring that they again leave in frustration um

13:18

or or live in cynicism and and that like the only outcome is quit and

13:24

start uh woodworking yeah the only outcome really is quit and start working

13:29

and um as a as a manager I always hated that right because when when your team

13:35

is either full of heroes or proceduralism you always have people who have the whole system in their head

13:42

they're certainly key people and then when they leave they take all that knowledge with them and then that

13:48

creates a bottleneck and so both of which are aren aren't and I think the

13:53

main idea of data Ops is there's a balance between fear and herois

14:00

that you can live you don't you know you don't have to be fearful 95% of the time maybe one or two% it's good to be

14:06

fearful and you don't have to be a hero again maybe one or two per it's good to be a hero but there's a balance um and

14:13

and in that balance you actually are much more prod

Aug 15, 202453:47
Working as a Core Developer in the Scikit-Learn Universe - Guillaume Lemaître

Working as a Core Developer in the Scikit-Learn Universe - Guillaume Lemaître

In this podcast episode, we talked with Guillaume Lemaître about navigating scikit-learn and imbalanced-learn. 🔗 CONNECT WITH Guillaume Lemaître LinkedIn - https://www.linkedin.com/in/guillaume-lemaitre-b9404939/ Twitter - https://x.com/glemaitre58 Github - https://github.com/glemaitre Website - https://glemaitre.github.io/ 🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks-club.slack.com/join/shared_invite/zt-2hu0sjeic-ESN7uHt~aVWc8tD3PefSlA#/shared-invite/email Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/u/0/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ Check other upcoming events - https://lu.ma/dtc-events LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://twitter.com/DataTalksClub Website - https://datatalks.club/ 🔗 CONNECT WITH ALEXEY Twitter - https://twitter.com/Al_Grigor Linkedin - https://www.linkedin.com/in/agrigorev/ 🎙 ABOUT THE PODCAST At DataTalksClub, we organize live podcasts that feature a diverse range of guests from the data field. Each podcast is a free-form conversation guided by a prepared set of questions, designed to learn about the guests’ career trajectories, life experiences, and practical advice. These insightful discussions draw on the expertise of data practitioners from various backgrounds. We stream the podcasts on YouTube, where each session is also recorded and published on our channel, complete with timestamps, a transcript, and important links. You can access all the podcast episodes here - https://datatalks.club/podcast.html 📚Check our free online courses ML Engineering course - http://mlzoomcamp.com Data Engineering course - https://github.com/DataTalksClub/data-engineering-zoomcamp MLOps course - https://github.com/DataTalksClub/mlops-zoomcamp Analytics in Stock Markets - https://github.com/DataTalksClub/stock-markets-analytics-zoomcamp LLM course - https://github.com/DataTalksClub/llm-zoomcamp Read about all our courses in one place - https://datatalks.club/blog/guide-to-free-online-courses-at-datatalks-club.html 👋🏼 GET IN TOUCH If you want to support our community, use this link - https://github.com/sponsors/alexeygrigorev If you're a company and want to support us, contact at alexey@datatalks.club

Jul 26, 202452:31
Building a Domestic Risk Assessment Tool - Sabina Firtala

Building a Domestic Risk Assessment Tool - Sabina Firtala

Links:

  • LinkedIn:https://www.linkedin.com/company/frontline100/
  • Ba Linh Le's LinkedIn: https://www.linkedin.com/in/ba-linh-le-/
  • Sabrina's LinkedIn: https://www.linkedin.com/in/sabina-firtala/
  • Twitter: https://x.com/frontline_100?mx=2
  • Website: https://www.frontline100.com/

Free LLM course: https://github.com/DataTalksClub/llm-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Jul 13, 202449:36
Berlin Buzzwords 2024

Berlin Buzzwords 2024

We stream the podcasts on YouTube, where each session is also recorded and published on our channel, complete with timestamps, a transcript, and important links. You can access all the podcast episodes here - https://datatalks.club/podcast.html 📚Check our free online courses ML Engineering course - http://mlzoomcamp.com Data Engineering course - https://github.com/DataTalksClub/data-engineering-zoomcamp MLOps course - https://github.com/DataTalksClub/mlops-zoomcamp Analytics in Stock Markets - https://github.com/DataTalksClub/stock-markets-analytics-zoomcamp LLM course - https://github.com/DataTalksClub/llm-zoomcamp Read about all our courses in one place - https://datatalks.club/blog/guide-to-free-online-courses-at-datatalks-club.html 👋🏼 GET IN TOUCH If you want to support our community, use this link - https://github.com/sponsors/alexeygrigorev If you’re a company, support us at alexey@datatalks.club

Jul 06, 202437:33
Community Building and Teaching in AI & Tech - Erum Afzal

Community Building and Teaching in AI & Tech - Erum Afzal

We talked about:

  • Erum's Background
  • Omdena Academy and Erum’s Role There
  • Omdena’s Community and Projects
  • Course Development and Structure at Omdena Academy
  • Student and Instructor Engagement
  • Engagement and Motivation
  • The Role of Teaching in Community Building
  • The Importance of Communities for Career Building
  • Advice for Aspiring Instructors and Freelancers
  • DS and ML Talent Market Saturation
  • Resources for Learning AI and Community Building
  • Erum’s Resource Recommendations


Links:

  • LinkedIn: https://www.linkedin.com/in/erum-afzal-64827b24/

  • Twitter:  https://twitter.com/Erum55449739

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

May 10, 202450:01
Working in Open Source - Probabl.ai and sklearn - Vincent Warmerdam

Working in Open Source - Probabl.ai and sklearn - Vincent Warmerdam

We talked about:

  • Vincent’s Background
  • SciKit Learn’s History and Company Formation
  • Maintaining and Transitioning Open Source Projects
  • Teaching and Learning Through Open Source
  • Role of Developer Relations and Content Creation
  • Teaching Through Calm Code and The Importance of Content Creation
  • Current Projects and Future Plans for Calm Code
  • Data Processing Tricks and The Importance of Innovation
  • Learning the Fundamentals and Changing the Way You See a Problem
  • Dev Rel and Core Dev in One
  • Why :probabl. Needs a Dev Rel
  • Exploration of Skrub and Advanced Data Processing
  • Personal Insights on SciKit Learn and Industry Trends
  • Vincent’s Upcoming Projects

Links:

  • probabl. YouTube channel: https://www.youtube.com/@UCIat2Cdg661wF5DQDWTQAmg
  • Calmcode website: https://calmcode.io/
  • probabl. website: https://probabl.ai/


Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

May 03, 202452:02
AI for Ecology, Biodiversity, and Conservation - Tanya Berger-Wolf

AI for Ecology, Biodiversity, and Conservation - Tanya Berger-Wolf

Links:

  • Biodiversity and Artificial Intelligence pdf: https://www.gpai.ai/projects/responsible-ai/environment/biodiversity-and-AI-opportunities-recommendations-for-action.pdf


Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Apr 26, 202451:48
Knowledge Graphs and LLMs Across Academia and Industry - Anahita Pakiman

Knowledge Graphs and LLMs Across Academia and Industry - Anahita Pakiman

We talked about:

  • Anahita's Background
  • Mechanical Engineering and Applied Mechanics
  • Finite Element Analysis vs. Machine Learning
  • Optimization and Semantic Reporting
  • Application of Knowledge Graphs in Research
  • Graphs vs Tabular Data
  • Computational graphs
  • Graph Data Science and Graph Machine Learning
  • Combining Knowledge Graphs and Large Language Models (LLMs)
  • Practical Applications and Projects
  • Challenges and Learnings
  • Anahita’s Recommendations


Links:

  • GitHub repo: https://github.com/antahiap/ADPT-LRN-PHYS/tree/main

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Apr 05, 202453:15
Inclusive Data Leadership Coaching - Tereza Iofciu

Inclusive Data Leadership Coaching - Tereza Iofciu

We talked about:

  • Tereza’s background
  • Switching from an Individual Contributor to Lead
  • Python Pizza and the pizza management metaphor
  • Learning to figure things out on your own and how to receive feedback
  • Tereza as a leadership coach
  • Podcasts
  • Tereza’s coaching framework (selling yourself vs bragging)
  • The importance of retrospectives
  • The importance of communication and active listening
  • Convincing people you don’t have power over
  • Building relationships and empathy
  • Inclusive leadership


Links:

  • LinkedIn: https://www.linkedin.com/in/tereza-iofciu/
  • Twitter: https://twitter.com/terezaif
  • Github: https://github.com/terezaif
  • Website: https:// terezaiofciu.com


Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Mar 29, 202448:17
Building Production Search Systems - Daniel Svonava
Mar 22, 202458:26
Building Machine Learning Products - Reem Mahmoud

Building Machine Learning Products - Reem Mahmoud

We talked about:


  • Reem’s background
  • Context-aware sensing and transfer learning
  • Shifting focus from PhD to industry
  • Reem’s experience with startups and dealing with prejudices towards PhDs
  • AI interviewing solution
  • How candidates react to getting interviewed by an AI avatar
  • End-to-end overview of a machine learning project
  • The pitfalls of using LLMs in your process
  • Mitigating biases
  • Addressing specific requirements for specific roles
  • Reem’s resource recommendations


Links:

  • LinkedIn: https://www.linkedin.com/in/reemmahmoud/recent-activity/all/
  • Website: https://topmate.io/reem_mahmoud


Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Mar 16, 202456:48
Make an Impact Through Volunteering Open Source Work - Sara EL-ATEIF

Make an Impact Through Volunteering Open Source Work - Sara EL-ATEIF

We talked about:

  • Sara’s background
  • On being a Google PhD fellow
  • Sara’s volunteer work
  • Finding AI volunteer work
  • Sara’s Fruit Punch challenge
  • How to take part in AI challenges
  • AI Wonder Girls
  • Hackathons
  • Things people often miss in AI projects and hackathons
  • Getting creative
  • Fostering your social media
  • Tips on applying for volunteer projects
  • Why it’s worth doing volunteer projects
  • Opportunities for data engineers and students
  • Sara’s newsletter suggestions


Links:

  • Dev and AI hackathons: https://devpost.com/
  • Healthcare-focused challenges: https://grand-challenge.org/challenges/
  • Volunteering in projects (AI4Good): https://www.fruitpunch.ai/
  • Volunteering in projects (AI4Good) 2: https://www.omdena.com/
  • Twitter: https://twitter.com/el_ateifSara
  • Instagram: https://www.instagram.com/saraelateif/
  • LinkedIn: https://www.linkedin.com/in/sara-el-ateif/
  • Youtube: www.youtube.com/@elateifsara


Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Feb 23, 202455:56
Accelerating The Job Hunt for The Perfect Job in Tech - Sarah Mestiri

Accelerating The Job Hunt for The Perfect Job in Tech - Sarah Mestiri

We talked about:

  • Sarah’s background
  • How Sarah became a coach and found her niche
  • Sarah’s clients
  • How Sarah helps her clients find the perfect job
  • Finding a specialization
  • Informational interviews
  • Building a connection for mutual benefit
  • The networking strategy
  • Listing your projects in the CV
  • The importance of doing research yourself and establishing your interests
  • How to land a part-time job when the company wants full-time
  • Age is not a factor
  • Applying for jobs after finishing a course and the importance of sharing your learnings
  • Sarah resource recommendations


Links:

  • LinkedIn: https://www.linkedin.com/in/sarahmestiri/
  • Website: https://thrivingcareermoms.com/
  • Personal Website: https://www.sarahmestiri.com/
  • Youtube channel: https://www.youtube.com/@thrivingcareermoms444

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Feb 02, 202453:05
Machine Learning Engineering in Finance - Nemanja Radojkovic

Machine Learning Engineering in Finance - Nemanja Radojkovic

We talked about:

  • Nemanja’s background
  • When Nemanja first work as a data person
  • Typical problems that ML Ops folks solve in the financial sector
  • What Nemanja currently does as an ML Engineer
  • The obstacle of implementing new things in financial sector companies
  • Going through the hurdles of DevOps
  • Working with an on-premises cluster
  • “ML Ops on a Shoestring” (You don’t need fancy stuff to start w/ ML Ops)
  • Tactical solutions
  • Platform work and code work
  • Programming and soft skills needed to be an ML Engineer
  • The challenges of transitioning from and electrical engineering and sales to ML Ops
  • The ML Ops tech stack for beginners
  • Working on projects to determine which skills you need


Links:

  • LinkedIn: https://www.linkedin.com/in/radojkovic/

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Jan 31, 202453:11
Stock Market Analysis with Python and Machine Learning - Ivan Brigida

Stock Market Analysis with Python and Machine Learning - Ivan Brigida

We talked about:

  • Ivan’s background
  • How Ivan became interested in investing
  • Getting financial data to run simulations
  • Open, High, Low, Close, Volume
  • Risk management strategy
  • Testing your trading strategies
  • Sticking to your strategy
  • Important metrics and remembering about trading fees
  • Important features
  • Deployment
  • How DataTalks.Club courses helped Ivan
  • Ivan’s site and course sign-up


Links:

  • Exploring Finance APIs: https://pythoninvest.com/long-read/exploring-finance-apis
  • Python Invest Blog Articles: https://pythoninvest.com/blog


Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Jan 24, 202455:31
Bayesian Modeling and Probabilistic Programming - Rob Zinkov

Bayesian Modeling and Probabilistic Programming - Rob Zinkov

We talked about:

  • Rob’s background
  • Going from software engineering to Bayesian modeling
  • Frequentist vs Bayesian modeling approach
  • About integrals
  • Probabilistic programming and samplers
  • MCMC and Hakaru
  • Language vs library
  • Encoding dependencies and relationships into a model
  • Stan, HMC (Hamiltonian Monte Carlo) , and NUTS
  • Sources for learning about Bayesian modeling
  • Reaching out to Rob


Links:

  • Book 1: https://bayesiancomputationbook.com/welcome.html
  • Book/Course: https://xcelab.net/rm/statistical-rethinking/

Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Jan 22, 202454:16
Navigating Challenges and Innovations in Search Technologies - Atita Arora

Navigating Challenges and Innovations in Search Technologies - Atita Arora

We talked about:


  • Atita’s background
  • How NLP relates to search
  • Atita’s experience with Lucidworks and OpenSource Connections
  • Atita’s experience with Qdrant and vector databases
  • Utilizing vector search
  • Major changes to search Atita has noticed throughout her career
  • RAG (Retrieval-Augmented Generation)
  • Building a chatbot out of transcripts with LLMs
  • Ingesting the data and evaluating the results
  • Keeping humans in the loop
  • Application of vector databases for machine learning
  • Collaborative filtering
  • Atita’s resource recommendations


Links:

  • LinkedIn: https://www.linkedin.com/in/atitaarora/
  • Twitter: https://x.com/atitaarora
  • Github: https://github.com/atarora
  • Human-in-the-Loop Machine Learning: https://www.manning.com/books/human-in-the-loop-machine-learning
  • Relevant Search: https://www.manning.com/books/relevant-search
  • Let's learn about Vectors: https://hub.superlinked.com/ Langchain: https://python.langchain.com/docs/get_started/introduction
  • Qdrant blog: https://blog.qdrant.tech/
  • OpenSource Connections Blog: https://opensourceconnections.com/blog/

Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Dec 27, 202357:00
The Entrepreneurship Journey: From Freelancing to Starting a Company - Adrian Brudaru

The Entrepreneurship Journey: From Freelancing to Starting a Company - Adrian Brudaru

We talked about:

  • Adrian’s background
  • The benefits of freelancing
  • Having an agency vs freelancing
  • What let Adrian switch over from freelancing
  • The conception of DLT (Growth Full Stack)
  • The investment required to start a company
  • Growth through the provision of services
  • Growth through teaching (product-market fit)
  • Moving on to creating docs
  • Adrian’s current role
  • Strategic partnerships and community growth through DocDB
  • Plans for the future of DLT
  • DLT vs Airbyte vs Fivetran
  • Adrian’s resource recommendations


Links:

  • Adrian's LinkedIn: https://www.linkedin.com/in/data-team/
  • Twitter: https://twitter.com/dlt_library
  • Github: https://github.com/dlt-hub/dlt
  • Website: https://dlthub.com/docs/intro


Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Dec 19, 202356:22
Become a Data Freelancer - Dimitri Visnadi

Become a Data Freelancer - Dimitri Visnadi

We talked about:

  • Dimitri’s background
  • The first steps of transitioning into freelance
  • Working with recruiters (contracting)
  • Deciding on what to charge for your services
  • Establishing your network
  • Self-marketing
  • Contracting vs freelancing
  • Which channel is better for those starting out?
  • Cutting out the middleman
  • Where to look for clients and how to vet them
  • The different way of getting into freelancing
  • Going back to a full-time job after freelancing
  • Common mistakes freelancers make
  • Dimitri’s resource suggestions
  • Reaching out to Dimitri


Links:

  • LinkedIn profile: http://www.linkedin.com/in/visnadi
  • The DataFreelancer website: https://thedatafreelancer.com/


Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Dec 17, 202355:13
AI for Digital Health - Maria Bruckert

AI for Digital Health - Maria Bruckert

We talked about:


  • Maria’s background
  • Deciding to go into telecare (healthcare)
  • Current difficulties in healthcare
  • Getting into the healthcare industry as a lifestyle brand
  • The importance of a plan B and being flexible
  • What is SQIN and the importance of communication
  • Going from lipstick to skin health analysis
  • The importance of community and broadening your audience
  • The importance of feedback and communicating benefits
  • The current state and growth of SQIN
  • Convincing investors and the importance of proving profitability
  • Maria’s role at SQIN
  • Balancing a newborn child and a new company


Links:

  • Free ML Engineering course: http://mlzoomcamp.com
  • Join DataTalks.Club: https://datatalks.club/slack.html
  • Our events: https://datatalks.club/events.html
Dec 04, 202350:25
Cracking the Code: Machine Learning Made Understandable - Christoph Molnar

Cracking the Code: Machine Learning Made Understandable - Christoph Molnar

We talked about:

  • Christoph’s background
  • Kaggle and other competitions
  • How Christoph became interested in interpretable machine learning
  • Interpretability vs Accuracy
  • Christoph’s current competition engagement
  • How Christoph chooses topics for books
  • Why Christoph started the writing journey with a book
  • Self-publishing vs via a publisher
  • Christoph’s other books
  • What is conformal prediction?
  • Christoph’s book on SHAP
  • Explainable AI vs Interpretable AI
  • Working alone vs with other people
  • Christoph’s other engagements and how to stay hands-on
  • Keeping a logbook
  • Does one have to be an expert on the topic to write a book about it?
  • Writing in the open and other feedback gathering methods
  • Advice for those who want to be technical writers
  • Self-publishing tools
  • Finding Christoph online


Links:

  • LinkedIn: https://www.linkedin.com/in/christoph-molnar/
  • Website: https://christophmolnar.com/


Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Nov 26, 202351:59
The Unwritten Rules for Success in Machine Learning - Jack Blandin

The Unwritten Rules for Success in Machine Learning - Jack Blandin

We talked about:

  • Jack’s background
  • Transitioning from IC to management
  • Lesson not taught in traditional school
  • The importance of people’s perception, trust, and respect
  • How soft skills are relevant to machine learning
  • How to put on a salesman hat in machine learning management
  • The importance of visuals and building a POC as fast as possible
  • 1st Rule of Machine Learning – don’t be afraid to start without machine learning
  • The importance of understanding the reality that data represents
  • The importance of putting yourself in the shoes of customers
  • The importance of software engineering skills in machine learning
  • Where to find Jack’s content
  • Jack’s next venture

Links:


  • Jack's LinkedIn profile: https://www.linkedin.com/in/jackblandin/

Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Nov 20, 202350:26
From a Research Scientist at Amazon to a Machine learning/AI Consultant - Verena Webber

From a Research Scientist at Amazon to a Machine learning/AI Consultant - Verena Webber

Links:

  • Mini sound bath: https://www.youtube.com/watch?v=g-lDrcSqcrQ


Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Nov 10, 202354:55
From Marketing to Product Owner in Search - Lera Kaimashnіkova

From Marketing to Product Owner in Search - Lera Kaimashnіkova

We talked about:

  • Lera’s background
  • Lera’s move from Ukraine to Germany
  • The transition from Marketing to Product Ownership
  • The importance of communication and one-on-ones
  • The role of Product Owner
  • Utilizing Scrum as a Product Owner
  • Building teams and cross-functionality
  • Lera’s experience learning about search
  • The importance of having both technical knowledge and business context
  • Open developer positions at AUTODOC
  • What experience Lera came to AUTODOC with
  • How marketing skills helped Lera in her current role
  • Lera’s resource recommendations
  • Everything is possible



Links:

  • Post: https://www.linkedin.com/posts/leracaiman_elasticsearch-ecommerce-activity-7106615081588674560-5WQO


Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Nov 05, 202355:14
Collaborative Data Science in Business - Ioannis Mesionis

Collaborative Data Science in Business - Ioannis Mesionis

Links:

  • LinkedIn: https://www.linkedin.com/in/ioannis-mesionis/
  • Github: https://github.com/ioannismesionis
  • Website: https://ioannismesionis.github.io/



Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Oct 27, 202355:50
Bridging Data Science and Healthcare - Eleni Stamatelou

Bridging Data Science and Healthcare - Eleni Stamatelou

Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Oct 20, 202354:02
DataTalks.Club Anniversary Interview - Alexey Grigorev, Johanna Bayer

DataTalks.Club Anniversary Interview - Alexey Grigorev, Johanna Bayer

Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Oct 12, 202357:45
Data Engineering for Fraud Prevention - Angela Ramirez

Data Engineering for Fraud Prevention - Angela Ramirez

We talked about:

  • Angela's background
  • Angela's role at Sam's Club
  • The usefulness of knowing ML as a data engineer
  • Angela's career path
  • Transitioning from data analyst to data engineer/system designer
  • Best practices for system design and data engineering
  • Working with document databases
  • Working with network-based databases
  • Detecting fraud with a network-based database
  • Selecting the database type to work with
  • Neo4j vs Postgres
  • The importance of having software engineering knowledge in data engineering
  • Data quality check tooling
  • The greatest challenges in data engineering
  • Debugging and finding the root cause of a failed job
  • What kinds of tools Angela uses on a daily basis
  • Working with external data sources
  • Angela's resource recommendations


Links:

  • LinkedIn: https://www.linkedin.com/in/aramirez1305/
  • Twitter: https://twitter.com/angelamaria__r
  • Github: https://github.com/aramir62
  • Previous podcast talk: https://twitter.com/i/spaces/1OwGWwZAZDnGQ?s=20


Free ML Engineering course: http://mlzoomcamp.com

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Oct 06, 202354:14
From Data Manager to Data Architect - Loïc Magnien

From Data Manager to Data Architect - Loïc Magnien

We talked about:

  • Loïc's background
  • Data management
  • Loïc's transition to data engineer
  • Challenges in the transition to data engineering
  • What is a data architect?
  • The output of a data architect's work
  • Establishing metrics and dimensions
  • The importance of communication
  • Setting up best practices for the team
  • Staying relevant and tech-watching
  • Setting up specifications for a pipeline
  • Be agile, create a POC, iterate ASAP, and build reusable templates
  • Reaching out to Loïc for questions


Links:

  • Loiic LinkedIn: https://www.linkedin.com/in/loicmagnien/


Free ML Engineering course: http://mlzoomcamp.com

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Sep 29, 202356:42
Pragmatic and Standardized MLOps - Maria Vechtomova

Pragmatic and Standardized MLOps - Maria Vechtomova

We talked about:

  • Maria's background
  • Marvelous MLOps
  • Maria's definition of MLOps
  • Alternate team setups without a central MLOps team
  • Pragmatic vs non-pragmatic MLOps
  • Must-have ML tools (categories)
  • Maturity assessment
  • What to start with in MLOps
  • Standardized MLOps
  • Convincing DevOps to implement
  • Understanding what the tools are used for instead of knowing all the tools
  • Maria's next project plans
  • Is LLM Ops a thing?
  • What Ahold Delhaize does
  • Resource recommendations to learn more about MLOps
  • The importance of data engineering knowledge for ML engineers

Links:

  • LinkedIn: https://www.linkedin.com/company/marvelous-mlops/
  • Website: https://marvelousmlops.substack.com/

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Sep 08, 202353:43
Democratizing Causality - Aleksander Molak

Democratizing Causality - Aleksander Molak

We talked about:

  • Aleksander's background
  • Aleksander as a Causal Ambassador
  • Using causality to make decisions
  • Counterfactuals and and Judea Pearl
  • Meta-learners vs classical ML models
  • Average treatment effect
  • Reducing causal bias, the super efficient estimator, and model uplifting
  • Metrics for evaluating a causal model vs a traditional ML model
  • Is the added complexity of a causal model worth implementing?
  • Utilizing LLMs in causal models (text as outcome)
  • Text as treatment and style extraction
  • The viability of A/B tests in causal models
  • Graphical structures and nonparametric identification
  • Aleksander's resource recommendations

Links:


  • The Book of Why: https://amzn.to/3OZpvBk
  • Causal Inference and Discovery in Python: https://amzn.to/46Pperr
  • Book's GitHub repo: https://github.com/PacktPublishing/Causal-Inference-and-Discovery-in-Python
  • The Battle of Giants: Causality vs NLP (PyData Berlin 2023): https://www.youtube.com/watch?v=Bd1XtGZhnmw
  • New Frontiers in Causal NLP (papers repo): https://bit.ly/3N0TFTL


Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Aug 25, 202356:00
Mastering Data Engineering as a Remote Worker - José María Sánchez Salas

Mastering Data Engineering as a Remote Worker - José María Sánchez Salas

We talked about:

  • José's background
  • How José relocated to Norway and his schedule
  • Tech companies in Norway and José role
  • Challenges of working as a remote data engineer
  • José's newsletter on how to make use of data
  • The process of making data useful
  • Where José gets inspiration for his newsletter
  • Dealing with burnout
  • When in Norway, do as the Norwegians do
  • The legalities of working remotely in Norway
  • The benefits of working remotely


Links:

  • LinkedIn: https://www.linkedin.com/in/jmssalas
  • Github: https://github.com/jmssalas
  • Website & Newsletter: https://jmssalas.com


Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Aug 18, 202346:31
The Good, the Bad and the Ugly of GPT - Sandra Kublik

The Good, the Bad and the Ugly of GPT - Sandra Kublik

We talked about:

  • Sandra's background
  • Making a YouTube channel to break into the LLM space
  • The business cases for LLMs
  • LLMs as amplifiers
  • The befits of keeping a human in the loop when using LLMs (AI limitations)
  • Using LLMs as assistants
  • Building an app that uses an LLM
  • Prompt whisperers and how to improve your prompts
  • Sandra's 7-day LLM experiment
  • Sandra's LLM content recommendations
  • Finding Sandra online


Links:

  • LinkedIn: https://www.linkedin.com/in/sandrakublik/
  • Twitter: https://twitter.com/sandra_kublik
  • Youtube: https://www.youtube.com/@sandra_kublik


Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Aug 04, 202350:53
LLMs for Everyone - Meryem Arik

LLMs for Everyone - Meryem Arik

We talked about:


  • Meryam's background
  • The constant evolution of startups
  • How Meryam became interested in LLMs
  • What is an LLM (generative vs non-generative models)?
  • Why LLMs are important
  • Open source models vs API models
  • What TitanML does
  • How fine-tuning a model helps in LLM use cases
  • Fine-tuning generative models
  • How generative models change the landscape of human work
  • How to adjust models over time
  • Vector databases and LLMs
  • How to choose an open source LLM or an API
  • Measuring input data quality
  • Meryam's resource recommendations


Links:

  • Website: https://www.titanml.co/
  • Beta docs: https://titanml.gitbook.io/iris-documentation/overview/guide-to-titanml...
  • Using llama2.0 in TitanML Blog: https://medium.com/@TitanML/the-easiest-way-to-fine-tune-and-inference-llama-2-0-8d8900a57d57
  • Discord: https://discord.gg/83RmHTjZgf
  • Meryem LinkedIn: https://www.linkedin.com/in/meryemarik/


Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Jul 28, 202355:29
Investing in Open-Source Data Tools - Bela Wiertz

Investing in Open-Source Data Tools - Bela Wiertz

We talked about:

  • Bela's background
  • Why startups even need investors
  • Why open source is a viable go-to-market strategy
  • Building a bottom-up community
  • The investment thesis for the TKM Family Office and the blurriness of the funding round naming convention
  • Angel investors vs VC Funds vs family offices
  • Bela's investment criteria and GitHub stars as a metric
  • Inbound sourcing, outbound sourcing, and investor networking
  • Making a good impression on an investor
  • Balancing open and closed source parts of a product
  • The future of open source
  • Recent successes of open source companies
  • Bela's resource recommendations


Links:


  • Understand who is engaging with your open source project article: https://www.crowd.dev/
  • Top 6 Books on Developer Community Building: https://www.crowd.dev/post/top-6-books-on-developer-community-building
  • Which open source software metrics matter: https://www.bvp.com/atlas/measuring-the-engagement-of-an-open-source-software-community#Which-open-source-software-metrics-matter


Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Jul 21, 202354:58
Why Machine Learning Design is Broken - Valerii Babushkin

Why Machine Learning Design is Broken - Valerii Babushkin

Links:


  • Book: https://www.manning.com/books/machine-learning-system-design?utm_source=AGMLBookcamp&utm_medium=affiliate&utm_campaign=book_babushkin_machine_4_25_23&utm_content=twitter
  • Discount: poddatatalks21 (35% off)
  • Evidently: https://www.evidentlyai.com/
  • Article: https://medium.com/people-ai-engineering/design-documents-for-ml-models-bbcd30402ff7


Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Jul 14, 202351:20
Interpretable AI and ML - Polina Mosolova

Interpretable AI and ML - Polina Mosolova

We talked about:

  • Polina's background
  • How common it is for PhD students to build ML pipelines end-to-end
  • Simultaneous PhD and industry experience
  • Support from both the academic and industry sides
  • How common the industrial PhD setup is and how to get into one
  • Organizational trust theory
  • How price relates to trust
  • How trust relates to explainability
  • The importance of actionability
  • Explainability vs interpretability vs actionability
  • Complex glass box models
  • Does the explainability of a model follow explainability?
  • What explainable AI bring to customers and end users
  • Can all trust be turned into KPI?

Links:


  • LinkedIn: https://www.linkedin.com/in/polina-mosolova/
  • Neural Additive Models paper: https://proceedings.neurips.cc/paper/2021/file/251bd0442dfcc53b5a761e050f8022b8-Paper.pdf
  • Neural Basis Model paper: https://arxiv.org/pdf/2205.14120.pdf
  • Interpretable Feature Spaces paper: https://kdd.org/exploration_files/vol24issue1_1._Interpretable_Feature_Spaces_revised.pdf
Jul 07, 202352:48
From Scratch to Success: Building an MLOps Team and ML Platform - Simon Stiebellehner

From Scratch to Success: Building an MLOps Team and ML Platform - Simon Stiebellehner

We talked about:

  • Simon's background
  • What MLOps is and what it isn't
  • Skills needed to build an ML platform that serves 100s of models
  • Ranking the importance of skills
  • The point where you should think about building an ML platform
  • The importance of processes in ML platforms
  • Weighing your options with SaaS platforms
  • The exploratory setup, experiment tracking, and model registry
  • What comes after deployment?
  • Stitching tools together to create an ML platform
  • Keeping data governance in mind when building a platform
  • What comes first – the model or the platform?
  • Do MLOps engineers need to have deep knowledge of how models work?
  • Is API design important for MLOps?
  • Simon's recommendations for furthering MLOps knowledge


Links:

  • LinkedIn: https://www.linkedin.com/in/simonstiebellehner/
  • Github: https://github.com/stiebels
  • Medium: https://medium.com/@sistel

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Jun 30, 202353:33
From MLOps to DataOps - Santona Tuli

From MLOps to DataOps - Santona Tuli

We talked about:

  • Santona's background
  • Focusing on data workflows
  • Upsolver vs DBT
  • ML pipelines vs Data pipelines
  • MLOps vs DataOps
  • Tools used for data pipelines and ML pipelines
  • The “modern data stack” and today's data ecosystem
  • Staging the data and the concept of a “lakehouse”
  • Transforming the data after staging
  • What happens after the modeling phase
  • Human-centric vs Machine-centric pipeline
  • Applying skills learned in academia to ML engineering
  • Crafting user personas based on real stories
  • A framework of curiosity
  • Santona's book and resource recommendations


Links:

  • LinkedIn: https://www.linkedin.com/in/santona-tuli/
  • Upsolver website: upsolver.com
  • Why we built a SQL-based solution to unify batch and stream workflows: https://www.upsolver.com/blog/why-we-built-a-sql-based-solution-to-unify-batch-and-stream-workflows


Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Jun 23, 202353:05
Data Developer Relations - Hugo Bowne-Anderson

Data Developer Relations - Hugo Bowne-Anderson

We talked about:

  • Hugo's background
  • Why do tools and the companies that run them have wildly different names
  • Hugo's other projects beside Metaflow
  • Transitioning from educator to DevRel
  • What is DevRel?
  • DevRel vs Marketing
  • How DevRel coordinates with developers
  • How DevRel coordinates with marketers
  • What skills a DevRel needs
  • The challenges that come with being an educator
  • Becoming a good writer: nature vs nurture
  • Hugo's approach to writing and suggestions
  • Establishing a goal for your content
  • Choosing a form of media for your content
  • Is DevRel intercompany or intracompany?
  • The Vanishing Gradients podcast
  • Finding Hugo online


Links:

  • Hugo Browne's github: http://hugobowne.github.io/
  • Vanishing Gradients: https://vanishinggradients.fireside.fm/
  • MLOps and DevOps: Why Data Makes It Differenthttps://www.oreilly.com/radar/mlops-and-devops-why-data-makes-it-different/
  • Evaluate Metaflow for free, right from your Browser: https://outerbounds.com/sandbox/


Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html


Jun 16, 202350:51
Lessons Learned from Freelancing and Working in a Start-up - Antonis Stellas

Lessons Learned from Freelancing and Working in a Start-up - Antonis Stellas

We talked about;

  • Antonis' background
  • The pros and cons of working for a startup
  • Useful skills for working at a startup and the Lean way to work
  • How Antonis joined the DataTalks.Club community
  • Suggestions for students joining the MLOps course
  • Antonis contributing to Evidently AI
  • How Antonis started freelancing
  • Getting your first clients on Upwork
  • Pricing your work as a freelancer
  • The process after getting approved by a client
  • Wearing many hats as a freelancer and while working at a startup
  • Other suggestions for getting clients as a freelancer
  • Antonis' thoughts on the Data Engineering course
  • Antonis' resource recommendations

Links:

  • Lean Startup by Eric Ries: https://theleanstartup.com/
  • Lean Analytics: https://leananalyticsbook.com/
  • Designing Machine Learning Systems by Chip Huyen: https://www.oreilly.com/library/view/designing-machine-learning/9781098107956/
  • Kafka Streaming with python by Khris Jenkins tutorial video: https://youtu.be/jItIQ-UvFI4


Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Jun 09, 202350:31
Data Access Management - Bart Vandekerckhove

Data Access Management - Bart Vandekerckhove

We talked about:

  • Bart's background
  • What is data governance?
  • Data dictionaries and data lineage
  • Data access management
  • How to learn about data governance
  • What skills are needed to do data governance effectively
  • When an organization needs to start thinking about data governance
  • Good data access management processes
  • Data masking and the importance of automating data access
  • DPO and CISO roles
  • How data access management works with a data mesh approach
  • Avoiding the role explosion problem
  • The importance of data governance integration in DataOps
  • Terraform as a stepping stone to data governance
  • How Raito can help an organization with data governance
  • Open-source data governance tools

Links:

  • LinkedIn: https://www.linkedin.com/in/bartvandekerckhove/
  • Twitter: https://twitter.com/Bart_H_VDK
  • Github: https://github.com/raito-io
  • Website: https://www.raito.io/
  • Data Mesh Learning Slack: https://data-mesh-learning.slack.com/join/shared_invite/zt-1qs976pm9-ci7lU8CTmc4QD5y4uKYtAA#/shared-invite/email
  • DataQG Website: https://dataqg.com/
  • DataQG Slack: https://dataqgcommunitygroup.slack.com/join/shared_invite/zt-12n0333gg-iTZAjbOBeUyAwWr8I~2qfg#/shared-invite/email
  • DMBOK (Data Management Book of Knowledge): https://www.dama.org/cpages/body-of-knowledge
  • DMBOK Wheel describing the data governance activities: https://www.dama.org/cpages/dmbok-2-wheel-images


Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Jun 02, 202350:29
Data Strategy: Key Principles and Best Practices - Boyan Angelov

Data Strategy: Key Principles and Best Practices - Boyan Angelov

We talked about:


  • Boyan's background
  • What is data strategy?
  • Due diligence and establishing a common goal
  • Designing a data strategy
  • Impact assessment, portfolio management, and DataOps
  • Data products
  • DataOps, Lean, and Agile
  • Data Strategist vs Data Science Strategist
  • The skills one needs to be a data strategist
  • How does one become a data strategist?
  • Data strategist as a translator
  • Transitioning from a Data Strategist role to a CTO
  • Using ChatGPT as a writing co-pilot
  • Using ChatGPT as a starting point
  • How ChatGPT can help in data strategy
  • Pitching a data strategy to a stakeholder
  • Setting baselines in a data strategy
  • Boyan's book recommendations

Links:


  • LinkedIn: https://www.linkedin.com/in/angelovboyan/
  • Twitter: https://twitter.com/thinking_code
  • Github: https://github.com/boyanangelov
  • Website: https://boyanangelov.com/


Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

May 26, 202355:49
Practical Data Privacy - Katharine Jarmul

Practical Data Privacy - Katharine Jarmul

We talked about:

  • Katharine's background
  • Katharine's ML privacy startup
  • GDPR, CCPA, and the “opt-in as the default” approach
  • What is data privacy?
  • Finding Katharine's book – Practical Data Privacy
  • The various definitions of data privacy and “user profiles”
  • Privacy engineering and privacy-enhancing technologies
  • Why data privacy is important
  • What is differential privacy?
  • The importance of keeping privacy in mind when designing systems
  • Data privacy on the example of ChatGPT
  • Katharine's resource suggestions for learning about data privacy


Links:

  • LinkedIn: https://www.linkedin.com/in/katharinejarmul/
  • Twitter: https://twitter.com/kjam

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

May 19, 202357:44
Building Scalable and Reliable Machine Learning Systems - Arseny Kravchenko

Building Scalable and Reliable Machine Learning Systems - Arseny Kravchenko

We talked about:

  • Arseny's background
  • Working on machine learning in startups
  • What is Machine Learning System Design?
  • Constraints and requirements
  • Known unknowns vs unknown unknowns (Design stage)
  • Writing a design document
  • Technical problems vs product-oriented problems
  • The solution part of the Design Document
  • What motivated Arseny to write a book on ML System Design
  • Examples of a Design Document in the book
  • The types of readers for ML System Design
  • Working with the co-author
  • Reacting to constraints and feedback when writing a book
  • Arseny's favorite chapter of the book
  • Other resources where you can learn about ML System Design
  • Twitter Giveaway


Links:

  • Book: https://www.manning.com/books/machine-learning-system-design?utm_source=AGMLBookcamp&utm_medium=affiliate&utm_campaign=book_babushkin_machine_4_25_23&utm_content=twitter
  • Discount: poddatatalks21 (35% off)


Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

May 12, 202350:59
Building an Open-Source NLP Tool - Johannes Hötter

Building an Open-Source NLP Tool - Johannes Hötter

We talked about:

  • Johannes’s background
  • Johannes’s Open Source Spotlight demos – Refinery and Bricks
  • The difficulties of working with natural language processing (NLP)
  • Incorporating ChatGPT into a process as a heuristic
  • What is Bricks?
  • The process of starting a startup – Kern
  • Making the decision to go with open source
  • Pros and cons of launching as open source
  • Kern’s business model
  • Working with enterprises
  • Johannes as a salesperson
  • The team at Kern
  • Johannes’s role at Kern
  • How Johannes and Henrik separate responsibilities at Kern
  • Working with very niche use cases
  • The short story of how Kern got its funding
  • Johannes’s resource recommendation


Links:

  • Refinery's GitHub repo: https://github.com/code-kern-ai/refinery
  • Bricks' Github repo: https://github.com/code-kern-ai/bricks
  • Bricks Open Source Spotlight demo: https://www.youtube.com/watch?v=r3rXzoLQy2U
  • Refinery Open Source Spotlight demo: https://www.youtube.com/watch?v=LlMhN2f7YDg
  • Discord: https://discord.com/invite/qf4rGCEphW
  • Ker's Website: https://www.kern.ai


Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Apr 21, 202356:27
Navigating Industrial Data Challenges - Rosona Eldred

Navigating Industrial Data Challenges - Rosona Eldred

We talked about:

  • Rosona’s background
  • How mathematics knowledge helps in industry
  • What is industrial data?
  • Setting up an industrial process using blue paint
  • Internet companies’ data vs industrial data
  • Explaining industrial processes using packing peanuts
  • Why productive industry needs data
  • Measuring product qualities
  • How data specialists use industrial data
  • Defining and measuring sustainability
  • Using data in reactionary measures to changing regulations
  • Types of industrial data
  • Solving problems and optimizing with industrial data
  • Industrial solvers
  • Tiny data vs Big data in productive industry
  • The advantages of coming from academia into productive industry
  • Materials and resources for industrial data
  • Women in industry
  • Why Rosona decided to shift to industrial data


Links:

  • Kaggle dataset: https://www.kaggle.com/datasets/paresh2047/uci-semcom






Apr 14, 202353:22
Mastering Self-Learning in Machine Learning - Aaisha Muhammad

Mastering Self-Learning in Machine Learning - Aaisha Muhammad

We talked about:

  • Aaisha’s background
  • How homeschooling affects self-study
  • Deciding on what to learn about
  • Establishing whether a resource is good
  • How Aaisha focuses on learning
  • Deciding on what kind of project to build
  • Find research materials
  • Aaisha’s experience with the Data Talks Club ML Zoomcamp
  • ML Zoomcamp projects
  • Aaisha’s interest in bioinformatics
  • Keeping motivated with deadlines
  • Notes and time-tracking tools
  • Drawbacks to self-studying
  • Aaisha’s interest in machine learning
  • Aaisha’s least favorable part of ML Zoomcamp
  • Helping people as a way to learn
  • Using ChatGPT as a “study group”
  • Is it possible to use self-studying to learn high-level topics
  • Switching topics to avoid burnout
  • Aaisha’s resource recommendations


Links:

  • LinkedIn: https://www.linkedin.com/in/aaisha-muhammad/
  • Twitter: https://twitter.com/ZealousMushroom
  • Github: https://github.com/AaishaMuhammad
  • Website: http://www.aaishamuhammad.co.za/

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Apr 07, 202351:02
The Secret Sauce of Data Science Management - Shir Meir Lador

The Secret Sauce of Data Science Management - Shir Meir Lador

We talked about:

  • Shir’s background
  • Debrief culture
  • The responsibilities of a group manager
  • Defining the success of a DS manager
  • The three pillars of data science management
  • Managing up
  • Managing down
  • Managing across
  • Managing data science teams vs business teams
  • Scrum teams, brainstorming, and sprints
  • The most important skills and strategies for DS and ML managers
  • Making sure proof of concepts get into production


Links:

  • The secret sauce of data science management: https://www.youtube.com/watch?v=tbBfVHIh-38
  • Lessons learned leading AI teams: https://blogs.intuit.com/2020/06/23/lessons-learned-leading-ai-teams/
  • How to avoid conflicts and delays in the AI development process (Part I): https://blogs.intuit.com/2020/12/08/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-i/
  • How to avoid conflicts and delays in the AI development process (Part II): https://blogs.intuit.com/2021/01/06/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-ii/
  • Leading AI teams deck: https://drive.google.com/drive/folders/1_CnqjugtsEbkIyOUKFHe48BeRttX0uJG
  • Leading AI teams video: https://www.youtube.com/watch?app=desktop&v=tbBfVHIh-38


Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Mar 31, 202348:43
SE4ML - Software Engineering for Machine Learning - Nadia Nahar

SE4ML - Software Engineering for Machine Learning - Nadia Nahar

We talked about:

  • Nadia’s background
  • Academic research in software engineering
  • Design patterns
  • Software engineering for ML systems
  • Problems that people in industry have with software engineering and ML
  • Communication issues and setting requirements
  • Artifact research in open source products
  • Product vs model
  • Nadia’s open source product dataset
  • Failure points in machine learning projects
  • Finding solutions to issues using Nadia’s dataset and experience
  • The problem of siloing data scientists and other structure issues
  • The importance of documentation and checklists
  • Responsible AI
  • How data scientists and software engineers can work in an Agile way


Links:

  • Model Card: https://arxiv.org/abs/1810.03993
  • Datasheets: https://arxiv.org/abs/1803.09010
  • Factsheets: https://arxiv.org/abs/1808.07261
  • Research Paper: https://www.cs.cmu.edu/~ckaestne/pdf/icse22_seai.pdf
  • Arxiv version: https://arxiv.org/pdf/2110.


Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Mar 24, 202353:40