
DataTalks.Club
By DataTalks.Club

Teaching Data Engineers - Jeff Katz
Teaching Data Engineers - Jeff Katz
DataTalks.ClubMay 13, 2022
00:00
52:33

Starting a Consultancy in the Data Space - Aleksander Kruszelnicki
Starting a Consultancy in the Data Space - Aleksander Kruszelnicki
We talked about:
Aleksander’s background
The difficulty of selling data stack as a service
How Aleksander got into consulting
The Mom Test – extracting feedback from people
User interviews
Why Aleksander’s data stack as a service startup was not viable
How Aleksander decided to switch to consulting
Finding clients to consult
Figuring out how to position your services
Geographical limitations
Figuring out your target audience
The importance of networking and marketing
Pricing your services
The pitfalls of daily and hourly pricing and how to balance incentives
Is Germany a good place to found a company?
Aleksander’s book recommendations
Links:
LinkedIn: https://www.linkedin.com/in/alkrusz/
Twitter: https://twitter.com/alkrusz
Website: www.leukos.io
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Mar 17, 202352:28

Biohacking for Data Scientists and ML Engineers - Ruslan Shchuchkin
Biohacking for Data Scientists and ML Engineers - Ruslan Shchuchkin
We talked about:
Ruslan’s background
Fighting procrastination and perfectionism
What is biohacking?
The role of dopamine and other hormones in daily life
How meditation can help
The influence light has on our bodies
Behavioral biohacking
Daylight lamps and using light to wake up
Sleep cycles
How nutrition affects productivity
Measuring productivity
Examples of unsuccessful biohacking attempts
Stoicism, voluntary discomfort, and self-challenges
Biohacking risks and ways to prevent them
Coffee and tea biohacking
Using self-reflection and tracking to measure results
Mindset shifting
Stoicism book recommendation
Work/life balance
Ruslan’s biohacking resource recommendation
Links:
LinkedIn: https://www.linkedin.com/in/ruslanshchuchkin/
ree data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Mar 10, 202352:58

Analytics for a Better World - Parvathy Krishnan
Analytics for a Better World - Parvathy Krishnan
We talked about:
Parvathy’s background
Brainstorming sessions with nonprofits to establish data maturity
Example of an Analytics for a Better World project
The overall data maturity situation of nonprofits vs private sector
Solving the skill gap
Publicly available content
The Analytics for a Better World Academy
The Academy’s target audience
How researchers can work with Analytics for a Better World
Improving data maturity in nonprofit organizations
People, processes, and technology
Typical tools that Analytics for a Better World recommends to nonprofits
Profiles in nonprofits
Does Analytics for a Better World has a need for data engineers?
The Analytics for a Better World team
Factors that help organizations become more data-driven
Parvathy’s resource recommendations
Links:
LinkedIn: https://www.linkedin.com/in/parvathykrishnank/
Twitter: https://twitter.com/ABWInstitute
Github: https://github.com/Analytics-for-a-Better-World
Website: https://analyticsbetterworld.org/
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Mar 03, 202354:35

Accelerating the Adoption of AI through Diversity - Dânia Meira
Accelerating the Adoption of AI through Diversity - Dânia Meira
We talked about:
Dania’s background
Founding the AI Guild
Datalift Summit
Coming up with meetup topics
Diversity in Berlin
Other types of diversity besides gender
The pitfalls of lacking diversity
Creating an environment where people can safely share their experiences
How the AI Guild helps organizations become more diverse
How the AI guild finds women in the fields of AI and data science
Advice for people in underrepresented groups
Organizing a welcoming environment and creating a code of conduct
AI Guild’s consulting work and community
AI Guild team
Dania’s resource recommendations
Upcoming Datalift Summit
Links:
Call for Speakers for the #datalift summit (Berlin, 14 to 16 June 2023): https://eu1.hubs.ly/H02RXvX0
Coded Bias documentary on Netflix: https://www.netflix.com/de/title/81328723#:~:text=This%20documentary%20investigates%20the%20bias,flaws%20in%20facial%20recognition%20technology.
Book Weapons of Math Destruction by Cathy O'Neil: https://en.wikipedia.org/wiki/Weapons_of_Math_Destruction
Book Lean In by Sheryl Sandberg: https://en.wikipedia.org/wiki/Lean_In
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Feb 24, 202357:01

Staff AI Engineer - Tatiana Gabruseva
Staff AI Engineer - Tatiana Gabruseva
We talked about:
Tatiana’s background
Going from academia to healthcare to the tech industry
What staff engineers do
Transferring skills from academia to industry and learning new ones
The importance of having mentors
Skipping junior and mid-level straight into the staff role
Convincing employers that you can take on a lead role
Seeing failure as a learning opportunity
Preparing for coding interviews
Preparing for behavioral and system design interviews
The importance of having a network and doing mock interviews
How much do staff engineers work with building pipelines, data science, ETC, MPOps, etc.?
Context switching
Advice for those going from academia to industry
The most exciting thing about working as an AI staff engineer
Tatiana’s book recommendations
Links:
LinkedIn: https://www.linkedin.com/in/tatigabru/
Twitter: https://twitter.com/tatigabru
Github: https://github.com/tatigabru
Website: http://tatigabru.com/
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Feb 17, 202355:24

The Journey of a Data Generalist: From Bioinformatics to Freelancing - Jekaterina Kokatjuhha
The Journey of a Data Generalist: From Bioinformatics to Freelancing - Jekaterina Kokatjuhha
We talked about:
Jekaterina’s background
How Jekaterina started freelancing
Jekaterina’s initial ways of getting freelancing clients
How being a generalist helped Jekaterina’s career
Connecting business and data
How Jekaterina’s LinkedIn posts helped her get clients
Jekaterina’s work in fundraising
Cohorts and KPIs
Improving communication between the data and business teams
Motivating every link in the company’s chain
The cons of freelancing
Balancing projects and networking
The importance of enjoying what you do
Growing the client base
In the office work vs working remotely
Jekaterina’s advice who people who feel stuck
Jekaterina’s resource recommendations
Links:
Jekaterina's LinkedIn: https://www.linkedin.com/in/jekaterina-kokatjuhha/
Join DataTalks.Club: https://datatalks.club/slack.html
Feb 11, 202352:18

Navigating Career Changes in Machine Learning - Chris Szafranek
Navigating Career Changes in Machine Learning - Chris Szafranek
We talked about
Chris’s background
Switching careers multiple times
Freedom at companies
Chris’s role as an internal consultant
Chris’s sabbatical
ChatGPT
How being a generalist helped Chris in his career
The cons of being a generalist and the importance of T-shaped expertise
The importance of learning things you’re interested in
Tips to enjoy learning new things
Recruiting generalists
The job market for generalists vs for specialists
Narrowing down your interests
Chris’s book recommendations
Links:
Lex Fridman: science, philosophy, media, AI (especially earlier episodes): https://www.youtube.com/lexfridman
Andrej Karpathy, former Senior Director of AI at Tesla, who's now focused on teaching and sharing his knowledge: https://www.youtube.com/@AndrejKarpathy
Beautifully done videos on engineering of things in the real world: https://www.youtube.com/@RealEngineering
Chris' website: https://szafranek.net/
Zalando Tech Radar: https://opensource.zalando.com/tech-radar/
Modal Labs, new way of deploying code to the cloud, also useful for testing ML code on GPUs: https://modal.com
Excellent Twitter account to follow to learn more about prompt engineering for ChatGPT: https://twitter.com/goodside
Image prompts for Midjourney: https://twitter.com/GuyP
Machine Learning Workflows in Production - Krzysztof Szafanek: https://www.youtube.com/watch?v=CO4Gqd95j6k
From Data Science to DataOps: https://datatalks.club/podcast/s11e03-from-data-science-to-dataops.html
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Feb 03, 202355:36

Preparing for a Data Science Interview - Luke Whipps
Preparing for a Data Science Interview - Luke Whipps
We talked about:
Luke’s background
Luke’s podcast - AI Game Changers
How Luke helps people get jobs
What’s changed in the recruitment market over the last 6 months
Getting ready for the interview process
Stage “zero” – the filter between the candidate and the company
Preparing for the introduction stage – research and communication
Reviewing the fundamentals during preparation
Preparing for the technical part of the interview
Establishing the hiring company’s expectations
Depth vs breadth
Overly theoretical and mathematical questions in interviews
Bombing (failing) in the middle of an interview
Applying to different roles within the same company
Luke’s resource recommendations
Links:
Luke's LinkedIn: https://www.linkedin.com/in/lukewhipps/
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jan 27, 202354:17

Indie Hacking - Pauline Clavelloux
Indie Hacking - Pauline Clavelloux
We talked about:
Pauline’s background
Pauline’s work as a manager at IBM
What is indie hacking?
Pauline initial indie hacking projects
Getting ready for launch
Responsibilities and challenges in indie hacking
Pauline’s latest indie hacking project
Going live and marketing
Challenges with Unreal Me
Staying motivated with indie hacking projects
Skills Pauline picked up while doing indie hacking projects
Balancing a day job and indie hacking
Micro SaaS and AboutStartup.io
How Pauline comes up with ideas for projects
Going from an idea on paper to building a project
Pauline’s Twitter success
Connecting with Pauline online
Pauline’s indie hacking inspiration
Pauline’s resource recommendation
Links:
Website: https://wintopy.io/
Pauline's Twitter: https://twitter.com/Pauline_Cx
Pauline's LinkedIn: https://www.linkedin.com/in/paulineclavelloux/
Blog about Indiehacking: https://aboutstartup.io
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jan 20, 202351:03

Doing Software Engineering in Academia - Johanna Bayer
Doing Software Engineering in Academia - Johanna Bayer
We talked about:
Johanna’s background
Open science course and reproducible papers
Research software engineering
Convincing a professor to work on software instead of papers
The importance of reproducible analysis
Why academia is behind on software engineering
The problems with open science publishing in academia
The importance of standard coding practices
How Johanna got into research software engineering
Effective ways of learning software engineering skills
Providing data and analysis for your project
Johanna’s initial experience with software engineering in a project
Working with sensitive data and the nuances of publishing it
How often Johanna does hackathons, open source, and freelancing
Social media as a source of repos and Johanna’s favorite communities
Contributing to Git repos
Publishing in the open in academia vs industry
Johanna’s book and resource recommendations
Conclusion
Links:
The Society of Research Software Engineering, plus regional chapters: https://society-rse.org/
The RSE Association of Australia and New Zealand: https://rse-aunz.github.io/
Research Software Engineers (RSEs) The people behind research software: https://de-rse.org/en/index.html
The software sustainability institute: https://www.software.ac.uk/
The Carpentries (beginner git and programming courses): https://carpentries.org/
The Turing Way Book of Reproducible Research: https://the-turing-way.netlify.app/welcome
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jan 13, 202349:49

Data-Centric AI - Marysia Winkels
Data-Centric AI - Marysia Winkels
We talked about:
Marysia’s background
What data-centric AI is
Data-centric Kaggle competitions
The mindset shift to data-centric AI
Data-centric does not mean you should not iterate on models
How to implement the data-centric approach
Focusing on the data vs focusing on the model
Resources to help implement the data-centric approach
Data-centric AI vs standard data cleaning
Making sure your data is representative
Knowing when your data is good enough
The importance of user feedback
“Shadow Mode” deployment
What to do if you have a lot of bad data or incomplete data
Marysia’s role at PyData
How Marysia joined PyData
The difference between PyData and PyCon
Finding Marysia online
Links:
Embetter & Bulk Demo: https://www.youtube.com/watch?v=L---nvDw9KU
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jan 06, 202353:08

Business Skills for Data Professionals - Loris Marini
Business Skills for Data Professionals - Loris Marini
We talked about:
Loris’ background
Transitioning from physics to data
Aligning people on concepts
Lead indicators and stickiness
Context, semantics, and meaning
Communication and being memorable
Making data digestible for business and building trust
The importance of understanding the language of business
Stakeholder mapping
Attending business meetings as a data professional
Organizing your stakeholder map
Prioritizing
How to support the business strategy
Learning to speak online
Resource recommendations from Loris
Links:
Discovering Data Discord server: https://bit.ly/discovering-data-discord
Loris' LinkedIn: https://www.linkedin.com/in/lorismarini/
Loris' Twitter: https://twitter.com/LorisMarini
Dec 16, 202254:13

From Software Engineer to Data Science Manager - Sadat Anwar
From Software Engineer to Data Science Manager - Sadat Anwar
We talked about:
Sadat’s background
Sadat’s backend engineering experience
Sadat’s pivot point as a backend engineer
Sadat’s exposure to ML and Data Science
Sadat’s Act Before you Think approach (with safety nets)
Sadat’s street cred and transition into management
The hiring process as an internal candidate
The importance of people management skills
The Brag List
The most difficult part of transitioning to management
Focusing on projects and setting milestones
Sadat’s transition from EM to data science management
How much domain knowledge is needed for management?
The main difference between engineering and management
How being an EM helped Sadat transition no DS management
53:32 Transitioning to DS management from other roles
How to feel accomplished as a manager
Sadat’s book recommendations
Sadat’s meetups
Links:
Sadat's Meetup page: https://www.meetup.com/berlin-search-technology-meetup/
Meetup event "Bias in AI: how to measure it and how to fix it event": https://www.meetup.com/data-driven-ai-berlin-meetup/events/289927565/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Dec 09, 202252:52

Teaching and Mentoring in Data Analytics - Irina Brudaru
Teaching and Mentoring in Data Analytics - Irina Brudaru
We talked about:
Irina’s background
Irina as a mentor
Designing curriculum and program management at AI Guild
Other things Irina taught at AI Guild
Why Irina likes teaching
Students’ reluctance to learn cloud
Irina as a manager
Cohort analysis in a nutshell
How Irina started teaching formally
Irina’s diversity project in the works
How DataTalks.Club can attract more female students to the Zoomcamps
How to get technical feedback at work
Antipatterns and overrated/overhyped topics in data analytics
Advice for young women who want to get into data science/engineering
Finding Irina online
Fundamentals for data analysts
Suggestions for DataTalks.club collaborations
Conclusions
Links:
LinkedIn Account: https://www.linkedin.com/in/irinabrudaru/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Dec 02, 202253:46

Technical Writing and Data Journalism - Angelica Lo Duca
Technical Writing and Data Journalism - Angelica Lo Duca
We talked about:
Angelica’s background
Angelica’s books
Data journalism
How Angelica got into data journalism
The field of digital humanities and Angelica’s data journalism course
Technical articles vs data journalism articles
Transforming reports into data storytelling
Are reports to stakeholders considered technical writing?
Data visualization in articles
Article length
The process of writing an article
Finding writing topics
How Angelica got into writing a book (communication with publishers)
The process for writing a book
Brainstorming
Reviews and revisions
Conclusion
Links:
Data Journalism examples (FENCED OUT): https://www.washingtonpost.com/graphics/world/border-barriers/europe-refugee-crisis-border-control/??noredirect=on
Data Journalism examples (La tierra esclava): https://latierraesclava.eldiario.es/
Small medium publication aiming at being Stack Overflow of Medium: https://medium.com/syntaxerrorpub
Example of a self-published book on Data Visualization: https://www.amazon.com/Introduction-Data-Visualization-Storytelling-Scientist-ebook/dp/B07VYCR3Z6/ref=sr_1_4?crid=4JRJ48O7K8TK&keywords=joses+berengueres&qid=1668270728&sprefix=joses+beremguere%2Caps%2C273&sr=8-4
My novels (in Italian) La bambina e il Clown: https://www.amazon.it/Bambina-Clown-Angelica-Lo-Duca/dp/1500984515/ref=sr_1_9?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=2KGK9GMN0FAHI&keywords=la+bambina+e+il+clown&qid=1668270769&sprefix=la+bambina+e+il+clown%2Caps%2C88&sr=8-9
My novels (in Italian) Il Violinista: https://www.amazon.it/Violinista-1-Angelica-Lo-Duca/dp/1501009672/ref=sr_1_1?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=12KTF9EF5UKIG&keywords=il+violinista+lo+duca&qid=1668270791&sprefix=il+violinista+lo+duca%2Caps%2C81&sr=8-1
Course on Data Journalism: https://www.coursera.org/learn/visualization-for-data-journalism
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Nov 25, 202250:59

From Digital Marketing to Analytics Engineering - Nikola Maksimovic
From Digital Marketing to Analytics Engineering - Nikola Maksimovic
We talked about:
Nikola’s background
Making the first steps towards a transition to BI and Analytics Engineering
Learning the skills necessary to transition to Analytics Engineering
The in-between period – from Marketing to Analytics Engineering
Nikola’s current responsibilities
Understanding what a Data Model is
Tools needed to work as an Analytics Engineer
The Analytics Engineering role over time
The importance of DBT for Analytics Engineers
Where can one learn about data modeling theory?
Going from Ancient Greek and Latin to understanding Data (Just-In-Time Learning)
The importance of having domain knowledge to analytics engineering
Suggestion for those wishing to transition into analytics engineering
The importance of having a mentor when transitioning
Finding a mentor
Helpful newsletters and blogs
Finding Nikola online
Links:
Nikola's LinkedIn account: https://www.linkedin.com/in/nikola-maksimovic-40188183/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Nov 18, 202246:51

Product Owners in Data Science - Anna Hannemann
Product Owners in Data Science - Anna Hannemann
We talked about:
About Anna and METRO
Anna’s background
The importance of a technical background for data product owners
What are product owners?
Product owners vs product managers
Anna’s work on recommender systems at METRO
Expanding the data team
Types of algorithms used for recommender systems
What kind of knowledge and skills data product owners need to have
Problems and ideas should come from the business
How Anna handles all her responsibilities
The process for starting work on new domains
Product portfolio management
ProductTank and Anna’s role in it
Anna’s resource recommendations
Links:
Data Science for Business Book: https://www.amazon.de/-/en/Foster-Provost/dp/1449361323/ref=sr_1_1?keywords=data+science+for+business&qid=1666404807&qu=eyJxc2MiOiIxLjg3IiwicXNhIjoiMS41MiIsInFzcCI6IjEuNDYifQ%3D%3D&sr=8-1
Article on Data Science Products: https://www.linkedin.com/pulse/way-create-data-science-products-lessons-learnt-anna-hannemann-phd/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Nov 11, 202254:03

Building Data Science Practice - Andrey Shtylenko
Building Data Science Practice - Andrey Shtylenko
We talked about:
Audience Poll
Andrey’s background
What data science practice is
Best DS practice in a traditional company vs IT-centric companies
Getting started with building data science practice (finding out who you report to)
Who the initiative comes from
Finding out what kind of problems you will be solving (Centralized approach)
Moving to a semi-decentralized approach
Resources to learn about data science practice
Pivoting from the role of a software engineer to data scientist
The most impactful realization from data science practice
Advice for individual growth
Finding Andrey online
Links:
Data Teams book: https://www.amazon.com/Data-Teams-Management-Successful-Data-Focused/dp/1484262271/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Nov 04, 202249:49

Large-Scale Entity Resolution - Sonal Goyal
Large-Scale Entity Resolution - Sonal Goyal
We talked about:
Sonal’s background
How the idea for Zingg came about
What Zingg is
The difference between entity resolution and identity resolution
How duplicate detection relates to entity resolution
How Sonal decided to start working on Zingg
How Zingg works
What Zingg runs on
Switching from consultancy to working on a new open source solution
Why Zingg is open source
Open source licensing
Working on Zingg initially vs now
Zingg’s current and future team
Sonal’s biggest current challenge
Avoiding problems with entity/identity resolution through database design
Identity resolution vs basic joins, data fusions, and fuzzy joins
Deterministic matching vs probabilistic machine learning
Identity and entity resolution applications for fraud detection
Graph algorithms vs classic ML in entity resolution
Identity resolution success stories
What Sonal would do differently given the chance to start over with Zingg
Advice for those seeking to realize their own solution to a data problem
Reading suggestion from Sonal
Conclusion
Links:
Open-Source Spotlight demo "Zingg":https://www.youtube.com/watch?v=zOabyZxN9b0
Creative Selection: Inside Apple's Design Process During the Golden Age of Steve Jobs book: https://www.amazon.com/Creative-Selection-Inside-Apples-Process/dp/1250194466
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Oct 28, 202253:28

From Data Science to DataOps - Tomasz Hinc
From Data Science to DataOps - Tomasz Hinc
We talked about:
Tomasz’s background
What Tomasz did before DataOps (Data Science)
Why Tomasz made the transition from Data science to DataOps
What is DataOps?
How is DataOps related to infrastructure?
How Tomasz learned the skills necessary to become DataOps
Becoming comfortable with terminal
The overlap between DataOps and Data Engineering
Suitable/useful skills for DataOps
Minimal operational skills for DataOps
Similarities between DataOps and Data Science Managers
Tomasz’s interesting projects
Confidence in results and avoiding going too deep with edge cases
Conclusion
Links:
Terminal setup video, 19 minutes long: https://www.youtube.com/watch?v=D2PSsnqgBiw
Command line videos, one and a half hour to become somewhat comfy with the terminal: https://www.youtube.com/playlist?list=PLIhvC56v63IKioClkSNDjW7iz-6TFvLwS
Course from MIT talking about just that (command line, git, storing secrets): https://missing.csail.mit.edu/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Oct 21, 202251:09

Data Science Career Development - Katie Bauer
Data Science Career Development - Katie Bauer
We talked about:
Katie’s background
What is a data scientist?
What is a data science manager?
Quality of the craft
How data leaders promote career growth
Supporting senior data professionals
Choosing the IC route vs the management route
Managing junior data professionals
Talking to senior stakeholders and PMs as a junior
The importance of hiring juniors
What skills do data scientist managers need to get hired?
How juniors that are just starting out can set themselves apart from the competition
Asking senior colleagues for help and the rubber duck channel
The challenges of the head of data
Conclusion
Links:
Jobs at Gloss Genius: https://boards.greenhouse.io/glossgenius
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Oct 14, 202253:37

From Testing Phones to Managing NLP Projects - Alvaro Navas Peire
From Testing Phones to Managing NLP Projects - Alvaro Navas Peire
We talked about:
Alvaro’s background
Working as a QA (Quality Assurance) engineer
Transitioning from QA to Machine Learning
Gathering knowledge about ML field
Searching for an ML job (improving soft skills and CV)
Data science interview skills
Zoomcamp projects
Zoomcamp project deployment
How to not undersell yourself during interviews
Alvaro’s experience with interviews during his transition
Alvaro’s Zoomcamp notes
Alvaro’s coach
The importance of mathematical knowledge to a transition into ML
Preparing for technical interviews
Alvaro’s typical workday
Alvaro’s team’s tech stack
The importance of a technical background to transitioning into ML
Links:
Alvaro's CV: https://www.dropbox.com/s/89hkt3ug0toqa2n/CV%20nou%20-%20angl%C3%A8s.pdf?dl=0
Github profile: https://github.com/ziritrion
LinkedIn profile: https://www.linkedin.com/in/alvaronavas/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcampJoin
DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Oct 07, 202248:37

Responsible and Explainable AI - Supreet Kaur
Responsible and Explainable AI - Supreet Kaur
We talked about:
Supreet’s background
Responsible AI
Example of explainable AI
Responsible AI vs explainable AI
Explainable AI tools and frameworks (glass box approach)
Checking for bias in data and handling personal data
Understanding whether your company needs certain type of data
Data quality checks and automation
Responsibility vs profitability
The human touch in AI
The trade-off between model complexity and explainability
Is completely automated AI out of the question?
Detecting model drift and overfitting
How Supreet became interested in explainable AI
Trustworthy AI
Reliability vs fairness
Bias indicators
The future of explainable AI
About DataBuzz
The diversity of data science roles
Ethics in data science
Conclusion
Links:
LinkedIn: https://www.linkedin.com/in/supreet-kaur1995/
Databuzz page: https://www.linkedin.com/company/databuzz-club/
Medium Blog Page: https://medium.com/@supreetkaur_66831
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Sep 30, 202253:01

Building Data Science Practice - Andrey Shtylenko
Building Data Science Practice - Andrey Shtylenko
We talked about:
Audience Poll
Andrey’s background
What data science practice is
Best DS practice in a traditional company vs IT-centric companies
Getting started with building data science practice (finding out who you report to)
Who the initiative comes from
Finding out what kind of problems you will be solving (Centralized approach)
Moving to a semi-decentralized approach
Resources to learn about data science practice
Pivoting from the role of a software engineer to data scientist
The most impactful realization from data science practice
Advice for individual growth
Finding Andrey online
Links:
Data Teams book: https://www.amazon.com/Data-Teams-Management-Successful-Data-Focused/dp/1484262271/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Sep 30, 202249:49

No episode this week
No episode this week
Have a great weekend!
Sep 23, 202200:18

Leading Data Research - David Bader
Leading Data Research - David Bader
We talked about:
David’s background
A day in the life of a professor
David’s current projects
Starting a school
The different types of professors
David’s recent papers
Similarities and differences between research labs and startups
Finding (or creating) good datasets
David’s lab
Balancing research and teaching as a professor
David’s most rewarding research project
David’s most underrated research project
David’s virtual data science seminars on YouTube
Teaching at universities without doing research
Staying up-to-date in research
David’s favorite conferences
Selecting topics for research
Convincing students to stay in academia and competing with industry
Finding David online
Links:
David A. Bader: https://davidbader.net/
NJIT Institute for Data Science: https://datascience.njit.edu/
Arkouda: https://github.com/Bears-R-Us/arkouda
NJIT Data Science YouTube Channel: https://www.youtube.com/c/NJITInstituteforDataScience
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Sep 16, 202258:42

Dataset Creation and Curation - Christiaan Swart
Dataset Creation and Curation - Christiaan Swart
We talked about:
Christiaan’s background
Usual ways of collecting and curating data
Getting the buy-in from experts and executives
Starting an annotation booklet
Pre-labeling
Dataset collection
Human level baseline and feedback
Using the annotation booklet to boost annotation productivity
Putting yourself in the shoes of annotators (and measuring performance)
Active learning
Distance supervision
Weak labeling
Dataset collection in career positioning and project portfolios
IPython widgets
GDPR compliance and non-English NLP
Finding Christiaan online
Links:
My personal blog: https://useml.net/
Comtura, my company: https://comtura.ai/
LI: https://www.linkedin.com/in/christiaan-swart-51a68967/
Twitter: https://twitter.com/swartchris8/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Sep 09, 202256:19

Data Mesh 101 - Zhamak Dehghani
Data Mesh 101 - Zhamak Dehghani
We talked about:
Zhamak’s background
What is Data Mesh?
Domain ownership
Determining what to optimize for with Data Mesh
Decentralization
Data as a product
Self-serve data platforms
Data governance
Understanding Data Mesh
Adopting Data Mesh
Resources on implementing Data Mesh
Links:
Free 30-day code from O'Reilly: https://learning.oreilly.com/get-learning/?code=DATATALKS22
Data Mesh book: https://learning.oreilly.com/library/view/data-mesh/9781492092384/
LinkedIn: https://www.linkedin.com/in/zhamak-dehghani
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Sep 02, 202254:09

Growing Data Engineering Team in a Scale-Up - Mehdi OUAZZA
Growing Data Engineering Team in a Scale-Up - Mehdi OUAZZA
We talked about:
Mehdi’s background
The difference between startup, scale-up and enterprise
Hypergrowth
Data platform engineers in a scale-up environment
What a data platform is and who builds it
Managing the fast pace of a scale-up while ensuring personal growth
Should a senior data person consider a scale-up or an enterprise?
Should a junior data person consider a scale-up or an enterprise?
Sourcing talent for hyper-growth companies and developing a community culture
Generating content and getting feedback
Generalization vs specialization for data engineers in a scale-up
The ratio of work between platform building and use case pipelines
Being proactive in order to progress to mid or senior level
Caps and bass guitars
MehdiO DataTV and DataCreators.Club (Mehdi’s YouTube Channel and podcast)
Links:
Mehdi's YouTube channel: https://www.youtube.com/channel/UCiZxJB0xWfPBE2omVZeWPpQ
Mehdi's Linkedin: https://linkedin.com/in/mehd-io/
Mehdi's Medium Blog: https://medium.com/@mehdio
Mehdi's data creators club: https://datacreators.club/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Aug 26, 202253:13

Lessons Learned About Data & AI at Enterprises - Alexander Hendorf
Lessons Learned About Data & AI at Enterprises - Alexander Hendorf
We talked about:
Alexander’s background
The role of Partner at Königsweg
Being part of the data and AI community
How Alexander became chair at PyData
Alexander’s many talks and advice on giving them
Explaining AI to managers
Why being able to explain machine learning to managers is important
The experimentational nature of AI and why it’s not a cure-all
Innovation requires patience
Convincing managers not to use AI or ML when there are better (simpler) solutions
The role of MLOps in enterprises
Thinking about the mid- and long-term when considering solutions
Finding Alexander online
Links:
Alexander's Twitter: https://twitter.com/hendorf
Alexander's LinkedIn: https://www.linkedin.com/in/hendorf/
Königsweg: https://www.koenigsweg.com
PyData Südwest: https://www.meetup.com/pydata-suedwest/
PyData Frankfurt: https://www.meetup.com/pydata-frankfurt/
PyConDE & PyData Berlin: https://pycon.de
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Aug 19, 202254:17

MLOps Architect - Danny Leybzon
MLOps Architect - Danny Leybzon
We talked about:
Danny’s background
What an MLOps Architect does
The popularity of MLOps Architect as a role
Convincing an employer that you can wear many different hats
Interviewing for the role of an MLOps Architect
How Danny prioritizes work with data scientists
Coming to WhyLabs when you’ve already got something in production vs nothing in production
Market awareness regarding the importance of model monitoring
How Danny (WhyLabs) chooses tools
ONNX
Common trends in tooling setups
The most rewarding thing for Danny in ML and data science
Danny’s secret for staying sane while wearing so many different hats
T-shaped specialist, E-shaped specialist, and the horizontal line
The importance of background for the role of an MLOps Architect
Key differences for WhyLogs free vs paid
Conclusion and where to find Danny online
Links:
Matt Turck: https://mattturck.com/data2021/
AI Observability Platform: https://whylabs.ai/observability
Danny's LinkedIn: https://www.linkedin.com/in/dleybz/
Whylabs' website: https://whylabs.ai/
AI Infrastructure Alliance: https://ai-infrastructure.org/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Aug 12, 202253:31

Decoding Data Science Job Descriptions - Tereza Iofciu
Decoding Data Science Job Descriptions - Tereza Iofciu
We talked about:
DataTalks.Club intro
Tereza’s background
Working as a coach
Identifying the mismatches between your needs and that of a company
How to avoid misalignments
Considering what’s mentioned in the job description, what isn’t, and why
Diversity and culture of a company
Lack of a salary in the job description
Way of doing research about the company where you will potentially work
How to avoid a mismatch with a company other than learning from your mistakes
Before data, during data, after data (a company’s data maturity level)
The company’s tech stack
Finding Tereza online
Links:
Decoding Data Science Job Descriptions (talk): https://www.youtube.com/watch?v=WAs9vSNTza8
Talk at ConnectForward: https://www.youtube.com/watch?v=WAs9vSNTza8
Slides: https://www.slideshare.net/terezaif/decoding-data-science-job-descriptions-250687704
Talk at DataLift: https://www.youtube.com/watch?v=pCtQ0szJiLA
Slides: https://www.slideshare.net/terezaif/lessons-learned-from-hiring-and-retaining-data-practitioners
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Aug 05, 202249:14

Data Science for Social Impact - Christine Cepelak
Data Science for Social Impact - Christine Cepelak
We talked about:
Christine’s Background
Private sector vs Public sector
Public policy
The challenges of being a community organizer
How public policy relates to political science
Programs that teach data science for public policy
Data science for public policy vs regular data science
The importance of ethical data science in public policy
How data science in social impact project differs from other projects
Other resources to learn about data science for public policy
Challenges with getting data in data science for public policy
The problems with accessing public datasets about recycling
Christine’s potential projects after Master’s degree
Gender inequality in STEM fields
Corporate responsibility and why organizations need social impact data scientists
What you need to start making a social impact with data science
80,000 hours
Other use cases for public policy data science
Coffee, Ethics & AI
Finding Christine online
Links:
Explore some Data Science for Social Good projects: http://www.dssgfellowship.org/projects/
Bi-weekly Ethics in AI Coffee Chat: https://www.meetup.com/coffee-ethics-ai/
Make a Social Impact with your Job: https://tinyurl.com/80khours
Course in Data Ethics: https://ethics.fast.ai/
Data Science for Social Good Berlin: https://dssg-berlin.org/
CorrelAid: https://correlaid.org/
DataKind: https://www.datakind.org/
Christine's LinkedIn: https://www.linkedin.com/in/christinecepelak/
Christine's Twitter: https://twitter.com/CLcep
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jul 29, 202248:23

Hiring Data Science Talent - Olga Ivina
Hiring Data Science Talent - Olga Ivina
We talked about:
Olga’s career journey
Hiring data scientists now vs 7 years ago
The two qualities of an excellent data scientist
What makes Alexey do this podcast
How Alexey get the latest information on data science
How Olga checks a candidate’s technical skills
How to make an answer stand out (showing your depth of knowledge)
A strong mathematical background vs a strong engineering background
When Auto ML will replace the need to have data scientists
Should data scientists transition into management? (the importance of communication in an organization)
Switching from a data analyst role to a data scientist
Attracting female talent in data science
Changing a job description to find talent
Long gaps in the CV
Eierlegende Wollmilchsau
Links:
Olga's LinkedIn: https://www.linkedin.com/in/olgaivina/
Olga's Twitter: https://twitter.com/olgaivina
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jul 22, 202252:45

From Open-Source Maintainer to Founder - Will McGugan
From Open-Source Maintainer to Founder - Will McGugan
We talked about:
Will’s background
Will’s open source projects
S3Fs and PyFile systems
Inspiration for open source projects
Will as a freelancer
Starting a company from a tweet (Rich and Textual)
Building in public (Will’s approach to social media)
The workforce and roadmap of Textualize.io
The importance of working on open source for Textualize employees
The workflow of and contributions to Textualize
Getting your first thousand GitHub Stars (going viral)
Suggestions for those who wish to start in the open-source space
Finding Will online
Links:
Twitter: https://twitter.com/willmcgugan
Textualize website: https://www.textualize.io/
Textualize GitHub: https://github.com/textualize
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jul 15, 202249:34

Designing a Data Science Organization - Lisa Cohen
Designing a Data Science Organization - Lisa Cohen
We talked about:
Lisa’s background
Centralized org vs decentralized org
Hybrid org (centralized/decentralized)
Reporting your results in a data organization
Planning in a data organization
Having all the moving parts work towards the same goals
Which approach Twitter follows (centralized vs decentralized)
Pros and cons of a decentralized approach
Pros and cons of a centralized approach
Finding a common language with all the functions of an org
Finding the right approach for companies that want to implement data science
How many data scientists does a company need?
Who do data scientists report huge findings to?
The importance of partnering closely with other functions of the org
The role of Product Managers in the org and across functions
Who does analytics at Twitter (analysts vs data scientists)
The importance of goals, objectives and key results
Conflicting objectives
The importance of research
Finding Lisa online
Links:
LinkedIn: https://www.linkedin.com/in/cohenlisa/
Twitter: https://twitter.com/lisafeig
Medium: https://medium.com/@lisa_cohen
Lisa Cohen's YouTube videos: https://www.youtube.com/playlist?list=PLRhmnnfr2bX7-GAPHzvfUeIEt2iYCbI3w
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jul 08, 202251:23

Developer Advocacy Engineer for Open-Source - Merve Noyan
Developer Advocacy Engineer for Open-Source - Merve Noyan
We talked about:
Merve’s background
Merve’s first contributions to open source
What Merve currently does at Hugging Face (Hub, Spaces)
What is means to be a developer advocacy engineer at Hugging Face
The best way to get open source experience (Google Summer of Code, Hacktoberfest, and sprints)
The peculiarities of hiring as it relates to code contributions
Best resources to learn about NLP besides Hugging Face
Good first projects for NLP
The most important topics in NLP right now
NLP ML Engineer vs NLP Data Scientist
Project recommendations and other advice to catch the eye of recruiters
Merve on Twitch and her podcast
Finding Merve online
Merve and Mario Kart
Links:
Hugging Face Course: https://hf.co/course
Natural Language Processing in TensorFlow: https://www.coursera.org/learn/natural-language-processing-tensorflow
Github ML Poetry: https://github.com/merveenoyan/ML-poetry
Tackling multiple tasks with a single visual language model: https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model
Hugging Face big science/TOpp: https://huggingface.co/bigscience/T0pp
Pathways Language Model (PaLM) blog: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jul 01, 202250:58

Data Scientists at Work - Mısra Turp
Data Scientists at Work - Mısra Turp
We talked about:
Misra’s background
What data scientists do
Consultant data scientists vs in-house data scientists (and freelancers)
Expectations for data scientists
The importance of keeping up to date with AI developments (FOMA)
How does DALL·E 2 work and should you care?
Going to conferences to stay up to date
The most pressing issue for data scientists
Fighting FOMA and imposter syndrome
Knowing when you have enough knowledge of a framework
The “best” type of data scientist
Being a generalist vs a specialist
Advice for entry-level data entering an oversaturated market
Catching the eye of big AI companies
Choosing a project for your portfolio
The importance of having a Ph.D. or Master’s degree in data science
Finding Misra online
Links:
Mısra's YouTube channel: https://www.youtube.com/channel/UCpNUYWW0kiqyh0j5Qy3aU7w
Twitter: https://twitter.com/misraturp
Hands-on Data Science: Complete Your First Portfolio Project: https://www.soyouwanttobeadatascientist.com/hods
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.htm
Jun 24, 202258:02

Freelancing and Consulting with Data Engineering - Adrian Brudaru
Freelancing and Consulting with Data Engineering - Adrian Brudaru
We talked about:
Adrian’s background
Freelancing vs Employment
Risk and occupancy rate in freelancing
The scariest part of freelancing
Adrian’s first projects
Freelancing 5 years later
Pay rates in freelancing
Acquiring skills while freelancing
Working with recruitment agencies and networking
Looking for projects and getting clients
Freelancing vs consulting
Clarity in clients’ expectations (scope of work)
Building your network
Freelancing platforms
Adrian’s data loading prototype
Going from freelancing to making your own product (and other investments)
The usefulness of a portfolio
Introverts in freelancing
Is it possible to work for 3 months a year in freelancing?
Choosing projects and skill-building strategy (focusing on interests)
Freelancing in Berlin
Clients’ expectations for freelancers vs employees
Working with more than one client at the same time
Adrian’s freelance cooperative on Slack
Other advice for novice freelancers (networking)
Finding Adrian online
Links:
Github: https://github.com/scale-vector
Slack Community: https://join.slack.com/t/berlindatacol-szn7050/shared_invite/zt-19dp8msp0-pP4Av3_fVFBbsdrzPROEAg
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jun 17, 202252:02

Getting a Data Engineering Job (Summary and Q&A) - Jeff Katz
Getting a Data Engineering Job (Summary and Q&A) - Jeff Katz
We talked about:
Summary of “Getting a Data Engineering Job” webinar
Python and engineering skills
Interview process
Behavioral interviews
Technical interviews
Learning Python and SQL from scratch
Is having non-coding experience a disadvantage?
Analyst or engineer?
Do you need certificates?
Do I need a master’s degree?
Fully remote data engineering jobs
Should I include teaching on my resume?
Object-oriented programming for data engineering
Python vs Java/Scala
SQL and Python technical interview questions
GCP certificates
Is commercial experience really necessary?
From sales to engineering
Solution engineers
Wrapping up
Links:
Getting a Data Engineering Job (webinar): https://www.youtube.com/watch?v=yvEWG-S1F_M
The Flask Mega-Tutorial Part I - Hello, World! blog: https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world
Mode SQL Tutorial: https://mode.com/sql-tutorial/
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jun 10, 202248:05

Using Data for Asteroid Mining - Daynan Crull
Using Data for Asteroid Mining - Daynan Crull
We talked about:
Daynan’s background
Astronomy vs cosmology
Applications of data science and machine learning in astronomy
Determining signal vs noise
What the data looks like in astronomy
Determining the features of an object in space
Ground truth for space objects
Why water is an important resource in the space economy
Other useful resources that can be found in asteroids
Sources of asteroids
The data team at an asteroid mining company
Open datasets for hobbyists
Mission and hardware design for asteroid mining
Partnerships and hires
Links:
LinkedIn: https://www.linkedin.com/in/daynan/
We're looking for a Sr Data Engineer: https://boards.eu.greenhouse.io/karmanplus/jobs/4027128101?gh_jid=4027128101
Minor Planet Center: https://minorplanetcenter.net/- JPL Horizons has a nice set of APIs for accessing data related to small bodies (including asteroids): https://ssd.jpl.nasa.gov/api.html
ESA has NEODyS: https://newton.spacedys.com/neodys
IRSA catalog that contains image and catalog data related to the WISE/NEOWISE data (and other infrared platforms): https://irsa.ipac.caltech.edu/frontpage/
NASA also has an archive of data collected from their various missions, including a node related to small bodies: https://pds-smallbodies.astro.umd.edu/
Sub-node directly related to asteroids: https://sbn.psi.edu/pds/
Size, Mass, and Density of Asteroids (SiMDA) is a nice catalog of observed asteroid attributes (and an indication of how small our sample size is!): https://astro.kretlow.de/?SiMDA
The source survey data, several are useful for asteroids: Pan-STARRS (https://outerspace.stsci.edu/display/PANSTARRS)
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jun 03, 202253:22

Machine Learning in Marketing - Juan Orduz
Machine Learning in Marketing - Juan Orduz
We talked about:
Juan’s background
Typical problems in marketing that are solved with ML
Attribution model
Media Mix Model – detecting uplift and channel saturation
Changes to privacy regulations and its effect on user tracking
User retention and churn prevention
A/B testing to detect uplift
Statistical approach vs machine learning (setting a benchmark)
Does retraining MMM models often improve efficiency?
Attribution model baselines
Choosing a decay rate for channels (Bayesian linear regression)
Learning resource suggestions
Bayesian approach vs Frequentist approach
Suggestions for creating a marketing department
Most challenging problems in marketing
The importance of knowing marketing domain knowledge for data scientists
Juan’s blog and other learning resources
Finding Juan online
Links:
Juan's PyData talk on uplift modeling: https://youtube.com/watch?v=VWjsi-5yc3w
Juan's website: https://juanitorduz.github.io
Introduction to Algorithmic Marketing book: https://algorithmic-marketing.online
Preventing churn like a bandit: https://www.youtube.com/watch?v=n1uqeBNUlRM
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
May 27, 202252:52

From Academia to Data Analytics and Engineering - Gloria Quiceno
From Academia to Data Analytics and Engineering - Gloria Quiceno
We talked about:
Gloria’s background
Working with MATLAB, R, C, Python, and SQL
Working at ICE
Job hunting after the bootcamp
Data engineering vs Data science
Using Docker
Keeping track of job applications, employers and questions
Challenges during the job search and transition
Concerns over data privacy
Challenges with salary negotiation
The importance of career coaching and support
Skills learned at Spiced
Retrospective on Gloria’s transition to data and advice
Top skills that helped Gloria get the job
Thoughts on cloud platforms
Thoughts on bootcamps and courses
Spiced graduation project
Standing out in a sea of applicants
The cohorts at Spiced
Conclusion
Links:
LinkedIn: https://www.linkedin.com/in/gloria-quiceno/
Github: https://github.com/gdq12
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
May 20, 202248:41

Teaching Data Engineers - Jeff Katz
Teaching Data Engineers - Jeff Katz
We talked about:
Jeff’s background
Getting feedback to become a better teacher
Going from engineering to teaching
Jeff on becoming a curriculum writer
Creating a curriculum that reinforces learning
Jeff on starting his own data engineering bootcamp
Shifting from teaching ML and data science to teaching data engineering
Making sure that students get hired
Screening bootcamp applicants
Knowing when it’s time to apply for jobs
The curriculum of JigsawLabs.io
The market demand of Spark, Kafka, and Kubernetes (or lack thereof)
Advice for data analysts that want to move into data engineering
The market demand of ETL/ELT and DBT (or lack thereof)
The importance of Python, SQL, and data modeling for data engineering roles
Interview expectations
How to get started in teaching
The challenges of being a one-person company
Teaching fundamentals vs the “shiny new stuff”
JigsawLabs.io
Finding Jeff online
Links:
Jigsaw Labs: https://www.jigsawlabs.io/free
Teaching my mom to code: https://www.youtube.com/watch?v=OfWwfTXGjBM
Getting a Data Engineering Job Webinar with Jeff Katz: https://www.eventbrite.de/e/getting-a-data-engineering-job-tickets-310270877547
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
May 13, 202252:33

From Roasting Coffee to Backend Development - Jessica Greene
From Roasting Coffee to Backend Development - Jessica Greene
We talked about:
Jessica’s background
Giving a talk at a tech conference about coffee
Jessica’s transition into tech (How to get started)
Going from learning to actually making money
Landing your first job in tech
Does your age matter when you’re trying to get a job?
Challenges that Jessica faced in the beginning of her career
Jessica’s role at PyLadies
Fighting the Imposter Syndrome
Generational differences in digital literacy and how to improve it
Events organized by PyLadies
Jessica’s beginnings at PyLadies (organizing events)
Jessica’s experience with public speaking
The impact of public speaking on your career
Tips for public speaking
Jessica’s work at Ecosia
Discrimination in the tech industry (and in general)
Finding Jessica online
Links:
Ecosia's website: https://www.ecosia.org/
Ecosia's blog: https://blog.ecosia.org/ecosia-financial-reports-tree-planting-receipts/
PyLadies Berlin: https://berlin.pyladies.com/
PyLadies' Meetup: https://meetup.com/PyLadies-Berlin
Code Academy: https://www.codecademy.com/
Freecodecamp: https://www.freecodecamp.org/
Coursera Machine Learning: https://www.coursera.org/learn/machine-learning
ML Bookcamp code: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Google Summer code: https://summerofcode.withgoogle.com/
Outreachy website: https://www.outreachy.org/
Alumni Interview: https://railsgirlssummerofcode.org/blog/2020-03-17-alumni-interview-jessica
Python pizza: https://python.pizza/
Pycon: https://pycon.it/en
Pycon 2022: https://2022.pycon.de/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
May 06, 202252:56

Recruiting Data Engineers - Nicolas Rassam
Recruiting Data Engineers - Nicolas Rassam
We talked about:
Nicolas’ background
The tech talent market in different countries
Hiring data scientists vs data engineers
A spike in interest for data engineering roles
The importance of recruiters having technical knowledge
The main challenges of hiring data engineers
The difference in hiring junior, mid, and senior level data engineers
Things recruiters look for in people who switch to a data engineering role
The importance of knowing cloud tools
The importance of knowing infrastructure tools
Preparing for the interview
The importance of a formal education
The importance having a project portfolio
How your current domain influence the interview
Conclusion
Links:
Nicolas' Twitter: https://twitter.com/n_rassam
Nicolas' LinkedIn: https://www.linkedin.com/in/nicolasrassam/
Onfido is hiring: https://onfido.com/engineering-technology/
Interview with Alicja about recruiting data scientists: https://datatalks.club/podcast/s07e02-recruiting-data-professionals.html
Webinar "Getting a Data Engineering Job" with Jeff Katz: https://eventbrite.com/e/310270877547
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Apr 29, 202249:43

Storytime for DataOps - Christopher Bergh
Storytime for DataOps - Christopher Bergh
We talked about:
Christopher’s background
The essence of DataOps
Also known as Agile Analytics Operations or DevOps for Data Science
Defining processes and automating them (defining “done” and “good”)
The balance between heroism and fear (avoiding deferred value)
The Lean approach
Avoiding silos
The 7 steps to DataOps
Wanting to become replaceable
DataOps is doable
Testing tools
DataOps vs MLOps
The Head Chef at Data Kitchen
What’s grilling at Data Kitchen?
The DataOps Cookbook
Links:
DataOps Manifesto website: https://dataopsmanifesto.org/en/
DataOps Cookbook: https://dataops.datakitchen.io/pf-cookbook
Recipes for DataOps Success: https://dataops.datakitchen.io/pf-recipes-for-dataops-success
DataOps Certification Course: https://info.datakitchen.io/training-certification-dataops-fundamentals
DataOps Blog: https://datakitchen.io/blog/
DataOps Maturity Model: https://datakitchen.io/dataops-maturity-model/
DataOps Webinars: https://datakitchen.io/webinars/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Apr 22, 202252:11

Machine Learning and Personalization in Healthcare - Stefan Gudmundsson
Machine Learning and Personalization in Healthcare - Stefan Gudmundsson
We talked about:
Stefan’s background
Applications of machine learning in healthcare
Sidekick Health – gamified therapeutics
How is working for King different from Sidekick Health?
The rewards systems in gamified apps
The importance of building a strong foundation for a data science team
The challenges of building an app in the healthcare industry
Dealing with ethics issues
Sidekick Health’s personalized recommendations and content
The importance of having the right approach in A/B tests (strong analytics and good data)
The importance of having domain knowledge to work as a data professional in the healthcare industry
Making a data-driven company
Risks for Sidekick Health
Sidekick Health growth strategy
Using AI to help people live better lives
Links:
LinkedIn: https://www.linkedin.com/in/stefanfreyrgudmundsson/
Job listings: https://sidekickhealth.bamboohr.com/jobs/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Apr 15, 202251:59

Innovation and Design for Machine Learning - Liesbeth Dingemans
Innovation and Design for Machine Learning - Liesbeth Dingemans
We talked about:
Liesbeth’s background
What is design?
The importance of interaction in design
Design as a process (Double Diamond technique)
How long does it take to go from an idea to finishing the second diamond?
Design thinking (Google’s PAIR)
What is a Design Sprint and who should participate in it?
Why should data specialists care about design?
Challenging your task-giver (asking “why”)
How to avoid the “Chinese whisper game” (reiterating the problem)
Defining the roadmap for data science teams
What is innovation?
Bringing innovation to your management
Task force-team approach to solving problems
Innovation, resource management issues, and using data to back your ideas
Words of advice for those interested in design and innovation
Links:
LinkedIn: https://www.linkedin.com/in/liesbeth-dingemans/
Medium posts on design, innovation, art and AI: https://medium.com/@liesbethmd
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Apr 08, 202255:49

Hacking Your Data Career - Marijn Markus
Hacking Your Data Career - Marijn Markus
We talked about:
Marijn’s background
Standing out in data science
Doing the opposite of what people tell you
Don’t shoot the messenger (carefully sharing your findings)
Advising the seniors
Bite off more than you can chew, then chew
Marijn’s side projects (finding value in doing things you find interesting)
Building a project portfolio
Marijn’s NGO project
The importance of a team
Open source intelligence (OSINT)
The importance of soft skills for data experts
Marijn’s LinkedIn growth strategy and tips
Links:
Twitter: https://twitter.com/MarijnMarkus
LinkedIn: https://www.linkedin.com/in/marijnmarkus/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Apr 01, 202255:56

Visualising Machine Learning - Meor Amer
Visualising Machine Learning - Meor Amer
We talked about:
kDimensions
Being self-employed
Visual engineering
Constrain yourself to get creative
Coming up with ideas
Visualising difficult concepts
The process of creating visuals
Creating visuals
Learning to create visuals for engineers
Consuming with intention to create
Learning by breaking code
Earning with visuals
Adding visuals to blog posts
Meor’s book: visual introduction to deep learning
Links:
A Visual Introduction to Deep Learning by Meor Amer: https://gumroad.com/a/63231091
kDimensions website: https://kdimensions.com/
Book to learn about Figma: https://figmabook.com/
Jack Butcher's approach: https://www.youtube.com/watch?v=azhqc4K-GAE
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Mar 25, 202252:06

From Math Teacher to Analytics Engineer - Juan Pablo
From Math Teacher to Analytics Engineer - Juan Pablo
We talked about:
Juan Pablo's Backround
Data engineering resources
Teaching calculus
Transitioning to Analytics
Data Analytics bootcamp
Getting money while studying
Going to meetups to get a job
Looking for uncrowded doors
Using LinkedIn
Portfolio
Talking to people on meetups
Eight tips to get your first analytics job
Consider contracts and temporary roles
Getting experience with non-profits
Create your own internship
Networking
Website for hosting a portfolio
I’m a math teacher. What should I learn first?
Analytics engineering
Best suggestion: keep showing up
Networking on online conferences
Communication skills and being organized
Links:
Website: https://www.thatjuanpablo.com/
Twitter: https://twitter.com/thatjuanpablo
BROKE teacher to FAANG engineer Twitter thread: https://twitter.com/thatjuanpablo/status/1475806246317875203
LinkedIn: https://www.linkedin.com/in/thatjuanpablo/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Mar 18, 202250:16

From Data Science to Data Engineering - Ellen König
From Data Science to Data Engineering - Ellen König
We talked about:
Ellen’s background
Why Ellen switched from data science to data engineering
The overlap between data science and data engineering
Skills to learn and improve for data engineering
Ways to pick up and improve skills (advice for making the transition)
What makes a data engineering course “good”
Languages to know for data engineering
The easiest part of transitioning into data engineering
The hardest part of transitioning into data engineering
Common data engineering team distributions
People who are both data scientists and data engineers
Pet projects and other ways to pick up development skills
Dealing with cloud processing costs (alerts, billing reports, trial periods)
Advice for getting into entry level positions
Which cloud platform should data engineers learn?
Links:
Twitter: https://twitter.com/ellen_koenig
LinkedIn: https://www.linkedin.com/in/ellenkoenig/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Mar 11, 202254:15

Becoming a Data Engineering Manager - Rahul Jain
Becoming a Data Engineering Manager - Rahul Jain
We talked about:
Rahul’s background
What do data engineering managers do and why do we need them?
Balancing engineering and management
Rahul’s transition into data engineering management
The importance of updating your skill set
Planning the transition to manager and other challenges
Setting expectations for the team and measuring success
Data reconciliation
GDPR compliance
Data modeling for Big Data
Advice for people transitioning into data engineering management
Staying on top of trends and enabling team members
The qualities of a good data engineering team
The qualities of a good data engineer candidate (interview advice)
The difference between having knowledge and stuffing a CV with buzzwords
Advice for students and fresh graduates
An overview of an end-to-end data engineering process
Links:
Rahul's LinkedIn: https://www.linkedin.com/in/16rahuljain/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Mar 04, 202251:24

A/B Testing - Jakob Graff
A/B Testing - Jakob Graff
We talked about:
Jakob’s background
The importance of A/B tests
Statistical noise
A/B test example
A/B tests vs expert opinion
Traffic splitting, A/A tests, and designing experiments
Noisy vs stable metrics – test duration and business cycles
Z-tests, T-tests, and time series
A/B test crash course advice
Frequentist approach vs Bayesian approach
A/B/C/D tests
Pizza dough
Links:
Jakob's LinkedIn: https://www.linkedin.com/in/jakob-graff-a6113a3a/
Product Analyst role at Inkitt: https://jobs.lever.co/inkitt/d2b0427a-f37f-4002-975d-28bd60b56d70
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Feb 25, 202254:16

Machine Learning System Design Interview - Valerii Babushkin
Machine Learning System Design Interview - Valerii Babushkin
We talked about:
Valerii’s background
Who goes through an ML system design interview
System design VS ML System design
Preparing for ML system design interviews
Machine learning project checklist
The importance of defining a goal and ways of measuring it
What to do after you set a goal
Typical components of an ML system
Applying ML systems to real-world problems
System design and coding in interviews for new graduates
Humans in the validation of model performance
Links:
Valerii's telegram channel (in Russian): t.me/cryptovalerii
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Feb 18, 202254:40

Career Coaching - Lindsay McQuade
Career Coaching - Lindsay McQuade
We talked about:
Lindsay’s background
Spiced Academy
Career coaching role
Reframing your experience
Helping with career problems
Finding what interests you
Tailoring a CV and “spray and pray”
Career coaching outside a bootcamp
Imposter syndrome
After bootcamp
Internships
Working with recruiters
Networking on LinkedIn
Links:
Lindsay's LinkedIn: https://www.linkedin.com/in/lindsay-mcquade/
Impostor questionnaire: http://impostortest.nickol.as/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Feb 11, 202252:28

Product Management Essentials for Data Professionals - Greg Coquillo
Product Management Essentials for Data Professionals - Greg Coquillo
We talked about:
Greg’s background
Responsibilities of Data Product Manager
Understanding customer journey
Interviewing business partners and decision-makers
Products sense, product mindset, and product roadmap
Working backwards
Driving the roadmap
Building a roadmap in Excel
Measuring success
Advice for teams that don’t have a product manager
Links:
Greg's LinkedIn: https://www.linkedin.com/in/greg-coquillo/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Feb 04, 202253:10

Recruiting Data Professionals - Alicja Notowska
Recruiting Data Professionals - Alicja Notowska
We talked about:
Alicja’s background
The hiring process
Sourcing and recruiting
Managing expectations
Making the job description attractive
Selecting profiles during sourcing
Profile keywords
The importance of a Master’s vs a Bachelor’s degree vs a PhD
Improving CV
Interview with the recruiter
Salary expectations
Advice for “career changers”
Cover letters
Data analysts
Double Bachelor’s degrees
The most difficult part of hiring
Coursera courses on the CV
Making a good impression on recruiters
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jan 28, 202257:01

DataTalks.Club Behind the Scenes - Eugene Yan, Alexey Grigorev
DataTalks.Club Behind the Scenes - Eugene Yan, Alexey Grigorev
We talked about:
Alexey’s background
Being a principal data scientist
DataTalks.Club
The beginning and growth of DataTalks.Club
Sustaining the pace
Types of talks
Popular and favorite talks
Making DataTalks.Club self-sufficient
Alexey’s book and course
Advice for people starting in data science and staying motivated
Not keeping up to date with new tools
Staying productive
Learning technical subjects and keeping notes
Inspiration and idea generation for DataTalks.Club
Links:
https://eugeneyan.com/writing/informal-mentors-alexey-grigorev/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jan 21, 202250:29

DTC's minis - From Data Engineering to MLOps - Sejal Vaidya
DTC's minis - From Data Engineering to MLOps - Sejal Vaidya
We don't have a new episode this week, but we have an amazing conversation with Sejal Vaidya from August
We talked about
Sejal's background
Why transitioning to ML engineering
Three phases of development of a project
Why data engineers should get involved in ML
Technologies
Tips for people who want to transition
Soft skills and understanding requirements
Helpful resources
Resources:
ML checklist (https://twolodzko.github.io/ml-checklist.html)
Machine Learning Bookcamp (https://mlbookcamp.com/)
Made with ML course (https://madewithml.com)
Full-stack deep learning (https://fullstackdeeplearning.com)
Newsletters: mlinproduction, huyenchip.com, jeremyjordan.me, mihaileric.com
Sejal's "Production ML" twitter list (https://twitter.com/i/lists/1212819218959351809)
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jan 14, 202216:52

Becoming a Data Science Manager - Mariano Semelman
Becoming a Data Science Manager - Mariano Semelman
We talked about:
Mariano’s background
Typical day of a manager
Becoming a manager
Preparing for the transition
Balancing projects and assumptions
Search and recommendations
Dealing with unfamiliar domains
Structuring projects
Connecting product and data science
Rules of Machine Learning
CRISP-DM and deployment
Giving feedback
Dealing with people leaving the team
Doing technical work as a manager
Dealing with bad hires
Keeping up with the industry
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jan 07, 202201:05:51

Leading NLP Teams - Ivan Bilan
Leading NLP Teams - Ivan Bilan
We talked about:
Ivan’s role at Personio
Ivan’s background
Studying technical management
Managing a software team
NLP teams
NLP engineers
Becoming an NLP engineer
Computer vision
NLP engineer vs ML engineer
Conversational designers
Linguistics outside of chatbots
When does a team need an NLP engineer or a linguist?
The future of NLP
NLP pipelines
GPT-3
Problems of GPT-3
Does GPT-3 make everything obsolete?
What NLP actually is?
Does NLP solve problems better than humans?
State of language translation
NLP Pandect
Links:
https://github.com/ivan-bilan/The-NLP-Pandect
https://github.com/ivan-bilan/The-Engineering-Manager-Pandect
https://github.com/ivan-bilan/The-Microservices-Pandect
Ivan's presentation about NLP: https://www.youtube.com/watch?v=VRur3xey31s
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Dec 24, 202159:10

Product Management for Machine Learning - Geo Jolly
Product Management for Machine Learning - Geo Jolly
We talked about
Geo’s background
Technical Product Manager
Building ML platform
Working on internal projects
Prioritizing the backlog
Defining the problems
Observability metrics
Avoiding jumping into “solution mode”
Breaking down the problem
Important skills for product managers
The importance of a technical background
Data Lead vs Staff Data Scientist vs Data PM
Approvals and rollout
Engineering/platform teams
Data scientists’ role in the engineering team
Scrum and Agile in data science
Transitioning from Data Scientist to Technical PM
Books to read for the transition
Transitioning for non-technical people
Doing user research
Quality assurance in ML
Advice for supporting an ML team as a Scrum master
Links:
Geo's LinkedIn: https://www.linkedin.com/in/geojolly/
Product School community: https://productschool.com/
http://theleanstartup.com/
Netflix CPO Medium blog: https://gibsonbiddle.medium.com/
Glovo is hiring: https://jobs.glovoapp.com/en/?d=4040726002
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Dec 17, 202101:02:46

Moving from Academia to Industry - CJ Jenkins
Moving from Academia to Industry - CJ Jenkins
We talked about:
CJ’s background
Evolutionary biology
Learning machine learning
Learning on the job and being honest with what you don’t know
Convincing that you will be useful
CJ’s first interview
Transitioning to industry
Tailoring your CV
Data science courses
Moving to Berlin
Being selective vs ‘spray and pray’
Moving on to new jobs
Plan for transitioning to industry
Requirements for getting hired
Publications, portfolios and pet projects
Adjusting to industry
Bad habits from academia
Topics with long-term value
CJ’s textbook
Links:
CJ's LinkedIn: https://www.linkedin.com/in/christina-jenkins/
Positions for master students: one two
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Dec 10, 202159:03

Advancing Big Data Analytics: Post-Doctoral Research - Eleni Tzirita Zacharatou
Advancing Big Data Analytics: Post-Doctoral Research - Eleni Tzirita Zacharatou
We talked about:
Eleni’s background
Spatial data analytics
Responsibilities of a postdoc
Publishing papers
Best places for data management papers
Differences between postdoc and PhD
Helping students become successful
Research at the DIMA group
Identifying important research directions
Reviewing papers
Underrated topics in data management
Research in data cleaning
Collaborating with others
Choosing the field for Master’s students
Choosing the topic for a Master thesis
Should I do a PhD?
Promoting computer science to female students
Links:
https://www.user.tu-berlin.de/tzirita/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Dec 03, 202101:00:45

Becoming a Data Product Manager - Sara Menefee
Becoming a Data Product Manager - Sara Menefee
We talked about:
Sara’s background
Product designer’s responsibilities
Data product manager’s responsibilities
Planning with the team
Design thinking and product design
Data PMs vs regular PMs
Skill requirements for Data PMs
Going from a product designer to a data product manager
Case studies
Resources for learning about product management
Data PM’s biggest challenge
Multitasking and context switching
Insights from user interviews
Using new, unfamiliar tools
Documentation
Idea generation
Do Data PMs need to know ML?
Links:
Product Management Courses: https://www.lennyrachitsky.com/course and https://www.reforge.com/mastering-product-management
Product Management Reading:
https://svpg.com/inspired-how-to-create-products-customers-love/ and https://steveblank.com/category/customer-development/
Data Engineering for Noobs: https://www.datacamp.com/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Nov 26, 202159:02

Data Science Manager vs Data Science Expert - Barbara Sobkowiak
Data Science Manager vs Data Science Expert - Barbara Sobkowiak
We talked about:
Barbara’s background
Do you need a manager or an expert?
Technical and non-technical requirements for managers
Importance of technical skills for managers
Responsibilities and skills of a manager
Importance of technical background for managers
Getting involved in business development and sales
Developing the team
Checking team’s work
Data science expert
Hiring experts
Who should we hire first?
Can an expert build a team?
Data science managers in startups
Project management
Ensuring that projects provide value
Questions before starting a project
Women in data science
Finding Barbara online
General advice
Link:
Barbara's LinkedIn: https://www.linkedin.com/in/barbara-sobkowiak-1a4a9568
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Nov 19, 202159:39

Ace Non-Technical Data Science Interviews - Nick Singh
Ace Non-Technical Data Science Interviews - Nick Singh
We talked about:
Nick’s background
Being a career coach
Overview of the hiring process
Behavioral interviews for data scientists
Preparing for behavioral interviews
Handling "tricky" questions
Project deep dive
Business context
Pacing, rambling, and honesty
“What’s your favorite model?”
What if I haven’t worked on a project that brought $1 mln?
Different questions for different levels
Product-sense interviews
Identifying key metrics in unfamiliar domains
Tech blogs
Cold emailing
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Nov 12, 202101:01:45

Becoming a Solopreneur in Data - Noah Gift
Becoming a Solopreneur in Data - Noah Gift
We talked about:
Noah’s background
Solopreneurship
A day of a solopreneur
Exponential vs linear work
Escaping the office work - digging the tunnel
Structuring goals
Staying motivated
Publishing books
Planning out books
Writing a book is like preparing to run a marathon
Distributed income
Getting started as a solopreneur
Lowering expenses and adding time
The right time to quit full-time
Building a network
Teaching at universities
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Nov 05, 202159:19

Building Business Acumen for Data Professionals - Thom Ives
Building Business Acumen for Data Professionals - Thom Ives
Links:
https://join.slack.com/t/integratedmlai/shared_invite/zt-r3hpj44k-gfhf1pzIt3jixrATyXCWnQ
https://www.linkedin.com/in/thomives/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Oct 29, 202101:05:30

Conquering the Last Mile in Data - Caitlin Moorman
Conquering the Last Mile in Data - Caitlin Moorman
We talked about:
Caitlin’s background
The last mile in data
The Pareto Principle
Failing to use data
Making sure data is used
Communicating with decision-makers
Working backwards from the last mile
Understanding how data drives decisions
Sketching and prototyping
Showing the benefits of power data
Measurability
Driving change in data
Asking high-leverage questions
Resistance from users
Understanding domain experts
Linear projects vs circular projects
Recommendations for data analyst students
Finding Caitlin online
Links:
Emelie's talk
https://locallyoptimistic.com/post/linear-and-circular-projects-part-1/
https://locallyoptimistic.com/post/linear-and-circular-projects-part-2/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Oct 22, 202101:02:02

Similarities and Differences between ML and Analytics - Rishabh Bhargava
Similarities and Differences between ML and Analytics - Rishabh Bhargava
We talked about:
Rishabh's background
Rishabh’s experience as a sales engineer
Prescriptive analytics vs predictive analytics
The problem with the term ‘data science’
Is machine learning a part of analytics?
Day-to-day of people that work with ML
Rule-based systems to machine learning
The role of analysts in rule-based systems and in data teams
Do data analysts know data better than data scientists?
Data analysts’ documentation and recommendations
Iterative work - data scientists/ML vs data analysts
Analyzing results of experiments
Overlaps between machine learning and analytics
Using tools to bridge the gap between ML and analytics
Do companies overinvest in ML and underinvest in analystics?
Do companies hire data scientists while forgetting to hire data analysts?
The difficulty of finding senior data analysts
Is data science sexier than data analytics?
Should ML and data analytics teams work together or independently?
Building data teams
Rishabh’s newsletter – MLOpsRoundup
Links:
https://mlopsroundup.substack.com/
https://twitter.com/rish_bhargava
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Oct 15, 202159:39

Building and Leading Data Teams - Tammy Liang
Building and Leading Data Teams - Tammy Liang
We talked about:
Tammy’s background
Being the chief of data
First projects as the first data person in a company
Initial resistance
Expanding the team
Role of business analyst
Platanomelon’s stack
Order for growing the data team
Demand forecasting
Should analysts know machine learning
Qualifications for the first data person in a company
Providing accurate results
Receiving insights in a timely manner
Providing useful insights
Giving ownership to the team
Starting as the first data person in a company
Data For Future podcast
Supporting team members that are stuck
Finding Tammy online
Links:
Tammy's podcast: https://dataforfuture.org/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Oct 08, 202159:10

What Researchers and Engineers Can Learn from Each Other - Mihail Eric
What Researchers and Engineers Can Learn from Each Other - Mihail Eric
We talked about:
Mihail’s background
NLP and self-driving vehicles
Transitioning from academia to the industry
Machine learning researchers
Finding open-ended problems
Machine learning engineers
Is data science more engineering or research?
What can engineers and researchers learn from one another?
Bridging the disconnect between researchers and engineers
Breaking down silos
Fluid roles
Full-stack data scientists
Advice to machine learning researchers
Advice to machine learning engineers
Reading papers
Choosing between engineering or research if you’re just starting
Confetti.ai
Links:
https://twitter.com/mihail_eric
http://confetti.ai/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Oct 01, 202101:01:44

Introducing Data Science in Startups - Marianna Diachuk
Introducing Data Science in Startups - Marianna Diachuk
We talked about:
Marianna’s background
Being the only data scientist
What should already be in the company
How much experience do you need
Identifying problems
Prioritization
What should the company already know?
First week
First month
First quarter
Managing expectations
Solving problems without ML
Project timelines
Finding the best solution
Evaluating performance
Getting stuck
Communicating with analysts
Transitioning from engineering to data science
Growing the team
Stopping projects
Questions for the company
From research to production
Wrapping up
Links:
Marianna's LinkedIn: https://www.linkedin.com/in/marianna-diachuk-53ba60116/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Sep 24, 202158:33

Defining Success: Metrics and KPIs - Adam Sroka
Defining Success: Metrics and KPIs - Adam Sroka
We talked about:
Adam’s background
Adam’s laser and data experience
Metrics and why do we care about them
Examples of metrics
KPIs
KPI examples
Derived KPIs
Creating metrics — grocery store example
Metric efficiency
North Star metrics
Threshold metrics
Health metrics
Data team metrics
Experiments: treatment and control groups
Accelerate metrics and timeboxing
Links:
Domino's article about measuring value: http://blog.dominodatalab.com/measuring-data-science-business-value
Adam's article about skills useful for data scientists: https://towardsdatascience.com/how-to-apply-your-hard-earned-data-science-skillset-812585e3cc06
Adam's article about standing out: https://towardsdatascience.com/how-to-stand-out-as-a-great-data-scientist-in-2021-3b7a732114a9
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Sep 17, 202101:02:51

Making Sense of Data Engineering Acronyms and Buzzwords - Natalie Kwong
Making Sense of Data Engineering Acronyms and Buzzwords - Natalie Kwong
We talked about:
Natalie’s background
Airbyte
What is ETL?
Why ELT instead of ETL?
Transformations
How does ELT help analysts be more independent?
Data marts and Data warehouses
Ingestion DB
ETL vs ELT
Data lakes
Data swamps
Data governance
Ingestion layer vs Data lake
Do you need both a Data warehouse and a Data lake?
Airbyte and ELT
Modern data stack
Reverse ETL
Is drag-and-drop killing data engineering jobs?
Who is responsible for managing unused data?
CDC – Change Data Capture
Slowly changing dimension
Are there cases where ETL is preferable over ELT?
Why is Airbyte open source?
The case of Elasticsearch and AWS
Links:
Natalie's LinkedIn: https://www.linkedin.com/in/nataliekwong/
https://airbyte.io/blog/why-the-future-of-etl-is-not-elt-but-el
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Sep 11, 202101:00:21

Mastering Algorithms and Data Structures - Marcello La Rocca
Mastering Algorithms and Data Structures - Marcello La Rocca
We talked about:
Learning algorithms and data structures
Resources for learning algorithms and data structures
Most important data structures
Learning the abstractions
Learning algorithms if they aren’t needed at work
Common mistakes when using wrong data structures
Importance of data structures for data scientists
Marcello’s book - Advanced Algorithms and Data Structures
Bloom filters
Where Bloom filters are useful
Approximate nearest neighbours
Searching for most similar vectors
Knowing frameworks vs knowing internals of data structures
Serializing Bloom filters
Algorithmic problems in job interviews
Important data structures for data scientists and data engineers
Learning by doing
Importance of compiled languages for data scientists
Links:
Marcello's book: Advanced Algorithms and Data Structures http://mng.bz/eP79 (promo code for 35% discount: poddatatalks21)
MIT, Introduction to Algorithms: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/
Algorithms specialization by Tim Roughgarden: https://www.coursera.org/specializations/algorithms
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Sep 03, 202101:02:11

Chief Data Officer - Marco De Sa
Chief Data Officer - Marco De Sa
We talked about:
Marco’s background
Role of CDO
Keeping track of many things
Becoming a CDO
Strategy vs tactics
VP of Data vs CDO
How many VPs of Data could be there?
Splitting the work between VP and CDO
Difference between CTO, CPO, and CDO
Breaking down the goals and working backwards from them
Assessing if we’re moving in the right direction
Dealing with many meetings
Being more effective
Building the data-driven culture
Challenges of working remotely
Does CDO need deep technical skills?
Importance of MBA
The key skills for becoming a CDO
Biggest challenges within OLX so far
Demonstrating the CDO skills on a job interview
Overcoming resistance
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Aug 27, 202101:01:55

Freelancing in Machine Learning - Mikio Braun
Freelancing in Machine Learning - Mikio Braun
We talked about:
Mikio’s background
What Mikio helps with
Moving from a full-time job to freelancing
Finding clients and importance of a strong network
Building a network
Initial meetings with clients
Understanding what clients need
Template for the offer (Million dollar consulting)
Deciding on rate type: hourly, daily, per project
Taking vacations (and paying twice for them)
Avoiding overworking
Specializing: consulting as a product
Working full-time as a principal vs being a consultant
Is the overhead worth it?
Getting a new client when you already have a project
After freelancing: what’s next?
Output of Mikio’s work
Learning new things
Lessons learned after finding clients
Registering as a freelancer in Germany
Personal liability of a freelancer
Effect of globalization and remote work on consulting
Advice for people who want to start freelancing
Woking full-time and freelancing at the same time
Books:
Million Dollar Consulting by Alan Weiss
Built to Sell by John Warrillow
Links:
Mikio's Twitter: https://twitter.com/mikiobraun
Mikio's LinkedIn: https://www.linkedin.com/in/mikiobraun/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Aug 20, 202101:02:05

Launching a Startup: From Idea to First Hire - Carmine Paolino
Launching a Startup: From Idea to First Hire - Carmine Paolino
We talked about:
Carmine’s background
Carmine’s startup FreshFlow
Doing user research
Design thinking
Entrepreneur first
Finding co-founders: the “expertise edges” framework
The structure of the EF program
Coming up with the idea
How important is going through a startup accelerator?
Finding your first client
Finding investors
Consequences of having a bad investor
Splitting responsibilities between co-founders
Hiring
The importance of delegating
Making work attractive to hires
Plans for the future
Just-in-time supply chain
What would you have done differently?
Advice for people starting a startup
Don’t focus on skills only
Getting motivation
Am I ready for a startup?
Importance of a business school
Advice on finding a co-founder
Do I need EF if I already have an idea?
Having a prototype before the pitch
Books:
The Mom Test by Rob Fitzpatrick
Design Thinking by Robert Curedale
Links:
FreshFlow: https://freshflow.ai/
Carmine's LinkedIn: https://www.linkedin.com/in/carminepaolino
Carmine's Twitter: https://twitter.com/paolino
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Aug 13, 202101:07:28
![Approach Learning as ML Project - Vladimir Finkelshtein [mini]](https://d3t3ozftmdmh3i.cloudfront.net/production/podcast_uploaded_nologo400/10831690/10831690-1640897292705-808f61387ec27.jpg)
Approach Learning as ML Project - Vladimir Finkelshtein [mini]
Approach Learning as ML Project - Vladimir Finkelshtein [mini]
We don't have an episode lined up for this week, but we recorded a small chat with Vladimir some time ago. Enjoy it!
We talked about:
Vladimir's background
Learning by answering questions
Don't be afraid of being wrong
Winnings books
Learning random things
Approach learning as a machine learning project
Links:
Vladimir on LinkedIn: https://www.linkedin.com/in/vladimir-finkelshtein/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Aug 06, 202113:56

Humans in the Loop - Lina Weichbrodt
Humans in the Loop - Lina Weichbrodt
We talked about:
Lina’s background
What we need to remember when starting a project (checklists)
Make sure the problem is formalized and close to the core business
Get the buy-in with stakeholders
Building trust with stakeholders
Don’t just focus on upsides – ask about concerns
Turning a concert into a metric
What happens when something goes wrong?
Post mortem reporting
Apply the 5 why’s
If a lot of users say it’s a bug – it’s worth investigating
Post mortem format
Action points
Debugging vs explaining the model
Are there online versions of checklists?
Make sure to log your inputs
Talking to end-users and using your own service
Your ideas vs Stakeholder ideas
Should data practitioners educate the team about data?
People skills and ‘dirty’ hacks
Where to find Lina
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jul 30, 202157:55

Running from Complexity - Ben Wilson
Running from Complexity - Ben Wilson
We talked about:
Ben’s Background
Building solutions for customers
Why projects don’t make it to production
Why do people choose overcomplicated solutions?
The dangers of isolating data science from the business unit
The importance of being able to explain things
Maximizing chances of making into production
The IKEA effect
Risks of implementing novel algorithms
If it can be done simply – do that first
Don’t become the guinea pig for someone’s white paper
The importance of stat skills and coding skills
Structuring an agile team for ML work
Timeboxing research
Mentoring
Ben’s book
‘Uncool techniques’ at AI-First companies
Should managers learn data science?
Do data scientists need to specialize to be successful?
Links:
Ben's book: https://www.manning.com/books/machine-learning-engineering-in-action (get 35% off with code "ctwsummer21")
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jul 23, 202101:11:43

I Want to Build a Machine Learning Startup! - Elena Samuylova
I Want to Build a Machine Learning Startup! - Elena Samuylova
We talked about:
Elena’s background
Why do a startup instead of being an employee?
Where to get ideas for your startup
Finding a co-founder
What should you consider before starting a startup?
Vertical startup vs infrastructure startup
‘AI First’ startups
Building tools for engineers
What skills do you need to start a startup?
Startup risks
How to be prepared to fail
Work-life balance
The part-time startup approach
Startup investment models
No resources and no technical expertise – what to do?
Productionizing your services
When to hire an expert
Talking to people with a problem before solving the problem
Starting Elena’s startup, Evidently
Elena’s role at Evidently
Why is Evidently open source?
“People will just copy my open source code. Should I be concerned?”
Bottom-up adoption
Creating value so that clients engage with your product
Is there a difference between countries when creating a startup?
Does open source mean the data is safer?
When should you hire engineers?
Following the market
Startups out of genuine interest vs Just for money and for fun
Links:
EvidentlyAI: https://evidentlyai.com/
Elena's LinkedIn: https://www.linkedin.com/in/elenasamuylova/
Elena's Twitter: https://twitter.com/elenasamuylova/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jul 16, 202158:26

Big Data Engineer vs Data Scientist - Roksolana Diachuk
Big Data Engineer vs Data Scientist - Roksolana Diachuk
Links:
Twitter: https://twitter.com/dead_flowers22
LinkedIn: https://www.linkedin.com/in/roksolanadiachuk/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jul 09, 202101:01:30

Build Your Own Data Pipeline - Andreas Kretz
Build Your Own Data Pipeline - Andreas Kretz
We talked about:
Andreas’s background
Why data engineering is becoming more popular
Who to hire first – a data engineer or a data scientist?
How can I, as a data scientist, learn to build pipelines?
Don’t use too many tools
What is a data pipeline and why do we need it?
What is ingestion?
Can just one person build a data pipeline?
Approaches to building data pipelines for data scientists
Processing frameworks
Common setup for data pipelines — car price prediction
Productionizing the model with the help of a data pipeline
Scheduling
Orchestration
Start simple
Learning DevOps to implement data pipelines
How to choose the right tool
Are Hadoop, Docker, Cloud necessary for a first job/internship?
Is Hadoop still relevant or necessary?
Data engineering academy
How to pick up Cloud skills
Avoid huge datasets when learning
Convincing your employer to do data science
How to find Andreas
Links:
LinkedIn: https://www.linkedin.com/in/andreas-kretz
Data engieering cookbook: https://cookbook.learndataengineering.com/
Course: https://learndataengineering.com/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jul 02, 202101:01:53

From Software Engineering to Machine Learning - Santiago Valdarrama
From Software Engineering to Machine Learning - Santiago Valdarrama
We talked about:
Santiago’s background
“Transitioning to ML” vs “Adding ML as a skill”
Getting over the fear of math for software developers
Learning by explaining
Seven lessons I learned about starting a career in machine learning
Lesson 1 – Take the first step
Lesson 2 – Learning is a marathon, not a sprint
Lesson 3 – If you want to go quickly, go alone. If you want to go far, go together.
Lesson 4 – Do something with the knowledge you gain
Lesson 5 – ML is not just math. Math is not scary.
Lesson 6 – Your ability to analyze a problem is the most important skill. Coding is secondary.
Lesson 7 – You don’t need to know every detail
Tools and frameworks needed to transition to machine learning
Problem-based learning vs Top-down learning
Learning resources
Santiago’s favorite books
Santiago’s course on transitioning to machine learning
Improving coding skills
Building solutions without machine learning
Becoming a better engineer
What is the difference between machine learning and data science?
Getting into machine learning - Reiteration
Getting past the math
Links:
Santiago's Twitter: https://twitter.com/svpino
Santiago's course: https://gumroad.com/svpino#kBjbC
Pinned tweet with a roadmap: https://twitter.com/svpino/status/1400798154732212230
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jun 25, 202159:43

Analytics Engineer: New Role in a Data Team - Victoria Perez Mola
Analytics Engineer: New Role in a Data Team - Victoria Perez Mola
Links:
https://www.notion.so/Analytics-Engineer-New-Role-in-a-Data-Team-9decbf33825c4580967cf3173eb77177
https://www.linkedin.com/in/victoriaperezmola/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Conference: https://datatalks.club/conferences/2021-summer-marathon.html
Jun 18, 202159:55

Data Governance - Jessi Ashdown, Uri Gilad
Data Governance - Jessi Ashdown, Uri Gilad
We talked about:
Jessi’s background
Uri’s background
Data governance
Implementing data governance: policies and processes
Reasons not to have data governance
Start with “why”
Cataloging and classifying our data
Let data work for you
The human component
Data quality
Defining policies
Implementing policies
Shopping-card experience for requesting data
Proving the value of data catalog
Using data catalog
Data governance = data catalog?
Links:
Book: https://www.oreilly.com/library/view/data-governance-the/9781492063483/
Jessi’s LinkedIn: https://www.linkedin.com/in/jashdown/
Uri’s LinkedIn: https://linkedin.com/in/ugilad
Uri’s Twitter: https://twitter.com/ugilad
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Conference: https://datatalks.club/conferences/2021-summer-marathon.html
Jun 11, 202157:59

What Data Scientists Don’t Mention in Their LinkedIn Profiles - Yury Kashnitsky
What Data Scientists Don’t Mention in Their LinkedIn Profiles - Yury Kashnitsky
We talked about:
Yury’s background
Failing fast: Grammarly for science
Not failing fast: Keyword recommender
Four steps to epiphany
Lesson learned when bringing XGBoost into production
When data scientists try to be engineers
Joining a fintech startup: Doing NLP with thousands of GPUs
Working at a Telco company
Having too much freedom
The importance of digital presence
Work-life balance
Quantifying impact of failing projects on our CVs
Business trips to Perm: don’t work on the weekend
What doesn’t kill you makes you stronger
Links:
Yury's course: https://mlcourse.ai/
Yury's Twitter: https://twitter.com/ykashnitsky
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Jun 04, 202159:56

Becoming a Data-led Professional - Arpit Choudhury
Becoming a Data-led Professional - Arpit Choudhury
We talked about:
Data-led academy
Arpit’s background
Growth marketing
Being data-led
Data-led vs data-driven
Documenting your data: creating a tracking plan
Understanding your data
Tools for creating a tracking plan
Data flow stages
Tracking events — examples
Collecting the data
Storing and analyzing the data
Data activation
Tools for data collection
Data warehouses
Reverse ETL tools
Customer data platforms
Modern data stack for growth
Buy vs build
People we need to in the data flow
Data democratization
Motivating people to document data
Product-led vs data-led
Links:
https://dataled.academy/
Join our Slack: https://datatalks.club/slack.html
May 28, 202101:00:20

How to Market Yourself (without Being a Celebrity) - Shawn Swyx Wang
How to Market Yourself (without Being a Celebrity) - Shawn Swyx Wang
We talked about:
Shawn’s background and his book
Marketing ourselves
Components of personal marketing
Personal brand for an average developer
Picking a domain: what to write about?
Being too niche
Finding a good niche
Learning in public
Borrowed platforms vs own platform
Starting on social media: Picking what they put down
Career transitioning: mutual exchange of value
Personal marketing for getting a new job
Getting hired through the back door
Finding content ideas
Marketing yourself in public — summary
Open-source knowledge
Internal marketing: promoting ourselves at work
Signature initiative
Public speaking
Wrapping up
Discount for the coding career book
75% of the engineering ladder criteria are not technical
Links:
Shawn's personal page: https://www.swyx.io/
Twitter: https://twitter.com/swyx
Book of the week page: https://datatalks.club/books/20210510-the-coding-career-handbook.html (with a discount for DTC members!)
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
May 21, 202101:02:57

From Physics to Machine Learning - Tatiana Gabruseva
From Physics to Machine Learning - Tatiana Gabruseva
We talked about:
Tatiana’s background
12 career hacks and changing career
Hack #1: Change your social circle
Hack #2: Forget your fears and stereotypes
Hack #3: Forget distractions
Hack #4: Don’t overestimate others and don’t underestimate yourself
Hack #5: Attention genius
Hack #6: Make a team
Hack #7: Less is more. Forget about perfectionism
Hack #8: Initial creation
Hack #9: Find mentors
Hack #10: Say “no”
Hack #11: Look for failures
Hack #12: Take care of yourself
Kaggle vs internships and pet projects
Resources for learning machine learning
Starting with Kaggle
Improving focus
Astroinformatics
How background in Physics is helpful for transitioning
Leaving academia
Preparing for interviews
Links:
Mock interviews: https://www.pramp.com/
Learning ML: https://www.coursera.org/learn/machine-learning and https://www.coursera.org/specializations/deep-learning
Python: https://www.coursera.org/learn/machine-learning-with-python
SQL: https://www.sqlhabit.com/
Practice: https://www.kaggle.com/
MIT 6.006: https://courses.csail.mit.edu/6.006/fall11/notes.shtml
Coding: https://leetcode.com/
System design: https://www.educative.io/courses/grokking-the-system-design-interview
Ukrainian telegram groups for interview preparation: https://t.me/FaangInterviewChannel, https://t.me/FaangTechInterview, https://t.me/FloodInterview
Join DataTalks.Club: https://datatalks.club/slack.html
May 14, 202101:06:33

What I Learned After Interviewing 300 Data Scientists - Oleg Novikov
What I Learned After Interviewing 300 Data Scientists - Oleg Novikov
We talked about:
Oleg’s background
Standing out in recruitment process
NextRound — a service for free mock interviews
Why rejections are generic
Starting NextRount — preparing a list of situations
Steps in the interview process
Read the job description!
CV is your landing page
Take-home assignments
Questions about your past experience
Hypothetical case questions
Technical rounds
Handling rejections
What to do after receiving an offer?
Do recruiters pay attention to age?
Getting a job with a PhD — it’s a cold start problem
Should I answer rejection emails?
Negotiating when my salary is low
Should I apply for jobs that require 5 years of experience?
Tricking applicant tracking systems
What else Oleg learned after interviewing 300 data scientists
How a horse's ass determined the design of a space shuttle
Links:
Oleg's service for interviews: https://nextround.cc/
LinkedIn: https://www.linkedin.com/in/olegnovikov/
Join DataTalks.Club: https://datatalks.club/slack.html
May 07, 202101:08:36

Effective Communication with Business for Data Professionals - Lior Barak
Effective Communication with Business for Data Professionals - Lior Barak
We talked about:
DataTalks.Club intro
Lior’s background
Who is a data strategist?
Improving communication between business and tech
Building trust
Putting data and business people together
Dealing with pushbacks
Building things in the lean way (and growing tomatoes)
Starting with ugly code
Convincing others to take our code
MVP vs development and Hummus
Talking to people who can’t code
Break down the silos
Hummus
Hummus places in Berlin
Lior’s book: Data is Like a Plate of Hummus
Data chaos
Links:
Book: https://www.amazon.com/-/en/Sarah-Mayor/dp/B086L277LZ (can be found on any amazon store)
Company: https://www.taleaboutdata.com/
Podcast: https://podcast.whatthedatapodcast.com/
Linkedin: https://www.linkedin.com/in/liorbarak/
Twitter: https://twitter.com/liorb
Hummus places in Berlin:
Azzam: https://goo.gl/maps/uCkb3ATc5CVKapDa6
Akkawy: https://g.page/akkawy
The Eatery Berlin: https://g.page/theeateryberlin
Join DataTalks.Club: https://datatalks.club/slack.html
Apr 30, 202157:23

Data Observability - Barr Moses
Data Observability - Barr Moses
We covered:
Barr’s background
Market gaps in data reliability
Observability in engineering
Data downtime
Data quality problems and the five pillars of data observability
Example: job failing because of a schema change
Three pillars of observability (good pipelines and bad data)
Observability vs monitoring
Finding the root cause
Who is accountable for data quality? (the RACI framework)
Service level agreements
Inferring the SLAs from the historical data
Implementing data observability
Data downtime maturity curve
Monte carlo: data observability solution
Open source tools
Test-driven development for data
Is data observability cloud agnostic?
Centralizing data observability
Detecting downstream and upstream data usage
Getting bad data vs getting unusual data
Links:
Learn more about Monte Carlo: https://www.montecarlodata.com/
The Data Engineer's Guide to Root Cause Analysis: https://www.montecarlodata.com/the-data-engineers-guide-to-root-cause-analysis/
Why You Need to Set SLAs for Your Data Pipelines: https://www.montecarlodata.com/how-to-make-your-data-pipelines-more-reliable-with-slas/
Data Observability: The Next Frontier of Data Engineering: https://www.montecarlodata.com/data-observability-the-next-frontier-of-data-engineering/
To get in touch with Barr, ping her in the DataTalks.Club group or use barr@montecarlodata.com
Join DataTalks.Club: https://datatalks.club/slack.html
Apr 23, 202101:01:44

Shifting Career from Analytics to Data Science - Andrada Olteanu
Shifting Career from Analytics to Data Science - Andrada Olteanu
We talked about:
Andrada’s background
Recommended courses
Kaggle and StackOverflow
Doing notebooks on Kaggle
Projects for learning data science
Finding a job and a mentor with Kaggle’s help
The process for looking for a job
Main difficulties of getting a job
Project portfolio and Kaggle
Helpful analytical skills for transitioning into data science
Becoming better at coding
Learning by imitating
Is doing masters helpful?
Getting into data science without a masters
Kaggle is not just about competitions
The last tip: use social media
Links:
https://www.kaggle.com/andradaolteanu
https://twitter.com/andradaolteanuu
https://www.linkedin.com/in/andrada-olteanu-3806a2132/
Join DataTalks.Club: https://datatalks.club/slack.html
Apr 16, 202101:02:34

Transitioning from Project Management to Data Science - Ksenia Legostay
Transitioning from Project Management to Data Science - Ksenia Legostay
We talked about:
Knesia’s background
Data analytics vs data science
Skills needed for data analytics and data science
Benefits of getting a masters degree
Useful online courses
How project management background can be helpful for the career transition
Which skills do PMs need to become data analysts?
Going from working with spreadsheets to working with python
Kaggle
Productionizing machine learning models
Getting experience while studying
Looking for a job
Gap between theory and practice
Learning plan for transitioning
Last tips and getting involved in projects
Links:
Notes prepared by Ksenia with all the info: https://www.notion.so/ksenialeg/DataTalks-Club-7597e55f476040a5921db58d48cf718f
Join DataTalks.Club: https://datatalks.club/slack.html
Apr 09, 202101:03:32

Building Online Tech Communities - Demetrios Brinkmann
Building Online Tech Communities - Demetrios Brinkmann
We talked about:
Demetrious’ background and starting the MLOps community
Growing MLOps community
Community moderations and dealing with problems
Becoming a community and connecting with people
Feeling belonged
Managing a community as an introvert
Keeping communities active
Doing custdev and talking to users
Random coffee and meeting with community members
Organizing community activities
Is community a business?
Five steps for starting a community in 2021
Shameless plug from Demetrious
Links:
https://mlops.community/
Join DataTalks.Club: https://datatalks.club/slack.html
Apr 02, 202101:13:52

DataOps 101 - Lars Albertsson
DataOps 101 - Lars Albertsson
We talked about:
Lars’ career
Doing DataOps before it existed
What is DataOps
Data platform
Main components of the data platform and tools to implement it
Books about functional programming principles
Batch vs Streaming
Maturity levels
Building self-service tools
MLOps vs DataOps
Data Mesh
Keeping track of transformations
Lake house
Links:
https://www.scling.com/reading-list/
https://www.scling.com/presentations/
Join DataTalks.Club: https://datatalks.club/slack.html
Mar 26, 202101:09:26

The Essentials of Public Speaking for Career in Data Science - Ben Taylor
The Essentials of Public Speaking for Career in Data Science - Ben Taylor
We talked about:
Ben’s background
AI evangelism
Ben’s first experiences speaking in public
Becoming a great speaker
Key Takeaways and Call to Action
Making a good introduction
Being Remembered
Writing a talk proposal for conferences
Landing a keynote
Good topics to start talks on
Pitching a solution talk to meetup organizers
Top public speaking skill to acquire
Book recommendations
Join DataTalks.Club: https://datatalks.club/slack.html
Mar 19, 202101:08:48

New Roles and Key Skills to Monetize Machine Learning - Vin Vashishta
New Roles and Key Skills to Monetize Machine Learning - Vin Vashishta
We discussed monetization roles and the capabilities people need to move into those roles.
The key roles are ML Researcher, ML Architect, and ML Product Manager.
We talked about:
Vin's career journey
What does it mean to "monetize machine learning"
Important monetization metrics
Who should we have on the team to make a project successful
Machine Learning Researcher (applied and scientist) - background, responsibilities, and needed skills
Developing new categories
The best recipe for a startup: angry users + data scientists
What research actually is
ML Product Manager - background, responsibilities, and needed skills
How product managers can actually manage all their responsibilities (and they have a lot of them!)
ML Architect - background, responsibilities, and needed skills
Path to becoming an architect
How should we change education to make it more effective
Important product metrics
And more!
Links:
https://twitter.com/v_vashishta
https://linkedin.com/in/vineetvashishta
https://databyvsquared.com/
Join DataTalks.Club: https://datatalks.club/slack.html
Mar 12, 202101:19:52

Personal Branding - Admond Lee Kin Lim
Personal Branding - Admond Lee Kin Lim
We talked about:
Admond's career journey
What is personal brand
How Admond started being active online
Publishing on medium and LinkedIn
Idea generation process and tools
Other platforms
Podcasts
Offline presence
1x1 meetings
Speaking on conferences
Having confidence to publish
Selling online courses
Personal values
Admond's course
And many other things
Links:
https://twitter.com/admond1994
https://linkedin.com/in/admond1994
https://buzzsumo.com
https://feedly.com/
https://lunchclub.com/
https://thelead.io/data-scientist-personal-brand-toolkit?utm_medium=instructor&utm_source=admond
Join DataTalks.Club: https://datatalks.club/slack.html
Mar 05, 202101:13:14

The ABC’s of Data Science - Danny Ma
The ABC’s of Data Science - Danny Ma
Did you know that there are 3 types different types of data scientists? A for analyst, B for builder, and C for consultant - we discuss the key differences between each one and some learning strategies you can use to become A, B, or C.
We talked about:
Inspirations for memes
Danny's background and career journey
The ABCs of data science - the story behind the idea
Data scientist type A - Analyst
Skills, responsibilities, and background for type A
Transitioning from data analytics to type A data scientist (that's the path Danny took)
How can we become more curious?
Data scientist B - Builder
Responsibilities and background for type B
Transitioning from type A to type B
Most important skills for type B
Why you have to learn more about cloud
Data scientist type C - consultant
Skills, responsibilities, and background for type C
Growing into the C type
Ideal data science team
Important business metrics
Getting a job - easier as type A or type B?
Looking for a job without experience
Two approaches for job search: "apply everywhere" and "apply nowhere"
Are bootcamps useful?
Learning path to becoming a data scientist
Danny's data apprenticeship program and "Serious SQL" course
Why SQL is the most important skill
R vs Python
Importance of Masters and PhD
Links:
Danny's profile on LinkedIn: https://linkedin.com/in/datawithdanny
Danny's course: https://datawithdanny.com/
Trailer: https://www.linkedin.com/posts/datawithdanny_datascientist-data-activity-6767988552811847680-GzUK/
Technical debt paper: https://proceedings.neurips.cc/paper/2015/hash/86df7dcfd896fcaf2674f757a2463eba-Abstract.html
Join DataTalks.Club: https://datatalks.club/slack.html
Feb 26, 202101:25:49

Translating ML Predictions Into Better Real-World Results with Decision Optimization - Dan Becker
Translating ML Predictions Into Better Real-World Results with Decision Optimization - Dan Becker
We talked about:
How we make decisions with machine learning
What is decision optimization
Specifying the decision function
Emulation for making the best decisions
Decision optimization and reinforcement learning
Getting started with decision optimization
Trends in the industry
Links:
https://datatalks.club/people/danbecker.html
https://www.decision.ai/
Join DataTalks.Club: https://datatalks.club/slack.html
Feb 19, 202155:44

Feature Stores: Cutting through the Hype - Willem Pienaar
Feature Stores: Cutting through the Hype - Willem Pienaar
We covered:
What is a feature store
Problems it solves
When to use a feature store
When not to use a feature store
The main components
When a team should start using a feature store
Links:
Feast: https://feast.dev/
https://www.tecton.ai/blog/what-is-a-feature-store/
https://docs.greatexpectations.io/en/latest/reference/core_concepts.html
Join DataTalks.Club: https://datatalks.club
Feb 12, 202101:01:06

The Rise of MLOps - Theofilos Papapanagiotou
The Rise of MLOps - Theofilos Papapanagiotou
We covered:
What is MLOps
The difference between MLOps and ML Engineering
Getting into MLOps
Kubeflow and its components, ML Platforms
Learning Kubeflow
DataOps
And other things
Links:
Microsoft MLOps maturity model: https://docs.microsoft.com/en-us/azure/architecture/example-scenario/mlops/mlops-maturity-model
Google MLOps maturity levels: https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
MLOps roadmap 2020-2025: https://github.com/cdfoundation/sig-mlops/blob/master/roadmap/2020/MLOpsRoadmap2020.md
Kubeflow website: https://www.kubeflow.org/
TFX Paper: https://research.google/pubs/pub46484/
Join DataTalks.Club: https://datatalks.club
Feb 05, 202101:02:51

Getting Started with Open Source - Vincent Warmerdam
Getting Started with Open Source - Vincent Warmerdam
We talked about
open source
getting started with open source
convincing your employer to contribute to open source
public speaking
the checklist for open source projects
the role of research advocate
And many more things!
Links from Vincent:
https://www.youtube.com/watch?v=68ABAU_V8qI&t=975s&ab_channel=PyData
https://www.youtube.com/watch?v=kYMfE9u-lMo&t=958s&ab_channel=PyData
https://koaning.io/projects.html
https://calmcode.io/
https://makenames.io/
https://koaning.github.io/clumper/api/clumper.html
Join DataTalks.Club: https://datatalks.club
Jan 29, 202101:02:47

Developer Advocacy for Data Science - Elle O'Brien
Developer Advocacy for Data Science - Elle O'Brien
We talked about development advocacy for data science.
We covered
The role of a developer advocate
The skills needed for the job and the responsibilities
How to become a developer advocate
You can find Elle on:
Twitter: https://twitter.com/DrElleOBrien
LinkedIn: https://linkedin.com/in/drelleobrien
DVC's youtube channel: https://www.youtube.com/channel/UC37rp97Go-xIX3aNFVHhXfQ
Join DataTalks.Club: https://datatalks.club
Jan 23, 202155:36

The Importance of Writing in a Tech Career - Eugene Yan
The Importance of Writing in a Tech Career - Eugene Yan
We talk about blogging technical writing. We cover:
Why should we write online?
What should we write about?
Writing at work: Design documents, wikis, etc.
The writing process (also at work)
Eugene's website: eugeneyan.com
Follow Eugene on Twitter: https://twitter.com/eugeneyan
Suggest topics: https://eugeneyan.com/topic-poll/
Join DataTalks.Club: https://datatalks.club
Jan 15, 202157:24

Mentoring - Rahul Jain
Mentoring - Rahul Jain
We talked about:
The role of mentoring in career
Looking for mentors and preparing for mentoring sessions as a mentee
Becoming a mentor
And many other things!
Links:
Rahul's profile on the mentoring club: https://www.mentoring-club.com/the-mentors/rahul-jain
Rahul's article about mentoring: https://rahulj51.github.io/career/coaching/mentoring/2020/06/22/career-coaching.html
Join DataTalks.Club: https://datatalks.club
Dec 25, 202056:12

Standing out as a Data Scientist - Luke Whipps
Standing out as a Data Scientist - Luke Whipps
We covered:
Getting the recruiter's attention
Making CV look great
Tailoring your application to the position
And many other things!
Luke's LinkedIn profile: https://www.linkedin.com/in/lukewhipps/
Join DataTalks.Club: https://datatalks.club
Dec 18, 202001:09:26

Building a Data Science Team - Dat Tran
Building a Data Science Team - Dat Tran
We talked about:
Dat's career so far and the startup he co-founded (Priceloop)
Who to hire first in a data team
How to hire the first data scientist
And many other things!
You can find Dat on LinkedIn: https://www.linkedin.com/in/dat-tran-a1602320/
Join DataTalksClub: https://datatalks.club
Dec 11, 202058:45

Processes in a Data Science Project - Alexey Grigorev
Processes in a Data Science Project - Alexey Grigorev
In this podcast, we talk about CRISP-DM - a methodology for organizing data science projects
DataTalks.Club is the place to talk about data. Join our community: https://datatalks.club
Read more about CRISP-DM here: https://mlbookcamp.com/article/crisp-dm
Dec 04, 202031:33

Roles in a data team - Alexey Grigorev
Roles in a data team - Alexey Grigorev
We talked about:
- different roles in a data team: product managers, data analysts, data engineers, data scientists, ML engineers, MLOps engineers
- their responsibilities
- the skills they need
DataTalks.Club is the place to talk about data. Join our community: https://datatalks.club
Nov 21, 202042:45