
Unboxing AI: The Podcast for Computer Vision Engineers
By Unboxing AI
I'm Gil Elbaz, Co-founder and CTO of Datagen. In this podcast, I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain a different perspective, as well as enjoying engaging conversation. It’s about much more than the technical processes – it’s about people, journeys, and ideas. Turn up the volume, insights inside.

Synthetic Data: Simulation & Visual Effects at Scale
Synthetic Data: Simulation & Visual Effects at Scale
Unboxing AI: The Podcast for Computer Vision EngineersJan 04, 2023
00:00
53:51

Synthetic Data: Simulation & Visual Effects at Scale
Synthetic Data: Simulation & Visual Effects at Scale
ABSTRACT
Gil Elbaz speaks with Tadas Baltrusaitis, who recently released the seminal paper DigiFace 1M: 1 Million Digital Face Images for Face Recognition. Tadas is a true believer in synthetic data and shares his deep knowledge of the subject along with insights on the current state of the field and what CV engineers need to know. Join Gil as they discuss morphable models, multimodal learning, domain gaps, edge cases and more
TOPICS & TIMESTAMPS
0:00 Introduction
2:06 Getting started in computer science
3:40 Inferring mental states from facial expressions
7:16 Challenges of facial expressions
8:40 Open Face
10:46 MATLAB to Python
13:17 Multimodal Machine Learning
15:52 Multimodals and Synthetic Data
16:54 Morphable Models
19:34 HoloLens
22:07 Skill Sets for CV Engineers
25:25 What is Synthetic Data?
27:07 GANs and Diffusion Models
31:24 Fake it Til You Make It
35:25 Domain Gaps
36:32 Long Tails (Edge Cases)
39:42 Training vs. Testing
41:53 Future of NeRF and Diffusion Models
48:26 Avatars and VR/AR
50:39 Advice for Next Generation CV Engineers
51:58 Season One Wrap-Up
LINKS & RESOURCES
Tadas Baltrusaitis
LinkedIn Github Google Scholar
Fake it Til You Make It
Video
Github
Digiface 1M
A 3D Morphable Eye Region Model for Gaze Estimation
Hololens
Multimodal Machine Learning: A Survey and Taxonomy
3d face reconstruction with dense landmarks
Open Face
Open Face 2.0
Dr. Rana el Kaliouby
Dr. Louis-Philippe Morency
Peter Robinson
Jamie Shotton
Errol Wood
Affectiva
GUEST BIO
Tadas Baltrusaitis is a principal scientist working in the Microsoft Mixed Reality and AI lab in Cambridge, UK where he leads the human synthetics team. He recently co-authored the groundbreaking paper DigiFace 1M, a data set of 1 million synthetic images for facial recognition. Tadas is also the co-author of Fake It Till You Make It: Face Analysis in the Wild Using Synthetic Data Alone, among other outstanding papers. His PhD research focused on automatic facial expression analysis in difficult real world settings and he was a postdoctoral associate at Carnegie Mellon University where his primary research lay in automatic understanding of human behavior, expressions and mental states using computer vision.
ABOUT THE HOST
I’m Gil Elbaz, co-founder and CTO of Datagen. In this podcast, I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain a different perspective, as well as enjoying engaging conversation. It’s about much more than the technical processes – it’s about people, journeys, and ideas. Turn up the volume, insights inside.
Jan 04, 202353:51

SLAM and the Evolution of Spatial AI
SLAM and the Evolution of Spatial AI
Host Gil Elbaz welcomes Andrew J. Davison, the father of SLAM. Andrew and Gil dive right into how SLAM has evolved and how it started. They speak about Spatial AI and what it means along with a discussion about global belief propagation. Of course, they talk about robotics, how it's impacted by new technologies like NeRF and what is the current state-of-the-art.
Timestamps and Topics
[00:00:00] Intro
[00:02:07] Early Research Leading to SLAM
[00:04:49] Why SLAM
[00:08:20] Computer Vision Based SLAM
[00:09:18] MonoSLAM Breakthrough
[00:13:47] Applications of SLAM
[00:16:27] Modern Versions of SLAM
[00:21:50] Spatial AI
[00:26:04] Implicit vs. Explicit Scene Representations
[00:34:32] Impact on Robotics
[00:38:46] Reinforcement Learning (RL)
[00:43:10] Belief Propagation Algorithms for Parallel Compute
[00:50:51] Connection to Cellular Automata
[00:55:55] Recommendations for the Next Generation of Researchers
Interesting Links:
Andrew Blake
Hugh Durrant-Whyte
John Leonard
Steven J. Lovegrove
Alex Mordvintsev
Prof. David Murray
Richard Newcombe
Renato Salas-Moreno
Andrew Zisserman
A visual introduction to Gaussian Belief Propagation
Github: Gaussian Belief Propagation
A Robot Web for Distributed Many-Device Localisation
In-Place Scene Labelling and Understanding with Implicit Scene Representation
Video
Video: Robotic manipulation of object using SOTA
Andrew Reacting to NERF in 2020
Cellular automata
Neural cellular automata
Dyson Robotics
Guest Bio
Andrew Davison is a professor of Robot Vision at the Department of Computing, Imperial College London. In addition, he is the director and founder of the Dyson robotics laboratory. Andrew pioneered the cornerstone algorithm - SLAM (Simultaneous Localisation and Mapping) and has continued to develop SLAM in substantial ways since then. His research focus is in improving & enhancing SLAM in terms of dynamics, scale, detail level, efficiency and semantic understanding of real-time video. SLAM has evolved into a whole new domain of “Spatial AI” leveraging neural implicit representations and the suite of cutting-edge methods creating a full coherent representation of the real world from video.
About the Host
I'm Gil Elbaz, co-founder and CTO of Datagen. I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain a different perspective, as well as enjoying engaging conversation. It's about much more than the technical processes. It's about people, journeys and ideas. Turn up the volume, insights inside.
Nov 07, 202201:02:34

The Next Frontier: Computer Vision on 3D Data - with Or Litany, Sr. Research Scientist, NVIDIA
The Next Frontier: Computer Vision on 3D Data - with Or Litany, Sr. Research Scientist, NVIDIA
Gil Elbaz hosts Or Litany, a senior research scientist at NVIDIA. They discuss the impact of 3D on computer vision and where it’s going in the near future. As well, they talk about the impact of industry on academia and vice versa. Or speaks about the future of 3D generative models, NeRF and how multi-modal models are changing computer vision. Together, Gil and Or explore the best ways to succeed in the field of AI.
TOPICS & TIMESTAMPS
[0:34] Intro
[2:01] Starting his journey
[5:03] Heat transfer equation in graphics
[10:21] Multimodal changing Computer Vision
[17:47] Why is 3D Important?
[23:17] 3D Generative Models in the next years
[26:25] Neural Rendering
[29:39] Connection between images/video & 3D
[31:39] Temporal Data
[33:45] Autonomous Driving & Simulation
[36:27] Prof Leonidas Guibas
[41:56] NeRF & Editing 3D information
[46:02] Manipulation of 3D representations
[52:23] Future of NeRF
[1:02:31] Google
[1:06:03] Meta [FAIR] experience
[1:09:57] Nvidia
[1:10:58] Sanya Fidler
[1:16:38] Consciousness
[1:21:31] Career Tips for Computer Vision Engineers
Or Litany:
LinkedIn
Google Scholar
Github
Interesting links:
Alex Bronstein
Angel Chang
Sanja Fidler
Leonidas Guibas
Judy Hoffman
Justin Johnson
Fei-Fei Li
Ameesh Makadia
Manolis Saava
Srinath Sridhar
Charles Ruizhongtai Qi
PointNet
Red-Black Tree
Nvidia
Two minute papers
The Three Body Problem
EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks
GUEST BIO
Our guest is Or Litany. Or Litany currently works as a senior research scientist at Nvidia. He earned his BSC in physics and mathematics from Hebrew University and his master's degree from the Technion. After that, he went on to do his PhD at Tel Aviv University, where he worked on analyzing 3D data with graph neural networks under professor Alex Bronstein. For his postdoc, Or attended Stanford University studying under the legendary professor Leonidas Guibas, as well as working as part of FAIR, the research group of Meta, where he pushed the cutting edge of 3D data analysis. Or is an extremely accomplished researcher with research that focuses on 3D deep learning for scene understanding, point cloud analysis and shape analysis. In 2023, Or will be joining the Technion as an assistant professor.
ABOUT THE HOST
I’m Gil Elbaz, Co-founder and CTO of Datagen. In this podcast, I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain
Sep 14, 202201:24:30

Body Models Driving the Age of the Avatar – with Michael J. Black, Director, Perceiving Systems Department, Max Planck Institute for Intelligent Systems
Body Models Driving the Age of the Avatar – with Michael J. Black, Director, Perceiving Systems Department, Max Planck Institute for Intelligent Systems
In this episode of Unboxing AI, I host Michael J. Black from the Max Planck Institute. We speak about body models, his journeys in industry and academia, representing all human body types and the age of the avatar. Michael explains about the early days of computer vision, his experiences commercializing body models through his startup, Body Labs, and how the metaverse and our avatars will revolutionize our everyday lives.
Episode transcript and more at UnboxingAI.show
TOPICS & TIMESTAMPS
00:39 Guest Intro
01:41 What are body models and why are they so useful?
04:17 Human interpretability - important or not?
05:32 Real use cases for body models
10:54 History of body model development leading to SMPL
19:21 Body model development beyond SMPL: MANO, FLAME, SMPL-X, and more
22:11 Edge cases: dealing with unique body shapes
24:45 Early days of computer vision
27:37 Working at Xerox PARC
30:00 Shifting to academia
31:30 The vision for Perceiving Systems at MPI-IS
34:15 Innovation and team structure at Perceiving Systems
37:40 Perceiving Systems - similarities to a startup
40:38 Founding Body Labs
45:30 Body Labs' Acquisition by Amazon
47:24 Distinguished Amazon Scholar role
49:03 About Meshcapade
50:05 What is the metaverse?
50:56 The age of the avatar
56:32 Career Tips for Computer Vision Engineers
LINKS AND RESOURCES
Michael J. Black @ MPI-IS
LinkedIn
Google Scholar
Twitter
YouTube
Papers at CVPR 2022
BEV
OSSO
EMOCA
Body Models
SMPL
FLAME
MANO
SMPL-X
STAR
SCAPE
About Meshcapade
Website
GitHub
Instagram
About Perceiving Systems
Overview Video
Website
GUEST BIO
Our guest is Michael J. Black, one of the founding directors of the Max Planck Institute for Intelligent Systems in Tübingen, Germany. He completed his PhD in computer science at Yale University, his postdoc at the University of Toronto, and has co-authored over 200 peer-reviewed papers to date. His research focuses on understanding humans and their behavior in video, working at the boundary of computer vision, machine learning, and computer graphics. His work on realistic 3D human body models such as SMPL has been widely used in both academia and industry, and in 2017, the start-up he co-founded to commercialize these technologies was acquired by Amazon. Today, Michael and his teams at MPI are developing exciting new capabilities in computer vision that will be important for the future of 3D avatars, the metaverse and beyond.
ABOUT THE HOST
I’m Gil Elbaz, Co-founder and CTO of Datagen. In this podcast, I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain a different perspective, as well as enjoying engaging conversation. It’s about much more than the technical processes – it’s about people, journeys, and ideas. Turn up the volume, insights inside.
Jul 14, 202259:28

Solving Autonomous Driving At Scale – With Vijay Badrinarayanan, VP of AI, Wayve
Solving Autonomous Driving At Scale – With Vijay Badrinarayanan, VP of AI, Wayve
In this episode of Unboxing AI, meet Vijay Badrinarayanan, the VP of AI at Wayve, and learn about Wayve’s end-to-end machine learning approach to self-driving. Along the way, Vijay shares what it was like working for Magic Leap in the early days, and relates the research journey that led to SegNet.
TOPICS & TIMESTAMPS
00:47 Guest Intro
02:38 Academia & Classic Computer Vision
08:56 PostDoc @ Cambridge - Road scene segmentation
18:42 Technical Challenges Faced During Early Deep Computer Vision
20:24 Meeting Alex Kendall; SegNet
25:15 Transition from Academia to Production Computer Vision at Magic Leap
27:09 Deep Eye-Gaze Estimation at Magic Leap
33:21 Joining Wayve
36:09 AV 1.0: First-gen autonomy
40:08 On Tesla LIDARs and their unique approach to AV
46:37 Wayve's AV 2.0 Approach
48:42 Programming By Data / Data-as-Code
51:02 Addressing the Long Tail Problem in AV
53:13 Powering AV 2.0 with Simulation
58:30 Re-simulation, Closing the Loop & Testing Neural Networks
1:01:44 The Future of AI and Advanced Approaches
1:11:50 Are there other 2.0s? Next industries to revolutionize
1:13:48 Next Steps for Wayve
1:14:59 Human-level AI
1:16:35 Career Tips for Computer Vision Engineers
LINKS AND RESOURCES
- On The Guest - Vijay Badrinarayanan
LinkedIn: https://www.linkedin.com/in/vijay-badrinarayanan-6578692/
Twitter: https://twitter.com/vijaycivs
Google Scholar: https://scholar.google.com/citations?user=WuJckpkAAAAJ
- About Wayve
https://wayve.ai/
https://sifted.eu/articles/wayve-autonomous-driving/
AV 2.0 Technical Thesis - Reimagining an autonomous vehicle: https://arxiv.org/abs/2108.05805
- SegNet
Vijay & Alex Kendall together with Roberto Cipolla release a revolutionary paper on segmentation with a novel and practical deep fully convolutional NN architecture for semantic pixel-wise segmentation
https://ieeexplore.ieee.org/abstract/document/7803544/
- Good NeRF explainer here:
https://datagen.tech/guides/synthetic-data/neural-radiance-field-nerf/
- DALL-E 2
https://openai.com/dall-e-2/
- StyleGAN2
https://github.com/NVlabs/stylegan2
GUEST BIO
Vijay Badrinarayanan is VP of AI at Wayve, a company pioneering AI technology to enable autonomous vehicles to drive in complex urban environments. He has been at the forefront of deep learning and artificial intelligence (AI) research and product development from the inception of the new era of deep learning driven AI. His joint research work in semantic segmentation conducted in Cambridge University, along with Alex Kendall, CEO of Wayve, is one of the highly cited publications in deep learning. As Director of Deep Learning and Artificial Intelligence (AI) at Magic Leap Inc., California he led R&D teams to deliver impactful first of its kind deep neural network driven products for power constrained Mixed Reality headset applications. As VP of AI, Vijay aims to deepen Wayve’s investment in deep learning to develop the end-to-end learnt brains behind Wayve’s self-driving technology. He is actively building a vision and learning team at Mountain View, CA focusing on actively researched AI topics such as representation learning, simulation intelligence and combined vision and language models with a view towards making meaningful product impact and bringing this cutting-edge approach to AVs to market.
FULL TRANSCRIPT AND MORE AT:
https://unboxingai.show/
May 23, 202201:19:04

Saving Lives with Deep Learning & Robust Infrastructure - with Idan Bassuk, VP A.I., Aidoc
Saving Lives with Deep Learning & Robust Infrastructure - with Idan Bassuk, VP A.I., Aidoc
In this episode of Unboxing AI, I host Idan Bassuk, the VP A.I. at Aidoc, to chat about computer vision, NLP, and AI explainability in the medical field. During the episode, Idan discusses the challenges that should interest anyone who’s building an AI team for scale. For example: what type of roles does he have on his team? Is it engineering first or data first? Does the real world of production resemble the rich academic research in the medical space?
TOPICS & TIMESTAMPS
1:52 - Going from terror tunnel detection to tumor detection
4:00 - Aidoc’s medical imaging AI product in a nutshell
9:09 - Main challenges for Aidoc
10:50 - Data variability in the medical field
14:18 - Explainability in medical AI
17:02 - Incorporating SoTA
19:15 - The state of academia
21:30 - Data-centric AI
23:49 - AI org structure at Aidoc
27:08 - Building test sets
29:19 - Using NLP to accelerate algorithm development
31:18 - Finding the right data
35:53 - How choosing the right annotation method affects accuracy
40:43 - Bringing it to production - team coordination
45:11 - The importance of clean code and code review
51:43 - Hiring with extreme transparency
54:46 - On the role of AI software engineers
57:08 - CI/CD and reproducibility
58:32 - Working agile in AI
1:01:38 - Planning with uncertainty
LINKS AND RESOURCES
Idan Bassuk LI: linkedin.com/in/idanbassuk
Aidoc: aidoc.com
Stanford cs231n course, mentioned @56:46:
Course Lectures: bit.ly/cs231n-Karpathy
Course Homepage: cs231n.stanford.edu
GUEST BIO
Idan was Aidoc's first employee. He started as an AI Algorithm and Software Engineer. Today he leads the A.I. Group, a 90 people group that concentrates all the efforts required for A.I. from dataset development, through algorithm research and engineering, and up to deployment and continuous monitoring in production at a scale of over 500 medical centers worldwide. Before joining Aidoc, Idan served for 10 years in the Israeli Defense Force. He started in the elite technological Talpiot course and later served as a team leader in a special operations unit. Idan finished his service as a Head of the Technological Section, leading defensive technological projects which were awarded Israel’s most prestigious defense award, for the success in detection of tunnels crossing the border to Israel.
ABOUT THE HOST
I’m Gil Elbaz, Co-founder and CTO of Datagen. In this podcast, I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain a different perspective, as well as enjoying engaging conversation. It’s about much more than the technical processes – it’s about people, journeys, and ideas. Turn up the volume, insights inside.
FULL TRANSCRIPT AND MORE AT UNBOXINGAI.SHOW:
https://unboxingai.show/podcast-item/saving-lives-with-deep-learning-robust-infrustructure-idan-bassuk-aidoc/
Apr 25, 202201:06:57

The Pace of Progress in Academia and Industry - with Prof. Lihi Zelnik-Manor, Faculty of Electrical and Computer Engineering, Technion
The Pace of Progress in Academia and Industry - with Prof. Lihi Zelnik-Manor, Faculty of Electrical and Computer Engineering, Technion
Listen in as Prof. Lihi Zelnik-Manor shares insights from a career that bridges from academia to industry. Lihi walks us through the history of the computer vision development community to the interplay between the industry and academia today.
Lihi Zelnik-Manor is an Professor in the Faculty of Electrical and Computer Engineering at the Technion and former General Manager of Alibaba DAMO Israel Lab. Prof. Zelnik-Manor holds a PhD and MSc (with honors) in Computer Science from the Weizmann Institute of Science and a BSc (summa cum laude) in Mechanical Engineering from the Technion. Her main area of expertise is Computer Vision. Prof Zelnik-Manor has done extensive community contribution, serving as General Chair of CVPR 2021, Program Chair of CVPR 2016, Associate Editor at TPAMI, served multiple times as Area Chair at CVPR, ECCV and was on the award committee of ACCV'18 and CVPR'19. Looking forward she will serve as General Chair of ECCV'22 and as Program Chair of ICCV'25.
Mar 28, 202201:04:19

Sorted and Sifted Machine Learning - with Anthony Goldbloom, Kaggle
Sorted and Sifted Machine Learning - with Anthony Goldbloom, Kaggle
Are you up to solving a machine learning problem? If so, start on Kaggle. Anthony Goldbloom, co-founder and CEO of Kaggle, joins us to talk about what it takes to found a machine learning community. We answer the question of what happens when you cross a domain expert with a pragmatic problem solver. Anthony talks about the future of AI and computer vision, how important it is to learn through doing and what he is looking forward to in the next year.
Mar 03, 202238:35