SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Qualcomm Technologies, Inc.
July 19, 2022
@QCOMResearch
3D perception:
Cutting-edge AI research
to understand the 3D world
3
Today’s agenda
• Advantages of 3D perception over 2D
• The need for 3D perception
across applications
• Advancements in 3D perception
by Qualcomm AI Research
• Future 3D perception
research directions
• Questions?
4
We perceive
the world in 3D
3D perception offers
many benefits and new
capabilities over 2D
• 3D structure is more
reliable than a 2D image
for perception
• 3D provides confident
cues for object and
scene recognition
• 3D allows accurate size,
pose, and motion estimation
• 3D is needed for rendering
by light and RF signals
5
3D data
acquisition
Multiple methods
offering different
benefits
Passive sensing
Determining 3D from signals emitted
by an uncontrolled illuminator
Active sensing
Illuminating and reconstructing 3D
from reflected back signals
Acquisition Methods
Light: Time-of-flight, structured light, LiDAR
RF: Radar, SAR, THz imaging
Other: CT scan, MRI, ultrasound, …
Benefits
Works in dark and various conditions
DFS: Depth From Stereo; MVS: Multi-View Stereo; SAR: Synthetic-Aperture Radar
Computational Methods
Geometry: Traditional CV using multiple cameras (DFS, MVS)
Inference: Via shadow, blur, and motion parallax cues
Regression: From appearance to 3D (AI driven)
Benefits
Lower cost and lower power
6
XR Camera
Autonomous
vehicles
IOT Mobile
3D perception enables diverse applications
that make our lives better
7
7
3D perception greatly facilitates
immersive XR
6-DoF: 6 degrees-of-freedom
8
3D perception empowers
autonomous driving
3D map reconstruction
Vehicle positioning
Finding navigable road
surfaces and avoiding
obstacles
Detecting and estimating
trajectories of vehicles,
pedestrians, and other
objects for path planning
Long-range
near camera
LR radar
Rear-side
camera
SR radar
MR camera
LR camera
LR radar
SR radar
Side camera
Side camera
Configurable set of
calibrated camera, radar
and LiDAR sensors
C-V2X
9
9
3D perception is important for
robotics
Situational awareness
Scene and human interaction
Object, scene, and motion recognition
Motion planning and navigation
Obstacle avoidance
Delivery and factory automation
Pick-and-place and assembly
Grabbing and placing objects
Putting things together
10
10
Image quality
Denoising, deblurring, relighting, HDR, etc.
Filtering effects
Bokeh, depth-of-field, etc.
Content editing
Seamless object removal, beautification, etc.
3D perception vastly improves
computational photography and cinematography
11
11
3D perception is valuable in many more areas
3D perception is highly valuable in many more areas
Access control and
surveillance
E-commerce, virtual
try-out, and virtual
walkthrough
Inspection and
asset control
Inventory, port, mining,
and construction
management
Medical
and biology
12
12
Neuro SLAM
AI for simultaneous localization & mapping
3D scene in RF
Scene understanding using RF signals
Neural radiance fields
Networks to synthesize virtual views
3D imitation learning
Learning 3D robotics tasks from human videos
is crucial for
understanding
the 3D world
3D
perception
Object detection Scene understanding
3DR: 3D Reconstruction
Pose estimation
Finding orientation and
key-points of objects
Depth estimation & 3DR
Creating 3D models of scenes
and objects from 2D images
Neuro SLAM
AI for simultaneous localization & mapping
3D scene in RF
Scene understanding using RF signals
Neural radiance fields
Networks to synthesize virtual views
3D imitation learning
Learning 3D robotics tasks from human videos
Finding positions and regions
of individual objects
Decomposing a scene
into its 3D and physical components
13
By Qualcomm
AI Research
Leading
3D perception
research
Object detection Scene understanding
Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.
Pose estimation
Depth estimation & 3DR
Novel AI techniques
for 3D perception
Full-stack AI optimizations to enable
real-world deployments
Energy-efficient platform to make
3D perception ubiquitous
Supervised and self-supervised learning
for mono & stereo with transformers
Efficient networks that can interpret 3D human
body pose and hand pose from 2D images
Efficient neural architectures that
leverage sparsity and polar spaces
End-to-end trained pipeline for
room-layout, surface normal, albedo,
material, object, and lighting estimation
World’s first real-time monocular depth
estimation on phone that can create
3D from a single image
Top accurate detection of vehicles,
pedestrians, traffic signs on
LiDAR 3D point clouds
Computationally scalable architecture
that iteratively improves key-point
detection with less than 5mm error
World’s first transformer-based
solution for indoor scenes that enables
photorealistic object insertion
14
14
Image pixels are arranged on a uniform grid,
while 3D point cloud faces accessibility vs memory trade-off
Implementation
challenges
Data
challenges
Sparse vs volumetric
nature of 3D point cloud
Incompleteness in
3D acquisition
Availability of high-quality
3D video datasets
HW/SW platform
(memory, SDKs, tools)
Manipulation, viewpoint
management
Computational load
(training/inference) Mem
SDK
3D
perception
challenges
15
Enabling AI-based
self-supervised
depth estimation
from a single image
1: X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task Distillation, BMVC 2021
2: InverseForm: A Loss Function for Structured Boundary-Aware Segmentation, CVPR 2021
Raw video obtained from Cityscapes Benchmark: https://www.cityscapes-dataset.com/
Self-supervised training pipeline of X-Distill
Test pipeline
𝐼𝑠
Depth network
View synthesis Photometric loss
6DoF network
𝐼𝑡
መ
𝐼𝑡
𝑇𝑡→𝑠
𝐷𝑡
Pretrained
segmentation
network
Depth-to-segmentation
network
Segmentation
loss
𝐷𝑡
𝑆𝑡
𝐼𝑡
Smoothness loss
Trainable unit
Loss function
Semantic guidance
Geometric guidance
Depth network
No need for annotated data
• Self-supervised learning from
unlabeled monocular videos
• Utilizes geometric relationship
across video frames
Builds on semantics1
• Significantly improves accuracy by
using semantic segmentation2
Works on any neural network
architecture
• Modifies only training and requires
no additional inference computation
Enables automatic
domain adaptation
16
16
Raw video obtained from Cityscapes Benchmark: https://www.cityscapes-dataset.com/
Achieving SOTA accuracy with 90% less computation
PackNet
HR-Depth
Ours
17
Improved on-device efficiency
with neural architecture search
Pre-trained Model
Set of Pareto-optimal
network architectures
Diverse search
Supports diverse spaces with different
neural operations to find the best models
Low cost
Scales to many HW devices
at minimal cost
Scalable
Low start-up cost of 1000-4000 epochs,
equivalent to training 2-10 networks
from scratch
Reliable
Uses direct HW measurements
instead of a potentially
inaccurate HW model
Enabling real-time on-device use cases
Distilling Optimal Neural
Architectures (DONNA)1
Hardware aware
20-40% faster models
1: Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces, ICCV 2021
Snapdragon is a product of Qualcomm Technologies, Inc. and/or its subsidiaries.
Search space Accuracy prediction training
HW-aware search
18
18
1: X-Distill/DONNA demo at NeurIPS 2021
AIMET is a product of Qualcomm Innovation Center, Inc. Raw video obtained from Cityscapes Benchmark: https://www.cityscapes-dataset.com/
Running monocular depth estimation in real time
on Snapdragon mobile platform
Model Parameters (M) FPS Error
Monodepth2 (RN50) Quantized 32.0 23 0.83
Our X-Distill (RN50) Quantized 32.0 23 0.71
Our X-Distill DONNA Quantized1 3.7 35 0.75
~10X reduction
in parameters
More efficient
model
More accurate
model
• Quantization through AI Model Efficiency Toolkit (AIMET)
• 30+ FPS by using more efficient backbone optimized via
DONNA neural architecture search
19
Novel transformer architecture
leverages spatial self-attention
for depth network
• Smaller model that runs in real time
• Improved 3D recall in reconstructed meshes
During training:
• Self-supervised learning for fine-tuning
transformers to improve temporal
consistency and eliminate texture copy
• Sparse depth from 6DoF to resolve
global scale issues
Enhancing the model
with a new efficient
transformer architecture
Depth
network
Depth
network
Depth
network
Depth
network
Unified
mesh
Depth network
Position
embedding
Aff,
context
Multi-head attention
U
Decoder
Encoder
6DoF: 6 degrees-of-freedom
20
20
Qualcomm Hexagon is a product of Qualcomm Technologies, Inc. and/or its subsidiaries.
smaller model fits on a phone
processor and runs real-time on
Qualcomm® Hexagon™ Processor
26x
Model Parameters (M) 3DR recall
Monodepth2 (RN34) 15.0 0.653
DPT (MIDAS, transformer based) 123.0 0.926
Ours (transformer based) 4.7 0.930
Image from XR camera Monodepth2 (RN34) model Our transformer-based model
Our transformer-based model is
both smaller and more accurate
21
21
Our transformer-based model achieves
better visual quality than current SOTA
Time-of-flight Monodepth2
NeuralRecon Ours
Depth Network
Depth Network
Depth Network
Depth Network
Unified Mesh
22
Similar to human visual perception
Highly accurate disparity estimation
model using two images (left/right)
• Fits into memory by just-in-time computation
• Geometry is much sharper and more detailed
• Greater generalizability
Models leverage Snapdragon’s
heterogenous computing
• Real-time performance
• Subpixel disparity accuracy
Extends to motion parallax
From single image to
stereo depth estimation
for increased accuracy
Look-up
Just-in-time cost volumes
Look-up Look-up
RNN
Encoder
Encoder
Context
23
23
JiT: Just-in-time
SOTA: RAFT-Stereo
Our JiT stereo depth estimation model runs in real-time
20x faster than SOTA with similar accuracy on Hexagon Processor
Left image Right image
24
Fast Polar Attentive 3D Object Detection on LiDAR Point Clouds, NeurIPS MLAD
Enabling efficient
object detection
in 3D point clouds Positional
encoding
Patch
embedding
Polar
features
Queries
Encoder input
Transformer
Backbone
A transformer-based architecture
• Leverages 2D pseudo-image features
extracted in the polar space
• Reduces latency and memory usage
without sacrificing detection accuracy
• Sectors data for a stream-wise
processing to make predictions
without requiring a complete
360 LiDAR scan
25
25
Fast Polar Attentive 3D Object Detection on LiDAR Point Clouds, NeurIPS MLAD workshop, 2021
Smaller, faster, lower power model that achieves the top precision
Model Parameters (M) GFLOPs Inference time (ms) Accuracy (AP)
PointRCNN 2.20 25 620 90.34
PV-RCNN 13.10 69 80 92.24
ComplexYolo 65.50 31 19 75.32
PointPillars 1.43 32 16 88.36
Ours 0.59 6 14 94.70
26
Dynamic iterative refinement for efficient 3D hand pose
estimation, WACV 2022
Dynamic
refinements
to reduce size
and latency
for hand pose
estimation
Attention
map
Refiner
Iterative refinement
Backbone
FC
FC
Variance
maps
HAND
model
3D pose
2D pose
Pose predictor
Ground truth
Ours
Lightweight architecture:
applies recursively while
incorporating attention
and gating for dynamic
refinement
Eliminates the need for
precise hand detection
27
27
Methods AUC (20-50) GFLOPs #Params
Z&B 0.948 78.2 -
Liu et al. 0.964 16.0 -
HAMR 0.982 8.0 -
Cai et al. 0.995 6.2 4.53M
Fan et al. 0.996 1.6 4.76M
Ours 0.997 1.3 1.68M
Methods AUC (0-50) GFLOPs #Params
Tekin et al. 0.653 13.62 14.31M
Fan et al. 0.731 1.37 5.76M
Ours 0.768 0.28 0.46M
Significant reduction in GFLOPs
while achieving better accuracy
AUC: Area under curve
Dynamic iterative refinement for efficient 3D hand pose estimation, WACV 2022
Our method also achieves the best average 3D error
9.76mm (SOTA) → 7.24mm (Ours)
Heatmaps for hand landmark
points from our model
28
IRISformer: Dense Vision Transformers for Single-Image
Inverse Rendering in Indoor Scenes, CVPR 2022 – in
collaboration with Professor Manmohan Chandraker
World’s first
transformer-based
inverse rendering
for scene
understanding
Backbone
TA
Dense Multi Transformer
HeadA
TR
TD
TN
Encoder
T0
TR
TD
TN
Decoder
HeadR
HeadD
HeadN
Tokens
Albedo
Roughness
Depth
Normals
Encoder
Decoder-f
Decoder-
Decoder-
Head
Head
Headf
Renderer
Lighting
BRDFGeoNet
LigthNet
Tokens
Estimates physically-based
scene attributes from
an indoor image
• End-to-end trained pipeline for
room-layout, surface normal, albedo,
material, object, and lighting estimation
• Leads to better handling of global
interactions between scene components,
achieving better disambiguation of
shape, material, and lighting
• SOTA results on all 3D perception tasks
and enables high-quality AR applications
such as object insertion
29
29
Ours
Ours
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes, CVPR 2022 – in collaboration with Professor Manmohan Chandraker
Our model finds more accurate scene attributes compared to SOTA
Small insets are refined estimations obtained by further post-processing with bilateral solvers (BS).
30
30
Focuses attention on:
1. Chairs and window
2. Highlighted regions over the image
3. Entire floor
4. The chair itself
Focuses attention on:
1. Neighboring shadowed areas of the wall
2. The entire wall
3. Potential light sources and occluders
4. Ambient environment
Attention Maps (Albedo)
Input images
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes, CVPR 2022 – in collaboration with Professor Manmohan Chandraker
Transformer algorithm automatically learns attention maps to
determine the important areas of the image
31
31
Our method correctly estimates lighting to realistically insert objects
Barron et al. 2013 Gardner et al. 2017 Li et al. 2021 Ours
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes, CVPR 2022 – in collaboration with Professor Manmohan Chandraker
32
33
Coming soon
3D
perception
innovations
Neuro-SLAM
• Greatly facilitates XR, autonomous vehicles, and robotic vision
• Our goal is to create 3D maps in real time on the device
3D imitation learning
• Learning complex dexterous skills in 3D spaces from human videos
• Our goal is to enable such learning in real-time
3D scene understanding in RF (Wi-Fi/5G)
• Our goal is to achieve floorplan estimation, human detection,
tracking, and pose using only RF signals
Neural radiance fields (NeRF)
• Single object/scene → single model: virtual images from any viewpoint
• Our goal is to run NeRF in real-time on mobile platforms
34
34
We are conducting
leading research to
enable 3D perception
Thanks to our full-stack
AI research, we are first to
demonstrate 3D perception
proof-of-concepts on
edge devices
We are solving system
and feasibility challenges
to move from research
to commercialization
35
www.qualcomm.com/research/
artificial-intelligence
@QCOMResearch
www.qualcomm.com/news/onq
www.youtube.com/c/QualcommResearch www.slideshare.net/qualcommwirelessevolution
Connect with us
Questions
Follow us on:
For more information, visit us at:
qualcomm.com & qualcomm.com/blog
Thank you
Nothing in these materials is an offer to sell any of the components
or devices referenced herein.
©2018-2022 Qualcomm Technologies, Inc. and/or its affiliated
companies. All Rights Reserved.
Qualcomm, Snapdragon, and Hexagon are trademarks or registered
trademarks of Qualcomm Incorporated. Other products and brand names
may be trademarks or registered trademarks of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated,
Qualcomm Technologies, Inc., and/or other subsidiaries or business units within
the Qualcomm corporate structure, as applicable. Qualcomm Incorporated
includes our licensing business, QTL, and the vast majority of our patent portfolio.
Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates,
along with its subsidiaries, substantially all of our engineering, research and
development functions, and substantially all of our products and services businesses,
including our QCT semiconductor business.

Mais conteúdo relacionado

Mais procurados

Introduction to image processing
Introduction to image processingIntroduction to image processing
Introduction to image processingThomas K T
 
An Introduction to Image Processing and Artificial Intelligence
An Introduction to Image Processing and Artificial IntelligenceAn Introduction to Image Processing and Artificial Intelligence
An Introduction to Image Processing and Artificial IntelligenceWasif Altaf
 
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...Hyeongmin Lee
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationVikas Jain
 
Sensors On 3d Digitization
Sensors On 3d DigitizationSensors On 3d Digitization
Sensors On 3d DigitizationRajan Kumar
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAMYu Huang
 
Introductory Level of SLAM Seminar
Introductory Level of SLAM SeminarIntroductory Level of SLAM Seminar
Introductory Level of SLAM SeminarDong-Won Shin
 
Augmented reality
Augmented reality Augmented reality
Augmented reality pneumonia
 
Research Directions in Transitional Interfaces
Research Directions in Transitional InterfacesResearch Directions in Transitional Interfaces
Research Directions in Transitional InterfacesMark Billinghurst
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIYu Huang
 
Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIYu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
State of mago3D, An Open Source Based Digital Twin Platform
State of mago3D, An Open Source Based Digital Twin PlatformState of mago3D, An Open Source Based Digital Twin Platform
State of mago3D, An Open Source Based Digital Twin PlatformSANGHEE SHIN
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningYu Huang
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving IYu Huang
 
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...Edge AI and Vision Alliance
 

Mais procurados (20)

Introduction to image processing
Introduction to image processingIntroduction to image processing
Introduction to image processing
 
An Introduction to Image Processing and Artificial Intelligence
An Introduction to Image Processing and Artificial IntelligenceAn Introduction to Image Processing and Artificial Intelligence
An Introduction to Image Processing and Artificial Intelligence
 
Imax technology
Imax technology Imax technology
Imax technology
 
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
 
Sensors On 3d Digitization
Sensors On 3d DigitizationSensors On 3d Digitization
Sensors On 3d Digitization
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
Introductory Level of SLAM Seminar
Introductory Level of SLAM SeminarIntroductory Level of SLAM Seminar
Introductory Level of SLAM Seminar
 
Screenless display
Screenless displayScreenless display
Screenless display
 
Presentation of Visual Tracking
Presentation of Visual TrackingPresentation of Visual Tracking
Presentation of Visual Tracking
 
Augmented reality
Augmented reality Augmented reality
Augmented reality
 
Research Directions in Transitional Interfaces
Research Directions in Transitional InterfacesResearch Directions in Transitional Interfaces
Research Directions in Transitional Interfaces
 
fusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving IIfusion of Camera and lidar for autonomous driving II
fusion of Camera and lidar for autonomous driving II
 
Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors II
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
State of mago3D, An Open Source Based Digital Twin Platform
State of mago3D, An Open Source Based Digital Twin PlatformState of mago3D, An Open Source Based Digital Twin Platform
State of mago3D, An Open Source Based Digital Twin Platform
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
fusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving Ifusion of Camera and lidar for autonomous driving I
fusion of Camera and lidar for autonomous driving I
 
Rasterization
RasterizationRasterization
Rasterization
 
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
 

Semelhante a Understanding the world in 3D with AI.pdf

"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro..."High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...Edge AI and Vision Alliance
 
Emerging vision technologies
Emerging vision technologiesEmerging vision technologies
Emerging vision technologiesQualcomm Research
 
Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...
Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...
Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...IRJET Journal
 
DISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERAS
DISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERASDISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERAS
DISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERAScscpconf
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IVYu Huang
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 
A Vision-Based Mobile Platform for Seamless Indoor/Outdoor Positioning
A Vision-Based Mobile Platform for Seamless Indoor/Outdoor PositioningA Vision-Based Mobile Platform for Seamless Indoor/Outdoor Positioning
A Vision-Based Mobile Platform for Seamless Indoor/Outdoor PositioningGuillaume Gales
 
3D Laser Scanning – All You Should Know About
3D Laser Scanning – All You Should Know About3D Laser Scanning – All You Should Know About
3D Laser Scanning – All You Should Know AboutKevinSmith511452
 
Presentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking ProjectPresentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking ProjectPrathamesh Joshi
 
Indoor 3 d video monitoring using multiple kinect depth cameras
Indoor 3 d video monitoring using multiple kinect depth camerasIndoor 3 d video monitoring using multiple kinect depth cameras
Indoor 3 d video monitoring using multiple kinect depth camerasijma
 
Indoor 3D Video Monitoring Using Multiple Kinect Depth-Cameras
Indoor 3D Video Monitoring Using Multiple Kinect Depth-CamerasIndoor 3D Video Monitoring Using Multiple Kinect Depth-Cameras
Indoor 3D Video Monitoring Using Multiple Kinect Depth-Camerasijma
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Yu Huang
 
MEMS Laser Scanning, the platform for next generation of 3D Depth Sensors
MEMS Laser Scanning, the platform for next generation of 3D Depth SensorsMEMS Laser Scanning, the platform for next generation of 3D Depth Sensors
MEMS Laser Scanning, the platform for next generation of 3D Depth SensorsJari Honkanen
 
iMinds insights - 3D Visualization Technologies
iMinds insights - 3D Visualization TechnologiesiMinds insights - 3D Visualization Technologies
iMinds insights - 3D Visualization TechnologiesiMindsinsights
 
Enhanced real time semantic segmentation
Enhanced real time semantic segmentationEnhanced real time semantic segmentation
Enhanced real time semantic segmentationAkankshaRawat42
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsPetteriTeikariPhD
 
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...AugmentedWorldExpo
 
Ieee 2015 16 matlab @dreamweb techno solutions-trichy
Ieee 2015 16 matlab  @dreamweb techno solutions-trichyIeee 2015 16 matlab  @dreamweb techno solutions-trichy
Ieee 2015 16 matlab @dreamweb techno solutions-trichysubhu8430
 

Semelhante a Understanding the world in 3D with AI.pdf (20)

"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro..."High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
"High-resolution 3D Reconstruction on a Mobile Processor," a Presentation fro...
 
Emerging vision technologies
Emerging vision technologiesEmerging vision technologies
Emerging vision technologies
 
Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...
Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...
Inspection of Suspicious Human Activity in the Crowd Sourced Areas Captured i...
 
DISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERAS
DISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERASDISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERAS
DISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERAS
 
EN_3DLS_Oil_Gas_Offshore
EN_3DLS_Oil_Gas_OffshoreEN_3DLS_Oil_Gas_Offshore
EN_3DLS_Oil_Gas_Offshore
 
3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV3-d interpretation from single 2-d image IV
3-d interpretation from single 2-d image IV
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
A Vision-Based Mobile Platform for Seamless Indoor/Outdoor Positioning
A Vision-Based Mobile Platform for Seamless Indoor/Outdoor PositioningA Vision-Based Mobile Platform for Seamless Indoor/Outdoor Positioning
A Vision-Based Mobile Platform for Seamless Indoor/Outdoor Positioning
 
3D Laser Scanning – All You Should Know About
3D Laser Scanning – All You Should Know About3D Laser Scanning – All You Should Know About
3D Laser Scanning – All You Should Know About
 
Presentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking ProjectPresentation Object Recognition And Tracking Project
Presentation Object Recognition And Tracking Project
 
Indoor 3 d video monitoring using multiple kinect depth cameras
Indoor 3 d video monitoring using multiple kinect depth camerasIndoor 3 d video monitoring using multiple kinect depth cameras
Indoor 3 d video monitoring using multiple kinect depth cameras
 
Indoor 3D Video Monitoring Using Multiple Kinect Depth-Cameras
Indoor 3D Video Monitoring Using Multiple Kinect Depth-CamerasIndoor 3D Video Monitoring Using Multiple Kinect Depth-Cameras
Indoor 3D Video Monitoring Using Multiple Kinect Depth-Cameras
 
Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)Lidar for Autonomous Driving II (via Deep Learning)
Lidar for Autonomous Driving II (via Deep Learning)
 
MEMS Laser Scanning, the platform for next generation of 3D Depth Sensors
MEMS Laser Scanning, the platform for next generation of 3D Depth SensorsMEMS Laser Scanning, the platform for next generation of 3D Depth Sensors
MEMS Laser Scanning, the platform for next generation of 3D Depth Sensors
 
iMinds insights - 3D Visualization Technologies
iMinds insights - 3D Visualization TechnologiesiMinds insights - 3D Visualization Technologies
iMinds insights - 3D Visualization Technologies
 
Enhanced real time semantic segmentation
Enhanced real time semantic segmentationEnhanced real time semantic segmentation
Enhanced real time semantic segmentation
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
 
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...
 
Virtual Reality(full)
Virtual Reality(full)Virtual Reality(full)
Virtual Reality(full)
 
Ieee 2015 16 matlab @dreamweb techno solutions-trichy
Ieee 2015 16 matlab  @dreamweb techno solutions-trichyIeee 2015 16 matlab  @dreamweb techno solutions-trichy
Ieee 2015 16 matlab @dreamweb techno solutions-trichy
 

Mais de Qualcomm Research

Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdfQualcomm Research
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
 
Why and what you need to know about 6G in 2022
Why and what you need to know about 6G in 2022Why and what you need to know about 6G in 2022
Why and what you need to know about 6G in 2022Qualcomm Research
 
Enabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdfEnabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdfQualcomm Research
 
Presentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AIPresentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AIQualcomm Research
 
Bringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensingBringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensingQualcomm Research
 
How will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdfHow will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdfQualcomm Research
 
Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)Qualcomm Research
 
Realizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5GRealizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5GQualcomm Research
 
3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolution3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolutionQualcomm Research
 
AI firsts: Leading from research to proof-of-concept
AI firsts: Leading from research to proof-of-conceptAI firsts: Leading from research to proof-of-concept
AI firsts: Leading from research to proof-of-conceptQualcomm Research
 
Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18Qualcomm Research
 
5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edge5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edgeQualcomm Research
 
Enabling on-device learning at scale
Enabling on-device learning at scaleEnabling on-device learning at scale
Enabling on-device learning at scaleQualcomm Research
 
The essential role of AI in the 5G future
The essential role of AI in the 5G futureThe essential role of AI in the 5G future
The essential role of AI in the 5G futureQualcomm Research
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsHow AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsQualcomm Research
 
Role of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous drivingRole of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous drivingQualcomm Research
 
Intelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiencyIntelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiencyQualcomm Research
 

Mais de Qualcomm Research (20)

Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
 
The future of AI is hybrid
The future of AI is hybridThe future of AI is hybrid
The future of AI is hybrid
 
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
 
Why and what you need to know about 6G in 2022
Why and what you need to know about 6G in 2022Why and what you need to know about 6G in 2022
Why and what you need to know about 6G in 2022
 
Enabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdfEnabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdf
 
Presentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AIPresentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AI
 
Bringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensingBringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensing
 
How will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdfHow will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdf
 
Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)
 
Realizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5GRealizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5G
 
3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolution3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolution
 
AI firsts: Leading from research to proof-of-concept
AI firsts: Leading from research to proof-of-conceptAI firsts: Leading from research to proof-of-concept
AI firsts: Leading from research to proof-of-concept
 
Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18
 
5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edge5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edge
 
Enabling on-device learning at scale
Enabling on-device learning at scaleEnabling on-device learning at scale
Enabling on-device learning at scale
 
The essential role of AI in the 5G future
The essential role of AI in the 5G futureThe essential role of AI in the 5G future
The essential role of AI in the 5G future
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsHow AI research is enabling next-gen codecs
How AI research is enabling next-gen codecs
 
Role of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous drivingRole of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous driving
 
Pioneering 5G broadcast
Pioneering 5G broadcastPioneering 5G broadcast
Pioneering 5G broadcast
 
Intelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiencyIntelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiency
 

Último

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Último (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Understanding the world in 3D with AI.pdf

  • 1. Qualcomm Technologies, Inc. July 19, 2022 @QCOMResearch 3D perception: Cutting-edge AI research to understand the 3D world
  • 2. 3 Today’s agenda • Advantages of 3D perception over 2D • The need for 3D perception across applications • Advancements in 3D perception by Qualcomm AI Research • Future 3D perception research directions • Questions?
  • 3. 4 We perceive the world in 3D 3D perception offers many benefits and new capabilities over 2D • 3D structure is more reliable than a 2D image for perception • 3D provides confident cues for object and scene recognition • 3D allows accurate size, pose, and motion estimation • 3D is needed for rendering by light and RF signals
  • 4. 5 3D data acquisition Multiple methods offering different benefits Passive sensing Determining 3D from signals emitted by an uncontrolled illuminator Active sensing Illuminating and reconstructing 3D from reflected back signals Acquisition Methods Light: Time-of-flight, structured light, LiDAR RF: Radar, SAR, THz imaging Other: CT scan, MRI, ultrasound, … Benefits Works in dark and various conditions DFS: Depth From Stereo; MVS: Multi-View Stereo; SAR: Synthetic-Aperture Radar Computational Methods Geometry: Traditional CV using multiple cameras (DFS, MVS) Inference: Via shadow, blur, and motion parallax cues Regression: From appearance to 3D (AI driven) Benefits Lower cost and lower power
  • 5. 6 XR Camera Autonomous vehicles IOT Mobile 3D perception enables diverse applications that make our lives better
  • 6. 7 7 3D perception greatly facilitates immersive XR 6-DoF: 6 degrees-of-freedom
  • 7. 8 3D perception empowers autonomous driving 3D map reconstruction Vehicle positioning Finding navigable road surfaces and avoiding obstacles Detecting and estimating trajectories of vehicles, pedestrians, and other objects for path planning Long-range near camera LR radar Rear-side camera SR radar MR camera LR camera LR radar SR radar Side camera Side camera Configurable set of calibrated camera, radar and LiDAR sensors C-V2X
  • 8. 9 9 3D perception is important for robotics Situational awareness Scene and human interaction Object, scene, and motion recognition Motion planning and navigation Obstacle avoidance Delivery and factory automation Pick-and-place and assembly Grabbing and placing objects Putting things together
  • 9. 10 10 Image quality Denoising, deblurring, relighting, HDR, etc. Filtering effects Bokeh, depth-of-field, etc. Content editing Seamless object removal, beautification, etc. 3D perception vastly improves computational photography and cinematography
  • 10. 11 11 3D perception is valuable in many more areas 3D perception is highly valuable in many more areas Access control and surveillance E-commerce, virtual try-out, and virtual walkthrough Inspection and asset control Inventory, port, mining, and construction management Medical and biology
  • 11. 12 12 Neuro SLAM AI for simultaneous localization & mapping 3D scene in RF Scene understanding using RF signals Neural radiance fields Networks to synthesize virtual views 3D imitation learning Learning 3D robotics tasks from human videos is crucial for understanding the 3D world 3D perception Object detection Scene understanding 3DR: 3D Reconstruction Pose estimation Finding orientation and key-points of objects Depth estimation & 3DR Creating 3D models of scenes and objects from 2D images Neuro SLAM AI for simultaneous localization & mapping 3D scene in RF Scene understanding using RF signals Neural radiance fields Networks to synthesize virtual views 3D imitation learning Learning 3D robotics tasks from human videos Finding positions and regions of individual objects Decomposing a scene into its 3D and physical components
  • 12. 13 By Qualcomm AI Research Leading 3D perception research Object detection Scene understanding Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc. Pose estimation Depth estimation & 3DR Novel AI techniques for 3D perception Full-stack AI optimizations to enable real-world deployments Energy-efficient platform to make 3D perception ubiquitous Supervised and self-supervised learning for mono & stereo with transformers Efficient networks that can interpret 3D human body pose and hand pose from 2D images Efficient neural architectures that leverage sparsity and polar spaces End-to-end trained pipeline for room-layout, surface normal, albedo, material, object, and lighting estimation World’s first real-time monocular depth estimation on phone that can create 3D from a single image Top accurate detection of vehicles, pedestrians, traffic signs on LiDAR 3D point clouds Computationally scalable architecture that iteratively improves key-point detection with less than 5mm error World’s first transformer-based solution for indoor scenes that enables photorealistic object insertion
  • 13. 14 14 Image pixels are arranged on a uniform grid, while 3D point cloud faces accessibility vs memory trade-off Implementation challenges Data challenges Sparse vs volumetric nature of 3D point cloud Incompleteness in 3D acquisition Availability of high-quality 3D video datasets HW/SW platform (memory, SDKs, tools) Manipulation, viewpoint management Computational load (training/inference) Mem SDK 3D perception challenges
  • 14. 15 Enabling AI-based self-supervised depth estimation from a single image 1: X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task Distillation, BMVC 2021 2: InverseForm: A Loss Function for Structured Boundary-Aware Segmentation, CVPR 2021 Raw video obtained from Cityscapes Benchmark: https://www.cityscapes-dataset.com/ Self-supervised training pipeline of X-Distill Test pipeline 𝐼𝑠 Depth network View synthesis Photometric loss 6DoF network 𝐼𝑡 መ 𝐼𝑡 𝑇𝑡→𝑠 𝐷𝑡 Pretrained segmentation network Depth-to-segmentation network Segmentation loss 𝐷𝑡 𝑆𝑡 𝐼𝑡 Smoothness loss Trainable unit Loss function Semantic guidance Geometric guidance Depth network No need for annotated data • Self-supervised learning from unlabeled monocular videos • Utilizes geometric relationship across video frames Builds on semantics1 • Significantly improves accuracy by using semantic segmentation2 Works on any neural network architecture • Modifies only training and requires no additional inference computation Enables automatic domain adaptation
  • 15. 16 16 Raw video obtained from Cityscapes Benchmark: https://www.cityscapes-dataset.com/ Achieving SOTA accuracy with 90% less computation PackNet HR-Depth Ours
  • 16. 17 Improved on-device efficiency with neural architecture search Pre-trained Model Set of Pareto-optimal network architectures Diverse search Supports diverse spaces with different neural operations to find the best models Low cost Scales to many HW devices at minimal cost Scalable Low start-up cost of 1000-4000 epochs, equivalent to training 2-10 networks from scratch Reliable Uses direct HW measurements instead of a potentially inaccurate HW model Enabling real-time on-device use cases Distilling Optimal Neural Architectures (DONNA)1 Hardware aware 20-40% faster models 1: Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces, ICCV 2021 Snapdragon is a product of Qualcomm Technologies, Inc. and/or its subsidiaries. Search space Accuracy prediction training HW-aware search
  • 17. 18 18 1: X-Distill/DONNA demo at NeurIPS 2021 AIMET is a product of Qualcomm Innovation Center, Inc. Raw video obtained from Cityscapes Benchmark: https://www.cityscapes-dataset.com/ Running monocular depth estimation in real time on Snapdragon mobile platform Model Parameters (M) FPS Error Monodepth2 (RN50) Quantized 32.0 23 0.83 Our X-Distill (RN50) Quantized 32.0 23 0.71 Our X-Distill DONNA Quantized1 3.7 35 0.75 ~10X reduction in parameters More efficient model More accurate model • Quantization through AI Model Efficiency Toolkit (AIMET) • 30+ FPS by using more efficient backbone optimized via DONNA neural architecture search
  • 18. 19 Novel transformer architecture leverages spatial self-attention for depth network • Smaller model that runs in real time • Improved 3D recall in reconstructed meshes During training: • Self-supervised learning for fine-tuning transformers to improve temporal consistency and eliminate texture copy • Sparse depth from 6DoF to resolve global scale issues Enhancing the model with a new efficient transformer architecture Depth network Depth network Depth network Depth network Unified mesh Depth network Position embedding Aff, context Multi-head attention U Decoder Encoder 6DoF: 6 degrees-of-freedom
  • 19. 20 20 Qualcomm Hexagon is a product of Qualcomm Technologies, Inc. and/or its subsidiaries. smaller model fits on a phone processor and runs real-time on Qualcomm® Hexagon™ Processor 26x Model Parameters (M) 3DR recall Monodepth2 (RN34) 15.0 0.653 DPT (MIDAS, transformer based) 123.0 0.926 Ours (transformer based) 4.7 0.930 Image from XR camera Monodepth2 (RN34) model Our transformer-based model Our transformer-based model is both smaller and more accurate
  • 20. 21 21 Our transformer-based model achieves better visual quality than current SOTA Time-of-flight Monodepth2 NeuralRecon Ours Depth Network Depth Network Depth Network Depth Network Unified Mesh
  • 21. 22 Similar to human visual perception Highly accurate disparity estimation model using two images (left/right) • Fits into memory by just-in-time computation • Geometry is much sharper and more detailed • Greater generalizability Models leverage Snapdragon’s heterogenous computing • Real-time performance • Subpixel disparity accuracy Extends to motion parallax From single image to stereo depth estimation for increased accuracy Look-up Just-in-time cost volumes Look-up Look-up RNN Encoder Encoder Context
  • 22. 23 23 JiT: Just-in-time SOTA: RAFT-Stereo Our JiT stereo depth estimation model runs in real-time 20x faster than SOTA with similar accuracy on Hexagon Processor Left image Right image
  • 23. 24 Fast Polar Attentive 3D Object Detection on LiDAR Point Clouds, NeurIPS MLAD Enabling efficient object detection in 3D point clouds Positional encoding Patch embedding Polar features Queries Encoder input Transformer Backbone A transformer-based architecture • Leverages 2D pseudo-image features extracted in the polar space • Reduces latency and memory usage without sacrificing detection accuracy • Sectors data for a stream-wise processing to make predictions without requiring a complete 360 LiDAR scan
  • 24. 25 25 Fast Polar Attentive 3D Object Detection on LiDAR Point Clouds, NeurIPS MLAD workshop, 2021 Smaller, faster, lower power model that achieves the top precision Model Parameters (M) GFLOPs Inference time (ms) Accuracy (AP) PointRCNN 2.20 25 620 90.34 PV-RCNN 13.10 69 80 92.24 ComplexYolo 65.50 31 19 75.32 PointPillars 1.43 32 16 88.36 Ours 0.59 6 14 94.70
  • 25. 26 Dynamic iterative refinement for efficient 3D hand pose estimation, WACV 2022 Dynamic refinements to reduce size and latency for hand pose estimation Attention map Refiner Iterative refinement Backbone FC FC Variance maps HAND model 3D pose 2D pose Pose predictor Ground truth Ours Lightweight architecture: applies recursively while incorporating attention and gating for dynamic refinement Eliminates the need for precise hand detection
  • 26. 27 27 Methods AUC (20-50) GFLOPs #Params Z&B 0.948 78.2 - Liu et al. 0.964 16.0 - HAMR 0.982 8.0 - Cai et al. 0.995 6.2 4.53M Fan et al. 0.996 1.6 4.76M Ours 0.997 1.3 1.68M Methods AUC (0-50) GFLOPs #Params Tekin et al. 0.653 13.62 14.31M Fan et al. 0.731 1.37 5.76M Ours 0.768 0.28 0.46M Significant reduction in GFLOPs while achieving better accuracy AUC: Area under curve Dynamic iterative refinement for efficient 3D hand pose estimation, WACV 2022 Our method also achieves the best average 3D error 9.76mm (SOTA) → 7.24mm (Ours) Heatmaps for hand landmark points from our model
  • 27. 28 IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes, CVPR 2022 – in collaboration with Professor Manmohan Chandraker World’s first transformer-based inverse rendering for scene understanding Backbone TA Dense Multi Transformer HeadA TR TD TN Encoder T0 TR TD TN Decoder HeadR HeadD HeadN Tokens Albedo Roughness Depth Normals Encoder Decoder-f Decoder- Decoder- Head Head Headf Renderer Lighting BRDFGeoNet LigthNet Tokens Estimates physically-based scene attributes from an indoor image • End-to-end trained pipeline for room-layout, surface normal, albedo, material, object, and lighting estimation • Leads to better handling of global interactions between scene components, achieving better disambiguation of shape, material, and lighting • SOTA results on all 3D perception tasks and enables high-quality AR applications such as object insertion
  • 28. 29 29 Ours Ours IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes, CVPR 2022 – in collaboration with Professor Manmohan Chandraker Our model finds more accurate scene attributes compared to SOTA Small insets are refined estimations obtained by further post-processing with bilateral solvers (BS).
  • 29. 30 30 Focuses attention on: 1. Chairs and window 2. Highlighted regions over the image 3. Entire floor 4. The chair itself Focuses attention on: 1. Neighboring shadowed areas of the wall 2. The entire wall 3. Potential light sources and occluders 4. Ambient environment Attention Maps (Albedo) Input images IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes, CVPR 2022 – in collaboration with Professor Manmohan Chandraker Transformer algorithm automatically learns attention maps to determine the important areas of the image
  • 30. 31 31 Our method correctly estimates lighting to realistically insert objects Barron et al. 2013 Gardner et al. 2017 Li et al. 2021 Ours IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes, CVPR 2022 – in collaboration with Professor Manmohan Chandraker
  • 31. 32
  • 32. 33 Coming soon 3D perception innovations Neuro-SLAM • Greatly facilitates XR, autonomous vehicles, and robotic vision • Our goal is to create 3D maps in real time on the device 3D imitation learning • Learning complex dexterous skills in 3D spaces from human videos • Our goal is to enable such learning in real-time 3D scene understanding in RF (Wi-Fi/5G) • Our goal is to achieve floorplan estimation, human detection, tracking, and pose using only RF signals Neural radiance fields (NeRF) • Single object/scene → single model: virtual images from any viewpoint • Our goal is to run NeRF in real-time on mobile platforms
  • 33. 34 34 We are conducting leading research to enable 3D perception Thanks to our full-stack AI research, we are first to demonstrate 3D perception proof-of-concepts on edge devices We are solving system and feasibility challenges to move from research to commercialization
  • 35. Follow us on: For more information, visit us at: qualcomm.com & qualcomm.com/blog Thank you Nothing in these materials is an offer to sell any of the components or devices referenced herein. ©2018-2022 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm, Snapdragon, and Hexagon are trademarks or registered trademarks of Qualcomm Incorporated. Other products and brand names may be trademarks or registered trademarks of their respective owners. References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business.