Autonomous Vehicle Webinar. Crash course in AVs: high-level overview, technology deep-dives, and trends. Follow me on Twitter at https://twitter.com/wileycwj.
Link to YouTube Video: https://www.youtube.com/watch?v=CruCp6vqPQs
Google Slides: https://docs.google.com/presentation/d/1-ZWAXEH-5Xu7_zts-rGhNwan14VH841llZwrHGT_9dQ/edit?usp=sharing
Autonomous Vehicles: the Intersection of Robotics and Artificial Intelligence
1. Autonomous
Vehicles Webinar
The intersection of robotics and
artificial intelligence
Streaming live via Hangouts
8pm CT - August 28th, 2016
Undergraduate student at University
of Illinois at Urbana - Champaign,
Class of 2017
B.S. Mechanical Engineering, Minor
in Electrical Engineering
Previous: PwC, Cummins, UIUC RA
2. Overview
I. What is an AV?
II. Technology
A. AI + Robotics = AVs
B. “Self-Driving Stack”
1. Sensing
2. Processing
3. Actuation
III. Up Next
3. What is an autonomous vehicle (AV) ?
Within the context of this discussion are focusing of
roadway motor vehicles.
AVs at their simplest would be a car with cruise-
control capability. At its most complex is an entirely
driverless vehicle.
Much like everything else in tech, there is a lot of
contention on how the classification should be
structured. What is ‘full autonomy’, etc? Thankfully,
the U.S. Dept. of Transportation developed an official
tiering with very clear distinctions.
Autonomous vehicles (AVs) are vehicles that are
capable movement with limited or no outside
instruction or intervention.
4. Autonomy, per the U.S. Dept. of Transportation:
SOURCE:
http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Departmen
t+of+Transportation+Releases+Policy+on+Automated+Vehicle+Develo
pment
Tier 1
Automation at this level involves one or more specific control functions. Examples include
electronic stability control or pre-charged brakes, where the vehicle automatically assists
with braking to enable the driver to regain control of the vehicle or stop faster than possible
by acting alone.
Tier 2
This level involves automation of at least two primary control functions designed to work in
unison to relieve the driver of control of those functions. An example of combined functions
enabling a Level 2 system is adaptive cruise control in combination with lane centering.
Tier 3
Vehicles at this level of automation enable the driver to cede full control of all safety-critical
functions under certain traffic or environmental conditions and in those conditions to rely
heavily on the vehicle to monitor for changes in those conditions requiring transition back to
driver control. The driver is expected to be available for occasional control, but with
sufficiently comfortable transition time.
Tier 4
The vehicle is designed to perform all safety-critical driving functions and monitor roadway
conditions entirely. The driver could provide destination input and is not expected to be
available for control at any time during the trip. This includes unoccupied vehicles.
6. The intersection of artificial intelligence and robotics
An intelligent system that is
capable of taking
information/data and acting upon
that data, capable of learning how
to draw further insight
Study of design and control of
mechanical systems. On a closed-
loop, these systems are capable
of controlling themselves using
sensory information
● Modern machine learning
and AI techniques are
capable of this for specific
tasks (AlphaGo, Image
Classification)
● These similar techniques,
especially Deep Learning,
could be applied to vehicles
to teach it them drive given
high volumes of data
● Robotics is a well
understood field of study
with decades of research
and progress
● Has been applied to planes,
cars, etc, but in an extremely
limited fashion
● Autonomy cannot be “hard-
coded”, must be “learned”
AI
Robotics
7. The intersection of artificial intelligence and robotics: where the magic happens
Autonomous vehicles have always been a scientific dream. Planes have been capable of auto-pilot,
“self-flying” features for decades. How is it taking so long to happen on cars? Well, existing
infrastructures and roads cannot support rule-based robotic systems. There are too many possible
scenarios that could occur when driving, rules for robotic vehicles cannot be “hard-coded”.
True autonomy requires artificial intelligence. Intelligence that resembles the human capability to
decipher 3D space changing in time. With decades of advances in machine learning and artificial
intelligence we are nearing a time when machines are better at understanding roads than we are.
11. Commands are sent to
Control Unit which tells
engine/motor to speed
up or slow down. An
analogous process
occurs for vehicle
steering.
Sensor data is passed on ro
algorithms and is processed
locally (GPUs) or over a
distributed network (the Cloud)
Autonomous Vehicle Architecture
0100011010101010
0010110101000101
Video Camera (still images
processing, pixels)
LIDAR (light-radar, point clouds)
Specific sensors (e.g. red light
detection, pedestrian detection)
1
Sensing Processing
2 3
Actuation
14. LIDAR, video cameras, and radar/sonic sensors are most
commonly used for gathering vehicle environment data
Video Camera (still images
processing, pixels)
LIDAR (light-radar, point clouds)
Specific sensors (e.g. red light
detection, stop signs)
Sensing
● “Light radar” - LIDAR
● Generates point clouds that are 3D representations
of the driving environment
● Seen as the high-resolution input data that is
integral to SLAM + RRT techniques
● Simple video cameras input feeds of still images
that can be processed for lanes, obstacles,
pedestrians, etc
● Cheap and effective, now being heavily
implemented as the choice data for deep learning
● Case-specific sensors are heavily leveraged to
provide insight in areas that LiDAR and cameras
cannot handle in a general way
● Ex) a specific camera pointed at where stoplights
are - feed directly into a specific algorithm for
sensing red, yellow, and green colors
15. A deep-dive on LIDAR Sensing
● LIDAR has quickly become a go-to
sensor for autonomous applications.
Velodyne is an industry leader with
relatively cheap, easy to calibrate units
● LIDAR units send out pulses of light and
measure the time to return, which can
be used to compute the distance of an
object
● A rotating LIDAR sensor gathering
distances of objects at different angles
can gather enough points of data to
construct a “point cloud”
● It is evident how useful point clouds are,
similar effect as the human eye, 3D
representation of space in real time
16. Researchers at MIT in collaboration with DARPA have been
able to fabricate and implement a solid-state LIDAR chip:
“Our lidar chips promise to be orders of magnitude smaller, lighter, and cheaper than lidar
systems available on the market today. They also have the potential to be much more robust
because of the lack of moving parts, with a non-mechanical beam steering 1,000 times faster
than what is currently achieved in mechanical lidar systems.”
“At the moment, our on-chip lidar system can detect objects at ranges of up to 2 meters, though
we hope to achieve a 10-meter range within a year. The minimum range is around 5
centimeters. We have demonstrated centimeter longitudinal resolution and expect 3-cm lateral
resolution at 2 meters. There is a clear development path towards lidar on a chip technology
that can reach 100 meters, with the possibility of going even farther.”
Massive size and price reduction of LIDAR sensors could
fundamentally change approach to autonomous vehicles,
drones, prosthetics, etc.
“MIT and DARPA pack LIDAR sensor onto single chip”
IEEE Spectrum, Aug 4 2016
A new, cheaper, solid state LIDAR emerging Sensing
SOURCE:
http://spectrum.ieee.org/tech-
talk/semiconductors/optoelectronics/mit-lidar-on-a-chip
17. The sensing stage needs to gather lots of data from different
sources in order to fully understand the environment
Video Camera (still images
processing, pixels)
LIDAR (light-radar, point clouds)
Specific sensors (e.g. red light
detection, stop signs)
Sensing
19. z
The Processing Stack Processing
● CPUs, GPUs, SoCs on
board
● Large amounts of flash
memory
● “Cloud” compute
● Powerful endpoints,
limited only by speed of
data communication
Computational Methods
Local
Distributed
RRT*, SLAM, Kinematics
End-to-End, DNN, CNN
Motion Planning / Mapping
Machine Learning / Deep
Learning
Intersections, Left-turn
Rule-based systems
LIDAR point
cloud data
Video Camera
Feed
Computational Muscle
Input Data
Output Commands
21. Motion Planning - Algorithm 1: SLAM Processing
What is the world around me (mapping)
● Sense from various positions
● Integrate measurements to produce map
Where in I am in the world (localization)
● Sense
● Relate sensor reading to a world model (a priori maps)
● Compute (probabilistic) location relative to model
**above points taken from CMU paper cited below
Depicted to the right is a Kalman Filter being applied to
position measurements and sensory information that in turn
generates a Gaussian distribution of the possible positions
Simultaneous localization and mapping (SLAM)
SOURCE:
http://www.cs.cmu.edu/~motionplanning/lecture/Chap8-
Kalman-Mapping_howie.pdf
22. Motion Planning - Algorithm 1: SLAM Processing
SLAM Walkthrough
SOURCE: http://ocw.mit.edu/courses/aeronautics-and-
astronautics/16-412j-cognitive-robotics-spring-
2005/projects/1aslam_blas_repo.pdf
29. Motion Planning - Algorithm 1: SLAM Processing
SLAM Walkthrough 1 2 3 4 5 6 7
Location Likelihood Distribution
SOURCE: http://ocw.mit.edu/courses/aeronautics-and-
astronautics/16-412j-cognitive-robotics-spring-
2005/projects/1aslam_blas_repo.pdf
30. Motion Planning - Algorithm 2: RRTs Processing
● Rapidly-exploring Random Trees (RRTs) are a set of
exploratory algorithms that are useful for trajectory
planning
● With a set of polygonal obstacles, an RRT can generate
a possible path from the starting configuration to the
ending (goal) configuration
● Sample paths are then input to a controller/model
representation of the vehicle dynamics and the
predicted trajectory of the vehicle is computed (x)
● The runtime of these algorithms can vary since
accuracy is based on samples taken
Once a probabilistic localization is realized, a probabilistic
path can be generated using RRTs
SOURCE:
http://acl.mit.edu/papers/KuwataTCST09.pdf
http://www.staff.science.uu.nl/~gerae101/pdf/compare.pdf
31. Motion Planning - SLAM + RRTs = advanced guesswork Processing
● In order to obtain a higher-resolution probabilistic
model of the ideal trajectory more samples need to be
taken and more computations performed, hence the
need for massive compute power!
● It is understandable that a car driving 60mph would
have issues performing this depth of computation in a
rapidly changing environment
For more in-depth understanding of algorithmic robotics
motion planning works check out SLAM for Dummies
A probabilistic path generated from probabilistic input poses
issues for vehicles moving at high speeds
SOURCE:
http://workshops.acin.tuwien.ac.at/clutter2014/papers/ric201
4_submission_9.pdf
http://acl.mit.edu/papers/KuwataTCST09.pdf
**white spots
represent sampled
points used to
generate RRT
32. Artificial Intelligence (ML/Deep Learning) Processing
● Newly emerging methodologies all revolve around deep
learning via neural nets
○ RNNs, CNNs, GANs, Autoencoding, etc
● Two main forces driving adoption of these methods:
○ Cheaper and more powerful local and cloud
computing (GPUs)
○ Open-source deep learning platforms
(TensorFlow)
These deep learning methodologies are injecting intelligence
into vehicles, feeding them massive amounts of data, and
letting them learn
Please check out this Deep Learning Playground for a better
visualization of the concept
Artificial Intelligence Methods
Feature extraction
performed by a
CNN on video from
a forward facing
camera. Model
was able to
determine what
were road edges
with relative
accuracy (via
NVIDIA)
Lane centering generator that predicts
path of vehicles based on video input
from front facing camera (via
Comma.ai)
34. Important Academic Papers Regarding Deep Learning Processing
● NVIDIA - “End to End Learning for Self-Driving Cars”
Video input from a forward facing camera is trained against steering wheel position and deep
learning networks are capable of detecting important road features with limited additional
nudging in the right direction
● Comma.ai - "Learning A Driving Simulator"
Using video input with no additional training metadata (IMU, wheel angle) auto-encoded video
was generated, predicting many frames into the future while maintaining road features
● Radford et al. (Facebook AI) - "Unsupervised Representational Learning w/ Deep GANs"
Seminal work on deep learning auto-encoding that allowed Comma.ai breakthrough and
similar work i.e. “Autoencoding Blade Runner”
● NYU & Facebook AI - “Deep Multi-Scale Video Prediction Beyond Mean Square Error”
Implications of these papers indicate deep learning is a highly promising solution for AVs
36. Computational muscle limited to local compute, for now Processing
● Current self-driving solutions are all implemented with
local compute due to the need for simplicity, focusing
on software first
● Utilizing GPUs and special SoCs to perform simple
operations (i.e. with pixels and point clouds) at
massive scale in parallel
● New TPUs (tensor processing units) are being
designed specifically for the purpose of machine
learning and AI, as well as new platforms emerging
specifically for AVs
● A distributed network offering massive computational
muscle would be ideal, but does not offer immediate
simplicity due to latency, security, reliability, ...
● Movement toward an “AWS for AVs” is a huge
opportunity, many companies are actively working on
Two paradigms currently, local compute (CPUs, SoCs, GPUs)
and distributed computation over a network (Cloud)
Google’s new TPU that
powered AlphaGo
38. Actuation stage is primarily based on field of controls and
electromechanical systems
Actuation
● The control unit is circuit hardware that manages
electromechanical systems within a car
● Large amount of low-level controls have been
standardized into protocols like CAN
● Most well-studied and understood portion of the
self-driving technology stack, high feasibility
relative to other parts of the “stack”
● Companies like Delphi and Bosch are large players
in this space and have invested decades of time
and research into vehicle controls
● Innovation in this space is much more iterative,
positioning incumbents to dominate the controls
hardware/software for AVs
The processing stage sends commands via bus like CAN
or similar architectures to engine control unit/modules
40. High level trends, “Self-Driving Stack” trends, general comments
● Costs of sensors is falling
through the floor
● No “best sensor” yet,
converging toward LIDAR
and video camera,
dependent on processing
approaches
● Accuracy limits, distance
limits, latency of data
feed (LIDAR especially)
are improving
exponentially with cost
Sensing
● Models vs. Neural vs.
Mixed, no “best practice”
● Local compute only
implementation yet, will
transition toward “Cloud”
same way as software
● Mapping is important but
AI vector bank is the new
data network effect
● V2V, V2I communication
cannot be relied upon
Processing
● Actuation / controls is out
in front of the rest of the
tech, not a limiting factor
● Mission critical safety and
reliability needs to be
investigated more
heavily, beyond “Six
Sigma”
● Incumbents well
positioned
● Security has not been
investigated thoroughly,
will emerge as a large
space later on
Actuation / Controls
41. My Thoughts
1
Data Network Effects for AI systems are
the sole most important factor to long-
term success. Advantage Uber and Tesla.
LIDAR and GPU companies will become
important OEMs and provide it as a
service to Big Auto, only non-commodity
hardware that matters to enable “AV”.
2
The inherently difficult problems are
software related, Big Auto not positioned
to “win” at software. Defer to startups
with ex-researchers.
3
42. - Otto (recently acquired by Uber for ~$600M)
- Zoox
- $200M fundraise with not evening a landing page, talk
about stealthy! Team consists of “fathers of AVs”
- Comma.ai
- Attempting to offer autonomy enablement to vehicle
mfgs.
- Drive.ai
- Software for AVs, not much info, rockstar team with
very deep background
- Peloton Tech
- More immediate use case for semi-autonomy with
platooning. Strategic investors, UPS venture arm is a
positive signal.
- NuTonomy
- Released functioning product in Singapore, great
team
Companies to pay
attention to
Want to review the A16Z presentation, like to use that type of timeline!
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
http://velodynelidar.com/lidar/hdlpressroom/pdf/Articles/A%20Perception-Driven%20Autonomous%20Urban%20Vehicle.pdf
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Want to review the A16Z presentation, like to use that type of timeline!
DOT Link: http://www.nhtsa.gov/About+NHTSA/Press+Releases/U.S.+Department+of+Transportation+Releases+Policy+on+Automated+Vehicle+Development
Want to review the A16Z presentation, like to use that type of timeline!