"The development of autonomous driving cars requires the handling of huge amounts of data produced by test vehicles and solving a number of critical challenges specific to the automotive industry.
In this talk we will describe these challenges and how we, at BMW, are overcoming them by adapting and reinventing existing big data solutions for our end-to-end data journey for autonomous driving. Our journey involves ingesting data produced by a variety of sensors into a dedicated Hadoop cluster, decoding the data, conducting quality control, processing and storing the data on the clusters, making it searchable, analyzing it and exposing it to the engineers working on the algorithms development.
In the first part of the talk we will present a general overview of the challenges we faced and the lessons we learned from them. In the second part we will deep dive into the most interesting technical issues. These include: dealing with automotive formats and standards that are not designed for distributed processing; defragmentation of sensory data; assuring the quality of the data coming from complex car hardware and software components; efficient data search across petabytes of data; and reprocessing the computing components running in the car inside the data center, which typically requires high performance computing."
Speakers:
Felix Reuthlinger, Data Engineer for Autonomous Driving, BMW Group
Dogukan Sonmez, Senior Software Engineer, BMW Group
AWS Community Day CPH - Three problems of Terraform
Data Driven Development of Autonomous Driving at BMW
1. BMW at DataWorks Summit 2018 Berlin
18.04.2018
DATA DRIVEN DEVELOPMENT OF
AUTONOMOUS DRIVING AT BMW
2. ABOUTTHE SPEAKERS
Felix Reuthlinger
§ Data Engineer for AD
§ Joined BMW in 2015
§ Before joining AD, I was
Big Data Architect at BMW central IT
§ Focus: Data center and data flow architecture for AD
§ Strong in: Spark, Scala
§ Co-founding and member of http://munich-datageeks.de/
Dogukan Sonmez
§ Software Engineer for AD
§ Joined BMW in 2017
§ Prior to BMW worked at various big data
and machine learning projects at SAP, Siemens and Sony
§ Focus: Data and Simulation for AD
§ Strong in: Distributed systems and software craftsmanship
§ Hobbies: Building wooden furniture, painting, IoT
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 2
3. AGENDA
Why Autonomous Driving requires data
How we
get data
process data
serve data
ensure data quality
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 3
4. WHYAUTONOMOUS DRIVING REQUIRES DATA
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 4
5. AUTONOMOUS DRIVING LEVELS
NO SUPPORT
HANDS ON
ASSISTENCE PARTLYAUTOMATED AUTONOMOUSHIGHLYAUTOMATED FULLYAUTOMATED
Vehicle controls forward and
sideward motion
Vehicle controls
forward motion
Driver has full control
Driver controls steering and
checks forward motion
Driver checks forward and
sideward motion
Driver is ready to take control
at any time
Driver only required for certain
parts of the track
AUFGABE
DES FAHRERS
AUFGABE DES
FAHRZEUGS
0 1 2 3 4 5
G11 / G30 iNEXT iNEXT Pilotserie tbd.
HANDS ON HANDS TEMP. OFF
EYES TEMP. OFF
HANDSOFF
EYESOFF
HANDS OFF
MINDOFF
PASSENGER
TRANSITION OF REPONSIBILITYHUMAN MACHINE
TECHNO-
LOGICAL
‘MOONSHOT’
TECHNO-
LOGICAL
QUANTUM
LEAP
Vehicle requests driver to
take over control based on
situations
Vehicle does not request
driver to take over control
No driver required
*Source: SAE (Society of Automotive Engineers) International Level of Automation
Page 5Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018
6. Full Range Radar.
Page 6
NIGHTVISION.Side View Camera.
Side Range
Radar.
Surround View Camera.
Ultra-sonic.
Stereo Front Camera.
Rear View Camera.
Side Range Radar.
Ultra-sonic.
STEERING AND LANE CONTROL ASSISTANT INCL.
LANE CHANGE ASSISTANT.
SURROUND VIEW.
ACTIVE CRUISE CONTROL.
SPEED LIMIT ASSIST.
EMERGENCY STEERING ASSIST.
WRONG WAY ASSIST.
CROSSROAD ASSIST.
ADAS* SYSTEM SETUP
(* AUTONOMOUS DRIVING ASSISTANCE SYSTEMS)
23 SENSORS
BMW SERIES 5
7. DATA DRIVEN DEVELOPMENT FOR AD @ BMW
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 7
SeveralTB/h
Upto 500 PB/a
ML Experiments/Training
Test drives
Data Ingest to Data Center
Organize
Structure
KPI report
Deployment of
trained algorithms
ML data sets
Phase out /
Balance datasets
Combinatorial boost of scenarios
Synthetic data
Focus of thistalk
8. Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 8
We got hundreds of PBs
of datato crunch …
Have a lot squirrels do it?
Probably not …
9. DATA JOURNEY OVERVIEW
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 9
Logger File Copy / Ingest Instance
Hadoop
File(s)
Meta store
InputFormat,
Defragmentation, Decoding
Speed Weather
25 km/h Sunny
30 km/h Sunny
Analytics, Functions,
Learning, …
I want to work on
data from a sunny
drive in June, …
10. HOW WE GET DATA
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 10
11. FILE FORMAT STANDARD
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 11
MDF4 (Measurement Data Format,version 4)
à https://www.asam.net/standards/detail/mdf/
Standard in automotive industry (by ASAM organization https://www.asam.net/ )
Organized in binary blocks
MDF4 has multiple usagetypes
sorted / unsorted content
for recording (hardware loggers) or calculated data
for data exchange and long-term storage
BMW AG is one of the standard authors
12. FILE FORMAT – HOW WE USE IT
Logger centric:
Main use case à hardware logger inthe car
Very high data bandwidth à write down data quickly (FIFO)
Our MDF4 files:
Unsorted content
Multiple small blocks for metadata
One continuous big block for storing record payload data
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 12
* Example generated with our custom implementation of Mdf4Writer
* Example hardware logger inthe car
13. FILE FORMAT
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 13
Header : car 1, drive 1
file set 1, file #1, …
0E 12 1A
Header (ID block)
->this is MDF4 of version X
MDF block
Block header
à Block description, size
link[0]
link[1]
---
Link[n]
Data section
à Fields
MDF block
Block header
à Block description, size
link[0]
link[1]
---
Link[n]
Data Section
à Fields
Data block
Block header
à Block description, size
Data Section
à Records / payloads
à Dynamic record size
à No indexing
è This causes the file to
be not split-able
Substructures, like structs, contain metadata downtothe Data Block
We use only 1 data
block here
It covers 99,99% of
thetotalvolume
….
15. DATA LOGGING IN THE CAR
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 15
Logger
SSD
File set 1
File set 2
Logger
Logger config :
car 1, drive 1, …
FIFO
Roll over to next file at 2 GB
(ca. 5s data)
0E 1A 87 …
12 1B AA …
00 01 2A …
Header : car 1, drive 1
file set 1, file #1, …
0E 12 1A
Header : car 1, drive 1
file set 1, file #2, …
87 1B AA
Header : car 1, drive 1
file set 2, file #1, …
00 01 2A
Header : car 1, drive 1
file set 2, file #2, …
04 23 0A
16. HOW WE PROCESS DATA
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 16
17. DATA PROCESSING
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 17
Hadoop / HDFS Hadoop / Spark
InputFormat
RDD / DF
…
Hadoop / HDFS
Speed Weather
25 km/h Sunny
30 km/h SunnyDrive meta data
Merged header information
Hadoop / HBase
Meta store
Note: we parallelize by scaling out over multiple driving sessions
Header : car 1, drive 1
file set 1, file #1, …
0E 12 1A
Header : car 1, drive 1
file set 1, file #2, …
87 1B AA
Header : car 1, drive 1
file set 2, file #1, …
00 01 2A
RDD / DFRDD / DF
RDD / DF
…
RDD / DF
…
read defragment decode store
18. DEEP DIVE ABOUT REDUCING I/O
Continuous data collection requires continuous processing.
Challenges:
Potentially thousands of files per driving session
MDF4 using dynamic record length, no clear split
Seeks inside file
Defragmentation = groupingtransformation
Goal: reduce network I/O.
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 18
19. MDF4 INPUT FORMAT
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 19
20. CUSTOM INPUT FORMAT IMPLEMENTATION
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 20
Header : car 1, drive 1
file set 1, file #1, …
0E 12 1A
RDD / DF
…
Header : car 1, drive 1
file set 2, file #1, …
00 01 2A
2 GB file size
dfs.blocksize=2G
è 1 file = 1 input split
Mdf4Record=
Metadata
Payload
(binary)
Mdf4Reader
InputSplit
…
Mdf4Reader
Executor / Partition
Mdf4InputFormat
Executor / Partition
22. DATA REPRESENTATION IN THE CAR BUS
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018
Page 22
Image from camera
Ethernet IPv4
SomeIP
UDP UDP Datagram
Ethernet IPv4 fragment
Ethernet IPv4 fragment
Ethernet IPv4 fragment
Ethernet IPv4
SomeIP
UDP UDP Datagram
…
23. DATA STRUCTURE FRAGMENTATION
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 23
Header : car 1, drive 1
file set 1, file #1, …
0E 12 1A
Header : car 1, drive 1
file set 1, file #1, …
0E 12 1A
In ~2% of the cases, data overlaps over multiple files
……
12 1A90
…
~98% of the data structures are within one MDF4 fileImage from camera
24. Key Value
A
A
A
WHY NOT USE WHAT IS ALREADY AVAILABLE
Reduce-by-key / group-by-key will shuffle most / all fragments.
Applied function on grouping has still huge result volume (partial image).
Defragmentation requires completeness, incomplete partial-defragmented results might again require shuffle.
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 24
Key Value
A 12
A 23
B 54
A 47
B 24
Key Sum
A 82
B 78
Key Sum
A
This will not get us a result
Works for aggregation
What if something
is missing?
25. DEFRAGMENTATION PROCESS
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 25
RDD with
fragments
create local-
complete
RDD
Reduce local-
complete RDD
Create local-
incomplete RDD from
remaining fragments
reduce local-
incomplete
RDD
Executor
Partition #1
Message #1
Executor
Partition #1
Message #1
ExecutorExecutor
Partition #1
#2
#3
#4
Partition #2
Message #1
(example: completeness = 4 fragments)
RDD #1
RDD #2 RDD #3
Executor
#4
Partition #2 Executor
Partition #2
#4
ExecutorExecutor
Partition #1
#2
#3
Partition #2
RDD #4
Executor
Partition #1
#2
RDD #5
Executor
Partition #2
#3
Union #3 and #5,
Discard remaining
uncomplete fragments
26. SHUFFLE RESULTS: EXAMPLE
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 26
This example result shows limited shuffling
27. HOW WE SERVE DATA
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 27
28. THE DATA
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 28
Speed Weather Environment
85 km/h Rainy Highway
30 km/h Sunny Urban
V1
29. LIDAR
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 29
LIDAR: Light Detection and Ranging
Good for generating a precise 3D map
Not reliable during bad weather conditions
30. RADAR
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 30
Long and short range inthe car
Good for detecting moving objects
Reliable during bad weather conditions
31. IMAGE
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 31
RCCC Image
RCCC format, compressed or uncompressed
Good for object recognitions (traffic lights, street signs, lane lines)
32. WHO ARE THE DATA USERS
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 32
Machine Learning
Engineer
Software Engineer
Algorithm Developer
Robotics Engineer
Applied Scientist
33. WHICH DATA USERS INTERESTED IN
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 33
WANTED
Parquet or ORC
Drive in
highway at
rainy day
★★★★★★
★ ★
WANTED
jpeg
Camera
images
WANTED
Rosbag
Sensory data
IMU, GPS
★ ★
WANTED
DF or DS
Lidar
and
radar data
★ ★
WANTED
HDF5
Urban drive
with traffic
lights
★★★★★★
★★★★★★
★★★★★★
★★★★★★
34. WHAT OUR USERS DO WITH THAT
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 34
Building driving strategy
Signal processing, sensor fusion
Sensor validation
Simulation
35. OUR PHILOSOPHY FOR DATA PROVISIONING
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 35
Evangelize data driven development
Big datatrainings
On boarding new usersto use our cluster
Abstract away data cluster complexity but also allow user to developtop of it
36. DATA ACCESS CHALLENGES
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 36
Scalable way of accessing big data
Continuously changing data structure makes it harder to work with data
Variety and complexity of data andtheir formats
Data centers acrossthe world and data shipping (in case privacy is not affected)
37. DATA ACCESS
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 37
Hadoop / HDFS
Speed Weather
25 km/h Sunny
60 km/h Sunny
Meta store
Hadoop / Spark / …
Data search API
RDD / DF
speed Weather front_camera_image
60 km/h Sunny
55 km/h Sunny
select (speed, front_camera_image) where (whether=sunny and speed > 50)
38. HOW WE ENSURE DATA QUALITY
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 38
39. WHY DATA QUALITY IS IMPORTANT
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 39
We don’t want to wastetime and resources by having unnecessary test drives
We don’t want to store datathat users cannot use
We don’t want to provide bad data
40. IT’S ALL ABOUT …
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 40
The GOOD The BAD The UGLY
41. WHAT COULD POSSIBLY GO WRONG
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 41
Logger
Image Frame drops
Calibration Errors
Configuration Errors
Corrupted sensory data
42. WHICH DATA IS INTERESTING TO USERS
Highway / urban drives
Drive at the night
Rainy day drive
Drive which in cross roads
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 42
43. ENSURING DATA QUALITY
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 43
Centralized data quality framework
Built top of the spark
Kafka for inter-application communication
44. CUSTOM INPUT DISCRETIZED STREAM IMPLEMENTATION
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 44
CustomInputDStream
InputDStream
Creates a new RDD once new data available
Uses streaming scheduler to run continuously
Triggered once a new message is sent
45. DATA QUALITY FRAMEWORK
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 45
HDFS
Hadoop / HDFS
Header : car 1, drive 1
file set 1, file #1, …
0E 12 1A
Header : car 1, drive 1
file set 2, file #1, …
00 01 2A
46. DATA DRIVEN DEVELOPMENT FOR AD @ BMW
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 46
SeveralTB/h
Upto 500 PB/a
ML Experiments/Training
Test drives
Data Ingest to Data Center
Organize
Structure
KPI report
Deployment of
trained algorithms
ML data sets
Phase out /
Balance datasets
Combinatorial boost of scenarios
Synthetic data
Focus of thistalk
47. WE ARE HIRING
The BMW AD organization is growing!
Visit our booth :)
We are also at Strata London in May
Data Driven Development of Autonomous Driving at BMW | DataWorks Summit Berlin | April 2018 Page 47
Autonomous Driving Campus
We got PBdata!