Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection, and Safety
1. Introduction Data systems @ scale Information selection Safety Conclusions
Data-driven challenges in AI:
scale, information selection, and safety
Anna Choromanska
New York University
ECE Department, Tandon Schoold of Engineering
Talk dedicated to my son, Marcin Tadeusz.
2. Introduction Data systems @ scale Information selection Safety Conclusions
Characteristics of modern data
Data size/multi-modality/safety
The amount of available digital data is doubling every two years; by 2020
the amount of data we create and copy annually will reach 44 zettabytes.
EMC Digital Universe study
3. Introduction Data systems @ scale Information selection Safety Conclusions
Characteristics of modern data
Data size/multi-modality/safety
The amount of available digital data is doubling every two years; by 2020
the amount of data we create and copy annually will reach 44 zettabytes.
EMC Digital Universe study
The data comes from multiple modalities such as LiDARs (point cloud),
cameras (images), natural language (text, speech), . . .
4. Introduction Data systems @ scale Information selection Safety Conclusions
Characteristics of modern data
Data size/multi-modality/safety
The amount of available digital data is doubling every two years; by 2020
the amount of data we create and copy annually will reach 44 zettabytes.
EMC Digital Universe study
The data comes from multiple modalities such as LiDARs (point cloud),
cameras (images), natural language (text, speech), . . .
Data can be safe or else anomalous/corrupted/adversarial.
5. Introduction Data systems @ scale Information selection Safety Conclusions
Challenges driven by modern data
Data-driven challenges in AI
scale: how to build AI systems @ scale?
6. Introduction Data systems @ scale Information selection Safety Conclusions
Challenges driven by modern data
Data-driven challenges in AI
scale: how to build AI systems @ scale?
information selection: how to
effectively process data
= choose relevant data modalities/portions
= avoid wasteful computations?
7. Introduction Data systems @ scale Information selection Safety Conclusions
Challenges driven by modern data
Data-driven challenges in AI
scale: how to build AI systems @ scale?
information selection: how to
effectively process data
= choose relevant data modalities/portions
= avoid wasteful computations?
safety: how to verify and trust the data?
8. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
eXtreme classification problem
Problem setting:
multi-class classification: each data point is assigned one label
9. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
eXtreme classification problem
Problem setting:
multi-class classification: each data point is assigned one label
multi-label classification: each data point is assigned a subset
of labels
10. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
eXtreme classification problem
Problem setting:
multi-class classification: each data point is assigned one label
multi-label classification: each data point is assigned a subset
of labels
Applications:
search engines
targeted advertising
aggregation of online news stories and their categorization
. . .
11. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
eXtreme classification problem
Problem setting:
multi-class classification: each data point is assigned one label
multi-label classification: each data point is assigned a subset
of labels
Goal: good predictor with logarithmic training and testing time
12. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
eXtreme classification problem
Problem setting:
multi-class classification: each data point is assigned one label
multi-label classification: each data point is assigned a subset
of labels
Goal: good predictor with logarithmic training and testing time
Most multi-class algo-
rithms run in O(k) time,
where k is the number of
classes. The lower-bound
is O(log k) .
13. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Tree-based classifier
h - hypothesis inducing the split, x - data point
14. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Tree-based classifier
h - hypothesis inducing the split, x - data point
15. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Pure and balanced split
16. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Pure and balanced split
Design per-node objective function that favors:
balanced splits ⇒ efficient tree
17. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Pure and balanced split
Design per-node objective function that favors:
balanced splits ⇒ efficient tree
pure splits ⇒ small classification error
18. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Objective function
J :=
M
j=1
M
l=j+1
|Pj −Pl |
balancing term
−λ1
K
y=1
M
j=1
M
l=j+1
πi Py
j −Py
l
class integrity term
+ λ2
M
j=1
Pj
−1
multi-way penalty
purity term
∈[−λ1, λ2]
19. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Objective function
J :=
M
j=1
M
l=j+1
|Pj −Pl |
balancing term
−λ1
K
y=1
M
j=1
M
l=j+1
πi Py
j −Py
l
class integrity term
+ λ2
M
j=1
Pj
−1
multi-way penalty
purity term
∈[−λ1, λ2]
J ⇒ Splitting criterion (objective function)
Given a set of n examples each with one (multi-class)/subset
(multi-label) of k labels, find a partitioner h that minimizes J.
20. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Objective function
J :=
M
j=1
M
l=j+1
|Pj −Pl |
balancing term
−λ1
K
y=1
M
j=1
M
l=j+1
πi Py
j −Py
l
class integrity term
+ λ2
M
j=1
Pj
−1
multi-way penalty
purity term
∈[−λ1, λ2]
J ⇒ Splitting criterion (objective function)
Given a set of n examples each with one (multi-class)/subset
(multi-label) of k labels, find a partitioner h that minimizes J.
Decreasing J leads to more pure and more balanced splits
21. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Objective function
J :=
M
j=1
M
l=j+1
|Pj −Pl |
balancing term
−λ1
K
y=1
M
j=1
M
l=j+1
πi Py
j −Py
l
class integrity term
+ λ2
M
j=1
Pj
−1
multi-way penalty
purity term
∈[−λ1, λ2]
J ⇒ Splitting criterion (objective function)
Given a set of n examples each with one (multi-class)/subset
(multi-label) of k labels, find a partitioner h that minimizes J.
Decreasing J leads to more pure and more balanced splits
⇒ efficient trees with logarithmic depth
22. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Objective function
J :=
M
j=1
M
l=j+1
|Pj −Pl |
balancing term
−λ1
K
y=1
M
j=1
M
l=j+1
πi Py
j −Py
l
class integrity term
+ λ2
M
j=1
Pj
−1
multi-way penalty
purity term
∈[−λ1, λ2]
J ⇒ Splitting criterion (objective function)
Given a set of n examples each with one (multi-class)/subset
(multi-label) of k labels, find a partitioner h that minimizes J.
Decreasing J leads to more pure and more balanced splits
⇒ efficient trees with logarithmic depth
Decreasing J leads to the reduction of the tree error
23. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Objective function
J :=
M
j=1
M
l=j+1
|Pj −Pl |
balancing term
−λ1
K
y=1
M
j=1
M
l=j+1
πi Py
j −Py
l
class integrity term
+ λ2
M
j=1
Pj
−1
multi-way penalty
purity term
∈[−λ1, λ2]
J ⇒ Splitting criterion (objective function)
Given a set of n examples each with one (multi-class)/subset
(multi-label) of k labels, find a partitioner h that minimizes J.
Decreasing J leads to more pure and more balanced splits
⇒ efficient trees with logarithmic depth
Decreasing J leads to the reduction of the tree error
⇒ small-error trees
24. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Objective properties
J extends to trees of arbitrary arity
25. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Objective properties
J extends to trees of arbitrary arity
J can be easily optimized with SGD
26. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Objective properties
J extends to trees of arbitrary arity
J can be easily optimized with SGD
J leads to the algorithm for tree construction and training
that runs online
27. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Objective properties
J extends to trees of arbitrary arity
J can be easily optimized with SGD
J leads to the algorithm for tree construction and training
that runs online
The approach accommodates classification as well as density
estimation problems.
28. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Objective properties
J extends to trees of arbitrary arity
J can be easily optimized with SGD
J leads to the algorithm for tree construction and training
that runs online
The approach accommodates classification as well as density
estimation problems.
J can be used to learn both the label partitioning and the
data representation simultaneously!
29. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Deep eXtreme classification
Deep representation learning:
Computation in the last layer can blow up...
30. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Deep eXtreme classification
Deep representation learning:
Computation in the last layer can blow up...
31. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Deep eXtreme classification
Deep representation learning:
Computation in the last layer can blow up...
32. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Experiments: classification
Table: Precisions: P@1, P@3, and P@5 (%) and nDCG scores: N@1,
N@3, and N@5 (%) obtained by OAA, LPSR, FastXML, PFastreXML,
and LdSM (d,M) with the depth of the tree d and arity M.
Delicious-200k N = 197k, D = 783k, K = 205k
Algorithm P@1 P@3 P@5 N@1 N@3 N@5
LPSR 18.59 15.43 14.07 18.59 16.17 15.13
FastXML 43.07 38.66 36.19 43.07 39.70 37.83
PFastreXML 41.72 37.83 35.58 41.72 38.76 37.08
LdSM (35,2) 43.40 39.80 37.75 43.40 40.66 39.11
33. Introduction Data systems @ scale Information selection Safety Conclusions
eXtreme classification
Experiments: classification
Table: Precisions: P@1, P@3, and P@5 (%) and nDCG scores: N@1,
N@3, and N@5 (%) obtained by OAA, LPSR, FastXML, PFastreXML,
and LdSM (d,M) with the depth of the tree d and arity M.
Delicious-200k N = 197k, D = 783k, K = 205k
Algorithm P@1 P@3 P@5 N@1 N@3 N@5
LPSR 18.59 15.43 14.07 18.59 16.17 15.13
FastXML 43.07 38.66 36.19 43.07 39.70 37.83
PFastreXML 41.72 37.83 35.58 41.72 38.76 37.08
LdSM (35,2) 43.40 39.80 37.75 43.40 40.66 39.11
Table: Prediction time [ms] per example for FastXML, PFastreXML, and
LdSM on AmazonCat, Wiki10, and Delicious-200k data sets.
FastXML PFastreXML LdSM
AmazonCat 1.21 1.34 0.49
Wiki10 3.00 NA 1.21
Delicious-200k 1.28 7.40 1.30
34. Introduction Data systems @ scale Information selection Safety Conclusions
Sensor selection for autonomous driving
Sensor selection problem for autonomous driving
Problem setting:
autonomous car equipped with multiple sensors
35. Introduction Data systems @ scale Information selection Safety Conclusions
Sensor selection for autonomous driving
Sensor selection problem for autonomous driving
Problem setting:
autonomous car equipped with multiple sensors
end-to-end training framework
36. Introduction Data systems @ scale Information selection Safety Conclusions
Sensor selection for autonomous driving
Sensor selection problem for autonomous driving
Problem setting:
autonomous car equipped with multiple sensors
end-to-end training framework
steering command: the only available supervision
37. Introduction Data systems @ scale Information selection Safety Conclusions
Sensor selection for autonomous driving
Sensor selection problem for autonomous driving
Problem setting:
autonomous car equipped with multiple sensors
end-to-end training framework
steering command: the only available supervision
Goal:
avoid fast increase of computational complexity with the
number of sensing devices
38. Introduction Data systems @ scale Information selection Safety Conclusions
Sensor selection for autonomous driving
Sensor selection problem for autonomous driving
Problem setting:
autonomous car equipped with multiple sensors
end-to-end training framework
steering command: the only available supervision
Goal:
avoid fast increase of computational complexity with the
number of sensing devices
activate feature extractors for relevant inputs only
39. Introduction Data systems @ scale Information selection Safety Conclusions
Sensor selection for autonomous driving
Sensor selection problem for autonomous driving
Problem setting:
autonomous car equipped with multiple sensors
end-to-end training framework
steering command: the only available supervision
Goal:
avoid fast increase of computational complexity with the
number of sensing devices
activate feature extractors for relevant inputs only
avoid overfitting to the simplest and most informative input
40. Introduction Data systems @ scale Information selection Safety Conclusions
Sensor selection for autonomous driving
Sensor selection problem for autonomous driving
Problem setting:
autonomous car equipped with multiple sensors
end-to-end training framework
steering command: the only available supervision
Goal:
avoid fast increase of computational complexity with the
number of sensing devices
activate feature extractors for relevant inputs only
avoid overfitting to the simplest and most informative input
guarantee real-time operation
41. Introduction Data systems @ scale Information selection Safety Conclusions
Sensor selection for autonomous driving
Sensor selection problem for autonomous driving
Problem setting:
autonomous car equipped with multiple sensors
end-to-end training framework
steering command: the only available supervision
Goal:
avoid fast increase of computational complexity with the
number of sensing devices
activate feature extractors for relevant inputs only
avoid overfitting to the simplest and most informative input
guarantee real-time operation
allow both discrete and continuous data selection
42. Introduction Data systems @ scale Information selection Safety Conclusions
Sensor selection for autonomous driving
Hardware
Figure: The block diagram of the autonomous platform.
Traxxas X-Maxx remote control truck (RC car, scale 1/6)
DrivePX2 for computations
three SEKONIX AR0231 GMSL cameras that are facing the
front of the platform and cover non-overlapping views. Each
camera has 60 degrees horizontal field of view
Velodyne VLP-16 LiDAR with 16 lasers covering 30 degree
vertical FOV and 360 degree horizontal FOV
43. Introduction Data systems @ scale Information selection Safety Conclusions
Sensor selection for autonomous driving
Approach: multi-modality and mixed policy
Figure: The architecture of the reconfigurable network.
44. Introduction Data systems @ scale Information selection Safety Conclusions
Sensor selection for autonomous driving
Approach: multi-modality and mixed policy
Figure: Different stages of training.
45. Introduction Data systems @ scale Information selection Safety Conclusions
Sensor selection for autonomous driving
Experiments: multi-modality and mixed policy
Table: Computational complexity comparison of different networks.
Network Name FLOPs
LiDAR only 26.17M
LiDAR with gating 14.11M
Single Camera 25.38M
Three Cameras 76.01M
Three cameras and LiDAR 102.49M
Three cameras and LiDAR with gating 90.08M
Multi-modal Experts Network 17.28M
chosen sensor: LiDAR
Multi-modal Experts Network 29.61M
chosen sensor: camera
46. Introduction Data systems @ scale Information selection Safety Conclusions
Sensor selection for autonomous driving
Experiments: multi-modality and mixed policy
47. Introduction Data systems @ scale Information selection Safety Conclusions
Safety in autonomous driving
Problem of safety in autonomous driving
Problem setting:
autonomous car instrumented with cameras and LIDAR and
controlled by an end-to-end learning system
48. Introduction Data systems @ scale Information selection Safety Conclusions
Safety in autonomous driving
Problem of safety in autonomous driving
Problem setting:
autonomous car instrumented with cameras and LIDAR and
controlled by an end-to-end learning system
Goal:
develop on-line monitoring framework for continuous real-time
safety in learning-based control systems
49. Introduction Data systems @ scale Information selection Safety Conclusions
Safety in autonomous driving
Problem of safety in autonomous driving
Problem setting:
autonomous car instrumented with cameras and LIDAR and
controlled by an end-to-end learning system
Goal:
develop on-line monitoring framework for continuous real-time
safety in learning-based control systems
monitor the validity of mappings from sensor inputs to
actuator commands
50. Introduction Data systems @ scale Information selection Safety Conclusions
Safety in autonomous driving
CEBGAN for safety in autonomous driving
Figure: Conditional energy based generative adversarial network
(CEBGAN) framework for the controller-focused anomaly detection
(CFAM).
51. Introduction Data systems @ scale Information selection Safety Conclusions
Safety in autonomous driving
Experiments
Figure: Safe operation of the autonomous platform.
Figure: Anomalous operation of the autonomous platform.
52. Introduction Data systems @ scale Information selection Safety Conclusions
Safety in autonomous driving
Experiments
53. Introduction Data systems @ scale Information selection Safety Conclusions
Summary
Summary and Future Directions
Discussed approaches:
scale: using decisions trees to scale AI systems to large data
sizes
54. Introduction Data systems @ scale Information selection Safety Conclusions
Summary
Summary and Future Directions
Discussed approaches:
scale: using decisions trees to scale AI systems to large data
sizes
information selection: using reconfigurable networks to select
relevant data
55. Introduction Data systems @ scale Information selection Safety Conclusions
Summary
Summary and Future Directions
Discussed approaches:
scale: using decisions trees to scale AI systems to large data
sizes
information selection: using reconfigurable networks to select
relevant data
safety: using GANs to monitor system’s safety
56. Introduction Data systems @ scale Information selection Safety Conclusions
Summary
Summary and Future Directions
Discussed approaches:
scale: using decisions trees to scale AI systems to large data
sizes
information selection: using reconfigurable networks to select
relevant data
safety: using GANs to monitor system’s safety
Future directions:
logarithmic space framework, modern recommendation
systems, other applications
scaling information selection algorithms to a large number of
inputs
ambiguous scenarios, increasing system’s robustness
practical sample complexity bounds
58. Introduction Data systems @ scale Information selection Safety Conclusions
Summary
Research Group
Many thanks to NVIDIA Autonomous Driving
Team in New Jersey!!!
59. Introduction Data systems @ scale Information selection Safety Conclusions
Summary
NYU Tandon ECE Seminar Series on Modern AI
DOORS ARE OPEN TO EVERYBODY!!!
Past Speakers: Yann LeCun, Yoshua Bengio, Stefano Soatto,
Vladimir Vapnik, David Blei, Richard J. Roberts, Anima
Anadkumar, Martial Hebert, Tony Jebara
Future confirmed speakers: Manuela Veloso, Eric Kandel, Francis
Bach, Raia Hadsell, Leon Bottou, Michael Kearns, Nicol`o
Cesa-Bianchi