Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection, and Safety

Introduction Data systems @ scale Information selection Safety Conclusions
Data-driven challenges in AI:
scale, information selection, and safety
Anna Choromanska
New York University
ECE Department, Tandon Schoold of Engineering
Talk dedicated to my son, Marcin Tadeusz.

Characteristics of modern data
Data size/multi-modality/safety
The amount of available digital data is doubling every two years; by 2020
the amount of data we create and copy annually will reach 44 zettabytes.
EMC Digital Universe study

The data comes from multiple modalities such as LiDARs (point cloud),
cameras (images), natural language (text, speech), . . .

The data comes from multiple modalities such as LiDARs (point cloud),
cameras (images), natural language (text, speech), . . .
Data can be safe or else anomalous/corrupted/adversarial.

Challenges driven by modern data
Data-driven challenges in AI
scale: how to build AI systems @ scale?

information selection: how to
eﬀectively process data
= choose relevant data modalities/portions
= avoid wasteful computations?

information selection: how to
eﬀectively process data
= choose relevant data modalities/portions
= avoid wasteful computations?
safety: how to verify and trust the data?

eXtreme classification
eXtreme classification problem
Problem setting:
multi-class classification: each data point is assigned one label

Problem setting:
multi-label classiﬁcation: each data point is assigned a subset
of labels

Problem setting:
of labels
Applications:
search engines
targeted advertising
aggregation of online news stories and their categorization
. . .

Problem setting:
of labels
Goal: good predictor with logarithmic training and testing time

Problem setting:
of labels
Goal: good predictor with logarithmic training and testing time
Most multi-class algo-
rithms run in O(k) time,
where k is the number of
classes. The lower-bound
is O(log k) .

Tree-based classiﬁer
h - hypothesis inducing the split, x - data point

Pure and balanced split

Design per-node objective function that favors:
balanced splits ⇒ eﬃcient tree

Design per-node objective function that favors:
balanced splits ⇒ eﬃcient tree
pure splits ⇒ small classiﬁcation error

Objective function
J :=
M
j=1
M
l=j+1
|Pj −Pl |
balancing term
−λ1
K
y=1
M
j=1
M
l=j+1
πi Py
j −Py
l
class integrity term
+ λ2


M
j=1
Pj

−1
multi-way penalty
purity term
∈[−λ1, λ2]

Objective function
J :=
M
j=1
M
l=j+1
|Pj −Pl |
balancing term
−λ1
K
y=1
M
j=1
M
l=j+1
πi Py
j −Py
l
+ λ2


M
j=1
Pj

−1
multi-way penalty
purity term
∈[−λ1, λ2]
J ⇒ Splitting criterion (objective function)
Given a set of n examples each with one (multi-class)/subset
(multi-label) of k labels, ﬁnd a partitioner h that minimizes J.

Objective function
J :=
M
j=1
M
l=j+1
|Pj −Pl |
balancing term
−λ1
K
y=1
M
j=1
M
l=j+1
πi Py
j −Py
l
+ λ2


M
j=1
Pj

−1
multi-way penalty
purity term
∈[−λ1, λ2]
Decreasing J leads to more pure and more balanced splits

Objective function
J :=
M
j=1
M
l=j+1
|Pj −Pl |
balancing term
−λ1
K
y=1
M
j=1
M
l=j+1
πi Py
j −Py
l
+ λ2


M
j=1
Pj

−1
multi-way penalty
purity term
∈[−λ1, λ2]
⇒ eﬃcient trees with logarithmic depth

Objective function
J :=
M
j=1
M
l=j+1
|Pj −Pl |
balancing term
−λ1
K
y=1
M
j=1
M
l=j+1
πi Py
j −Py
l
+ λ2


M
j=1
Pj

−1
multi-way penalty
purity term
∈[−λ1, λ2]
Decreasing J leads to the reduction of the tree error

Objective function
J :=
M
j=1
M
l=j+1
|Pj −Pl |
balancing term
−λ1
K
y=1
M
j=1
M
l=j+1
πi Py
j −Py
l
+ λ2


M
j=1
Pj

−1
multi-way penalty
purity term
∈[−λ1, λ2]
Decreasing J leads to the reduction of the tree error
⇒ small-error trees

Objective properties
J extends to trees of arbitrary arity

J can be easily optimized with SGD

J leads to the algorithm for tree construction and training
that runs online

that runs online
The approach accommodates classiﬁcation as well as density
estimation problems.

that runs online
The approach accommodates classiﬁcation as well as density
estimation problems.
J can be used to learn both the label partitioning and the
data representation simultaneously!

Deep eXtreme classiﬁcation
Deep representation learning:
Computation in the last layer can blow up...

Experiments: classiﬁcation
Table: Precisions: P@1, P@3, and P@5 (%) and nDCG scores: N@1,
N@3, and N@5 (%) obtained by OAA, LPSR, FastXML, PFastreXML,
and LdSM (d,M) with the depth of the tree d and arity M.
Delicious-200k N = 197k, D = 783k, K = 205k
Algorithm P@1 P@3 P@5 N@1 N@3 N@5
LPSR 18.59 15.43 14.07 18.59 16.17 15.13
FastXML 43.07 38.66 36.19 43.07 39.70 37.83
PFastreXML 41.72 37.83 35.58 41.72 38.76 37.08
LdSM (35,2) 43.40 39.80 37.75 43.40 40.66 39.11

Experiments: classiﬁcation
Table: Precisions: P@1, P@3, and P@5 (%) and nDCG scores: N@1,
N@3, and N@5 (%) obtained by OAA, LPSR, FastXML, PFastreXML,
and LdSM (d,M) with the depth of the tree d and arity M.
Delicious-200k N = 197k, D = 783k, K = 205k
Algorithm P@1 P@3 P@5 N@1 N@3 N@5
LPSR 18.59 15.43 14.07 18.59 16.17 15.13
FastXML 43.07 38.66 36.19 43.07 39.70 37.83
PFastreXML 41.72 37.83 35.58 41.72 38.76 37.08
LdSM (35,2) 43.40 39.80 37.75 43.40 40.66 39.11
Table: Prediction time [ms] per example for FastXML, PFastreXML, and
LdSM on AmazonCat, Wiki10, and Delicious-200k data sets.
FastXML PFastreXML LdSM
AmazonCat 1.21 1.34 0.49
Wiki10 3.00 NA 1.21
Delicious-200k 1.28 7.40 1.30

Sensor selection for autonomous driving
Sensor selection problem for autonomous driving
Problem setting:
autonomous car equipped with multiple sensors

Problem setting:
end-to-end training framework

Problem setting:
steering command: the only available supervision

Problem setting:
Goal:
avoid fast increase of computational complexity with the
number of sensing devices

Problem setting:
Goal:
activate feature extractors for relevant inputs only

Problem setting:
Goal:
avoid overﬁtting to the simplest and most informative input

Problem setting:
Goal:
guarantee real-time operation

Problem setting:
Goal:
guarantee real-time operation
allow both discrete and continuous data selection

Hardware
Figure: The block diagram of the autonomous platform.
Traxxas X-Maxx remote control truck (RC car, scale 1/6)
DrivePX2 for computations
three SEKONIX AR0231 GMSL cameras that are facing the
front of the platform and cover non-overlapping views. Each
camera has 60 degrees horizontal ﬁeld of view
Velodyne VLP-16 LiDAR with 16 lasers covering 30 degree
vertical FOV and 360 degree horizontal FOV

Approach: multi-modality and mixed policy
Figure: The architecture of the reconﬁgurable network.

Approach: multi-modality and mixed policy
Figure: Diﬀerent stages of training.

Experiments: multi-modality and mixed policy
Table: Computational complexity comparison of diﬀerent networks.
Network Name FLOPs
LiDAR only 26.17M
LiDAR with gating 14.11M
Single Camera 25.38M
Three Cameras 76.01M
Three cameras and LiDAR 102.49M
Three cameras and LiDAR with gating 90.08M
Multi-modal Experts Network 17.28M
chosen sensor: LiDAR
Multi-modal Experts Network 29.61M
chosen sensor: camera

Experiments: multi-modality and mixed policy

Safety in autonomous driving
Problem of safety in autonomous driving
Problem setting:
autonomous car instrumented with cameras and LIDAR and
controlled by an end-to-end learning system

Problem setting:
Goal:
develop on-line monitoring framework for continuous real-time
safety in learning-based control systems

Problem setting:
Goal:
develop on-line monitoring framework for continuous real-time
safety in learning-based control systems
monitor the validity of mappings from sensor inputs to
actuator commands

CEBGAN for safety in autonomous driving
Figure: Conditional energy based generative adversarial network
(CEBGAN) framework for the controller-focused anomaly detection
(CFAM).

Experiments
Figure: Safe operation of the autonomous platform.
Figure: Anomalous operation of the autonomous platform.

Experiments

Summary
Summary and Future Directions
Discussed approaches:
scale: using decisions trees to scale AI systems to large data
sizes

Summary
sizes
information selection: using reconﬁgurable networks to select
relevant data

Summary
sizes
relevant data
safety: using GANs to monitor system’s safety

Summary
sizes
relevant data
safety: using GANs to monitor system’s safety
Future directions:
logarithmic space framework, modern recommendation
systems, other applications
scaling information selection algorithms to a large number of
inputs
ambiguous scenarios, increasing system’s robustness
practical sample complexity bounds

Summary
Research Group

Summary
Research Group
Many thanks to NVIDIA Autonomous Driving
Team in New Jersey!!!

Summary
NYU Tandon ECE Seminar Series on Modern AI
DOORS ARE OPEN TO EVERYBODY!!!
Past Speakers: Yann LeCun, Yoshua Bengio, Stefano Soatto,
Vladimir Vapnik, David Blei, Richard J. Roberts, Anima
Anadkumar, Martial Hebert, Tony Jebara
Future conﬁrmed speakers: Manuela Veloso, Eric Kandel, Francis
Bach, Raia Hadsell, Leon Bottou, Michael Kearns, Nicol`o
Cesa-Bianchi

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection, and Safety

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection, and Safety

Semelhante a Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection, and Safety (20)

Mais de MLconf

Mais de MLconf (20)

Último

Último (20)

Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection, and Safety