SlideShare a Scribd company logo
1 of 60
Download to read offline
Pedestrian Behavior/Intention
Modeling for Autonomous Driving III
Yu Huang
Yu.huang07@gmail.com
Sunnyvale, California
Outline
• SR-LSTM: State Refinement for LSTM towards Pedestrian Trajectory Prediction
• Pedestrian Path, Pose, and Intention Prediction Through Gaussian Process Dynamical
Models
• StarNet: Pedestrian Trajectory Prediction using Deep Neural Network in Star Topology
• Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs
• Multi-Agent Tensor Fusion for Contextual Trajectory Prediction
• Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic
Scenes
• TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted
Interactions
• Learning to Infer Relations for Future Trajectory Forecast
• Peeking into the Future: Predicting Future Person Activities and Locations in Videos
SR-LSTM: State Refinement for LSTM
towards Pedestrian Trajectory Prediction
• 2019.3
• In crowd scenarios, reliable trajectory prediction of pedestrians requires insightful
understanding of their social behaviors.
• These behaviors have been well investigated by plenty of studies, while it is hard to be fully
expressed by hand-craft rules.
• Recent studies based on LSTM networks have shown great ability to learn social behaviors.
• However, many of these methods rely on previous neighboring hidden states but ignore the
important current intention of the neighbors.
• In order to address this issue, this is a data-driven state refinement module for LSTM
network (SR- LSTM), which activates the utilization of the current intention of neighbors, and
jointly and iteratively refines the current states of all participants in the crowd through a
message passing mechanism.
• To effectively extract the social effect of neighbors, further introduce a social-aware
information selection mechanism consisting of an element-wise motion gate and a
pedestrian-wise attention to select useful message from neighboring pedestrians.
SR-LSTM: State Refinement for LSTM
towards Pedestrian Trajectory Prediction
• Current states of neighbors are important for timely interaction inference.
When predicting for the lady at time t, considering the trajectory of the man on the right up to time t
(a), or the one up to time t − 1 (b), can cause great deviation in predicting results (dashed lines).
SR-LSTM: State Refinement for LSTM
towards Pedestrian Trajectory Prediction
• Useful information should be adaptively selected from neighbors, based on
their motions and locations.
(a) Activation trajectory patterns of hidden neurons in LSTM, which start from the origin. Each trajectory
pattern marked by certain color contains trajectories from database which has top- 20 responses for the
hidden neuron. (b) A sample of three pedestrian interaction. How will the dyad pay attention to the other
pedestrian on the left?
SR-LSTM: State Refinement for LSTM
towards Pedestrian Trajectory Prediction
Framework overview of SR-LSTM. States refinement module is considered as an additional subnetwork of
the LSTM cells, which aligns pedestrians together and updates current states of them. The refined states
are used to predict the location at the next time step.
SR-LSTM: State Refinement for LSTM
towards Pedestrian Trajectory Prediction
• The Vanilla LSTM is used for extracting features from the pedestrian trajectory separately.
• The main difference is that the States Refinement (SR) module is used for refining the cell
states by passing message among pedestrians.
• The SR module takes the following three information sources of all pedestrians as input: the
current locations of pedestrians, hidden states and cell states from LSTM.
• The output of the SR module is the refined cell states.
• In the task of pedestrian trajectory prediction, further refinement could improve the quality
of the interaction model, indicating the intention negotiation of human interaction natures.
• The motion gate and the pedestrian-wise attention jointly select the important information
from neighboring pedestrians for message passing.
SR-LSTM: State Refinement for LSTM
towards Pedestrian Trajectory Prediction
In SR- LSTM, current states of pedestrians can
timely refine each other, particularly in the case
where pedestrians change their intentions.
SR-LSTM are able to implicitly explain for common
social behaviors, which gives moderate future
predictions and relatively low errors.
Pedestrian Path, Pose, and Intention Prediction Through Gaussian
Process Dynamical Models and Pedestrian Activity Recognition
• 2019. 5 IEEE T-ITS
• The predictions of pedestrian paths improve current automatic emergency braking systems.
• It is to predict future pedestrian paths, poses, and intentions up to 1 s in advance.
• This method is based on balanced Gaussian process dynamical models (B-GPDMs), which
reduce the 3-D time-related information extracted from key points or joints placed along
pedestrian bodies into low-dimensional spaces.
• The B-GPDM is also capable of inferring future latent positions and reconstruct their
associated observations.
• However, learning a generic model for all kinds of pedestrian activities normally provides
less accurate predictions.
• The proposed method obtains multiple models of four types of activity, i.e., walking,
stopping, starting, and standing, and selects the most similar model to estimate future
pedestrian states.
• This method detects starting activities 125ms after the gait initiation with an accuracy 80%
and recognizes stopping intentions 58.33ms before the event with an accuracy 70%.
Pedestrian Path, Pose, and Intention Prediction Through Gaussian
Process Dynamical Models and Pedestrian Activity Recognition
General description of the method based on
B-GPDMs. The algorithm is divided into
two stages: offline training (top) and online
execution (bottom).
The method learns multiple models of each
type of pedestrian activity, i.e. walking,
stopping, starting and standing, and selects
the most appropriate one to estimate future
pedestrian states at each time step.
A training dataset of motion sequences, in
which pedestrians perform different activities,
is split into 8 subsets based on typical
crossing orientations and type of activity.
A B-GPDM is obtained for each sequence
with one activity contained in the dataset.
Pedestrian Path, Pose, and Intention Prediction Through Gaussian
Process Dynamical Models and Pedestrian Activity Recognition
• The proposed method is based on B-GPDMs, which reduce the 3D time-related
positions and displacements extracted from key points or joints placed along the
pedestrian bodies into low-dimensional latent spaces.
• The B-GPDM also has the peculiarity of inferring future latent positions and
reconstructing the observation associated to a latent position from the latent space.
• Therefore, it is possible to reconstruct future observations from future latent positions.
• In the online execution, given a new pedestrian observation, the current activity is
determined using a HMM.
• Thus, the selection of the most appropriate model among the trained ones is centered
solely on that activity.
• Finally, the selected model is used to predict the future latent positions and reconstruct
the future pedestrian path and poses.
StarNet: Pedestrian Trajectory Prediction using
Deep Neural Network in Star Topology
• 2019.6
• Pedestrian trajectory prediction is crucial for many important applications.
• This problem is a great challenge because of complicated interactions among pedestrians.
• Previous methods model only the pairwise interactions between pedestrians, which not only
oversimplifies the interactions among pedestrians but also is computationally inefficient.
• StarNet has a star topology which includes a unique hub network and multiple host
networks.
• The hub network takes observed trajectories of all pedestrians to produce a comprehensive
description of the interpersonal interactions.
• Then the host networks, each of which corresponds to one pedestrian, consult the
description and predict future trajectories.
• The star topology gives StarNet two advantages over conventional models.
• StarNet is able to consider the collective influence among all pedestrians in the hub network,
making more accurate predictions.
• StarNet is computationally efficient since the number of host network is linear to the number of
pedestrians.
StarNet: Pedestrian Trajectory Prediction using
Deep Neural Network in Star Topology
The structure of StarNet. StarNet mainly consists a centralized hub network and several host
networks. The hub network collects movement information and generates a feature which
describes joint interactions among pedestrians. Each host network, corresponding to a certain
pedestrian, queries the hub network and predicts the pedestrian’s trajectory.
StarNet: Pedestrian Trajectory Prediction using
Deep Neural Network in Star Topology
• Pedestrian path prediction is a great challenge due to the uncertainty of future movements.
• Conventional methods tackle this problem with manually crafted features.
• Data-driven methods remove the requirement of hand- crafted features, and greatly
improve the ability to predict pedestrian trajectories.
• However, existing methods compute pairwise features, and thus oversimplified the
interactions in the real word environment.
• Meanwhile, they suffer from a huge computational burden in crowded scenes.
• StarNet has two advantages over previous methods.
• 1) the representation is to describe not only pairwise interactions but also collective ones.
• Such a comprehensive representation enables StarNet to make accurate predictions.
• 2) the interactions between one pedestrian and others are efficiently computed.
• When predicting all pedestrians’ trajectories, the computational time increases linearly, rather
than quadratically, as the number of pedestrians increases.
StarNet: Pedestrian Trajectory Prediction using
Deep Neural Network in Star Topology
The process of predicting the coordinates.
StarNet: Pedestrian Trajectory Prediction using
Deep Neural Network in Star Topology
• The hub network takes all of the observed trajectories simultaneously and produces a
comprehensive representation r of the crowd of pedestrians.
• The representation r includes both spatial and temporal information of the crowd, which is
the key to describe the interactions among pedestrians.
• The hub network produces r by two steps: 1) produce a spatial representation of the crowd
for each time step; 2) the spatial representation is fed into a LSTM to produce the spatio-
temporal representation r.
• For the i-th pedestrian, the host network first embeds the observed trajectory Oi, and then
combines the embedded trajectory with the spatio-temporal representation rt, predicting
the future trajectory.
• Specifically, the host network predicts the future trajectory by two steps: 1) take the
observed trajectory Oi and the spatio-temporal representation rt as input and generates an
integrated representation; 2) predict the future trajectory of the i-th pedestrian depending
on the observed trajectory Oi and the integrated representation.
StarNet: Pedestrian Trajectory Prediction using
Deep Neural Network in Star Topology
Predicted trajectories and the
corresponding ground truths.
Different colors indicate different
trajectories. The trajectories of
ground truth are labeled with dots.
The predicted trajectories are
labeled with triangles.
Social Ways: Learning Multi-Modal Distributions
of Pedestrian Trajectories with GANs
• CVPRW 2019
• This paper proposes an approach for predicting the motion of pedestrians interacting with
others.
• It uses a Generative Adversarial Network (GAN) to sample plausible predictions for any
agent in the scene.
• As GANs are very susceptible to mode collapsing and dropping, they show that the recently
proposed Info-GAN allows dramatic improvements in multi-modal pedestrian trajectory
prediction to avoid these issues.
• It also left out L2-loss in training the generator, unlike some previous works, because it
causes serious mode collapsing though faster convergence.
• They show through experiments on real and synthetic data that the proposed method leads
to generate more diverse samples and to preserve the modes of the predictive distribution.
• In particular, to prove this claim, we have designed a toy example dataset of trajectories that
can be used to assess the performance of different methods in preserving the predictive
distribution modes.
Social Ways: Learning Multi-Modal Distributions
of Pedestrian Trajectories with GANs
Illustration of the trajectory prediction problem. Having the observed trajectories
of a pedestrian of interest, here shown with a *, and the ones of other pedestrians
in the environment, the system should be able to build a predictive distribution of
possible trajectories (here with two modes in dashed yellow lines).
Social Ways: Learning Multi-Modal Distributions
of Pedestrian Trajectories with GANs
• When deciding his steering actions, a pedestrian anticipates likely scenarios
about the evolution of his surrounding in the near future.
• Now, this anticipation may not be always very easy, because of the
uncertainties in the neighbors future motion and intentions.
• In most recent NN-based motion prediction systems, the input is taken as
the set of most recent observations of the surrounding pedestrians.
• Hence, mappings from observations to predicted trajectories built through
the networks do not consider explicitly uncertain and multimodal nature of
the neighbors future trajectories, and, in a way, the network is expected to
learn it too, which may be too much to expect.
• The Social Ways GAN generates independent random trajectory samples
that mimic the distribution of trajectories among our training data,
conditioned on observed initial tracklets of duration τ for all the agents in the
scene.
Social Ways: Learning Multi-Modal Distributions
of Pedestrian Trajectories with GANs
Block Diagram of the Social Ways prediction system. The yellow ellipses represent loss calculations. The dashed
arrows show the backpropagation directions. The bold arrows carry ground truth data.
Social Ways: Learning Multi-Modal Distributions
of Pedestrian Trajectories with GANs
• GAN training is known to be hard, as it may not converge, exhibit vanishing gradients when
there is imbalance between the Generator and the Discriminator, or may be subject to mode
collapsing, i.e. sampling of synthetic data without diversity.
• When predicting pedestrian motion, it is critical to avoid mode collapsing, as it could result
in catastrophic decisions, i.e. for an autonomous driving agent.
• Here introduced two major changes in the GAN training.
• First, do not use an L2 loss enforcing the generated samples to be close to the true data, because
having observed negative impact of this term in the diversity of the generated samples.
• Also, implemented an Info-GAN architecture has a very positive impact on avoiding the mode
collapsing problem with respect to other versions of GANs.
• Info-GAN learns disentangled representations of the sources of variation among the data, and
does so by introducing a new coding variable c as an input.
• The training is performed by adding another term to maximize a lower bound of the mutual
information between the distribution of c and the distribution of the generated outputs,
which requires training another sub-network which serves as a surrogate to evaluate the
likelihoods over the generated data.
Social Ways: Learning Multi-Modal Distributions
of Pedestrian Trajectories with GANs
It illustrates sample outputs (in magenta color). The observed trajectories are shown in blue and ground truth
prediction and constant-velocity predictions are shown in cyan and orange lines, respectively.
Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
• Accurate prediction of others’ trajectories is essential for autonomous driving.
• Trajectory prediction is challenging because it requires reasoning about agents’ past
movements, social interactions among varying numbers and kinds of agents, constraints
from the scene context, and the stochasticity of human behavior.
• This approach models these interactions and constraints jointly within a Multi-Agent
Tensor Fusion (MATF) network.
• Specifically, the model encodes multiple agents’ past trajectories and the scene context
into a Multi-Agent Tensor, then applies convolutional fusion to capture multiagent
interactions while retaining the spatial structure of agents and the scene context.
• The model decodes recurrently to multiple agents’ future trajectories, using adversarial
loss to learn stochastic predictions.
• Experiments on both highway driving and pedestrian crowd datasets show that the model
achieves state-of- the-art prediction accuracy.
2019.7
Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
• There are two parallel encoding streams in the MATF architecture.
• One encodes the past trajectories of each individual agent xi independently using single agent
LSTM encoders, and another encodes the static scene context image c with a CNN.
• Each LSTM encoder shares the same set of parameters, so the architecture is invariant to the
number of agents in the scene.
• The outputs of the LSTM encoders are 1-D agent state vectors {x′1 , x′2 , .., x′n } without
temporal structure.
• The output of the scene context encoder CNN is a scaled feature map c′ retaining the spatial
structure of the bird’s-eye view static scene context image.
• Next, the two encoding streams are concatenated spatially into a Multi-Agent Tensor.
• Agent encodings {x′1, x′2, .., x′n} are placed into one bird’s-eye view spatial tensor, which is
initialized to 0 and is of the same shape (width and height) as the encoded scene image c′.
Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
• The dimension axis of the encodings fits into the channel axis of the tensor.
• The agent encodings are placed into the spatial tensor with respect to their positions at the
last time step of their past trajectories.
• This tensor is then concatenated with the encoded scene image in the channel dimension to
get a combined tensor. If multiple agents are placed into the same cell in the tensor due to
discretization, element-wise max pooling is performed.
• The Multi-Agent Tensor is fed into fully convolutional layers, which learn to represent
interactions among multiple agents and between agents and the scene context, while
retaining spatial locality, to produce a fused Multi-Agent Tensor.
• Specifically, these layers operate at multiple spatial resolution scale levels by adopting U-
Net-like architectures to model interaction at different spatial scales.
• The output feature map of this fused model c′′ has exactly the same shape as c′ in width and
height to retain the spatial structure of the encoding.
Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
The Multi-Agent Tensor encoding is a spatial
feature map of the scene context and multiple
agents from an overhead perspective, including
agent channels (above) and context channels
(below). Agents’ feature vectors (red) output
from single- Agent LSTM encoders are placed
spatially w.r.t. agents’ coordinates to form the
agent channels. The agent channels are aligned
spatially with the context channels (a context
feature map) output from scene context
encoding layers to retain the spatial structure.
Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
• To decode each agent’s predicted trajectory, agent- specific representations with fused
interaction features for each agent {x1′′ , x2′′ , .., xn′′ } are sliced out according to their
coordinates from the fused Multi-Agent Tensor output c′′.
• These agent-specific representations are then added as a residual to the original encoded
agent vectors to form final agent encoding vectors {x1′ + x1′′ , x2′ + x2′′ , ..., xn′ + xn′′ }, which
encode all the information from the past trajectories of the agents themselves, the static
scene context, and the interaction features among multiple agents.
• In this way, this approach allows each agent to get a different social and contextual
embedding focused on itself.
• Importantly, the model gets these embeddings for multiple agents using shared feature
extractors instead of operating n times for n agents.
• Finally, for each agent in the scene, its final vector xi′ + xi′′ is decoded to future trajectory
prediction yiˆ by LSTM decoders.
• Similar to the encoders for each agent, parameters are shared to guarantee that the network
can generalize well when the number of agents in the scene varies.
Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
Illustration of the Multi-Agent Tensor Fusion (MATF) architecture.
Multi-Agent Tensor Fusion for Contextual
Trajectory Prediction
Stanford Drone dataset. From left to right:MATF Multi Agent Scene, MATF Multi Agent, and LSTM. Blue past trajectories,
red ground truth, and green predicted. The closer the green predicted trajectory is to the red ground truth future
trajectory, the more accurate the prediction. The model predicts that (1) two agents entering the roundabout from the
top will exit to the left; (2) one agent coming from the left on the pathway above the roundabout is turning left to move
toward the top of the image; (3) one agent is decelerating at the door of the building above and to the right of the
roundabout. (4) In one interesting failure case, an agent on the top-right of the roundabout is turning right to move
toward the top of the image; the model predicts the turn, but not how sharp it will be.
Which Way Are You Going? Imitative Decision
Learning for Path Forecasting in Dynamic Scenes
• Here propose a Imitative Decision Learning (IDL) approach, which delves deeper into the key
that inherently characterizes the multimodality – the latent decision.
• The proposed IDL first infers the distribution of such latent decisions by learning from moving
histories.
• A policy is then generated by taking the sampled latent decision into account to predict the
future.
• Different plausible upcoming paths correspond to each sampled latent decision.
• This approach significantly differs from the mainstream literature that relies on a predefined
latent variable to extrapolate diverse predictions.
• In order to augment the understanding of the latent decision and resultant multimodal future,
investigate their connection through mutual information optimization.
• Moreover, the proposed IDL integrates spatial and temporal dependencies into one single
framework, in contrast to handling them with two-step settings.
• This approach enables simultaneous anticipating the paths of all pedestrians in the scene.
CVPR 2019
Which Way Are You Going? Imitative Decision
Learning for Path Forecasting in Dynamic Scenes
The multimodal nature of future paths in a dynamic scene: There are multiple plausible
forthcoming paths (the dash red and cyan lines) based on identical historical moving
records (the solid red and cyan lines). Here only display three possibilities as an example.
One issue that has been challenging for the task of path forecasting in dynamic scenes is the multimodal
nature of the future: Given a set of historical observations, there will be more than one probable future.
Despite tremendous accomplishments that has been made to foresee a deterministic future, the
majority of the existing studies fail to consider the multiple possibilities of future.
Which Way Are You Going? Imitative Decision
Learning for Path Forecasting in Dynamic Scenes
• This work focuses on understanding and imitating the underlying human decision- making
process to anticipate future paths in dynamic scenes.
• Fundamentally, IDL can be viewed as jointly training:
• (1) an inference sub-network L that extrapolates the latent decision,
• (2) a policy/generator π that recovers a policy to generate upcoming paths,
• (3) a statistics sub-network Q that discovers the impact of latent decision on predictions,
• (4) a discriminator D that attempts to differentiate our generated outcomes from the expert
demonstrations.
• The detailed schematic diagram for forecasting future paths is seen in the following Figure.
Which Way Are You Going? Imitative Decision
Learning for Path Forecasting in Dynamic Scenes
• The red arrows indicate the direction of information flow between each module.
• The black arrows suggest the direction of information flow inside a module.
• The historical trajectories are input into inference sub-network to infer distribution of latent decisions.
• The temporal convolutional sub- module receives the output from the pre-trained convolutional sub-
module and produces a two-unit vector.
• A pre-trained deconvolutional sub-module and a softmax layer read each unit to form the mean and
derivation of a Gaussian distribution of latent decisions.
• Meanwhile, the encoder of policy/generator π processes the historical trajectories by a ConvGRU layer.
• An element-wise addition product on the encoded hidden states henc
tk and sampled latent decision S
initializes the decoder.
• The final predictions are generated from decoded hidden states hdec
t′ through a deconvolutional layer.
• The statistics sub-network reads prediction, latent decision to measure significance of S.
• The discriminator distinguishes predictions from ground truth future paths (expert demonstrations).
Which Way Are You Going? Imitative Decision
Learning for Path Forecasting in Dynamic Scenes
Which Way Are You Going? Imitative Decision
Learning for Path Forecasting in Dynamic Scenes
Which Way Are You Going? Imitative Decision
Learning for Path Forecasting in Dynamic Scenes
Qualitative comparisons on SAP dataset. The top left shows the observed records and the matching ground
truth (G.T.). In order to have a clear visualization for better understanding the multimodality, separately
illustrate several trajectories and the diverse predicted paths apart from others from example 1 to example 5.
TraPHic: Trajectory Prediction in Dense and
Heterogeneous Traffic Using Weighted Interactions
• CVPR 2019
• An algorithm for predicting the near-term trajectories of road agents in dense traffic videos.
• This approach is designed for heterogeneous traffic, where the road agents may correspond
to buses, cars, scooters, bi-cycles, or pedestrians.
• It models the interactions between different road agents using a novel LSTM-CNN hybrid
network for trajectory prediction.
• In particular, it takes into account heterogeneous interactions that implicitly account for the
varying shapes, dynamics, and behaviors of different road agents.
• It also models horizon-based interactions which are used to implicitly model the driving
behavior of each road agent.
• The prediction algorithm, TraPHic, tested on the standard datasets and a new dense,
heterogeneous traffic dataset corresponding to urban Asian videos and agent trajectories.
TraPHic: Trajectory Prediction in Dense and
Heterogeneous Traffic Using Weighted Interactions
• Two observations:
• 1) Based on the idea that road agents in such dense traffic do not react to every road agent
around them; rather, they selectively focus attention on key interactions in a semielliptical region
in the field of view, which is called the “horizon”;
• 2) To capture heterogeneous road agent dynamics, embed their properties into the state-space
representation of the road agents and feed them into the hybrid network.
• TraPHic Network:
• To generate input embeddings for all agents based on trajectory information and heterogeneous
dynamic constraints such as agent shape, velocity, and traffic concentration at the agent’s spatial
coordinates, and other parameters.
• These embeddings are passed through LSTMs and eventually used to construct the horizon map,
the neighbor map and the ego agent’s own tensor map.
• The horizon and neighbor maps are passed through separate ConvNets and then concatenated
together with the ego agent tensor to produce latent representations.
• Finally, these latent representations are passed through an LSTM to generate a trajectory
prediction for the ego agent.
TraPHic: Trajectory Prediction in Dense and
Heterogeneous Traffic Using Weighted Interactions
TraPHic Network Architecture: The ego agent is marked by the red dot. The green elliptical region around it is its
neighborhood and the cyan semi-elliptical region in front of it is its horizon.
TraPHic: Trajectory Prediction in Dense and
Heterogeneous Traffic Using Weighted Interactions
Trajectory Prediction Results: highlight the performance of various trajectory prediction methods on TRAF dataset with
different types of road-agents. The ground truth (GT) trajectory: solid green line, and TraPHic prediction: a solid red line.
The prediction results of other methods (RNN-ED, S-LSTM, S-GAN, CS-LSTM) are drawn with different dashed lines.
Learning to Infer Relations for Future
Trajectory Forecast
• Relational inference is flexible to define ‘an object’ as a spatial feature representation
extracted from each region of the discretized grid regardless of what exists in that region.
• Inferring relational behavior between road users as well as road users and their surrounding
physical space is an important step toward effective modeling and prediction of navigation
strategies adopted by participants in road scenes.
• Here is a relation-aware framework for future trajectory forecast, which aims to infer
relational information from the interactions of road users with each other and with
environments.
• To address the different importance of relations, design a relation gate module (RGM) with
an internal gating process.
• The RGM is beneficial to control of information flow through multiple switch gates and
identifies descriptive relations that highly influence the future motion of the target by
conditioning on its past trajectory.
CVPR 2019
Learning to Infer Relations for Future
Trajectory Forecast
The proposed gated relation encoder (GRE) visually discovers both human-human (j -th region: woman ↔ man)
and human-space interactions (i-th region: cyclist ↔ cone) from each region of the discretized grid over time.
Learning to Infer Relations for Future
Trajectory Forecast
• In this framework, an object is a visual encoding of spatial behavior of road users (if they
exist) and environmental representations together with their temporal interactions over time,
which naturally corresponds to local human-human and human-space interactions in each
region of the discretized grid.
• On top of this, learning to infer relational behavior from all objects (i.e., spatio-temporal
interactions in the context) from a global perspective.
• Given a sequence of images, the gated relation encoder (GRE) visually extracts spatio-
temporal interactions (i.e., objects) through the spatial behavior encoder (SBE) and
temporal interaction encoder (TIE) .
• The RGM of GRE infers pair-wise relations from objects and then focuses on looking at
which relations will be potentially meaningful to forecast the future motion of the target
under its past behavior.
• To predict future locations using the aggregated relational features through the trajectory
prediction network (TPN) in the form of heatmaps which can be further refined by
considering spatial dependencies between predicted locations and extended to learn the
uncertainty of future forecast at test time.
Learning to Infer Relations for Future
Trajectory Forecast
The efficacy of the spatial refinement network
(SRN) for spatial dependencies.
Learning to Infer Relations for Future
Trajectory Forecast
• Predicted heatmaps from the TPN are sometimes ambiguous.
• The main insight for this issue is a lack of spatial dependencies among predictions.
• Since the network independently predicts δ heatmaps, there is no constraint to enforce
them to be spatially aligned between predictions.
• Thus, design a spatial refinement network (SRN) to learn implicit spatial dependencies in a
feature space.
• First concatenate intermediate activations (early and late features) of the TPN and let
through the SRN using large receptive fields.
• As a result, the output show less confusion between heatmap locations, making use of rich
contextual information from neighboring predictions.
• The total loss:
Learning to Infer Relations for Future
Trajectory Forecast
The efficacy of the uncertainty embedding into our framework with MC dropout.
Learning to Infer Relations for Future
Trajectory Forecast
• Bayesian neural networks (BNNs) have been considered to tackle the uncertainty of the
network’s weight parameters.
• It is found that inference in BNNs can be approximated by sampling from posterior
distribution of the deterministic network’s weight parameters using Monte Carlo dropout.
• It performs approximating variational inference using dropout at test time to draw multiple
samples from the dropout distribution.
• It literally enables us to capture multiple plausible trajectories over the uncertainties of the
network’s learned weight parameters.
• However, use the mean of L samples as prediction, which best approximates variational
inference in BNNs.
• It computes the variance of L = 5 samples to measure the uncertainty.
Learning to Infer Relations for Future
Trajectory Forecast
Qualitative evaluation. (Color codes: Yellow - given past trajectory, Red - ground-truth, and Green - this prediction)
Illustrations of prediction during complicated human-human interactions. (a) A cyclist (•••) interacts with a person moving
slow (•••). (b) A person (•••) meets a group of people. (c) A cyclist (•••) first interacts with another cyclist in front (•••)
and then considers the influence of a person (•••). This approach socially avoids potential collisions.
Peeking into the Future: Predicting Future
Person Activities and Locations in Videos
• CVPR 2019
• Deciphering human behaviors to predict their future paths/trajectories and what they would
do from videos is important in many applications.
• Therfore, this work studies predicting a pedestrian’s future path jointly with future activities.
• They propose an end-to-end, multi-task learning system, called Next, utilizing rich visual
features about human behavioral information and interaction with their surroundings.
• It encodes a person through rich semantic features about visual appearance, body
movement and interaction with the surroundings, motivated by the fact that humans derive
such predictions by relying on similar visual cues.
• To facilitate the training, the network is learned with an auxiliary task of predicting future
location in which the activity will happen.
• In the auxiliary task, it designs a discretized grid called the Manhattan Grid, as location
prediction target for the system.
Peeking into the Future: Predicting Future
Person Activities and Locations in Videos
The goal is to jointly predict a person’s future path and activity. The green and yellow line show two
possible future trajectories and two possible activities are shown in the green and yellow boxes.
Depending on the future activity, the person (top right) may take different paths, e.g. the yellow path
for “loading” and the green path for “object transfer”.
Peeking into the Future: Predicting Future
Person Activities and Locations in Videos
• Humans navigate through public spaces often with specific purposes in mind, ranging from
simple ones like entering a room to more complicated ones like putting things into a car.
• Such intention, however, is mostly neglected in existing work.
• The joint prediction model can have two benefits:
• 1) learning the activity together with the path may benefit the future path prediction;
Intuitively, humans are able to read from others’ body language to anticipate whether they
are going to cross the street or continue walking along the sidewalk.
• 2) the joint model advances the capability of understanding not only the future path but also
the future activity by taking into account the rich semantic context in videos; this increases
the capabilities of automated video analytics for social good, such as safety applications like
anticipating pedestrian movement at traffic intersections or a road robot helping humans
transport goods to a car.
Peeking into the Future: Predicting Future
Person Activities and Locations in Videos
Overview of the Next model. Given a sequence of frames containing the person for prediction, this model utilizes
person behavior module and person interaction module to encode rich visual semantics into a feature tensor.
Peeking into the Future: Predicting Future
Person Activities and Locations in Videos
• 4 Key components:
• Person behavior module extracts visual information from the behavioral sequence of
the person.
• Person interaction module looks at the interaction between a person and their
surroundings.
• Trajectory generator summarizes the encoded visual features and predicts the future
trajectory by the LSTM decoder with focal attention.
• Activity prediction utilizes rich visual semantics to predict the future activity label for
the person.
• In addition, divide the scene into a discretized grid of multiple scales, called
Manhattan Grid, to compute classification and regression for robust activity
location prediction.
Peeking into the Future: Predicting Future
Person Activities and Locations in Videos
To model appearance changes of a person, utilize a pre-trained object detection model with “RoIAlign”
to extract fixed size CNN features for each person bounding box.
To average the features along the spatial dimensions for each person and feed them into an LSTM
encoder. Finally, obtain a feature representation of Tobs × d, where d is the hidden size of the LSTM. To
capture the body movement, utilize a person keypoint detection model to extract person keypoint
information. To apply the linear transformation to embed the keypoint coordinates before feeding into
the LSTM encoder. The shape of the encoded feature has the shape of Tobs × d. These appearance
and movement features are commonly used in a wide variety of studies and thus do not introduce new
concern on machine learning fairness.
Peeking into the Future: Predicting Future
Person Activities and Locations in Videos
The person-objects feature can capture how far away the person is to the other
person and the cars. The person-scene feature can capture whether the person is
near the sidewalk or grass. It designs this information to the model with the hope
of learning things like a person walks more often on the sidewalk than the grass
and tends to avoid bumping into cars.
Peeking into the Future: Predicting Future
Person Activities and Locations in Videos
• It uses an LSTM decoder to directly predict the future trajectory in the xy-coordinate.
• The hidden state of this decoder is initialized using the last state of the person’s trajectory
LSTM encoder.
• Add an auxiliary task, i.e. activity location prediction, in addition to predicting the future
activity label of the person.
• At each time instant, the xy-coordinate will be computed from the decoder state and by a
fully connected layer.
• It employs an effective focal attention, originally proposed to carry out multimodal inference
over a sequence of images for visual question answering; which key idea is to project
multiple features into a space of correlation, where discriminative features can be easier to
capture by the attention mechanism.
Peeking into the Future: Predicting Future
Person Activities and Locations in Videos
To bridge the gap between trajectory generation and activity label prediction, it proposes an activity
location prediction (ALP) module to predict the final location of where the person will engage in the future
activity. The activity location prediction includes two tasks, location classification and location regression.
Peeking into the Future: Predicting Future
Person Activities and Locations in Videos
Qualitative comparison between this method and the baselines. Yellow path is the observable trajectory and
green path is the ground truth trajectory during the prediction period. Predictions are shown as blue heatmaps.
Pedestrian behavior/intention modeling for autonomous driving III

More Related Content

What's hot

Driving Behavior for ADAS and Autonomous Driving VIII
Driving Behavior for ADAS and Autonomous Driving VIIIDriving Behavior for ADAS and Autonomous Driving VIII
Driving Behavior for ADAS and Autonomous Driving VIIIYu Huang
 
Driving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XIDriving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XIYu Huang
 
Driving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIDriving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIYu Huang
 
Driving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VIDriving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VIYu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIYu Huang
 
Driving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIDriving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIYu Huang
 
Driving Behavior for ADAS and Autonomous Driving V
Driving Behavior for ADAS and Autonomous Driving VDriving Behavior for ADAS and Autonomous Driving V
Driving Behavior for ADAS and Autonomous Driving VYu Huang
 
Driving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivDriving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivYu Huang
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIYu Huang
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAMYu Huang
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic SegmentationYu Huang
 
Deep VO and SLAM IV
Deep VO and SLAM IVDeep VO and SLAM IV
Deep VO and SLAM IVYu Huang
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIYu Huang
 
Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VYu Huang
 
camera-based Lane detection by deep learning
camera-based Lane detection by deep learningcamera-based Lane detection by deep learning
camera-based Lane detection by deep learningYu Huang
 
Depth Fusion from RGB and Depth Sensors IV
Depth Fusion from RGB and Depth Sensors  IVDepth Fusion from RGB and Depth Sensors  IV
Depth Fusion from RGB and Depth Sensors IVYu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 

What's hot (20)

Driving Behavior for ADAS and Autonomous Driving VIII
Driving Behavior for ADAS and Autonomous Driving VIIIDriving Behavior for ADAS and Autonomous Driving VIII
Driving Behavior for ADAS and Autonomous Driving VIII
 
Driving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XIDriving behaviors for adas and autonomous driving XI
Driving behaviors for adas and autonomous driving XI
 
Driving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving IIIDriving Behavior for ADAS and Autonomous Driving III
Driving Behavior for ADAS and Autonomous Driving III
 
Driving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VIDriving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VI
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Fisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving IIIFisheye-Omnidirectional View in Autonomous Driving III
Fisheye-Omnidirectional View in Autonomous Driving III
 
Driving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VIIDriving Behavior for ADAS and Autonomous Driving VII
Driving Behavior for ADAS and Autonomous Driving VII
 
Driving Behavior for ADAS and Autonomous Driving V
Driving Behavior for ADAS and Autonomous Driving VDriving Behavior for ADAS and Autonomous Driving V
Driving Behavior for ADAS and Autonomous Driving V
 
Driving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xivDriving behaviors for adas and autonomous driving xiv
Driving behaviors for adas and autonomous driving xiv
 
Driving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XIIDriving behaviors for adas and autonomous driving XII
Driving behaviors for adas and autonomous driving XII
 
Camera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning IICamera-Based Road Lane Detection by Deep Learning II
Camera-Based Road Lane Detection by Deep Learning II
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
BEV Semantic Segmentation
BEV Semantic SegmentationBEV Semantic Segmentation
BEV Semantic Segmentation
 
Deep VO and SLAM IV
Deep VO and SLAM IVDeep VO and SLAM IV
Deep VO and SLAM IV
 
Deep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data IIDeep Learning’s Application in Radar Signal Data II
Deep Learning’s Application in Radar Signal Data II
 
Pedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving VPedestrian behavior/intention modeling for autonomous driving V
Pedestrian behavior/intention modeling for autonomous driving V
 
camera-based Lane detection by deep learning
camera-based Lane detection by deep learningcamera-based Lane detection by deep learning
camera-based Lane detection by deep learning
 
Depth Fusion from RGB and Depth Sensors IV
Depth Fusion from RGB and Depth Sensors  IVDepth Fusion from RGB and Depth Sensors  IV
Depth Fusion from RGB and Depth Sensors IV
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 

Similar to Pedestrian behavior/intention modeling for autonomous driving III

IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...IRJET Journal
 
Cooperative positioning and tracking in disruption tolerant networks
Cooperative positioning and tracking in disruption tolerant networksCooperative positioning and tracking in disruption tolerant networks
Cooperative positioning and tracking in disruption tolerant networksJPINFOTECH JAYAPRAKASH
 
Cloudsim t-drive enhancing driving directions with taxi drivers’ intelligence
Cloudsim  t-drive enhancing driving directions with taxi drivers’ intelligenceCloudsim  t-drive enhancing driving directions with taxi drivers’ intelligence
Cloudsim t-drive enhancing driving directions with taxi drivers’ intelligenceecway
 
Java t-drive enhancing driving directions with taxi drivers’ intelligence
Java  t-drive enhancing driving directions with taxi drivers’ intelligenceJava  t-drive enhancing driving directions with taxi drivers’ intelligence
Java t-drive enhancing driving directions with taxi drivers’ intelligenceecwayerode
 
T drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligenceT drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligenceEcway Technologies
 
Java t-drive enhancing driving directions with taxi drivers’ intelligence
Java  t-drive enhancing driving directions with taxi drivers’ intelligenceJava  t-drive enhancing driving directions with taxi drivers’ intelligence
Java t-drive enhancing driving directions with taxi drivers’ intelligenceEcway Technologies
 
Android t-drive enhancing driving directions with taxi drivers’ intelligence
Android  t-drive enhancing driving directions with taxi drivers’ intelligenceAndroid  t-drive enhancing driving directions with taxi drivers’ intelligence
Android t-drive enhancing driving directions with taxi drivers’ intelligenceecway
 
T drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligenceT drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligenceEcway Technologies
 
Dotnet t-drive enhancing driving directions with taxi drivers’ intelligence
Dotnet  t-drive enhancing driving directions with taxi drivers’ intelligenceDotnet  t-drive enhancing driving directions with taxi drivers’ intelligence
Dotnet t-drive enhancing driving directions with taxi drivers’ intelligenceEcway Technologies
 
A Study of Mobile User Movements Prediction Methods
A Study of Mobile User Movements Prediction Methods A Study of Mobile User Movements Prediction Methods
A Study of Mobile User Movements Prediction Methods IJECEIAES
 
Impact of random mobility models on olsr
Impact of random mobility models on olsrImpact of random mobility models on olsr
Impact of random mobility models on olsrijwmn
 
Impact of random mobility models on olsr
Impact of random mobility models on olsrImpact of random mobility models on olsr
Impact of random mobility models on olsrijwmn
 
MEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIES
MEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIESMEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIES
MEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIEScscpconf
 
Measuring similarity between mobility models and real world motion trajectories
Measuring similarity between mobility models and real world motion trajectoriesMeasuring similarity between mobility models and real world motion trajectories
Measuring similarity between mobility models and real world motion trajectoriescsandit
 
Predict Traffic flow with KNN and LSTM
Predict Traffic flow with KNN and LSTMPredict Traffic flow with KNN and LSTM
Predict Traffic flow with KNN and LSTMAfzaal Subhani
 
Scalable algorithms for nearest neighbor joins on big trajectory data
Scalable algorithms for nearest neighbor joins on big trajectory dataScalable algorithms for nearest neighbor joins on big trajectory data
Scalable algorithms for nearest neighbor joins on big trajectory dataieeepondy
 
NS-CUK Seminar: S.T.Nguyen, Review on "Multi-modal Trajectory Prediction for ...
NS-CUK Seminar: S.T.Nguyen, Review on "Multi-modal Trajectory Prediction for ...NS-CUK Seminar: S.T.Nguyen, Review on "Multi-modal Trajectory Prediction for ...
NS-CUK Seminar: S.T.Nguyen, Review on "Multi-modal Trajectory Prediction for ...ssuser4b1f48
 
IEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIsam Al Jawarneh, PhD
 
EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...
EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...
EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...Wireilla
 
EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...
EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...
EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...ijfls
 

Similar to Pedestrian behavior/intention modeling for autonomous driving III (20)

IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...
IRJET - A Review on Pedestrian Behavior Prediction for Intelligent Transport ...
 
Cooperative positioning and tracking in disruption tolerant networks
Cooperative positioning and tracking in disruption tolerant networksCooperative positioning and tracking in disruption tolerant networks
Cooperative positioning and tracking in disruption tolerant networks
 
Cloudsim t-drive enhancing driving directions with taxi drivers’ intelligence
Cloudsim  t-drive enhancing driving directions with taxi drivers’ intelligenceCloudsim  t-drive enhancing driving directions with taxi drivers’ intelligence
Cloudsim t-drive enhancing driving directions with taxi drivers’ intelligence
 
Java t-drive enhancing driving directions with taxi drivers’ intelligence
Java  t-drive enhancing driving directions with taxi drivers’ intelligenceJava  t-drive enhancing driving directions with taxi drivers’ intelligence
Java t-drive enhancing driving directions with taxi drivers’ intelligence
 
T drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligenceT drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligence
 
Java t-drive enhancing driving directions with taxi drivers’ intelligence
Java  t-drive enhancing driving directions with taxi drivers’ intelligenceJava  t-drive enhancing driving directions with taxi drivers’ intelligence
Java t-drive enhancing driving directions with taxi drivers’ intelligence
 
Android t-drive enhancing driving directions with taxi drivers’ intelligence
Android  t-drive enhancing driving directions with taxi drivers’ intelligenceAndroid  t-drive enhancing driving directions with taxi drivers’ intelligence
Android t-drive enhancing driving directions with taxi drivers’ intelligence
 
T drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligenceT drive enhancing driving directions with taxi drivers’ intelligence
T drive enhancing driving directions with taxi drivers’ intelligence
 
Dotnet t-drive enhancing driving directions with taxi drivers’ intelligence
Dotnet  t-drive enhancing driving directions with taxi drivers’ intelligenceDotnet  t-drive enhancing driving directions with taxi drivers’ intelligence
Dotnet t-drive enhancing driving directions with taxi drivers’ intelligence
 
A Study of Mobile User Movements Prediction Methods
A Study of Mobile User Movements Prediction Methods A Study of Mobile User Movements Prediction Methods
A Study of Mobile User Movements Prediction Methods
 
Impact of random mobility models on olsr
Impact of random mobility models on olsrImpact of random mobility models on olsr
Impact of random mobility models on olsr
 
Impact of random mobility models on olsr
Impact of random mobility models on olsrImpact of random mobility models on olsr
Impact of random mobility models on olsr
 
MEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIES
MEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIESMEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIES
MEASURING SIMILARITY BETWEEN MOBILITY MODELS AND REAL WORLD MOTION TRAJECTORIES
 
Measuring similarity between mobility models and real world motion trajectories
Measuring similarity between mobility models and real world motion trajectoriesMeasuring similarity between mobility models and real world motion trajectories
Measuring similarity between mobility models and real world motion trajectories
 
Predict Traffic flow with KNN and LSTM
Predict Traffic flow with KNN and LSTMPredict Traffic flow with KNN and LSTM
Predict Traffic flow with KNN and LSTM
 
Scalable algorithms for nearest neighbor joins on big trajectory data
Scalable algorithms for nearest neighbor joins on big trajectory dataScalable algorithms for nearest neighbor joins on big trajectory data
Scalable algorithms for nearest neighbor joins on big trajectory data
 
NS-CUK Seminar: S.T.Nguyen, Review on "Multi-modal Trajectory Prediction for ...
NS-CUK Seminar: S.T.Nguyen, Review on "Multi-modal Trajectory Prediction for ...NS-CUK Seminar: S.T.Nguyen, Review on "Multi-modal Trajectory Prediction for ...
NS-CUK Seminar: S.T.Nguyen, Review on "Multi-modal Trajectory Prediction for ...
 
IEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al Jawarneh
 
EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...
EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...
EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...
 
EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...
EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...
EFFECTIVE REDIRECTING OF THE MOBILE ROBOT IN A MESSED ENVIRONMENT BASED ON TH...
 

More from Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationYu Huang
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningYu Huang
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainYu Huang
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksYu Huang
 

More from Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Open Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planningOpen Source codes of trajectory prediction & behavior planning
Open Source codes of trajectory prediction & behavior planning
 
Lidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rainLidar in the adverse weather: dust, fog, snow and rain
Lidar in the adverse weather: dust, fog, snow and rain
 
Autonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucksAutonomous Driving of L3/L4 Commercial trucks
Autonomous Driving of L3/L4 Commercial trucks
 

Recently uploaded

THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectssuserb6619e
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxVelmuruganTECE
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentBharaniDharan195623
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 

Recently uploaded (20)

THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managament
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 

Pedestrian behavior/intention modeling for autonomous driving III

  • 1. Pedestrian Behavior/Intention Modeling for Autonomous Driving III Yu Huang Yu.huang07@gmail.com Sunnyvale, California
  • 2. Outline • SR-LSTM: State Refinement for LSTM towards Pedestrian Trajectory Prediction • Pedestrian Path, Pose, and Intention Prediction Through Gaussian Process Dynamical Models • StarNet: Pedestrian Trajectory Prediction using Deep Neural Network in Star Topology • Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs • Multi-Agent Tensor Fusion for Contextual Trajectory Prediction • Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes • TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions • Learning to Infer Relations for Future Trajectory Forecast • Peeking into the Future: Predicting Future Person Activities and Locations in Videos
  • 3. SR-LSTM: State Refinement for LSTM towards Pedestrian Trajectory Prediction • 2019.3 • In crowd scenarios, reliable trajectory prediction of pedestrians requires insightful understanding of their social behaviors. • These behaviors have been well investigated by plenty of studies, while it is hard to be fully expressed by hand-craft rules. • Recent studies based on LSTM networks have shown great ability to learn social behaviors. • However, many of these methods rely on previous neighboring hidden states but ignore the important current intention of the neighbors. • In order to address this issue, this is a data-driven state refinement module for LSTM network (SR- LSTM), which activates the utilization of the current intention of neighbors, and jointly and iteratively refines the current states of all participants in the crowd through a message passing mechanism. • To effectively extract the social effect of neighbors, further introduce a social-aware information selection mechanism consisting of an element-wise motion gate and a pedestrian-wise attention to select useful message from neighboring pedestrians.
  • 4. SR-LSTM: State Refinement for LSTM towards Pedestrian Trajectory Prediction • Current states of neighbors are important for timely interaction inference. When predicting for the lady at time t, considering the trajectory of the man on the right up to time t (a), or the one up to time t − 1 (b), can cause great deviation in predicting results (dashed lines).
  • 5. SR-LSTM: State Refinement for LSTM towards Pedestrian Trajectory Prediction • Useful information should be adaptively selected from neighbors, based on their motions and locations. (a) Activation trajectory patterns of hidden neurons in LSTM, which start from the origin. Each trajectory pattern marked by certain color contains trajectories from database which has top- 20 responses for the hidden neuron. (b) A sample of three pedestrian interaction. How will the dyad pay attention to the other pedestrian on the left?
  • 6. SR-LSTM: State Refinement for LSTM towards Pedestrian Trajectory Prediction Framework overview of SR-LSTM. States refinement module is considered as an additional subnetwork of the LSTM cells, which aligns pedestrians together and updates current states of them. The refined states are used to predict the location at the next time step.
  • 7. SR-LSTM: State Refinement for LSTM towards Pedestrian Trajectory Prediction • The Vanilla LSTM is used for extracting features from the pedestrian trajectory separately. • The main difference is that the States Refinement (SR) module is used for refining the cell states by passing message among pedestrians. • The SR module takes the following three information sources of all pedestrians as input: the current locations of pedestrians, hidden states and cell states from LSTM. • The output of the SR module is the refined cell states. • In the task of pedestrian trajectory prediction, further refinement could improve the quality of the interaction model, indicating the intention negotiation of human interaction natures. • The motion gate and the pedestrian-wise attention jointly select the important information from neighboring pedestrians for message passing.
  • 8. SR-LSTM: State Refinement for LSTM towards Pedestrian Trajectory Prediction In SR- LSTM, current states of pedestrians can timely refine each other, particularly in the case where pedestrians change their intentions. SR-LSTM are able to implicitly explain for common social behaviors, which gives moderate future predictions and relatively low errors.
  • 9. Pedestrian Path, Pose, and Intention Prediction Through Gaussian Process Dynamical Models and Pedestrian Activity Recognition • 2019. 5 IEEE T-ITS • The predictions of pedestrian paths improve current automatic emergency braking systems. • It is to predict future pedestrian paths, poses, and intentions up to 1 s in advance. • This method is based on balanced Gaussian process dynamical models (B-GPDMs), which reduce the 3-D time-related information extracted from key points or joints placed along pedestrian bodies into low-dimensional spaces. • The B-GPDM is also capable of inferring future latent positions and reconstruct their associated observations. • However, learning a generic model for all kinds of pedestrian activities normally provides less accurate predictions. • The proposed method obtains multiple models of four types of activity, i.e., walking, stopping, starting, and standing, and selects the most similar model to estimate future pedestrian states. • This method detects starting activities 125ms after the gait initiation with an accuracy 80% and recognizes stopping intentions 58.33ms before the event with an accuracy 70%.
  • 10. Pedestrian Path, Pose, and Intention Prediction Through Gaussian Process Dynamical Models and Pedestrian Activity Recognition General description of the method based on B-GPDMs. The algorithm is divided into two stages: offline training (top) and online execution (bottom). The method learns multiple models of each type of pedestrian activity, i.e. walking, stopping, starting and standing, and selects the most appropriate one to estimate future pedestrian states at each time step. A training dataset of motion sequences, in which pedestrians perform different activities, is split into 8 subsets based on typical crossing orientations and type of activity. A B-GPDM is obtained for each sequence with one activity contained in the dataset.
  • 11. Pedestrian Path, Pose, and Intention Prediction Through Gaussian Process Dynamical Models and Pedestrian Activity Recognition • The proposed method is based on B-GPDMs, which reduce the 3D time-related positions and displacements extracted from key points or joints placed along the pedestrian bodies into low-dimensional latent spaces. • The B-GPDM also has the peculiarity of inferring future latent positions and reconstructing the observation associated to a latent position from the latent space. • Therefore, it is possible to reconstruct future observations from future latent positions. • In the online execution, given a new pedestrian observation, the current activity is determined using a HMM. • Thus, the selection of the most appropriate model among the trained ones is centered solely on that activity. • Finally, the selected model is used to predict the future latent positions and reconstruct the future pedestrian path and poses.
  • 12. StarNet: Pedestrian Trajectory Prediction using Deep Neural Network in Star Topology • 2019.6 • Pedestrian trajectory prediction is crucial for many important applications. • This problem is a great challenge because of complicated interactions among pedestrians. • Previous methods model only the pairwise interactions between pedestrians, which not only oversimplifies the interactions among pedestrians but also is computationally inefficient. • StarNet has a star topology which includes a unique hub network and multiple host networks. • The hub network takes observed trajectories of all pedestrians to produce a comprehensive description of the interpersonal interactions. • Then the host networks, each of which corresponds to one pedestrian, consult the description and predict future trajectories. • The star topology gives StarNet two advantages over conventional models. • StarNet is able to consider the collective influence among all pedestrians in the hub network, making more accurate predictions. • StarNet is computationally efficient since the number of host network is linear to the number of pedestrians.
  • 13. StarNet: Pedestrian Trajectory Prediction using Deep Neural Network in Star Topology The structure of StarNet. StarNet mainly consists a centralized hub network and several host networks. The hub network collects movement information and generates a feature which describes joint interactions among pedestrians. Each host network, corresponding to a certain pedestrian, queries the hub network and predicts the pedestrian’s trajectory.
  • 14. StarNet: Pedestrian Trajectory Prediction using Deep Neural Network in Star Topology • Pedestrian path prediction is a great challenge due to the uncertainty of future movements. • Conventional methods tackle this problem with manually crafted features. • Data-driven methods remove the requirement of hand- crafted features, and greatly improve the ability to predict pedestrian trajectories. • However, existing methods compute pairwise features, and thus oversimplified the interactions in the real word environment. • Meanwhile, they suffer from a huge computational burden in crowded scenes. • StarNet has two advantages over previous methods. • 1) the representation is to describe not only pairwise interactions but also collective ones. • Such a comprehensive representation enables StarNet to make accurate predictions. • 2) the interactions between one pedestrian and others are efficiently computed. • When predicting all pedestrians’ trajectories, the computational time increases linearly, rather than quadratically, as the number of pedestrians increases.
  • 15. StarNet: Pedestrian Trajectory Prediction using Deep Neural Network in Star Topology The process of predicting the coordinates.
  • 16. StarNet: Pedestrian Trajectory Prediction using Deep Neural Network in Star Topology • The hub network takes all of the observed trajectories simultaneously and produces a comprehensive representation r of the crowd of pedestrians. • The representation r includes both spatial and temporal information of the crowd, which is the key to describe the interactions among pedestrians. • The hub network produces r by two steps: 1) produce a spatial representation of the crowd for each time step; 2) the spatial representation is fed into a LSTM to produce the spatio- temporal representation r. • For the i-th pedestrian, the host network first embeds the observed trajectory Oi, and then combines the embedded trajectory with the spatio-temporal representation rt, predicting the future trajectory. • Specifically, the host network predicts the future trajectory by two steps: 1) take the observed trajectory Oi and the spatio-temporal representation rt as input and generates an integrated representation; 2) predict the future trajectory of the i-th pedestrian depending on the observed trajectory Oi and the integrated representation.
  • 17. StarNet: Pedestrian Trajectory Prediction using Deep Neural Network in Star Topology Predicted trajectories and the corresponding ground truths. Different colors indicate different trajectories. The trajectories of ground truth are labeled with dots. The predicted trajectories are labeled with triangles.
  • 18. Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs • CVPRW 2019 • This paper proposes an approach for predicting the motion of pedestrians interacting with others. • It uses a Generative Adversarial Network (GAN) to sample plausible predictions for any agent in the scene. • As GANs are very susceptible to mode collapsing and dropping, they show that the recently proposed Info-GAN allows dramatic improvements in multi-modal pedestrian trajectory prediction to avoid these issues. • It also left out L2-loss in training the generator, unlike some previous works, because it causes serious mode collapsing though faster convergence. • They show through experiments on real and synthetic data that the proposed method leads to generate more diverse samples and to preserve the modes of the predictive distribution. • In particular, to prove this claim, we have designed a toy example dataset of trajectories that can be used to assess the performance of different methods in preserving the predictive distribution modes.
  • 19. Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs Illustration of the trajectory prediction problem. Having the observed trajectories of a pedestrian of interest, here shown with a *, and the ones of other pedestrians in the environment, the system should be able to build a predictive distribution of possible trajectories (here with two modes in dashed yellow lines).
  • 20. Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs • When deciding his steering actions, a pedestrian anticipates likely scenarios about the evolution of his surrounding in the near future. • Now, this anticipation may not be always very easy, because of the uncertainties in the neighbors future motion and intentions. • In most recent NN-based motion prediction systems, the input is taken as the set of most recent observations of the surrounding pedestrians. • Hence, mappings from observations to predicted trajectories built through the networks do not consider explicitly uncertain and multimodal nature of the neighbors future trajectories, and, in a way, the network is expected to learn it too, which may be too much to expect. • The Social Ways GAN generates independent random trajectory samples that mimic the distribution of trajectories among our training data, conditioned on observed initial tracklets of duration τ for all the agents in the scene.
  • 21. Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs Block Diagram of the Social Ways prediction system. The yellow ellipses represent loss calculations. The dashed arrows show the backpropagation directions. The bold arrows carry ground truth data.
  • 22. Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs • GAN training is known to be hard, as it may not converge, exhibit vanishing gradients when there is imbalance between the Generator and the Discriminator, or may be subject to mode collapsing, i.e. sampling of synthetic data without diversity. • When predicting pedestrian motion, it is critical to avoid mode collapsing, as it could result in catastrophic decisions, i.e. for an autonomous driving agent. • Here introduced two major changes in the GAN training. • First, do not use an L2 loss enforcing the generated samples to be close to the true data, because having observed negative impact of this term in the diversity of the generated samples. • Also, implemented an Info-GAN architecture has a very positive impact on avoiding the mode collapsing problem with respect to other versions of GANs. • Info-GAN learns disentangled representations of the sources of variation among the data, and does so by introducing a new coding variable c as an input. • The training is performed by adding another term to maximize a lower bound of the mutual information between the distribution of c and the distribution of the generated outputs, which requires training another sub-network which serves as a surrogate to evaluate the likelihoods over the generated data.
  • 23. Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs It illustrates sample outputs (in magenta color). The observed trajectories are shown in blue and ground truth prediction and constant-velocity predictions are shown in cyan and orange lines, respectively.
  • 24. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction • Accurate prediction of others’ trajectories is essential for autonomous driving. • Trajectory prediction is challenging because it requires reasoning about agents’ past movements, social interactions among varying numbers and kinds of agents, constraints from the scene context, and the stochasticity of human behavior. • This approach models these interactions and constraints jointly within a Multi-Agent Tensor Fusion (MATF) network. • Specifically, the model encodes multiple agents’ past trajectories and the scene context into a Multi-Agent Tensor, then applies convolutional fusion to capture multiagent interactions while retaining the spatial structure of agents and the scene context. • The model decodes recurrently to multiple agents’ future trajectories, using adversarial loss to learn stochastic predictions. • Experiments on both highway driving and pedestrian crowd datasets show that the model achieves state-of- the-art prediction accuracy. 2019.7
  • 25. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction • There are two parallel encoding streams in the MATF architecture. • One encodes the past trajectories of each individual agent xi independently using single agent LSTM encoders, and another encodes the static scene context image c with a CNN. • Each LSTM encoder shares the same set of parameters, so the architecture is invariant to the number of agents in the scene. • The outputs of the LSTM encoders are 1-D agent state vectors {x′1 , x′2 , .., x′n } without temporal structure. • The output of the scene context encoder CNN is a scaled feature map c′ retaining the spatial structure of the bird’s-eye view static scene context image. • Next, the two encoding streams are concatenated spatially into a Multi-Agent Tensor. • Agent encodings {x′1, x′2, .., x′n} are placed into one bird’s-eye view spatial tensor, which is initialized to 0 and is of the same shape (width and height) as the encoded scene image c′.
  • 26. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction • The dimension axis of the encodings fits into the channel axis of the tensor. • The agent encodings are placed into the spatial tensor with respect to their positions at the last time step of their past trajectories. • This tensor is then concatenated with the encoded scene image in the channel dimension to get a combined tensor. If multiple agents are placed into the same cell in the tensor due to discretization, element-wise max pooling is performed. • The Multi-Agent Tensor is fed into fully convolutional layers, which learn to represent interactions among multiple agents and between agents and the scene context, while retaining spatial locality, to produce a fused Multi-Agent Tensor. • Specifically, these layers operate at multiple spatial resolution scale levels by adopting U- Net-like architectures to model interaction at different spatial scales. • The output feature map of this fused model c′′ has exactly the same shape as c′ in width and height to retain the spatial structure of the encoding.
  • 27. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction The Multi-Agent Tensor encoding is a spatial feature map of the scene context and multiple agents from an overhead perspective, including agent channels (above) and context channels (below). Agents’ feature vectors (red) output from single- Agent LSTM encoders are placed spatially w.r.t. agents’ coordinates to form the agent channels. The agent channels are aligned spatially with the context channels (a context feature map) output from scene context encoding layers to retain the spatial structure.
  • 28. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction • To decode each agent’s predicted trajectory, agent- specific representations with fused interaction features for each agent {x1′′ , x2′′ , .., xn′′ } are sliced out according to their coordinates from the fused Multi-Agent Tensor output c′′. • These agent-specific representations are then added as a residual to the original encoded agent vectors to form final agent encoding vectors {x1′ + x1′′ , x2′ + x2′′ , ..., xn′ + xn′′ }, which encode all the information from the past trajectories of the agents themselves, the static scene context, and the interaction features among multiple agents. • In this way, this approach allows each agent to get a different social and contextual embedding focused on itself. • Importantly, the model gets these embeddings for multiple agents using shared feature extractors instead of operating n times for n agents. • Finally, for each agent in the scene, its final vector xi′ + xi′′ is decoded to future trajectory prediction yiˆ by LSTM decoders. • Similar to the encoders for each agent, parameters are shared to guarantee that the network can generalize well when the number of agents in the scene varies.
  • 29. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction Illustration of the Multi-Agent Tensor Fusion (MATF) architecture.
  • 30. Multi-Agent Tensor Fusion for Contextual Trajectory Prediction Stanford Drone dataset. From left to right:MATF Multi Agent Scene, MATF Multi Agent, and LSTM. Blue past trajectories, red ground truth, and green predicted. The closer the green predicted trajectory is to the red ground truth future trajectory, the more accurate the prediction. The model predicts that (1) two agents entering the roundabout from the top will exit to the left; (2) one agent coming from the left on the pathway above the roundabout is turning left to move toward the top of the image; (3) one agent is decelerating at the door of the building above and to the right of the roundabout. (4) In one interesting failure case, an agent on the top-right of the roundabout is turning right to move toward the top of the image; the model predicts the turn, but not how sharp it will be.
  • 31. Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes • Here propose a Imitative Decision Learning (IDL) approach, which delves deeper into the key that inherently characterizes the multimodality – the latent decision. • The proposed IDL first infers the distribution of such latent decisions by learning from moving histories. • A policy is then generated by taking the sampled latent decision into account to predict the future. • Different plausible upcoming paths correspond to each sampled latent decision. • This approach significantly differs from the mainstream literature that relies on a predefined latent variable to extrapolate diverse predictions. • In order to augment the understanding of the latent decision and resultant multimodal future, investigate their connection through mutual information optimization. • Moreover, the proposed IDL integrates spatial and temporal dependencies into one single framework, in contrast to handling them with two-step settings. • This approach enables simultaneous anticipating the paths of all pedestrians in the scene. CVPR 2019
  • 32. Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes The multimodal nature of future paths in a dynamic scene: There are multiple plausible forthcoming paths (the dash red and cyan lines) based on identical historical moving records (the solid red and cyan lines). Here only display three possibilities as an example. One issue that has been challenging for the task of path forecasting in dynamic scenes is the multimodal nature of the future: Given a set of historical observations, there will be more than one probable future. Despite tremendous accomplishments that has been made to foresee a deterministic future, the majority of the existing studies fail to consider the multiple possibilities of future.
  • 33. Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes • This work focuses on understanding and imitating the underlying human decision- making process to anticipate future paths in dynamic scenes. • Fundamentally, IDL can be viewed as jointly training: • (1) an inference sub-network L that extrapolates the latent decision, • (2) a policy/generator π that recovers a policy to generate upcoming paths, • (3) a statistics sub-network Q that discovers the impact of latent decision on predictions, • (4) a discriminator D that attempts to differentiate our generated outcomes from the expert demonstrations. • The detailed schematic diagram for forecasting future paths is seen in the following Figure.
  • 34. Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes • The red arrows indicate the direction of information flow between each module. • The black arrows suggest the direction of information flow inside a module. • The historical trajectories are input into inference sub-network to infer distribution of latent decisions. • The temporal convolutional sub- module receives the output from the pre-trained convolutional sub- module and produces a two-unit vector. • A pre-trained deconvolutional sub-module and a softmax layer read each unit to form the mean and derivation of a Gaussian distribution of latent decisions. • Meanwhile, the encoder of policy/generator π processes the historical trajectories by a ConvGRU layer. • An element-wise addition product on the encoded hidden states henc tk and sampled latent decision S initializes the decoder. • The final predictions are generated from decoded hidden states hdec t′ through a deconvolutional layer. • The statistics sub-network reads prediction, latent decision to measure significance of S. • The discriminator distinguishes predictions from ground truth future paths (expert demonstrations).
  • 35. Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes
  • 36. Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes
  • 37. Which Way Are You Going? Imitative Decision Learning for Path Forecasting in Dynamic Scenes Qualitative comparisons on SAP dataset. The top left shows the observed records and the matching ground truth (G.T.). In order to have a clear visualization for better understanding the multimodality, separately illustrate several trajectories and the diverse predicted paths apart from others from example 1 to example 5.
  • 38. TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions • CVPR 2019 • An algorithm for predicting the near-term trajectories of road agents in dense traffic videos. • This approach is designed for heterogeneous traffic, where the road agents may correspond to buses, cars, scooters, bi-cycles, or pedestrians. • It models the interactions between different road agents using a novel LSTM-CNN hybrid network for trajectory prediction. • In particular, it takes into account heterogeneous interactions that implicitly account for the varying shapes, dynamics, and behaviors of different road agents. • It also models horizon-based interactions which are used to implicitly model the driving behavior of each road agent. • The prediction algorithm, TraPHic, tested on the standard datasets and a new dense, heterogeneous traffic dataset corresponding to urban Asian videos and agent trajectories.
  • 39. TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions • Two observations: • 1) Based on the idea that road agents in such dense traffic do not react to every road agent around them; rather, they selectively focus attention on key interactions in a semielliptical region in the field of view, which is called the “horizon”; • 2) To capture heterogeneous road agent dynamics, embed their properties into the state-space representation of the road agents and feed them into the hybrid network. • TraPHic Network: • To generate input embeddings for all agents based on trajectory information and heterogeneous dynamic constraints such as agent shape, velocity, and traffic concentration at the agent’s spatial coordinates, and other parameters. • These embeddings are passed through LSTMs and eventually used to construct the horizon map, the neighbor map and the ego agent’s own tensor map. • The horizon and neighbor maps are passed through separate ConvNets and then concatenated together with the ego agent tensor to produce latent representations. • Finally, these latent representations are passed through an LSTM to generate a trajectory prediction for the ego agent.
  • 40. TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions TraPHic Network Architecture: The ego agent is marked by the red dot. The green elliptical region around it is its neighborhood and the cyan semi-elliptical region in front of it is its horizon.
  • 41. TraPHic: Trajectory Prediction in Dense and Heterogeneous Traffic Using Weighted Interactions Trajectory Prediction Results: highlight the performance of various trajectory prediction methods on TRAF dataset with different types of road-agents. The ground truth (GT) trajectory: solid green line, and TraPHic prediction: a solid red line. The prediction results of other methods (RNN-ED, S-LSTM, S-GAN, CS-LSTM) are drawn with different dashed lines.
  • 42. Learning to Infer Relations for Future Trajectory Forecast • Relational inference is flexible to define ‘an object’ as a spatial feature representation extracted from each region of the discretized grid regardless of what exists in that region. • Inferring relational behavior between road users as well as road users and their surrounding physical space is an important step toward effective modeling and prediction of navigation strategies adopted by participants in road scenes. • Here is a relation-aware framework for future trajectory forecast, which aims to infer relational information from the interactions of road users with each other and with environments. • To address the different importance of relations, design a relation gate module (RGM) with an internal gating process. • The RGM is beneficial to control of information flow through multiple switch gates and identifies descriptive relations that highly influence the future motion of the target by conditioning on its past trajectory. CVPR 2019
  • 43. Learning to Infer Relations for Future Trajectory Forecast The proposed gated relation encoder (GRE) visually discovers both human-human (j -th region: woman ↔ man) and human-space interactions (i-th region: cyclist ↔ cone) from each region of the discretized grid over time.
  • 44. Learning to Infer Relations for Future Trajectory Forecast • In this framework, an object is a visual encoding of spatial behavior of road users (if they exist) and environmental representations together with their temporal interactions over time, which naturally corresponds to local human-human and human-space interactions in each region of the discretized grid. • On top of this, learning to infer relational behavior from all objects (i.e., spatio-temporal interactions in the context) from a global perspective. • Given a sequence of images, the gated relation encoder (GRE) visually extracts spatio- temporal interactions (i.e., objects) through the spatial behavior encoder (SBE) and temporal interaction encoder (TIE) . • The RGM of GRE infers pair-wise relations from objects and then focuses on looking at which relations will be potentially meaningful to forecast the future motion of the target under its past behavior. • To predict future locations using the aggregated relational features through the trajectory prediction network (TPN) in the form of heatmaps which can be further refined by considering spatial dependencies between predicted locations and extended to learn the uncertainty of future forecast at test time.
  • 45. Learning to Infer Relations for Future Trajectory Forecast The efficacy of the spatial refinement network (SRN) for spatial dependencies.
  • 46. Learning to Infer Relations for Future Trajectory Forecast • Predicted heatmaps from the TPN are sometimes ambiguous. • The main insight for this issue is a lack of spatial dependencies among predictions. • Since the network independently predicts δ heatmaps, there is no constraint to enforce them to be spatially aligned between predictions. • Thus, design a spatial refinement network (SRN) to learn implicit spatial dependencies in a feature space. • First concatenate intermediate activations (early and late features) of the TPN and let through the SRN using large receptive fields. • As a result, the output show less confusion between heatmap locations, making use of rich contextual information from neighboring predictions. • The total loss:
  • 47. Learning to Infer Relations for Future Trajectory Forecast The efficacy of the uncertainty embedding into our framework with MC dropout.
  • 48. Learning to Infer Relations for Future Trajectory Forecast • Bayesian neural networks (BNNs) have been considered to tackle the uncertainty of the network’s weight parameters. • It is found that inference in BNNs can be approximated by sampling from posterior distribution of the deterministic network’s weight parameters using Monte Carlo dropout. • It performs approximating variational inference using dropout at test time to draw multiple samples from the dropout distribution. • It literally enables us to capture multiple plausible trajectories over the uncertainties of the network’s learned weight parameters. • However, use the mean of L samples as prediction, which best approximates variational inference in BNNs. • It computes the variance of L = 5 samples to measure the uncertainty.
  • 49. Learning to Infer Relations for Future Trajectory Forecast Qualitative evaluation. (Color codes: Yellow - given past trajectory, Red - ground-truth, and Green - this prediction) Illustrations of prediction during complicated human-human interactions. (a) A cyclist (•••) interacts with a person moving slow (•••). (b) A person (•••) meets a group of people. (c) A cyclist (•••) first interacts with another cyclist in front (•••) and then considers the influence of a person (•••). This approach socially avoids potential collisions.
  • 50. Peeking into the Future: Predicting Future Person Activities and Locations in Videos • CVPR 2019 • Deciphering human behaviors to predict their future paths/trajectories and what they would do from videos is important in many applications. • Therfore, this work studies predicting a pedestrian’s future path jointly with future activities. • They propose an end-to-end, multi-task learning system, called Next, utilizing rich visual features about human behavioral information and interaction with their surroundings. • It encodes a person through rich semantic features about visual appearance, body movement and interaction with the surroundings, motivated by the fact that humans derive such predictions by relying on similar visual cues. • To facilitate the training, the network is learned with an auxiliary task of predicting future location in which the activity will happen. • In the auxiliary task, it designs a discretized grid called the Manhattan Grid, as location prediction target for the system.
  • 51. Peeking into the Future: Predicting Future Person Activities and Locations in Videos The goal is to jointly predict a person’s future path and activity. The green and yellow line show two possible future trajectories and two possible activities are shown in the green and yellow boxes. Depending on the future activity, the person (top right) may take different paths, e.g. the yellow path for “loading” and the green path for “object transfer”.
  • 52. Peeking into the Future: Predicting Future Person Activities and Locations in Videos • Humans navigate through public spaces often with specific purposes in mind, ranging from simple ones like entering a room to more complicated ones like putting things into a car. • Such intention, however, is mostly neglected in existing work. • The joint prediction model can have two benefits: • 1) learning the activity together with the path may benefit the future path prediction; Intuitively, humans are able to read from others’ body language to anticipate whether they are going to cross the street or continue walking along the sidewalk. • 2) the joint model advances the capability of understanding not only the future path but also the future activity by taking into account the rich semantic context in videos; this increases the capabilities of automated video analytics for social good, such as safety applications like anticipating pedestrian movement at traffic intersections or a road robot helping humans transport goods to a car.
  • 53. Peeking into the Future: Predicting Future Person Activities and Locations in Videos Overview of the Next model. Given a sequence of frames containing the person for prediction, this model utilizes person behavior module and person interaction module to encode rich visual semantics into a feature tensor.
  • 54. Peeking into the Future: Predicting Future Person Activities and Locations in Videos • 4 Key components: • Person behavior module extracts visual information from the behavioral sequence of the person. • Person interaction module looks at the interaction between a person and their surroundings. • Trajectory generator summarizes the encoded visual features and predicts the future trajectory by the LSTM decoder with focal attention. • Activity prediction utilizes rich visual semantics to predict the future activity label for the person. • In addition, divide the scene into a discretized grid of multiple scales, called Manhattan Grid, to compute classification and regression for robust activity location prediction.
  • 55. Peeking into the Future: Predicting Future Person Activities and Locations in Videos To model appearance changes of a person, utilize a pre-trained object detection model with “RoIAlign” to extract fixed size CNN features for each person bounding box. To average the features along the spatial dimensions for each person and feed them into an LSTM encoder. Finally, obtain a feature representation of Tobs × d, where d is the hidden size of the LSTM. To capture the body movement, utilize a person keypoint detection model to extract person keypoint information. To apply the linear transformation to embed the keypoint coordinates before feeding into the LSTM encoder. The shape of the encoded feature has the shape of Tobs × d. These appearance and movement features are commonly used in a wide variety of studies and thus do not introduce new concern on machine learning fairness.
  • 56. Peeking into the Future: Predicting Future Person Activities and Locations in Videos The person-objects feature can capture how far away the person is to the other person and the cars. The person-scene feature can capture whether the person is near the sidewalk or grass. It designs this information to the model with the hope of learning things like a person walks more often on the sidewalk than the grass and tends to avoid bumping into cars.
  • 57. Peeking into the Future: Predicting Future Person Activities and Locations in Videos • It uses an LSTM decoder to directly predict the future trajectory in the xy-coordinate. • The hidden state of this decoder is initialized using the last state of the person’s trajectory LSTM encoder. • Add an auxiliary task, i.e. activity location prediction, in addition to predicting the future activity label of the person. • At each time instant, the xy-coordinate will be computed from the decoder state and by a fully connected layer. • It employs an effective focal attention, originally proposed to carry out multimodal inference over a sequence of images for visual question answering; which key idea is to project multiple features into a space of correlation, where discriminative features can be easier to capture by the attention mechanism.
  • 58. Peeking into the Future: Predicting Future Person Activities and Locations in Videos To bridge the gap between trajectory generation and activity label prediction, it proposes an activity location prediction (ALP) module to predict the final location of where the person will engage in the future activity. The activity location prediction includes two tasks, location classification and location regression.
  • 59. Peeking into the Future: Predicting Future Person Activities and Locations in Videos Qualitative comparison between this method and the baselines. Yellow path is the observable trajectory and green path is the ground truth trajectory during the prediction period. Predictions are shown as blue heatmaps.