Numenta VP Research Subutai Ahmad presents a talk on "Sparsity in the Neocortex and its Implications for Continual Learning" at the virtual CVPR 2020 workshop. In this talk, he discusses how continuous learning systems can benefit from sparsity, active dendrites and other neocortical mechanisms.
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for continuous learning
1. SPARSITY IN THE NEOCORTEX,
AND ITS IMPLICATIONS FOR CONTINUOUS LEARNING
CVPR WORKSHOP ON CONTINUAL LEARNING IN COMPUTER VISION
JUNE 14, 2020
Subutai Ahmad
Email: sahmad@numenta.com
Twitter: @SubutaiAhmad
2. 1) Reverse engineer the neocortex
- biologically accurate theories
- open access neuroscience publications
2) Apply neocortical principles to AI
- improve current techniques
- move toward truly intelligent systems
Mission
Founded in 2005,
by Jeff Hawkins and Donna Dubinsky
3. OUTLINE
1. Sparsity in the neocortex
• Sparse activations and connectivity
• Neuron model
• Learning rules
2. Sparse representations and catastrophic forgetting
• Stability
• Plasticity
3. Network model
• Unsupervised continuously learning system
5. “mostly missing”
sparse vector = vector with mostly zero elements
Most neuroscience papers describe three types of sparsity:
1) Population sparsity
How many neurons are active right now?
Estimate: roughly 0.5% to 2% of cells are active at a time (Attwell & Laughlin, 2001; Lennie, 2003).
2) Lifetime sparsity
How often does a given cell fire?
3) Connection sparsity
When a layer of cells projects to another layer, what percentage are connected?
Estimate: 1% - 5% of possible neuron to neuron connections exist (Holmgren et al., 2003).
WHAT EXACTLY IS “SPARSITY”?
6. “axon”“soma”
Point Neuron Model
NEURON MODEL
x Not a neuron
Integrate and fire neuron: Lapicque, 1907
Perceptron: Rosenblatt 1962;
Deep learning: Rumelhart et al. 1986; LeCun et al., 2015
8. DENDRITES DETECT SPARSE PATTERNS
(Mel, 1992; Branco & Häusser, 2011; Schiller et al,
2000; Losonczy, 2006; Antic et al, 2010; Major et al,
2013; Spruston, 2008; Milojkovic et al, 2005, etc.)
Major, Larkum and Schiller 2013
Pyramidal neuron
3K to 10K synapses
Dendrites split into dozens of independent computational segments
These segments activate with cluster of 10-20 active synapses
Neurons detect dozens of highly sparse patterns, in parallel
9. Pyramidal neuron
Sparse feedforward patterns
Sparse local patterns
Sparse top-down patterns
9
Learning localized to dendritic segments
“Branch specific plasticity”
If cell becomes active:
• If there was a dendritic spike, reinforce that segment
• If there were no dendritic spikes, grow connections by
subsampling cells active in the past
If cell is not active:
• If there was a dendritic spike, weaken the segments
(Gordon et al., 2006; Losonczy et al., 2008; Yang et al., 2014; Cichon & Gang, 2015;
El-Boustani et al., 2018; Weber et al., 2016; Sander et al., 2016; Holthoff et al., 2004)
NEURONS UNDERGO SPARSE LEARNING
10. “We observed substantial spine turnover, indicating that the architecture of the neuronal circuits in the
auditory cortex is dynamic (Fig. 1B). Indeed, 31% ± 1% (SEM) of the spines in a given imaging
session were not detected in the previous imaging session; and, similarly, 31 ± 1% (SEM) of the spines
identified in an imaging session were no longer found in the next imaging session.
(Loewenstein, et al., 2015)
Learning involves growing and removing synapses
• Structural plasticity: network structure is dynamically altered during learning
HIGHLY DYNAMIC LEARNING AND CONNECTIVITY
11. OUTLINE
1. Sparsity in the neocortex
• Neural activations and connectivity are highly sparse
• Neurons detect dozens of independent sparse patterns
• Learning is sparse and incredibly dynamic
2. Sparse representations and catastrophic forgetting
• Stability
• Plasticity
3. Network model
• Unsupervised continuously learning system
12. Thousands of neurons send input to any single
neuron
On each neuron, 8-20 synapses on tiny segments of
dendrites recognize patterns.
The connections are learned.
STABILITY OF SPARSE REPRESENTATIONS
Pyramidal neuron
3K to 10K synapses
xi<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
n inputs<latexit sha1_base64="dwl2mcnPQNgv8dEIjD6vL0j60R0=">AAAB+XicbVDLSsNAFJ3UV62vqEtFBovgqiS60GXRjcsW7APaUCbTSTt0MgkzN8USuvQv3LhQxK2bfoc7v8GfcJp2oa0HLhzOuZd77/FjwTU4zpeVW1ldW9/Ibxa2tnd29+z9g7qOEkVZjUYiUk2faCa4ZDXgIFgzVoyEvmANf3A79RtDpjSP5D2MYuaFpCd5wCkBI3VsW+I2sAdIMZdxAnrcsYtOycmAl4k7J8Xy8aT6/XgyqXTsz3Y3oknIJFBBtG65TgxeShRwKti40E40iwkdkB5rGSpJyLSXZpeP8ZlRujiIlCkJOFN/T6Qk1HoU+qYzJNDXi95U/M9rJRBce2n2E5N0tihIBIYIT2PAXa4YBTEyhFDFza2Y9okiFExYBROCu/jyMqlflNzLklM1adygGfLoCJ2ic+SiK1RGd6iCaoiiIXpCL+jVSq1n6816n7XmrPnMIfoD6+MHppeXXg==</latexit>
P(xi · xj ✓)<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
Sparse vector matching
= connections on dendrite
= input activity
13. We can get excellent robustness by reducing , at
the cost of increased “false positives” and
interference.
Can compute the probability of a random
vector matching a given :
Numerator: volume around point (white)
Denominator: full volume of space (grey)
P (xi · xj ✓) =
P|xi|
b=✓ | ⌦n
(xi, b, |xj|) |
n
|xj |<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
|⌦n
(xi, b, k)| =
✓
|xi|
b
◆✓
n |xi|
k b
◆
<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
xi<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
P(xi · xj ✓)<latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit><latexit sha1_base64="(null)">(null)</latexit>
STABILITY OF SPARSE REPRESENTATIONS
(Ahmad & Scheinkman, 2019)
14. 1) False positive error decreases exponentially with dimensionality with sparsity.
2) Error rates do not decrease when activity is dense (a=n/2).
3) Assume uniform random distribution of vectors.
Sparse binary vectors: probability of interference Sparse scalar vectors: probability of interference
STABILITY OF SPARSE REPRESENTATIONS
(Ahmad & Scheinkman, 2019)
16. (Hawkins & Ahmad, 2016)
Model pyramidal neuron
Simple localized learning rules
When a cell becomes active:
1) If a segment detected pattern, reinforce that segment
2) If no segment detected a pattern, grow new connections
on new dendritic segment
If cell did not become active:
1) If a segment detected pattern, weaken that segment
- Learning consists of growing new connections
- Neurons learn continuously but since patterns are
sparse and learning is sparse, new patterns don’t
interfere with old ones
Sparse top-down context
Sparse local context
Sparse feedforward patterns
STABILITY VS PLASTICITY
17. OUTLINE
1. Sparsity in the neocortex
• Neural activations and connectivity are highly sparse
• Neurons detect dozens of independent sparse patterns
• Learning is sparse and incredibly dynamic
2. Sparse representations and catastrophic forgetting
• Sparse high dimensional representations are remarkably stable
• Local plasticity rules enable learning new patterns without interference
3. Network model
• Unsupervised continuously learning system
18. (Hawkins & Ahmad, 2016)
HTM SEQUENCE MEMORY
Model pyramidal neuron
Sparse top-down context
Sparse local context
Sparse feedforward patterns
1) Associates past activity as context for current activity
2) Automatically learns from prediction errors
3) Learns continuously without forgetting past patterns
4) Can learn complex high-Markov order sequences
19. CONTINUOUS LEARNING AND FAULT TOLERANCE
Input: continuous stream of non-Markov sequences interspersed with random input
Task: correctly predict the next element (max accuracy is 50%)
XABCDE noise YABCFG noise YABCFG noise……
time
(Hawkins & Ahmad, 2016)
Changed sequences mid-stream
“killed” neurons
20. Recurrent
Neural network
(ESN, LSTM)
HTM*
CONTINUOUS LEARNING WITH STREAMING DATA SOURCES
(Cui et al, Neural Computation, 2016)
2015-04-20
Monday
2015-04-21
Tuesday
2015-04-22
Wednesday
2015-04-23
Thursday
2015-04-24
Friday
2015-04-25
Saturday
2015-04-26
Sunday
0 k
5 k
10 k
15 k
20 k
25 k
30 k
PassengerCountin30minwindow
A
B C
Shift
ARIM
ALSTM
1000LSTM
3000LSTM
6000
TM
0.0
0.2
0.4
0.6
0.8
1.0
NRMSE
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
MAPE
0.0
0.5
1.0
1.5
2.0
2.5
NegativeLog-likelihood
Shift
ARIM
ALSTM
1000LSTM
3000LSTM
6000
TM
LSTM
1000LSTM
3000LSTM
6000
TM
D
NYC Taxi demand datastream
Source: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
?
22. ANOMALY DETECTION
Benchmark for anomaly detection in streaming applications
Detector Score
Perfect 100.0
HTM 70.1
CAD OSE† 69.9
nab-comportex† 64.6
KNN CAD† 58.0
Relative Entropy 54.6
Twitter ADVec v1.0.0 47.1
Windowed Gaussian 39.6
Etsy Skyline 35.7
Sliding Threshold 30.7
Bayesian
Changepoint
17.7
EXPoSE 16.4
Random 11.0
• Real-world data (365,551 points, 58 data streams)
• Scoring encourages early detection
• Published, open resource
(Ahmad et al, 2017)
23. SUMMARY
1. Sparsity in the neocortex
• Neural activations and connectivity are highly sparse
• Neurons detect dozens of independent sparse patterns
• Learning is sparse and incredibly dynamic
2. Sparse representations and catastrophic forgetting
• Sparse high dimensional representations are remarkably stable
• Local plasticity rules enable learning new patterns without interference
3. Network model
• Biologically inspired unsupervised continuously learning system
• Inherently stable representations
• Thank you! Questions? sahmad@numenta.com