Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Principles of Hierarchical Temporal Memory - Foundations of Machine Intelligence
1. Numenta Workshop
October 17, 2014
Jeff Hawkins
jhawkins@Numenta.com
Principles of Hierarchical Temporal Memory
Foundations of Machine Intelligence
2. 1) Discover operating principles of neocortex.
2) Create technology for machine intelligence
based on neocortical principles.
Numenta’s Mission
3. Why Will Machine Intelligence be Based on Cortical Principles?
1) Cortex uses a common learning algorithm
vision
hearing
touch
behavior
2) Cortical algorithm is incredibly adaptable
languages
engineering
science
arts …
3) Network effects
hardware and software efforts will
focus on most universal solution
4. Talk Topics
- Cortical facts
- Cortical theory (HTM)
- Research roadmap
- Applications roadmap
- Thoughts on Machine Intelligence
easy
deep
easy
5. What the Cortex Does
patterns The neocortex learns a model
from fast changing sensory data
The model generates
- predictions
- anomalies
- actions
Most of sensory changes are due
to your own movement
The neocortex learns a sensory-motor model of the world
patterns
patterns
light
sound
touch
retina
cochlear
somatic
7. Cortical Theory
Hierarchy
Cellular layers
Mini-columns
Neurons w/1000’s of synapses
- 10% proximal
- 90% distal
Active distal dendrites
Synaptogenesis
Remarkably uniform
- anatomically
- functionally
Sheet of cellsHTM
Hierarchical Temporal Memory
1) Hierarchy of identical regions
2) Each region learns sequences
3) Stability increases going up hierarchy if
input is predictable
4) Sequences unfold going down
Questions
- What does a region do?
- What do the cellular layers do?
- How do neurons implement this?
- How does this work in hierarchy?
2/3
4
6
5
8. 2/3
4
5
6
Cellular Layers
Sequence memory
Sequence memory
Sequence memory
Sequence memory
Inference
Inference
Motor
Attention
FeedforwardFeedback
Each layer implements a variation of a common sequence
memory algorithm.
9. 2/3
4
5
6
Copy of motor commands
Sensor/afferent data Next higher region
Two Types of Inference (L4, L2/3)
Learns sensory-motor sequences
Learns high-order sequences Stable
Predicted
Pass through
changes
Un-predicted
A-B-C-D
X-B-C-Y
A-B-C- ? D
X-B-C- ? Y
These are universal inference steps.
They apply to all sensory modalities.
Produces receptive field properties seen in cortex.
11. Sparse Distributed Representations (SDRs)
• Many bits (thousands)
• Few 1’s mostly 0’s
• Example: 2,000 bits, 2% active
• Each bit has semantic meaning
• Learned
01000000000000000001000000000000000000000000000000000010000…………01000
Dense Representations
• Few bits (8 to 128)
• All combinations of 1’s and 0’s
• Example: 8 bit ASCII
• Bits have no inherent meaning
• Arbitrarily assigned by programmer
01101101 = m
Sparse Distributed Representations (SDRs)
The Language of Intelligence
12. SDR Properties
subsampling is OK
3) Union membership:
Indices
1
2
|
10
Is this SDR
a member?
2) Store and Compare:
store indices of active bits
Indices
1
2
3
4
5
|
40
1)
2)
3)
….
10)
2%
20%Union
E.g. a cell can recognize many
unique patterns on a single
dendritic branch.
Ten synapses from Pattern 1
Ten synapses from Pattern N
1) Similarity:
shared bits = semantic similarity
13. Cell activates
from dozens of feedforward patterns
Neurons Recognize Hundreds of Patterns
Cell predicts its activity
in hundreds of contexts
20. - This is a first order sequence memory.
- It cannot learn A-B-C-D vs. X-B-C-Y.
- Mini-columns turn this into a high-order sequence memory.
Learning Transitions
Multiple predictions can occur at once.
A-B A-C A-D
21. Forming High Order Representations
Feedforward
Sparse activation of columns
No prediction
All cells in column become active
With prediction
Only predicted cells in column become active
22. Representing High-order Sequences
A-B-C-D vs. X-B-C-Y
A
X B
B
C
C
Y
D
Before training
A
X B’’
B’
C’’
C’
Y’’
D’
After training
Same columns,
but only one cell active per column.
IF 40 active columns, 10 cells per column
THEN 1040 ways to represent the same input in different contexts
23. HTM Temporal Memory (aka Cellular Layer)
Converts input to sparse activation of columns
Recognizes, and recalls high-order sequences
- Continuous learning
- High capacity
- Local learning rules
- Fault tolerant
- No sensitive parameters
- Semantic generalization
HTM Temporal Memory is a building block of neocortex/machine intelligence
motor
inference
inference
attention
24. 2/3
4
5
6
Research Roadmap
Sensory-motor Inference
High-order Inference
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 10%
Data: Streaming
Capabilities: Prediction
Anomaly detection
Classification
Applications: Predictive maintenance
Security
Natural Language Processing
25. Applications Using HTM High-order Inference
Server anomalies
GROK
available on AWS
Unusual human
behavior
Geospatial
anomalies
Natural language
search/prediction
Cortical.IO
Stock volume
anomalies
HTM High Order
Sequence Memory
Encoder
SDRData Predictions
Anomalies
All use the same HTM code base
26. 2/3
4
5
6
Research Roadmap
Sensory-motor Inference
High-order Inference
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 10%
Data: Streaming
Capabilities: Prediction
Anomaly detection
Classification
Applications: IT
Security
Natural Language Processing
Data: Static
(with simple behaviors)
Capabilities: Classification
Prediction
Applications: Vision image classification
(with saccades)
Network classification
27. 2/3
4
5
6
Research Roadmap
Sensory-motor Inference
High-order Inference
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 10%
Data: Streaming
Capabilities: Prediction
Anomaly detection
Classification
Applications: IT
Security
Natural Language Processing
Data: Static
With simple behaviors
Capabilities: Classification
Prediction
Applications: Vision image classification
(with saccades)
Network classification
Data: Static and/or streaming
Capabilities: Goal-oriented behavior
Applications: Robotics
Smart bots
Proactive defense
28. 2/3
4
5
6
Research Roadmap
Sensory-motor Inference
High-order Inference
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 10%
Data: Streaming
Capabilities: Prediction
Anomaly detection
Classification
Applications: IT
Security
Natural Language Processing
Data: Static
(with simple behavior)
Capabilities: Classification
Prediction
Applications: Vision image classification
(with saccades and hierarchy)
Network classification
Data: Static and/or streaming
Capabilities: Goal-oriented behavior
Applications: Robotics
Smart bots
Proactive defense
Enables : Multi-sensory modalities
Multi-behavioral modalities
29. Algorithms are documented
Multiple independent implementations
Numenta’s software is open source (GPLv3)
NuPIC www.Numenta.org
Active discussion groups for theory and implementation
Numenta’s research code is posted daily
Collaborations
IBM Almaden Research, San Jose, CA
DARPA, Washington D.C
Cortical.IO, Austria
Jhawkins@numenta.com
@Numenta
Research Roadmap: open and transparent
35. Machine Intelligence Landscape
Premise Biological Mathematical Engineered
Data Spatial-temporal
Behavior
Spatial-temporal Language
Documents
Capabilities Prediction
Classification
Goal-oriented Behavior
Classification NL Query
Valuable? Yes Yes Yes
Path to M.I.? Yes Probably not No
Cortical
(e.g. HTM)
ANNs
(e.g. Deep learning)
A.I.
(e.g. Watson)
36. 1940’s 1950’s
- Analog vs. digital
- Decimal vs. binary
- Wired vs. memory-based programming
- Serial vs. random access memory
Many approaches
- Digital
- Binary
- Memory-based programming
- Two tier memory
One dominant paradigm
The Birth of Programmable Computing
Why Did One Paradigm Win?
- Network effects
Why Did This Paradigm Win?
- Most flexible
- Most scalable
37. 2010’s 2020’s
The Birth of Machine Intelligence
- Specific vs. universal algorithms
- Mathematical vs. memory-based
- Spatial vs. time-based patterns
- Batch vs. on-line learning
Many approaches
- Universal algorithms
- Memory-based
- Time-based patterns
- On-line learning
One dominant paradigm
Why Will One Paradigm Win?
- Network effects
Why Will This Paradigm Win?
- Most flexible
- Most scalable
How Do We Know This is Going to Happen?
- Brain is proof case
- We have made great progress
38. What Can Be Done With Software
1 layer
30 msec / learning-inference-prediction step
10-6 of human cortex
2048 columns 65,000 neurons
300M synapses
39. Challenges
Dendritic regions
Active dendrites
1,000s of synapses
10,000s of potential synapses
Continuous learning
Challenges and Opportunities for Neuromorphic HW
Opportunities
Low precision memory (synapses)
Fault tolerant
- memory
- connectivity
- neurons
- natural recovery
Simple activation states (no spikes)
Connectivity
- very sparse, topological
40. Requirements for Online learning
• Train on every new input
• If pattern does not repeat, forget it
• If pattern repeats, reinforce it
Connection strength/weight is binary
Connection permanence is a scalar
Training changes permanence
If permanence > threshold then connected
Learning is the
formation of connections
10
connectedunconnected
Connection
permanence 0.2
41. 1
2/3
4
5
6
Motor
1
2/3
4
5
6
Sensory
Motor
Kinesthetic
Thalamus
Thalamus
We believe all layers implement variations of the same learning algorithm:
- Learning transitions in afferent data.
Stable representations are formed for predicted transitions.
Unpredicted transitions are passed to next layer.
Layer 4: Learns sensory/motor transitions.
Layer 3: Learns high-order sequence transitions.
Layers 5 and 6 learn sequences for motor and attention
Sequence Memory
Cortical Layers
42. Document corpus
(e.g. Wikipedia)
128 x 128
100K “Word SDRs”
- =
Apple Fruit Computer
Macintosh
Microsoft
Mac
Linux
Operating system
….
Natural Language +
43. Training set
frog eats flies
cow eats grain
elephant eats leaves
goat eats grass
wolf eats rabbit
cat likes ball
elephant likes water
sheep eats grass
cat eats salmon
wolf eats mice
lion eats cow
dog likes sleep
elephant likes water
cat likes ball
coyote eats rodent
coyote eats rabbit
wolf eats squirrel
dog likes sleep
cat likes ball
---- ---- -----
Word 3Word 2Word 1
Sequences of Word SDRs
HTM
44. Training set
eats“fox”
?
frog eats flies
cow eats grain
elephant eats leaves
goat eats grass
wolf eats rabbit
cat likes ball
elephant likes water
sheep eats grass
cat eats salmon
wolf eats mice
lion eats cow
dog likes sleep
elephant likes water
cat likes ball
coyote eats rodent
coyote eats rabbit
wolf eats squirrel
dog likes sleep
cat likes ball
---- ---- -----
Sequences of Word SDRs
HTM
45. Training set
eats“fox”
rodent
1) Unsupervised Learning
2) Semantic Generalization
3) Many Applications
frog eats flies
cow eats grain
elephant eats leaves
goat eats grass
wolf eats rabbit
cat likes ball
elephant likes water
sheep eats grass
cat eats salmon
wolf eats mice
lion eats cow
dog likes sleep
elephant likes water
cat likes ball
coyote eats rodent
coyote eats rabbit
wolf eats squirrel
dog likes sleep
cat likes ball
---- ---- -----
Sequences of Word SDRs
HTM