SlideShare uma empresa Scribd logo
1 de 72
Baixar para ler offline
 William M. Pottenger, Ph.D.
All Rights Reserved
To be or not to be IID:
That is the Question
Higher Order Learning
William M. Pottenger, Ph.D.
Rutgers University and Intuidex, Inc.
DrWMP@rci.rutgers.edu; www.dimacs.rutgers.edu/~billp
DrWMPottenger@Intuidex.com; www.intuidex.com
 William M. Pottenger, Ph.D.
All Rights Reserved
Dr. William M. Pottenger
www.dimacs.rutgers.edu/~billp
www.intuidex.com
• Example Application Areas
– Homeland Security/Law
Enforcement/Criminal Justice
Information Systems
– Decision Support Systems
– Information Retrieval Systems
– High Performance Computing
• Research Funded by
– National Science Foundation
– National Institute of Justice
– Department of Homeland Security
– Army Research Lab
– Commonwealth of Pennsylvania
– Corporate Partners
– E.g., Lockheed-Martin, Kodak,
PNNL, Boeing, etc.
• Associate Research Professor
@ Rutgers University
– DIMACS & Computer Science
• CEO of Intuidex, Inc.
• Director of Transition for
DHS S&T CCI Center
• Research Scientist @ NCSA
• M.S., Ph.D. in CS at UIUC
• Research Interests
– Statistical Relational Learning
– Leveraging higher-order
relations in graphs of data
– Parallel and Distributed Visual
& Data Analytics
– Analytics in a parallel and/or
distributed environment
– Information Extraction
– Automatic extraction of
keywords/features from text
2
 William M. Pottenger, Ph.D.
All Rights Reserved
What is Higher Order Information?
• Swanson (‘91) posed problem: Migraine headaches (M)
– stress associated with M
– stress leads to loss of magnesium
– calcium channel blockers prevent some M
– magnesium is a natural calcium channel blocker
– spreading cortical depression (SCD) implicated in M
– high levels of magnesium inhibit SCD
– M patients have high platelet aggregability
– magnesium can suppress platelet aggregability
• All extracted from medical journal titles
Slide reused with permission of Marti Hearst @ UCB
3
 William M. Pottenger, Ph.D.
All Rights Reserved
Gathering Evidence
stress
migraine
CCB
magnesium
PA
magnesium
SCD
magnesiummagnesium
Slide reused with permission of Marti Hearst @ UCB
4
 William M. Pottenger, Ph.D.
All Rights Reserved
Higher Order Paths!
migraine magnesium
stress
CCB
PA
SCD
Slide reused with permission of Marti Hearst @ UCB
5
 William M. Pottenger, Ph.D.
All Rights Reserved
Related Work:
Link Mining and Collective Classification
 Link-based approaches (Taskar et al., 2001; Getoor and
Diehl, 2005; Lu and Getoor, 2003; Neville and Jensen
2004) to collective classification use explicit link
information within networked data
 Studies (Chakrabarti et al., 1998; Neville and Jensen,
2000; Taskar et al., 2001) have shown that collective
classifiers can achieve significant reductions in
classification errors by performing inference about
multiple data instances simultaneously
 Collective classifiers are context-dependent and are not
designed to classify stand-alone data instances
 We propose classification methods that leverage implicit
links between features in small training sets, and that
maintain the ability for “context-free” classification of
individual data instances
6
 William M. Pottenger, Ph.D.
All Rights Reserved
Is there a theoretical basis for the use
of higher order co-occurrence relations?
• Research agenda: study machine learning
algorithms in search of a theoretical foundation
for the use of higher order relations
• First algorithm: Latent Semantic Indexing (LSI)
– Widely used technique in text mining and IR based on
the Singular Value Decomposition (SVD) matrix
factoring algorithm
– Research question: Does LSI use higher order term
co-occurrence?
– First step: study SVD
7
April Kontostathis
Associate Professor
@ Ursinus College
 William M. Pottenger, Ph.D.
All Rights Reserved
Is there a theoretical basis for the use of
higher order co-occurrence relations in LSI?
s1
s2
s3
sr
A (m x n)

T (m x r) S (r x r) DT (r x n)
Term by Doc Term by
Dimension
Singular
Values
Dimension by Document
s1 <= s2 <= s3 <= . . . <=sr
r = rank of A, m = num terms, n = number docs
Singular Value Decomposition
8
 William M. Pottenger, Ph.D.
All Rights Reserved
Is there a theoretical basis for the use of
higher order co-occurrence relations in LSI?
s1
s2
s3
sr
A (m x n)

T (m x k) S (k x k) DT (k x n)
Reduced
Term by Doc
Term by
Dimension
Singular
Values
Dimension by Document
s1 <= s2 <= s3 <= . . . <=sr
r = rank of A, m = num terms, n = number docs
LSI: Truncation of Singular Values
9
 William M. Pottenger, Ph.D.
All Rights Reserved
Is there a theoretical basis for the use of
higher order co-occurrence relations in LSI?
human
interface
computer
user
system
response
time
EPS
Survey
trees
graph
minors
human x 1 1 0 2 0 0 1 0 0 0 0
interface 1 x 1 1 1 0 0 1 0 0 0 0
computer 1 1 x 1 1 1 1 0 1 0 0 0
user 0 1 1 x 2 2 2 1 1 0 0 0
system 2 1 1 2 x 1 1 3 1 0 0 0
response 0 0 1 2 1 x 2 0 1 0 0 0
time 0 0 1 2 1 2 x 0 1 0 0 0
EPS 1 1 0 1 3 0 0 x 0 0 0 0
Survey 0 0 1 1 1 1 1 0 x 0 1 1
trees 0 0 0 0 0 0 0 0 0 x 2 1
graph 0 0 0 0 0 0 0 0 1 2 x 2
minors 0 0 0 0 0 0 0 0 1 1 2 x
Deerwester Term by Term Matrix
human
interface
computer
user
system
response
time
EPS
Survey
trees
graph
minors
human x 0.54 0.56 0.94 1.69 0.58 0.58 0.84 0.32 -0.32 -0.34 -0.25
interface 0.54 x 0.52 0.87 1.50 0.55 0.55 0.73 0.35 -0.20 -0.19 -0.14
computer 0.56 0.52 x 1.09 1.67 0.75 0.75 0.77 0.63 0.15 0.27 0.20
user 0.94 0.87 1.09 x 2.79 1.25 1.25 1.28 1.04 0.23 0.42 0.31
system 1.69 1.50 1.67 2.79 x 1.81 1.81 2.30 1.20 -0.47 -0.39 -0.28
response 0.58 0.55 0.75 1.25 1.81 x 0.89 0.80 0.82 0.38 0.56 0.41
time 0.58 0.55 0.75 1.25 1.81 0.89 x 0.80 0.82 0.38 0.56 0.41
EPS 0.84 0.73 0.77 1.28 2.30 0.80 0.80 x 0.46 -0.41 -0.43 -0.31
Survey 0.32 0.35 0.63 1.04 1.20 0.82 0.82 0.46 x 0.88 1.17 0.85
trees -0.32 -0.20 0.15 0.23 -0.47 0.38 0.38 -0.41 0.88 x 1.96 1.43
graph -0.34 -0.19 0.27 0.42 -0.39 0.56 0.56 -0.43 1.17 1.96 x 1.81
minors -0.25 -0.14 0.20 0.31 -0.28 0.41 0.41 -0.31 0.85 1.43 1.81 x
Deerwester Term by Term Matrix, truncated to two dimensions
10
 William M. Pottenger, Ph.D.
All Rights Reserved
• Answer is in the following theorem we proved:
If the ijth element of the truncated term by
term matrix, Y, is non-zero, then there exists a
co-occurrence path of order  1 between terms
i and j.
– Kontostathis, A. and Pottenger, W. M. (2006) A
Framework for Understanding LSI Performance.
Information Processing & Management, volume 42,
issue 1, pages 56-73.
• We have both proven mathematically and
demonstrated empirically that LSI is based on
the use of higher order co-occurrence relations.
• Next step?
Is there a theoretical basis for the use of
higher order co-occurrence relations in LSI?
11
 William M. Pottenger, Ph.D.
All Rights Reserved
Using Higher Order Information in both
Generative and Discriminative Learning
• Extend the theoretical foundation that April and
I developed by studying characteristics of
higher-order information in other machine
learning approaches including both generative and
discriminative supervised learning as well as
unsupervised approaches
– Ganiz, M. C., Lytkin, N. I. and Pottenger, W. M.
(2009) Leveraging Higher Order Dependencies Between
Features for Text Classification. In the Proceedings
of the European Conference on Machine Learning and
Principles and Practice of Knowledge Discovery in
Databases (ECML PKDD). Bled, Slovenia, September.
Nikita Lytkin
Research Scientist @
NYU Medical Center
Murat Ganiz
Assistant Professor
@ Dogus University
 William M. Pottenger, Ph.D.
All Rights Reserved
Representation of Boolean
Data by a Bipartite Graph
13
 William M. Pottenger, Ph.D.
All Rights Reserved
Multinomial vs. Multivariate Event Model
McCallum & Nigam (1998)
14
 William M. Pottenger, Ph.D.
All Rights Reserved
First Order Paths in a Data Graph
15
 William M. Pottenger, Ph.D.
All Rights Reserved
Second Order Paths in a Data Graph
16
 William M. Pottenger, Ph.D.
All Rights Reserved
Patterns of Connectivity between Features
17
 William M. Pottenger, Ph.D.
All Rights Reserved
Probabilistic Characterization of Features by
Second Order Paths
18
 William M. Pottenger, Ph.D.
All Rights Reserved
Higher Order Naïve Bayes:
A Generative Learner
Murat Ganiz
Assistant Professor
@ Dogus University
19
 William M. Pottenger, Ph.D.
All Rights Reserved
20
Slonim & Tishby (2001) vs. HONB
Ganiz, M. C., Pottenger, W. M. and George, C. (2010) Higher Order
Naïve Bayes: A Novel Non-IID Approach to Text Classification. IEEE
Transactions of Knowledge and Data Engineering (TKDE).
multinomial features binary features
Dataset NB NB_wc improvement % NB HONB improvement %
COMP (5) 0.473 0.508 7.4 0.51 0.65 26.5
SCIENCE (4) 0.65 0.725 11.5 0.6 0.84 41.6
POLITICS (3) 0.62 0.67 8.1 0.68 0.83 22.8
RELIGION (3) 0.525 0.553 5.3 0.64 0.74 15.7
8.075 26.65
 HONB achieves statistically significantly better performance
than NB for four datasets based on t-test results
 (Slonim & Tishby, 2001) did not report std dev or t-test results
 William M. Pottenger, Ph.D.
All Rights Reserved
Supervised Second Order Transformation
for Discriminative Learning
21
Nikita Lytkin Research
Scientist @ NYU Medical
Center
 William M. Pottenger, Ph.D.
All Rights Reserved
Influence of Higher-Order Paths
22
 William M. Pottenger, Ph.D.
All Rights Reserved
Experimental Setup
 Support Vector Machine (Vapnik 1998) was
used to evaluate the Supervised Second Order
Transformation
 Multi-class classification by SVM was
performed using the “one-against-one”
scheme
 Used RBF and linear kernels in SVM and
varied soft margin cost from 10-4 to 104
 Training set size varied from 5% to 60%
 Eight experiments performed at each sample
size
25
 William M. Pottenger, Ph.D.
All Rights Reserved
 Six benchmark text corpora were selected
 Stop words were removed, others were stemmed
 For the RELIGION, POLITICS, SCIENCE and COMP
subsets of the 20 Newsgroups dataset, the top 2000
terms ranked by Information Gain were selected;
500 documents per class were sampled at random for
comparison with Slonim and Tishby (2001)
Experimental Setup (continued)
Dataset # classes total # docs # terms
RELIGION 3 1500 2000
POLITICS 3 1500 2000
SCIENCE 4 2000 2000
COMP 5 2500 2000
Citeseer 6 3312 3703
Cora 6 2708 1433
26
 William M. Pottenger, Ph.D.
All Rights Reserved
Scalability Across Training Set Sizes
27
 William M. Pottenger, Ph.D.
All Rights Reserved
Results for Naïve Bayes, SVM, HONB and
HOSVM on 20NG REL & SCI Datasets
28
 William M. Pottenger, Ph.D.
All Rights Reserved
Results for Naïve Bayes, SVM, HONB and
HOSVM on Citeseer & Cora Datasets
29
 William M. Pottenger, Ph.D.
All Rights Reserved
Significance of Results for Naïve Bayes,
SVM, HONB and HOSVM on All Datasets
30
 HONB consistently and statistically significantly outperformed NB
on all datasets (significant at <= 5% p-value)
 HOSVM outperformed SVM on the RELIGION, POLITICS and
SCIENCE datasets (significant at <= 5% p-value)
 Although, the difference between HOSVM and SVM on the COMP
dataset was significant at the level 0.158, HOSVM outperformed
SVM on seven out of eight trials by an average of 3%
 William M. Pottenger, Ph.D.
All Rights Reserved
What role do higher-order relations play in
supervised machine learning?
• Higher-Order Collective Classification (HOCC)
– Classifies a set of instances simultaneously and thus exploits the
relationships between them; Based on a record-relation graph
– Capable of both supervised event detection as well as
unsupervised anomaly detection
• Application: Classification and Anomaly Detection of
Interdomain Routing Events
– Goal: detect and categorize such events
– Menon, V. and Pottenger, W. M. (2009) A Higher Order
Collective Classifier for Detecting and Classifying Network
Events. In the Proceedings of the IEEE International Conference
on Intelligence and Security Informatics 2009 (ISI 2009)
31
Vikas Menon
Software Developer @
Bridgewater Associates
 William M. Pottenger, Ph.D.
All Rights Reserved
HOCC Results
• Detection of Interdomain Routing Events and Anomalies Based on
Higher-Order Path Analysis
– Slammer worm attack, Witty worm attack, 2003 East Coast Blackout
• Real Time Classification of Abnormal Events
– Sliding window samples of 120 three-second instances
– 180th window = start of event
– HOCC detects events and distinguishes anomalies
Witty (Supervised) Witty (Unsupervised)
32
 William M. Pottenger, Ph.D.
All Rights Reserved
What role do higher-order relations play in
unsupervised machine learning?
• Next step? Consider unsupervised learning…
– Association Rule Mining (ARM)
• ARM is one of the most widely used algorithms in
data mining
– Extend ARM to higher order… Higher Order
Apriori
• LHOIM (Latent Higher-Order Information Mining)
• Experiments confirm the value of Higher Order
Apriori on real world e-marketplace data
33
Shenzhi Li
Senior Software Engineer
@ Ask (Ask.com)
 William M. Pottenger, Ph.D.
All Rights Reserved
LHOIM Results on 20NG Computer Dataset
• Average error rate for 1st-order (top left) 2nd-order (top right)
• Average stdev for 1st-order (bottom left) 2nd-order (bottom right)
34
Li, S. Z., Wu, T., and Pottenger, W. M. (2005) Distributed Higher
Order Association Rule Mining Using Information Extracted from
Textual Data. SIGKDD Explorations, volume 7, issue 1, pages 26-35.
Higher Order Graph Sampling on Reuters
Naï…
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10
Naïve Bayes Random Sampling
Higher Order Naïve Bayes Random Sampling
Higher Order Naïve Bayes Higher Order Sampling
Naï…
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10
Naïve Bayes Random Sampling
Higher Order Naïve Bayes Random Sampling
Higher Order Naïve Bayes Higher Order Sampling
Naï…
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10
Naïve Bayes Random Sampling
Higher Order Naïve Bayes Random Sampling
Higher Order Naïve Bayes Higher Order Sampling
Naï…
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 9 10
Naïve Bayes Random Sampling
Higher Order Naïve Bayes Random Sampling
Higher Order Naïve Bayes Higher Order Sampling
Higher Order Naïve
Bayes with Higher
Order Sampling gives
even better results
Higher Order Naïve
Bayes improves the
accuracy by at least 10%
Accuracy
in %
Patterns can be
discovered using a
much smaller sample –
important for online
learning
Training
Sample %
Cibin
George
M.S. in
CS @
Rutgers
 William M. Pottenger, Ph.D.
All Rights Reserved
Higher Order (Online)
Latent Dirichlet Allocation
Intuitively, this formula can be interpreted as
a word being assigned to a topic proportional
to its frequency of occurrence in that topic.
This is in fact, our guiding intuition and we
simply replace these term frequencies with
higher order frequencies.
36
Nir Grinberg
Ph.D. in CS
@ Rutgers
Kashyap Kolipaka
Ph.D. in CS @
Rutgers
Christie Nelson
Ph.D. at RUTCOR
@ Rutgers
 William M. Pottenger, Ph.D.
All Rights Reserved
Modeling Social Media for Emergency
Response in Port-au-Prince, Haiti
Cluster Geolocation
 William M. Pottenger, Ph.D.
All Rights Reserved
Modeling Social Media for Emergency
Response in Port-au-Prince, Haiti
Cluster Geolocation with predicted resource
 William M. Pottenger, Ph.D.
All Rights Reserved
Research Futures: Privacy-Enhanced
Higher Order Community Partitioning
),()(
11
=
,
1
1=
jiIPA
nl
Q k
ij
k
ij
ji
k
l
k
l  
),()(=),()
2
(=
,,
jiIPAjiI
m
dd
AQ ijij
ji
ji
ij
ji
 
Let I(I,j) be 1 if vertices i and j are in the same
community (social network), and 0 otherwise, then
Newman’s Q-Modularity is defined as:
Generalization
Q-Modularity counts edges inside each community and
subtracts the expected number of edges inside the same
community. Higher-order Ql counts number of paths inside
each community and subtracts the expected number of
paths. We propose Ql as a measure of a community split
and consider a combinatorial optimization approach.
39
Alex Nikolov, Ph.D.
in CS @ Rutgers
 William M. Pottenger, Ph.D.
All Rights Reserved
Results on Ground Truth Data
• We optimized Ql using an LP rounding based
approximation algorithm for correlation
clustering.
• We ran our experiments on networks with
known communities, and compared the known
communities to our clustering using the
Adjusted Rand Index.
Datasetl 1 2 3 4
Karate 0.5414 0.5669 0.5669 0.5669
Political
Books
0.6250 0.6463 0.6463 0.6463
40
 William M. Pottenger, Ph.D.
All Rights Reserved
Is Ql easier to approximate?
• We approximated Ql on random Gn,p graphs for
different values of l and p.
• We used the ratio of the value of the found
solution to the value of an LP relaxation as an
estimate of the approximation factor.
• It seems that Ql is harder for denser graphs (p
high) but easier for higher l.
l = 1 2 3 4 5
p = 0.03 0.9678 0.9840 1.0000 1.0000 0.9986
p = 0.12 0.1828 0.4542 -0.1179 0.8447 1.0000
p = 0.60 -0.1130 0.3975 1.0000 1.0000 1.0000
41
 William M. Pottenger, Ph.D.
All Rights Reserved
Differential Privacy
• Differential Privacy [DMNS]: A randomized
function K gives ε-differential privacy if for all
graphs G1,G2 differing in a single edge and all
subsets S of Range(K):
• The global sensitivity of a real valued function f
is:
where G1,G2 differ in a single edge.
S])G([KPrS])G(K[Pr 21 
GSf  maxG1,G2
| f (G1) f (G2) |
42
 William M. Pottenger, Ph.D.
All Rights Reserved
Sensitivity of Ql
The global sensitivity of Ql is at most 5(2l – 1)/l
for any fixed clustering.
By [DMNS], given a community split, outputting Ql
+ Lap(5(2l – 1)/lε) satisfies ε-differential privacy.
43
 William M. Pottenger, Ph.D.
All Rights Reserved
Differentially Private Community Discovery
• The measure of community split Ql is insensitive.
– We can output the value of a community split
differentially privately
• But we would like a to design an algorithm Alg,
such that:
– Alg outputs a community partition with high Ql ;
– Alg satisfies ε-differential privacy
• Considered in Differentially Private Combinatorial
Optimization (Gupta et al. 2009), but there is
no general method.
44
 William M. Pottenger, Ph.D.
All Rights Reserved
 In HOQL, we classify states as being in a high reward class or a low reward class. States are
added to a class based on a threshold. We use HONB classification for action selection. We
combine our method with greedy action selection based on the formula:
ε = 1- εstart* (1-episodecurrent / episodetotal)
 Q-values are updated based on the traditional formula:
Q(st
, at
) ← Q(st
, at
) + α[rt+1
+ γmaxa
Q(st+1
, a) – Q(st
, at
)
 Where α is the learning rate and γ is the discount factor. In these results, α = .91, γ =
1, and εstart = 0.8
REU Ashley Edwards
Higher Order Q-Learning (HOQL)
Ashley
Edwards,
Applicant for
Ph.D. in CS
@ Rutgers
Edwards, A. and
Pottenger, W. M. 2011.
Higher Order Q-
Learning. IEEE
Symposium on Adaptive
Dynamic Programming
and Reinforcement
Learning. Paris, France.
45
 William M. Pottenger, Ph.D.
All Rights Reserved
Anomaly detection through
machine-learning exposed that the
Chinese government is capable of
“line rate” MITM attacks.
Due to pipelining in modern
browser implementations,
“censorware” is forced to remember
a 5-tuple for every attempt a user
makes to view censored content.
<ipSrc, ipDst, srcPort, dstPort,
proto>
Chinese government routers use
fiber-optics to do censorship at
“line rate.”
They lose the ability to drop
packets, so every censorware
router in the path must store a 5-
tuple and block responses.
This begs the question: “What
kinds of computational complexity
bottlenecks in ‘censorware’ can we
exploit?”
For example, how large of a
“botnet” would be required to cause
Chinese censorware routers to run
out of memory?
A BMITM
User attempts to restart the
connection.
Government servers useSEQ-1460
attack on TCP.
Government servers get user to
establish new, fake connection
User accepts new, fake connection
and retransmits.
Government rejects data
transmission with RST packet.
Server doesn’t understand new,
fake connection. Sends RSTs.
User rejects attempt to restart
the connection.
Server assumes user is adversarial.
Sends RSTs and kills connection.
REU Becker Polverini
Using Clustering to Detect Censorware
46
Polverini, A. B. and Pottenger, W. M. 2011. Using Clustering to Detect
Chinese Censorware. CSIIRW ’11 Oak Ridge National Labs, TN USA
 William M. Pottenger, Ph.D.
All Rights Reserved
CCICADA technology transfer efforts
• Goal: Technology transfer to DHS users and
customers
• Several Tech Transfer programs @ DHS S&T:
– E2E – Engage to Excel
– Tech Solutions
– SECURE
• CCICADA is committed to support these existing
programs and to innovate new approaches – what can
you do?
– Publish your open-source software!
– Commercialize your software!
– Start your own company… and sell to DHS!
4747
www.intuidex.com ©Intuidex 2013 48
Intuidex, Inc.
Presenter: William M. Pottenger, Ph.D.
DrWMPottenger@intuidex.com
www.intuidex.com ©Intuidex 2013 49
About Intuidex
Data Analytics and Data Model provider
Focused on helping Organizations discover
actionable intelligence from large, varied, and
complex data sources
Provides an open, extensible analytics platform,
Watchman AnalyticsTM
Platform and components that facilitate enhanced
real-time information extraction, consolidation,
fusion and discovery from disparate structured
and unstructured data streams
www.intuidex.com ©Intuidex 2013 50
The problem we solve: “Big Data”
 Data volume and complexity has increased exponentially
 The number of data sources has exploded as well as
data formats, schemas and types
 The most valuable data is often unstructured and
fragmented
 The necessary data to drive better decisions is often
scattered across multiple data silos
 Data that is useful and valuable is often incomplete and
requires other data sources to validate
 Data storage systems are often proprietary with limited
interoperability
 Data from different sources regarding the same entities
sometimes conflicts.
www.intuidex.com ©Intuidex 2013 51www.intuidex.com ©Intuidex 2013 51
Differentiation
• Academic: Commercial Technology
Development
o Lab @ Rutgers University
o Director of Tech Transition for DHS S&T CCI Center
o Close cooperation with Rutgers Office of
Commercialization
o Three patents allowed, fourth pending
• Strategic Partnerships
o Rutgers University and DHS S&T Center of Excellence
o PNNL-DHS S&T National Visual Analytics Center
o Law Enforcement Partners: 3M (PIPS Technology)
o Customers in Intel / Defense sectors
www.intuidex.com ©Intuidex 2013 52
Analyst Information Overload
FMV
COMINT
SIGINT
HUMINT
SIGACTS
OTHER
Analyst
Applications
and
Visualization
Platforms
e.g., TIGR
www.intuidex.com ©Intuidex 2013 53
Data
Source
Data
Source
Data
Source
Data
Source
HighPerformanceIndex(IxHPI™)
Indexing
Routine
Indexing
Routine
Indexing
Routine
Indexing
Routine
Watchman Analytics™
Entity Extraction (IxExtract™)
Feature Selection (IxFeatures™
Topic Modeling (IxTopics™)
Rule Learning (IxRules™)
Recommender (IxRecommend™)
Alerting (IxAlert™)
Clustering (IxCluster™)
Data Validation (IxValidate™)
Trending (IxEntityTrend™)
Link Analysis (IxLinks™)
Data Fusion (IxRelClu™)
Entity Resolution (IxResolve™)
U
S
E
R
Watchman
Analytics™
Visualization
Customer
Visualization
www.intuidex.com ©Intuidex 2013 54www.intuidex.com ©Intuidex 2013 54
• Web-based advanced data analytics and visualization solution
• Adobe Flex RIA framework
• Component Modules
• Synchronized
Watchman Analytics™ for BOSS
www.intuidex.com ©Intuidex 2013 55
Intuidex and 3M Partnership
Intuidex, Inc., a leader and innovator in data analytics (machine learning), is the
pioneer of Higher Order Learning™ technologies that deliver unprecedented accuracy and
efficiency in identifying linkages, trends and patterns across disparate information
systems, in real time or near real time. Intuidex analytics have been licensed by
customers in the US Defense and Intelligence Agencies, US Law Enforcement Agencies
and the Fortune 500 to extract latent intelligence and insights from both structured and
unstructured data sources.
3M (formerly PIPS Technology) is the worldwide leader in Automated License
Plate Recognition (ALPR) technology. PIPS designs, manufactures, and supports its
complete line of ALPR products and services for use in law enforcement, parking, tolling,
and intelligent transportation systems. With over 20,000 cameras deployed around the
globe and a wide range of patents covering their technology and its application, PIPS
Technology is easily recognized as the leading provider of traffic related video imaging
and license plate capture technology for public safety agencies everywhere.
www.intuidex.com ©Intuidex 2013 56
APPLICATIONS OF
HIGHER ORDER LEARNING™
FROM
www.intuidex.com ©Intuidex 2013 57
• Objective: determine which COMINT is likely important and
require further analysis
• Data: plain text representation of comm-hits
• 400 samples drawn from Afghanistan theater
• Classification: two classes
• Class A, Class B
• Evaluation
• Compared IxHONB™ to Naïve Bayes (NB)
• Train on 5% to 90%, test on rest
• Averages (accuracy, precision, recall, ...) across 10-folds
Military Threat Detection Applications of
Intuidex’s Higher Order Learning™
www.intuidex.com ©Intuidex 2013 58www.intuidex.com ©Intuidex 2013 58
Weighted F-measure performance of NB vs. IxHONB™
www.intuidex.com ©Intuidex 2013 59
MIRC (Chat) Entity Extraction
 Data from MIRC chat Comm Hits (COMINT) has
been helpful to GMTI analysts in
 Determining the nature of movements detected by radar (e.g.,
wild animals don't radio their friends for help)
 Whether ground targets may represent a threat
 Validating known movements by corroborating with statements
of locals (if they see a vehicle WE see, then we KNOW what
the “dots” are)
 Some “dots” can talk!
 Tactical Ground Reporting System (TIGR)
 A TIGR user on the battlefield has limited ability to refine a
search the way an analyst can
 Only has temporal and spatial filters, and relies on pre-
packaged intel from various sources input to TIGR (HUMINT,
SIGACT, HUMINT)
www.intuidex.com ©Intuidex 2013 60
Example Actionable Information
• IxRules™ aids a user in discovering rules for multiple entity types
• IED Trigger
“On 23 February 2006, at 12:30 PM, in Ba'qubah, Diyala, Iraq,
assailants detonated a probable command-initiated improvised
explosive device (IED) hidden in a soup vendor's handcart near an
Iraqi Army patrol in the central market, killing eight Iraqi soldiers and
eight civilians, wounding four Iraqi soldiers and 11 civilians, and causing
unspecified damage to the public market. The Mujahidin Shura
Council in Iraq (MSC) claimed responsibility.”
• Height
“… The suspect is described as black, medium complexion, 28-30 years
old, clean-shaven, approximately 6 feet 8 inches tall, weighing 180-
200 pounds, with a muscular build. He was last seen wearing a black
sweatshirt, black pants, and a dark blue or black knit hat. …”
www.intuidex.com ©Intuidex 2013 61
Tactical Ground Reporting System: TIGR
www.intuidex.com ©Intuidex 2013 62
Benefits to the Warfighter
1. Fusion of high-value COMINT intel provides
significantly improved situational awareness for
warfighters with ‘boots on the ground’
2. Extraction / summarization of high-value COMINT,
SIGACT, HUMINT from unstructured, unleveraged
text sources
3. Fusion of high-value COMINT and other text-
based intel with GMTI and other intel sources
• Transitioned to: ESC/CIEF, used in DARPA
Tactical Ground Reporting System (TIGR)
Technology Transition Description
• Fielded operationally at: Afghanistan and other
theaters
• Customer(s): TIGR and users, e.g., GEOINT, FSR,
S2, ISR, MAI, CPTI, JIEDDO MID, CIED, RFI, NASIC,
Centcom TFs
Information extraction, summarization and fusion
technologies to provide warfighter with
situational awareness
From theater: “These are exactly the sort of quick and
dirty SIGINT summaries I am trying to get. … Just
wanted to make sure you know how happy our ground
units are to get this information in a wrap up. This daily
tipper has made our supported units very happy. Thanks
for the consistent help.”
www.intuidex.com ©Intuidex 2013 63
• Objective: Classify confidence in perpetrator identification for
incidents in NCTC Worldwide Incident Tracking System (WITS)
• Data: relational tables from WITS
• Sampled ~1,000 incidents from 80,000 record corpus
• Included some free text
• Classification: five confidence classes
• Plausible, Likely, Unknown, Unlikely, Inferred (analyst)
• Evaluation
• Compared IxHONB™ to NB and LSI-kNN
• Train on 5% to 90% of sample, test on rest
• Averages (accuracy, precision, recall, ...) across 10-folds
Counterterrorism Applications of Intuidex’s
Higher Order Learning™
www.intuidex.com ©Intuidex 2013 64
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
5 10 20 30 40 50 60 70 80 90
F-measure
Percentage of Training Set Available for Training
HONB
LSI-kNN
NB
Non-weighted F-measure performance of NB, LSI-kNN and IxHONB™
www.intuidex.com ©Intuidex 2013 65
Nuclear Detection
•Data was taken from a Thermo Scientific
handheld Spectroscopic Personal Radiation
Detector called the InterceptorTM
• 302 gamma-ray spectrum files
•20 from Tc99m, the rest from other
isotopes or background
•Small positive class size
• 1024 numeric channels per spectrum
•High dimensional space
• 14 labeled, high confidence isotopes
•Potassium (40K; 1.3 billion years)
www.intuidex.com ©Intuidex 2013 66
Sample of Results - Accuracy
Accuracy
65% 60% 55% 50% 45% 40% 35% 30% 25% 20%
Ga67 – D-B 0.0002 0 0 0 0 0 0 0 0 0
Ga67 – N-D-B 1 1 1 0.343 0.778 0.697 0.39 0.57 0.26 0.06
I131 – D-B 0 0 0 0 0 0 0 0.01 0.251 0.16
I131 – N-D-B 0 0 0.002 0.008 0 0 0 0.01 0.002 0
In111 – D-B 0.136 0.017 0.01 0.001 0 0 0 0 0 0
In111 – N-D-B 1 0.08 0.389 0.005 0.001 0.037 0 0.18 0 0.45
Tc99m – D-B 0.049 0.095 0.001 0 0 0 0 0 0 0
Tc99m - N-D-B 0 0 0 0 0 0 0 0 0 0
Key
Statistically Significant difference: NB < HONB
Not Statistically Significant
www.intuidex.com ©Intuidex 2013 67www.intuidex.com ©Intuidex 2013 67
Typical Intuidex Engagement
• Client environment analysis
Infrastructure (hardware, software)
Data sources
Operations (relevant and related policies)
• Requirements Specification with SMEs
Iterate until approved
• Deploy high-performance index engine
Install, configure, test
• Deploy indexing routines
Develop, configure, optimize
• Deploy analytics services
(Optional) Develop custom services to spec
Install, configure, test
www.intuidex.com ©Intuidex 2013 68www.intuidex.com ©Intuidex 2013 68
Typical Intuidex Engagement
• (Optional) Existing visualization interface
Design interface specification for existing framework
• Ground-truth development with SMEs
• System documentation
Usage documentation
Administration and Configuration documentation
Visualization interface documentation (optional)
• Deployment validation
Quality assurance
Load testing
• Customer acceptance
www.intuidex.com ©Intuidex 2013 69www.intuidex.com ©Intuidex 2013 69
Watchman Analytics™ Functionality
Entity Resolution Online Monitoring
Data Deconfliction Automated Alerting
Interactive
Analysis
Entity Extraction
Ad-hoc Reporting Entity Classification
Privacy
Protection*
Quality Assurance
Link-based
Analysis
Embedded Analytics
* Privacy protection is a major Intuidex research area and development thrust
www.intuidex.com ©Intuidex 2013 71
• Intuidex, Inc. is a hi-tech start-up incorporated by
William. M. Pottenger, Ph.D.
• Thought Leadership in Data Analytics
• Key Partnerships
 William M. Pottenger, Ph.D.
All Rights Reserved
Acknowledgements
• I am very grateful to my hardworking, intelligent and creative
(current and former) students and postdocs without whom none of
this would have been possible: Kunikazu Yoda, Christie Nelson,
Aleksandar Nikolov, Nir Grinberg, Cibin George, Christopher
Janneck, Nikita Lytkin, Shenzhi Li, Murat Ganiz, Chirag Pandya,
Kashyap Kolipaka, Vikas Menon, April Kontostathis, Tianhao Wu,
Jirada Kuntraruk, Jason Perry, Mark Dilsizian (and >> others).
• I also thank Rutgers University, the National Science Foundation,
the Department of Homeland Security and the National Institute of
Justice. This material is based upon work partially supported by the
National Science Foundation under Grant Numbers 0703698 and
0712139. Any opinions, findings, and conclusions or recommendations
expressed in this material are those of the authors and do not
necessarily reflect the views of the National Science Foundation or
Rutgers University.
• I also gratefully acknowledge the continuing help of my Lord and
Savior, Yeshua the Messiah (Jesus the Christ) in my life and work.
72
 William M. Pottenger, Ph.D.
All Rights Reserved
Thank you!
Q&A
73
 William M. Pottenger, Ph.D.
All Rights Reserved
References
 Soumen Chakrabarti, Byron Dom, and Piotr Indyk. Enhanced hypertext
categorization using hyperlinks. SIGMOD Rec., 27(2):307–318, 1998.
 Scott Deerwester, Susan T. Dumais, George W. Furnas,Thomas K.
Landauer, and Richard Harshman.
 Indexing by latent semantic analysis. Journal of the American Society for
Information Science, 41:391–407, 1990.
 Lise Getoor and Christopher P. Diehl. Link mining: a survey. SIGKDD Explor.
Newsl., 7(2):3–12, 2005.
 Murat Can Ganiz, Sudhan Kanitkar, Mooi Choo Chuah, and William M.
Pottenger. Detection of interdomain routing anomalies based on higher-order
path analysis. In ICDM ’06: Proceedings of the Sixth International
Conference on Data Mining, pages 874–879, Washington, DC, USA, 2006.
IEEE Computer Society.
 Leo Katz. A new status index derived from sociometric analysis.
Psychometrika, 18(1):39–43, March 1953.
 April Kontostathis and William M. Pottenger. A framework for understanding
latent semantic indexing (LSI) Performance. Inf. Process. Manage.,
42(1):56–73, 2006.
74
 William M. Pottenger, Ph.D.
All Rights Reserved
 Qing Lu and Lise Getoor. Link-based classification. In Tom Fawcett and
Nina Mishra, editors, ICML, pages 496–503. AAAI Press, 2003.
 Shenzhi Li, Tianhao Wu, and William M. Pottenger. Distributed higher order
association rule mining using information extracted from textual data.
SIGKDD Explorations Newsl., 7(1):26–35, 2005.
 J. Neville and D. Jensen. Iterative classification in relational data. In Proc.
AAAI, pages 13–20. AAAI Press, 2000.
 J. Neville and D. Jensen. Dependency networks for relational data. Data
Mining, 2004. ICDM ’04. Fourth IEEE International Conference, pages 170–
177, Nov. 2004.
 Noam Slonim and Naftali Tishby. The power of word clusters for text
classification. In In 23rd European Colloquium on Information Retrieval
Research, 2001.
 Ben Taskar, Eran Segal, and Daphne Koller. Probabilistic classification and
clustering in relational data. In Proceedings of the Seventeenth
International Joint Conference on Artificial Intelligence, pages 870–878,
2001.
 Vladimir Vapnik. Statistical Learning Theory. John Wiley, 1998.
References
75

Mais conteúdo relacionado

Mais procurados

On distributed fuzzy decision trees for big data
On distributed fuzzy decision trees for big dataOn distributed fuzzy decision trees for big data
On distributed fuzzy decision trees for big datanexgentechnology
 
Intelligent information extraction based on artificial neural network
Intelligent information extraction based on artificial neural networkIntelligent information extraction based on artificial neural network
Intelligent information extraction based on artificial neural networkijfcstjournal
 
Data sharing in neuroimaging: incentives, tools, and challenges
Data sharing in neuroimaging: incentives, tools, and challengesData sharing in neuroimaging: incentives, tools, and challenges
Data sharing in neuroimaging: incentives, tools, and challengesKrzysztof Gorgolewski
 
Sampling methods for counting temporal motifs
Sampling methods for counting temporal motifsSampling methods for counting temporal motifs
Sampling methods for counting temporal motifsAustin Benson
 
Proposed-curricula-MCSEwithSyllabus_24_...
Proposed-curricula-MCSEwithSyllabus_24_...Proposed-curricula-MCSEwithSyllabus_24_...
Proposed-curricula-MCSEwithSyllabus_24_...butest
 
HOW NEURAL NETWORKS WORK
HOW NEURAL NETWORKS WORKHOW NEURAL NETWORKS WORK
HOW NEURAL NETWORKS WORKAM Publications
 
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...ijaia
 
Paper id 252014107
Paper id 252014107Paper id 252014107
Paper id 252014107IJRAT
 
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Sarvesh Kumar
 
June 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational IntelligenceJune 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational Intelligenceaciijournal
 
Choosing to grow a graph
Choosing to grow a graphChoosing to grow a graph
Choosing to grow a graphAustin Benson
 
Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2Deakin University
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...AI Publications
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
 

Mais procurados (20)

On distributed fuzzy decision trees for big data
On distributed fuzzy decision trees for big dataOn distributed fuzzy decision trees for big data
On distributed fuzzy decision trees for big data
 
Intelligent information extraction based on artificial neural network
Intelligent information extraction based on artificial neural networkIntelligent information extraction based on artificial neural network
Intelligent information extraction based on artificial neural network
 
CV
CVCV
CV
 
Satya Sahoo Thesis Defense
Satya Sahoo Thesis DefenseSatya Sahoo Thesis Defense
Satya Sahoo Thesis Defense
 
AI that/for matters
AI that/for mattersAI that/for matters
AI that/for matters
 
Data sharing in neuroimaging: incentives, tools, and challenges
Data sharing in neuroimaging: incentives, tools, and challengesData sharing in neuroimaging: incentives, tools, and challenges
Data sharing in neuroimaging: incentives, tools, and challenges
 
Sampling methods for counting temporal motifs
Sampling methods for counting temporal motifsSampling methods for counting temporal motifs
Sampling methods for counting temporal motifs
 
Proposed-curricula-MCSEwithSyllabus_24_...
Proposed-curricula-MCSEwithSyllabus_24_...Proposed-curricula-MCSEwithSyllabus_24_...
Proposed-curricula-MCSEwithSyllabus_24_...
 
HOW NEURAL NETWORKS WORK
HOW NEURAL NETWORKS WORKHOW NEURAL NETWORKS WORK
HOW NEURAL NETWORKS WORK
 
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...
PREDICTING STUDENT ACADEMIC PERFORMANCE IN BLENDED LEARNING USING ARTIFICIAL ...
 
Paper id 252014107
Paper id 252014107Paper id 252014107
Paper id 252014107
 
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
 
June 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational IntelligenceJune 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational Intelligence
 
Choosing to grow a graph
Choosing to grow a graphChoosing to grow a graph
Choosing to grow a graph
 
Ijnsa050202
Ijnsa050202Ijnsa050202
Ijnsa050202
 
Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2Deep learning 1.0 and Beyond, Part 2
Deep learning 1.0 and Beyond, Part 2
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
31 34
31 3431 34
31 34
 

Destaque

JPEC 2014 AnnualReport-HR-ToPrint
JPEC 2014 AnnualReport-HR-ToPrintJPEC 2014 AnnualReport-HR-ToPrint
JPEC 2014 AnnualReport-HR-ToPrintAmy Jo Reimer-Myers
 
Program wcci-final[1]
Program wcci-final[1]Program wcci-final[1]
Program wcci-final[1]TARKI AOMAR
 
Dez acontecimentos mais inusitados de Pequim
Dez acontecimentos mais inusitados de PequimDez acontecimentos mais inusitados de Pequim
Dez acontecimentos mais inusitados de Pequimflavia_rodrigues
 
The Great Olympic Lip Sync
The Great Olympic Lip SyncThe Great Olympic Lip Sync
The Great Olympic Lip Synccoolstuff
 
portfolio-Qiao
portfolio-Qiaoportfolio-Qiao
portfolio-Qiaozhang qiao
 
Annual-Report-and-Research-in-Progress-2014-(lr)
Annual-Report-and-Research-in-Progress-2014-(lr)Annual-Report-and-Research-in-Progress-2014-(lr)
Annual-Report-and-Research-in-Progress-2014-(lr)Shaida Darian
 
K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]
K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]
K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]Randy Ikas
 
Gis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsGis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsAhmad Jawwad
 
Ancient china qin dynasty, the great wall, mauseleum
Ancient china   qin dynasty, the great wall, mauseleumAncient china   qin dynasty, the great wall, mauseleum
Ancient china qin dynasty, the great wall, mauseleumAlex Thompson
 
Switching horses midstream - From Waterfall to Agile
Switching horses midstream - From Waterfall to AgileSwitching horses midstream - From Waterfall to Agile
Switching horses midstream - From Waterfall to AgileDoc Norton
 
Solar Pump Applications in South asia
Solar Pump Applications in South asiaSolar Pump Applications in South asia
Solar Pump Applications in South asiatiger power yan
 
Curriculum Vitae -Carlo Gaetano-November_18th_ 2015_new
Curriculum Vitae -Carlo Gaetano-November_18th_ 2015_newCurriculum Vitae -Carlo Gaetano-November_18th_ 2015_new
Curriculum Vitae -Carlo Gaetano-November_18th_ 2015_newCarlo Gaetano
 
HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at XiaomiHBaseCon
 
WeChat preso - Jiaqi Huang
WeChat preso - Jiaqi HuangWeChat preso - Jiaqi Huang
WeChat preso - Jiaqi HuangJiaqi Huang
 
The solar system. Class 4. Vedruna Tordera
The solar system. Class 4. Vedruna TorderaThe solar system. Class 4. Vedruna Tordera
The solar system. Class 4. Vedruna Torderaarualarual
 
Workshop erfa trimbach-120614
Workshop erfa trimbach-120614Workshop erfa trimbach-120614
Workshop erfa trimbach-120614Vorname Nachname
 

Destaque (19)

SW 04-27 Final presentation
SW 04-27 Final presentationSW 04-27 Final presentation
SW 04-27 Final presentation
 
JPEC 2014 AnnualReport-HR-ToPrint
JPEC 2014 AnnualReport-HR-ToPrintJPEC 2014 AnnualReport-HR-ToPrint
JPEC 2014 AnnualReport-HR-ToPrint
 
Senbud 1
Senbud 1Senbud 1
Senbud 1
 
Program wcci-final[1]
Program wcci-final[1]Program wcci-final[1]
Program wcci-final[1]
 
Dez acontecimentos mais inusitados de Pequim
Dez acontecimentos mais inusitados de PequimDez acontecimentos mais inusitados de Pequim
Dez acontecimentos mais inusitados de Pequim
 
The Great Olympic Lip Sync
The Great Olympic Lip SyncThe Great Olympic Lip Sync
The Great Olympic Lip Sync
 
portfolio-Qiao
portfolio-Qiaoportfolio-Qiao
portfolio-Qiao
 
Annual-Report-and-Research-in-Progress-2014-(lr)
Annual-Report-and-Research-in-Progress-2014-(lr)Annual-Report-and-Research-in-Progress-2014-(lr)
Annual-Report-and-Research-in-Progress-2014-(lr)
 
Contoh ragam musik
Contoh ragam musikContoh ragam musik
Contoh ragam musik
 
K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]
K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]
K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]
 
Gis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsGis capabilities on Big Data Systems
Gis capabilities on Big Data Systems
 
Ancient china qin dynasty, the great wall, mauseleum
Ancient china   qin dynasty, the great wall, mauseleumAncient china   qin dynasty, the great wall, mauseleum
Ancient china qin dynasty, the great wall, mauseleum
 
Switching horses midstream - From Waterfall to Agile
Switching horses midstream - From Waterfall to AgileSwitching horses midstream - From Waterfall to Agile
Switching horses midstream - From Waterfall to Agile
 
Solar Pump Applications in South asia
Solar Pump Applications in South asiaSolar Pump Applications in South asia
Solar Pump Applications in South asia
 
Curriculum Vitae -Carlo Gaetano-November_18th_ 2015_new
Curriculum Vitae -Carlo Gaetano-November_18th_ 2015_newCurriculum Vitae -Carlo Gaetano-November_18th_ 2015_new
Curriculum Vitae -Carlo Gaetano-November_18th_ 2015_new
 
HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at Xiaomi
 
WeChat preso - Jiaqi Huang
WeChat preso - Jiaqi HuangWeChat preso - Jiaqi Huang
WeChat preso - Jiaqi Huang
 
The solar system. Class 4. Vedruna Tordera
The solar system. Class 4. Vedruna TorderaThe solar system. Class 4. Vedruna Tordera
The solar system. Class 4. Vedruna Tordera
 
Workshop erfa trimbach-120614
Workshop erfa trimbach-120614Workshop erfa trimbach-120614
Workshop erfa trimbach-120614
 

Semelhante a Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learning Group)

Effects of Network Structure, Competition and Memory Time on Social Spreading...
Effects of Network Structure, Competition and Memory Time on Social Spreading...Effects of Network Structure, Competition and Memory Time on Social Spreading...
Effects of Network Structure, Competition and Memory Time on Social Spreading...James Gleeson
 
informatics_future.pdf
informatics_future.pdfinformatics_future.pdf
informatics_future.pdfAdhySugara2
 
A metadata scheme of the software-data relationship: A proposal
A metadata scheme of the software-data relationship: A proposalA metadata scheme of the software-data relationship: A proposal
A metadata scheme of the software-data relationship: A proposalKai Li
 
ICCSS2015 talk: Null model for meme popularity
ICCSS2015 talk: Null model for meme popularityICCSS2015 talk: Null model for meme popularity
ICCSS2015 talk: Null model for meme popularityJames Gleeson
 
Lec1-Into
Lec1-IntoLec1-Into
Lec1-Intobutest
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
wetenschappelijke vorming: HCI topics
wetenschappelijke vorming: HCI topicswetenschappelijke vorming: HCI topics
wetenschappelijke vorming: HCI topicsErik Duval
 
Lecture-1-Introduction-to-Data-Mining.pdf
Lecture-1-Introduction-to-Data-Mining.pdfLecture-1-Introduction-to-Data-Mining.pdf
Lecture-1-Introduction-to-Data-Mining.pdfJojo314349
 
KDD, Data Mining, Data Science_I.pptx
KDD, Data Mining, Data Science_I.pptxKDD, Data Mining, Data Science_I.pptx
KDD, Data Mining, Data Science_I.pptxYogeshGairola2
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataHerbert Van de Sompel
 
Data Infrastructure for Coastal and Estuarine Science
Data Infrastructure for Coastal and Estuarine ScienceData Infrastructure for Coastal and Estuarine Science
Data Infrastructure for Coastal and Estuarine ScienceAnne Thessen
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Rich Heimann
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewPhilip Bourne
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data CitationMicah Altman
 
"Melting Pot" of the Sciences in interdisciplinary research
"Melting Pot" of the Sciences in interdisciplinary research"Melting Pot" of the Sciences in interdisciplinary research
"Melting Pot" of the Sciences in interdisciplinary researchNatalie de Vries
 

Semelhante a Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learning Group) (20)

Effects of Network Structure, Competition and Memory Time on Social Spreading...
Effects of Network Structure, Competition and Memory Time on Social Spreading...Effects of Network Structure, Competition and Memory Time on Social Spreading...
Effects of Network Structure, Competition and Memory Time on Social Spreading...
 
informatics_future.pdf
informatics_future.pdfinformatics_future.pdf
informatics_future.pdf
 
A metadata scheme of the software-data relationship: A proposal
A metadata scheme of the software-data relationship: A proposalA metadata scheme of the software-data relationship: A proposal
A metadata scheme of the software-data relationship: A proposal
 
ICCSS2015 talk: Null model for meme popularity
ICCSS2015 talk: Null model for meme popularityICCSS2015 talk: Null model for meme popularity
ICCSS2015 talk: Null model for meme popularity
 
Lec1-Into
Lec1-IntoLec1-Into
Lec1-Into
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
wetenschappelijke vorming: HCI topics
wetenschappelijke vorming: HCI topicswetenschappelijke vorming: HCI topics
wetenschappelijke vorming: HCI topics
 
Lecture-1-Introduction-to-Data-Mining.pdf
Lecture-1-Introduction-to-Data-Mining.pdfLecture-1-Introduction-to-Data-Mining.pdf
Lecture-1-Introduction-to-Data-Mining.pdf
 
KDD, Data Mining, Data Science_I.pptx
KDD, Data Mining, Data Science_I.pptxKDD, Data Mining, Data Science_I.pptx
KDD, Data Mining, Data Science_I.pptx
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Öppen data och forskningens genomslag
Öppen data och forskningens genomslagÖppen data och forskningens genomslag
Öppen data och forskningens genomslag
 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage data
 
Data Infrastructure for Coastal and Estuarine Science
Data Infrastructure for Coastal and Estuarine ScienceData Infrastructure for Coastal and Estuarine Science
Data Infrastructure for Coastal and Estuarine Science
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data Citation
 
"Melting Pot" of the Sciences in interdisciplinary research
"Melting Pot" of the Sciences in interdisciplinary research"Melting Pot" of the Sciences in interdisciplinary research
"Melting Pot" of the Sciences in interdisciplinary research
 
Process Research With Digital Trace Data
Process Research With Digital Trace DataProcess Research With Digital Trace Data
Process Research With Digital Trace Data
 

Mais de Hakka Labs

Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Hakka Labs
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchHakka Labs
 
DataEngConf SF16 - Data Asserts: Defensive Data Science
DataEngConf SF16 - Data Asserts: Defensive Data ScienceDataEngConf SF16 - Data Asserts: Defensive Data Science
DataEngConf SF16 - Data Asserts: Defensive Data ScienceHakka Labs
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartHakka Labs
 
DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleDataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleHakka Labs
 
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor DataDataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor DataHakka Labs
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale Hakka Labs
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQHakka Labs
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...Hakka Labs
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...Hakka Labs
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestHakka Labs
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringHakka Labs
 
DataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data StructuresDataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data StructuresHakka Labs
 
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkDataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkHakka Labs
 
DataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with OurselvesDataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with OurselvesHakka Labs
 
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High DeliverabilityDataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High DeliverabilityHakka Labs
 
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...Hakka Labs
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInDataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInHakka Labs
 
DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopHakka Labs
 

Mais de Hakka Labs (20)

Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series search
 
DataEngConf SF16 - Data Asserts: Defensive Data Science
DataEngConf SF16 - Data Asserts: Defensive Data ScienceDataEngConf SF16 - Data Asserts: Defensive Data Science
DataEngConf SF16 - Data Asserts: Defensive Data Science
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at Instacart
 
DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleDataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scale
 
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor DataDataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineering
 
DataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data StructuresDataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data Structures
 
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkDataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
 
DataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with OurselvesDataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with Ourselves
 
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High DeliverabilityDataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
 
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInDataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
 
DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
 

Último

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Último (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learning Group)

  • 1.  William M. Pottenger, Ph.D. All Rights Reserved To be or not to be IID: That is the Question Higher Order Learning William M. Pottenger, Ph.D. Rutgers University and Intuidex, Inc. DrWMP@rci.rutgers.edu; www.dimacs.rutgers.edu/~billp DrWMPottenger@Intuidex.com; www.intuidex.com
  • 2.  William M. Pottenger, Ph.D. All Rights Reserved Dr. William M. Pottenger www.dimacs.rutgers.edu/~billp www.intuidex.com • Example Application Areas – Homeland Security/Law Enforcement/Criminal Justice Information Systems – Decision Support Systems – Information Retrieval Systems – High Performance Computing • Research Funded by – National Science Foundation – National Institute of Justice – Department of Homeland Security – Army Research Lab – Commonwealth of Pennsylvania – Corporate Partners – E.g., Lockheed-Martin, Kodak, PNNL, Boeing, etc. • Associate Research Professor @ Rutgers University – DIMACS & Computer Science • CEO of Intuidex, Inc. • Director of Transition for DHS S&T CCI Center • Research Scientist @ NCSA • M.S., Ph.D. in CS at UIUC • Research Interests – Statistical Relational Learning – Leveraging higher-order relations in graphs of data – Parallel and Distributed Visual & Data Analytics – Analytics in a parallel and/or distributed environment – Information Extraction – Automatic extraction of keywords/features from text 2
  • 3.  William M. Pottenger, Ph.D. All Rights Reserved What is Higher Order Information? • Swanson (‘91) posed problem: Migraine headaches (M) – stress associated with M – stress leads to loss of magnesium – calcium channel blockers prevent some M – magnesium is a natural calcium channel blocker – spreading cortical depression (SCD) implicated in M – high levels of magnesium inhibit SCD – M patients have high platelet aggregability – magnesium can suppress platelet aggregability • All extracted from medical journal titles Slide reused with permission of Marti Hearst @ UCB 3
  • 4.  William M. Pottenger, Ph.D. All Rights Reserved Gathering Evidence stress migraine CCB magnesium PA magnesium SCD magnesiummagnesium Slide reused with permission of Marti Hearst @ UCB 4
  • 5.  William M. Pottenger, Ph.D. All Rights Reserved Higher Order Paths! migraine magnesium stress CCB PA SCD Slide reused with permission of Marti Hearst @ UCB 5
  • 6.  William M. Pottenger, Ph.D. All Rights Reserved Related Work: Link Mining and Collective Classification  Link-based approaches (Taskar et al., 2001; Getoor and Diehl, 2005; Lu and Getoor, 2003; Neville and Jensen 2004) to collective classification use explicit link information within networked data  Studies (Chakrabarti et al., 1998; Neville and Jensen, 2000; Taskar et al., 2001) have shown that collective classifiers can achieve significant reductions in classification errors by performing inference about multiple data instances simultaneously  Collective classifiers are context-dependent and are not designed to classify stand-alone data instances  We propose classification methods that leverage implicit links between features in small training sets, and that maintain the ability for “context-free” classification of individual data instances 6
  • 7.  William M. Pottenger, Ph.D. All Rights Reserved Is there a theoretical basis for the use of higher order co-occurrence relations? • Research agenda: study machine learning algorithms in search of a theoretical foundation for the use of higher order relations • First algorithm: Latent Semantic Indexing (LSI) – Widely used technique in text mining and IR based on the Singular Value Decomposition (SVD) matrix factoring algorithm – Research question: Does LSI use higher order term co-occurrence? – First step: study SVD 7 April Kontostathis Associate Professor @ Ursinus College
  • 8.  William M. Pottenger, Ph.D. All Rights Reserved Is there a theoretical basis for the use of higher order co-occurrence relations in LSI? s1 s2 s3 sr A (m x n)  T (m x r) S (r x r) DT (r x n) Term by Doc Term by Dimension Singular Values Dimension by Document s1 <= s2 <= s3 <= . . . <=sr r = rank of A, m = num terms, n = number docs Singular Value Decomposition 8
  • 9.  William M. Pottenger, Ph.D. All Rights Reserved Is there a theoretical basis for the use of higher order co-occurrence relations in LSI? s1 s2 s3 sr A (m x n)  T (m x k) S (k x k) DT (k x n) Reduced Term by Doc Term by Dimension Singular Values Dimension by Document s1 <= s2 <= s3 <= . . . <=sr r = rank of A, m = num terms, n = number docs LSI: Truncation of Singular Values 9
  • 10.  William M. Pottenger, Ph.D. All Rights Reserved Is there a theoretical basis for the use of higher order co-occurrence relations in LSI? human interface computer user system response time EPS Survey trees graph minors human x 1 1 0 2 0 0 1 0 0 0 0 interface 1 x 1 1 1 0 0 1 0 0 0 0 computer 1 1 x 1 1 1 1 0 1 0 0 0 user 0 1 1 x 2 2 2 1 1 0 0 0 system 2 1 1 2 x 1 1 3 1 0 0 0 response 0 0 1 2 1 x 2 0 1 0 0 0 time 0 0 1 2 1 2 x 0 1 0 0 0 EPS 1 1 0 1 3 0 0 x 0 0 0 0 Survey 0 0 1 1 1 1 1 0 x 0 1 1 trees 0 0 0 0 0 0 0 0 0 x 2 1 graph 0 0 0 0 0 0 0 0 1 2 x 2 minors 0 0 0 0 0 0 0 0 1 1 2 x Deerwester Term by Term Matrix human interface computer user system response time EPS Survey trees graph minors human x 0.54 0.56 0.94 1.69 0.58 0.58 0.84 0.32 -0.32 -0.34 -0.25 interface 0.54 x 0.52 0.87 1.50 0.55 0.55 0.73 0.35 -0.20 -0.19 -0.14 computer 0.56 0.52 x 1.09 1.67 0.75 0.75 0.77 0.63 0.15 0.27 0.20 user 0.94 0.87 1.09 x 2.79 1.25 1.25 1.28 1.04 0.23 0.42 0.31 system 1.69 1.50 1.67 2.79 x 1.81 1.81 2.30 1.20 -0.47 -0.39 -0.28 response 0.58 0.55 0.75 1.25 1.81 x 0.89 0.80 0.82 0.38 0.56 0.41 time 0.58 0.55 0.75 1.25 1.81 0.89 x 0.80 0.82 0.38 0.56 0.41 EPS 0.84 0.73 0.77 1.28 2.30 0.80 0.80 x 0.46 -0.41 -0.43 -0.31 Survey 0.32 0.35 0.63 1.04 1.20 0.82 0.82 0.46 x 0.88 1.17 0.85 trees -0.32 -0.20 0.15 0.23 -0.47 0.38 0.38 -0.41 0.88 x 1.96 1.43 graph -0.34 -0.19 0.27 0.42 -0.39 0.56 0.56 -0.43 1.17 1.96 x 1.81 minors -0.25 -0.14 0.20 0.31 -0.28 0.41 0.41 -0.31 0.85 1.43 1.81 x Deerwester Term by Term Matrix, truncated to two dimensions 10
  • 11.  William M. Pottenger, Ph.D. All Rights Reserved • Answer is in the following theorem we proved: If the ijth element of the truncated term by term matrix, Y, is non-zero, then there exists a co-occurrence path of order  1 between terms i and j. – Kontostathis, A. and Pottenger, W. M. (2006) A Framework for Understanding LSI Performance. Information Processing & Management, volume 42, issue 1, pages 56-73. • We have both proven mathematically and demonstrated empirically that LSI is based on the use of higher order co-occurrence relations. • Next step? Is there a theoretical basis for the use of higher order co-occurrence relations in LSI? 11
  • 12.  William M. Pottenger, Ph.D. All Rights Reserved Using Higher Order Information in both Generative and Discriminative Learning • Extend the theoretical foundation that April and I developed by studying characteristics of higher-order information in other machine learning approaches including both generative and discriminative supervised learning as well as unsupervised approaches – Ganiz, M. C., Lytkin, N. I. and Pottenger, W. M. (2009) Leveraging Higher Order Dependencies Between Features for Text Classification. In the Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD). Bled, Slovenia, September. Nikita Lytkin Research Scientist @ NYU Medical Center Murat Ganiz Assistant Professor @ Dogus University
  • 13.  William M. Pottenger, Ph.D. All Rights Reserved Representation of Boolean Data by a Bipartite Graph 13
  • 14.  William M. Pottenger, Ph.D. All Rights Reserved Multinomial vs. Multivariate Event Model McCallum & Nigam (1998) 14
  • 15.  William M. Pottenger, Ph.D. All Rights Reserved First Order Paths in a Data Graph 15
  • 16.  William M. Pottenger, Ph.D. All Rights Reserved Second Order Paths in a Data Graph 16
  • 17.  William M. Pottenger, Ph.D. All Rights Reserved Patterns of Connectivity between Features 17
  • 18.  William M. Pottenger, Ph.D. All Rights Reserved Probabilistic Characterization of Features by Second Order Paths 18
  • 19.  William M. Pottenger, Ph.D. All Rights Reserved Higher Order Naïve Bayes: A Generative Learner Murat Ganiz Assistant Professor @ Dogus University 19
  • 20.  William M. Pottenger, Ph.D. All Rights Reserved 20 Slonim & Tishby (2001) vs. HONB Ganiz, M. C., Pottenger, W. M. and George, C. (2010) Higher Order Naïve Bayes: A Novel Non-IID Approach to Text Classification. IEEE Transactions of Knowledge and Data Engineering (TKDE). multinomial features binary features Dataset NB NB_wc improvement % NB HONB improvement % COMP (5) 0.473 0.508 7.4 0.51 0.65 26.5 SCIENCE (4) 0.65 0.725 11.5 0.6 0.84 41.6 POLITICS (3) 0.62 0.67 8.1 0.68 0.83 22.8 RELIGION (3) 0.525 0.553 5.3 0.64 0.74 15.7 8.075 26.65  HONB achieves statistically significantly better performance than NB for four datasets based on t-test results  (Slonim & Tishby, 2001) did not report std dev or t-test results
  • 21.  William M. Pottenger, Ph.D. All Rights Reserved Supervised Second Order Transformation for Discriminative Learning 21 Nikita Lytkin Research Scientist @ NYU Medical Center
  • 22.  William M. Pottenger, Ph.D. All Rights Reserved Influence of Higher-Order Paths 22
  • 23.  William M. Pottenger, Ph.D. All Rights Reserved Experimental Setup  Support Vector Machine (Vapnik 1998) was used to evaluate the Supervised Second Order Transformation  Multi-class classification by SVM was performed using the “one-against-one” scheme  Used RBF and linear kernels in SVM and varied soft margin cost from 10-4 to 104  Training set size varied from 5% to 60%  Eight experiments performed at each sample size 25
  • 24.  William M. Pottenger, Ph.D. All Rights Reserved  Six benchmark text corpora were selected  Stop words were removed, others were stemmed  For the RELIGION, POLITICS, SCIENCE and COMP subsets of the 20 Newsgroups dataset, the top 2000 terms ranked by Information Gain were selected; 500 documents per class were sampled at random for comparison with Slonim and Tishby (2001) Experimental Setup (continued) Dataset # classes total # docs # terms RELIGION 3 1500 2000 POLITICS 3 1500 2000 SCIENCE 4 2000 2000 COMP 5 2500 2000 Citeseer 6 3312 3703 Cora 6 2708 1433 26
  • 25.  William M. Pottenger, Ph.D. All Rights Reserved Scalability Across Training Set Sizes 27
  • 26.  William M. Pottenger, Ph.D. All Rights Reserved Results for Naïve Bayes, SVM, HONB and HOSVM on 20NG REL & SCI Datasets 28
  • 27.  William M. Pottenger, Ph.D. All Rights Reserved Results for Naïve Bayes, SVM, HONB and HOSVM on Citeseer & Cora Datasets 29
  • 28.  William M. Pottenger, Ph.D. All Rights Reserved Significance of Results for Naïve Bayes, SVM, HONB and HOSVM on All Datasets 30  HONB consistently and statistically significantly outperformed NB on all datasets (significant at <= 5% p-value)  HOSVM outperformed SVM on the RELIGION, POLITICS and SCIENCE datasets (significant at <= 5% p-value)  Although, the difference between HOSVM and SVM on the COMP dataset was significant at the level 0.158, HOSVM outperformed SVM on seven out of eight trials by an average of 3%
  • 29.  William M. Pottenger, Ph.D. All Rights Reserved What role do higher-order relations play in supervised machine learning? • Higher-Order Collective Classification (HOCC) – Classifies a set of instances simultaneously and thus exploits the relationships between them; Based on a record-relation graph – Capable of both supervised event detection as well as unsupervised anomaly detection • Application: Classification and Anomaly Detection of Interdomain Routing Events – Goal: detect and categorize such events – Menon, V. and Pottenger, W. M. (2009) A Higher Order Collective Classifier for Detecting and Classifying Network Events. In the Proceedings of the IEEE International Conference on Intelligence and Security Informatics 2009 (ISI 2009) 31 Vikas Menon Software Developer @ Bridgewater Associates
  • 30.  William M. Pottenger, Ph.D. All Rights Reserved HOCC Results • Detection of Interdomain Routing Events and Anomalies Based on Higher-Order Path Analysis – Slammer worm attack, Witty worm attack, 2003 East Coast Blackout • Real Time Classification of Abnormal Events – Sliding window samples of 120 three-second instances – 180th window = start of event – HOCC detects events and distinguishes anomalies Witty (Supervised) Witty (Unsupervised) 32
  • 31.  William M. Pottenger, Ph.D. All Rights Reserved What role do higher-order relations play in unsupervised machine learning? • Next step? Consider unsupervised learning… – Association Rule Mining (ARM) • ARM is one of the most widely used algorithms in data mining – Extend ARM to higher order… Higher Order Apriori • LHOIM (Latent Higher-Order Information Mining) • Experiments confirm the value of Higher Order Apriori on real world e-marketplace data 33 Shenzhi Li Senior Software Engineer @ Ask (Ask.com)
  • 32.  William M. Pottenger, Ph.D. All Rights Reserved LHOIM Results on 20NG Computer Dataset • Average error rate for 1st-order (top left) 2nd-order (top right) • Average stdev for 1st-order (bottom left) 2nd-order (bottom right) 34 Li, S. Z., Wu, T., and Pottenger, W. M. (2005) Distributed Higher Order Association Rule Mining Using Information Extracted from Textual Data. SIGKDD Explorations, volume 7, issue 1, pages 26-35.
  • 33. Higher Order Graph Sampling on Reuters Naï… 0 10 20 30 40 50 60 70 1 2 3 4 5 6 7 8 9 10 Naïve Bayes Random Sampling Higher Order Naïve Bayes Random Sampling Higher Order Naïve Bayes Higher Order Sampling Naï… 0 10 20 30 40 50 60 70 1 2 3 4 5 6 7 8 9 10 Naïve Bayes Random Sampling Higher Order Naïve Bayes Random Sampling Higher Order Naïve Bayes Higher Order Sampling Naï… 0 10 20 30 40 50 60 70 1 2 3 4 5 6 7 8 9 10 Naïve Bayes Random Sampling Higher Order Naïve Bayes Random Sampling Higher Order Naïve Bayes Higher Order Sampling Naï… 0 10 20 30 40 50 60 70 1 2 3 4 5 6 7 8 9 10 Naïve Bayes Random Sampling Higher Order Naïve Bayes Random Sampling Higher Order Naïve Bayes Higher Order Sampling Higher Order Naïve Bayes with Higher Order Sampling gives even better results Higher Order Naïve Bayes improves the accuracy by at least 10% Accuracy in % Patterns can be discovered using a much smaller sample – important for online learning Training Sample % Cibin George M.S. in CS @ Rutgers
  • 34.  William M. Pottenger, Ph.D. All Rights Reserved Higher Order (Online) Latent Dirichlet Allocation Intuitively, this formula can be interpreted as a word being assigned to a topic proportional to its frequency of occurrence in that topic. This is in fact, our guiding intuition and we simply replace these term frequencies with higher order frequencies. 36 Nir Grinberg Ph.D. in CS @ Rutgers Kashyap Kolipaka Ph.D. in CS @ Rutgers Christie Nelson Ph.D. at RUTCOR @ Rutgers
  • 35.  William M. Pottenger, Ph.D. All Rights Reserved Modeling Social Media for Emergency Response in Port-au-Prince, Haiti Cluster Geolocation
  • 36.  William M. Pottenger, Ph.D. All Rights Reserved Modeling Social Media for Emergency Response in Port-au-Prince, Haiti Cluster Geolocation with predicted resource
  • 37.  William M. Pottenger, Ph.D. All Rights Reserved Research Futures: Privacy-Enhanced Higher Order Community Partitioning ),()( 11 = , 1 1= jiIPA nl Q k ij k ij ji k l k l   ),()(=),() 2 (= ,, jiIPAjiI m dd AQ ijij ji ji ij ji   Let I(I,j) be 1 if vertices i and j are in the same community (social network), and 0 otherwise, then Newman’s Q-Modularity is defined as: Generalization Q-Modularity counts edges inside each community and subtracts the expected number of edges inside the same community. Higher-order Ql counts number of paths inside each community and subtracts the expected number of paths. We propose Ql as a measure of a community split and consider a combinatorial optimization approach. 39 Alex Nikolov, Ph.D. in CS @ Rutgers
  • 38.  William M. Pottenger, Ph.D. All Rights Reserved Results on Ground Truth Data • We optimized Ql using an LP rounding based approximation algorithm for correlation clustering. • We ran our experiments on networks with known communities, and compared the known communities to our clustering using the Adjusted Rand Index. Datasetl 1 2 3 4 Karate 0.5414 0.5669 0.5669 0.5669 Political Books 0.6250 0.6463 0.6463 0.6463 40
  • 39.  William M. Pottenger, Ph.D. All Rights Reserved Is Ql easier to approximate? • We approximated Ql on random Gn,p graphs for different values of l and p. • We used the ratio of the value of the found solution to the value of an LP relaxation as an estimate of the approximation factor. • It seems that Ql is harder for denser graphs (p high) but easier for higher l. l = 1 2 3 4 5 p = 0.03 0.9678 0.9840 1.0000 1.0000 0.9986 p = 0.12 0.1828 0.4542 -0.1179 0.8447 1.0000 p = 0.60 -0.1130 0.3975 1.0000 1.0000 1.0000 41
  • 40.  William M. Pottenger, Ph.D. All Rights Reserved Differential Privacy • Differential Privacy [DMNS]: A randomized function K gives ε-differential privacy if for all graphs G1,G2 differing in a single edge and all subsets S of Range(K): • The global sensitivity of a real valued function f is: where G1,G2 differ in a single edge. S])G([KPrS])G(K[Pr 21  GSf  maxG1,G2 | f (G1) f (G2) | 42
  • 41.  William M. Pottenger, Ph.D. All Rights Reserved Sensitivity of Ql The global sensitivity of Ql is at most 5(2l – 1)/l for any fixed clustering. By [DMNS], given a community split, outputting Ql + Lap(5(2l – 1)/lε) satisfies ε-differential privacy. 43
  • 42.  William M. Pottenger, Ph.D. All Rights Reserved Differentially Private Community Discovery • The measure of community split Ql is insensitive. – We can output the value of a community split differentially privately • But we would like a to design an algorithm Alg, such that: – Alg outputs a community partition with high Ql ; – Alg satisfies ε-differential privacy • Considered in Differentially Private Combinatorial Optimization (Gupta et al. 2009), but there is no general method. 44
  • 43.  William M. Pottenger, Ph.D. All Rights Reserved  In HOQL, we classify states as being in a high reward class or a low reward class. States are added to a class based on a threshold. We use HONB classification for action selection. We combine our method with greedy action selection based on the formula: ε = 1- εstart* (1-episodecurrent / episodetotal)  Q-values are updated based on the traditional formula: Q(st , at ) ← Q(st , at ) + α[rt+1 + γmaxa Q(st+1 , a) – Q(st , at )  Where α is the learning rate and γ is the discount factor. In these results, α = .91, γ = 1, and εstart = 0.8 REU Ashley Edwards Higher Order Q-Learning (HOQL) Ashley Edwards, Applicant for Ph.D. in CS @ Rutgers Edwards, A. and Pottenger, W. M. 2011. Higher Order Q- Learning. IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. Paris, France. 45
  • 44.  William M. Pottenger, Ph.D. All Rights Reserved Anomaly detection through machine-learning exposed that the Chinese government is capable of “line rate” MITM attacks. Due to pipelining in modern browser implementations, “censorware” is forced to remember a 5-tuple for every attempt a user makes to view censored content. <ipSrc, ipDst, srcPort, dstPort, proto> Chinese government routers use fiber-optics to do censorship at “line rate.” They lose the ability to drop packets, so every censorware router in the path must store a 5- tuple and block responses. This begs the question: “What kinds of computational complexity bottlenecks in ‘censorware’ can we exploit?” For example, how large of a “botnet” would be required to cause Chinese censorware routers to run out of memory? A BMITM User attempts to restart the connection. Government servers useSEQ-1460 attack on TCP. Government servers get user to establish new, fake connection User accepts new, fake connection and retransmits. Government rejects data transmission with RST packet. Server doesn’t understand new, fake connection. Sends RSTs. User rejects attempt to restart the connection. Server assumes user is adversarial. Sends RSTs and kills connection. REU Becker Polverini Using Clustering to Detect Censorware 46 Polverini, A. B. and Pottenger, W. M. 2011. Using Clustering to Detect Chinese Censorware. CSIIRW ’11 Oak Ridge National Labs, TN USA
  • 45.  William M. Pottenger, Ph.D. All Rights Reserved CCICADA technology transfer efforts • Goal: Technology transfer to DHS users and customers • Several Tech Transfer programs @ DHS S&T: – E2E – Engage to Excel – Tech Solutions – SECURE • CCICADA is committed to support these existing programs and to innovate new approaches – what can you do? – Publish your open-source software! – Commercialize your software! – Start your own company… and sell to DHS! 4747
  • 46. www.intuidex.com ©Intuidex 2013 48 Intuidex, Inc. Presenter: William M. Pottenger, Ph.D. DrWMPottenger@intuidex.com
  • 47. www.intuidex.com ©Intuidex 2013 49 About Intuidex Data Analytics and Data Model provider Focused on helping Organizations discover actionable intelligence from large, varied, and complex data sources Provides an open, extensible analytics platform, Watchman AnalyticsTM Platform and components that facilitate enhanced real-time information extraction, consolidation, fusion and discovery from disparate structured and unstructured data streams
  • 48. www.intuidex.com ©Intuidex 2013 50 The problem we solve: “Big Data”  Data volume and complexity has increased exponentially  The number of data sources has exploded as well as data formats, schemas and types  The most valuable data is often unstructured and fragmented  The necessary data to drive better decisions is often scattered across multiple data silos  Data that is useful and valuable is often incomplete and requires other data sources to validate  Data storage systems are often proprietary with limited interoperability  Data from different sources regarding the same entities sometimes conflicts.
  • 49. www.intuidex.com ©Intuidex 2013 51www.intuidex.com ©Intuidex 2013 51 Differentiation • Academic: Commercial Technology Development o Lab @ Rutgers University o Director of Tech Transition for DHS S&T CCI Center o Close cooperation with Rutgers Office of Commercialization o Three patents allowed, fourth pending • Strategic Partnerships o Rutgers University and DHS S&T Center of Excellence o PNNL-DHS S&T National Visual Analytics Center o Law Enforcement Partners: 3M (PIPS Technology) o Customers in Intel / Defense sectors
  • 50. www.intuidex.com ©Intuidex 2013 52 Analyst Information Overload FMV COMINT SIGINT HUMINT SIGACTS OTHER Analyst Applications and Visualization Platforms e.g., TIGR
  • 51. www.intuidex.com ©Intuidex 2013 53 Data Source Data Source Data Source Data Source HighPerformanceIndex(IxHPI™) Indexing Routine Indexing Routine Indexing Routine Indexing Routine Watchman Analytics™ Entity Extraction (IxExtract™) Feature Selection (IxFeatures™ Topic Modeling (IxTopics™) Rule Learning (IxRules™) Recommender (IxRecommend™) Alerting (IxAlert™) Clustering (IxCluster™) Data Validation (IxValidate™) Trending (IxEntityTrend™) Link Analysis (IxLinks™) Data Fusion (IxRelClu™) Entity Resolution (IxResolve™) U S E R Watchman Analytics™ Visualization Customer Visualization
  • 52. www.intuidex.com ©Intuidex 2013 54www.intuidex.com ©Intuidex 2013 54 • Web-based advanced data analytics and visualization solution • Adobe Flex RIA framework • Component Modules • Synchronized Watchman Analytics™ for BOSS
  • 53. www.intuidex.com ©Intuidex 2013 55 Intuidex and 3M Partnership Intuidex, Inc., a leader and innovator in data analytics (machine learning), is the pioneer of Higher Order Learning™ technologies that deliver unprecedented accuracy and efficiency in identifying linkages, trends and patterns across disparate information systems, in real time or near real time. Intuidex analytics have been licensed by customers in the US Defense and Intelligence Agencies, US Law Enforcement Agencies and the Fortune 500 to extract latent intelligence and insights from both structured and unstructured data sources. 3M (formerly PIPS Technology) is the worldwide leader in Automated License Plate Recognition (ALPR) technology. PIPS designs, manufactures, and supports its complete line of ALPR products and services for use in law enforcement, parking, tolling, and intelligent transportation systems. With over 20,000 cameras deployed around the globe and a wide range of patents covering their technology and its application, PIPS Technology is easily recognized as the leading provider of traffic related video imaging and license plate capture technology for public safety agencies everywhere.
  • 54. www.intuidex.com ©Intuidex 2013 56 APPLICATIONS OF HIGHER ORDER LEARNING™ FROM
  • 55. www.intuidex.com ©Intuidex 2013 57 • Objective: determine which COMINT is likely important and require further analysis • Data: plain text representation of comm-hits • 400 samples drawn from Afghanistan theater • Classification: two classes • Class A, Class B • Evaluation • Compared IxHONB™ to Naïve Bayes (NB) • Train on 5% to 90%, test on rest • Averages (accuracy, precision, recall, ...) across 10-folds Military Threat Detection Applications of Intuidex’s Higher Order Learning™
  • 56. www.intuidex.com ©Intuidex 2013 58www.intuidex.com ©Intuidex 2013 58 Weighted F-measure performance of NB vs. IxHONB™
  • 57. www.intuidex.com ©Intuidex 2013 59 MIRC (Chat) Entity Extraction  Data from MIRC chat Comm Hits (COMINT) has been helpful to GMTI analysts in  Determining the nature of movements detected by radar (e.g., wild animals don't radio their friends for help)  Whether ground targets may represent a threat  Validating known movements by corroborating with statements of locals (if they see a vehicle WE see, then we KNOW what the “dots” are)  Some “dots” can talk!  Tactical Ground Reporting System (TIGR)  A TIGR user on the battlefield has limited ability to refine a search the way an analyst can  Only has temporal and spatial filters, and relies on pre- packaged intel from various sources input to TIGR (HUMINT, SIGACT, HUMINT)
  • 58. www.intuidex.com ©Intuidex 2013 60 Example Actionable Information • IxRules™ aids a user in discovering rules for multiple entity types • IED Trigger “On 23 February 2006, at 12:30 PM, in Ba'qubah, Diyala, Iraq, assailants detonated a probable command-initiated improvised explosive device (IED) hidden in a soup vendor's handcart near an Iraqi Army patrol in the central market, killing eight Iraqi soldiers and eight civilians, wounding four Iraqi soldiers and 11 civilians, and causing unspecified damage to the public market. The Mujahidin Shura Council in Iraq (MSC) claimed responsibility.” • Height “… The suspect is described as black, medium complexion, 28-30 years old, clean-shaven, approximately 6 feet 8 inches tall, weighing 180- 200 pounds, with a muscular build. He was last seen wearing a black sweatshirt, black pants, and a dark blue or black knit hat. …”
  • 59. www.intuidex.com ©Intuidex 2013 61 Tactical Ground Reporting System: TIGR
  • 60. www.intuidex.com ©Intuidex 2013 62 Benefits to the Warfighter 1. Fusion of high-value COMINT intel provides significantly improved situational awareness for warfighters with ‘boots on the ground’ 2. Extraction / summarization of high-value COMINT, SIGACT, HUMINT from unstructured, unleveraged text sources 3. Fusion of high-value COMINT and other text- based intel with GMTI and other intel sources • Transitioned to: ESC/CIEF, used in DARPA Tactical Ground Reporting System (TIGR) Technology Transition Description • Fielded operationally at: Afghanistan and other theaters • Customer(s): TIGR and users, e.g., GEOINT, FSR, S2, ISR, MAI, CPTI, JIEDDO MID, CIED, RFI, NASIC, Centcom TFs Information extraction, summarization and fusion technologies to provide warfighter with situational awareness From theater: “These are exactly the sort of quick and dirty SIGINT summaries I am trying to get. … Just wanted to make sure you know how happy our ground units are to get this information in a wrap up. This daily tipper has made our supported units very happy. Thanks for the consistent help.”
  • 61. www.intuidex.com ©Intuidex 2013 63 • Objective: Classify confidence in perpetrator identification for incidents in NCTC Worldwide Incident Tracking System (WITS) • Data: relational tables from WITS • Sampled ~1,000 incidents from 80,000 record corpus • Included some free text • Classification: five confidence classes • Plausible, Likely, Unknown, Unlikely, Inferred (analyst) • Evaluation • Compared IxHONB™ to NB and LSI-kNN • Train on 5% to 90% of sample, test on rest • Averages (accuracy, precision, recall, ...) across 10-folds Counterterrorism Applications of Intuidex’s Higher Order Learning™
  • 62. www.intuidex.com ©Intuidex 2013 64 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 5 10 20 30 40 50 60 70 80 90 F-measure Percentage of Training Set Available for Training HONB LSI-kNN NB Non-weighted F-measure performance of NB, LSI-kNN and IxHONB™
  • 63. www.intuidex.com ©Intuidex 2013 65 Nuclear Detection •Data was taken from a Thermo Scientific handheld Spectroscopic Personal Radiation Detector called the InterceptorTM • 302 gamma-ray spectrum files •20 from Tc99m, the rest from other isotopes or background •Small positive class size • 1024 numeric channels per spectrum •High dimensional space • 14 labeled, high confidence isotopes •Potassium (40K; 1.3 billion years)
  • 64. www.intuidex.com ©Intuidex 2013 66 Sample of Results - Accuracy Accuracy 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% Ga67 – D-B 0.0002 0 0 0 0 0 0 0 0 0 Ga67 – N-D-B 1 1 1 0.343 0.778 0.697 0.39 0.57 0.26 0.06 I131 – D-B 0 0 0 0 0 0 0 0.01 0.251 0.16 I131 – N-D-B 0 0 0.002 0.008 0 0 0 0.01 0.002 0 In111 – D-B 0.136 0.017 0.01 0.001 0 0 0 0 0 0 In111 – N-D-B 1 0.08 0.389 0.005 0.001 0.037 0 0.18 0 0.45 Tc99m – D-B 0.049 0.095 0.001 0 0 0 0 0 0 0 Tc99m - N-D-B 0 0 0 0 0 0 0 0 0 0 Key Statistically Significant difference: NB < HONB Not Statistically Significant
  • 65. www.intuidex.com ©Intuidex 2013 67www.intuidex.com ©Intuidex 2013 67 Typical Intuidex Engagement • Client environment analysis Infrastructure (hardware, software) Data sources Operations (relevant and related policies) • Requirements Specification with SMEs Iterate until approved • Deploy high-performance index engine Install, configure, test • Deploy indexing routines Develop, configure, optimize • Deploy analytics services (Optional) Develop custom services to spec Install, configure, test
  • 66. www.intuidex.com ©Intuidex 2013 68www.intuidex.com ©Intuidex 2013 68 Typical Intuidex Engagement • (Optional) Existing visualization interface Design interface specification for existing framework • Ground-truth development with SMEs • System documentation Usage documentation Administration and Configuration documentation Visualization interface documentation (optional) • Deployment validation Quality assurance Load testing • Customer acceptance
  • 67. www.intuidex.com ©Intuidex 2013 69www.intuidex.com ©Intuidex 2013 69 Watchman Analytics™ Functionality Entity Resolution Online Monitoring Data Deconfliction Automated Alerting Interactive Analysis Entity Extraction Ad-hoc Reporting Entity Classification Privacy Protection* Quality Assurance Link-based Analysis Embedded Analytics * Privacy protection is a major Intuidex research area and development thrust
  • 68. www.intuidex.com ©Intuidex 2013 71 • Intuidex, Inc. is a hi-tech start-up incorporated by William. M. Pottenger, Ph.D. • Thought Leadership in Data Analytics • Key Partnerships
  • 69.  William M. Pottenger, Ph.D. All Rights Reserved Acknowledgements • I am very grateful to my hardworking, intelligent and creative (current and former) students and postdocs without whom none of this would have been possible: Kunikazu Yoda, Christie Nelson, Aleksandar Nikolov, Nir Grinberg, Cibin George, Christopher Janneck, Nikita Lytkin, Shenzhi Li, Murat Ganiz, Chirag Pandya, Kashyap Kolipaka, Vikas Menon, April Kontostathis, Tianhao Wu, Jirada Kuntraruk, Jason Perry, Mark Dilsizian (and >> others). • I also thank Rutgers University, the National Science Foundation, the Department of Homeland Security and the National Institute of Justice. This material is based upon work partially supported by the National Science Foundation under Grant Numbers 0703698 and 0712139. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or Rutgers University. • I also gratefully acknowledge the continuing help of my Lord and Savior, Yeshua the Messiah (Jesus the Christ) in my life and work. 72
  • 70.  William M. Pottenger, Ph.D. All Rights Reserved Thank you! Q&A 73
  • 71.  William M. Pottenger, Ph.D. All Rights Reserved References  Soumen Chakrabarti, Byron Dom, and Piotr Indyk. Enhanced hypertext categorization using hyperlinks. SIGMOD Rec., 27(2):307–318, 1998.  Scott Deerwester, Susan T. Dumais, George W. Furnas,Thomas K. Landauer, and Richard Harshman.  Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391–407, 1990.  Lise Getoor and Christopher P. Diehl. Link mining: a survey. SIGKDD Explor. Newsl., 7(2):3–12, 2005.  Murat Can Ganiz, Sudhan Kanitkar, Mooi Choo Chuah, and William M. Pottenger. Detection of interdomain routing anomalies based on higher-order path analysis. In ICDM ’06: Proceedings of the Sixth International Conference on Data Mining, pages 874–879, Washington, DC, USA, 2006. IEEE Computer Society.  Leo Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39–43, March 1953.  April Kontostathis and William M. Pottenger. A framework for understanding latent semantic indexing (LSI) Performance. Inf. Process. Manage., 42(1):56–73, 2006. 74
  • 72.  William M. Pottenger, Ph.D. All Rights Reserved  Qing Lu and Lise Getoor. Link-based classification. In Tom Fawcett and Nina Mishra, editors, ICML, pages 496–503. AAAI Press, 2003.  Shenzhi Li, Tianhao Wu, and William M. Pottenger. Distributed higher order association rule mining using information extracted from textual data. SIGKDD Explorations Newsl., 7(1):26–35, 2005.  J. Neville and D. Jensen. Iterative classification in relational data. In Proc. AAAI, pages 13–20. AAAI Press, 2000.  J. Neville and D. Jensen. Dependency networks for relational data. Data Mining, 2004. ICDM ’04. Fourth IEEE International Conference, pages 170– 177, Nov. 2004.  Noam Slonim and Naftali Tishby. The power of word clusters for text classification. In In 23rd European Colloquium on Information Retrieval Research, 2001.  Ben Taskar, Eran Segal, and Daphne Koller. Probabilistic classification and clustering in relational data. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pages 870–878, 2001.  Vladimir Vapnik. Statistical Learning Theory. John Wiley, 1998. References 75