SlideShare uma empresa Scribd logo
1 de 83
Baixar para ler offline
DT Brown Bag: A Primer in Analytics
WELCOME!
R2 = 500; p<marty’s 1mile time
asymptotically approaching perfect
Thursday, August 22, 13
Outline
•EAT, Guten Appetit, Bon appetit, Buen apetito, Buon appetito!
•Words from the VP
•Why this brown-bag?
•Analytics Services:
•Team Introduction; About YOU!
•Why Analytics!?
•Philosophy...
•Case Studies:
•Case Study (Nathan D.)
•Localview (Marty A.)
•Case Study (me)
•Core Values: Analytical Insights
•On the horizon...
Thursday, August 22, 13
Why this brown bag??
Learning [close] at a pace similar to the pace at which we learn.
Learning and Educating from/to PMs, SWE, and OPs.
PM: Provide insights from FRIs/RFPs.
PM: Atmospherics from our costumers.
SWE: Accessing data spaces.
SWE: Integrating algorithms.
OP: How do you best consume the outputs of models?
OP: What models are best to present to OPs?
PM: Program Managers, SWE: Software Engineers, OP: Operators
Thursday, August 22, 13
ISW
USMA
DARPA%...%
Why this brown bag??
Thursday, August 22, 13
Data Tactics Analytics Practice
The Team:
(Nathan D., Shrayes R., David P., Adam VE., Andrew T., Geoffrey B., Rich H.)
Graduates from top universities...
Degrees include:
mathematics, computer science, aeronautical engineering,
astrophysics, electrical engineering, mechanical engineering, statistics,
social science(s).
Base competencies (horizontals): Clustering, Association Rules,
Regression, Naive Bayesian Classifier, Decision Trees, Time-Series,
Text Analysis.
Going beyond the base (verticals)...
Thursday, August 22, 13
Data Tactics Analytics Practice
ABOUT YOU:
28 confirmed, 18 webex, 14 tentative (n:60 represent > 25% of the company)
21 confirmed within the first 60 minutes....
Monsee Wood & Steve Moccio 1st
Charles Fuller & Lenesto Page Last
Chris Zilligen: 3,120 (Longest resume)
Catherine Schymanski: 284 (shortest resume)
Linguistic Standard:
Jack Gustafson (FK: -126)
Shrayes Ramesh (FK: -38)
...analytics team below the company average!! :)
Thursday, August 22, 13
Horizontals & Verticals
Clustering || Regression || Decision Trees || Text Analysis
Association Rules || Naive Bayesian Classifier || Time Series Analysis
econom
etricsspatialeconom
etrics
graph
theory
algorithm
s
astrophysicaltim
e-series
analysis
path
planning
algorithm
s
bayesian
statistics
constrained
optim
izations
num
ericalintegration
techniques
PCA
G
LM
hierarchicalm
odels
IRT
DLISA
latentclass
analysis
structuralequation
m
odeling
m
ixture
m
odels
SVM
m
axent
CART
naive
bayes
classifier
ICA
Thursday, August 22, 13
Data Tactics Analytics Practice
Program
m
ing
&
Scripting
Skills
M
athem
atics
&
Statistics
Domain Expertise
DT
Analytics
Traditional
Research
DangerZone!
~statisticulation
ML
[2] http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
[1] Statisticulation “How to Lie with Statistics” Darrell Huff
[3] https://portal.data-tactics-corp.com/sites/analytics/Wiki/AnalyticsFAQ.aspx
Thursday, August 22, 13
Why Analytics [Business]???
Why are analytics important?
(Business, Analytics, Practical)
"We need to stop reinventing the cloud
and start using it!"
(Dave Boyd)
Thursday, August 22, 13
Why are analytics important?
(Business, Analytics, Practical)
Analytics:
No Free Lunch (NFL) theorems: no algorithm performs better
than any other when their performance is averaged uniformly
over all possible problems of a particular type. Algorithms must
be designed for a particular domain or style of problem, and that
there is no such thing as a general purpose algorithm.
Why Analytics [Analytics]???
Thursday, August 22, 13
Marty doesn’t scale - none of us do.
Data Scales
Web Scales
Academic Publications Scale
IC Scales
N
t
t
Why Analytics [Practical]???
Thursday, August 22, 13
Why Analytics [Practical]???
Why are analytics important?
(Business, Analytics, Practical)
“…the alternative to good statistics is not “no
statistics,” it’s bad statistics. People who argue
against statistical reasoning often end up backing up
their arguments with whatever numbers they have at
their command, over- or under-adjusting in their
eagerness to avoid anything systematic” Bill James
Thursday, August 22, 13
"companies that have massive amounts of data
without massive amounts of clue are going to be
displaced by startups that have less data but more
clue" (Tim O’Reilly)
Philosophy:
Thursday, August 22, 13
Philosophy:
We are NOT “Data Agnostic”
...this should represent an early warning
system about our culture. The IT notion
of data is dead.
Thursday, August 22, 13
Analytics in Perspective...
http://datatactics.blogspot.com/2013/07/analytics-in-perspective-inquiry-into.html
Analytics in Perspective: An Inquiry into Modes of Inquiry
Thursday, August 22, 13
“Analytics in Perspective” reflects how people arrive at
decisions.
GOOD: Induction, Abduction, Circumscription, Counterfactuals.
BAD: Deduction, Speculation, Justification, Groupthink
Analytics in Perspective...
Thursday, August 22, 13
Identifying Smugglers
Leveraging Big Spatio-Temporal Data
Thursday, August 22, 13
Background: The Strait of Hormuz
Importance:
• Oil
• Embargo
• Smuggling
Thursday, August 22, 13
How to Catch Smugglers
In order to stop smugglers, we must identify:
1. Which boats are undertaking illicit activities
2. Where illicit activities are taking place
3. Points of departure/arrival of suspicious ships
Thursday, August 22, 13
A Difficult Task: Too Much Data
AIS (transponder) provides ship-level data:
• Ship location (lat-long)
• Ship speed
• Ship bearing
• Ship “purpose”
• Time stamp
About 0.5M pings from 1,300 boats between
March 2012 and January 2013.
Thursday, August 22, 13
A Difficult Task: Too Much Data
Thursday, August 22, 13
A Difficult Task: Too Little Data
Individual pings or tracks not useful: no point of
comparison
Similarly, small duration plots are too thin to provide
analytic leverage.
Thursday, August 22, 13
A Difficult Task: Too Little Data
.
A single boat:
Thursday, August 22, 13
A Difficult Task: Too Little Data
.
A single day:
Thursday, August 22, 13
A Difficult Task: Many Types of Boats
Thursday, August 22, 13
Solution: Analytics
Use a statistical model to discover patterns in
the data…
…then identify observations (boat-times) that do
not fit those patterns.
Goal: Identify boats, place, and times that exhibit
or house discrepant behavior.
Thursday, August 22, 13
Characteristics of a Good Model
A good model for this data should:
• Leverage all of the available data
• Take advantage of local information (not global patterns)
• Be able to accommodate a variety of patterns (shipping,
fishing, etc)
• Be able to identify ships that are only occasionally deviant
• Identify place-times where deviant activity occurs
• Be estimable with reasonable computational resources
Thursday, August 22, 13
The Model
A local, unsupervised-as-supervised learning,
bagged, probability model.
A LUBaP model?
Thursday, August 22, 13
The Model
A local, unsupervised-as-supervised learning,
bagged, probability model.
We want to compare apples-to-apples; that is,
treat nearby (spatio-temporally) boats the same,
don't compare them to far-flung ones.
Assign each observation to a geographically
constrained grid square.
Thursday, August 22, 13
The Model
A local, unsupervised-as-supervised learning,
bagged, probability model.
Thursday, August 22, 13
The Model
A local, unsupervised-as-supervised learning,
bagged, probability model.
Let m denote the number of observations in a particular grid
square. Then, in each square, add m additional observations
with the following characteristics:
•position, drawn from bivariate uniform distribution
•speed, drawn with replacement from empirical distribution
•time of observation, drawn from a uniform distribution
Now, the task is no longer unsupervised, but supervised.
->Model the probability of a boat being a ``real'' boat.
Thursday, August 22, 13
The Model
Thursday, August 22, 13
The Model
Thursday, August 22, 13
The Model
A local, unsupervised-as-supervised learning,
bagged, probability model.
•Turned outlier detection, a poorly structured problem, into
modeling a binary target, a very well-understood problem
•Now, simply model the probability that each boat is “real”
•Apply logistic regression to each grid square
•Allow the flexibility (order) of the model fit (splines,
interactions) to depend on the data density in each square
(more data, richer model).
•logit(“real”) = f(speed, location, time)
Thursday, August 22, 13
The Model
A local, unsupervised-as-supervised learning,
bagged, probability model.
Problem: Predictions may be arbitrary due to
random assignment and grid coarseness.
Thursday, August 22, 13
The Model
A local, unsupervised-as-supervised learning,
bagged, probability model.
Problem: Predictions may be arbitrary due to
random assignment and grid coarseness.
Solution:
1. Create multiple grids with different positions.
2. Re-run the local model in each square, for
each different grid.
3. Aggregate the predicted probabilities for each
observation, in each grid, by averaging.
Thursday, August 22, 13
Computational Efficiency
Estimating a flexible model in each of ~300 grid squares, for
each of 6 grids, means estimating ~1,800 logistic models!
Not a problem, because:
• each one has limited amounts of data (most algorithms take
exponentially longer as a function of data size)
• each local model is separate, allowing for parallel
processing
Computation on my laptop takes ~4 minutes after simple
parallelization across cores.
Thursday, August 22, 13
What is the Output from this Model?
•Predicted probability of each boat-time (i.e. observation)
being a real boat.
•High probabilities indicate observations doing something
“normal” or “predictable.”
•Low probabilities indicate observations doing something
“discrepant.”
Ship ID Lat Long Speed Timestamp Pr
623432 24.546 55.005 9.8 1203221230 0.78
874627 24.716 55.108 12.4 1209242230 0.08
523881 25.128 54.807 4.2 1206120947 0.64
Thursday, August 22, 13
Value I: Location of Illicit Activities
Thursday, August 22, 13
Value II: Identify Devious Boats
Thursday, August 22, 13
Value III: Prioritized List of Suspect Boats
•Model generates probabilities on an interval scale
•Facilitates efficient use of scarce enforcement resources
Thursday, August 22, 13
Lessons Learned
Analytics is a powerful tool for identifying patterns in big data.
Identifying outliers is predicated on identifying patterns.
LUBaP models are a powerful tool for outlier detection.
This model utilizes no subject matter expertise and a simple
probability model (implications: portable across domains; fast)
Thursday, August 22, 13
What’s the Next Hot Thing?
Unsupervised Scaling of Text Data
Thursday, August 22, 13
Analyzing Text is Important
The preponderance of data created today is free text, not
structured numerical data.
One thing people want to do with text is “scale” it; that is, rank
order it according to an underlying continuum.
Examples:
-put a numerical value on what each product reviewer thinks of
a particular product
-generate a measure of the extremism of Iranian clerics based
on their writings
Thursday, August 22, 13
Analyzing Text is Difficult
Text data is unstructured, and messy.
“I thought I would love the iPhone, but it’s actually not that
great.”
Standard approaches:
1. Dictionary: Create a numeric value for many content-laden
words; compare texts to the dictionary.
2. Estimation: Hand-score many texts; use the scores as a
basis for training a statistical model for other texts.
Thursday, August 22, 13
A New Approach
Each author’s use of a word implies they “support” that
word, as opposed to words they don’t use. The
model, developed for scaling ideological positions of
legislators from votes, can be applied to word use.
Benefits:
1: No dictionary!
2: Language invariant!
https://github.com/DataTacticsCorp/text-analysis
Thursday, August 22, 13
Preliminary Example
Pulled down 2000 tweets, 1000 each with the hashtags #prolife
and #prochoice.
Drop the hashtags (no cheating!), pre-process the text data, and
run the model.
Thursday, August 22, 13
Output
Thursday, August 22, 13
Output
Thursday, August 22, 13
Output
Thursday, August 22, 13
Local Events, Worldwide Impact
Thursday, August 22, 13
Localview
Localview also known as “Lv”, is a Cloud/Web
based proprietary Dashboard with an
advanced analytics framework – the desired
end state is an integrated data mining,
knowledge discovery and pattern recognition
of social and spatial pattering. Lv will provide
end-users with globally and locally available
historical information as well as globally and
locally available real-time social media data
feed. This service includes; news, on the spot
statistics using a proprietary Data Tactics Tool
called
©
“ZoomStat”, historical facts, social media, economics, security, military,
infrastructure, health, aid, natural disasters, war, entertainment, weather,
transportation, and travel. All results will be analyzed, ingested,
normalized, and then plotted on a dynamic and interactive global map.
Thursday, August 22, 13
...by the numbers
 7 volunteered & part time team members (NO OVERHEAD)
 first DEMO delivered in 86 days
 832 hours of research & development time
Thursday, August 22, 13
The Team:
The Team
backend development frontend development data analysis development
Marty A
Joe A
Joon K
Annie W Dave P
Rich H
Shenoa H
Thursday, August 22, 13
Evolution:
Thursday, August 22, 13
Evolution:
Thursday, August 22, 13
Development Process
Lv Development Process
Thursday, August 22, 13
End-Users:
Law Enforcement
IC & DoD Commercial
Thursday, August 22, 13
Directional Space Time
Analytics
Base-Rate Fallacy
Thursday, August 22, 13
Directional Space Time Analytics
Data Tactics has been working on a set of problems that
require considered solutions. The following method
compares distributions at two points in time, with a
particular focus on changes in the overall morphology of the
distribution as well as mobility of individual observations
within the distribution over that same period of time and
contextually accounting for neighborhood effects. These
dynamics are illuminating and communicate time and
explicitly account for underlying spatial dimension (Wy).
Based on the integration of a dynamic local space-time
together with direction statistics these methods provide
insights on the role of spatial dependence and uncontrolled
variance over time and space.
Thursday, August 22, 13
Directional Space Time Analytics
This analysis demonstrates the utility of directional space time analytics
on regional stability distribution dynamics. Drawing on recent advances
in geovisualization [1], we suggest a spatially explicit view of mobility.
Based on the integration of a dynamic local indicator of spatial
association together with directional statistics and mapped data points
to each observation, this framework provides new insights on the role of
spatial dependence in regional stability and change.
These approaches have been illustrated with state level incomes in the
U.S. (1969-2008), Gross Domestic Product (1960 - 2011) Failed State
Index (2010 - 2012), and GMTI data (t0, t1).
[1] Murray, A. T., Liu, Y., Rey, S. J., and Anselin, L. (2010). Exploring movement object patterns.
Thursday, August 22, 13
Per Capita Gross Domestic Product
A measure of the total output of a country that takes the gross domestic product (GDP)
and divides it by the number of people in the country. The per capita GDP is especially
useful when comparing one country to another because it shows the relative
performance of the countries. A rise in per capita GDP signals growth in the economy
and tends to translate as an increase in productivity.
GDP is widely used by economists to gauge economic recession and recovery and an
economy's general monetary ability to address externalities. It is not meant to measure
externalities. It serves as a general metric for a nominal monetary standard of living and
is not adjusted for costs of living within a region.
Gross Domestic Product
GDP = private consumption + gross investment + government spending + (exports − imports), or
Thursday, August 22, 13
GDP per. Capita
Time Span: 1960 to 2011 (51 temporal bin(s), 1 year intervals): 2000 to 2011 (12 temporal
bin(s), 1 year intervals);
Spatial Area: Global;
Original Sample: 202 obs;
Data processing: imputation;
Pruned Sample: 145 observations;
Method: Directional Local Indicator of Spatial Autocorrelation (Moran’s I) with space-time
classifications of High-high (Hh), high-High, Low-Low (LL), High Low (HL), Low-High (LH);
Spatial Weights: knn4;
Thursday, August 22, 13
> describe(dlisa$yr2000)
> describe(dlisa$yr2011)
V. Name n mean sd median mad min max range skew kurtosis
yr2000 145 5759 9534 1491 1831 87 46453 46366 2.12 3.72
yr2011 145 13292 20621 4666 5841 231 114232 114001 2.46 6.54
Directional Space Time Analytics
Thursday, August 22, 13
Directional Space Time Analytics
https://vimeo.com/69775085
Thursday, August 22, 13
Directional Space Time Analytics
2000:2011 (12 temporal bin(s), 1 year intervals);
Thursday, August 22, 13
Directional Space Time Analytics
What is wrong with Vermont[1]?
- Seemingly nothing!
- Lies within head of approximately normal distribution
- Not an outlier in a classical statistical sense
- Vermont remains below the US average but is
closing the gap.
[1] State Median Income
Thursday, August 22, 13
State Median Income
Time Span: 1969 to 2008 (40 temporal bin(s), 1 year intervals)
Spatial Area: Contiguous United States;
Original Sample: 48 obs;
Method: Directional Local Indicator of Spatial Autocorrelation (Moran’s I) with space-time
classifications of High-high (Hh), high-High, Low-Low (LL), High Low (HL), Low-High (LH);
Spatial Weights: Rook Contiguity;
Thursday, August 22, 13
Directional Space Time Analytics
1969:2008 (40 temporal bin(s), 1 year intervals)
Thursday, August 22, 13
Directional Space Time Analytics
1969:2008 (40 temporal bin(s), 1 year intervals)
Thursday, August 22, 13
Directional Space Time Analytics
1969:2008 (40 temporal bin(s), 1 year intervals)
Thursday, August 22, 13
Directional Space Time Analytics
Thursday, August 22, 13
Core Values:
Localview as an ecosystem:
Most existing big data analyses of social media are confined to a
single platform. However, most of the topics of interest to such
studies, such as influence or information flow can rarely be confined
to the Internet, let alone to a single platform. Understandable
difficulty in obtaining high-quality multi-platform data does not mean
that we can treat a single platform as a closed and insular system,
as if human information flows were all gases in a chamber.
“Shapes of stories into computers...” Kurt Vonnegut
Nate Silver - Cognition2
; Small Multiples; Tukey vs. Tufte
http://kottke.org/11/09/kurt-vonnegut-explains-the-shapes-of-stories
Thursday, August 22, 13
Core Values:
Open-source software where possible. 
-Bigger data means bigger cost.
-Scientific Python and R Computing Language reached maturity years ago.
Data = Rough + Smooth Qualities
Rough = impulsive, spiky signal: outliers; Smooth = pervasive
Leverage analytics to help understand patterns in data as well as outliers - so called rough
and smooth elements of data. The “smooth” and the “rough” patterns in data are
informative, depending on the specific questions customers have.
Local, as opposed to global or whole-map statistics:
We believe that micro-level, local patterns are often of key interest, and can be
obscured or distorted by attempts to fit global models to local data. 
Analytical Pluralism:
Mutli-method approaches dominate single-method approaches.  Rather than craft a single
statistical model to answer a customer question, we attack problems from several angles
simultaneously, deriving insights from areas of overlap and divergence in the pattern of findings.
Methodological pathways:
Blend nomothetic and idiographic approaches.
Thursday, August 22, 13
Core Values:
Thursday, August 22, 13
Analytical Resources:
https://portal.data-tactics-corp.com/sites/analytics/SitePages/Home.aspx
Thursday, August 22, 13
https://github.com/DataTacticsCorp
Analytical Resources:
Thursday, August 22, 13
Analytical Resources:
http://datatactics.blogspot.com
Thursday, August 22, 13
...on the horizon.
...On the Horizon:
DT & USMA Department of Systems Engineering partner together and leverage
the Advanced Individual Academic Development Program.
Rstudio: analytics.data-tactics-corp.com; PostgreSQL: analytics.data-tactics-corp.com Port: 5432
https://github.com/rheimann/kiva-master
Thursday, August 22, 13
Data Tactics & US Military Academy:
A Prime in Microfinance using KIVA
Rstudio: analytics.data-tactics-corp.com; PostgreSQL: analytics.data-tactics-corp.com Port: 5432
Understanding the complex nature of microfinance more completely:
The US military is directly involved in microfinance (Iraq & Afghanistan), working primarily
through Provincial Reconstruction Teams (PRTs).  Funded by the DoD and DoS; the
operational requirements of these agencies create a need to demonstrate quick impact on
economic recovery and therefore the goal is to report high numbers of loans. 
Technical complexities separate this data from other datasets:
Heterogeneous forms: structured/unstructured/nominal,ordinal, quantitative/temporal/
geographic/multi-lingual/multiple relationships(lenders to recipients) - multiple sectors/
missing data. Data cleansing is hard!
Big Data(ish): $420M (USD), 1.1 million lenders, 580,000 loans, 250 partners, 4.1M
transactions, 3 WHOLE GBs. (https://vimeo.com/28413747)
Broad appeal:
...government to defense to finance to banking to non-profit organizations to THE POOR.
https://github.com/rheimann/kiva-master
Thursday, August 22, 13
...on the horizon.
...On the Horizon:
DT & The Institute for the Study of War will collaborate in a balanced but largely
quantitative approach to analyzing revolutions and the role social media plays with
particular focus on the Iraq Spring.
Thursday, August 22, 13
...on the horizon.
...on the Horizon:
Data Science for Program Managers (late September / early October)
Analytics Brown Bag Volume II (October / Early November)
Thursday, August 22, 13
Thank you...
83
Questions?
Homepage: http://www.data-tactics.com
Blog: http://datatactics.blogspot.com
Twitter: https://twitter.com/DataTactics
Or, me (Rich Heimann) at rheimann@data-tactics-corp.com
Thursday, August 22, 13

Mais conteúdo relacionado

Mais procurados

Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBenchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBhaskar Mitra
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackBhaskar Mitra
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalNik Spirin
 
Recommender Systems in the Linked Data era
Recommender Systems in the Linked Data eraRecommender Systems in the Linked Data era
Recommender Systems in the Linked Data eraRoku
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalBhaskar Mitra
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document RankingBhaskar Mitra
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataSSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataPolytechnic University of Bari
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Polytechnic University of Bari
 
Data Tactics Open Source Brief
Data Tactics Open Source BriefData Tactics Open Source Brief
Data Tactics Open Source BriefDataTactics
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsNYC Predictive Analytics
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Andre Freitas
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies台灣資料科學年會
 
Informatics is a natural science
Informatics is a natural scienceInformatics is a natural science
Informatics is a natural scienceFrank van Harmelen
 
Probablistic information retrieval
Probablistic information retrievalProbablistic information retrieval
Probablistic information retrievalNisha Arankandath
 

Mais procurados (20)

Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBenchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
Recommender Systems in the Linked Data era
Recommender Systems in the Linked Data eraRecommender Systems in the Linked Data era
Recommender Systems in the Linked Data era
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrieval
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataSSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
 
Recommender Systems and Linked Open Data
Recommender Systems and Linked Open DataRecommender Systems and Linked Open Data
Recommender Systems and Linked Open Data
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
 
Data Tactics Open Source Brief
Data Tactics Open Source BriefData Tactics Open Source Brief
Data Tactics Open Source Brief
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
Informatics is a natural science
Informatics is a natural scienceInformatics is a natural science
Informatics is a natural science
 
Question answering
Question answeringQuestion answering
Question answering
 
Probablistic information retrieval
Probablistic information retrievalProbablistic information retrieval
Probablistic information retrieval
 

Destaque

Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big DataRevolution Analytics
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsDavid Pittman
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 
Learning Lunches
Learning LunchesLearning Lunches
Learning LunchesMissyKrupp
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksBICA Labs
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataPaco Nathan
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle CompetitionsDataRobot
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkDEEPASHRI HK
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentationlpaviglianiti
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitionsOwen Zhang
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsNhatHai Phan
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learningjoshwills
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDevashish Shanker
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)Prof. Dr. Diego Kuonen
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013Philip Zheng
 

Destaque (20)

Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big Data
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Learning Lunches
Learning LunchesLearning Lunches
Learning Lunches
 
Creation of ideas
Creation of ideasCreation of ideas
Creation of ideas
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big Data
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentation
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013
 

Semelhante a Data Tactics Analytics Brown Bag (Aug 22, 2013)

Analytics Brownbag
Analytics Brownbag Analytics Brownbag
Analytics Brownbag DataTactics
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsKrishna Sankar
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsKrishna Sankar
 
Unlocked Workshop OSCON 2013 - Part I
Unlocked Workshop OSCON 2013 - Part IUnlocked Workshop OSCON 2013 - Part I
Unlocked Workshop OSCON 2013 - Part IWayne Walls
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxGreg Makowski
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk KnowledgeKrishna Sankar
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoopRussell Jurney
 
Machine Learning for Scientific Applications
Machine Learning for Scientific ApplicationsMachine Learning for Scientific Applications
Machine Learning for Scientific ApplicationsDavid Lary
 
Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...
Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...
Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...Grigori Fursin
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxVenkateswaraBabuRavi
 
Introduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDBIntroduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDBHector Correa
 
Reasons to select research data and where to start
Reasons to select research data and where to startReasons to select research data and where to start
Reasons to select research data and where to startThe University of Edinburgh
 
Discovering emerging effects in Learning Networks with simulations Hendrik Dr...
Discovering emerging effects in Learning Networks with simulations Hendrik Dr...Discovering emerging effects in Learning Networks with simulations Hendrik Dr...
Discovering emerging effects in Learning Networks with simulations Hendrik Dr...Hendrik Drachsler
 
2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible researchYannick Wurm
 
Bring the Noise
Bring the NoiseBring the Noise
Bring the NoiseJon Cowie
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-sharestelligence
 
Lecture on AI and Machine Learning
Lecture on AI and Machine LearningLecture on AI and Machine Learning
Lecture on AI and Machine LearningXiaonan Wang
 

Semelhante a Data Tactics Analytics Brown Bag (Aug 22, 2013) (20)

Analytics Brownbag
Analytics Brownbag Analytics Brownbag
Analytics Brownbag
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science Competitions
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science Competitions
 
Unlocked Workshop OSCON 2013 - Part I
Unlocked Workshop OSCON 2013 - Part IUnlocked Workshop OSCON 2013 - Part I
Unlocked Workshop OSCON 2013 - Part I
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptx
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
 
Machine Learning for Scientific Applications
Machine Learning for Scientific ApplicationsMachine Learning for Scientific Applications
Machine Learning for Scientific Applications
 
How to crack down big data?
How to crack down big data? How to crack down big data?
How to crack down big data?
 
Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...
Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...
Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
 
Introduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDBIntroduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDB
 
Reasons to select research data and where to start
Reasons to select research data and where to startReasons to select research data and where to start
Reasons to select research data and where to start
 
Discovering emerging effects in Learning Networks with simulations Hendrik Dr...
Discovering emerging effects in Learning Networks with simulations Hendrik Dr...Discovering emerging effects in Learning Networks with simulations Hendrik Dr...
Discovering emerging effects in Learning Networks with simulations Hendrik Dr...
 
2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research
 
Bring the Noise
Bring the NoiseBring the Noise
Bring the Noise
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
 
Lecture on AI and Machine Learning
Lecture on AI and Machine LearningLecture on AI and Machine Learning
Lecture on AI and Machine Learning
 
Big Data Tutorial V4
Big Data Tutorial V4Big Data Tutorial V4
Big Data Tutorial V4
 

Mais de Rich Heimann

Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"
Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"
Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"Rich Heimann
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Rich Heimann
 
Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Rich Heimann
 
Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Rich Heimann
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Rich Heimann
 
Data Tactics Analytics Brown Bag (November 2013)
Data Tactics Analytics Brown Bag (November 2013)Data Tactics Analytics Brown Bag (November 2013)
Data Tactics Analytics Brown Bag (November 2013)Rich Heimann
 
Spatial Analysis; The Primitives at UMBC
Spatial Analysis; The Primitives at UMBCSpatial Analysis; The Primitives at UMBC
Spatial Analysis; The Primitives at UMBCRich Heimann
 
Spatial Analysis and Geomatics
Spatial Analysis and GeomaticsSpatial Analysis and Geomatics
Spatial Analysis and GeomaticsRich Heimann
 
Week 1 Lecture @ UMBC
Week 1 Lecture @ UMBCWeek 1 Lecture @ UMBC
Week 1 Lecture @ UMBCRich Heimann
 
Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Rich Heimann
 

Mais de Rich Heimann (10)

Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"
Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"
Guest Talk for Data Society's "INTRO TO DATA SCIENCE BOOT CAMP"
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
 
Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)
 
Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
 
Data Tactics Analytics Brown Bag (November 2013)
Data Tactics Analytics Brown Bag (November 2013)Data Tactics Analytics Brown Bag (November 2013)
Data Tactics Analytics Brown Bag (November 2013)
 
Spatial Analysis; The Primitives at UMBC
Spatial Analysis; The Primitives at UMBCSpatial Analysis; The Primitives at UMBC
Spatial Analysis; The Primitives at UMBC
 
Spatial Analysis and Geomatics
Spatial Analysis and GeomaticsSpatial Analysis and Geomatics
Spatial Analysis and Geomatics
 
Week 1 Lecture @ UMBC
Week 1 Lecture @ UMBCWeek 1 Lecture @ UMBC
Week 1 Lecture @ UMBC
 
Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)Human Terrain Analysis at George Mason University (DAY 1)
Human Terrain Analysis at George Mason University (DAY 1)
 

Último

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 

Último (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

Data Tactics Analytics Brown Bag (Aug 22, 2013)

  • 1. DT Brown Bag: A Primer in Analytics WELCOME! R2 = 500; p<marty’s 1mile time asymptotically approaching perfect Thursday, August 22, 13
  • 2. Outline •EAT, Guten Appetit, Bon appetit, Buen apetito, Buon appetito! •Words from the VP •Why this brown-bag? •Analytics Services: •Team Introduction; About YOU! •Why Analytics!? •Philosophy... •Case Studies: •Case Study (Nathan D.) •Localview (Marty A.) •Case Study (me) •Core Values: Analytical Insights •On the horizon... Thursday, August 22, 13
  • 3. Why this brown bag?? Learning [close] at a pace similar to the pace at which we learn. Learning and Educating from/to PMs, SWE, and OPs. PM: Provide insights from FRIs/RFPs. PM: Atmospherics from our costumers. SWE: Accessing data spaces. SWE: Integrating algorithms. OP: How do you best consume the outputs of models? OP: What models are best to present to OPs? PM: Program Managers, SWE: Software Engineers, OP: Operators Thursday, August 22, 13
  • 4. ISW USMA DARPA%...% Why this brown bag?? Thursday, August 22, 13
  • 5. Data Tactics Analytics Practice The Team: (Nathan D., Shrayes R., David P., Adam VE., Andrew T., Geoffrey B., Rich H.) Graduates from top universities... Degrees include: mathematics, computer science, aeronautical engineering, astrophysics, electrical engineering, mechanical engineering, statistics, social science(s). Base competencies (horizontals): Clustering, Association Rules, Regression, Naive Bayesian Classifier, Decision Trees, Time-Series, Text Analysis. Going beyond the base (verticals)... Thursday, August 22, 13
  • 6. Data Tactics Analytics Practice ABOUT YOU: 28 confirmed, 18 webex, 14 tentative (n:60 represent > 25% of the company) 21 confirmed within the first 60 minutes.... Monsee Wood & Steve Moccio 1st Charles Fuller & Lenesto Page Last Chris Zilligen: 3,120 (Longest resume) Catherine Schymanski: 284 (shortest resume) Linguistic Standard: Jack Gustafson (FK: -126) Shrayes Ramesh (FK: -38) ...analytics team below the company average!! :) Thursday, August 22, 13
  • 7. Horizontals & Verticals Clustering || Regression || Decision Trees || Text Analysis Association Rules || Naive Bayesian Classifier || Time Series Analysis econom etricsspatialeconom etrics graph theory algorithm s astrophysicaltim e-series analysis path planning algorithm s bayesian statistics constrained optim izations num ericalintegration techniques PCA G LM hierarchicalm odels IRT DLISA latentclass analysis structuralequation m odeling m ixture m odels SVM m axent CART naive bayes classifier ICA Thursday, August 22, 13
  • 8. Data Tactics Analytics Practice Program m ing & Scripting Skills M athem atics & Statistics Domain Expertise DT Analytics Traditional Research DangerZone! ~statisticulation ML [2] http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram [1] Statisticulation “How to Lie with Statistics” Darrell Huff [3] https://portal.data-tactics-corp.com/sites/analytics/Wiki/AnalyticsFAQ.aspx Thursday, August 22, 13
  • 9. Why Analytics [Business]??? Why are analytics important? (Business, Analytics, Practical) "We need to stop reinventing the cloud and start using it!" (Dave Boyd) Thursday, August 22, 13
  • 10. Why are analytics important? (Business, Analytics, Practical) Analytics: No Free Lunch (NFL) theorems: no algorithm performs better than any other when their performance is averaged uniformly over all possible problems of a particular type. Algorithms must be designed for a particular domain or style of problem, and that there is no such thing as a general purpose algorithm. Why Analytics [Analytics]??? Thursday, August 22, 13
  • 11. Marty doesn’t scale - none of us do. Data Scales Web Scales Academic Publications Scale IC Scales N t t Why Analytics [Practical]??? Thursday, August 22, 13
  • 12. Why Analytics [Practical]??? Why are analytics important? (Business, Analytics, Practical) “…the alternative to good statistics is not “no statistics,” it’s bad statistics. People who argue against statistical reasoning often end up backing up their arguments with whatever numbers they have at their command, over- or under-adjusting in their eagerness to avoid anything systematic” Bill James Thursday, August 22, 13
  • 13. "companies that have massive amounts of data without massive amounts of clue are going to be displaced by startups that have less data but more clue" (Tim O’Reilly) Philosophy: Thursday, August 22, 13
  • 14. Philosophy: We are NOT “Data Agnostic” ...this should represent an early warning system about our culture. The IT notion of data is dead. Thursday, August 22, 13
  • 16. “Analytics in Perspective” reflects how people arrive at decisions. GOOD: Induction, Abduction, Circumscription, Counterfactuals. BAD: Deduction, Speculation, Justification, Groupthink Analytics in Perspective... Thursday, August 22, 13
  • 17. Identifying Smugglers Leveraging Big Spatio-Temporal Data Thursday, August 22, 13
  • 18. Background: The Strait of Hormuz Importance: • Oil • Embargo • Smuggling Thursday, August 22, 13
  • 19. How to Catch Smugglers In order to stop smugglers, we must identify: 1. Which boats are undertaking illicit activities 2. Where illicit activities are taking place 3. Points of departure/arrival of suspicious ships Thursday, August 22, 13
  • 20. A Difficult Task: Too Much Data AIS (transponder) provides ship-level data: • Ship location (lat-long) • Ship speed • Ship bearing • Ship “purpose” • Time stamp About 0.5M pings from 1,300 boats between March 2012 and January 2013. Thursday, August 22, 13
  • 21. A Difficult Task: Too Much Data Thursday, August 22, 13
  • 22. A Difficult Task: Too Little Data Individual pings or tracks not useful: no point of comparison Similarly, small duration plots are too thin to provide analytic leverage. Thursday, August 22, 13
  • 23. A Difficult Task: Too Little Data . A single boat: Thursday, August 22, 13
  • 24. A Difficult Task: Too Little Data . A single day: Thursday, August 22, 13
  • 25. A Difficult Task: Many Types of Boats Thursday, August 22, 13
  • 26. Solution: Analytics Use a statistical model to discover patterns in the data… …then identify observations (boat-times) that do not fit those patterns. Goal: Identify boats, place, and times that exhibit or house discrepant behavior. Thursday, August 22, 13
  • 27. Characteristics of a Good Model A good model for this data should: • Leverage all of the available data • Take advantage of local information (not global patterns) • Be able to accommodate a variety of patterns (shipping, fishing, etc) • Be able to identify ships that are only occasionally deviant • Identify place-times where deviant activity occurs • Be estimable with reasonable computational resources Thursday, August 22, 13
  • 28. The Model A local, unsupervised-as-supervised learning, bagged, probability model. A LUBaP model? Thursday, August 22, 13
  • 29. The Model A local, unsupervised-as-supervised learning, bagged, probability model. We want to compare apples-to-apples; that is, treat nearby (spatio-temporally) boats the same, don't compare them to far-flung ones. Assign each observation to a geographically constrained grid square. Thursday, August 22, 13
  • 30. The Model A local, unsupervised-as-supervised learning, bagged, probability model. Thursday, August 22, 13
  • 31. The Model A local, unsupervised-as-supervised learning, bagged, probability model. Let m denote the number of observations in a particular grid square. Then, in each square, add m additional observations with the following characteristics: •position, drawn from bivariate uniform distribution •speed, drawn with replacement from empirical distribution •time of observation, drawn from a uniform distribution Now, the task is no longer unsupervised, but supervised. ->Model the probability of a boat being a ``real'' boat. Thursday, August 22, 13
  • 34. The Model A local, unsupervised-as-supervised learning, bagged, probability model. •Turned outlier detection, a poorly structured problem, into modeling a binary target, a very well-understood problem •Now, simply model the probability that each boat is “real” •Apply logistic regression to each grid square •Allow the flexibility (order) of the model fit (splines, interactions) to depend on the data density in each square (more data, richer model). •logit(“real”) = f(speed, location, time) Thursday, August 22, 13
  • 35. The Model A local, unsupervised-as-supervised learning, bagged, probability model. Problem: Predictions may be arbitrary due to random assignment and grid coarseness. Thursday, August 22, 13
  • 36. The Model A local, unsupervised-as-supervised learning, bagged, probability model. Problem: Predictions may be arbitrary due to random assignment and grid coarseness. Solution: 1. Create multiple grids with different positions. 2. Re-run the local model in each square, for each different grid. 3. Aggregate the predicted probabilities for each observation, in each grid, by averaging. Thursday, August 22, 13
  • 37. Computational Efficiency Estimating a flexible model in each of ~300 grid squares, for each of 6 grids, means estimating ~1,800 logistic models! Not a problem, because: • each one has limited amounts of data (most algorithms take exponentially longer as a function of data size) • each local model is separate, allowing for parallel processing Computation on my laptop takes ~4 minutes after simple parallelization across cores. Thursday, August 22, 13
  • 38. What is the Output from this Model? •Predicted probability of each boat-time (i.e. observation) being a real boat. •High probabilities indicate observations doing something “normal” or “predictable.” •Low probabilities indicate observations doing something “discrepant.” Ship ID Lat Long Speed Timestamp Pr 623432 24.546 55.005 9.8 1203221230 0.78 874627 24.716 55.108 12.4 1209242230 0.08 523881 25.128 54.807 4.2 1206120947 0.64 Thursday, August 22, 13
  • 39. Value I: Location of Illicit Activities Thursday, August 22, 13
  • 40. Value II: Identify Devious Boats Thursday, August 22, 13
  • 41. Value III: Prioritized List of Suspect Boats •Model generates probabilities on an interval scale •Facilitates efficient use of scarce enforcement resources Thursday, August 22, 13
  • 42. Lessons Learned Analytics is a powerful tool for identifying patterns in big data. Identifying outliers is predicated on identifying patterns. LUBaP models are a powerful tool for outlier detection. This model utilizes no subject matter expertise and a simple probability model (implications: portable across domains; fast) Thursday, August 22, 13
  • 43. What’s the Next Hot Thing? Unsupervised Scaling of Text Data Thursday, August 22, 13
  • 44. Analyzing Text is Important The preponderance of data created today is free text, not structured numerical data. One thing people want to do with text is “scale” it; that is, rank order it according to an underlying continuum. Examples: -put a numerical value on what each product reviewer thinks of a particular product -generate a measure of the extremism of Iranian clerics based on their writings Thursday, August 22, 13
  • 45. Analyzing Text is Difficult Text data is unstructured, and messy. “I thought I would love the iPhone, but it’s actually not that great.” Standard approaches: 1. Dictionary: Create a numeric value for many content-laden words; compare texts to the dictionary. 2. Estimation: Hand-score many texts; use the scores as a basis for training a statistical model for other texts. Thursday, August 22, 13
  • 46. A New Approach Each author’s use of a word implies they “support” that word, as opposed to words they don’t use. The model, developed for scaling ideological positions of legislators from votes, can be applied to word use. Benefits: 1: No dictionary! 2: Language invariant! https://github.com/DataTacticsCorp/text-analysis Thursday, August 22, 13
  • 47. Preliminary Example Pulled down 2000 tweets, 1000 each with the hashtags #prolife and #prochoice. Drop the hashtags (no cheating!), pre-process the text data, and run the model. Thursday, August 22, 13
  • 51. Local Events, Worldwide Impact Thursday, August 22, 13
  • 52. Localview Localview also known as “Lv”, is a Cloud/Web based proprietary Dashboard with an advanced analytics framework – the desired end state is an integrated data mining, knowledge discovery and pattern recognition of social and spatial pattering. Lv will provide end-users with globally and locally available historical information as well as globally and locally available real-time social media data feed. This service includes; news, on the spot statistics using a proprietary Data Tactics Tool called © “ZoomStat”, historical facts, social media, economics, security, military, infrastructure, health, aid, natural disasters, war, entertainment, weather, transportation, and travel. All results will be analyzed, ingested, normalized, and then plotted on a dynamic and interactive global map. Thursday, August 22, 13
  • 53. ...by the numbers  7 volunteered & part time team members (NO OVERHEAD)  first DEMO delivered in 86 days  832 hours of research & development time Thursday, August 22, 13
  • 54. The Team: The Team backend development frontend development data analysis development Marty A Joe A Joon K Annie W Dave P Rich H Shenoa H Thursday, August 22, 13
  • 57. Development Process Lv Development Process Thursday, August 22, 13
  • 58. End-Users: Law Enforcement IC & DoD Commercial Thursday, August 22, 13
  • 59. Directional Space Time Analytics Base-Rate Fallacy Thursday, August 22, 13
  • 60. Directional Space Time Analytics Data Tactics has been working on a set of problems that require considered solutions. The following method compares distributions at two points in time, with a particular focus on changes in the overall morphology of the distribution as well as mobility of individual observations within the distribution over that same period of time and contextually accounting for neighborhood effects. These dynamics are illuminating and communicate time and explicitly account for underlying spatial dimension (Wy). Based on the integration of a dynamic local space-time together with direction statistics these methods provide insights on the role of spatial dependence and uncontrolled variance over time and space. Thursday, August 22, 13
  • 61. Directional Space Time Analytics This analysis demonstrates the utility of directional space time analytics on regional stability distribution dynamics. Drawing on recent advances in geovisualization [1], we suggest a spatially explicit view of mobility. Based on the integration of a dynamic local indicator of spatial association together with directional statistics and mapped data points to each observation, this framework provides new insights on the role of spatial dependence in regional stability and change. These approaches have been illustrated with state level incomes in the U.S. (1969-2008), Gross Domestic Product (1960 - 2011) Failed State Index (2010 - 2012), and GMTI data (t0, t1). [1] Murray, A. T., Liu, Y., Rey, S. J., and Anselin, L. (2010). Exploring movement object patterns. Thursday, August 22, 13
  • 62. Per Capita Gross Domestic Product A measure of the total output of a country that takes the gross domestic product (GDP) and divides it by the number of people in the country. The per capita GDP is especially useful when comparing one country to another because it shows the relative performance of the countries. A rise in per capita GDP signals growth in the economy and tends to translate as an increase in productivity. GDP is widely used by economists to gauge economic recession and recovery and an economy's general monetary ability to address externalities. It is not meant to measure externalities. It serves as a general metric for a nominal monetary standard of living and is not adjusted for costs of living within a region. Gross Domestic Product GDP = private consumption + gross investment + government spending + (exports − imports), or Thursday, August 22, 13
  • 63. GDP per. Capita Time Span: 1960 to 2011 (51 temporal bin(s), 1 year intervals): 2000 to 2011 (12 temporal bin(s), 1 year intervals); Spatial Area: Global; Original Sample: 202 obs; Data processing: imputation; Pruned Sample: 145 observations; Method: Directional Local Indicator of Spatial Autocorrelation (Moran’s I) with space-time classifications of High-high (Hh), high-High, Low-Low (LL), High Low (HL), Low-High (LH); Spatial Weights: knn4; Thursday, August 22, 13
  • 64. > describe(dlisa$yr2000) > describe(dlisa$yr2011) V. Name n mean sd median mad min max range skew kurtosis yr2000 145 5759 9534 1491 1831 87 46453 46366 2.12 3.72 yr2011 145 13292 20621 4666 5841 231 114232 114001 2.46 6.54 Directional Space Time Analytics Thursday, August 22, 13
  • 65. Directional Space Time Analytics https://vimeo.com/69775085 Thursday, August 22, 13
  • 66. Directional Space Time Analytics 2000:2011 (12 temporal bin(s), 1 year intervals); Thursday, August 22, 13
  • 67. Directional Space Time Analytics What is wrong with Vermont[1]? - Seemingly nothing! - Lies within head of approximately normal distribution - Not an outlier in a classical statistical sense - Vermont remains below the US average but is closing the gap. [1] State Median Income Thursday, August 22, 13
  • 68. State Median Income Time Span: 1969 to 2008 (40 temporal bin(s), 1 year intervals) Spatial Area: Contiguous United States; Original Sample: 48 obs; Method: Directional Local Indicator of Spatial Autocorrelation (Moran’s I) with space-time classifications of High-high (Hh), high-High, Low-Low (LL), High Low (HL), Low-High (LH); Spatial Weights: Rook Contiguity; Thursday, August 22, 13
  • 69. Directional Space Time Analytics 1969:2008 (40 temporal bin(s), 1 year intervals) Thursday, August 22, 13
  • 70. Directional Space Time Analytics 1969:2008 (40 temporal bin(s), 1 year intervals) Thursday, August 22, 13
  • 71. Directional Space Time Analytics 1969:2008 (40 temporal bin(s), 1 year intervals) Thursday, August 22, 13
  • 72. Directional Space Time Analytics Thursday, August 22, 13
  • 73. Core Values: Localview as an ecosystem: Most existing big data analyses of social media are confined to a single platform. However, most of the topics of interest to such studies, such as influence or information flow can rarely be confined to the Internet, let alone to a single platform. Understandable difficulty in obtaining high-quality multi-platform data does not mean that we can treat a single platform as a closed and insular system, as if human information flows were all gases in a chamber. “Shapes of stories into computers...” Kurt Vonnegut Nate Silver - Cognition2 ; Small Multiples; Tukey vs. Tufte http://kottke.org/11/09/kurt-vonnegut-explains-the-shapes-of-stories Thursday, August 22, 13
  • 74. Core Values: Open-source software where possible.  -Bigger data means bigger cost. -Scientific Python and R Computing Language reached maturity years ago. Data = Rough + Smooth Qualities Rough = impulsive, spiky signal: outliers; Smooth = pervasive Leverage analytics to help understand patterns in data as well as outliers - so called rough and smooth elements of data. The “smooth” and the “rough” patterns in data are informative, depending on the specific questions customers have. Local, as opposed to global or whole-map statistics: We believe that micro-level, local patterns are often of key interest, and can be obscured or distorted by attempts to fit global models to local data.  Analytical Pluralism: Mutli-method approaches dominate single-method approaches.  Rather than craft a single statistical model to answer a customer question, we attack problems from several angles simultaneously, deriving insights from areas of overlap and divergence in the pattern of findings. Methodological pathways: Blend nomothetic and idiographic approaches. Thursday, August 22, 13
  • 79. ...on the horizon. ...On the Horizon: DT & USMA Department of Systems Engineering partner together and leverage the Advanced Individual Academic Development Program. Rstudio: analytics.data-tactics-corp.com; PostgreSQL: analytics.data-tactics-corp.com Port: 5432 https://github.com/rheimann/kiva-master Thursday, August 22, 13
  • 80. Data Tactics & US Military Academy: A Prime in Microfinance using KIVA Rstudio: analytics.data-tactics-corp.com; PostgreSQL: analytics.data-tactics-corp.com Port: 5432 Understanding the complex nature of microfinance more completely: The US military is directly involved in microfinance (Iraq & Afghanistan), working primarily through Provincial Reconstruction Teams (PRTs).  Funded by the DoD and DoS; the operational requirements of these agencies create a need to demonstrate quick impact on economic recovery and therefore the goal is to report high numbers of loans.  Technical complexities separate this data from other datasets: Heterogeneous forms: structured/unstructured/nominal,ordinal, quantitative/temporal/ geographic/multi-lingual/multiple relationships(lenders to recipients) - multiple sectors/ missing data. Data cleansing is hard! Big Data(ish): $420M (USD), 1.1 million lenders, 580,000 loans, 250 partners, 4.1M transactions, 3 WHOLE GBs. (https://vimeo.com/28413747) Broad appeal: ...government to defense to finance to banking to non-profit organizations to THE POOR. https://github.com/rheimann/kiva-master Thursday, August 22, 13
  • 81. ...on the horizon. ...On the Horizon: DT & The Institute for the Study of War will collaborate in a balanced but largely quantitative approach to analyzing revolutions and the role social media plays with particular focus on the Iraq Spring. Thursday, August 22, 13
  • 82. ...on the horizon. ...on the Horizon: Data Science for Program Managers (late September / early October) Analytics Brown Bag Volume II (October / Early November) Thursday, August 22, 13
  • 83. Thank you... 83 Questions? Homepage: http://www.data-tactics.com Blog: http://datatactics.blogspot.com Twitter: https://twitter.com/DataTactics Or, me (Rich Heimann) at rheimann@data-tactics-corp.com Thursday, August 22, 13