Discontinuities Demo

•

0 gostou•715 visualizações

DataTactics

Tecnologia Negócios

Challenges
• how do you decide which variables capture
what happened?
• /when/ did an event happen
• what's the effect of the event on the variables
• Can we construct a UI and algorithm to tackle
all three problem simultaneously?

Thursday, November 14, 13

Goals
• The goal is to feed in raw data as the sole
input, and obtain answers to all three
questions:
• (1) when did an event likely occur
• (2) what variables can we use to measure the
event
• (3) what was the effect of the event on those
variables

Thursday, November 14, 13

variable outcome

Simple example

effect

event

Thursday, November 14, 13

time

With limited insight…
• if we know timing and the important
variables, we can measure the effect of the
shock on the variables. (standard regression
techniques)
• if we know the set of important variables and
track variables over time, we can identify
timing of shocks.
• if we know timing and have a long history of
variable evolution, we can cluster variables by
their behavior at the important point in time
(relative to other points in time)
Thursday, November 14, 13

estimated effect

Methodology

correct
effect

correct timing
Thursday, November 14, 13

Methodology
•

For every time T and variable K, run an OLS, under the hypothesis that a shock
occurred at time T to variable K

•

Sample is restricted to variables for a neighborhood around t, i.e. [t-bandwith , t
+bandwidth]:
Y(K,t) = A(K,T) + B(K,T)S(t) + e(K,t) with
S(t) = 1(t>T) is an indicator with T as the time to test

•

Results are stored as the matrix of coefficients B(K,T)

•

OLS estimates of B(K,T) are biased towards zero to the extent that S(t) is
misspeciﬁed.

•

In other words, B(K,T) will be maximally different from zero (and unbiased) at the
true break T

Thursday, November 14, 13

Answers
• when did an event likely occur?
– aggregate (sum) effects across all variables

• what variables can we use to measure the
event?
– which variables had the largest effect at time point?

• what was the effect of the event on those
variables?
– we just measured that

• what variables move together often across
time?
– show similar variables
Thursday, November 14, 13

Example 1: Super Bowl tweets
• Twitter streaming API (every tweet)
• Sample of data selected from Sunday,
February 3, 1600-2210 hours
• Binned into minute-by-minute word counts
• Out of 651k 1-grams, kept 1035 least sparse
(> 30% sparse) words.
• Input data is 371x1035 matrix

Thursday, November 14, 13

SUPERBOWL SHINY

Thursday, November 14, 13

Network graph of variables with
correlations > .95

Halftime show

Power outage

Thursday, November 14, 13

Deployable and Repeatable
• The model only requires data to be
transformed to a KxT matrix.
– K variables
– T time periods

We could use this model on many other data sets!
• minute-by-minute word count in twitter
• stock prices
• chatter on social media forums

Thursday, November 14, 13

Network graph of forums with
correlations > .27

Hezbollah

Thursday, November 14, 13

Future improvements
• OLS is simple and efficient, but other models
may be more accurate at estimating effects in
some cases
• exploring different approaches to choosing
which variables to consider and approach to
aggregating variable effects.
• massively parallel on all 630k words
simultaneously?
• real-time analytics on streaming data

Thursday, November 14, 13

Mais conteúdo relacionado

Mais de DataTactics

NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATADataTactics

C Star Analytic PresentationDataTactics

Text Analysis Using Twitter: A Case Study in Dhaka DataTactics

Data Science and Analytics Brown BagDataTactics

Data Tactics Analytics PracticeDataTactics

Big Data ConferenceDataTactics

DLISADataTactics

Analytics Brownbag DataTactics

Big Data Taxonomy 8/26/2013DataTactics

Ontology and ReportsDataTactics

Data Tactics Unified Dataspace Architecture and DescriptionDataTactics

Data Tactics Semantic and Interoperability Summit Feb 12, 2013DataTactics

Horizontal Integration of Big Intelligence DataDataTactics

Bill Ontology Summit (08 feb 1400hrs) v2DataTactics

DT Company Overview January 2013DataTactics

Capabilities Brief AnalyticsDataTactics

Data Tactics dhs introduction to cloud technologies wtcDataTactics

Multi Discipline Intelligence Production Teams 1DataTactics

Data Tactics and Nervve Integrated Big Data v3DataTactics

Data Tactics Open Source BriefDataTactics

Mais de DataTactics (20)

NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA

C Star Analytic Presentation

Text Analysis Using Twitter: A Case Study in Dhaka

Data Science and Analytics Brown Bag

Data Tactics Analytics Practice

Big Data Conference

DLISA

Analytics Brownbag

Big Data Taxonomy 8/26/2013

Ontology and Reports

Data Tactics Unified Dataspace Architecture and Description

Data Tactics Semantic and Interoperability Summit Feb 12, 2013

Horizontal Integration of Big Intelligence Data

Bill Ontology Summit (08 feb 1400hrs) v2

DT Company Overview January 2013

Capabilities Brief Analytics

Data Tactics dhs introduction to cloud technologies wtc

Multi Discipline Intelligence Production Teams 1

Data Tactics and Nervve Integrated Big Data v3

Data Tactics Open Source Brief

Último

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

How to write a Business Continuity PlanDatabarracks

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Gen AI in Business - Global Trends Report 2024.pdfAddepto

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

Discontinuities Demo

1. Discontinuities Demo Shrayes Ramesh, PhD. Data Tactics Corporation Thursday, November 14, 13

2. Challenges • how do you decide which variables capture what happened? • /when/ did an event happen • what's the effect of the event on the variables • Can we construct a UI and algorithm to tackle all three problem simultaneously? Thursday, November 14, 13

3. Goals • The goal is to feed in raw data as the sole input, and obtain answers to all three questions: • (1) when did an event likely occur • (2) what variables can we use to measure the event • (3) what was the effect of the event on those variables Thursday, November 14, 13

4. variable outcome Simple example effect event Thursday, November 14, 13 time

5. Scaled up Thursday, November 14, 13

6. With limited insight… • if we know timing and the important variables, we can measure the effect of the shock on the variables. (standard regression techniques) • if we know the set of important variables and track variables over time, we can identify timing of shocks. • if we know timing and have a long history of variable evolution, we can cluster variables by their behavior at the important point in time (relative to other points in time) Thursday, November 14, 13

7. estimated effect Methodology correct effect correct timing Thursday, November 14, 13

8. Methodology • For every time T and variable K, run an OLS, under the hypothesis that a shock occurred at time T to variable K • Sample is restricted to variables for a neighborhood around t, i.e. [t-bandwith , t +bandwidth]: Y(K,t) = A(K,T) + B(K,T)S(t) + e(K,t) with S(t) = 1(t>T) is an indicator with T as the time to test • Results are stored as the matrix of coefficients B(K,T) • OLS estimates of B(K,T) are biased towards zero to the extent that S(t) is misspeciﬁed. • In other words, B(K,T) will be maximally different from zero (and unbiased) at the true break T Thursday, November 14, 13

9. Methodology Thursday, November 14, 13

10. Answers • when did an event likely occur? – aggregate (sum) effects across all variables • what variables can we use to measure the event? – which variables had the largest effect at time point? • what was the effect of the event on those variables? – we just measured that • what variables move together often across time? – show similar variables Thursday, November 14, 13

11. Example 1: Super Bowl tweets • Twitter streaming API (every tweet) • Sample of data selected from Sunday, February 3, 1600-2210 hours • Binned into minute-by-minute word counts • Out of 651k 1-grams, kept 1035 least sparse (> 30% sparse) words. • Input data is 371x1035 matrix Thursday, November 14, 13

12. SUPERBOWL SHINY Thursday, November 14, 13

13. Network graph of variables with correlations > .95 Halftime show Power outage Thursday, November 14, 13

14. Deployable and Repeatable • The model only requires data to be transformed to a KxT matrix. – K variables – T time periods We could use this model on many other data sets! • minute-by-minute word count in twitter • stock prices • chatter on social media forums Thursday, November 14, 13

15. Network graph of forums with correlations > .27 Hezbollah Thursday, November 14, 13

16. Future improvements • OLS is simple and efficient, but other models may be more accurate at estimating effects in some cases • exploring different approaches to choosing which variables to consider and approach to aggregating variable effects. • massively parallel on all 630k words simultaneously? • real-time analytics on streaming data Thursday, November 14, 13

Discontinuities Demo

Recomendados

Recomendados

Mais conteúdo relacionado

Mais de DataTactics

Mais de DataTactics (20)

Último

Último (20)

Discontinuities Demo