This document discusses using Bayesian inference to detect change points in noisy time series data. It provides an overview of Bayesian statistics and Bayes' theorem. Mud pulse telemetry data from oil drilling is used as an example case study, where change point detection can identify different rock types or when oil is reached. The document outlines modelling the problem statistically and developing a Python implementation of a single change point detector based on calculating the posterior probability of change point locations. Several other potential applications are also mentioned, including financial data and web traffic analysis.
6. Mud
pulse
telemetry
• InformaDon
encoded
digitally,
transmiOed
via
pressure
pulses
through
mud
fluid.
• Alert
drillers
that
they
have
reached
oil,
detect
rock
types
and
general
monitoring.
7. The
problem
• Poor
bit
rate
and
resoluDon
• Time
consuming
analysis
8. Approaches
to
staDsDcs
• FrequenDst
– Data
gathered
is
a
repeatable
random
sample.
“Frequency”
– Underlying
parameters
are
constant
– Fisher’s
0.05
• Bayesian
– Data
are,
fixed
and
observed
from
the
realised
sample
– Parameters
unknown
and
described
probabilisDcally
– Introduce
“subjecDvity”
10. The
Theory:
Bayesian
inference
• Methodology
of
mathemaDcal
inference:
– Choosing
between
several
possible
models
– ExtracDng
parameters
for
these
models
• Bayes’
Theorem:
Rev
Thomas
Bayes
1702
-‐
1761
p(w | D) =
p(D | w)p(w)
p(D)
Likelihood
Prior
Probability
Posterior
Probability
Evidence
-‐ Remove
nuisance
parameters
by
marginalisaDon
-‐ InteresDng
ones
remain
12. 0
20
40
60
80
100
120
140
160
180
200
0.5
1
1.5
2
2.5
data
=
model
+
noise
• a
sequence
of
N
samples
of
data
from
a
piecewise
constant
source
with
added
Gaussian
noise.
• Noise
independent
of
mean,
idenDcally
distributed
and
S.D.
=
σ
• Heterogenous:
divide
into
two
homogenous
segments
µ2
⎩
⎨
⎧
+
+
=
i
i
i
e
e
d
2
1
µ
µ
Nim
mi
≤<
≤
1µ
Nm
13. Single
changepoint
detector:
How
does
it
work?
• SubsDtute
likelihood
into
Bayes’ Law
– Simple
model-‐
consider
Ockham’s
Razor
• Interested
in
changepoint
locaDon
m,
integrate
w.r.t.
the
nuisance
parameters
(µ1,
µ2
and
σ)…rearrange
this…
• …get
a
BIG
expression
for
p({m}|dI),
code
in
Python
• On
running
obtain
most
likely
changepoint
locaDon
Ockham’s
razor:
hOp://www.jstor.org/discover/10.2307/29774559?sid=21105568247973&uid=3738032&uid=4&uid=2
20. “Google’s
algorithm
is
the
“secret
sauce
recipe”
that
has
enabled
it
to
dominate
search.”
-‐
FT.com
16th
Sept
2014
hOp://www.p.com/cms/s/0/9615661c-‐3ce1-‐11e4-‐9733-‐00144feabdc0.html?
siteediDon=uk#axzz3DSwXYAW8
Any
business
with
an
online
presence
today
open
struggles
to
accurately
evaluate:
●
The
quality
of
their
website
and
associated
linking
pages,
as
perceived
by
Google
●
The
robustness
of
their
website
to
a
sudden
change
in
Google’s
search
algorithm
21. Web
traffic
30000
35000
40000
45000
50000
55000
60000
raw
daily
google
search-‐sourced
pageviews
22. Web
traffic
(2)
30000
35000
40000
45000
50000
55000
60000
smoothed
data
using
moving
average
23. Web
traffic
(3)
30000
35000
40000
45000
50000
55000
60000
smoothed
data
with
cyclicality
removed
24. Web
traffic
(4)
-‐838
-‐837.5
-‐837
-‐836.5
-‐836
-‐835.5
-‐835
-‐834.5
-‐834
-‐833.5
-‐833
30000
35000
40000
45000
50000
55000
60000
likelihood
of
change
in
data
plo>ed
over
.me
day
removed
likelihood
CP
25.
26. number
of
tropical
storms
per
year
in
the
North
AtlanDc
Data
obtained
from
ibtracs
database:
hOps://www.ncdc.noaa.gov/ibtracs/
27. "Amo
Dmeseries
1856-‐present"
by
Rosentod,
Marsupilami
-‐
hOp://www.cdc.noaa.gov/CorrelaDon/amon.us.long.data.
Licensed
under
Public
Domain
via
Wikimedia
Commons
-‐
hOp://commons.wikimedia.org/wiki/File:Amo_Dmeseries_1856-‐present.svg#mediaviewer/
File:Amo_Dmeseries_1856-‐present.svg
28.
29. Other
applicaDons
/
possibiliDes
• Financial
markets
and
poliDcal
events
• Combine
with
frequenDst
staDcal
methods:
– Use
of
GLR
in
online
(moving
window)
detecDon
applicaDon
• Your
own
data/
ideas
!
30. Thank
you
• Link
to
Python
code
on
github:
hOps://github.com/swhustla/pydata-‐bayes-‐changepoint
– Single
changepoint
detector
(as
seen
tonight)
– Dual
changepoint
detector
– Ramp
detector
• Further
reading:
– Numerical
Bayesian
Methods
Applied
to
Signal
Processing
(StaDsDcs
and
CompuDng)
by
Fitzgerald,
O’Ruanaidh,
1996
:
hOp://www.amazon.co.uk/Numerical-‐Bayesian-‐Processing-‐
StaDsDcs-‐CompuDng/dp/0387946292
– Bayesian
Inference
on
Change
Point
Problems
(2007)
hOp://www.cs.ubc.ca/~murphyk/Students/Xuan_MSc07.pdf
TwiOer:
@norhustla
Email:
frank.kelly@cantab.net
31. Thank
you
• AddiDonal
links:
– Google
Algo
updates:
hOp://moz.com/google-‐algorithm-‐change
– Mathsight
-‐>
insights
into
algorithm
changes
hOp://mathsight.org
– AtlanDc
mulD-‐decadal
oscillaDon
spaDal
paOern:
hOp://commons.wikimedia.org/wiki/File:AMO_PaOern.png
– NaDonal
climaDc
data
center
hOps://www.ncdc.noaa.gov/ibtracs/
– Ockham’s
Razor
and
Bayesian
Inference:
hOp://www.jstor.org/discover/10.2307/29774559?
sid=21105568247973&uid=3738032&uid=4&uid=2
– ConverDng
from
Matlab
to
Python:
hOp://mathesaurus.sourceforge.net/matlab-‐numpy.html
TwiOer:
@norhustla
Email:
frank.kelly@cantab.net