This article proposes a new distance measure called Spatial Assembling Distance (SpADe) that can handle noise, shifting, and scaling in both temporal and amplitude dimensions of time series. SpADe is applied to the problem of streaming pattern detection, which continuously monitors streaming time series to detect matches with query patterns. Experimental results show that SpADe is an effective distance measure for time series and achieves high accuracy and efficiency for continuous pattern detection in streaming data.
2. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 2
are Longest Common Subsequence (LCSS) [4] and Edit subsequence and the query pattern is no more than a
Distance on Real sequence (EDR) [3]. Compared to the given threshold δ.
second category, the ε-matching warping distances are All the mentioned distance measures of time series are
robust in the presence of noise and partially handle designed for full sequence matching, in which distance
some amplitude shifting and scaling variances. However, is measured based on the full length of sequences. How-
they are still sensitive to certain degrees of amplitude ever, on the problem of streaming pattern detection, we
warps because ε-matching is directly based on amplitude have no priori knowledge on the positions and lengths
values. Figure 2 shows two examples where amplitude of the possible matching subsequences. When using
shifting and scaling variances may affect the effective- these distances, we need to first divide the potential
ness of existing warping distances. subsequences from the streaming time series, and then
compare them to query patterns based on full matching.
B
B An obvious solution is to compare the most recent sub-
A
sequences of streaming time series to the query patterns
A
C C whenever a new data item arrives. However, such an ap-
(a) (b)
proach is computationally intensive, and incurs redun-
dant computational overhead. Segmentation is a simple
Fig. 2. Impact of amplitude shifting and scaling. d(A, C) way to handle subsequence matching, in which potential
may be less than d(A, B) for warping distances. matching subsequences are extracted from streaming
time series and compared to query patterns. However,
The local shapes of time series also affect the ef- potential segments may be hard to extract as many time
fectiveness of distances. Figure 3 shows an example series patterns have no clear boundaries.
where the DTW distance of two local shapes is quite As a subsequence matching problem, pattern detection
small even though they are quite distinct in shapes. on streaming time series is naturally expensive. Warping
Existing warping distances lose much information when distances have so far not been extended for online
matching local shapes. pattern detection in streaming time series while taking
both shifting and scaling into account. SpADe is applied
b
to efficiently perform continuous detection of patterns on
... ... streaming time sequences without the need to perform
a sequence segmentation. Our contributions are as follow:
• We propose a robust distance measure of shape-
b’
based time series, SpADe, which can be applied
... ...
to both full sequence and subsequence matching.
a’
It is not sensitive to shifting and scaling in either
Fig. 3. Impact of local shapes on warping distances. the temporal or the amplitude dimensions of time
series.
Global amplitude shifting and scaling can be handled • We propose a continuous SpADe computation ap-
by normalization [3], [8]. Given a time series s, each data proach which can naturally be used on streaming
item s[i] can be normalized as s[i] = (s[i] − μ)/σ, where pattern detection. We improve the efficiency of pat-
μ and σ are the average and standard deviation of data tern detection by using a pruning approach.
items in s. Many available time series data sets have • We extend the SpADe distance for streaming pattern
been normalized [9]. However, local amplitude shifting detection of multivariate time series.
and scaling (an example is shown in Figure 1) cannot be • Experimental study was conducted. We present ex-
handled by simple normalization of global time series. perimental results that show that SpADe is an effec-
To fully handle noise, local shapes, shifting and scaling tive distance measure of time series, and it is both
in temporal and amplitude dimensions of shape-based efficient and effective for subsequence matching on
time series, we propose a novel distance measure, called streaming time series.
Spatial Assembling Distance (SpADe). The rest of the paper is organized as follows. Section
We investigate the use of SpADe in the context of 2 gives an overview of distance measures of time series
detection of streaming patterns. Pattern detection on and existing solutions on subsequence matching. Section
streaming time series is to continuously monitor match- 3 defines the basic SpADe, and Section 4 proposes
ing subsequences of streaming time series against some effective techniques on computing the SpADe distance.
given query patterns. A pattern in time series is a Section 5 introduces the approach of continuous pat-
set of sequential data items collected in discrete time tern detection by SpADe. Section 6 extends the SpADe
points, describing a meaningful tendency of evolving distance for streaming pattern detection of multivariate
data items during a period of time, and therefore im- time series. Section 7 shows the experimental study of
plying important phenomenon of the monitored objects. SpADe. Section 8 summarizes our conclusions.
A subsequence of streaming time series is said to be This paper improves on our previous work [10] by giv-
matched to a query pattern if the distance between the ing a thorough analysis of warping-based subsequence
3. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 3
matching in Section 2.3, a detailed discussion on effective techniques is the use of Euclidean distance on measuring
computation of the SpADe distance in Section 4, an distances in feature space. Park et al. [24] proposed an
extension of SpADe for streaming pattern detection of approach for subsequence matching by applying DTW.
multivariate time series in Section 6, and an extensive The suffix tree is used to index possible subsequences
experimental study on the impacts of parameters and of the data sequences. However, all these studies on
the application of streaming human motion pattern de- subsequence matching try to search the matches of short
tection in Section 7. query patterns to long sequences in a database, where
index can be built on long data sequences.
Pattern detection on streaming time series is to detect
2 P RELIMINARIES
matching subsequences within long streaming sequences
2.1 Distance measures of time series to any given query pattern. Wu et. al [25] proposed
The distance between two time series is essentially an online segmentation and pruning algorithm to sim-
computed from the aggregation of pair-wise difference plify the data sequence as zigzag shapes. However, the
of data items within them. Traditionally, the Euclidean piecewise linear representation limits its application in
distance is used to measure the distances between time shape based pattern matching on time series. Euclidean
series of the same length. Many dimensionality reduc- distance or its variation (e.g., correlations) was used
tion techniques, such as Discrete Fourier Transform [11], in matching patterns in some recent works on stream-
Singular Value Decomposition [12], Discrete Wavelet ing time series such as BRAID [26], SPIRIT [27]. Gao
Transform [13], Adaptive Piecewise Constant Approxi- et. al [28] also studied continuous pattern queries on
mation [14] and Chebyshev Polynomials [15], have been streaming time series. They attempted to detect the near-
applied to feature vector extraction from time series, est neighbor pattern when new data value arrives. As
after which Euclidean distance can then be applied in mentioned earlier, the use of simple Euclidean distance
measuring distances of the extracted feature vectors. or correlation in these studies affects the effectiveness
However, it has been observed that the Euclidean metric of pattern matching where shifting and scaling exist.
is very sensitive to distortion and noise [3], [6]. Steaming pattern detection on DTW distance has been
Warping distances such as DTW [1] and EDR [3] have recently studied in [29]. The matching subsequences are
been proposed to measure distances of time series with continuously monitored by computing DTW distances in
arbitrary lengths. The optimal alignments of data items a continuous fashion. This technique can also be applied
between two time sequences are obtained by repeating to the other warping distances such as EDR. However,
some data items so that the lengths of two sequences as stated earlier, these warping distances do not handle
can be the same. As a result, local time shifting and shifting and scaling in amplitude.
scaling [7] are handled under those warping distances.
The distance is calculated by finding the best warping 2.3 Warping-based subsequence matching
path in the distance matrix using dynamic programming, Given two time series s1 and s2 of lengths m and
which has a complexity of O(mn) (m and n are the n, a warping distance uses a matrix of (m + 1) ×
lengths of time series). Lower bounds of warping dis- (n + 1) for computing the full sequence distance by
tances [6], [16] have been proposed to prune some real a recursive function: M [i, j] = f(x,y)∈φ(i,j) (M [x, y] +
computations of warping distances. However, existing subcost((x, y), (i, j))). M [i, j] records an intermittent re-
warping distances are still sensitive to the shifting and sult of an optimal substructure, which describes the
scaling in the amplitude dimension of time series. optimal matching of two prefixes s1 [1 : i] and s2 [1 : j].
Supporting effectively matching time series under The main function f is either min or max function,
shifting and scaling variances has been attempted by depending on whether it is to measure distances or sim-
many studies [5], [17], [18], [19], [20]. However, the ilarities. Notation φ(i, j) denotes the set of entries in the
techniques proposed in these studies either support only matrix from which M [i, j] can be dynamically computed.
uniform shifting and scaling or cannot fully address For each element (x, y) ∈ φ(i, j), it is satisfied that x ≤ i
the shifting and scaling variances in both temporal and and y ≤ j so that M [i, j] can be dynamically computed
amplitude dimensions of time series. Moreover, time from those entries which have been already computed.
series are matched based on data items in these studies, Typically, φ(i, j) = {(i − 1, j), (i, j − 1), (i − 1, j − 1)}.
where meaningful local shapes (as the example in Figure The function subcost((x, y), (i, j)) is the additional cost
3) may not be effectively captured and matched. for computing M [i, j] from M [x, y]. It is typically a non-
negative function. The actual distance of time series is
2.2 Pattern detection on streaming time series actually aggregated over a number of subcosts through
For subsequence matching, ST-index [21], Dual Match dynamic programming. The initial condition for com-
[22] and General Match [23] extract local patterns from puting the warping distances is M [0, 0] = 0, from which
sequences by fixed size sliding windows. They map each the distance is aggregated. The entries in M can be
window of data items into a multidimensional point computed row-by-row or column-by-column. The last
and use indexing techniques to efficiently match the entry to be computed, M [m, n], finally determines the
subsequences in feature space. The limitation of these warping distance of two time series. For each entry
4. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 4
(i, j) ∈ M , there must be a warping path from which sliding window. A local pattern l of length w from a time
M [i, j] is aggregated. In full sequence matching, we sequence s1 can be described as l = (θt , θa , θs ), which are
should guarantee that the warping path of each entry the position (mid point) of l in s1 , the mean amplitude of
(i, j) (i × j > 0) is initialized from the entry (0, 0). data items in l and the shape signature of l respectively.
The warping distance can also be applied for sub- The distance of two local patterns l in s1 and l in s2 , can
sequence matching, to handle the temporal variances be measured as D1 (l, l ) = f (|θa − θa |, |θs − θs |), which
between the querying patterns and matching subse- is a weighted sum of the differences in amplitude and
quences. Given a querying pattern q and a long time shape features of two local patterns. The weights in f is
series s of length m and n, a wide distance matrix M application-specific, depending on the tolerance of the
of (m + 1) × (n + 1) can be created (shown in Figure 4). amplitude difference and that of the shape difference.
Instead of evaluating the distance of two full time series A local pattern match (LPM) p is formed from l and
based on the warping path between two fixed corner l if D1 (l, l ) < ε, which means that there is a match
entries, we propose to evaluate the distances between q between l and l . We label the positions of l in s1 and
and the subsequences of s based on the warping paths l in s2 as xp and yp respectively. A matching matrix of
from the bottom edge to the top edge of M . m × n is shown in Figure 5 to describe the match of local
patterns in s1 and s2 . The relative positions of l and l
M[m, e]
are obtained by projecting p horizontally and vertically.
x A LPM p can be described by the coordinates of two
q local patterns: p = (xp , yp , ψp ) = (θt , θt , θt − θt ), where ψp
... ...
i
M[i, j] represents the temporal shifting of two local patterns.
j
m
M[0, b] y
s
s1 p
l l
Fig. 4. Subsequence matching using warping distances p.x m
p.x
0
s1
The boundary entries are initialized as M [0, j] = 0, s2 x
M [i, 0] = +∞ (i > 0). All the other entries in M l’
l’
are computed column-by-column following the same 0 p.y n
O y p.y s2 n
recursive function as full sequence matching. Within
each column, they are computed in a bottom-up manner. Fig. 5. An example of an LPM and its corresponding local
In each column, the top entry M [m, e] is used to evaluate
patterns in matching matrix.
whether there is a matching subsequence ended at the
position e of the long time series s. For each entry (m, e)
Note that there are a number of local patterns ex-
of the top edge of M , a warping path can be traced
tracted from two time sequences s1 and s2 . A large
out. Given such a warping path (which starts at (0, b)
number of LPMs will be formed if s1 and s2 are similar
and ends at (m, e)), the warping distance (or its square)
in shapes. Their distribution can be visualized in the
between q and subsequence s[b : e] can then be measured
matching matrix formed from the two sequences.
as M [m, e]. The subsequence s[b : e] will be a matching
subsequence to q if M [m, e] ≤ δ. 3.2 Distance between two LPMs
For streaming time series scenarios, the length of s
is not fixed. Data items of s evolve dynamically. We We measure the SpADe distance of two time series by
may maintain a sliding window of width w (which is finding the best combination of LPMs in the matching
comparable to m) as the width of matrix M . When a matrix, such that they can maximize the matches of s1
new data item is appended to s, a new column of M will and s2 . The quality of LPM combination is determined
be recomputed by refreshing all entries in that column by the following two criteria: 1), the projections (vertical
in a bottom-up manner. Such a technique can also be and horizontal) of LPMs should cover large regions of
applied in subsequence matching when n is too large. In s1 and s2 . The larger the covered regions, the more
this case, instead of using a matrix of (m + 1) × (n + 1), data items in s1 and s2 are matched; 2), the temporal
a small matrix of (m + 1) × w is enough (w n). shifting of two LPMs should be as small as possible,
which means that two LPMs can be obtained by a similar
3 S PATIAL A SSEMBLING D ISTANCE transformation from local patterns in s1 to local patterns
in s2 . We define the gaps between two LPMs p1 and p2
3.1 Local pattern match on s1 and s2 as Dx (p2 , p1 ) and Dy (p2 , p1 ) respectively:
In full sequence matching, the distance between two
time sequences s1 [1 : m] and s2 [1 : n] is measured max(xp2 − xp1 − w, 0) if xp2 > xp1 ;
Dx (p2 , p1 ) =
based on the full length of two sequences. We borrow the +∞ otherwise.
idea from General Match [23], and extract a set of small max(yp2 − yp1 − w, 0) if yp2 > yp1 ;
Dy (p2 , p1 ) =
local patterns from time series by using a fixed size of +∞ otherwise.
5. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 5
The gaps are used to handle the noise and local l1 l3
unmatched regions within time series.
l l1
Definition 1: The distance of two LPMs p1 and p2 is
defined as D2 (p2 , p1 ) = g(Dx (p2 , p1 )) + g(Dy (p2 , p1 )) + l2 l4
g(|ψp2 − ψp1 |).
temporal scaling amplitude scaling
Function g(x) is a penalty on the gaps between two
LPMs, which can be defined by users, but should satisfy Fig. 6. Scaling the local patterns.
the following properties: 1) g(0) = 0; 2) g(x + y) ≥ g(x) +
g(y), (x, y ≥ 0). In our study, we simply use g(x) = x noted as V (l). If the local pattern l is cast into St temporal
which satisfies the requirements on g(x). We also define scales and Sa amplitude scales, then |V (l)| = St × Sa .
the distance (D2 ) between a LPM p and a point at the Given two time series s1 and s2 , we actually measure
top or bottom of the matching matrix, by assuming that the distance between them by only scaling one time
the point is the mid point of a virtual LPM. series s1 . A LPM p is formed by a local pattern l in s1
and a local pattern l in s2 , if ∃l ∈ V (l), D1 (l , l ) < ε.
3.3 SpADe in full sequence matching According to the definition of SpADe, to compute the
distance of s1 and s2 , we need to extract O(n) local
Definition 2: Given a path r = Ps → p1 → ... →
patterns from s2 and conduct O(n) ε-range queries over
pt → Pe formed by Ps (0, 0), Pe (m, n), and a number of
those O(mSt Sa ) scaled local patterns extracted from s1 .
LPMs p1 , . . . , pt , the length of r is defined as Cost(r) =
t−1 As a result, the total computational cost of SpADe will be
D2 (p1 , Ps ) + i=1 D2 (pi+1 , pi ) + D2 (Pe , pt ).
much higher, compared to the traditional distance mea-
Given two sequences s1 [1 : m] and s2 [1 : n], a matching sures of time series such as DTW and EDR. Therefore,
matrix can be built based on all the LPMs between s1 and we propose some approximate techniques to speed up
s2 . Given two corner points Ps (0, 0) and Pe (m, n) in the the distance computation of SpADe.
matching matrix, {ri } include all the paths derived from
the LPMs, and linking Ps and Pe .
Definition 3: The SpADe distance of s1 to s2 un- 4.2 Efficient detection of LPMs
der full sequence matching is defined as D(s1 , s2 ) = Short local patterns are preferred to describe the fine
mint Cost(rt ), rt ∈ {ri }. grained local shapes of time series. This is because long
In other words, the SpADe distance of two given time local patterns generate more false positive LPMs, as large
sequences is the length of shortest path from left-bottom ε is needed for long patterns to reduce the false dismissal
corner to the right-up corner in the matching matrix of ratio of LPMs. Haar wavelet [31] is a good candidate
these two sequences. We find the best combination of for extracting θa and θs features from local patterns,
LPMs using the shortest path connecting two end points. as low band wavelet coefficients elegantly describe the
That is why we call the distance as spatial assembling mean amplitude and the general shape of local patterns.
distance. Finding shortest paths has been well studied Moreover, the Haar wavelet is computationally efficient.
and the classic Dijkstra’s algorithm [30] can be applied. In our solution, we propose to use the first 4 low band
wavelet coefficients as θa (the first low band wavelet
4 E FFECTIVE S PAD E C OMPUTATION coefficient) and θs (the second to the fourth low band
wavelet coefficients) features of local patterns.
4.1 Handling scaling variations In many applications of time series, distances of a
The scaling variations of two time series are not handled querying time series to a number of database time series
in the original definition of SpADe given in the previous are typically computed online. To improve the efficiency
section. To handle the scaling variations, one time series of matching local patterns, those existing instances can
need to be scaled into a number of time series in both be preprocessed, and scaled local patterns can be ex-
temporal and amplitude dimensions. Then, for each tracted from them. A multi-dimensional index such as
local pattern in the original time series, a number of R-tree [32] can be used to index those local patterns so
scaled local patterns can be extracted from the scaled that ε-range queries can be efficiently processed.
time series. Figure 6 shows how a number of scaled To handle the variances of shifting and scaling, given
local patterns are extracted based on a original local a local pattern l extracted from a query time series q, a
pattern l. First, a number of local patterns (l1 and l2 in large number of existing local patterns extracted from all
the example) with the same mid points and different data sequences will match l . Therefore, many branches
lengths are extracted from the original time series as a in the R-tree are involved during the query, which incur
means of temporal scaling. Second, for each temporally much computational overhead. Inspired by VA-File [33],
scaled local pattern (l1 as an example), a number of we partition the feature space into cells, and approximate
amplitude scaled local patterns (l3 and l4 ) of same length the distance between local patterns according to the cells
are extracted from the same positions of the amplitude they fall in. As the number of dimensions is small and
scaled time series. The set of all scaled (both in time and adequate variation should be allowed, the total number
amplitude) local patterns varied from l (including l) is of filled cells is expected to be much less than the number
6. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 6
of local patterns. Consequently, each cell records a list of w = 4k, where k is an integer. The range of w is consid-
original local patterns whose scaled local patterns fall in ered based on the length of time series. It cannot be too
the cells. Therefore, only a local pattern l is maintained small as the short local patterns may not be long enough
even though more than one local patterns in V (l) fall in to represent a meaningful local shape patterns. More-
a cell c. Given a query local pattern l (located in cell c), over, small w incurs a large number of local patterns, and
all local patterns within c and the direct neighbor cells therefore drops the efficiency of SpADe. On the other
around c are treated as the matching local patterns of l . hand, w also cannot be too large as 4 wavelet coefficients
Therefore, efficiency of detecting LPMs is achieved by will be not enough to approximate the complex local
checking the matching local patterns within cells. shapes extracted from long local patterns. On practice
The space of wavelet coefficients of local patterns is (tested from many time series data sets), w can be chosen
partitioned into cells. Effective widths of cells are learned from 64 to n , where n is the average length of time series.
n
2
from the distribution of wavelet coefficients extracted We generate a number of scales in time and amplitude
from the training data set. For each wavelet coefficient fi , by specifying St and Sa . The granularity of scales is
¯ −μ
we normalize it as fi = fiσi i , where ui and σi are mean set as 0.1. For example, if St is 7, then we generate
and standard deviation of fi respectively. The widths temporal scales of 0.7, 0.8, . . . , 1.3. Parameter c is chosen
of cells in the normalized wavelet coefficient space are from 8 to 16. It cannot be too small as small c generates
set as 1 for each dimension. To limit the number of
4 large sliding steps which will lose some LPMs. On the
¯ ¯
cells, all fi > 2 or all fi < −2 are treated as outlier contrary, c does not need to be larger than 16 because a
w
partitions. Therefore, each dimension is segmented into sliding step of 16 is already fine enough as a slide. The
18 partitions, and there are totally 184 cells in the feature four parameters w, St , Sa and c are adjusted within its
space of local patterns. value range. The combination achieving best accuracy
in cross validation of training data set is learned as the
4.3 Fast SpADe using disjoint sliding windows parameters in SpADe.
Local patterns can be extracted from time series with
different granularity of sliding steps. The finest gran- 5 S PAD E ON S UBSEQUENCE M ATCHING
ularity is applied in the original definition of SpADe, SpADe is useful not only for full sequence matching,
i.e., local patterns are extracted at every position of both but for subsequence matching as well. It is a good
s1 and s2 . As a result, the number of detected LPMs candidate to continuously monitor subsequences. In this
will be very large, incurring high computational cost of section, we show how SpADe distance can be continu-
SpADe. Inspired by the idea applied in [21], we propose ously computed in subsequence matching. First we give
to speed up the SpADe computation by using wider some notions used in subsequence matching. A number
sliding steps so that the number of derived LPMs can of time series queries qs, describing the phenomenon
be remarkably reduced. In our solution, disjoint sliding interested by users, are preprocessed and stored in
windows on the query time series s2 , and a sliding step query engine. The streaming time series s continuously
of w (c is introduced for determining the width of sliding
c feeds data items to the query engine. The query engine
step) on the other time series s1 were used to extract continuously reports the matching subsequences whose
local patterns from two time series. The SpADe distance distances to any query pattern q is no more than some
can then be computed from those LPMs. The longer the given query threshold δ.
LPMs, the larger sliding steps within s1 and s2 , and the
more efficiency can be achieved on SpADe computation.
Pe 5.1 Variance of SpADe in subsequence matching
m
Given a query pattern q[1 : m] and some recent data
items s[ts : te ] in the streaming time series, the local
s1 SpADe distance of s at time point t (ts ≤ t < te ) is
defined as:
Definition 4: D(q, s, t) = mini<te D(q, s[t + 1 : i]).
0 Ps s2 n D(q, s, t) measures the distance of the best matching
subsequence (to q) starting at time point t + 1 of s.
Fig. 7. SpADe computation by disjoint sliding windows. As shown in Figure 8, D(q, s, t) can be explained as
the shortest path from point Ps (0, t) to points Pe (m, t )
4.4 Parameter learning (t < t < te ). Let tr = argmint D(q, s[t + 1 : t ]). D(q, s, t)
There are some parameters, w, St , Sa and c, which affect is actually the full sequence matching SpADe distance
the accuracy of SpADe distance. Effective values of these of q to s[t + 1 : tr ]. The global time scaling of a matching
−t
parameters can be learned from the training data set subsequence s[t+1 : tr ] to q can be measured as u = trm .
by maximizing the accuracy of cross validation on one If u = 1, the matching subsequence is in the same length
nearest neighbor classification approach. To facilitate the of q, and it is called an equal-length match; If u > 1, the
wavelet transformation, we choose the pattern length matching subsequence will be longer than q, and it is
7. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 7
t shrinking t r t+m expanding t e
m match match
consecutive LPMs pt1 and pt2 in the path, such that, pt1
is detected behind pt2 , i.e., ypt1 ≥ ypt2 , and Dx (pt2 , pt1 ) =
q +∞. According to Definition 1, D2 (pt2 , pt1 ) = +∞.
shortest path
... ... Therefore, Dc (p) = +∞, which is impossible because we
can at least find a path from Ps (0, yp − w ) to p whose
2
cost is only g(xp − w ). Consequently, p1 cannot change
2
the value of Dc (p).
0 Ps ... s ... Lemma 1 guarantees that Dc (p) can be immediately
computed when p is detected from the streaming time
Fig. 8. An example of local SpADe distance.
series. The computation of Dc (p) is to find the previous
called an expanding match; otherwise, u < 1, it is called a LPM of p, noted as p , from which the shortest path
shrinking match. from the bottom edge of the matching matrix to p is
Pattern detection tries to find subsequences of s whose found, i.e., p = argminp1 (Dc (p1 ) + D2 (p, p1 )). According
SpADe distance to query q is less than some threshold to Definition 1, p should be in the left-bottom corner
δ. This can be achieved by continuously computing local of p. Figure 9 shows the searching region ABOC of p .
SpADe distances, i.e., finding matching subsequences This is because for those LPMs whose reference point is
satisfying D(q, s, t) ≤ δ at every point of s. However, this beyond ABOC, one of the gaps of p to them will be +∞.
is not efficient because each computation of SpADe dis-
.
.
tance requires finding the shortest path of LPMs within .
O’ O p
some window size, which consumes much computation. B B’
To improve the efficiency of continuous SpADe compu-
... ε ...
tation, we propose an incremental way of computing
SpADe distance. For pattern detection, the probability q
A’ . O" C’
of having matching subsequence grows as the number .
.
A C
of LPMs increases. Much computation will be saved if ... s ...
the SpADe distance is updated only when new LPMs
are detected. Fig. 9. Searching region of previous LPM.
Definition 5: The cumulating SpADe distance of a de-
tected LPM p to query q, noted as Dc (p), is the shortest However, it is not necessary to search p in the large
path starting from points at the bottom edge of matching region of ABOC, as large gaps are usually not allowed
matrix to p. in practice. Therefore, the searching region of p can
Definition 6: The potential SpADe distance of a LPM be reduced by constraining the gaps between two con-
p to query q is defined as Dp (p) = Dc (p) + g(m − xp − w ). secutive LPMs. Figure 9 shows the constraint searching
2
Dc (p) is a lower bound on the length of paths passing region A B OC with a gap bound of ξ. The efficiency
through p and linking the bottom and top edges of the of computing Dc (p) will be improved significantly when
matching matrix. Once Dc (p) > δ, p will not emerge small ξ is applied. The cumulating SpADe distance and
in the path of any qualified matching subsequence for potential SpADe distance with the constraint region are
q. On the other hand, if Dc (p) ≤ δ, p is a promising denoted as Dc,ξ (p) and Dp,ξ (p) respectively. On detecting
LPM. Meanwhile, Dp (p) is an upper bound of the local p , we get Dc,ξ (p) = Dc,ξ (p ) + D(p, p ). For range query,
SpADe distance. Therefore, Once Dp (p) ≤ δ, a qualified if Dc,ξ (p) > δ, we simply drop p as it will not appear as
matching subsequence to the query q is found. a LPM in a qualified matching subsequence.
To find p of p, we need maintain those LPMs in the
searching region of p , and test all the LPMs within
5.2 Incremental computation of SpADe this region column-by-column. To reduce the number of
On pattern detection in streaming time series, we ac- detected LPMs, we use disjoint sliding windows on the
tually detect LPMs by cutting the most recent local streaming time series. Meanwhile, for each query pattern
pattern from streaming data sequence, extracting feature q, a sliding step of w is applied. As shown in Figure 9,
c 2
from the chopped local pattern, and retrieving LPMs the number of LPMs in A B O O” is bounded as cξ2 w
of the local pattern. On detecting a LPM p, it will be due to the strategy of sliding steps.
perfect if Dc (p) and Dp (p) can be computed on the fly. The above model guarantees that Dc,ξ (p) can be com-
The following lemma supports this incremental way of puted column-by-column because the previous LPM of
ξ
SpADe computation. p must be in the previous w columns of the column p
Lemma 1: The LPMs detected behind a LPM p on locates. Therefore, for each query pattern q, the number
streaming time series will not change Dc (p). of LPMs need to be dynamically maintained is bounded
Proof: Suppose p1 is detected behind p. Therefore, as O( cmξ ). If there are N query patterns with largest
w2
yp1 ≥ yp . If p1 changes Dc (p), it should be in the shortest ¯
length of m, the memory cost of continuous SpADe
path of Dc (p). Let p1 → ... → pt → p is a path from p1 computation will be bounded as the maximal number
¯
to p in shortest path. Then we must be able to find two of LPMs need to maintained, O( cN mξ ). If t is the av-
w2
8. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 8
ε
erage number of LP M s detected from one chopped
.
. . .
. . .
. .
. .
. .
.
local pattern of streaming time series, the complexity of . . . . . . .
nξt2 p4 p5
whole pattern detection will be O( Nw2 ). In the regions p3
BZ
where no matching subsequences appear, the number of p2 bX
q
...
bW bY
LPMs will be very small, close to zero. Therefore, the ... b1 bA ...
computation of Dc,ξ (p) will be very efficient. b0
p1 p
Along with the computation of Dc,ξ (p), we record x
the starting point of the shortest path to p. Dp,ξ (p) is p0
computed following the calculation of Dc,ξ (p). As we A
...
W X Y
...
Z
0 y s
have mentioned, once Dp,ξ (p) is found to be less than
δ, a qualified matching subsequence is detected. The Fig. 10. Pruning in SpADe distance computation.
position of matching subsequence is actually the vertical
projections from the starting point of the shortest path = 0. ∴ D2 (p4 , p1 ) ≥ D2 (p3 , p1 ), and p4 is not a potential
of p to the end point of p. Considering that the potential posterior of p1 .
SpADe distance of some LPMs around p may also satisfy We have mentioned that disjoint sliding window is
the range query, the LPM who has the smallest Dp,ξ (p) used to chop local patterns from streaming time series.
within a local region is returned as the end of a matching Therefore, a column of LPMs will be obtained for every
subsequence in this region. chopped local pattern. The post-bound of a LPM pi in
column A is bi . The post-bound of column A can be
defined as bA = maxypi =yA bi , i.e., the highest post-bound
5.3 Pruning approach in SpADe computation of pi in column A. According to Lemma 2, any LPM over
The major computational cost of range query on stream- bA and behind column A will not be a potential posterior
ing time series comes from the computation of cumulat- of any LPM in column A.
ing SpADe distance of detected LPMs. Query processing Definition 8: The estimate-bound of a column A is
will be efficient if some LPMs can be pruned without BA = maxyA −ξ≤yX <yA bX (X is a column before A).
the computation of cumulating SpADe distances. In the Figure 10 shows an example of estimate-bound BZ of
following, we introduce the concepts of post-bound and column Z. It is obvious that for a LPM p5 over BZ in
estimate-bound, and show how such a pruning approach column Z, it is not a potential posterior of any LPM in
is achieved. column W, X, Y . In other words, the previous LPM of p5
Definition 7: The post-bound of a LPM p is the highest will not be found in the searching region of p5 . Therefore,
position of the potential posteriors of p, which can be p5 can be pruned without the computation of Dc,ξ (p5 ).
located in the next column of p. The estimate-bound of a column is continuously com-
A LPM p2 is a potential posterior of p1 if Dc,ξ (p1 ) + puted based on the post-bound of previous columns. On
D2 (p2 , p1 ) ≤ δ. Suppose the post-bound of p1 is b1 , getting a promising LPM in a new column, we update
according to the definition, for any p3 satisfying yp3 = the post-bound of that column, which is further used to
yp1 + w and xp3 > b1 , p3 will not be a potential posterior compute the estimate-bound of following columns.
of p1 . Based on this, we have the following lemma.
Lemma 2: For any LPM p4 satisfying that yp4 ≥ yp1 +w 6 S TREAMING PATTERN D ETECTION FOR
and xp4 > b1 which is the post-bound of p1 , p4 will not
be a potential posterior of p1 .
M ULTI - FEATURE T IME S ERIES
Proof: We simplify xpi and ypi as xi and yi . For a p4 In SpADe, local patterns are approximated for efficient
satisfying the conditions in Lemma 2, a virtual LPM p3 matching by using wavelet transformation and grid
can be found such that y3 = y1 + w, x3 = x4 > b1 , and indexing. However, when time series are multivariate
ψp3 = ψp4 . Therefore, p3 is not a potential posterior of
p1 . To show that p4 is also not a potential posterior of sequences (i.e., s[i] is a multivariate vector instead of
p1 , we only need prove that D2 (p4 , p1 ) ≥ D2 (p3 , p1 ). The a univariate number), the number of grids for approx-
relationship of p1 , p3 and p4 is shown in Figure 10. imating local patterns will be exponentially increased,
due to the curse of dimensionality [34]. As a result, the
D2 (p3 , p1 ) = g(x3 − x1 − w) + g(|x3 − x1 − w|)
cost of indexing and matching local patterns increases
D2 (p4 , p1 ) = g(x4 − x1 − w) + g(y4 − y1 − w)
exponentially. To efficiently apply SpADe distance to
+g(|(x4 − x1 ) − (y4 − y1 )|) streaming pattern detection of multi-feature time series,
x3 = x4 we propose to decompose the multi-feature time series
g(x + y) ≥ g(x) + g(y), x, y ≥ 0 into a number of time series of univariate data, and then
∴ ΔD = D2 (p4 , p1 ) − D2 (p3 , p1 ) match them in parallel. The local distances of matching
subsequences ended at the same position of different
= g(y4 − y1 − w) + g(|(x4 − x1 ) − (y4 − y1 )|) − g(|x4 − x1 − w|) decomposed time series are aggregated on the fly, which
gives an overall evaluation of the match between the
= g(|(x4 − x1 ) − (y4 − y1 )|) + g(y4 − y1 − w) − g(x4 − x1 − w)
subsequence (ended at the current position) of streaming
≥ g(|(x4 − x1 ) − (y4 − y1 )|) − g(|(x4 − x1 − w) − (y4 − y1 − w)|) time series and the query pattern.
9. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. Y, JANUARY 200Z 9
For a query pattern and a streaming sequence q and As shown in Table 1, we compare the distance measures
s, given a dimension i, SpADe is applied to evaluate the based on the classification accuracy over 19 data sets.
match between qi and si (which are the ith decomposed For each distance measure, we learn the parameters (e.g.,
sequences of q and s). At column j, the local match of si warping width of DTW, matching threshold ε of EDR,
is defined as Di,j (q, s) = minp∈P Dp (p), where P is the w, c, St and Sa of SpADe) from the training data set by
set of all LPMs (between qi and si ) detected in column maximizing the 1NN classification accuracy of leave one
j. It is actually the minimal potential SpADe distance of out cross validation. The classification accuracy on test
all LPMs detected in column j. If there is no LPM in data set is shown in Table 1. We see that in many time
column j, Di,j (q, s) = g(m), where m is the length of q. series data sets, especially in many of those with smooth
In the example shown in Figure 11, Di,j (q, s) = Dp (p1 ). shapes, SpADe achieves higher accuracy than the other
distance measures.
c j−2 c j−1 c j
m
Data set Euclidean DTW EDR LCSS SpADe
Syn. con. 0.880 0.983 0.960 0.877 0.953
Gun point 0.913 0.913 0.980 0.980 1.000
CBF 0.852 0.996 0.989 0.988 0.959
p1 FaceAll 0.714 0.808 0.806 0.718 0.767
qi
OSULeaf 0.517 0.616 0.785 0.777 0.889
Swed. leaf 0.787 0.843 0.904 0.867 0.888
50words 0.631 0.758 0.802 0.773 0.793
p2 Trace 0.760 0.990 0.960 1.000 1.000
Two Pat. 0.910 0.998 0.998 0.999 0.990
0 ... si ... Wafer
FaceFour
0.995
0.784
0.995
0.886
0.993
0.966
0.988
0.920
0.994
0.977
Lighting2 0.754 0.869 0.852 0.803 0.755
Fig. 11. An example of local (best) match. Lighting7 0.575 0.712 0.699 0.712 0.699
ECG200 0.880 0.880 0.900 0.870 0.840
Adiac 0.611 0.609 0.616 0.558 0.681
For feature sequences qi and si , the local best Yoga 0.830 0.845 0.806 0.849 0.857
match of si at column j is defined as Di,j (q, s) = Fish 0.783 0.840 0.920 0.914 0.943
Problem4 0.917 0.900 0.917 0.933 0.933
minj ≤j (Di,j (q, s) + g(w × (j − j ))). Assuming that the Problem12 0.829 0.913 0.883 0.895 0.898
query pattern contains d features, we then define the
TABLE 1
local best match of s to q at column j as Dj (q, s) =
d Accuracy of 1NN classification in full sequence matching.
i=1 Di,j (q, s). It is obvious that Dj (q, s) is the aggre-
gation of local best matches of si for all decomposed
sequences. It therefore gives an overall evaluation of 7.1.2 Impact of parameters
the distance of a subsequence of s (ended at column The length of local patterns w is an important parameter.
j) to the query q. Because all decomposed sequences of It determines the complexity of shapes in the extracted
s are compared against the corresponding decomposed local patterns. However, optimal w can be learned from
sequences of q in parallel, the local best match of s can training data sets, and it can also be set as a trade off
then be continuously (column-by-column) computed. between the accuracy and efficiency of classification. We
show the impact of pattern length on the accuracy of
7 P ERFORMANCE E VALUATION leave one out cross validation of 1NN classification in
In our performance evaluation, we compare SpADe with Figure 12. Three data sets of different shapes are used in
some commonly used distance measures of time series: this test. The shapes of some examples of time series are
Euclidean distance, DTW and EDR in terms of accuracy shown on the left, and the accuracy of corresponding
and efficiency. Our test platform is a PC with Pentium4 data set is shown on the right. In this test, given a
3.0G CPU and 1G RAM. pattern length w, the maximal accuracy achieved by
adjusting c, St and Sa is recorded. We can see that
shorter local patterns are preferred in the Fish data set
7.1 Full sequence matching of SpADe (Figures 12(a) and 12(b)) to capture the local shapes more
We use the UCR Time Series Classification/Clustering accurately because those local shapes are important in
data sets [9] for testing the performance of SpADe in identifying the labels of instances in this data set; For
full sequence matching. the Problem4 data set (Figures 12(c) and 12(d)), longer
local patterns are preferred as there are too much high
7.1.1 Accuracy in full sequence matching frequency dithering within the shapes of time series.
Like in many other studies [35], [3], one nearest neighbor The shapes of short local patterns are meaningless in
classification (1NN) is used to test the accuracy of dis- this data set. On the contrary, the wavelet approxima-
tances under full sequence matching. In 1NN classifica- tion of long local patterns reduces the impact of high
tion, for each sequence in the testing data set, we predict frequency dithering. It therefore smooths the shapes of
its label from its nearest neighbor in the training data set. time series; For the Problem12 data set (Figures 12(e)
If the derived label is the same as the original label of the and 12(f)), pattern length w does not affects the accuracy
testing sequence, we get a hit; Otherwise, we get a miss. too much. However, it cannot be too long as the wavelet