SlideShare uma empresa Scribd logo
1 de 16
Mining Time-Series Data
1
Time-Series Database
 Consists of sequences of values or events obtained over
repeated measurements of time (weekly, hourly…)
 Stock market analysis, economic and sales forecasting, scientific
and engineering experiments, medical treatments etc.
 Can also be considered as a Sequence database
 Consists of a sequence of ordered events (time – optional)
 Web page Traversal Sequence
 Time-Series data can be analyzed to:
 Identify correlations
 Similar / Regular patterns, trends, outliers
2
Trend Analysis
Time Series involving a variable Y can be represented as a function
of time t, Y = F(t)
Goals of Time-Series Analysis
 Modeling time series - To gain insight into the mechanism
 Forecasting time series - For prediction
3
Trend Analysis
 Trend Analysis Components
 Long-term or trend movements (trend curve): general direction in
which a time series is moving over a long interval of time
 Cyclic movements or cycle variations: long term oscillations
about a trend line or curve
 e.g., business cycles, may or may not be periodic
 Seasonal movements or seasonal variations
 i.e, almost identical patterns that a time series appears to
follow during corresponding months of successive years.
 Irregular or random movements
 Time series analysis: decomposition of a time series into
these four basic movements
 Additive Model: TS = T + C + S + I
 Multiplicative Model: TS = T × C × S × I
4
Trend Analysis
 Adjusting Seasonal fluctuations
 Given a series of measurements y1, y,2, y3… influences of the data
that are systematic / calendar related must be removed
 Fluctuations conceal true underlying movement of the series and non-seasonal
characteristics
 De-seasonalize the data
 Seasonal Index – set of numbers showing the relative values of a
variable during the months of a year
 Sales during Oct, Nov, Dec – 80%, 120% and 140% of average monthly sales –
Seasonal index – 80, 120, 140
 Dividing original monthly data by seasonal index – De-seasonalizes data
 Auto-Correlation Analysis
 To detect correlations between ith element and (i-k)th element k- lag
 Pearson’s coefficient can be used between <y1, y2,…yN-k> and <yk+1,
yk+2, ..yN>
5
Trend Analysis
 Estimating Trend Curves
 The freehand method
 Fit the curve by looking at the graph
 Costly and barely reliable for large-scaled data mining
 The least-square method
 Find the curve minimizing the sum of the squares of the
deviation di of points yi on the curve from the corresponding
data points - ∑i=1
n
di
2
 The moving-average method
6
Trend Analysis
 Moving Average Method
 Smoothes the data
 Eliminates cyclic, seasonal and irregular movements
 Loses the data at the beginning or end of a series
 Sensitive to outliers (can be reduced by Weighted Moving Average)
 Assigns greater weight to center elements to eliminate smoothing effects
 Ex: 3 7 2 0 4 5 9 7 2
 Moving average of order 3: 4 3 2 3 6 7 6
 Weighted average (1 4 1): 5.5 2.5 1 3.5 5.5 8 6.5
7
Trend Analysis
 Once trends are detected data can be divided by
corresponding trend values
 Cyclic Variations can be handled using Cyclic Indexes
 Time-Series Forecasting
 Long term / Short term predictions
 ARIMA – Auto-Regressive Integrated Moving Average
8
Similarity Search
 Normal database query finds exact match
 Similarity search finds data sequences that differ only slightly
from the given query sequence
 Two categories of similarity queries
 Whole matching: find a sequence that is similar to the query
sequence
 Subsequence matching: find all pairs of similar sequences
 Typical Applications
 Financial market
 Market basket data analysis
 Scientific databases
 Medical diagnosis
9
Data Reduction and Transformation
 Time Series data – high-dimensional data – each point
of time can be viewed as a dimension
 Dimensionality Reduction techniques
 Signal Processing techniques
 Discrete Fourier Transform
 Discrete Wavelet Transform
 Singular Value Decomposition based on PCA
 Random projection-based Sketches
 Time Series data is transformed and strongest coefficients –
features
 Techniques may require values in Frequency domain
 Distance preserving Ortho-normal transformations
 The distance between two signals in the time domain is the same as their
Euclidean distance in the frequency domain
10
Indexing methods for Similarity
Search
 Multi-dimensional index
 Use the index to retrieve the sequences that are at most a certain
small distance away from the query sequence
 Perform post-processing by computing the actual distance
between sequences in the time domain and discard any false
matches
 Sequence is mapped to trails, trail is divided into sub-trails
 Indexing techniques
 R-trees, R*-trees, Suffix trees etc
11
Subsequence Matching
 Break each sequence into a set of pieces of window with length w
 Extract the features of the subsequence inside the window
 Map each sequence to a “trail” in the feature space
 Divide the trail of each sequence into “subtrails” and represent each
of them with minimum bounding rectangle
 Use a multi-piece assembly algorithm to search for longer sequence
matches
 Uses Euclidean distance (Sensitive to outliers)
12
Similarity Search Methods
 Practically there maybe differences in the baseline and
scale
 Distance from one baseline to another – offset
 Data has to be normalized
 Sequence X = <x1, x2, ..xn> can be replaced by X’ = <x1’, x2’, …xn’>
where xi’ = xi - µ / σ
 Two subsequences are considered similar if one lies within an
envelope of ε width around the other, ignoring outliers
 Two sequences are said to be similar if they have enough non-
overlapping time-ordered pairs of similar subsequences
 Parameters specified by a user or expert: sliding window size,
width of an envelope for similarity, maximum gap, and matching
fraction
13
Similarity Search Methods
14
Similarity Search Method
 Atomic matching
 Find all pairs of gap-free windows of a small length that are
similar
 Window stitching
 Stitch similar windows to form pairs of large similar
subsequences allowing gaps between atomic matches
 Subsequence Ordering
 Linearly order the subsequence matches to determine whether
enough similar pieces exist
15
Query Languages for Time Sequence
 Time-sequence query language
 Should be able to specify sophisticated queries like
 Find all of the sequences that are similar to some sequence in class A, but not similar to
any sequence in class B
 Should be able to support various kinds of queries: range queries, all-
pair queries, and nearest neighbor queries
 Shape definition language
 Allows users to define and query the overall shape of time sequences
 Uses human readable series of sequence transitions or macros
 Ignores the specific details
 E.g., the pattern up, Up, UP can be used to describe increasing
degrees of rising slopes
 Macros: spike, valley, etc.
16

Mais conteúdo relacionado

Mais procurados

Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streamshktripathy
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningSlideshare
 
OLAP operations
OLAP operationsOLAP operations
OLAP operationskunj desai
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningParas Kohli
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methodsrajshreemuthiah
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Salah Amean
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)DheerajPachauri
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 

Mais procurados (20)

Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Data cubes
Data cubesData cubes
Data cubes
 
OLAP operations
OLAP operationsOLAP operations
OLAP operations
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)Clustering for Stream and Parallelism (DATA ANALYTICS)
Clustering for Stream and Parallelism (DATA ANALYTICS)
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 

Semelhante a 5.2 mining time series data

Using timeseries extraction the mining.ppt
Using timeseries extraction the mining.pptUsing timeseries extraction the mining.ppt
Using timeseries extraction the mining.pptjohnsaida3101
 
20IT501_DWDM_PPT_Unit_V.ppt
20IT501_DWDM_PPT_Unit_V.ppt20IT501_DWDM_PPT_Unit_V.ppt
20IT501_DWDM_PPT_Unit_V.pptPalaniKumarR2
 
Iso 9001 consultants uk
Iso 9001 consultants ukIso 9001 consultants uk
Iso 9001 consultants ukjondarita
 
Methods of quality management
Methods of quality managementMethods of quality management
Methods of quality managementselinasimpson2001
 
What is quality management system
What is quality management systemWhat is quality management system
What is quality management systemselinasimpson0201
 
Quality manual for iso 9001
Quality manual for iso 9001Quality manual for iso 9001
Quality manual for iso 9001jondarita
 
Quality management conference
Quality management conferenceQuality management conference
Quality management conferenceselinasimpson311
 
Iso 9001 course
Iso 9001 courseIso 9001 course
Iso 9001 courseporikgefus
 
Iso 9001 manual
Iso 9001 manualIso 9001 manual
Iso 9001 manualjomharipe
 
Iso 9001 courses
Iso 9001 coursesIso 9001 courses
Iso 9001 coursesdenritafu
 
Iso 9001 templates
Iso 9001 templatesIso 9001 templates
Iso 9001 templatespogerita
 
Quality assurance and management
Quality assurance and managementQuality assurance and management
Quality assurance and managementselinasimpson2301
 
Quality management presentation
Quality management presentationQuality management presentation
Quality management presentationselinasimpson1501
 
Ts 16949 quality management system
Ts 16949 quality management systemTs 16949 quality management system
Ts 16949 quality management systemselinasimpson381
 
Course in quality management
Course in quality managementCourse in quality management
Course in quality managementselinasimpson361
 
What is iso 9001 standard
What is iso 9001 standardWhat is iso 9001 standard
What is iso 9001 standardjomjintra
 

Semelhante a 5.2 mining time series data (20)

Time series.ppt
Time series.pptTime series.ppt
Time series.ppt
 
Using timeseries extraction the mining.ppt
Using timeseries extraction the mining.pptUsing timeseries extraction the mining.ppt
Using timeseries extraction the mining.ppt
 
20IT501_DWDM_PPT_Unit_V.ppt
20IT501_DWDM_PPT_Unit_V.ppt20IT501_DWDM_PPT_Unit_V.ppt
20IT501_DWDM_PPT_Unit_V.ppt
 
17329274.ppt
17329274.ppt17329274.ppt
17329274.ppt
 
Quality management thesis
Quality management thesisQuality management thesis
Quality management thesis
 
Iso 9001 consultants uk
Iso 9001 consultants ukIso 9001 consultants uk
Iso 9001 consultants uk
 
Methods of quality management
Methods of quality managementMethods of quality management
Methods of quality management
 
What is quality management system
What is quality management systemWhat is quality management system
What is quality management system
 
Quality manual for iso 9001
Quality manual for iso 9001Quality manual for iso 9001
Quality manual for iso 9001
 
Quality management conference
Quality management conferenceQuality management conference
Quality management conference
 
Iso 9001 course
Iso 9001 courseIso 9001 course
Iso 9001 course
 
Iso 9001 manual
Iso 9001 manualIso 9001 manual
Iso 9001 manual
 
Iso 9001 courses
Iso 9001 coursesIso 9001 courses
Iso 9001 courses
 
Quality management cycle
Quality management cycleQuality management cycle
Quality management cycle
 
Iso 9001 templates
Iso 9001 templatesIso 9001 templates
Iso 9001 templates
 
Quality assurance and management
Quality assurance and managementQuality assurance and management
Quality assurance and management
 
Quality management presentation
Quality management presentationQuality management presentation
Quality management presentation
 
Ts 16949 quality management system
Ts 16949 quality management systemTs 16949 quality management system
Ts 16949 quality management system
 
Course in quality management
Course in quality managementCourse in quality management
Course in quality management
 
What is iso 9001 standard
What is iso 9001 standardWhat is iso 9001 standard
What is iso 9001 standard
 

Mais de Krish_ver2

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back trackingKrish_ver2
 
5.5 back track
5.5 back track5.5 back track
5.5 back trackKrish_ver2
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02Krish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithmKrish_ver2
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03Krish_ver2
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programmingKrish_ver2
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-iKrish_ver2
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03Krish_ver2
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquerKrish_ver2
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03Krish_ver2
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02Krish_ver2
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing extKrish_ver2
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashingKrish_ver2
 

Mais de Krish_ver2 (20)

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back tracking
 
5.5 back track
5.5 back track5.5 back track
5.5 back track
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithm
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programming
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-i
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquer
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02
 
5.1 greedy
5.1 greedy5.1 greedy
5.1 greedy
 
5.1 greedy 03
5.1 greedy 035.1 greedy 03
5.1 greedy 03
 
4.4 hashing02
4.4 hashing024.4 hashing02
4.4 hashing02
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing ext
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashing
 
4.2 bst
4.2 bst4.2 bst
4.2 bst
 

Último

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 

Último (20)

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 

5.2 mining time series data

  • 2. Time-Series Database  Consists of sequences of values or events obtained over repeated measurements of time (weekly, hourly…)  Stock market analysis, economic and sales forecasting, scientific and engineering experiments, medical treatments etc.  Can also be considered as a Sequence database  Consists of a sequence of ordered events (time – optional)  Web page Traversal Sequence  Time-Series data can be analyzed to:  Identify correlations  Similar / Regular patterns, trends, outliers 2
  • 3. Trend Analysis Time Series involving a variable Y can be represented as a function of time t, Y = F(t) Goals of Time-Series Analysis  Modeling time series - To gain insight into the mechanism  Forecasting time series - For prediction 3
  • 4. Trend Analysis  Trend Analysis Components  Long-term or trend movements (trend curve): general direction in which a time series is moving over a long interval of time  Cyclic movements or cycle variations: long term oscillations about a trend line or curve  e.g., business cycles, may or may not be periodic  Seasonal movements or seasonal variations  i.e, almost identical patterns that a time series appears to follow during corresponding months of successive years.  Irregular or random movements  Time series analysis: decomposition of a time series into these four basic movements  Additive Model: TS = T + C + S + I  Multiplicative Model: TS = T × C × S × I 4
  • 5. Trend Analysis  Adjusting Seasonal fluctuations  Given a series of measurements y1, y,2, y3… influences of the data that are systematic / calendar related must be removed  Fluctuations conceal true underlying movement of the series and non-seasonal characteristics  De-seasonalize the data  Seasonal Index – set of numbers showing the relative values of a variable during the months of a year  Sales during Oct, Nov, Dec – 80%, 120% and 140% of average monthly sales – Seasonal index – 80, 120, 140  Dividing original monthly data by seasonal index – De-seasonalizes data  Auto-Correlation Analysis  To detect correlations between ith element and (i-k)th element k- lag  Pearson’s coefficient can be used between <y1, y2,…yN-k> and <yk+1, yk+2, ..yN> 5
  • 6. Trend Analysis  Estimating Trend Curves  The freehand method  Fit the curve by looking at the graph  Costly and barely reliable for large-scaled data mining  The least-square method  Find the curve minimizing the sum of the squares of the deviation di of points yi on the curve from the corresponding data points - ∑i=1 n di 2  The moving-average method 6
  • 7. Trend Analysis  Moving Average Method  Smoothes the data  Eliminates cyclic, seasonal and irregular movements  Loses the data at the beginning or end of a series  Sensitive to outliers (can be reduced by Weighted Moving Average)  Assigns greater weight to center elements to eliminate smoothing effects  Ex: 3 7 2 0 4 5 9 7 2  Moving average of order 3: 4 3 2 3 6 7 6  Weighted average (1 4 1): 5.5 2.5 1 3.5 5.5 8 6.5 7
  • 8. Trend Analysis  Once trends are detected data can be divided by corresponding trend values  Cyclic Variations can be handled using Cyclic Indexes  Time-Series Forecasting  Long term / Short term predictions  ARIMA – Auto-Regressive Integrated Moving Average 8
  • 9. Similarity Search  Normal database query finds exact match  Similarity search finds data sequences that differ only slightly from the given query sequence  Two categories of similarity queries  Whole matching: find a sequence that is similar to the query sequence  Subsequence matching: find all pairs of similar sequences  Typical Applications  Financial market  Market basket data analysis  Scientific databases  Medical diagnosis 9
  • 10. Data Reduction and Transformation  Time Series data – high-dimensional data – each point of time can be viewed as a dimension  Dimensionality Reduction techniques  Signal Processing techniques  Discrete Fourier Transform  Discrete Wavelet Transform  Singular Value Decomposition based on PCA  Random projection-based Sketches  Time Series data is transformed and strongest coefficients – features  Techniques may require values in Frequency domain  Distance preserving Ortho-normal transformations  The distance between two signals in the time domain is the same as their Euclidean distance in the frequency domain 10
  • 11. Indexing methods for Similarity Search  Multi-dimensional index  Use the index to retrieve the sequences that are at most a certain small distance away from the query sequence  Perform post-processing by computing the actual distance between sequences in the time domain and discard any false matches  Sequence is mapped to trails, trail is divided into sub-trails  Indexing techniques  R-trees, R*-trees, Suffix trees etc 11
  • 12. Subsequence Matching  Break each sequence into a set of pieces of window with length w  Extract the features of the subsequence inside the window  Map each sequence to a “trail” in the feature space  Divide the trail of each sequence into “subtrails” and represent each of them with minimum bounding rectangle  Use a multi-piece assembly algorithm to search for longer sequence matches  Uses Euclidean distance (Sensitive to outliers) 12
  • 13. Similarity Search Methods  Practically there maybe differences in the baseline and scale  Distance from one baseline to another – offset  Data has to be normalized  Sequence X = <x1, x2, ..xn> can be replaced by X’ = <x1’, x2’, …xn’> where xi’ = xi - µ / σ  Two subsequences are considered similar if one lies within an envelope of ε width around the other, ignoring outliers  Two sequences are said to be similar if they have enough non- overlapping time-ordered pairs of similar subsequences  Parameters specified by a user or expert: sliding window size, width of an envelope for similarity, maximum gap, and matching fraction 13
  • 15. Similarity Search Method  Atomic matching  Find all pairs of gap-free windows of a small length that are similar  Window stitching  Stitch similar windows to form pairs of large similar subsequences allowing gaps between atomic matches  Subsequence Ordering  Linearly order the subsequence matches to determine whether enough similar pieces exist 15
  • 16. Query Languages for Time Sequence  Time-sequence query language  Should be able to specify sophisticated queries like  Find all of the sequences that are similar to some sequence in class A, but not similar to any sequence in class B  Should be able to support various kinds of queries: range queries, all- pair queries, and nearest neighbor queries  Shape definition language  Allows users to define and query the overall shape of time sequences  Uses human readable series of sequence transitions or macros  Ignores the specific details  E.g., the pattern up, Up, UP can be used to describe increasing degrees of rising slopes  Macros: spike, valley, etc. 16