SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
A Framework for Online Clustering
Based on Evolving Semi-Supervision
Guilherme Alves, Maria Camila N. Barioni and Elaine Faria
Universidade Federal de Uberlândia
Outline
■ Introduction
■ The CABESS Framework
■ Pointwise CABESS
■ Experiments and Results
■ Conclusion
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 2
Introduction
Motivation
■ The desired organization for the data may change over time.
Semi-supervised approaches may be useful for guiding clustering
algorithms in the adaptation process.
■ Additional information (semi-supervision) may change over time.
■ It may causes clustering transitions: birth, split or merge.
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 3
Introduction
Objective and Research Questions
Goal: to provide a framework that is able to use and maintain
semi-supervision correctly to enable efficient and effective online
clustering processes.
Q1. How accurately does semi-supervision aid on clustering
effectiveness when there are external clustering transitions over time?
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 4
Introduction
Objective and Research Questions
Goal: to provide a framework that is able to use and maintain
semi-supervision correctly to enable efficient and effective online
clustering processes.
Q1. How accurately does semi-supervision aid on clustering
effectiveness when there are external clustering transitions over time?
Q2. Are there major differences between clustering effectiveness when
using semi-supervised clustering approaches based on feedback and
labels?
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 5
Introduction
Objective and Research Questions
Goal: to provide a framework that is able to use and maintain
semi-supervision correctly to enable efficient and effective online
clustering processes.
Q1. How accurately does semi-supervision aid on clustering
effectiveness when there are external clustering transitions over time?
Q2. Are there major differences between clustering effectiveness when
using semi-supervised clustering approaches based on feedback and
labels?
Q3. How the feedback window size variation affects semi-supervision
information and clustering effectiveness?
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 6
Introduction
Objective and Research Questions
Goal: to provide a framework that is able to use and maintain
semi-supervision correctly to enable efficient and effective online
clustering processes.
Q1. How accurately does semi-supervision aid on clustering
effectiveness when there are external clustering transitions over time?
Q2. Are there major differences between clustering effectiveness when
using semi-supervised clustering approaches based on feedback and
labels?
Q3. How the feedback window size variation affects semi-supervision
information and clustering effectiveness?
Q4. How efficient is our approach compared to existing semi-supervised
approaches?
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 7
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 8
𝑡0
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
III.Groundtruth
Time
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 9
𝑡0 𝑡
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
3
3 3 2 2
2
3 2
4
4
4
4
4
2 2
2
2
2
3
33
4 4
4
4
2 2
4 4
III.Groundtruth
Time
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 10
𝑡0 𝑡 𝑡
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
3
3 3 2 2
2
3 2
4
4
4
4
4
2 2
2
2
2
3
33
4 4
4
4
2 2
4 4
3
3 2 2
2
3 2
4
4
4
4 4
4
2
2
4
3
4
4
2 2
2
3
4 4
3 2
2233
3
2
2
3
III.Groundtruth
Time
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 11
𝑡0 𝑡 𝑡 𝑡
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
3
3 3 2 2
2
3 2
4
4
4
4
4
2 2
2
2
2
3
33
4 4
4
4
2 2
4 4
3
3 2 2
2
3 2
4
4
4
4 4
4
2
2
4
3
4
4
2 2
2
3
4 4
3 2
2233
3
2
2
3 3
3 2 2
2
3 2
4
4
4 4
4
2
2
2
4
3
4
4
4
2 23
4 4
3
2 2
4
22
2233
3
3
III.Groundtruth
Time
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 12
𝑡0
1 2
1
1
2 2
2
1
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
II.PartitionSetIII.Groundtruth
Time
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 13
𝑡0 𝑡
1 2
1
1
2 2
2
1
1 2
1
2 2
23
3
41
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
3
3 3 2 2
2
3 2
4
4
4
4
4
2 2
2
2
2
3
33
4 4
4
4
2 2
4 4
II.PartitionSetIII.Groundtruth
Time
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 14
𝑡0 𝑡 𝑡
DivisãoSplit: → { , }
1 2
1
1
2 2
2
1
1 2
1
2 2
23
3
41
2
4 4
2
4
3
1
2 2
23
3
2
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
3
3 3 2 2
2
3 2
4
4
4
4
4
2 2
2
2
2
3
33
4 4
4
4
2 2
4 4
3
3 2 2
2
3 2
4
4
4
4 4
4
2
2
4
3
4
4
2 2
2
3
4 4
3 2
2233
3
2
2
3
II.PartitionSetIII.Groundtruth
Time
I.Clustering
transition
Introduction
Example
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 15
𝑡0 𝑡 𝑡 𝑡
DivisãoSplit: → { , }
1 2
1
1
2 2
2
1
1 2
1
2 2
23
3
41
2
4 4
2
4
3
1
2 2
23
3
2
3 2
4
4
2
4
3
4
2 2
2
23
3
1 1
1 1 2 2
2
1 2
1
1
1
1
1 1
1
2 2
2
2
2
1
1
1
1
1
3
3 3 2 2
2
3 2
4
4
4
4
4
2 2
2
2
2
3
33
4 4
4
4
2 2
4 4
3
3 2 2
2
3 2
4
4
4
4 4
4
2
2
4
3
4
4
2 2
2
3
4 4
3 2
2233
3
2
2
3 3
3 2 2
2
3 2
4
4
4 4
4
2
2
2
4
3
4
4
4
2 23
4 4
3
2 2
4
22
2233
3
3
II.PartitionSetIII.Groundtruth
Time
I.Clustering
transition
Introduction
Main Contributions
■ The introduction of CABESS (Cluster Adaptation Based on
Evolving Semi-Supervision)
■ A framework for online clustering using semi-supervision in the form
of feedback.
■ A strategy that extracts semi-supervision information from
feedback given in the form of labels.
■ An approach to keep the labels consistent over time.
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 16
The CABESS Framework
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 17
AF Summarization
1
Semi-supervised
Clustering 4
Clustering
2
ℱ
Semi-Supervision
Deduction 3
B
Transition
detection 5
𝒟
CABESS
𝑡 ℛ 𝑡
, 𝒮 𝑡
ℱ 𝑡
𝒟 𝑡
T
Partition
Set Storage
𝑡 𝑡−
𝑡
F
T
𝑡−
𝒮 𝑡
A
B
Verify if new instances have been generated or the user has been satisfied with the cluster quality.
Verify if it is the first clustering performed.
The Framework CABESS
Pointwise CABESS
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 18
AF
BIRCH
[Zhang et al.1996]
1
SSDBScan
[Lelis and Sander
2009] 4
DBScan
[Ester et al.1996]
2
ℱ
Semi-Supervision
Deduction 3
B
MONIC
[Spiliopoulou et al.
2006] 5
𝒟
CABESS
𝑡 ℛ 𝑡
, 𝒮 𝑡
ℱ 𝑡
𝒟 𝑡
T
Partition
Set Storage
𝑡 𝑡−
𝑡
F
T
𝑡−
𝒮 𝑡
The CABESS Framework » Pointwise CABESS
Extracting labels from feedbacks
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 19
1. Feedbacks
𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛
Instance-level labels
✓
✓
✓
✓
✓
✓
✓
(a)
The CABESS Framework » Pointwise CABESS
Extracting labels from feedbacks
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 20
1. Feedbacks
𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛
Instance-level labels
■ A neighborhood 𝑁 is defined as a set of instances that must be in the
same cluster.
✓
✓
✓
✓
✓
✓
✓
(a) (b)
The CABESS Framework » Pointwise CABESS
Extracting labels from feedbacks
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 21
1. Feedbacks
𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛
Instance-level labels
■ A neighborhood 𝑁 is defined as a set of instances that must be in the
same cluster.
■ Same previous cluster AND received positive feedback → Same label
■ Received displacement feedback → assign to label of the neighborhood
of the destination rule.
✓
✓
✓
✓
✓
✓
✓
1
1
2
1
2
2
2
2
2
(a) (b) (c)
The CABESS Framework » Pointwise CABESS
Extracting labels from feedbacks
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 22
2. Instance-level labels
𝑑𝑒𝑑𝑢𝑐𝑡𝑖𝑜𝑛
Summarized-level labels
■ Performed as a propagation task
■ If one of the summarized instances has labeled instances with
different labels
■ Split it to obtain purified summarized instances
■ summarized instances that contain only the same label in labeled
instances.
1
1
2
1
2
2
2
2
2
1
1
2
1
2
2
2
2
2
1
1
2
2
2
2
(c) (d)
The CABESS Framework » Pointwise CABESS
Dealing with obsolete labels
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 23
■ Obsolete labels: labels assigned to instances, for which the
clusters do not exist anymore.
■ Minimizing the problem: adoption of a detector of transitions.
■ Better neighborhood management.
■ when a cluster survives both neighborhood and associated labels are
preserved.
𝑠𝑝𝑙𝑖𝑡
does not exist at 𝒕 → label 2 becomes obsolete
𝑡 𝑡
Experiments
Datasets
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 24
Name # instances d # classes Reference Type
DB7 9,050 2 8 [Silva et al. 2015]
SyntheticSYN3 5,000 2 3 streamMOA
SYN4 10,000 3 5 streamMOA
FROGS 1,484 8 4 [Colonna et al. 2016]
RealIPEA 5,564 5 27 IPEA
KDD’995 24,692 19 11 UCI
Experiments
Comparison Methods
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 25
■ Unsupervised (DBScan)
■ Consists in periodically executing a clustering algorithm without any
semi-supervision.
■ Semi-supervised (SSDBScan)
■ Static Approach.
■ Consists in periodically applying a semi-supervised clustering algorithm.
■ This approach does not discard any label over time.
■ Window-based Approach.
■ It is a variation of the previous approach.
■ Instead of executing the clustering algorithm over all the label set, we
remove old labels.
Experiments
Protocol and Evaluation
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 26
■ Adjusted Rand Index (ARI)
■ Prequential Protocol
■ For each timestamp 𝑡 only one label is considered valid for a
data instance according to the grouping tree.
■ Online arrival of instances and feedback.
■ The arrival of the data instances and the user feedback are given
according to the uniform distribution.
Experimental Results
Semi-supervised vs. Unsupervised
Q1. How accurately does semi-supervision aid on clustering
effectiveness when there are external clustering transitions over time?
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 27
Experimental Results
Semi-supervised vs. Unsupervised
Q1. How accurately does semi-supervision aid on clustering
effectiveness when there are external clustering transitions over time?
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 28
Experimental Results
Feedbacks vs. Labels
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 29
Q2. Are there major differences between clustering effectiveness when
using semi-supervised clustering approaches based on feedback and labels?
Experimental Results
Feedbacks vs. Labels
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 30
Q2. Are there major differences between clustering effectiveness when
using semi-supervised clustering approaches based on feedback and labels?
Experimental Results
Feedback Window Size Variation
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 31
Q3. How the feedback window size variation affects semi-supervision
information and clustering effectiveness?
Experimental Results
Efficiency
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 32
Q4. How efficient is our approach compared to existing semi-supervised
approaches?
Conclusion
■ Higher efficiency when compared to other semi-supervised
approaches
■ While keeping an equivalent effectiveness.
■ Future works:
■ Exploring other types of semi-supervision information such as
instance-level constraints.
■ Tackling other strategies for detecting transitions.
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 33
References
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996). A density-based algorithm for
discovering clusters in large spatial databases with noise. In KDD, pages 226–231.
AAAI Press
Colonna, J. G., Gama, J., and Nakamura, E. F. (2016). Recognizing Family, Genus, and
Species of Anuran Using a Hierarchical Classification Approach. pages 198–212.
Springer, Cham.
Lai, H. P., Visani, M., Boucher, A., and Ogier, J.-M. (2014). A new interactive semi-
supervised clustering model for large image database indexing. Pattern
Recognition Letters, 37(1):94–106.
Lelis, L. and Sander, J. (2009). Semi-supervised Density-Based Clustering. In IEEE
ICDM, pages 842–847.
Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., and Schult, R. (2006). MONIC: Modeling
and Monitoring Cluster Transitions. In ACM SIGKDD, page 706, New York, NY, USA.
ACM Press.
Zhang, T., Ramakrishnan, R., and Livny, M. (1996). BIRCH: An Efficient Data
Clustering Method for very Large Databases. ACM SIGMOD Record, 25(2):103–114.
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 34
A Framework for Online
Clustering Based on
Evolving Semi-Supervision
Guilherme Alves guilhermealves@ufu.br
Maria Camila N. Barioni camila.barioni@ufu.br
Elaine Faria elaine@ufu.br
6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 35
Acknowledgments

Mais conteúdo relacionado

Semelhante a A Framework for Online Clustering Based on Evolving Semi-supervision

A Hybrid Memory Data Cube Approach for High Dimension Relations
A Hybrid Memory Data Cube Approach for High Dimension RelationsA Hybrid Memory Data Cube Approach for High Dimension Relations
A Hybrid Memory Data Cube Approach for High Dimension RelationsRodrigo Rocha Silva
 
UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...
UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...
UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...UKSG: connecting the knowledge community
 
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...Lixi Conrads
 
Identity & Access Management Briefing
Identity & Access Management BriefingIdentity & Access Management Briefing
Identity & Access Management BriefingCharise Arrowood
 
Multiple Sequence Alignment Tool Using NCBI COBALT
Multiple Sequence Alignment Tool Using NCBI COBALTMultiple Sequence Alignment Tool Using NCBI COBALT
Multiple Sequence Alignment Tool Using NCBI COBALTMohsin Raza
 
From Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation
From Cardinal(ity) Sins to Cost-Efficient Metrics AggregationFrom Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation
From Cardinal(ity) Sins to Cost-Efficient Metrics AggregationPaige Cruz
 
A solution to align corporate reporting frameworks: The case of GRI and CDP
A solution to align corporate reporting frameworks: The case of GRI and CDPA solution to align corporate reporting frameworks: The case of GRI and CDP
A solution to align corporate reporting frameworks: The case of GRI and CDPMaria Mora
 
Investigating Geographic Information System Technologies A Global Positioning...
Investigating Geographic Information System Technologies A Global Positioning...Investigating Geographic Information System Technologies A Global Positioning...
Investigating Geographic Information System Technologies A Global Positioning...Simon Sweeney
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsMaribel Acosta Deibe
 
Why Distributed Tracing is Essential for Performance and Reliability
Why Distributed Tracing is Essential for Performance and ReliabilityWhy Distributed Tracing is Essential for Performance and Reliability
Why Distributed Tracing is Essential for Performance and ReliabilityDevOps.com
 
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug TriageSurvey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug TriageIRJET Journal
 
Sql portfolio admin_practicals
Sql portfolio admin_practicalsSql portfolio admin_practicals
Sql portfolio admin_practicalsShelli Ciaschini
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Universidade de São Paulo
 
IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...
IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...
IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...IRJET Journal
 
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based ClassificationImpact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based ClassificationCSCJournals
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsPooyan Jamshidi
 
SDPM in Brazil - 2007
SDPM in Brazil - 2007SDPM in Brazil - 2007
SDPM in Brazil - 2007Peter Mello
 
Susanne Bartel | Breaking the Plateau with KMM
Susanne Bartel | Breaking the Plateau with KMM Susanne Bartel | Breaking the Plateau with KMM
Susanne Bartel | Breaking the Plateau with KMM Kanban Conferences
 

Semelhante a A Framework for Online Clustering Based on Evolving Semi-supervision (20)

A Hybrid Memory Data Cube Approach for High Dimension Relations
A Hybrid Memory Data Cube Approach for High Dimension RelationsA Hybrid Memory Data Cube Approach for High Dimension Relations
A Hybrid Memory Data Cube Approach for High Dimension Relations
 
UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...
UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...
UKSG Conference 2015 - E-resources: ezPAARSE helps you discover who is readin...
 
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
 
Identity & Access Management Briefing
Identity & Access Management BriefingIdentity & Access Management Briefing
Identity & Access Management Briefing
 
Multiple Sequence Alignment Tool Using NCBI COBALT
Multiple Sequence Alignment Tool Using NCBI COBALTMultiple Sequence Alignment Tool Using NCBI COBALT
Multiple Sequence Alignment Tool Using NCBI COBALT
 
From Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation
From Cardinal(ity) Sins to Cost-Efficient Metrics AggregationFrom Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation
From Cardinal(ity) Sins to Cost-Efficient Metrics Aggregation
 
A solution to align corporate reporting frameworks: The case of GRI and CDP
A solution to align corporate reporting frameworks: The case of GRI and CDPA solution to align corporate reporting frameworks: The case of GRI and CDP
A solution to align corporate reporting frameworks: The case of GRI and CDP
 
Investigating Geographic Information System Technologies A Global Positioning...
Investigating Geographic Information System Technologies A Global Positioning...Investigating Geographic Information System Technologies A Global Positioning...
Investigating Geographic Information System Technologies A Global Positioning...
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 
Why Distributed Tracing is Essential for Performance and Reliability
Why Distributed Tracing is Essential for Performance and ReliabilityWhy Distributed Tracing is Essential for Performance and Reliability
Why Distributed Tracing is Essential for Performance and Reliability
 
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug TriageSurvey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
 
Sql portfolio admin_practicals
Sql portfolio admin_practicalsSql portfolio admin_practicals
Sql portfolio admin_practicals
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
 
IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...
IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...
IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based ClassificationImpact of Asymmetry of Internet Traffic for Heuristic Based Classification
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
 
Learning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain EnvironmentsLearning Software Performance Models for Dynamic and Uncertain Environments
Learning Software Performance Models for Dynamic and Uncertain Environments
 
SDPM in Brazil - 2007
SDPM in Brazil - 2007SDPM in Brazil - 2007
SDPM in Brazil - 2007
 
Susanne Bartel | Breaking the Plateau with KMM
Susanne Bartel | Breaking the Plateau with KMM Susanne Bartel | Breaking the Plateau with KMM
Susanne Bartel | Breaking the Plateau with KMM
 

Último

Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsDobusch Leonhard
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptAmirRaziq1
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clonechaudhary charan shingh university
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 

Último (20)

Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
PLASMODIUM. PPTX
PLASMODIUM. PPTXPLASMODIUM. PPTX
PLASMODIUM. PPTX
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and Pitfalls
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
Immunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.pptImmunoblott technique for protein detection.ppt
Immunoblott technique for protein detection.ppt
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clone
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 

A Framework for Online Clustering Based on Evolving Semi-supervision

  • 1. A Framework for Online Clustering Based on Evolving Semi-Supervision Guilherme Alves, Maria Camila N. Barioni and Elaine Faria Universidade Federal de Uberlândia
  • 2. Outline ■ Introduction ■ The CABESS Framework ■ Pointwise CABESS ■ Experiments and Results ■ Conclusion 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 2
  • 3. Introduction Motivation ■ The desired organization for the data may change over time. Semi-supervised approaches may be useful for guiding clustering algorithms in the adaptation process. ■ Additional information (semi-supervision) may change over time. ■ It may causes clustering transitions: birth, split or merge. 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 3
  • 4. Introduction Objective and Research Questions Goal: to provide a framework that is able to use and maintain semi-supervision correctly to enable efficient and effective online clustering processes. Q1. How accurately does semi-supervision aid on clustering effectiveness when there are external clustering transitions over time? 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 4
  • 5. Introduction Objective and Research Questions Goal: to provide a framework that is able to use and maintain semi-supervision correctly to enable efficient and effective online clustering processes. Q1. How accurately does semi-supervision aid on clustering effectiveness when there are external clustering transitions over time? Q2. Are there major differences between clustering effectiveness when using semi-supervised clustering approaches based on feedback and labels? 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 5
  • 6. Introduction Objective and Research Questions Goal: to provide a framework that is able to use and maintain semi-supervision correctly to enable efficient and effective online clustering processes. Q1. How accurately does semi-supervision aid on clustering effectiveness when there are external clustering transitions over time? Q2. Are there major differences between clustering effectiveness when using semi-supervised clustering approaches based on feedback and labels? Q3. How the feedback window size variation affects semi-supervision information and clustering effectiveness? 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 6
  • 7. Introduction Objective and Research Questions Goal: to provide a framework that is able to use and maintain semi-supervision correctly to enable efficient and effective online clustering processes. Q1. How accurately does semi-supervision aid on clustering effectiveness when there are external clustering transitions over time? Q2. Are there major differences between clustering effectiveness when using semi-supervised clustering approaches based on feedback and labels? Q3. How the feedback window size variation affects semi-supervision information and clustering effectiveness? Q4. How efficient is our approach compared to existing semi-supervised approaches? 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 7
  • 8. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 8 𝑡0 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 III.Groundtruth Time
  • 9. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 9 𝑡0 𝑡 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 2 2 3 33 4 4 4 4 2 2 4 4 III.Groundtruth Time
  • 10. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 10 𝑡0 𝑡 𝑡 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 2 2 3 33 4 4 4 4 2 2 4 4 3 3 2 2 2 3 2 4 4 4 4 4 4 2 2 4 3 4 4 2 2 2 3 4 4 3 2 2233 3 2 2 3 III.Groundtruth Time
  • 11. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 11 𝑡0 𝑡 𝑡 𝑡 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 2 2 3 33 4 4 4 4 2 2 4 4 3 3 2 2 2 3 2 4 4 4 4 4 4 2 2 4 3 4 4 2 2 2 3 4 4 3 2 2233 3 2 2 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 4 3 4 4 4 2 23 4 4 3 2 2 4 22 2233 3 3 III.Groundtruth Time
  • 12. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 12 𝑡0 1 2 1 1 2 2 2 1 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 II.PartitionSetIII.Groundtruth Time
  • 13. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 13 𝑡0 𝑡 1 2 1 1 2 2 2 1 1 2 1 2 2 23 3 41 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 2 2 3 33 4 4 4 4 2 2 4 4 II.PartitionSetIII.Groundtruth Time
  • 14. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 14 𝑡0 𝑡 𝑡 DivisãoSplit: → { , } 1 2 1 1 2 2 2 1 1 2 1 2 2 23 3 41 2 4 4 2 4 3 1 2 2 23 3 2 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 2 2 3 33 4 4 4 4 2 2 4 4 3 3 2 2 2 3 2 4 4 4 4 4 4 2 2 4 3 4 4 2 2 2 3 4 4 3 2 2233 3 2 2 3 II.PartitionSetIII.Groundtruth Time I.Clustering transition
  • 15. Introduction Example 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 15 𝑡0 𝑡 𝑡 𝑡 DivisãoSplit: → { , } 1 2 1 1 2 2 2 1 1 2 1 2 2 23 3 41 2 4 4 2 4 3 1 2 2 23 3 2 3 2 4 4 2 4 3 4 2 2 2 23 3 1 1 1 1 2 2 2 1 2 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 2 2 3 33 4 4 4 4 2 2 4 4 3 3 2 2 2 3 2 4 4 4 4 4 4 2 2 4 3 4 4 2 2 2 3 4 4 3 2 2233 3 2 2 3 3 3 2 2 2 3 2 4 4 4 4 4 2 2 2 4 3 4 4 4 2 23 4 4 3 2 2 4 22 2233 3 3 II.PartitionSetIII.Groundtruth Time I.Clustering transition
  • 16. Introduction Main Contributions ■ The introduction of CABESS (Cluster Adaptation Based on Evolving Semi-Supervision) ■ A framework for online clustering using semi-supervision in the form of feedback. ■ A strategy that extracts semi-supervision information from feedback given in the form of labels. ■ An approach to keep the labels consistent over time. 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 16
  • 17. The CABESS Framework 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 17 AF Summarization 1 Semi-supervised Clustering 4 Clustering 2 ℱ Semi-Supervision Deduction 3 B Transition detection 5 𝒟 CABESS 𝑡 ℛ 𝑡 , 𝒮 𝑡 ℱ 𝑡 𝒟 𝑡 T Partition Set Storage 𝑡 𝑡− 𝑡 F T 𝑡− 𝒮 𝑡 A B Verify if new instances have been generated or the user has been satisfied with the cluster quality. Verify if it is the first clustering performed.
  • 18. The Framework CABESS Pointwise CABESS 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 18 AF BIRCH [Zhang et al.1996] 1 SSDBScan [Lelis and Sander 2009] 4 DBScan [Ester et al.1996] 2 ℱ Semi-Supervision Deduction 3 B MONIC [Spiliopoulou et al. 2006] 5 𝒟 CABESS 𝑡 ℛ 𝑡 , 𝒮 𝑡 ℱ 𝑡 𝒟 𝑡 T Partition Set Storage 𝑡 𝑡− 𝑡 F T 𝑡− 𝒮 𝑡
  • 19. The CABESS Framework » Pointwise CABESS Extracting labels from feedbacks 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 19 1. Feedbacks 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 Instance-level labels ✓ ✓ ✓ ✓ ✓ ✓ ✓ (a)
  • 20. The CABESS Framework » Pointwise CABESS Extracting labels from feedbacks 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 20 1. Feedbacks 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 Instance-level labels ■ A neighborhood 𝑁 is defined as a set of instances that must be in the same cluster. ✓ ✓ ✓ ✓ ✓ ✓ ✓ (a) (b)
  • 21. The CABESS Framework » Pointwise CABESS Extracting labels from feedbacks 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 21 1. Feedbacks 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 Instance-level labels ■ A neighborhood 𝑁 is defined as a set of instances that must be in the same cluster. ■ Same previous cluster AND received positive feedback → Same label ■ Received displacement feedback → assign to label of the neighborhood of the destination rule. ✓ ✓ ✓ ✓ ✓ ✓ ✓ 1 1 2 1 2 2 2 2 2 (a) (b) (c)
  • 22. The CABESS Framework » Pointwise CABESS Extracting labels from feedbacks 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 22 2. Instance-level labels 𝑑𝑒𝑑𝑢𝑐𝑡𝑖𝑜𝑛 Summarized-level labels ■ Performed as a propagation task ■ If one of the summarized instances has labeled instances with different labels ■ Split it to obtain purified summarized instances ■ summarized instances that contain only the same label in labeled instances. 1 1 2 1 2 2 2 2 2 1 1 2 1 2 2 2 2 2 1 1 2 2 2 2 (c) (d)
  • 23. The CABESS Framework » Pointwise CABESS Dealing with obsolete labels 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 23 ■ Obsolete labels: labels assigned to instances, for which the clusters do not exist anymore. ■ Minimizing the problem: adoption of a detector of transitions. ■ Better neighborhood management. ■ when a cluster survives both neighborhood and associated labels are preserved. 𝑠𝑝𝑙𝑖𝑡 does not exist at 𝒕 → label 2 becomes obsolete 𝑡 𝑡
  • 24. Experiments Datasets 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 24 Name # instances d # classes Reference Type DB7 9,050 2 8 [Silva et al. 2015] SyntheticSYN3 5,000 2 3 streamMOA SYN4 10,000 3 5 streamMOA FROGS 1,484 8 4 [Colonna et al. 2016] RealIPEA 5,564 5 27 IPEA KDD’995 24,692 19 11 UCI
  • 25. Experiments Comparison Methods 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 25 ■ Unsupervised (DBScan) ■ Consists in periodically executing a clustering algorithm without any semi-supervision. ■ Semi-supervised (SSDBScan) ■ Static Approach. ■ Consists in periodically applying a semi-supervised clustering algorithm. ■ This approach does not discard any label over time. ■ Window-based Approach. ■ It is a variation of the previous approach. ■ Instead of executing the clustering algorithm over all the label set, we remove old labels.
  • 26. Experiments Protocol and Evaluation 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 26 ■ Adjusted Rand Index (ARI) ■ Prequential Protocol ■ For each timestamp 𝑡 only one label is considered valid for a data instance according to the grouping tree. ■ Online arrival of instances and feedback. ■ The arrival of the data instances and the user feedback are given according to the uniform distribution.
  • 27. Experimental Results Semi-supervised vs. Unsupervised Q1. How accurately does semi-supervision aid on clustering effectiveness when there are external clustering transitions over time? 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 27
  • 28. Experimental Results Semi-supervised vs. Unsupervised Q1. How accurately does semi-supervision aid on clustering effectiveness when there are external clustering transitions over time? 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 28
  • 29. Experimental Results Feedbacks vs. Labels 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 29 Q2. Are there major differences between clustering effectiveness when using semi-supervised clustering approaches based on feedback and labels?
  • 30. Experimental Results Feedbacks vs. Labels 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 30 Q2. Are there major differences between clustering effectiveness when using semi-supervised clustering approaches based on feedback and labels?
  • 31. Experimental Results Feedback Window Size Variation 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 31 Q3. How the feedback window size variation affects semi-supervision information and clustering effectiveness?
  • 32. Experimental Results Efficiency 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 32 Q4. How efficient is our approach compared to existing semi-supervised approaches?
  • 33. Conclusion ■ Higher efficiency when compared to other semi-supervised approaches ■ While keeping an equivalent effectiveness. ■ Future works: ■ Exploring other types of semi-supervision information such as instance-level constraints. ■ Tackling other strategies for detecting transitions. 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 33
  • 34. References Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226–231. AAAI Press Colonna, J. G., Gama, J., and Nakamura, E. F. (2016). Recognizing Family, Genus, and Species of Anuran Using a Hierarchical Classification Approach. pages 198–212. Springer, Cham. Lai, H. P., Visani, M., Boucher, A., and Ogier, J.-M. (2014). A new interactive semi- supervised clustering model for large image database indexing. Pattern Recognition Letters, 37(1):94–106. Lelis, L. and Sander, J. (2009). Semi-supervised Density-Based Clustering. In IEEE ICDM, pages 842–847. Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., and Schult, R. (2006). MONIC: Modeling and Monitoring Cluster Transitions. In ACM SIGKDD, page 706, New York, NY, USA. ACM Press. Zhang, T., Ramakrishnan, R., and Livny, M. (1996). BIRCH: An Efficient Data Clustering Method for very Large Databases. ACM SIGMOD Record, 25(2):103–114. 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 34
  • 35. A Framework for Online Clustering Based on Evolving Semi-Supervision Guilherme Alves guilhermealves@ufu.br Maria Camila N. Barioni camila.barioni@ufu.br Elaine Faria elaine@ufu.br 6-Oct-17 32nd Brazilian Symposium on Databases (SBBD 2017) 35 Acknowledgments