SlideShare uma empresa Scribd logo
1 de 28
Exploring Data A preliminary exploration of the data to better understand its characteristics Key motivations of data exploration include Helping to select the right tool for preprocessing or analysis Making use of humans’ abilities to recognize pattern  People can recognize patterns not captured by data analysis tools  Related to the area of Exploratory Data Analysis (EDA) Created by statistician John Tukey
Contd… In EDA, as originally defined by Tukey The focus was on visualization Clustering and anomaly detection were viewed as exploratory techniques In data mining, clustering and anomaly detection are major areas of interest, and not thought of as just exploratory In our discussion of data exploration, we focus on Summary statistics Visualization Online Analytical Processing (OLAP)
Iris Sample Data Set  Many of the exploratory data techniques are illustrated with the Iris Plant data set. Three flower types (classes): Setosa Virginica Versicolour Four (non-class) attributes  Sepal width and length  Petal width and length
Summary Statistics Summary statistics  are numbers that summarize properties of the data Summarized properties include frequency, location and spread  Examples: 	location - mean                   	spread - standard deviation
Frequency and Mode The frequency of an attribute value is the percentage of time the value occurs in the data set  For example, given the attribute ‘gender’ and a representative population of people, the gender ‘female’ occurs about 50% of the time. The mode of a an attribute is the most frequent attribute value    The notions of frequency and mode are typically used with categorical data
Percentiles For continuous data, the notion of a percentile is more useful.  Given an ordinal or continuous attribute x and a number p between 0 and 100, the pth percentile is a value     of x such that p% of the observed values of x are less than    .  For instance, the 50th percentile is the value      such that 50% of all values of x are less than
Measures of Location: Mean and Median The mean is the most common measure of the location of a set of points.   However, the mean is very sensitive to outliers.    Thus, the median or a trimmed mean is also commonly used
Measures of Spread: Range and Variance Range is the difference between the max and min The variance or standard deviation is the most common measure of the spread of a set of points
Visualization Visualization is the conversion of data into a visual or tabular format so that the characteristics of the data and the relationships among data items or attributes can be analyzed or reported. Humans have a well developed ability to analyze large amounts of information that is presented visually Can detect general patterns and trends Can detect outliers and unusual patterns
Visualization techniques-Histogram Histogram  Usually shows the distribution of values of a single variable Divide the values into bins and show a bar plot of the number of objects in each bin.  The height of each bar indicates the number of objects Shape of histogram depends on the number of bins
Contd…. Ex: petal width 10 bins
Visualization Techniques: Box Plots Invented by J. Tukey Another way of displaying the distribution of data  Following figure shows the basic part of a box plot outlier 75th percentile 50th percentile 25th percentile 10th percentile 10th percentile
Visualization Techniques: Scatter Plots Scatter plots  Attributes values determine the position Two-dimensional scatter plots most common, but can have three-dimensional scatter plots Often additional attributes can be displayed by using the size, shape, and color of the markers that represent the objects  It is useful to have arrays of scatter plots can compactly summarize the relationships of several pairs of attributes
Contd…
Visualization Techniques: Contour Plots Contour plots  Useful when a continuous attribute is measured on a spatial grid They partition the plane into regions of similar values The contour lines that form the boundaries of these regions connect points with equal values	 The most common example is contour maps of elevation
Contour Plot Example: SST Dec, 1998
Visualization Techniques: Matrix Plots Matrix plots  Can plot the data matrix This can be useful when objects are sorted according to class Typically, the attributes are normalized to prevent one attribute from dominating the plot	 Plots of similarity or distance matrices can also be useful for visualizing the relationships between objects Examples of matrix plots are p
Visualization of the Iris Data Matrix
Visualization Techniques: Parallel Coordinates Parallel Coordinates  Used to plot the attribute values of high-dimensional data Instead of using perpendicular axes, use a set of parallel axes  The attribute values of each object are plotted as a point on each corresponding coordinate axis and the points are connected by a line	 Thus, each object is represented as a line
Other Visualization Techniques Star Plots  Similar approach to parallel coordinates, but axes radiate from a central point The line connecting the values of an object is a polygon
Contd… Chernoff Faces Approach created by Herman Chernoff This approach associates each attribute with a characteristic of a face The values of each attribute determine the appearance of the corresponding facial characteristic	 Each object becomes a separate face Relies on human’s ability to distinguish faces
OLAP On-Line Analytical Processing (OLAP) was proposed by E. F. Codd, the father of the relational database. Relational databases put data into tables, while OLAP uses a multidimensional array representation.  Such representations of data previously existed in statistics and other fields There are a number of data analysis and data exploration operations that are easier with such a data representation.
OLAP Operations: Data Cube The key operation of a OLAP is the formation of a data cube A data cube is a multidimensional representation of data, together with all possible aggregates. By all possible aggregates, we mean the aggregates that result by selecting a proper subset of the dimensions and summing over all remaining dimensions. For example, if we choose the species type dimension of the Iris data and sum over all other dimensions, the result will be a one-dimensional entry with three entries, each of which gives the number of flowers of each type.
OLAP Operations: Slicing and Dicing Slicing is selecting a group of cells from the entire multidimensional array by specifying a specific value for one or more dimensions.  Dicing involves selecting a subset of cells by specifying a range of attribute values.  This is equivalent to defining a subarray from the complete array.  In practice, both operations can also be accompanied by aggregation over some dimensions.
OLAP Operations: Roll-up and Drill-down Attribute values often have a hierarchical structure. Each date is associated with a year, month, and week. A location is associated with a continent, country, state (province, etc.), and city.  Products can be divided into various categories, such as clothing, electronics, and furniture.
Contd… Note that these categories often nest and form a tree or lattice A year contains months which contains day A country contains a state which contains a city
Contd… This hierarchical structure gives rise to the roll-up and drill-down operations. For sales data, we can aggregate (roll up) the sales across all the dates in a month.  Conversely, given a view of the data where the time dimension is broken into months, we could split the monthly sales totals (drill down) into daily sales totals. Likewise, we can drill down or roll up on the location or product ID attributes.
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net

Mais conteúdo relacionado

Mais procurados

Basics of Educational Statistics (Graphs & its Types)
Basics of Educational Statistics (Graphs & its Types)Basics of Educational Statistics (Graphs & its Types)
Basics of Educational Statistics (Graphs & its Types)HennaAnsari
 
Areas In Statistics
Areas In StatisticsAreas In Statistics
Areas In Statisticsguestc94d8c
 
Types of data and graphical representation
Types of data and graphical representationTypes of data and graphical representation
Types of data and graphical representationReena Titoria
 
Data Presentation
Data PresentationData Presentation
Data Presentationcheergalsal
 
2.1-2.2 Organizing Data
2.1-2.2 Organizing Data2.1-2.2 Organizing Data
2.1-2.2 Organizing Datamlong24
 
Understanding the graphical representation of data in research
Understanding the graphical representation of data in researchUnderstanding the graphical representation of data in research
Understanding the graphical representation of data in researchDrShalooSaini
 
Numerical & graphical presentation of data
Numerical & graphical presentation of dataNumerical & graphical presentation of data
Numerical & graphical presentation of dataSarfraz Ahmad
 
Introduction to Statistics - Basic concepts
Introduction to Statistics - Basic conceptsIntroduction to Statistics - Basic concepts
Introduction to Statistics - Basic conceptsDocIbrahimAbdelmonaem
 
The uses of Tables & graphs
The uses of Tables & graphsThe uses of Tables & graphs
The uses of Tables & graphsFranco Jesús
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsSarfraz Ahmad
 
Diowane2003
Diowane2003Diowane2003
Diowane2003SFYC
 
LINE AND SCATTER DIAGRAM,FREQUENCY DISTRIBUTION
LINE AND SCATTER DIAGRAM,FREQUENCY DISTRIBUTIONLINE AND SCATTER DIAGRAM,FREQUENCY DISTRIBUTION
LINE AND SCATTER DIAGRAM,FREQUENCY DISTRIBUTIONruhila bhat
 
Statistic and probability 2
Statistic and probability 2Statistic and probability 2
Statistic and probability 2Irfan Yaqoob
 

Mais procurados (20)

Organizing data
Organizing dataOrganizing data
Organizing data
 
Histogram
HistogramHistogram
Histogram
 
Basics of Educational Statistics (Graphs & its Types)
Basics of Educational Statistics (Graphs & its Types)Basics of Educational Statistics (Graphs & its Types)
Basics of Educational Statistics (Graphs & its Types)
 
Areas In Statistics
Areas In StatisticsAreas In Statistics
Areas In Statistics
 
Types of data and graphical representation
Types of data and graphical representationTypes of data and graphical representation
Types of data and graphical representation
 
Day 3 descriptive statistics
Day 3  descriptive statisticsDay 3  descriptive statistics
Day 3 descriptive statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Data Presentation
Data PresentationData Presentation
Data Presentation
 
2.1-2.2 Organizing Data
2.1-2.2 Organizing Data2.1-2.2 Organizing Data
2.1-2.2 Organizing Data
 
Understanding the graphical representation of data in research
Understanding the graphical representation of data in researchUnderstanding the graphical representation of data in research
Understanding the graphical representation of data in research
 
Numerical & graphical presentation of data
Numerical & graphical presentation of dataNumerical & graphical presentation of data
Numerical & graphical presentation of data
 
Introduction to Statistics - Basic concepts
Introduction to Statistics - Basic conceptsIntroduction to Statistics - Basic concepts
Introduction to Statistics - Basic concepts
 
Histogram
HistogramHistogram
Histogram
 
Presentation of data
Presentation of dataPresentation of data
Presentation of data
 
The uses of Tables & graphs
The uses of Tables & graphsThe uses of Tables & graphs
The uses of Tables & graphs
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Diowane2003
Diowane2003Diowane2003
Diowane2003
 
LINE AND SCATTER DIAGRAM,FREQUENCY DISTRIBUTION
LINE AND SCATTER DIAGRAM,FREQUENCY DISTRIBUTIONLINE AND SCATTER DIAGRAM,FREQUENCY DISTRIBUTION
LINE AND SCATTER DIAGRAM,FREQUENCY DISTRIBUTION
 
Statistical table
Statistical tableStatistical table
Statistical table
 
Statistic and probability 2
Statistic and probability 2Statistic and probability 2
Statistic and probability 2
 

Destaque (20)

Data input and transformation
Data input and transformationData input and transformation
Data input and transformation
 
XL-MINER: Associations
XL-MINER: AssociationsXL-MINER: Associations
XL-MINER: Associations
 
XL-MINER: Data Exploration
XL-MINER: Data ExplorationXL-MINER: Data Exploration
XL-MINER: Data Exploration
 
Sampling Distributions
Sampling DistributionsSampling Distributions
Sampling Distributions
 
Xlminer demo
Xlminer demoXlminer demo
Xlminer demo
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Exploratory data analysis v1.0
Exploratory data analysis v1.0Exploratory data analysis v1.0
Exploratory data analysis v1.0
 
Exploratory data analysis coursera
Exploratory data analysis courseraExploratory data analysis coursera
Exploratory data analysis coursera
 
XL-MINER:Prediction
XL-MINER:PredictionXL-MINER:Prediction
XL-MINER:Prediction
 
Gis
GisGis
Gis
 
Spatial analysis and Analysis Tools ( GIS )
Spatial analysis and Analysis Tools ( GIS )Spatial analysis and Analysis Tools ( GIS )
Spatial analysis and Analysis Tools ( GIS )
 
Spatial interpolation techniques
Spatial interpolation techniquesSpatial interpolation techniques
Spatial interpolation techniques
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
 
Spatial analysis and modeling
Spatial analysis and modelingSpatial analysis and modeling
Spatial analysis and modeling
 
Powerpoint paragraaf 5.3/5.4
Powerpoint paragraaf 5.3/5.4 Powerpoint paragraaf 5.3/5.4
Powerpoint paragraaf 5.3/5.4
 
Data
DataData
Data
 
Miedo Jajjjajajja
Miedo JajjjajajjaMiedo Jajjjajajja
Miedo Jajjjajajja
 
Quick Look At Clustering
Quick Look At ClusteringQuick Look At Clustering
Quick Look At Clustering
 
Oracle: DML
Oracle: DMLOracle: DML
Oracle: DML
 
Classification
ClassificationClassification
Classification
 

Semelhante a Exploring Data

Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3OllieShoresna
 
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdfGraphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdfHimakshi7
 
Artificial Intelligence - Data Analysis, Creative & Critical Thinking and AI...
Artificial Intelligence - Data Analysis, Creative & Critical Thinking and  AI...Artificial Intelligence - Data Analysis, Creative & Critical Thinking and  AI...
Artificial Intelligence - Data Analysis, Creative & Critical Thinking and AI...deboshreechatterjee2
 
UNIT-4.docx
UNIT-4.docxUNIT-4.docx
UNIT-4.docxscet315
 
Exploring Data (1).pptx
Exploring Data (1).pptxExploring Data (1).pptx
Exploring Data (1).pptxgina458018
 
Statistics Class 10 CBSE
Statistics Class 10 CBSE Statistics Class 10 CBSE
Statistics Class 10 CBSE Smitha Sumod
 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research ReportDrMAlagupriyasafiq
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data ProcessingDrMAlagupriyasafiq
 
Exploratory Data Analysis.pptx for Data Analytics
Exploratory Data Analysis.pptx for Data AnalyticsExploratory Data Analysis.pptx for Data Analytics
Exploratory Data Analysis.pptx for Data Analyticsharshrnotaria
 
Wynberg girls high-Jade Gibson-maths-data analysis statistics
Wynberg girls high-Jade Gibson-maths-data analysis statisticsWynberg girls high-Jade Gibson-maths-data analysis statistics
Wynberg girls high-Jade Gibson-maths-data analysis statisticsWynberg Girls High
 
Visualizations in Exploratory Data Analysis
Visualizations in Exploratory Data AnalysisVisualizations in Exploratory Data Analysis
Visualizations in Exploratory Data AnalysisOluwatobiAdefami
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxJANNU VINAY
 
DATA VISUALIZATION.pptx
DATA VISUALIZATION.pptxDATA VISUALIZATION.pptx
DATA VISUALIZATION.pptxPraneethBhai1
 

Semelhante a Exploring Data (20)

Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3Data Mining Exploring DataLecture Notes for Chapter 3
Data Mining Exploring DataLecture Notes for Chapter 3
 
17329274.ppt
17329274.ppt17329274.ppt
17329274.ppt
 
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdfGraphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
Graphicalrepresntationofdatausingstatisticaltools2019_210902_105156.pdf
 
Artificial Intelligence - Data Analysis, Creative & Critical Thinking and AI...
Artificial Intelligence - Data Analysis, Creative & Critical Thinking and  AI...Artificial Intelligence - Data Analysis, Creative & Critical Thinking and  AI...
Artificial Intelligence - Data Analysis, Creative & Critical Thinking and AI...
 
UNIT-4.docx
UNIT-4.docxUNIT-4.docx
UNIT-4.docx
 
Exploring Data (1).pptx
Exploring Data (1).pptxExploring Data (1).pptx
Exploring Data (1).pptx
 
Statistics Class 10 CBSE
Statistics Class 10 CBSE Statistics Class 10 CBSE
Statistics Class 10 CBSE
 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research Report
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data Processing
 
Exploratory Data Analysis.pptx for Data Analytics
Exploratory Data Analysis.pptx for Data AnalyticsExploratory Data Analysis.pptx for Data Analytics
Exploratory Data Analysis.pptx for Data Analytics
 
Wynberg girls high-Jade Gibson-maths-data analysis statistics
Wynberg girls high-Jade Gibson-maths-data analysis statisticsWynberg girls high-Jade Gibson-maths-data analysis statistics
Wynberg girls high-Jade Gibson-maths-data analysis statistics
 
Edited economic statistics note
Edited economic statistics noteEdited economic statistics note
Edited economic statistics note
 
Visualizations in Exploratory Data Analysis
Visualizations in Exploratory Data AnalysisVisualizations in Exploratory Data Analysis
Visualizations in Exploratory Data Analysis
 
Introduction to Descriptive Statistics
Introduction to Descriptive StatisticsIntroduction to Descriptive Statistics
Introduction to Descriptive Statistics
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
 
QQ Plot.pptx
QQ Plot.pptxQQ Plot.pptx
QQ Plot.pptx
 
Bba 2001
Bba 2001Bba 2001
Bba 2001
 
Survey data & sampling
Survey data & samplingSurvey data & sampling
Survey data & sampling
 
DATA VISUALIZATION.pptx
DATA VISUALIZATION.pptxDATA VISUALIZATION.pptx
DATA VISUALIZATION.pptx
 
Stat-Lesson.pptx
Stat-Lesson.pptxStat-Lesson.pptx
Stat-Lesson.pptx
 

Mais de DataminingTools Inc

AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceDataminingTools Inc
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web miningDataminingTools Inc
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDataminingTools Inc
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDataminingTools Inc
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDataminingTools Inc
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technologyDataminingTools Inc
 

Mais de DataminingTools Inc (20)

Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Exploring Data

  • 1. Exploring Data A preliminary exploration of the data to better understand its characteristics Key motivations of data exploration include Helping to select the right tool for preprocessing or analysis Making use of humans’ abilities to recognize pattern People can recognize patterns not captured by data analysis tools Related to the area of Exploratory Data Analysis (EDA) Created by statistician John Tukey
  • 2. Contd… In EDA, as originally defined by Tukey The focus was on visualization Clustering and anomaly detection were viewed as exploratory techniques In data mining, clustering and anomaly detection are major areas of interest, and not thought of as just exploratory In our discussion of data exploration, we focus on Summary statistics Visualization Online Analytical Processing (OLAP)
  • 3. Iris Sample Data Set Many of the exploratory data techniques are illustrated with the Iris Plant data set. Three flower types (classes): Setosa Virginica Versicolour Four (non-class) attributes Sepal width and length Petal width and length
  • 4. Summary Statistics Summary statistics are numbers that summarize properties of the data Summarized properties include frequency, location and spread Examples: location - mean spread - standard deviation
  • 5. Frequency and Mode The frequency of an attribute value is the percentage of time the value occurs in the data set For example, given the attribute ‘gender’ and a representative population of people, the gender ‘female’ occurs about 50% of the time. The mode of a an attribute is the most frequent attribute value The notions of frequency and mode are typically used with categorical data
  • 6. Percentiles For continuous data, the notion of a percentile is more useful. Given an ordinal or continuous attribute x and a number p between 0 and 100, the pth percentile is a value of x such that p% of the observed values of x are less than . For instance, the 50th percentile is the value such that 50% of all values of x are less than
  • 7. Measures of Location: Mean and Median The mean is the most common measure of the location of a set of points. However, the mean is very sensitive to outliers. Thus, the median or a trimmed mean is also commonly used
  • 8. Measures of Spread: Range and Variance Range is the difference between the max and min The variance or standard deviation is the most common measure of the spread of a set of points
  • 9. Visualization Visualization is the conversion of data into a visual or tabular format so that the characteristics of the data and the relationships among data items or attributes can be analyzed or reported. Humans have a well developed ability to analyze large amounts of information that is presented visually Can detect general patterns and trends Can detect outliers and unusual patterns
  • 10. Visualization techniques-Histogram Histogram Usually shows the distribution of values of a single variable Divide the values into bins and show a bar plot of the number of objects in each bin. The height of each bar indicates the number of objects Shape of histogram depends on the number of bins
  • 11. Contd…. Ex: petal width 10 bins
  • 12. Visualization Techniques: Box Plots Invented by J. Tukey Another way of displaying the distribution of data Following figure shows the basic part of a box plot outlier 75th percentile 50th percentile 25th percentile 10th percentile 10th percentile
  • 13. Visualization Techniques: Scatter Plots Scatter plots Attributes values determine the position Two-dimensional scatter plots most common, but can have three-dimensional scatter plots Often additional attributes can be displayed by using the size, shape, and color of the markers that represent the objects It is useful to have arrays of scatter plots can compactly summarize the relationships of several pairs of attributes
  • 15. Visualization Techniques: Contour Plots Contour plots Useful when a continuous attribute is measured on a spatial grid They partition the plane into regions of similar values The contour lines that form the boundaries of these regions connect points with equal values The most common example is contour maps of elevation
  • 16. Contour Plot Example: SST Dec, 1998
  • 17. Visualization Techniques: Matrix Plots Matrix plots Can plot the data matrix This can be useful when objects are sorted according to class Typically, the attributes are normalized to prevent one attribute from dominating the plot Plots of similarity or distance matrices can also be useful for visualizing the relationships between objects Examples of matrix plots are p
  • 18. Visualization of the Iris Data Matrix
  • 19. Visualization Techniques: Parallel Coordinates Parallel Coordinates Used to plot the attribute values of high-dimensional data Instead of using perpendicular axes, use a set of parallel axes The attribute values of each object are plotted as a point on each corresponding coordinate axis and the points are connected by a line Thus, each object is represented as a line
  • 20. Other Visualization Techniques Star Plots Similar approach to parallel coordinates, but axes radiate from a central point The line connecting the values of an object is a polygon
  • 21. Contd… Chernoff Faces Approach created by Herman Chernoff This approach associates each attribute with a characteristic of a face The values of each attribute determine the appearance of the corresponding facial characteristic Each object becomes a separate face Relies on human’s ability to distinguish faces
  • 22. OLAP On-Line Analytical Processing (OLAP) was proposed by E. F. Codd, the father of the relational database. Relational databases put data into tables, while OLAP uses a multidimensional array representation. Such representations of data previously existed in statistics and other fields There are a number of data analysis and data exploration operations that are easier with such a data representation.
  • 23. OLAP Operations: Data Cube The key operation of a OLAP is the formation of a data cube A data cube is a multidimensional representation of data, together with all possible aggregates. By all possible aggregates, we mean the aggregates that result by selecting a proper subset of the dimensions and summing over all remaining dimensions. For example, if we choose the species type dimension of the Iris data and sum over all other dimensions, the result will be a one-dimensional entry with three entries, each of which gives the number of flowers of each type.
  • 24. OLAP Operations: Slicing and Dicing Slicing is selecting a group of cells from the entire multidimensional array by specifying a specific value for one or more dimensions. Dicing involves selecting a subset of cells by specifying a range of attribute values. This is equivalent to defining a subarray from the complete array. In practice, both operations can also be accompanied by aggregation over some dimensions.
  • 25. OLAP Operations: Roll-up and Drill-down Attribute values often have a hierarchical structure. Each date is associated with a year, month, and week. A location is associated with a continent, country, state (province, etc.), and city. Products can be divided into various categories, such as clothing, electronics, and furniture.
  • 26. Contd… Note that these categories often nest and form a tree or lattice A year contains months which contains day A country contains a state which contains a city
  • 27. Contd… This hierarchical structure gives rise to the roll-up and drill-down operations. For sales data, we can aggregate (roll up) the sales across all the dates in a month. Conversely, given a view of the data where the time dimension is broken into months, we could split the monthly sales totals (drill down) into daily sales totals. Likewise, we can drill down or roll up on the location or product ID attributes.
  • 28. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net