Data analytics in computer networking

•Transferir como PPTX, PDF•

3 gostaram•1,171 visualizações

Stenio Fernandes

Applied Exploratory Data Analysis - Computer Networking

Dados e análise

Data Analytics in Computer
Networking
The Case for Exploratory Data Analysis
Stenio Fernandes
Carleton University / CIn-UFPE
March 2016

Outline
Data Analysis - background
EDA basics
Applied EDA (Examples: WiFi simulated data)
Q&A
References

Data Science Pipeline
•Analytic Data
•Analytic Code
•Documentation
•Distribution
•ElementsofReproducibleResearch
Report Writing for Data Science in R, Roger D. Peng, 2016

1. Stating and refining the question
2. Exploring the data
3. Building formal statistical models
4. Interpreting the results
5. Communicating the results
Epicycle of Analysis
The Art of Data Science, A Guide for Anyone Who Works with Data, Roger D. Peng and Elizabeth Matsui, 2016

• summarize the measurements in a
single data set without further
interpretation
•Descriptive
• Searching for discoveries, trends,
correlations, or relationships
between multiple variables to
generate ideas or hypotheses
Exploratory
• quantifying whether an observed
pattern will likely hold beyond the
data set in hand
Inferential
• uses a subset of measurements (the
features) to predict another
measurement (the outcome)
Predictive
• what happens to one measurement
if you make another measurement
change
Causal
• changing one measurement always
and exclusively leads to a specific,
deterministic behavior in another
Deterministic
The Elements of Data
Analytic Style, A guide for
people who want to analyze
data, Jeff Leek, 2015

Why use EDA - Summary
• Maximize insight into a data set
• Uncover underlying structure
• Extract important variables
• Detect outliers and anomalies
• Test underlying assumptions
• Develop parsimonious models
• Determine optimal factor
settings
•NIST
• Show comparisons
• Show causality, mechanism,
explanation
• Show multivariate data
• Integrate multiple modes of
evidence
• Describe and document the
evidence
• Content is king
•JHUniversity

Answer to initial questions
What is a typical value for a certain feature?
What is the uncertainty for a typical value of a
feature?
What is a good distributional fit for a feature?
What is the percentile distribution?
Does modification on one variable have an
effect another variable?
Does a factor have an effect on performance
metrics?
What are the most important factors?
What is the best function for relating a
response variable to other variables?
What are the best settings for factors
(i.e. levels)?
Can we separate signal from noise?
Can we extract any structure from multivariate
data?
Does the data have outliers?

EDA Graphs
Understand data properties Find patterns in data
Suggest modeling strategies Debug analyses

Applied EDA
Using R/ggplot2
(mpg dataset) -> fake wifi dataset

Practical Steps
•Before performing any measurements or simulation
• Identify
• Performance Metrics
• Performance Factors and Levels
• Caution: sometimes you have to guess the ranges for the levels
• Use an educated guess
Don’t run tons of simulations / experiments (As previously discussed)
Plot quick and dirty graphs
• No need for titles, labels

Some examples of EDA Graphs - WiFi Data (simulated)
• “Vendor” - factor / levels: LinkSys, …
• “Model“ – factor / Levels: GST200, …
• "Users_Max_Rate“ - factor (background traffic) /
levels: 1.6, 1.8,…,7.0 Mbps
• "Year“ – factor / Levels: 1999, 2008
• "BER“ – factor / Levels: 4, 5, 6, and 8
• "Type“ – factor (type of user) / Levels: 4, f, r
• Rate – performance metric (Mbps)
• Distance - factor (distance from the AP) / “Levels:
50,100m
Features
(Observation
Variables)

References
• NIST’s Handbook of Statistics Engineering (online)
• Report Writing for Data Science in R, Roger D. Peng, 2016
• The Art of Data Science, A Guide for Anyone Who Works with Data, Roger D.
Peng and Elizabeth Matsui, 2016
• The Elements of Data Analytic Style, A guide for people who want to analyze
data, Jeff Leek, 2015

Mais conteúdo relacionado

Mais procurados

A Systematic Review of Model-Driven SecurityPhu H. Nguyen

Tim Menzies, directions in Data ScienceCS, NcState

KREAM@ICCS2013Jaakko Lappalainen

Query aware determinization of uncertain objectsSoftroniics india

Ph.D Annual Report IIIMatteo Avalle

Machine Learning for Domain ExpertsMehmet Alican Noyan

2017_ResumeJames A Edwards III

Towards Automatic Composition of Multicomponent Predictive SystemsManuel Martín

Ontology based top-k query answering over massive, heterogeneous, and dynamic...Daniele Dell'Aglio

Work Package Presentationlaurensrietveld

Linkedin Executive summaryMichael Simonsen

Model-Driven Run-Time Enforcement of Complex Role-Based Access Control PoliciesLionel Briand

Data Mining vs StatisticsAndry Alamsyah

Data Processing and Data VisualizationRenata Brandão

Lucas_Taylor_Resume_Gen_Su16Luke Taylor

Tommi kramer 2013-06-21-caise-re2-kramercaise2013vlc

Test Tool for Industrial Ethernet Network Performance (June 2009)Jim Gilsinn

Machine learning workshop using Orange datamining frameworkAmr Rashed

Data Science, Machine Learning and Neural NetworksBICA Labs

Approach AI assuranceAviral Srivastava

Mais procurados (20)

A Systematic Review of Model-Driven Security

Tim Menzies, directions in Data Science

KREAM@ICCS2013

Query aware determinization of uncertain objects

Ph.D Annual Report III

Machine Learning for Domain Experts

2017_Resume

Towards Automatic Composition of Multicomponent Predictive Systems

Ontology based top-k query answering over massive, heterogeneous, and dynamic...

Work Package Presentation

Linkedin Executive summary

Model-Driven Run-Time Enforcement of Complex Role-Based Access Control Policies

Data Mining vs Statistics

Data Processing and Data Visualization

Lucas_Taylor_Resume_Gen_Su16

Tommi kramer 2013-06-21-caise-re2-kramer

Test Tool for Industrial Ethernet Network Performance (June 2009)

Machine learning workshop using Orange datamining framework

Data Science, Machine Learning and Neural Networks

Approach AI assurance

Destaque

IEEE ICC 2012 - Dependability Assessment of Virtualized NetworksStenio Fernandes

統計在半導體產業的應用 -- Basic Statistic Methodschunhung chou

AlphaPy: A Data Science Pipeline in PythonMark Conway

Computer networks--introduction computer-networkingOlorunyomi Segun

Networking with Purpose - the Lincoln Hub: Dr Andrew West Vice Chancellor, Li...SmartNet

The tale of heavy tails in computer networkingStenio Fernandes

Ch12Keith Jasper Mier

Best Ways of Using MoodleSandra Pires Coach

Computer hardware and networking componentsManpreet Singh Bedi

Destaque (9)

IEEE ICC 2012 - Dependability Assessment of Virtualized Networks

統計在半導體產業的應用 -- Basic Statistic Methods

AlphaPy: A Data Science Pipeline in Python

Computer networks--introduction computer-networking

Networking with Purpose - the Lincoln Hub: Dr Andrew West Vice Chancellor, Li...

The tale of heavy tails in computer networking

Ch12

Best Ways of Using Moodle

Computer hardware and networking components

Semelhante a Data analytics in computer networking

NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...National Information Standards Organization (NISO)

Data analytcis-first-stepsShesha R

Data+Science+in+Python+-+Data+Prep+&+EDA.pdfneelakandan2001kpm

Data Processing DOH Workshop.pptxcharlslabarda

Introduction of data scienceTanujaSomvanshi1

Python for Data Analysis: A Comprehensive GuideAivada

Data Science Introduction: Concepts, lifecycle, applications.pptxsumitkumar600840

Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov

Tips and Tricks to be an Effective Data ScientistLisa Cohen

BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...Big Data Value Association

Data Science and Analysis.pptxPrashantYadav931011

SAS Training session - By Pratima Pratima Pandey

Text Analytics for Legal workAlgoAnalytics Financial Consultancy Pvt. Ltd.

Data Science.pptx NEW COURICUUMN IN DATAjaved75

التنقيب في البيانات - Data Miningnabil_alsharafi

Data Science Training in Chandigarh hasmeerana605

Towards a Comprehensive Machine Learning BenchmarkTuri, Inc.

The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfData Science Council of America

Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptxtesfkeb

IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET Journal

Semelhante a Data analytics in computer networking (20)

NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...

Data analytcis-first-steps

Data+Science+in+Python+-+Data+Prep+&+EDA.pdf

Data Processing DOH Workshop.pptx

Introduction of data science

Python for Data Analysis: A Comprehensive Guide

Data Science Introduction: Concepts, lifecycle, applications.pptx

Metabolomic Data Analysis Workshop and Tutorials (2014)

Tips and Tricks to be an Effective Data Scientist

BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...

Data Science and Analysis.pptx

SAS Training session - By Pratima

Text Analytics for Legal work

Data Science.pptx NEW COURICUUMN IN DATA

التنقيب في البيانات - Data Mining

Data Science Training in Chandigarh h

Towards a Comprehensive Machine Learning Benchmark

The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf

Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx

IRJET- Deep Learning Model to Predict Hardware Performance

Último

April 2024 - Crypto Market Report's Analysismanisha194592

Anomaly detection and data imputation within time seriesParis Women in Machine Learning and Data Science

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

Halmar dropshipping via API with DroFxolyaivanovalion

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823

Invezz.com - Grow your wealth with trading signalsInvezz1

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Probability Grade 10 Third Quarter LessonsJoseMangaJr1

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

Data analytics in computer networking

1. Data Analytics in Computer Networking The Case for Exploratory Data Analysis Stenio Fernandes Carleton University / CIn-UFPE March 2016

2. Outline Data Analysis - background EDA basics Applied EDA (Examples: WiFi simulated data) Q&A References

3. Data Analytics - Background

4. Data Science Pipeline •Analytic Data •Analytic Code •Documentation •Distribution •ElementsofReproducibleResearch Report Writing for Data Science in R, Roger D. Peng, 2016

5. 1. Stating and refining the question 2. Exploring the data 3. Building formal statistical models 4. Interpreting the results 5. Communicating the results Epicycle of Analysis The Art of Data Science, A Guide for Anyone Who Works with Data, Roger D. Peng and Elizabeth Matsui, 2016

6. • summarize the measurements in a single data set without further interpretation •Descriptive • Searching for discoveries, trends, correlations, or relationships between multiple variables to generate ideas or hypotheses Exploratory • quantifying whether an observed pattern will likely hold beyond the data set in hand Inferential • uses a subset of measurements (the features) to predict another measurement (the outcome) Predictive • what happens to one measurement if you make another measurement change Causal • changing one measurement always and exclusively leads to a specific, deterministic behavior in another Deterministic The Elements of Data Analytic Style, A guide for people who want to analyze data, Jeff Leek, 2015

7. EDA basics

8. Why use EDA - Summary • Maximize insight into a data set • Uncover underlying structure • Extract important variables • Detect outliers and anomalies • Test underlying assumptions • Develop parsimonious models • Determine optimal factor settings •NIST • Show comparisons • Show causality, mechanism, explanation • Show multivariate data • Integrate multiple modes of evidence • Describe and document the evidence • Content is king •JHUniversity

9. Answer to initial questions What is a typical value for a certain feature? What is the uncertainty for a typical value of a feature? What is a good distributional fit for a feature? What is the percentile distribution? Does modification on one variable have an effect another variable? Does a factor have an effect on performance metrics? What are the most important factors? What is the best function for relating a response variable to other variables? What are the best settings for factors (i.e. levels)? Can we separate signal from noise? Can we extract any structure from multivariate data? Does the data have outliers?

10. EDA Graphs Understand data properties Find patterns in data Suggest modeling strategies Debug analyses

11. Applied EDA Using R/ggplot2 (mpg dataset) -> fake wifi dataset

12. Practical Steps •Before performing any measurements or simulation • Identify • Performance Metrics • Performance Factors and Levels • Caution: sometimes you have to guess the ranges for the levels • Use an educated guess Don’t run tons of simulations / experiments (As previously discussed) Plot quick and dirty graphs • No need for titles, labels

13. Some examples of EDA Graphs - WiFi Data (simulated) • “Vendor” - factor / levels: LinkSys, … • “Model“ – factor / Levels: GST200, … • "Users_Max_Rate“ - factor (background traffic) / levels: 1.6, 1.8,…,7.0 Mbps • "Year“ – factor / Levels: 1999, 2008 • "BER“ – factor / Levels: 4, 5, 6, and 8 • "Type“ – factor (type of user) / Levels: 4, f, r • Rate – performance metric (Mbps) • Distance - factor (distance from the AP) / “Levels: 50,100m Features (Observation Variables)

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26. Q&A

27. References • NIST’s Handbook of Statistics Engineering (online) • Report Writing for Data Science in R, Roger D. Peng, 2016 • The Art of Data Science, A Guide for Anyone Who Works with Data, Roger D. Peng and Elizabeth Matsui, 2016 • The Elements of Data Analytic Style, A guide for people who want to analyze data, Jeff Leek, 2015

Notas do Editor

Left figure: Report Writing for Data Science in R, Roger D. Peng, 2016
Left figure: The Art of Data Science, A Guide for Anyone Who Works with Data, Roger D. Peng and Elizabeth Matsui, 2016
Figure and Text: The Elements of Data Analytic Style, A guide for people who want to analyze data, Jeff Leek, 2015

Data analytics in computer networking

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (9)

Semelhante a Data analytics in computer networking

Semelhante a Data analytics in computer networking (20)

Último

Último (20)

Data analytics in computer networking

Notas do Editor