Python Visualisation for Data Science

•

2 gostaram•2,325 visualizações

This document discusses data visualization libraries for data science in Python. It outlines the data science pipeline and how visualization fits in at each step. Popular Python visualization libraries like Matplotlib, Pandas, ggplot, Altair, Seaborn, Plotly, Bokeh, and HoloViews are presented. Guidance is provided on choosing a library based on ease of use, functionality, and support. Examples demonstrate basic plotting with Pandas and adding annotations with Matplotlib, as well as using Altair for grammar-based visualization. Interactivity options with libraries like Bokeh and Plotly are also briefly covered.

Dados e análise

Data Vis for Data Science
Usage of Python Visualisation Libraries
Amit Kapoor
@amitkaps

Data Science Pipeline
— Frame: Problem deﬁnition
— Acquire: Data ingestion
— Reﬁne: Data wrangling
— Transform: Feature creation
— Explore: Feature selection
— Model: Model creation & assessment
— Insight: Solution communication

Role of Visualisation
— Frame: Structuring (issue tree, hypotheses)
— Acquire: Loading (progress, errors)
— Reﬁne: Proﬁling (missing values, outliers)
— Transform: Univariate & Bivariate Vis (1D, 2D)
— Explore: Multi Dimensional Vis (3D ... ND)
— Model: Model Vis (predictions, errors, models)
— Insight: Vis Comm (chart, narrative, dashboard)

Understanding Visualisation
— Domain & Task Layer e.g. Tabular Data for EDA
— Data Layer e.g. Data Types, Transformation
— Visual Layer e.g. Encoding, Marks, Coordinate
— Annotation Layer e.g. Labels, Ticks, Titles
— Interaction Layer e.g. Filtering, Highlighting,
Selection

Python Visualisation Libraries
— Matplotlib
— Pandas built-in plotting
— ggpy
— Altair
— Seaborn
— Plotly
— Bokeh
— HoloViews
— VisPy
— Lightning
— pygg

Choosing a Visualisation Library
— Ease of Learning: How hard is the API?
— Coverage: How many graphic types can it cover?
— Approach: Is it Charting or Grammar based?
— Documentation: How easy is it to make basics
graphs?
— Community Support: How hard is it to make complex
graphs?

Notes in Circulation
year | type | denom | value | money | number |
------- | -------| ------ | ------ | ------- | ------ |
1977 | Notes | 0001 | 1 | 2.72 | 2.720 |
1977 | Notes | 1000 | 1000 | 0.55 | 0.001 |
1977 | Notes | 0002 | 2 | 1.48 | 0.740 |
1977 | Notes | 0050 | 50 | 9.95 | 0.199 |
... | ... | ... | ... | ... | ... |
2015 | Notes | 0500 | 500 | 7853.75 | 15.708 |
2015 | Notes | 0001 | 1 | 3.09 | 3.090 |
2015 | Notes | 0010 | 10 | 320.15 | 32.015 |
2015 | Notes | 1000 | 1000 | 6325.68 | 6.326 |

Use Pandas for Base Plotting
# Loading Data
import pandas as pd
notes = pd.read_csv('notes.csv')
# Data Transformation
notes_wide = pd.pivot_table(data = notes, index="year",
columns="denom", values="money")
# Plotting
notes_wide.plot(kind="line")

Use Matplotlib for Annotation
# Basic Styling
import matplotlib.pyplot as pet
plt.rcParams['ﬁgure.ﬁgsize'] = (9,6)
plt.style.use('ggplot')
# Plotting
notes_wide.plot(kind="line")
# Adding Annotation
plt.ylabel('Value INR Bns')
plt.title('Notes in Circulation')

Ideally use ggplot like R
from plot import *
ggplot(notes, aes(x='year',
y='money',
color='denom')) + /
geom_line()

Use Altair for Grammar Visualisation
from altair import Chart
Chart(notes).mark_line().encode(
x='year:N',
y='money',
color='denom'
)

Personal Usage
— Use Pandas for base plotting and time series
— Use Matplotlib for matrices and customisation
— Use Seaborn for 1D & 2D statistical graphs,
especially categorical variable
— Use IPython Widgets for model interaction
— Use Datashader for Big Data Visualisation
— Experimenting with Altair

What about interactivity?
— Watch out for Altair - Interaction will be build
in soon
— Use Bokeh for web-based interactive dashboard,
but require learning a different API
— Use Plotly for creating full interactive charts.
Integration with Matplotlib available.

Get in touch with me
Amit Kapoor
@amitkaps
amitkaps.com

Mais conteúdo relacionado

Semelhante a Python Visualisation for Data Science

MSBI and Data WareHouse techniques by Quontra QUONTRASOLUTIONS

IaaS, PaaS, and DevOps for Data ScientistDmitry Petukhov

Msbi online trainingGlory IT Technologies Pvt. Ltd.

Machine Learning with PythonAnkit Rathi

Data Product ArchitecturesBenjamin Bengfort

Data Observability Best PracicesAndy Petrella

Introduction to Data Analtics with Pandas [PyCon Cz]Alexander Hendorf

PYTHON-Chapter 4-Plotting and Data Science PyLab - MAULIK BORSANIYAMaulik Borsaniya

The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks

Using the LEADing Data Reference ContentGlobal University Alliance

Machine Learning with AzureBarbara Fusinska

Cssu dw dmsumit621

Machine learning 101AmmarChalifah

Scala meetup Kyiv slides 20171215Evaldas Miliauskas

How to Ensure your Microsoft BI Project is a Success! Ed Senez

Msbi course contentUnited Global Soft

QWC 2014 - A picture worth 1000 wordsJohn Park

“Semantic PDF Processing & Document Representation”diannepatricia

Sap business objects bobi trainingFuturePoint Technologies

DA_01_Intro.pptxAlok Mohapatra

Semelhante a Python Visualisation for Data Science (20)

MSBI and Data WareHouse techniques by Quontra

IaaS, PaaS, and DevOps for Data Scientist

Msbi online training

Machine Learning with Python

Data Product Architectures

Data Observability Best Pracices

Introduction to Data Analtics with Pandas [PyCon Cz]

PYTHON-Chapter 4-Plotting and Data Science PyLab - MAULIK BORSANIYA

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

Using the LEADing Data Reference Content

Machine Learning with Azure

Cssu dw dm

Machine learning 101

Scala meetup Kyiv slides 20171215

How to Ensure your Microsoft BI Project is a Success!

Msbi course content

QWC 2014 - A picture worth 1000 words

“Semantic PDF Processing & Document Representation”

Sap business objects bobi training

DA_01_Intro.pptx

Mais de Amit Kapoor

Deep Learning for NLPAmit Kapoor

The Power of Ensembles in Machine LearningAmit Kapoor

Storytelling with Data - Approach | SkillsAmit Kapoor

Visualising Big DataAmit Kapoor

Learning the Craft of Data VisualisationAmit Kapoor

Visualising Multi Dimensional DataAmit Kapoor

Tools & Resources for Data VisualisationAmit Kapoor

Fifth Elephant 2014 talk - Crafting Visual Stories with DataAmit Kapoor

Storytelling with Data - See | Show | Tell | EngageAmit Kapoor

Crafting Visual Stories with DataAmit Kapoor

Business Process Improvement - A Strategic and Supply Chain Perspective Amit Kapoor

What makes a data-story work?Amit Kapoor

What is Strategy - Thinking like a StrategistAmit Kapoor

Telling Stories with Data - Using Story SpineAmit Kapoor

Story Structure and Modern StorytellingAmit Kapoor

Targeting the Moment of Truth - Using Big Data in RetailAmit Kapoor

Storytelling - GutenbergAmit Kapoor

Analytics in ConsultingAmit Kapoor

Retail Pricing PerspectiveAmit Kapoor

Mais de Amit Kapoor (19)

Deep Learning for NLP

The Power of Ensembles in Machine Learning

Storytelling with Data - Approach | Skills

Visualising Big Data

Learning the Craft of Data Visualisation

Visualising Multi Dimensional Data

Tools & Resources for Data Visualisation

Fifth Elephant 2014 talk - Crafting Visual Stories with Data

Storytelling with Data - See | Show | Tell | Engage

Crafting Visual Stories with Data

Business Process Improvement - A Strategic and Supply Chain Perspective

What makes a data-story work?

What is Strategy - Thinking like a Strategist

Telling Stories with Data - Using Story Spine

Story Structure and Modern Storytelling

Targeting the Moment of Truth - Using Big Data in Retail

Storytelling - Gutenberg

Analytics in Consulting

Retail Pricing Perspective

Último

Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131

怎样办理圣地亚哥州立大学毕业证（SDSU毕业证书）成绩单学校原版复制vexqp

Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg

Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg

Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli

如何办理英国诺森比亚大学毕业证（NU毕业证书）成绩单原件一模一样wsppdmt

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg

Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg

Digital Transformation Playbook by Graham WareGraham Ware

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop

Ranking and Scoring Exercises for ResearchRajesh Mondal

Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515

Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro

Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14

Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls

Python Visualisation for Data Science

1. Data Vis for Data Science Usage of Python Visualisation Libraries Amit Kapoor @amitkaps

2. Data Science Pipeline — Frame: Problem deﬁnition — Acquire: Data ingestion — Reﬁne: Data wrangling — Transform: Feature creation — Explore: Feature selection — Model: Model creation & assessment — Insight: Solution communication

3. Role of Visualisation — Frame: Structuring (issue tree, hypotheses) — Acquire: Loading (progress, errors) — Reﬁne: Proﬁling (missing values, outliers) — Transform: Univariate & Bivariate Vis (1D, 2D) — Explore: Multi Dimensional Vis (3D ... ND) — Model: Model Vis (predictions, errors, models) — Insight: Vis Comm (chart, narrative, dashboard)

4. Understanding Visualisation — Domain & Task Layer e.g. Tabular Data for EDA — Data Layer e.g. Data Types, Transformation — Visual Layer e.g. Encoding, Marks, Coordinate — Annotation Layer e.g. Labels, Ticks, Titles — Interaction Layer e.g. Filtering, Highlighting, Selection

5. Python Visualisation Libraries — Matplotlib — Pandas built-in plotting — ggpy — Altair — Seaborn — Plotly — Bokeh — HoloViews — VisPy — Lightning — pygg

6. Choosing a Visualisation Library — Ease of Learning: How hard is the API? — Coverage: How many graphic types can it cover? — Approach: Is it Charting or Grammar based? — Documentation: How easy is it to make basics graphs? — Community Support: How hard is it to make complex graphs?

7. Notes in Circulation year | type | denom | value | money | number | ------- | -------| ------ | ------ | ------- | ------ | 1977 | Notes | 0001 | 1 | 2.72 | 2.720 | 1977 | Notes | 1000 | 1000 | 0.55 | 0.001 | 1977 | Notes | 0002 | 2 | 1.48 | 0.740 | 1977 | Notes | 0050 | 50 | 9.95 | 0.199 | ... | ... | ... | ... | ... | ... | 2015 | Notes | 0500 | 500 | 7853.75 | 15.708 | 2015 | Notes | 0001 | 1 | 3.09 | 3.090 | 2015 | Notes | 0010 | 10 | 320.15 | 32.015 | 2015 | Notes | 1000 | 1000 | 6325.68 | 6.326 |

8. Use Pandas for Base Plotting # Loading Data import pandas as pd notes = pd.read_csv('notes.csv') # Data Transformation notes_wide = pd.pivot_table(data = notes, index="year", columns="denom", values="money") # Plotting notes_wide.plot(kind="line")

10. Use Matplotlib for Annotation # Basic Styling import matplotlib.pyplot as pet plt.rcParams['ﬁgure.ﬁgsize'] = (9,6) plt.style.use('ggplot') # Plotting notes_wide.plot(kind="line") # Adding Annotation plt.ylabel('Value INR Bns') plt.title('Notes in Circulation')

11.

12. Ideally use ggplot like R from plot import * ggplot(notes, aes(x='year', y='money', color='denom')) + / geom_line()

13. Use Altair for Grammar Visualisation from altair import Chart Chart(notes).mark_line().encode( x='year:N', y='money', color='denom' )

14.

15. Personal Usage — Use Pandas for base plotting and time series — Use Matplotlib for matrices and customisation — Use Seaborn for 1D & 2D statistical graphs, especially categorical variable — Use IPython Widgets for model interaction — Use Datashader for Big Data Visualisation — Experimenting with Altair

16. What about interactivity? — Watch out for Altair - Interaction will be build in soon — Use Bokeh for web-based interactive dashboard, but require learning a different API — Use Plotly for creating full interactive charts. Integration with Matplotlib available.

17. Get in touch with me Amit Kapoor @amitkaps amitkaps.com

Python Visualisation for Data Science

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Python Visualisation for Data Science

Semelhante a Python Visualisation for Data Science (20)

Mais de Amit Kapoor

Mais de Amit Kapoor (19)

Último

Último (20)

Python Visualisation for Data Science