This document discusses data visualization libraries for data science in Python. It outlines the data science pipeline and how visualization fits in at each step. Popular Python visualization libraries like Matplotlib, Pandas, ggplot, Altair, Seaborn, Plotly, Bokeh, and HoloViews are presented. Guidance is provided on choosing a library based on ease of use, functionality, and support. Examples demonstrate basic plotting with Pandas and adding annotations with Matplotlib, as well as using Altair for grammar-based visualization. Interactivity options with libraries like Bokeh and Plotly are also briefly covered.
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Python Visualisation for Data Science
1. Data Vis for Data Science
Usage of Python Visualisation Libraries
Amit Kapoor
@amitkaps
2. Data Science Pipeline
— Frame: Problem definition
— Acquire: Data ingestion
— Refine: Data wrangling
— Transform: Feature creation
— Explore: Feature selection
— Model: Model creation & assessment
— Insight: Solution communication
3. Role of Visualisation
— Frame: Structuring (issue tree, hypotheses)
— Acquire: Loading (progress, errors)
— Refine: Profiling (missing values, outliers)
— Transform: Univariate & Bivariate Vis (1D, 2D)
— Explore: Multi Dimensional Vis (3D ... ND)
— Model: Model Vis (predictions, errors, models)
— Insight: Vis Comm (chart, narrative, dashboard)
4. Understanding Visualisation
— Domain & Task Layer e.g. Tabular Data for EDA
— Data Layer e.g. Data Types, Transformation
— Visual Layer e.g. Encoding, Marks, Coordinate
— Annotation Layer e.g. Labels, Ticks, Titles
— Interaction Layer e.g. Filtering, Highlighting,
Selection
6. Choosing a Visualisation Library
— Ease of Learning: How hard is the API?
— Coverage: How many graphic types can it cover?
— Approach: Is it Charting or Grammar based?
— Documentation: How easy is it to make basics
graphs?
— Community Support: How hard is it to make complex
graphs?
8. Use Pandas for Base Plotting
# Loading Data
import pandas as pd
notes = pd.read_csv('notes.csv')
# Data Transformation
notes_wide = pd.pivot_table(data = notes, index="year",
columns="denom", values="money")
# Plotting
notes_wide.plot(kind="line")
9.
10. Use Matplotlib for Annotation
# Basic Styling
import matplotlib.pyplot as pet
plt.rcParams['figure.figsize'] = (9,6)
plt.style.use('ggplot')
# Plotting
notes_wide.plot(kind="line")
# Adding Annotation
plt.ylabel('Value INR Bns')
plt.title('Notes in Circulation')
11.
12. Ideally use ggplot like R
from plot import *
ggplot(notes, aes(x='year',
y='money',
color='denom')) + /
geom_line()
13. Use Altair for Grammar Visualisation
from altair import Chart
Chart(notes).mark_line().encode(
x='year:N',
y='money',
color='denom'
)
14.
15. Personal Usage
— Use Pandas for base plotting and time series
— Use Matplotlib for matrices and customisation
— Use Seaborn for 1D & 2D statistical graphs,
especially categorical variable
— Use IPython Widgets for model interaction
— Use Datashader for Big Data Visualisation
— Experimenting with Altair
16. What about interactivity?
— Watch out for Altair - Interaction will be build
in soon
— Use Bokeh for web-based interactive dashboard,
but require learning a different API
— Use Plotly for creating full interactive charts.
Integration with Matplotlib available.
17. Get in touch with me
Amit Kapoor
@amitkaps
amitkaps.com