2. Talking Topics
Jupyter notebook
About me
Python modules for Data Science
Anaconda
Pandas
About pandas
Data Munging / Data Preparation.
Demo
Seaborn
About seaborn
Machine Learning
Linear Regression.
3. About me..
Job Title = Architect QA
Build Tools using Python for QA automation testing .
Currently Learning
4. Python modules for Data Science
Packages used for Data Analysis and Analytics
Jupyter Notebook
Pandas
Numpy
Scipy
Matplotlib
Seaborn
Scikitlearn
7. What is Anaconda ?
Essentially a Large ( ~ 400 MB ) Python Installation.
But Contains Everything you need for Data Analysis
Unless you have a special reason not to , you should just install and use this.
9. About Pandas
What is Pandas ?
Pandas is a Python library for data analysis and data manipulation. A python version of the R
data.frame library.
Key Features of Pandas
It has API’s for loading data from different file formats into memory.
( exel, tsv, csv, db and etc).
Data is structured in the form of Rows and Columns.
Retrieval of data is similar as SQL, can perform all the operations such as Groupby, Joins, Views and etc..
Merging of data from multiple datasets.
Does support much of DataTime series functionality, Timezone, Business Days, Holidays and etc..
Boolean Indexing
Fancy Indexing
10. Core DataStructures of Pandas
DataFrames
Series
Core Operations
Create Select Insert Map
Join Sort Clean ApplyMap
View Update Filter Append
Group Summarize Confirm Rotate
36. What is Seaborn?
Seaborn provides a high-level interface to matplotlib. It provides a high level
interface for drawing attractive statistical graphs.