O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
Data science
Data science
Carregando em…3
×

Confira estes a seguir

1 de 65 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Data science (20)

Anúncio

Mais recentes (20)

Data science

  1. 1. Introduction to Data Science PURNA CHANDER RAO . KATHULA
  2. 2. Agenda ● What is Data Science? ● Domain’s - Need of Data Science? ● Data Life Cycle ● Data Science Sub-Domains ● Why Python for Data Science? ● Python - Modules in Data science ○ Introduction to Pandas ○ Introduction to Numpy ○ Introduction to Matplotlib ○ Introduction to Seaborn ● What is Machine Learning ?
  3. 3. What is Data Science Data Science is the field of study that combines Domain expertise, Programming skills, Knowledge of Math and Statistics to extract meaningful insights from DATA. In turn these systems generate insights that analysts and business users translate into tangible business values.
  4. 4. Data Life Cycle
  5. 5. Data Science - Sub Domains
  6. 6. Domains - Need of Data Science ● Ecommerce ○ Recommendation System, Customer sentiment analysis, Inventory management, improve customer service. ● HealthCare ○ Castlight - Helps customers / Client to take an appropriate plan ● Financials ○ Chatbots, call-center automation , paper work automation ● And ETC……….
  7. 7. Why Python for Data Science ● It is easy to Learn ○ Now the language of choice for 8 of 10 US computer science programs ● Full Featured ○ Not just a statistics language , but has full capabilities for data acquisition, cleaning, databases, high performance computing and more ● Strong Data Science Libraries ○ Pandas, Numpy, Matplotlib, Scipy, Seaborn, NLTK, Scikitlearn and etc….
  8. 8. Anaconda
  9. 9. What is Anaconda? ● Essentially a Large ( ~ 400 MB ) Python Installation ● But contains everything you need for Data Engineering, Analytics and Machine Learning ● Unless you have a special reason not to , you should just install and use this.
  10. 10. Introduction to Pandas What is Pandas ? Pandas is a Python library for data analysis and data manipulation. A python version of the R data.frame library. Key Features of Pandas ● It has API’s for loading data from different file formats into memory. ● ( exel, tsv, csv, db and etc). ● Data is structured in the form of Rows and Columns. ● Retrieval of data is similar as SQL, can perform all the operations such as Groupby, Joins, Views and etc.. ● Merging of data from multiple datasets. ● Does support much of DataTime series functionality, Timezone, Business Days, Holidays and etc.. ● Boolean Indexing ● Fancy Indexing
  11. 11. Core DataStrucures of Pandas ● DataFrames ● Series Core Operations Create Select Insert Map Join Sort Clean ApplyMap View Update Filter Append Group Summarise Confirm Rotate
  12. 12. Introduction to Numpy ● Numpy is extremely used in scientific computing ● 3 Main benefits of using numpy array over a list ○ Less memory ○ Fast ○ Convenient ● Broadcasting allows universal functions to deal in a meaningful way with numpy arrays.
  13. 13. Introduction to Matplotlib A picture is worth than thousands of words. Matplotlib is a 2-D plotting library that helps in visualizing figures. Matplotlib emulates Matlab like graphs and visualizations. Matplotlib is a python library used to create 2D graphs and plots by using python scripts. It has a module named pyplot which makes things easy for plotting by providing feature to control line styles, font properties, formatting axes etc. It supports a very wide variety of graphs and plots namely - histogram, bar charts, power spectra, error charts etc. It is used along with NumPy to provide an environment that is an effective open source alternative for MatLab.
  14. 14. Introduction to Seaborn Seaborn is a Python data visualization library based on matplotlib . it provides a high level interface for drawing attractive and informative statistical graphics Important features of seaborn ● Built in themes for styling matplotlib graphics ● Fitting in and visualizing linear regression models ● Plotting statistical time series data ● Seaborn works well with NumPy and Pandas data structures ● It comes with built in themes for styling Matplotlib graphics
  15. 15. BOX PLOTS
  16. 16. VIOLIN PLOTS
  17. 17. BAR PLOTS
  18. 18. BOX PLOTS
  19. 19. VIOLIN PLOTS
  20. 20. Machine Learning ● What is Machine Learning ● Types of Machine Learning ● Supervised and Unsupervised Learning. ● Use Cases ○ Linear Regression ( Supervised) ○ K-Means ( Unsupervised) ○ Sentiment Analysis
  21. 21. What is Machine Learning Machine Learning is a subset of Artificial Intelligence ( AI ) which provides the machines the ability to learn automatically & improve from experience without being explicitly programmed.
  22. 22. Types of Machine Learning ● Supervised Learning. ● Unsupervised Learning. ● Reinforcement Learning.
  23. 23. Linear Regression (Supervised) Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task. Regression models a target prediction value based on independent variables. It is mostly used for finding out the relationship between variables and forecasting.
  24. 24. K - Means ( Unsupervised) K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity. The results of the K-means clustering algorithm are: ● The centroids of the K clusters, which can be used to label new data ● Labels for the training data (each data point is assigned to a single cluster)
  25. 25. References Python / Anaconda - https://www.anaconda.com/distribution/ Pandas - https://pandas.pydata.org/ Numpy - https://numpy.org/ Matplotlib - https://matplotlib.org/ Seaborn - https://seaborn.pydata.org/ Scipy - https://www.scipy.org/ Bokeh - https://bokeh.pydata.org/en/latest/

×