O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Data scientist What is inside it?

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
1 data science with python
1 data science with python
Carregando em…3
×

Confira estes a seguir

1 de 21 Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Data scientist What is inside it? (20)

Anúncio

Mais recentes (20)

Data scientist What is inside it?

  1. 1. A new beginning in your career The Future of Data Science: What does Data Science have in store? Saif Shaikh Innovations in Business Solutions Inc. (IIBS) 403-151 City Centre Dr.| Mississauga, ON | L5B 2T4| Tel: (905)-268-0958| E-mail: info@iibs.ca, Website: http://www.iibs.ca
  2. 2. Agenda Defining data science Skills required Choosing R programming Career opportunities Q&A
  3. 3. Presenter introduction Saif Shaikh • Instructor at IIBS for the Data Scientist with R Programming course • Consultant involved in the data analytics and modeling fields • Formerly employed in the medical devices field • Education: B.S.E.E. (Massachusetts), M.Eng. (McMaster)
  4. 4. • Relatively new multidisciplinary field where scientific procedures are used to gain knowledge from data that can be in various forms • With the arrival of big data (enormous data sets) thanks to inexpensive data collection and storage, data science can be applied on it due to inexpensive computational power • A data scientist follows the data science process • Types of data to be analyzed  Structured: Stored as a model and organized such as a relational database or spreadsheet  Unstructured: No model or organization such a raw data including text, images, sound files, video  Semi-structured: Combination of the two such as a smartphone picture where the image data is unstructured but the appended camera information is structured Defining data science
  5. 5. Defining data science Data science is multidisciplinary http://blogs.gartner.com/christi-eubanks/three-lessons-crossfit-taught-data-science/
  6. 6. Defining data science Various disciplines contribute to data science https://en.wikibooks.org/wiki/Data_Science:_An_Introduction/A_Mash-up_of_Disciplines
  7. 7. Defining data science The data science process https://en.wikipedia.org/wiki/Data_science http://blog.operasolutions.com/bid/384900/what-is-data-science
  8. 8. • Data science subfields:  Machine learning: Subfield of artificial intelligence that gives computers access to data so they can learn themselves. It focuses on designing algorithms that can learn and make predictions using the supplied data.  Natural language processing: Subfield of artificial intelligence that uses computers to understand and derive meaning from human languages without explicit clues.  Deep learning: Subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.  Data mining: Discovering patterns in data using different methods such as machine learning, statistics and database systems to explain a phenomenon.  Data visualization: Presentation of data in graphical format so patterns, trends and correlations can be noticed easily.  Statistical modeling: Subfield of mathematics used to find relationships between variables in data using mathematical equations. Defining data science
  9. 9. • Data science applications:  Finance: Fraud detection, risk modeling, trading  Communication, media and entertainment: Consumer insights, recommend content, sentiment analysis, customer acquisition  Healthcare and pharmaceutical: Clinical trials, genetics analysis, epidemic forecasting  Education: Behavioral classification, teacher effectiveness  Manufacturing: Internet of Things, failure detection  Retail: Shelf-space optimization, pricing, promotions, up-sell  Energy and utilities: Smart meter analysis, service quality optimization, outage management and restoration, accident prevention, exploration  Agriculture: Climatology insights, field characteristics, weather information  Transportation: Fleet vehicle maintenance, self-driving vehicles, logistic optimization  Insurance: Fraud detection, call center optimization, risk assessment Defining data science
  10. 10. • Challenging field with high potential having 4 prerequisites: • Programming  Analytic software such as R  Ability to readily adapt and learn new software tools and libraries as required • Quantitative analysis  Ability to analyze data, develop algorithms and build models  Basic statistics, probability theory, algebra, read mathematical notations • Communication  Ability to express your findings to a non-technical audience such as marketing or sales  Can easily tell a story with graphs and presentations  Will be working with many departments so teamwork skills are needed Skills required
  11. 11. • Intuition  Ability to understand the business product will allow you to ask the correct questions and find the correct answers  Should have a curious mindset to solve problems Skills required
  12. 12. Choosing R Programming • Software environment for statistical computing and graphics • Free and open source language • Excellent R software tools to get started right away such as RStudio • Own set of tools to write publication-quality plots and documentation • Continuous backing by statisticians, scientists, scholars and research institutes • Publicly available for over 20 years with a regular release cycle • Comes with a robust packaging system to allow developers and domain experts to easily distribute their code, often written by researchers and accompany scientific papers
  13. 13. Choosing R Programming • All in one environment for data manipulation, visualization, machine learning, reporting and more • R popularity in academia is important because it creates a pool of talent that feeds industry that in turns creates more demand for R talent • Top tier companies using R such as: Facebook, Google, Twitter, Microsoft, Uber, Airbnb, IBM, HP, Ford, Accenture, American Express, Citibank and many more
  14. 14. TIOBE SEP17 Index: Number of search engine searches https://www.tiobe.com/tiobe-index/r/ Choosing R Programming
  15. 15. RedMonk JUN17 Programming Language Rankings: Usage and discussion http://redmonk.com/sogrady/2017/06/08/language-rankings-6-17/ Choosing R Programming
  16. 16. Scholarly articles with data science software http://r4stats.com/articles/popularity/ Choosing R Programming
  17. 17. • One of the hottest jobs right now because of 3 reasons: Shortage of talent, organizations continue to face enormous challenges in organizing data and the need for data scientists is no longer restricted to tech giants • Data science can apply to a number of occupations • Glassdoor released a report in January 2017 with data scientist as the best job in America (median salary $110,000) • Careercast revealed in 2017 that data scientists have the best growth potential over the next 7 years as they are the toughest job to fill (median salary $111,267) Career opportunities
  18. 18. Career opportunities Indeed: Data scientist job trends https://www.indeed.com/jobtrends/q-%22Data-Scientist%22.html
  19. 19. Career opportunities Emsi: Q416 data science related occupations and their earnings https://www.forbes.com/sites/emsi/2016/11/16/want-to-become-a-data-scientist-where-the-jobs-are-and-what-employers-are-looking-for/2/
  20. 20. Career opportunities Robert Half: Top 10 tech jobs in 2017 https://www.roberthalf.com/sites/default/files/Media_Root/images/rht-pdfs/rht_0916_ig_sg2017-jobstowatch_nam_eng.pdf
  21. 21. • If you require further training, we have a course available at IIBS: Data Scientist with R Programming (55 hours) – Saif Shaikh Weekend classes Mississauga, ON 905-268-0958 info@iibs.ca Q&A

×