O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 22 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Lecture #01 (20)

Anúncio

Mais recentes (20)

Lecture #01

  1. 1. Introduction to Data Science Lecture#1 Program: BS(DS)-Fall 2019 Instructor: Konpal Darakshan
  2. 2. Books: 1. Doing Data Science: Straight Talk from the Frontline by Cathy O'Neil and Rachel Schutt. 2. The R Primer by Claus Thorn Ekstrom. Other: 1-An Introduction to Data Science by Jeffrey M. Stanton and Jeffrey S. Saltz. 2-Learn R for Applied Statistics: With Data Visualizations, Regressions, and Statistics by Eric Goh Ming Hui. 3-Practical Statistics for Data Scientists: 50 Essential Concepts by Andrew Bruce and Peter C. Bruce. 4-Data Analysis for the Life Sciences with R by Michael I. Love and Rafael A. Irizarry. 5-R Programming for Data Science by Roger D. Peng.
  3. 3. Marking Scheme • Exams Final Exam 40 Marks 1st Hourly 15 Marks 2nd Hourly 15 Marks • Sessional Marks Lab Manual 5 Marks Presentation 5 Marks Assignments 10 Marks Quizzes 10 Marks
  4. 4. Chapter # 01 Introduction: What is Data Science? • Big Data and Data Science hype • Getting past the hype • Why now? • Datafication • Current landscape of perspectives • Data Science Jobs • What is data Scientist -In Academia -In Industry
  5. 5. Basic Terminologies • Data • It can be -generated -collected -retrieved. Simulation Similarity Measures Data Structures Algorithms
  6. 6. • Data: facts with no meanings. • Information: learning from facts. • Knowledge: practical understanding of a subject. • Understanding: the ability to absorb knowledge and learn to reason. • Wisdom: the quality of having experience and good judgment; ability to think and foresee. • Validity: ways to confirm truth.
  7. 7. • Cross-sectional data: applied on data without time. • Temporal data: applied on time series. • Spatial: considers location i.e. coordinate determination in touch phones. • Temporal cum Spatial (GIS): considers change with passage of time for example population density. • Measurements of Scales There are 4 scales of measurement • Nominal: determines classification of data i.e. male/female. • Ordinal: determines order of data and can be numerical or non-numerical i.e. time of day (dawn, morning, noon, afternoon, evening, night). • Interval: gives the interval of a measurement i.e. temperature interval. • Ratio: gives ratio of the measurement i.e. weight, height, number of children.
  8. 8. Big Data and Data Science Hype:  Skeptical related to Data Sciences. • Is data sciences only the stuff going in companies like Google, Facebook and tech companies? • There’s a distinct lack of respect for the researchers in academia and industry labs who have been working on this kind of stuff for years, and whose work is based on decades. • The hype is crazy-In general, hype masks reality and increases the noise-to- signal ratio. • Statisticians already feel that they are studying and working on the “Science of Data.” Chapter # 01 Introduction: What is Data Science?
  9. 9. Getting Past the Hype • Rachel’s experience going from getting a PhD in statistics to working at Google. In her words:
  10. 10. We have a couple replies to this: • Sure, there’s is a difference between industry and academia. But does it really have to be that way? Why do many courses in school have to be so intrinsically out of touch with reality? • Even so, the gap doesn’t represent simply a difference between industry statistics and academic statistics. The general experience of data scientists is that, at their job, they have access to a larger body of knowledge and methodology, as well as a process, which we now define as the data science process, that has foundations in both statistics and computer science. Around all the hype, in other words, there is a ring of truth: this is something new. Getting Past the Hype
  11. 11. • We have massive amounts of data about many aspects of our lives, and ,simultaneously, What people might not know is that the “datafication” of our offline behavior has started as well. • On the Internet, this means Amazon recommendation systems. • on Facebook, friend recommendations, film and music recommendations, and so on. • In finance, this means credit ratings, trading algorithms, and models. • In education, this is starting to mean dynamic personalized learning and assessments coming out of places like Knewton and Khan Academy. • In government, this means policies based on data. Why Now?
  12. 12. • In the May/June 2013 issue of Foreign Affairs, Kenneth Neil Cukier and Viktor Mayer-Schoenberger wrote an article called “The Rise of Big Data”, In it they discuss the concept of datafication, They define datafication as a process of “taking all aspects of life and turning them into data.” • They follow up their definition in the article with a line that speaks volumes about their perspective: Once we datafy things, we can transform their purpose and turn the information into new forms of value. Datafication
  13. 13. Examples: • How we quantify friendships with “likes”. • “Google’s augmented-reality glasses datafy the gaze. • Twitter datafies stray thoughts. • LinkedIn datafies professional networks. • When we “like” someone or something online, we are intending to be datafied. • Browse the Web, we are unintentionally through cookies. • When we walk around in a store, or even on the street, we are being datafied, via sensors, cameras, or Google glasses. • Taking part in a social media experiment. • All-out surveillance and stalking. But it’s all datafication Datafication
  14. 14. For example, • On Quora there’s a discussion from 2010 about “What is Data Science?” and here’s Metamarket CEO Mike Driscoll’s answer: Data science, as it’s practiced, is a blend of Red-Bull-fueled hacking and espresso-inspired statistics. • Driscoll then refers to Drew Conway’s Venn diagram of data science from 2010. Current landscape of perspectives
  15. 15. • Nathan Yau’s 2009 post, “Rise of the Data Scientist”, which include: 1. Statistics (traditional analysis you’re used to thinking about) 2. Data munging (parsing, scraping, and formatting data) 3. Visualization (graphs, tools, etc.) • ASA President Nancy Geller’s 2011 Amstat News article, “Don’t shun the ‘S’ word”, in which she defends statistics: • Then at LinkedIn and Facebook, respectively—coined the term “data scientist” in 2008. • Wikipedia finally gained an entry on data science in 2012. Current landscape of perspectives
  16. 16. • In 2001, William Cleveland wrote a position paper about data science called “Data Science: An action plan to expand the field of statistics.” • Harvard Business Review declared data scientist to be the “Sexiest Job of the 21st Century”. So data science existed before data scientists? Is this semantics, or does it make sense? Current landscape of perspectives
  17. 17. Data Science Jobs • For three years running, data science has been dubbed ¨the best job in America.¨ According to Stack Overflow, it is one of the highest paying jobs in the software sector. • The GDPR increased the reliance companies have on data scientists due to the need for real-time analytics and storing data responsibly. • There are 465 job openings in New York City alone for data scientists. • LinkedIn recently picked data scientist as its most promising career of 2019. One of the reasons it got the top spot was that the average salary for people in the role is $130,000. • The January report from Indeed, one of the top job sites, showed a 29% increase in demand for data scientists year over year and a 344% increase since 2013 -- a dramatic upswing. But while demand -- in the form of job postings -- continues to rise sharply, searches by job seekers skilled in data science grew at a slower pace (14%), suggesting a gap between supply and demand.
  18. 18. The growth in data scientist job postings on Indeed, from December 2016 to December 2018
  19. 19. OK, So What Is a Data Scientist, Really? Perhaps the most concrete approach is to define data science is by its usage. • In Academia • An academic data scientist is a scientist, trained in anything from social science to biology, who works with large amounts of data, and must grapple with computational problems posed by the structure, size, messiness, and the complexity and nature of the data, while simultaneously solving a real-world problem. • In Industry More generally, a data scientist is someone who knows • How to design the experiments, • how to the process of collecting, cleaning, and munging of data. • Skills that are also necessary for understanding biases in the data, and for debugging logging output from code. • Exploratory data analysis, which combines visualization and data sense. • Find patterns, build models, and algorithms. • Use analyses for decision making.
  20. 20. Data Engineers are the data professionals who prepare the “big data” infrastructure to be analyzed by Data Scientists Data analyst is someone who merely curates meaningful insights from data. A data scientist is a professional with the capabilities to gather large amounts of data to analyze and synthesize the information into actionable plans for companies and other organizations. What Is a Data Scientist

×