O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Data Science - An emerging Stream of Science with its Spreading Reach & Impact

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
8 minute intro to data science
8 minute intro to data science
Carregando em…3
×

Confira estes a seguir

1 de 53 Anúncio

Data Science - An emerging Stream of Science with its Spreading Reach & Impact

Baixar para ler offline

This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.

This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Data Science - An emerging Stream of Science with its Spreading Reach & Impact (20)

Anúncio

Mais de Dr. Sunil Kr. Pandey (15)

Mais recentes (20)

Anúncio

Data Science - An emerging Stream of Science with its Spreading Reach & Impact

  1. 1. Data Science Dr. Sunil Kr Pandey Professor & Director (IT & UG) Institute of Technology & Science Mohan Nagar, Ghaziabad
  2. 2. Evolution of Databases
  3. 3. There's certainly a lot of it! 2015 1 Zettabyte 1 Exabyte 1 Petabyte (brain) 14 PB: http://www.quora.com/Neuroscience-1/How-much-data-can-the-human-brain-store (2002) 5 EB: http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/execsum.htm 1 Petabyte == 1000 TB 2002 2009 (2009) 800 EB: http://www.emc.com/collateral/analyst-reports/idc-digital-universe-are-you-ready.pdf (2015) 8 ZB: http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf 2006 2011 (2006) 161 EB: http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf (2011) 1.8 ZB: http://www.emc.com/leadership/programs/digital-universe.htm (life in video) 60 PB: in 4320p resolution, extrapolated from 16MB for 1:21 of 640x480 video (w/sound) – almost certainly a gross overestimate, as sleep can be compressed significantly! 5 EB 161 EB 800 EB 1.8 ZB 8.0 ZB 14 PB 60 PB Data produced each year 100-years of HD video + audio Human brain's capacity Data, data everywhere… References 1 TB = 1000 GB 120 PB logarithmicscale
  4. 4. Data has become a Resource that needs to be carefully stored, processed, analyzed, visualize and Present where it is required securely.
  5. 5. Growing Need for Analytics DATA HARNESSING Companies store each piece of information generated during the business operations and customer interactions. DATA VOLUMESData is generated. Learning from the data is used in the decision making and process optimization. Data is analyzed. 1.22010 2012 2015 2.4 7.9 Volumes in Trillion GB DID YOU KNOW ? Generation of Large Amount of Data from Business Transactions 4 Billion Number of transactions every year 900 Number of Stores Number of SKUs 10000 -1 lakh
  6. 6. Year Data Volume in Zetabytes 2010 2 2011 5 12 6.5 13 9 14 12.5 15 15.5 16 18 17 26 18 33 19 41 20 50.5 21 64.5 22 79.5 23 101 24 129.5 25 175 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2 5 6.5 9 12.5 15.5 18 26 33 41 50.5 64.5 79.5 101 129.5 175 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Data Volume Growth from 2010 – 2025 Year Data Volume Growth in Data Volume 2010-2025 (Projections)
  7. 7. Fourth Paradigm of Science Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science - • Thousands of years • Empirical (अनुभवजन्य) • Few hundreds of years • Theoretical (सैद्धांतिक) • Last fifty years • Computational (गणनधत्मक) • “Query the world” • Last twenty years • eScience (Data Science) • “Download the world”
  8. 8. What is Data Science • Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. • Data Science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, comp. science, and information science. • The availability of high-capacity networks, low-cost computers and storage devices as well as the widespread adoption of hardware virtualization, service-oriented architecture and autonomic and utility computing has led to growth in cloud computing.
  9. 9. Data Science – A Visual Definition
  10. 10. Data Science : A Definition Data Science is the science which uses computer science, statistics and machine learning, visualization and human-computer interactions to: 1. Collect 2. Clean 3. Integrate 4. Analyze 5. Visualize 6. Interact with data to create data products. Objective of Data Science is to “Turn Data into Data Products”.
  11. 11. Traditionally, the data that we had was mostly structured and small in size, which could be analyzed by using the simple BI tools. Unlike data in the traditional systems which was mostly structured, today most of the data is unstructured or semi-structured. Let’s have a look at the data trends in the image given below which shows that by 2020, more than 80 % of the data will be unstructured.
  12. 12. Data Science Team •Business Analyst •Data & Analytics Manager •Data Analyst •Database Administrator •Data Scientist •Statistician •Data Engineer •Data Architect
  13. 13. Role of Business Analyst
  14. 14. What is Analytics? Data on its own is useless unless you can make sense of it! WHAT IS ANALYTICS? The scientific process of transforming data into insight for making better decisions, offering new opportunities for a competitive advantage 22
  15. 15. Types of Analytics 1 32 Analytics Prescriptive Analytics Descriptive analyticsPredictive analytics Enabling smart decisions based on data What should we do? Mining data to provide business insights What has happened? Predicting the future based on historical patterns What could happen?
  16. 16. Types of Analytics Prescriptive Analytics advice on possible outcomes Predictive Analytics understanding the future Descriptive Analytics insight into the past Why do airline prices change every hour? How do grocery cashiers know to hand you coupons you might actually use? How does Netflix frequently recommend just the right movie?
  17. 17. Features Business Intelligence (BI) Data Science Data Sources Structured (Usually SQL, often Data Warehouse) Both Structured and Unstructured ( logs, cloud data, SQL, NoSQL, text) Approach Statistics and Visualization Statistics, Machine Learning, Graph Analysis, Neuro- linguistic Programming (NLP) Focus Past and Present Present and Future Tools Pentaho, Microsoft BI, QlikView, R RapidMiner, BigML, Weka, R Business Intelligence (BI) vs. Data Science
  18. 18. Scope of Business Intelligence techniques employed in 2018.
  19. 19. Interest for “Data Science” term since December 2013 (source: Google Trends) Hype bag-of-words. Let’s not focus on buzzwords, but on what the beneath technologies can actually solve.
  20. 20. Lifecycle of Data Science
  21. 21. Contrast: Databases Databases Data Science Data Value “Precious” “Cheap” Data Volume Modest Massive Examples Bank records, Personnel records, Census, Medical records Online clicks, GPS logs, Tweets, Building sensor readings Priorities Consistency, Error recovery, Auditability Speed, Availability, Query richness Structured Strongly (Schema) Weakly or none (Text) Properties Transactions, ACID* CAP* theorem (2/3), eventual consistency Realizations SQL NoSQL: MongoDB, CouchDB, Hbase, Cassandra, Riak, Memcached, Apache River, … ACID = Atomicity, Consistency, Isolation and Durability CAP = Consistency, Availability, Partition Tolerance
  22. 22. Contrast: Machine Learning Data Science Explore many models, build and tune hybrids Understand empirical properties of models Develop/use tools that can handle massive datasets Take action! Machine Learning Develop new (individual) models Prove mathematical properties of models Improve/validate on a few, relatively clean, small datasets Publish a paper
  23. 23. the companies are expanding as fast as the data!
  24. 24. The first war: Terminology • Analyzing data has a long history! • There have been many terms that have been used to describe such endeavors: • Statistics • Artificial Intelligence • Machine learning • Data analytics • Since I happen to work in a “Data Science” program perhaps I may be allowed the indulgence of using that terminology…
  25. 25. The Case for Business Analytics • The Business environment today is more complex than ever before. • Businesses are expected to be diligently responsive to the increasing demands of customers, various stakeholders and even regulators. • Organizations have been turning to the use of analytics. • More than 83% of Global CIOs surveyed by IBM in 2010 singled out Business Intelligence and Analytics as one of their visionary plans for enhancing competitiveness. In most cases the primary objective of an organization that seeks to turn to analytics is: • Revenue/Profit growth • Optimize expenditure SOLUTION BUSINESS NEED GOAL 34
  26. 26. Data Analysis Has Been Around for a While… R.A. Fisher Howard Dresner Peter Luhn W.E. Deming
  27. 27. Experiments, observations, and numerical simulations in many areas of science and business are currently generating terabytes of data, and in some cases are on the verge of generating petabytes and beyond. Analyses of the information contained in these data sets have already led to major breakthroughs in fields ranging from genomics to astronomy and high-energy physics and to the development of new information-based industries. - Frontiers in Massive Data Analysis, National Research Council of the National Academies Given a large mass of data, we can by judicious selection construct perfectly plausible unassailable theories—all of which, some of which, or none of which may be right. - Paul Arnold Srere
  28. 28. The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it. -Hal Varian, Google's Chief Economist, http://www.mckinsey.com/insights/innovation/hal_varian_on_how_the_web_challenges_managers My personal goal: Getting students to be able to think critically about data.
  29. 29. What is Big Data? The are many examples of "data", but what makes some of it “big”? The classic definition revolves around the three V’s - Volume, velocity, and variety.  Volume: There is a just a lot of it being generated all the time. Things get interesting and “big”, when you can’t fit it all on one computer anymore. Why? There are many ideas here such as MapReduce, Hadoop, etc. that all revolve around being able to process data that goes from Terabytes, to Petabytes, to Exabytes.  Velocity: Data is being generated very quickly. Can you even store it all? If not, then what do you get rid of and what do you keep?  Variety: The data types you mention all take different shapes. What does it mean to store them so that you can play with or compare them?
  30. 30. BIGDATAData that is TOO LARGE & TOO COMPLEX for conventional data tools to capture, store and analyze. Shares traded on US Stock Markets each day: 7 Billion Data generated in one flight from NY to London: 10 Terabytes Number of tweets per day on Twitter: 400 Million Number of ‘Likes’ each day on Facebook: 3 Billion The 3V’s of Big Data VOLUME VARIETY VELOCITY 90% OF THE WORLD’S DATA WAS GENERATED IN THE LAST TWO YEARS Big Data Everywhere! www.imarticus.org 39
  31. 31. Is Big Data the same as Data Science?  Are Big Data and Data Science the same thing?  I wouldn't say so...  Data Science can be done on small data sets.  And not everything done using Big Data would necessarily be called Data Science. Big Data Data Science
  32. 32. Is Big Data the same as Data Science?  Are Big Data and Data Science the same thing?  I wouldn't say so...  Data Science can be done on small data sets.  And not everything done using Big Data would necessarily be called Data Science.  But there certainly is a substantial overlap! Big Data Data Science
  33. 33. Perspective Of Big Data's Growth • Worldwide Big Data market revenues for software and services are projected to increase from $42B in 2018 to $103B in 2027, attaining a Compound Annual Growth Rate (CAGR) of 10.48% according to Wikibon. •According to an Accenture study, 79% of enterprise executives agree that companies that do not embrace Big Data will lose their competitive position and could face extinction. Even more, 83%, have pursued Big Data projects to seize a competitive edge. •Forrester predicts the global Big Data software market will be worth $31B this year, growing 14% from the previous year. The entire global software market is forecast to be worth $628B in revenue, with $302B from applications. •Worldwide Big Data market revenues for software and services are projected to increase from $42B in 2018 to $103B in 2027, attaining a Compound Annual Growth Rate (CAGR) of 10.48% according to Wikibon. • 59% of executives say Big Data at their company would be improved through the use of AI according to PwC.
  34. 34. Future Trends Tech & Industries to watch out in near Future: • Progressive Web Apps (PWAs) — A mixture of a mobile and web apps. • Block Chain & Fintech – Meta-model building, reliable trading & credit scoring. • Healthcare — Diagnosis by Medical Imaging (Computer vision & ML). • AR/VR — Sport Analysis, Business Cards (Image Tracking), Real -Life Gaming (Hado). • AI Speech Assistants, smarter Chat-bot integrations. • Smart Supply Chain — Digital twins (IoT Sensors). • 5G — Big data, Mobile cloud computing, scalable IoT & Network function virtualisation (NFV). • 3D Printing — Prefabrication efficiency, Defect detection, Predictive ML maintenance. • Dark Data — Information that is yet to become available in digital format. • Quantum Computing — Cutting data processing times into fractions.
  35. 35. Thank You! Dr. Sunil Kr Pandey Professor & Director (IT & UG) Institute of Technology & Science Mohan Nagar, Ghaziabad Email: sunilpandey@its.edu.in

×