O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
Big Data Scotland 2017
Big Data Scotland 2017
Carregando em…3
×

Confira estes a seguir

1 de 37 Anúncio

DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION

Baixar para ler offline

Today, data science is enabling companies, governments, research centres and other organisations to turn their volumes of big data into valuable and actionable insights. It is important to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. According to the McKinsey Global Institute, the U.S. alone could face a shortage of about 190,000 data scientists and 1.5 million managers and analysts who can understand and make decisions using big data by 2018. In coming years, data scientists will be vital to all sectors —from law and medicine to media and nonprofits. Has the African continent planned to train the next generation of data scientists required on the continent?

Today, data science is enabling companies, governments, research centres and other organisations to turn their volumes of big data into valuable and actionable insights. It is important to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. According to the McKinsey Global Institute, the U.S. alone could face a shortage of about 190,000 data scientists and 1.5 million managers and analysts who can understand and make decisions using big data by 2018. In coming years, data scientists will be vital to all sectors —from law and medicine to media and nonprofits. Has the African continent planned to train the next generation of data scientists required on the continent?

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Quem viu também gostou (20)

Anúncio

Semelhante a DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION (20)

Anúncio

Mais recentes (20)

DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION

  1. 1. Data Science a Multifaceted Discipline: Data Science Engineering and Data Science Analytics A Keynote Address, 24 February 2017 By Prof. Venansius Baryamureeba, PhD Chairman and Managing Director, ICT Consults Ltd www.ict.co.ug www.baryamureeba.ug; barya@baryamureeba.ug www.utamu.ac.ug/barya; barya@utamu.ac.ug Africa Data Forum Johannesburg Conference, 22-24 February 2017.
  2. 2. Outline • Data Science • Data Science a Multifaceted Discipline • Foundations of Data Science • Data (Science) Engineering • Importance and Evolving Role of Data Science • Examples of Data Science in Action • Data (Science) Analytics • Big Data Analytics • Conclusion
  3. 3. Data Science • Data Science is an interdisciplinary discipline about methods and systems to extract knowledge or insights from large quantities of data coming in various forms. Historically, no single practice described the simultaneous use of so many different skill sets and bases of knowledge. • Data science has emerged as the field that exists at the intersection of mathematics, statistics and computer science knowledge and expertise in a science discipline. • Data science employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, and computer and information sciences and applies them on a wide range of data-rich domains such as biomedical sciences, physical science, geoscience, social science, engineering, business, and education.
  4. 4. Data Science a Multifaceted Discipline • Data science is a very broad and multifaceted field • Data science combines aspects of computer science, information science, mathematics and statistics. • Data Science requires a multidisciplinary skill set (i.e. requires skills in computer science, analytics, data management, art and design and entrepreneurship among others). • Data science uses automated methods to analyze massive amounts of data and extract knowledge from them. • Data science applies various tools and techniques to data in order to gain a data product, an exploitable insight derived from collected facts. • Data science provides the underlying theory and methods of the data revolution.
  5. 5. Foundations of Data Science • While there is not yet a consensus on what precisely constitutes data science, three professional communities, all within computer science and/or statistics, are emerging as foundational to data science: • (i) Database Management enables transformation, conglomeration, and organization of data resources; • (ii) Statistics and Machine Learning convert data into knowledge; and • (iii) Distributed and Parallel Systems provide the computational infrastructure to carry out data analysis.
  6. 6. Role of Statistics in Data Science • In a policy statement issued on October 1, 2015, the American Statistical Association (ASA) stated that statistics is "foundational to data science"—along with database management and distributed and parallel systems—and its use in this emerging field empowers researchers to extract knowledge and obtain better results from Big Data and other analytics projects. • The statement also encouraged "maximum and multifaceted collaboration" between statisticians and data scientists to maximize the full potential of big data and data science.
  7. 7. Data Scientist Vs Data Engineer • Data Scientists and Data Engineers may be new job titles, but the core job roles have been around for a while. • Traditionally, anyone who analyzed data would be called a “Data Analyst” and anyone who created backend platforms to support data analysis would be called a “Business Intelligence (BI) Developer”. • With the emergence of big data, new roles emerged in corporations, research centers and governments — namely, Data Scientists and Data Engineers.
  8. 8. Data Analyst • Data Analysts are experienced data professionals in their organization who can query and process data, provide reports, summarize and visualize data. • They have a strong understanding of how to leverage existing tools and methods to solve a problem, and help people from across the organisation understand specific queries with ad-hoc reports and charts. • However, they are not expected to deal with analyzing big data, nor are they typically expected to have the mathematical or research background to develop new algorithms for specific problems. • Skills and Tools: Data Analysts need to have a baseline understanding of some core skills: statistics, data munging, data visualization, exploratory data analysis, Microsoft Excel, SPSS, SPSS Modeler, SAS, SAS Miner, SQL, Microsoft Access, Tableau, SSAS.
  9. 9. Business Intelligence Developer • Business Intelligence (BI) Developers are data experts that interact more closely with internal stakeholders to understand the reporting needs, and then to collect requirements, design, and build BI and reporting solutions for the organisation. • They have to design, develop and support new and existing data warehouses, ETL ( Extract, Transform and Load) packages, cubes, dashboards and analytical reports. • They work with databases, both relational and multidimensional, and should have great SQL development skills to integrate data from different resources. They use all of these skills to meet the enterprise-wide self- service needs. • BI Developers are typically not expected to perform data analyses. • Skills and tools: ETL, developing reports, OLAP, cubes, web intelligence, business objects design, Tableau, dashboard tools, SQL, SSAS, SSIS.
  10. 10. Data Scientist • A data scientist is the alchemist of the 21st century: someone who can turn raw data into purified insights. Data scientists apply statistics, machine learning, and other analytic approaches to solve critical business problems. Their primary function is to help organizations turn their volumes of big data into valuable and actionable insights. • In addition to data analytical skills, Data Scientists are expected to have strong programming skills, an ability to design new algorithms, handle big data, with some expertise in the domain knowledge. • Data Scientists are also expected to interpret and eloquently deliver the results of their findings, by visualization techniques, building data science apps, or narrating interesting stories about the solutions to their data (business) problems.
  11. 11. Data Scientist Cont’d • The problem-solving skills of a data scientist requires an understanding of traditional and new data analysis methods to build statistical models or discover patterns in data. For example, creating a recommendation engine, predicting the stock market, diagnosing patients based on their similarity, or finding the patterns of fraudulent transactions. • Data Scientists may sometimes be presented with big data without a particular business problem in mind. In this case, the curious Data Scientist is expected to explore the data, come up with the right questions, and provide interesting findings! • They should have experience working with different datasets of different sizes and shapes, and be able to run their algorithms on large size data effectively and efficiently, which typically means staying up-to-date with all the latest cutting-edge technologies. • Skills and tools: Python, R, Scala, Apache Spark, Hadoop, data mining tools and algorithms, machine learning, statistics.
  12. 12. Data Engineer • Data engineering includes what some organisations might call Data Infrastructure or Data Architecture. • The data engineer gathers and collects the data, stores it, does batch processing or real-time processing on it, and serves it via an API to a data scientist who can easily query it. • A good data engineer has extensive knowledge on databases and best engineering practices. These include handling and logging errors, monitoring the system, building human-fault-tolerant pipelines, understanding what is necessary to scale up, addressing continuous integration, knowledge of database administration, maintaining data cleaning, and ensuring a deterministic pipeline.
  13. 13. Data Engineer Cont’d • Data Engineers are the data professionals who prepare the “big data” infrastructure to be analyzed by Data Scientists. • They are software engineers who design, build, integrate data from various resources, and manage big data. Then, they write complex queries on that, make sure it is easily accessible, works smoothly, and their goal is optimizing the performance of their organisation’s big data ecosystem. • They might also run some ETL (Extract, Transform and Load) on top of big datasets and create big data warehouses that can be used for reporting or analysis by data scientists. Beyond that, because Data Engineers focus more on the design and architecture, they are typically not expected to know any machine learning or analytics for big data. • Skills and tools: Hadoop, MapReduce, Hive, Pig, MySQL, MongoDB, Cassandra, Data streaming, NoSQL, SQL, programming.
  14. 14. Data Scientist and Data Engineer • There is great deal of overlap between these two roles. • For instance, a data scientist might use the Hadoop ecosystem to serve up answers to their data questions, and a data engineer might be programming an iterative machine learning algorithm to run over a Spark cluster. • Some companies, research centres or governments prefer that candidates are comfortable with aspects from both data science and data engineering. Additionally, if a company, research centre or government has defined these two roles separately, it can be possible to switch from one role to the other.
  15. 15. Key Skill Areas for a Graduate in Data Science and Engineering • For a Graduate in Data Science and Engineering, the core computer science and statistics courses should cover: Process Mining, Data Mining, Algorithms, Visualization, Real-life data challenges, Statistics for Big Data, Statistical Learning Theory, and Probability and Stochastic Processes. • For a Graduate in Data Science the core courses should cover: Database and Cloud Computing technology for Big Data; Data Mining, Statistics and Predictive Modeling; Machine Learning and Graph Analytics; Information Retrieval and Natural Language Processing; Business Intelligence and Visual Analytics; Data Warehousing and Decision Support; Communication and Visualization of Results; Privacy, Security and Ethics; and Entrepreneurship and Data Product Design. • For a Graduate in Data Engineering, more courses can be got from a graduate program in software engineering to add to the common courses of a graduate program in data science and engineering above.
  16. 16. Importance of Data Science • We live in a digitized world in which massive amounts of data are harvested daily to inform actions and policies for the future. • We build sophisticated systems to collect, organize, analyze, and share data. • We each have unlimited access to huge amounts of information and the tools to interpret it. • We are more aware than ever how molecules and cells move, how inflation fluctuates, and how the flu travels, all in real time. • We can efficiently distribute bus stations and plan transit schedules. • With the right tools, we can predict how proteins misfold in our brains, or what our galaxy might look like in a thousand years. • In a society driven by data, knowledge is a commodity that is created and shared transparently all over the world.
  17. 17. Disciplinary Trends • Data science is a rapidly growing field with an increasing demand in industry, research, and government. • A recent McKinsey Global Institute study states that the US will face a shortage of about 190,000 data scientists and 1.5 million managers and analysts who can understand and make decisions using big data by 2018. • In a recent MIT Sloan Management Review survey, four in ten (43%) companies report their lack of appropriate analytical skills as a key challenge. • The ideal data scientist is a scientist with entrepreneurial skills, who is used to asking the right business questions, understands the techniques and is familiar with the tools for solving them.
  18. 18. Turning Data into Insight • From government, social networks and ecommerce sites to sensors, smart meters and mobile networks, data is being collected at an unprecedented speed and scale. • The networked world is generating big data that no human, or group of humans, can process fast enough. • This big data has the potential to transform the way business, government, science and healthcare are carried out. • Data science holds the key to unlocking that potential i.e. Data science can put big data to use.
  19. 19. The Evolving Role of Data Science • In the social sciences, modern research problems demand analysis beyond traditional statistical hypothesis testing. Students are increasingly faced with the prospect of building their own analysis software and methodologies. • In the life sciences, vast quantities of data generated by new Deoxyribonucleic acid (DNA), Ribonucleic acid (RNA), and protein sequencing technologies have engulfed biologists and chemists, who rarely have training in statistics or computer science. • Physicists, who traditionally have the most computational training, are tackling data sets of orders of magnitude larger than the previous generation of researchers ever dealt with. As Bloom says, “big data is when you have more data than you’re used to.”
  20. 20. Evolving Role of Data Science cont’d • Why would customers go to physical shops if the majority of the products can be bought online on Amazon - that in turn even suggests products/articles that are bought by like-minded people? • Why would future generations go to expensive financial advisors of established banks, when Google offers often better financial advice by analyzing search behavior using Google Trends? • Understanding the needs of the new online society is key for succeeding in today’s business world, and Data Science is one approach towards data-driven decision making as opposed to using “gut feelings”.
  21. 21. Scientific Method Vs Analytical Method • Scientific Method Is a method of procedure that has characterized natural science since the 17th century, consisting of systematic observation, measurement, and experiment, and the formulation, testing, and modification of hypotheses. • Analytical Method is a generic process combining the power of the Scientific Method with the use of formal process to solve any type of problem.
  22. 22. Analytical Method • Analytic Method has nine steps: • 1. Identify the problem to solve. • 2. Choose an appropriate process. (THE KEY STEP) • 3. Use the process to hypothesize analysis or solution elements. • 4. Design an experiment(s) to test the hypothesis. • 5. Perform the experiment(s). • 6. Accept, reject, or modify the hypothesis. • 7. Repeat steps 3, 4, 5, and 6 until the hypothesis is accepted. • 8. Implement the solution. • 9. Continuously improve the process as opportunities arise.
  23. 23. Examples of Data Science in Action Problems that we used to solve using operations research techniques are now better solved using data science techniques. • Planning and forecasting: • identifying possible future developments in telecommunications • Identifying possible future developments in banking • deciding how much capacity is needed in a holiday business • Marketing: evaluating the value of sale promotions, developing customer profiles and computing the life-time value of a customer. • Credit scoring: deciding which customers offer the best prospects for credit companies.
  24. 24. Examples of Data Science in Action cont’d • Scheduling: • of aircrews and the fleet for airlines • of vehicles in supply chains • of orders in a factory • of operating theatres in a hospital • Yield management: • setting the prices of airline seats and hotel rooms to reflect changing demand and the risk of no shows • Facility planning: • computer simulations of airports for the rapid and safe processing of travellers • improving appointments systems for medical practice. • Defense and peace keeping: finding ways to deploy troops rapidly.
  25. 25. Big Data Analytics • Big data analytics is the process of examining large data sets to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. • Big data analytics is used in many industries to allow companies and organizations to make better business decisions and in the sciences to verify or disprove existing models or theories. • Computing power is needed for big data analytics
  26. 26. Why Big Data Analytics is Important • To maximize the discovery potential, we must employ advanced big data analytics methods and algorithms, visualization techniques, and high-performance computing. • The unprecedented and multifaceted challenges demand for advanced big data analytics skills in statistics, data mining, machine learning, signal/image processing and visualization, data management and programming. • These skills bridge several disciplines and push research frontiers: from the methods disciplines of computer science, electrical engineering, applied mathematics, and statistics to domain disciplines across science and engineering.
  27. 27. Why Big Data Analytics is Important Cont’d • Big data analytics examines large amounts of data to uncover hidden patterns, correlations and other insights. • With today’s technology, it’s possible to analyze data and get answers from it almost immediately whereas the traditional business intelligence solutions are slower and less efficient. • Big data analytics helps organizations harness their data and use it to identify new opportunities. That, in turn, leads to smarter business moves, more efficient operations, higher profits and happier customers. • Businesses can learn key insights about their customers to make informed business decisions. • Scientists can discover previously unknown patterns hidden deep inside the mountains of data.
  28. 28. Why Big Data Analytics is Important Cont’d • Cost reduction. Big data technologies such as Hadoop and cloud-based analytics bring significant cost advantages when it comes to storing large amounts of data – plus they can identify more efficient ways of doing business. • Faster, better decision making. With the speed of Hadoop and in-memory analytics, combined with the ability to analyze new sources of data, businesses are able to analyze information immediately – and make decisions based on what they’ve learned. • New products and services. With the ability to gauge customer needs and satisfaction through analytics comes the power to give customers what they want. With big data analytics, more companies are creating new products to meet customers’ needs.
  29. 29. Emphasis on a Few Application Areas of Big Data Analytics • E-Business • Politics • Informal Sector • Healthcare Management • Mobile Money
  30. 30. E-Business Systems • E-business systems are a set of online technologies, equipment and tools that a business uses to conduct business via the Internet. These systems help a company/ organisation connect with customers, process orders and manage information. • For instance, one high-profit e-business system is a web-based retail store where customers can purchase products online. • Components of Business • Business Process • Managing Business and Firm Hierarchies • The Business Environment • The Role of Information Systems in Business • Systems that Span the Enterprise • Enterprise Applications • Intranets and Extranets • E-Business, E-Commerce and E-Government
  31. 31. Politics and Big Data Analytics • Winning politics is now tied to big data analytics • One of the storylines in the November 2016 US presidential election is how both major political parties used big data analytics to inform their decisions and tried to get ahead. • In winning the 2012 US presidential election, the Obama campaign successfully employed big data analytics to influence people and get them to vote. Analytics experts say enterprises can apply these same tactics to influence customers and drive sales. • The 2012 US Presidential election was a watershed event for leveraging technology in the political arena. Both the Obama and Romney campaigns relied heavily on technology, but many analysts say the Obama campaign tapped into the power of big data analytics more effectively.
  32. 32. The Informal Sector and Big Data Analytics • International Labor Organization's (ILO) Guidelines on Measuring the Informal Sector uses big data analytics techniques • Knowing the size of the informal sector in any country/continent helps in planning and deployment of key interventions • Big data analytics is critical in informing strategies aimed at transforming the informal sector to the formal sector
  33. 33. Healthcare Management and Big Data Analytics • The healthcare industry historically has generated large amounts of data, driven by record keeping, compliance & regulatory requirements, and patient care. • Big data in healthcare refers to electronic health data sets so large and complex that they are difficult (or impossible) to manage with traditional software and/or hardware; nor can they be easily managed with traditional or common data management tools and methods • Big data analytics in healthcare is evolving into a promising field for providing insight from very large data sets and improving outcomes while reducing costs. • Big data analytics in healthcare has great potential despite the various challenges to overcome.
  34. 34. Mobile Money and Big Data Analytics • Mobile money providers, particularly mobile operators, are sitting on two gold mines of data: one from their core GSM operations (Telco Call Detail Record (CDR) data, detailed coordinates of their Cell IDs, etc.) and one from their mobile money operation (Know Your Customer (KYC) data for customers, agent registration forms, transactional databases, etc.). • Uncovering, analyzing and transforming mobile money data into action: • Big data analytics can help in understanding how issues like customer demographics, usage in the first month after sign-up and quality of agents impact ongoing customer activity. • Big data analytics can also yield very powerful insights to track mobile money fraud, how to better manage an agent network, manage float and cash, drive the marketing expenditures, etc. • Big data analytics can feed into most of the key business decisions a mobile money manager can make.
  35. 35. Conclusion • Data Science continues to evolve as a multifaceted discipline • The demand for Data Scientists and Data Engineers is growing by leaps and bounds every passing day • According to the McKinsey Global Institute, the U.S. alone could face a shortage of about 190,000 professionals with data science skills by 2018. • McKinsey Global Institute found that sectors such as computer and electronic products and information, finance and insurance, and government will likely gain the most value from using big data, and thus employ many of the world’s data scientists. • Data scientists will be vital to all sectors in coming years—from law and medicine to media and nonprofits. Thank You. END

Notas do Editor

  • datascience.nyu.edu
  • Data Science Skill Set, T. Stadelmann et.al, Applied Data Science in Europe.

×