O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Ml in a day v 1.1

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
Machine learning101 v1.2
Machine learning101 v1.2
Carregando em…3
×

Confira estes a seguir

1 de 51 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Ml in a day v 1.1 (20)

Anúncio

Mais de CCG (20)

Mais recentes (20)

Anúncio

Ml in a day v 1.1

  1. 1. Machine Learning 101 Advanced Analytics and DataScience
  2. 2. CCG Analytics Solutions & Services DATA MANAGEMENT Data & analytics consultants with a passion for helping clients overcome business challenges & increase performance by leveraging modern analytic solutions. BUSINESS ANALYTICS DATA STRATEGY
  3. 3. VOICES OF OUR CUSTOMERS “CCGto brought the expertise and the vision of to help us execute, to provide visibility to the data in a manner that we can use it faster.” - Gary Gray, Business Solutions Executive, Corsicana Mattress Company “The people we talked to know us. CCGwasn’t trying to fit us into a boilerplate template but prescribe a tailored solution. Their RapidRoadmap was the basis of our BI Strategy for the next two years.” - Kevin Davis, Sr. Director of BI, Kforce “Many times with CCG, we come to the table with questions or ideas and within a couple of days or weeks the team comes back with above and beyond what we actually asked for. They care.” - Chris Fitzpatrick, Vice President of Business Analytics & Strategy, vineyard vines “"I'mamazed at the talent at CCG, not just the skillset - they're really good people. We've already referred them once and will do so again!” - CIO, Ruth’s Chris Hospitality Group
  4. 4. Objectives By the end of this workshop, you should be able to: Describe what Machine Learning is and how it fits in to the analytic landscape Understand the difference between traditional and “advanced” analytics Describe what a statistical model is Understand a machine learning approach to statistical modeling The conceptual methodology behind the Machine Learning areas of classification, and clustering Describe some the most common tools for implementing data science and Machine Learning
  5. 5. AGENDA Why should anyone care about machine learning? What is Machine Learning? How does Machine Learning work? Ok but how does it really work? How can an organization use Machine Learning?
  6. 6. The concepts in Machine Learning are not new. How has Machine Learning Evolved? https://www.quantinsti.com/blog/machine-learning-basics nother human.
  7. 7. Even though the concepts are decades old, machine learning has only become feasible at scale in recent years. Why Machine Learning Now? Flood of data and decreasing costs of storage Increasing computational power Increased attention from researchers Growth of open source technologies Support from industries
  8. 8. Machine Learning has tons of useful applications you already encounter or hear about every day. Analyzing Images Understanding Language Forming & Executing Strategy Personalized Recommendations Autonomous Decisions Predicting Asset Values How is Machine Learning used?
  9. 9. Machine Learning isn’t just applicable to high tech. There are suitable use cases presentin most business sectors. Where is Machine Learning used? Healthcare • Claims Fraud • Real-time mortality risk for ICU patients • Response Adapted Radiotherapy • Predictingpatient medication adherence • Translational/precision medicine Finance • Foreclosure/creditrisk • Risk analysis • Fraud detection • Demand forecasting • Anti Money Laundering • Algorithmic trading Energy • Resource allocation • Load forecasting • Grid optimization • Robotics • Anomaly detection • Image recognition • Predictivemaintenance Retail • Singleview of customer • Customer serviceanalysis • Inventory planning • Social media analysis • Lead scoring • Marketing campaign evaluation
  10. 10. Machine learning sits at the intersectionof statistics and computer science to help businesses make decisions. Why Machine Learning Now? Computational Power Statistics Predictive & Prescriptive Decision Support Faster More Accurate More PowerfulSelf-Improving Always-On
  11. 11. AGENDA Why should anyone care about machine learning? What is Machine Learning? How does Machine Learning work? Ok but how does it really work? How can an organization use Machine Learning?
  12. 12. Machine Learning is a technique that can be used in the data science process to achieve several possible outputs. What is Machine Learning? Data Science A broadprocessfor generatinginsights that mayinvolve dataingestionfrom one or manysources(includingexternal data, streamingdata, or bigdata), data processingandcleansing, model generationusingstatistical ormachine learningapproaches, model selection, model deploymentandmaintenance, and visualizationof data. Advanced Analytics Applydatascience topredictive (what will happen?)orprescriptive(what shouldwe do?) businessuse cases. Artificial Intelligence / Cognitive Computing Applydatascience toapproximate humanintuitionanddecisionmaking (e.g.strategy,creativity,planning) or humansensoryfunctions(e.g. computervision,natural language understanding,etc.) Statistics A branchof mathfor generating descriptionsof orinferencesabouta population,oftenbasedonsamples of the population.Inferencesmay take the form of “models,”which are equationsthatapproximate the data’sinherentrelationships. Machine Learning Combinescomputerscience with mathconceptsto generate models by rapidlyiteratingonlarge datasets. Other Analytics Disciplines (Data Engineering,Visualization) Disciplines Process Outputs Automation / Robotics / Intelligent Devices Actions Strategy / Operations
  13. 13. Advanced Analytics (“AA”) enable predictive and prescriptive uses of data by applying sophisticatedmath and statistics to automate parts of the analysis. What is Advanced Analytics? Traditional analytics focuses on understanding and explaining the data that has been collected. AA focuses on generating new data in the form of predictions or decisions, and going the extra step to automate decision-making when possible.
  14. 14. Advanced Analytics deal with making “best guesses” faster,better,and more consistentthan relying on human SMEs. Provide insights on existing data using: • Raw data points • Summaries of data • Calculations across existing data fields • KPIs The data reported are historical or current facts. Generally requires the application of basic mathematics or arithmetic. Generate new data, including: • Predicted future values • Best guesses of missing values • Suggested next steps • Categorizations The data generated are “best guesses” and contain some uncertainty. Requires the application of advanced mathematics, statistics and computing principles. TraditionalAnalytics AdvancedAnalytics Traditional vs. Advanced Analytics
  15. 15. AGENDA Why should anyone care about machine learning? What is Machine Learning? How does Machine Learning work? Ok but how does it really work? How can an organization use Machine Learning?
  16. 16. A model is a repeatable, data-driven approach to making a best guess. It does this by formalizing mathematical relationships between data in the form of either: – Rules (e.g. predict applicants will default on a loan if Credit Score < 700 and Debt to Income Ratio > 30%) – Or an equation (e.g. predict Home Price = 100*Square Footage + 2*Average Income in the Area) NOTE that this is not the same as a DATA model. These are different things: Machine Learning works by using “algorithms” to generate “models.” How does Machine Learning work? Data Model Statistical Model
  17. 17. In the past we’ve toldcomputers how to use data to a answer our questions. Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $2MM Program / Model This month sales = (prior month + 2 months prior + 3 months prior) / 3 Answer This month’s sales = $3MM? What’s a model?
  18. 18. Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM But we’ve found that if we give the machine historicfacts, we can let it find the right program/ model to plug in for future answers. Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $2MM Program / Model This month’s sales = 1/8 * Prior month + 1/3 * 2 months prior + 1/4 * 3 months prior What’s a model?
  19. 19. Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM Once we have our machine-defined program, we can use it with new data to make better predictions. Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $2MM Program / Model This month’s sales = 1/8 * Prior month + 1/3 * 2 months prior + 1/4 * 3 months prior New Data Prior month sales: $8MM 2 months prior: $6MM 3 months prior: $8MM Answer This month’s sales = $5MM What’s a model?
  20. 20. A defined set of steps for solving a problem Often involves repeating steps May or may not have an ending condition – The problem is solved to our satisfaction • For example – stop when the last 4 iterations have been 95% accurate or better – The problem hasn’t been solved but we don’t seem to be getting any closer to solving it • For example – stop if the last 10 iterations have not seen any improvement in accuracy – The process has run for a long time • For example – stop after the program has run for 12 hours, regardless of whether progress is still being made The wordalgorithm gets used a lot, but it isn’t always defined. What is an algorithm?
  21. 21. Collect the data and randomly create initial decision rules. Design a method for measurably evaluating how good or bad your hypothesis is. Update your hypothesis in a way that marginally improves the performance of your decision rules. Continue this process until the hypothesis either you are satisfied with the results, or your hypothesis can’t improve anymore with the data available. Almostall machine learning algorithms followthe same general pattern. Create a hypothesis Evaluate the hypothesis Adjustthe hypothesis Repeat until convergence What is an algorithm?
  22. 22. AGENDA Why should anyone care about machine learning? What is Machine Learning? How does Machine Learning work? Ok but how does it really work? How can an organization use Machine Learning?
  23. 23. There are two main families of algorithms to choose from. Supervised Learning Unsupervised Learning There aren’t necessarily “right answers,” we just want to get a better understanding of our data. We know the “right answers” for some of the scenarios. – We may have history we can look back on – We may be hoping to replicate human decision making
  24. 24. Supervisedor Unsupervised? Predict our profits next quarter. Supervised Identify the number written on a check. Group our customers into segments. Supervised Unsupervised Predict a user’s rating for a given product. Supervised Find the most importantvariables in a dataset. Unsupervised Identify credit card transactionsthat are out of the ordinary. Unsupervised
  25. 25. Now let’s walkthrough two of the mostpopular machine learning approaches and discuss how the algorithms are applied. How does an algorithm really workfor businesses? Classification Clustering
  26. 26. Use classificationwhen you want to guess a non-numeric value,like a yes/no answer.We will take a decisiontree approach. Everyone will repay their loan. Create a hypothesis 20 outstanding loans
  27. 27. Use classificationwhen you want to guess a non-numeric value,like a yes/no answer.We will take a decisiontree approach. Calculate accuracy as the % of predictions that are correct based on your current set of rules. Evaluate the hypothesis 20 outstanding loans 12 repaid, 8 defaulted Accuracy = 12/20 = 60%
  28. 28. Use classificationwhen you want to guess a non-numeric value,like a yes/no answer.We will take a decisiontree approach. Find the next branch by looking for the data split that would have the biggest impact on the purity of each node. There are several ways to do this mathematically (Gini Index, Information Gain, Chi- Square). Adjustthe hypothesis 20 outstanding loans20 outstanding loans 20 outstanding loans CreditScore > 700CreditScore < 700 Income > 60kIncome < 60k DTI > 40%DTI < 40% 80%73%70%50%71%53% 59% weighted 60% weighted 75% weighted
  29. 29. Use classificationwhen you want to guess a non-numeric value,like a yes/no answer.We will take a decisiontree approach. Repeat the process for each of your new “leaf” nodes. Stop when you reach an acceptable level of accuracy, or when your accuracy begins getting worse with independent data. Repeat until convergence 20 outstanding loans DTI > 40%DTI < 40% CreditScore > 700CreditScore < 700Income > $60kIncome < $60k 100%50% 100%100% 80% weighted
  30. 30. Classificationis used for lots of problems that copy human intuition. Think about how you classify informationto identify these images! These use cases areobviously morecomplex than our simple decision tree, but with moreadvanced approaches like convolutionalneural networks thesepictures can definitely be classified by a machine.
  31. 31. Use clustering when there’s no “correct”classification,but you still want to assign individuals to groups. This algorithmis called k-means clustering. Imagine Marketing has asked you to split these customers into 3 groups. How would you do it?
  32. 32. Use clustering when there’s no “correct”classification,but you still want to assign individuals to groups. This algorithmis called k-means clustering. I can segment my customers by assigning them to 3 groups. We’ll set down 3 random “anchors” and assign each customer to its closest anchor. Create a hypothesis
  33. 33. Use clustering when there’s no “correct”classification,but you still want to assign individuals to groups. This algorithmis called k-means clustering. Find the distance between each customer and the center of each group. Take note of which customers are actually closest to a different center than the one they’re assigned to. Evaluate the hypothesis
  34. 34. Use clustering when there’s no “correct”classification,but you still want to assign individuals to groups. This algorithmis called k-means clustering. Reassign each customer to the group corresponding to the center they’re closest to, and move the anchors to the middle of their new group. Adjustthe hypothesis
  35. 35. Use clustering when there’s no “correct”classification,but you still want to assign individuals to groups. This algorithmis called k-means clustering. Repeat until convergence Keep moving the anchors and re-assigning customers until the anchors stop moving.
  36. 36. This is just the tip of the iceberg.There are several algorithms available for various types of problems.
  37. 37. AGENDA Why should anyone care about machine learning? What is Machine Learning? How does Machine Learning work? Ok but how does it really work? How can an organization use Machine Learning?
  38. 38. Delivering analytics with Machine Learning requires alignment across people, process,technology,and data. Engaging with Machine Learning Image inspired by Microsoft People Process Technology Data Guide Support Enable
  39. 39. Data scientists combine broad skills to integrate data, build models,and drive business value. People Process Technology Data
  40. 40. Let’s lookat the MicrosoftTeamData Science Process to see how data scientists spend their time. People Process Technology Data
  41. 41. TraditionalAnalytics The outputs of the process can be used in traditional analytics, analyzed directly,or fed into automated decision-making. Storeand access data. Filter and aggregate it. Visualizeit. Show it to the business so they can take action. MachineLearning Filter and aggregate it. Create a model. Generate new data (predictions, etc.). The new data can be stored with the rest of the data for usein analytics. Or it can be visualized directly to gain insights. Or it can automate decisions or actions, allowing better processes to run faster and 24/7. People Process Technology Data
  42. 42. The sources of data for use in data science can be broad. People Process Technology Data Data Warehouses •Curated & Governeddata •Big data •Cloud or on-prem Data Lakes •Unstructured& Semi-structured data •Streaming data •Partiallycurated Externally Procured Data •Maybe purchased from 3rd party providers •Maybe scraped from the web •Mayrequire designingresearch experiments Data scientists typically havethe programming and data integration skills to use data fromanywhereitcan be found.
  43. 43. The Microsofttechnology stackprovides a holistic solutionto your Machine Learning needs. People Process Technology Data
  44. 44. We can work with your business to deliver custompredictive and prescriptive analytics across the lifecycle. What can CCG do? Use Case Definition • Develop a backlog of predictive and prescriptive use cases • Refine and prioritize use cases by value • Develop a predictive roadmap Model Development • Aggregate data from across internal and external data sources • Develop and test multiple models to find the best approach to making predictions Model Maintenance • Monitor and maintain statistical models to sustain predictive power • Develop a model telemetry dashboard • Test model design changes to improve predictive power Model Governance & Processes • Assess existing Data Science capabilities • Develop standards and processes to help guide data science output • Build a Data Science Center of Excellence Model Deployment • Customize and deploy pre-existing models from Azure Cognitive Services • Deploy custom model as an API or batch job, or support deployment in existing systems Rapid Insight Prototype Offering Model as a Service Subscription Offering
  45. 45. CCG’s Rapid Insight Solution Actionable Backlog – Of use cases ripe for predictive analytics to transform your business Detailed Readouts – The materials we leave behind will include extensive analysis of our methodology, findings, and recommendations Ownership of the Model – Just because the project ends doesn’t mean the model stops working. Unlike other managed service providers, what we produce on your behalf is yours to keep Identify Use Cases – By holding a workshop with process SMEs to identify opportunities to supercharge the business Summarize the Findings – So you can understand the model’s outputs and begin taking action on what we’ve learned Develop a Prototype Model – To generate forecasts, classifications, orexploratory analysis forone of your use cases using an industry-standard tool like Azure Machine Learning Studio or Databricks Week 1 Weeks 2-5 Week 6
  46. 46. Fully Operational Production Model – Available at all times, in production – Batch & API integrations Model Supervision – Model is monitored for ongoing usability – Performance dashboard – Guaranteed accuracy SLAs Model Retraining & Support – Scheduled & triggered model re-tuning or re-training – Add new data features over time Model as a Service Solution Set up model as a web service Visualize model performance in a dashboard Maintain and enhance model
  47. 47. THANK YOU! What questions do you have?
  48. 48. Microsoftofferspre-builtAPIs through Cognitive Services that can expedite the deploymentof AI capabilities. People Process Technology Data
  49. 49. VISUAL DRAG -AND-DROP Azure Machine Learning Studio
  50. 50. What is Azure Databricks? A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure Best of Databricks Best of Microsoft Designed in collaboration with the founders of Apache Spark One-click set up; streamlined workflows Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage) Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SLAs)
  51. 51. Azure Databricks key audiences & benefits Unified analytics platform Integrated workspace Easy data exploration Collaborative experience Interactive dashboards Faster insights • Best of Spark & serverless • Databricks managed Spark Improved ETL performance • Zero management clusters, serverless Easy to schedule jobs Automated workflows Enhanced monitoring & troubleshooting • Automated alerts & easy access to logs Zero Management Spark Cluster democratization (serverless) Fast, collaborative analytics platform accelerating time to market No dev-ops required Enterprise grade security • Encryption • End-to-end auditing • Role-based control • Compliance Data scientist Data engineer CDO, VP of analytics Provided by Microsoft and Databricks under NDA

×