Anúncio
Anúncio

Mais conteúdo relacionado

Apresentações para você(20)

Similar a Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data Management and Machine Learning" - Bjorn Staender(20)

Anúncio

Mais de Dataconomy Media(20)

Anúncio

Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data Management and Machine Learning" - Bjorn Staender

  1. Bjoern Staender Analytics & Data Innovation Oracle Deutschland B.V. & Co KG September 17, 2019 How to be more productive with Autonomous Data Management and Machine Learning
  2. Safe harbor statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. © 2019 Oracle
  3. Precision Agriculture Marketing Analytics Data Monetization Customer Experience Higher Education Smart Cities Factory of the Future Gene Therapy © 2019 Oracle How to managa and analyze all this data ?
  4. Manage and Analyze All Your Data Assets Big Data SQL / R R / Python / SQL Object Store “Engineered Features” Derived attributes that reflect domain knowledge—key to best models e.g.: • Counts • Totals • Changes over time Boil down the Data Lake Architecturally, Many Options and Flexibility * OML4Py – coming soon© 2019 Oracle
  5. Algorithms Operate on Data ML and AI are just “Algorithms” Move the Algorithms - Not the Data! It Changes Everything! © 2019 Oracle
  6. CLASSIFICATION • Naïve Bayes • Logistic Regression (GLM) • Decision Tree • Random Forest • Neural Network • Support Vector Machine • Explicit Semantic Analysis CLUSTERING • Hierarchical K-Means • Hierarchical O-Cluster • Expectation Maximization (EM) ANOMALY DETECTION • One-Class SVM TIME SERIES • State of the art forecasting using Exponential Smoothing • Includes all popular models e.g. Holt-Winters with trends, seasons, irregularity, missing data REGRESSION • Linear Model • Generalized Linear Model • Support Vector Machine (SVM) • Stepwise Linear regression • Neural Network • LASSO * ATTRIBUTE IMPORTANCE • Minimum Description Length • Principal Comp Analysis (PCA) • Unsupervised Pair-wise KL Div • CUR decomposition for row & AI ASSOCIATION RULES • A priori/ market basket PREDICTIVE QUERIES • Predict, cluster, detect, features SQL ANALYTICS • SQL Windows, SQL Patterns, SQL Aggregates • OAA (Oracle Data Mining + Oracle R Enterprise) and ORAAH combined • OAA includes support for Partitioned Models, Transactional, Unstructured, Geo-spatial, Graph data. etc, Oracle Machine Learning Algorithms FEATURE EXTRACTION • Principal Comp Analysis (PCA) • Non-negative Matrix Factorization • Singular Value Decomposition (SVD) • Explicit Semantic Analysis (ESA) TEXT MINING SUPPORT • Algorithms support text • Tokenization and theme extraction • Explicit Semantic Analysis (ESA) for document similarity STATISTICAL FUNCTIONS • Basic statistics: min, max, median, stdev, t-test, F-test, Pearson’s, Chi-Sq, ANOVA, etc. R PACKAGES • CRAN R Algorithm Packages through Embedded R Execution • Spark MLlib algorithm integration EXPORTABLE ML MODELS • REST APIs for deployment © 2019 Oracle
  7. STATISTICAL FUNCTIONS • Descriptive statistics (e.g. median, stdev, mode, sum, etc.) • Hypothesis testing (t-test, F-test, Kolmogorov- Smirnov test, Mann Whitney test, Wilcoxon Signed Ranks test) • Correlations analysis (parametric and nonparametric e.g. Pearson’s test for correlation, Spearman's rho coefficient, Kendall's tau-b correlation coefficient) • Ranking functions • Cross Tabulations with Chi- square statistics| • Linear regression • ANOVA (Analysis of variance) • Test Distribution fit (e.g. Normal distribution test, Binomial test, Weibull test, Uniform test, Exponential test, Poisson test, etc.) • Statistical Aggregates (min, max, mean, median, stdev, mode, quantiles, plus x sigma, minus x sigma, top n outliers, bottom n outliers) Oracle SQL Functions ANALYTICAL SQL • SQL Windows • SQL Aggregate functions • LAG/LEAD functions • SQL for Pattern Matching • Additional approximate query processing: APPROX_COUNT, APPROX_SUM, APPROX_RANK • Regular Expressions © 2019 Oracle
  8. SQL Statistical Functions SELECT SUBSTR(cust_income_level, 1, 22) income_level, AVG(DECODE(cust_gender, 'M', amount_sold, null)) sold_to_men, AVG(DECODE(cust_gender, 'F', amount_sold, null)) sold_to_women, STATS_T_TEST_INDEPU(cust_gender, amount_sold, 'STATISTIC', 'F') t_observed, STATS_T_TEST_INDEPU(cust_gender, amount_sold) two_sided_p_value FROM customers c, sales s WHERE c.cust_id = s.cust_id GROUP BY ROLLUP(cust_income_level) ORDER BY income_level, sold_to_men, sold_to_women, t_observed; Simple SQL Syntax—Statistical Comparisons (t-tests) STATS_T_TEST_INDEPU (SQL) Example; P_Values < 05 show statistically significantly differences in the amounts purchased by men vs. women Compare AVE Purchase Amounts Men vs. Women Grouped_By INCOME_LEVEL © 2019 Oracle
  9. OML for SQL Model Build & SQL Apply Prediction BEGIN DBMS_DATA_MINING.CREATE_MODEL( model_name => 'BUY_INSUR1', mining_function => dbms_data_mining.classification, data_table_name => 'CUST_INSUR_LTV', case_id_column_name => 'CUST_ID', target_column_name => 'BUY_INSURANCE', settings_table_name => 'CUST_INSUR_LTV_SET'); END; / Simple SQL Syntax—Classification Model Select prediction_probability(BUY_INSUR1, 'Yes' USING 3500 as bank_funds, 825 as checking_amount, 400 as credit_balance, 22 as age, 'Married' as marital_status, 93 as MONEY_MONTLY_OVERDRAWN, 1 as house_ownership) from dual; ML Model Build (PL/SQL) Model Apply (SQL query) © 2019 Oracle
  10. OML for SQL Model Build BEGIN DBMS_DATA_MINING.CREATE_MODEL( model_name => 'BUY_INSURANCE_AI', mining_function => DBMS_DATA_MINING.ATTRIBUTE_IMPORTANCE, data_table_name => 'CUST_INSUR_LTV', case_id_column_name => 'cust_id', target_column_name => 'BUY_INSURANCE', settings_table_name => 'Att_Import_Mode_Settings'); END; / Simple SQL Syntax—Attribute Importance SELECT attribute_name, explanatory_value, rank FROM BUY_INSURANCE_AI ORDER BY rank, attribute_name; ML Model Build (PL/SQL) Model Results (SQL query) ATTRIBUTE_NAME RANK ATTRIBUTE_VALUE BANK_FUNDS 1 0.2161 MONEY_MONTLY_OVERDRAWN 2 0.1489 N_TRANS_ATM 3 0.1463 N_TRANS_TELLER 4 0.1156 T_AMOUNT_AUTOM_PAYMENTS 5 0.1095 A1A2A3A4 A5 A6 A7 © 2019 Oracle
  11. OML for R Model Build > ore.odmAI (BUY_INSURANCE ~ ., CUST_INSUR_LTV) Call: ore.odmAI(formula = BUY_INSURANCE ~ ., data = CUST_INSUR_LTV) Simple R Language Syntax—Attribute Importance ML Model Build (R) Model Results (R) Importance: importance rank BANK_FUNDS 0.2161187797 1 MONEY_MONTLY_OVERDRAWN 0.1489347141 2 N_TRANS_ATM 0.1463026512 3 N_TRANS_TELLER 0.1155879786 4 T_AMOUNT_AUTOM_PAYMENTS 0.1095178647 5 A1A2A3A4 A5 A6 A7 © 2019 Oracle
  12. OML for Python* Model Build > ai_mod = ai(**setting) # Create AI model object > ai_mod = ai_mod.fit(train_x, train_y) Simple Python Language Syntax—Attribute Importance ML Model Build (Python) Model Results (Python) Importance: variable importance rank BANK_FUNDS 0.2161187797 1 MONEY_MONTLY_OVERDRAWN 0.1489347141 2 N_TRANS_ATM 0.1463026512 3 N_TRANS_TELLER 0.1155879786 4 T_AMOUNT_AUTOM_PAYMENTS 0.1095178647 5 A1A2A3A4 A5 A6 A7 * OML4Py – coming soon © 2019 Oracle
  13. Database Developer to Data Scientist Journey • Data extraction Typically 80% of the work • Data wrangling • Deriving new attributes (“feature engineering”) … … • Import predictions & insights • Translate and deploy ML models Eliminate or minimize with Oracle • Automate You Are Probably Already Doing Most of This Work! 1 https://www.infoworld.com/article/3228245/data-science/the-80-20-data-science-dilemma.html Most data scientists spend only 20 percent of their time on actual data analysis and 80 percent of their time finding, cleaning, and reorganizing huge amounts of data, which is an inefficient data strategy1 Data Management platform becomes a combined/hybrid DM + machine learning platform Where the Machine Learning “Magic” Happens © 2019 Oracle
  14. Additional Data Management Challenges Maintenance 72% of IT Budget is spent on Generic Maintenance Tasks vs Innovation - ComputerWorld Cost and Complexity Reliability 91% Experience Unplanned Data Center Outages - Healthcare IT News Database downtime costs $7,900 / minute - DB Maestro ¾ Cost of Database Management spent on labor - IDC 91%72% 75% © 2019 Oracle
  15. Journey to Autonomous Database Focus on innovation, not maintenance • Automatic Query Rewrite • Automatic Undo Management • Autonomous Health Framework • Automatic Diagnostic Framework • Automatic Refresh of Clones • Automatic SQL Tuning • Automatic Workload Capture / Replay • Automatic SQL Plan Management • Automatic Capture of SQL Monitor • Automatic Data Optimization • Automatic Memory Management • Automatic Segment Space Mgmt • Automatic Statistics Gathering • Automatic Storage Management • Automatic Workload Repository • Automatic Diagnostic Monitor • Automatic Columnar Flash • Automatic IM population • Automatic Application Continuity 9i 10g 11g 12c 18c Examples of automation capabilities introduced through releases 19c • Automatic Indexing © 2019 Oracle
  16. Autonomous Database Automated Data Center Operations and Machine Learning Complete Infrastructure Automation Complete Database Automation Oracle Cloud Oracle Autonomous Database | Components Brings full automation to entire database lifecycle © 2019 Oracle
  17. Using Machine Learning to Drive Autonomous Automatically Adapts to Changing Workloads WORKLOAD OPTIMIZATIONS “The big ticket item for 2018 & 2019 is the use of ML and AI in the DBMS allowing the DBMS to maintain itself – the DBMS becomes Self-Driving. The job of the DBA evolves to use their skills for tasks with greater business value.” - Donald Feinberg, Distinguished Analyst, Gartner (Oracle Webcast) Protects Against External Malicious Attacks SECURITY Detects Anomalies and Fixes Known Issues MONITORING & DIAGNOSTICS © 2019 Oracle
  18. •Tasks Specific to the Business – Architecture, planning, data modeling – Data security and data lifecycle management – Application-related tuning – End-to-End service level management •Tactical Operations – Configuration and tuning of systems, network, storage – Database provisioning, patching – Database backups, H/A, disaster recovery – Database optimization Value Scale Innovation Maintenance © 2019 Oracle What Autonomous Database Means for DBAs
  19. •Tasks Specific to the Business – Architecture, planning, data modeling – Data security and data lifecycle management – Application-related tuning – End-to-End service level management •Tactical Operations – Configuration and tuning of systems, network, storage – Database provisioning, patching – Database backups, H/A, disaster recovery – Database optimization Value Scale Innovation Maintenance Removes tactical drudgery, more time to innovate © 2019 Oracle What Autonomous Database Means for DBAs
  20. All Analytic Workloads Data Warehouse, Data Mart, Data Lakes AUTONOMOUS DATA WAREHOUSE Online TP & Mixed Workloads Transactions, Mixed Workloads, Application Development AUTONOMOUS TRANSACTION PROCESSING ORACLE AUTONOMOUS DATABASE © 2019 Oracle Select an Autonomous Database Solution that Meets Your Workload Needs
  21. Autonomous Data Warehouse - Key Use Cases Query All DataMachine Learning Data Marts / Warehouses Sandboxes for Data Scientists Data Lakes Business Analytics © 2019 Oracle
  22. Autonomous Transaction Processing - Key Use Cases Innovate Faster Real-Time Analytics and Machine Learning Departmental or Mission Critical Applications1 Mixed Workloads Application Development Support Business Operations Analytics Transactions 1 Coming in Calendar Year 2019 © 2019 Oracle
  23. http://oracle.com/cloud/free © 2019 Oracle
  24. Thank you ! Bjoern.Staender@oracle.com Analytics & Data Innovation Oracle Germany © 2019 Oracle
Anúncio