O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

20200402 oracle cloud infrastructure data science

Oracle Cloud Infrastructure Data Science 技術概要

Livros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo
  • Seja o primeiro a comentar

20200402 oracle cloud infrastructure data science

  1. 1. 2020 4 2 Oracle Cloud Infrastructure Data Science
  2. 2. Oracle Java Oracle CorporationOracle Java Oracle Corporation 2 Copyright © 2020 Oracle and/or its affiliates.
  3. 3. 3 Copyright © 2020 Oracle and/or its affiliates. • • • - - - • (ML) OSS Oracle Accelerated Data Science(ADS) • ML • PaaS IaaS Overview
  4. 4. 4 Copyright © 2020 Oracle and/or its affiliates. • • • Notebook • • Jupyter Notebook ML Compute • Compartment VCN Subnet Compute Block Volume • ML • Keras • scikit-learn • XGBoost • Oracle Accelerated Data Science(ADS) • • Accelerated Data Science scikit-learn ML Jupyter Notebook Noteboot Compute Block Storage
  5. 5. 5 Copyright © 2020 Oracle and/or its affiliates. Notebook Python Notebook OCI OCI Jupyter Notebook
  6. 6. 6 Copyright © 2020 Oracle and/or its affiliates. • Oracle Cloud Infrastructure Data Science Python • API • Oracle AutoML • • • Oracle Accelerated Data Science(ADS) AutoML Confidential – © 2020 Oracle Internal ⑥モデルの 解釈 ②データの 変換 ⑤モデルの 評価 Accelerated data Science
  7. 7. 7 Copyright © 2020 Oracle and/or its affiliates. • ADS • DatasetFactory • • • OCI Object Storage, Amazon S3, Google Cloud Storage, Azure Blob • Oracle DB, ADW, MongoDB, HDFS, NoSQL DB, Elastic Search, etc. • • CSV, TSV, Parquet, libsvm, json, Excel, HDF5, SQL, xml, Apache Server Logfile(clf, log), arff
  8. 8. 8 Copyright © 2020 Oracle and/or its affiliates. # ds = DatasetFactory.open("/path/to/data.data", format='csv', delimiter=" ") # OCI Object Storage Service ds = DatasetFactory.open("oci://<bucket-name>/<file-name>", storage_options = { "config": "~/.oci/config", "profile": "DEFAULT_USER" }) # Amazon S3 ds = DatasetFactory.open("s3://bucket_name/iris.csv", storage_options = { 'key': 'aws key', 'secret': 'aws secret, 'blocksize': 1000000, 'client_kwargs': { "endpoint_url": "https://s3-us-west-1.amazonaws.com" } }) # ADW uri = f'oracle+cx_oracle://{os.environ["ADW_USER"]}:{os.environ["ADW_PASSWORD"]}@{os.environ["ADW_SID"]}’ ds = DatasetFactory.open(uri, format="sql", table=table, index_col=index_col, target='label')
  9. 9. 9 Copyright © 2020 Oracle and/or its affiliates. • RDB • ( ) • • ” ” • • • • • etc.
  10. 10. 10 Copyright © 2020 Oracle and/or its affiliates. • • • • • String • ( ) • • Null Null
  11. 11. 11 Copyright © 2020 Oracle and/or its affiliates. 1. 2. 3. 4. ADS # ds.get_recommendations() transformed_ds = ds.get_transformed_dataset() # transformed_ds = ds.auto_transform() ADS AutoML
  12. 12. 12 Copyright © 2020 Oracle and/or its affiliates. ADS ( , ) ( , ) “Drop” get_recommendations()
  13. 13. 13 Copyright © 2020 Oracle and/or its affiliates. ( , ) ( , ) “Drop” get_recommendations()
  14. 14. 14 Copyright © 2020 Oracle and/or its affiliates. ( , ) ( , ) “Drop” get_recommendations()
  15. 15. 15 Copyright © 2020 Oracle and/or its affiliates. ( ) ( , ) “Up-sample” “Down-sample” ( , ) get_recommendations()
  16. 16. 16 Copyright © 2020 Oracle and/or its affiliates. • • • • API(Seaborn, Matplotlib, GIS)
  17. 17. 17 Copyright © 2020 Oracle and/or its affiliates. # show_in_notebook() ds.show_in_notebook() 5
  18. 18. 18 Copyright © 2020 Oracle and/or its affiliates. # ds.plot("col02").show_in_notebook(figsize=(4,4)) # ds.plot("col02", y="col01").show_in_notebook(figsize=(4,4)) # ds.plot("col01", y="col03").show_in_notebook()
  19. 19. 19 Copyright © 2020 Oracle and/or its affiliates. API # Matplotlib from numpy.random import randn df = pd.DataFrame(randn(1000, 4), columns=list('ABCD')) def ts_plot(df, figsize): ts = pd.Series(randn(1000), index=pd.date_range('1/1/2000', periods=1000)) df.set_index(ts) df = df.cumsum() plt.figure() df.plot(figsize=figsize) plt.legend(loc='best') ds = DatasetFactory.from_dataframe(df, target='A') ds.call(ts_plot, figsize=(7,7)) Seaborn, Matplotlib, GIS
  20. 20. 20 Copyright © 2020 Oracle and/or its affiliates. • ADS AutoML • 1. 2. ( ) 3. 4. # train, test = transformed_ds.train_test_split(test_size=0.1) # ml_engine = OracleAutoMLProvider(n_jobs=-1, loglevel=logging.ERROR) oracle_automl = AutoML(train, provider=ml_engine) automl_model1, baseline = oracle_automl.train() • AdaBoostClassifier • DecisionTreeClassifier • ExtraTreesClassifier • KNeighborsClassifier • LGBMClassifier • LinearSVC • LogisticRegression • RandomForestClassifier • SVC • XGBClassifier
  21. 21. 21 Copyright © 2020 Oracle and/or its affiliates. Oracle AutoML oracle_automl.visualize_algorithm_selection_trials() oracle_automl.visualize_adaptive_sampling_trials()
  22. 22. 22 Copyright © 2020 Oracle and/or its affiliates. Oracle AutoML oracle_automl.visualize_feature_selection_trials() oracle_automl.visualize_tuning_trials()
  23. 23. 23 Copyright © 2020 Oracle and/or its affiliates. • • • ( ) TESTTESTTESTTESTTEST TRAIN TESTTESTTESTTESTTEST TRAIN TRAIN TEST TRAIN TRAINTEST TRAINTEST TRAINTEST (※1) 1 2 3 4 5 ※1 N 1 1 TEST N-1 TRAIN 2 1 TEST N-1 TRAIN N
  24. 24. 24 Copyright © 2020 Oracle and/or its affiliates. ) • • PR ROC • # bin_evaluator = ADSEvaluator(test, models=[bin_lr_model, bin_rf_model], training_data=train) # bin_evaluator.show_in_notebook(perfect=True)
  25. 25. 25 Copyright © 2020 Oracle • • • • • • • Global Explainer = - (Feature Permutation Importance) - (Individual Conditional Expectation(ICE)) - (Partial Dependence Plot(PDP)) • Local Explainer =
  26. 26. 26 Copyright © 2020 Oracle and/or its affiliates. ADS Global Explainer – Feature Permutation Importance PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 1 0 3 Braund, Mr. Owen male 22 1 0 7.25 S 2 1 1 Cumings, Mrs. John female 38 1 0 71.2833 C 3 1 3 Heikkinen, Miss. Laina female 26 0 0 7.925 S 4 1 1 Futrelle, Mrs. Jacques Heath female 35 1 0 53.1 S PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 1 0 3 Braund, Mr. Owen Female 22 1 0 7.25 S 2 1 1 Cumings, Mrs. John Male 38 1 0 71.2833 C 3 1 3 Heikkinen, Miss. Laina Male 26 0 0 7.925 S 4 1 1 Futrelle, Mrs. Jacques Heath male 35 1 0 53.1 S (baseline_score) (shuffled_score) baseline_score shuffled_score baseline_score shuffled_score • • baseline_score - shffuled_score
  27. 27. 27 Copyright © 2020 Oracle and/or its affiliates. # With ADSExplainer, create a global explanation object using # the MLXGlobalExplainer provider from ads.explanations.mlx_global_explainer import MLXGlobalExplainer global_explainer = explainer.global_explanation( provider=MLXGlobalExplainer()) # A summary of the global feature permutation importance algorithm and # how to interpret the output can be displayed with global_explainer.feature_importance_summary() # Compute the global Feature Permutation Importance explanation importances = global_explainer.compute_feature_importance() # ADS supports multiple visualizations for the global Feature # Permutation Importance explanations (see "Interpretation" above) # Simple bar chart highlighting the average impact on model score # across multiple iterations of the algorithm importances.show_in_notebook() # Build the model using AutoML. 'model' is a subclass of type ADSModel. # Note that the ADSExplainer below works with any model (classifier or # regressor) that is wrapped in an ADSModel import logging from ads.automl.provider import OracleAutoMLProvider from ads.automl.driver import AutoML ml_engine = OracleAutoMLProvider(n_jobs=-1, loglevel=logging.ERROR) oracle_automl = AutoML(train, provider=ml_engine) model, baseline = oracle_automl.train() # Create the ADS explainer object, which is used to construct global # and local explanation objects. The ADSExplainer takes as input the # model to explain and the train/test dataset from ads.explanations.explainer import ADSExplainer explainer = ADSExplainer(test, model, training_data=train) Global Explainer – Feature Importance Sample Code
  28. 28. 28 Copyright © 2020 Oracle and/or its affiliates. ADS Global Explainer - Individual Conditional Expectation(ICE) F1 F2 F3 T 2 1.2 0 15.1 7 2.4 4 12.5 8 9.7 3 18.1 . ... ... 13.5 F1 F2 F3 T 2 1.2 0 15.1 F1 F2 F3 T 1 1.2 0 ? 2 2.4 4 ? 3 9.7 3 ? . ... ... ? F1 F2 F3 T 1 1.2 0 13.5 2 2.4 4 15.1 3 9.7 3 17.5 . ... ... ... F1 T F1 input T ( ) T F1 F1 T Oracle
  29. 29. 29 Copyright © 2020 Oracle and/or its affiliates. ADS Global Explainer - Partial Dependence Plot(PDP) F1 F2 F3 T 2 1.2 0 15.1 7 2.4 4 12.5 8 9.7 3 18.1 . ... ... 13.5 F1 F2 F3 T 2 1.2 0 15.1 F1 F2 F3 T 1 1.2 0 ? 2 2.4 4 ? 3 9.7 3 ? . ... ... ? F1 F2 F3 T 1 1.2 0 13.5 2 2.4 4 15.1 3 9.7 3 17.5 . ... ... ... F1 T ICE ICE PDP = ICE ( ) Oracle ICE
  30. 30. 30 Copyright © 2020 Oracle and/or its affiliates. from ads.explanations.mlx_global_explainer import MLXGlobalExplainer global_explainer = explainer.global_explanation( provider=MLXGlobalExplainer()) # A summary of the global partial feature dependence explanation # algorithm and how to interpret the output can be displayed with global_explainer.partial_dependence_summary() # Compute the 1-feature PDP on the categorical feature, "sex", # and numerical feature, "age" pdp_sex = global_explainer.compute_partial_dependence("sex") pdp_age = global_explainer.compute_partial_dependence( "age", partial_range=(0, 1)) # ADS supports PDP visualizations for both 1-feature and 2-feature # Feature Dependence explanations, and ICE visualizations for 1-feature # Feature Dependence explanations (see "Interpretation" above) # Visualize the categorical feature PDP for the True (Survived) label pdp_sex.show_in_notebook(labels=True) # Note that the ADSExplainer below works with any model (classifier or # regressor) that is wrapped in an ADSModel import logging from ads.automl.provider import OracleAutoMLProvider from ads.automl.driver import AutoML ml_engine = OracleAutoMLProvider(n_jobs=-1, loglevel=logging.ERROR) oracle_automl = AutoML(train, provider=ml_engine) model, baseline = oracle_automl.train() # Create the ADS explainer object, which is used to construct # global and local explanation objects. The ADSExplainer takes # as input the model to explain and the train/test dataset from ads.explanations.explainer import ADSExplainer explainer = ADSExplainer(test, model, training_data=train) # With ADSExplainer, create a global explanation object using # the MLXGlobalExplainer provider Global Explainer – ICE/PDP Sample Code
  31. 31. 31 Copyright © 2020 Oracle and/or its affiliates. Local Explainer • • ( α) • (Survived= 0 or 1) • PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 1 0 3 Braund, Mr. Owen male 22 1 0 7.25 S 2 1 1 Cumings, Mrs. John female 38 1 0 71.2833C 3 1 3 Heikkinen, Miss. Laina female 26 0 0 7.925 S ... ... ... ... ... ... ... ... ... ... ) (https://www.kaggle.com/c/titanic) PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 500 ? 1 Anna. Miss. Bworn female 36 1 0 71.283 C PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 500 1 1 Anna. Miss. Bworn female 36 1 0 71.283 C Why?
  32. 32. 32 Copyright © 2020 Oracle and/or its affiliates. Local Explainer PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 1 0 3 Braund, Mr. Owen male 22 1 0 7.25 S 2 1 1 Cumings, Mrs. John female 38 1 0 71.2833 C 3 1 3 Heikkinen, Miss. Laina female 26 0 0 7.925 S ... ... ... ... ... ... ... ... ... ... Oracle PassengerId Survived Pclass Name Sex Age SibSp Parch Fare Embarked 500 ? 1 Anna. Miss. Bworn female 36 1 0 71.283 C Passenger ID = 500 Passenger ID = 500 Oracle MLX
  33. 33. 33 Copyright © 2020 Oracle and/or its affiliates. Local Explainer PassengerID 500 PassengerID 500 ( )
  34. 34. 34 Copyright © 2020 Oracle and/or its affiliates. from ads.explanations.mlx_local_explainer import MLXLocalExplainer local_explainer = explainer.local_explanation( provider=MLXLocalExplainer()) # A summary of the local explanation algorithm and how to interpret # the output can be displayed with local_explainer.summary() # Select a specific sample (instance/row) to generate a local # explanation for sample = 14 # Compute the local explanation on our sample from the test set explanation = local_explainer.explain(test.X.iloc[sample:sample+1], test.y.iloc[sample:sample+1]) # Visualize the explanation for the label True (Survived). See # the "Interpretation" section above for more information explanation.show_in_notebook(labels=True) # Build the model using AutoML. 'model' is a subclass of type ADSModel. # Note that the ADSExplainer below works with any model (classifier or # regressor) that is wrapped in an ADSModel import logging from ads.automl.provider import OracleAutoMLProvider from ads.automl.driver import AutoML ml_engine = OracleAutoMLProvider(n_jobs=-1, loglevel=logging.ERROR) oracle_automl = AutoML(train, provider=ml_engine) model, baseline = oracle_automl.train() # Create the ADS explainer object, which is used to construct # global and local explanation objects. The ADSExplainer takes # as input the model to explain and the train/test dataset from ads.explanations.explainer import ADSExplainer explainer = ADSExplainer(test, model, training_data=train) # With ADSExplainer, create a local explanation object using # the MLXLocalExplainer provider Local Explainer
  35. 35. 35 Copyright © 2020 Oracle and/or its affiliates. • • • Data Science Platform • ADS ML • scikit-learn, keras, xgboost, lightGBM scikit-learn lightGBM OCI [ ]> [ ] Notebook
  36. 36. 36 Copyright © 2020 Oracle and/or its affiliates. Oracle Functions OCI Data Science OCI API Gateway http://hoge:8080/invoke/.. RESTEndpoint OCI Functions Service OCI Registry Service Application func.yml func.py scorefn.py requirement.txt ? cURL • • • func.yml • func.py • scorefn.py • requirement.txt • ( ) • Fn OCI Functions • OCI API Gateway • OCI (OCI Functions) • REST (API Gateway) • OCI • REST OCI Functions

×