O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Introduction to Apache Hivemall v0.5.2 and v0.6

766 visualizações

Publicada em

Lightning talk at Hadoop Conference Japan 2019.

Publicada em: Dados e análise
  • Seja o primeiro a comentar

Introduction to Apache Hivemall v0.5.2 and v0.6

  1. 1. Introduction to Apache Hivemall v0.5.2 and v0.6 Principal Engineer Makoto YUI @myui @ApacheHivemall 1Hadoop Conf Japan - Mar 14, 2019
  2. 2. Hadoop Conf Japan - Mar 14, 2019 2 We Open-source! Streaming log collector Bulk data import/export Efficient binary serialization Machine learning on Hadoop Workflow EngineEmbedded version of Fluentd
  3. 3. Machine Learning in SQL queries 3 Hadoop Conf Japan - Mar 14, 2019
  4. 4. BigQuery ML at Google I/O 2018 4 https://ai.googleblog.com/2018/07/machine-learning-in-google-bigquery.html Hadoop Conf Japan - Mar 14, 2019
  5. 5. 5 Could I use ML-in-SQL in my cluster? Hadoop Conf Japan - Mar 14, 2019
  6. 6. 6 Open-source Machine Learning Solution for SQL-on-Hadoop Hadoop Conf Japan - Mar 14, 2019 hivemall.apache.org (incubating)
  7. 7. 7 HiveQL SparkSQL/Dataframe API Pig Latin Hivemall is a multi/cross platform ML library that provides rich set of functions Hadoop Conf Japan - Mar 14, 2019
  8. 8. Hivemall on Apache Hive 8Hadoop Conf Japan - Mar 14, 2019
  9. 9. Hivemall on Apache Spark Dataframe 9Hadoop Conf Japan - Mar 14, 2019
  10. 10. Hivemall on SparkSQL 10Hadoop Conf Japan - Mar 14, 2019
  11. 11. Hivemall on Apache Pig 11Hadoop Conf Japan - Mar 14, 2019
  12. 12. Online Prediction by Apache Streaming 12Hadoop Conf Japan - Mar 14, 2019
  13. 13. New in v0.5.2 – Brickhouse UDFs Hadoop Conf Japan - Mar 14, 2019 13 JSON Hyper LogLog
  14. 14. New in v0.5.2 – Field-aware Factorization Machines Hadoop Conf Japan - Mar 14, 2019 14
  15. 15. Hadoop Conf Japan - Mar 14, 2019 15 New in v0.5.2 – Okapi BM25 term weighting
  16. 16. Plan for v0.6 16Hadoop Conf Japan - Mar 14, 2019 Release in April-May, 2019 ü New state-of-the-art optimizers like AdamHD (merged) ü Gradient boosting ü Stable XGBoost support ü More efficient Sparse vector support in RandomForest ü Spark 2.4 support
  17. 17. 17 SELECT train_xgboost_classifier(features, label) as (model_id, model) FROM training_data XGBoost support in Hivemall (beta version) SELECT rowed, AVG(predicted) as predicted FROM ( -- predict with each model SELECT xgboost_predict(rowid, features, model_id, model) AS (rowid, predicted) -- join each test record with each model FROM xgboost_models CROSS JOIN test_data_with_id ) t GROUP BY rowid; Hadoop Conf Japan - Mar 14, 2019
  18. 18. ü Word2Vec support ü Multi-class Logistic Regression ü Hyperparameter tuning (e.g., grid search) ü Yarn application/standalone Hivemall Future work (v0.7 or later) 18 PR#91 PR#116 Hadoop Conf Japan - Mar 14, 2019
  19. 19. Hadoop Conf Japan - Mar 14, 2019 19 We are hiring.. Engineer (Java/Scala/Ruby), Data Scientist, Sales Engineer, SRE, Support Engineer

×