O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

BigML Education - Datasets

302 visualizações

Publicada em

Datasets are the fundamental building block for your BigML workflows. Learn how to filter, sample, add new fields, or split a dataset into training and test datasets.

Publicada em: Dados e análise
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

BigML Education - Datasets

  1. 1. BigML Education Datasets June 2017
  2. 2. BigML Education Program 2Datasets In This Video • Introduction • Typical workflow: 1-click creation • Purpose of datasets in BigML • Exploration • Pre-flight check • Basic Features • Other ways to create datasets • Train/Test split • More Exploration • Advanced Features • Filtering • Feature engineering with Flatline
  3. 3. BigML Education Program 3Datasets Sources Introduction
  4. 4. BigML Education Program 4Datasets What is a Dataset? • Datasets are the fundamental building blocks • Models, Clusters, etc. derive from datasets • Sources can only become datasets • Data exploration / Pre-flight check • Missing/Errors • Summary statistics • Non-preferred fields • Default objective for 1-click actions
  5. 5. BigML Education Program 5Datasets Datasets Basic Features
  6. 6. BigML Education Program 6Datasets Dataset Features • Immutable - “dataset/5943226f01440401bf0003bd” • Creating Datasets • From a source • From a dataset: sampling, training/test • From a batch output • Dynamic scatterplot
  7. 7. BigML Education Program 7Datasets Datasets Advanced Features
  8. 8. BigML Education Program 8Datasets Advanced Configuration • Dataset Filtering • Feature Engineering
  9. 9. BigML Education Program 9Datasets Loan Status Charged Off Current Default Fully Paid In Grace Late (16-30) Late (31-120) Filter Current In Grace Late (16-30) Late (31-120) Open Charged Off Default Fully Paid Closed Engineer Good Bad Quality
  10. 10. BigML Education Program 10Datasets Summary • Dataset Purpose • Fundamental building block • Pre-flight check: counts, histograms, scatterplot • Creating dataset • From source: 1-click and sampling • Training / Test split • From batch output • From dataset: sampling, filtering, new features