O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Machine Learning RUM - Velocity 2016

742 visualizações

Publicada em

Presentation from Velocity 2016 on using Machine Learning to determine the metrics that drive bounce and conversions

Publicada em: Tecnologia
  • Entre para ver os comentários

Machine Learning RUM - Velocity 2016

  1. 1. Using machine learning to determine drivers of bounce and conversion 2016 Velocity Santa Clara
  2. 2. Pat Meenan @patmeenan Tammy Everts @tameverts
  3. 3. What we did
  4. 4. Get the code https://github.com/WPO-Foundation/beacon-ml
  5. 5. Deep Learning Weights
  6. 6. Random Forest Lots of random decision trees
  7. 7. Vectorizing the data • Everything needs to be numeric • Strings converted to several inputs as yes/no (1/0) • i.e. Device Manufacturer – “Apple” would be a discrete input • Watch out for input explosion (UA String)
  8. 8. Balancing the data • 3% Conversion Rate • 97% Accurate by always guessing no • Subsample the data for 50/50 mix
  9. 9. Validation Data • Train on 80% of the data • Validate on 20% to prevent overfitting
  10. 10. Smoothing the data • ML works best on normally distributed data scaler = StandardScaler() x_train = scaler.fit_transform(x_train) x_val = scaler.transform(x_val)
  11. 11. Input/Output Relationships • SSL highly correlated with Conversions • Long sessions highly correlated with not bouncing • Remove correlated features from training
  12. 12. Training Deep Learning model = Sequential() model.add(...) model.compile(optimizer='adagrad', loss='binary_crossentropy', metrics=["accuracy"]) model.fit(x_train, y_train, nb_epoch=EPOCH_COUNT, batch_size=32, validation_data=(x_val, y_val), verbose=2, shuffle=True)
  13. 13. Training Random Forest clf = RandomForestClassifier(n_estimators=FOREST_SIZE, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=12, random_state=None, verbose=2, warm_start=False, class_weight=None) clf.fit(x_train, y_train)
  14. 14. Feature Importances clf.feature_importances_
  15. 15. What we learned
  16. 16. Takeaways
  17. 17. Thanks!

×