9. Survived
first class,
female,
1 sibling,
35 years old
Perished
third class,
female,
2 siblings,
18 years old
second class,
male,
0 siblings,
50 years old
Can we predict survival from data?
38. Using All Available Data
• Use training and dev to generate predictions on test
39. Using All Available Data
• Use training and dev to generate predictions on test
[General]
experiment_name = Titanic_Predict
task = predict
!
[Input]
train_location = train+dev
test_location = test
featuresets = [["family.csv", "misc.csv",
"socioeconomic.csv", "vitals.csv"]]
learners = ["RandomForestClassifier", "SVC", "MultinomialNB"]
label_col = Survived
!
[Tuning]
grid_search = true
objective = accuracy
!
[Output]
results = output
43. Advanced SKLL Features
• Read/write .arff, .csv, .jsonlines, .megam, .ndj,
and .tsv data
• Parameter grids for all supported classifiers/regressors
44. Advanced SKLL Features
• Read/write .arff, .csv, .jsonlines, .megam, .ndj,
and .tsv data
• Parameter grids for all supported classifiers/regressors
• Parallelize experiments on DRMAA clusters
45. Advanced SKLL Features
• Read/write .arff, .csv, .jsonlines, .megam, .ndj,
and .tsv data
• Parameter grids for all supported classifiers/regressors
• Parallelize experiments on DRMAA clusters
• Ablation experiments
46. Advanced SKLL Features
• Read/write .arff, .csv, .jsonlines, .megam, .ndj,
and .tsv data
• Parameter grids for all supported classifiers/regressors
• Parallelize experiments on DRMAA clusters
• Ablation experiments
• Collapse/rename classes from config file
47. Advanced SKLL Features
• Read/write .arff, .csv, .jsonlines, .megam, .ndj,
and .tsv data
• Parameter grids for all supported classifiers/regressors
• Parallelize experiments on DRMAA clusters
• Ablation experiments
• Collapse/rename classes from config file
• Rescale predictions to be closer to observed data
48. Advanced SKLL Features
• Read/write .arff, .csv, .jsonlines, .megam, .ndj,
and .tsv data
• Parameter grids for all supported classifiers/regressors
• Parallelize experiments on DRMAA clusters
• Ablation experiments
• Collapse/rename classes from config file
• Rescale predictions to be closer to observed data
• Feature scaling
49. Advanced SKLL Features
• Read/write .arff, .csv, .jsonlines, .megam, .ndj,
and .tsv data
• Parameter grids for all supported classifiers/regressors
• Parallelize experiments on DRMAA clusters
• Ablation experiments
• Collapse/rename classes from config file
• Rescale predictions to be closer to observed data
• Feature scaling
• Python API
59. SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam')
60. SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam')
# Train a linear SVM
learner = Learner('LinearSVC')
learner.train(train_examples)
61. SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam')
# Train a linear SVM
learner = Learner('LinearSVC')
learner.train(train_examples)
# Load test examples and evaluate
test_examples = load_examples('test.tsv')
(conf_matrix, accuracy, prf_dict, model_params,
obj_score) = learner.evaluate(test_examples)
62. SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam')
# Train a linear SVM
learner = Learner('LinearSVC')
learner.train(train_examples)
# Load test examples and evaluate
confusion matrix
test_examples = load_examples('test.tsv')
(conf_matrix, accuracy, prf_dict, model_params,
obj_score) = learner.evaluate(test_examples)
63. SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam')
# Train a linear SVM
learner = Learner('LinearSVC')
learner.train(train_examples)
# Load test examples and evaluate
test_examples = load_examples('test.tsv')
(conf_matrix, accuracy, prf_dict, model_params,
obj_score) = learner.evaluate(test_examples)
64. SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam')
# Train a linear SVM
learner = Learner('LinearSVC')
learner.train(train_examples)
precision, recall, f-score
# Load test examples and evaluate
for each class
test_examples = load_examples('test.tsv')
(conf_matrix, accuracy, prf_dict, model_params,
obj_score) = learner.evaluate(test_examples)
65. SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam')
# Train a linear SVM
learner = Learner('LinearSVC')
learner.train(train_examples)
tuned model
# Load test examples and evaluate parameters
test_examples = load_examples('test.tsv')
(conf_matrix, accuracy, prf_dict, model_params,
obj_score) = learner.evaluate(test_examples)
66. SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam')
# Train a linear SVM
learner = Learner('LinearSVC')
learner.train(train_examples)
# Load test examples and evaluate
objective function
test_examples = load_examples('test.tsv')
score on test set
(conf_matrix, accuracy, prf_dict, model_params,
obj_score) = learner.evaluate(test_examples)
67. SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam')
# Train a linear SVM
learner = Learner('LinearSVC')
learner.train(train_examples)
# Load test examples and evaluate
test_examples = load_examples('test.tsv')
(conf_matrix, accuracy, prf_dict, model_params,
obj_score) = learner.evaluate(test_examples)
68. SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam')
# Train a linear SVM
learner = Learner('LinearSVC')
learner.train(train_examples)
# Load test examples and evaluate
test_examples = load_examples('test.tsv')
(conf_matrix, accuracy, prf_dict, model_params,
obj_score) = learner.evaluate(test_examples)
# Generate predictions from trained model
predictions = learner.predict(test_examples)
69. SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam')
# Train a linear SVM
learner = Learner('LinearSVC')
learner.train(train_examples)
# Load test examples and evaluate
test_examples = load_examples('test.tsv')
(conf_matrix, accuracy, prf_dict, model_params,
obj_score) = learner.evaluate(test_examples)
# Generate predictions from trained model
predictions = learner.predict(test_examples)
# Perform 10-fold cross-validation with a radial SVM
learner = Learner('SVC')
(fold_result_list,
grid_search_scores) = learner.cross_validate(train_examples)
70. SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam')
# Train a linear SVM
learner = Learner('LinearSVC')
learner.train(train_examples)
# Load test examples and evaluate
test_examples = load_examples('test.tsv')
(conf_matrix, accuracy, prf_dict, model_params,
obj_score) = learner.evaluate(test_examples)
# Generate predictions from trained model
predictions = learner.predict(test_examples)
per-fold
# evaluation results cross-validation with a radial SVM
Perform 10-fold
learner = Learner('SVC')
(fold_result_list,
grid_search_scores) = learner.cross_validate(train_examples)
71. SKLL API
from skll import Learner, load_examples
# Load training examples
train_examples = load_examples('myexamples.megam')
# Train a linear SVM
learner = Learner('LinearSVC')
learner.train(train_examples)
# Load test examples and evaluate
test_examples = load_examples('test.tsv')
(conf_matrix, accuracy, prf_dict, model_params,
obj_score) = learner.evaluate(test_examples)
# Generate predictions from trained model
predictions = learner.predict(test_examples)
# Perform 10-fold cross-validation with a radial SVM
per-fold training
learner = Learner('SVC')
set obj. scores
(fold_result_list,
grid_search_scores) = learner.cross_validate(train_examples)
72. SKLL API
import numpy as np
import os
from skll import write_feature_file
!
# Create some training examples
classes = []
ids = []
features = []
for i in range(num_train_examples):
y = "dog" if i % 2 == 0 else "cat"
ex_id = "{}{}".format(y, i)
x = {"f1": np.random.randint(1, 4),
"f2": np.random.randint(1, 4),
"f3": np.random.randint(1, 4)}
classes.append(y)
ids.append(ex_id)
features.append(x)
# Write them to a file
train_path = os.path.join(_my_dir, 'train',
'test_summary.jsonlines')
write_feature_file(train_path, ids, classes, features)