SlideShare a Scribd company logo
1 of 6
Project Report
Albert Chu
Introduction:
Insurance company need a way to assess people to decide insurance plans and prices and my
internship with Northwestern Mutual has to do with finding clients and assessing which
insurance portfolio is most suitable for them My methods of analysis is using Newton’s Method
to create a minimizing risk and maximizing Expected Returns. Given that the insurance industry
is hundreds of billions of dollars, tiny inaccuracies are magnified.
Method:
How can we make portfolio maximizing more accurate? Data sets are given that have numerous
factors that influence the eligibility of life insurance. All these variables are given dummy
variables: [0=no] and [1=yes] but each are weighted differently with a function. Using more than
just a Yes-No Scale or a 1-10 scale we can influence a more accurate result from the given data.
As said previously, our function with the predictor variables will have a different weight on the
result given to us by R and by eliminating insignificant variables in our data model set. With
that, we create a second degree polynomial with all the interaction terms. The first thing we
needed to do was to minimize the variance (v=xtVx) and maximize expected return (r = atx). We
will be using Newton’s method to find the tiny inaccuracies from the rounding that the function
gives people.
The function given to us from R: σ2(x)=σ2x2+σrx+V+error can now be implemented by our
dataset in the attached files. The overall variance and expected values are calculated then placed
in the function. Each person will have a different and unique way of showing if they will be
eligible for certain plans. Then to find the optimal point we will use method Newton’s Method
given our liability function and interpret the accuracy of the data with real life models to test if it
is viable. Thus, Newton’s method to execute more iterations to eliminate numerical rounding
errors as well as to more accurately represent the output by this pseudo code in the main set of
code:
newtons_method(test_preds, test['Response'].values,N,.01) starts with i = 1;
while( i <= N ):
optvalue = test['Response'].values-test['Response'].values/test['Response'].values;
then if ( abs(optvalue -test['Response'].values) < TOL ):
print('Took '+ str(i) + ' iterations');
return
i = i+1;
test['Response'].values=optvalue ; and print(optvalue );
Results:
The results would allow insurance company to invest less money on risks, thus allowing lower
rates to attract more clients, outputs a number that decides client’s risk assessment and
profitability after a certain amount of iterations, giving a very accurate number. The original train
score is what they originally received from the test given by the insurance company. Since we
are currently testing with the tolerance of only .01, Newton’s method only runs a few iterations
on most results. You can now independently find a person's train score with a lot more accuracy
and see what kind of package they are eligible for. For example clients 1,2, and 3 respectively:
Eliminate missing values
Train score is: 6.5
Optimization terminated successfully.
Current function value: 6.48286
Iterations: 4
Train score is: 7.3
Optimization terminated successfully.
Current function value: 7.29378
Iterations: 5
Train score is: 8.0
Optimization terminated successfully.
Current function value: 8.00905
Iterations: 8
The output for client 1 is given Absolute Error: .01714 Relative Error: .0026438
with the error is less than .01 as the tolerance set. this amount is significant enough such that of
millions of people in the US with insurance, that pay thousands of dollars a year, add up to huge
losses in money for insurance companies. To make this even better, I would need more time and
implement a lot more code to make this viable for the current economy.
Code:
import pandas as pd
import numpy as np
import xgboost as xgb
from scipy.optimize import fmin_powell
from ml_metrics import quadratic_weighted_kappa
def eval_wrapper(yhat, y):
y = np.array(y)
y = y.astype(int)
yhat = np.array(yhat)
yhat = np.clip(np.round(yhat), np.min(y), np.max(y)).astype(int)
return quadratic_weighted_kappa(yhat, y)
def get_params():
params = {}
params["objective"] = "reg:linear"
params["eta"] = 0.05
params["min_child_weight"] = 240
params["subsample"] = 0.9
params["colsample_bytree"] = 0.67
params["silent"] = 1
params["max_depth"] = 6
plst = list(params.items())
return plst
def apply_offset(data, bin_offset, sv, scorer=eval_wrapper):
# data has the format of pred=0, offset_pred=1, labels=2 in the first
dim
data[1, data[0].astype(int)==sv] = data[0, data[0].astype(int)==sv] +
bin_offset
score = scorer(data[1], data[2])
return score
# global variables
columns_to_drop = ['Id', 'Response',
'Medical_History_10','Medical_History_24']
xgb_num_rounds = 700
num_classes = 8
eta_list = [0.05] * 200
eta_list = eta_list + [0.02] * 500
print("Load the data using pandas")
train = pd.read_csv("../input/train.csv")
test = pd.read_csv("../input/test.csv")
# combine train and test
all_data = train.append(test)
all_data['BMI_Age'] = all_data['BMI'] * all_data['Ins_Age']
med_keyword_columns =
all_data.columns[all_data.columns.str.startswith('Medical_Keyword_')]
all_data['Med_Keywords_Count'] = all_data[med_keyword_columns].sum(axis=1)
print('Eliminate missing values')
# Use -1 for any others
all_data.fillna(-1, inplace=True)
# fix the dtype on the label column
all_data['Response'] = all_data['Response'].astype(int)
# split train and test
train = all_data[all_data['Response']>0].copy()
test = all_data[all_data['Response']<1].copy()
# convert data to xgb data structure
xgtrain = xgb.DMatrix(train.drop(columns_to_drop, axis=1),
train['Response'].values)
xgtest = xgb.DMatrix(test.drop(columns_to_drop, axis=1),
label=test['Response'].values)
# get the parameters for xgboost
plst = get_params()
print(plst)
# train model
model = xgb.train(plst, xgtrain, xgb_num_rounds, learning_rates=eta_list)
# get preds
train_preds = model.predict(xgtrain, ntree_limit=model.best_iteration)
print('Train score is:', eval_wrapper(train_preds, train['Response']))
test_preds = model.predict(xgtest, ntree_limit=model.best_iteration)
train_preds = np.clip(train_preds, -0.99, 8.99)
test_preds = np.clip(test_preds, -0.99, 8.99)
# train offsets
# determine iterations for more accurate read
offsets = np.array([0.1, -1, -2, -1, -0.8, 0.02, 0.8, 1])
data = np.vstack((train_preds, train_preds, train['Response'].values))
for j in range(num_classes):
data[1, data[0].astype(int)==j] = data[0, data[0].astype(int)==j] +
offsets[j]
for j in range(num_classes):
train_offset = lambda x: -apply_offset(data, x, j)
offsets[j] = fmin_powell(train_offset, offsets[j])
newtons_method(test_preds, test['Response'].values,1000,.01);
# apply offsets to test
data = np.vstack((test_preds, test_preds, test['Response'].values))
for j in range(num_classes):
data[1, data[0].astype(int)==j] = data[0, data[0].astype(int)==j] +
offsets[j]
final_test_preds = np.round(np.clip(data[1], 1, 8)).astype(int)
preds_out = pd.DataFrame({"Id": test['Id'].values, "Response":
final_test_preds})
preds_out = preds_out.set_index('Id')
preds_out.to_csv('xgb_offset_submission.csv')
def newtons_method(ftrain,function,N,.01):
i = 1;
while( i <= N ):
p = p0-fstring(p0)/fpstring(p0);
print(p);
if( abs(p-p0) < TOL ):
print('Took '+ str(i) + ' iterations');
return
i = i+1;
p0=p;

More Related Content

Viewers also liked

Shalby Times – Volume No. 19 | May_2016 Issue out now!
Shalby Times – Volume No. 19 | May_2016 Issue out now!Shalby Times – Volume No. 19 | May_2016 Issue out now!
Shalby Times – Volume No. 19 | May_2016 Issue out now!Rioconn India
 
Digital Programme for NZTE
Digital Programme for NZTEDigital Programme for NZTE
Digital Programme for NZTESahand Bagheri
 
Anit No Need Foe Tha Price Of SATAN.Pt.1.html.doc
Anit No Need Foe Tha Price Of SATAN.Pt.1.html.docAnit No Need Foe Tha Price Of SATAN.Pt.1.html.doc
Anit No Need Foe Tha Price Of SATAN.Pt.1.html.docMCDub
 
Administración de la informacion
Administración de la informacionAdministración de la informacion
Administración de la informacionEden Rodríguez
 
Murad's New Titles.Pic.doc
Murad's New Titles.Pic.docMurad's New Titles.Pic.doc
Murad's New Titles.Pic.docMCDub
 
Conférence - écrire pour le web : ça s'apprend ! - Entrelac - janvier 2016
Conférence - écrire pour le web : ça s'apprend ! - Entrelac - janvier 2016Conférence - écrire pour le web : ça s'apprend ! - Entrelac - janvier 2016
Conférence - écrire pour le web : ça s'apprend ! - Entrelac - janvier 2016Jean-Marc COURTIADE
 
Amen.Pt.1.html.doc.docx
Amen.Pt.1.html.doc.docxAmen.Pt.1.html.doc.docx
Amen.Pt.1.html.doc.docxMCDub
 
Question 3 of media evaluation
Question 3 of media evaluationQuestion 3 of media evaluation
Question 3 of media evaluationTasha900
 
Tha price of tha satan.pt.3.newer.html
Tha price of tha satan.pt.3.newer.htmlTha price of tha satan.pt.3.newer.html
Tha price of tha satan.pt.3.newer.htmlMCDub
 
Taking on the dogmatic approach to education with a bit of ‘reclaim open digi...
Taking on the dogmatic approach to education with a bit of ‘reclaim open digi...Taking on the dogmatic approach to education with a bit of ‘reclaim open digi...
Taking on the dogmatic approach to education with a bit of ‘reclaim open digi...Association for Learning Technology
 
Social Media in International Education
Social Media in International EducationSocial Media in International Education
Social Media in International EducationKeri Ramirez
 
Tha Price Of Health.Pt.1.newer.html.doc
Tha Price Of Health.Pt.1.newer.html.docTha Price Of Health.Pt.1.newer.html.doc
Tha Price Of Health.Pt.1.newer.html.docMCDub
 
Tha Total Elimination.Pt.1.html.doc.docx
Tha Total Elimination.Pt.1.html.doc.docxTha Total Elimination.Pt.1.html.doc.docx
Tha Total Elimination.Pt.1.html.doc.docxMCDub
 
Tha price of freedom.pt.3.newer.html.doc
Tha price of freedom.pt.3.newer.html.docTha price of freedom.pt.3.newer.html.doc
Tha price of freedom.pt.3.newer.html.docMCDub
 

Viewers also liked (20)

Shalby Times – Volume No. 19 | May_2016 Issue out now!
Shalby Times – Volume No. 19 | May_2016 Issue out now!Shalby Times – Volume No. 19 | May_2016 Issue out now!
Shalby Times – Volume No. 19 | May_2016 Issue out now!
 
ocTEL and Open Badges #altc
ocTEL and Open Badges #altcocTEL and Open Badges #altc
ocTEL and Open Badges #altc
 
Custom reporting from CiviCRM with Google Sheets
Custom reporting from CiviCRM with Google SheetsCustom reporting from CiviCRM with Google Sheets
Custom reporting from CiviCRM with Google Sheets
 
Digital Programme for NZTE
Digital Programme for NZTEDigital Programme for NZTE
Digital Programme for NZTE
 
Anit No Need Foe Tha Price Of SATAN.Pt.1.html.doc
Anit No Need Foe Tha Price Of SATAN.Pt.1.html.docAnit No Need Foe Tha Price Of SATAN.Pt.1.html.doc
Anit No Need Foe Tha Price Of SATAN.Pt.1.html.doc
 
Administración de la informacion
Administración de la informacionAdministración de la informacion
Administración de la informacion
 
Murad's New Titles.Pic.doc
Murad's New Titles.Pic.docMurad's New Titles.Pic.doc
Murad's New Titles.Pic.doc
 
Conférence - écrire pour le web : ça s'apprend ! - Entrelac - janvier 2016
Conférence - écrire pour le web : ça s'apprend ! - Entrelac - janvier 2016Conférence - écrire pour le web : ça s'apprend ! - Entrelac - janvier 2016
Conférence - écrire pour le web : ça s'apprend ! - Entrelac - janvier 2016
 
BritishSwimming
BritishSwimmingBritishSwimming
BritishSwimming
 
Amen.Pt.1.html.doc.docx
Amen.Pt.1.html.doc.docxAmen.Pt.1.html.doc.docx
Amen.Pt.1.html.doc.docx
 
Question 3 of media evaluation
Question 3 of media evaluationQuestion 3 of media evaluation
Question 3 of media evaluation
 
Tha price of tha satan.pt.3.newer.html
Tha price of tha satan.pt.3.newer.htmlTha price of tha satan.pt.3.newer.html
Tha price of tha satan.pt.3.newer.html
 
Taking on the dogmatic approach to education with a bit of ‘reclaim open digi...
Taking on the dogmatic approach to education with a bit of ‘reclaim open digi...Taking on the dogmatic approach to education with a bit of ‘reclaim open digi...
Taking on the dogmatic approach to education with a bit of ‘reclaim open digi...
 
Social Media in International Education
Social Media in International EducationSocial Media in International Education
Social Media in International Education
 
GFOLIO1
GFOLIO1GFOLIO1
GFOLIO1
 
Tha Price Of Health.Pt.1.newer.html.doc
Tha Price Of Health.Pt.1.newer.html.docTha Price Of Health.Pt.1.newer.html.doc
Tha Price Of Health.Pt.1.newer.html.doc
 
Lacrosse
LacrosseLacrosse
Lacrosse
 
Tha Total Elimination.Pt.1.html.doc.docx
Tha Total Elimination.Pt.1.html.doc.docxTha Total Elimination.Pt.1.html.doc.docx
Tha Total Elimination.Pt.1.html.doc.docx
 
Tha price of freedom.pt.3.newer.html.doc
Tha price of freedom.pt.3.newer.html.docTha price of freedom.pt.3.newer.html.doc
Tha price of freedom.pt.3.newer.html.doc
 
Community and Agency in CPD: ALT Oppourtnuties
Community and Agency in CPD: ALT OppourtnutiesCommunity and Agency in CPD: ALT Oppourtnuties
Community and Agency in CPD: ALT Oppourtnuties
 

Similar to Insurance Optimization

Accurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsAccurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsJieming Wei
 
OverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docxOverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docxgerardkortney
 
[M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization [M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization Andrea Rubio
 
Ml2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regressionMl2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regressionankit_ppt
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONIRJET Journal
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learnedweka Content
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedDataminingTools Inc
 
The following calendar-year information is taken from the December.docx
The following calendar-year information is taken from the December.docxThe following calendar-year information is taken from the December.docx
The following calendar-year information is taken from the December.docxcherry686017
 
Solution manual for design and analysis of experiments 9th edition douglas ...
Solution manual for design and analysis of experiments 9th edition   douglas ...Solution manual for design and analysis of experiments 9th edition   douglas ...
Solution manual for design and analysis of experiments 9th edition douglas ...Salehkhanovic
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...ijcseit
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...IJCSES Journal
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIVikas Virani
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
Customer Satisfaction Data - Multiple Linear Regression Model.pdf
Customer Satisfaction Data -  Multiple Linear Regression Model.pdfCustomer Satisfaction Data -  Multiple Linear Regression Model.pdf
Customer Satisfaction Data - Multiple Linear Regression Model.pdfruwanp2000
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachYusuf Uzun
 

Similar to Insurance Optimization (20)

Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
 
Accurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsAccurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification Algorithms
 
OverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docxOverviewThis hands-on lab allows you to follow and experiment w.docx
OverviewThis hands-on lab allows you to follow and experiment w.docx
 
report
reportreport
report
 
Kaggle KDD Cup Report
Kaggle KDD Cup ReportKaggle KDD Cup Report
Kaggle KDD Cup Report
 
[M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization [M3A3] Data Analysis and Interpretation Specialization
[M3A3] Data Analysis and Interpretation Specialization
 
Ml2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regressionMl2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regression
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Regression
RegressionRegression
Regression
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
 
The following calendar-year information is taken from the December.docx
The following calendar-year information is taken from the December.docxThe following calendar-year information is taken from the December.docx
The following calendar-year information is taken from the December.docx
 
Solution manual for design and analysis of experiments 9th edition douglas ...
Solution manual for design and analysis of experiments 9th edition   douglas ...Solution manual for design and analysis of experiments 9th edition   douglas ...
Solution manual for design and analysis of experiments 9th edition douglas ...
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMI
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Customer Satisfaction Data - Multiple Linear Regression Model.pdf
Customer Satisfaction Data -  Multiple Linear Regression Model.pdfCustomer Satisfaction Data -  Multiple Linear Regression Model.pdf
Customer Satisfaction Data - Multiple Linear Regression Model.pdf
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN Approach
 

Insurance Optimization

  • 1. Project Report Albert Chu Introduction: Insurance company need a way to assess people to decide insurance plans and prices and my internship with Northwestern Mutual has to do with finding clients and assessing which insurance portfolio is most suitable for them My methods of analysis is using Newton’s Method to create a minimizing risk and maximizing Expected Returns. Given that the insurance industry is hundreds of billions of dollars, tiny inaccuracies are magnified. Method: How can we make portfolio maximizing more accurate? Data sets are given that have numerous factors that influence the eligibility of life insurance. All these variables are given dummy variables: [0=no] and [1=yes] but each are weighted differently with a function. Using more than just a Yes-No Scale or a 1-10 scale we can influence a more accurate result from the given data. As said previously, our function with the predictor variables will have a different weight on the result given to us by R and by eliminating insignificant variables in our data model set. With that, we create a second degree polynomial with all the interaction terms. The first thing we needed to do was to minimize the variance (v=xtVx) and maximize expected return (r = atx). We will be using Newton’s method to find the tiny inaccuracies from the rounding that the function gives people. The function given to us from R: σ2(x)=σ2x2+σrx+V+error can now be implemented by our dataset in the attached files. The overall variance and expected values are calculated then placed
  • 2. in the function. Each person will have a different and unique way of showing if they will be eligible for certain plans. Then to find the optimal point we will use method Newton’s Method given our liability function and interpret the accuracy of the data with real life models to test if it is viable. Thus, Newton’s method to execute more iterations to eliminate numerical rounding errors as well as to more accurately represent the output by this pseudo code in the main set of code: newtons_method(test_preds, test['Response'].values,N,.01) starts with i = 1; while( i <= N ): optvalue = test['Response'].values-test['Response'].values/test['Response'].values; then if ( abs(optvalue -test['Response'].values) < TOL ): print('Took '+ str(i) + ' iterations'); return i = i+1; test['Response'].values=optvalue ; and print(optvalue ); Results: The results would allow insurance company to invest less money on risks, thus allowing lower rates to attract more clients, outputs a number that decides client’s risk assessment and profitability after a certain amount of iterations, giving a very accurate number. The original train score is what they originally received from the test given by the insurance company. Since we are currently testing with the tolerance of only .01, Newton’s method only runs a few iterations on most results. You can now independently find a person's train score with a lot more accuracy and see what kind of package they are eligible for. For example clients 1,2, and 3 respectively: Eliminate missing values Train score is: 6.5 Optimization terminated successfully. Current function value: 6.48286 Iterations: 4 Train score is: 7.3 Optimization terminated successfully. Current function value: 7.29378
  • 3. Iterations: 5 Train score is: 8.0 Optimization terminated successfully. Current function value: 8.00905 Iterations: 8 The output for client 1 is given Absolute Error: .01714 Relative Error: .0026438 with the error is less than .01 as the tolerance set. this amount is significant enough such that of millions of people in the US with insurance, that pay thousands of dollars a year, add up to huge losses in money for insurance companies. To make this even better, I would need more time and implement a lot more code to make this viable for the current economy.
  • 4. Code: import pandas as pd import numpy as np import xgboost as xgb from scipy.optimize import fmin_powell from ml_metrics import quadratic_weighted_kappa def eval_wrapper(yhat, y): y = np.array(y) y = y.astype(int) yhat = np.array(yhat) yhat = np.clip(np.round(yhat), np.min(y), np.max(y)).astype(int) return quadratic_weighted_kappa(yhat, y) def get_params(): params = {} params["objective"] = "reg:linear" params["eta"] = 0.05 params["min_child_weight"] = 240 params["subsample"] = 0.9 params["colsample_bytree"] = 0.67 params["silent"] = 1 params["max_depth"] = 6 plst = list(params.items()) return plst def apply_offset(data, bin_offset, sv, scorer=eval_wrapper): # data has the format of pred=0, offset_pred=1, labels=2 in the first dim data[1, data[0].astype(int)==sv] = data[0, data[0].astype(int)==sv] + bin_offset score = scorer(data[1], data[2]) return score # global variables columns_to_drop = ['Id', 'Response', 'Medical_History_10','Medical_History_24'] xgb_num_rounds = 700 num_classes = 8 eta_list = [0.05] * 200 eta_list = eta_list + [0.02] * 500 print("Load the data using pandas") train = pd.read_csv("../input/train.csv") test = pd.read_csv("../input/test.csv") # combine train and test all_data = train.append(test) all_data['BMI_Age'] = all_data['BMI'] * all_data['Ins_Age']
  • 5. med_keyword_columns = all_data.columns[all_data.columns.str.startswith('Medical_Keyword_')] all_data['Med_Keywords_Count'] = all_data[med_keyword_columns].sum(axis=1) print('Eliminate missing values') # Use -1 for any others all_data.fillna(-1, inplace=True) # fix the dtype on the label column all_data['Response'] = all_data['Response'].astype(int) # split train and test train = all_data[all_data['Response']>0].copy() test = all_data[all_data['Response']<1].copy() # convert data to xgb data structure xgtrain = xgb.DMatrix(train.drop(columns_to_drop, axis=1), train['Response'].values) xgtest = xgb.DMatrix(test.drop(columns_to_drop, axis=1), label=test['Response'].values) # get the parameters for xgboost plst = get_params() print(plst) # train model model = xgb.train(plst, xgtrain, xgb_num_rounds, learning_rates=eta_list) # get preds train_preds = model.predict(xgtrain, ntree_limit=model.best_iteration) print('Train score is:', eval_wrapper(train_preds, train['Response'])) test_preds = model.predict(xgtest, ntree_limit=model.best_iteration) train_preds = np.clip(train_preds, -0.99, 8.99) test_preds = np.clip(test_preds, -0.99, 8.99) # train offsets # determine iterations for more accurate read offsets = np.array([0.1, -1, -2, -1, -0.8, 0.02, 0.8, 1]) data = np.vstack((train_preds, train_preds, train['Response'].values)) for j in range(num_classes): data[1, data[0].astype(int)==j] = data[0, data[0].astype(int)==j] + offsets[j] for j in range(num_classes): train_offset = lambda x: -apply_offset(data, x, j) offsets[j] = fmin_powell(train_offset, offsets[j]) newtons_method(test_preds, test['Response'].values,1000,.01); # apply offsets to test data = np.vstack((test_preds, test_preds, test['Response'].values)) for j in range(num_classes): data[1, data[0].astype(int)==j] = data[0, data[0].astype(int)==j] + offsets[j]
  • 6. final_test_preds = np.round(np.clip(data[1], 1, 8)).astype(int) preds_out = pd.DataFrame({"Id": test['Id'].values, "Response": final_test_preds}) preds_out = preds_out.set_index('Id') preds_out.to_csv('xgb_offset_submission.csv') def newtons_method(ftrain,function,N,.01): i = 1; while( i <= N ): p = p0-fstring(p0)/fpstring(p0); print(p); if( abs(p-p0) < TOL ): print('Took '+ str(i) + ' iterations'); return i = i+1; p0=p;