Over the past few years the percentage of students who leave Dana College at the end of their
first year has increased. Last year, Dana started voluntary one-credit hour-long seminars with
faculty to help first-year students establish an on-campus connection. If Dana is able to show that
the seminars have a positive effect on retention, college administrators will be convinced to
continue funding this initiative. Dana's administration also suspects that first-year students with
lower high school GPAs have a higher probability of leaving Dana at the end of the first year.
Data on the 500 first-year students from last year has been collected. Each observation consists
of a first-year student's high school GPA, whether they enrolled in a seminar, and whether they
dropped out and did not return to Dana. Apply logistic regression to classify observations as
dropped out or not dropped out by using GPA and Seminar as input variables and Dropped as the
target (or response) variable.
For part a., in the Data tab of the Rattle GUI - R window, click inside the box next to Filename:
and navigate to the location of the file DanaTrain.csv. Select the file DanaTrain.csv, click Open,
then click the Execute button. Uncheck the box next to Partition. For the Student variable, select
the Ident button. For the GPA and Seminar variables, select the Input button. For the Dropped
variable, select the Target button. Next, click the Execute button. In the Model tab, and in the
Type: row, select the button next to Linear, and then select the button next to Logistic. Click the
Execute button. To evaluate the performance of a logistic regression model on a validation set,
click the Evaluate tab. In the Model: row, select the box next to Linear, and in the Data: row
select CSV File. Click inside the box next to CSV File and navigate to the location of the file
DanaValidation.csv. Select the file DanaValidation.csv, and click Open. To generate the ROC
chart in the Plots pane of the RStudio interface, in the Evaluate tab, select ROC in the Type: row,
and click
Execute.StudentGPASeminarDropped13.781No23.220No32.690Yes43.120No52.960Yes62.470
No72.370No82.760Yes93.10No102.860No112.620No123.070No133.30No142.430Yes152.980
Yes162.630No172.680Yes182.620Yes192.821No202.380No213.330No222.330Yes232.930Yes
243.060No253.981No262.880No273.120No283.120No293.380No303.291No313.491No322.89
0No332.770No342.610Yes353.10No363.061No372.610Yes382.911No392.490Yes403.450No41
2.930Yes422.220Yes432.510Yes443.711No452.110Yes462.870No472.641No483.040No492.54
0Yes502.570No513.850No522.530No533.21No543.120No552.90No563.270No572.730Yes582.
70Yes592.290No602.340No613.40No622.620No632.621No642.640No652.140No663.060No67
2.830Yes683.351No693.330No702.160No712.40No722.510No732.861No743.120No753.381N
o762.821No773.040No782.780No792.740No802.020No812.560No823.480No832.640Yes842.6
30No852.110No863.030No872.330No882.010No892.940Yes903.811No912.680No922.580No9
32.71No942.750Yes952.530No963.080Yes972.780No983.230N.
Over the past few years the percentage of students who leave Dana Co.pdf
1. Over the past few years the percentage of students who leave Dana College at the end of their
first year has increased. Last year, Dana started voluntary one-credit hour-long seminars with
faculty to help first-year students establish an on-campus connection. If Dana is able to show that
the seminars have a positive effect on retention, college administrators will be convinced to
continue funding this initiative. Dana's administration also suspects that first-year students with
lower high school GPAs have a higher probability of leaving Dana at the end of the first year.
Data on the 500 first-year students from last year has been collected. Each observation consists
of a first-year student's high school GPA, whether they enrolled in a seminar, and whether they
dropped out and did not return to Dana. Apply logistic regression to classify observations as
dropped out or not dropped out by using GPA and Seminar as input variables and Dropped as the
target (or response) variable.
For part a., in the Data tab of the Rattle GUI - R window, click inside the box next to Filename:
and navigate to the location of the file DanaTrain.csv. Select the file DanaTrain.csv, click Open,
then click the Execute button. Uncheck the box next to Partition. For the Student variable, select
the Ident button. For the GPA and Seminar variables, select the Input button. For the Dropped
variable, select the Target button. Next, click the Execute button. In the Model tab, and in the
Type: row, select the button next to Linear, and then select the button next to Logistic. Click the
Execute button. To evaluate the performance of a logistic regression model on a validation set,
click the Evaluate tab. In the Model: row, select the box next to Linear, and in the Data: row
select CSV File. Click inside the box next to CSV File and navigate to the location of the file
DanaValidation.csv. Select the file DanaValidation.csv, and click Open. To generate the ROC
chart in the Plots pane of the RStudio interface, in the Evaluate tab, select ROC in the Type: row,
and click
Execute.StudentGPASeminarDropped13.781No23.220No32.690Yes43.120No52.960Yes62.470
No72.370No82.760Yes93.10No102.860No112.620No123.070No133.30No142.430Yes152.980
Yes162.630No172.680Yes182.620Yes192.821No202.380No213.330No222.330Yes232.930Yes
243.060No253.981No262.880No273.120No283.120No293.380No303.291No313.491No322.89
0No332.770No342.610Yes353.10No363.061No372.610Yes382.911No392.490Yes403.450No41
2.930Yes422.220Yes432.510Yes443.711No452.110Yes462.870No472.641No483.040No492.54
0Yes502.570No513.850No522.530No533.21No543.120No552.90No563.270No572.730Yes582.
70Yes592.290No602.340No613.40No622.620No632.621No642.640No652.140No663.060No67
2.830Yes683.351No693.330No702.160No712.40No722.510No732.861No743.120No753.381N
o762.821No773.040No782.780No792.740No802.020No812.560No823.480No832.640Yes842.6
30No852.110No863.030No872.330No882.010No892.940Yes903.811No912.680No922.580No9
32.71No942.750Yes952.530No963.080Yes972.780No983.230No993.170No1002.40No1013.27