*Town of Barpak after Gorkha earthquake. Image from The Telegraph (UK)*

by George Taniwaki

This is the final set of my notes from a machine learning class offered by edX. Part 1 of this blog entry is posted in June 2018.

### Step 7: Optimize model

At the end of step 6, I discovered that none of my three models met the minimum F score (at least 0.60) needed to pass the class. Starting with the configuration shown in Figure 5, I modified my experiment by replacing the static data split with partition and sampling using 10 evenly split folds. I used a random seed of 123 to ensure reproducibility.

I added both a cross-validation step and a hyperparameter tuning step to optimize results. To improve performance, I added a Convert to indicator values module. This converts the categorical variables into dummy binary variables before running the model.

Unfortunately, the MAML ordinal regression module does not support hyperparameter tuning. So I replaced it with the one-vs-all multiclass classifier. The new configuration is shown in Figure 6 below. (Much thanks to my classmate Robert Ritz for sharing his model.)

**Figure 6.** Layout of MAML Studio experiment with hyperparameter tuning

For an explanation of how hyperparameter tuning works, see Microsoft documentation and MSDN blog post.

### Model 5 – One-vs-all multiclass model using logistic regression classifier

In the earlier experiments, the two-class logistic regression classifier gave the best results. I will use it again with the one-vs-all multiclass model. The default parameter ranges for the two-class logistic regression classifier are: Optimization tolerance = 1E-4, 1E-7, L1 regularization weight = 0 .01, 0.1, 1.0, L2 regularization weight = 0.01, 0.1, 1.0, and memory size for L-BFGS = 5, 20, 50.

**Table 12a**. Truth table for one-vs-all multiclass model using logistic regression classifier

Truth table |
Is 1 |
Is 2 |
Is 3 |
TOTAL |

Predict 1 | 296 | 156 | 20 | 472 |

Predict 2 | 621 | 4633 | 1651 | 6905 |

Predict 3 | 21 | 847 | 1755 | 2623 |

TOTAL |
936 | 5636 | 3426 | 10000 |

**Table 12b**. Performance measures for one-vs-all multiclass model using logistic regression classifier

Performance |
Value |

Avg Accuracy | 0.76 |

F1 Score | 0.64 |

F1 Score (test data) | Not submitted |

The result is disappointing. The new model has an F1 score of 0.64, which is lower than the F1 score of the ordinal regression model using the logistic regression classifier.

### Model 6 – Add geo_level_2 to model

Originally, I excluded geo_level_2 from the model even though the Chi-square test was significant because it consumed too many degrees of freedom. I rerun the experiment with the variable and keeping all other variables and parameters the same.

**Table 13a**. Truth table for one-vs-all multiclass model using logistic regression classifier and including geo_level_2

Truth table |
Is 1 |
Is 2 |
Is 3 |
TOTAL |

Predict 1 | 355 | 218 | 27 | 600 |

Predict 2 | 564 | 4662 | 1446 | 6672 |

Predict 3 | 19 | 756 | 1953 | 2728 |

TOTAL |
938 | 5636 | 3426 | 10000 |

**Table 13b**. Performance measures for one-vs-all multiclass model using logistic regression classifier and including geo_level_2

Performance |
Value |

Avg Accuracy | 0.80 |

F1 Score | 0.70 |

F1 Score (test data) | Not submitted |

The resulting F1 score using the test dataset is 0.70, which is better than any prior experiments and meets our target of 0.70 exactly.

### Model 7 – Add height/floor to the model

I will try to improve the model by adding a variable measuring height/floor. This variable is always positive, skewed toward zero and has a long tail. To normalize it, I apply the natural log transform and name the variable ln_height_per_floor. Table 14 and Figure 7 show the summary statistics.

**Table 14.** Descriptive statistics for ln_height_per_floor

Variable name |
Min |
Median |
Max |
Mean |
Std dev |

ln_height_per_floor | -1.79 | 0.69 | 2.30 | 0.76 | 0.25 |

**Figure 7.** Histogram of ln_height_per_floor

I run the model again with no other changes.

**Table 15a**. Truth table for one-vs-all multiclass model using logistic regression classifier, including geo_level_2, height/floor

Truth table |
Is 1 |
Is 2 |
Is 3 |
TOTAL |

Predict 1 | 366 | 227 | 28 | 621 |

Predict 2 | 557 | 4640 | 1436 | 6633 |

Predict 3 | 15 | 769 | 1962 | 2746 |

TOTAL |
938 | 5636 | 3426 | 10000 |

**Table 15b**. Performance measures for one-vs-all multiclass model using logistic regression classifier, including geo_level_2, height/floor

Performance |
Value |

Avg Accuracy | 0.80 |

F1 Score | 0.70 |

F1 Score (test data) | Not submitted |

The accuracy of predicting damage_level = 1 or 3 increases, but the accuracy of 2 decreases. Resulting in no change in average accuracy or the F1 score.

### Model 8 – Go back to ordinal regression

The accuracy of the one-vs-all multiclass model was significantly improved by adding geo_level_2. Let’s see what happens if I add this variable to the ordinal regression model which produced a higher F1 score than the one-vs-all model.

**Table 16a**. Truth table for ordinal regression model using logistic regression classifier, including geo_level_2, height/floor

Truth table |
Is 1 |
Is 2 |
Is 3 |
TOTAL |

Predict 1 | 80 | 59 | 1 | 140 |

Predict 2 | 227 | 1557 | 542 | 2326 |

Predict 3 | 3 | 246 | 585 | 834 |

TOTAL |
310 | 1862 | 1128 | 3300 |

**Table 16b**. Performance measures for ordinal regression model using logistic regression classifier, including geo_level_2, height/floor

Performance |
Value |

Avg Accuracy | 0.78 |

F1 Score | 0.67 |

F1 Score (test data) | Not submitted |

Surprisingly, ordinal regression produces worse results when the geo_level_2 variable is included than without it.

### Model 9 – Convert numeric to categorical

I spent a lot of effort adjusting and normalizing my numeric variables. They were mostly integer values with small range and did not appear to be correlated to damage_grade. Could the model be improved by treating them as categorical? Let’s find out.

First I perform a Chi Square test to confirm all of the variables are significant. Then run the model after converting all the values from numeric to strings, and converting all the variables from numeric to categorical.

**Table 17**. Chi-square results of numerical values to damage_grade

Variable name |
Chi-square |
Deg. of freedom |
P value |

count_floor_pre_eq | 495 | 14 | < 2.2E-16* |

height | 367 | 37 | < 2.2e-16* |

age | 690 | 60 | < 2.2e-16* |

area | 738 | 314 | < 2.2e-16* |

count_families | 76 | 14 | 1.3e-10* |

count_superstructure | 104 | 14 | 7.1e-16* |

count_secondary_use | 79 | 4 | 3.6e-16* |

*One or more enums have sample sizes too small to use Chi-square approximation

[ ] P value greater than 0.05 significance level

**Table 18a**. Truth table for ordinal regression model using logistic regression classifier, including geo_level_2, height/floor, and converting numeric to categorical

Truth table |
Is 1 |
Is 2 |
Is 3 |
TOTAL |

Predict 1 | 83 | 62 | 3 | 148 |

Predict 2 | 224 | 1544 | 540 | 2308 |

Predict 3 | 3 | 256 | 585 | 844 |

TOTAL |
310 | 1862 | 1128 | 3300 |

**Table 18b**. Performance measures for ordinal regression model using logistic regression classifier, including geo_level_2, height/floor, and converting numeric to categorical

Performance |
Value |

Avg Accuracy | 0.78 |

F1 Score | 0.67 |

F1 Score (test data) | Not submitted |

Changing the integer variables to categorical has almost no impact on the F1 score.

### Conclusion

Table 19 below summarizes all nine models I built. Six of them achieved an F1 score of 0.60 or higher on the training data, which would probably have been sufficient to pass the class. Two of them had F1 score of 0.70 which would be a grade of 95 out of 100.

I was unable to run most of these models on the test dataset and submit the results to the data science capstone website. Thus, I do not know what my leaderboard F1 score would be. It is possible that I overfit my model to the training data and my leaderboard F1 score might be lower.

Finding the best combination of variables, models, and model hyperparameters is difficult to do manually. It took me several hours to build the nine models described in this blog post. Machine learning automation tools exist but are not yet robust, nor built into platforms like MAML Studio. (Much thanks again to Robert Ritz who pointed me to TPOT, a Python-based tool for auto ML.)

**Table 19.** Summary of models. Green indicates differences from base case, model 2

Model |
Variables |
Algorithm |
Training data |
F1 score (test data) |

1 | None | Naïve guess = 2 | None | 0.56 |

3 | 27 from Table 5 | Ordinal regression with decision forest | 0.67 split | 0.64 |

4 | 27 from Table 5 | Ordinal regression with SVM | 0.67 split | 0.57 (0.5644) |

2 | 27 from Table 5 | Ordinal regression with logistic regression | 0.67 split | 0.68 (0.5687) |

5 | 27 from Table 5 | One-vs-all multiclass with logistic regression, hyperparameter tuning | 10-fold partition | 0.64 |

6 | 27 from Table 5, geo_level_2 | One-vs-all multiclass with logistic regression, hyperparameter tuning | 10-fold partition | 0.70 |

7 | 27 from Table 5, geo_level_2, height/floor | One-vs-all multiclass with logistic regression, hyperparameter tuning | 10-fold partition | 0.70 |

8 | 27 from Table 5, geo_level_2, height/floor | Ordinal regression with logistic regression | 0.67 split | 0.67 |

9 | 27 from Table 5, convert numeric to categorical, geo_level_2, height/floor | Ordinal regression with logistic regression | 0.67 split | 0.67 |