As continued from the Supervised Classification Post, we have been running grid search on the email classification algorithm varying Gamma and C. The result are as below. Each column represents a Gamma value and each row represents a C value. The value of the internal grids themselves are error rate percentage for the given value of Gamma and C.

The eye-opening observation of this experiment is how much Gamma influences the error rate compared to C. While any value of C less than 1 is bad, probably due to over fitting, the performance of the Classification Algorithm does not vary much where C >= 1. Whereas, distinct gamma values have a more concrete effect on the error value of the Classifcation.

Test data:

I took the parameters pertaining to the lowest error value (3.75) resulting in gamma value at 0.03125 and C value at 4. I use these values to evaluate the algorithm on the test data. The resultant Error rate is 4.6%

Gamma C | 0.0009765625 | 0.03125 | 0.25 | 1 | 4 |
---|---|---|---|---|---|

0.03125 | 18.15 | 9.75 | 44.15 | 46.2 | 46.4 |

0.25 | 7.25 | 5.4 | 18.05 | 14.2 | 38.15 |

1 | 5.65 | 4.15 | 11 | 14.85 | 17 |

4 | 4.6 | 3.75 | 10.95 | 14.55 | 17 |

32 | 3.95 | 3.95 | 10.95 | 14.55 | 17 |

1024 | 3.9 | 4.15 | 10.95 | 14.55 | 17 |