As continued from the Supervised Classification Post, we have been running grid search on the email classification algorithm varying Gamma and C. The result are as below. Each column represents a Gamma value and each row represents a C value. The value of the internal grids themselves are error rate percentage for the given value of Gamma and C.
The eye-opening observation of this experiment is how much Gamma influences the error rate compared to C. While any value of C less than 1 is bad, probably due to over fitting, the performance of the Classification Algorithm does not vary much where C >= 1. Whereas, distinct gamma values have a more concrete effect on the error value of the Classifcation.
Test data:
I took the parameters pertaining to the lowest error value (3.75) resulting in gamma value at 0.03125 and C value at 4. I use these values to evaluate the algorithm on the test data. The resultant Error rate is 4.6%
|
Gamma C |
0.0009765625 |
0.03125 |
0.25 |
1 |
4 |
|---|---|---|---|---|---|
|
0.03125 |
18.15 |
9.75 |
44.15 |
46.2 |
46.4 |
|
0.25 |
7.25 |
5.4 |
18.05 |
14.2 |
38.15 |
|
1 |
5.65 |
4.15 |
11 |
14.85 |
17 |
|
4 |
4.6 |
3.75 |
10.95 |
14.55 |
17 |
|
32 |
3.95 |
3.95 |
10.95 |
14.55 |
17 |
|
1024 |
3.9 |
4.15 |
10.95 |
14.55 |
17 |
Recent Comments