Grid Search for Email Classification

2012-02-05 15:13:51 » Machine Learning, Python

As continued from the Supervised Classification Post, we have been running grid search on the email classification algorithm varying Gamma and C.  The result are as below. Each column represents a Gamma value and each row represents a C value. The value of the internal grids themselves are error rate percentage for the given value of Gamma and C.

The eye-opening observation of this experiment is how much Gamma influences the error rate compared to C.  While any value of C less than 1 is bad, probably due to over fitting, the performance of the Classification Algorithm does not vary much where C >= 1.  Whereas, distinct gamma values have a more concrete effect on the error value of the Classifcation.

Test data:

I took the parameters pertaining to the lowest error value (3.75) resulting in gamma value at 0.03125 and C value at 4.  I use these values to evaluate the algorithm on the test data. The resultant Error rate is 4.6%

Gamma
->

C

0.0009765625

0.03125

0.25

1

4

0.03125

18.15

9.75

44.15

46.2

46.4

0.25

7.25

5.4

18.05

14.2

38.15

1

5.65

4.15

11

14.85

17

4

4.6

3.75

10.95

14.55

17

32

3.95

3.95

10.95

14.55

17

1024

3.9

4.15

10.95

14.55

17

comments powered by Disqus