Grid Search for Email Classification

As continued from the Supervised Classification Post, we have been running grid search on the email classification algorithm varying Gamma and C.  The result are as below. Each column represents a Gamma value and each row represents a C value. The value of the internal grids themselves are error rate percentage for the given value of Gamma and C.

The eye-opening observation of this experiment is how much Gamma influences the error rate compared to C.  While any value of C less than 1 is bad, probably due to over fitting, the performance of the Classification Algorithm does not vary much where C >= 1.  Whereas, distinct gamma values have a more concrete effect on the error value of the Classifcation.

Test data:

I took the parameters pertaining to the lowest error value (3.75) resulting in gamma value at 0.03125 and C value at 4.  I use these values to evaluate the algorithm on the test data. The resultant Error rate is 4.6%

 

Gamma
->

C

0.0009765625

0.03125

0.25

1

4

0.03125

18.15

9.75

44.15

46.2

46.4

0.25

7.25

5.4

18.05

14.2

38.15

1

5.65

4.15

11

14.85

17

4

4.6

3.75

10.95

14.55

17

32

3.95

3.95

10.95

14.55

17

1024

3.9

4.15

10.95

14.55

17

About Sharmila G Sivakumar

Hi, I’m a software professional, exploring new ideas and experimenting in web mining. Python is my programming language of choice. I love dark chocolate, DBC (Death by chocolate), Batman comics and animated series and Phantom comics. I’m attached to my Z60m laptop and the Sansa SanDisk player my hubby gifted me. :)
No comments yet.

Leave a Reply