Archive | Python RSS feed for this section

Grid Search for Email Classification

As continued from the Supervised Classification Post, we have been running grid search on the email classification algorithm varying Gamma and C.  The result are as below. Each column represents a Gamma value and each row represents a C value. The value of the internal grids themselves are error rate percentage for the given value of Gamma and C.

The eye-opening observation of this experiment is how much Gamma influences the error rate compared to C.  While any value of C less than 1 is bad, probably due to over fitting, the performance of the Classification Algorithm does not vary much where C >= 1.  Whereas, distinct gamma values have a more concrete effect on the error value of the Classifcation.

Test data:

I took the parameters pertaining to the lowest error value (3.75) resulting in gamma value at 0.03125 and C value at 4.  I use these values to evaluate the algorithm on the test data. The resultant Error rate is 4.6%

 

Gamma
->

C

0.0009765625

0.03125

0.25

1

4

0.03125

18.15

9.75

44.15

46.2

46.4

0.25

7.25

5.4

18.05

14.2

38.15

1

5.65

4.15

11

14.85

17

4

4.6

3.75

10.95

14.55

17

32

3.95

3.95

10.95

14.55

17

1024

3.9

4.15

10.95

14.55

17

Chardet: Detecting Unknown String Encodings

Have you ever worked with data extracted from a random source? Like an unknown website? This can sometimes become a nightmare for developpers as it is impossible to determine the encoding. Further text processing without using the correct encoding can become error prone. Lets see how to handle the different situations where encoding is known or unknown.

Read More…

Creating Universally Unique ID in Python

Unique as a Snowflake

Unique as a Snowflake

GUID is a term that was bandied about in my office to signify any unique id that we used to identify our database records. But I never gave it a second thought for a long time, that is until I heard UUID mentioned in the context of couchdb, as enabling distributed data storage. This piqued my interest and I started reading on UUID. (By the way, our Guids were just sequential numbers generated by our db). So I started digging more info on UUID in general and python’s uuid module in particular.
Read More…

How to ssh in python using Paramiko?

If only decoding was so Easy!

If only decoding was so Easy!

If you have ever agonized over connecting and communicating with a remote machine in python, give Paramiko a go.  Paramiko is most helpful for cases where one needs to securely communicate and exchange data,  execute commands on remote machines, handle connect requests from remove machines or access ssh services like sftp. As described in the paramiko’s homepage

“Paramiko is a module for python 2.2 (or higher) that implements the SSH2 protocol for secure (encrypted and authenticated) connections to remote machines.”

Read More…