python

Grid Search for Email Classification

As continued from the Supervised Classification Post, we have been running grid search on the email classification algorithm varying Gamma and C.  The result are as below. Each column represents a Gamma value and each row represents a C value. The value of the internal grids themselves are error rate percentage for the given value of Continue reading Grid Search for Email Classification

Coreference Resolution Tools : A First Look

Coreference is where two or more noun phrases refer to the same entity.   This is an integral part of natural languages to avoid repetition, demonstrate possession/relation etc. Eg:  Harry wouldn’t bother to read “Hogwarts: A History” as long as Hermione is around.  He knows she knows the book by heart. The different types of coreference Continue reading Coreference Resolution Tools : A First Look

Chardet: Detecting Unknown String Encodings

Have you ever worked with data extracted from a random source? Like an unknown website? This can sometimes become a nightmare for developpers as it is impossible to determine the encoding. Further text processing without using the correct encoding can become error prone. Lets see how to handle the different situations where encoding is known Continue reading Chardet: Detecting Unknown String Encodings