Metonym

Monday, February 06, 2012 » Machine Learning, Python

Open Source world often provides us treasure troves of golden libraries. Only you might have to spend some time mining for the nuggets. WordNet is such a nugget, if there ever was one. So many words and their senses are catalogued often along with their inherent structure. A definition for each word sense and plenty of examples are also provided. Only what is missing is a well rounded algorithm for similarity comparison for words.

Decruft

Monday, September 20, 2010 » Machine Learning, Python

One of the pressing problems in web data extraction is separating meaningful content pertaining to the subject from cruft like navigation links, ads, footnotes and sidebar contents promoting links to other pages with the site or other sites.