Practising data scientist. Experimenting in web mining is my passion. Currently dabbling in finance and number crunching. Rediscovering Statistics.
Thursday, May 21, 2015 » python, women in tech
The post was initially conceived as a response to a discussion on women participation initiated by Shreyas in the ChennaiPy group inspired by Jacob Kaplan-Moss’s keynote at PyCon 2015. But it grew into a huge report, with a lot of my personal stories weaved in. It was too much for a mailing list and I wrote a shorter version (only comparatively ;) ) as response. I decided to give this rambling a new life as a post in my blog.
Tuesday, January 13, 2015 » julia, machine learning, python
In recent years, nginx is fast overtaking apache as the default web server due to it’s smaller footprint and easier configuring ability.
Friday, August 01, 2014 » Elasticsearch, Nginx, Python, Tutorial
Elasticsearch by default does not support security features. They are left to the sole discretion of the developer. This guide will help you setup a secure elasticsearch single node server. This is based on days of searching the internet and poring through the alternatives available to seamlessly implement a secure server. As one of the first options, I tried the elasticsearch-jetty plugin but ran into issues like “org.elasticsearch.ElasticsearchIllegalStateException: Can’t create an index”. Further searching did not turn up any solutions. The second best solution seemed to be an nginx reverse proxy. While this is a good solution for a single server, it can potentially become a complicated problem for multi-node servers. But my usecase warranted only a single node. So I decided to put Elasticsearch behind an nginx reverse proxy and provide ssl and password based http authentication to secure the server.
Wednesday, December 18, 2013 » Julia, Machine Learning, Python
Nothing helps like practice. I’m fiddling around with machine learning and data visualization libraries to become proficient in them. So I wanted to try my hand in coaxing information from data. Thats when I came across CO2 emission index of countries in Quandl, one of the excellent sources of open data. So I thought how would it be to take other attributes of the countries and find a relation between those attributes and CO2 emission index.
This is a series of mini posts about using Julia from a python programmer’s perspective. There have already been quite a few opinion pieces on Julia. So let us get to code directly :)
Thursday, October 24, 2013 » Message Brokers, Python, Python External Library
I have been looking into job queues for one of my personal projects. This excellent post by Muriel Salvan A quick message queue benchmark: ActiveMQ, RabbitMQ, HornetQ, QPID, Apollo gives a good comparison of popular message brokers. The consensus in on RabbitMQ, which is well established but one of the upcoming options not covered is Redis. With it’s recent support for PubSub, it is shaping up a strong contender.
Open Source world often provides us treasure troves of golden libraries. Only you might have to spend some time mining for the nuggets. WordNet is such a nugget, if there ever was one. So many words and their senses are catalogued often along with their inherent structure. A definition for each word sense and plenty of examples are also provided. Only what is missing is a well rounded algorithm for similarity comparison for words.
As continued from the Supervised Classification Post, we have been running grid search on the email classification algorithm varying Gamma and C. The result are as below. Each column represents a Gamma value and each row represents a C value. The value of the internal grids themselves are error rate percentage for the given value of Gamma and C.
Sunday, January 29, 2012
Those who know me well would have known that I had taken a break from my job for my pregnancy and now enjoying my son’s company. But my hands itched to code and to create. I have been fleshing out my pet project and experimenting in crawling and natural langauge processing. The project is highly ambitious and may take several man years to materialize. Some of the side projects were good enough to give back to the community like decruft . Yet, most of them are still works in progress, and the more I do, the more I learn.
Sunday, January 29, 2012
Background: Learning to Data Crunch. The first challenge I set for myself was email classification. While spam classification might seem like beating a dead horse. some of my reasons for choosing it is as below