Practising data scientist. Experimenting in web mining is my passion. Currently dabbling in finance and number crunching. Rediscovering Statistics.
Friday, August 01, 2014 » elasticsearch, nginx, python, tutorial
Elasticsearch by default does not support security features. They are left to the sole discretion of the developer. This guide will help you setup a secure elasticsearch single node server. This is based on days of searching the internet and poring through the alternatives available to seamlessly implement a secure server. As one of the first options, I tried the elasticsearch-jetty plugin but ran into issues like “org.elasticsearch.ElasticsearchIllegalStateException: Can’t create an index”. Further searching did not turn up any solutions. The second best solution seemed to be an nginx reverse proxy. While this is a good solution for a single server, it can potentially become a complicated problem for multi-node servers. But my usecase warranted only a single node. So I decided to put Elasticsearch behind an nginx reverse proxy and provide ssl and password based http authentication to secure the server.
Wednesday, December 18, 2013 » julia, machine learning, python
Nothing helps like practice. I’m fiddling around with machine learning and data visualization libraries to become proficient in them. So I wanted to try my hand in coaxing information from data. Thats when I came across CO2 emission index of countries in Quandl, one of the excellent sources of open data. So I thought how would it be to take other attributes of the countries and find a relation between those attributes and CO2 emission index.
This is a series of mini posts about using Julia from a python programmer’s perspective. There have already been quite a few opinion pieces on Julia. So let us get to code directly :)
Thursday, October 24, 2013 » Message Brokers, Python, Python External Library
I have been looking into job queues for one of my personal projects. This excellent post by Muriel Salvan A quick message queue benchmark: ActiveMQ, RabbitMQ, HornetQ, QPID, Apollo gives a good comparison of popular message brokers. The consensus in on RabbitMQ, which is well established but one of the upcoming options not covered is Redis. With it’s recent support for PubSub, it is shaping up a strong contender.
Open Source world often provides us treasure troves of golden libraries. Only you might have to spend some time mining for the nuggets. WordNet is such a nugget, if there ever was one. So many words and their senses are catalogued often along with their inherent structure. A definition for each word sense and plenty of examples are also provided. Only what is missing is a well rounded algorithm for similarity comparison for words.
As continued from the Supervised Classification Post, we have been running grid search on the email classification algorithm varying Gamma and C. The result are as below. Each column represents a Gamma value and each row represents a C value. The value of the internal grids themselves are error rate percentage for the given value of Gamma and C.
Sunday, January 29, 2012
Those who know me well would have known that I had taken a break from my job for my pregnancy and now enjoying my son’s company. But my hands itched to code and to create. I have been fleshing out my pet project and experimenting in crawling and natural langauge processing. The project is highly ambitious and may take several man years to materialize. Some of the side projects were good enough to give back to the community like decruft . Yet, most of them are still works in progress, and the more I do, the more I learn.
Sunday, January 29, 2012
Background: Learning to Data Crunch. The first challenge I set for myself was email classification. While spam classification might seem like beating a dead horse. some of my reasons for choosing it is as below
Well, the title is kinda misleading, as no one has yet produced a satisfactorily working installation of iTunes on Linux not even with wine. I tried, and failed. To put it in context, a few days back I came across the article “Harvard **Statistics 110**: Introduction to Probability, on iTunes” on Hacker News. I was looking at options to learn prob and stats properly and this course offered the best coverage I had ever seen. I wanted to use the resource. I tried installing wine and then itunes on wine. But it did not work. Most search results were dead ends. So I went back to Hacker news post and looked at the comments section for clues. http://news.ycombinator.com/item?id=3469393
Tuesday, November 22, 2011
Today I came upon a link on Hacker News. When I clicked on the link, I was not expecting much, except that there might be something interesting, considering that it was on Hacker News' front-page. What I saw, blew me away. It’s the story of how Sugru was dreamt about in February 2003, how it evolved through conceptualisation, fleshing out the product, funding, branding and what not to where it is today, serving customers in all 7 continents.