I have been looking into job queues for one of my personal projects. This excellent post by Muriel Salvan A quick message queue benchmark: ActiveMQ, RabbitMQ, HornetQ, QPID, Apollo gives a good comparison of popular message brokers. The consensus in on RabbitMQ, which is well established but one of the upcoming options not covered is Redis. With it’s recent support for PubSub, it is shaping up a strong contender.
Advantages of RabbitMQ
- Highly customizable routing
- Persistent queues
Advantages of Redis
- high speed due to in memory datastore
- can double up as both key-value datastore and job queue
Since I’m working in python, I decided to go with Celery. I tried testing both RabbitMQ and Redis by adding 100000 messages to the queue and using a worker to process the queued messages. The test was run thrice and averaged. In the case of the celery worker, there doesn’t seem to be a burst mode, i.e the worker cannot not exit when all the messages in the queue are processed. So I had to use the next best approximation, the timestamps in the log messages.
tasks.py has the task definition and the message broker to use.
from celery import Celery celery = Celery('tasks', broker='amqp://guest@localhost//') #celery = Celery('tasks', broker='redis://localhost//') @celery.task def newtask(somestr, dt, value): pass
test.py does the actual adding of the tasks to the queue
from tasks import newtask from datetime import datetime import time dt = datetime.utcnow() st_time = time.time() for i in xrange(100000): newtask.delay('shortstring', dt, 67.8) print time.time() - st_time
The celery worker retrieves the messages by running the command
time celery -A tasks worker --loglevel=info -f tasks.log --concurrency 1
–concurrency indicates how many simultaneous workers to run. -f indicates the logfile to use. We can infer the time taken for the run from the log timestamp to process the last message. Next we need to estimate the time taken for the INFO level logging the worker does and deduct it from the total time taken.
import logging import sys import time logger = logging.getLogger('MainProcess') hdlr = logging.FileHandler('/tmp/myapp.log') formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s') hdlr.setFormatter(formatter) logger.addHandler(hdlr) logger.setLevel(logging.INFO) def main(): inputf = sys.argv for inputf in sys.argv[1:]: loglines = file(inputf).readlines() loglines = [line.split(']', 1).strip() for line in loglines] st_time = time.time() for line in loglines: logger.info(line) print inputf, time.time() - st_time if __name__ == "__main__": main()
Here is the tabulation of the results for each trial consisting of 100,000 messages. It is apparent that RabbitMQ takes 75% of Redis' time to add a message and 86% of the time to process a message. Since the message processing capacity is almost equal, the decision would be solely based on the features. i.e if you want sophisticated routing capabilities go with RabbitMQ. If you need an in memory key-value store go with Redis.
|Activity||Trial 1||Trial 2||Trial 3||Average||Per Message|
|RabbitMQ - Adding Message to Queue||56.96||54.18||57.13||56.09||0.0005609|
|Redis - Adding Message to Queue||68.81||76.52||76.95||74.09||0.0007409|
|RabbitMQ - Processing Messages off the Queue||122.406||132.55||195.885||150.28||0.0015028|
|Redis - Processing Messages off the Queue||157.59||177.774||186.332||173.9||0.001739|