Django and Celery - Death to Cron

For those who have been following me, you will know that I’ve been working on the techinical side of managing a EVE Online corporation named Dreddit. As part of this I was tasked (what seems a lifetime ago) to create a access management system for the corporation’s services and tools.

EVE Online provides a basic API, allowing for external users to probe the game for information, it is now a common thing for corporations to use these API keys as permission codes, the webapp can connect to the EVE API and deem if the user is allowed to access the service.

The tool I created is simply called Auth, and is designed to be automated as possible allowing users to gain access to their accounts without admin intervention. One of the key processes in Auth is the update job which checks all the API keys and updates the permissions as needed. In a corporation which has in the region of 1700 players and is the biggest corporation in EVE you can imagine that this job is a problem.

On the old design it was a simple cronjob that interated through the keys and updated them, after time this job was taking in the region of a hour to complete so it was split up to complete batches of 50 keys at a time, even after diving the load we’re now still hampered by this job.

Over the last 24 hours we’ve decided to move this job to a task management system, and we’ve picked Celery for its excellent Django integration and its distributed nature, this has allowed us to do something we couldn’t of done before.

The idea is that you break down your common processes into tasks, nuggests that can be shipped to any machine and run in a time-insensitive manner. That wouldn’t work well for Django views and UI, but for the backend processing a app does (sending mails, calling webservices, updating permissions) it takes the need away from running cronjobs or slowing down the user’s response times.

So a few key features are:

Distributed

With RabbitMQ as its backend, its allowed us to spin up specific queue workhorses for Auth, as our infrastructure is based on XenServer this will allow us to add more workers to the processing pool without significant overhead and work, just a simple deployment of the Django app and configuration to point to our RabbitMQ and database instance.

Task Queueing

We can throw those hundreds of keys on the queue, to be processed when available, enabling us to save time and load by not checking the database every five minutes for keys that require a update.

Rate Limiting

This is a key item, Recently CCP decided to start enforcing limits on the EVE API server, if your service makes too many invalid requests then it could end up with a nasty letter in your mailbox.

With Celery we can limit the number of queries for specific tasks, limiting the authenticated requests (like reading your character list) while allowing the anonymous cached requests to be sent through more often.

No more sleep(), no more worrying that CCP will hassle me for hammering the API, bliss!

I’m sure i’ll have a lot more to write about it over time, but for the moment its giving that massive productivity boost we’ve been looking for. Hopefully now this will scale to handle one of the largest alliances in the game as well.

Posted