Digital Solutions & Online Creative 

Alex runs a small digital creative business from an office in London. It's called Outside In Media.

About

Soyrex is a web development and design resource intended as a place for me to share tips and tricks relating to html, css, web design, web development and other internet and web topics. If you like what you read, leave a comment, or send an email. Also, check out my portfolio.

This form does not yet contain any fields.
    Search the Archives
    Currently Reading..
    • Blink: The Power of Thinking Without Thinking
      Blink: The Power of Thinking Without Thinking
      by Malcolm Gladwell

      Blink talks about flash cognition and sub-conscious cognitive activity.. awesome read!

    • Confessions of an Economic Hit Man: The Shocking Story of How America Really Took Over the World
      Confessions of an Economic Hit Man: The Shocking Story of How America Really Took Over the World
      by John Perkins

      Confessions of an Economic Hit Man - i knew the world was a big conspiracy.. but this is a gripping insight into how the world really works.

    Recommended Reading
    • Designing with Web Standards (Voices That Matter)
      Designing with Web Standards (Voices That Matter)
      by Jeffrey Zeldman, Ethan Marcotte
    • Web Standards Creativity: Innovations in Web Design with XHTML, CSS, & DOM Scripting: Innovations in Web Design with XHTML, CSS, and DOM Scripting
      Web Standards Creativity: Innovations in Web Design with XHTML, CSS, & DOM Scripting: Innovations in Web Design with XHTML, CSS, and DOM Scripting
      by C et al Adams
    • CSS Mastery: Advanced Web Standards Solutions
      CSS Mastery: Advanced Web Standards Solutions
      by Andy Budd, Cameron Moll, Simon Collison
    Recent Comments
    Thursday
    28Aug2008

    Django, Nginx + Memcached 

    Ok, so Django has a pretty good caching engine build right into the framework, right? So why would i be trying to implement a caching solution for my Django project? I'm a masochist. that's why.

    My sites are hosted on a VPS (with very sparse memory allocation), so i'm looking to minimise memory usage wherever possible. I also had the joy of watching my Django FCGIs for http://the-hive-mind.com get completely annihilated by metafilter.org a few weeks ago - i think it stood up to the first 200-300 requests.. then completely died.

    Anyway, I have just launched a website for a friend in Nepal, who runs a trekking business (he comes highly recommended if you ever want to go hiking in that region). Basically, I decided to try and load test the Django instances on the site, to see how much traffic it could handle... i downloaded and installed siege.. which (as the name suggests) lays siege to your web server.

    My out of the box Django FCGI being called by a Nginx instance was only capable of reliably handling up to about 10 concurrent connections before it started dropping requests.. not really acceptable (caching, at this point.. was completely turned off).

    So I set about sorting out caching. I read the Django documentation.. and quickly decided that the built in caching wasn't quite to my liking. I already knew a little about memcached, and wanted to use it to cache my generated responses, so the fact that Django supported it was nice. However the idea of using a the Django cache middleware doesn't really cut it. Nginx supports memcached.. so why would i want to fire the request off to my (inherently bulky and inneficient) python FCGI instance? just to use python's undoubtedly slower memcached library to return the cached content? I wouldn't.

    The solution I've come up with is somewhat simplistic, however it DOES solve my immediate problem... and it's done wonders for the server's load capacity.

    The solution : Django creates the cache object, but Nginx retrieves it.

    The Django caching middleware is halfway to the perfect model, it correctly creates cached objects, however it uses a strance combination of . separated python words, the URI and a md5 hexdigest. That all seems like a little much to expect Nginx to replicate (remembering that my goal here is to avoid hitting the FCGI at all for cached content).

    So after some digging and background, I decided it would be fun to write a simple MiddleWare for Django that would allow me to cache out my content's responses with nice, Nginx friendly keys, so I could then implement my cache override directly in my Nginx config.

    Step 1: Creating the Cache Objects: MiddleWare

    This is really very simple. I'd never written Django Middleware before, however it was surprisingly simple. A MiddleWare is just a Python object that implements any of a series of methods. My NginxMiddleWare looks like this:

    from django.core.cache import cache
    import re
    import settings
    
    class NginxMemCacheMiddleWare:
        def process_response(self, request, response):
            cacheIt = True
            theUrl = request.get_full_path()
    
            # if it's a GET then store it in the cache:
            if request.method != 'GET':
                cacheIt = False
    
            # loop on our CACHE_INGORE_REGEXPS and ignore
            # certain urls.
            for exp in settings.CACHE_IGNORE_REGEXPS:
                if re.match(exp,theUrl):
                    cacheIt = False
    
            if cacheIt:
                key = '%s-%s' % (settings.CACHE_KEY_PREFIX,theUrl)
                cache.set(key,response.content)     
    
    
            return response
    

    We also need to install our new MiddleWare into the site. I saved the above class definition into a file called NginxMiddleWare.py and installed it into my site-packages.. i intend to implement this caching scheme on my other Django sites (including this blog). So in my settings.py I add 'NginxMiddleWare.NginxMemCacheMiddleWare' to the MIDDLEWARE_CLASSES.

    Also in the Django projects settings file we add the following:

    CACHE_BACKEND = 'memcached://127.0.0.1:11211/'
    CACHE_KEY_PREFIX = '/your-site-name'
    CACHE_IGNORE_REGEXPS = (
        r'/admin.*',
    )
    

    Firstly we are telling Django to use memcached (I'm assumign you already have memcached set up - go here if you don't).

    I have defined two new settings variables, CACHEKEYPREFIX and CACHEIGNOREREGEXPS. These allow me to control the caching. CACHEKEYPREFIX allows me to store multiple sites in the same memcached.. but creating a unique string key. And CACHEIGNOREREGEXPS allows me to define a set of regular expression URLS that I DO NOT want to cache - like the admin site.

    Step 2: Nginx Configuration

    The configuration of Nginx was a bit fiddly. I really wanted my Django project to continue to run in exactly the same way as it was previously - ie no silly fake URL prefixes or other cruft to confuse my urls.py..

    So I needed to get Nginx to firstly serve my Django pages off a fake, internal server, like so:

    server {
            listen 9004;
            location / {
    
                    # host and port to fastcgi server
                    # @fastcgi_pass unix:/var/www/trekkingnepaltours.com/django.sock;
                    fastcgi_pass 127.0.0.1:8004;
                    fastcgi_param PATH_INFO $fastcgi_script_name;
                    fastcgi_param REQUEST_METHOD $request_method;
                    fastcgi_param QUERY_STRING $query_string;
                    fastcgi_param CONTENT_TYPE $content_type;
                    fastcgi_param CONTENT_LENGTH $content_length;
                    fastcgi_pass_header Authorization;
                    fastcgi_intercept_errors off;
                       include /etc/nginx/fastcgi_params;
                    }
    
    }
    

    Pick a high port, then configure your FCGI however you like it to be configured, above is what I use.. but I'm assuming if you've read this far that you know enough to configure your FCGI under Nginx.

    Basically this creates us another server that we can talk to, allowing us to use Nginx like a proxy to server our Django pages. The logic is something like:

    check if url is cached
    if url IS cached then
        return the cached response
    else
        proxy the connection to our django server
    

    So the guts of the logic for hte Nginx config are as follows:

       location / {
                if ($request_method = POST) {
                        proxy_pass http://localhost:9004;
                        break;
                }
                default_type  "text/html; charset=utf-8";
                set $memcached_key "/your-site-name-$uri";
                memcached_pass localhost:11211;
                error_page 404 502 = /django;
        }
    
        location = /django  {
                proxy_pass http://localhost:9004;
                break;
        }
    

    Before this definition, i have a whole bunch of locations set to handle my static content, so it never reaches this stage. The / location first checks if it's a POST request, if so it will proxy the request off to django directly, we never cache POSTs. Then if we get past that point, we set the default type of our response to html and utf8. Then we set $memcachedkey to the string that we used in our Django settings.py plus a dash plus the $uri from nginx. Next we pass off to memcached locally to check for the cached object, if it exists memcached will return it, otherwise we get an error. Errors are handled by our errorpage directive, which farms them off to a virtual location called /django, which again sends the request to the internal Django instance.

    So, if there is no cached object, then Nginx will get Django to render the page, and the MiddleWare we defined above will save off our cached objects with the correct keys.

    But Wait - what if I change data in the DB?

    Of course, Django is a Content Management Framework and i make extensive use of the provided admin system.. so when I add a new trek to http://trekkingnepaltours.com i want the site to update itself. To achieve this we override the save methods on the relevant DB models and get them to clear the cache.

    I assume here that you use the standard Django convention of defining a getabsoluteurl method on your models, so we just override the save function and call our django cache delete function on the correct cache key, to remove if from memcached. Below is the save method off my Photo model:

    def save(self):
        theUrl = self.trek.get_absolute_url()
        key = '%s-%s' % (settings.CACHE_KEY_PREFIX,theUrl)
        cache.delete(key)
    
        key = '%s-/' % (settings.CACHE_KEY_PREFIX)
        cache.delete(key)
    
        super(Photo, self).save()
    

    As you can see, I'm removing the cache for the Trek model that this photo is associated to (since that's where the actual page is) and also the cache for the home page, the photos often get rendered there too - obviously we can be as granular as we like in this save method, removing whatever we need to in order to update the site. We could probably even write some code to wipe the entire cache..

    The end result

    Siege now shows the site successfully handling 2000 odd concurrent connections on a constant load for 1 minute... while hardly registering any memory usage at all on my VPS - problem solved.

    Disclaimer..

    I know this is probably not the most elegant solution, however it has solved my problem for me. Having said that, if you have ANY comments about how this could be made better, please leave a comment... also if Django can already do this out of the box... someone smarter than me needs to tell me how ;)

    PrintView Printer Friendly Version

    EmailEmail Article to Friend

    Reader Comments (2)

    Hi,
    Thanks for the nice Idea.
    I have a travel company / adventure travel in the Himalaya and it is usually to encounter different kinds of problems with the website and the net.
    Thanks

    Prem

    November 12, 2009 | Unregistered CommenterPrem

    A lot students transpire the responsibility to qualified writers because they lack the skill to compose a respectable paper about this post thats the cause why you need to use online plagiarism, but such customers like composer don't do that. Thanks for the text

    January 18, 2010 | Unregistered CommenterOdryBZ29

    PostPost a New Comment

    Enter your information below to add a new comment.

    My response is on my own website »
    Author Email (optional):
    Author URL (optional):
    Post:
     
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>
    « iPhone Web Apps: monitter.com | Main | jQuery + Twitter + a night of coding »