I was generating some reports recently that involved accessing expensive object methods whose results were known to not change on subsequent calls; However, instead of using local variables, I sketched-out this quick memoization decorator to save method responses as variables on the object (using a leading ‘_’ followed by the method-name as the variable name):

def cache_method_results(fn):
    def _view(self, *args, **kwargs):
        var_name = '_{n}'.format(n=fn.__name__)

        if var_name in self.__dict__:  # Return the copy we have
            return self.__dict__[var_name]

        else:  # Run the function and save its result
            self.__dict__[var_name] = fn(self, *args, **kwargs)
            return self.__dict__[var_name]

    return _view

You might use it like this:

class Foo(object):
    @cache_method_results
    def some_expensive_operation(self):
        ...calculate something big and unchanging...
        return results


f = Foo()
print(f.some_expensive_operation())  # This first call will run the calculation
...
print(f.some_expensive_operation())  # but this one will used the cached result instead

It’s not rocket science, but these little tricks add to the fun of using Python.

“Fun example of programming language scope” is only “fun” for a certain type of geek; But I like programming examples that help explain how your code is interpreted, particularly if the lesson can help prevent a certain class of bug.

Now that you’re expecting a scope puzzle, what will the following JavaScript print? (Ignoring the line numbers, of course, which are here to aid in discussion.)


 1:  var foo = 1;
 2: 
 3:  function bar1() {
 4:   print("A: " + foo );
 5:  }
 6: 
 7:  function bar2() {
 8:   print("B: " + foo );
 9:   var foo = 2;
10:   print("C: " + foo );
11:  }
12: 
13:  function bar3() {
14:    print("D: " + foo );
15:    eval("var foo = 2;")
16:    print("E: " + foo );
17:  }
18:
19:  bar1();
20:  bar2();
21:  bar3();

 
 

No peeking…

 
 


Answer


A: 1
B: undefined
C: 2
D: 1
E: 2

 

“A” is easy. Since foo is a “free variable” (i.e., not defined within function bar1), the interpreter goes up the scope chain, and finds the global foo, defined on line 1.

For bar2, “C: 2″ is obvious — it’s “B: undefined” that lets you in on the magic under the hood. You sort of expect to see “B: 1″ (or a compiler error.) However, JavaScript interpreters scan-ahead, searching for variable definitions (e.g., var statements) when parsing a code block. The interpreter sees/re-writes bar2 like this:


 1:  function bar2() {
 2:    var foo;
 3:    print("B: " + foo );
 4:    foo = 2;
 5:    print("C: " + foo );
 6:  }

With that definition, “B: undefined” makes perfect sense.

To short-circuit the magic, bar3 uses eval() to do it’s trickery. At line 14, foo still points to the global foo, much like in bar1; However, the eval statement on line 15 modifies the local scope, introducing a new, local foo. By line 16, “E: 2″ is using the newly introduced foo.

 

The lesson: Even though JavaScript allows you to declare variables at any point within a block, putting your var statements at the beginning of the block can help eliminate scope confusion around whether an inner- or outer-closure contains the correct value.

 
 


Bonus Question

Is JavaScript’s var a let or let*? It’s easy to find out using the following:


 1:  var x = 2, y = 3, z = x + y;
 2:  print(z);

Is this legal? Will it print ’5′?


[Update: 2010/09/21: If you liked this, you'll also enjoy "JavaScript Scoping and Hoisting".]

The latest jsmacro (v0.2.3) adds support for “else” clauses to “if”, “ifdef”, and “ifndef” statements. Combine this with the command-line variable definition support and you can now do fun things like this:


//@ifdef IE6_BUILD
 ...custom IE6 code here
//@else
 ...code for other browsers here
//@end

Of course, this goes against the idea that your JavaScript would remain usable for development without needing to be processed, but it’s just an example. Longer term, I hope to have a different approach available that will allow conditional code substitution so that browser specific optimizations won’t get in the way of an easy development/test/debug process.

jsmacro 0.2 was a full rewrite (because version 0.2′s are always a full rewrite.) It’s now a little closer to what I was originally thinking. Instead of a line-by-line state machine, the parser now uses regex, and dynamically calls macro-handling methods based on the name of the macro. That’s a little vague, but in practice it means that extending the macro language is easier, and it may be possible to do it on-the-fly (as in, writing new macro implementations within the JavaScript source file that’s being parsed — which is a geeky goal I’m going for.)

Other new additions:

  • Test files are now picked-up automatically when named correctly. This makes it painless to add more tests.
  • Added support for setting DEFINE flags from the command-line. Handy if you automate builds for different environments (like IE6 vs. the rest of the world.)
  • Added support for #ifdef and #ifndef

The next big hurdles will be how to handle else statements, and coming up with a reason to implement some type of #inline capability.

For awhile now I’ve wanted a JavaScript preprocessor to conditionally include debug and testing code when needed. It’s always registered as merely a “nice to have”, so I hadn’t sought one out. However, I had a little time over the weekend and wanted to play with the idea, so here it is: jsmacro (on GitHub.)

[Note that before writing this I did seek out existing implementations, and found js-preprocess to be the most interesting; However, I needed something that would work as part of an existing build chain, so authoring the tool in Python instead of JavaScript made more sense.]

Currently, jsmacro is poorly named, as I didn’t write the macro system that was in my head. Instead, it’s currently a basic preprocessor supporting only DEFINE and IF statements, which happened to be all I needed at the time. Usage works like this:

Input JavaScript


  //@define DEBUG 0

  var foo = function() {
    //@if DEBUG
    alert('This.');
    alert('That.');
    //@end

    print "Hi";
  };

Pass the above JavaScript through jsmacro from the command line like this: ./jsmacro.py -f infile.js > outfile.js (assuming the files are all in the same directory), and you get the following:

Output JavaScript


  var foo = function() {

    print "Hi";
  };

The tool has registered the variable ‘DEBUG’ as 0 (i.e., false), so the conditional include statements omit the alert() calls. If DEBUG had been set to 1 (i.e., true), the alert() statements would remain (though all jsmacro instructions would be removed either way.)

One of the tricky things about doing macros or preprocessing in JavaScript is that I wanted the code to be valid JavaScript before the tool is run (which is why C-preprocessors won’t work.) The idea is that you develop as you normally would, but wrap your debug and testing code in conditional jsmacro statements so that they are automatically removed as part of your build process.

There’s nothing fancy about the current implementation (it’s a crude state machine that scans line-by-line, top-to-bottom looking for regex patterns and deciding whether to output the line of not.) Crude as it may be though, it completely solved a problem for me, and hopefully it will help you out as well.

For one of my projects running on Google App Engine, I’m using Google Analytics to track traffic, but I didn’t want the tracking code showing up in my local development environment. Fortunately, detecting where your code is running is easy.

Take a look at this sample:

import os

try:
  is_dev = os.environ['SERVER_SOFTWARE'].startswith('Dev')
except:
  is_dev = False
  
is_prod = not(is_dev)

If the is_dev and is_prod variables are exposed to your templates, you can do something like this:

{% if is_prod %}
 {% include "analytics.html" %}
{% endif %}

Hope this helps!

Sometimes classic Basic Access Authentication is the right approach to password protecting a webpage. It’s not secure from sniffing, but functional if you’re just trying to ward off the casual surfer in the wrong spot. (For example, restricting access to your cat pictures, not your missile silo codes.)

Basic authentication is often added to sites (or directories) using a .htaccess file and something like this:

AuthUserFile /home/foo/.htpasswd
AuthName "Private Area"
AuthType Basic

<Limit GET>
require valid-user
</Limit>

…but you can also do basic authentication on-the-fly by reading/writing HTTP Headers. To ask the browser for a user/password, you can raise a 401 error, and write a “www-Authenticate” header containing something like ‘Basic realm=”Secure Area”‘. To read the user/password, look for an Authorization header, grab it’s value, Base 64 decode it, and you should have a string in the form of “user:password”.

Here’s how you might handle it with Google App Engine. (Well, really you might use a decorator.. but this example is easier to explain.)

class AuthTest(webapp.RequestHandler):
  def get(self):

    # Wrapping in a huge try/except isn't the best approach. This is just 
    # an example for how you might do this.
    try:
      # Parse the header to extract a user/password combo.
      # We're expecting something like "Basic XZxgZRTpbjpvcGVuIHYlc4FkZQ=="
      auth_header = self.request.headers['Authorization']

      # Isolate the encoded user/passwd and decode it
      auth_parts = auth_header.split(' ')
      user_pass_parts = base64.b64decode(auth_parts[1]).split(':')
      user_arg = user_pass_parts[0]
      pass_arg = user_pass_parts[1]
    
      checkAuth(user_arg, pass_arg) # have this call raise an exception if it fails

      self.response.out.write(template.render('templates/foo.html', {}))

    except Exception, e:
      logging.debug("AuthTest Exception: %s" % (e))

      # Here's how you set the headers requesting the browser to prompt
      # for a user/password:
      self.response.set_status(401, message="Authorization Required")
      self.response.headers['WWW-Authenticate'] = 'Basic realm="Secure Area"'

      # Rendering a 401 Error page is a good way to go...
      self.response.out.write(template.render('templates/error/401.html', {}))

That’s all there is to it.

If you want to programatically write an Authorization header (as in, sending authentication credentials to another site, like the Twitter API’s, for example) you’ll do something like this:

request = urllib2.Request(url)
request.add_header('Authorization', "Basic %s" % (base64.b64encode("%s:%s" % (user, password))))

Enjoy!

I was reading Ben Fry‘s thesis, Organic Information Design yesterday, came across the section on Conway’s Game of Life, and thought it would make a nice NodeBox demo.

Here it is: conway-life.py

Nodebox screenshot

There’s not much to it, but it does show a software pattern I’ve been using frequently with NodeBox. Many of the NodeBox examples make heavy use of non-namespaced, global variables. I suppose it makes simple code easy to read for those new to programming, but it’s a habit you’ll want to break before your code starts getting more complex.

What I’ve found helpful is to create a World/Universe/Controller/Stage object that drives the rendering. Instead of using multiple globals in draw(), the controller object keeps the main parameters as local properties, and instantiates any needed objects in it’s __init__(). This approach prevents global variables names from clashing, and allows for creative reuse of rendering components.

Enjoy!

Posted in Uncategorized.

NodeBox makes a great environment for data visualizations and generative art. It’s easy to get started in, and you get basic drawing, type, and image manipulation. When you’re ready for more, it’s not too difficult to bring in external Python libraries to connect NodeBox to other systems, or add physics and particle simulation to spice up your visuals.

For those unfamiliar with NodeBox,

“NodeBox is a Mac OS X application that lets you create 2D visuals (static, animated or interactive) using Python programming code and export them as a PDF or a QuickTime movie.”

It uses PyObjC to embed a Python runtime into an OS X native application, and fits into the same toolbox as Processing and openFrameworks. It’s a bit slower to run complex animations in, but you’re coding in Python, you get the gorgeous fonts and anti-aliasing you’d expect on OS X, and it provides easy access to some OS X native libraries, like Core Image. For a quick look at what it can do, check out the NodeBox gallery.

NodeBox includes it’s own Python build, which is nice for portability and reliability, but it uses a custom sys.path that doesn’t look for Python packages you might already have installed on your system. There are a few ways to deal with this:

  1. You can install your packages into NodeBox’s path, ie., ~/Library/Application\ Support/NodeBox/ — meaning that you can use them from NodeBox, but not from other scripts…
  2. You can import sys in your NodeBox code and manually modify the sys.path value to add your existing packages…
  3. You can install packages into your system site-packages directory, and sym-link them from NodeBox’s directory…
  4. You can make NodeBox use your system packages instead of it’s own by sym-linking ~/Library/Application\ Support/NodeBox to your site-packages directory of choice (ex., /Library/Python/2.5/site-packages)

For this exercise, I’ll be adding pymunk (Python bindings for the Chipmonk physics library) to NodeBox using option #3: Installing pymunk globally, and sym-linking from NodeBox’s package directory. This allows me to run the pymunk examples from the command-line (which use PyGame and Pyglet), but still use pymunk from NodeBox. This may not always be the best solution, so you’ll have to pick what’s right for your needs.

Let’s get started.

Pymunk (at the time of writing) includes it’s own copy of the Chipmunk source code, making this whole process rather easy. Once you’ve downloaded and uncompressed the pymunk source, cd into it’s directory and build chipmunk using:

> python setup.py build_chipmunk

Now you can build and install pymunk:

> python setup.py install

This will install an egg (which I normally hate dealing with, but that’s another story.) If you don’t want the egg, just copy the pymunk directory into your site-packages.

Now we’ll add pymunk to NodeBox’s path. My pymunk is in /Library/Python/2.5/site-packages/, so I’ll:

> cd ~/Library/Application\ Support/NodeBox

> ln -s /Library/Python/2.5/site-packages/pymunk-0.8.1-py2.5.egg/pymunk .

Finally, NodeBox needs access to libchipmunk. I used this approach:

> cd /Applications/NodeBox/NodeBox.app/Contents/MacOS/

> ln -s ~/Library/Application\ Support/NodeBox/pymunk/libchipmunk.dylib .

We should be done! Fire up NodeBox and try an include pymunk to see if it loads. If you don’t see any error messages, you’re good to go.

If you’re new to pymunk (as I was until this week), head over to the Slide and Pin Joint tutorial to see how it works. The example is written for PyGame, so you’ll be doing a little rewriting to bring it into NodeBox.

The following screenshot shows the Slide and Pin Joint demo within NodeBox using my take on porting it. I’m having a little trouble with the slide joint, but you can check out my code if you’re curious: slide_and_pinjoint_example.py

pymunk in nodebox screenshot

Posted in Uncategorized.

Another simple Ubiquity command for the morning… This one, called ‘expandlinks’, finds all links on the current page and adds the link’s URL (as a hyperlink itself) next to each existing link label. This is particularly handy if you’re going to print an HTML page for later reference.


CmdUtils.CreateCommand({
  name: "expandlinks",
  homepage: "http://eriksmartt.com/blog/",
  author: { name: "Erik Smartt"},
  license: "MPL",
  preview: "Expands all hyperlinks, showing link locations.",
  execute: function() {
    var doc =  Application.activeWindow.activeTab.document;
    jQuery(doc.body).find("a").each(function(i) {
        jQuery(this).after(" &lt;<a href='" + this.href + "'>" + this.href + "</a>&gt;");
    });
  }
})

And yes, it will be much easier to subscribe to these commands once I gather them into a JS file for Ubiquity. For now, you can copy/paste into the command editor if you’re interested in trying it out.

Posted in Uncategorized.

Even simpler then my last Ubiquity example, this one came about from an actual project need to verify a custom character-length based text truncation filter. Select the text in the browser, invoke Ubiquity, and type: charcount

CmdUtils.CreateCommand({
  name: "charcount",
  takes: {"text to count chars in": noun_arb_text},
  preview: function( pblock, argText ) {
    pblock.innerHTML = argText.text.length;
  }
})

Update: See comments below for Ubiquity 0.5 compatibility updates

Mozilla Ubiquity was released this week, and the functionality was so inspiring that I couldn’t help playing with it. For those that haven’t checked it out yet, think “Quicksilver inside Firefox”… or perhaps, “a contextually-aware command-line for your web browser.” If that still doesn’t mean anything to you… well, you’ll have to watch the intro video ;-)

Extending Ubiquity’s vocabulary is done via JavaScript, and the developer docs are pretty straight forward.

The docs cover Hello World, so I figured that the next best intro test would be a way to lookup stock charts and quotes. Here’s the result of a few minutes hacking on it:

CmdUtils.CreateCommand({
  name: "tik",
  takes: {"stock ticker symbol": noun_arb_text},
  preview: function( pblock, argText ) {
    var charturl = "http://chart.finance.yahoo.com/c/1y/a/" + argText.text;
    pblock.innerHTML = "";
  },
  execute: function( argText ) {
    var windowManager = Components.classes["@mozilla.org/appshell/window-mediator;1"]
                      .getService(Components.interfaces.nsIWindowMediator);
    var browserWindow = windowManager.getMostRecentWindow("navigator:browser");
    var browser = browserWindow.getBrowser();
    var url = "http://finance.google.com/finance?q=" + argText.text;
    browser.loadOneTab(url, null, null, null, false, false);
  }
})

This command introduces a ‘tik’ keyword, which loads 1-year stock symbol charts (from Yahoo) into the preview pane, and allows click-through to open a new tab for the Google Finance page of said symbol. Note that the preview-pane doesn’t always resize correctly for the chart to fit (though you can generally make it happen by typing a space after the stock symbol.) I guess there’s still some work to do there.

Posted in Uncategorized.

In case someone else needs it, here’s a simple shell script I use for grabbing the current weather conditions by U.S. zipcode using the Yahoo APIs:

function weather {
  zipcode=$1
  if [ -n "$zipcode" ]; then
    lynx -dump "http://weather.yahooapis.com/forecastrss?p=$zipcode" | grep -i condition | awk -F' ' '{print $4 $5 $6}' | awk -F'< ' '{print $1}' | sed 's/,/ /'
  else
    echo 'USAGE: weather <zipcode>'
  fi
}

The script uses lynx to grab the Yahoo RSS feed, piping the output to grep, which extracts the line containing the word ‘condition’. awk then pulls out some specific fields (delimited by spaces, then later by ‘<’), and sed converts the commas to spaces for prettier output. Obviously, the whole thing is fairly fragile, and changes to the RSS format (which happened maybe a month back) break the code. Checking the weather in Austin with the command: `weather 78701`, currently outputs: “Fair 67F”.

Posted in Uncategorized.

Django “lorem ipsum” generator (and a new contrib.webdesign module)

The Django Web Framework project just added a new contrib.webdesign module with an amazingly simple, but incredibly handy first feature: a lorem ipsum generator. The idea is that a project’s base templates can include generated lorem ipsum for testing layout and page flow, but inheriting templates can override the generated text once real content is available.

The lorem tag is used like this (via the contrib.webdesign docs):

  • {% lorem %} will output the common “lorem ipsum” paragraph.
  • {% lorem 3 p %} will output the common “lorem ipsum” paragraph and two random paragraphs each wrapped in HTML <p> tags.
  • {% lorem 2 w random %} will output two random Latin words.

In practice, you might do this:

templates/template.html:


<html>
  <head>
    <title>{% block article_title %}{% lorem 5 w %}{% endblock %}</title>
  </head>
  <body>
    <div class="article">
      <div class="article_title">{% block article_title %}{% lorem 5 w %}{% endblock %}</div>
      <div class="article_body">{% block article_body %}{% lorem 4 p %}{% endblock %}</div>
    </div>
  </body>
</html>

And then inherit when you’re ready:

templates/article.html:


{% extends "template.html" %}

{% if article %}
  {% block article_title %}{{ article.title }}{% endblock %}
  {% block article_body %}{{ article.body }}{% endblock %}
{% endif %}

Previously, I used to just paste lorem ipsum text directly into the main template (wrapped in block tags for overridding), but this new tag will let you skip the copy/paste routine. Very nice!

Posted in Uncategorized.

My previous post, “Passing JSON via the X-JSON HTTP header with Django and Prototype“, contained an example on writing custom HTTP headers from a Django-based web application. Continuing with that theme, here’s another header trick that I use in one of my apps to force the browser’s “Save As…” dialog box when viewing a particular URL.

The feature that I wanted was the ability to generate an XML file based on an HTTP GET request, but to have the browser open a “Save As…” dialog instead of attempting to render it (as would normally happen with XML in a modern browser.) The solution is to exploit the web browser behavior of not handling unknown mime types. A sample implementation (written in Python for the Django Web Framework) follows:

def save_as_xml(request):
    import datetime

    current_time = datetime.now()

    response = HttpResponse('PUT THE XML HERE')
    response['Content-Type'] = 'application/x-generated-xml-backup'
    response['Content-disposition'] = 'Attachment; filename=export.%s.xml' % (current_time.strftime("%Y-%m-%d"))

    return response

Setting the Content-Type header to a made-up type ensures that the browser will not attempt to render the file. The Content-disposition header provides the mechanism for suggesting the filename of the content to be saved on the viewer’s system. In this case, I’m using the standard `datetime` module to insert the date into the suggested filename.

Posted in Uncategorized.

One of the demo sites I was working on this week needed to pass a small amount of JSON back with it’s page results. There are a few ways to do this (and I’d suggest this post, “Loading Content with JSON” as a starting point if you’re looking for ideas), but for simplicity, I decided to take advantage of the automatic X-JSON HTTP Header parsing feature in Prototype 1.5.0. (The Ajax.Request docs address this capability.)

The sample code below demonstrates the use of the X-JSON header with an simple “sticky notes” web app. On the client-side, the JavaScript is quite simple. The second variable in the onSuccess callback handler will be automatically initialized using the data in the X-JSON header:

function display_note(id) {
    new Ajax.Request('/api/note/' + id + '/', {
        method: 'get',
        onSuccess: function(transport, results) {
            alert("Note(" + results['id'] + ") `" + results['title'] + "`: " + results['body']);
        },
        }
    );
}

To handle this request, I’m using Django on the server with the following URL pattern:

(r'^api/note/(?P\d+)/$', 'views.get_note')

The `get_note` method implementation looks like this: [NOTE: For production use, you'll want some exception handling, but I removed the error handling to simplify the example.]

def get_note(request, id):
    # Fetch the Note from the DB:
    note = Note.objects.get(pk=id)
    # Create the response object (with some dummy text for now):
    response = HttpResponse('Check the X-JSON header.')
    # Manually set the X-JSON header using the JSON generated from the Note record:
    response['X-JSON'] = cjson.encode(note.__dict__)
    # Return the response object:
    return response

If you’d like to use this technique on your own sites, there are couple points to remember:

  1. You can’t return an empty HTTP Response regardless of there being an X-JSON header. If the response is empty, the browser will hang waiting for content to arrive.
  2. The X-JSON header should only be used for small payloads. Don’t stuff more then 8kb in your headers. If you’re sending more then that, move the JSON to the body of the response.
  3. The cjson and simplejson encoders don’t handle Django DateTime fields. For objects with DateTime fields, write an alternate method for converting the object into a dictionary before passing it to the json encoder.

[Update: 2009/04/10]
This post got some flak from a firewall vendor for demonstrating a technique of pushing JSON out that will likely bypass their security checks. Since I haven’t used their products in a long time, I’m not too bothered by this, but I do want to point out that you likely don’t want to use HTTP Headers to pass this kind of data in a production site anyway. You’ll find out quickly that there are character limits to how much you can put in the headers, and before long, the distinction between what data goes in the header and what goes in the body will blur. Once that happens you’ve got a big mess on your hands. Better would be to avoid this pattern all together. My post here simply demonstrates how to use the technique, should you be interested in doing so.

Posted in Uncategorized.

In Part 1 of this series, I described some of the motivation, and the components being used to build a new blog for myself. In this (lengthy) post, I’ll address the solution I used to move my content archives from WordPress to the new app.

Installing new blog software is generally easy, but if you have legacy content that you need to preserve, the ability to move content between systems becomes of utmost importance. Fortunately, it’s quite common for popular software to provide import/export features; Having good tools to migrate content reduces switching costs, making it easy to try new software without fear of content lock-in. Unfortunately, with a home-grow blog platform, these tools need to be written from scratch.

For my soon-to-be-launched Django-based blog, importing content from my WordPress installation was an early priority — there’s only so much testing you can do with lorem ipsum posts. In tackling this content migration, I considered the following four options:

  1. Support the legacy database schema.
  2. Export and Import at the database level (ie., SQL dump, some text file munching, and SQL imports.)
  3. Write an adapter layer to pull from the existing database and insert into the new database.
  4. Export the content into a neutral format, and import from that format.

Regardless of the approach taken, I also added one important requirement: The import solution had to be so easy (and easily repeatable) that I would never hesitate to make a change to the database models when needed. Naturally, it’s nice to freeze the model once you have a stable release, but during development, even the database model should be open to agile iteration. I’ve worked on systems where every model change meant writing accompanying SQL scripts to alter the tables, and while effective, it wastes time, and I wanted the option to simply export, wipe the database clean, and re-import whenever needed. (And preferably by simply running a single script.)

I finally settled on option #4, to export into a neutral format (XML), and write an importer for that format; However, I did briefly consider each of the above options:

1) Supporting the legacy (WordPress) database schema sounds nice on the surface. This would allow the two systems to share the same database (thus eliminating the need to migrate content at all), while making it extremely easy to run the systems side-by-side (perhaps even balancing traffic between the two to test the deployment.) The downside though, is that the custom application would need to maintain the data relationships that WordPress was relying on. It’s certainly doable, but on further investigation, I found that I didn’t actually like everything about the WordPress schema; There was a bit too much de-normalized data that I didn’t want to keep around.

2) Exporting and Importing at the database level would essentially involve a mysqldump, some sed/grep/perl magic, and a SQL import into a new database. This would get the job done, but could very well lead to endless hours of tweaking regex patterns; and the end result would basically be throw-away code.

3) Writing an adapter layer was actually the most tempting at first. I knew that Django contained a tool for generating model definitions based on an existing database schema. If this worked for the WordPress database, then all I would need to do is write a thin layer to fetch content from one model and stick it into another. Sure enough, the `inspectdb` tool did do a good job, and I got so far as having routines for pulling posts and comments before realizing that this also wasn’t as reusable a solution as I wanted. Complicating matters was the need to do all this magic in a single database, since the Multiple Database Support branch of Django is still in development/testing.

With the above options scratched off the list, I went in search of a means to export directly from WordPress into a neutral format. With a little googling, I found some posts about an export/import feature that might be “in development” in the WordPress tree, but I found no documentation on the feature. Fortunately, a few more searches turned up the “WordPress XML Export” plugin, which sounded like an effort to backport the exporting feature to early versions of WordPress. After first installing the XML Export plugin, I found that it didn’t actually work with the version of WordPress on my server, but a quick look through the source code revealed a hardcoded version check that was easy enough to modify. With that change made, the plugin has run like a champ ever since.

The XML Export plugin outputs the full contents of a WordPress blog into a WXR file (WordPress eXtended RSS), which is an RSS 2.0 file, extended with a wordpress export namespace so that it can include extra metadata and comments.

With the content archives now in a massive RSS file, the next task was to write an importer. To parse the XML, I decided to use ElementTree for it’s simplicity in getting the job done. Pulling the file into ElementTree is a one-liner (when wordpress_xml_file is a File object):

tree = ET.parse(wordpress_xml_file)

The entries can be easily iterated:

for item in tree.findall("channel/item"):

Extracting the basic elements was also straight-forward (which I stuck into a Dictionary):


results['link'] = item.find("link").text
results['pubDate'] = item.find("pubDate").text
results['summary'] = item.find("description").text
results['body'] = item.find("{http://purl.org/rss/1.0/modules/content/}encoded").text
results['post_date'] = item.find("{http://wordpress.org/export/1.0/}post_date").text
results['post_date_gmt'] = item.find("{http://wordpress.org/export/1.0/}post_date_gmt").text

Extracting the Categories/Tags was only slightly more work:


results['categories'] = []

categories = item.findall("category")

for c in categories:
    results['categories'].append(c.text)

Pulling the comments was the only messy part of the process. The list of comments is easy enough to fetch…

comments = item.findall("{http://wordpress.org/export/1.0/}comment")

…but extracting the actual comment text is a little more work because some comments may contain child nodes. For example, a comment containing a hyperlink, bold tag, or any other HTML will be truncated if you simply use the `.text` attribute. To crawl the comment text and child tags, I used the `getiterator()` method, while concatinating `.text` attributes to assemble the full comment text. While doing this, I also decided to filter out any HTML tags from the comments, which made the process fairly simple:


tmp_comment_list = []

comment_tag = comment.find("{http://wordpress.org/export/1.0/}comment_content")

for comment_tag_child in comment_tag.getiterator():
    tmp_comment_text = comment_tag_child.text
    if tmp_comment_text: tmp_comment_list.append(tmp_comment_text)

the_comment['body'] = ' '.join(tmp_comment_list)

results['comments'].append(the_comment)

By writing an importer for the WXR/RSS 2.0 format, this not only solves the problem at hand, but also sets the groundwork for a reusable RSS importer. IMO, this potential reuse adds additional value to the solution (as opposed to one-off SQL munching or custom adaption layers), which makes it worth any additional work that might have gone into it. With a little re-factoring, the same system could also be extended to support the Movable Type Import Format, making the software very easy to setup and evaluate.

In Part 3, I’ll skip some of the development details and jump into the server issues, with a focus on why the new blog hasn’t launched yet. The answer lies heavily in the challenge of running a Python-based application server in shared hosting environments. The common lack of mod_python, the RAM hit, etc., all add to the complexity in adopting Django.

Posted in Uncategorized.

For one of my Django-based projects, I decided to setup an automated functional-testing system using Selenium to add content to the Admin tool and verify that it works in the site. In order to use this in a “continuous-integration”-like manner, I needed a way to automate the tear-down, initialization, and setup of a fresh installation of the app.

I use a few more tricks to get this all working, but I wanted to share a couple scripts I wrote to handle the database re-initialization. I gather from some of the Django discussions that similar functionality may be working it’s way into the mainline already, but for the time being, here’s what I’m doing.

I broke the process into two scripts, not because it’s the best thing to do, but because doing the first part as a shell script made sense, and doing the second part in Python was easier.

This first script take a brute-force approach at pulling the database settings from the project’s settings.py file, and using them to delete the existing database and create a new one by driving the command-line ‘mysqladmin’ tool. (There’s also some voodoo done elsewhere which results in the script using a different database name if it’s in the testing environment, but that’s for another post.)


#!/bin/bash

# Extract the user/passwd from the settings file
username=`grep DATABASE_USER settings.py | awk -F\' '{print $2}'`
password=`grep DATABASE_PASSWORD settings.py | awk -F\' '{print $2}'`
database=`grep DATABASE_NAME settings.py | awk -F\' '{print $2}'`

echo 'Clearing the database...'
echo 'y' | mysqladmin --host=localhost --user=$username --password=$password drop $database
mysqladmin --host=localhost --user=$username --password=$password create $database

echo 'Setting up the database and test account...'
./dbinit.py

echo 'Done.'

This second script (called ‘dbinit.py’, and called from the script above) uses pexpect (an Expect-like module for Python) to drive the ‘syncdb’ function of Django’s manage.py tool. When using pexpect, the thing to remember is that you have to “expect” the full, and exact string that the child process outputs. I got hung up on this at first, which is why you’ll see me using the more crude “.*” pattern below:


#!/usr/bin/python

import sys
import pexpect

child = pexpect.spawn('python manage.py syncdb')
child.logfile = sys.stdout

#child.expect('Would you like to create one now.*:')
child.expect('.*:')
child.sendline('yes')

child.expect('Username.*:')
child.sendline('SOMEUSERNAME')

child.expect('E-mail address:')
child.sendline('SOMEUSERNAME@foo.com')

child.expect('Password:')
child.sendline('NOTSOSECRETPASSWORD')

child.expect('Password.*:')
child.sendline('NOTSOSECRETPASSWORD')

child.expect(pexpect.EOF)

With these scripts in place, not only have I been able to setup an automated testing solution, but I also use them in early development when I’m still flushing out a data-model. This approach allows me to quickly reinitialize an environment — although you should use with caution since it also deletes all content from the database.

Posted in Uncategorized.

Admittedly, this is perhaps more of an interesting trick rather than a needed feature; However, if you’ve ever wanted to print man pages or simply read them in a nice, anti-aliased document view instead of within the Terminal, here’s a tip you might like. The following bash script (and credit goes 100% to my friend Victor, who is sans-blog) will format and open man pages in Preview:


#!/bin/bash

cmd=$1
if [ -z $cmd ]; then
    me=`basename $0`;
    echo "Usage: $me command_name";
    exit;
fi

man $1 > /dev/null 2>&1
if [ $? -ne 0 ]; then
    echo "No man page for $cmd";
    exit;
fi

man -t $cmd|open -f -a /Applications/Preview.app

On my box, I called the script ‘manpreview’ and dropped it in ~/bin/ for easy access. Once you `chmod u+x` it (and have ~/bin/ in your path), you’ll be able to do fun things like `manpreview tcpdump` for some extended reading.

Earlier this week I was looking for a nice HTML editor for Eclipse to help ease life when using PyDev with a Django project. I didn’t have much luck, other then finding a few syntax coloring tools that were HTML aware. That changed today when I found Aptana: The Web IDE. It’s a free, open source IDE for HTML, JavaScript, and CSS, built on Eclipse (available as a stand-alone application, or an Eclipse plugin) that offers target-browser aware code assist and syntax checking. The site includes some great screencasts to demo the product (and an interesting use of a .tv domain name.)

Though it’s officially unsupported on Eclipse 3.2 (they only support 3.1), it seems to work just fine in my environment.

(Via eHub)

[Minor update: Aptana ran fine on my OS X machine, but crashes hard on my AMD64 Ubuntu Dapper box running Eclipse 3.2.]

Posted in Uncategorized.