This month’s APUG meeting will feature guest speaker Greg Wilson, author of Beautiful Code, Data Crunching, Parallel Programming Using C++, Practical Parallel Programming, etc.
For more details, see: http://wiki.python.org/moin/AustinPythonUserGroup and http://python.meetup.com/188/.
Hope to see you there!
Last week I received a review-copy of the new “The Definitive Guide to Django” book from Apress. I hadn’t planned on buying the book since it seemed a little too beginner-focused; but I agreed to give it an honest reading, so I happily dove in with an “it’s Python, of course I’m going to like it” attitude.
The book was written by Adrian Holovaty and Jacob Kaplan-Moss, the creators and “Benevolent Dictators” of the Django Web Framework. It was Holovaty and Kaplan-Moss’ first book, and, I believe, meant to be the first Django book to market. The book was drafted online; open to peer-review and community feedback; and ultimately published under the GNU Free Documentation License.
From the get-go, the print edition had a few inherent market challenges to face: First, the entire book is available online, for free, at: <http://www.djangobook.com/>. Second, in many ways the book is a re-hash of the docs available at <http://www.djangoproject.com/documentation/>, which are also free. Third, the book covers Django 0.96, not SVN. (0.96 is technically the latest-snapshot release, but a lot has changed since 0.96.) And finally, the $45 MSRP could be seen as a little steep for what is effectively a printed copy of a free, online book.
Diving in, the book takes the reader through the basic installation process, provides a brief background on how the framework came to be (and why you want one), then steps through the major features (ie., the template system, ORM, URLconfs, generic views, etc.) It’s what you’d expect from a technical reference — no fluff, and straight to the details. There are plenty of code snippets to learn from, and the sidebar notes tend to be insightful.
Since it wasn’t new material for me, the book was a fairly quick read; but the experience of reading Django documentation in book-form was actually quite fascinating. There’s something about settling into a comfortable chair with a book, pen, and highlighter that you just can’t get with online documentation. Perhaps it was just a little more noticeable given the material. When I read the Django docs online, I tend to skim over them while trying to solve a problem. I use them as a reference more then a learning tool, and it’s usually while actively coding, thus my brain is partially distracted with whatever it is I’m building.
With a physical book, you can unplug, step away from the computer, and give the material your undivided attention. This isolation from distraction results in a much deeper understanding of the text. This is the real the value of the printed book — it’s an opportunity to digest online documentation in an environment more conducive to learning and retention.
The market needed a good Django book, and this one delivered a solid reference for the framework. Arguably, it’s not really a “Beginner’s Guide to Django”, but hopefully it covers enough of the basics that future books can focus on best practices and more advanced techniques. (On a related note, there’s apparently an upcoming “Practical Django Projects” book, also from Apress, that will focus more on building “reusable Django applications from start to finish”. This might actually make for a better beginner’s book, depending on how it turns out. [Via The B-List: Speaking and writing].)
The million-dollar question then, is “Should you buy this book?” My answer ended up being a bit more positive then I expected, but there are two parts: First, if you’re a front-end developer only, you don’t need this book. You can just read Chapter 4: The Django Template System online, and then use the “Django Templates: Guide for HTML authors” section of the online docs as a reference. For back-end developers, the story is different. If you’re going to just “read it while you hack”, then you might as well just read it online; but if you’re serious about building applications with Django (especially if you’re new to it), then you should consider the book and investing the time to step away from the computer and really let yourself get into it. Unless you are an active contributor to Django (which I’m not, just to be clear), the odds are pretty good that you’ll learn something new, even if you’re already using Django today.
Django “lorem ipsum” generator (and a new contrib.webdesign module)
The Django Web Framework project just added a new contrib.webdesign module with an amazingly simple, but incredibly handy first feature: a lorem ipsum generator. The idea is that a project’s base templates can include generated lorem ipsum for testing layout and page flow, but inheriting templates can override the generated text once real content is available.
The lorem tag is used like this (via the contrib.webdesign docs):
In practice, you might do this:
templates/template.html:
<html>
<head>
<title>{% block article_title %}{% lorem 5 w %}{% endblock %}</title>
</head>
<body>
<div class="article">
<div class="article_title">{% block article_title %}{% lorem 5 w %}{% endblock %}</div>
<div class="article_body">{% block article_body %}{% lorem 4 p %}{% endblock %}</div>
</div>
</body>
</html>
And then inherit when you’re ready:
templates/article.html:
{% extends "template.html" %}
{% if article %}
{% block article_title %}{{ article.title }}{% endblock %}
{% block article_body %}{{ article.body }}{% endblock %}
{% endif %}
Previously, I used to just paste lorem ipsum text directly into the main template (wrapped in block tags for overridding), but this new tag will let you skip the copy/paste routine. Very nice!
I’m back from PyCon 2007. It was a busy weekend, with 593 Pythonistas attending the conference. I took a fair amount of notes, but I’ve pulled out some highlights below:
Public school education is so bad that real eLearning solutions can’t go to the schools — they need to be outside of schools so that you don’t have the traditional censorship that comes with public schools — and you don’t have the associates with the bad experiences kids have while at “school”.
I’m off to PyCon 2007 (Dallas, TX) in the morning. I managaged to get into the Advanced Django tutorial (which I’m really looking forward to), so I’m heading up a day early. If you happen to be there, hopefully we’ll cross paths!
In Part 1 of this series, I described some of the motivation, and the components being used to build a new blog for myself. In this (lengthy) post, I’ll address the solution I used to move my content archives from WordPress to the new app.
Installing new blog software is generally easy, but if you have legacy content that you need to preserve, the ability to move content between systems becomes of utmost importance. Fortunately, it’s quite common for popular software to provide import/export features; Having good tools to migrate content reduces switching costs, making it easy to try new software without fear of content lock-in. Unfortunately, with a home-grow blog platform, these tools need to be written from scratch.
For my soon-to-be-launched Django-based blog, importing content from my WordPress installation was an early priority — there’s only so much testing you can do with lorem ipsum posts. In tackling this content migration, I considered the following four options:
Regardless of the approach taken, I also added one important requirement: The import solution had to be so easy (and easily repeatable) that I would never hesitate to make a change to the database models when needed. Naturally, it’s nice to freeze the model once you have a stable release, but during development, even the database model should be open to agile iteration. I’ve worked on systems where every model change meant writing accompanying SQL scripts to alter the tables, and while effective, it wastes time, and I wanted the option to simply export, wipe the database clean, and re-import whenever needed. (And preferably by simply running a single script.)
I finally settled on option #4, to export into a neutral format (XML), and write an importer for that format; However, I did briefly consider each of the above options:
1) Supporting the legacy (WordPress) database schema sounds nice on the surface. This would allow the two systems to share the same database (thus eliminating the need to migrate content at all), while making it extremely easy to run the systems side-by-side (perhaps even balancing traffic between the two to test the deployment.) The downside though, is that the custom application would need to maintain the data relationships that WordPress was relying on. It’s certainly doable, but on further investigation, I found that I didn’t actually like everything about the WordPress schema; There was a bit too much de-normalized data that I didn’t want to keep around.
2) Exporting and Importing at the database level would essentially involve a mysqldump, some sed/grep/perl magic, and a SQL import into a new database. This would get the job done, but could very well lead to endless hours of tweaking regex patterns; and the end result would basically be throw-away code.
3) Writing an adapter layer was actually the most tempting at first. I knew that Django contained a tool for generating model definitions based on an existing database schema. If this worked for the WordPress database, then all I would need to do is write a thin layer to fetch content from one model and stick it into another. Sure enough, the `inspectdb` tool did do a good job, and I got so far as having routines for pulling posts and comments before realizing that this also wasn’t as reusable a solution as I wanted. Complicating matters was the need to do all this magic in a single database, since the Multiple Database Support branch of Django is still in development/testing.
With the above options scratched off the list, I went in search of a means to export directly from WordPress into a neutral format. With a little googling, I found some posts about an export/import feature that might be “in development” in the WordPress tree, but I found no documentation on the feature. Fortunately, a few more searches turned up the “WordPress XML Export” plugin, which sounded like an effort to backport the exporting feature to early versions of WordPress. After first installing the XML Export plugin, I found that it didn’t actually work with the version of WordPress on my server, but a quick look through the source code revealed a hardcoded version check that was easy enough to modify. With that change made, the plugin has run like a champ ever since.
The XML Export plugin outputs the full contents of a WordPress blog into a WXR file (WordPress eXtended RSS), which is an RSS 2.0 file, extended with a wordpress export namespace so that it can include extra metadata and comments.
With the content archives now in a massive RSS file, the next task was to write an importer. To parse the XML, I decided to use ElementTree for it’s simplicity in getting the job done. Pulling the file into ElementTree is a one-liner (when wordpress_xml_file is a File object):
tree = ET.parse(wordpress_xml_file)
The entries can be easily iterated:
for item in tree.findall("channel/item"):
Extracting the basic elements was also straight-forward (which I stuck into a Dictionary):
results['link'] = item.find(”link”).text
results['pubDate'] = item.find(”pubDate”).text
results['summary'] = item.find(”description”).text
results['body'] = item.find(”{http://purl.org/rss/1.0/modules/content/}encoded”).text
results['post_date'] = item.find(”{http://wordpress.org/export/1.0/}post_date”).text
results['post_date_gmt'] = item.find(”{http://wordpress.org/export/1.0/}post_date_gmt”).text
Extracting the Categories/Tags was only slightly more work:
results['categories'] = []
categories = item.findall(”category”)
for c in categories:
results['categories'].append(c.text)
Pulling the comments was the only messy part of the process. The list of comments is easy enough to fetch…
comments = item.findall("{http://wordpress.org/export/1.0/}comment")
…but extracting the actual comment text is a little more work because some comments may contain child nodes. For example, a comment containing a hyperlink, bold tag, or any other HTML will be truncated if you simply use the `.text` attribute. To crawl the comment text and child tags, I used the `getiterator()` method, while concatinating `.text` attributes to assemble the full comment text. While doing this, I also decided to filter out any HTML tags from the comments, which made the process fairly simple:
tmp_comment_list = []
comment_tag = comment.find(”{http://wordpress.org/export/1.0/}comment_content”)
for comment_tag_child in comment_tag.getiterator():
tmp_comment_text = comment_tag_child.text
if tmp_comment_text: tmp_comment_list.append(tmp_comment_text)
the_comment['body'] = ‘ ‘.join(tmp_comment_list)
results['comments'].append(the_comment)
By writing an importer for the WXR/RSS 2.0 format, this not only solves the problem at hand, but also sets the groundwork for a reusable RSS importer. IMO, this potential reuse adds additional value to the solution (as opposed to one-off SQL munching or custom adaption layers), which makes it worth any additional work that might have gone into it. With a little re-factoring, the same system could also be extended to support the Movable Type Import Format, making the software very easy to setup and evaluate.
In Part 3, I’ll skip some of the development details and jump into the server issues, with a focus on why the new blog hasn’t launched yet. The answer lies heavily in the challenge of running a Python-based application server in shared hosting environments. The common lack of mod_python, the RAM hit, etc., all add to the complexity in adopting Django.
I was hoping to write this post as an announcement for my new blogging solution, but instead (since I haven’t flipped the switch yet), I thought I’d start off with why I’m doing it, and what software I’ve pulled together to keep from reinventing the wheel. (In future posts I’ll address the development itself, the unique features, and the major obstacles in moving from a WordPress installation on a shared server, to a custom web app written using Django. This last bit, the actual hosting of a Django app, is a significant one, as it is the primary issue causing a delay in switching over).
I moved my blog to WordPress software (from PyBlosxom, and a number of home-grown solutions before that) back in April 2005. I’ve been quite happy with WordPress, and would definitely recommend it for anyone who doesn’t enjoy coding (and maintaining) their own web apps. After writing a few custom plugins and a plain, but functional theme, my WordPress-based blog has been churning reliably for well over a year. However, after also using Django for over a year to build other web apps, it became too tempting not to use Django for my own site. (It really is a great framework to work with, particularly if you’re a fan of Python.)
Building a custom app isn’t all roses and cherries. (I’m not sure what that means, but it sounds good.) With an established open source solution like WordPress, you have access to thousands of testers and hackers, all working to ensure that the software is reliable. You have access to good documentation, and plenty of bloggers who post solutions for custom integration problems. Furthermore, with PHP support being almost ubiquitous in shared hosting environments, you can have a WordPress installation up and running in a matter of minutes.
With a home-grown system, you do ALL the heavy lifting in development, testing, and maintenance, and in that regard, you’re re-inventing the wheel in some areas, and leaving a community of support behind. Viewed in this light, it seems a little silly to build a custom solution when a proven, free system already exists. But custom apps can have their advantages if you can still leverage some open source communities while assembling a solution that is architected to address the specific needs you have. In my case, I tried to do as little custom, one-off engineering as possible (expect in the fun areas), while enabling a unique flexibility to re-think content interaction on my blog. I wanted the ability to prototype new feature ideas at the speed of Python + Django (which is to say, very fast), but not get bogged-down debugging ORM’s and template engines. (I’ve spent plenty of time doing that in the past.)
Not wanting to write everything from scratch, my new solution is LAMP based (Linux [Ubuntu to be specific], Apache 2, MySQL, and Python), using the Django framework for it’s generated Admin CMS, object-relation mapping library, templating engine, URL mapping, etc. In other words, the only thing not leveraged from the open source community is the actual business logic of my app (which in a blog, can be quite simple.) I’m even leveraging external services like Akismet (for filtering comment spam), and del.icio.us, flickr, and Technorati for pulling in external content and metrics. I’m also using ElementTree (for the XML parsing in my content import system), Pygments (for syntax highlighting the code embedded in blog posts), simplejson (for generating JSON from Python objects), PyTechnorati (for accessing Technorati’s API’s), the Universal Feed Parser (for pulling in external RSS/ATOM feeds), and the Yahoo! Interface Library (for the CSS Fonts and Grids libraries.) During development, I’ve also relied heavily on Subversion and Bazaar for my revision control needs.
With this arsenal of open source software, I was able to feature-match the bits I wanted from WordPress rather quickly, and then iterate on the presentation and interaction without the burden of implementing everything from scratch. Needless to say, I’m excited about the new site (it’s been running parallel to my WordPress blog for several months), and I’m eager to see what happens when I finally flip the switch and start routing traffic to it!
Just a reminder, the Austin (TX) Python Users Group meeting is tonight, 7pm, at Enthought, in downtown Austin. Eugene Oden will be giving a presentation on using Pyro (Python Remote Objects.)