For the past six months I’ve been writing, breaking, and re-writing a new CMS system for this site. The current incarnation is something like a Blogging tool, but it didn’t start that way. It started as a Wiki.
Last year I was testing a variety of Wiki’s as a means of keeping and sharing project notes. During my search for all things Wiki, I started using a text editor for OS X called VoodooPad that blends browsing and editing a Wiki into a single, desktop application. Having a stand-alone Wiki is quite cool, but what sold me on this application was the ability to do remote editing via XML-RPC. Unfortunately, the only Wiki on the market at the time that supported the VoodooPad API was a demo written by the applications’developers. The sample Wiki didn’t quite meet my needs, but the novelty of managing content using this unique hypertext editor was enough motivation to dive into the API’s and see what I could hack together.
Implementing the VoodooPad API’s turned out to be rather simple, so I started with a class that could pass data around for testing. Working with test data got old quick, so I decided to build a simple web front-end to edit the test cases. This meant I had to store ‘pages’and lookup ‘keys’for those pages so I could edit/view them online while still returning the contents to VoodooPad. Once I got started, I decided to put off debugging the API’s for awhile, and instead focused on the web interface. Deep down I wanted to do this anyway, but now I had a little more motivation.
After committing to writing a Wiki engine from scratch, I broke down the main functions of a Wiki as problems to be solved. These are:
- Page storage
- Key lookup and key matching
- Plain-text based markup scheme
- Hyperlinking
- Version control of content
- Caching
- Templating the presentation layer (separating logic from presentation)
- The ability to author/edit articles online (Web Interface)
- Support for remote publishing (XML-RPC / Web-API’s)
- Version control
- Viewing and restoring diffs
- The ability to associate Metadata with content
- Spell checking
- Search
- Comment posting
More advanced features:
- Syndication
- Subscribing to comment threads (not just article posts)
- Trackback / Pingback support
- Remote administration (XML-RPC)
- Ability to author/administrate multiple sites
- Ability to mirror another site
There is a school of thought that encourages documenting and determining solutions for all problems before writing the first line of code. But that’s not how I do my own projects. Instead, I like to select core functionality and deliver a working proof-of-concept first. Then, I schedule feature releases based on groups of functionality and iterate on the design. I find that by diving right in I uncover problems I wasn’t aware of at first and I get a better understanding of the problem space. It’s also better to be missing some features then to never deliver at all (ie., getting overwhelmed and never finishing, which is quite common on personal projects like this.)
In future articles I will describe the various approaches I took in solving each of these problems, as the evolution of a solution is often as interesting as the final result. But before we dive into the technical details, I’ll finish the rest of the story.
I like the concept of a Wiki as a means of revising and interlinking a collection of content, but once the repositories grew, I found myself needing a page to track changes, updates, and additions to the site. This soon became the home page of the Wiki, listing changes by date. The layout and use of this new page was quickly blending it’s purpose with another genre of CMS I’m interested in: the Blog.
I started thinking about Blogging and Wiki’ing as content management problems, and coincidentally enough, there’s so much over-lap that it is feasible to build one framework that can do both. The difference is not so much how content is stored and managed, but how the templating and interactions are handled. From the author’s perspective, a Blog post and a Wiki page are identical — a Blog is just a single-user Wiki. From a reader’s perspective, the Blog page just isn’t editable; it still reads the same.
Naturally, the project’s scope grew. It was now going to be a single CMS framework capable of running (or blending) Blogs and Wikis. Fortunately, with the core CMS engine already working, getting a crude blog site running was almost as simple as writing some new layout templates and queries. Of course, Blogs do bring their own set of problems. To have a competitive feature-set, a modern Blog engine has to keep up with the hotly debated and diverse world of Blog API’s.
With a Blog- and Wiki-mode well under way, there was still one more piece to the puzzle: Content Aggregation. This one doesn’t seem to fit as well, but I have a good reason why they might blend. The answer came from two angles:
First, I use a desktop newsreader application called NetNewsWire to track syndicated content. When I have a network connection, this app is fantastic. But being the content junkie I am, it bothers me that I’m not able to update my feeds while I’m traveling or otherwise offline. I’ve debated switching to a web-based aggregator that can run on a server (keeping content up-to-date), but I really do like NetNewsWire.
Second, while many bloggers list their blogroll, I find this minimally useful. It’s nice to know what sites someone follows, but I’ve got more then enough subscriptions to fill my spare cycles. Instead of listing subscription URLs, it would be far more useful to know which articles from their blogroll a person recommends. Some bloggers do this with “check this out” posts and off-site links, but that seems rather inefficient since it requires an active post by the blogger.
It turns out there is a simple solution to both of these problems: a proxy. Not just any proxy though; for this to work, it has to be knowledgeable about syndication formats, namely: RSS, RDF, and Atom. Instead of using my desktop reader to fetch feeds directly, I pass the requested RSS URL as an HTTP GET argument to a cgi script on my server. The server fetches the RSS, parses it, modifies it, and passes it to my desktop client. With the RSS parsed and stored, I’ve also enabled a History mechanism with long-term storage, and the ability to browse the archive using a web browser if needed.
You may have caught the bit about the proxy modifying the RSS (and indeed you should, as it seems rather odd.) The modifications are to the “about” or “link” URL for the RSS item. Instead of keeping the original link, the proxy inserts a unique URL pointing to another cgi service on the server. By proxying these links, the server now knows which articles I request — and that, is far more valuable then a blogroll! Instead of giving a list of bookmarks or “read this” posts, this system enables a view into my aggregated content, editorialized by my action of following links from a feed. By offering this list as it’s own RSS feed, readers can subscribe to the “what am I reading online” feed.
At first pass, this RSS proxy was a stand-alone system; However, during some unfortunate late-night re-factoring, it merged into the Blog/Wiki CMS engine tree. The rationale was that the Blog engine should have access to the Aggregator tool and the editorialized content feed. This is the current state of affairs, but I no longer feel this is the right direction.
So there you have it, the history behind my latest CMS project. I’m still working on the next batch of articles to discuss the approaches taken during development, the mistakes made, and the plans for growth. Finishing the code is a priority, but I will hopefully have a few articles ready soon.