My current focus is in re-writing the article storage mechanism. In the first version, everything was stored in a database (even the version control mechanism used the database), but this soon became problematic as new functionality was added and the data-model needed updating too often to keep writing import/export routines. To simplify, I moved everything to flat-file storage. Since the system was designed to cache the HTML, the performance hit of reading text files wasn’t really an issue. Furthermore, I’m a big fan of leveraging the tools one already has — I’m a unix geek, so having ‘diff’, ‘vi’, ‘sed’, etc. on hand is a good thing. Managing plain-text files is also a very natural way to work with articles.
With the CMS working from flat-files, the next step was naturally to re-write it. The plan now is that the web interface will only work with my CMS libraries, and the CMS libraries with work with a database that can be imported or exported to the current flat-file format. From a content authoring perspective, articles can still be written as plain-text and manually placed in the “data” directory. The admin tool will contain a “sync with file-system” function, where the database will re-sync and find the new articles. The file-system will remain the ultimate authority, meaning that the synchronization process in theory could start by dropping all articles from the database and rebuilding from scratch (although that wouldn’t be very efficient.)
The database replication process may seem cumbersome for frequent content editing, but there’s a hidden benefit: since the articles must always be scanned and imported, you can run sanity checks and validation before committing to the database. These checks can be run before doing the database update, thus ensuring that previously valid content stays live until the updated article is parse-able.
The logic to make all this work is fairly straight-forward, so now it’s the data-model holding things up. The problem is that I don’t want to pollute my personal article storage with the storage of the news aggregator. The aggregator will be bringing in hundreds of items daily, but I post articles only a couple times a week. The aggregator content is also less valuable to me, meaning, if it got corrupt, nuking that database would only be a mild setback.
The main CMS class was designed to use a single database, but only add tables as needed. If you never instantiated a news aggregator, it would never run the create statements for that service, thus keeping the data model neat and tidy. What I’m changing to now, is multiple database files: one for the aggregator, one for the blog engine, and one for each wiki instance. That last sentence would definitely sound odd if I was using MySQL, PostgreSQL, or some other traditional database daemon, but I’m not. I’m using SQLite. I’ll save my full review of SQLite for another post, but the thing to note at this point is that SQLite uses database files on the filesystem instead of a single master repository. The immediate benefit is in data portability.