eriksmartt.com>Selected Archives

rewriting the web

I've been spending a little time lately brushing up my web-fu by tinkering with A-j-a-x and greasemonkey. The Ajax stuff is SO much nicer to use then the old school IFRAME hacks for dynamic content. It brings a little web-developer tear to my eye to remember the mountains of code I once wrote to make this work cross-platform back in the NN/IE 4.x days. Now it's a method call and a callback. Beautiful.

Greasemonkey, on the other hand, is a whole different animal. I first heard about greasemonkey at ETech, which prompted a huge light-bulb to appear overhead. Unfortunately it wasn't shining very bright, and it took a little while for the gears to crunch over why I'd want to modify the pages I was surfing (probably because I get most of my content via RSS feeds.) There's the obvious task of stripping ad-banners and such, but that can be done with other tools already. The other problem is that I bounce back and forth between Firefox and Safari, and for the past couple months I've been on a Safari kick (which has no greasemonkey.)

So there I was debating switching back to Firefox to get greasemonkey when I had two thoughts: first, JavaScript wouldn't be my language of choice for something like this, and second, there's an obscure feature in PithHelmet (a Safari plugin/hack) that can already do this.

In the revision history notes for PithHelmet, there's an entry on 2004-08-12 that reads "Machete allows you to clean up or remix web sites with small scripts." Oh yeah. Once you figure out how to use the extremely obtuse user interface, you'll find that PithHelmet has the ability to pipe incoming HTML to a shell script as stdin data, then route the stdout from the script back to the browser. The choice of which scripts to pipe to is decided based on pattern matching the URL.

To get started I picked something simple -- nuking 'target="_blank"' attributes from hyperlinks. Why? Because I hate it when sites assume that I want to open a link in a new window, and this is a pretty simple pattern to match. The script looks something like this:

#!/usr/local/bin/python
import re, sys

# Compile a regex pattern to match 'target="_blank"' in hyperlinks
re_pattern = re.compile("(\<a )(.*?)(target[\ ]*=[\ ]*[\"\']_blank[\"\'])(.*?)(\%gt;)")

if __name__ == '__main__':
    # Loop over each line in stdin
    for line in sys.stdin.readlines():
        # Write the line back to stdout after dropping any 'target="_blank"' matches
        sys.stdout.write(re_pattern.sub(lambda mo:"%s%s%s%s" % (mo.group(1), mo.group(2), mo.group(4), mo.group(5)), line))

With the script saved somewhere convenient and chmod u+x'd, I made a new rule in PithHelmet using "Regex URL Match" with the pattern: ^http:\/\/. And just like that... 'target="_blank"' was gone.

For more info and fodder on Greasemonkey, check out the following links: