<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Moving my blog from WordPress to Django; Part 2: Migrating the data</title>
	<atom:link href="http://www.eriksmartt.com/blog/archives/306/feed" rel="self" type="application/rss+xml" />
	<link>http://www.eriksmartt.com/blog/archives/306</link>
	<description>my little chunk of bandwidth</description>
	<lastBuildDate>Sat, 06 Mar 2010 19:36:28 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0-alpha</generator>
	<item>
		<title>By: erik</title>
		<link>http://www.eriksmartt.com/blog/archives/306#comment-38281</link>
		<dc:creator>erik</dc:creator>
		<pubDate>Wed, 24 Jun 2009 01:24:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.eriksmartt.com/blog/archives/306#comment-38281</guid>
		<description>Hmm..  I wonder if you might use a SAX parser instead, so that the whole XML document doesn&#039;t have to be loaded into RAM.</description>
		<content:encoded><![CDATA[<p>Hmm..  I wonder if you might use a SAX parser instead, so that the whole XML document doesn&#8217;t have to be loaded into RAM.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sajal</title>
		<link>http://www.eriksmartt.com/blog/archives/306#comment-38278</link>
		<dc:creator>Sajal</dc:creator>
		<pubDate>Tue, 23 Jun 2009 15:43:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.eriksmartt.com/blog/archives/306#comment-38278</guid>
		<description>Interesting post... Ive been planing for months to port my largest site to Django, but due to my laziness/other projects i havent started work yet.

I played a little earlier, and figured the XML dump route wouldnt work for me. It would eat up allt he ram in the server and freeze mysql. (my wordpress site has &gt; 200,000 posts and &gt; 500,000 tags ... its a news site ) .. i tested this method long ago when i wanted to play with drupal..

The option i see is using XML-RPC make a python script which would one by one get posts and save it in Django ...

or.. I had to write some own code which generates the XML(or any other dump format) pulling posts one by one.

If a script is running on my server which fetches posts one by one in XML-RPC it would take 30 mins + but not generate any load.</description>
		<content:encoded><![CDATA[<p>Interesting post&#8230; Ive been planing for months to port my largest site to Django, but due to my laziness/other projects i havent started work yet.</p>
<p>I played a little earlier, and figured the XML dump route wouldnt work for me. It would eat up allt he ram in the server and freeze mysql. (my wordpress site has &gt; 200,000 posts and &gt; 500,000 tags &#8230; its a news site ) .. i tested this method long ago when i wanted to play with drupal..</p>
<p>The option i see is using XML-RPC make a python script which would one by one get posts and save it in Django &#8230;</p>
<p>or.. I had to write some own code which generates the XML(or any other dump format) pulling posts one by one.</p>
<p>If a script is running on my server which fetches posts one by one in XML-RPC it would take 30 mins + but not generate any load.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: erik</title>
		<link>http://www.eriksmartt.com/blog/archives/306#comment-36880</link>
		<dc:creator>erik</dc:creator>
		<pubDate>Fri, 11 Jul 2008 14:15:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.eriksmartt.com/blog/archives/306#comment-36880</guid>
		<description>Thanks for the link Martey... and the point can&#039;t be overstressed: The WXR that WordPress produces is an invalid mess!  A strict parser will fail.  If you want to work with it, you need a liberal, non-validating parser (and even still, you might need to touch up potential character encoding issues before you start.)</description>
		<content:encoded><![CDATA[<p>Thanks for the link Martey&#8230; and the point can&#8217;t be overstressed: The WXR that WordPress produces is an invalid mess!  A strict parser will fail.  If you want to work with it, you need a liberal, non-validating parser (and even still, you might need to touch up potential character encoding issues before you start.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Martey</title>
		<link>http://www.eriksmartt.com/blog/archives/306#comment-36879</link>
		<dc:creator>Martey</dc:creator>
		<pubDate>Fri, 11 Jul 2008 10:11:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.eriksmartt.com/blog/archives/306#comment-36879</guid>
		<description>I tried doing this, but into serious issues because the WXR file produced was not well-formed XML.

Other people have had similar issues: http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/</description>
		<content:encoded><![CDATA[<p>I tried doing this, but into serious issues because the WXR file produced was not well-formed XML.</p>
<p>Other people have had similar issues: <a href="http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/" rel="nofollow">http://lucumr.pocoo.org/cogitations/2008/02/18/how-not-to-do-xml/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: erik</title>
		<link>http://www.eriksmartt.com/blog/archives/306#comment-31335</link>
		<dc:creator>erik</dc:creator>
		<pubDate>Wed, 28 Feb 2007 17:39:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.eriksmartt.com/blog/archives/306#comment-31335</guid>
		<description>Good point Jesse -- I had a number of problems with the XML that came out of the plugin as well (and I also found out along the way that my MySQL tables were storing &#039;latin-1&#039; instead of &#039;utf8&#039;... fun fun.)  I ended up adding some entity encoding/translation to swap various entities and unicode characters into something more ascii-friendly.  I hate doing this (I really wish it were easier to run everything as unicode), but that&#039;s just where we&#039;re at with our tools right now I guess.

The following are the characters I swapped:

&quot;&#8211;&quot; becomes &quot;--&quot;
&quot;&#8212;&quot; becomes &quot;--&quot;
&quot;&#8216;&quot; becomes &quot;&#039;&quot;
&quot;&#8217;&quot; becomes &quot;&#039;&quot;
&quot;&#8218;&quot; becomes &quot;&#039;&quot;
&quot;&#8220;&quot; becomes &#039;&quot;&#039;
&quot;&#8221;&quot; becomes &#039;&quot;&#039;
&quot;&#8222;&quot; becomes &#039;&quot;&#039;
&quot;&#8230;&quot; becomes &quot;...&quot;
&quot;&#8243;&quot; becomes &#039;&quot;&#039;
&quot;\u2019&quot; becomes &quot;&#039;&quot;
&quot;\u2029&quot; becomes &quot;?&quot;</description>
		<content:encoded><![CDATA[<p>Good point Jesse &#8212; I had a number of problems with the XML that came out of the plugin as well (and I also found out along the way that my MySQL tables were storing &#8216;latin-1&#8242; instead of &#8216;utf8&#8242;&#8230; fun fun.)  I ended up adding some entity encoding/translation to swap various entities and unicode characters into something more ascii-friendly.  I hate doing this (I really wish it were easier to run everything as unicode), but that&#8217;s just where we&#8217;re at with our tools right now I guess.</p>
<p>The following are the characters I swapped:</p>
<p>&#8220;&amp;#8211;&#8221; becomes &#8220;&#8211;&#8221;<br />
&#8220;&amp;#8212;&#8221; becomes &#8220;&#8211;&#8221;<br />
&#8220;&amp;#8216;&#8221; becomes &#8220;&#8216;&#8221;<br />
&#8220;&amp;#8217;&#8221; becomes &#8220;&#8216;&#8221;<br />
&#8220;&amp;#8218;&#8221; becomes &#8220;&#8216;&#8221;<br />
&#8220;&amp;#8220;&#8221; becomes &#8216;&#8221;&#8216;<br />
&#8220;&amp;#8221;&#8221; becomes &#8216;&#8221;&#8216;<br />
&#8220;&amp;#8222;&#8221; becomes &#8216;&#8221;&#8216;<br />
&#8220;&amp;#8230;&#8221; becomes &#8220;&#8230;&#8221;<br />
&#8220;&amp;#8243;&#8221; becomes &#8216;&#8221;&#8216;<br />
&#8220;\u2019&#8243; becomes &#8220;&#8216;&#8221;<br />
&#8220;\u2029&#8243; becomes &#8220;?&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jesse Legg</title>
		<link>http://www.eriksmartt.com/blog/archives/306#comment-31334</link>
		<dc:creator>Jesse Legg</dc:creator>
		<pubDate>Wed, 28 Feb 2007 17:04:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.eriksmartt.com/blog/archives/306#comment-31334</guid>
		<description>One additional note, in case anyone else reads this and has a similar problem. 

The export plugin *might* generate some malformed XML, like unescaped or illegal characters. For example, I often use &#8212;, an em-dash. ElementTree seemed fine with it, but working with that data subsequently caused python to throw an UnicodeEncodeError. Eg. A simple &#039;print results[&#039;body&#039;]&#039; returns the following error:

&gt;&gt;&gt; UnicodeEncodeError: &#039;ascii&#039; codec can&#039;t encode character u&#039;\u2014&#039; in position 4
&gt;&gt;&gt; 99: ordinal not in range(128)

I&#039;m no XML expert. In fact, I know very little about character sets in general. So perhaps I need to change Python&#039;s character set or something. I didn&#039;t bother. Instead, I manually edited the RSS file and removed the offending characters.</description>
		<content:encoded><![CDATA[<p>One additional note, in case anyone else reads this and has a similar problem. </p>
<p>The export plugin *might* generate some malformed XML, like unescaped or illegal characters. For example, I often use &#8212;, an em-dash. ElementTree seemed fine with it, but working with that data subsequently caused python to throw an UnicodeEncodeError. Eg. A simple &#8216;print results['body']&#8216; returns the following error:</p>
<p>&gt;&gt;&gt; UnicodeEncodeError: &#8216;ascii&#8217; codec can&#8217;t encode character u&#8217;\u2014&#8242; in position 4<br />
&gt;&gt;&gt; 99: ordinal not in range(128)</p>
<p>I&#8217;m no XML expert. In fact, I know very little about character sets in general. So perhaps I need to change Python&#8217;s character set or something. I didn&#8217;t bother. Instead, I manually edited the RSS file and removed the offending characters.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: erik</title>
		<link>http://www.eriksmartt.com/blog/archives/306#comment-31228</link>
		<dc:creator>erik</dc:creator>
		<pubDate>Sat, 17 Feb 2007 19:45:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.eriksmartt.com/blog/archives/306#comment-31228</guid>
		<description>Hi Jesse, thanks for the comment.  I updated the URL.  (And thanks for the tip on your blog about Wagamama coming to Boston.  I&#039;m in Boston more often then London lately.)</description>
		<content:encoded><![CDATA[<p>Hi Jesse, thanks for the comment.  I updated the URL.  (And thanks for the tip on your blog about Wagamama coming to Boston.  I&#8217;m in Boston more often then London lately.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jesse Legg</title>
		<link>http://www.eriksmartt.com/blog/archives/306#comment-31225</link>
		<dc:creator>Jesse Legg</dc:creator>
		<pubDate>Sat, 17 Feb 2007 14:23:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.eriksmartt.com/blog/archives/306#comment-31225</guid>
		<description>Awesome post, I&#039;m working on this very same thing at the moment and your insights were helpful. I was having a hard time finding a decent export solution. You might want to include the full link to the WXR export plugin, instead of the Google search. That link being &lt;a href=&quot;http://www.technosailor.com/wordpress-to-wordpress-import/&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>Awesome post, I&#8217;m working on this very same thing at the moment and your insights were helpful. I was having a hard time finding a decent export solution. You might want to include the full link to the WXR export plugin, instead of the Google search. That link being <a href="http://www.technosailor.com/wordpress-to-wordpress-import/" rel="nofollow">here</a>.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
