Blog-split

Occasionally I post (or could post) things that are not fit for Debian Planet (non-English) or Ubuntu.lv Planet (unrelated to Latvian or Debian/Ubuntu matters), so I finally budged and switch the planet feeds to their own categories so that such things can be managed on a per-post basis.

Anyone know a Wordpress plugin that would allow to set multiple default categories?

Popularity: 20% [?]


Debconf7 photos

My blog server has been down ever since I left off to Edinburgh for the 7th Debian conference. Since then I have been busy taking photos of everything and everyone here AND I am not alone in that. Enjoy!

Popularity: 39% [?]


A must

It is really a must have - my blog gets mentioned on the DWN for the first time and at the same time (or even a bit earlier) the electricity cuts to the building where my server is co-located. And it takes a couple of days for the local administrator to get from all that chaos to turning my server back on. Perfect timing :P.

Popularity: 39% [?]


Spelling

From the previous version of my last post, people with even moderate knowledge of English could have easily understood that I suck at spelling and that, consequently, I did not have a spelling checker installed at this blog. Both of those two conclusions would be true.
So I decided to break in and install something to help me and after some mishaps, I settled on Visual Spellcheck plug-in. I am editing my posts in HTML anyway, so lack of WYSIWYG editor support is not critical for me, more like the opposite. All I needed was to install the plug-in, activate it in wordpress, install php5-pspell package and restart Apache. I forgot to restart Apache at first and got a cryptic error from the included fake pspell wrapper. Also aspell and corresponding language libraries must be installed on server site.
Note: after you have corrected all spelling errors in your text you must also remember to press “Continue Editing” link or otherwise changes will not be saved. I think that is a bug.

Popularity: 23% [?]


Another blog move

Got access to a new, more powerful server, moved my blog from Mnemosyne to Wordpress, hopefully did not flood p.d.o, did not move the comments over yet.

And not to flood the planet.d.o with the move I did this little change to the Wordpress feed - at the very end of the wp-atom.php file there is this line:


<?php $items_count++; if (($items_count == get_settings('posts_per_rss')) && empty ...

So I changed it to this:


<?php $items_count++; if ((($items_count == get_settings('posts_per_rss')) || $post->post_date < 1150758000 ) && empty ...

Where 1150758000 is the UNIX time of some point between the last post from the old blog and the first post in the new blog.

And I also imported my old posts entries from an Atom feed using the rss feed import as a base. The diff is after the break

(Continued)

Popularity: 20% [?]


Reiserfs glitch

Yesterday my server went down again. This time it was a combination of hardware and software. On software side my Reizerfs filesystem got corrupted in a funny way - I could not delete some files under /var. Those were files modified after a specific time yesterday, but the effect was only local to /var directory and did not effect any files in any other directories on the same partition. No suspicios messages in logs for the whole day. It gives back EPERM = “Permission denied” errors even for root. I “fixed” the problem by “cp -r /var /var.new && mv /var /var.old && mv /var.new /var”. All files were fine in the new directory (no corruption) and I could delete the same files there. But /var.old is still not deletable even by “rm -rf” as root. Any ideas?

Of couse I just though that I would reboot now so that all services would use the new /var. And of course the system did not come up because USB chip died. I had to go to the server room and disable USB in BIOS to get it up again. I hate hardware. :P

Popularity: 14% [?]


Just when you think you’re done …

Just a day after I announced my new blog, the server hosting it went off-line. The funny thing is - it was because of an upgrade. As it turns out after I updated Debian sid on the server, a new version of udev was installed. A version that did not work anymore with kernels << 2.6.12 and surely the kernel I had there was 2.6.11. I mean, what the hell is that? Why would you drop support for kernels that are only 4 version old???

Could you not just have a copy of the old code around and call that if the kernel is old?

Anyway it appears that it was the udev that was tasked with loading the drivers for my servers network card and as the server was rebooted (power problems) it would not bring its network back up. I only today managed to get to that server and find out what the problem was.

Oh and I also noticed that LVM packages are also just to be broken - apparently new packages will only work with kernel >> 2.6.14 . Just wonderful!

Popularity: 12% [?]


Migration finished

Hello again dear Debian Planet! I have a surprise for you. While you were not watching I made myself a brand new shining blog using the fine Mnemosyne software. It is a blog written in Python that uses Kid templates and a lot of wonderful pythonic magic (for uninitiated) that generates a set of XML compliant static files.

I added support for comments, tag cloud and some other dynamic features using the magic of AJAX.

The feed for the planet is tweaked to only display things that have not been posted to the old blog, so this is no flood - it is only the new posts (and one old one for completeness). I especially hope the DPL platform comparison will get some feedback.

Popularity: 18% [?]


Testing blogging from GMail

This particular post is being written from GMail, transfered to my blog server by email, processed by some massage scripts and then integrated into my blog without me doing much about it. Neat ne?

Pārbāūdīšu arī rāķštīšāņū Latviešu valodā.

Убедитесь в том, что вы готовы встретиться с любой проблемой, возможной в этой ситуации.

日本

This concludes language testing. :)

Popularity: 26% [?]


Migrating from Blogger to your own Mnemosyne blog

As I decided to migrate my blog from blogger.com to my own blogging software on my own domain, I did not want to leave my old posts behind. As I had more then 150 of them, I had to do some automation to do that within my lifetime.

The first step is to get the data from Blogger.com database to somewhere where you can access it in full. You will need FTP or SFTP access to a computer with a real IP address. The idea is to go to blogger.com and specifiy that you want to switch storage of your blog pages from their blogspot.com service to your of FTP or SFTP server. You will have to provide server address, URL, folder, username and password. URL does not matter at this point, but other parameters define where your data will be dumped to.

Save the changes and republish the blog. After a few dozens of minutes all your data will be at your server in a set of complete html files. Now we need something that will parse those html files and write out something that your blogging software can understand. As I use Mnemosyne, my format is a set of mail message files.

To allow me to use HTML formatted messages in my blog (and not only reST) I added this to my config.py:

 class EntryMixin:    def _init_content(self):        """Read in the message's body, strip any signature, and format using        reStructedText unless X-Format=='html'."""         s = self.msg.get_payload(decode=True)        if not s: return ''         try: s = s[:s.rindex('--  ')]        except ValueError: pass         body=False         try:                if self.msg['X-Format'] == "html":                        body = s.replace("&nbsp;", " ")                        body = re.sub(r'&(?!w{1,10};)',r'&amp;',body)                        body = xml.dom.minidom.parseString("<div>"+body+"</div>").toxml()        except KeyError:                pass        except xml.parsers.expat.ExpatError, e:                print "W: Parse failed for "+self.msg['Subject']+" at "+self.msg['Date']+" from "+str(int(time.mktime(time.strptime(self.msg["Date"],"%a %b %d %H:%M:%S %Y"))))                print xml.parsers.expat.ErrorString(e.code), e.lineno, e.offset         if not body:                parts = docutils.core.publish_parts(s, writer_name='html')                body = parts['body']         self.cache('content', body)        return body 

This will try to parse messages as pure xHTML if custom header “X-Format” is set to “html” in the blog entry. There is one problem with this approach - xHTML must be valid, otherwise XML parser in Kid templating engine will fail and there will be no end of trouble. That is why even in HTML mode we reparse the body to XML and back to string again. If we get a parsing error at that point, we fall back to reST parser.

Now we need something that will analyse our Blogger.com generated HTML files and get our content from there. Here is the script that I used:

 #!/usr/bin/python  import os, os.path, time, sys, glob, re, xml.dom.minidom id = 1 host = "old"  mdate = re.compile( r'\">(d+) (D+) (d+)<' ) mbody = re.compile( r'</div>(.+)<div style=' ) mfooter = re.compile( r'<a href=\"http://example.com/(.+)\" title=\"permanent link\">(d+):(d+)</a></em>' )  files = glob.glob("old/*/*/*.html")  for file in files:        year = 0        month = 0        day = 0        hour = 0        minute = 0         subject = ""        oldurl = ""        body = ""        date = ""        title = ""        footer = ""         f = open( file, "r" )        status = 0        for l in f:                if status == 0:                        if l.find('class="date-header"')>0:                                date = l                                status = 1                elif status == 1:                        if l.find('class="post-title"')>0:                                status = 2                        if l.find('class="post-body"')>0:                                status = 4                elif status == 2:                        if len(l.strip()) > 0:                                title = l                                status = 3                elif status == 3:                        if l.find('class="post-body"')>0:                                status = 4                elif status == 4:                        if len(l.strip()) > 5:                                body += l                        if l.find('padding-bottom: 0.25em;')>0:                                status = 5                elif status == 5:                        if l.find('posted by ')>0:                                footer = l                                break         f.close()         rdate = mdate.search( date )        rbody = mbody.search( body )        rfooter = mfooter.search( footer )         year = rdate.groups()[2].strip()        month = rdate.groups()[1].strip()        day = rdate.groups()[0].strip()         subject = title.strip()         body = rbody.groups()[0]        body = "<p>"+body+" </p>"        body = re.sub(r'<img([^>]*?[^/])>',r'<img1/>',body)         if subject == "":                subject = re.sub(r'<br.*?>',' ', body)                subject = re.sub(r'</p.*?>',' ', subject)                subject = re.sub(r'<.*?>','', subject)                subject = subject.strip()                line = subject.find(' ')                if line > 45:                        subject = subject[:40]+"..."                else:                        subject = subject[:line]          oldurl = "http://oldblog.blogspot.com/"+rfooter.groups()[0].strip()        hour = rfooter.groups()[1].strip()        minute = rfooter.groups()[2].strip()         mtime = time.mktime(time.strptime(day+" "+month+" "+year+"  "+hour+":"+minute, "%d %B %Y %H:%M"))        outname = str(int(mtime))        while os.path.exists( outname+"."+str(id)+"."+host ):                id += 1        outname = outname+"."+str(id)+"."+host        out = open( "entries/new/"+outname, "w" )         out.write("Date: "+time.ctime(mtime))        out.write(" Subject: "+subject)        out.write(" X-URL: "+oldurl)        out.write(" X-Tags: untagged")        out.write(" X-Format: html")        out.write("  ")        out.write(body)        out.close() 

Here I parse the HTML files using a simple state automata and then assemble all the data that we need to have, like timestamp in filename and Date field. In this script it is assumed that in the current directory there is “old/” directory with subdirectories like “2005″, “2006″, … which have subdirectories for months in which there are xHTML files for individual posts. The output is written to “entries/new/$timestamp.$id.$host” files.

After running this script and then running mnemosyne you will see a bunch of messages about failed parsing of some messages. That shows you where your xHTML is not valid. The usual problems are html entities ( like &euro; ) that the parser does not recognise and lack of closure on img or br tags:

 This will fail because of &euro;  <img src="smile.png"> will fail, it must be <img src"smile.png"/> 

My import script fixes img and br tags, but for other problems there is not much choice but going trough the entries and fixing them up manually. Also later on, if you will want to paste some custom HTML into your post, you will have to mark the whole post as HTML mode post and also check if you pasted HTML is valid XML manually.

There are some other fun things in this blog, but I will go into that in later posts.

Popularity: 31% [?]