WoWHead client for Linux

This is highly unofficial, but if you want to upload your World of Warcraft statistics to WoWHead in Linux, then you might be able to do so by using the following script. You will need curl and wget installed.

(Continued)

Popularity: 16% [?]


TurboGears widget errors

I am making a small project with TurboGears and I am just loving the widgets and the identity framework. There are some rough spots, like the documentation, but luckily for the most pars you can just launch an ipython interpreter and use tab completion to look at what functions are available. But then I got this error:
Traceback (most recent call last):
File "./start-foo.py", line 23, in
from foo.controllers import Root
File “…Foo/foo/controllers.py”, line 22, in

action=”save_settings”
File “/var/lib/python-support/python2.5/turbogears/widgets/meta.py”, line 169, in widget_init
validator = generate_schema(self.validator, widgets)
File “/var/lib/python-support/python2.5/turbogears/widgets/meta.py”, line 277, in generate_schema
if widget.is_named:
AttributeError: type object ‘SettingsFields’ has no attribute ‘is_named’

And I could not find any help on this. Luckily I found Lucas manual and by comparing the code I found that in this part of the code:
settings_form = widgets.TableForm(
fields=SettingsFields,
action="save_settings"
)

Instead of passing the whole SettingsFields class, an instance of it needs to be passed ( as in “fields=SettingsFields(),” ). Cann’t wait to see all the goodies they are making now in TG 2.0, but considering my experience, I will wait until they update the documentation for it before diving in.

Popularity: 21% [?]


Die C

I am not a good coder. In my mind a coder is determined by how much he loves C (or, in clinical cases, assembler). I hate C and avoid it whenever I can, because I am always off-by-one, and often even in more then one place, which make debugging even more .. fun. :(

Currently I am coding an application that is centred around use of artificial neural networks (more precisely a multilayer perceptron). The NN part must be done in C for speed and memory usage (imagine adding and multiplying hundreds of millions of floating values just for one operation .. and having to do that for 10 000 times for training .. while holding the whole data set in RAM, because you access every single value of it every second). With that in mind I engineered my system in such a way that the C part can be reduced to a couple simple operations that I can adapt from examples of the NN library. And still I barely wrote those pieces. Working with MySQL in C is pure pain, for example.

The rest of the system is in Python and I must say that Python is a godsend by itself, without even comparing it to C. I am now so used to Python when I was debugging my C program, I automatically wrote a chunk of Python into it and was very surprised when it did not work.

With that in mind, to ease the pain for further generations of neural network researchers, I will (someday) find some time and write Python wrappers for the nice and cool Lightweight Neural Network C library that I am currently using here.

BTW: I found that SourceForge is packed full with great software projects, but it is very hard to find those projects in Google. Morale - seeking for good free software? wajig (or apt-get) it! then sourceforge it! then freshmeat it! and only then google it.

Popularity: 34% [?]


Migrating from Blogger to your own Mnemosyne blog

As I decided to migrate my blog from blogger.com to my own blogging software on my own domain, I did not want to leave my old posts behind. As I had more then 150 of them, I had to do some automation to do that within my lifetime.

The first step is to get the data from Blogger.com database to somewhere where you can access it in full. You will need FTP or SFTP access to a computer with a real IP address. The idea is to go to blogger.com and specifiy that you want to switch storage of your blog pages from their blogspot.com service to your of FTP or SFTP server. You will have to provide server address, URL, folder, username and password. URL does not matter at this point, but other parameters define where your data will be dumped to.

Save the changes and republish the blog. After a few dozens of minutes all your data will be at your server in a set of complete html files. Now we need something that will parse those html files and write out something that your blogging software can understand. As I use Mnemosyne, my format is a set of mail message files.

To allow me to use HTML formatted messages in my blog (and not only reST) I added this to my config.py:

 class EntryMixin:    def _init_content(self):        """Read in the message's body, strip any signature, and format using        reStructedText unless X-Format=='html'."""         s = self.msg.get_payload(decode=True)        if not s: return ''         try: s = s[:s.rindex('--  ')]        except ValueError: pass         body=False         try:                if self.msg['X-Format'] == "html":                        body = s.replace("&nbsp;", " ")                        body = re.sub(r'&(?!\w{1,10};)',r'&amp;',body)                        body = xml.dom.minidom.parseString("<div>"+body+"</div>").toxml()        except KeyError:                pass        except xml.parsers.expat.ExpatError, e:                print "W: Parse failed for "+self.msg['Subject']+" at "+self.msg['Date']+" from "+str(int(time.mktime(time.strptime(self.msg["Date"],"%a %b %d %H:%M:%S %Y"))))                print xml.parsers.expat.ErrorString(e.code), e.lineno, e.offset         if not body:                parts = docutils.core.publish_parts(s, writer_name='html')                body = parts['body']         self.cache('content', body)        return body 

This will try to parse messages as pure xHTML if custom header “X-Format” is set to “html” in the blog entry. There is one problem with this approach - xHTML must be valid, otherwise XML parser in Kid templating engine will fail and there will be no end of trouble. That is why even in HTML mode we reparse the body to XML and back to string again. If we get a parsing error at that point, we fall back to reST parser.

Now we need something that will analyse our Blogger.com generated HTML files and get our content from there. Here is the script that I used:

 #!/usr/bin/python  import os, os.path, time, sys, glob, re, xml.dom.minidom id = 1 host = "old"  mdate = re.compile( r'\">(\d+) (\D+) (\d+)<' ) mbody = re.compile( r'</div>(.+)<div style=' ) mfooter = re.compile( r'<a href=\"http://example.com/(.+)\" title=\"permanent link\">(\d+):(\d+)</a></em>' )  files = glob.glob("old/*/*/*.html")  for file in files:        year = 0        month = 0        day = 0        hour = 0        minute = 0         subject = ""        oldurl = ""        body = ""        date = ""        title = ""        footer = ""         f = open( file, "r" )        status = 0        for l in f:                if status == 0:                        if l.find('class="date-header"')>0:                                date = l                                status = 1                elif status == 1:                        if l.find('class="post-title"')>0:                                status = 2                        if l.find('class="post-body"')>0:                                status = 4                elif status == 2:                        if len(l.strip()) > 0:                                title = l                                status = 3                elif status == 3:                        if l.find('class="post-body"')>0:                                status = 4                elif status == 4:                        if len(l.strip()) > 5:                                body += l                        if l.find('padding-bottom: 0.25em;')>0:                                status = 5                elif status == 5:                        if l.find('posted by ')>0:                                footer = l                                break         f.close()         rdate = mdate.search( date )        rbody = mbody.search( body )        rfooter = mfooter.search( footer )         year = rdate.groups()[2].strip()        month = rdate.groups()[1].strip()        day = rdate.groups()[0].strip()         subject = title.strip()         body = rbody.groups()[0]        body = "<p>"+body+" </p>"        body = re.sub(r'<img([^>]*?[^/])>',r'<img\1/>',body)         if subject == "":                subject = re.sub(r'<br.*?>',' ', body)                subject = re.sub(r'</p.*?>',' ', subject)                subject = re.sub(r'<.*?>','', subject)                subject = subject.strip()                line = subject.find(' ')                if line > 45:                        subject = subject[:40]+"..."                else:                        subject = subject[:line]          oldurl = "http://oldblog.blogspot.com/"+rfooter.groups()[0].strip()        hour = rfooter.groups()[1].strip()        minute = rfooter.groups()[2].strip()         mtime = time.mktime(time.strptime(day+" "+month+" "+year+"  "+hour+":"+minute, "%d %B %Y %H:%M"))        outname = str(int(mtime))        while os.path.exists( outname+"."+str(id)+"."+host ):                id += 1        outname = outname+"."+str(id)+"."+host        out = open( "entries/new/"+outname, "w" )         out.write("Date: "+time.ctime(mtime))        out.write(" Subject: "+subject)        out.write(" X-URL: "+oldurl)        out.write(" X-Tags: untagged")        out.write(" X-Format: html")        out.write("  ")        out.write(body)        out.close() 

Here I parse the HTML files using a simple state automata and then assemble all the data that we need to have, like timestamp in filename and Date field. In this script it is assumed that in the current directory there is “old/” directory with subdirectories like “2005″, “2006″, … which have subdirectories for months in which there are xHTML files for individual posts. The output is written to “entries/new/$timestamp.$id.$host” files.

After running this script and then running mnemosyne you will see a bunch of messages about failed parsing of some messages. That shows you where your xHTML is not valid. The usual problems are html entities ( like &euro; ) that the parser does not recognise and lack of closure on img or br tags:

 This will fail because of &euro;  <img src="smile.png"> will fail, it must be <img src"smile.png"/> 

My import script fixes img and br tags, but for other problems there is not much choice but going trough the entries and fixing them up manually. Also later on, if you will want to paste some custom HTML into your post, you will have to mark the whole post as HTML mode post and also check if you pasted HTML is valid XML manually.

There are some other fun things in this blog, but I will go into that in later posts.

Popularity: 29% [?]


Learning TurboGears now. True MVC separa…

Learning TurboGears now. True MVC separation. Nice and powerful. Templates are valid XHTML documents (and not a mess of gibberish). Not much magic. I hate magic. Especially when it fails. I look forward to writing a task tracking system for a very special project in this framework. More about the project when it is launched. You’re gonna love it ;)

Also I am going to visit my family - they still have no Internet, so I’ll be mostly offline ’till the New Year (except for this and this). So I wish you all a Merry Christmas (note: there is absolutely nothing christian about Christmas - it existed like 2000 years before that character from Bible is born according to that book and in Latvian this day is called “ZiemassvÄ“tki” - “ziema” = winter, “svÄ“tki” = celebration) and a Happy New Year (if you are really an orthodox christian, why do you count years the new, non-christian way? you should be celebrating New Year on 13th of January like Christ did and the Orthodox church still does, because it uses Julian Calendar more then 420 years after it has been denounced)!

Oh, the fun of touting religious people … of any religion … :D Have fun everybody!

Footnote: in October I noted that there have been 11000 spam mails in my GMail spam box at that point (it stores only spam that has arrived in last 30 days). Today I am pleased to say that I can only see 4300 spam mails there today. It could be that GMail has implemented some procedures so that some spam does not even reach that folder, but I shall be very optimistic and say that amount of spam has declined! Maybe spammers are on holidays? If so, I wish they stay there :D

Popularity: 18% [?]


Another tiny note - why is it so that al…

Another tiny note - why is it so that all the Python wrappers for Gnome and Freedesktop related things (GnomeVFS and DBus in my experience) have absolutely no API documentation !!! PyGTK has a nice set of documentation for GTK work, but it doesn’t extend beyond that, sadly. When I was writing SBackup, I had to resort to using Python build-in function dir() to show me what names the gnomevfs module exported and guestimate my way from there. That was ugly as hell, but worked. Now I am trying to find any information about that “new” DBus thing that everyone was so excited about approximately a year ago - I can only find a few blog posts about rewrites of said API and a few simple programs that do not even work with the rewritten API. I mean - I can understand not having documentation for internal functions of a desktop program, but not having a public API document for a critical library of freedesktop.org desktop infrastructure - that is just plain dumb.

Edit: It seams that there is some kind of dbus tutorial with Python API section. I do not know, why it didn’t appear on first 10 pages of Google search for “dbus python” or “dbus python API”, but I hope this linking will help that a bit. Also, we will see, how useful it actually is - the rest of the document itself is quite cryptic to me.

Edit2: I am impressed, the Python chapter is definitely written by someone different from those that wrote the rest of the dbus tutorial - this part actually makes sense, is very detailed and hand-holding when it is needed. i got almost all my questions answered. Thanks to whoever wrote that!

Popularity: 25% [?]


Remote restore fixed. Full planned funct…

Remote restore fixed. Full planned functionality reached.
All that I planned to do for Simple Backup is done as of version 0.7.
Now I only need to wait for the evaluation from my mentors at Ubuntu and to fix all bugs they and all other users find :D

I also did a bit of refactoring in this release oriented towards less memory usage. Results:

  • Memory usage while restoring dropped by 30-50%
  • Memory usage while making a new backup reduced … tenfold?!?!

I like it :D

What I do not like is the performance of GnomeVFS over ssh (Bug #155872) and also a need to download all the backup image *twice* to restore anything (in the worst case). Sadly I can not do much in either case :(
( Of course I could fix the GnomeVFS ssh module and write a new tar implementation with an external file positioning cache, but I fear that it is somewhat beyond my capabilities :) )

Popularity: 22% [?]


Ok, now the 0.5 release of my Simple Bac…

Ok, now the 0.5 release of my Simple Backup suite can now actually restore something from your backups! (both command-line and Gnome interfaces) It now even does automatic backups (and not just claims that it does). It even doesn’t store empty folders in the backup. Oh and some usability fixes are also thrown in for no extra charge. :)
Note: due to a small, tiny bug in gnomevfs, restoring files from remote backup locations doesn’t work yet. I’ll have to do a lot of hacking to get that working :P

Popularity: 19% [?]


0.3 release of SBackup :)

This week was a productive for me, but I did run into some unexpected
technical difficulities and thus had to work throught the weekend to
catch up.

I just did the 0.3 release on freashmeat and sourceforge.

Progress checklist:
* backend daemon - ok
* GUI configurator - ok
* commandline restore - 50% (need to write a directory extraction
function that is missing from python tarfile module)
* GUI restore - 95% (depends on command line restore)
* GUI to write a backup snapshot to cd - 0%, optional

Of course extensive optimisation, testing and polishing is quite needed too.

I should be able to finish the restore tools tomorrow if no other
major problems occur.

Popularity: 16% [?]


Still writing a restore tool :(

Still writing a restore tool :(
Last two hours were spent debugging an interesting problem with TreeView in PyGTK. It was too slow to parse and add all files from a backup snapshot to the tree view at once (not to mention that it took 35 Mb of RAM :P), so I decided to load the tree as needed - I would add the children of a node only when this node gets expanded. So I happily wrote a handler to ‘row-expanded’ event that does just that - adds some children to the newly expanded node.
Note: as the node cannot expand if it doesn’t have some children already, I also add a dummy child to all directory nodes
Then the problem came up - once I enabled my handler, the nodes would not expand anymore: the expansion handles were there, I could click on them and see the CPU being chewed away by the parsing of the 6 Mb nodelist, but nothing changed in the interface - even the dummy node didn’t come up.
That confused not only me, but also #pygtk people. I wrote a 15 line simple script to replicate the problem, but everything worked fine there :O. At this point I started commenting stuff out at random and found out that breaking the link between treestore and treeview (recommended in docs to avoid excessive updates) resets the expansion state. Doh.
But it was not the end yet. After that I noticed that the nodes didn’t expand on the first try, but only on the second. 8) After some mental mummbo-jummbo I came to an idea that proved to be dumb, but correct. Prepare for a gem boys and girls - if, in the process of execution of row-expanded handler, at at least one point the expanding node has no children (like when you have removed the dummy node, but still haven’t added the real ones) - the expansion doesn’t happen!
Two bugs^Wfeatures with the same effect. Oh, the fun of debugging never stops :D

Popularity: 28% [?]