Do you want to hear the most incredible ...
Do you want to hear the most incredible "it's not a bug - it's a feature" story ever?
After shooting hundreds of megs of RAWs with my Canon 350D last couple of weeks, I noticed a very strange thing - importing this large amount of files from my camera into F-Spot took ages. F-Spot ate memory in tens and hundreds of megabytes and never returned it back to the system. Well I blamed it on Mono and went searching for a better way. Then I found out that command-line C program gphoto also take the same horrific amount of memory to import my photos. I saw that to download 900 Mb of photos (~250 photos) photo memory use went up to ~910 Mb (2 Mb were shared). Luckily Linux managed to swap out part of gphoto, so I could finish the download with my 512 Mb of real RAM and a 1 Gb swap file. I googled and founds tens of bug reports on this - first of them as early as December 2004. Ouch.
Well - let's see what the problem is, shall we? Some bugreports reference a bug in gphoto's SourceForge bug tracker where a users reports that downloading a 250 Mb video file takes 250 Mb of RAM and developers reply that unfortunately that is the limitation of current infrastructure and it is very hard to fix. Bumer.
But wait! He says that downloading ONE file takes a lot of RAM. This limit should not exist when downloading multiple files - we should be able to drop information about previous file as soon as we start downloading the next one, right?
Ok, lest see, what really is going on there. Downloading source of gphoto. Looking at it. Seeing a lot of mess. After around 10 minutes I start to understand that there is a table of option names and functions and the real job is doe by command line parser who calls a function as soon as he encounters a proper parameter on the command line. :P After 3 more minutes jumping around the code I finally get to a function that gets called to download a single file. Looks pretty easy:
- take a CameraFile pointer
- pass it to gp_file_new() for inicialization
- pass it to gp_get_file() to get the actual data of file (download happens here)
- pass it to gp_write_file_to_file() to dump the data to a file on disk
- pass it to gp_file_unref() to free the data
Looks all fine and dandy so far. However I see the memory use that suggest that this last operation does not happen as it should, so I search for the gp_file_unref() function. I do not find it in gphoto source, but as I soon figure out - it is in libgphoto2. The function is pretty straight forward - the reference count of the structure is reduced by 1 and if it has reached 0, the structure is freed from memory via gp_file_free() function.
Hmm, I wonder what will happen if I replace gp_file_unref() with gp_file_free() in gphoto? After a quick compile and installation (I thank the Gods and all DD's for the wonders of "debuild -us -uc && sudo dpkg -i ../gphoto*.deb") I ran gphoto again. Wow, it now only consumes 8-16 Mb of RAM and not 900. The files downloaded fine, but in the end glibc made a lot of fuss about "double free". What does that mean? It means that someone managed to get a reference to our MemoryFile and didn't give it back. Naughty boy!
We only call three functions using that pointer, so it should not be hard to trace them trough the source to see what they do. The gp_file_new() function looks good, it sets reference count to 1 always. gp_get_file is more complex - I get to crawl through a lot of strange redirects to all levels of gphoto architecture. At one point I get a bit alarmed as I see a local variable called ref_count, but then I see that the code just stores reference count there for safekeeping while data is copied from another object and right after that copy reference count is put back safely. After all that I get to the end of the gp_get_file function, just a couple thing left - cache the result, clean up and return the file. Wait a minute ....
It appears that someone thought that it is a good idea to use a gig or so of my RAM for cache, just in case if I would like to download the same photos the second time around in the same program call. IT IS NOT!
Results: one line patch, one NMU building for upload, one *very* long bug in upstream bug tracker, one developer quite upset and not too convinced about the correctness of free software ways any more :P