ANN: LightMediaScanner 0.4.4 released

I'm proud to announce LightMediaScanner 0.4.4 was released and I'd like to take some time to remind you of this awesome project ;-) While 0.4.4 is no big leap ahead of 0.4.3 I'm doing this post more to make clear people that the project is still alive and it may be the right tool for you, particularly if you need fast media scanning on your desktop or embedded device.

Introduction (History)

LightMediaScanner (LMS) was born in 2007 during the development of Canola2 media player for Nokia N800 Maemo device. We did a previous try at media scanners for Canola1 and did learn a lot, that would be the foundation of LMS. We had two major problems to solve:
  1. scan large amounts of media as fast as possible
  2. don't mess with interactivity.
The first problem had more qualifiers, like the media being unreliable (broken files), be removed without prior notification (mmc removed when system was off), slow discs (sd/mmc), small amount of available RAM, low-end CPU and so on. Traditional approaches were slow and would bring your device to unusable state for minutes, even after the scan process finished as they consumed too much memory that would swap your applications and services. They would also be naive with file scanning and would trash filesystem's cache with useless data.

Existing Solutions Problems

At that time we had Maemo's media scanner that was awful and Tracker was being developed to solve its problems. However Tracker was immature and had other focus, more like being fancy with new standards and trying to solve all the diseases of the world... while we need to be fast in some specific cases. To highlight technical problems, let's consider JPEG scanners. Everything one Media Player application would need is the title, author/artist, date, width, height, orientation and maybe geolocation. These are few bytes, particularly because Title and Author are missing in most cases. We should open the file, get these bytes, register these and close the file. Every system at the time used libexif to do so, however it would copy all the header to memory prior to scanning. Given that most files contain thumbnails in the header, it would mean useless malloc + read of around 10-50kb, to get dozen bytes and exit. This trashes the VM and disc. Moreover, some libraries would open the file themselves and would not fadvise as POSIX_FADV_DONTNEED, thus the filesystem would keep it in cache for further access that never happen! Similar situations existed for PNG, Mp3 and others.

LightMediaScanner Solution

We decided to go with a 2 process solution to avoid crashes and hang-ups on the main process. This would be safe and we could kill the slave process if it took too long to report on a given file. Later on we introduced threaded and single process scanning that would be useful in some cases. We also had dynamic loadable plugins (shared objects) with the media parsers to be used given matching extensions. The plugins can be selected by the user at runtime. Yes, we would not scan every file with every parser, we did trust the extension as a way to save work and it worked fine, and fast. The main process just scans the filesystem and pipes the files to the slave (thread or process). This slave would open the file, get the data to populate a structure to be written to database, then close the file and report back the progress. If the slave takes too long to reply or dies, it's respawned with the next file in the queue. The database uses SQLite3 and a configurable number of files are processed in one SQL transaction, thus we'd not hit the filesystem with writes that often. The database access is provided with helper functions that uses pre-compiled statements. However we only provide the insert/update methods, as the fetch (select) are up to the application to use (maybe using their own ORM framework). Some metadata formats did not specify encoding (ASCII was assumed) and fallbacks can be applied with the charsets. It will use iconv in a priority list to figure out what works.

Just the Indexer!

LightMediaScanner is not a daemon-service, it does not talk DBus neither requires you to speak RDF or SPARQL. If you want, you can create one with it, but it does not provide this stuff out of the box. Basically you call it to check for media and then access the database yourself, using the SQLite3 directly or using some ORM, doing caches or not, it's up to you. The database tables are quite simple and try to provide common useful relations. It's not designed to cover all weird cases in the world (like multiple artists per song or which lens did you use with your camera).

Using it!

Get the code from our GIT or use the release tarball, see http://people.profusion.mobi/cgit.cgi/lightmediascanner.git/. It uses standard autoconf/automake to build, so should be familiar to most developers. It hard-depends only on SQLite3 and iconv (most libC ships with it). Optional dependencies are libvorbis, libmp4v2 and libflac. You can call the scanner from C, C++ or Python using our bindings. Comprehensive example can be found at src/bin/test.c References: previous post about lightmediascanner 0.2.0 release