PyScan: A Little Helper For The HP CM1312nfi MFP Scanner

18. April, 2009

Update: Find the latest version (0.6) here.

I recently bought a HP CM1312nfi MFP scanner (multi function device with scanner and color laser printer). After scanning some 1’000 pages, I’m still satisfied with the device. The document feeder (ADF) sometimes tries to eat the paper after spitting it out and the colors could be a little more brilliant but overall, a good deal for the price.

What bothered me was that the “Start scan” button doesn’t work on Linux. But someone posted a script in the bug report which can poll the button by reading the URL http://$ip/hp/device/notifications.xml (replace “$ip” with the IP address or DNS name of the scanner). This returns some XML with two interesting elements: StartScan and ADFLoaded. The first one becomes 1 when someone presses the “Start scan” button on the scanner and the second one tells us whether there is some paper in the ADF.

With that and some source code, it was simple to create a little tool that works quite like xsane but fixes a couple of things that annoyed me for a long time:

  1. The UI of xsane is dead while it scans
  2. There is no online preview of the scan; you have to open the file in some extra tool to verify that the scan looks OK
  3. xsane doesn’t know about scan “projects”
  4. xsane doesn’t start to scan when I press the button on the scanner

As with all OSS software, this thing can seriously ruin your day, so be a bit careful. One of the biggest problems is the file size: To be able to edit files without loss of quality, TIFF format is the default. Each full page scan takes 26MB, 100 pages need 2.6GB!

Plans for 0.5: Allow to edit projects in the UI, select them, save and load them. Right now, you must define your projects via the command line or by editing the source code.

Download: PyScan-0.4.tar.gz (12KB, MD5 checksum)

Dependencies (see README.txt for download links):

  • Python 2.6
  • PyQt4 4.4.3
  • Python Imaging Library 1.1.6
  • Python Imaging SANE 1.1.6 (needs included patch; see README.txt for instructions).

Features

  • Code to load images in a background thread, generate thumbnails (compatible to Konqueror/Dolphin) and display them in a list view
  • Display a (big) image with various manual and automatic zoom levels and modes (fit to window, percent) with zoom and pan
  • Online preview of scan in progress

Hideous details of the source

Again and again, I’m astonished how simple some tasks are in Python and Qt … if you’re willing to accept some “non-OO-ness” of the solution. I’ll explain some things I did here to give you an idea what’s going on.

Online preview

PyScan has an online preview of the currently active scan. If you look at the documentation, the Python Imaging SANE interface offers no way to do that. After looking at the source, I found that the SANE interface simply reads bytes from the SANE scanner module and copies these into a PIL image which was created on the Python side.

So my solution is to be notified that a scan is in progress and then copy said image every second (all 26MB) into a string. That string is then used to build a QImage which is turned into a QPixmap which is then displayed in the right pixmap view. See pilImage2QImage() for the details.

Background threads

I also moved all expensive code into threads: Loading big TIFF images, scaling them down to thumbnails, saving the images, etc. All threads have a method to add work to their input queues and they send Qt signals when they’re done. Continuing the scanning when there is paper in the ADF tray was a bit of a problem, through.

Since the saving of the images is happening in a background thread, the code could start the next scan before the saving was completed. This wasn’t such a big problem except that the “scan next image” code looks for files on the disk to determine the next filename. This would lead to overwrites. So I had to synchronize this somehow. My simple hack was to set a boolean “waiting” in the scanner thread which indicates that the scanner has more paper to process and waits for the save thread to complete. When the UI gets the “image saved” signal, it triggers the scanner to continue.

Generating thumbnails

The last hack in the code is the generation of the thumbnails. The main issue here was that I need the thumbnails for the gallery view really deep down in the Qt render code. Wasting time at that level is really a no-no but at first glance, the API offers no way to defer loading of the images and then later update the items in the list view when the data is available. Keep in mind what I need to do:

  1. Load a 26MB file from disk
  2. Scale it with antialiasing
  3. … for hundreds, possibly thousands of files!

My solution: In the render code, I create a LazyPixmap. This is just dumb object to save the filename and a placeholder pixmap which is used into the real thumbnail becomes available. The LazyPixmap will schedule a job for the LoaderThread.

In my first code, I tried to create a QPixmap in the LoaderThread but that doesn’t work: Only the UI thread is allowed to create a QPixmap. Duh. But luckily, Qt offers the QImage class which works even without a UI and which offers basically the same API as QPixmap. So the LoaderThread can load the image from disk and scale it down (to save memory and avoid heavy computation in the UI thread) right before emitting a “loaded” signal.

There are two places where a LazyPixmap is used: In the PixmapWidget (which can display and zoom a QPixmap) and in the ThumbnailDelegate which draws the thumbnails for the filenames in the GalleryModel.

In the case of the PixmapWidget, the signal will be handled in lazyLoaded(). Here, we convert the QImage into a QPixmap (in lpm.getPixmap()) and assign that pixmap, recalculate the zoom factor, realign the view, etc.

The GalleryModel, I have the problem that I need to tell Qt somehow that the pixmap has changed but the API offers nothing except rendering the whole widget by calling update(). This will render at most (on a huge screen) 30 pixmaps. Happens one time per visible pixmap, causes no flicker. Probably not worth to waste another second on it.

If you look at the code, you’ll see that a class called KDEThumbnailCache is used. This class accesses the same thumbnails als konqueror (KDE3) or Dolphin (KDE4). This means once the images are scaled down (either by my code or Dolphin), all tools can quickly load the small, precalculated thumbnails instead of having to scale the 26MB files again.

Conclusion

Well, that’s it for a small walk through the code. Feel free to give feedback if you like PyScan (or not) or when you have patches.


Subtext: Visual Programming, New Angle

2. March, 2009

If you have no idea what subtext is, lean back and watch this presentation.

It’s nice to finally find another person concerned about the state of programming languages. I started with C, toyed a bit with some other languages, moved to Java and today, I’m working mostly with Java, Groovy and Python. I’m doing all my spare-time code in Python. Why Python? Because I get more bang for the key press. And my spare time is most valuable.

So while I thoroughly agree that the idea of subtext is convincing, it’s too limiting at the same time: There are simple problems which you can’t express well in subtext, for example: A switch with 10 cases and some complex code in each case. It would just become too wide. The same applies when formulas have more than ten parameters. Your flow tree looks nice but it takes more screen real estate than the “traditional” version.

So my argument is that we need a way to choose. Software projects need to give up the holy grail of “one language to rule them all.” The IDE should allow to mix and match various languages and more complex “objects” like tables, rich text, animations. Why do I have to waste my time formatting tabular content in a Java file (think array of values) when I can have Excel? Why can’t Java read the data directly from Excel? Why can’t I embed Excel tables in Java source code and access them like a 2D O array? Why can’t I use a rich word processor to write my comments? Why is TAB 1-8 spaces instead of “one level of indent”? Why do I have to use braces when I already indent my code?

Because our computers are not powerful enough today. With every key we press, we have to worry about RAM and performance. Because companies still believe in lock in. Sun would probably add a cross-platform COM API into Java but will Microsoft port Excel to any platform where a Java compiler is available? Oh, we could use OpenOffice. Let’s see. People working for a software development company that has more then two employees: Comment here if your company policy allows to use OO instead of Office. Now let’s see how long it takes to get 10 comments.

In the end, what we have today is the most simple thing that actually works and doesn’t take too much RAM. I hope the time is ripe for the next step. I’m sick of fixed-width fonts, curly braces and source code which is 1% functionality and 99% “make the damn compiler happy”.


UPCScan 0.7: Where is my stuff?

16. November, 2008

UPCScan 0.7 is released. New features:

  • UPCScan can now find music CDs
  • If UPCScan can’t find something on Amazon, it will still create an entry which you can then edit to fill in the details.
  • Entries can be deleted.
  • I’ve added lending information so you can quickly figure out who your new “ex-friends” should be.
  • I’m working on a series/issue information system to make it more simple to complete your collection. With this version, you’ll need to edit the database directly to add series/issue information but the user interface can already display this data.
  • I’m working on a feature to create an OpenOffice document with the locations. This would allow you to print this out and then scan the locations in as you scan your collection to tell UPCScan under which location to file the items. If you can’t wait, then you can use the barcode.py script to generate PNG images with barcodes which you can import in OpenOffice to achieve the same effect.

Download: upcscan-0.7.tar.gz (26,921 Bytes, MD5)


Enthought Traits

9. October, 2008

I’m always looking for more simple ways to build applications. Let’s face it, it’s 2008 and after roughly 50 years, writing something that collects a few bits of data and presents them in a nice way is still several days of work. And that’s without Undo/Redo, a way to persist the data, a way to evolve the storage format, etc.

Python was always promising and with the tkinter module, they set a rather high watermark on how you easily could build UIs … alas Tk is not the most powerful UI framework out there and … well … let’s just leave it at that.

With Traits, we have a new contender and I have to admit that I like it … a lot. The traits framework solves a lot of the standard issues out of the box while leaving all the hooks and bolts available between a very thin polish so you can still get at them when you have to.

For example, you have a list of persons and you want to assign each person a gender. Here is the model:

class Gender(HasTraits):
    name = Str
    
    def __repr__(self):
        return 'Gender %s' % self.name

class Person(HasTraits):
    name = Str
    gender = Instance(Gender)
    
    def __repr__(self):
        return 'Person %s' % self.name

class Model(HasTraits):
    genderList = List(Gender)
    persons = List(Person)

Here is how you use this model:

female = Gender(name='Female')
male = Gender(name='Male')
undefined = Gender(name='Undefined')

aMale = Person(name='a male', gender=male)
aFemale = Person(name='a female', gender=female)

model = Model()
model.genderList.append(female)
model.genderList.append(male)
model.genderList.append(undefined)
model.persons.append(aFemale)
model.persons.append(aMale)

Nothing fancy so far. Unlike the rest of Python, with Traits, you can make sure that an attribute of an instance has the correct type. For example, “aMale.gender = aFemale” would throw an exception in the assignment.

The nice stuff is that the UI components honor the information you use to build your model. So if you want to show a tree with all persons and genders, you use code like this:

class Model(HasTraits):
    genderList = List(Gender)
    persons = List(Person)
    tree = Property
    
    def _get_tree(self):
        return self

class ModelView(View):
    def __init__(self):
        super(ModelView, self).__init__(
            Item('tree',
                editor=TreeEditor(
                    nodes = [
                       TreeNode(node_for = [ Model ],
                           children = 'persons',
                           label = '=Persons',
                           view = View(),
                       ),
                       TreeNode(node_for = [ Person ],
                           children = '',
                           label = 'name',
                           view = View(
                               Item('name'),
                               Item('gender',
                                  editor=EnumEditor(values=genderList,)
                               ),
                           ),
                       ),
                       TreeNode(node_for = [ Model ],
                           children = 'genderList',
                           label = '=Persons by Gender',
                           view = View(),
                       ),
                       TreeNode(node_for = [ Gender ],
                           children = '',
                           label = 'name',
                           view = View(),
                       ),
                    ],
                ),
            ),
            Item('genderList', style='custom'),
            title = 'Tree Test',
            resizable = True,
            width = .5,
            height = .5,
        )

model.configure_traits(view=ModelView())

First of all, I needed to add a property “tree” to my “Model” class. This is a calculated field which just returns “self” and I need this to be able to reference it in my tree editor. The tree editor defines nodes by defining their properties. So a “Model” node has “persons” and “genderList” as children. The tree view is smart enough to figure out that these are in fact lists of elements and it will try to turn each element into a node if it can find a definition for it.

That’s it. Everything else has already been defined in your model and what would be the point in doing that again?

But there is more. With just a few more lines of code, we can get a list of all persons from a Gender instance and with just a single change in the tree view, we can see them in the view. If you select a person and change its name, all nodes in the tree will update. Without any additional wiring. Sounds too good to be true?

First, we must be able to find all persons with a certain sex in Gender. To do that, we add a property which gives us access to the model and then query the model for all persons, filter this list by gender and that’s it. Sounds complex? Have a look:

class Gender(HasTraits):
    name = Str
    persons = Property
    
    def _get_persons(self):
        return [p for p in self.model.persons
                if p.gender == self]

But how do I define the attribute “model” in Gender? This is a hen-and-egg problem. Gender references Model and vice versa. Python to the rescue. Add this line after the definition of Model:

Gender.model = Instance(Model)

That’s it. Now we need to assign this new field in Gender. We could do this manually but Traits offers a much better way: You can listen for changes on genderList!

    def _genderList_items_changed(self, new):
        for child in new.added:
            child.model = self

This code will be executed for every change to the list. I walk over the list of new children and assign “model”.

Does that work? Let’s check: Append this line at the end of the file:

assert male.persons == [aMale], male.persons

And the icing of the cake: The tree. Just change the argument “children=”” to “children = ‘persons'” in the TreeNode for Gender. Run and enjoy!

One last polish: The editor for genders looks a bit ugly. To suppress the persons list, add this to the Gender class:

    traits_view = View(
        Item('name')
    )

There is one minor issue: You can’t assign a type to the property “persons” in Gender. If you do, you’ll get strange exceptions and bugs. Other than that, this is probably the most simple way to build a tree of objects in your model that I’ve seen so far.

To make things easier for you to try, here is the complete source again in one big block. You can download the Enthought Python Distribution which contains all and everything on the Enthought website.

from enthought.traits.api import 
        HasTraits, Str, Instance, List, Property, This

from enthought.traits.ui.api import 
        TreeEditor, TreeNode, View, Item, EnumEditor

class Gender(HasTraits):
    name = Str
    # Bug1: This works
    persons = Property
    # This corrupts the UI:
    # wx._core.PyDeadObjectError: The C++ part of the ScrolledPanel object has been 
    # deleted, attribute access no longer allowed.
    #persons = Property(List)
    
    traits_view = View(
        Item('name')
    )
    
    def _get_persons(self):
        return [p for p in self.model.persons if p.gender == self]
    
    def __repr__(self):
        return 'Gender %s' % self.name

class Person(HasTraits):
    name = Str
    gender = Instance(Gender)
    
    def __repr__(self):
        return 'Person %s' % self.name

# Bug1: This doesn't work; you'll get ForwardProperty instead of a list when
# you access the property "persons"!
#Gender.persons = Property(fget=Gender._get_persons, trait=List(Person),)
# Same
#Gender.persons = Property(trait=List(Person),)
# Same
#Gender.persons = Property()
# Same, except it's now a TraitFactory
#Gender.persons = Property

class Model(HasTraits):
    genderList = List(Gender)
    persons = List(Person)
    tree = Property
    
    def _get_tree(self):
        return self
    
    def _genderList_items_changed(self, new):
        for child in new.added:
            child.model = self

Person.model = Instance(Model)
Gender.model = Instance(Model)

female = Gender(name='Female')
male = Gender(name='Male')
undefined = Gender(name='Undefined')

aMale = Person(name='a male', gender=male)
aFemale = Person(name='a female', gender=female)

model = Model()
model.genderList.append(female)
model.genderList.append(male)
model.genderList.append(undefined)
model.persons.append(aFemale)
model.persons.append(aMale)

assert male.persons == [aMale], male.persons

# This must be extenal because it references "Model"
# Usually, you would define this in the class to edit
# as a class field called "traits_view".
class ModelView(View):
    def __init__(self):
        super(ModelView, self).__init__(
            Item('tree',
                editor=TreeEditor(
                    nodes = [
                       TreeNode(node_for = [ Model ],
                           children = 'persons',
                           label = '=Persons',
                           view = View(),
                       ),
                       TreeNode(node_for = [ Person ],
                           children = '',
                           label = 'name',
                           view = View(
                               Item('name'),
                               Item('gender',
                                  editor=EnumEditor(
                                      values=model.genderList,
                                  )
                               ),
                           ),
                       ),
                       TreeNode(node_for = [ Model ],
                           children = 'genderList',
                           label = '=Persons by Gender',
                           view = View(),
                       ),
                       TreeNode(node_for = [ Gender ],
                           children = 'persons',
                           label = 'name',
                           view = View(),
                       ),
                    ],
                ),
            ),
            Item('genderList', style='custom'),
            title = 'Tree Test',
            resizable = True,
            width = .5,
            height = .5,
        )

model.configure_traits(view=ModelView())

UPCScan 0.6: It’s Qt, Man!

8. October, 2008

Update: Version 0.7 released.

Getting drowned in your ever growing CD, DVD, book or comic collection? Then UPCScan might be for you.

UPCScan 0.6 is ready for download. There are many fixed and improvements. The biggest one is probably the live PyQt4 user interface (live means that the UI saves all your changes instantly, so no data loss if your computer crashes because of some other program ;-)).

The search field accepts barcodes (from a barcode laser scanner) and ISBN numbers. There is a nice cover image dialog where you can download and assign images if Amazon doesn’t have one. Note: Amazon sometimes has an image but it’s marked as “customer image”. Use the “Visit” button on the UI to check if an image is missing and click on the “No Cover” button to open the “Cover Image” dialog where you can download and assign images. I haven’t checked if the result of the search query contains anything useful in this case.

UPCScan 0.6 – 24,055 bytes, MD5 Checksum. Needs Python 2.5. PyQt4 4.4.3 is optional.

Security notice: You need an Amazon Web Service Account (get one here). When you run the program for the first time, it will tell you what to do. This means two things:

  1. Your queries will be logged. So if you don’t want Amazon to know what you own, this program is not very useful for you.
  2. Your account ID will be stored in the article database at various places. I’m working on an export function which filters all private data out. Until then, don’t give this file to your friends unless you know what that means (and frankly, I don’t). You have been warned.

Scanning Your DVD, Book, Comic, … Collection

4. October, 2008

Update: Version 0.6 released.

If you’re like me, you have a lot of DVDs, books, comics, whatever … and a few years ago, you kind of lost your grip on your collection. Whenever there is a DVD sale, you invariantly come home with a movie you already have.

After the German Linux Magazin published an article how to setup a laser scanner with Amazon, I decided to get me one and give it a try. Unfortunately, the Perl script has a few problems:

  • It’s written in Perl.
  • It’s written in Perl.
  • It’s written in Perl.
  • There is no download link for the script without line numbers.
  • The DB setup script is missing.
  • The script uses POE.
  • It’s hard to add new services.
  • Did I mention that it’s written in Perl? Right.

So I wrote a new version in Python. You can find the docs how to use it in the header of each file. Additionally, I’ve included a file “Location codes.odt”. You can edit it with OpenOffice and put the names of the places where you store your stuff in there. Before you start to scan in the EAN/UPC codes of the stuff in a new place, scan the location code and upcscan.py will make the link for you. It will also ask you for a nice name of the location when you scan a location code for the first time.

If you need more location codes, you can generate them yourself. The codes starting with “200” are for private use, so there is no risk of a collision. I’m using this Python script to generate the GIF images. Just put this at the end of the script:

if __name__=='__main__':
    import sys
    s = checksum(sys.argv[1])
    img = genbarcode(s, 1)
    img.save('EAN13-%s.gif' % s, 'GIF')
    print error

There is a primitive tool to generate a HTML page from your goods and a small tool to push your own cover images into the database if Amazon doesn’t provide one.

Note: You’ll need an AWS account for the script to work. The script will tell you where to get your account ID and where you need to put the ID when you start it for the first time.

Download upscan-0.1.tar.gz (54KB, MD5 Checksum)


Docs? Ask The Sphinx

16. July, 2008

If you need to generate docs for your Python projects, try Sphinx.


TurboGears 2.0 Is On Track

8. July, 2008

There are three things which hooked me to TurboGears:

  1. Every day stuff is simple, complex stuff is possible
  2. Automatic reload after code change (no need to restart)
  3. It’s in Python

What I didn’t like is that TG 2.0 has been so quiet for so long. I’m on the Planet Turbogears RSS feed and I wasn’t sure whether 2.0 was alive or dead or whatever.

Well, it seems to be more alive than I expected and hopefully, we’ll see a 2.0 soon. In “Doing the right thing should be easy” by Mark Ramm, you can find more details.


Portable UI

18. January, 2008

For many years, I’ve been looking for a way to write portable applications with a nice, responsive user interface. Many have tried and many have failed:

  • Python with tcl/tk – A nice experience from the developer side. The Python wrapper around the tk widget set shows how you can get compact, yet easy understandable code and write UI’s in short time. If it just weren’t that ugly …
  • Java with Swing – Swing borrows a lot from X11, the grandfather of all graphical desktops. I have yet to see anyone managing to impress the world with their grandfather …
  • Java with SWT – Now, here comes a contender. Java is pretty widely available (not quite as many platforms as Python, but still), it is pretty fast, okay, the download is a bit on the big side … but no DLL hell, easy to setup (especially if you don’t provide an installer and just push a ZIP out). SWT is nice, fast … and bare bones. MFC? Well, they have JFace and in a few years, there might even be a text editing component that can do word wrap and still show line numbers. Oh, and SWT is available on even fewer platforms than Java. Palm, anyone?
  • HTML – Web based apps are all the hype. If you want to use your app on the run, it gets tricky. I don’t know about the US, but here in Europe, going online with you mobile will ruin you. Literally. Also, I’ve had my struggles with HTML and CSS and I can do without. Either and both.

I’ve tried a few more but in the end, things never felt right. Until recently. I’m a big fan of treeline. Treeline uses Python and PyQt which wraps Qt (say: “cute”). Qt is a mature framework, currently at version 4.3.3, with 4.4 is around the corner. It doesn’t have all the nifty stuff I can imagine (like an RTF editor; QTextEdit can only do a (big) fraction of that) but it gets closer to what I want than anything else.

In the past two weeks, I wrote a little clone of yWriter4. The little baby has currently about 8000 loc and about half of the functionality I want to give it (especially the text editing is still leaving a lot to be desired). Except for two bugs (signal names and GC issues), it’s been a real pleasure to use. I managed to implement almost every feature within a few minutes or few hours (the storyboard took 6 hours, the scene chart view took two), also thanks to the good defaults of the framework. Here is an impression of v0.2:

So when you’re considering to write a small to medium sized application which needs to run on Windows, Linux and MacOS, give PyQt a try.


Sorting Number Table Columns in PyQt4

14. January, 2008

Here is a simple trick to sort number columns in the QTableWidget of Qt4 and PyQt4: Format the number as a right aligned string:

for i range(12):
    item = QTableWidgetItem(u'%7d' random.randint(1, 10000))
    item.setTextAlignment(Qt.AlignRight)
    table.setItem (i, 1, item)