Boo

8. July, 2011

It seems a dream of mine has come true: A language like Python, compiled into something that can be executed as fast as Java or C++ and which allows to extend the compiler at compile time.

Ladies and gentlemen, please a warm welcome for Boo.

Guess I’ve found a reason to install Mono.


Maven Tools for Eclipse: M2 Repository Analysis And Dependency Management

13. May, 2011

I’ve finished RC1 of my set of tools to import Eclipse plug-ins into Maven 2 repositories. You can find the source on github. It needs Python 2.7 and lxml. pip is your friend.

The new features: There is now a tool to analyze the M2 repository for oddities. Currently, it can find these issues:

  • Dependencies which are used but not part of the repository
  • Dependencies which are used with different versions or version ranges (i.e. when one POM includes a dependency with 1.0 and another POM pulls in the very same dependency with version 1.1)
  • Dependencies which are used without versions or version ranges or a catch-all version like [0,)
  • Several versions of the same artifact in the repository

Plus it prints a list of all POMs in the repo with files (jar, pom, sources, test-sources, …). Here is a sample report.

The last tool can create a POM file with a dependencyManagement element containing the versions of the POMs in the repository. You can use this to nail down all versions to the ones existing in your repository (so you don’t accidentally pull in something you don’t want).

Lastly, I’ve enhanced the patch tool. Instead of overwriting replaced dependencies, it will now move them into a new profile. This way, users of the repository can specify which dependency they want (the one from the repository or, say, one from Maven Central).

I will try to build a new testing repo over the weekend so we can start wrapping up the necessary patches for a release.

Related posts: Eclipse 3.6.2 Artifacts for Maven 2


Numpty Physics

17. March, 2011

I really liked Crayon Physics. It was simple idea, great brain teaser, the perfect UI.

If you liked it as well, have a look at Numpty Physics.


pygments syntax highlighting

4. January, 2011

Need a good syntax highlighter? Check out Pygments.


The Python Module of the Week Blog

17. March, 2010

If you love Python, this blog is for you: Doug Hellmann posts about one Python module every week (PyMOTW). The posts contain lots of real world examples how to use a module. This week, it’s ElementTree.


PyScan 0.6

10. October, 2009

I’ve done some more work on PyScan. Most work is under the hood but I’ve got a system to load and save projects and a bug was fixed: If you pressed the Scan button too quickly with 0.4, the last image could have been overwritten. PyScan is now based on hplip 3.9.8.

PyScan 0.6.tar.gz (16KB, MD5 Sum: 2b5e23099be438ceceb69ec23d64cec6)

See the original post for features and the system requirements.


Attaching Sources To Eclipse Artifacts

8. October, 2009

After running mvn eclipse:make-artifacts -DstripQualifier=true -DeclipseDir=...path/to/eclipse, you might have lots of source JARs but they aren’t in the right place for Maven 2 to pick them up.

This little Python script fixes that. Just run it after eclipse:make-artifacts.

Note: You may be wondering why I use eclipse:make-artifacts instead of the recommended eclipse:to-maven. Simple: I don’t like to have a dozen core-*.jar in my project.

"""Maven 2 Eclipse Artifact Source resolver

After importing Eclipse artifacts into an M2 repository with

> mvn eclipse:make-artifacts -DstripQualifier=true -DeclipseDir=.../eclipse

run this script to move all source JARs in the right place for
Maven 2 to pick them up.
"""
import os, sys
from shutil import copyfile

def processGroup(path):
    print 'group %s' % path
    for dir in os.listdir(path):
        path2 = os.path.join(path, dir)
        if '.' in dir:
            processArtifact(path2)
        else:
            processGroup(path2)

def processArtifact(path):
    srcPath = path + '.source'
    #print 'processArtifact',srcPath
    if os.path.exists(srcPath):
        processArtifactWithSource(path)

def processArtifactWithSource(basePath):
    binPath = basePath
    srcPath = basePath + '.source'
    baseName = os.path.basename(basePath)
    print 'artifact', baseName
    
    for version in os.listdir(binPath):
        vbinPath = os.path.join(binPath, version)
        vsrcPath = os.path.join(srcPath, version)
        
        srcJar = os.path.join(vsrcPath, '%s.source-%s.jar' % (baseName, version))
        destJar = os.path.join(vbinPath, '%s-%s-sources.jar' % (baseName, version))
        
        if os.path.exists(srcJar):
            print '%s -> %s' % (srcJar, destJar)
            copyfile(srcJar, destJar)

m2repo = os.environ['M2_REPO']
if not m2repo:
    raise Exception('Env variable M2_REPO is not set')

root = os.path.join(m2repo, 'org', 'eclipse')
for dir in os.listdir(root):
    path = os.path.join(root, dir)
    processGroup(path)

Updated 01. December 2009: The script now figures out where your M2 repo is. Plus a minimum of documentation.


Printing Big Stuff On Linux

22. June, 2009

During the weekend, I tried to print my family tree. It was pretty simple to generate with GenealogyJ but frankly, the family tree view sucks. I was constantly shifting my “root” node to be able to see the parts I was interested in, etc. And printing this sucks even more. GJ will use lots of paper, printing huge empty boxes with lots of space around them. If I reduced the size of the parts which I don’t need, the layout fell apart. In short, I needed something better.

Graphviz to the rescue. A small Python program (100 lines) read the GED file and turned it into a graph with the layout of the persons just the way I wanted them to be. dot thought about the graph for a few seconds until it emitted a nice SVG which I could then print. Or so I thought.

I opened Inkscape and clicked on print. Yeah … that looks like the document … or rather a small part of it. Where is the poster print option? Ah, there is none. Great. Export as PNG with … oh … 150 DPI. Starting GIMP. No poster printing either. lpr tries to scale the image by .114537 which results in a 87 MB file. You gotta be kidding! kprinter? Nope … unless the command poster can be found.

A word of warning: The poster for openSUSE 11.1 (which you get with zypper) is broken. I tried it with both PostScript and Encapsulated PostScript and both resulting files were unusable in GhostView, GhostScript and Okular. Use this one instead.

After installing psutils, I could rotate the file to fit better on the page, too.

Conclusion: Most programs on Linux create PostScript files but printing them is still something for the command line. The recent print dialogs all look somewhat similar (but are different in tiny, annoying ways), each version is missing some important detail and most of them fail to simply print an oversized image on a single piece of paper or spread it over several. That Inkscape spits out an empty page after the document is just a minor issue.

What’s worse: The print preview either doesn’t work or doesn’t exist, printing to file is not implemented either or isn’t persistent. In the whole process, I ruined roughly 50 sheets of paper. *sigh*

In the end, I had a process which involved six different programs (GenealogyJ, Python, graphviz, Inkscape, pstops, poster) just for this standard task. Printing on Linux has some way to go, yet. On the positive side, I could script all these tasks and I could do it.


PyScan: A Little Helper For The HP CM1312nfi MFP Scanner

18. April, 2009

Update: Find the latest version (0.6) here.

I recently bought a HP CM1312nfi MFP scanner (multi function device with scanner and color laser printer). After scanning some 1’000 pages, I’m still satisfied with the device. The document feeder (ADF) sometimes tries to eat the paper after spitting it out and the colors could be a little more brilliant but overall, a good deal for the price.

What bothered me was that the “Start scan” button doesn’t work on Linux. But someone posted a script in the bug report which can poll the button by reading the URL http://$ip/hp/device/notifications.xml (replace “$ip” with the IP address or DNS name of the scanner). This returns some XML with two interesting elements: StartScan and ADFLoaded. The first one becomes 1 when someone presses the “Start scan” button on the scanner and the second one tells us whether there is some paper in the ADF.

With that and some source code, it was simple to create a little tool that works quite like xsane but fixes a couple of things that annoyed me for a long time:

  1. The UI of xsane is dead while it scans
  2. There is no online preview of the scan; you have to open the file in some extra tool to verify that the scan looks OK
  3. xsane doesn’t know about scan “projects”
  4. xsane doesn’t start to scan when I press the button on the scanner

As with all OSS software, this thing can seriously ruin your day, so be a bit careful. One of the biggest problems is the file size: To be able to edit files without loss of quality, TIFF format is the default. Each full page scan takes 26MB, 100 pages need 2.6GB!

Plans for 0.5: Allow to edit projects in the UI, select them, save and load them. Right now, you must define your projects via the command line or by editing the source code.

Download: PyScan-0.4.tar.gz (12KB, MD5 checksum)

Dependencies (see README.txt for download links):

  • Python 2.6
  • PyQt4 4.4.3
  • Python Imaging Library 1.1.6
  • Python Imaging SANE 1.1.6 (needs included patch; see README.txt for instructions).

Features

  • Code to load images in a background thread, generate thumbnails (compatible to Konqueror/Dolphin) and display them in a list view
  • Display a (big) image with various manual and automatic zoom levels and modes (fit to window, percent) with zoom and pan
  • Online preview of scan in progress

Hideous details of the source

Again and again, I’m astonished how simple some tasks are in Python and Qt … if you’re willing to accept some “non-OO-ness” of the solution. I’ll explain some things I did here to give you an idea what’s going on.

Online preview

PyScan has an online preview of the currently active scan. If you look at the documentation, the Python Imaging SANE interface offers no way to do that. After looking at the source, I found that the SANE interface simply reads bytes from the SANE scanner module and copies these into a PIL image which was created on the Python side.

So my solution is to be notified that a scan is in progress and then copy said image every second (all 26MB) into a string. That string is then used to build a QImage which is turned into a QPixmap which is then displayed in the right pixmap view. See pilImage2QImage() for the details.

Background threads

I also moved all expensive code into threads: Loading big TIFF images, scaling them down to thumbnails, saving the images, etc. All threads have a method to add work to their input queues and they send Qt signals when they’re done. Continuing the scanning when there is paper in the ADF tray was a bit of a problem, through.

Since the saving of the images is happening in a background thread, the code could start the next scan before the saving was completed. This wasn’t such a big problem except that the “scan next image” code looks for files on the disk to determine the next filename. This would lead to overwrites. So I had to synchronize this somehow. My simple hack was to set a boolean “waiting” in the scanner thread which indicates that the scanner has more paper to process and waits for the save thread to complete. When the UI gets the “image saved” signal, it triggers the scanner to continue.

Generating thumbnails

The last hack in the code is the generation of the thumbnails. The main issue here was that I need the thumbnails for the gallery view really deep down in the Qt render code. Wasting time at that level is really a no-no but at first glance, the API offers no way to defer loading of the images and then later update the items in the list view when the data is available. Keep in mind what I need to do:

  1. Load a 26MB file from disk
  2. Scale it with antialiasing
  3. … for hundreds, possibly thousands of files!

My solution: In the render code, I create a LazyPixmap. This is just dumb object to save the filename and a placeholder pixmap which is used into the real thumbnail becomes available. The LazyPixmap will schedule a job for the LoaderThread.

In my first code, I tried to create a QPixmap in the LoaderThread but that doesn’t work: Only the UI thread is allowed to create a QPixmap. Duh. But luckily, Qt offers the QImage class which works even without a UI and which offers basically the same API as QPixmap. So the LoaderThread can load the image from disk and scale it down (to save memory and avoid heavy computation in the UI thread) right before emitting a “loaded” signal.

There are two places where a LazyPixmap is used: In the PixmapWidget (which can display and zoom a QPixmap) and in the ThumbnailDelegate which draws the thumbnails for the filenames in the GalleryModel.

In the case of the PixmapWidget, the signal will be handled in lazyLoaded(). Here, we convert the QImage into a QPixmap (in lpm.getPixmap()) and assign that pixmap, recalculate the zoom factor, realign the view, etc.

The GalleryModel, I have the problem that I need to tell Qt somehow that the pixmap has changed but the API offers nothing except rendering the whole widget by calling update(). This will render at most (on a huge screen) 30 pixmaps. Happens one time per visible pixmap, causes no flicker. Probably not worth to waste another second on it.

If you look at the code, you’ll see that a class called KDEThumbnailCache is used. This class accesses the same thumbnails als konqueror (KDE3) or Dolphin (KDE4). This means once the images are scaled down (either by my code or Dolphin), all tools can quickly load the small, precalculated thumbnails instead of having to scale the 26MB files again.

Conclusion

Well, that’s it for a small walk through the code. Feel free to give feedback if you like PyScan (or not) or when you have patches.


Subtext: Visual Programming, New Angle

2. March, 2009

If you have no idea what subtext is, lean back and watch this presentation.

It’s nice to finally find another person concerned about the state of programming languages. I started with C, toyed a bit with some other languages, moved to Java and today, I’m working mostly with Java, Groovy and Python. I’m doing all my spare-time code in Python. Why Python? Because I get more bang for the key press. And my spare time is most valuable.

So while I thoroughly agree that the idea of subtext is convincing, it’s too limiting at the same time: There are simple problems which you can’t express well in subtext, for example: A switch with 10 cases and some complex code in each case. It would just become too wide. The same applies when formulas have more than ten parameters. You’re flow tree looks nice but it takes more screen real estate than the “traditional” version.

So my argument is that we need a way to choose. Software projects need to give up the holy grail of “one language to rule them all” but the IDE should allow to mix and match various languages and more complex “objects” like tables, rich text, animations. Why do I have to waste my time formatting tabular content in a Java file (think array of values) when I can have Excel? Why can’t Java read the data directly from Excel? Why can’t I embed Excel tables in Java source code and access them like a primitive array? Why can’t I use a rich word processor to write my comments? Why is TAB 1-8 spaces instead of “one level of indent”? Why do I have to use braces when I already indent my code?

Because our computers are not powerful enough today. With every key we press, we have to worry about RAM and performance. Because companies still believe in lock in. Sun would probably add a cross-platform COM API into Java but will Microsoft port Excel to any platform where a Java compiler is available? Oh, we could use OpenOffice. Let’s see. People working for a software development company that has more then two employees: Comment here if your company policy allows to use OO instead of Office. Now let’s see how long it takes to get 10 comments.

In the end, what we have today is the most simple thing that actually works and doesn’t take too much RAM. I hope the time is ripe for the next step. I’m sick of fixed-width fonts, curly braces and source code which is 1% functionality and 99% “make the damn compiler happy”.


Follow

Get every new post delivered to your Inbox.