Suspend Fail in openSUSE 12.1 After Upgrading KDE

29. June, 2012

When you upgrade openSUSE 12.1’s KDE 4.7 to 4.8 (using this repo), suspend to disk or ram might stop working. If so, you’ve encountered bug 758379:  STR (Suspend to RAM) fails when NetworkManager running and NFS shares mounted

The description is a bit misleading. It also happens for suspend to disk (STD) and when you don’t use NetworkManager.

Workaround: Unmount your NFS shares before you try to suspend:

sudo umount -t nfs -a

If you use NFS v4, then the command is:

sudo umount -t nfs4 -a

To check whether it worked, use this:

mount | grep nfs

This shouldn’t print anything with “type nfs” anymore. Afterwards, suspend should work.

Jazoon 2012: Divide&Conquer: Efficient Java for Multicore World

29. June, 2012

Not much new in the talk “Divide&Conquer: Efficient Java for Multicore World” by Sunil Mundluri and Velmurugan Periasamy.

Amdahl’s law shows that you can’t get an arbitrary speed-up when running part of your code in parallel. In practice, you can expect serial code to execute 2-4 times faster if you run it with, say, the fork/join framework of Java 7. This is due to setup + join cost and the fact that the tasks themselves don’t get faster – you just execute more of them at the same time. So if a task takes 10 seconds and you can run all of them in parallel, the total execution time will be a bit over 10s.

If you want to use fork/join with Java 6, you can add the jsr166y.jar to your classpath.

Again, functional programming makes everything more simple. With Java 8 and lambda expressions, syntactic sugar will make things even more readable but at a price.

You might want to check one of today’s new languages like Xtend, Scala or Groovy to get these features today with Java 6.

Jazoon 2012: Serialization: Tips, Traps, and Techniques

29. June, 2012

Every once in a while, you learn something new even though you thought you’d know it all. That’s what happened to me during the talk “Serialization: Tips, Traps, and Techniques” by Ian Partridge.

Serialization is such a basic, old technique that it’s surprising that you can learn something new about it. In a nutshell, serialization converts between object graphs and byte streams.

Unfortunately, the API is one of the oldest in the Java runtime. And it’s not one of the best. On the other hand, it’s used in many places like RMI, EJB, SDO, JPA and distributed caching.

What did I learn? Let’s see.

Did you know that it’s possible to serialize a class (without error) that you can’t read back in? It’s actually pretty simple to do: Don’t provide a default constructor (you know, those without any arguments).

You also shouldn’t try to serialize non-static inner classes because they keep a hidden reference to the outer instance.

When you use serialization, then you must take into account that the serialized form becomes part of the public API. This means that private and even final fields are suddenly part of your API that you need to document. Why? Because ObjectInputStream creates an instance using the default constructor and then it sets the final fields using Unsafe.putObject().

If you check the parameters in your constructor, then you will have to repeat them in readObject(). On top of that, you can’t trust the instances which you get from the Serialization API. An attacker can manipulate the byte stream to get references to internal data structures which you most certainly don’t want to expose.

There were *Unshared() methods added with Java 1.4 to solve these but they don’t work. Forget about them.

Anything else? Oh, yes: The serialVersionUID. Besides all the known problems, there is another one:

private static int COUNTER = 0;</pre>
public static class Version1 implements Serializable {
    void foo() {
        COUNTER = COUNTER + 1;

If someone fixes this code to

private static int COUNTER = 0;</pre>
public static class Version1 implements Serializable {
    void foo() {
        COUNTER += 1;

then deserialization fails for some versions of Java because the generated hidden accessor methods change.

The Serializable Proxy Pattern solves many of the problems.

If you use proxies, consider using Externalizable instead of Serializable

Jazoon 2012: Akka 2.0 – Scaling up and out with Actors

29. June, 2012

Concurrency is too hard but we need it. In his talk “Akka 2.0 – Scaling up and out with Actors,” Viktor Johan Klang showed new features of Akka 2.0.

The framework now uses Future to create pipes between actors and Promise to write data to, say, a stream (docs).

To make error handling more simple, there is now “parental supervision.”

Decoupling actors becomes even more with the Event Bus API.

There is support for ZeroMQ to create grids/meshes of actors (docs).

But every framework has its limitations. If you hit one of those, it’s usually either “Use the Source, Luke” or “You’re out of luck”. Akka 2.0 comes with a new extensions mechanism to hook into the framework.

Jazoon 2012: Messaging in the cloud – why do i care?

29. June, 2012

In his talk “Messaging in the cloud – why do i care?“, Oleg Zhurakousky showed some examples why you should know about messaging even if you don’t use cloud computing.

What is messaging? When a producer sends a message to a consumer over a channel/transport.

What kinds of messaging are there? Point-to-point (P2P) and publish/subscribe. An example of the former is writing a file to hard disk – you don’t expect that file to appear in several places. The latter is used in mailing lists.

P2p can be active or passive. In the active scenario, the consumer gets the message immediately. Example: Watching a web page in your browser. You wouldn’t want the browser to tell you “go drink some coffee, I’ll let you know when it’s done.”

In the passive case, the message is stored somewhere so the consumer can process it at its leisure. You mailbox is an example for this (off- and on-line).

All messaging systems are only one-way. If the consumer can reply, the implementations always make the consumer a producer. Think web sites. Your browser (producer) send a message to the server (consumer): “I want to see this page”. Then the server becomes the new producer when it sends data to the browser (new consumer).

As you can see, we’re using message based system all the time. What makes them so interesting?

They are easy to set up, easy to maintain and easy to make fault tolerant. For example, you can have these generic kind of consumer in your network:

  • Transformers – Turn one kind of message in another. XML to JSON or CSV, binary data to text, insert data into a database
  • Filters to ignore some messages without changing the code of the consumer
  • Routers to redirect messages to consumers that are interested in them
  • Splitter that can copy (parts of) a message to several consumers (distribute part in map-reduce framework)
  • Aggregators that can join several messages into a single one (reduce part in map-reduce framework)
Messages also allow you some nifty tricks like sending a message again after a timeout. If you keep a claim check, you can easily make sure that the receiver will get only a single copy of the message.


Jazoon 2012: Syntactic Salt and Sugar

29. June, 2012

Syntactic Salt and Sugar was a presentation given by James Gould and Alex Holmes. They were talking about some recent developments and whether they are good (sugar) or bad (salt).


DSLs are becoming ubiquitous. Everyone wants, needs and does DSLs today. But think of this for a moment: Is SQL a DSL?

Scary thought, eh? It’s certainly a limited language but since it’s Turing complete, the limits are more in the pain writing queries and not in the fact that it’s a language designed to query data sets.

The advantage of DSLs is that you can fine tune them to your domain. That can help to avoid a lot of confusion.

But …

  • There are five people on this planet who can develop a nice syntax that is easy to use, easy to read, easy to understand and mostly consistent. Guido van Rossum is one of them. You’re not.
  • It’s easy to fall for the “one more feature” trap in a DSL. The most important property of a DSL is that it’s limited. It’s not a general purpose programming language.
  • Getting the syntax right is very, very hard. It’s easy to define syntax in the Xtext grammar editor – as long as you blissfully ignore the consumers of your DSL. As soon as you try to make their lives easier, all hell will break loose. Do you allow trailing commas? How do you handle ambiguities? Did you make sure all error messages make sense? Is it still readable? Can you add features without breaking all existing code?
  • YALTL – Yet another language to learn

Default Methods in Java 8

In Java 8, you can add method bodies to methods defined in interfaces:

public interface Foo {
String getName() default { return "Foo"; }

Finally, you can have mixins in Java. Yay ^_^

Now, some people will wonder: Isn’t that multiple inhertiance?

Yup. And as usual, because of some “features” of Java, they had to implement this in a … surprising way. What does this code print?

public interface A {
    String getName() default { return "A"; }

public interface B {
    String getName() default { return "B"; }

public class C implements A, B {
    public void main() {
        System.out.println(new C().getName());
Nothing – it doesn’t compile because the compiler can’t decide which method to call. But this one compiles:
public interface A {
    String getName() default { return "A"; }

public interface B {
    String getName() default { return "B"; }

public interface C extends B {}

public class D implements A, C {
    public void main() {
        System.out.println(new C().getName());

If you’re wondering: Instead of inheriting directly from “B”, I added a new interface “C”. Now, “A” is “closer” and it will print “A”.

That means changes in A or C can modify the behavior of D. If you’re lucky, the compiler will refuse to compile it. *sigh*

No Free Lunch

Again, it’s easy to see that each feature comes with a cost attached.

Jazoon 2012: Building Scalable, Highly Concurrent and Fault-Tolerant Systems: Lessons Learned

29. June, 2012

What do Cloud Computing, multi-core processors and Big Data have in common?


In his presentation, Jonas Bonér showed what you should care about:

  • Always prefer immutable
  • Separate concerns in different layers with the minimum amount of dependencies
  • Separate error handling from the business logic
  • There is no free lunch: For every feature, you will have to pay a price
  • Avoid using RPC/RMI. Try lure you into “convenience over correctness”
  • Make sure you handle timeouts correctly
  • Use CALM if you can
  • Not all your data needs ACID.
  • Know about CAP and BASEDrop ACID And Think About Data
  • Get rid of dependencies by using event sourcing/CQS/CQRS
  • Frameworks like Hibernate always leak in places where you can’t have it. KISS.

Longer explanation:

Immutables can always be shared between threads. Usually, they are also simple to share between processes, even when they run on different computers. Trying locks and clever concurrency will only get you more bugs, unmaintainable code and a heart attack.

Dependencies kill a project faster and more efficiently than almost any other technique. Avoid them. Split your projects into Maven modules. You can’t import what you don’t have on the classpath.

Error handling in your business logic (BL) will bloat the code and make it harder to maintain. Business logic can’t handle database failures. Parameters should have been validated before they were passed to business logic. Business logic should produce a result and the caller should then decide what to do with it (instead of mixing persistence code into your business layer). The BL shouldn’t be aware that the data comes from a database or that the result goes back into a database. What would your unit tests say? See also Akka 2.0 and “parental supervision.”

Obvious programming has a value: You can see what happens. It has a price: Boiler plate code. You can try to hide this but it will still leak. Hibernate is a prefect example for this. Yes, it hides the fact that getChildren() needs to run a query against the database – unless the entity leaks outside of your transaction. It does generate proxies to save you from seeing the query but that can break equals().

Same applies to RMI. When RMI decides that you can’t handle the message, then you won’t even see it. In many cases, a slightly “unusual” message (like one with additional fields) wouldn’t hurt.

As soon as you add RMI or clustering, you add an invisible network in your method calls. Make sure you have the correct timeouts (so your callers don’t block forever) and that you handle them correctly. New error sources that are caused adding the network:

  1. Failure to serialize the message
  2. Host unreachable
  3. Packet drops
  4. Network lag
  5. Destination doesn’t accept message because of configuration error
  6. Message is sent to the wrong destination
  7. Destination can’t read message
Claim checks allow to resend a message again after a timeout without having it processed twice by the consumer.

CALM and BASE refer to the fact that you can only have two of the tree CAP characteristics: Consistency, Availability and Partition Tolerance. Since Partition Tolerance (necessary for scaling) and Availability (what’s the point of having a consistent but dead database?) are most important, you have to sacrifice consistency. CALM and BASE show ways to eventually reach consistency, even without manual intervention. For all data related to money, you will want consistency as well but think about it: How many accounts are there in your database? And how many comments? Is ACID really necessary for each comment?

Solution: Put your important data (when money is involved) into an old school relational database. Single instance. Feed that database with queues, so it doesn’t hurt (much) when it goes down once in a while. Put comments, recommendations, shopping carts into a NoSQL database. So what if a shopping cart isn’t synchronized over all your partitions? Just make sure that users stay on one shard and they will only notice when the shard dies and you can’t restore the shopping cart quickly enough from the event stream.

Which event stream? The one which your CQRS design created. More on that in another post. You might also want to look at Akka 2.0 which comes with a new EventBus.

Jazoon 2012: IBM Watson since Jeopardy!

29. June, 2012

From the summary:

In February 2011, IBM demonstrated its latest Research breakthroughs in natural language processing and deep question answering. Named Watson, it made history when it was entered into the famously complex US television quiz-show ‘Jeopardy!‘ where it comfortably beat two of the greatest human players ever to appear on the show. Since then, work has focused on bringing these breakthroughs to real-world problems.

If you haven’t seen the video, now is a good time: Episode 1, Episode 2, Episode 3

Before the show, Watson was trained with data from a variety of sources, including Wikipedia and dbpedia. The software is able to process both unstructured and structured data and learn from it. That means it converts the data into an internal representation that its various answer finding modules can then use. These modules include classic AI inferencing algorithms as well as Lucene based full-text search modules.

This is basically what makes Watson different: Instead of relying on a single, one-size-fits-all strategy, Watson uses many different strategies and each of them returns a “result” where a result consists of an answer and a “confidence” that this answer might be useful or correct.

Instead of mapping all the confidence values to a predefined range, each module can return any number for confidence. So some modules return values between 0 and 1, others from -1 to 1 and yet others return values between +/-∞ (including both). The trick is that Watson uses an extensive training session to learn how to weigh the outputs of the different modules. To do this, the correct answers for a large set of questions is necessary.

Which makes Jeopardy! such a perfect fit: They have accumulated the correct answers for thousands of questions that were asked during the show and that made it so “easy” to train Watson automatically because IBM engineers could debug the answering process when Watson erred.

But Watson isn’t about winning TV shows. The current goal is to turn Watson into a tool that can be used by doctors around the world to identify illnesses. Today, doctors work so many hours per week that they can only read a tiny fraction of all the articles that are published. Surveys show that 81% of doctors read less than 5h/month. One solution would be to hire more doctors. Guess what that would mean for costs in the health sector.

Or we could make Watson read all that and present all that blabla in a compressed form when the symptoms match. Think Google where you don’t know what you’re looking for.

Sounds good? Or frightening? Some people in the audience were thinking “Skynet” but here are some facts that you should know:

  • In health care, Watson is a “medical device”. These are heavily regulated.
  • The goal is not to have “Dr. Watson.” The goal is to give doctors a smart library, not a smart ass or something that can directly make unsupervised decisions about the next therapy step.
  • IBM isn’t developing the product alone. They are working with companies from the health care sector who know how doctors (should) work. You might want to see this video: Watson Computer Comes to the University of Maryland and have a look at this channel: IBMWatsonSolutions
  • Privacy is an important concern. Watson will see millions of medical records. There are pretty strict laws governing this (HIPAA)
  • Watson isn’t a data warehouse. It won’t process all the medical records into one huge data set which it can query. Instead, doctors will enter symptoms in a standardized way and Watson will present a list of things to check plus medical conditions that match.
  • For training, Watson needs a huge list of correct answers. It doesn’t try to find patterns by itself.

So unlike Skynet, Watson is much more like a boring tool. Sorry.

One very interesting aspect is that Watson is something that you won’t buy as a product. Instead, it’s probably going to be a cloud service which charges, say, per question.

Other fields where it would be useful:

  • Justice. Who has time to read all the laws and regulations that the government ships all the time?
  • Legislation
  • Engineering
  • Research in chemistry, physics and genetics