Good Introduction to Threads and Shared Data Structures

20. June, 2014

Soon, we’ll have computers with 1024 cores but that won’t help unless software developers write code that make use of them.

To do that, you need a fundamental understanding on how threads work and what parallel algorithms are and what the real-world problems might be.

Dmitry Vyukov has created a web site “1024cores” which gives you both an introduction to the basics (what are we talking about? which tools to we have? what is a memory model and why should I care?) as well as some practical algorithms like concurrent skip lists with detailed descriptions of the problems that you will encounter and how to fix them.

If you want to know what the discussion is all about or if you want to polish your knowledge or if you need a specific solution, this is a good place to start 🙂

Jazoon 2010, day 2

7. June, 2010

This is my report of day 2 (see my posts about day 1).

Total Cost of Ownership by Ken Schwaber

This talk was basically about Scrum and the fact that you can’t get something of value for free. Or to put it another way around: If you save some time today by quickly hacking up a feature, you’re gonna pay in the future. There is even an interest on this, so the later you have to pay, the more expensive it will get.

So the next time your boss asks you to do something “quick”, ask him whether (s)he is aware of the total cost and whether (s)he is really willing to pay it.

Unleash your processor(s) by Václav Pech

We all know that CPU’s don’t get faster, they just reproduce faster. PCs sold today have 4 cores (and each core can execute two threads at the same time). In 2012, intel is planning to release a 50 core chip and that’s just peanuts to what you can find on your graphics card (which can have roughly 600 very simple CPUs on a single chip).

The main problem here is that we, as humans, are extremely good at parallel processing at the hardware level (most of our body continues to works while we talk, think, eat, etc.) but we’re extremely bad when thinking about parallel problems.

Concepts from the HPC world and mainframes come to the rescue: Actors, Fork/Join, Parallel Arrays, Agents and Dataflow.

The idea is to get away from the tedious synchronization and use data structures which are already thread-safe and then write simple algorithms which are invoked by a framework on an as-needed basis. Imagine you have a huge amount of images to scale to thumbnails. The algorithm is always the same and it works independent of the input. So you can allocate a number of generic worker threads. Each of them gets a copy of the algorithm at runtime plus the arguments (image and thumbnail file name).

Then you have an algorithm to traverse the directory tree which produces the input and output file names. Instead of doing everything yourself, you take a parallel array and add file names as your tree workers discover them. This will trigger the thumbnail workers.

The interesting thing here: No synchronization. You don’t even write the threads. All you do is a single call:

inParallel (filenames, thumbnailer);

The thumbnailer is just as simple:

public class Thumbnailer extends Actor {
    public void act (Object item) {
        File input = ((File[])item)[0];
        File output = ((File[])item)[0];
        ... insert favorite scaler here ...

See? No synchronized, volatile or extends Thread. Can’t wait? Check out JSR-166y

JavaFX: Designer developer workflow by Martin Gunnarsson and Pär Sikö

Tough one since I couldn’t decide where to go. HTML 5 with WebSockets? Maven 3.0?

Mouth-wetting talk about what you could do with software if everyone was just a little bit more open. In the talk, they showed how you could draw something in Photoshop and then export the design and use it directly in JavaFX.

It also showed some of the new features of JavaFX 1.3 which seems to follow the historical model of Java: 1.2 is the first version which is really usable. But it’s nice to see some progress in the Java world at last. I just wished we’d have got these five years ago when it would have mattered 🙂

Lunch break. 🙂

JavaFX – The condemned live longer by Andreas Fürer, David Sauter and Daniel Seiler

Along the same lines as the previous talk but this time, it shows some of the dark sides of JavaFX. Mostly boils down to: If you want to do fancy graphics in JavaFX, just do it. If you want to use it for more traditional UIs , then think again. Everything but the most simple components are still missing and those which exist sometimes have ugly bugs. 😦

My conclusion: Immature, oversold technology (or in managese: Sun’s bold response to the threads imposed by RCP and Flex/AIR).

Patterns and Practices in Hibernate by Patrycja Wegrzynowicz

Hibernate might be the most successful OR mapper for Java but it’s not the most simple (which is partly because of the documentation and that the problem itself isn’t very simple). I own the standard book about Hibernate, too, and I can agree that it explains in detail all the great features of Hibernate but it doesn’t answer “Why would I use that?”

Patrycja did a code review of the examples in the book and came up with lots of small problems and a couple of major ones (like code which doesn’t lock the rows in the correct order leading to an illegal state in the database). I talked to her and she wants to put the results of her code review online. This would mean we’d get a project with correct examples for using Hibernate.

She also mentioned SOLID which is an acronym made up from acronyms and stands for:

S – SRP (Single responsibility principle),
O – OCP (Open/closed principle),
L – LSP (Liskov substitution principle),
I – ISP (Interface segregation principle),
D – DIP (Dependency inversion principle)

This blog also explains it very well.

Migration to JPA – real life experience by Jan Sliwa

It’s always interesting when marketing hits the real world and all those buzz words are stripped to the bones. Jan talked about how to build a Java application which connects data centers all over Europe which contain sensitive data (medical records). To make the data secure, they applied a simple solution: The personal data is stored on the computers of the responsible doctor and only the medical files are saved on the servers. This means that the medical data itself is anonymous.

Two of the problems they encountered were:

  1. Creating an EntityManagerFactory is expensive. When do I open/close one? Is one enough for the whole application? Do I need a pool?
  2. How do I know whether an object is detached from the session?

He also talked about problems during testing. Maybe he should read my blog more often 🙂

Managed JPA in an OSGi framework – getting the best of both worlds by Tim Ward

OSGi is a framework for the paranoid. By default, it hides everything. So how do you expose your model to both the JPA framework and all the other places where it is used?

Tim explains the problems they encountered and how they solved them. My conclusion: For your problem, OSGi is not the solution. Spring and similar frameworks have shown how to do DI properly and Maven has shown how to handle dependencies. OSGi more and more feels like a remnant from the cold war where no one trusted anyone.

That’s for all for day 2. Next: day 3.