Jazoon 2012: Syntactic Salt and Sugar

29. June, 2012

Syntactic Salt and Sugar was a presentation given by James Gould and Alex Holmes. They were talking about some recent developments and whether they are good (sugar) or bad (salt).

DSLs

DSLs are becoming ubiquitous. Everyone wants, needs and does DSLs today. But think of this for a moment: Is SQL a DSL?

Scary thought, eh? It’s certainly a limited language but since it’s Turing complete, the limits are more in the pain writing queries and not in the fact that it’s a language designed to query data sets.

The advantage of DSLs is that you can fine tune them to your domain. That can help to avoid a lot of confusion.

But …

  • There are five people on this planet who can develop a nice syntax that is easy to use, easy to read, easy to understand and mostly consistent. Guido van Rossum is one of them. You’re not.
  • It’s easy to fall for the “one more feature” trap in a DSL. The most important property of a DSL is that it’s limited. It’s not a general purpose programming language.
  • Getting the syntax right is very, very hard. It’s easy to define syntax in the Xtext grammar editor – as long as you blissfully ignore the consumers of your DSL. As soon as you try to make their lives easier, all hell will break loose. Do you allow trailing commas? How do you handle ambiguities? Did you make sure all error messages make sense? Is it still readable? Can you add features without breaking all existing code?
  • YALTL – Yet another language to learn

Default Methods in Java 8

In Java 8, you can add method bodies to methods defined in interfaces:

public interface Foo {
String getName() default { return "Foo"; }
}

Finally, you can have mixins in Java. Yay ^_^

Now, some people will wonder: Isn’t that multiple inhertiance?

Yup. And as usual, because of some “features” of Java, they had to implement this in a … surprising way. What does this code print?

public interface A {
    String getName() default { return "A"; }
}

public interface B {
    String getName() default { return "B"; }
}

public class C implements A, B {
    public void main() {
        System.out.println(new C().getName());
    }
}
Nothing – it doesn’t compile because the compiler can’t decide which method to call. But this one compiles:
public interface A {
    String getName() default { return "A"; }
}

public interface B {
    String getName() default { return "B"; }
}

public interface C extends B {}

public class D implements A, C {
    public void main() {
        System.out.println(new C().getName());
    }
}

If you’re wondering: Instead of inheriting directly from “B”, I added a new interface “C”. Now, “A” is “closer” and it will print “A”.

That means changes in A or C can modify the behavior of D. If you’re lucky, the compiler will refuse to compile it. *sigh*

No Free Lunch

Again, it’s easy to see that each feature comes with a cost attached.


Jazoon 2012: Building Scalable, Highly Concurrent and Fault-Tolerant Systems: Lessons Learned

29. June, 2012

What do Cloud Computing, multi-core processors and Big Data have in common?

Parallelism.

In his presentation, Jonas Bonér showed what you should care about:

  • Always prefer immutable
  • Separate concerns in different layers with the minimum amount of dependencies
  • Separate error handling from the business logic
  • There is no free lunch: For every feature, you will have to pay a price
  • Avoid using RPC/RMI. Try lure you into “convenience over correctness”
  • Make sure you handle timeouts correctly
  • Use CALM if you can
  • Not all your data needs ACID.
  • Know about CAP and BASEDrop ACID And Think About Data
  • Get rid of dependencies by using event sourcing/CQS/CQRS
  • Frameworks like Hibernate always leak in places where you can’t have it. KISS.

Longer explanation:

Immutables can always be shared between threads. Usually, they are also simple to share between processes, even when they run on different computers. Trying locks and clever concurrency will only get you more bugs, unmaintainable code and a heart attack.

Dependencies kill a project faster and more efficiently than almost any other technique. Avoid them. Split your projects into Maven modules. You can’t import what you don’t have on the classpath.

Error handling in your business logic (BL) will bloat the code and make it harder to maintain. Business logic can’t handle database failures. Parameters should have been validated before they were passed to business logic. Business logic should produce a result and the caller should then decide what to do with it (instead of mixing persistence code into your business layer). The BL shouldn’t be aware that the data comes from a database or that the result goes back into a database. What would your unit tests say? See also Akka 2.0 and “parental supervision.”

Obvious programming has a value: You can see what happens. It has a price: Boiler plate code. You can try to hide this but it will still leak. Hibernate is a prefect example for this. Yes, it hides the fact that getChildren() needs to run a query against the database – unless the entity leaks outside of your transaction. It does generate proxies to save you from seeing the query but that can break equals().

Same applies to RMI. When RMI decides that you can’t handle the message, then you won’t even see it. In many cases, a slightly “unusual” message (like one with additional fields) wouldn’t hurt.

As soon as you add RMI or clustering, you add an invisible network in your method calls. Make sure you have the correct timeouts (so your callers don’t block forever) and that you handle them correctly. New error sources that are caused adding the network:

  1. Failure to serialize the message
  2. Host unreachable
  3. Packet drops
  4. Network lag
  5. Destination doesn’t accept message because of configuration error
  6. Message is sent to the wrong destination
  7. Destination can’t read message
Claim checks allow to resend a message again after a timeout without having it processed twice by the consumer.

CALM and BASE refer to the fact that you can only have two of the tree CAP characteristics: Consistency, Availability and Partition Tolerance. Since Partition Tolerance (necessary for scaling) and Availability (what’s the point of having a consistent but dead database?) are most important, you have to sacrifice consistency. CALM and BASE show ways to eventually reach consistency, even without manual intervention. For all data related to money, you will want consistency as well but think about it: How many accounts are there in your database? And how many comments? Is ACID really necessary for each comment?

Solution: Put your important data (when money is involved) into an old school relational database. Single instance. Feed that database with queues, so it doesn’t hurt (much) when it goes down once in a while. Put comments, recommendations, shopping carts into a NoSQL database. So what if a shopping cart isn’t synchronized over all your partitions? Just make sure that users stay on one shard and they will only notice when the shard dies and you can’t restore the shopping cart quickly enough from the event stream.

Which event stream? The one which your CQRS design created. More on that in another post. You might also want to look at Akka 2.0 which comes with a new EventBus.


Jazoon 2012: IBM Watson since Jeopardy!

29. June, 2012

From the summary:

In February 2011, IBM demonstrated its latest Research breakthroughs in natural language processing and deep question answering. Named Watson, it made history when it was entered into the famously complex US television quiz-show ‘Jeopardy!‘ where it comfortably beat two of the greatest human players ever to appear on the show. Since then, work has focused on bringing these breakthroughs to real-world problems.

If you haven’t seen the video, now is a good time: Episode 1, Episode 2, Episode 3

Before the show, Watson was trained with data from a variety of sources, including Wikipedia and dbpedia. The software is able to process both unstructured and structured data and learn from it. That means it converts the data into an internal representation that its various answer finding modules can then use. These modules include classic AI inferencing algorithms as well as Lucene based full-text search modules.

This is basically what makes Watson different: Instead of relying on a single, one-size-fits-all strategy, Watson uses many different strategies and each of them returns a “result” where a result consists of an answer and a “confidence” that this answer might be useful or correct.

Instead of mapping all the confidence values to a predefined range, each module can return any number for confidence. So some modules return values between 0 and 1, others from -1 to 1 and yet others return values between +/-∞ (including both). The trick is that Watson uses an extensive training session to learn how to weigh the outputs of the different modules. To do this, the correct answers for a large set of questions is necessary.

Which makes Jeopardy! such a perfect fit: They have accumulated the correct answers for thousands of questions that were asked during the show and that made it so “easy” to train Watson automatically because IBM engineers could debug the answering process when Watson erred.

But Watson isn’t about winning TV shows. The current goal is to turn Watson into a tool that can be used by doctors around the world to identify illnesses. Today, doctors work so many hours per week that they can only read a tiny fraction of all the articles that are published. Surveys show that 81% of doctors read less than 5h/month. One solution would be to hire more doctors. Guess what that would mean for costs in the health sector.

Or we could make Watson read all that and present all that blabla in a compressed form when the symptoms match. Think Google where you don’t know what you’re looking for.

Sounds good? Or frightening? Some people in the audience were thinking “Skynet” but here are some facts that you should know:

  • In health care, Watson is a “medical device”. These are heavily regulated.
  • The goal is not to have “Dr. Watson.” The goal is to give doctors a smart library, not a smart ass or something that can directly make unsupervised decisions about the next therapy step.
  • IBM isn’t developing the product alone. They are working with companies from the health care sector who know how doctors (should) work. You might want to see this video: Watson Computer Comes to the University of Maryland and have a look at this channel: IBMWatsonSolutions
  • Privacy is an important concern. Watson will see millions of medical records. There are pretty strict laws governing this (HIPAA)
  • Watson isn’t a data warehouse. It won’t process all the medical records into one huge data set which it can query. Instead, doctors will enter symptoms in a standardized way and Watson will present a list of things to check plus medical conditions that match.
  • For training, Watson needs a huge list of correct answers. It doesn’t try to find patterns by itself.

So unlike Skynet, Watson is much more like a boring tool. Sorry.

One very interesting aspect is that Watson is something that you won’t buy as a product. Instead, it’s probably going to be a cloud service which charges, say, per question.

Other fields where it would be useful:

  • Justice. Who has time to read all the laws and regulations that the government ships all the time?
  • Legislation
  • Engineering
  • Research in chemistry, physics and genetics

Related:


Jazoon 2012: Architecting non-trivial browser applications

28. June, 2012

Marc Bächinger gave a presentation how to develop HTML5 browser applications.

The big advantage of HTML5+JavaScript is that it gives users a better experience and usability. One of the first steps should be to decide which framework(s) you want to use. You can use one of the big, monolithic, one-size-fits-all frameworks that do everything or select best-of-breed frameworks for specific aspects (browser facade, MVC framework, helper libraries and components).

You should use REST on the server side because that makes the server and the components of your application easier to reuse.

The main drawback is that you have (often much) more complexity on the client. This can be controlled by strict application of the MVC pattern.

Browser facades

Every browser has its quirks and most of the time, you just don’t want to know. Browser facades try hard to make all browsers similar. Examples are jQuery and zepto.js

MVC frameworks

Backbone.js, Spine.js, Knockout.js, ember.js, JavaScriptMVC, Top 10 JavaScript MVC frameworks

Helper libraries and frameworks

gMap, OSM, Raphaël, jQuery UI, Twitter bootstrap.js, mustache, jade

Important

Since the whole application now runs in the client, security is even more important since attackers can do anything that you don’t expect.


Jazoon 2012: How to keep your Architecture in good Shape?!

28. June, 2012

Ingmar Kellner presented some tips how to prevent your architecture rotting into a mess. When that happens, you will have these problems:

  • Rigidity – The system is hard to change because every change forces many other changes.
  • Fragility – Changes cause the system to break in conceptually unrelated places.
  • Immobility – It’s hard to disentangle the system into reusable components.
  • Viscosity – Doing things right is harder than doing things wrong.
  • Opacity – It is hard to read and understand. It does not express its intent well.

(Robert C. Martin)

According to Tom DeMarco, your ability to manage this depends on control. And control depends on measurements – if you can’t measure something, you can’t control it.

How rotten is your software? Look for cycle groups (some package X depends on Y depends on Z depends on A depends on X):

  • They tend to stay
  • They tend to grow
  • They are a strong smell

Ingmar showed some examples in the JDK 6 (lots of cycles) and ActiveMQ (lots of cycles in 4.x, much better in 5.0 but again growing since then).

What can you do?

Use a consistent “architecture blueprint” that makes it obvious which layer/slice can use what. In the blueprint, layers are horizontal (presentation, domain, persistence) and slices are vertical (everything related to contracts, customers, users, and finally common code).

You will need someone with the role “Architect” who “defines the architecture, thresholds for coding metrics, identifies ‘hot spots'” and developers who “implement use cases, respecting the architecture and coding metrics thresholds.” All this is verified by a CI server.

At the same time, avoid “rulitis” – the false belief that more and stricter rules makes things “better.”

Some rules you might want to use:

  • The blueprint is free of cycles
  • Package naming convention that matches the blueprint
  • Control coupling and cycles with tools
  • Use tools to control code duplication, file size, cyclomatic complexity, number of classes per package, etc.
  • Reserve 20% of your time for refactoring

Following these rules can help to reduce costs during the maintenance phase:

  • 50% less time
  • 50% of the budget
  • 85% less defects

according to a study conducted by Barry M. Horowitz for the Department of Defense.


Jazoon 2012: Why you should care about software assessment

28. June, 2012

Tudor Girba gave a presentation at the Jazoon about a topic that is very dear to him: Software assessment. To quote:

What is assessment? The process of understanding a given situation to support decision-making. During software development, engineers spend as much as 50% of the overall effort on doing precisely that: they try to understand the current status of the system to know what to do next.

In other words: Assessment is a process and a set of tools to help developers to make decisions. They typical example is a bug shows up and you need to fix it. That raises the usual questions:

  1. What happened?
  2. Why did it happen?
  3. Where did it happen?
  4. How can I fix it?

As we all know, each of these steps can be difficult. As an extreme example, someone mentioned selling software to the NSA. It crashed. The NSA calls the developer:

NSA: “There is a problem with your software.”

You: “Who am I talking with?”

NSA: “Sorry, I can’t tell you that.”

You: “Well … okay. So what problem?”

NSA: “I can’t tell you that either.”

You: “… Can you give me a stack trace?”

NSA: “I’m afraid not.”

Unlikely but we all know similar situations. Even seasoned software developers are guilty of giving completely useless failure reports: “It didn’t work.” … “What are you talking about? What’s ‘it’?”

Tudor gave some nice examples how he used simple assessment tools that allow him to query log files and sources of some application to locate bugs, locate similar bugs and help to find out why some part doesn’t behave well. Examples:

  1. An application usually returns text in the user’s language but some rare error message is always in German. Cause: When the error message was created, the code called Locale.getDefault()
  2. Several other places could be found that showed the same behavior by searching the source code for places where Locale.getDefault() was called either directly or indirectly. A test case was added to prevent this from happening again.
  3. Some cache would have a hit ratio of less than 50%. Analyzing the logs showed that two components used the same cache. When each got their own cache, the hit ratios reached sane levels.

So assessments allow you to do strategic planning by showing you all the dependencies that some part of the code has (or the whole application).

In a spike assessment, you can analyze some small part to verify that a change would or could have the desired effect (think performance).

Did you know that developers spend about 50% of the time reading code? If tools can help them understand some piece of code faster, that makes them more productive. Unfortunately, today’s tools are pretty limited when it comes to this. Eclipse can show me who calls Locale.getDefault() but it can’t show me indirect calls.

Worse: If the developer makes the wrong decision because she couldn’t see all the important facts, then these often have a huge impact.

Another important aspect is how you use metrics. Metrics are generally useful but the same is not true for every metric. Just like you wouldn’t copy unit tests from one project to the next, you need to reevaluate the metrics that you extract from each project. Some will just be a waste of time for certain projects.

My comments:

We really, really need better tooling to chop data. IDEs should allow me to run queries against my source code, collect and aggregate data and check the results in unit tests to validate design constraints.

It was also interesting to see how Tudor works. He often uses simple words which can be misleading. But when you look at the slides, then there was this graph about some data points. Most graphs show a linear Y axis with the ticks evenly spread. He uses a different approach:

Usual diagram to the left, Tudor’s version to the right

Related links:


Jazoon 2012: Large scale testing in an Agile world

28. June, 2012

Alan Ogilvie is working at a division of IBM responsible for testing IBM’s Java SE product. Some numbers from his presentation:

  • A build for testing is about 500MB (takes 17 min to download to a test machine)
  • There are 20 different versions (AIX, Linux, Windows, z/OS * x86, power, zSeries)
  • The different teams create 80..200 builds every day
  • The tests run on heaps from 32MB to 500GB
  • They use hardware with 1 to 128+ cores
  • 4 GC policies
  • More than 1000 different combinations of command line options
  • Some tests have to be repeated a lot of time to catch “1 out of 100” failures that happen only very rarely

That amounts to millions of test cases that run every month.

1% of them fail.

To tame this beast, the team uses two approaches:

  1. Automated failure analysis that can match error messages from the test case to known bugs
  2. Not all of the tests are run every time

The first approach makes sure that most test failures can be handled automatically. If some test is there to trigger a known bug, that shouldn’t take any time from a human – unless the test suddenly succeeds.

The second approach is more interesting: They run only a small fraction of the tests every time the test suite is started. How can that possibly work?

If you run a test today and it succeeds, you will have some confidence that it still works today. You’re not 100% sure but, well, maybe 99.5%. So you might skip this test today and mark it as “light green” in the test results (as opposed to “full green” for a test that has been run this time).

What about the next day? You’re still 98% sure. And the day after that? Well, our confidence is waning fast, so we’re still pretty sure – 90%.

The same goes for tests that fail. Unless someone did something about them (and requested that this specific test is run again), you can be pretty sure that the test would fail again. So it gets light red unlike the tests that failed today.

This way, most tests only have to be run once every 4-5 days during development.

Why would they care?

For a release, all tests need to be run. That takes three weeks.

They really can’t possibly run all tests all the time.


Jazoon 2012: Development Next – And Now For Something Completely Different?

28. June, 2012

Dave Thomas gave the keynote speech (link) about how technology seems to change all around us just to show up the same, old problems over and over again.

Things that I took home:

  • In the future, queries will me more important than languages
  • Big data is big

Some comments from me:

How often were you irritated by how source code from someone else looked? I don’t mean sloppy, I mean indentation or how they place spaces and braces. In 2012, it should be possible to separate the model (source code) from the view (text editor) – why can’t my IDE simply show me the source in the way that I like and keep the source code in a nice, common format? (see Bug 45423 – [formatting] Separate presentation from formatting)

And how often have you wondered “Which parts of the code call Locale.getDefault() either directly or indirectly?”

How often did you need to make a large-scale change in the source code which could have been done with a refactoring in a few minutes – but writing the refactoring would have taken days because there simply are no tools to quickly write even simple refactorings in your IDE?

Imagine this: You can load the complete AST of your source code into a NoSQL database. And all the XML files. And the configuration files. And the UML. And everything else. And then create links between those parts. And query those links … apply a piece of JavaScript to each matching node …

Customers of my applications always want new business reports every day. It takes way too long to build these reports. It takes too much effort to change them. And it’s impossible to know whether a report is correct because no one can write test cases for them.


Jazzon 2011, Day 3

26. June, 2011

The day started with a keynote from a M$ guy:


Jazzon 2011, Day 3 – How to become a famous author and publish a book: Using Freemium Content with a Profit – Pouline Middleton

26. June, 2011

How to become a famous author and publish a book: Using Freemium Content with a Profit – Pouline Middleton

Pouline showed us the obstacles in which you run when you try to publish your own book. It confirmed my own conclusions: Unless your name is Stephen King or J. K. Rowling, publishers aren’t really for you. They most often want to own your work but selling it is more of a second thought.

Instead of using the common channels to sell her book, she chose a freemium model. You can read the book as blog posts, by email or buy it from herself (she eventually founded her own publishing company Fiction Works.

The fun (or sad) part is that she made much more money this way than she could have hoped for if she had used the traditional channels.

Here is an example. Say you have 1’000 die hard fans (not so hard to come by when the Internet has almost 1 billion users). Each of them buys from you for $100. Again not so much. That gives you $100’000.


%d bloggers like this: