Jazoon 2012: Building Scalable, Highly Concurrent and Fault-Tolerant Systems: Lessons Learned

29. June, 2012

What do Cloud Computing, multi-core processors and Big Data have in common?

Parallelism.

In his presentation, Jonas Bonér showed what you should care about:

  • Always prefer immutable
  • Separate concerns in different layers with the minimum amount of dependencies
  • Separate error handling from the business logic
  • There is no free lunch: For every feature, you will have to pay a price
  • Avoid using RPC/RMI. Try lure you into “convenience over correctness”
  • Make sure you handle timeouts correctly
  • Use CALM if you can
  • Not all your data needs ACID.
  • Know about CAP and BASEDrop ACID And Think About Data
  • Get rid of dependencies by using event sourcing/CQS/CQRS
  • Frameworks like Hibernate always leak in places where you can’t have it. KISS.

Longer explanation:

Immutables can always be shared between threads. Usually, they are also simple to share between processes, even when they run on different computers. Trying locks and clever concurrency will only get you more bugs, unmaintainable code and a heart attack.

Dependencies kill a project faster and more efficiently than almost any other technique. Avoid them. Split your projects into Maven modules. You can’t import what you don’t have on the classpath.

Error handling in your business logic (BL) will bloat the code and make it harder to maintain. Business logic can’t handle database failures. Parameters should have been validated before they were passed to business logic. Business logic should produce a result and the caller should then decide what to do with it (instead of mixing persistence code into your business layer). The BL shouldn’t be aware that the data comes from a database or that the result goes back into a database. What would your unit tests say? See also Akka 2.0 and “parental supervision.”

Obvious programming has a value: You can see what happens. It has a price: Boiler plate code. You can try to hide this but it will still leak. Hibernate is a prefect example for this. Yes, it hides the fact that getChildren() needs to run a query against the database – unless the entity leaks outside of your transaction. It does generate proxies to save you from seeing the query but that can break equals().

Same applies to RMI. When RMI decides that you can’t handle the message, then you won’t even see it. In many cases, a slightly “unusual” message (like one with additional fields) wouldn’t hurt.

As soon as you add RMI or clustering, you add an invisible network in your method calls. Make sure you have the correct timeouts (so your callers don’t block forever) and that you handle them correctly. New error sources that are caused adding the network:

  1. Failure to serialize the message
  2. Host unreachable
  3. Packet drops
  4. Network lag
  5. Destination doesn’t accept message because of configuration error
  6. Message is sent to the wrong destination
  7. Destination can’t read message
Claim checks allow to resend a message again after a timeout without having it processed twice by the consumer.

CALM and BASE refer to the fact that you can only have two of the tree CAP characteristics: Consistency, Availability and Partition Tolerance. Since Partition Tolerance (necessary for scaling) and Availability (what’s the point of having a consistent but dead database?) are most important, you have to sacrifice consistency. CALM and BASE show ways to eventually reach consistency, even without manual intervention. For all data related to money, you will want consistency as well but think about it: How many accounts are there in your database? And how many comments? Is ACID really necessary for each comment?

Solution: Put your important data (when money is involved) into an old school relational database. Single instance. Feed that database with queues, so it doesn’t hurt (much) when it goes down once in a while. Put comments, recommendations, shopping carts into a NoSQL database. So what if a shopping cart isn’t synchronized over all your partitions? Just make sure that users stay on one shard and they will only notice when the shard dies and you can’t restore the shopping cart quickly enough from the event stream.

Which event stream? The one which your CQRS design created. More on that in another post. You might also want to look at Akka 2.0 which comes with a new EventBus.


Jazoon 2011, Day 1 – Opening Keynote

26. June, 2011

Opening Keynote

The opening keynote was “Platforms in the Cloud: Where Will Your Next Application Run?” by David Chappell. He put a lot of the bits and piece of cloud computing into perspective: Private and public clouds, when a cloud makes sense and why people use clouds. Some use it because it’s a way to avoid their own IT which says a lot. He also put a couple of frameworks and products next to each other to make it more easy to see through all the fog.

Personally, I agree with him: Cloud computing is the next step. It solves one of the basic problems in computers today: You have too much computer power when you don’t need it and too little when you do.

Actually, I hope that CC won’t only make life easier for the business but also for developers. More on that in my next installment of TNBT – The Next Best Thing.

Some highlights from the talk:

Start-ups need to fail fast or scale fast. So clouds are perfect for them: Cheap, salable.

In the long run PaaS (Platform as a Service) will win over IaaS (Infrastructure as a Service). There are already many companies which offer PaaS by tailoring an IaaS VM to do what you need.

When it comes to NoSQL, that means “not only SQL”. Most applications need a mix of SQL and non-SQL data sources. For example an MP3 cloud player will keep the song titles and other meta data in an SQL table (so you can easily sort and search) but the songs will be in a non-SQL storage.

Another use case for cloud computing is off-site backup. That puts your data at risk to being copied but which is more hazardous for your company: That a competitor might be able to break the encryption or that the data is lost forever? If you lose your business data, you’ll probably bankrupt faster.

I talked to him after the presentation and he made an odd comment about open source (“Oh, you’re one of those open source guys. Don’t you have children to feed?”) My guess is he makes the same mistake as many people: Free software is free as in freedom, not as in beer. You can change it but there is no reason to give it away for free. Some people do but that only means they have some other means to generate revenue.


AeroFS – A New Distributed File System

11. May, 2011

AeroFS is a new distributed file system (from their website):

Unlimited Storage

Using AeroFS, you can sync allthe data on your devices. No limits. No caps. You already have your storage, now use it!

Ultimate Privacy

AeroFS will never store your files in the cloud (unless you want to, of course ;-). Your files will only be shared with those who you invite.

Better Security

AeroFS encrypts your data end-to-end. This way, we are able to provide better security than most online storage services. Seriously.

  • Because AeroFS is completely distributed, even if we experience downtime,you won’t!
Sounds like an interesting solution. Especially since your data never leaves your country (unless you add foreign servers) and there are only very little cost for the company behind the service (you run all the involved servers).
With Dropbox and similar services, you can never be sure where your data ends up. They say it’s safe but that only holds true until a) the company goes bankrupt or b) some government agency knocks on their doors to hunt terrorists.