TNBT: Persistence

19. February, 2011

In this issue of “The Next Big Thing”, I’ll talk about something that every software uses and which is always developed again from scratch for every application: Persistence.

Every application needs to load data from “somewhere” (user preferences, config settings, data to process) and after processing the data, it needs to save the results. Persistence is the most important feature of any software. Without it, the code would be useless.

Oddly, the most important area of the software isn’t a shiny skyscraper but a swamp: Muddy, boggy, suffocating.

Therefore, the next big thing in software development must make loading and saving data a bliss. Some features it needs to have:

  • Transaction contexts to define which data needs to be rolled back in case of an error. Changes to the data model must be atomic by default. Even if I add 5,000 elements at once, either all or none of them must be added when an error happens.
  • Persistence must be transparent. The language should support rules how to transform data for a specific storage (file, database) but these should be generic. I don’t want to poison my data model with thousands of annotations.
  • All types must support persistence by default; not being able to be persisted must be the exception.
  • Creating a binary file format must be as simple as defining the XML format.
  • It must have optimizers (which run in the background like garbage collection runs today) that determine how much of the model graph needs to be loaded from a storage.

Related Articles:

  • The Next Best Thing – Series in my blog where I dream about the future of software development

The future of data

23. April, 2010

RFid Data Table from a BBC exhibition (CC: by-nc-nd)

People love to share. They share emotions, affection, information, files and personal data. But they don’t want to share that with everyone. Imagine sharing your bank statements with the IRS. Or that you just bought a very expensive TV set with a burglar. Or that you’re not at home for the next four weeks. Or photos and films of your children with a pedophile.

While people don’t talk about their private life in a public form, they do post it in social networks. They don’t want anyone to have a look at the data on their harddisks but they backup the very same data with online backup services. The line between private and the web is blurring.

Unfortunately, data can’t protect itself, so as soon as you put something online, anyone can see it, copy it, give it to someone else or keep a copy even after you deleted it yourself. The Internet doesn’t forget.

So the obvious solution is that data must become active. It must check who has permission to access it and only reveal its details to people who you have permission. How would that work?

Let’s have a look at ssh. At work, I’m accessing a server and work with an account but I have no idea what the password for that account is. How do I login? With my own credentials. I give a public key to the system administrator and he adds me to the list of people who can login. If he doesn’t want me anymore, he deletes the key from the list and I loose access. He doesn’t know my password and I doesn’t know his.

To achieve the same with data, the data must be encrypted. To decrypt it, users must ask a server for the decrypt key and identify themselves with their public key.

Of course, there are a couple of issues with this approach:

  1. First of all, it will bloat the data and make the processing (much) slower. Well, that might be an issue today but soon, progress will solve that.
  2. Users could decrypt the data once and then keep a decrypted copy. While this is true, is it an issue? First of all, these people had once access, so it will only become an issue if we want to revoke the access. Also, if they don’t backup the data regularly, a hardware failure will solve the problem sooner or later.

    Lastly, we could attach a license to data which disallows to share the decrypted copy with anyone. If anyone did, they could be sued for the license infringement. And let us not forget that most people won’t understand how this all works, so they won’t be able to do it. Plus as long as it works and it comfortable enough to use, they won’t see a reason to do it.

    For those who do understand the technology or want to abuse it, no amount of protection will be enough to stop them. This is why we have laws and courts.

  3. People could loose their data or their password. Happens all the time. But wouldn’t this approach solve both these issues? If all data was encrypted and there were servers to distribute credentials, people would have to remember just a single password for all services. The password could be strong and it could be changed with ease. Web sites could add users based on their public keys (just like ssh or OpenID). And there would be no need to worry about losing data since you could back it up with an online backup service since the encryption would happen before it is backed up.

Comments?