Jazoon 2012: IBM Watson since Jeopardy!

29. June, 2012

From the summary:

In February 2011, IBM demonstrated its latest Research breakthroughs in natural language processing and deep question answering. Named Watson, it made history when it was entered into the famously complex US television quiz-show ‘Jeopardy!‘ where it comfortably beat two of the greatest human players ever to appear on the show. Since then, work has focused on bringing these breakthroughs to real-world problems.

If you haven’t seen the video, now is a good time: Episode 1, Episode 2, Episode 3

Before the show, Watson was trained with data from a variety of sources, including Wikipedia and dbpedia. The software is able to process both unstructured and structured data and learn from it. That means it converts the data into an internal representation that its various answer finding modules can then use. These modules include classic AI inferencing algorithms as well as Lucene based full-text search modules.

This is basically what makes Watson different: Instead of relying on a single, one-size-fits-all strategy, Watson uses many different strategies and each of them returns a “result” where a result consists of an answer and a “confidence” that this answer might be useful or correct.

Instead of mapping all the confidence values to a predefined range, each module can return any number for confidence. So some modules return values between 0 and 1, others from -1 to 1 and yet others return values between +/-∞ (including both). The trick is that Watson uses an extensive training session to learn how to weigh the outputs of the different modules. To do this, the correct answers for a large set of questions is necessary.

Which makes Jeopardy! such a perfect fit: They have accumulated the correct answers for thousands of questions that were asked during the show and that made it so “easy” to train Watson automatically because IBM engineers could debug the answering process when Watson erred.

But Watson isn’t about winning TV shows. The current goal is to turn Watson into a tool that can be used by doctors around the world to identify illnesses. Today, doctors work so many hours per week that they can only read a tiny fraction of all the articles that are published. Surveys show that 81% of doctors read less than 5h/month. One solution would be to hire more doctors. Guess what that would mean for costs in the health sector.

Or we could make Watson read all that and present all that blabla in a compressed form when the symptoms match. Think Google where you don’t know what you’re looking for.

Sounds good? Or frightening? Some people in the audience were thinking “Skynet” but here are some facts that you should know:

  • In health care, Watson is a “medical device”. These are heavily regulated.
  • The goal is not to have “Dr. Watson.” The goal is to give doctors a smart library, not a smart ass or something that can directly make unsupervised decisions about the next therapy step.
  • IBM isn’t developing the product alone. They are working with companies from the health care sector who know how doctors (should) work. You might want to see this video: Watson Computer Comes to the University of Maryland and have a look at this channel: IBMWatsonSolutions
  • Privacy is an important concern. Watson will see millions of medical records. There are pretty strict laws governing this (HIPAA)
  • Watson isn’t a data warehouse. It won’t process all the medical records into one huge data set which it can query. Instead, doctors will enter symptoms in a standardized way and Watson will present a list of things to check plus medical conditions that match.
  • For training, Watson needs a huge list of correct answers. It doesn’t try to find patterns by itself.

So unlike Skynet, Watson is much more like a boring tool. Sorry.

One very interesting aspect is that Watson is something that you won’t buy as a product. Instead, it’s probably going to be a cloud service which charges, say, per question.

Other fields where it would be useful:

  • Justice. Who has time to read all the laws and regulations that the government ships all the time?
  • Legislation
  • Engineering
  • Research in chemistry, physics and genetics

Related:


Jazoon 2012: Large scale testing in an Agile world

28. June, 2012

Alan Ogilvie is working at a division of IBM responsible for testing IBM’s Java SE product. Some numbers from his presentation:

  • A build for testing is about 500MB (takes 17 min to download to a test machine)
  • There are 20 different versions (AIX, Linux, Windows, z/OS * x86, power, zSeries)
  • The different teams create 80..200 builds every day
  • The tests run on heaps from 32MB to 500GB
  • They use hardware with 1 to 128+ cores
  • 4 GC policies
  • More than 1000 different combinations of command line options
  • Some tests have to be repeated a lot of time to catch “1 out of 100” failures that happen only very rarely

That amounts to millions of test cases that run every month.

1% of them fail.

To tame this beast, the team uses two approaches:

  1. Automated failure analysis that can match error messages from the test case to known bugs
  2. Not all of the tests are run every time

The first approach makes sure that most test failures can be handled automatically. If some test is there to trigger a known bug, that shouldn’t take any time from a human – unless the test suddenly succeeds.

The second approach is more interesting: They run only a small fraction of the tests every time the test suite is started. How can that possibly work?

If you run a test today and it succeeds, you will have some confidence that it still works today. You’re not 100% sure but, well, maybe 99.5%. So you might skip this test today and mark it as “light green” in the test results (as opposed to “full green” for a test that has been run this time).

What about the next day? You’re still 98% sure. And the day after that? Well, our confidence is waning fast, so we’re still pretty sure – 90%.

The same goes for tests that fail. Unless someone did something about them (and requested that this specific test is run again), you can be pretty sure that the test would fail again. So it gets light red unlike the tests that failed today.

This way, most tests only have to be run once every 4-5 days during development.

Why would they care?

For a release, all tests need to be run. That takes three weeks.

They really can’t possibly run all tests all the time.


At last: Filing patents is been patented!

11. January, 2011

Just before the end of last year, a gaping hole has been closed in the struggle to turn the world in a lawyer’s playground: IBM has filed a patent that patents filing patents.

Whenever you apply for a new patent, you’ll have to pay royalties to IBM! It’s like the invention of the self-printing money! Well done! 🙂

References: