Jazoon 2012: IBM Watson since Jeopardy!

29. June, 2012

From the summary:

In February 2011, IBM demonstrated its latest Research breakthroughs in natural language processing and deep question answering. Named Watson, it made history when it was entered into the famously complex US television quiz-show ‘Jeopardy!‘ where it comfortably beat two of the greatest human players ever to appear on the show. Since then, work has focused on bringing these breakthroughs to real-world problems.

If you haven’t seen the video, now is a good time: Episode 1, Episode 2, Episode 3

Before the show, Watson was trained with data from a variety of sources, including Wikipedia and dbpedia. The software is able to process both unstructured and structured data and learn from it. That means it converts the data into an internal representation that its various answer finding modules can then use. These modules include classic AI inferencing algorithms as well as Lucene based full-text search modules.

This is basically what makes Watson different: Instead of relying on a single, one-size-fits-all strategy, Watson uses many different strategies and each of them returns a “result” where a result consists of an answer and a “confidence” that this answer might be useful or correct.

Instead of mapping all the confidence values to a predefined range, each module can return any number for confidence. So some modules return values between 0 and 1, others from -1 to 1 and yet others return values between +/-∞ (including both). The trick is that Watson uses an extensive training session to learn how to weigh the outputs of the different modules. To do this, the correct answers for a large set of questions is necessary.

Which makes Jeopardy! such a perfect fit: They have accumulated the correct answers for thousands of questions that were asked during the show and that made it so “easy” to train Watson automatically because IBM engineers could debug the answering process when Watson erred.

But Watson isn’t about winning TV shows. The current goal is to turn Watson into a tool that can be used by doctors around the world to identify illnesses. Today, doctors work so many hours per week that they can only read a tiny fraction of all the articles that are published. Surveys show that 81% of doctors read less than 5h/month. One solution would be to hire more doctors. Guess what that would mean for costs in the health sector.

Or we could make Watson read all that and present all that blabla in a compressed form when the symptoms match. Think Google where you don’t know what you’re looking for.

Sounds good? Or frightening? Some people in the audience were thinking “Skynet” but here are some facts that you should know:

  • In health care, Watson is a “medical device”. These are heavily regulated.
  • The goal is not to have “Dr. Watson.” The goal is to give doctors a smart library, not a smart ass or something that can directly make unsupervised decisions about the next therapy step.
  • IBM isn’t developing the product alone. They are working with companies from the health care sector who know how doctors (should) work. You might want to see this video: Watson Computer Comes to the University of Maryland and have a look at this channel: IBMWatsonSolutions
  • Privacy is an important concern. Watson will see millions of medical records. There are pretty strict laws governing this (HIPAA)
  • Watson isn’t a data warehouse. It won’t process all the medical records into one huge data set which it can query. Instead, doctors will enter symptoms in a standardized way and Watson will present a list of things to check plus medical conditions that match.
  • For training, Watson needs a huge list of correct answers. It doesn’t try to find patterns by itself.

So unlike Skynet, Watson is much more like a boring tool. Sorry.

One very interesting aspect is that Watson is something that you won’t buy as a product. Instead, it’s probably going to be a cloud service which charges, say, per question.

Other fields where it would be useful:

  • Justice. Who has time to read all the laws and regulations that the government ships all the time?
  • Legislation
  • Engineering
  • Research in chemistry, physics and genetics

Related:


Back from JaZOOn, Second Day

26. June, 2007

Well, modern medicine worked it’s usual miracle and my brain was much less clogged today. I went to the keynotes but left a bit disappointed. The history of the web and REST was nice to see but my interest in the past is usually reduced to use it as a source for cynical comments about mistakes that bite us today, and there wasn’t much in it for me in that regard. The second talk just contained nothing that I didn’t knew already. Well, you can’t always win.

Next, I went to see a software demonstration (Automated (J)Unit Testing) but I had seen that one before so I left early and attended Hibernate Search: Unstructured Search for Hibernate instead. The group around Emmanuel Bernard managed to extend the query API of Hibernate for Apache Lucene. Nice work, easy to use, looks promising. If you have a web application which allows users to search for something, this is definitively something you should try. Like Google, you can offer a single text field and the search results will be ranked in an intelligent way. Cool.

After lunch, I enjoyed the The Zen of jMaki. They have started to collect all and every JavaScript Web widget set out there, wrapped all of them in the same way, so they get much more simple to use. I don’t like JSP’s and tag-libraries but they have done a nice job and the demos looked real enough to believe that this can actually help.

In the same room, I watched David Nuescheler Blitzing the Content Repository: AJAX meets JCR. He developed a little JavaScript library called “R-JAX” which allows to create something that resembles CRUD with a JCR and a few lines of HTML. Since you can access the Content Repository via HTTP, all you need to do is to copy all files (JavaScript, HTML, CSS, etc.) into the repository and then make sure you use the right (relative) URLs and you were ready to go. This JCR stuff also looks very interesting. I hope I’ll find the time to have a closer look at Apache Jackrabbit one of these days.

Of course, when you do a lot of AJAX, you need to test it somehow. Ed Burns held the talk Java Platform Automated Testing of Ajax Applications where he compared four different tools to do this (some commercial, some OSS) and Webclient a.k.a. MCP (Mozilla Control Program) which allows to embed a web browser in a Java program and control it from a unit test (so you can load a web page, examine it, check AJAX requests, etc). GWT gets you only so far with their own testing framework (especially since it’s insane to setup and some things (like UI elements) can’t be tested at all. MCP solves all that but you have to deploy the webapp somewhere. Choose your poison.

Right now, MCP can only run Firefox (but they are working on getting at least IE on Windows). It would be nice to see the same integration on Linux using the IEs4Linux project. You did know that you can run IE on Linux, didn’t you? Not that anyone ever wanted (except for those web pages which stubbornly refuse to display correctly in Firefox … and for those, who insist on Flash 123.5 which will come for Linux in 2150 … but who needs them anyway).

The next talk was obvious: Java and Scripting: One VM, Many Languages. Rags Srinivas (with hat!) showed us around the Java Scripting API. Pretty low level presentation with little new information. I had hoped for more meat here. The only interesting he mentioned was that Sun doesn’t really care about dynamic languages per se. They care that as many of them as possibly run on the Java VM but not the languages themselves. That probably explains the strange maneuvering in the last months: Hiring key Ruby developers, working on standardizing Groovy (JSR 241 and then suddenly JavaFX is the Next Great Thing(TM). Actually, JavaFX just seems to be another building block in a growing forest (some would say swamp) of dynamic languages flourishing around Java.

Smells a lot like .NET (one runtime, any language you like) and probably makes sense. There are so many common problems (Singletons, DB access, HTML generation, mixing HTML and Java) which you can’t really do well in Java but perfectly well in other languages which don’t (have to) drag the Java legacy along. Java is ten years old, now, and it begins to show. GC was a fantastic new feature when Java came out, but today, every contender for the language of the next decade can do that. In Java, Beans, lists, maps and other, important types and concepts are second class citizens. To create a simple list and sort it, you have to write ten lines of code. In Groovy, you write:

def list = ['a', 1, 'b']

1 is of course turned into an Integer. Try that in Java 5 and the vital information, the data in the list, is drowned in syntax to club the compiler into silence:

import java.util.Arrays;
import java.util.List;

public class Foo {
    List list = Arrays.asList (new Object[] { 'a', 1, 'b' });
}

The sad part is that I had to start Eclipse to make sure that the syntax is correct. The Java code is six times as long and only 1/6th of that is actual information. The rest is only there to make the compiler happy. 😦

Back to Jazoon. I would have loved to attent the BOF’s, especially the ones registered by Neil M. Gafter about Java Closures and something else (I forgot) but I still wasn’t too well and didn’t want to risk to have to miss the last two days.

All in all, I enjoyed this day. My thanks go to the JUGS guys for organizing it.