rhu | My two cents on Watson

First of all, this is a demonstration, not an experiment. And of course it's not fair; "Jeopardy!" contestants at this level know almost all of the answers, and it's really all about who buzzes in first. A computer will almost always buzz in first. Game over. And I'm surprised by how many people seem to think that the contestants don't get to read the question onscreen as soon as it's revealed.

So to everyone who's whining about this being neither an honest experiment nor a level playing field, my advice is to sit back and relax; enjoy the show.

But it's been interesting to look at Watson's mistakes. Getting "finis" instead of "terminus" is well within the realm of mistakes that a human player might make; so too "chic" instead of "class".

But "Toronto" was interesting for two reasons. First, that was an eminently fair question, as demonstrated by the fact that Ken, Brad, and I (heh) all got it right. Yet it involved too many levels of indirection for Watson's pattern-matching to make any headway. Here's a case where I would have loved to have the top three answers visible. And second, it was a classic example of where the computer's answer was not even in the correct category; this is the kind of mistake that makes smug humans laugh but it's actually far more impressive that Watson's filtering usually works. (This was one of the things that the Nova episode last week went into.) [Edited to add: the principal researcher at IBM explains that because FJ categories in particular are often misleading, Watson downplays the importance of the category title in weighing its responses. The question could just as well have been "In 1897, Boston was the first U.S. city to build one of these" --- the response to which would not have been the name of a U.S. city.]

It was a nice touch that the programmers adopted the "Jeopardy!" conventions of adding a bunch of question marks at the end to indicate uncertainty, just as they did with the "I'm going to have to guess" on the Daily Double where Watson's confidence level was low.

The Daily Double wagers were also interesting. Watson, like a human player, assessed the current scores and estimated its odds of success, and then computed a wager that maximized its expected utility. The only difference is that humans can carry one and a half significant digits in their heads, and Watson didn't have that limitation. So a bet of $947 makes perfect sense, and if it could have gotten more precise by wagering cents and fractional cents I'm sure it would have done so. [Edited to add: IBM explains the betting subsystem]

One point where I was impressed was when Watson correctly pronounced "Jean Valjean." I was also glad to see they got the Roman numerals bug fixed so that "Henry VIII" was said correctly; I do wonder whether they put in the exception for "Malcolm X". (I also wonder if my GPS would correctly handle "Houston St.", but that's another matter.)

I also was amused that Watson's preferred answer for "reinstate" was "reinstate 2" --- this, of course, is because the clue's "To bring back someone to his original function or position" matches "reinstate 2 : to restore to a proper condition : replace in an original or equivalent state" in the MW unabridged (or the equivalent in whatever dictionary they're using).

Overall, IBM got their money's worth. They've clearly demonstrated that a bank of System 7s running their very sophisticated software can effectively data mine a large data set that is not very structured, and can quickly assign confidence values to its results.

What I'd love to see is a similar demonstration -- roughly based on the "Jeopardy!" format, except that all three contestants respond to every clue, and perhaps with each clue being scored with a daily-double-like wager -- with Watson versus equivalent systems from Google and Microsoft.

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Most Popular Tags

"what i believe" - 22 uses
about dad - 24 uses
ask lj - 32 uses
bible bystanders - 14 uses
biking - 16 uses
bookbinding - 18 uses
books - 80 uses
daf - 24 uses
dafcast - 48 uses
drowsy chaperone - 10 uses
dvar torah - 9 uses
fonts - 8 uses
food - 28 uses
funny - 12 uses
geekitude - 23 uses
genealogy - 20 uses
humor - 40 uses
israel - 48 uses
jcds - 8 uses
judaism - 342 uses
kabbalat shabbat - 40 uses
kids - 166 uses
kvelling - 14 uses
kvetches - 71 uses
language - 15 uses
links - 88 uses
liturgy - 22 uses
mit mystery hunt - 20 uses
mourning - 26 uses
music - 158 uses
newton - 10 uses
panda magazine - 18 uses
parody - 8 uses
pdz - 18 uses
pesach - 34 uses
pesachim - 10 uses
politics - 34 uses
project runway - 8 uses
puzzles - 428 uses
quotes - 10 uses
review - 36 uses
reviews - 95 uses
siddur - 93 uses
silly - 79 uses
software - 14 uses
talmud - 28 uses
typography - 12 uses
vacation - 10 uses
voice lessons - 14 uses
work - 10 uses

Flat | Top-Level Comments Only

From:

mabfan.livejournal.com

Thank you for this cogent analysis. Nomi figured out why it said "reinstate 2," and I was among the rest of the humans amused by Watson's choice of wagers and incorrect Final Jeopardy! answer.

qaqaq.livejournal.com

My guess was that it was the second superscript definition of REINSTATE in a given dictionary? I wondered if the show would have taken "reinstate two" as a correct response.

crs.livejournal.com

I was glad that they at least made the computer use a physical buzzer. But I do wonder if they made the computer have to guess the right timing to activate its buzzer based on sound data, or if they just gave it an input signal "buzzer active now"...

530nm330hz.livejournal.com

The latter.

soph.livejournal.com

Remember that the humans on the show have the same "buzzer active now" signal; lights around the game board will light up when the buzzer is active. So it's not like Watson really has an advantage in that regard.

[edit: Oh, I should explain my appearance here. I found this page when I was Googling to find out why Watson had "reinstate 2" as its top answer on one of the questions, and this post answered it well! So thank you

530nm330hz, and I'm sorry if my appearance here wasn't wanted. I temporarily forgot I wasn't viewing an LJ post from one of my friends. :D]

Edited Date: 2011-02-18 11:45 am (UTC)

Welcome! I'm always happy to make new acquaintances via Google.

But I disagree with your first point. Watson was given an unambiguous signal of when to buzz in, and could respond with a fixed and small delay to it. Human players don't wait for the light to go on; they are listening to the pace of Alex reading the question (or so Ken Jennings explained on NPR last night, and I've heard this from friends who have been on the show) and anticipating when the light will come on --- which is why sometimes humans buzz in early and are locked out, which Watson will never do, and sometimes humans buzz in late and miss their chance to answer.

So Watson *does* have an advantage, and pressed it, and won because of it. Which is fine with me --- if Watson had been unable to come up with the correct response in a matter of seconds, it would not have been able to make use of its advantage in buzzing in, so it doesn't diminish IBM's achievement to note this.

diceytillerman.livejournal.com

Just for my own curiousity, which version of Les Miserables were they using as Jean Valjean's context? Book, movie, musical?

It was in the category "Literary APB"; I think all the books they used have also been made into either a movie or a stage production, in some cases both, but I'd call the context "books".

Thanks! That's interesting.

Brainripples

My two cents on Watson

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

Profile

January 2013

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags