From TREC to Watson: is open domain question answering a solved problem?

ConstantinOrasan Research Group in Computational Linguistics, University of Wolverhampton, UK http://www.wlv.ac.uk/~in6093/ From TREC to Watson: is open domain question answering a solved problem?

Structure of the talk 4 July 2011 Constantin Orasan - KEPT 2011 Brief introduction to QA Video 1: Where are we now – IBM Watson The structure of a QA system Video 2: Watson vs. humans Overview of Watson QA from the point of view of users/companies Conclusions

Information overload 4 July 2011 Constantin Orasan - KEPT 2011 “Getting information off the Internet is like taking a drink from a fire hydrant” Mitchell Kapor

What is question answering? 4 July 2011 Constantin Orasan - KEPT 2011 A way to address the problem of information overload Question answering aims at identifying the answer to a question posed in natural languagein a large collection of documents The information provided by QA is more focused than information retrieval The output can be the exact answer or a text snippet which contains the answer The domain took off as a result of the introduction of QA track in TREC, whilst cross-lingual QA as a result of CLEF

Types of QA systems 4 July 2011 Constantin Orasan - KEPT 2011 ,[object Object],+ can potentially answer any question - very low accuracy (especially in cross-lingual settings) ,[object Object],+ very little language processing necessary - limited to the answers in the database ,[object Object],+ very high accuracy - can require extensive language processing and limited to one domain

Evolution of QA domain 4 July 2011 Constantin Orasan - KEPT 2011 Early QA systems date as back as 1960s and were mainly front ends to databases had limited usability Open-domain QA emerged as a result of the increasing amount of data available to answer a question need to find and extract the answer developed last 1990s as a result of the QA track at Text REtrieval Conferences emphasis on factoid questions, but other types of questions were also explored CLEF competitions have encouraged development of cross-lingual systems.

Where are we now? 4 July 2011 Constantin Orasan - KEPT 2011 IBM and the Jeopardy Challenge Jeopardy! is an American quiz show where participants are given clues and need to guess the question (e.g. if the clue is The Father of Our Country; he didn't really chop down a cherry tree the contestant would respond Who is George Washington?) Watson is a QA system developed by IBM http://www.youtube.com/watch?v=FC3IryWr4c8

Structure of an open domain QA system 4 July 2011 Constantin Orasan - KEPT 2011 A typical open domain QA system consists of: Question processor Document processor Answer extractor (and validation) Can have components for cross-lingual processing Has access to several external resources

Question processor 4 July 2011 Constantin Orasan - KEPT 2011 Produces an interpretation of the question Determines the Question Type (e.g. factoid, definition, procedure, etc.) Determines the Expected Answer Type (EAT) On the basis of the question it produces a query Determines syntactic and semantic relations between the words from the questions Expands the query with synonyms May perform translation of the keywords in the query in the case of cross-lingual QA

Expected answer type calculation 4 July 2011 Constantin Orasan - KEPT 2011 Relies on the existence of an answer type taxonomy This taxonomy can be made open-domain by linking to general ontologies such as WordNet The EAT can be determined using rule-based as well as machine learning approaches Who is the president of Romania? Where is Paris? Knowledge of domain can greatly improve the identification of EAT and help deal with ambiguities

Query formulation 4 July 2011 Constantin Orasan - KEPT 2011 Produces a query from the question As a list of keywords As a list of phrases Identifies entities present in the question Produce variants of the query by introducing morphological, lexical and semantic variations Domain knowledge is very important for identification of entities and generation of valid variations and vital in cross-lingual scenarios

Document processing 4 July 2011 Constantin Orasan - KEPT 2011 Uses the query produced in the previous step to retrieve paragraphs which may contain the answer It is largely domain independent as it relies on text retrieval engines Ranks results, but this is largely independent of the QA task For limited collections of texts it is possible to enrich the index with various linguistic information which can help further processing When the domain is known, characteristics of the input files can improve the retrieval (e.g. presence of metadata)

Answer extraction 4 July 2011 Constantin Orasan - KEPT 2011 Uses a variety of techniques to identify the answer of a question The answer should have the type of EAT Very often rely on previously created patterns (e.g. When was the telephone invented? can be answered if there is a sentence that matches the pattern The telephone was invented in <date>), Many patterns can express the same answer (e.g. the telephone, invented in <date>) Relations identified in the question between the expected answer and entities from the question can be exploited by patterns

Answer extraction (II) 4 July 2011 Constantin Orasan - KEPT 2011 Potential answers are ranked according to functions which are usually learned from the data The ranking and validation of answers can be done using external sources such as the Internet QA for well defined domains can rely on better patterns The functions learned usually work well only on the type of data used for training

Open domain QA - evaluation 4 July 2011 Constantin Orasan - KEPT 2011 Great coverage, but low accuracy For example: EPHYRA QA system in TRAC 2007 reports an accuracy of 0.20 for factoid questions (Schlaefer et al. 2007) OpenEphyra was used for a cross-lingual Romanian – English QA system and we obtained 0.11 accuracy for factoid questions (Dornescu et al. 2008) – the best performing system for all cross-lingual QA tasks in CLEF 2008 The results are not directly comparable (different QA engines, tuned differently, different collections, different tasks) But does it make sense to do open domain question answering?

How did Watson perform? 4 July 2011 Constantin Orasan - KEPT 2011 http://www.youtube.com/watch?v=Puhs2LuO3Zc

How was this achieved? 4 July 2011 Constantin Orasan - KEPT 2011 Starting point the Practical Intelligent Question Answering Technology (PIQUANT) developed by IBM to participate in TREC Has been under development at IBM for more than 6 years by a team of 4 full time researchers Was one of the top three to five in many TRECs PIQUANT was performing around 0.33 on the TREC data PIQUANT used a standard architecture for QA

How was this achieved? (II) 4 July 2011 Constantin Orasan - KEPT 2011 Lots of extra work was put in the system: a core team of 20 researchers working for almost 4 years PIQUANT system was enriched with a large number of modules for language processing The processing was parallelised heavily Lots of components were developed to deal with specific problems (lots of experts) Watson tries to combine deep and shallow knowledge Had access to large data sets and very good hardware

Overview of Watson’s structure 4 July 2011 Constantin Orasan - KEPT 2011

Hardware used 4 July 2011 Constantin Orasan - KEPT 2011 Watson is a workload optimized system designed for complex analytics, made possible by integrating massively parallel POWER7 processors and the IBM DeepQA software to answer Jeopardy! questions in under three seconds. Watson is made up of a cluster of ninety IBM Power 750 servers (plus additional I/O, network and cluster controller nodes in 10 racks) with a total of 2880 POWER7 processor cores and 16 Terabytes of RAM. Each Power 750 server uses a 3.5 GHz POWER7 eight core processor, with four threads per core. The POWER7 processor's massively parallel processing capability is an ideal match for Watson's IBM DeepQA software which is embarrassingly parallel (that is a workload that is easily split up into multiple parallel tasks). According to John Rennie, Watson can process 500 gigabytes, the equivalent of a million books, per second. IBM's master inventor and senior consultant Tony Pearson estimated Watson's hardware cost at about $3 million and with 80 TeraFLOPs would be placed 94th on the Top 500 Supercomputers list. From: http://en.wikipedia.org/wiki/Watson_(computer)

Speed of answer 4 July 2011 Constantin Orasan - KEPT 2011 In Jeopardy! an answer needs to be provided in 3-5 seconds In initial experiments with running Watson on a single processor an answer was obtained in about 2 hours The system was implemented using Apache UIMA Asynchronous Scaleout Massively parallel architecture Indexes used to answer the questions had to be pre-processed using Hadoop

Watson was not only NLP 4 July 2011 Constantin Orasan - KEPT 2011 Betting strategyhttp://www.youtube.com/watch?v=vA9aqAd2iso

To sum up, Watson is: 4 July 2011 Constantin Orasan - KEPT 2011 An amazing engineering project A massive investment Research in many domains of NLP A big PR stunt A way to improve the IBM position in text analytics But it is not really a technology ready to be deployed But was it a real progress in open-domain QA?

So is open domain QA a solved problem? Can we really solve open domain QA? Do we really need open domain QA? Do we care?

QA from user perspective ,[object Object]

Can rarely be formulated in one go

Do not always contain answers from only one source

Have access to previously asked questions

Most of them cannot afford to invest millions of dollars ,[object Object]

The QALL-ME project demonstrators in domain of tourism – can answer questions in the domain of cinema/movies and accommodation. E.g. What movies can I see in Wolverhampton this week? How can I get to Novotel Hotel, Wolverhampton? the questions can be asked in any of the four languages in the consortium small scale demonstrator built for Romanian

QALL-ME framework 4 July 2011 Constantin Orasan - KEPT 2011

The QALL-ME ontology 4 July 2011 Constantin Orasan - KEPT 2011 All the reasoning and processing is done using a domain ontology The ontology also provides the means of achieving cross-lingual QA Determines the way data is stored in the database Ontologies need to be developed for each domain

30 Part of the tourism ontology

Evaluation of the QALL-ME prototype 4 July 2011 Constantin Orasan - KEPT 2011 For the cinema domain the accuracy ranged between 60% to 85% depending on the language The system was tested on real questions posed by the users which were completely independent from the ones used to develop the system The error were mainly caused by wrongly identified named entities, missing patterns and mistakes of the entailment engine In an commercial environment this system can be revised every day in order to obtain much higher performance

Closed domain QA for commercial companies 4 July 2011 Constantin Orasan - KEPT 2011 Closed domain QA has a certain appeal with companies These companies normally have large databases of questions and answers from customers The domain can be very clearly defined In some cases the systems needed are actually canned QA systems

From TREC to Watson: is open domain question answering a solved problem?

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a From TREC to Watson: is open domain question answering a solved problem?

Semelhante a From TREC to Watson: is open domain question answering a solved problem? (20)

Mais de Constantin Orasan

Mais de Constantin Orasan (7)

Último

Último (20)

From TREC to Watson: is open domain question answering a solved problem?