SlideShare uma empresa Scribd logo
1 de 16
Fusion 2010  13 th  International Conference on Information Fusion  EICC, Edinburgh, UK  Thursday, 29 July 2010  Current Approaches to Automated Information Evaluation and their Applicability to Priority Intelligence Requirement Answering
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],VIStology | FUSION 2010 | Edinburgh www.vistology.com
Overview ,[object Object],[object Object],[object Object],[object Object],www.vistology.com VIStology | FUSION 2010 | Edinburgh
Priority Intelligence Requirements: Doctrine ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],VIStology | FUSION 2010 | Edinburgh www.vistology.com
NATO STANAG 2022
Question-Answering Technologies  by Source Data Format VIStology | FUSION 2010 | Edinburgh www.vistology.com Information Source Format Familiar Application Advanced Application Tables (Relational DBs, Spreadsheets) Structured Query Language (SQL) Wolfram Alpha (Mathematica) Text Web Search Engines (Google, Yahoo!, Ask) Systems from AQUAINT (IC) competition;  IBM Watson Tagged Text Google Patent Search Metacarta; Palantir Logic Statements Prolog Powerset (acquired by MS Bing); Cyc Trusted Teammates Personal Communication Yahoo! Answers; Vark (acquired by Google); US Army Intelligence Knowledge  Network Shoutbox
Structured Data Q-A: Wolfram Alpha www.vistology.com VIStology | FUSION 2010 | Edinburgh Wolfram Alpha identifies Tupelo as where Elvis was born (Elvis disambiguated as Elvis Presley) and provides map overlay and additional info, like current city population.  Reference sources listed by title on another screen, no access to source data. Query “Where was Elvis born?” automatically translated to Mathematica query: Elvis Presley, place of birth.
Text: Google VIStology | FUSION 2010 | Edinburgh www.vistology.com Google PageRank disambiguates query: Elvis = Elvis Presley by PageRank. Top-ranked snippets can easily be scanned  for consensus answer from independent sources: Tupelo, MS.  PageRank less useful in MI context because reports are not hyperlinked.
Text-Based Q-A: IBM Watson www.vistology.com VIStology | FUSION 2010 | Edinburgh IBM’s text-based algorithms identified these phrases as top potential “Jeopardy” answer, with scores displayed. In “Jeopardy”, answer is in form of question. Query in “Jeopardy” format (including category “Musical Pastiche”)
Tagged Text: Metacarta VIStology | FUSION 2010 | Edinburgh www.vistology.com Query identifies documents that contain “elvis”, “born” and a location.  Answers literally all over the map.  Consensus answer not obvious from location clusters.  Documents are recent  news articles. Query: “Where was Elvis born?”
Logic-Based Q-A: Powerset VIStology | FUSION 2010 | Edinburgh www.vistology.com Answers involve multiple “Elvises”. Source data is Wikipedia only.
Social Question-Answering: Vark VIStology | FUSION 2010 | Edinburgh www.vistology.com Routed to unknown user in my ‘network’ computed as likely to provide answer; Answer returned in less than minute. Optimized for mobile environment. Feedback  Vark queries need to be over certain length, hence this phrasing.
Comparison by Technology www.vistology.com VIStology | FUSION 2010 | Edinburgh STANAG Requirement Tables: Wolfram Alpha Text: Google IBM Watson Tagged Text: Metacarta Palantir Logic Statements: Powerset Teammates: Vark Y! Answers Source Wolfram: Reference document title (no url) URL of document in which info appears (usually: not Watson).  No further attempt to match info to source.  I.e. not: 1000 demonstrators  according to police.  Teammate known.  May not say where info originates. Source Reliability Curated data:  Reference works,  Government data. Centrality measures: Google PageRank (eigenvector centrality); Technorati Authority (inlink centrality); VIStology blogger authority (centrality + engagement) Curated data: Wikipedia. Wikipedia has PageRank: 9 out of 10 (reliable)  Track record,  Reputation. Votes on answers. Longevity. # of answers Source Independence No.  One unified datastore. Duplicate document detection; Explicit source tracking (href; bit.ly); Leskovec meme tracking. SNA metrics of independence. No.  Single data source. User Authentication. Information Credibility Partial Integrity constraints. Can’t easily  verify info.  Consensus answers; same answer identified in multiple distinct sources. Could check  integrity constraints;  URI co-ref a problem. Contradictions halt inference. Demonstrated area of expertise
Research Gaps ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],www.vistology.com VIStology | FUSION 2010 | Edinburgh
Conclusions ,[object Object],[object Object],[object Object],www.vistology.com VIStology | FUSION 2010 | Edinburgh
Thank You ,[object Object],[object Object],[object Object],VIStology | FUSION 2010 | Edinburgh www.vistology.com

Mais conteúdo relacionado

Destaque

Sources of information
Sources of informationSources of information
Sources of informationshemilore
 
Unit 5 - Sources of Information
Unit 5 - Sources of InformationUnit 5 - Sources of Information
Unit 5 - Sources of InformationRobbieA
 
Business Information Sources 1
Business Information Sources 1Business Information Sources 1
Business Information Sources 1Sabina Cisek
 
Types of information sources
Types of information sourcesTypes of information sources
Types of information sourceseheilman1
 
What are Information Sources?
What are Information Sources?What are Information Sources?
What are Information Sources?Johan Koren
 
Media and Information Literacy (MIL) - 5. Media and Information Sources
Media and Information Literacy (MIL) - 5. Media and Information SourcesMedia and Information Literacy (MIL) - 5. Media and Information Sources
Media and Information Literacy (MIL) - 5. Media and Information SourcesArniel Ping
 
Sources of information
Sources of informationSources of information
Sources of informationmcarrwmcc
 
Personal SWOT for Teachers
Personal SWOT for TeachersPersonal SWOT for Teachers
Personal SWOT for Teachersm nagaRAJU
 

Destaque (10)

Urban Analytics & Information Fusion with CityGML
Urban Analytics & Information Fusion with CityGMLUrban Analytics & Information Fusion with CityGML
Urban Analytics & Information Fusion with CityGML
 
Sources of information
Sources of informationSources of information
Sources of information
 
Unit 5 - Sources of Information
Unit 5 - Sources of InformationUnit 5 - Sources of Information
Unit 5 - Sources of Information
 
Business Information Sources 1
Business Information Sources 1Business Information Sources 1
Business Information Sources 1
 
Types of information sources
Types of information sourcesTypes of information sources
Types of information sources
 
What are Information Sources?
What are Information Sources?What are Information Sources?
What are Information Sources?
 
Media and Information Literacy (MIL) - 5. Media and Information Sources
Media and Information Literacy (MIL) - 5. Media and Information SourcesMedia and Information Literacy (MIL) - 5. Media and Information Sources
Media and Information Literacy (MIL) - 5. Media and Information Sources
 
Sources of information
Sources of informationSources of information
Sources of information
 
Wikispaces Tutorial
Wikispaces TutorialWikispaces Tutorial
Wikispaces Tutorial
 
Personal SWOT for Teachers
Personal SWOT for TeachersPersonal SWOT for Teachers
Personal SWOT for Teachers
 

Current Approaches to Automated Information Evaluation and their Applicability to Priority Intelligence Requirement Answering

  • 1. Fusion 2010 13 th International Conference on Information Fusion EICC, Edinburgh, UK Thursday, 29 July 2010 Current Approaches to Automated Information Evaluation and their Applicability to Priority Intelligence Requirement Answering
  • 2.
  • 3.
  • 4.
  • 6. Question-Answering Technologies by Source Data Format VIStology | FUSION 2010 | Edinburgh www.vistology.com Information Source Format Familiar Application Advanced Application Tables (Relational DBs, Spreadsheets) Structured Query Language (SQL) Wolfram Alpha (Mathematica) Text Web Search Engines (Google, Yahoo!, Ask) Systems from AQUAINT (IC) competition; IBM Watson Tagged Text Google Patent Search Metacarta; Palantir Logic Statements Prolog Powerset (acquired by MS Bing); Cyc Trusted Teammates Personal Communication Yahoo! Answers; Vark (acquired by Google); US Army Intelligence Knowledge Network Shoutbox
  • 7. Structured Data Q-A: Wolfram Alpha www.vistology.com VIStology | FUSION 2010 | Edinburgh Wolfram Alpha identifies Tupelo as where Elvis was born (Elvis disambiguated as Elvis Presley) and provides map overlay and additional info, like current city population. Reference sources listed by title on another screen, no access to source data. Query “Where was Elvis born?” automatically translated to Mathematica query: Elvis Presley, place of birth.
  • 8. Text: Google VIStology | FUSION 2010 | Edinburgh www.vistology.com Google PageRank disambiguates query: Elvis = Elvis Presley by PageRank. Top-ranked snippets can easily be scanned for consensus answer from independent sources: Tupelo, MS. PageRank less useful in MI context because reports are not hyperlinked.
  • 9. Text-Based Q-A: IBM Watson www.vistology.com VIStology | FUSION 2010 | Edinburgh IBM’s text-based algorithms identified these phrases as top potential “Jeopardy” answer, with scores displayed. In “Jeopardy”, answer is in form of question. Query in “Jeopardy” format (including category “Musical Pastiche”)
  • 10. Tagged Text: Metacarta VIStology | FUSION 2010 | Edinburgh www.vistology.com Query identifies documents that contain “elvis”, “born” and a location. Answers literally all over the map. Consensus answer not obvious from location clusters. Documents are recent news articles. Query: “Where was Elvis born?”
  • 11. Logic-Based Q-A: Powerset VIStology | FUSION 2010 | Edinburgh www.vistology.com Answers involve multiple “Elvises”. Source data is Wikipedia only.
  • 12. Social Question-Answering: Vark VIStology | FUSION 2010 | Edinburgh www.vistology.com Routed to unknown user in my ‘network’ computed as likely to provide answer; Answer returned in less than minute. Optimized for mobile environment. Feedback Vark queries need to be over certain length, hence this phrasing.
  • 13. Comparison by Technology www.vistology.com VIStology | FUSION 2010 | Edinburgh STANAG Requirement Tables: Wolfram Alpha Text: Google IBM Watson Tagged Text: Metacarta Palantir Logic Statements: Powerset Teammates: Vark Y! Answers Source Wolfram: Reference document title (no url) URL of document in which info appears (usually: not Watson). No further attempt to match info to source. I.e. not: 1000 demonstrators according to police. Teammate known. May not say where info originates. Source Reliability Curated data: Reference works, Government data. Centrality measures: Google PageRank (eigenvector centrality); Technorati Authority (inlink centrality); VIStology blogger authority (centrality + engagement) Curated data: Wikipedia. Wikipedia has PageRank: 9 out of 10 (reliable) Track record, Reputation. Votes on answers. Longevity. # of answers Source Independence No. One unified datastore. Duplicate document detection; Explicit source tracking (href; bit.ly); Leskovec meme tracking. SNA metrics of independence. No. Single data source. User Authentication. Information Credibility Partial Integrity constraints. Can’t easily verify info. Consensus answers; same answer identified in multiple distinct sources. Could check integrity constraints; URI co-ref a problem. Contradictions halt inference. Demonstrated area of expertise
  • 14.
  • 15.
  • 16.

Notas do Editor

  1. Wolfram Alpha identifies Tupelo as the place of Elvis’ birth (Elvis disambiguated as Elvis Presley) and provides additional information on the city.. Reference sources by title, not easily checked. Add box.
  2. Elvis disambiguated as “Elvis Presley” by PageRank. Consensus answer apparent by inspection. Highest ranking document doesn’t contain answer in its snippet.
  3. http://www.nytimes.com/interactive/2010/06/16/magazine/watson-trivia-game.html?scp=3&sq=ibm%20watson&st=cse
  4. (Partial) geographic overlay tied to trailing month archive of news articles. Matches are documents with a location that contain “Elvis” and “born”. At least one document contains the correct answer, but many false hits, including an article about an Elvis-loving Episcopal priest in Alaska. Hits in France, Spain, Haiti, etc. Somewhat denser around Tupelo, but not enough to indicate answer clearly.
  5. Elvis disambiguated to several Elvises. Birthplaces highlighted in each by Powerset. Uses Wikipedia data only.
  6. Correct answer routed to a self-identified Elvis expert (assumed Elvis = Elvis Presley) and correct answer returned in less than a minute. Feedback can be provided “Was Gregory’s answer helpful”? Yes, Kind of, but not for me. No. Question phrased this way because questions have to be over a certain length.
  7. Green means an automated solution exists; Yellow means solution is partial or not wholly automated (requires human judgment). Red means no automated solution.