SlideShare a Scribd company logo
1 of 66
Overview of Hybrid Architecture in Project Halo
Jesse Wang, Peter Clark
March 18, 2013
Status of Hybrid
Architecture
Goals, Modularity, Dispatcher, Evaluation




                                        2
Hybrid System Near Term Goals
                                                        CYC
•   Setup the infrastructure to communicate
    with existing reasoners                               AURA
                                                 AURA         TEQA
•   Reliably dispatch questions and collect
    answers                                               CYC
•   Create related tools and resources
        Question generation/selection, answer
                                                          TEQA
    o


        evaluation, report analysis, etc.
•   Experiment ways to choose the answers from      Dispatcher
    available reasoners – as hybrid solver


                                                                 3
Focus Areas of Hybrid Framework (until mid
2013)
  Modularity

  • Loose coupling, high cohesion, data exchange
    protocols

  Dispatching

  • Send requests and handle the responses

  Evaluation

  • Ability to get ratings on answers, and report results


                                                            4
Hybrid System Core Components


                                CYC     TEQA

               Find-A-
                Value
 Chapt 7



              In
                         AURA                  IR?
           Campbell
                                  DirectQA



 Filtered                                            Yellow Outline: New
                                                     or Updated
  Set of
Questions                                            SQs: suggested
                                                     questions
                                                     SQA: QA with
                                                     suggested questions
                                                     TEQA: Textual
                                                     Entailment QA
                                                     IR: Information
                                                     Retrieval

                                                                       5
Infrastructure: Dispatchers

               CYC                   TEQA

     AURA
                                                IR
                     Dispatcher




    Live Single QA    Suggested QA          Batch QA

                                                       6
Dispatcher Features

•   Asynchronous batch mode and single/experiment mode
•   Parallel dispatching to reasoners
    o   Very functional UI: Live progress indicator, view question file, logs
    o   Exception and error handling
           • Retry question when server is busy
•   Batch service can continue to finish even if the client dies
    o   Cancel/stop the batch process also available
•   Input and output support both XML and CSV/TSV formats
    o   Pipeline support: accept Question-Selector input
•   Configurable dispatchers, select reasoners
    o   Collect answers and compute basic statistics


                                                                                7
Question-Answering via Suggested Questions

•   Similar features as Live/Direct QA
•   Aggregate suggested questions’ answers as a solver
•   Unique features:
    o   Interactively browse suggested questions database
    o   Filter on certain facets
    o   Using Q/A concepts, question types, etc. to improve relevance
    o   Automatic comparison of filtered and non-filtered results by chapters




                                                                                8
Question and Answer Handling

•   Handling and parsing reasoner’s returned results
    o   Customized programming
•   Information on execution: details and summary
•   Report generation
    o   Automatic evaluation
•   Question Selector
    o   Support multiple facets/filters
    o   Question banks
    o   Dynamic UI to pick questions
    o   Hidden tags support


                                                       9
Automatic Evaluation: Status as of 2013.3
                                       User overall   AutoEval Overall
    120
•         Automatic result evaluation features
            •   Web UI/service to use
    100
            •   Algorithms to score exact and variable answers
                 – brevity/clarity
    80           – relevance: correctness + completeness
                 – overall score
            •   Generate reports
    60
                 – Summary & details
                 – Graph plot
• 40Improving evaluation result accuracy
            •   Using: basic text processing tricks (stop words, stemming, trigram
    20
                similarity, etc.), location of answer, length of answer, bio concepts, counts
                of concepts, chapters referred, question types, answer type
            •   Experiments and analysis (several rounds, W.I.P.)
     0

                                                                                            10
Hybrid Performance
How we evaluate and how can improve
overall system performance




                                      11
Caveats: Question Generation and Selection

•   Generated by a small group of SMEs (senior biology students)
•   In natural language, without textbook (only syllabus)




                                                                   12
Question Set Facets
                             Question Types
                                                                   Chapter Distribution


                                                                        12     0
                    PROPERTY                                                       4
              HOW
                       5%    WHY
               7%
                              5%                                  11                       5
      HAVE-                    HOW-MANY             WHAT-IS-A
   RELATIONSHIP                   4%                  3%
        7%                          WHERE
                                     5%                           10                       6

                                WHAT-DOES-X-DO
IS-IT-TRUE-THAT                                       HAVE-                            7
                                      3%
       9%                                          SIMILARITIES        9
                                                        2%                     8
                                          Other
                                           9%
                                                     X-OR-Y
                                                       2%


                                                  FUNCTION-OF-X
                                                        1%
                                                      HAVE-                                    E
                                                   DIFFERENCES                                 V
              FIND-A-VALUE                              1%
                  46%




                                                                                               13
Caveat: Evaluation Criteria

•   We provided a clear guideline, but still subjective
    o   A(4.0) = correct, complete answer, no major weakness
    o   B(3.0) = correct, complete answer with small cosmetic issues
    o   C(2.0) = partially correct or complete answers, with some big issues
    o   D(1.0) = somewhat relevant answer or information, or poor presentation
    o   F(0.0) = wrong or irrelevant, conflicting or hard-to-locate answers
•   Only 3 users to rate the answers, under tight timeline
                                                                User Preferences
                                                    3
                                                   2.5
                                                    2
                                                                                      Aura
                                                   1.5
                                                                                      Cyc
                                                     1
                                                                                      Text QA
                                                   0.5
                                                    0
                                                            7          15        23

                                                                                            14
Evaluation Example
Q: What is the maximum number of different atoms a carbon atom can bind at once?




                                                                                   15
More Evaluation Samples (Snapshot)




                                     16
Reasoner Quality Overview
 160
                              Answer Counts Over Rating
 140


 120                                                                                Aura

                                                                                    Cyc
 100
                                                                                    Text QA
 80


 60


 40


 20


  0
       0.00   0.33   0.67   1.00   1.33   1.67   2.00   2.33   2.67   3.00   3.33          3.67   4.00




                                                                                                         17
Performance Number

    Reasoner Performance on                           Reasoner Performance
        All Ratings (0..4)                          on "Good" (>= 3.0) Answers
0.600                                       0.400


                                            0.350
0.500

                                            0.300

0.400
                                            0.250

                                  Aura                                          Aura
0.300                                       0.200
                                  Cyc                                           Cyc
                                  Text QA                                       Text QA
                                            0.150
0.200

                                            0.100
0.100
                                            0.050

0.000
                                            0.000
        Precision   Recall   F1
                                                      Precision   Recall   F1

                                                                                 18
Answers Over Question Types
     Count of Answered Questions                                 Answer Overall Rating
HAVE-RELATIONSHIP
                                                        HAVE-RELATIONSHIP

 HAVE-SIMILARITIES
                                                         HAVE-SIMILARITIES

HAVE-DIFFERENCES                                        HAVE-DIFFERENCES
                                  Text QA
   IS-IT-TRUE-THAT                Cyc                                                                Text QA
                                                           IS-IT-TRUE-THAT
                                  Aura                                                               Cyc
            X-OR-Y
                                                                    X-OR-Y                           Aura
        WHAT-IS-A
                                                                WHAT-IS-A

  WHAT-DOES-X-DO
                                                          WHAT-DOES-X-DO

         PROPERTY
                                                                 PROPERTY

        HOW-MANY
                                                                HOW-MANY

             HOW
                                                                     HOW
                                              36
     FIND-A-VALUE
                                                             FIND-A-VALUE

                     0   5   10          15        20
                                                                         0.00   1.00   2.00   3.00          4.00

                                                                                                                   19
Answer Distribution Over Chapters
   4.00
                                    Answer Quality Over Chapters
                                      Text QA
                                                                          Text QA
   3.50                    Text QA                                                                                   0Aura 4
                                        Cyc
                                                                                                                      Cyc
                                                   Aura                         Cyc
   3.00                                 Aura                                                                          Text QA
                        Cyc                                                                                          5            6
                                                                                             Text QA
   2.50     Text QA              Aura

                     Cyc                                                         Aura
                                                                                                                     7            8
   2.00
                            Aura
                                                                                        Cyc
    1.50                                                                                   Text QA
                                                                                                                     9            10
                                 Aura

    1.00                                Aura
                             Cyc                                                  Cyc
                   Text QA
                                                    Aura         Aura                                                11           12
   0.50                                                                   Cyc     Text QA
                                                                    Cyc
                                                          Cyc
                                                                    Text QA
   0.00                                            Text QA
             0               4                 5             6             7             8              9     10             11         12
  Aura      3.13           3.67            1.83            2.33                                        0.58   1.83          1.00       0.50
  Cyc       1.75           2.17            1.00            1.67                         3.17           1.11   1.83                     2.67
  Text QA   2.21           2.27            1.23            2.67           2.89          1.20           1.28   1.97          2.06       2.50
                                                                                                                                              20
Answers on Questions with E/V Answer Type

                   Exact/Various Answer Count
        50
        40                                    45
                                                    40
        30
                                                         E
        20                25
                                                         V
        10
               5               5         13
         0
                   Aura            Cyc        Text QA




                   Exact/Variou Answer Quality
        3.00
        2.50
        2.00
        1.50                                             E
        1.00                                             V
        0.50
        0.00
                    Aura           Cyc         Text QA



                                                             21
Improve Performance: Hybrid Solver – Combine!

•   Random selector (dumbest, baseline)
    o   Total question answered correctly should beat the best solver

•   Priority selector (less dumb)
    o   Pick reasoner following a good order (e.g. Aura > Cyc > Text QA) *
    o   Expected performance: better than best individual

•   Trained selector: Feature and rule-based selector (smarter)
    o   Decision-Tree (CTree…) learning over Q-Type, Chapter, …
    o   Expected performance: slightly better than above

•   Theoretical best selector: MAX – the upper limit (smartest)
    o   Suppose we can always pick the best performing reasoner


                                                                             22
Performance (F1) with Hybrid Solvers
                      Performance of Solvers
               on Good Answers (Good: Rating >= 3.0)
0.300



0.250



0.200
                                                                   Aura
                                                                   Cyc
                                                                   Text QA
0.150
                                                                   Random
                                                                   Priority

0.100                                                              D-Tree
                                                                   Max


0.050



0.000
        Aura    Cyc   Text QA   Random   Priority   D-Tree   Max

                                                                            24
Conclusion

•   Each reasoner has its own strength and weakness
    o   Some aspects not handled well by AURA & CYC
    o   Low hanging: IS-IT-TRUE-THAT for all, WHAT-IS-A for CYC, …
•   Aggregated performance easily beats the best individual (Text QA)
    o   Random solver does a good job (F1: mean=0.609): F1MAX – F1random ~ 2.5%
•   Little room for better performance via answer selection
    o   F1MAX – F1D-Tree ~ 0.5%
    o   Better focus on MORE and/or BETTER solvers




                                                                             25
Future and Discussions




                     26
Near Future Plans

•   Include SQDB-based answers as a “Solver”
    o   Help alleviate question interpretation problems by reasoners
•   Include Information Retrieval-based answers as a “Solver”
    o   Help understand the extra power reasoners can have over search
•   Improvement evaluation mechanism
•   Extract more features from questions and answers to enable a better
    solver, and see how close we can get to the upper limit (MAX)
•   Improve question selector to support multiple sources and automatic
    update/merge of question metadata
•   Find ways to handle question bank evolution


                                                                          27
Further Technical Directions (2013.6+)

  Get More, Better Reasoners


  Machine learning, Evidence
  combination
  • Extract and use more features to select best answers
  • Evidence collection and weighing

  Analytics & tuning

  • Easier to explore individual results and diagnose failures
  • Support to tune and optimize performance over target
    question-answer datasets

  Inter-solver communication

  • Support shared data, shared answers
  • Subgoaling
    • Allow reasoners to call each other for subgoals


                                                                 28
Open *Data*

   Requirements

  • Clear Semantics, Common Format (standard), Easy to
    Access, Persistent (available)

   Data Sources

  • Questions bank, training sets, knowledge base, protocol for
    intermediate and final data exchange

   Open Data Access Layer

  • Design and implement protocols and services for data I/O

                                                                  29
Open *Services*

   Two Categories

   • Pure machine/algorithms based
   • Human-computation (social, crowd sourcing)

   Requirements

   • Communicate with open data, generate meta data,
   • More reliable, scalable, reusable

   Goal: Process and refine data

   • Convert raw, noisy, inaccurate data 
     refined, structured, useful

                                                       30
Open *Environment*

   Definition

  • AI development environment to facilitate
    collaboration, efficiency and scalability

   Operation

  • like MMPOG, each “player” gets credits: contribution, resource
    consumption; interests, loans; ratings…

   Opportunities

  • self-organized projects, growth potential, encourage
    collaboration, grand prize


                                                                     31
Thank You!
For having the opportunity for Q&A 



Backup slides next




                                       32
IBM Watson’s “DeepQA” Hybrid Architecture




                                            33
DeepQA Answer Merging And Ranking Module




                                           34
Wolfram Alpha Hybrid Architecture

•   Data Curation
•   Computation
•   Linguistic components
•   Presentation




                                    35
36
37
Answer Distribution (Density)

                                                    Answer Distribution
                       16


                       14


                       12
    Count of Answers




                       10


                       8                                                                                            Text QA
                                                                                                                    Cyc
                       6
                                                                                                                    Aura

                       4


                       2


                       0
                        0.00   0.33   0.67   1.00   1.33    1.67   2.00   2.33   2.67   3.00   3.33   3.67   4.00
                                                           Average User Rating



                                                                                                                              38
Data Table for Answer Quality Distribution




                                             39
Work Performed

•   Created web-based dispatcher infrastructure
    o   For both Live Direct QA and Live Suggested Questions
    o   Batch mode to handle larger amount
•   Built a web UI for UW student to rate answers of questions (HEF)
    o   Coherent UI, duplicate removal, queued tasks
•   Established automatic ways for result evaluation and comparison
•   Applied first versions of file exchange format and protocols
•   Employed initial file and data exchange formats and protocols
•   Setup faceted browsing and search (retrieval) UI
    o   And web services for 3rd party consumption
•   Carried out many rounds of relevance studies and analysis


                                                                       40
First Evaluation via Halo Evaluation Framework

•   We sent individual QA result set to UW students for evaluation
•   First round hybrid system evaluation:
    o   Cyc SQA: 9 best (3 ties), 14 good, 15 / 60 answered
    o   Aura QA: 1 best, 9 good, 14/60 answered;
    o   Aura SQA: 4 best (3 ties), 7 good, 8/60 answered
    o   Text QA: 27 best, 29 good; SQA: 3 best, 5 good, 7/60 answered
    o   Best scenario: 41/60 answered
    o   Note: Cyc Live was not included


    o   * SQA (Answering via suggested questions)


                                                                        41
Live Direct QA Dispatcher Service
         What does ribosome make?




                           Ask a question




                           Waiting for answers




                           Answers returned?
                                                 42
Live Suggested QA Dispatcher Service




                                       43
Batch QA Dispatcher Service




                              44
Live solver Service Dispatchers




                                  45
Direct Live QA: What does ribosome make?




                                           46
Direct Live QA: What does ribosome make?




                                           47
Suggested Questions Dispatcher




                                 48
Results for Suggested Question Dispatcher




                                            49
50




Batch Mode QA Dispatcher
Batch QA Progress Bar




                        51
Suggested questions database browser




                                       52
Faceted Search on Suggested Questions




                                        53
Tuning the Suggested Question Recommendation

Accomplished                      Not Yet Implemented
• Indexed suggested questions     • Parsing the questions
  database                        • More experiment (heuristics)
   – Concept, question, answers
                                    on retrieval/ranking criteria
• Created a web service for          – manual
  upload new set of suggested
  questions                       • Get SME generate training
• Extracted chapter information     data to evaluate
  from answer text (TEXT)            – Automatic
• Analyzed question types         • More feature extraction
   – Pattern-based
• Experimented with some basic
  retrieval criteria


                                                                    54
Parsing, Indexing and Ranking

In-place                           NYI
• New local concept extraction     • More sentence features
  service                             – Content type:
                                        Questions, figures, header, reg
• Concept extracted and in index        ular, review…
• Both sentences and paragraphs       – Previous and next concepts
  are in index                        – Count of concepts
• Basic sentence type identified      – Clauses
• Chapter and section                 – Universal truth
  information in                      – Relevance or not
                                   • Question parsing
• Several ways of ranking
  evaluated                        • More refining on ranking
                                   • Learning to Rank ??


                                                                     55
Browse Hybrid system




                       56
WIP: Ranking Experiments (Ablation Study)
 Features                  Only     Without    Only     W/O
                          (Easy)    (Easy)    (Hard)   (Hard)
 Sentence Text            139/201             31/146

 Sentence Concept         79/201              13/146

 Prev/Next Sentence          -                  -
 Concept
 Locality info               -                  -
 (Chapter, etc.)
 Stopword list               -                  -

 Stemming comparison         -                  -

 Other features (type…)      -                  -

 Weighting (variations)
                                                                57
Automatic Evaluation of IR Results

•   Inexpensive, consistent results for tuning
    o   Always using human judgments would be expensive and somehow
        inconsistent
•   Quick turnover
•   With both “easy” and “difficult” question-answer sets
•   Validated by UW students to be trustworthy
    o   95% accuracy on average with threshold




                                                                      58
First UW Students’ Evaluation on AutoEval

•   Notations:
    o   0 = right on. 100% is right, 0% is wrong.
    o   -1 = false positive. It means we gave it a high score (>50%), but the
        retrieved text does NOT contain or imply answer
    o   +1 = false negative. It means we gave it a low score (<50%), but the
        retrieved text actually DOES contain or imply the answer
•   We gave each of 4 students
    o   15 questions, 15*5=75 sentences and scores to rank
    o   5 of the questions are the same, 10 are unique to each student
    o   23/45 questions from “hard” set, 22/45 from “easy” set


                                                                                59
Results: Auto-Evaluation Validity Verification


  1
0.9
                                                    Threshold at 50%
0.8
                                                    Threshold at 80%
0.7
0.6
 0.5
 0.4
 0.3
 0.2
 0.1
      0                             Threshold at 80%

          1
              2                  Threshold at 50%
                    3
                           4




                                                                       60
The “Easy” QA set *

•    Task: automatic evaluate if retrieved sentences contain the answer
•    Scoring: Max score, Mean Average Precision (MAP)
•    Result using Max (with threshold at 80%):
     o   193 regular questions and 8 yes/no questions (via concepts overlap)
            •   Only with sentence text: 139 (69.2%)
            •   Peter’s test set: 149 (74.1%)
            •   Peter’s more refined: 158 (78.6%)
            •   (Lower) Upper bound for IR: 170 (84.2%)
            •   Jesse’s best: ??


    * The evaluation is for IR portion ONLY, no answer pinpointing
                                                                               61
“Easy” QA Set Auto-Evaluation
                                            Result
      0.9

      0.8

      0.7

      0.6

      0.5

      0.4                                                                          Result

      0.3

      0.2

      0.1

       0
            Q text Only   Vulcan Basic   Vulcan Refined   BaseIR   Current Upper
                                                                      Bound




                                                                                            62
Best Upper Bound for Hard Set as of Today

With weighting on Answer Text, Answer Concepts, Question
Text, Question Concepts, matching over Sentence Text, Concepts, and
Concepts from Previous and Next Sentences, and sentence type…
Comparison with keyword overlap, concept overlap, stopwords removal
and smart stemming techniques…




                                                                      64
Sharing the Data and Knowledge

•   Information We Want, and each solver may also want
•   Everyone’s result
•   Everyone’s confidence on results
•   Everyone’s supporting evidence
    o   From textbook sentences, reviews, homework section, figures…
    o   From related web material, e.g. biology WikiPedia
    o   From common world knowledge, ParaPara, WordNet, …
•   Training data – for offline use




                                                                       66
More Timeline Details for First Integration

We are in control                    Partners
• AURA                               • Cyc
   – Now                                – ? Hopefully before EOY 2012
• Text                               • JHU
   – before 12/7                        – ?? Hopefully before EOY 2012
• Vulcan IR Baseline                 • ReVerb
   – before 12/15                       – ??? EOM January 2013
• Initial Hybrid System Output
   – Before 12/21
   – Without unified data format
   – With limited (possibly
     outdated) suggested questions


                                                                        67
Rounds of Improvements
       Infrastructure (module &
       service)
       • Integrate solver
       • Data I/O



            Tricks (algorithms & data)
            • Refine Hybrid Strategy
            • Heuristic + Machine Learning




                 Analysis (evaluation)
                 • Evaluation with humans
                 • With each solver + hybrid system


                                                      68
OpenHalo
                            AURA

                                    SILK
                                     QA
            CYC
            QA
                    Vulcan Hybrid
                       System


                                           Other
                  TEQA                      QA




     Data                Service           Collaboration

                                                           69

More Related Content

Similar to Hybrid system architecture overview

Oracle Application Management Suite
Oracle Application Management SuiteOracle Application Management Suite
Oracle Application Management SuiteOracleVolutionSeries
 
20110812 CyberTAN presentation
20110812 CyberTAN presentation20110812 CyberTAN presentation
20110812 CyberTAN presentationRichard Hsu
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
[DSC Europe 23] Vladimir Ageev - From Tables to Answers: building QA System f...
[DSC Europe 23] Vladimir Ageev - From Tables to Answers: building QA System f...[DSC Europe 23] Vladimir Ageev - From Tables to Answers: building QA System f...
[DSC Europe 23] Vladimir Ageev - From Tables to Answers: building QA System f...DataScienceConferenc1
 
Sledgehammer to Fine Brush for QA
Sledgehammer to Fine Brush for QASledgehammer to Fine Brush for QA
Sledgehammer to Fine Brush for QAShelley Lambert
 
Visualizing content in metadata stores
Visualizing content in metadata storesVisualizing content in metadata stores
Visualizing content in metadata storesXavier Llorà
 
Faster apps. faster time to market. faster mean time to repair
Faster apps. faster time to market. faster mean time to repairFaster apps. faster time to market. faster mean time to repair
Faster apps. faster time to market. faster mean time to repairCompuware ASEAN
 
Testing in an Open Source Middleware Platform Space The WSO2 Way.
Testing in an Open Source Middleware Platform Space  The WSO2 Way.Testing in an Open Source Middleware Platform Space  The WSO2 Way.
Testing in an Open Source Middleware Platform Space The WSO2 Way.WSO2
 
Visionbi Quality Gates
Visionbi Quality GatesVisionbi Quality Gates
Visionbi Quality GatesRam Yonish
 
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...Stuart Wrigley
 
High performance database applications with pure query and ibm data studio.ba...
High performance database applications with pure query and ibm data studio.ba...High performance database applications with pure query and ibm data studio.ba...
High performance database applications with pure query and ibm data studio.ba...Vladimir Bacvanski, PhD
 
Performance Engineering Case Study V1.0
Performance Engineering Case Study    V1.0Performance Engineering Case Study    V1.0
Performance Engineering Case Study V1.0sambitgarnaik
 
Acumen Fuse & Customer Case Study
Acumen Fuse & Customer Case StudyAcumen Fuse & Customer Case Study
Acumen Fuse & Customer Case StudyAcumen
 
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...HostedbyConfluent
 
Top100summit christina
Top100summit christinaTop100summit christina
Top100summit christinaChristina Geng
 
Elements of a Test Framework
Elements of a Test FrameworkElements of a Test Framework
Elements of a Test FrameworkSmartBear
 
Accelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab Management
Accelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab ManagementAccelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab Management
Accelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab ManagementBIOVIA
 
NG BB 34 Analysis of Variance (ANOVA)
NG BB 34 Analysis of Variance (ANOVA)NG BB 34 Analysis of Variance (ANOVA)
NG BB 34 Analysis of Variance (ANOVA)Leanleaders.org
 

Similar to Hybrid system architecture overview (20)

Oracle Application Management Suite
Oracle Application Management SuiteOracle Application Management Suite
Oracle Application Management Suite
 
20110812 CyberTAN presentation
20110812 CyberTAN presentation20110812 CyberTAN presentation
20110812 CyberTAN presentation
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
[DSC Europe 23] Vladimir Ageev - From Tables to Answers: building QA System f...
[DSC Europe 23] Vladimir Ageev - From Tables to Answers: building QA System f...[DSC Europe 23] Vladimir Ageev - From Tables to Answers: building QA System f...
[DSC Europe 23] Vladimir Ageev - From Tables to Answers: building QA System f...
 
Sledgehammer to Fine Brush for QA
Sledgehammer to Fine Brush for QASledgehammer to Fine Brush for QA
Sledgehammer to Fine Brush for QA
 
Visualizing content in metadata stores
Visualizing content in metadata storesVisualizing content in metadata stores
Visualizing content in metadata stores
 
Faster apps. faster time to market. faster mean time to repair
Faster apps. faster time to market. faster mean time to repairFaster apps. faster time to market. faster mean time to repair
Faster apps. faster time to market. faster mean time to repair
 
Testing in an Open Source Middleware Platform Space The WSO2 Way.
Testing in an Open Source Middleware Platform Space  The WSO2 Way.Testing in an Open Source Middleware Platform Space  The WSO2 Way.
Testing in an Open Source Middleware Platform Space The WSO2 Way.
 
Visionbi Quality Gates
Visionbi Quality GatesVisionbi Quality Gates
Visionbi Quality Gates
 
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Tech...
 
High performance database applications with pure query and ibm data studio.ba...
High performance database applications with pure query and ibm data studio.ba...High performance database applications with pure query and ibm data studio.ba...
High performance database applications with pure query and ibm data studio.ba...
 
Performance Engineering Case Study V1.0
Performance Engineering Case Study    V1.0Performance Engineering Case Study    V1.0
Performance Engineering Case Study V1.0
 
Acumen Fuse & Customer Case Study
Acumen Fuse & Customer Case StudyAcumen Fuse & Customer Case Study
Acumen Fuse & Customer Case Study
 
PraveenResumeNewL
PraveenResumeNewLPraveenResumeNewL
PraveenResumeNewL
 
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
 
Top100summit christina
Top100summit christinaTop100summit christina
Top100summit christina
 
Elements of a Test Framework
Elements of a Test FrameworkElements of a Test Framework
Elements of a Test Framework
 
Accelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab Management
Accelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab ManagementAccelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab Management
Accelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab Management
 
Lafauci dv club oct 2006
Lafauci dv club oct 2006Lafauci dv club oct 2006
Lafauci dv club oct 2006
 
NG BB 34 Analysis of Variance (ANOVA)
NG BB 34 Analysis of Variance (ANOVA)NG BB 34 Analysis of Variance (ANOVA)
NG BB 34 Analysis of Variance (ANOVA)
 

More from Jesse Wang

Agile lean workshop
Agile lean workshopAgile lean workshop
Agile lean workshopJesse Wang
 
Big data analytic platform
Big data analytic platformBig data analytic platform
Big data analytic platformJesse Wang
 
Social shopping with semantic power
Social shopping with semantic powerSocial shopping with semantic power
Social shopping with semantic powerJesse Wang
 
Smart datamining semtechbiz 2013 report
Smart datamining semtechbiz 2013 reportSmart datamining semtechbiz 2013 report
Smart datamining semtechbiz 2013 reportJesse Wang
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commonsJesse Wang
 
Experiment on Knowledge Acquisition
Experiment on Knowledge AcquisitionExperiment on Knowledge Acquisition
Experiment on Knowledge AcquisitionJesse Wang
 
Chinese New Year
Chinese New Year Chinese New Year
Chinese New Year Jesse Wang
 
SemTech 2012 Talk semantify office
SemTech 2012 Talk  semantify officeSemTech 2012 Talk  semantify office
SemTech 2012 Talk semantify officeJesse Wang
 
Building SMWCon Spring 2012 Site
Building SMWCon Spring 2012 SiteBuilding SMWCon Spring 2012 Site
Building SMWCon Spring 2012 SiteJesse Wang
 
SMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev UpdateSMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev UpdateJesse Wang
 
SMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome RemarksSMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome RemarksJesse Wang
 
Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)Jesse Wang
 
Msra talk smw+apps
Msra talk smw+appsMsra talk smw+apps
Msra talk smw+appsJesse Wang
 
Jist tutorial semantic wikis and applications
Jist tutorial   semantic wikis and applicationsJist tutorial   semantic wikis and applications
Jist tutorial semantic wikis and applicationsJesse Wang
 
Semantic Wiki Page Maker
Semantic Wiki Page MakerSemantic Wiki Page Maker
Semantic Wiki Page MakerJesse Wang
 
Facets of applied smw
Facets of applied smwFacets of applied smw
Facets of applied smwJesse Wang
 
Smwcon widget editor - first preview
Smwcon widget editor - first previewSmwcon widget editor - first preview
Smwcon widget editor - first previewJesse Wang
 
Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011Jesse Wang
 
Smwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawikiSmwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawikiJesse Wang
 
Semantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in ActionSemantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in ActionJesse Wang
 

More from Jesse Wang (20)

Agile lean workshop
Agile lean workshopAgile lean workshop
Agile lean workshop
 
Big data analytic platform
Big data analytic platformBig data analytic platform
Big data analytic platform
 
Social shopping with semantic power
Social shopping with semantic powerSocial shopping with semantic power
Social shopping with semantic power
 
Smart datamining semtechbiz 2013 report
Smart datamining semtechbiz 2013 reportSmart datamining semtechbiz 2013 report
Smart datamining semtechbiz 2013 report
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Experiment on Knowledge Acquisition
Experiment on Knowledge AcquisitionExperiment on Knowledge Acquisition
Experiment on Knowledge Acquisition
 
Chinese New Year
Chinese New Year Chinese New Year
Chinese New Year
 
SemTech 2012 Talk semantify office
SemTech 2012 Talk  semantify officeSemTech 2012 Talk  semantify office
SemTech 2012 Talk semantify office
 
Building SMWCon Spring 2012 Site
Building SMWCon Spring 2012 SiteBuilding SMWCon Spring 2012 Site
Building SMWCon Spring 2012 Site
 
SMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev UpdateSMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev Update
 
SMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome RemarksSMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome Remarks
 
Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)
 
Msra talk smw+apps
Msra talk smw+appsMsra talk smw+apps
Msra talk smw+apps
 
Jist tutorial semantic wikis and applications
Jist tutorial   semantic wikis and applicationsJist tutorial   semantic wikis and applications
Jist tutorial semantic wikis and applications
 
Semantic Wiki Page Maker
Semantic Wiki Page MakerSemantic Wiki Page Maker
Semantic Wiki Page Maker
 
Facets of applied smw
Facets of applied smwFacets of applied smw
Facets of applied smw
 
Smwcon widget editor - first preview
Smwcon widget editor - first previewSmwcon widget editor - first preview
Smwcon widget editor - first preview
 
Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011
 
Smwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawikiSmwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawiki
 
Semantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in ActionSemantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in Action
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Hybrid system architecture overview

  • 1. Overview of Hybrid Architecture in Project Halo Jesse Wang, Peter Clark March 18, 2013
  • 2. Status of Hybrid Architecture Goals, Modularity, Dispatcher, Evaluation 2
  • 3. Hybrid System Near Term Goals CYC • Setup the infrastructure to communicate with existing reasoners AURA AURA TEQA • Reliably dispatch questions and collect answers CYC • Create related tools and resources Question generation/selection, answer TEQA o evaluation, report analysis, etc. • Experiment ways to choose the answers from Dispatcher available reasoners – as hybrid solver 3
  • 4. Focus Areas of Hybrid Framework (until mid 2013) Modularity • Loose coupling, high cohesion, data exchange protocols Dispatching • Send requests and handle the responses Evaluation • Ability to get ratings on answers, and report results 4
  • 5. Hybrid System Core Components CYC TEQA Find-A- Value Chapt 7 In AURA IR? Campbell DirectQA Filtered Yellow Outline: New or Updated Set of Questions SQs: suggested questions SQA: QA with suggested questions TEQA: Textual Entailment QA IR: Information Retrieval 5
  • 6. Infrastructure: Dispatchers CYC TEQA AURA IR Dispatcher Live Single QA Suggested QA Batch QA 6
  • 7. Dispatcher Features • Asynchronous batch mode and single/experiment mode • Parallel dispatching to reasoners o Very functional UI: Live progress indicator, view question file, logs o Exception and error handling • Retry question when server is busy • Batch service can continue to finish even if the client dies o Cancel/stop the batch process also available • Input and output support both XML and CSV/TSV formats o Pipeline support: accept Question-Selector input • Configurable dispatchers, select reasoners o Collect answers and compute basic statistics 7
  • 8. Question-Answering via Suggested Questions • Similar features as Live/Direct QA • Aggregate suggested questions’ answers as a solver • Unique features: o Interactively browse suggested questions database o Filter on certain facets o Using Q/A concepts, question types, etc. to improve relevance o Automatic comparison of filtered and non-filtered results by chapters 8
  • 9. Question and Answer Handling • Handling and parsing reasoner’s returned results o Customized programming • Information on execution: details and summary • Report generation o Automatic evaluation • Question Selector o Support multiple facets/filters o Question banks o Dynamic UI to pick questions o Hidden tags support 9
  • 10. Automatic Evaluation: Status as of 2013.3 User overall AutoEval Overall 120 • Automatic result evaluation features • Web UI/service to use 100 • Algorithms to score exact and variable answers – brevity/clarity 80 – relevance: correctness + completeness – overall score • Generate reports 60 – Summary & details – Graph plot • 40Improving evaluation result accuracy • Using: basic text processing tricks (stop words, stemming, trigram 20 similarity, etc.), location of answer, length of answer, bio concepts, counts of concepts, chapters referred, question types, answer type • Experiments and analysis (several rounds, W.I.P.) 0 10
  • 11. Hybrid Performance How we evaluate and how can improve overall system performance 11
  • 12. Caveats: Question Generation and Selection • Generated by a small group of SMEs (senior biology students) • In natural language, without textbook (only syllabus) 12
  • 13. Question Set Facets Question Types Chapter Distribution 12 0 PROPERTY 4 HOW 5% WHY 7% 5% 11 5 HAVE- HOW-MANY WHAT-IS-A RELATIONSHIP 4% 3% 7% WHERE 5% 10 6 WHAT-DOES-X-DO IS-IT-TRUE-THAT HAVE- 7 3% 9% SIMILARITIES 9 2% 8 Other 9% X-OR-Y 2% FUNCTION-OF-X 1% HAVE- E DIFFERENCES V FIND-A-VALUE 1% 46% 13
  • 14. Caveat: Evaluation Criteria • We provided a clear guideline, but still subjective o A(4.0) = correct, complete answer, no major weakness o B(3.0) = correct, complete answer with small cosmetic issues o C(2.0) = partially correct or complete answers, with some big issues o D(1.0) = somewhat relevant answer or information, or poor presentation o F(0.0) = wrong or irrelevant, conflicting or hard-to-locate answers • Only 3 users to rate the answers, under tight timeline User Preferences 3 2.5 2 Aura 1.5 Cyc 1 Text QA 0.5 0 7 15 23 14
  • 15. Evaluation Example Q: What is the maximum number of different atoms a carbon atom can bind at once? 15
  • 16. More Evaluation Samples (Snapshot) 16
  • 17. Reasoner Quality Overview 160 Answer Counts Over Rating 140 120 Aura Cyc 100 Text QA 80 60 40 20 0 0.00 0.33 0.67 1.00 1.33 1.67 2.00 2.33 2.67 3.00 3.33 3.67 4.00 17
  • 18. Performance Number Reasoner Performance on Reasoner Performance All Ratings (0..4) on "Good" (>= 3.0) Answers 0.600 0.400 0.350 0.500 0.300 0.400 0.250 Aura Aura 0.300 0.200 Cyc Cyc Text QA Text QA 0.150 0.200 0.100 0.100 0.050 0.000 0.000 Precision Recall F1 Precision Recall F1 18
  • 19. Answers Over Question Types Count of Answered Questions Answer Overall Rating HAVE-RELATIONSHIP HAVE-RELATIONSHIP HAVE-SIMILARITIES HAVE-SIMILARITIES HAVE-DIFFERENCES HAVE-DIFFERENCES Text QA IS-IT-TRUE-THAT Cyc Text QA IS-IT-TRUE-THAT Aura Cyc X-OR-Y X-OR-Y Aura WHAT-IS-A WHAT-IS-A WHAT-DOES-X-DO WHAT-DOES-X-DO PROPERTY PROPERTY HOW-MANY HOW-MANY HOW HOW 36 FIND-A-VALUE FIND-A-VALUE 0 5 10 15 20 0.00 1.00 2.00 3.00 4.00 19
  • 20. Answer Distribution Over Chapters 4.00 Answer Quality Over Chapters Text QA Text QA 3.50 Text QA 0Aura 4 Cyc Cyc Aura Cyc 3.00 Aura Text QA Cyc 5 6 Text QA 2.50 Text QA Aura Cyc Aura 7 8 2.00 Aura Cyc 1.50 Text QA 9 10 Aura 1.00 Aura Cyc Cyc Text QA Aura Aura 11 12 0.50 Cyc Text QA Cyc Cyc Text QA 0.00 Text QA 0 4 5 6 7 8 9 10 11 12 Aura 3.13 3.67 1.83 2.33 0.58 1.83 1.00 0.50 Cyc 1.75 2.17 1.00 1.67 3.17 1.11 1.83 2.67 Text QA 2.21 2.27 1.23 2.67 2.89 1.20 1.28 1.97 2.06 2.50 20
  • 21. Answers on Questions with E/V Answer Type Exact/Various Answer Count 50 40 45 40 30 E 20 25 V 10 5 5 13 0 Aura Cyc Text QA Exact/Variou Answer Quality 3.00 2.50 2.00 1.50 E 1.00 V 0.50 0.00 Aura Cyc Text QA 21
  • 22. Improve Performance: Hybrid Solver – Combine! • Random selector (dumbest, baseline) o Total question answered correctly should beat the best solver • Priority selector (less dumb) o Pick reasoner following a good order (e.g. Aura > Cyc > Text QA) * o Expected performance: better than best individual • Trained selector: Feature and rule-based selector (smarter) o Decision-Tree (CTree…) learning over Q-Type, Chapter, … o Expected performance: slightly better than above • Theoretical best selector: MAX – the upper limit (smartest) o Suppose we can always pick the best performing reasoner 22
  • 23. Performance (F1) with Hybrid Solvers Performance of Solvers on Good Answers (Good: Rating >= 3.0) 0.300 0.250 0.200 Aura Cyc Text QA 0.150 Random Priority 0.100 D-Tree Max 0.050 0.000 Aura Cyc Text QA Random Priority D-Tree Max 24
  • 24. Conclusion • Each reasoner has its own strength and weakness o Some aspects not handled well by AURA & CYC o Low hanging: IS-IT-TRUE-THAT for all, WHAT-IS-A for CYC, … • Aggregated performance easily beats the best individual (Text QA) o Random solver does a good job (F1: mean=0.609): F1MAX – F1random ~ 2.5% • Little room for better performance via answer selection o F1MAX – F1D-Tree ~ 0.5% o Better focus on MORE and/or BETTER solvers 25
  • 26. Near Future Plans • Include SQDB-based answers as a “Solver” o Help alleviate question interpretation problems by reasoners • Include Information Retrieval-based answers as a “Solver” o Help understand the extra power reasoners can have over search • Improvement evaluation mechanism • Extract more features from questions and answers to enable a better solver, and see how close we can get to the upper limit (MAX) • Improve question selector to support multiple sources and automatic update/merge of question metadata • Find ways to handle question bank evolution 27
  • 27. Further Technical Directions (2013.6+) Get More, Better Reasoners Machine learning, Evidence combination • Extract and use more features to select best answers • Evidence collection and weighing Analytics & tuning • Easier to explore individual results and diagnose failures • Support to tune and optimize performance over target question-answer datasets Inter-solver communication • Support shared data, shared answers • Subgoaling • Allow reasoners to call each other for subgoals 28
  • 28. Open *Data* Requirements • Clear Semantics, Common Format (standard), Easy to Access, Persistent (available) Data Sources • Questions bank, training sets, knowledge base, protocol for intermediate and final data exchange Open Data Access Layer • Design and implement protocols and services for data I/O 29
  • 29. Open *Services* Two Categories • Pure machine/algorithms based • Human-computation (social, crowd sourcing) Requirements • Communicate with open data, generate meta data, • More reliable, scalable, reusable Goal: Process and refine data • Convert raw, noisy, inaccurate data  refined, structured, useful 30
  • 30. Open *Environment* Definition • AI development environment to facilitate collaboration, efficiency and scalability Operation • like MMPOG, each “player” gets credits: contribution, resource consumption; interests, loans; ratings… Opportunities • self-organized projects, growth potential, encourage collaboration, grand prize 31
  • 31. Thank You! For having the opportunity for Q&A  Backup slides next 32
  • 32. IBM Watson’s “DeepQA” Hybrid Architecture 33
  • 33. DeepQA Answer Merging And Ranking Module 34
  • 34. Wolfram Alpha Hybrid Architecture • Data Curation • Computation • Linguistic components • Presentation 35
  • 35. 36
  • 36. 37
  • 37. Answer Distribution (Density) Answer Distribution 16 14 12 Count of Answers 10 8 Text QA Cyc 6 Aura 4 2 0 0.00 0.33 0.67 1.00 1.33 1.67 2.00 2.33 2.67 3.00 3.33 3.67 4.00 Average User Rating 38
  • 38. Data Table for Answer Quality Distribution 39
  • 39. Work Performed • Created web-based dispatcher infrastructure o For both Live Direct QA and Live Suggested Questions o Batch mode to handle larger amount • Built a web UI for UW student to rate answers of questions (HEF) o Coherent UI, duplicate removal, queued tasks • Established automatic ways for result evaluation and comparison • Applied first versions of file exchange format and protocols • Employed initial file and data exchange formats and protocols • Setup faceted browsing and search (retrieval) UI o And web services for 3rd party consumption • Carried out many rounds of relevance studies and analysis 40
  • 40. First Evaluation via Halo Evaluation Framework • We sent individual QA result set to UW students for evaluation • First round hybrid system evaluation: o Cyc SQA: 9 best (3 ties), 14 good, 15 / 60 answered o Aura QA: 1 best, 9 good, 14/60 answered; o Aura SQA: 4 best (3 ties), 7 good, 8/60 answered o Text QA: 27 best, 29 good; SQA: 3 best, 5 good, 7/60 answered o Best scenario: 41/60 answered o Note: Cyc Live was not included o * SQA (Answering via suggested questions) 41
  • 41. Live Direct QA Dispatcher Service What does ribosome make? Ask a question Waiting for answers Answers returned? 42
  • 42. Live Suggested QA Dispatcher Service 43
  • 43. Batch QA Dispatcher Service 44
  • 44. Live solver Service Dispatchers 45
  • 45. Direct Live QA: What does ribosome make? 46
  • 46. Direct Live QA: What does ribosome make? 47
  • 48. Results for Suggested Question Dispatcher 49
  • 49. 50 Batch Mode QA Dispatcher
  • 52. Faceted Search on Suggested Questions 53
  • 53. Tuning the Suggested Question Recommendation Accomplished Not Yet Implemented • Indexed suggested questions • Parsing the questions database • More experiment (heuristics) – Concept, question, answers on retrieval/ranking criteria • Created a web service for – manual upload new set of suggested questions • Get SME generate training • Extracted chapter information data to evaluate from answer text (TEXT) – Automatic • Analyzed question types • More feature extraction – Pattern-based • Experimented with some basic retrieval criteria 54
  • 54. Parsing, Indexing and Ranking In-place NYI • New local concept extraction • More sentence features service – Content type: Questions, figures, header, reg • Concept extracted and in index ular, review… • Both sentences and paragraphs – Previous and next concepts are in index – Count of concepts • Basic sentence type identified – Clauses • Chapter and section – Universal truth information in – Relevance or not • Question parsing • Several ways of ranking evaluated • More refining on ranking • Learning to Rank ?? 55
  • 56. WIP: Ranking Experiments (Ablation Study) Features Only Without Only W/O (Easy) (Easy) (Hard) (Hard) Sentence Text 139/201 31/146 Sentence Concept 79/201 13/146 Prev/Next Sentence - - Concept Locality info - - (Chapter, etc.) Stopword list - - Stemming comparison - - Other features (type…) - - Weighting (variations) 57
  • 57. Automatic Evaluation of IR Results • Inexpensive, consistent results for tuning o Always using human judgments would be expensive and somehow inconsistent • Quick turnover • With both “easy” and “difficult” question-answer sets • Validated by UW students to be trustworthy o 95% accuracy on average with threshold 58
  • 58. First UW Students’ Evaluation on AutoEval • Notations: o 0 = right on. 100% is right, 0% is wrong. o -1 = false positive. It means we gave it a high score (>50%), but the retrieved text does NOT contain or imply answer o +1 = false negative. It means we gave it a low score (<50%), but the retrieved text actually DOES contain or imply the answer • We gave each of 4 students o 15 questions, 15*5=75 sentences and scores to rank o 5 of the questions are the same, 10 are unique to each student o 23/45 questions from “hard” set, 22/45 from “easy” set 59
  • 59. Results: Auto-Evaluation Validity Verification 1 0.9 Threshold at 50% 0.8 Threshold at 80% 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Threshold at 80% 1 2 Threshold at 50% 3 4 60
  • 60. The “Easy” QA set * • Task: automatic evaluate if retrieved sentences contain the answer • Scoring: Max score, Mean Average Precision (MAP) • Result using Max (with threshold at 80%): o 193 regular questions and 8 yes/no questions (via concepts overlap) • Only with sentence text: 139 (69.2%) • Peter’s test set: 149 (74.1%) • Peter’s more refined: 158 (78.6%) • (Lower) Upper bound for IR: 170 (84.2%) • Jesse’s best: ?? * The evaluation is for IR portion ONLY, no answer pinpointing 61
  • 61. “Easy” QA Set Auto-Evaluation Result 0.9 0.8 0.7 0.6 0.5 0.4 Result 0.3 0.2 0.1 0 Q text Only Vulcan Basic Vulcan Refined BaseIR Current Upper Bound 62
  • 62. Best Upper Bound for Hard Set as of Today With weighting on Answer Text, Answer Concepts, Question Text, Question Concepts, matching over Sentence Text, Concepts, and Concepts from Previous and Next Sentences, and sentence type… Comparison with keyword overlap, concept overlap, stopwords removal and smart stemming techniques… 64
  • 63. Sharing the Data and Knowledge • Information We Want, and each solver may also want • Everyone’s result • Everyone’s confidence on results • Everyone’s supporting evidence o From textbook sentences, reviews, homework section, figures… o From related web material, e.g. biology WikiPedia o From common world knowledge, ParaPara, WordNet, … • Training data – for offline use 66
  • 64. More Timeline Details for First Integration We are in control Partners • AURA • Cyc – Now – ? Hopefully before EOY 2012 • Text • JHU – before 12/7 – ?? Hopefully before EOY 2012 • Vulcan IR Baseline • ReVerb – before 12/15 – ??? EOM January 2013 • Initial Hybrid System Output – Before 12/21 – Without unified data format – With limited (possibly outdated) suggested questions 67
  • 65. Rounds of Improvements Infrastructure (module & service) • Integrate solver • Data I/O Tricks (algorithms & data) • Refine Hybrid Strategy • Heuristic + Machine Learning Analysis (evaluation) • Evaluation with humans • With each solver + hybrid system 68
  • 66. OpenHalo AURA SILK QA CYC QA Vulcan Hybrid System Other TEQA QA Data Service Collaboration 69

Editor's Notes

  1. We’ve been debating to see if it is necessary to evaluate a separate Information Retrieval module for comparison purpose – see how well an Information Retrieval-based module can do as a baseline and how much better we can add on to it – our value added.