SlideShare uma empresa Scribd logo
1 de 58
Baixar para ler offline
Watson UIMA Pablo Summary 
Apache UIMA and the Watson Jeopardy!TM 
System 
Big Data Montreal Meetup 
Pablo Ariel Duboue 
Les Laboratoires Foulab 
999 Rue du College 
Montreal, H4C 2S3, Quebec 
February 4th 2014 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Outline 
Watson 
Jeopardy!TM 
Approach 
Apache Unstructured Information Management Architecture 
Advantages 
Mini-Tutorial 
UIMA Asynchronous Scale-out (Low-latency) 
My Own Personal Contributions 
To Watson 
After Watson 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Jeopardy!TM 
Outline 
Watson 
Jeopardy!TM 
Approach 
Apache Unstructured Information Management Architecture 
Advantages 
Mini-Tutorial 
UIMA Asynchronous Scale-out (Low-latency) 
My Own Personal Contributions 
To Watson 
After Watson 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Jeopardy!TM 
Problem 
from http://en.wikipedia.org/wiki/Jeopardy UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Jeopardy!TM 
Example Questions 
Category: "J.P." 
He played Duke Washburn, Curly’s twin brother, in 
"City Slickers II". 
I Answer: Jack Palance 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Jeopardy!TM 
About the Speaker 
I am passionate about improving society through language 
technology and split my time between teaching, doing 
research and contributing to free software projects 
I Columbia University 
I Natural Language Generation 
I Thesis: “Indirect Supervised Learning of Strategic 
Generation Logic”, defended Jan. 2005. 
I IBM Research Watson 
I Question Answering 
I Deep QA - Watson 
I Independent Research, living in Montreal 
I Collaboration with Universite de Montreal 
I Free Software projects and consulting for small companies 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Jeopardy!TM 
The Challenges of a Research Team 
I Blazzingly fast development 
I Quick experimental turn-around is not a “nice to have” 
feature, it is key 
I Dead code 
I Lack of documentation 
I Reproducibility of results 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Jeopardy!TM 
Architecture 
Incremental progress from June 2007 to November 2010, from Ferrucci (2012) 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Jeopardy!TM 
The Challenges of a Grand Challenge 
I Very expensive. 
I Constantly on the verge of being canceled. 
I Plenty of issues beyond the control of the research team. 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Approach 
Outline 
Watson 
Jeopardy!TM 
Approach 
Apache Unstructured Information Management Architecture 
Advantages 
Mini-Tutorial 
UIMA Asynchronous Scale-out (Low-latency) 
My Own Personal Contributions 
To Watson 
After Watson 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Approach 
Approach 
I Keep all options open till the end 
I Do not overcommit 
I Propose candidate answers by perfoming searches 
I Gather supporting evindence by performing a search for 
each candidate answer (!) 
I Analyze all this wealth of information in parallel 
I Central scoring and ranking using Machine Learning 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Approach 
Architecture 
DeepQA Architecture, from Ferrucci (2012) 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Approach 
Components Descriptions 
Question Analysis. Extract keywords, assign to known classes, 
expand entities. 
Primary Search. Obtain a set of documents relevant to the 
question. 
Candidate Answer Generation. Extract from the documents 
candidate answers. 
Evidence Retrieval and Scoring. Fetch passages (sentences) 
containing the candidate answers and relevant 
keywords, then score the candidates in context. 
Final Confidence Merging. Apply a trained model based on the 
evidence. 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Advantages 
Outline 
Watson 
Jeopardy!TM 
Approach 
Apache Unstructured Information Management Architecture 
Advantages 
Mini-Tutorial 
UIMA Asynchronous Scale-out (Low-latency) 
My Own Personal Contributions 
To Watson 
After Watson 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Advantages 
Frameworks 
I Frameworks enable: 
I Sharing and Collaboration 
I Growth 
I Deployment and Large scale implementations 
I Adoption 
I Frameworks need: 
I Maintenance (no software is ever “completed”) 
I Documentation (to further collaboration / adoption) 
I Neutrality (w.r.t. applications being implemented) 
I Ownership (on behalf of their developers / maintainers) 
I Publicity (for widespread adoption) 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Advantages 
Enabling Sharing and Collaboration 
I Sharing within an organization 
I Code is the documentation 
I Agile sharing 
I Convention-over-configuration 
I Sharing with the world 
I Enabling the greater good, without paying a high price 
(support time, spoiling potential ventures) 
I Sharing with new / potential partners 
I Bringing new people up to speed 
I Attracting talent 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Advantages 
Enabling Growth 
I New phenomena 
I From syntactic parsing to semantic parsing 
I From parsing sentences to parsing USB traffic data 
I New artifacts 
I From text to speech 
I New architectures 
I From Understanding to Generation 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Advantages 
Enable Deployment and Large scale implementations 
I Multiple architectures 
I Windows, Linux 
I On-line vs. off-line 
I Batch corpus processing vs. user-oriented Web services 
I New programming languages (and old, efficient ones) 
I New human languages 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Advantages 
What is UIMA 
I UIMA is a framework, a means to integrate text or other 
unstructured information analytics. 
I Reference implementations available for Java, C++ and 
others. 
I An Open Source project under the umbrella of the Apache 
Foundation. 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Advantages 
Analytics Frameworks 
I Find all telephone numbers in running text 
I (((n([0-9]{3}n))|[0-9]{3})-? 
[0-9]{3}-?[0-9]{4} 
I Nice but... 
I How are you going to feed this result for further processing? 
I What about finding non-standard proper names in text? 
I Acquiring technology from external vendors, free software 
projects, etc? 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Advantages 
In-line Annotations 
I Modify text to include annotations 
I This/DET happy/ADJ puppy/N 
I It gets very messy very quickly 
I (S (NP (This/DET happy/ADJ puppy/N) (VP eats/V (NP 
(the/DET bone/N))) 
I Annotations can easily cross boundaries of other 
annotations 
I He said <confidential>the project can’t go on. The funding 
is lacking.</confidential> 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Advantages 
Standoff Annotations 
I Standoff annotations 
I Do not modify the text 
I Keep the annotations as offsets within the original text 
I Most analytics frameworks support standoff annotations. 
I UIMA is built with standoff annotations at its core. 
I Example: 
He said the project can’t go on. The funding is lacking. 
012345678901234567890123567890123456789012345678901234567 
I Sentence Annotation: 0-32, 35-57. 
I Confidential Annotation: 8-57. 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Advantages 
Type Systems 
I Key to integrating analytic packages developed by 
independent vendors. 
I Clear metadata about 
I Expected Inputs 
I Tokens, sentences, proper names, etc 
I Produced Outputs 
I Parse trees, opinions, etc 
I The framework creates an unified typesystem for a given 
set of annotators being run. 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Advantages 
UIMA Advantages 
I CAS 
I Memory Efficiency 
I Indices 
I Types 
I Interoperability 
I Lean protocol serialization 
I UIMA AS sends and retrieves from network nodes only the 
required information 
I (default XMI serialization is anything but lean) 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Tutorial 
Outline 
Watson 
Jeopardy!TM 
Approach 
Apache Unstructured Information Management Architecture 
Advantages 
Mini-Tutorial 
UIMA Asynchronous Scale-out (Low-latency) 
My Own Personal Contributions 
To Watson 
After Watson 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Tutorial 
UIMA Concepts 
I Common Annotation Structure or CAS 
I Subject of Analysis (SofA or View) 
I JCas 
I Feature Structures 
I Annotations 
I Indices and Iterators 
I Analysis Engines (AEs) 
I AEs descriptors 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Tutorial 
Room annotator 
I From the UIMA tutorial, write an Analysis Engine that 
identifies room numbers in text. 
Yorktown patterns: 20-001, 31-206, 04-123 (Regular 
Expression Pattern: [0-9][0-9]-[0-2][0-9][0-9]) 
Hawthorne patterns: GN-K35, 1S-L07, 4N-B21 (Regular 
Expression Pattern: [G1-4][NS]-[A-Z][0-9]) 
I Steps: 
1. Define the CAS types that the annotator will use. 
2. Generate the Java classes for these types. 
3. Write the actual annotator Java code. 
4. Create the Analysis Engine descriptor. 
5. Test the annotator. 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Tutorial 
Editing a Type System 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Tutorial 
The XML descriptor 
<?xml version=" 1.0 " encoding="UTF8 ? 
typeSystemDescr ipt ion xmlns= h t t p : / / uima . apache . org / resour ceSpec i f ier  
nameTutorialTypeSystem / name 
 d e s c r i p t i o n Type System De f i n i t i o n f o r the t u t o r i a l examples  
as of Exercise 1 / d e s c r i p t i o n  
vendorApache Software Foundation / vendor 
version1.0 / version 
types 
t ypeDes c r ipt ion 
nameorg . apache . uima . t u t o r i a l .RoomNumber / name 
 d e s c r i p t i o n  / d e s c r i p t i o n  
supertypeNameuima . tcas . Annotat ion / supertypeName 
features 
 f e a t u r eDe s c r i p t i o n  
name b u i l d i n g  / name 
 d e s c r i p t i o n Bui lding containing t h i s room / d e s c r i p t i o n  
rangeTypeNameuima . cas . St r i n g  / rangeTypeName 
 / f e a t u r eDe s c r i p t i o n  
 / features 
 / t ypeDes c r ipt ion 
 / types 
 / typeSystemDescr ipt ion 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Tutorial 
The AE code 
package org . apache . uima . t u t o r i a l . ex1 ; 
import java . u t i l . regex . Matcher ; 
import java . u t i l . regex . Pat tern ; 
import org . apache . uima . analysis_component . JCasAnnotator_ImplBase ; 
import org . apache . uima . jcas . JCas ; 
import org . apache . uima . t u t o r i a l .RoomNumber ; 
/ 
 Example annotator t h a t detects room numbers using 
 Java 1.4 regular expressions . 
/ 
public class RoomNumberAnnotator extends JCasAnnotator_ImplBase { 
pr ivate Pat tern mYorktownPattern = 
Pat tern . compile (    b[04]  d[02]d   d   b  ) ; 
pr ivate Pat tern mHawthornePattern = 
Pat tern . compile (    b [G14][NS][AZ ]   d   d   b  ) ; 
public void process ( JCas aJCas ) { 
/ / next s l i d e 
} 
} 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Tutorial 
The AE code (cont.) 
public void process ( JCas aJCas ) { 
/ / get document t e x t 
St r i n g docText = aJCas . getDocumentText ( ) ; 
/ / search f o r Yorktown room numbers 
Matcher matcher = mYorktownPattern . matcher ( docText ) ; 
int pos = 0; 
while (matcher . f i n d ( pos ) ) { 
/ / found one  create annotat ion 
RoomNumber annotat ion = new RoomNumber ( aJCas ) ; 
annotat ion . setBegin (matcher . s t a r t ( ) ) ; 
annotat ion . setEnd (matcher . end ( ) ) ; 
annotat ion . s e tBu i l d i n g (  Yorktown  ) ; 
annotat ion . addToIndexes ( ) ; 
pos = matcher . end ( ) ; 
} 
/ / search f o r Hawthorne room numbers 
/ / . . 
} 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Tutorial 
UIMA Document Analyzer 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Tutorial 
UIMA Document Analyzer (cont) 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Tutorial 
Custom Flow Controllers 
I UIMA allows you to specify which AE will receive the CAS 
next, based on all the annotations on the CAS. 
I examples/descriptors/flow_controller/WhiteboardFlowController.xml 
I FlowController implementing a simple version of the 
“whiteboard” flow model. Each time a CAS is received, it 
looks at the pool of available AEs that have not yet run on 
that CAS, and picks one whose input requirements are 
satisfied. Limitations: only looks at types, not features. 
Does not handle multiple Sofas or CasMultipliers. 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
UIMA AS 
Outline 
Watson 
Jeopardy!TM 
Approach 
Apache Unstructured Information Management Architecture 
Advantages 
Mini-Tutorial 
UIMA Asynchronous Scale-out (Low-latency) 
My Own Personal Contributions 
To Watson 
After Watson 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
UIMA AS 
UIMA AS: ActiveMQ 
ActiveMQ 
Broker UIMA AS 
AEs 
queue 
Client 
queue 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
UIMA AS 
UIMA AS: Wrapping Primitive AEs 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
UIMA AS 
UIMA AS: Advantages 
I Very flexible in terms of splitting the load between your 
nodes. 
I You have full control of how to split the queues into 
subqueues, etc. 
I Very efficient in terms of network overhead. 
I A CAS to be split and processed multiple times (on different 
parts) is only sent once. 
I Only the needed annotations are sent and the new 
annotations are sent back. 
I Correct metadata (descriptor files) are key for that to work 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
UIMA AS 
UIMA AS: More information 
I http://uima.apache.org/doc-uimaas-what.html 
I http://svn.apache.org/viewvc/uima/uima-as/trunk/README?view=markup 
I http://uima.apache.org/d/uima-as-2.4.2/uima_async_scaleout.html 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
UIMA AS 
Many frameworks 
I Besides UIMA 
I http://uima.apache.org 
I LingPipe 
I http://alias-i.com/lingpipe/ 
I Gate 
I http://gate.ac.uk/ 
I NLTK 
I http://www.nltk.org/ 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
UIMA AS 
UIMA Advantages 
I Apache Licensed 
I Enterprise-ready code quality 
I Demonstrated scalability 
I Developed by experts in building frameworks 
I Not domain (e.g., NLP) experts 
I Interoperable (C++, Java, others) 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
UIMA AS 
How Hard is to Learn UIMA? 
I It is hard. 
I Documentation is very good but very bulky. 
I Take the time to read it cover to cover, it goes down fast. 
I Use the Eclipse tooling whenever possible. 
I Learn uimaFIT first, then pure JCas, then CAS if needed. 
I Focus on the “goodies”: 
I Apache UIMA Ruta – rule based annotators 
I OpenNLP – trained models for POS, NER, etc., easy to 
train your own 
I ClearTk – a wrapper around machine learning libraries 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
To Watson 
Outline 
Watson 
Jeopardy!TM 
Approach 
Apache Unstructured Information Management Architecture 
Advantages 
Mini-Tutorial 
UIMA Asynchronous Scale-out (Low-latency) 
My Own Personal Contributions 
To Watson 
After Watson 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
To Watson 
My Contributions in the Watson System 
I Sources Team 
I Internal Tooling 
I Machine learning in watson 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
To Watson 
Systems Team 
Systems Team, from https://www.research.ibm.com/deepqa/. 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
To Watson 
Machine Learning in Watson 
I Multiple phases of Logistic Regression 
I Feature Engineering 
I DSL for Feature Engineering 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
To Watson 
First Four Phases of Merging and Ranking 
from Gondek, Lally, Kalyanpur, Murdock, Duboue, Zhang, Pan, Qiu, Welty (2012) 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
After Watson 
Outline 
Watson 
Jeopardy!TM 
Approach 
Apache Unstructured Information Management Architecture 
Advantages 
Mini-Tutorial 
UIMA Asynchronous Scale-out (Low-latency) 
My Own Personal Contributions 
To Watson 
After Watson 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
After Watson 
After Watson 
I Consulting 
I Academic Work 
I Teaching 
I Hunter Gatherer 
I Thoughtland 
I Free Software 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
After Watson 
Consulting 
I MatchFWD: LinkedIn data 
I UrbanOrca: Facebook data 
I KeaText: legal data 
I Radialpoint: tech support data 
I Signed the open letter from Montreal tech community 
I http://wearemtltech.ca 
I Contact me at http://duboue.net 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
After Watson 
Academic Work 
I Taught a graduate-level course in NLG in Argentina. 
I Published: 
I Pablo Duboue, Jing He and Jian-Yun Nie. “Hunter Gatherer: UdeM at 1Click-2”. NTCIR (2013). 
I Pablo Duboue. “On the Feasibility of Automatically Describing n-dimensional Objects”. EWNLG 
(2013). 
I Pablo Duboue. Thoughtland: Natural Language Descriptions for Machine Learning n-dimensional 
Error Functions (demo)”. Proc. of EWNLG (2013). 
I Jing He, Pablo Duboue, and Jian-Yun Nie. “Bridging the Gap between Intrinsic and Perceived 
Relevance in Snippet Generation” ”. COLING (2012). 
I Fabian Pacheco, Pablo Duboue, and Martin Dominguez. “On The Feasibility of Open Domain 
Referring Expression Generation Using Large Scale Folksonomies (short paper)”. NAACL (2012). 
I Pablo Duboue. “Extractive email thread summarization: Can we do better than He Said She Said?”. 
INLG (2012). 
I David Nicolas Racca, Luciana Benotti, and Pablo Duboue. “The GIVE-2.5 C Generation System” 
EWNLG (2011). 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
After Watson 
Hunter Gatherer 
I What? 1-Click Search 
I Input: Query and 200 ranked Web pages 
I Output: a 1,000 characters summary 
I Summary should contain the information the pages relevant 
to the query. 
I A research challenge part of NTICR 
I Queries belong to 8 types (celebrities, how to, location, 
etc) 
I But the type is not explicit 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
After Watson 
Hunter Gatherer Approach 
I Apply the DeepQA architecture to 1-Click task 
I Do not explicitly type the query 
I Hunt nuggets, gather evidence 
1. Hunt text nuggets on relevant passages 
2. Gather evidence passages that contain nuggets and query 
terms 
3. Score nuggets based on evidence 
4. Final output are sentences containing highly scored 
nuggets 
https://github.com/DrDub/hunter-gatherer 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
After Watson 
Thoughtland 
I Generation of textual descriptions for n-dimensional data. 
I Early stage research 
I Focus on describing the error surface for Machine Learning 
models 
I Presented at the European Workshop in Natural Language 
Generation in Sofia, Bulgaria (2013) 
I Written in Scala, using Mahout on top of Hadoop for 
clustering and Weka for machine learning. 
I Demo: http://thoughtland.duboue.net 
I Code: https://github.com/DrDub/Thoughtland 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
After Watson 
Thoughtland: Input 
I A small data set from the UCI ML repo, the Auto-Mpg Data: 
http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/ 
@relation auto_mpg 
@attribute mpg numeric 
@attribute cylinders numeric 
@attribute displacement numeric 
@attribute horsepower numeric 
@attribute weight numeric 
@attribute acceleration numeric 
@attribute modelyear numeric 
@attribute origin numeric 
@data 
18.0,8,307.0,130.0,3504.,12.0,70,1 
14.0,8,455.0,225.0,3086.,10.0,70,1 
24.0,4,113.0,95.00,2372.,15.0,70,3 
22.0,6,198.0,95.00,2833.,15.5,70,1 
27.0,4,97.00,88.00,2130.,14.5,70,3 
26.0,4,97.00,46.00,1835.,20.5,70,2 
... +400 more rows 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
After Watson 
Thoughtland: Output 
I MLP, 2 hidden layers (3, 2 units), acc. 65%, Thoughtland 
generates: 
There are four components and eight dimensions. Components One, Two and Three are 
small. Components One, Two and Three are very dense. Components Four, Three and 
One are all far from each other. The rest are all at a good distance from each other. 
I MLP, 1 hidden layer (8 units), acc. 65.7%, Thoughtland 
generates: 
There are four components and eight dimensions. Components One, Two and Three are 
small. Components One, Two and Three are very dense. Components Four and Three 
are far from each other. The rest are all at a good distance from each other. 
(difference is highlighted) 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
After Watson 
Thoughtland: Architecture 
UIMA and Watson Les Laboratoires Foulab
Watson UIMA Pablo Summary 
Summary 
I UIMA is a production ready framework for unstructured 
information processing. 
I It enables both batch processing and low latency 
processing. 
I UIMA is a framework and it contains little or no annotators 
itself. 
I But new annotators are becoming available through 
OpenNLP and ClearTk. 
I It is an efficient framework that requires commitment on 
behalf of its practitioners. 
I UIMA learning curve is fairly steep. 
UIMA and Watson Les Laboratoires Foulab

Mais conteúdo relacionado

Mais procurados

Java notes | All Basics |
Java notes | All Basics |Java notes | All Basics |
Java notes | All Basics |ShubhamAthawane
 
Introduction to Java Programming
Introduction to Java ProgrammingIntroduction to Java Programming
Introduction to Java ProgrammingRavi Kant Sahu
 
Java basic-tutorial for beginners
Java basic-tutorial for beginners Java basic-tutorial for beginners
Java basic-tutorial for beginners Muzammil Ali
 
Introduction to java
Introduction to javaIntroduction to java
Introduction to javaSteve Fort
 
Core java lessons
Core java lessonsCore java lessons
Core java lessonsvivek shah
 
PROGRAMMING IN JAVA-unit 3-part II
PROGRAMMING IN JAVA-unit 3-part IIPROGRAMMING IN JAVA-unit 3-part II
PROGRAMMING IN JAVA-unit 3-part IISivaSankari36
 

Mais procurados (11)

Java essential notes
Java essential notesJava essential notes
Java essential notes
 
Java notes | All Basics |
Java notes | All Basics |Java notes | All Basics |
Java notes | All Basics |
 
Java basic introduction
Java basic introductionJava basic introduction
Java basic introduction
 
Introduction to Java Programming
Introduction to Java ProgrammingIntroduction to Java Programming
Introduction to Java Programming
 
Java basic-tutorial for beginners
Java basic-tutorial for beginners Java basic-tutorial for beginners
Java basic-tutorial for beginners
 
Introduction to java
Introduction to javaIntroduction to java
Introduction to java
 
Java
JavaJava
Java
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
Core java lessons
Core java lessonsCore java lessons
Core java lessons
 
Presentation on java
Presentation  on  javaPresentation  on  java
Presentation on java
 
PROGRAMMING IN JAVA-unit 3-part II
PROGRAMMING IN JAVA-unit 3-part IIPROGRAMMING IN JAVA-unit 3-part II
PROGRAMMING IN JAVA-unit 3-part II
 

Semelhante a Pablo Duboue

2017 03 25 Microsoft Hacks, How to code efficiently
2017 03 25 Microsoft Hacks, How to code efficiently2017 03 25 Microsoft Hacks, How to code efficiently
2017 03 25 Microsoft Hacks, How to code efficientlyBruno Capuano
 
A Backpack to go the Extra-Functional Mile (a hitched hike by the PROWESS pro...
A Backpack to go the Extra-Functional Mile (a hitched hike by the PROWESS pro...A Backpack to go the Extra-Functional Mile (a hitched hike by the PROWESS pro...
A Backpack to go the Extra-Functional Mile (a hitched hike by the PROWESS pro...Laura M. Castro
 
Watir Presentation Sumanth Krishna. A
Watir Presentation   Sumanth Krishna. AWatir Presentation   Sumanth Krishna. A
Watir Presentation Sumanth Krishna. ASumanth krishna
 
Automating Security Testing with the OWTF
Automating Security Testing with the OWTFAutomating Security Testing with the OWTF
Automating Security Testing with the OWTFJerod Brennen
 
QA or the Highway - Extra-functional testing, improve how you observe the sys...
QA or the Highway - Extra-functional testing, improve how you observe the sys...QA or the Highway - Extra-functional testing, improve how you observe the sys...
QA or the Highway - Extra-functional testing, improve how you observe the sys...Federico Toledo
 
Hol 1940-01-net pdf-en
Hol 1940-01-net pdf-enHol 1940-01-net pdf-en
Hol 1940-01-net pdf-endborsan
 
Beyond the basics of SonarQube: improve your Java(Script) code even further
Beyond the basics of SonarQube: improve your Java(Script) code even furtherBeyond the basics of SonarQube: improve your Java(Script) code even further
Beyond the basics of SonarQube: improve your Java(Script) code even furtherJohan Janssen
 
Java Deserialization Vulnerabilities - The Forgotten Bug Class (RuhrSec Edition)
Java Deserialization Vulnerabilities - The Forgotten Bug Class (RuhrSec Edition)Java Deserialization Vulnerabilities - The Forgotten Bug Class (RuhrSec Edition)
Java Deserialization Vulnerabilities - The Forgotten Bug Class (RuhrSec Edition)CODE WHITE GmbH
 
Codeception Testing Framework -- English #phpkansai
Codeception Testing Framework -- English #phpkansaiCodeception Testing Framework -- English #phpkansai
Codeception Testing Framework -- English #phpkansaiFlorent Batard
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programsgreenwop
 
The Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas HaverThe Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas HaverQA or the Highway
 
Chaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionChaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionKeet Sugathadasa
 
War of the Machines: PVS-Studio vs. TensorFlow
War of the Machines: PVS-Studio vs. TensorFlowWar of the Machines: PVS-Studio vs. TensorFlow
War of the Machines: PVS-Studio vs. TensorFlowPVS-Studio
 
Mistral and StackStorm
Mistral and StackStormMistral and StackStorm
Mistral and StackStormDmitri Zimine
 
Integrate Your Test Automation Tools for More Power
Integrate Your Test Automation Tools for More PowerIntegrate Your Test Automation Tools for More Power
Integrate Your Test Automation Tools for More PowerTechWell
 
Don't Drop the SOAP: Real World Web Service Testing for Web Hackers
Don't Drop the SOAP: Real World Web Service Testing for Web Hackers Don't Drop the SOAP: Real World Web Service Testing for Web Hackers
Don't Drop the SOAP: Real World Web Service Testing for Web Hackers Tom Eston
 
IRJET - Automation in Python using Speech Recognition
IRJET -  	  Automation in Python using Speech RecognitionIRJET -  	  Automation in Python using Speech Recognition
IRJET - Automation in Python using Speech RecognitionIRJET Journal
 
How to build scalable and maintainable test automation solution for Web / API...
How to build scalable and maintainable test automation solution for Web / API...How to build scalable and maintainable test automation solution for Web / API...
How to build scalable and maintainable test automation solution for Web / API...Dmitriy Romanov
 

Semelhante a Pablo Duboue (20)

2017 03 25 Microsoft Hacks, How to code efficiently
2017 03 25 Microsoft Hacks, How to code efficiently2017 03 25 Microsoft Hacks, How to code efficiently
2017 03 25 Microsoft Hacks, How to code efficiently
 
A Backpack to go the Extra-Functional Mile (a hitched hike by the PROWESS pro...
A Backpack to go the Extra-Functional Mile (a hitched hike by the PROWESS pro...A Backpack to go the Extra-Functional Mile (a hitched hike by the PROWESS pro...
A Backpack to go the Extra-Functional Mile (a hitched hike by the PROWESS pro...
 
Watir Presentation Sumanth Krishna. A
Watir Presentation   Sumanth Krishna. AWatir Presentation   Sumanth Krishna. A
Watir Presentation Sumanth Krishna. A
 
IBM Watson
IBM WatsonIBM Watson
IBM Watson
 
Railway Route Optimizer
Railway Route OptimizerRailway Route Optimizer
Railway Route Optimizer
 
Automating Security Testing with the OWTF
Automating Security Testing with the OWTFAutomating Security Testing with the OWTF
Automating Security Testing with the OWTF
 
QA or the Highway - Extra-functional testing, improve how you observe the sys...
QA or the Highway - Extra-functional testing, improve how you observe the sys...QA or the Highway - Extra-functional testing, improve how you observe the sys...
QA or the Highway - Extra-functional testing, improve how you observe the sys...
 
Hol 1940-01-net pdf-en
Hol 1940-01-net pdf-enHol 1940-01-net pdf-en
Hol 1940-01-net pdf-en
 
Beyond the basics of SonarQube: improve your Java(Script) code even further
Beyond the basics of SonarQube: improve your Java(Script) code even furtherBeyond the basics of SonarQube: improve your Java(Script) code even further
Beyond the basics of SonarQube: improve your Java(Script) code even further
 
Java Deserialization Vulnerabilities - The Forgotten Bug Class (RuhrSec Edition)
Java Deserialization Vulnerabilities - The Forgotten Bug Class (RuhrSec Edition)Java Deserialization Vulnerabilities - The Forgotten Bug Class (RuhrSec Edition)
Java Deserialization Vulnerabilities - The Forgotten Bug Class (RuhrSec Edition)
 
Codeception Testing Framework -- English #phpkansai
Codeception Testing Framework -- English #phpkansaiCodeception Testing Framework -- English #phpkansai
Codeception Testing Framework -- English #phpkansai
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programs
 
The Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas HaverThe Automation Firehose: Be Strategic and Tactical by Thomas Haver
The Automation Firehose: Be Strategic and Tactical by Thomas Haver
 
Chaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in ProductionChaos Engineering - The Art of Breaking Things in Production
Chaos Engineering - The Art of Breaking Things in Production
 
War of the Machines: PVS-Studio vs. TensorFlow
War of the Machines: PVS-Studio vs. TensorFlowWar of the Machines: PVS-Studio vs. TensorFlow
War of the Machines: PVS-Studio vs. TensorFlow
 
Mistral and StackStorm
Mistral and StackStormMistral and StackStorm
Mistral and StackStorm
 
Integrate Your Test Automation Tools for More Power
Integrate Your Test Automation Tools for More PowerIntegrate Your Test Automation Tools for More Power
Integrate Your Test Automation Tools for More Power
 
Don't Drop the SOAP: Real World Web Service Testing for Web Hackers
Don't Drop the SOAP: Real World Web Service Testing for Web Hackers Don't Drop the SOAP: Real World Web Service Testing for Web Hackers
Don't Drop the SOAP: Real World Web Service Testing for Web Hackers
 
IRJET - Automation in Python using Speech Recognition
IRJET -  	  Automation in Python using Speech RecognitionIRJET -  	  Automation in Python using Speech Recognition
IRJET - Automation in Python using Speech Recognition
 
How to build scalable and maintainable test automation solution for Web / API...
How to build scalable and maintainable test automation solution for Web / API...How to build scalable and maintainable test automation solution for Web / API...
How to build scalable and maintainable test automation solution for Web / API...
 

Mais de ClusterCba

Esteban Próspero
Esteban PrósperoEsteban Próspero
Esteban PrósperoClusterCba
 
Leandro Di Persia
Leandro Di PersiaLeandro Di Persia
Leandro Di PersiaClusterCba
 
Nicolás Ramos
Nicolás RamosNicolás Ramos
Nicolás RamosClusterCba
 
Agustín Bergallo
Agustín BergalloAgustín Bergallo
Agustín BergalloClusterCba
 
Matias Cuenca Acuña
Matias Cuenca AcuñaMatias Cuenca Acuña
Matias Cuenca AcuñaClusterCba
 
Christian Oviedo
Christian OviedoChristian Oviedo
Christian OviedoClusterCba
 
Mauricio Rucci
Mauricio RucciMauricio Rucci
Mauricio RucciClusterCba
 
Presentacion _Utrera_Pticomex_060312
Presentacion _Utrera_Pticomex_060312Presentacion _Utrera_Pticomex_060312
Presentacion _Utrera_Pticomex_060312ClusterCba
 
Presentacion jaimez romero_pticomex_ 060312
Presentacion jaimez romero_pticomex_ 060312Presentacion jaimez romero_pticomex_ 060312
Presentacion jaimez romero_pticomex_ 060312ClusterCba
 
Presentacion_Mordezki_PTICOMEX
Presentacion_Mordezki_PTICOMEXPresentacion_Mordezki_PTICOMEX
Presentacion_Mordezki_PTICOMEXClusterCba
 

Mais de ClusterCba (14)

Esteban Próspero
Esteban PrósperoEsteban Próspero
Esteban Próspero
 
Leandro Di Persia
Leandro Di PersiaLeandro Di Persia
Leandro Di Persia
 
Nicolás Ramos
Nicolás RamosNicolás Ramos
Nicolás Ramos
 
Ivan Arce
Ivan ArceIvan Arce
Ivan Arce
 
Darren Camas
Darren CamasDarren Camas
Darren Camas
 
Diego May
Diego MayDiego May
Diego May
 
Agustín Bergallo
Agustín BergalloAgustín Bergallo
Agustín Bergallo
 
Matias Cuenca Acuña
Matias Cuenca AcuñaMatias Cuenca Acuña
Matias Cuenca Acuña
 
Christian Oviedo
Christian OviedoChristian Oviedo
Christian Oviedo
 
Mauricio Rucci
Mauricio RucciMauricio Rucci
Mauricio Rucci
 
Diego Casali
Diego CasaliDiego Casali
Diego Casali
 
Presentacion _Utrera_Pticomex_060312
Presentacion _Utrera_Pticomex_060312Presentacion _Utrera_Pticomex_060312
Presentacion _Utrera_Pticomex_060312
 
Presentacion jaimez romero_pticomex_ 060312
Presentacion jaimez romero_pticomex_ 060312Presentacion jaimez romero_pticomex_ 060312
Presentacion jaimez romero_pticomex_ 060312
 
Presentacion_Mordezki_PTICOMEX
Presentacion_Mordezki_PTICOMEXPresentacion_Mordezki_PTICOMEX
Presentacion_Mordezki_PTICOMEX
 

Último

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Último (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Pablo Duboue

  • 1. Watson UIMA Pablo Summary Apache UIMA and the Watson Jeopardy!TM System Big Data Montreal Meetup Pablo Ariel Duboue Les Laboratoires Foulab 999 Rue du College Montreal, H4C 2S3, Quebec February 4th 2014 UIMA and Watson Les Laboratoires Foulab
  • 2. Watson UIMA Pablo Summary Outline Watson Jeopardy!TM Approach Apache Unstructured Information Management Architecture Advantages Mini-Tutorial UIMA Asynchronous Scale-out (Low-latency) My Own Personal Contributions To Watson After Watson UIMA and Watson Les Laboratoires Foulab
  • 3. Watson UIMA Pablo Summary Jeopardy!TM Outline Watson Jeopardy!TM Approach Apache Unstructured Information Management Architecture Advantages Mini-Tutorial UIMA Asynchronous Scale-out (Low-latency) My Own Personal Contributions To Watson After Watson UIMA and Watson Les Laboratoires Foulab
  • 4. Watson UIMA Pablo Summary Jeopardy!TM Problem from http://en.wikipedia.org/wiki/Jeopardy UIMA and Watson Les Laboratoires Foulab
  • 5. Watson UIMA Pablo Summary Jeopardy!TM Example Questions Category: "J.P." He played Duke Washburn, Curly’s twin brother, in "City Slickers II". I Answer: Jack Palance UIMA and Watson Les Laboratoires Foulab
  • 6. Watson UIMA Pablo Summary Jeopardy!TM About the Speaker I am passionate about improving society through language technology and split my time between teaching, doing research and contributing to free software projects I Columbia University I Natural Language Generation I Thesis: “Indirect Supervised Learning of Strategic Generation Logic”, defended Jan. 2005. I IBM Research Watson I Question Answering I Deep QA - Watson I Independent Research, living in Montreal I Collaboration with Universite de Montreal I Free Software projects and consulting for small companies UIMA and Watson Les Laboratoires Foulab
  • 7. Watson UIMA Pablo Summary Jeopardy!TM The Challenges of a Research Team I Blazzingly fast development I Quick experimental turn-around is not a “nice to have” feature, it is key I Dead code I Lack of documentation I Reproducibility of results UIMA and Watson Les Laboratoires Foulab
  • 8. Watson UIMA Pablo Summary Jeopardy!TM Architecture Incremental progress from June 2007 to November 2010, from Ferrucci (2012) UIMA and Watson Les Laboratoires Foulab
  • 9. Watson UIMA Pablo Summary Jeopardy!TM The Challenges of a Grand Challenge I Very expensive. I Constantly on the verge of being canceled. I Plenty of issues beyond the control of the research team. UIMA and Watson Les Laboratoires Foulab
  • 10. Watson UIMA Pablo Summary Approach Outline Watson Jeopardy!TM Approach Apache Unstructured Information Management Architecture Advantages Mini-Tutorial UIMA Asynchronous Scale-out (Low-latency) My Own Personal Contributions To Watson After Watson UIMA and Watson Les Laboratoires Foulab
  • 11. Watson UIMA Pablo Summary Approach Approach I Keep all options open till the end I Do not overcommit I Propose candidate answers by perfoming searches I Gather supporting evindence by performing a search for each candidate answer (!) I Analyze all this wealth of information in parallel I Central scoring and ranking using Machine Learning UIMA and Watson Les Laboratoires Foulab
  • 12. Watson UIMA Pablo Summary Approach Architecture DeepQA Architecture, from Ferrucci (2012) UIMA and Watson Les Laboratoires Foulab
  • 13. Watson UIMA Pablo Summary Approach Components Descriptions Question Analysis. Extract keywords, assign to known classes, expand entities. Primary Search. Obtain a set of documents relevant to the question. Candidate Answer Generation. Extract from the documents candidate answers. Evidence Retrieval and Scoring. Fetch passages (sentences) containing the candidate answers and relevant keywords, then score the candidates in context. Final Confidence Merging. Apply a trained model based on the evidence. UIMA and Watson Les Laboratoires Foulab
  • 14. Watson UIMA Pablo Summary Advantages Outline Watson Jeopardy!TM Approach Apache Unstructured Information Management Architecture Advantages Mini-Tutorial UIMA Asynchronous Scale-out (Low-latency) My Own Personal Contributions To Watson After Watson UIMA and Watson Les Laboratoires Foulab
  • 15. Watson UIMA Pablo Summary Advantages Frameworks I Frameworks enable: I Sharing and Collaboration I Growth I Deployment and Large scale implementations I Adoption I Frameworks need: I Maintenance (no software is ever “completed”) I Documentation (to further collaboration / adoption) I Neutrality (w.r.t. applications being implemented) I Ownership (on behalf of their developers / maintainers) I Publicity (for widespread adoption) UIMA and Watson Les Laboratoires Foulab
  • 16. Watson UIMA Pablo Summary Advantages Enabling Sharing and Collaboration I Sharing within an organization I Code is the documentation I Agile sharing I Convention-over-configuration I Sharing with the world I Enabling the greater good, without paying a high price (support time, spoiling potential ventures) I Sharing with new / potential partners I Bringing new people up to speed I Attracting talent UIMA and Watson Les Laboratoires Foulab
  • 17. Watson UIMA Pablo Summary Advantages Enabling Growth I New phenomena I From syntactic parsing to semantic parsing I From parsing sentences to parsing USB traffic data I New artifacts I From text to speech I New architectures I From Understanding to Generation UIMA and Watson Les Laboratoires Foulab
  • 18. Watson UIMA Pablo Summary Advantages Enable Deployment and Large scale implementations I Multiple architectures I Windows, Linux I On-line vs. off-line I Batch corpus processing vs. user-oriented Web services I New programming languages (and old, efficient ones) I New human languages UIMA and Watson Les Laboratoires Foulab
  • 19. Watson UIMA Pablo Summary Advantages What is UIMA I UIMA is a framework, a means to integrate text or other unstructured information analytics. I Reference implementations available for Java, C++ and others. I An Open Source project under the umbrella of the Apache Foundation. UIMA and Watson Les Laboratoires Foulab
  • 20. Watson UIMA Pablo Summary Advantages Analytics Frameworks I Find all telephone numbers in running text I (((n([0-9]{3}n))|[0-9]{3})-? [0-9]{3}-?[0-9]{4} I Nice but... I How are you going to feed this result for further processing? I What about finding non-standard proper names in text? I Acquiring technology from external vendors, free software projects, etc? UIMA and Watson Les Laboratoires Foulab
  • 21. Watson UIMA Pablo Summary Advantages In-line Annotations I Modify text to include annotations I This/DET happy/ADJ puppy/N I It gets very messy very quickly I (S (NP (This/DET happy/ADJ puppy/N) (VP eats/V (NP (the/DET bone/N))) I Annotations can easily cross boundaries of other annotations I He said <confidential>the project can’t go on. The funding is lacking.</confidential> UIMA and Watson Les Laboratoires Foulab
  • 22. Watson UIMA Pablo Summary Advantages Standoff Annotations I Standoff annotations I Do not modify the text I Keep the annotations as offsets within the original text I Most analytics frameworks support standoff annotations. I UIMA is built with standoff annotations at its core. I Example: He said the project can’t go on. The funding is lacking. 012345678901234567890123567890123456789012345678901234567 I Sentence Annotation: 0-32, 35-57. I Confidential Annotation: 8-57. UIMA and Watson Les Laboratoires Foulab
  • 23. Watson UIMA Pablo Summary Advantages Type Systems I Key to integrating analytic packages developed by independent vendors. I Clear metadata about I Expected Inputs I Tokens, sentences, proper names, etc I Produced Outputs I Parse trees, opinions, etc I The framework creates an unified typesystem for a given set of annotators being run. UIMA and Watson Les Laboratoires Foulab
  • 24. Watson UIMA Pablo Summary Advantages UIMA Advantages I CAS I Memory Efficiency I Indices I Types I Interoperability I Lean protocol serialization I UIMA AS sends and retrieves from network nodes only the required information I (default XMI serialization is anything but lean) UIMA and Watson Les Laboratoires Foulab
  • 25. Watson UIMA Pablo Summary Tutorial Outline Watson Jeopardy!TM Approach Apache Unstructured Information Management Architecture Advantages Mini-Tutorial UIMA Asynchronous Scale-out (Low-latency) My Own Personal Contributions To Watson After Watson UIMA and Watson Les Laboratoires Foulab
  • 26. Watson UIMA Pablo Summary Tutorial UIMA Concepts I Common Annotation Structure or CAS I Subject of Analysis (SofA or View) I JCas I Feature Structures I Annotations I Indices and Iterators I Analysis Engines (AEs) I AEs descriptors UIMA and Watson Les Laboratoires Foulab
  • 27. Watson UIMA Pablo Summary Tutorial Room annotator I From the UIMA tutorial, write an Analysis Engine that identifies room numbers in text. Yorktown patterns: 20-001, 31-206, 04-123 (Regular Expression Pattern: [0-9][0-9]-[0-2][0-9][0-9]) Hawthorne patterns: GN-K35, 1S-L07, 4N-B21 (Regular Expression Pattern: [G1-4][NS]-[A-Z][0-9]) I Steps: 1. Define the CAS types that the annotator will use. 2. Generate the Java classes for these types. 3. Write the actual annotator Java code. 4. Create the Analysis Engine descriptor. 5. Test the annotator. UIMA and Watson Les Laboratoires Foulab
  • 28. Watson UIMA Pablo Summary Tutorial Editing a Type System UIMA and Watson Les Laboratoires Foulab
  • 29. Watson UIMA Pablo Summary Tutorial The XML descriptor <?xml version=" 1.0 " encoding="UTF8 ? typeSystemDescr ipt ion xmlns= h t t p : / / uima . apache . org / resour ceSpec i f ier nameTutorialTypeSystem / name d e s c r i p t i o n Type System De f i n i t i o n f o r the t u t o r i a l examples as of Exercise 1 / d e s c r i p t i o n vendorApache Software Foundation / vendor version1.0 / version types t ypeDes c r ipt ion nameorg . apache . uima . t u t o r i a l .RoomNumber / name d e s c r i p t i o n / d e s c r i p t i o n supertypeNameuima . tcas . Annotat ion / supertypeName features f e a t u r eDe s c r i p t i o n name b u i l d i n g / name d e s c r i p t i o n Bui lding containing t h i s room / d e s c r i p t i o n rangeTypeNameuima . cas . St r i n g / rangeTypeName / f e a t u r eDe s c r i p t i o n / features / t ypeDes c r ipt ion / types / typeSystemDescr ipt ion UIMA and Watson Les Laboratoires Foulab
  • 30. Watson UIMA Pablo Summary Tutorial The AE code package org . apache . uima . t u t o r i a l . ex1 ; import java . u t i l . regex . Matcher ; import java . u t i l . regex . Pat tern ; import org . apache . uima . analysis_component . JCasAnnotator_ImplBase ; import org . apache . uima . jcas . JCas ; import org . apache . uima . t u t o r i a l .RoomNumber ; / Example annotator t h a t detects room numbers using Java 1.4 regular expressions . / public class RoomNumberAnnotator extends JCasAnnotator_ImplBase { pr ivate Pat tern mYorktownPattern = Pat tern . compile ( b[04] d[02]d d b ) ; pr ivate Pat tern mHawthornePattern = Pat tern . compile ( b [G14][NS][AZ ] d d b ) ; public void process ( JCas aJCas ) { / / next s l i d e } } UIMA and Watson Les Laboratoires Foulab
  • 31. Watson UIMA Pablo Summary Tutorial The AE code (cont.) public void process ( JCas aJCas ) { / / get document t e x t St r i n g docText = aJCas . getDocumentText ( ) ; / / search f o r Yorktown room numbers Matcher matcher = mYorktownPattern . matcher ( docText ) ; int pos = 0; while (matcher . f i n d ( pos ) ) { / / found one create annotat ion RoomNumber annotat ion = new RoomNumber ( aJCas ) ; annotat ion . setBegin (matcher . s t a r t ( ) ) ; annotat ion . setEnd (matcher . end ( ) ) ; annotat ion . s e tBu i l d i n g ( Yorktown ) ; annotat ion . addToIndexes ( ) ; pos = matcher . end ( ) ; } / / search f o r Hawthorne room numbers / / . . } UIMA and Watson Les Laboratoires Foulab
  • 32. Watson UIMA Pablo Summary Tutorial UIMA Document Analyzer UIMA and Watson Les Laboratoires Foulab
  • 33. Watson UIMA Pablo Summary Tutorial UIMA Document Analyzer (cont) UIMA and Watson Les Laboratoires Foulab
  • 34. Watson UIMA Pablo Summary Tutorial Custom Flow Controllers I UIMA allows you to specify which AE will receive the CAS next, based on all the annotations on the CAS. I examples/descriptors/flow_controller/WhiteboardFlowController.xml I FlowController implementing a simple version of the “whiteboard” flow model. Each time a CAS is received, it looks at the pool of available AEs that have not yet run on that CAS, and picks one whose input requirements are satisfied. Limitations: only looks at types, not features. Does not handle multiple Sofas or CasMultipliers. UIMA and Watson Les Laboratoires Foulab
  • 35. Watson UIMA Pablo Summary UIMA AS Outline Watson Jeopardy!TM Approach Apache Unstructured Information Management Architecture Advantages Mini-Tutorial UIMA Asynchronous Scale-out (Low-latency) My Own Personal Contributions To Watson After Watson UIMA and Watson Les Laboratoires Foulab
  • 36. Watson UIMA Pablo Summary UIMA AS UIMA AS: ActiveMQ ActiveMQ Broker UIMA AS AEs queue Client queue UIMA and Watson Les Laboratoires Foulab
  • 37. Watson UIMA Pablo Summary UIMA AS UIMA AS: Wrapping Primitive AEs UIMA and Watson Les Laboratoires Foulab
  • 38. Watson UIMA Pablo Summary UIMA AS UIMA AS: Advantages I Very flexible in terms of splitting the load between your nodes. I You have full control of how to split the queues into subqueues, etc. I Very efficient in terms of network overhead. I A CAS to be split and processed multiple times (on different parts) is only sent once. I Only the needed annotations are sent and the new annotations are sent back. I Correct metadata (descriptor files) are key for that to work UIMA and Watson Les Laboratoires Foulab
  • 39. Watson UIMA Pablo Summary UIMA AS UIMA AS: More information I http://uima.apache.org/doc-uimaas-what.html I http://svn.apache.org/viewvc/uima/uima-as/trunk/README?view=markup I http://uima.apache.org/d/uima-as-2.4.2/uima_async_scaleout.html UIMA and Watson Les Laboratoires Foulab
  • 40. Watson UIMA Pablo Summary UIMA AS Many frameworks I Besides UIMA I http://uima.apache.org I LingPipe I http://alias-i.com/lingpipe/ I Gate I http://gate.ac.uk/ I NLTK I http://www.nltk.org/ UIMA and Watson Les Laboratoires Foulab
  • 41. Watson UIMA Pablo Summary UIMA AS UIMA Advantages I Apache Licensed I Enterprise-ready code quality I Demonstrated scalability I Developed by experts in building frameworks I Not domain (e.g., NLP) experts I Interoperable (C++, Java, others) UIMA and Watson Les Laboratoires Foulab
  • 42. Watson UIMA Pablo Summary UIMA AS How Hard is to Learn UIMA? I It is hard. I Documentation is very good but very bulky. I Take the time to read it cover to cover, it goes down fast. I Use the Eclipse tooling whenever possible. I Learn uimaFIT first, then pure JCas, then CAS if needed. I Focus on the “goodies”: I Apache UIMA Ruta – rule based annotators I OpenNLP – trained models for POS, NER, etc., easy to train your own I ClearTk – a wrapper around machine learning libraries UIMA and Watson Les Laboratoires Foulab
  • 43. Watson UIMA Pablo Summary To Watson Outline Watson Jeopardy!TM Approach Apache Unstructured Information Management Architecture Advantages Mini-Tutorial UIMA Asynchronous Scale-out (Low-latency) My Own Personal Contributions To Watson After Watson UIMA and Watson Les Laboratoires Foulab
  • 44. Watson UIMA Pablo Summary To Watson My Contributions in the Watson System I Sources Team I Internal Tooling I Machine learning in watson UIMA and Watson Les Laboratoires Foulab
  • 45. Watson UIMA Pablo Summary To Watson Systems Team Systems Team, from https://www.research.ibm.com/deepqa/. UIMA and Watson Les Laboratoires Foulab
  • 46. Watson UIMA Pablo Summary To Watson Machine Learning in Watson I Multiple phases of Logistic Regression I Feature Engineering I DSL for Feature Engineering UIMA and Watson Les Laboratoires Foulab
  • 47. Watson UIMA Pablo Summary To Watson First Four Phases of Merging and Ranking from Gondek, Lally, Kalyanpur, Murdock, Duboue, Zhang, Pan, Qiu, Welty (2012) UIMA and Watson Les Laboratoires Foulab
  • 48. Watson UIMA Pablo Summary After Watson Outline Watson Jeopardy!TM Approach Apache Unstructured Information Management Architecture Advantages Mini-Tutorial UIMA Asynchronous Scale-out (Low-latency) My Own Personal Contributions To Watson After Watson UIMA and Watson Les Laboratoires Foulab
  • 49. Watson UIMA Pablo Summary After Watson After Watson I Consulting I Academic Work I Teaching I Hunter Gatherer I Thoughtland I Free Software UIMA and Watson Les Laboratoires Foulab
  • 50. Watson UIMA Pablo Summary After Watson Consulting I MatchFWD: LinkedIn data I UrbanOrca: Facebook data I KeaText: legal data I Radialpoint: tech support data I Signed the open letter from Montreal tech community I http://wearemtltech.ca I Contact me at http://duboue.net UIMA and Watson Les Laboratoires Foulab
  • 51. Watson UIMA Pablo Summary After Watson Academic Work I Taught a graduate-level course in NLG in Argentina. I Published: I Pablo Duboue, Jing He and Jian-Yun Nie. “Hunter Gatherer: UdeM at 1Click-2”. NTCIR (2013). I Pablo Duboue. “On the Feasibility of Automatically Describing n-dimensional Objects”. EWNLG (2013). I Pablo Duboue. Thoughtland: Natural Language Descriptions for Machine Learning n-dimensional Error Functions (demo)”. Proc. of EWNLG (2013). I Jing He, Pablo Duboue, and Jian-Yun Nie. “Bridging the Gap between Intrinsic and Perceived Relevance in Snippet Generation” ”. COLING (2012). I Fabian Pacheco, Pablo Duboue, and Martin Dominguez. “On The Feasibility of Open Domain Referring Expression Generation Using Large Scale Folksonomies (short paper)”. NAACL (2012). I Pablo Duboue. “Extractive email thread summarization: Can we do better than He Said She Said?”. INLG (2012). I David Nicolas Racca, Luciana Benotti, and Pablo Duboue. “The GIVE-2.5 C Generation System” EWNLG (2011). UIMA and Watson Les Laboratoires Foulab
  • 52. Watson UIMA Pablo Summary After Watson Hunter Gatherer I What? 1-Click Search I Input: Query and 200 ranked Web pages I Output: a 1,000 characters summary I Summary should contain the information the pages relevant to the query. I A research challenge part of NTICR I Queries belong to 8 types (celebrities, how to, location, etc) I But the type is not explicit UIMA and Watson Les Laboratoires Foulab
  • 53. Watson UIMA Pablo Summary After Watson Hunter Gatherer Approach I Apply the DeepQA architecture to 1-Click task I Do not explicitly type the query I Hunt nuggets, gather evidence 1. Hunt text nuggets on relevant passages 2. Gather evidence passages that contain nuggets and query terms 3. Score nuggets based on evidence 4. Final output are sentences containing highly scored nuggets https://github.com/DrDub/hunter-gatherer UIMA and Watson Les Laboratoires Foulab
  • 54. Watson UIMA Pablo Summary After Watson Thoughtland I Generation of textual descriptions for n-dimensional data. I Early stage research I Focus on describing the error surface for Machine Learning models I Presented at the European Workshop in Natural Language Generation in Sofia, Bulgaria (2013) I Written in Scala, using Mahout on top of Hadoop for clustering and Weka for machine learning. I Demo: http://thoughtland.duboue.net I Code: https://github.com/DrDub/Thoughtland UIMA and Watson Les Laboratoires Foulab
  • 55. Watson UIMA Pablo Summary After Watson Thoughtland: Input I A small data set from the UCI ML repo, the Auto-Mpg Data: http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/ @relation auto_mpg @attribute mpg numeric @attribute cylinders numeric @attribute displacement numeric @attribute horsepower numeric @attribute weight numeric @attribute acceleration numeric @attribute modelyear numeric @attribute origin numeric @data 18.0,8,307.0,130.0,3504.,12.0,70,1 14.0,8,455.0,225.0,3086.,10.0,70,1 24.0,4,113.0,95.00,2372.,15.0,70,3 22.0,6,198.0,95.00,2833.,15.5,70,1 27.0,4,97.00,88.00,2130.,14.5,70,3 26.0,4,97.00,46.00,1835.,20.5,70,2 ... +400 more rows UIMA and Watson Les Laboratoires Foulab
  • 56. Watson UIMA Pablo Summary After Watson Thoughtland: Output I MLP, 2 hidden layers (3, 2 units), acc. 65%, Thoughtland generates: There are four components and eight dimensions. Components One, Two and Three are small. Components One, Two and Three are very dense. Components Four, Three and One are all far from each other. The rest are all at a good distance from each other. I MLP, 1 hidden layer (8 units), acc. 65.7%, Thoughtland generates: There are four components and eight dimensions. Components One, Two and Three are small. Components One, Two and Three are very dense. Components Four and Three are far from each other. The rest are all at a good distance from each other. (difference is highlighted) UIMA and Watson Les Laboratoires Foulab
  • 57. Watson UIMA Pablo Summary After Watson Thoughtland: Architecture UIMA and Watson Les Laboratoires Foulab
  • 58. Watson UIMA Pablo Summary Summary I UIMA is a production ready framework for unstructured information processing. I It enables both batch processing and low latency processing. I UIMA is a framework and it contains little or no annotators itself. I But new annotators are becoming available through OpenNLP and ClearTk. I It is an efficient framework that requires commitment on behalf of its practitioners. I UIMA learning curve is fairly steep. UIMA and Watson Les Laboratoires Foulab