SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
SystemT:
Declarative Information Extraction
Laura Chiticariu, IBM Research
Cognitive Systems Institute Speaker Series
11/17/2016
• Use cases
• Requirements
• SystemT
• SystemT courses
Outline
Case Study 1: Sentiment Analysis
Text
Analytics
Product catalog, Customer Master Data, …
Social
Media
• Products
Interests360o Profile
• Relationships
• Personal
Attributes
• Life
Events
Statistical
Analysis,
Report Gen.
Ban
k6
23
%
Ban
k5
20
%
Ban
k4
21
%
Ban
k3
5%
Ban
k2
15
%
Ban
k1
15
%
Scale
500M+ tweets a day,
100M+ consumers, …
Breadth
Buzz, intent, sentiment, life
events, personal atts, …
Complexity
Sarcasm, wishful thinking, …
Case Study 2: Machine Analysis
Web Server
Logs
Custom
Application
Logs
Database
Logs
Raw Logs
Parsed Log
Records
Parse Extract
Linkage
Information
End-to-End
Application
Sessions
Integrate
Graphical Models
(Anomaly detection)
Analyze
5 min 0.85
Cause Effect Delay Correlation
10 min 0.62
Correlation Analysis
• Every customer has unique
components with unique log record
formats
• Need to quickly customize all stages
of analysis for these custom log data
sources
• Use cases
• Requirements
• SystemT
• SystemT courses
Outline
Expressivity
• Need	to	express	complex	algorithms	for	a	variety	of	tasks	and	data	sources
Scalability
• Large	number	of	documents	and	large-size	documents
Transparency
• Every	customer’s	data	and	problems	are	unique	in	some	way
• Need	to	easily	comprehend,	debug	and	enhance	extractors
Enterprise Requirements
Expressivity Example: Different Kinds of Parses
We are raising our tablet forecast.
S
are
NP
We
S
raising NP
forecastNP
tablet
DET
our
subj
obj
subj pred
Natural Language Machine Log
Dependency
Tree
Oct 1 04:12:24 9.1.1.3 41865:
%PLATFORM_ENV-1-DUAL_PWR: Faulty
internal power supply B detected
Time Oct 1 04:12:24
Host 9.1.1.3
Process 41865
Category %PLATFORM_ENV-1-DUAL_PWR
Message
Faulty internal power supply
B detected
Expressivity Example: Fact Extraction (Tables)
8
Singapore 2012 Annual Report
(136 pages PDF)
Identify note breaking down
Operating expenses line item,
and extract opex components
Identify line item for Operating
expenses from Income statement
(financial table in pdf document)
• Social Media: Twitter alone has 500M+ messages / day; 1TB+ per day
• Financial Data: SEC alone has 20M+ filings, several TBs of data, with documents range
from few KBs to few MBs
• Machine Data: One application server under moderate load at medium logging level à
1GB of logs per day
Scalability Example
Transparency Example: Need to incorporate in-situ data
Intel's 2013 capex is elevated at 23% of sales, above average of 16%
FHLMC reported $4.4bn net loss and requested $6bn in capital from Treasury.
I'm still hearing from clients that Merrill's website is better.
Customer or
competitor?
Good or bad?
Entity of interest
I need to go back to Walmart, Toys R Us has the same
toy $10 cheaper!
• Use cases
• Requirements
• SystemT
• SystemT courses
Outline
AQL Language
Optimizer
Operator
Runtime
Specify extractor semantics
declaratively
Choose efficient execution plan
that implements semantics
Example AQL Extractor
Fundamental Results & Theorems
• Expressivity: The	class	of	extraction	tasks	expressible	in	AQL	is	a	strict	superset	of	that	expressible	through	cascaded	
regular	automata.
• Performance: For	any	acyclic	token-based	finite	state	transducer	T,	there	exists	an	operator	graph	G such	that	
evaluating	T and	G has	the	same	computational	complexity.
create view PersonPhone as
select P.name as person, N.number as phone
from Person P, PhoneNum N, Sentence S
where
Follows(P.name, N.number, 0, 30)
and Contains(S.sentence, P.name)
and Contains(S.sentence, N.number)
and ContainsRegex( /b(phone|at)b/ ,
SpanBetween(P.name, N.number) );
Within a singlesentence
<Person> <PhoneNum>
0-30 chars
Contains“phone” or “at”
Within a singlesentence
<Person> <PhoneNum>
0-30 chars
Contains“phone” or “at”
More	than	35	papers	across	ACL,	EMNLP,	VLDB,	PODS	and	SIGMOD
SystemT: IBM Research project started in 2006
SystemT Architecture
package com.ibm.avatar.algebra.util.sentence;
import java.io.BufferedWriter;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.regex.Matcher;
public class SentenceChunker
{
private Matcher sentenceEndingMatcher = null;
public static BufferedWriter sentenceBufferedWriter = null;
private HashSet<String> abbreviations = new HashSet<String> ();
public SentenceChunker ()
{
}
/** Constructor that takes in the abbreviations directly. */
public SentenceChunker (String[] abbreviations)
{
// Generate the abbreviations directly.
for (String abbr : abbreviations) {
this.abbreviations.add (abbr);
}
}
/**
* @param doc the document text to be analyzed
* @return true if the document contains at least one sentence boundary
*/
public boolean containsSentenceBoundary (String doc)
{
String origDoc = doc;
/*
* Based on getSentenceOffsetArrayList()
*/
// String origDoc = doc;
// int dotpos, quepos, exclpos, newlinepos;
int boundary;
int currentOffset = 0;
do {
/* Get the next tentative boundary for the sentenceString */
setDocumentForObtainingBoundaries (doc);
boundary = getNextCandidateBoundary ();
if (boundary != -1) {doc.substring (0, boundary + 1);
String remainder = doc.substring (boundary + 1);
String candidate = /*
* Looks at the last character of the String. If this last
* character is part of an abbreviation (as detected by
* REGEX) then the sentenceString is not a fullSentence and
* "false” is returned
*/
// while (!(isFullSentence(candidate) &&
// doesNotBeginWithCaps(remainder))) {
while (!(doesNotBeginWithPunctuation (remainder)
&& isFullSentence (candidate))) {
/* Get the next tentative boundary for the sentenceString */
int nextBoundary = getNextCandidateBoundary ();
if (nextBoundary == -1) {
break;
}
boundary = nextBoundary;
candidate = doc.substring (0, boundary + 1);
remainder = doc.substring (boundary + 1);
}
if (candidate.length () > 0) {
// sentences.addElement(candidate.trim().replaceAll("n", "
// "));
// sentenceArrayList.add(new Integer(currentOffset + boundary
// + 1));
// currentOffset += boundary + 1;
// Found a sentence boundary. If the boundary is the last
// character in the string, we don't consider it to be
// contained within the string.
int baseOffset = currentOffset + boundary + 1;
if (baseOffset < origDoc.length ()) {
// System.err.printf("Sentence ends at %d of %dn",
// baseOffset, origDoc.length());
return true;
}
else {
return false;
}
}
// origDoc.substring(0,currentOffset));
// doc = doc.substring(boundary + 1);
doc = remainder;
}
}
while (boundary != -1);
// If we get here, didn't find any boundaries.
return false;
}
public ArrayList<Integer> getSentenceOffsetArrayList (String doc)
{
ArrayList<Integer> sentenceArrayList = new ArrayList<Integer> ();
// String origDoc = doc;
// int dotpos, quepos, exclpos, newlinepos;
int boundary;
int currentOffset = 0;
sentenceArrayList.add (new Integer (0));
do {
/* Get the next tentative boundary for the sentenceString */
setDocumentForObtainingBoundaries (doc);
boundary = getNextCandidateBoundary ();
if (boundary != -1) {
String candidate = doc.substring (0, boundary + 1);
String remainder = doc.substring (boundary + 1);
/*
* Looks at the last character of the String. If this last character
* is part of an abbreviation (as detected by REGEX) then the
* sentenceString is not a fullSentence and "false" is returned
*/
// while (!(isFullSentence(candidate) &&
// doesNotBeginWithCaps(remainder))) {
while (!(doesNotBeginWithPunctuation (remainder) &&
isFullSentence (candidate))) {
/* Get the next tentative boundary for the sentenceString */
int nextBoundary = getNextCandidateBoundary ();
if (nextBoundary == -1) {
break;
}
boundary = nextBoundary;
candidate = doc.substring (0, boundary + 1);
remainder = doc.substring (boundary + 1);
}
if (candidate.length () > 0) {
sentenceArrayList.add (new Integer (currentOffset + boundary + 1));
currentOffset += boundary + 1;
}
// origDoc.substring(0,currentOffset));
doc = remainder;
}
}
while (boundary != -1);
if (doc.length () > 0) {
sentenceArrayList.add (new Integer (currentOffset + doc.length ()));
}
sentenceArrayList.trimToSize ();
return sentenceArrayList;
}
private void setDocumentForObtainingBoundaries (String doc)
{
sentenceEndingMatcher = SentenceConstants.
sentenceEndingPattern.matcher (doc);
}
private int getNextCandidateBoundary ()
{
if (sentenceEndingMatcher.find ()) {
return sentenceEndingMatcher.start ();
}
else
return -1;
}
private boolean doesNotBeginWithPunctuation (String remainder)
{
Matcher m = SentenceConstants.punctuationPattern.matcher (remainder);
return (!m.find ());
}
private String getLastWord (String cand)
{
Matcher lastWordMatcher = SentenceConstants.lastWordPattern.matcher (cand);
if (lastWordMatcher.find ()) {
return lastWordMatcher.group ();
}
else {
return "";
}
}
/*
* Looks at the last character of the String. If this last character is
* par of an abbreviation (as detected by REGEX)
* then the sentenceString is not a fullSentence and "false" is returned
*/
private boolean isFullSentence (String cand)
{
// cand = cand.replaceAll("n", " "); cand = " " + cand;
Matcher validSentenceBoundaryMatcher =
SentenceConstants.validSentenceBoundaryPattern.matcher (cand);
if (validSentenceBoundaryMatcher.find ()) return true;
Matcher abbrevMatcher = SentenceConstants.abbrevPattern.matcher (cand);
if (abbrevMatcher.find ()) {
return false; // Means it ends with an abbreviation
}
else {
// Check if the last word of the sentenceString has an entry in the
// abbreviations dictionary (like Mr etc.)
String lastword = getLastWord (cand);
if (abbreviations.contains (lastword)) { return false; }
}
return true;
}
}
Java Implementation of Sentence Boundary Detection
Expressivity: AQL vs. Custom Java Code
create dictionary AbbrevDict from file
'abbreviation.dict’;
create view SentenceBoundary as
select R.match as boundary
from ( extract regex /(([.?!]+s)|(ns*n))/
on D.text as match from Document D ) R
where
Not(ContainsDict('AbbrevDict',
CombineSpans(LeftContextTok(R.match, 1),R.match)));
Equivalent AQL Implementation
Expressivity: Rich Set of Operators
Deep Syntactic Parsing ML Training & Scoring
Core Operators
Tokenization Parts of Speech Dictionaries
Semantic Role Labels
Language to express NLP Algorithms à AQL
….Regular
Expressions
HTML/XML
Detagging
Span
Operations
Relational
Operations
Aggregation
Operations
Expressivity: Multilingual Support
create dictionary PurchaseVerbs as
('buy.01', 'purchase.01', 'acquire.01', 'get.01');
create view Relation as
select A.verb as BUY_VERB, R2.head as PURCHASE, A.polarity as WILL_BUY
from Action A, Role R
where
MatchesDict('PurchaseVerbs', A.verbClass);
and Equals(A.aid, R.aid)
and Equals(R.type, 'A1');
ACL	‘15,	‘16,	EMNLP	‘16,	COLING	‘16
• Parsing all languages into same
predicate argument structure
(language-independent)
• Common predicate argument
structure surfaced in AQL
Scalability: Class of Optimizations in SystemT
• Rewrite-based: rewrite algebraic operator graph
– Shared Dictionary Matching
– Shared Regular Expression Evaluation
– Regular Expression Strength Reduction
– On-demand tokenization
• Cost-based: relies on novel selectivity estimation for
text-specific operators
– Standard transformations
• E.g., push down selections
– Restricted Span Evaluation
• Evaluate expensive operators on restricted regions of
the document
16
Tokenization overhead is paid only
once
Person
(followed within 30 tokens)
Plan C
Plan A
Join
Phone
Restricted Span
Evaluation
Plan B
Person
Identify Phone starting
within 30 tokens
Extract text to the right
Phone
Identify Person ending
within 30 tokens Extract text to the left
Transparency in SystemT
PersonPhone
• Clear and simple provenance à
Person PhonePerson
Anna at James St. office (555-5555) ….
[Liu et al., VLDB’10]
Ease of comprehension
Ease of debugging
Ease of enhancement
• Explains output data in terms of input data, intermediate data, and the transformation
• For predicate-based rule languages (e.g., SQL), can be computed automatically
create view PersonPhone as
select P.name as person, N.number as phone
from Person P, PhoneNum N
where Follows(P.name, N.number, 0, 30);
person phone
Anna 555-5555
James 555-5555
Person
name
Anna
James
Phone
number
555-5555t1
t2
t3
t1 Ù t3
t2 Ù t3
Web Tools Overview Ease of
Programming
Ease of
Sharing
• Use cases
• Requirements
• SystemT
• SystemT courses
Outline
SystemT in Universities
2013 2015 2016 2017
• UC Santa Cruz
(full Graduate class)
2014
• U. Washington (Grad)
• U. Oregon (Undergrad)
• U. Aalborg, Denmark (Grad)
• UIUC (Grad)
• U. Maryland Baltimore County
(Undergrad)
• UC Irvine (Grad)
• NYU Abu-Dhabi (Undergrad)
• U. Washington (Grad)
• U. Oregon (Undergrad)
• U. Maryland Baltimore County
(Undergrad)
• …
• UC Santa Cruz, 3 lectures
in one Grad class
SystemT MOOC
Alon Halevy, November 2015
"The tutorial of System T was an important hands-on component of a Ph.D course on Structured Data on the Web at the
University of Aalborg in Denmark taught by Alon Halevy. The tutorial did a great job giving the students the feeling for
the challenges involved in extracting structured data from text.“
Shimei Pan, University of Maryland, Baltimore County, May 2016
"The Information Extraction with SystemT class, which was offered for the first time at UMBC, has been a wonderful
experience for me and my students. I really enjoyed teaching this course. Students were also very enthusiastic. Based
on the feedback from my students, they have learned a lot. Some of my students even want me to offer this class every
semester.“
SystemT in Universities: Testimonials
• Professors like teaching it
• Students do interesting projects and have fun!
• Identify character relationships in Games of Thrones
• Determine the cause and treatment for PTSD patients
• Extract job experiences and skills from resumes
• …
SystemT in Universities: Summary
SystemT MOOC
Bring SystemT to 100,000+ students
Target Audience: NLP Engineer, Data Scientist
e.g., University students (grad, undergrad), industry professionals
Learning Path with two classes
TA0105: Text Analytics: Getting Results with SystemT
• Basics of IE with SystemT, learning to write extractors using the web tool, learn AQL
• 5 lectures + 1 hands-on lab + final exam
TA0106EN: Advanced Text Analytics: Getting Results with SystemT
• Advanced material on declarative approach and SystemT optimization
• 2 lectures + final exam
Summary
create view Company as
select ...
from ...
where ...
create view SentimentFeaturesas
select …
from …;
AQLExtractorsAQLExtractors
Embedded machine
learning model
AQL Rules
create view SentimentForCompany as
select T.entity, T.polarity
from classifyPolarity (SentimentFeatures) T;
create view Company as
select ...
from ...
where ...
create view SentimentFeaturesas
select …
from …;
AQLExtractorsAQLExtractors
Embedded machine
learning model
AQL Rules
AQLExtractorsAQLExtractors
Embedded machine
learning model
Embedded machine
learning model
AQL Rules
create view SentimentForCompany as
select T.entity, T.polarity
from classifyPolarity (SentimentFeatures) T;
Declarative AQL language
Development Environment
Cost-based
optimization
...
Discovery tools for AQL development
SystemT Runtime
Input
Documents
Extracted
Objects
AQL: a declarative language that can be used to build
extractors outperforming the state-of-the-art [ACL’10,
EMNLP’10, PODS’13,PODS’14,EMNLP Tutorial’15]
Multilingual: [ACL’15,ACL’16,EMNLP’16,COLING’16]
A suite of novel development tooling leveraging machine
learning and HCI [EMNLP’08, VLDB’10, ACL’11, CIKM’11,
ACL’12, EMNLP’12, CHI’13, SIGMOD’13, ACL’13,VLDB’15,
NAACL’15]
Cost-based optimization for text-
centric operations
[ICDE’08, ICDE’11, FPL’13, FPL’14]
Highly embeddable runtime with high-
throughput and small memory footprint.
[SIGMOD Record’09, SIGMOD’09]
A declarative information extraction system with cost-based optimization, high-performance runtime
and novel development tooling based on solid theoretical foundation [PODS’13, 14]. Shipping with 10+ IBM
products, and installed on 400+ client’s sites.
InfoSphere
Streams
InfoSphere
BigInsights
SystemTSystemT
IBM Engines
…
Find out more about SystemT !
https://ibm.biz/BdF4GQ
Try out
SystemT
Watch a demo
Learn about using
SystemT in
unversity courses

Mais conteúdo relacionado

Mais procurados

DBMask: Fine-Grained Access Control on Encrypted Relational Databases
DBMask: Fine-Grained Access Control on Encrypted Relational DatabasesDBMask: Fine-Grained Access Control on Encrypted Relational Databases
DBMask: Fine-Grained Access Control on Encrypted Relational DatabasesMateus S. H. Cruz
 
Fosdem 2013 petra selmer flexible querying of graph data
Fosdem 2013 petra selmer   flexible querying of graph dataFosdem 2013 petra selmer   flexible querying of graph data
Fosdem 2013 petra selmer flexible querying of graph dataPetra Selmer
 
Algorithm and Programming (Introduction of dev pascal, data type, value, and ...
Algorithm and Programming (Introduction of dev pascal, data type, value, and ...Algorithm and Programming (Introduction of dev pascal, data type, value, and ...
Algorithm and Programming (Introduction of dev pascal, data type, value, and ...Adam Mukharil Bachtiar
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Primya Tamil
 
Information Retrieval 02
Information Retrieval 02Information Retrieval 02
Information Retrieval 02Jeet Das
 
Java Foundations: Methods
Java Foundations: MethodsJava Foundations: Methods
Java Foundations: MethodsSvetlin Nakov
 
Object oriented programming in C++
Object oriented programming in C++Object oriented programming in C++
Object oriented programming in C++jehan1987
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
C++ questions And Answer
C++ questions And AnswerC++ questions And Answer
C++ questions And Answerlavparmar007
 
Search Engines
Search EnginesSearch Engines
Search Enginesbutest
 
ENKI: Access Control for Encrypted Query Processing
ENKI: Access Control for Encrypted Query ProcessingENKI: Access Control for Encrypted Query Processing
ENKI: Access Control for Encrypted Query ProcessingMateus S. H. Cruz
 
Object Oriented Programming Short Notes for Preperation of Exams
Object Oriented Programming Short Notes for Preperation of ExamsObject Oriented Programming Short Notes for Preperation of Exams
Object Oriented Programming Short Notes for Preperation of ExamsMuhammadTalha436
 
IE for Semi-structured Document: Supervised Approach
IE for Semi-structured Document: Supervised ApproachIE for Semi-structured Document: Supervised Approach
IE for Semi-structured Document: Supervised Approachbutest
 
Homomorphic encryption and Private Machine Learning Classification
Homomorphic encryption and Private Machine Learning ClassificationHomomorphic encryption and Private Machine Learning Classification
Homomorphic encryption and Private Machine Learning ClassificationMohammed Ashour
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With RJahnab Kumar Deka
 
Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the Cloud
Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the CloudPrivacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the Cloud
Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the CloudMateus S. H. Cruz
 
Programming in c++
Programming in c++Programming in c++
Programming in c++Baljit Saini
 

Mais procurados (20)

DBMask: Fine-Grained Access Control on Encrypted Relational Databases
DBMask: Fine-Grained Access Control on Encrypted Relational DatabasesDBMask: Fine-Grained Access Control on Encrypted Relational Databases
DBMask: Fine-Grained Access Control on Encrypted Relational Databases
 
Fosdem 2013 petra selmer flexible querying of graph data
Fosdem 2013 petra selmer   flexible querying of graph dataFosdem 2013 petra selmer   flexible querying of graph data
Fosdem 2013 petra selmer flexible querying of graph data
 
Ir models
Ir modelsIr models
Ir models
 
Algorithm and Programming (Introduction of dev pascal, data type, value, and ...
Algorithm and Programming (Introduction of dev pascal, data type, value, and ...Algorithm and Programming (Introduction of dev pascal, data type, value, and ...
Algorithm and Programming (Introduction of dev pascal, data type, value, and ...
 
Text Mining with R
Text Mining with RText Mining with R
Text Mining with R
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 
Information Retrieval 02
Information Retrieval 02Information Retrieval 02
Information Retrieval 02
 
Java Foundations: Methods
Java Foundations: MethodsJava Foundations: Methods
Java Foundations: Methods
 
Object oriented programming in C++
Object oriented programming in C++Object oriented programming in C++
Object oriented programming in C++
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
C++ questions And Answer
C++ questions And AnswerC++ questions And Answer
C++ questions And Answer
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
ENKI: Access Control for Encrypted Query Processing
ENKI: Access Control for Encrypted Query ProcessingENKI: Access Control for Encrypted Query Processing
ENKI: Access Control for Encrypted Query Processing
 
Object Oriented Programming Short Notes for Preperation of Exams
Object Oriented Programming Short Notes for Preperation of ExamsObject Oriented Programming Short Notes for Preperation of Exams
Object Oriented Programming Short Notes for Preperation of Exams
 
IE for Semi-structured Document: Supervised Approach
IE for Semi-structured Document: Supervised ApproachIE for Semi-structured Document: Supervised Approach
IE for Semi-structured Document: Supervised Approach
 
Homomorphic encryption and Private Machine Learning Classification
Homomorphic encryption and Private Machine Learning ClassificationHomomorphic encryption and Private Machine Learning Classification
Homomorphic encryption and Private Machine Learning Classification
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With R
 
Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the Cloud
Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the CloudPrivacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the Cloud
Privacy-Preserving Multi-Keyword Fuzzy Search over Encrypted Data in the Cloud
 
Programming in c++
Programming in c++Programming in c++
Programming in c++
 

Semelhante a Declarative Multilingual Information Extraction with SystemT

2015 07-30-sysetm t.short.no.animation
2015 07-30-sysetm t.short.no.animation2015 07-30-sysetm t.short.no.animation
2015 07-30-sysetm t.short.no.animationdiannepatricia
 
SystemT: Declarative Information Extraction
SystemT: Declarative Information ExtractionSystemT: Declarative Information Extraction
SystemT: Declarative Information ExtractionYunyao Li
 
Declarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemTDeclarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemTLaura Chiticariu
 
2017-01-25-SystemT-Overview-Stanford
2017-01-25-SystemT-Overview-Stanford2017-01-25-SystemT-Overview-Stanford
2017-01-25-SystemT-Overview-StanfordLaura Chiticariu
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Flink Forward
 
엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630Yong Joon Moon
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout source{d}
 
Dsm as theory building
Dsm as theory buildingDsm as theory building
Dsm as theory buildingClarkTony
 
CS101- Introduction to Computing- Lecture 29
CS101- Introduction to Computing- Lecture 29CS101- Introduction to Computing- Lecture 29
CS101- Introduction to Computing- Lecture 29Bilal Ahmed
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxpetabridge
 
Whats New In C# 4 0 - NetPonto
Whats New In C# 4 0 - NetPontoWhats New In C# 4 0 - NetPonto
Whats New In C# 4 0 - NetPontoPaulo Morgado
 
devLink - What's New in C# 4?
devLink - What's New in C# 4?devLink - What's New in C# 4?
devLink - What's New in C# 4?Kevin Pilch
 
Применение паттерна Page Object для автоматизации веб сервисов - новый взгляд
Применение паттерна Page Object для автоматизации веб сервисов - новый взглядПрименение паттерна Page Object для автоматизации веб сервисов - новый взгляд
Применение паттерна Page Object для автоматизации веб сервисов - новый взглядCOMAQA.BY
 
Clean code _v2003
 Clean code _v2003 Clean code _v2003
Clean code _v2003R696
 
The Swift Compiler and Standard Library
The Swift Compiler and Standard LibraryThe Swift Compiler and Standard Library
The Swift Compiler and Standard LibrarySantosh Rajan
 
PythonStudyMaterialSTudyMaterial.pdf
PythonStudyMaterialSTudyMaterial.pdfPythonStudyMaterialSTudyMaterial.pdf
PythonStudyMaterialSTudyMaterial.pdfdata2businessinsight
 
To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2Bahul Neel Upadhyaya
 
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Spark Summit
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 

Semelhante a Declarative Multilingual Information Extraction with SystemT (20)

2015 07-30-sysetm t.short.no.animation
2015 07-30-sysetm t.short.no.animation2015 07-30-sysetm t.short.no.animation
2015 07-30-sysetm t.short.no.animation
 
SystemT: Declarative Information Extraction
SystemT: Declarative Information ExtractionSystemT: Declarative Information Extraction
SystemT: Declarative Information Extraction
 
Declarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemTDeclarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemT
 
2017-01-25-SystemT-Overview-Stanford
2017-01-25-SystemT-Overview-Stanford2017-01-25-SystemT-Overview-Stanford
2017-01-25-SystemT-Overview-Stanford
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
 
엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 
Dsm as theory building
Dsm as theory buildingDsm as theory building
Dsm as theory building
 
CS101- Introduction to Computing- Lecture 29
CS101- Introduction to Computing- Lecture 29CS101- Introduction to Computing- Lecture 29
CS101- Introduction to Computing- Lecture 29
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
Whats New In C# 4 0 - NetPonto
Whats New In C# 4 0 - NetPontoWhats New In C# 4 0 - NetPonto
Whats New In C# 4 0 - NetPonto
 
devLink - What's New in C# 4?
devLink - What's New in C# 4?devLink - What's New in C# 4?
devLink - What's New in C# 4?
 
Применение паттерна Page Object для автоматизации веб сервисов - новый взгляд
Применение паттерна Page Object для автоматизации веб сервисов - новый взглядПрименение паттерна Page Object для автоматизации веб сервисов - новый взгляд
Применение паттерна Page Object для автоматизации веб сервисов - новый взгляд
 
Sharbani bhattacharya VB Structures
Sharbani bhattacharya VB StructuresSharbani bhattacharya VB Structures
Sharbani bhattacharya VB Structures
 
Clean code _v2003
 Clean code _v2003 Clean code _v2003
Clean code _v2003
 
The Swift Compiler and Standard Library
The Swift Compiler and Standard LibraryThe Swift Compiler and Standard Library
The Swift Compiler and Standard Library
 
PythonStudyMaterialSTudyMaterial.pdf
PythonStudyMaterialSTudyMaterial.pdfPythonStudyMaterialSTudyMaterial.pdf
PythonStudyMaterialSTudyMaterial.pdf
 
To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2
 
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 

Mais de diannepatricia

Teaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsonTeaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsondiannepatricia
 
Cognitive systems institute talk 8 june 2017 - v.1.0
Cognitive systems institute talk   8 june 2017 - v.1.0Cognitive systems institute talk   8 june 2017 - v.1.0
Cognitive systems institute talk 8 june 2017 - v.1.0diannepatricia
 
Building Compassionate Conversational Systems
Building Compassionate Conversational SystemsBuilding Compassionate Conversational Systems
Building Compassionate Conversational Systemsdiannepatricia
 
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”diannepatricia
 
Cognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving AccessibilityCognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving Accessibilitydiannepatricia
 
Artificial Intellingence in the Car
Artificial Intellingence in the CarArtificial Intellingence in the Car
Artificial Intellingence in the Cardiannepatricia
 
“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”diannepatricia
 
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...diannepatricia
 
170330 cognitive systems institute speaker series mark sherman - watson pr...
170330 cognitive systems institute speaker series    mark sherman - watson pr...170330 cognitive systems institute speaker series    mark sherman - watson pr...
170330 cognitive systems institute speaker series mark sherman - watson pr...diannepatricia
 
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”diannepatricia
 
Cognitive Assistance for the Aging
Cognitive Assistance for the AgingCognitive Assistance for the Aging
Cognitive Assistance for the Agingdiannepatricia
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"diannepatricia
 
The Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented IntelligenceThe Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented Intelligencediannepatricia
 
Developing Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team CognitionDeveloping Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team Cognitiondiannepatricia
 
Cyber-Social Learning Systems
Cyber-Social Learning SystemsCyber-Social Learning Systems
Cyber-Social Learning Systemsdiannepatricia
 
“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”diannepatricia
 
"Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ..."Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ...diannepatricia
 
Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50diannepatricia
 
KATE - a Platform for Machine Learning
KATE - a Platform for Machine LearningKATE - a Platform for Machine Learning
KATE - a Platform for Machine Learningdiannepatricia
 
Cognitive Computing for Aging Society
Cognitive Computing for Aging SocietyCognitive Computing for Aging Society
Cognitive Computing for Aging Societydiannepatricia
 

Mais de diannepatricia (20)

Teaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watsonTeaching cognitive computing with ibm watson
Teaching cognitive computing with ibm watson
 
Cognitive systems institute talk 8 june 2017 - v.1.0
Cognitive systems institute talk   8 june 2017 - v.1.0Cognitive systems institute talk   8 june 2017 - v.1.0
Cognitive systems institute talk 8 june 2017 - v.1.0
 
Building Compassionate Conversational Systems
Building Compassionate Conversational SystemsBuilding Compassionate Conversational Systems
Building Compassionate Conversational Systems
 
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
“Artificial Intelligence, Cognitive Computing and Innovating in Practice”
 
Cognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving AccessibilityCognitive Insights drive self-driving Accessibility
Cognitive Insights drive self-driving Accessibility
 
Artificial Intellingence in the Car
Artificial Intellingence in the CarArtificial Intellingence in the Car
Artificial Intellingence in the Car
 
“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”“Semantic PDF Processing & Document Representation”
“Semantic PDF Processing & Document Representation”
 
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
Joining Industry and Students for Cognitive Solutions at Karlsruhe Services R...
 
170330 cognitive systems institute speaker series mark sherman - watson pr...
170330 cognitive systems institute speaker series    mark sherman - watson pr...170330 cognitive systems institute speaker series    mark sherman - watson pr...
170330 cognitive systems institute speaker series mark sherman - watson pr...
 
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
“Fairness Cases as an Accelerant and Enabler for Cognitive Assistance Adoption”
 
Cognitive Assistance for the Aging
Cognitive Assistance for the AgingCognitive Assistance for the Aging
Cognitive Assistance for the Aging
 
From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"
 
The Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented IntelligenceThe Role of Dialog in Augmented Intelligence
The Role of Dialog in Augmented Intelligence
 
Developing Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team CognitionDeveloping Cognitive Systems to Support Team Cognition
Developing Cognitive Systems to Support Team Cognition
 
Cyber-Social Learning Systems
Cyber-Social Learning SystemsCyber-Social Learning Systems
Cyber-Social Learning Systems
 
“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”“IT Technology Trends in 2017… and Beyond”
“IT Technology Trends in 2017… and Beyond”
 
"Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ..."Curious Learning: using a mobile platform for early literacy education as a ...
"Curious Learning: using a mobile platform for early literacy education as a ...
 
Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50Embodied Cognition - Booch HICSS50
Embodied Cognition - Booch HICSS50
 
KATE - a Platform for Machine Learning
KATE - a Platform for Machine LearningKATE - a Platform for Machine Learning
KATE - a Platform for Machine Learning
 
Cognitive Computing for Aging Society
Cognitive Computing for Aging SocietyCognitive Computing for Aging Society
Cognitive Computing for Aging Society
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Declarative Multilingual Information Extraction with SystemT

  • 1. SystemT: Declarative Information Extraction Laura Chiticariu, IBM Research Cognitive Systems Institute Speaker Series 11/17/2016
  • 2. • Use cases • Requirements • SystemT • SystemT courses Outline
  • 3. Case Study 1: Sentiment Analysis Text Analytics Product catalog, Customer Master Data, … Social Media • Products Interests360o Profile • Relationships • Personal Attributes • Life Events Statistical Analysis, Report Gen. Ban k6 23 % Ban k5 20 % Ban k4 21 % Ban k3 5% Ban k2 15 % Ban k1 15 % Scale 500M+ tweets a day, 100M+ consumers, … Breadth Buzz, intent, sentiment, life events, personal atts, … Complexity Sarcasm, wishful thinking, …
  • 4. Case Study 2: Machine Analysis Web Server Logs Custom Application Logs Database Logs Raw Logs Parsed Log Records Parse Extract Linkage Information End-to-End Application Sessions Integrate Graphical Models (Anomaly detection) Analyze 5 min 0.85 Cause Effect Delay Correlation 10 min 0.62 Correlation Analysis • Every customer has unique components with unique log record formats • Need to quickly customize all stages of analysis for these custom log data sources
  • 5. • Use cases • Requirements • SystemT • SystemT courses Outline
  • 6. Expressivity • Need to express complex algorithms for a variety of tasks and data sources Scalability • Large number of documents and large-size documents Transparency • Every customer’s data and problems are unique in some way • Need to easily comprehend, debug and enhance extractors Enterprise Requirements
  • 7. Expressivity Example: Different Kinds of Parses We are raising our tablet forecast. S are NP We S raising NP forecastNP tablet DET our subj obj subj pred Natural Language Machine Log Dependency Tree Oct 1 04:12:24 9.1.1.3 41865: %PLATFORM_ENV-1-DUAL_PWR: Faulty internal power supply B detected Time Oct 1 04:12:24 Host 9.1.1.3 Process 41865 Category %PLATFORM_ENV-1-DUAL_PWR Message Faulty internal power supply B detected
  • 8. Expressivity Example: Fact Extraction (Tables) 8 Singapore 2012 Annual Report (136 pages PDF) Identify note breaking down Operating expenses line item, and extract opex components Identify line item for Operating expenses from Income statement (financial table in pdf document)
  • 9. • Social Media: Twitter alone has 500M+ messages / day; 1TB+ per day • Financial Data: SEC alone has 20M+ filings, several TBs of data, with documents range from few KBs to few MBs • Machine Data: One application server under moderate load at medium logging level à 1GB of logs per day Scalability Example
  • 10. Transparency Example: Need to incorporate in-situ data Intel's 2013 capex is elevated at 23% of sales, above average of 16% FHLMC reported $4.4bn net loss and requested $6bn in capital from Treasury. I'm still hearing from clients that Merrill's website is better. Customer or competitor? Good or bad? Entity of interest I need to go back to Walmart, Toys R Us has the same toy $10 cheaper!
  • 11. • Use cases • Requirements • SystemT • SystemT courses Outline
  • 12. AQL Language Optimizer Operator Runtime Specify extractor semantics declaratively Choose efficient execution plan that implements semantics Example AQL Extractor Fundamental Results & Theorems • Expressivity: The class of extraction tasks expressible in AQL is a strict superset of that expressible through cascaded regular automata. • Performance: For any acyclic token-based finite state transducer T, there exists an operator graph G such that evaluating T and G has the same computational complexity. create view PersonPhone as select P.name as person, N.number as phone from Person P, PhoneNum N, Sentence S where Follows(P.name, N.number, 0, 30) and Contains(S.sentence, P.name) and Contains(S.sentence, N.number) and ContainsRegex( /b(phone|at)b/ , SpanBetween(P.name, N.number) ); Within a singlesentence <Person> <PhoneNum> 0-30 chars Contains“phone” or “at” Within a singlesentence <Person> <PhoneNum> 0-30 chars Contains“phone” or “at” More than 35 papers across ACL, EMNLP, VLDB, PODS and SIGMOD SystemT: IBM Research project started in 2006 SystemT Architecture
  • 13. package com.ibm.avatar.algebra.util.sentence; import java.io.BufferedWriter; import java.util.ArrayList; import java.util.HashSet; import java.util.regex.Matcher; public class SentenceChunker { private Matcher sentenceEndingMatcher = null; public static BufferedWriter sentenceBufferedWriter = null; private HashSet<String> abbreviations = new HashSet<String> (); public SentenceChunker () { } /** Constructor that takes in the abbreviations directly. */ public SentenceChunker (String[] abbreviations) { // Generate the abbreviations directly. for (String abbr : abbreviations) { this.abbreviations.add (abbr); } } /** * @param doc the document text to be analyzed * @return true if the document contains at least one sentence boundary */ public boolean containsSentenceBoundary (String doc) { String origDoc = doc; /* * Based on getSentenceOffsetArrayList() */ // String origDoc = doc; // int dotpos, quepos, exclpos, newlinepos; int boundary; int currentOffset = 0; do { /* Get the next tentative boundary for the sentenceString */ setDocumentForObtainingBoundaries (doc); boundary = getNextCandidateBoundary (); if (boundary != -1) {doc.substring (0, boundary + 1); String remainder = doc.substring (boundary + 1); String candidate = /* * Looks at the last character of the String. If this last * character is part of an abbreviation (as detected by * REGEX) then the sentenceString is not a fullSentence and * "false” is returned */ // while (!(isFullSentence(candidate) && // doesNotBeginWithCaps(remainder))) { while (!(doesNotBeginWithPunctuation (remainder) && isFullSentence (candidate))) { /* Get the next tentative boundary for the sentenceString */ int nextBoundary = getNextCandidateBoundary (); if (nextBoundary == -1) { break; } boundary = nextBoundary; candidate = doc.substring (0, boundary + 1); remainder = doc.substring (boundary + 1); } if (candidate.length () > 0) { // sentences.addElement(candidate.trim().replaceAll("n", " // ")); // sentenceArrayList.add(new Integer(currentOffset + boundary // + 1)); // currentOffset += boundary + 1; // Found a sentence boundary. If the boundary is the last // character in the string, we don't consider it to be // contained within the string. int baseOffset = currentOffset + boundary + 1; if (baseOffset < origDoc.length ()) { // System.err.printf("Sentence ends at %d of %dn", // baseOffset, origDoc.length()); return true; } else { return false; } } // origDoc.substring(0,currentOffset)); // doc = doc.substring(boundary + 1); doc = remainder; } } while (boundary != -1); // If we get here, didn't find any boundaries. return false; } public ArrayList<Integer> getSentenceOffsetArrayList (String doc) { ArrayList<Integer> sentenceArrayList = new ArrayList<Integer> (); // String origDoc = doc; // int dotpos, quepos, exclpos, newlinepos; int boundary; int currentOffset = 0; sentenceArrayList.add (new Integer (0)); do { /* Get the next tentative boundary for the sentenceString */ setDocumentForObtainingBoundaries (doc); boundary = getNextCandidateBoundary (); if (boundary != -1) { String candidate = doc.substring (0, boundary + 1); String remainder = doc.substring (boundary + 1); /* * Looks at the last character of the String. If this last character * is part of an abbreviation (as detected by REGEX) then the * sentenceString is not a fullSentence and "false" is returned */ // while (!(isFullSentence(candidate) && // doesNotBeginWithCaps(remainder))) { while (!(doesNotBeginWithPunctuation (remainder) && isFullSentence (candidate))) { /* Get the next tentative boundary for the sentenceString */ int nextBoundary = getNextCandidateBoundary (); if (nextBoundary == -1) { break; } boundary = nextBoundary; candidate = doc.substring (0, boundary + 1); remainder = doc.substring (boundary + 1); } if (candidate.length () > 0) { sentenceArrayList.add (new Integer (currentOffset + boundary + 1)); currentOffset += boundary + 1; } // origDoc.substring(0,currentOffset)); doc = remainder; } } while (boundary != -1); if (doc.length () > 0) { sentenceArrayList.add (new Integer (currentOffset + doc.length ())); } sentenceArrayList.trimToSize (); return sentenceArrayList; } private void setDocumentForObtainingBoundaries (String doc) { sentenceEndingMatcher = SentenceConstants. sentenceEndingPattern.matcher (doc); } private int getNextCandidateBoundary () { if (sentenceEndingMatcher.find ()) { return sentenceEndingMatcher.start (); } else return -1; } private boolean doesNotBeginWithPunctuation (String remainder) { Matcher m = SentenceConstants.punctuationPattern.matcher (remainder); return (!m.find ()); } private String getLastWord (String cand) { Matcher lastWordMatcher = SentenceConstants.lastWordPattern.matcher (cand); if (lastWordMatcher.find ()) { return lastWordMatcher.group (); } else { return ""; } } /* * Looks at the last character of the String. If this last character is * par of an abbreviation (as detected by REGEX) * then the sentenceString is not a fullSentence and "false" is returned */ private boolean isFullSentence (String cand) { // cand = cand.replaceAll("n", " "); cand = " " + cand; Matcher validSentenceBoundaryMatcher = SentenceConstants.validSentenceBoundaryPattern.matcher (cand); if (validSentenceBoundaryMatcher.find ()) return true; Matcher abbrevMatcher = SentenceConstants.abbrevPattern.matcher (cand); if (abbrevMatcher.find ()) { return false; // Means it ends with an abbreviation } else { // Check if the last word of the sentenceString has an entry in the // abbreviations dictionary (like Mr etc.) String lastword = getLastWord (cand); if (abbreviations.contains (lastword)) { return false; } } return true; } } Java Implementation of Sentence Boundary Detection Expressivity: AQL vs. Custom Java Code create dictionary AbbrevDict from file 'abbreviation.dict’; create view SentenceBoundary as select R.match as boundary from ( extract regex /(([.?!]+s)|(ns*n))/ on D.text as match from Document D ) R where Not(ContainsDict('AbbrevDict', CombineSpans(LeftContextTok(R.match, 1),R.match))); Equivalent AQL Implementation
  • 14. Expressivity: Rich Set of Operators Deep Syntactic Parsing ML Training & Scoring Core Operators Tokenization Parts of Speech Dictionaries Semantic Role Labels Language to express NLP Algorithms à AQL ….Regular Expressions HTML/XML Detagging Span Operations Relational Operations Aggregation Operations
  • 15. Expressivity: Multilingual Support create dictionary PurchaseVerbs as ('buy.01', 'purchase.01', 'acquire.01', 'get.01'); create view Relation as select A.verb as BUY_VERB, R2.head as PURCHASE, A.polarity as WILL_BUY from Action A, Role R where MatchesDict('PurchaseVerbs', A.verbClass); and Equals(A.aid, R.aid) and Equals(R.type, 'A1'); ACL ‘15, ‘16, EMNLP ‘16, COLING ‘16 • Parsing all languages into same predicate argument structure (language-independent) • Common predicate argument structure surfaced in AQL
  • 16. Scalability: Class of Optimizations in SystemT • Rewrite-based: rewrite algebraic operator graph – Shared Dictionary Matching – Shared Regular Expression Evaluation – Regular Expression Strength Reduction – On-demand tokenization • Cost-based: relies on novel selectivity estimation for text-specific operators – Standard transformations • E.g., push down selections – Restricted Span Evaluation • Evaluate expensive operators on restricted regions of the document 16 Tokenization overhead is paid only once Person (followed within 30 tokens) Plan C Plan A Join Phone Restricted Span Evaluation Plan B Person Identify Phone starting within 30 tokens Extract text to the right Phone Identify Person ending within 30 tokens Extract text to the left
  • 17. Transparency in SystemT PersonPhone • Clear and simple provenance à Person PhonePerson Anna at James St. office (555-5555) …. [Liu et al., VLDB’10] Ease of comprehension Ease of debugging Ease of enhancement • Explains output data in terms of input data, intermediate data, and the transformation • For predicate-based rule languages (e.g., SQL), can be computed automatically create view PersonPhone as select P.name as person, N.number as phone from Person P, PhoneNum N where Follows(P.name, N.number, 0, 30); person phone Anna 555-5555 James 555-5555 Person name Anna James Phone number 555-5555t1 t2 t3 t1 Ù t3 t2 Ù t3
  • 18. Web Tools Overview Ease of Programming Ease of Sharing
  • 19. • Use cases • Requirements • SystemT • SystemT courses Outline
  • 20. SystemT in Universities 2013 2015 2016 2017 • UC Santa Cruz (full Graduate class) 2014 • U. Washington (Grad) • U. Oregon (Undergrad) • U. Aalborg, Denmark (Grad) • UIUC (Grad) • U. Maryland Baltimore County (Undergrad) • UC Irvine (Grad) • NYU Abu-Dhabi (Undergrad) • U. Washington (Grad) • U. Oregon (Undergrad) • U. Maryland Baltimore County (Undergrad) • … • UC Santa Cruz, 3 lectures in one Grad class SystemT MOOC
  • 21. Alon Halevy, November 2015 "The tutorial of System T was an important hands-on component of a Ph.D course on Structured Data on the Web at the University of Aalborg in Denmark taught by Alon Halevy. The tutorial did a great job giving the students the feeling for the challenges involved in extracting structured data from text.“ Shimei Pan, University of Maryland, Baltimore County, May 2016 "The Information Extraction with SystemT class, which was offered for the first time at UMBC, has been a wonderful experience for me and my students. I really enjoyed teaching this course. Students were also very enthusiastic. Based on the feedback from my students, they have learned a lot. Some of my students even want me to offer this class every semester.“ SystemT in Universities: Testimonials
  • 22. • Professors like teaching it • Students do interesting projects and have fun! • Identify character relationships in Games of Thrones • Determine the cause and treatment for PTSD patients • Extract job experiences and skills from resumes • … SystemT in Universities: Summary
  • 23. SystemT MOOC Bring SystemT to 100,000+ students Target Audience: NLP Engineer, Data Scientist e.g., University students (grad, undergrad), industry professionals Learning Path with two classes TA0105: Text Analytics: Getting Results with SystemT • Basics of IE with SystemT, learning to write extractors using the web tool, learn AQL • 5 lectures + 1 hands-on lab + final exam TA0106EN: Advanced Text Analytics: Getting Results with SystemT • Advanced material on declarative approach and SystemT optimization • 2 lectures + final exam
  • 24. Summary create view Company as select ... from ... where ... create view SentimentFeaturesas select … from …; AQLExtractorsAQLExtractors Embedded machine learning model AQL Rules create view SentimentForCompany as select T.entity, T.polarity from classifyPolarity (SentimentFeatures) T; create view Company as select ... from ... where ... create view SentimentFeaturesas select … from …; AQLExtractorsAQLExtractors Embedded machine learning model AQL Rules AQLExtractorsAQLExtractors Embedded machine learning model Embedded machine learning model AQL Rules create view SentimentForCompany as select T.entity, T.polarity from classifyPolarity (SentimentFeatures) T; Declarative AQL language Development Environment Cost-based optimization ... Discovery tools for AQL development SystemT Runtime Input Documents Extracted Objects AQL: a declarative language that can be used to build extractors outperforming the state-of-the-art [ACL’10, EMNLP’10, PODS’13,PODS’14,EMNLP Tutorial’15] Multilingual: [ACL’15,ACL’16,EMNLP’16,COLING’16] A suite of novel development tooling leveraging machine learning and HCI [EMNLP’08, VLDB’10, ACL’11, CIKM’11, ACL’12, EMNLP’12, CHI’13, SIGMOD’13, ACL’13,VLDB’15, NAACL’15] Cost-based optimization for text- centric operations [ICDE’08, ICDE’11, FPL’13, FPL’14] Highly embeddable runtime with high- throughput and small memory footprint. [SIGMOD Record’09, SIGMOD’09] A declarative information extraction system with cost-based optimization, high-performance runtime and novel development tooling based on solid theoretical foundation [PODS’13, 14]. Shipping with 10+ IBM products, and installed on 400+ client’s sites. InfoSphere Streams InfoSphere BigInsights SystemTSystemT IBM Engines …
  • 25. Find out more about SystemT ! https://ibm.biz/BdF4GQ Try out SystemT Watch a demo Learn about using SystemT in unversity courses