SlideShare uma empresa Scribd logo
1 de 60
Parsing Techniques
for
Lexicalized
Context-Free Grammars*
Giorgio Satta

University of Padua
* Joint work with : Jason Eisner, Mark-Jan Nederhof
Summary
• Part I: Lexicalized Context-Free Grammars
– motivations and definition
– relation with other formalisms

• Part II: standard parsing
– TD techniques
– BU techniques

• Part III: novel algorithms
– BU enhanced
– TD enhanced
Lexicalized grammars
• each rule specialized for one or more
lexical items
• advantages over non-lexicalized
formalisms:
– express syntactic preferences that are
sensitive to lexical words
– control word selection
Syntactic preferences
• adjuncts
Workers [ dumped sacks ] into a bin
*Workers dumped [ sacks into a bin ]

• N-N compound
[ hydrogen ion ] exchange
*hydrogen [ ion exchange ]
Word selection
• lexical
Nora convened the meeting
?Nora convened the party

• semantics
Peggy solved two puzzles
?Peggy solved two goats

• world knowledge
Mary shelved some books
?Mary shelved some cooks
Lexicalized CFG
Motivations :
• study computational properties common
to generative formalisms used in
state-of-the-art real-world parsers
• develop parsing algorithm that can be
directly applied to these formalisms
Lexicalized CFG
VP[dump][sack]
VP[dump][sack]
V[dump]

NP[sack]

PP[into][bin]
P[into]

N[sack]

dumped

sacks

NP[bin]
Det[a]

into

a

N[bin]

bin
Lexicalized CFG
Context-free grammars with :
• alphabet VT:
– dumped, sacks, into, ...

• delexicalized nonterminals VD:
– NP, VP, ...

• nonterminals VN:
– NP[sack], VP[dump][sack], ...
Lexicalized CFG
Delexicalized nonterminals encode :
• word sense
N, V, ...

• grammatical features
number, tense, ...

• structural information
bar level, subcategorization state, ...

• other constraints
distribution, contextual features, ...
Lexicalized CFG
• productions have two forms :
– V[dump] → dumped
– VP[dump][sack]
→ VP[dump][sack] PP[into][bin]

• lexical elements in lhs inherited from rhs
Lexicalized CFG
• production is k-lexical :
k occurrences of lexical elements in rhs
– NP[bin] → Det[a] N[bin]
is 2-lexical
– VP[dump][sack]
→ VP[dump][sack] PP[into][bin]
is 4-lexical
LCFG at work
• 2-lexical CFG
– Alshawi 1996

: Head Automata

– Eisner 1996

: Dependency Grammars

– Charniak 1997

: CFG

– Collins 1997

: generative model
LCFG at work
Probabilistic LCFG G is strongly equivalent
to probabilistic grammar G’ iff
• 1-2-1 mapping between derivations
• each direction is a homomorphism
• derivation probabilities are preserved
LCFG at work
From Charniak 1997 to 2-lex CFG :
NPS[profits]
NNP[profits]

ADJNP[corporate]

Pr1 (corporate | ADJ, NP, profits) ×
Pr1 (profits | N, NP, profits) ×

Pr2 ( NP → ADJ N | NP, S, profits)
LCFG at work
From Collins 1997 (Model #2) to 2-lex CFG :

〈VPS, {}, ∆left ⊕ ∆, ∆S ⊕ ∆〉 [bought]
〈N, ∆〉 [IBM]

〈VPS, {NP-C}, ∆left , ∆S 〉 [bought]

Prleft (NP, IBM | VP, S, bought, ∆left , {NP-C})
LCFG at work
Major Limitation :
Cannot capture relations involving
lexical items outside actual constituent
(cfr. history based models)
NP[d1][d0]
V[d0]

d0

NP[d1]

d1

PP[d2][d3]

d2

cannot look at d0
when computing
PP attachment
LCFG at work
• lexicalized context-free parsers that
are not LCFG :
– Magerman 1995

: Shift-Reduce+

– Ratnaparkhi 1997

: Shift-Reduce+

– Chelba & Jelinek 1998

: Shift-Reduce+

– Hermjakob & Mooney 1997

: LR
Related work
Other frameworks for the study of
lexicalized grammars :
• Carroll & Weir 1997 :
Stochastic Lexicalized Grammars;
emphasis on expressiveness
• Goodman 1997 :
Probabilistic Feature Grammars;
emphasis on parameter estimation
Summary
• Part I: Lexicalized Context-Free Grammars
– motivations and definition
– relation with other formalisms

• Part II: standard parsing
– TD techniques
– BU techniques

• Part III: novel algorithms
– BU enhanced
– TD enhanced
Standard Parsing
• standard parsing algorithms
(CKY, Earley, LC, ...) run on LCFG in time
O ( | G | × |w |3 )
• for 2-lex CFG (simplest case) |G | grows
with |VD|3 × |VT|2 !!
Goal :
Get rid of |VT| factors
Standard Parsing: TD
Result (to be refined) :
Algorithms satisfying the correct-prefix
property are “unlikely” to run on LCFG in
time independent of VT
Correct-prefix property
Earley, Left-Corner, GLR, ... :
S

w
left-to-right reading position
On-line parsing
No grammar precompilation (Earley) :

G
w

Parser

Output
Standard Parsing: TD
Result :
On-line parsers with correct-prefix property
cannot run in time O ( f(|VD|, |w |) ),
for any function f
Off-line parsing
Grammar is precompiled (Left-Corner, LR) :

w

Parser

C(G )

G

PreComp

Output
Standard Parsing: TD
Fact :
We can simulate a nondeterministic
FA M on w in time O ( |M | × |w | )
Conjecture :
Fix a polynomial p.
We cannot simulate M on w in time
p( |w | ) unless we spend exponential
time in precompiling M
Standard Parsing: TD
Assume our conjecture holds true
Result :
Off-line parsers with correct-prefix property
cannot run in time O ( p(|VD|, |w |) ),
for any polynomial p, unless we spend
exponential time in precompiling G
Standard Parsing: BU
Common practice in lexicalized grammar
parsing :
• select productions that are lexically
grounded in w
• parse BU with selected subset of G
Problem :
Algorithm removes |VT| factors but
introduces new |w | factors !!
Standard Parsing: BU
Time charged :
• i, k, j
⇒ |w |3

A[d2]
B[d1]

i

d1

C[d2]

k

d2

• A, B, C
j

• d 1, d 2

Running time is O ( |VD|3 × |w |5 ) !!

⇒ |VD|3
⇒ |w |2
Standard BU : Exhaustive
100000

10000

y = c x 5,2019
1000

time

BU naive
100

10

1
10

100

length
Standard BU : Pruning
10000

1000

time

y = c x 3,8282
BU naive

100

10

1
10

100

length
Summary
• Part I: Lexicalized Context-Free Grammars
– motivations and definition
– relation with other formalisms

• Part II: standard parsing
– TD techniques
– BU techniques

• Part III: novel algorithms
– BU enhanced
– TD enhanced
BU enhanced
Result :
Parsing with 2-lex CFG in time
O ( |VD|3 × |w |4 )
Remark :
Result transfers to models in Alshawi 1996,
Eisner 1996, Charniak 1997, Collins 1997
Remark :
Technique extends to improve parsing of
Lexicalized-Tree Adjoining Grammars
Algorithm #1
Basic step in naive BU :
A[d2]
B[d1]

i

d1

C[d2]

k

d2

j

Idea :
Indices d1 and j can be processed
independently
Algorithm #1
• Step 1

A[d2]

• Step 2

d1

⇒

C[d2]

B[d1]
i

A[d2]

k

A[d2]

k

d2

k

i

d2

C[d2]
i

C[d2]

j

d2

A[d2]

⇒
i

d2

j
BU enhanced
Upper bound provided by Algorithm #1 :
O (|w |4 )
Goal :
Can we go down to O (|w |3 ) ?
Spine
The spine of a parse tree is the path from
the root to the root’s head
S[buy]
S[buy]
NP[IBM]
IBM

AdvP[week]

VP[buy]
V[buy]
bought

last week

NP[Lotus]
Lotus
Spine projection
The spine projection is the yield of the subtree composed by the spine and all its
sibling nodes
S[buy]
S[buy]
NP[IBM]
IBM

AdvP[week]

VP[buy]
V[buy]
bought

last week

NP[Lotus]
Lotus

NP[IBM] bought NP[Lotus] AdvP[week]
Split Grammars
Split spine projections at head :

??
Problem :
how much information do we need to store
in order to construct new grammatical
spine projections from splits ?
Split Grammars
Fact :
Set of spine projections is a linear contextfree language
Definition :
2-lex CFG is split if set of spine projections
is a regular language
Remark :
For split grammars, we can recombine
splits using finite information
Split Grammars
Non-split grammar :

S[d]
AdvP[a]

S1[d]
S[d]

AdvP[a]

AdvP[b]

S1[d]
S[d]

AdvP[b]

• unbounded # of
dependencies
between left and
right dependents
of head

…

• linguistically
unattested and
unlikely
Split Grammars
S[d]
AdvP[a] S[d]
S[d]

AdvP[b]

…

Split grammar :
finite # of
dependencies
between left and
right dependents
of lexical head
Split Grammars
Precompile grammar such that splits are
derived separately
S[buy]

S[buy]

S[buy] AdvP[week]
NP[IBM] VP[buy]
V[buy]

NP[IBM] r3[buy]

⇒
NP[Lotus]

bought

r3[buy] is a split symbol

r2[buy]

AdvP[week]

r1[buy]

NP[Lotus]

bought
Split Grammars
• t : max # of states per spine automaton
• g : max # of split symbols per spine
automaton (g < t )
• m : # of delexicalized nonterminals
thare are maximal projections
BU enhanced
Result :
Parsing with split 2-lexical CFG in time
O ( t 2 g 2 m 2 | w |3 )
Remark:
Models in Alshawi 1996, Charniak 1997
and Collins 1997 are not split
Algorithm #2
Idea :

B[d]

• recognize left and
right splits separately

d
B[d]

B[d]
s

d

s

d

• collect head
dependents one
split at a time
Algorithm #2

NP[IBM]

bought

NP[Lotus]

AdvP[week]
Algorithm #2
• Step 1

k

s1

s2
d2

d1

d2

• Step 2

d1

B[d1]

s2

d1

i

⇒

r1

B[d1]
s1

B[d1]

r2

r2

r2

r2

⇒
s1

s2

s2
d2

i

d2
Algorithm #2 : Exhaustive
100000

y = c x 5,2019
10000

y = c x 3,328
1000

time

BU split
BU naive

100

10

1
10

100

length
Algorithm #2 : Pruning
10000

y = c x 3,8282
1000

time

y = c x 2,8179

BU naive
BU split

100

10

1
10

100

length
Related work
Cubic time algorithms for lexicalized
grammars :
• Sleator & Temperley 1991 :
Link Grammars
• Eisner 1997 :
Bilexical Grammars (improved by
transfer of Algorithm #2)
TD enhanced
Goal :
Introduce TD prediction for 2-lexical
CFG parsing, without |VT| factors
Remark :
Must relax left-to-right parsing
(because of previous results)
TD enhanced
Result :
TD parsing with 2-lex CFG in time
O ( |VD|3 × |w |4 )
Open :
O ( |w |3 ) extension to split grammars
TD enhanced
Strongest version of correct-prefix property :
S

w
reading position
Data Structures
Prods with lhs A[d] :

Trie for A[d] :

• A[d] → X1[d1] X2[d2]

d1

• A[d] → Y1[d3] Y2[d2]
• A[d] → Z1[d2] Z2[d1]

d2

d2
d1

d3
Data Structures
Rightmost subsequence recognition
by precompiling input w into a
deterministic FA :
a

b
a

b

c
a

c
a

c
a

b

b
Algorithm #3
S

A[d]

i

j

Item representation :
• i, j indicate extension
of A[d] partial analysis

k

• k indicates rightmost
possible position for
completion of A[d]
analysis
Algorithm #3 : Prediction
A[d2]
C[d2]

B[d1]
d1
i

d2
k’

k

• Step 1 :
find rightmost
subsequence
before k for some
A[d2] production
• Step 2 :
make Earley
prediction
Conclusions
• standard parsing techniques are not
suitable for processing lexicalized
grammars
• novel algorithms have been introduced
using enhanced dynamic programming
• work to be done :
extension to history-based models
The End

Many thanks for helpful discussion to :
Jason Eisner, Mark-Jan Nederhof

Mais conteúdo relacionado

Mais procurados

1984 Article on An Application of AI to Operations Reserach
1984 Article on An Application of AI to Operations Reserach1984 Article on An Application of AI to Operations Reserach
1984 Article on An Application of AI to Operations ReserachBob Marcus
 
Postselection technique for quantum channels and applications for qkd
Postselection technique for quantum channels and applications for qkdPostselection technique for quantum channels and applications for qkd
Postselection technique for quantum channels and applications for qkdwtyru1989
 
Mit203 analysis and design of algorithms
Mit203  analysis and design of algorithmsMit203  analysis and design of algorithms
Mit203 analysis and design of algorithmssmumbahelp
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataVrije Universiteit Amsterdam
 
Iwsm2014 an analogy-based approach to estimation of software development ef...
Iwsm2014   an analogy-based approach to estimation of software development ef...Iwsm2014   an analogy-based approach to estimation of software development ef...
Iwsm2014 an analogy-based approach to estimation of software development ef...Nesma
 
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersFeynman Liang
 
Mit203 analysis and design of algorithms
Mit203  analysis and design of algorithmsMit203  analysis and design of algorithms
Mit203 analysis and design of algorithmssmumbahelp
 
My cool new Slideshow!
My cool new Slideshow!My cool new Slideshow!
My cool new Slideshow!Dung Trương
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Searchmatele41
 

Mais procurados (15)

1984 Article on An Application of AI to Operations Reserach
1984 Article on An Application of AI to Operations Reserach1984 Article on An Application of AI to Operations Reserach
1984 Article on An Application of AI to Operations Reserach
 
Unit 4
Unit 4Unit 4
Unit 4
 
Slide1
Slide1Slide1
Slide1
 
Unit 2
Unit 2Unit 2
Unit 2
 
Postselection technique for quantum channels and applications for qkd
Postselection technique for quantum channels and applications for qkdPostselection technique for quantum channels and applications for qkd
Postselection technique for quantum channels and applications for qkd
 
Mit203 analysis and design of algorithms
Mit203  analysis and design of algorithmsMit203  analysis and design of algorithms
Mit203 analysis and design of algorithms
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
 
Data Structures- Hashing
Data Structures- Hashing Data Structures- Hashing
Data Structures- Hashing
 
NP-Completeness - II
NP-Completeness - IINP-Completeness - II
NP-Completeness - II
 
Iwsm2014 an analogy-based approach to estimation of software development ef...
Iwsm2014   an analogy-based approach to estimation of software development ef...Iwsm2014   an analogy-based approach to estimation of software development ef...
Iwsm2014 an analogy-based approach to estimation of software development ef...
 
Relational Calculus
Relational CalculusRelational Calculus
Relational Calculus
 
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencoders
 
Mit203 analysis and design of algorithms
Mit203  analysis and design of algorithmsMit203  analysis and design of algorithms
Mit203 analysis and design of algorithms
 
My cool new Slideshow!
My cool new Slideshow!My cool new Slideshow!
My cool new Slideshow!
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Search
 

Destaque (12)

Clipping Circuits
Clipping CircuitsClipping Circuits
Clipping Circuits
 
Clipping in Computer Graphics
Clipping in Computer Graphics Clipping in Computer Graphics
Clipping in Computer Graphics
 
L07 dc and ac load line
L07 dc and ac load lineL07 dc and ac load line
L07 dc and ac load line
 
Loadline
LoadlineLoadline
Loadline
 
Topdown parsing
Topdown parsingTopdown parsing
Topdown parsing
 
Clipper circuits
Clipper circuitsClipper circuits
Clipper circuits
 
Top down parsing
Top down parsingTop down parsing
Top down parsing
 
Clampers and clippers
Clampers and clippersClampers and clippers
Clampers and clippers
 
Feedback amplifiers
Feedback amplifiersFeedback amplifiers
Feedback amplifiers
 
Top down and botttom up Parsing
Top down     and botttom up ParsingTop down     and botttom up Parsing
Top down and botttom up Parsing
 
Clipping
ClippingClipping
Clipping
 
Clipping Algorithm In Computer Graphics
Clipping Algorithm In Computer GraphicsClipping Algorithm In Computer Graphics
Clipping Algorithm In Computer Graphics
 

Semelhante a grammer

815.07 machine learning using python.pdf
815.07 machine learning using python.pdf815.07 machine learning using python.pdf
815.07 machine learning using python.pdfSairaAtta5
 
Convex optmization in communications
Convex optmization in communicationsConvex optmization in communications
Convex optmization in communicationsDeepshika Reddy
 
lecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdflecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdfAnaNeacsu5
 
dynamic programming complete by Mumtaz Ali (03154103173)
dynamic programming complete by Mumtaz Ali (03154103173)dynamic programming complete by Mumtaz Ali (03154103173)
dynamic programming complete by Mumtaz Ali (03154103173)Mumtaz Ali
 
CS540-2-lecture11 - Copy.ppt
CS540-2-lecture11 - Copy.pptCS540-2-lecture11 - Copy.ppt
CS540-2-lecture11 - Copy.pptssuser0be977
 
Intermediate Code Generation.pptx
Intermediate Code Generation.pptxIntermediate Code Generation.pptx
Intermediate Code Generation.pptxAbhishek Tirkey
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptxKokilaK25
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-iKrish_ver2
 
CS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfCS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfssuser034ce1
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptxpallavidhade2
 
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptxNibrasulIslam
 
Introduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxIntroduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxPJS KUMAR
 
Asymptotic Notations
Asymptotic NotationsAsymptotic Notations
Asymptotic NotationsRishabh Soni
 
20110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-0220110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-02Computer Science Club
 
jn;lm;lkm';m';;lmppt of data structure.pdf
jn;lm;lkm';m';;lmppt of data structure.pdfjn;lm;lkm';m';;lmppt of data structure.pdf
jn;lm;lkm';m';;lmppt of data structure.pdfVinayNassa3
 

Semelhante a grammer (20)

Notion of Algorithms.pdf
Notion of Algorithms.pdfNotion of Algorithms.pdf
Notion of Algorithms.pdf
 
815.07 machine learning using python.pdf
815.07 machine learning using python.pdf815.07 machine learning using python.pdf
815.07 machine learning using python.pdf
 
Convex optmization in communications
Convex optmization in communicationsConvex optmization in communications
Convex optmization in communications
 
lecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdflecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdf
 
dynamic programming complete by Mumtaz Ali (03154103173)
dynamic programming complete by Mumtaz Ali (03154103173)dynamic programming complete by Mumtaz Ali (03154103173)
dynamic programming complete by Mumtaz Ali (03154103173)
 
CS540-2-lecture11 - Copy.ppt
CS540-2-lecture11 - Copy.pptCS540-2-lecture11 - Copy.ppt
CS540-2-lecture11 - Copy.ppt
 
Code Optimization.ppt
Code Optimization.pptCode Optimization.ppt
Code Optimization.ppt
 
Intermediate Code Generation.pptx
Intermediate Code Generation.pptxIntermediate Code Generation.pptx
Intermediate Code Generation.pptx
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptx
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-i
 
CS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdfCS-102 DS-class_01_02 Lectures Data .pdf
CS-102 DS-class_01_02 Lectures Data .pdf
 
Lecture11
Lecture11Lecture11
Lecture11
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx
 
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
 
Introduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxIntroduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptx
 
Algorithms
Algorithms Algorithms
Algorithms
 
Asymptotic Notations
Asymptotic NotationsAsymptotic Notations
Asymptotic Notations
 
20110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-0220110319 parameterized algorithms_fomin_lecture01-02
20110319 parameterized algorithms_fomin_lecture01-02
 
Iare ds ppt_3
Iare ds ppt_3Iare ds ppt_3
Iare ds ppt_3
 
jn;lm;lkm';m';;lmppt of data structure.pdf
jn;lm;lkm';m';;lmppt of data structure.pdfjn;lm;lkm';m';;lmppt of data structure.pdf
jn;lm;lkm';m';;lmppt of data structure.pdf
 

Último

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Último (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

grammer

  • 1. Parsing Techniques for Lexicalized Context-Free Grammars* Giorgio Satta University of Padua * Joint work with : Jason Eisner, Mark-Jan Nederhof
  • 2. Summary • Part I: Lexicalized Context-Free Grammars – motivations and definition – relation with other formalisms • Part II: standard parsing – TD techniques – BU techniques • Part III: novel algorithms – BU enhanced – TD enhanced
  • 3. Lexicalized grammars • each rule specialized for one or more lexical items • advantages over non-lexicalized formalisms: – express syntactic preferences that are sensitive to lexical words – control word selection
  • 4. Syntactic preferences • adjuncts Workers [ dumped sacks ] into a bin *Workers dumped [ sacks into a bin ] • N-N compound [ hydrogen ion ] exchange *hydrogen [ ion exchange ]
  • 5. Word selection • lexical Nora convened the meeting ?Nora convened the party • semantics Peggy solved two puzzles ?Peggy solved two goats • world knowledge Mary shelved some books ?Mary shelved some cooks
  • 6. Lexicalized CFG Motivations : • study computational properties common to generative formalisms used in state-of-the-art real-world parsers • develop parsing algorithm that can be directly applied to these formalisms
  • 8. Lexicalized CFG Context-free grammars with : • alphabet VT: – dumped, sacks, into, ... • delexicalized nonterminals VD: – NP, VP, ... • nonterminals VN: – NP[sack], VP[dump][sack], ...
  • 9. Lexicalized CFG Delexicalized nonterminals encode : • word sense N, V, ... • grammatical features number, tense, ... • structural information bar level, subcategorization state, ... • other constraints distribution, contextual features, ...
  • 10. Lexicalized CFG • productions have two forms : – V[dump] → dumped – VP[dump][sack] → VP[dump][sack] PP[into][bin] • lexical elements in lhs inherited from rhs
  • 11. Lexicalized CFG • production is k-lexical : k occurrences of lexical elements in rhs – NP[bin] → Det[a] N[bin] is 2-lexical – VP[dump][sack] → VP[dump][sack] PP[into][bin] is 4-lexical
  • 12. LCFG at work • 2-lexical CFG – Alshawi 1996 : Head Automata – Eisner 1996 : Dependency Grammars – Charniak 1997 : CFG – Collins 1997 : generative model
  • 13. LCFG at work Probabilistic LCFG G is strongly equivalent to probabilistic grammar G’ iff • 1-2-1 mapping between derivations • each direction is a homomorphism • derivation probabilities are preserved
  • 14. LCFG at work From Charniak 1997 to 2-lex CFG : NPS[profits] NNP[profits] ADJNP[corporate] Pr1 (corporate | ADJ, NP, profits) × Pr1 (profits | N, NP, profits) × Pr2 ( NP → ADJ N | NP, S, profits)
  • 15. LCFG at work From Collins 1997 (Model #2) to 2-lex CFG : 〈VPS, {}, ∆left ⊕ ∆, ∆S ⊕ ∆〉 [bought] 〈N, ∆〉 [IBM] 〈VPS, {NP-C}, ∆left , ∆S 〉 [bought] Prleft (NP, IBM | VP, S, bought, ∆left , {NP-C})
  • 16. LCFG at work Major Limitation : Cannot capture relations involving lexical items outside actual constituent (cfr. history based models) NP[d1][d0] V[d0] d0 NP[d1] d1 PP[d2][d3] d2 cannot look at d0 when computing PP attachment
  • 17. LCFG at work • lexicalized context-free parsers that are not LCFG : – Magerman 1995 : Shift-Reduce+ – Ratnaparkhi 1997 : Shift-Reduce+ – Chelba & Jelinek 1998 : Shift-Reduce+ – Hermjakob & Mooney 1997 : LR
  • 18. Related work Other frameworks for the study of lexicalized grammars : • Carroll & Weir 1997 : Stochastic Lexicalized Grammars; emphasis on expressiveness • Goodman 1997 : Probabilistic Feature Grammars; emphasis on parameter estimation
  • 19. Summary • Part I: Lexicalized Context-Free Grammars – motivations and definition – relation with other formalisms • Part II: standard parsing – TD techniques – BU techniques • Part III: novel algorithms – BU enhanced – TD enhanced
  • 20. Standard Parsing • standard parsing algorithms (CKY, Earley, LC, ...) run on LCFG in time O ( | G | × |w |3 ) • for 2-lex CFG (simplest case) |G | grows with |VD|3 × |VT|2 !! Goal : Get rid of |VT| factors
  • 21. Standard Parsing: TD Result (to be refined) : Algorithms satisfying the correct-prefix property are “unlikely” to run on LCFG in time independent of VT
  • 22. Correct-prefix property Earley, Left-Corner, GLR, ... : S w left-to-right reading position
  • 23. On-line parsing No grammar precompilation (Earley) : G w Parser Output
  • 24. Standard Parsing: TD Result : On-line parsers with correct-prefix property cannot run in time O ( f(|VD|, |w |) ), for any function f
  • 25. Off-line parsing Grammar is precompiled (Left-Corner, LR) : w Parser C(G ) G PreComp Output
  • 26. Standard Parsing: TD Fact : We can simulate a nondeterministic FA M on w in time O ( |M | × |w | ) Conjecture : Fix a polynomial p. We cannot simulate M on w in time p( |w | ) unless we spend exponential time in precompiling M
  • 27. Standard Parsing: TD Assume our conjecture holds true Result : Off-line parsers with correct-prefix property cannot run in time O ( p(|VD|, |w |) ), for any polynomial p, unless we spend exponential time in precompiling G
  • 28. Standard Parsing: BU Common practice in lexicalized grammar parsing : • select productions that are lexically grounded in w • parse BU with selected subset of G Problem : Algorithm removes |VT| factors but introduces new |w | factors !!
  • 29. Standard Parsing: BU Time charged : • i, k, j ⇒ |w |3 A[d2] B[d1] i d1 C[d2] k d2 • A, B, C j • d 1, d 2 Running time is O ( |VD|3 × |w |5 ) !! ⇒ |VD|3 ⇒ |w |2
  • 30. Standard BU : Exhaustive 100000 10000 y = c x 5,2019 1000 time BU naive 100 10 1 10 100 length
  • 31. Standard BU : Pruning 10000 1000 time y = c x 3,8282 BU naive 100 10 1 10 100 length
  • 32. Summary • Part I: Lexicalized Context-Free Grammars – motivations and definition – relation with other formalisms • Part II: standard parsing – TD techniques – BU techniques • Part III: novel algorithms – BU enhanced – TD enhanced
  • 33. BU enhanced Result : Parsing with 2-lex CFG in time O ( |VD|3 × |w |4 ) Remark : Result transfers to models in Alshawi 1996, Eisner 1996, Charniak 1997, Collins 1997 Remark : Technique extends to improve parsing of Lexicalized-Tree Adjoining Grammars
  • 34. Algorithm #1 Basic step in naive BU : A[d2] B[d1] i d1 C[d2] k d2 j Idea : Indices d1 and j can be processed independently
  • 35. Algorithm #1 • Step 1 A[d2] • Step 2 d1 ⇒ C[d2] B[d1] i A[d2] k A[d2] k d2 k i d2 C[d2] i C[d2] j d2 A[d2] ⇒ i d2 j
  • 36. BU enhanced Upper bound provided by Algorithm #1 : O (|w |4 ) Goal : Can we go down to O (|w |3 ) ?
  • 37. Spine The spine of a parse tree is the path from the root to the root’s head S[buy] S[buy] NP[IBM] IBM AdvP[week] VP[buy] V[buy] bought last week NP[Lotus] Lotus
  • 38. Spine projection The spine projection is the yield of the subtree composed by the spine and all its sibling nodes S[buy] S[buy] NP[IBM] IBM AdvP[week] VP[buy] V[buy] bought last week NP[Lotus] Lotus NP[IBM] bought NP[Lotus] AdvP[week]
  • 39. Split Grammars Split spine projections at head : ?? Problem : how much information do we need to store in order to construct new grammatical spine projections from splits ?
  • 40. Split Grammars Fact : Set of spine projections is a linear contextfree language Definition : 2-lex CFG is split if set of spine projections is a regular language Remark : For split grammars, we can recombine splits using finite information
  • 41. Split Grammars Non-split grammar : S[d] AdvP[a] S1[d] S[d] AdvP[a] AdvP[b] S1[d] S[d] AdvP[b] • unbounded # of dependencies between left and right dependents of head … • linguistically unattested and unlikely
  • 42. Split Grammars S[d] AdvP[a] S[d] S[d] AdvP[b] … Split grammar : finite # of dependencies between left and right dependents of lexical head
  • 43. Split Grammars Precompile grammar such that splits are derived separately S[buy] S[buy] S[buy] AdvP[week] NP[IBM] VP[buy] V[buy] NP[IBM] r3[buy] ⇒ NP[Lotus] bought r3[buy] is a split symbol r2[buy] AdvP[week] r1[buy] NP[Lotus] bought
  • 44. Split Grammars • t : max # of states per spine automaton • g : max # of split symbols per spine automaton (g < t ) • m : # of delexicalized nonterminals thare are maximal projections
  • 45. BU enhanced Result : Parsing with split 2-lexical CFG in time O ( t 2 g 2 m 2 | w |3 ) Remark: Models in Alshawi 1996, Charniak 1997 and Collins 1997 are not split
  • 46. Algorithm #2 Idea : B[d] • recognize left and right splits separately d B[d] B[d] s d s d • collect head dependents one split at a time
  • 48. Algorithm #2 • Step 1 k s1 s2 d2 d1 d2 • Step 2 d1 B[d1] s2 d1 i ⇒ r1 B[d1] s1 B[d1] r2 r2 r2 r2 ⇒ s1 s2 s2 d2 i d2
  • 49. Algorithm #2 : Exhaustive 100000 y = c x 5,2019 10000 y = c x 3,328 1000 time BU split BU naive 100 10 1 10 100 length
  • 50. Algorithm #2 : Pruning 10000 y = c x 3,8282 1000 time y = c x 2,8179 BU naive BU split 100 10 1 10 100 length
  • 51. Related work Cubic time algorithms for lexicalized grammars : • Sleator & Temperley 1991 : Link Grammars • Eisner 1997 : Bilexical Grammars (improved by transfer of Algorithm #2)
  • 52. TD enhanced Goal : Introduce TD prediction for 2-lexical CFG parsing, without |VT| factors Remark : Must relax left-to-right parsing (because of previous results)
  • 53. TD enhanced Result : TD parsing with 2-lex CFG in time O ( |VD|3 × |w |4 ) Open : O ( |w |3 ) extension to split grammars
  • 54. TD enhanced Strongest version of correct-prefix property : S w reading position
  • 55. Data Structures Prods with lhs A[d] : Trie for A[d] : • A[d] → X1[d1] X2[d2] d1 • A[d] → Y1[d3] Y2[d2] • A[d] → Z1[d2] Z2[d1] d2 d2 d1 d3
  • 56. Data Structures Rightmost subsequence recognition by precompiling input w into a deterministic FA : a b a b c a c a c a b b
  • 57. Algorithm #3 S A[d] i j Item representation : • i, j indicate extension of A[d] partial analysis k • k indicates rightmost possible position for completion of A[d] analysis
  • 58. Algorithm #3 : Prediction A[d2] C[d2] B[d1] d1 i d2 k’ k • Step 1 : find rightmost subsequence before k for some A[d2] production • Step 2 : make Earley prediction
  • 59. Conclusions • standard parsing techniques are not suitable for processing lexicalized grammars • novel algorithms have been introduced using enhanced dynamic programming • work to be done : extension to history-based models
  • 60. The End Many thanks for helpful discussion to : Jason Eisner, Mark-Jan Nederhof

Notas do Editor

  1. This is a longer version of my invited talk at IWPT00, including some slides not shown at the presentation (because of time restrictions). The following very short notes are a summary of additional comments (not appearing in the slides) that I have made during the presentation. Work by Jason, Mark-Jan and myself presented at ACL99, NAACL00, TAG+5 plus work in progress. Many thanks to Jason and Mark-Jan for plenty of discussion during talk preparation.
  2. Standard parsing techniques are those used by our community in the 80’s with unlexicalized formalisms, and still exploited at present with several language models. Part II goes through a computational analysis of these techniques/strategies and shows that they do not perform efficiently with lexicalized formalisms. Novel solutions are presented in Part III.
  3. Development of lexicalized grammars has been strongly enhanced by availability of large tree banks, starting with the beginning of the 90’s. Certain problems (see above) cannot be solved if system does not use lexical information. Therefore extensive lexical relations have been introduced into the syntactic model. I am not going to dispute here whether the choice was a good one; things might evolve in the future toward more modular and esplicative systems.
  4. This slide not shown at IWPT00. These are standard examples taken from the existing literature. Change the linguistic examples: use minimal pairs showing how syntactic structures switch with change of lexical element(s).
  5. This slide not shown at IWPT00. “Convene” shows a case of genuinely lexical factors. Solve needs object with an actual state and a desired state, etc.
  6. Abstract away from differences in language models that are not relevant for computational complexity analysis.
  7. Standard CFG + lexical element percolation. We call co-head the lexical head of a complement of the actual constituent. The formalisms captures relations among lexical elements. When weights are introduced, this translates into preferences.
  8. I am trying to keep to a minimum # of definitions, these are the only symbols you need to cache to follow this talk. The really new notion here is the set of delexicalized nonterminals.
  9. This slide not shown at IWPT00. No direct reference to lexical elements is allowed in a delexicalized nonterminal. Among “other constraints”: Charniak uses constraint on father node label (to differentiate distributions); Collins uses gap feature (see GPSG) and other finite info about distribution of certain lexical elements.
  10. Here is an important notion expressing the amount of lexicalization the grammar supports. We measure the complexity of the grammar on the basis of the arity of the represented lexical relations. Remark: Bilexical grammar implies binary productions. Remark: The degree of lexicalization k induces a proper hierarchy on the generated languages.
  11. Here are some state-of-the-art language models that are strongly related to lexicalized CFG. Strongly related means there exists a linear time, 1-2-1 mapping (actually, a homomorphism) between derivations, that preserves derivation probabilities. Main reason for the large development of bilexical models rests on the difficulty, given available annotated data, in estimating lexical relations of arity greater than 2. Open: is Lafferty et al. 1992 probabilistic version of Link Grammars a 3-lexical CFG?
  12. This slide not shown at IWPT00.
  13. This slide not shown at IWPT00.
  14. This slide not shown at IWPT00. Multi-sets denote a particular state in the process of generating complements subecategorized for by the head. Delta’s are finite functions denoting distributions of certain lexical elements, these can be merged.
  15. NP is direct object of the Verb. In this case, taking into consideration the verbal head would help in computation of likelihood of PP-attachment. Contrast with more general history based models.
  16. I am not considering here models that are not based on CFG, as Brill’s transformation based model. Computational limitations derived in Part II transfer to these models, since they are more powerful than 2-lexical CFG. Our results in Part III should however be extended. Open: relation between 2-lex CFG and lexicalized version of DOP model.
  17. Let us consider the least complex case, namely that of bilexical CFG. Grammar size grows with the square of alphabet size (|V_T|^2 is tight in real world cases, |V_D|^3 is not). For the rest of this talk, our goal will be to develop parsing strategies running in time independent of |V_T| factors (in our case this will always result in sublinear time wrt the grammar).
  18. We present a computational analysis of a family of parsing algorithms that use the so-called top-down prediction. These algorithms share the correct-prefix property (to be introduced in next slide). The result, to be refined in the next slides, supports the empirically based common belief that BU strategies perform better than TD strategies in processing lexicalized formalisms based on context-free grammars.
  19. The green productions are called here bridging productions. Note that this property is defined in combination with a strict left-to-right processing requirement. We will show that computation of this combination of requirements is expensive to maintain.
  20. G is some data structure representing the input grammar but no additional information that could be inferred from it.
  21. Result says we cannot get rid of |V_T| factors when top-down prediction is involved. Mathematically: it does not matter how fast function f grows, it cannot be independent of V_T. The proof shows that the correct-prefix property is not a local property, you must consider the whole grammar in order to test this condition.
  22. We are allowed to extract additional information from the input grammar previous to parsing.
  23. The on-line parsing result was easy to establish. Transferring this result to the off-line case is quite difficult, as it is difficult in general to prove non-trivial lower bounds for many problems in computer science. The best we have been able to do is to provide evidence in support of the result, by relating it with a conjecture regarding FA. When parsing with a nondeterministic FA, in order to remove any dependence on the size of M from the runnig time, we need to spend exponential time in determinizing M. No algorithm is known for linear time parsing on |w| and independent on M that only requires polytime precompilation. What if we are prepared to spend more than linear time in processing w? Can we compensate for the lack of information on M, due the poly bound on precompilation, by allowing more time in processing w? We conjecture the answer is no, due to the independence between M and w.
  24. Similar to the on-line result, but relativized to our FA conjecture.
  25. Standard CYK parsing of bilexical CFG shows an increase in time complexity of a factor of |w|^2, ascribed to the choice of productions that are grounded in the input string span under recognition.
  26. How much does this affect us in real life? Here are some experiments that were run by Jason Eisner, on a bilexical CFG that was estimated from the Penn tree bank. I will say more later about the grammar, when we compare these results to the performance of our novel algorithms. Exhaustive parsing has applications in EM algorithms.
  27. We consider here a quite simple pruning strategy, using standard beam-search and some thresholding. The strategy has a speed-up of a factor between 10 to 20. Note that the polynomial represented by the line gives a running time that is worst than the worst case we have with unlexicalized CFG. This is an English text experiment. If we use input with some higher degree of noise, as for instance a language with word boundary uncertainty or a word lattice input from some low level speech recognizer, then I expect that the degree of ambiguity in the choice of head position to be even worst, resulting in an increased gap between the unlexicalized and the lexicalized cases.
  28. Result directly transfers to the above language models; published implementations of parsers for these models are all O(|w|^5). Naive method for Lexicalized Tree-Adjoining Grammar Parsing is O(|w|^8), see Rasnik 1992 and Schabes 1992. Result improves to O(|w|^7). Grammar constant involved is small and not dependent on alphabet size.
  29. Some predicate headed in d_2 is seeking a left dependent with head d_1. In pairing with head d_2 the constituent headed in d_1 and extending from i to k, we do not need to consider index j. After pairing is done, d_1 can be discarded, and then we can come to the processing of index j, that we need to compute the extension of the final constituent (the combination of the two). Thus we can split the basic step into two sub-steps.
  30. No more than 4 indices are involved at each step, thus complexity is O(|w|^4). There are some optimizations that we can apply here (but that do not improve the time asymptotic complexity). We can apply Step 1 only in case d_2 is head of a constituent starting at k and extending to the right. This amounts to say that we pack together all constituents starting at k and with head at a given position, before applying Step 1. The unpacking is then done when applying Step 2.
  31. The spine is the path along which the root head is percolated. Let us consider only spines that correspond to maximal projections of some head.
  32. The idea with spine projections is to look at possible dependents of a given head in their left-to-right order. (Again, assume maximal projections.) Convention : dependents are all phrases that can combine with a given head; they divide up into complements and adjuncts.
  33. Let us split spine projections at the head position (either before or after). If we combine left and right splits coming from different spine projections, how do we find out whether the combination is still a grammatical spine projection?
  34. Linear context-free grammar: only one nonterminal in the right-hand side of each production. Derivation trees grow along a single spine. When combining splits, we can use finite information related to the states of an FA that recognizes spine projections. We need one FA for each lexical head.
  35. In a non-split grammar, we can establish an unbounded number of dependencies between left and right dependents of some head. In the example, we alternate sentential modifiers to the left and the right, using a special extra symbol (S_1) to record the state in this process. In this way, each left modifier depends on some right modifier, and we obtain non-regular linear context-free languages of the form a^n # b^n. I don’t know of any natural language that behaves in this way, and it seems extremely unlikely that such a language exists.
  36. In contrast, split grammars can only establish a finite number of dependencies between left and right dependents of the head. Here is a more realistic natural language example, where left or right attachment of the modifier depends only on the head (i.e., not on the number of modifiers attached at the opposite side). In this case we have zero dependencies between left and right dependents of the head.
  37. For split grammars, we can precompile the grammar in such a way that the above transformation is established for each spine. The new symbols on the right spine are related to states of a FA that recognizes spine projections for the head “buy”. Symbol r_3 is a split symbol: it marks the switch from derivation of right split to derivation of left split.
  38. These are “small” grammar constants that do not depend on the alphabet size (only used in the next slide, do not cache them). Roughly speaking, the product t times m is of the same order as the size of the delexicalized nonterminals in the grammar. Constant g is quite small in practice, and equal to one for most of the lexical entries of the grammar.
  39. Grammar constant here is of the same order as that in Algorithm #1 (we are not trading computational resources). The above language models have enough power to express non-split grammars. If only split grammars are interesting for natural language modeling, then we should not have any loss of accuracy if we restrict the above models to this class, and this will result in an asymptotic improvement in time efficiency as stated by the result.
  40. Each constituent is represented by the algorithm by means of its left and right splits. Splits are recognized separately, and each head collects its dependents one split at a time.
  41. Here is a high-level view of how Algorithm #2 works.
  42. We see how a left dependent is collected by some head d_2 that is recognizing its left split. Recognition of right splits proceeds symmetrically. Note that each of the two steps only involves three input string indices.
  43. We report comparison between the naive, O(n^3) CYK implementation and Algorithm #2. We are running a split CFG that has been estimated from the Penn tree bank according to the model presented in Jason 1998. At 30-length sentences, we have a speed up of 19 for exhaustive parsing.
  44. At 30-length sentences, we have a speed up of 5 when our pruning strategy is applied to both algorithms.
  45. First algorithm is defined for a non-lexicalized grammar model. But it has the nice property that, when used with lexicalized version of the grammar, it still works in n^3 time. These two algorithms work for dependency-like models, the employed techniques are quite different from the ones for phrase structures we have presented.
  46. In the perspective of unlexicalized grammar formalisms, top-down prediction is effective in reducing the parsing search space, as compared to crude bottom-up parsing. Correct-prefix property is a desired filtering technique that we would like to have in a parsing algorithm for lexicalized CFG as well. As we have seen, direct implementation of correct-prefix property in a parser for bilexical CFG that works strictly left-to-right is very likely to result in an inefficient computation. So we have to give up with something in the end. We will be able to implement an even stronger version of the correct prefix property, by relaxing only the left-to-right condition.
  47. We present a version of the algorithm that works for general bilexical grammars and extends the BU result. Open: split grammar case.
  48. We exploit a generalization of look-ahead. Recall that we require the existence of bridging productions (the green productions) associated with each partially or completely recognized constituent (yellow constituent). We restrict possible bridging productions to those that have unexpanded nonterminals with heads grounded in the portion of the string not yet analyzed, and in the same order.
  49. In order to have running time independent of the alphabet size, we need to develop specific data structures for grammar and input string representation. Productions with the same left-hand side have the sequences of lexical elements appearing in their right-hand sides reversed and stored into a trie.
  50. We also employ standard deterministic FA constructions that represent the set of all subsequences appearing in a given string (when read right-to-left).
  51. As in the Earley algorithm or in chart-based algorithms, each item representing a partial analysis is associated with two indices corresponding to its extension in the input string. In addition, we keep a third index, indicating the rightmost position in the input beyond which the partial analysis cannot be extended without violating our version of the correct prefix property.
  52. The algorithm can be viewed as an extension of the Earley algorithm, in which standard left-to-right phrase analysis is synchronized with right-to-left analysis for lexical item subsequences. The prediction step of the algorithm works as follows. In order to expand a predicted left-hand side symbol A[d_2] that should not extend beyond position k, we read the trie associated with A[d_2] and our input string representation. This step locates all rightmost occurrences of subsequences of the input string that correspond to sequences of lexical elements appearing in the right-hand side of some A[d_2] production. Then we apply the standard prediction step as in the Earley algorithm, and predict the left-corner of the selected productions. Again, each item is associated with a third index beyond which the analysis cannot be expanded. This index is directly computed from the recognized subsequences.
  53. Further work need to be done in order to extend the presented results to more general history based models.