SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
The Word-based Regular Expressions
(originally CHeuSoV :-) )
Aleksey Cheusov
vle@gmx.net
Minsk LUG meeting, 29 Dec. 2012, Belarus
Plan for my presentation
Mathematics: RL definition and theorem
Real World: applications of RL and some notes
Linguistics: POS tagset, Semantic tagset, Text tagging,
Tagged text corpus
Wanted: (mumbling...)
Mathematics: ExpRL definition, WRL definition
Linguistics + Real World: WRE syntax and examples
Вброс, драка, разбор завалов, развоз трупов
Regular language
Definition: given a finite non-empty set of elements Σ (alphabet):
∅ is a regular language.
For any element v ∈ Σ, {v} is a regular language.
If A and B are regular languages, so is A ∪ B.
If A and B are regular languages, so is {ab|a ∈ A, b ∈ B},
where ab means string concatenation.
If A is a regular language, so is A∗ where
A∗ = ∅ ∪ {a1a2. . .an|ai ∈ A, n > 0}.
No other languages over Σ are regular.
Example:
Let Σ = {a, b, c}. Then since aab and cc are members of Σ∗,
{aab} and {cc} are regular languages. So is the union of these
two sets {aab, cc}, and so is the concatenation of the two
{aabcc}. Likewise, {aab}∗, {cc}∗ and {aab, cc, aabcc}∗ are
regular languages.
Regular language (interesting fact)
Provable fact:
Regular languages are closed under intersection and negation
operations.
Finite state automaton
Definition: 5-tuple (Σ, S, S0, F, δ) is a finite state automaton
Σ is a finite non-empty set of elements (alphabet).
S is a finite non-empty set of states.
S0 ⊆ S is a set of start states.
F ⊆ S is a set of finite states.
δ : S × Σ → 2S is a state transition function.
Theorem: ∀ regular language R, ∃ FSA f , such that L(f ) = R; ∀
FSA f , L(f ) is a regular language (L — language of FSA, that is a
set of accepted inputs).
Real world. Regular expressions (hello UNIX!).
Regular language is a foundation for so called regular expressions,
in most cases alphabet Σ is a set of characters (ASCII, Unicode
etc.):
POSIX basic regular expressions (grep, sed, vi etc.) and
extended regular expressions (grep -E, awk etc.)
Google re2, Yandex PIRE
Perl, pcre, Ruby, Python (superset of regular language,
and. . . extremely inefficient ;-) )
lex
...
Widely used extensions:
submatch (depending on implementation may still be FSM but
not FSA)
backreferences (incompatible with regular language and FSM
at all)
Tagsets
Penn part-of-speech tag set:
NN noun, singular or mass (apple, computer, fruit etc.)
NNS noun plural (apples, computers, fruits etc.)
CC coordinating conjunction (and, or)
VB verb, base form (give, book, destroy etc.)
VBD verb, past tense form (gave, booked, destroyed etc.)
VBN verb, past participle form (given, booked, destroyed etc.)
VBG verb, gerund/present participle (giving, booking,
destroying etc.)
etc.
Semantic tag set:
LinkVerb a verb that connects the subject to the
complement(seem, feel, look etc.)
AnimateNoun (brother, son etc.)
etc.
Tagged sentence, POS tagging, semantic tagging
Examples:
The book is red →
The_DT book_NN is_VBZ red_JJ →
The_DT book_NN/Object is_VBZ red_JJ/Color
Note: Words book and red are ambiguous.
My son goes to school →
My_PRP$ son_NN goes_VBZ to_IN school_NN →
My_PRP$ son_NN/Person goes_VBZ to_TO
school_NN/Establishment
Questions: How to match tagged sentence or find portions of it?
Can regular expressions help? Do we need
regcomp(3)/regexec(3)-like functionality?
Expanded regular language
Definition: given a non-empty set of elements Σ (alphabet) and a
set of one-place predicates P = {P1, P2, . . . Pk},
Pi : Σ → {true, false}:
∅ is an expanded regular language.
For any element v ∈ Σ, {v} is an expanded regular language.
For any i {v|Pi (v) = true} is an expanded regular language.
Σ is an expanded regular language.
If A and B are expanded regular languages, so is A ∪ B.
If A and B are expanded regular languages, so is A  B.
If A and B are expanded regular languages, so is
{ab|a ∈ A, b ∈ B}, where ab means string concatenation.
If A is a regular language, so is A∗ where
A∗ = ∅ ∪ {a1a2. . .an|ai ∈ A, n > 0}.
No other languages over Σ are expanded regular.
Expanded regular language (interesting facts)
Provable facts:
Expanded regular languages are closed under intersection and
negation operations.
∀ expanded regular language R we can build a regular
language R∗ over alphabet Σ∗, such that ∃f : Σ∗ → 2Σ and
L(R) = L(R∗) (expanding all elements in L(R∗) with a help of
f ). Thus, we can build expanded regular expression engine
based on traditional FSA/FSM-based algorithms.
The Word-based regular language
Definition: given
W — set of words (character sequences), e.g.
”the”, ”apple””123”, ”; ”, ”C2H5OH”, . . .
TPOS — finite non-empty set of part-of-speech tags, e.g.
{DT, NN, NNS, VBP, VBZ, . . . }
TSem — finite non-empty set of semantic tags, e.g.
{LinkVerb, Person, Object, TransitiveVerb, . . . }
DPOS : W → 2TPOS — POS dictionary, e.g.
DPOS (”the”) = {DT}, DPOS(”book”) = {NN, VB, VBP},
DPOS (”and”) = {CC}
DSem : W → 2TSem — semantic dictionary, e.g.
DSem(”son”) = {Person},
DSem(”mouse”) = {Animal, ComputerDevice}
EREs — finite set of POSIX extended regular expressions, e.g.
{”. ∗ ing”, ”.multi. ∗ al”, ”[A − Z][a − z]”, ”[0 − 9] + ” . . . } etc.
(to be continued)
The Word-based regular language
(continuation) the word-based regular language
(TPOS , TSem, DPOS, DSem, EREs) is an expanded regular
language over alphabet W × TPOS × 2TSem and one-place
predicates P = {Pcheck
tPOS
, Pcheck
tSem
, Ptagging
tPOS
, Ptagging
tSem
, Pword
re }, where
Pcheck
tPOS
(w, ·, ·) = true if tPOS ∈ DPOS(w), and false otherwise
Pcheck
tSem
(w, ·, ·) = true if tSem ∈ DSem(w), and false otherwise
Ptagging
tPOS
(·, tagPOS , ·) = true if tPOS = tagPOS, and false
otherwise
Ptagging
tSem
(·, ·, tagsSem) = true if tSem ∈ tagsSem, and false
otherwise
Pword
re (w, ·, ·) = true if POSIX extended regular expression
re ∈ EREs matches w, and false otherwise
The Word-based regular expressions (Finally!).
Syntax:
"word" — word itself (Pword
re ), e.g. "the", "2012-12-29" etc.
’regexp’ — words matched by specified regexp (Pword
re )
Tag — words tagged as tagPOS (Ptagging
tPOS
), e.g. NN, DT, VB
etc.
%Tag — words tagged as tagSem (Ptagging
tSem
), e.g. %Person,
%Object, %LinkLerb etc.
_Tag — words having as tagPOS in POS dictionary (Pcheck
tPOS
),
e.g. _NN, _DT, _VB etc.
@Tag — words having as tagSem in semantic dictionary
(Pcheck
tSem
), e.g. @LinkVerb, @Object etc.
. (dot) — any word with any POS and semantic tags
ˆ — beginning of the sentence
$ — end of the sentence
(to be continued)
The Word-based regular expressions (Finally!).
Syntax (continuation):
( R ) — grouping like in mathematical expressions
<num R > — submatch and extraction
R ? and [ R ] — optional WRE
R * and R + — possibly empty and non-empty repetitions
R {n,m}, R {n,} and R {,m} — repetitions
R – S — subtraction
R & S and R / S — intersection, / is for single word WREs,
& is for complex WREs
R | S — union
R S — concatenation
!R — negation (L(!R) is equal to either Σ  L(R) or Σ ∗ L(R)
depending on a context of use)
(to be continued)
The Word-based regular expressions (Finally!).
Syntax (priorities from highest to lowest, continuation):
, / and | in single word non-spaced WREs
Prepositional unary operation !
Postpositional unary operations {n,m}, ’?’, ’+’ and ’*’
( R ) and <num R >
R & S
R – S
R S
R | S
(to be continued)
WRE examples
How to select noun phrases (leftmost-longest match, only POS
tags is our rules)
( DT | CD+ )? RB * CC|JJ|JJR|JJS * (NN|NNS + | NP + )
Ex.: This absolutely stupid decision
Ex.: The best fuel cell
Ex.: Black and white colors
Ex.: Vasiliy Pupkin
NER (Named Entity Recognition) for person names
DSem("MrDr") = {"Mrs.", "Mr.", "Dr.", "Doctor",
"President", "Dear", . . . }
@MrDr <1 ’[A-Z][a-z]+’/!@MrDr␣+>
Ex.: Doctor <1 Zhivago>
Ex.: Dr. <1 Vasiliy Pupkin>
Ex.: Mrs. <1 Kate>
but not
Ex.: Mr. <1 President>
WRE examples
Question focus (object attributes)
ˆ "What" "is" "the" <1 @Attribute > "of" <2 .* > "?"
Ex.: What is the <1 color > of <2 your book> ?
Tiny definitions
m4_define(NounPhrase, “(see previous slide)”) ˆ (NounPhrase
& (. . . . . . .)) "is" (NounPhrase & (. . . . . . .))
Ex.: What is the <1 color > of <2 your book> ?
Sentiment analysis
DSem("BadCharact") = {"sucks", "stupid", "crappy",
"shitty", . . . }
DSem("GoodCharact") = {"rocks", "awesome", "excellent",
. . . }
DSem("OurProduct") = {"Linux", "NetBSD", "Ubuntu",
"AltLinux", "iPad", "Android", . . . }
<1 @BadCharact|@GoodCharact> <2 @OurProduct> |
<2 @OurProduct> <1 @BadCharact|@GoodCharact>
The Word-based Regular Expressions
is really cool DSL!

Mais conteúdo relacionado

Mais procurados

Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4shah zeb
 
Deterministic context free grammars &non-deterministic
Deterministic context free grammars &non-deterministicDeterministic context free grammars &non-deterministic
Deterministic context free grammars &non-deterministicLeyo Stephen
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsShiraz316
 
Formal Languages (in progress)
Formal Languages (in progress)Formal Languages (in progress)
Formal Languages (in progress)Jshflynn
 
Lecture 7: Definite Clause Grammars
Lecture 7: Definite Clause GrammarsLecture 7: Definite Clause Grammars
Lecture 7: Definite Clause GrammarsCS, NcState
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free GrammarsMarina Santini
 
3.1 intro toautomatatheory h1
3.1 intro toautomatatheory  h13.1 intro toautomatatheory  h1
3.1 intro toautomatatheory h1Rajendran
 
context free language
context free languagecontext free language
context free languagekhush_boo31
 
Introduction of predicate logics
Introduction of predicate  logicsIntroduction of predicate  logics
Introduction of predicate logicschauhankapil
 
Introduction of predicate logics
Introduction of predicate  logicsIntroduction of predicate  logics
Introduction of predicate logicschauhankapil
 
Types of Language in Theory of Computation
Types of Language in Theory of ComputationTypes of Language in Theory of Computation
Types of Language in Theory of ComputationAnkur Singh
 
Word level language identification in code-switched texts
Word level language identification in code-switched textsWord level language identification in code-switched texts
Word level language identification in code-switched textsHarsh Jhamtani
 
Regular expressions and languages pdf
Regular expressions and languages pdfRegular expressions and languages pdf
Regular expressions and languages pdfDilouar Hossain
 
Top school in ghaziabad
Top school in ghaziabadTop school in ghaziabad
Top school in ghaziabadEdhole.com
 
Context Free Grammar
Context Free GrammarContext Free Grammar
Context Free GrammarAkhil Kaushik
 
Class7
 Class7 Class7
Class7issbp
 
R.4 solving literal equations
R.4 solving literal equationsR.4 solving literal equations
R.4 solving literal equationshisema01
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsEran Zimbler
 

Mais procurados (20)

Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4
 
Deterministic context free grammars &non-deterministic
Deterministic context free grammars &non-deterministicDeterministic context free grammars &non-deterministic
Deterministic context free grammars &non-deterministic
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Formal Languages (in progress)
Formal Languages (in progress)Formal Languages (in progress)
Formal Languages (in progress)
 
Lecture 7: Definite Clause Grammars
Lecture 7: Definite Clause GrammarsLecture 7: Definite Clause Grammars
Lecture 7: Definite Clause Grammars
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free Grammars
 
3.1 intro toautomatatheory h1
3.1 intro toautomatatheory  h13.1 intro toautomatatheory  h1
3.1 intro toautomatatheory h1
 
context free language
context free languagecontext free language
context free language
 
Introduction of predicate logics
Introduction of predicate  logicsIntroduction of predicate  logics
Introduction of predicate logics
 
Introduction of predicate logics
Introduction of predicate  logicsIntroduction of predicate  logics
Introduction of predicate logics
 
Hak ontoforum
Hak ontoforumHak ontoforum
Hak ontoforum
 
Types of Language in Theory of Computation
Types of Language in Theory of ComputationTypes of Language in Theory of Computation
Types of Language in Theory of Computation
 
Word level language identification in code-switched texts
Word level language identification in code-switched textsWord level language identification in code-switched texts
Word level language identification in code-switched texts
 
Regular expressions and languages pdf
Regular expressions and languages pdfRegular expressions and languages pdf
Regular expressions and languages pdf
 
First order logic
First order logicFirst order logic
First order logic
 
Top school in ghaziabad
Top school in ghaziabadTop school in ghaziabad
Top school in ghaziabad
 
Context Free Grammar
Context Free GrammarContext Free Grammar
Context Free Grammar
 
Class7
 Class7 Class7
Class7
 
R.4 solving literal equations
R.4 solving literal equationsR.4 solving literal equations
R.4 solving literal equations
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 

Semelhante a Алексей Чеусов - Расчёсываем своё ЧСВ

INFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptINFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptLamhotNaibaho3
 
Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammarmeresie tesfay
 
Lecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdfLecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdfDeptii Chaudhari
 
Formal methods 3 - languages and machines
Formal methods   3 - languages and machinesFormal methods   3 - languages and machines
Formal methods 3 - languages and machinesVlad Patryshev
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translationHrishikesh Nair
 
ToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdfToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdfjaishreemane73
 
Chapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdfChapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdfDrIsikoIsaac
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter Systemkkkseld
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptxsiddhantroy13
 
01-Introduction&Languages.pdf
01-Introduction&Languages.pdf01-Introduction&Languages.pdf
01-Introduction&Languages.pdfTariqSaeed80
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingKaterina Vylomova
 
1LECTURE 8 Regular_Expressions.ppt
1LECTURE 8 Regular_Expressions.ppt1LECTURE 8 Regular_Expressions.ppt
1LECTURE 8 Regular_Expressions.pptMarvin886766
 
Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4shah zeb
 
Lecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdfLecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdfDeptii Chaudhari
 
Lecture 1,2
Lecture 1,2Lecture 1,2
Lecture 1,2shah zeb
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture NotesFellowBuddy.com
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
 

Semelhante a Алексей Чеусов - Расчёсываем своё ЧСВ (20)

INFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptINFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.ppt
 
Ch3 4 regular expression and grammar
Ch3 4 regular expression and grammarCh3 4 regular expression and grammar
Ch3 4 regular expression and grammar
 
Lecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdfLecture Notes-Are Natural Languages Regular.pdf
Lecture Notes-Are Natural Languages Regular.pdf
 
L_2_apl.pptx
L_2_apl.pptxL_2_apl.pptx
L_2_apl.pptx
 
Formal methods 3 - languages and machines
Formal methods   3 - languages and machinesFormal methods   3 - languages and machines
Formal methods 3 - languages and machines
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
ToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdfToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdf
 
Chapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdfChapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdf
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter System
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptx
 
Ch2 automata.pptx
Ch2 automata.pptxCh2 automata.pptx
Ch2 automata.pptx
 
01-Introduction&Languages.pdf
01-Introduction&Languages.pdf01-Introduction&Languages.pdf
01-Introduction&Languages.pdf
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
 
1LECTURE 8 Regular_Expressions.ppt
1LECTURE 8 Regular_Expressions.ppt1LECTURE 8 Regular_Expressions.ppt
1LECTURE 8 Regular_Expressions.ppt
 
Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4
 
Lecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdfLecture Notes-Finite State Automata for NLP.pdf
Lecture Notes-Finite State Automata for NLP.pdf
 
Lecture 1,2
Lecture 1,2Lecture 1,2
Lecture 1,2
 
10651372.ppt
10651372.ppt10651372.ppt
10651372.ppt
 
Theory of Computation Lecture Notes
Theory of Computation Lecture NotesTheory of Computation Lecture Notes
Theory of Computation Lecture Notes
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 

Mais de Minsk Linux User Group

Vladimir ’mend0za’ Shakhov — Linux firmware for iRMC controller on Fujitsu P...
 Vladimir ’mend0za’ Shakhov — Linux firmware for iRMC controller on Fujitsu P... Vladimir ’mend0za’ Shakhov — Linux firmware for iRMC controller on Fujitsu P...
Vladimir ’mend0za’ Shakhov — Linux firmware for iRMC controller on Fujitsu P...Minsk Linux User Group
 
Андрэй Захарэвіч — Hack the Hackpad: Першая спроба публічнага кіравання задач...
Андрэй Захарэвіч — Hack the Hackpad: Першая спроба публічнага кіравання задач...Андрэй Захарэвіч — Hack the Hackpad: Першая спроба публічнага кіравання задач...
Андрэй Захарэвіч — Hack the Hackpad: Першая спроба публічнага кіравання задач...Minsk Linux User Group
 
Святлана Ермаковіч — Вікі-дапаможнік. Як узмацніць беларускую вікі-супольнасць
Святлана Ермаковіч — Вікі-дапаможнік. Як узмацніць беларускую вікі-супольнасцьСвятлана Ермаковіч — Вікі-дапаможнік. Як узмацніць беларускую вікі-супольнасць
Святлана Ермаковіч — Вікі-дапаможнік. Як узмацніць беларускую вікі-супольнасцьMinsk Linux User Group
 
Тимофей Титовец — Elastic+Logstash+Kibana: Архитектура и опыт использования
Тимофей Титовец — Elastic+Logstash+Kibana: Архитектура и опыт использованияТимофей Титовец — Elastic+Logstash+Kibana: Архитектура и опыт использования
Тимофей Титовец — Elastic+Logstash+Kibana: Архитектура и опыт использованияMinsk Linux User Group
 
Андрэй Захарэвіч - Як мы ставілі KDE пад FreeBSD
Андрэй Захарэвіч - Як мы ставілі KDE пад FreeBSDАндрэй Захарэвіч - Як мы ставілі KDE пад FreeBSD
Андрэй Захарэвіч - Як мы ставілі KDE пад FreeBSDMinsk Linux User Group
 
Vitaly ̈_Vi ̈ Shukela - My FOSS projects
Vitaly  ̈_Vi ̈ Shukela - My FOSS projectsVitaly  ̈_Vi ̈ Shukela - My FOSS projects
Vitaly ̈_Vi ̈ Shukela - My FOSS projectsMinsk Linux User Group
 
Alexander Lomov - Cloud Foundry и BOSH: истории из жизни
Alexander Lomov - Cloud Foundry и BOSH: истории из жизниAlexander Lomov - Cloud Foundry и BOSH: истории из жизни
Alexander Lomov - Cloud Foundry и BOSH: истории из жизниMinsk Linux User Group
 
Vikentsi Lapa — How does software testing become software development?
Vikentsi Lapa — How does software testing  become software development?Vikentsi Lapa — How does software testing  become software development?
Vikentsi Lapa — How does software testing become software development?Minsk Linux User Group
 
Михаил Волчек — Свободные лицензии. быть или не быть? Продолжение
Михаил Волчек — Свободные лицензии. быть или не быть? ПродолжениеМихаил Волчек — Свободные лицензии. быть или не быть? Продолжение
Михаил Волчек — Свободные лицензии. быть или не быть? ПродолжениеMinsk Linux User Group
 
Максим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPN
Максим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPNМаксим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPN
Максим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPNMinsk Linux User Group
 
Слава Машканов — “Wubuntu”: Построение гетерогенной среды Windows+Linux на н...
Слава Машканов — “Wubuntu”: Построение гетерогенной среды  Windows+Linux на н...Слава Машканов — “Wubuntu”: Построение гетерогенной среды  Windows+Linux на н...
Слава Машканов — “Wubuntu”: Построение гетерогенной среды Windows+Linux на н...Minsk Linux User Group
 
MajorDoMo: Открытая платформа Умного Дома
MajorDoMo: Открытая платформа Умного ДомаMajorDoMo: Открытая платформа Умного Дома
MajorDoMo: Открытая платформа Умного ДомаMinsk Linux User Group
 
Максим Салов - Отладочный монитор
Максим Салов - Отладочный мониторМаксим Салов - Отладочный монитор
Максим Салов - Отладочный мониторMinsk Linux User Group
 
Максим Мельников - FOSDEM 2014 overview
Максим Мельников - FOSDEM 2014 overviewМаксим Мельников - FOSDEM 2014 overview
Максим Мельников - FOSDEM 2014 overviewMinsk Linux User Group
 
Константин Шевцов - Пара слов о Jenkins
Константин Шевцов - Пара слов о JenkinsКонстантин Шевцов - Пара слов о Jenkins
Константин Шевцов - Пара слов о JenkinsMinsk Linux User Group
 
Ермакович Света - Операция «Пингвин»
Ермакович Света - Операция «Пингвин»Ермакович Света - Операция «Пингвин»
Ермакович Света - Операция «Пингвин»Minsk Linux User Group
 
Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...
Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...
Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...Minsk Linux User Group
 
Алексей Туля - А нужен ли вам erlang?
Алексей Туля - А нужен ли вам erlang?Алексей Туля - А нужен ли вам erlang?
Алексей Туля - А нужен ли вам erlang?Minsk Linux User Group
 

Mais de Minsk Linux User Group (20)

Vladimir ’mend0za’ Shakhov — Linux firmware for iRMC controller on Fujitsu P...
 Vladimir ’mend0za’ Shakhov — Linux firmware for iRMC controller on Fujitsu P... Vladimir ’mend0za’ Shakhov — Linux firmware for iRMC controller on Fujitsu P...
Vladimir ’mend0za’ Shakhov — Linux firmware for iRMC controller on Fujitsu P...
 
Андрэй Захарэвіч — Hack the Hackpad: Першая спроба публічнага кіравання задач...
Андрэй Захарэвіч — Hack the Hackpad: Першая спроба публічнага кіравання задач...Андрэй Захарэвіч — Hack the Hackpad: Першая спроба публічнага кіравання задач...
Андрэй Захарэвіч — Hack the Hackpad: Першая спроба публічнага кіравання задач...
 
Святлана Ермаковіч — Вікі-дапаможнік. Як узмацніць беларускую вікі-супольнасць
Святлана Ермаковіч — Вікі-дапаможнік. Як узмацніць беларускую вікі-супольнасцьСвятлана Ермаковіч — Вікі-дапаможнік. Як узмацніць беларускую вікі-супольнасць
Святлана Ермаковіч — Вікі-дапаможнік. Як узмацніць беларускую вікі-супольнасць
 
Тимофей Титовец — Elastic+Logstash+Kibana: Архитектура и опыт использования
Тимофей Титовец — Elastic+Logstash+Kibana: Архитектура и опыт использованияТимофей Титовец — Elastic+Logstash+Kibana: Архитектура и опыт использования
Тимофей Титовец — Elastic+Logstash+Kibana: Архитектура и опыт использования
 
Андрэй Захарэвіч - Як мы ставілі KDE пад FreeBSD
Андрэй Захарэвіч - Як мы ставілі KDE пад FreeBSDАндрэй Захарэвіч - Як мы ставілі KDE пад FreeBSD
Андрэй Захарэвіч - Як мы ставілі KDE пад FreeBSD
 
Vitaly ̈_Vi ̈ Shukela - My FOSS projects
Vitaly  ̈_Vi ̈ Shukela - My FOSS projectsVitaly  ̈_Vi ̈ Shukela - My FOSS projects
Vitaly ̈_Vi ̈ Shukela - My FOSS projects
 
Vitaly ̈_Vi ̈ Shukela - Dive
Vitaly  ̈_Vi ̈ Shukela - DiveVitaly  ̈_Vi ̈ Shukela - Dive
Vitaly ̈_Vi ̈ Shukela - Dive
 
Alexander Lomov - Cloud Foundry и BOSH: истории из жизни
Alexander Lomov - Cloud Foundry и BOSH: истории из жизниAlexander Lomov - Cloud Foundry и BOSH: истории из жизни
Alexander Lomov - Cloud Foundry и BOSH: истории из жизни
 
Vikentsi Lapa — How does software testing become software development?
Vikentsi Lapa — How does software testing  become software development?Vikentsi Lapa — How does software testing  become software development?
Vikentsi Lapa — How does software testing become software development?
 
Михаил Волчек — Свободные лицензии. быть или не быть? Продолжение
Михаил Волчек — Свободные лицензии. быть или не быть? ПродолжениеМихаил Волчек — Свободные лицензии. быть или не быть? Продолжение
Михаил Волчек — Свободные лицензии. быть или не быть? Продолжение
 
Максим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPN
Максим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPNМаксим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPN
Максим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPN
 
Слава Машканов — “Wubuntu”: Построение гетерогенной среды Windows+Linux на н...
Слава Машканов — “Wubuntu”: Построение гетерогенной среды  Windows+Linux на н...Слава Машканов — “Wubuntu”: Построение гетерогенной среды  Windows+Linux на н...
Слава Машканов — “Wubuntu”: Построение гетерогенной среды Windows+Linux на н...
 
MajorDoMo: Открытая платформа Умного Дома
MajorDoMo: Открытая платформа Умного ДомаMajorDoMo: Открытая платформа Умного Дома
MajorDoMo: Открытая платформа Умного Дома
 
Максим Салов - Отладочный монитор
Максим Салов - Отладочный мониторМаксим Салов - Отладочный монитор
Максим Салов - Отладочный монитор
 
Максим Мельников - FOSDEM 2014 overview
Максим Мельников - FOSDEM 2014 overviewМаксим Мельников - FOSDEM 2014 overview
Максим Мельников - FOSDEM 2014 overview
 
Константин Шевцов - Пара слов о Jenkins
Константин Шевцов - Пара слов о JenkinsКонстантин Шевцов - Пара слов о Jenkins
Константин Шевцов - Пара слов о Jenkins
 
Ермакович Света - Операция «Пингвин»
Ермакович Света - Операция «Пингвин»Ермакович Света - Операция «Пингвин»
Ермакович Света - Операция «Пингвин»
 
Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...
Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...
Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...
 
Vikentsi Lapa - Tools for testing
Vikentsi Lapa - Tools for testingVikentsi Lapa - Tools for testing
Vikentsi Lapa - Tools for testing
 
Алексей Туля - А нужен ли вам erlang?
Алексей Туля - А нужен ли вам erlang?Алексей Туля - А нужен ли вам erlang?
Алексей Туля - А нужен ли вам erlang?
 

Último

Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 

Último (20)

Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 

Алексей Чеусов - Расчёсываем своё ЧСВ

  • 1. The Word-based Regular Expressions (originally CHeuSoV :-) ) Aleksey Cheusov vle@gmx.net Minsk LUG meeting, 29 Dec. 2012, Belarus
  • 2. Plan for my presentation Mathematics: RL definition and theorem Real World: applications of RL and some notes Linguistics: POS tagset, Semantic tagset, Text tagging, Tagged text corpus Wanted: (mumbling...) Mathematics: ExpRL definition, WRL definition Linguistics + Real World: WRE syntax and examples Вброс, драка, разбор завалов, развоз трупов
  • 3. Regular language Definition: given a finite non-empty set of elements Σ (alphabet): ∅ is a regular language. For any element v ∈ Σ, {v} is a regular language. If A and B are regular languages, so is A ∪ B. If A and B are regular languages, so is {ab|a ∈ A, b ∈ B}, where ab means string concatenation. If A is a regular language, so is A∗ where A∗ = ∅ ∪ {a1a2. . .an|ai ∈ A, n > 0}. No other languages over Σ are regular. Example: Let Σ = {a, b, c}. Then since aab and cc are members of Σ∗, {aab} and {cc} are regular languages. So is the union of these two sets {aab, cc}, and so is the concatenation of the two {aabcc}. Likewise, {aab}∗, {cc}∗ and {aab, cc, aabcc}∗ are regular languages.
  • 4. Regular language (interesting fact) Provable fact: Regular languages are closed under intersection and negation operations.
  • 5. Finite state automaton Definition: 5-tuple (Σ, S, S0, F, δ) is a finite state automaton Σ is a finite non-empty set of elements (alphabet). S is a finite non-empty set of states. S0 ⊆ S is a set of start states. F ⊆ S is a set of finite states. δ : S × Σ → 2S is a state transition function. Theorem: ∀ regular language R, ∃ FSA f , such that L(f ) = R; ∀ FSA f , L(f ) is a regular language (L — language of FSA, that is a set of accepted inputs).
  • 6. Real world. Regular expressions (hello UNIX!). Regular language is a foundation for so called regular expressions, in most cases alphabet Σ is a set of characters (ASCII, Unicode etc.): POSIX basic regular expressions (grep, sed, vi etc.) and extended regular expressions (grep -E, awk etc.) Google re2, Yandex PIRE Perl, pcre, Ruby, Python (superset of regular language, and. . . extremely inefficient ;-) ) lex ... Widely used extensions: submatch (depending on implementation may still be FSM but not FSA) backreferences (incompatible with regular language and FSM at all)
  • 7. Tagsets Penn part-of-speech tag set: NN noun, singular or mass (apple, computer, fruit etc.) NNS noun plural (apples, computers, fruits etc.) CC coordinating conjunction (and, or) VB verb, base form (give, book, destroy etc.) VBD verb, past tense form (gave, booked, destroyed etc.) VBN verb, past participle form (given, booked, destroyed etc.) VBG verb, gerund/present participle (giving, booking, destroying etc.) etc. Semantic tag set: LinkVerb a verb that connects the subject to the complement(seem, feel, look etc.) AnimateNoun (brother, son etc.) etc.
  • 8. Tagged sentence, POS tagging, semantic tagging Examples: The book is red → The_DT book_NN is_VBZ red_JJ → The_DT book_NN/Object is_VBZ red_JJ/Color Note: Words book and red are ambiguous. My son goes to school → My_PRP$ son_NN goes_VBZ to_IN school_NN → My_PRP$ son_NN/Person goes_VBZ to_TO school_NN/Establishment Questions: How to match tagged sentence or find portions of it? Can regular expressions help? Do we need regcomp(3)/regexec(3)-like functionality?
  • 9. Expanded regular language Definition: given a non-empty set of elements Σ (alphabet) and a set of one-place predicates P = {P1, P2, . . . Pk}, Pi : Σ → {true, false}: ∅ is an expanded regular language. For any element v ∈ Σ, {v} is an expanded regular language. For any i {v|Pi (v) = true} is an expanded regular language. Σ is an expanded regular language. If A and B are expanded regular languages, so is A ∪ B. If A and B are expanded regular languages, so is A B. If A and B are expanded regular languages, so is {ab|a ∈ A, b ∈ B}, where ab means string concatenation. If A is a regular language, so is A∗ where A∗ = ∅ ∪ {a1a2. . .an|ai ∈ A, n > 0}. No other languages over Σ are expanded regular.
  • 10. Expanded regular language (interesting facts) Provable facts: Expanded regular languages are closed under intersection and negation operations. ∀ expanded regular language R we can build a regular language R∗ over alphabet Σ∗, such that ∃f : Σ∗ → 2Σ and L(R) = L(R∗) (expanding all elements in L(R∗) with a help of f ). Thus, we can build expanded regular expression engine based on traditional FSA/FSM-based algorithms.
  • 11. The Word-based regular language Definition: given W — set of words (character sequences), e.g. ”the”, ”apple””123”, ”; ”, ”C2H5OH”, . . . TPOS — finite non-empty set of part-of-speech tags, e.g. {DT, NN, NNS, VBP, VBZ, . . . } TSem — finite non-empty set of semantic tags, e.g. {LinkVerb, Person, Object, TransitiveVerb, . . . } DPOS : W → 2TPOS — POS dictionary, e.g. DPOS (”the”) = {DT}, DPOS(”book”) = {NN, VB, VBP}, DPOS (”and”) = {CC} DSem : W → 2TSem — semantic dictionary, e.g. DSem(”son”) = {Person}, DSem(”mouse”) = {Animal, ComputerDevice} EREs — finite set of POSIX extended regular expressions, e.g. {”. ∗ ing”, ”.multi. ∗ al”, ”[A − Z][a − z]”, ”[0 − 9] + ” . . . } etc. (to be continued)
  • 12. The Word-based regular language (continuation) the word-based regular language (TPOS , TSem, DPOS, DSem, EREs) is an expanded regular language over alphabet W × TPOS × 2TSem and one-place predicates P = {Pcheck tPOS , Pcheck tSem , Ptagging tPOS , Ptagging tSem , Pword re }, where Pcheck tPOS (w, ·, ·) = true if tPOS ∈ DPOS(w), and false otherwise Pcheck tSem (w, ·, ·) = true if tSem ∈ DSem(w), and false otherwise Ptagging tPOS (·, tagPOS , ·) = true if tPOS = tagPOS, and false otherwise Ptagging tSem (·, ·, tagsSem) = true if tSem ∈ tagsSem, and false otherwise Pword re (w, ·, ·) = true if POSIX extended regular expression re ∈ EREs matches w, and false otherwise
  • 13. The Word-based regular expressions (Finally!). Syntax: "word" — word itself (Pword re ), e.g. "the", "2012-12-29" etc. ’regexp’ — words matched by specified regexp (Pword re ) Tag — words tagged as tagPOS (Ptagging tPOS ), e.g. NN, DT, VB etc. %Tag — words tagged as tagSem (Ptagging tSem ), e.g. %Person, %Object, %LinkLerb etc. _Tag — words having as tagPOS in POS dictionary (Pcheck tPOS ), e.g. _NN, _DT, _VB etc. @Tag — words having as tagSem in semantic dictionary (Pcheck tSem ), e.g. @LinkVerb, @Object etc. . (dot) — any word with any POS and semantic tags ˆ — beginning of the sentence $ — end of the sentence (to be continued)
  • 14. The Word-based regular expressions (Finally!). Syntax (continuation): ( R ) — grouping like in mathematical expressions <num R > — submatch and extraction R ? and [ R ] — optional WRE R * and R + — possibly empty and non-empty repetitions R {n,m}, R {n,} and R {,m} — repetitions R – S — subtraction R & S and R / S — intersection, / is for single word WREs, & is for complex WREs R | S — union R S — concatenation !R — negation (L(!R) is equal to either Σ L(R) or Σ ∗ L(R) depending on a context of use) (to be continued)
  • 15. The Word-based regular expressions (Finally!). Syntax (priorities from highest to lowest, continuation): , / and | in single word non-spaced WREs Prepositional unary operation ! Postpositional unary operations {n,m}, ’?’, ’+’ and ’*’ ( R ) and <num R > R & S R – S R S R | S (to be continued)
  • 16. WRE examples How to select noun phrases (leftmost-longest match, only POS tags is our rules) ( DT | CD+ )? RB * CC|JJ|JJR|JJS * (NN|NNS + | NP + ) Ex.: This absolutely stupid decision Ex.: The best fuel cell Ex.: Black and white colors Ex.: Vasiliy Pupkin NER (Named Entity Recognition) for person names DSem("MrDr") = {"Mrs.", "Mr.", "Dr.", "Doctor", "President", "Dear", . . . } @MrDr <1 ’[A-Z][a-z]+’/!@MrDr␣+> Ex.: Doctor <1 Zhivago> Ex.: Dr. <1 Vasiliy Pupkin> Ex.: Mrs. <1 Kate> but not Ex.: Mr. <1 President>
  • 17. WRE examples Question focus (object attributes) ˆ "What" "is" "the" <1 @Attribute > "of" <2 .* > "?" Ex.: What is the <1 color > of <2 your book> ? Tiny definitions m4_define(NounPhrase, “(see previous slide)”) ˆ (NounPhrase & (. . . . . . .)) "is" (NounPhrase & (. . . . . . .)) Ex.: What is the <1 color > of <2 your book> ? Sentiment analysis DSem("BadCharact") = {"sucks", "stupid", "crappy", "shitty", . . . } DSem("GoodCharact") = {"rocks", "awesome", "excellent", . . . } DSem("OurProduct") = {"Linux", "NetBSD", "Ubuntu", "AltLinux", "iPad", "Android", . . . } <1 @BadCharact|@GoodCharact> <2 @OurProduct> | <2 @OurProduct> <1 @BadCharact|@GoodCharact>
  • 18. The Word-based Regular Expressions is really cool DSL!