SlideShare uma empresa Scribd logo
1 de 9
Baixar para ler offline
Separating Shuffle Regular Expressions for Data
Words
Manoj Kilaru Etienne Lozes Sylvain Schmitz
May 27, 2016
1 Definitions
1.1 Background
A data word is a finite sequence w =
a1
d1
. . .
an
dn
∈ (Σ × D)∗
, where Σ is a
finite set of symbols, and D is an infinite set of data (in all examples, D = N).
The string projection [standard name? -el] str(w) is the word a1 . . . an ∈ Σ∗
.
For a fixed data word w =
a1
d1
. . .
an
dn
, the length |w| of w is n, and the
underlying equivalence relation ∼w is {(i, j) | i, j ≤ |w| and di = dj}. For w, w
two data words, we write w ≈ w if str(w) = str(w ) and ∼w=∼w .
A data language is a set of words L ⊆ (Σ × D)∗
closed under ≈.
[TODO: Define RA and DA -el]
Several formalisms for describing data languages have been proposed, that
can be grouped in three categories
• register automata (RA, see [KF94])
• first/second order logic (most notably FO2
(+1, <, ∼), see [BDM+
11]) [de-
fine these logics -el]
• data automata (DA, see [BDM+
11])
For any RA, a FO2
(+1, <, ∼) formula recognizing the same language can be
computed [complexity? -el], and for any FO2
(+1, <, ∼) formula, an equivalent
DA can be computed in 2-EXPTIME. However, there are no converse transla-
tions possible: the language defined by the formula ∀x, y.x = y ⇒ x ∼ y can’t
be recognized by a register automaton (example language for FO2
RA?).
[We also found a direct encoding of RA into DA a bit simpler than the one
that can be found in [BS07]. The idea is to guess the run of the RA, and guess
for every position of the run manipulating a register r whether this is the last
acces to this register until an update of its content. -el]
1
Given a language L described by a RA/DA/FO formula, one might want to
know whether it contains at least one word; this is called the emptiness prob-
lem for RA/DA and the satisfiability problem for FO formulas. For RA/DA,
emptiness is decidable, and satisfiability is decidable as well for FO2
(+1, <, ∼).
However, pretty close formalisms for data words, like FO3
(+1, ∼), have unde-
cidable emptiness/satisfiability.
1.2 Separating shuffle
A data word w is a separating shuffle of w1 and w2, noted w ∈ w1 w2, if
w is a shuffle of w1 and w2, and if the data occurring in w1 are disjoint from
the data of w2. For instance,
a
1
b
2
c
1
d
3
∈
a
1
c
1
b
2
d
3
, but
a
1
b
2
c
1
d
3
∈
a
1
d
3
b
2
c
1
, because 1 is a common data of the
two words. is extended to languages as expected: L1 L2 is the set of words
w such that w ∈ w1 w2 for at least one w1 ∈ L1 and one w2 ∈ L2.
The separating shuffle can be combined with the boolean connectives and
regular expressions (RE) to define separating shuffle regular expressions (SSRE):
e ::= | a ∈ Σ | e.e | e ∨ e | e∗
(RE)
ϕ ::= e | ϕ ∧ ϕ | ϕ ∨ ϕ | ¬ϕ | ϕ ϕ (SSRE)
[define SSRE with homomorphisms -el]
The semantics of a SSRE is defined as expected. For instance, the SSRE
¬(¬ ¬ ) ∧ ¬
describes the set of data words with always the same data occurring in all
positions. If we abbreviate this SSRE as atom, the following
¬ (atom ∧ ¬
a∈Σ
a)
describes the set of words that satisfy the formula ∀x, y.x = y ⇒ x ∼ y.
2 Questions
We are interesting in the following questions:
• how SSRE relate to RA/DA/FO2
?
• how the projections of SSRE languages relate to multicounter automata,
or shuffle-star definable languages?
• is emptiness decidable for SSRE?
• what about variants of this (like for instance if we allow concatenation
above shuffle)?
2
• can we axiomatize SSRE (like Kleene algebras [Koz02], concurrent Kleene
algebras [HMSW11], nominal Kleene algebras [GC11, KMS15]).
• what is the role of homormorphisms in SSRE (expressiveness, decidabil-
ity)?
3 Results
Theorem 1. Let L be such that L/ ≈ is finite. Then L is SSRE definable.
Proof. It is enough to show that L0 := {w|w ≈ w0} is SSRE definable for any
w0. We reason by induction on |w0|:
• if w0 = , then L0 is defined by the SSRE
• if w0 = a, d , then L0 is defined by the SSRE a
• otherwise w0 =
a
d
· wR = wL ·
a
d
and by induction there are eR
and eL that define [wR]≈ and [wL]≈ respectively. Then a.eR ∧ eL.a ∧
1class . . . 1class
k times
, where k is the index of ∼w0 , defines [w0]≈. Indeed,
if either d or d occurs in the middle of the word, then ∼wR
and ∼wL
completely determine ∼w0
. Otherwise, wether d = d is determined by
the number of equivalence classes of ∼w0
.
Theorem 2. Every DA definable language is definable by a SSRE+hom.
Proof. [TODO. Check wether this can be done without any negations, using
just the duals of all connectives. -el]
Proof using negations
Consider Data automata A = < τ, B > where the τ is a letter to letter trans-
ducer and B is a class automata which is non deterministic finite automata.
Now as B is NFA it can be expressed in a regular expression let Be
.
A Data word v will be in language of A if in the output of the base automata
(let) v there is no class which get rejected by the class automata
let the language, shuffle of B be represented by B = ¬((¬Be
∧ 1class) Σ∗
)
Then L(A) = τ−1
(B ) where τ : A∗
→ C∗
so τ−1
: C∗
→ A∗
and let us label
every transition by new alphabet Σ . For example let the
c:a
−−→
t
where c ∈ C∗
and b ∈ B∗
and t ∈∗
Σ
let functions g and h be g : t → c and h : t → a.
3
After this labelling let the automata be T
Now
L(A) = τ−1
(B )
= h(T ∩ g−1
(B ))
= f(T ∩ B )
where
B = {b1t1b2t2 . . . bmtm/b1b2 · · ·m ∈ B }
T = {b1t1b2t2 · · ·m tm/t1t2 . . . tm ∈ L(T) ∧ ∀ig(ti) = bi}
f : bi → ∧ ti → h(ti)
Hence the language of Data automata is expressed in SSRE + homomorphisms
accepting the same language.
Theorem 3. The emptiness and universality of SSRE + homomorphisms is
undecidable.
Proof. It is known that universality for RA is undecidable [NSV04] and every
RA can be expressed as DA accepting the same language [BDM+
11].
So universality of DA is undecidable and thereby universality of SSRE + ho-
momorphisms is undecidable and SSRE + homomorphisms are closed under
negations hence emptiness of SSRE + homomorphisms is also undecidable.
Let Σn = {ai, bi | i = 1, . . . , n} and Ln the language defined by
n
i=1
∀x.bi(x) ⇒ ∃y.y < x ∧ ai(y) ∧ y ∼ x ∧ ∀z.y < z < x ⇒ ¬ai(z)
[How can it be written with two variables only? -el] This language corresponds
to the language of a RA with n registers that write in regi while reading ai and
checks wrt to regi while reading bi.
Theorem 4. The above language Ln is SSRE definable.
Proof. [I forgot... -el] The expression
∀concatenationfactor(
n
i=1
(wi.(Σ/wi)∗
=⇒ ((wir∗
i ) (Σ/wi, ri)∗
)))
expresses the above language Ln where ∀concatenationfactor(e)means¬(Σ∗
.¬e.Σ∗
).
Let Lno the language of data words with non-overlapping classes, i.e. Lno =
{w | [i]∼w
is a connex set of positions for all i = 1, . . . , |w|} = {w | w |= ∀x, y.x
+1
∼
y ⇒ x + 1 = y}.
4
Theorem 5. Lno is not RA definable, but it is SSRE definable.
Proof. [I forgot... -el]
Let RA has r number of states then consider a word of length r+2 with first r+1
positions distinct data and r+2 position has to be checked with all r+1 previous
positions which is not possible as it has only r registers. Hence the Language
cannot be expressed in RA. The expression
∀shufflefactor(2classes =⇒ 1class.1class)
recognizes above language. where ∀shufflefactor(e) = ¬(¬e Σ∗
)
Theorem 6. The emptiness of SSRE (without homomorphisms) is undecidable.
Proof. The proof uses the idea in section 9 of [BDM+
11]
The proof is by reduction from Post’s Correspondence Problem(PCP).PCP
problem is given a pair of n words (ui, vi) from Σ∗
and the undecidable ques-
tion is there a sequence of numbers i0, i1, . . . , im ∈ {1, 2, . . . , n} such that
ui0
ui1
. . . uim
= vi0
vi1
. . . vim
.
consider a map from each letter in Σ to each letter in Σ such that Σ ∩ Σ = φ
If a word w ∈ Σ∗
,then a word w ∈ Σ ∗
can be constructed by replacing each sym-
bol in Σ with corresponding symbol in Σ consider a new alphabet Σ = Σ ∪ Σ .
Now sequence i0, i1, . . . , im is the solution of PCP iff the Data word w with
following properties is accepted by our SSRE.
1. The string projection is of the form (uivi)∗
where ui ∈ Σ∗
and vi ∈ Σ ∗
which corresponds to the vi ∈ Σ∗
2. all the data in letters in Σ should be distinct and data in all letters in
Σ should be distinct and all the data which appeared in the Σ∗
should
appear in the Σ ∗
and in same order.
Now we encode this properties in SSRE without homomorphisms
1. The language (uivi)∗
is a regular language hence there exists an regular
expression for it
2. (a) all the data in Σ∗
should be distinct for this consider expression
¬((Σ
∗
.Σ.Σ
∗
.Σ.Σ
∗
∧ 1class) Σ
∗
)
(b) all the data in Σ ∗
should be distinct for this consider expression
¬((Σ
∗
.Σ .Σ
∗
.Σ .Σ
∗
∧ 1class) Σ
∗
)
(c) for all the data present in Σ∗
there is corresponding position of Σ ∗
with same data for this consider the expression
¬((Σ ∧ 1class) Σ
∗
) ∧ ¬((Σ ∧ 1class) Σ
∗
)
5
(d) all the data in Σ∗
should be in same order in Σ ∗
for this consider
expression
∀shufflefactor(2classes =⇒ (¬(Σ.(Σ
2
∧ 1class).Σ))
hence emptiness of SSRE without is undecidable.
4 Restricted Negation
A SSRE is positive if does not occur underneath a negation.
Theorem 7. If e is a positive SSRE, then L(e) is EMSO2
(< . ∼, +1,
+1
∼) defin-
able.
Proof. [TODO, this is only a sketch of the proof -el] Let ϕe(X) be the EMSO2
(<
. ∼, +1,
+1
∼) with fv(ϕe) = {X} defined by induction on e as follows:
• ...
Then for all ˆX ⊆ {1, . . . , |w|}, it holds that w, X → ˆX |= ϕe iff w| ˆX ∈ L(e),
where w| ˆX =
ai1
di1
. . .
aik
dik
if ˆX = {i1, . . . , ik}. Finally, ∃X.∀x.x ∈ X∧ϕe(X)
defines L(e).
5 Conjectures
Let the language LF L be defined as ∃x, y(x ∼ y ∧ (∀yx < y ∨ x = y) ∧ (∀xx <
y ∨ x = y)) which expresses that first and last position contains different data.
Conjecture 1. The language LF L cannot be represented by SSRE (without
homomorphisms).
Conjecture 2. RA,DA,FO2
(+1, <, ∼) are not comparable with SSRE (without
homomorphisms) in expressiveness of languages of data words.
Proof. The above mentioned language LF L can be represented using RA,FO2
(+1, <
, ∼),DA.
• As the definition is expressed in FO2
(+1, <, ∼) so the language is FO2
(+1, <
, ∼) definable.
• It is RA expressible because the automaton stores the data of the first
position and it nondeterministically guesses the last position and does a
read operation of the same register here.
• It is DA expressible as it the transducer guesses the first and last position
with # rest it keeps the same and the class automata checks whether the
language is (Σ∗
+ #.Σ∗
.#).
6
From above conjecture the language LF L cannot be represented in SSRE
(without homomorphisms)
The language ∀x, y.x = y ⇒ x ∼ y describing that all data are distinct can
be expressed using SSRE (without homomorphisms) but cannot be represented
using an RA
• Suppose the language can be represented using an RA.
let RA has r number of states then consider a word of length r+2 with
first r+1 positions distinct data and r+2 position has to be checked with
all r+1 previous positions which is not possible as it has only r registers.
Hence contradiction.
• The expression
¬ (1class ∧ ¬ ∧ ¬
a∈Σ
a)
represents the above language.
The language all the positions have same data and length of the data word
is even cannot represented FO2
(+1, <, ∼) but can be represented using a SSRE
(without homomorphisms)
• The expression
1class ∧ (ΣΣ)∗
represents the above mentioned language.
• The above mentioned language cannot be represented using an FO2
(+1, <
, ∼). It can be shown by playing EhrenfeuchtFrass game on family of words
wn = (
a
d
)2n
andwn = (
a
d
)2n
+1
Let the L be language of data words satisfying the following properties
1. String projection is of the form a∗
ba∗
.
2. The data value at the position of b is distinct from rest of the positions.
3. Rest of the data occurs precisely twice once before b and once after after-
wards
4. The order of the data before b must not be exactly same as the order of
data after b occurs.
[BDM+
11]
The language L can be represented using a DA. The letter letter transducer
changes all the a’s before b to # except it marks two positions and labels first
position x and next y and b to b and all a’s after b to # except it marks two
positions and labels first position x and next y. the class automata accepts the
language
b + ## + xy + yx
7
[BDM+
11]
The language L can be represented using a SSRE (without homomor-
phisms). The expression
a∗
ba∗
∧¬((Σ+
.b.Σ∗
∧1class) Σ∗
)∧¬((Σ∗
.b.Σ+
∧1class) Σ∗
)∧¬((Σ∗
.a.Σ∗
.a.Σ∗
.b.Σ∗
∧2classes) Σ∗
)
∧¬((Σ∗
.b.Σ∗
.a.Σ∗
.a.Σ∗
∧2classes) Σ∗
)∧¬(a Σ∗
)∧((a.(aa∧1class).a∧2classes) Σ∗
)
expresses the language L .
It can be shown that ¬L cannot be represented by an DA [BDM+
11].
But SSRE are closed under negations hence ¬L can be represented using a
SSRE(without homomorphisms).
Conjecture 3. The language {
a1
d1
. . .
an
dn
| d1 = dn} is not SSRE definable.
However, it is RA definable with one register.
Open question: can SSRE (without homeomorphisms) express all languages
defined by deterministic RA?
References
[BDM+
11] Mikolaj Boja´nczyk, Claire David, Anca Muscholl, Thomas
Schwentick, and Luc Segoufin. Two-variable logic on data words.
ACM Trans. Comput. Log., 12(4):27, 2011.
[BS07] Henrik Bj¨orklund and Thomas Schwentick. On notions of regularity
for data languages. In Fundamentals of Computation Theory, 16th
International Symposium, FCT 2007, Budapest, Hungary, August
27-30, 2007, Proceedings, pages 88–99, 2007.
[GC11] Murdoch James Gabbay and Vincenzo Ciancia. Freshness and
name-restriction in sets of traces with names. In FOSSACS 2011,
pages 365–380, 2011.
[HMSW11] Tony Hoare, Bernhard Mller, Georg Struth, and Ian Wehrman.
Concurrent kleene algebra and its foundations. The Journal of Logic
and Algebraic Programming, 80(6):266 – 296, 2011. Relations and
Kleene Algebras in Computer Science.
[KF94] Michael Kaminski and Nissim Francez. Finite-memory automata.
Theor. Comput. Sci., 134(2):329–363, 1994.
[KMS15] Dexter Kozen, Konstantinos Mamouras, and Alexandra Silva. Com-
pleteness and incompleteness in nominal kleene algebra. In Rela-
tional and Algebraic Methods in Computer Science - 15th Interna-
tional Conference, RAMiCS 2015, Braga, Portugal, September 28 -
October 1, 2015, Proceedings, pages 51–66, 2015.
8
[Koz02] Dexter Kozen. On the complexity of reasoning in kleene algebra.
Inf. Comput., 179(2):152–162, 2002.
[NSV04] Frank Neven, Thomas Schwentick, and Victor Vianu. Finite state
machines for strings over infinite alphabets. ACM Trans. Comput.
Logic, 5(3):403–435, July 2004.
9

Mais conteúdo relacionado

Mais procurados

Алексей Чеусов - Расчёсываем своё ЧСВ
Алексей Чеусов - Расчёсываем своё ЧСВАлексей Чеусов - Расчёсываем своё ЧСВ
Алексей Чеусов - Расчёсываем своё ЧСВMinsk Linux User Group
 
05 8640 (update email) multiset cs closure propertie (edit lafi)2
05 8640 (update email) multiset cs closure propertie (edit lafi)205 8640 (update email) multiset cs closure propertie (edit lafi)2
05 8640 (update email) multiset cs closure propertie (edit lafi)2IAESIJEECS
 
3.1 intro toautomatatheory h1
3.1 intro toautomatatheory  h13.1 intro toautomatatheory  h1
3.1 intro toautomatatheory h1Rajendran
 
A new technique for proving non regularity based on the measure of a language
A new technique for proving non regularity based on the measure of a languageA new technique for proving non regularity based on the measure of a language
A new technique for proving non regularity based on the measure of a languageRyoma Sin'ya
 
形式言語理論への 測度論的アプローチ
形式言語理論への 測度論的アプローチ形式言語理論への 測度論的アプローチ
形式言語理論への 測度論的アプローチRyoma Sin'ya
 
Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4shah zeb
 
Chapter1 Formal Language and Automata Theory
Chapter1 Formal Language and Automata TheoryChapter1 Formal Language and Automata Theory
Chapter1 Formal Language and Automata TheoryTsegazeab Asgedom
 
Theory of Automata Lesson 02
Theory of Automata Lesson 02Theory of Automata Lesson 02
Theory of Automata Lesson 02hamzamughal39
 
Automata
AutomataAutomata
AutomataGaditek
 
Theory of Automata
Theory of AutomataTheory of Automata
Theory of AutomataFarooq Mian
 
Cd32939943
Cd32939943Cd32939943
Cd32939943IJMER
 
IRJET- On Semigroup and its Connections with Lattices
IRJET- On Semigroup and its Connections with LatticesIRJET- On Semigroup and its Connections with Lattices
IRJET- On Semigroup and its Connections with LatticesIRJET Journal
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal languageRabia Khalid
 
Theory of Automata Lesson 01
 Theory of Automata Lesson 01  Theory of Automata Lesson 01
Theory of Automata Lesson 01 hamzamughal39
 
Theory of Computation "Chapter 1, introduction"
Theory of Computation "Chapter 1, introduction"Theory of Computation "Chapter 1, introduction"
Theory of Computation "Chapter 1, introduction"Ra'Fat Al-Msie'deen
 

Mais procurados (20)

Алексей Чеусов - Расчёсываем своё ЧСВ
Алексей Чеусов - Расчёсываем своё ЧСВАлексей Чеусов - Расчёсываем своё ЧСВ
Алексей Чеусов - Расчёсываем своё ЧСВ
 
05 8640 (update email) multiset cs closure propertie (edit lafi)2
05 8640 (update email) multiset cs closure propertie (edit lafi)205 8640 (update email) multiset cs closure propertie (edit lafi)2
05 8640 (update email) multiset cs closure propertie (edit lafi)2
 
3.1 intro toautomatatheory h1
3.1 intro toautomatatheory  h13.1 intro toautomatatheory  h1
3.1 intro toautomatatheory h1
 
A new technique for proving non regularity based on the measure of a language
A new technique for proving non regularity based on the measure of a languageA new technique for proving non regularity based on the measure of a language
A new technique for proving non regularity based on the measure of a language
 
Lesson 03
Lesson 03Lesson 03
Lesson 03
 
形式言語理論への 測度論的アプローチ
形式言語理論への 測度論的アプローチ形式言語理論への 測度論的アプローチ
形式言語理論への 測度論的アプローチ
 
Theory of computation Lec1
Theory of computation Lec1Theory of computation Lec1
Theory of computation Lec1
 
Unit i
Unit iUnit i
Unit i
 
Lecture 3,4
Lecture 3,4Lecture 3,4
Lecture 3,4
 
Chapter1 Formal Language and Automata Theory
Chapter1 Formal Language and Automata TheoryChapter1 Formal Language and Automata Theory
Chapter1 Formal Language and Automata Theory
 
Theory of Automata Lesson 02
Theory of Automata Lesson 02Theory of Automata Lesson 02
Theory of Automata Lesson 02
 
Automata
AutomataAutomata
Automata
 
Theory of Automata
Theory of AutomataTheory of Automata
Theory of Automata
 
Cd32939943
Cd32939943Cd32939943
Cd32939943
 
Theory of computation Lec2
Theory of computation Lec2Theory of computation Lec2
Theory of computation Lec2
 
IRJET- On Semigroup and its Connections with Lattices
IRJET- On Semigroup and its Connections with LatticesIRJET- On Semigroup and its Connections with Lattices
IRJET- On Semigroup and its Connections with Lattices
 
Hak ontoforum
Hak ontoforumHak ontoforum
Hak ontoforum
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal language
 
Theory of Automata Lesson 01
 Theory of Automata Lesson 01  Theory of Automata Lesson 01
Theory of Automata Lesson 01
 
Theory of Computation "Chapter 1, introduction"
Theory of Computation "Chapter 1, introduction"Theory of Computation "Chapter 1, introduction"
Theory of Computation "Chapter 1, introduction"
 

Semelhante a draft

Theory of Computation - Lectures 4 and 5
Theory of Computation - Lectures 4 and 5Theory of Computation - Lectures 4 and 5
Theory of Computation - Lectures 4 and 5Dr. Maamoun Ahmed
 
1LECTURE 8 Regular_Expressions.ppt
1LECTURE 8 Regular_Expressions.ppt1LECTURE 8 Regular_Expressions.ppt
1LECTURE 8 Regular_Expressions.pptMarvin886766
 
Mod 2_RegularExpressions.pptx
Mod 2_RegularExpressions.pptxMod 2_RegularExpressions.pptx
Mod 2_RegularExpressions.pptxRaviAr5
 
Chapter 3 REGULAR EXPRESSION.pdf
Chapter 3 REGULAR EXPRESSION.pdfChapter 3 REGULAR EXPRESSION.pdf
Chapter 3 REGULAR EXPRESSION.pdfdawod yimer
 
End semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
End semexam | Theory of Computation | Akash Anand | MTH 401A | IIT KanpurEnd semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
End semexam | Theory of Computation | Akash Anand | MTH 401A | IIT KanpurVivekananda Samiti
 
Automata
AutomataAutomata
AutomataGaditek
 
Regular expression with DFA
Regular expression with DFARegular expression with DFA
Regular expression with DFAMaulik Togadiya
 
Formal Languages (in progress)
Formal Languages (in progress)Formal Languages (in progress)
Formal Languages (in progress)Jshflynn
 
Chapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdfChapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdfDrIsikoIsaac
 
Regular expression
Regular expressionRegular expression
Regular expressionRajon
 
RegularExpressions.pdf
RegularExpressions.pdfRegularExpressions.pdf
RegularExpressions.pdfImranBhatti58
 
Formal methods 3 - languages and machines
Formal methods   3 - languages and machinesFormal methods   3 - languages and machines
Formal methods 3 - languages and machinesVlad Patryshev
 
Hw2 2017-spring
Hw2 2017-springHw2 2017-spring
Hw2 2017-spring奕安 陳
 

Semelhante a draft (20)

Theory of Computation - Lectures 4 and 5
Theory of Computation - Lectures 4 and 5Theory of Computation - Lectures 4 and 5
Theory of Computation - Lectures 4 and 5
 
1LECTURE 8 Regular_Expressions.ppt
1LECTURE 8 Regular_Expressions.ppt1LECTURE 8 Regular_Expressions.ppt
1LECTURE 8 Regular_Expressions.ppt
 
Mod 2_RegularExpressions.pptx
Mod 2_RegularExpressions.pptxMod 2_RegularExpressions.pptx
Mod 2_RegularExpressions.pptx
 
Chapter 3 REGULAR EXPRESSION.pdf
Chapter 3 REGULAR EXPRESSION.pdfChapter 3 REGULAR EXPRESSION.pdf
Chapter 3 REGULAR EXPRESSION.pdf
 
End semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
End semexam | Theory of Computation | Akash Anand | MTH 401A | IIT KanpurEnd semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
End semexam | Theory of Computation | Akash Anand | MTH 401A | IIT Kanpur
 
L_2_apl.pptx
L_2_apl.pptxL_2_apl.pptx
L_2_apl.pptx
 
Automata
AutomataAutomata
Automata
 
Unit ii
Unit iiUnit ii
Unit ii
 
Algorithms on Strings
Algorithms on StringsAlgorithms on Strings
Algorithms on Strings
 
Theory of computation
Theory of computationTheory of computation
Theory of computation
 
QB104541.pdf
QB104541.pdfQB104541.pdf
QB104541.pdf
 
Regular expression with DFA
Regular expression with DFARegular expression with DFA
Regular expression with DFA
 
Ch02
Ch02Ch02
Ch02
 
Formal Languages (in progress)
Formal Languages (in progress)Formal Languages (in progress)
Formal Languages (in progress)
 
Chapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdfChapter2CDpdf__2021_11_26_09_19_08.pdf
Chapter2CDpdf__2021_11_26_09_19_08.pdf
 
Regular expression
Regular expressionRegular expression
Regular expression
 
PART A.doc
PART A.docPART A.doc
PART A.doc
 
RegularExpressions.pdf
RegularExpressions.pdfRegularExpressions.pdf
RegularExpressions.pdf
 
Formal methods 3 - languages and machines
Formal methods   3 - languages and machinesFormal methods   3 - languages and machines
Formal methods 3 - languages and machines
 
Hw2 2017-spring
Hw2 2017-springHw2 2017-spring
Hw2 2017-spring
 

draft

  • 1. Separating Shuffle Regular Expressions for Data Words Manoj Kilaru Etienne Lozes Sylvain Schmitz May 27, 2016 1 Definitions 1.1 Background A data word is a finite sequence w = a1 d1 . . . an dn ∈ (Σ × D)∗ , where Σ is a finite set of symbols, and D is an infinite set of data (in all examples, D = N). The string projection [standard name? -el] str(w) is the word a1 . . . an ∈ Σ∗ . For a fixed data word w = a1 d1 . . . an dn , the length |w| of w is n, and the underlying equivalence relation ∼w is {(i, j) | i, j ≤ |w| and di = dj}. For w, w two data words, we write w ≈ w if str(w) = str(w ) and ∼w=∼w . A data language is a set of words L ⊆ (Σ × D)∗ closed under ≈. [TODO: Define RA and DA -el] Several formalisms for describing data languages have been proposed, that can be grouped in three categories • register automata (RA, see [KF94]) • first/second order logic (most notably FO2 (+1, <, ∼), see [BDM+ 11]) [de- fine these logics -el] • data automata (DA, see [BDM+ 11]) For any RA, a FO2 (+1, <, ∼) formula recognizing the same language can be computed [complexity? -el], and for any FO2 (+1, <, ∼) formula, an equivalent DA can be computed in 2-EXPTIME. However, there are no converse transla- tions possible: the language defined by the formula ∀x, y.x = y ⇒ x ∼ y can’t be recognized by a register automaton (example language for FO2 RA?). [We also found a direct encoding of RA into DA a bit simpler than the one that can be found in [BS07]. The idea is to guess the run of the RA, and guess for every position of the run manipulating a register r whether this is the last acces to this register until an update of its content. -el] 1
  • 2. Given a language L described by a RA/DA/FO formula, one might want to know whether it contains at least one word; this is called the emptiness prob- lem for RA/DA and the satisfiability problem for FO formulas. For RA/DA, emptiness is decidable, and satisfiability is decidable as well for FO2 (+1, <, ∼). However, pretty close formalisms for data words, like FO3 (+1, ∼), have unde- cidable emptiness/satisfiability. 1.2 Separating shuffle A data word w is a separating shuffle of w1 and w2, noted w ∈ w1 w2, if w is a shuffle of w1 and w2, and if the data occurring in w1 are disjoint from the data of w2. For instance, a 1 b 2 c 1 d 3 ∈ a 1 c 1 b 2 d 3 , but a 1 b 2 c 1 d 3 ∈ a 1 d 3 b 2 c 1 , because 1 is a common data of the two words. is extended to languages as expected: L1 L2 is the set of words w such that w ∈ w1 w2 for at least one w1 ∈ L1 and one w2 ∈ L2. The separating shuffle can be combined with the boolean connectives and regular expressions (RE) to define separating shuffle regular expressions (SSRE): e ::= | a ∈ Σ | e.e | e ∨ e | e∗ (RE) ϕ ::= e | ϕ ∧ ϕ | ϕ ∨ ϕ | ¬ϕ | ϕ ϕ (SSRE) [define SSRE with homomorphisms -el] The semantics of a SSRE is defined as expected. For instance, the SSRE ¬(¬ ¬ ) ∧ ¬ describes the set of data words with always the same data occurring in all positions. If we abbreviate this SSRE as atom, the following ¬ (atom ∧ ¬ a∈Σ a) describes the set of words that satisfy the formula ∀x, y.x = y ⇒ x ∼ y. 2 Questions We are interesting in the following questions: • how SSRE relate to RA/DA/FO2 ? • how the projections of SSRE languages relate to multicounter automata, or shuffle-star definable languages? • is emptiness decidable for SSRE? • what about variants of this (like for instance if we allow concatenation above shuffle)? 2
  • 3. • can we axiomatize SSRE (like Kleene algebras [Koz02], concurrent Kleene algebras [HMSW11], nominal Kleene algebras [GC11, KMS15]). • what is the role of homormorphisms in SSRE (expressiveness, decidabil- ity)? 3 Results Theorem 1. Let L be such that L/ ≈ is finite. Then L is SSRE definable. Proof. It is enough to show that L0 := {w|w ≈ w0} is SSRE definable for any w0. We reason by induction on |w0|: • if w0 = , then L0 is defined by the SSRE • if w0 = a, d , then L0 is defined by the SSRE a • otherwise w0 = a d · wR = wL · a d and by induction there are eR and eL that define [wR]≈ and [wL]≈ respectively. Then a.eR ∧ eL.a ∧ 1class . . . 1class k times , where k is the index of ∼w0 , defines [w0]≈. Indeed, if either d or d occurs in the middle of the word, then ∼wR and ∼wL completely determine ∼w0 . Otherwise, wether d = d is determined by the number of equivalence classes of ∼w0 . Theorem 2. Every DA definable language is definable by a SSRE+hom. Proof. [TODO. Check wether this can be done without any negations, using just the duals of all connectives. -el] Proof using negations Consider Data automata A = < τ, B > where the τ is a letter to letter trans- ducer and B is a class automata which is non deterministic finite automata. Now as B is NFA it can be expressed in a regular expression let Be . A Data word v will be in language of A if in the output of the base automata (let) v there is no class which get rejected by the class automata let the language, shuffle of B be represented by B = ¬((¬Be ∧ 1class) Σ∗ ) Then L(A) = τ−1 (B ) where τ : A∗ → C∗ so τ−1 : C∗ → A∗ and let us label every transition by new alphabet Σ . For example let the c:a −−→ t where c ∈ C∗ and b ∈ B∗ and t ∈∗ Σ let functions g and h be g : t → c and h : t → a. 3
  • 4. After this labelling let the automata be T Now L(A) = τ−1 (B ) = h(T ∩ g−1 (B )) = f(T ∩ B ) where B = {b1t1b2t2 . . . bmtm/b1b2 · · ·m ∈ B } T = {b1t1b2t2 · · ·m tm/t1t2 . . . tm ∈ L(T) ∧ ∀ig(ti) = bi} f : bi → ∧ ti → h(ti) Hence the language of Data automata is expressed in SSRE + homomorphisms accepting the same language. Theorem 3. The emptiness and universality of SSRE + homomorphisms is undecidable. Proof. It is known that universality for RA is undecidable [NSV04] and every RA can be expressed as DA accepting the same language [BDM+ 11]. So universality of DA is undecidable and thereby universality of SSRE + ho- momorphisms is undecidable and SSRE + homomorphisms are closed under negations hence emptiness of SSRE + homomorphisms is also undecidable. Let Σn = {ai, bi | i = 1, . . . , n} and Ln the language defined by n i=1 ∀x.bi(x) ⇒ ∃y.y < x ∧ ai(y) ∧ y ∼ x ∧ ∀z.y < z < x ⇒ ¬ai(z) [How can it be written with two variables only? -el] This language corresponds to the language of a RA with n registers that write in regi while reading ai and checks wrt to regi while reading bi. Theorem 4. The above language Ln is SSRE definable. Proof. [I forgot... -el] The expression ∀concatenationfactor( n i=1 (wi.(Σ/wi)∗ =⇒ ((wir∗ i ) (Σ/wi, ri)∗ ))) expresses the above language Ln where ∀concatenationfactor(e)means¬(Σ∗ .¬e.Σ∗ ). Let Lno the language of data words with non-overlapping classes, i.e. Lno = {w | [i]∼w is a connex set of positions for all i = 1, . . . , |w|} = {w | w |= ∀x, y.x +1 ∼ y ⇒ x + 1 = y}. 4
  • 5. Theorem 5. Lno is not RA definable, but it is SSRE definable. Proof. [I forgot... -el] Let RA has r number of states then consider a word of length r+2 with first r+1 positions distinct data and r+2 position has to be checked with all r+1 previous positions which is not possible as it has only r registers. Hence the Language cannot be expressed in RA. The expression ∀shufflefactor(2classes =⇒ 1class.1class) recognizes above language. where ∀shufflefactor(e) = ¬(¬e Σ∗ ) Theorem 6. The emptiness of SSRE (without homomorphisms) is undecidable. Proof. The proof uses the idea in section 9 of [BDM+ 11] The proof is by reduction from Post’s Correspondence Problem(PCP).PCP problem is given a pair of n words (ui, vi) from Σ∗ and the undecidable ques- tion is there a sequence of numbers i0, i1, . . . , im ∈ {1, 2, . . . , n} such that ui0 ui1 . . . uim = vi0 vi1 . . . vim . consider a map from each letter in Σ to each letter in Σ such that Σ ∩ Σ = φ If a word w ∈ Σ∗ ,then a word w ∈ Σ ∗ can be constructed by replacing each sym- bol in Σ with corresponding symbol in Σ consider a new alphabet Σ = Σ ∪ Σ . Now sequence i0, i1, . . . , im is the solution of PCP iff the Data word w with following properties is accepted by our SSRE. 1. The string projection is of the form (uivi)∗ where ui ∈ Σ∗ and vi ∈ Σ ∗ which corresponds to the vi ∈ Σ∗ 2. all the data in letters in Σ should be distinct and data in all letters in Σ should be distinct and all the data which appeared in the Σ∗ should appear in the Σ ∗ and in same order. Now we encode this properties in SSRE without homomorphisms 1. The language (uivi)∗ is a regular language hence there exists an regular expression for it 2. (a) all the data in Σ∗ should be distinct for this consider expression ¬((Σ ∗ .Σ.Σ ∗ .Σ.Σ ∗ ∧ 1class) Σ ∗ ) (b) all the data in Σ ∗ should be distinct for this consider expression ¬((Σ ∗ .Σ .Σ ∗ .Σ .Σ ∗ ∧ 1class) Σ ∗ ) (c) for all the data present in Σ∗ there is corresponding position of Σ ∗ with same data for this consider the expression ¬((Σ ∧ 1class) Σ ∗ ) ∧ ¬((Σ ∧ 1class) Σ ∗ ) 5
  • 6. (d) all the data in Σ∗ should be in same order in Σ ∗ for this consider expression ∀shufflefactor(2classes =⇒ (¬(Σ.(Σ 2 ∧ 1class).Σ)) hence emptiness of SSRE without is undecidable. 4 Restricted Negation A SSRE is positive if does not occur underneath a negation. Theorem 7. If e is a positive SSRE, then L(e) is EMSO2 (< . ∼, +1, +1 ∼) defin- able. Proof. [TODO, this is only a sketch of the proof -el] Let ϕe(X) be the EMSO2 (< . ∼, +1, +1 ∼) with fv(ϕe) = {X} defined by induction on e as follows: • ... Then for all ˆX ⊆ {1, . . . , |w|}, it holds that w, X → ˆX |= ϕe iff w| ˆX ∈ L(e), where w| ˆX = ai1 di1 . . . aik dik if ˆX = {i1, . . . , ik}. Finally, ∃X.∀x.x ∈ X∧ϕe(X) defines L(e). 5 Conjectures Let the language LF L be defined as ∃x, y(x ∼ y ∧ (∀yx < y ∨ x = y) ∧ (∀xx < y ∨ x = y)) which expresses that first and last position contains different data. Conjecture 1. The language LF L cannot be represented by SSRE (without homomorphisms). Conjecture 2. RA,DA,FO2 (+1, <, ∼) are not comparable with SSRE (without homomorphisms) in expressiveness of languages of data words. Proof. The above mentioned language LF L can be represented using RA,FO2 (+1, < , ∼),DA. • As the definition is expressed in FO2 (+1, <, ∼) so the language is FO2 (+1, < , ∼) definable. • It is RA expressible because the automaton stores the data of the first position and it nondeterministically guesses the last position and does a read operation of the same register here. • It is DA expressible as it the transducer guesses the first and last position with # rest it keeps the same and the class automata checks whether the language is (Σ∗ + #.Σ∗ .#). 6
  • 7. From above conjecture the language LF L cannot be represented in SSRE (without homomorphisms) The language ∀x, y.x = y ⇒ x ∼ y describing that all data are distinct can be expressed using SSRE (without homomorphisms) but cannot be represented using an RA • Suppose the language can be represented using an RA. let RA has r number of states then consider a word of length r+2 with first r+1 positions distinct data and r+2 position has to be checked with all r+1 previous positions which is not possible as it has only r registers. Hence contradiction. • The expression ¬ (1class ∧ ¬ ∧ ¬ a∈Σ a) represents the above language. The language all the positions have same data and length of the data word is even cannot represented FO2 (+1, <, ∼) but can be represented using a SSRE (without homomorphisms) • The expression 1class ∧ (ΣΣ)∗ represents the above mentioned language. • The above mentioned language cannot be represented using an FO2 (+1, < , ∼). It can be shown by playing EhrenfeuchtFrass game on family of words wn = ( a d )2n andwn = ( a d )2n +1 Let the L be language of data words satisfying the following properties 1. String projection is of the form a∗ ba∗ . 2. The data value at the position of b is distinct from rest of the positions. 3. Rest of the data occurs precisely twice once before b and once after after- wards 4. The order of the data before b must not be exactly same as the order of data after b occurs. [BDM+ 11] The language L can be represented using a DA. The letter letter transducer changes all the a’s before b to # except it marks two positions and labels first position x and next y and b to b and all a’s after b to # except it marks two positions and labels first position x and next y. the class automata accepts the language b + ## + xy + yx 7
  • 8. [BDM+ 11] The language L can be represented using a SSRE (without homomor- phisms). The expression a∗ ba∗ ∧¬((Σ+ .b.Σ∗ ∧1class) Σ∗ )∧¬((Σ∗ .b.Σ+ ∧1class) Σ∗ )∧¬((Σ∗ .a.Σ∗ .a.Σ∗ .b.Σ∗ ∧2classes) Σ∗ ) ∧¬((Σ∗ .b.Σ∗ .a.Σ∗ .a.Σ∗ ∧2classes) Σ∗ )∧¬(a Σ∗ )∧((a.(aa∧1class).a∧2classes) Σ∗ ) expresses the language L . It can be shown that ¬L cannot be represented by an DA [BDM+ 11]. But SSRE are closed under negations hence ¬L can be represented using a SSRE(without homomorphisms). Conjecture 3. The language { a1 d1 . . . an dn | d1 = dn} is not SSRE definable. However, it is RA definable with one register. Open question: can SSRE (without homeomorphisms) express all languages defined by deterministic RA? References [BDM+ 11] Mikolaj Boja´nczyk, Claire David, Anca Muscholl, Thomas Schwentick, and Luc Segoufin. Two-variable logic on data words. ACM Trans. Comput. Log., 12(4):27, 2011. [BS07] Henrik Bj¨orklund and Thomas Schwentick. On notions of regularity for data languages. In Fundamentals of Computation Theory, 16th International Symposium, FCT 2007, Budapest, Hungary, August 27-30, 2007, Proceedings, pages 88–99, 2007. [GC11] Murdoch James Gabbay and Vincenzo Ciancia. Freshness and name-restriction in sets of traces with names. In FOSSACS 2011, pages 365–380, 2011. [HMSW11] Tony Hoare, Bernhard Mller, Georg Struth, and Ian Wehrman. Concurrent kleene algebra and its foundations. The Journal of Logic and Algebraic Programming, 80(6):266 – 296, 2011. Relations and Kleene Algebras in Computer Science. [KF94] Michael Kaminski and Nissim Francez. Finite-memory automata. Theor. Comput. Sci., 134(2):329–363, 1994. [KMS15] Dexter Kozen, Konstantinos Mamouras, and Alexandra Silva. Com- pleteness and incompleteness in nominal kleene algebra. In Rela- tional and Algebraic Methods in Computer Science - 15th Interna- tional Conference, RAMiCS 2015, Braga, Portugal, September 28 - October 1, 2015, Proceedings, pages 51–66, 2015. 8
  • 9. [Koz02] Dexter Kozen. On the complexity of reasoning in kleene algebra. Inf. Comput., 179(2):152–162, 2002. [NSV04] Frank Neven, Thomas Schwentick, and Victor Vianu. Finite state machines for strings over infinite alphabets. ACM Trans. Comput. Logic, 5(3):403–435, July 2004. 9