1. Combinatory Logic and language
engineering
Ismail Biskri, Adam Joly and Boucif Amar Bensaber
LAMIA, Université du Québec à Trois-Rivières
2. Introduction
Language engineering:
Everything related to the NLP and the knowledge extraction
Main goal: help humans to access to knowledge contained in texts
Definition:
The study and the description of the concepts, the approaches, the methods and the
techniques that allow data extraction and knowledge modeling and acquisition from
texts
Knowledge acquisition from text needs to be assisted by analysis tools for
corpus, such as:
Semantic or syntactic analyzers
Marker tracking tools supported by contextual exploration
Statistical analyzers
Etc.
Numerous application fields since the development of the Web and
office tools
2 Biskri, Joly & Amar Bensaber, ICGST 2011
3. Introduction (2/3)
Many generations of tools:
At the beginning (about 40 years ago):
Applications focusing on 1 functionality
Since the 90’s:
More complex approaches are required by the industry for text analysis
There is an interest for functions and operations assembling in complex processing
chains (Hallab & al. 2000; Moscarola & al., 2002)
Most of the tools proposed then offer various functionalities
Despite some success with scientists and industries, they have many
important limits:
The technologies offer a closed and limited set of functionalities
They are designed as autonomous entities that can hardly or simply not be
integrated into more complex processing chains
They can be unusable by researchers with particular analysis needs (lack of
adaptability)
3 Biskri, Joly & Amar Bensaber, ICGST 2011
4. Introduction (3/3)
Recently, a new generation of software platforms for language engineering
has started to emerge
Statistical analysis:
Aladin (Seffah & al., 1995)
T2K and Knime (Warr, 2007)
Linguistic Analysis:
Context (Crispino & al., 1999)
Gate (Cunningham & al, 2002)
From these new platforms emerge new interests on processing chains
about:
Their coherence
Their flexibility
Their adaptability
Etc.
4 Biskri, Joly & Amar Bensaber, ICGST 2011
5. General Framework
Processing chain:
Integrated sequence of computational modules dedicated to specific processing,
assembled in a pertinent order according to a processing goal determined by the
language engineer
A module accomplishes an operation which applies to one or many object
entities from a given type and returns other object entities from another
type
A processing chain allow the composition of modules
We need a formal system that can answer 2 fundamental questions:
Given a set of modules, what are the allowable arrangements which lead to
coherent processing chains?
Given a coherent processing chain, how can we automate (as much as possible)
its assessment (in the sense of its calculability)?
Such a system will be at the center of our theoretical model
5 Biskri, Joly & Amar Bensaber, ICGST 2011
6. General Framework
Theoretical general framework chosen: Applicative
Grammars (Desclés, 1990; Shaumyan, 1998)
Instead of designing a rewritten grammar for syntactical
validation of the processing chain, we use a typed logic.
Types are given to inputs/outputs (integer, char, …)
Types constraint the possibilities of modules composition
Main advantages of this formalism:
Assures a firm compositionality of the different modules in the different processing chains,
by validating the types attributed to the modules
Allows to compose an infinity of modules
6 Biskri, Joly & Amar Bensaber, ICGST 2011
7. Combinatory Logic
Combinator Role -Reduction ruleβ
B Composition B x y z x (y z)→
C Permutation C x z y x y z→
Φ Distribution Φ x y z u x (y u) (z u)→
W Duplication W x y x y y→
From the works of Schöfinkel (1924) and Curry and Feys (1958)
Eliminate the need for variables in mathematics
Combinators:
Abstract operators that apply to other operators in order to build more
complex operators;
Act as functions over arguments, in an operator-operands structure
Each specific action is represented by a unique rule that defines the
equivalence between a logical expression with a combinator versus one
without a combinator ( -reduction rule)β
7 Biskri, Joly & Amar Bensaber, ICGST 2011
8. Complex combinators:
We can combine recursively many elementary combinators together
to form an infinitely range of complex combinators
The global action is determined by the successive application of the
combinators (from left to right)
Example:
i. B B C x y z u v
ii. B (C x) y z u v
iii. C x (y z) u v
iv. x u (y z) v
Power combinators (χn
):
Reiterates n times the action of the combinator χ
Distance combinators (χn):
Postpones the action of a combinator of n stepsχ
Combinatory Logic (2/3)
8 Biskri, Joly & Amar Bensaber, ICGST 2011
9. Combinatory Logic (3/3)
Combinatory logic fills 2 major goals:
It gives an interoperable and formal representation of the solution;
Combinatory logic expressions formally represent the composition of the
modules of the processing chain and gives the direct execution order
Combinators provides operators to support the different types of
interactions between modules:
B: expresses the composition of 2 interconnected modules
C: assures that all combinators and modules of the expression appear together
to the left and all inputs to the right (ordering)
Φ: distributes the same input to 2 or more different modules
9 Biskri, Joly & Amar Bensaber, ICGST 2011
10. Processing Chains
Our model builds systems using metaprogramming:
The metaprograms act as controllers over the programs (modules) by specifying the
interactions between modules and their execution flow
The goal is to be able to easily replace a module by another one with
compatible inputs and outputs
Module:
It acts like a math function:
It takes arguments as inputs
It processes a specific action
It returns a result as output
Each module is independent (black box: we know what it does but we are not interesting
in how)
It must have the capacity to communicate with other modules following a protocol
10 Biskri, Joly & Amar Bensaber, ICGST 2011
11. Processing Chains (2/2)
A controller supervises the flow of communication:
It verifies the validity of connections between modules (if the processing chain is
syntactically correct):
It determines the execution order of modules (following the combinatory
expression)
It triggers the execution of a module (one at a time only)
Processing chain 2
Processing chain 1
M1M1
M2M2
O1
O2
I1
I2 I4
I3
M3/C2M3/C2 O3 M4M4 O4I5
Controller 1
M1M1 O1 M2/C3M2/C3 O2I3
I2
I1
…
By abstraction, a processing
chain (the controller and
modules) can be considered
as a (super or meta) module
by itself)
Thus it can be used as a
module in another processing
chain
11 Biskri, Joly & Amar Bensaber, ICGST 2011
12. Basic Processing Chains (1 module)
M1M1 O1I1
M1M1 O1
I2
I1
In
…
12 Biskri, Joly & Amar Bensaber, ICGST 2011
1 input:
No combinator needed
O1 is obtained by applying M1 to I1
O1 = M1 I1
n inputs:
We add the inputs at the end of the expression
O1 = M1 I1 I2 … In
13. Serial processing chains
Relation of composition between modules (B)
2 connected modules:
O1 = M1 I1
O2 = M2 I2
I2 = O1
O2 = M2 (M1 I1)
O2 = B M2 M1 I1
3 connected modules:
O3 = M3 I3
I3 = O2
O3 = M3 (B M2 M1 I1)
O3 = B3
M3 B M2 M1 I1
O3 = C B3
B M3 M2 M1 I1
4 connected modules: O4 = C B4
(C B3
B) M4 M3 M2 M1 I1
(…)
The power of B is induced by the number of modules in the chain
M1M1 O1I1 M2M2 O2I2
M1M1 O1I1 M2M2 O2I2 M3M3 O3I3
13 Biskri, Joly & Amar Bensaber, ICGST 2011
14. Parallel processing chains
Contains modules that have many inputs
Module connected on the 1st input of a 2nd module:
O2 = M2 I2 I3
O1 = M1 I1
I2 = O1
O2 = M2 (M1 I1) I3
O2 = B M2 M1 I1 I3
2 modules connected to a 3rd module:
O3 = M3 I3 I4
I3 = M1 I1
I4 = M2 I2
O3 = M3 (M1 I1) (M2 I2)
O3 = B M3 M1 I1 (M2 I2)
O3 = C2 B M3 M1 (M2 I2) I1
O3 = B3 C2 B M3 M1 M2 I2 I1
3 modules connected to a 4th module: B7 C6 C6 B3 C2 B M4 M1 M2 M3 I3 I2 I1
(…)
The distance of combinators B and C can be induced by the number of modules
M1M1 O1I1
M2M2 O2
I2
I3
M1M1
M2M2
O1
O2
I1
I2 I4
I3
M3M3 O3
14 Biskri, Joly & Amar Bensaber, ICGST 2011
16. SATIM
Following these formalisms and principles, we have implemented a prototype
(work in progress) named SATIM.
SATIM: « Système d’Analyse et de Traitement de l’Information Multidimensionnelle »
(Multidimensional Data Analysis and processing System)
The architecture of this modular platform postulates 3 levels of interaction with a
language engineer:
1. Workshop:
Contains various modules, procedures and functions and their assigned applicative categories
Possibility to add or delete modules to a « database » of modules
1. Laboratory:
Allows an engineer to build his processing chain and adjust it using tests and according to his
objective
1. Application:
It is the output of the previous level: the processing chain is then an autonomous software that
contains a coherent and well organized subset of modules
16 Biskri, Joly & Amar Bensaber, ICGST 2011
17. Conclusion
We are at a prototypal stage/test phase
Eventually, it will become the full-size project within which we aspire
to design tools for language engineering and other tools for NLP in
general
The strong foundations (formalism and principles) at the heart of SATIM
are aimed to address the need for coherence, flexibility, adaptability and
easy communication between programs (processing chains):
Modules are independents: we can easily replace a module by another one
with compatible inputs and output to change some parts of a given program
We believe that the approach could help research teams to collaborate
together by sharing components
17 Biskri, Joly & Amar Bensaber, ICGST 2011
Notas do Editor
The composition combinator B combines together two operators x and y in order to form the complex operator B x y that acts on an operand z according to the β-reduction rule. The permutation combinator C uses an operator x in order to build the complex operator C x such as if x acts on the operands y and z, C x will act on those operands in the reverse order, that is to say z and y. Given the three operators x, y and z and the operand u, the distribution combinator Φ distributes the operand with the two precedent operators. Finally, given the binary operators x, and the operand y, the combinator W duplicates y so that the operator x will have its two arguments.