Argumentation in Artificial Intelligence: From Theory to Practice (Practice)

Argumentation in Artificial Intelligence: From
Theory to Practice
Part 2: Practice!
Federico Cerutti Mauro Vallati
Cardiff University
University of Huddersfield

Table of contents
1. Assessing the State of the Art
2. Analysis of the State of the Art in Abstract Argumentation
3. Learning for Argumentation
1

Assessing the State of the Art

How to Select a Solver
I understand what is argumentation about, I want to use it for solving
some of my problems. How do I pick up the best solver(s)?
... or, how to fairly compare solvers
2

How to Select a Solver
Clearly, one may not have enough time, resources, benchmarks, or
experience, to run a full experimental comparison among solvers.
This is one of the reasons why standards are introduced and usually
exploited.
3

Standards
First, we need to define some standard way for comparing
Specifically:
• standard language for input and output
• challenging, diverse, and representative instances to deal with (aka,
benchmarks)
• or, ways for creating and selecting benchmarks
The larger and more diverse the set of available benchmarks, the higher
the probability the results of the comparison are relevant for your specific
set of instances and problems.
4

Something more about benchmarks
Benchmarks can be created using generators such as AFBenchGen [4, 5]
or Probo [6]
• Purely random generated AFs.
• AFs based on structured graphs
• Watts [16]
• Erd¨os-R´enyi, [9]
• Barabasi-Albert [1]
• Focus on Stable
• Focus on SCC
Otherwise, AFs generated by considering “applications”
• Planning
• Wikipedia pages
• etc..
5

Competitions in AI: problem solved?
Standardised way for comparing solvers.
6

Can I Blindly Trust Competition Results?
NO
Ok, let me elaborate on this...
7

Sources of Performance Variation
There are various sources of performance variation that aﬀect results.
Your settings (in a wide sense) and needs can be very diﬀerent from
those used during competitions
(Sorry Ariel, not only low-level details) 8

Sources of Performance Variation (1)
Solver randomisation and other stochastic effects
• Many solvers take advantage of randomisation
• Very different solver trajectories
• Computationally expensive to draw a complete figure of the
performance of a randomised solver [11]
Other sources: operating system, cache, shared hard drives..
9

Solver randomisation and other stochastic effects
• Many solvers take advantage of randomisation
• Very different solver trajectories
• Computationally expensive to draw a complete figure of the
performance of a randomised solver [11]
Other sources: operating system, cache, shared hard drives..
Instances solved across 100 runs on application benchmaks for top 3 SAT 2014 solvers. (from [11])
9

Running time and memory limits
• Generally, more running time or memory result in higher coverage
• improved performance with increased limits tends to not be
distributed evenly across all solvers
10

Running time and memory limits
• Generally, more running time or memory result in higher coverage
• improved performance with increased limits tends to not be
distributed evenly across all solvers
1 2 3 4 5 6 7 8
Memory [GB]
0
20
40
60
80
100
120
140
Coverage[#instances]
Hpp-ce, Hpp
Hflow
SPMaS
Rlazya
CedalionGamer
DPMPlan
Dynamic-Gamer
SymBA-2, SymBA-1, NuCeLaR
Metis, MIPlan
RIDA
cGamer-bd
IPC 2014: planners that perform extensive precomputation benﬁt more
from increased memory limits [14]
10

Hardware and Software environment
• Solvers are aﬀected to varying degree by diﬀerent CPUs or other
hardware elements [10]
• Java, C++ compilers, libraries, python, linkers, etc.
11

Hardware and Software environment
• Solvers are aﬀected to varying degree by diﬀerent CPUs or other
hardware elements [10]
• Java, C++ compilers, libraries, python, linkers, etc.
gpj Gpj gPj GPj gpJ GpJ gPJ GPJ
100
110
120
130
140
150
160Coverage[#instances]
Madagascar
YAHSP3-mt
Madagascar-pc, YAHSP3
Probe
BFS-f
Mercury
Jasper
ArvandHerd, USE
IBaCoP2
Cedalion, IBaCoP
YAHSP3-mt
Madagascar
Madagascar-pc
YAHSP3
Probe
BFS-f
Mercury
Jasper
IBaCoP2
IBaCoP
USE
ArvandHerd
Cedalion
IPC 2014: coverage of top solvers wrt C++, python, and Java version
11

Choice of benchmark (distribution)
• Benchmarks should challenging (not trivial, not too hard)
• What does challenging mean? (dynamic or static property?)[15]
• How to create them?
• How to select them?
¼
¾¼
¼
¼
¼
½¼¼
Ú
ØÝ
Ö
Ì
ÓÙ
ØÙÐ
È
Ö
ÒÌ
ØÖ×
Ð
×Ò

ÐÓÓÖØÐ
ÌÖ
Ò×ÔÓÖØ
Ç
Ô
Ò×Ø

×
Å
ÒØ
Ò
Ò
Î
×Ø
ÐÐÖÑ
Ò
À
Ò
ÈÐÒÒÖ×´ÔÖ
ÒØµ
Ë Ø ×
Ò ÌÖ
½ ¾¼
½½ ½
½¼
¼
12

Ranking mechanism: The techniques for aggregating results across the
set of benchmarks strongly aﬀect competitions outcome [14]
Two main orthogonal dimensions:
• What metrics do we care about?
• Absolute vs relative ranking
• Example: IPC score, coverage, Borda ranking, PAR10..
13

Are Competitions Useful?
Don’t take me wrong, competitions in AI are awesome.
14

Are Competitions Useful?
Don’t take me wrong, competitions in AI are awesome.
• Foster the advancement of the state of the art
• Provide a large set of benchmarks
• Support the standardisation
• Provide a large number of ready-to-use solvers
• Highlight issues that need to be tackled by the community (e.g.,
areas not receiving enough attention, lack of applications, etc.)
14

A Pinch of Salt
Results from competitions in AI cannot necessarily be easily generalised.
They refer to the considered solvers, solving the selected benchmarks,
ordered according to selected metrics, run on the speciﬁc hardware and
software conﬁguration used during the competition.
15

Analysis of the State of the Art
in Abstract Argumentation

IPC Score
IPC(s, P) =



0 if P is unsolved
1
1 + log10
TP (s)
T∗
P
otherwise
tP (s) denotes the time needed by solver s to solve P
T∗
P is the minimum amount of time required by any
considered solver to solve P
16

PAR10 score
Penalised Average Runtime 10.
PAR10(s, P) =
10 ∗ T if P is unsolved
tP (s) otherwise
T indicates the considered timeout
tP (s) denotes the time needed by solver s to solve P
17

ICCMA 2015 (1)
Four Semantics:
• complete (CO)
• preferred (PR)
• grounded (GR)
• stable (ST)
Four computational tasks:
• determine some extension (SE)
• determine all extensions (EE)
• decide whether a given argument is contained in some extension
(DC)
• decide whether a given argument is contained in all extensions (DS)
18

ICCMA 2015 (2)
18 solvers, tested on 192 AFs
10 minutes and 4 GB of RAM for solving a task.
1 point for each solved instance (used for in-track ranking).
General ranking done using Borda score.
19

Main Classes of Solvers
Solvers that took part in ICCMA 2015 can be (roughly) classiﬁed as
• reduction-based approaches: the argumentation problem is
encoded as a known problem such as SAT, ASP, MAX-SAT, etc.
• Can exploit availability of well-engineered solvers and established
techniques.
• direct approaches: the argumentation problem is tackled directly.
20

ICCMA 2015 – Results
EE-PR
1. Cegartix
2. ArgSemSAT
3. CoQuiAAS
4. ASPARTIX-V
5. LabSATSolver
6. prefMaxSAT
7. ASGL
8. ASPARTIX-D
9. ConArg
10. ArgTools
11. . . .
EE-ST
1. ASPARTIX-D
2. ArgSemSAT
3. CoQuiAAS
4. ASGL
5. ConArg
6. ArgTools
7. LabSATSolver
8. DIAMOND
9. Dungell
Carneades
ASSA
21

ICCMA 2015: Impression
First Impression:
Reduction-based systems
are the most efﬁcient
22

Is That Always the Case?
EE-PR
All Barabasi-Albert Erd¨os-R´enyi StableM Watts-Strogatz
Solver PAR10 Cov. F.t PAR10 Cov. PAR10 Cov. PAR10 Cov. PAR10 Cov.
Cegartix 1350.4 79.1 229 1662.6 74.2 1266.6 81.0 1439.2 77.0 1028.6 84.2
ArgSemSAT 1916.2 69.1 35 3532.3 41.9 433.7 94.2 2530.9 58.7 1171.1 81.5
LabSATSolver 2050.3 66.8 9 3430.7 43.5 261.3 96.5 2869.5 53.0 1657.5 73.9
prefMaxSAT 2057.2 66.8 273 3482.1 42.9 444.0 94.2 3625.2 40.3 697.5 89.4
DIAMOND 2417.0 61.0 1 3447.8 43.2 1366.7 79.0 2831.8 53.7 2026.0 68.0
ASPARTIX-D 2728.6 56.1 4 4101.5 32.6 3067.8 51.6 2068.8 66.7 1630.3 74.3
ASPARTIX-V 2772.2 55.2 21 3646.6 40.3 3292.6 47.1 2340.7 62.0 1772.4 71.9
CoQuiAas 3026.4 50.5 78 3736.1 38.4 2873.4 53.5 2836.4 53.3 2645.1 57.1
ASGL 3477.3 43.2 1 4809.7 20.3 96.1 100.0 4475.4 26.0 4585.5 25.4
Conarg 3696.3 39.3 158 1128.7 81.6 2813.9 55.8 4934.6 18.3 6000.0 0.0
ArgTools 3906.2 35.2 322 3694.4 39.0 45.2 100.0 6000.0 0.0 6000.0 0.0
GRIS 4543.7 24.4 174 254.6 96.1 6000.0 0.0 6000.0 0.0 6000.0 0.0
23

State of the Art
• It is not always the case that that reduction-based solvers always
outperform non reduction-based systems;
• The solvers at the state of the art show a high level of
complementarity (specially those able to deal with EE-PR problems),
thus they are suitable to be combined in portfolios;
24

Parallelising the Reasoning Process
ICCMA focused on sequential solvers. Can we parallelise?
25

Quick and clean solution: run multiple solvers in parallel.
Strenghts
• Easy to implement
• Low overhead of communication
Weaknesses
• No information shared among the solvers
• Does not allow to solve instances that are too large for sequential
solvers
26

Example: P-SCC-REC [7], for enumerating preferred extensions in large
AFs.
It leverages on the notion of Strongly Connected Components, and the
extension-based semantics deﬁnition schema SCC-recursiveness [2]
27

P-SCC-REC: idea
Creation of the SCCs-tree structure: {S1, S2}, {S3} , where S1 = {c, d},
S2 = {e, f }, and S3 = {g, h}.
a b
e f
c d g h
Level 1 Level 2
28

P-SCC-REC: Results)
¼
½ ¼
¿¼¼
¼
¼¼
¼
¼¼
¼ ½ ¼ ¿¼¼ ¼ ¼¼ ¼ ¼¼
È½ Ú× È¾
¼
½ ¼
¿¼¼
¼
¼¼
¼
¼¼
¼ ½ ¼ ¿¼¼ ¼ ¼¼ ¼ ¼¼
È½ Ú× È
29

What does “Learning” Mean?
I have a set of AFs that want to analyse, I know the problem I am
working on, I picked up a solver that works decently.
...but, in order to deploy the system, I need it to be faster.
30

What does “Learning” Mean?
I have a set of AFs that want to analyse, I know the problem I am
working on, I picked up a solver that works decently.
...but, in order to deploy the system, I need it to be faster.
Let’s learn something then.
30

Learning: idea
Generic solver
31

Learning: idea
Generic solver
Knowledge
(about the
problem,
solver, ...)
31

Learning: idea
Generic solver
Knowledge
(about the
problem,
solver, ...)
Knowledge-boosted approach
31

However...
Extracting additional knowledge could, in principle, be easy. But...
32

Which Kind of Knowledge?
• Combination and Selection of solvers
• Conﬁguration of solvers
• Conﬁguration (Reformulation) of AFs
Here we focus on knowledge that can be automatically extracted.
33

Combining and Selecting Solvers
(Solver selection can be seen as a particular case of portfolio
conﬁguration)
• Static: the same portfolio is used for analysing any AF
• Dynamic: portfolio is conﬁgured according to some characteristics of
the AF
34

Static Portfolio
Deﬁned by:
1. the selected solvers;
2. the order in which solvers will be run; and
3. the runtime allocated to each solver.
36

Static Portfolio: Approaches
In [8] two approaches were proposed:
Shared-k
Each component solver has been allocated maxRuntime
k seconds. Solvers
selected/ordered according to overall PAR10
FDSS
From an empty portfolio, we iteratively add either a new solver
component, or extend the allocated CPU-time of a solver already added
to the portfolio, depending on what maximises the increment of the
PAR10 score of the portfolio
37

Dynamic Portfolio
For each AF, a vector of features is computed.
Similar instances should have similar feature vectors.
Portfolios are conﬁgured using empirical performance models
39

Dynamic Portfolio: Features
Features can be extracted from diﬀerent representations of an AF [3].
E.g., Directed graph representation.
• Graph size features: number of vertices, number of edges, ratios
verticesedges and inverse, and graph density
• Degree features: average, standard deviation, maximum, minimum
degree values across the nodes in the graph.
• SCC features: number of SCCs, average, standard deviation, maxi-
mum and minimum size.
• Graph structure: presence of auto-loops, number of isolated
vertices, etc
Similarly, features can be extracted by considering undirected graph, or
matrix representation.
40

Dynamic Portfolio: Approaches
Classification-based
Classify
It classifies a given AF into a single category which corresponds to the single solver
predicted to be the fastest and allocates it all the available CPU-time
Regression-based
1-Regression
Given the predicted runtime of each solver, the solver predicted to be the fastest is
selected and it has allocated all the available CPU-time
M-regression
Initially we select the solver predicted to be the fastest, but we allocate only its
predicted CPU-time +10%. If such a solver does not solve the given AF in the
allocated time, it is stopped and no longer available to be selected, and the process
iterates by selecting a different solver
41

Some interesting
results when using
representative
training instances..
EE-PR
System Cov. PAR10
VBS 91.4 562.9
Classify 89.7 665.2
1-Regression 88.6 734.7
M-Regression 82.8 1068.3
FDSS 80.0 1311.4
Cegartix 79.1 1350.4
Shared-2 73.2 1678.0
Shared-3 69.4 1892.0
ArgSemSAT 69.1 1916.2
LabSATSolver 66.8 2050.3
prefMaxSAT 66.8 2057.2
Shared-4 65.7 2105.5
Shared-5 63.3 2240.3
DIAMOND 61.0 2417.0
ASPARTIX-D 56.1 2728.6
ASPARTIX-V 55.2 2772.2
CoQuiAas 50.5 3026.4
ASGL 43.2 3477.3
Conarg 39.3 3696.3
ArgTools 35.2 3906.2
GRIS 24.4 4543.7
42

Selection of Solvers
EE-PR
System Class. M-Reg.
ArgSemSAT 0 253
ArgTools 311 305
ASGL 6 36
ASPARTIX-D 2 80
ASPARTIX-V 1 99
Cegartix 221 403
Conarg 157 122
CoQuiAas 43 44
DIAMOND 0 65
GRIS 153 278
LabSATSolver 13 208
prefMaxSAT 297 301
43

Leave-one-set-out Scenario: Can We Generalise?
EE-PR
Barabasi-Albert Erd¨os-R´enyi StableM Watts-Strogatz
System Cov. PAR10 Cov. PAR10 Cov. PAR10 Cov. PAR10
Classify 78.9 1321.4 88.6 745.0 74.4 1574.3 89.5 677.8
1-Regression 76.3 1479.0 63.0 2255.2 76.5 1453.9 83.0 1079.9
M-Regression 70.4 1828.4 67.3 2039.7 77.0 1434.7 79.6 1267.6
FDSS 69.1 1916.2 80.9 1245.5 79.1 1341.9 78.6 1380.0
Shared-2 73.2 1678.0 73.2 1678.0 74.2 1620.4 73.2 1678.0
Shared-3 69.4 1892.0 67.3 2007.9 69.5 1896.7 69.4 1892.0
Shared-4 65.7 2106.2 65.7 2101.1 65.7 2108.1 65.7 2103.9
Shared-5 63.3 2240.9 63.4 2235.8 63.3 2242.9 63.3 2242.9
44

Conﬁguration of Algorithms
Solvers can be conﬁgured to improve performance on a class of problems
/ instances.
Image taken from [13].
45

Configuration of Algorithms
There exists several configuration approaches, based on different
underlying ideas.
For the sake of this talk, we focus on SMAC [12], used for configuring
ArgSemSAT
Image taken from [12].
46

Conﬁguration of the Solver
Parameter Domain Default
SOLVER-ExtEnc {001111, 010101, 010111, ......, 111111} 101010
GLUCOSE-gc-frac [0.0, 500.0] 0.2
GLUCOSE-rnd-freq [0.0, 1.0] [0.0
GLUCOSE-cla-decay [0.0, 1.0] 0.999
GLUCOSE-max-var-decay [0.0, 1.0] 0.95
GLUCOSE-var-decay [0.0, 1.0] 0.8
GLUCOSE-phase-saving 0,1,2 2
GLUCOSE-ccmin-mode 0,1,2 2
GLUCOSE-K [0.0, 1.0] 0.8
GLUCOSE-R [1.0, 5.0] 1.4
GLUCOSE-szTrailQueue [10,10000] (int) 5000
GLUCOSE-szLBDQueue [10,10000] (int) 50
GLUCOSE-simp-gc-frac [0.0, 5000.0] 0.5
GLUCOSE-sub-lim [-1,10000] (int) 20
GLUCOSE-cl-lim [-1,10000] (int) 1000
GLUCOSE-grow [-10000,10000] (int) 0
GLUCOSE-incReduceDB [0,10000] (int) 300
GLUCOSE-ﬁrstReduceDB [0,10000] (int) 2000
GLUCOSE-
specialIncReduceDB
[0,10000] (int) 1000
GLUCOSE-
minLBDFrozenClause
[0,10000] (int) 30
47

Conﬁguration of the Framework
Order arguments/attacks according to:
1. The number of attacks received;
2. The number of attacks to other arguments;
3. The presence of self-attacks;
4. The diﬀerence between the number of received attacks and the
number of attacks to other arguments;
5. Being an argument in a mutual attack.
+ arguments can be listed following a direct or inverse order
Ordering of arguments and attacks are independent
48

Conﬁguration of the Framework (2)
a1 a3 a2
arg(a1).
arg(a2).
arg(a3).
att(a1,a3).
att(a2,a2).
att(a3,a1).
att(a3,a2).
arg(a2).
arg(a3).
arg(a1).
att(a2,a2).
att(a3,a2).
att(a3,a1).
att(a1,a3).
List of arguments ordered according to the number
of received attacks and, subsequently, the number
of outgoing attacks; and the list of attacks ordered
prioritising self-attacks and, subsequently, the
number of outgoing attacks
49

Parametrisation
Parameter Domain Default
args ingoingFirst [-1.0,1.0] 0
args outgoingFirst [-1.0,1.0] 0.2
args autoFirst [-1.0,1.0] -1
args eachOther [-1.0,1.0] -1
args differenceFirst [-1.0,1.0] -1
atts ingoingFirst [-1.0,1.0] 0
atts outgoingFirst [-1.0,1.0] 0
atts autoFirst [-1.0,1.0] 0.2
atts eachOther [-1.0,1.0] 0
atts differenceFirst [-1.0,1.0] 0
atts orders {0,1,2,3,4} 0
0 Same ordering applied to the first argument of the attack pair
1 Same ordering applied to the second argument of the attack pair
2 Inverse ordering applied to the first argument of the attack pair
3 Inverse ordering applied to the second argument of the attack pair
4 Attack-specific ordering
50

Results: Representative Training Instances
Set Configuration IPC Score PAR10 Fastest (%)
Barabasi-Albert Default 78.0 1921.0 2.5
Configured 125.2 1863.1 60.5
Erdös-Rényi Default 56.8 3426.5 16.5
Configured 60.4 3329.2 18.0
Watts-Strogatz Default 116.6 1967.3 28.0
Configured 118.1 1967.9 23.5
General Default 110.0 1665.4 11.0
Configured 143.0 1376.8 62.5
51

Results: Cross-Validation
Training sets Test sets
Barabasi-Albert Erdös-Rényi Watts-Strogatz General
Barabasi-Albert 119.2 6.9 34.5 42.8
Erdös-Rényi 92.3 58.6 105.3 125.7
Watts-Strogatz 116.2 52.6 115.6 129.2
General 87.5 57.6 113.5 133.2
52

Configuration: Most Important Single Parameters
Set 1st 2nd 3rd
Barabasi-Albert S-ExtEnc (011111) G-firstReduceDB (1528) G-cla-decay (0.32)
Erdös-Rényi F-autoFirst (-1.00) G-rnd-freq (0.00) G-K (0.26)
Watts-Strogatz S-ExtEnc (101010) G-Grow (0) G-rnd-freq (0.08)
General S-ExtEnc (101010) G-R (2.09) G-cla-decay (0.99)
53

Conﬁguration: Interaction Between Parameters
54

Learning for Argumentation: Summarising
Exploiting additional knowledge can help argumentation reasoners to
improve their runtime performance.
3 main approaches analysed so far:
• Portfolio / Algorithm Selection
• Algorithm Conﬁguration
• Model Reformulation
55

Let’s move to the last bit of this tutorial.
55

References I
[1] A. Barabasi and R. Albert.
Emergence of scaling in random networks.
Science, 286(5439), 1999.
[2] P. Baroni and M. Giacomin.
A General Recursive Schema for Argumentation Semantics.
In Proceedings of the 14th European Conference on Artiﬁcial
Intelligence (ECAI 2004), pages 783–787.
[3] F. Cerutti, M. Giacomin, and M. Vallati.
Algorithm selection for preferred extensions enumeration.
In Computational Models of Argument - Proceedings of COMMA,
pages 221–232, 2014.
56

References II
Generating challenging benchmark AFs.
In Proceedings of COMMA, pages 457–458, 2014.
Generating challenging benchmark AFs: Afbenchgen2.
In Proceedings of COMMA, 2016.
[6] F. Cerutti, N. Oren, H. Strass, M. Thimm, and M. Vallati.
A benchmark framework for a computational argumentation
competition.
pages 459–460, 2014.
57

References III
[7] F. Cerutti, I. Tachmazidis, M. Vallati, S. Batsakis, M. Giacomin,
and G. Antoniou.
Exploiting parallelism for hard problems in abstract
argumentation.
In Proceedings of the Twenty-Ninth AAAI Conference on Artificial
Intelligence, pages 1475–1481, 2015.
[8] F. Cerutti, M. Vallati, and M. Giacomin.
Where are we now? state of the art and future trends of
solvers for hard argumentation problems.
pages 207–218, 2016.
[9] P. Erdös and A. Rényi.
On random graphs. I.
Publicationes Mathematicae Debrecen, 6:290–297, 1959.
58

References IV
[10] A. E. Howe and E. Dahlman.
A critical assessment of benchmark comparison in planning.
J. Artif. Intell. Res. (JAIR), 17:1–3, 2002.
[11] B. Hurley and B. O’Sullivan.
Statistical regimes and runtime prediction.
In Proceedings of the Twenty-Fourth International Joint Conference
on Artificial Intelligence, IJCAI, pages 318–324, 2015.
[12] F. Hutter, H. H. Hoos, K. Leyton-Brown, and K. P. Murphy.
Time-bounded sequential parameter optimization.
In Learning and Intelligent Optimization, 4th International
Conference, LION, pages 281–298, 2010.
[13] F. Hutter, H. H. Hoos, K. Leyton-Brown, and y. v. p.
Thomas Stützle, journal=J. Artif. Intell. Res. (JAIR).
Paramils: An automatic algorithm configuration framework.
59

References V
[14] C. Linares L´opez, S. J. Celorrio, and A. G. Olaya.
The deterministic part of the seventh international planning
competition.
Artif. Intell., 223:82–119, 2015.
[15] M. Vallati and T. Vaquero.
Towards a protocol for benchmark selection in IPC.
In Proceedings of the 4th Workshop on the International Planning
Competition (WIPC), 2015.
[16] D. J. Watts and S. H. Strogatz.
Collective dynamics of ’small-world’ networks.
Nature, 393(6684):440–442, 1998.
60

Argumentation in Artificial Intelligence: From Theory to Practice (Practice)

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Argumentation in Artificial Intelligence: From Theory to Practice (Practice)

Semelhante a Argumentation in Artificial Intelligence: From Theory to Practice (Practice) (20)

Último

Último (20)

Argumentation in Artificial Intelligence: From Theory to Practice (Practice)