Part on Practice of the IJCAI 2017 Tutorial titled "Argumentation in Artificial Intelligence: From Theory to Practice", from Federico Cerutti and Mauro Vallati
Argumentation in Artificial Intelligence: From Theory to Practice (Practice)
1. Argumentation in Artificial Intelligence: From
Theory to Practice
Part 2: Practice!
Federico Cerutti Mauro Vallati
Cardiff University
University of Huddersfield
2. Table of contents
1. Assessing the State of the Art
2. Analysis of the State of the Art in Abstract Argumentation
3. Learning for Argumentation
1
4. How to Select a Solver
I understand what is argumentation about, I want to use it for solving
some of my problems. How do I pick up the best solver(s)?
... or, how to fairly compare solvers
2
5. How to Select a Solver
Clearly, one may not have enough time, resources, benchmarks, or
experience, to run a full experimental comparison among solvers.
This is one of the reasons why standards are introduced and usually
exploited.
3
6. Standards
First, we need to define some standard way for comparing
Specifically:
• standard language for input and output
• challenging, diverse, and representative instances to deal with (aka,
benchmarks)
• or, ways for creating and selecting benchmarks
The larger and more diverse the set of available benchmarks, the higher
the probability the results of the comparison are relevant for your specific
set of instances and problems.
4
7. Something more about benchmarks
Benchmarks can be created using generators such as AFBenchGen [4, 5]
or Probo [6]
• Purely random generated AFs.
• AFs based on structured graphs
• Watts [16]
• Erd¨os-R´enyi, [9]
• Barabasi-Albert [1]
• Focus on Stable
• Focus on SCC
Otherwise, AFs generated by considering “applications”
• Planning
• Wikipedia pages
• etc..
5
8. Competitions in AI: problem solved?
Standardised way for comparing solvers.
6
9. Can I Blindly Trust Competition Results?
NO
Ok, let me elaborate on this...
7
10. Sources of Performance Variation
There are various sources of performance variation that affect results.
Your settings (in a wide sense) and needs can be very different from
those used during competitions
(Sorry Ariel, not only low-level details) 8
11. Sources of Performance Variation (1)
Solver randomisation and other stochastic effects
• Many solvers take advantage of randomisation
• Very different solver trajectories
• Computationally expensive to draw a complete figure of the
performance of a randomised solver [11]
Other sources: operating system, cache, shared hard drives..
9
12. Sources of Performance Variation (1)
Solver randomisation and other stochastic effects
• Many solvers take advantage of randomisation
• Very different solver trajectories
• Computationally expensive to draw a complete figure of the
performance of a randomised solver [11]
Other sources: operating system, cache, shared hard drives..
Instances solved across 100 runs on application benchmaks for top 3 SAT 2014 solvers. (from [11])
9
13. Sources of Performance Variation (2)
Running time and memory limits
• Generally, more running time or memory result in higher coverage
• improved performance with increased limits tends to not be
distributed evenly across all solvers
10
14. Sources of Performance Variation (2)
Running time and memory limits
• Generally, more running time or memory result in higher coverage
• improved performance with increased limits tends to not be
distributed evenly across all solvers
1 2 3 4 5 6 7 8
Memory [GB]
0
20
40
60
80
100
120
140
Coverage[#instances]
Hpp-ce, Hpp
Hflow
SPMaS
Rlazya
CedalionGamer
DPMPlan
Dynamic-Gamer
SymBA-2, SymBA-1, NuCeLaR
Metis, MIPlan
RIDA
cGamer-bd
IPC 2014: planners that perform extensive precomputation benfit more
from increased memory limits [14]
10
15. Sources of Performance Variation (3)
Hardware and Software environment
• Solvers are affected to varying degree by different CPUs or other
hardware elements [10]
• Java, C++ compilers, libraries, python, linkers, etc.
11
16. Sources of Performance Variation (3)
Hardware and Software environment
• Solvers are affected to varying degree by different CPUs or other
hardware elements [10]
• Java, C++ compilers, libraries, python, linkers, etc.
gpj Gpj gPj GPj gpJ GpJ gPJ GPJ
100
110
120
130
140
150
160Coverage[#instances]
Madagascar
YAHSP3-mt
Madagascar-pc, YAHSP3
Probe
BFS-f
Mercury
Jasper
ArvandHerd, USE
IBaCoP2
Cedalion, IBaCoP
YAHSP3-mt
Madagascar
Madagascar-pc
YAHSP3
Probe
BFS-f
Mercury
Jasper
IBaCoP2
IBaCoP
USE
ArvandHerd
Cedalion
IPC 2014: coverage of top solvers wrt C++, python, and Java version
11
17. Sources of Performance Variation (4)
Choice of benchmark (distribution)
• Benchmarks should challenging (not trivial, not too hard)
• What does challenging mean? (dynamic or static property?)[15]
• How to create them?
• How to select them?
¼
¾¼
¼
¼
¼
½¼¼
Ú
ØÝ
Ö
Ì
ÓÙ
ØÙÐ
È
Ö
ÒÌ
ØÖ×
Ð
×Ò
ÐÓÓÖØÐ
ÌÖ
Ò×ÔÓÖØ
Ç
Ô
Ò×Ø
×
Å
ÒØ
Ò
Ò
Î
×Ø
ÐÐÖÑ
Ò
À
Ò
ÈÐÒÒÖ×´ÔÖ
Òص
Ë Ø ×
Ò ÌÖ
½ ¾¼
½½ ½
½¼
¼
12
18. Sources of Performance Variation (5)
Ranking mechanism: The techniques for aggregating results across the
set of benchmarks strongly affect competitions outcome [14]
Two main orthogonal dimensions:
• What metrics do we care about?
• Absolute vs relative ranking
• Example: IPC score, coverage, Borda ranking, PAR10..
13
20. Are Competitions Useful?
Don’t take me wrong, competitions in AI are awesome.
• Foster the advancement of the state of the art
• Provide a large set of benchmarks
• Support the standardisation
• Provide a large number of ready-to-use solvers
• Highlight issues that need to be tackled by the community (e.g.,
areas not receiving enough attention, lack of applications, etc.)
14
21. A Pinch of Salt
Results from competitions in AI cannot necessarily be easily generalised.
They refer to the considered solvers, solving the selected benchmarks,
ordered according to selected metrics, run on the specific hardware and
software configuration used during the competition.
15
23. IPC Score
IPC(s, P) =
0 if P is unsolved
1
1 + log10
TP (s)
T∗
P
otherwise
tP (s) denotes the time needed by solver s to solve P
T∗
P is the minimum amount of time required by any
considered solver to solve P
16
24. PAR10 score
Penalised Average Runtime 10.
PAR10(s, P) =
10 ∗ T if P is unsolved
tP (s) otherwise
T indicates the considered timeout
tP (s) denotes the time needed by solver s to solve P
17
25. ICCMA 2015 (1)
Four Semantics:
• complete (CO)
• preferred (PR)
• grounded (GR)
• stable (ST)
Four computational tasks:
• determine some extension (SE)
• determine all extensions (EE)
• decide whether a given argument is contained in some extension
(DC)
• decide whether a given argument is contained in all extensions (DS)
18
26. ICCMA 2015 (2)
18 solvers, tested on 192 AFs
10 minutes and 4 GB of RAM for solving a task.
1 point for each solved instance (used for in-track ranking).
General ranking done using Borda score.
19
27. Main Classes of Solvers
Solvers that took part in ICCMA 2015 can be (roughly) classified as
• reduction-based approaches: the argumentation problem is
encoded as a known problem such as SAT, ASP, MAX-SAT, etc.
• Can exploit availability of well-engineered solvers and established
techniques.
• direct approaches: the argumentation problem is tackled directly.
20
31. State of the Art
• It is not always the case that that reduction-based solvers always
outperform non reduction-based systems;
• The solvers at the state of the art show a high level of
complementarity (specially those able to deal with EE-PR problems),
thus they are suitable to be combined in portfolios;
24
33. Parallelising the Reasoning Process
Quick and clean solution: run multiple solvers in parallel.
Strenghts
• Easy to implement
• Low overhead of communication
Weaknesses
• No information shared among the solvers
• Does not allow to solve instances that are too large for sequential
solvers
26
34. Parallelising the Reasoning Process
Example: P-SCC-REC [7], for enumerating preferred extensions in large
AFs.
It leverages on the notion of Strongly Connected Components, and the
extension-based semantics definition schema SCC-recursiveness [2]
27
35. P-SCC-REC: idea
Creation of the SCCs-tree structure: {S1, S2}, {S3} , where S1 = {c, d},
S2 = {e, f }, and S3 = {g, h}.
a b
e f
c d g h
Level 1 Level 2
28
38. What does “Learning” Mean?
I have a set of AFs that want to analyse, I know the problem I am
working on, I picked up a solver that works decently.
...but, in order to deploy the system, I need it to be faster.
30
39. What does “Learning” Mean?
I have a set of AFs that want to analyse, I know the problem I am
working on, I picked up a solver that works decently.
...but, in order to deploy the system, I need it to be faster.
Let’s learn something then.
30
45. Which Kind of Knowledge?
• Combination and Selection of solvers
• Configuration of solvers
• Configuration (Reformulation) of AFs
Here we focus on knowledge that can be automatically extracted.
33
46. Combining and Selecting Solvers
(Solver selection can be seen as a particular case of portfolio
configuration)
• Static: the same portfolio is used for analysing any AF
• Dynamic: portfolio is configured according to some characteristics of
the AF
34
48. Static Portfolio
Defined by:
1. the selected solvers;
2. the order in which solvers will be run; and
3. the runtime allocated to each solver.
36
49. Static Portfolio: Approaches
In [8] two approaches were proposed:
Shared-k
Each component solver has been allocated maxRuntime
k seconds. Solvers
selected/ordered according to overall PAR10
FDSS
From an empty portfolio, we iteratively add either a new solver
component, or extend the allocated CPU-time of a solver already added
to the portfolio, depending on what maximises the increment of the
PAR10 score of the portfolio
37
51. Dynamic Portfolio
For each AF, a vector of features is computed.
Similar instances should have similar feature vectors.
Portfolios are configured using empirical performance models
39
52. Dynamic Portfolio: Features
Features can be extracted from different representations of an AF [3].
E.g., Directed graph representation.
• Graph size features: number of vertices, number of edges, ratios
verticesedges and inverse, and graph density
• Degree features: average, standard deviation, maximum, minimum
degree values across the nodes in the graph.
• SCC features: number of SCCs, average, standard deviation, maxi-
mum and minimum size.
• Graph structure: presence of auto-loops, number of isolated
vertices, etc
Similarly, features can be extracted by considering undirected graph, or
matrix representation.
40
53. Dynamic Portfolio: Approaches
Classification-based
Classify
It classifies a given AF into a single category which corresponds to the single solver
predicted to be the fastest and allocates it all the available CPU-time
Regression-based
1-Regression
Given the predicted runtime of each solver, the solver predicted to be the fastest is
selected and it has allocated all the available CPU-time
M-regression
Initially we select the solver predicted to be the fastest, but we allocate only its
predicted CPU-time +10%. If such a solver does not solve the given AF in the
allocated time, it is stopped and no longer available to be selected, and the process
iterates by selecting a different solver
41
57. Configuration of Algorithms
Solvers can be configured to improve performance on a class of problems
/ instances.
Image taken from [13].
45
58. Configuration of Algorithms
There exists several configuration approaches, based on different
underlying ideas.
For the sake of this talk, we focus on SMAC [12], used for configuring
ArgSemSAT
Image taken from [12].
46
60. Configuration of the Framework
Order arguments/attacks according to:
1. The number of attacks received;
2. The number of attacks to other arguments;
3. The presence of self-attacks;
4. The difference between the number of received attacks and the
number of attacks to other arguments;
5. Being an argument in a mutual attack.
+ arguments can be listed following a direct or inverse order
Ordering of arguments and attacks are independent
48
61. Configuration of the Framework (2)
a1 a3 a2
arg(a1).
arg(a2).
arg(a3).
att(a1,a3).
att(a2,a2).
att(a3,a1).
att(a3,a2).
arg(a2).
arg(a3).
arg(a1).
att(a2,a2).
att(a3,a2).
att(a3,a1).
att(a1,a3).
List of arguments ordered according to the number
of received attacks and, subsequently, the number
of outgoing attacks; and the list of attacks ordered
prioritising self-attacks and, subsequently, the
number of outgoing attacks
49
62. Parametrisation
Parameter Domain Default
args ingoingFirst [-1.0,1.0] 0
args outgoingFirst [-1.0,1.0] 0.2
args autoFirst [-1.0,1.0] -1
args eachOther [-1.0,1.0] -1
args differenceFirst [-1.0,1.0] -1
atts ingoingFirst [-1.0,1.0] 0
atts outgoingFirst [-1.0,1.0] 0
atts autoFirst [-1.0,1.0] 0.2
atts eachOther [-1.0,1.0] 0
atts differenceFirst [-1.0,1.0] 0
atts orders {0,1,2,3,4} 0
0 Same ordering applied to the first argument of the attack pair
1 Same ordering applied to the second argument of the attack pair
2 Inverse ordering applied to the first argument of the attack pair
3 Inverse ordering applied to the second argument of the attack pair
4 Attack-specific ordering
50
67. Learning for Argumentation: Summarising
Exploiting additional knowledge can help argumentation reasoners to
improve their runtime performance.
3 main approaches analysed so far:
• Portfolio / Algorithm Selection
• Algorithm Configuration
• Model Reformulation
55
69. References I
[1] A. Barabasi and R. Albert.
Emergence of scaling in random networks.
Science, 286(5439), 1999.
[2] P. Baroni and M. Giacomin.
A General Recursive Schema for Argumentation Semantics.
In Proceedings of the 14th European Conference on Artificial
Intelligence (ECAI 2004), pages 783–787.
[3] F. Cerutti, M. Giacomin, and M. Vallati.
Algorithm selection for preferred extensions enumeration.
In Computational Models of Argument - Proceedings of COMMA,
pages 221–232, 2014.
56
70. References II
[4] F. Cerutti, M. Giacomin, and M. Vallati.
Generating challenging benchmark AFs.
In Proceedings of COMMA, pages 457–458, 2014.
[5] F. Cerutti, M. Giacomin, and M. Vallati.
Generating challenging benchmark AFs: Afbenchgen2.
In Proceedings of COMMA, 2016.
[6] F. Cerutti, N. Oren, H. Strass, M. Thimm, and M. Vallati.
A benchmark framework for a computational argumentation
competition.
In Computational Models of Argument - Proceedings of COMMA,
pages 459–460, 2014.
57
71. References III
[7] F. Cerutti, I. Tachmazidis, M. Vallati, S. Batsakis, M. Giacomin,
and G. Antoniou.
Exploiting parallelism for hard problems in abstract
argumentation.
In Proceedings of the Twenty-Ninth AAAI Conference on Artificial
Intelligence, pages 1475–1481, 2015.
[8] F. Cerutti, M. Vallati, and M. Giacomin.
Where are we now? state of the art and future trends of
solvers for hard argumentation problems.
In Computational Models of Argument - Proceedings of COMMA,
pages 207–218, 2016.
[9] P. Erd¨os and A. R´enyi.
On random graphs. I.
Publicationes Mathematicae Debrecen, 6:290–297, 1959.
58
72. References IV
[10] A. E. Howe and E. Dahlman.
A critical assessment of benchmark comparison in planning.
J. Artif. Intell. Res. (JAIR), 17:1–3, 2002.
[11] B. Hurley and B. O’Sullivan.
Statistical regimes and runtime prediction.
In Proceedings of the Twenty-Fourth International Joint Conference
on Artificial Intelligence, IJCAI, pages 318–324, 2015.
[12] F. Hutter, H. H. Hoos, K. Leyton-Brown, and K. P. Murphy.
Time-bounded sequential parameter optimization.
In Learning and Intelligent Optimization, 4th International
Conference, LION, pages 281–298, 2010.
[13] F. Hutter, H. H. Hoos, K. Leyton-Brown, and y. v. p.
Thomas St¨utzle, journal=J. Artif. Intell. Res. (JAIR).
Paramils: An automatic algorithm configuration framework.
59
73. References V
[14] C. Linares L´opez, S. J. Celorrio, and A. G. Olaya.
The deterministic part of the seventh international planning
competition.
Artif. Intell., 223:82–119, 2015.
[15] M. Vallati and T. Vaquero.
Towards a protocol for benchmark selection in IPC.
In Proceedings of the 4th Workshop on the International Planning
Competition (WIPC), 2015.
[16] D. J. Watts and S. H. Strogatz.
Collective dynamics of ’small-world’ networks.
Nature, 393(6684):440–442, 1998.
60