Robust Agent Execution

1. Robust Agent Execution: Generalising Plans to Landscapes "What I done on my MRes"

3. Its important to consider what happens when plans are executed.

4. Typically, the assumptions we've made to make our models tractable also mean that the plan won't execute as intended.

6. Its important to consider what happens when plans are executed.

7. Typically, the assumptions we've made to make our models tractable also mean that the plan won't execute as intended. This is a problem.

9. Agents based around reaction can deal with changes in their environment.

10. But they tend to be ill-suited to acting towards long term objective.

12. Slow process, but leads to high quality decisions.

13. Very good for long term goal achievement.

14. Less well suited to environments where things are changing outside the agent's control, and fast-paced environments.

16. Not too hard to remodel the world based on current observations and rethink, but it is time consuming.

17. That also assumes that the model is rich enough to capture unforeseen consequences.

18. For reactive agents, things are rarely going to go tragically wrong, but in dealing with "threats", agent may not achieve objectives.

20. It isn't enough to label some aspects as being handled by a reactive component and others by a deliberative - the interactions of the two can still be disruptive to goal satisfaction.

21. We can't reason fast enough to make near-real-time deliberative decisions.

23. The basic approach is to allow an agent to make decisions based on a continuous range of stimuli from a mixture of sources.

24. This will allow the agent to react deliberatively, or deliberate reactively.

27. Developers aim for 60fps execution - each frame gets 16ms of CPU time, but most of this is graphics processing etc. AI decisions might expect to get at most 1ms total for every agent.

29. Evaluating mathematical functions is fast.

30. We use heuristics to evaluate individual states and guide search towards likely good solutions.

31. What if we were evaluating the entire state space?

33. "Influence" radiates out from a point of interest, the amount of influence exerted decays as the distance from the point increases.

34. Influence can be positive or negative to attract or repel an agent from the points.

35. Influence from multiple points can interact. Additive, multiplicative etc.

37. In planning domains, our variables don't have this kind of spatial mapping.

38. Or do they...

40. DTGs define the manner in which variables can change value.

41. Gives a sense of adjacency of values within the domain of a single variable.

42. Each SAS+ variable can be seen then as having an order, allowing an Influence Map to be defined across the representation.

44. Nodes we feel are important are attractive.

45. Nodes we need to avoid are repellant.

46. Influence propagates across the DTG.

48. A layered system involves giving priority to certain aspects, or arbitrating between them.

49. We instead use a "stack" model, in which each unit feeds directly into the executive.

50. This gives an architecture free from hierarchical bias and prioritisation.

52. The landscapes are then fed to the executive which can incorporate all the relevant information into its decision making.

54. Environmental Data

55. Plan Data

57. Given a goal node, it propagates influence across a DTG, providing information for the shortest path through the DTG as well as highlighting the existence of alternate routes.

58. A naïve baseline for manipulating the environment, which other stacks then modify with more detailed information.

60. It allows for updates to the perceived value of a state, and for these value alterations to be propagated out as influence to allow an agent to exploit opportunities or avoid dangers.

64. In general Roadblocks do not signify an alteration to the physical characteristics of the world, rather an addition of other characteristics.

65. E.g. Actual roadblocks - the road still exists, and maybe will be available again within the lifecycle of the agent.

67. This is where the majority of reasoning is done, and issues such as Causal satisfaction are primarily handled.

68. Forms the basis of the execution - with no additional data and no problems, the plan should be what the agent executes.

70. Deviation from the plan should be permissible and achievable if necessary.

71. Imagining the plan as a trajectory through the space, do we want to strongly describe the trajectory as a ridge through the space, or weakly describe it as a set of waypoints.

72. Allowing loose conformity through weak description gives scope for deviation and alternate routes being found.

74. Based on SAS+ notation we know that nodes can be grouped together, such as cities representing the UK and EU.

75. Clustering allows us to build these groupings automatically (although not necessarily as obviously as by inspection)

77. Calculate the centroid of the cluster based on the centroid being the node with the smallest average weighted distance to each node within the cluster.

78. Update weights based on relative distance to each centroid.

79. Repeat to stability.

82. We have a list of nodes that are "Focal Nodes" within a DTG.

83. Activated Focal Nodes are those that appear in both lists.

84. These are the waypoints that we use to form a landscape that loosely conforms to the original plan, and these nodes radiate influence to form the Plan Data Landscape.

86. In the initial implementation, these are simply summed to find the overall value of each node in the landscape.

87. More sophisticated techniques might be appropriate instead.

89. The Stacks are independent of each other, so can run asynchronously to update the landscape, and can be parallelised.

90. Much of the heavy-lifting can be done offline, meaning that the execution-time components are kept very simple.

92. Domain structure analysis means that the agent can see alternative routes to the intended node, when the total cost of a route becomes too great, other routes will be used.

93. Environmental data means that the agent can be influenced by what is happening around it and can react to a dynamic environment.

95. Still not entirely satisfied that the specific details laid out here (or glossed over) are completely appropriate.

96. Entire system now functional and tested though!

102. Need to have a much more robust implementation in place, and ideally be built into an actual application.

103. Want to show the extensibility of the approach by developing new Stacks such as an Opponent Modelling Stack to highlight expectation of other agent's actions and how this can be factored in as an additional landscape.

Robust Agent Execution

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (8)

Semelhante a Robust Agent Execution

Semelhante a Robust Agent Execution (20)

Mais de Luke Dicken

Mais de Luke Dicken (19)

Último

Último (20)

Robust Agent Execution