1. Learning agents spec for CSE333 9/19/02 1
CSE333 project initial spec: Learning agents
Participants:
Huayan Gao (huayan.gao@uconn.edu),
Thibaut Jahan, (thj@ifrance.com)
David Keil, (DavidKeil@aol.com)
Jian Lian (lianjian@yahoo)
1. Objectives and goals
This project will investigate current research on software learning agents and will
implement a simple system that demonstrates such agents. Our goal is to build a
distributed learning agent system that interactively finds a policy for navigating a maze.
Our implementation will be component-based, using UML and Java. It may also include
investigation on scalability, robustness, and adaptability of the system. Four component
candidates of a distributed learning agent are perception, action, communication, and
learning. Our ambition is to build a general architecture model of components for
learning agents. We implement the different "generic" components so they can be
assembled easily into an agent. Learning in interaction with the agent’s environment is
the problem of reinforcement learning. We will therefore address reinforcement learning
(within Q-learning).
2. Topic summary
Reinforcement learning
Reinforcement learning is rational policy search and revives ideas associated with
adaptive systems and related to optimal control and dynamic programming [sut-bar98].
Traditional machine-learning research approaches assumed that learning was offline
(separated from application of knowledge learned).
2. Learning agents spec for CSE333 9/19/02 2
A policy maps from agent states (shaped by percepts) to actions, defining an agent’s
actions as a series of responses to previously unknown, dynamically generated percepts.
A rational agent is one that acts to maximize its expected utility or future reward or
performance measure. Because their actions may affect the environment, such agents
must incorporate thinking or planning ahead into their computations. Because they obtain
information from their environments only through percepts, they have incomplete
knowledge of the environment and must conduct a trial-and-error search for a policy that
obtains a high performance measure.
Q-learning
Q-learning is a variant of reinforcement learning in which the agent incrementally
computes, from its interaction with its environment, a table of expected aggregate future
rewards, with values discounted as they extend into the future. As it proceeds, the agent
modifies the values in the table to refine its estimates. The Q function returns the optimal
action, given a state. The evolving table of estimated Q values is called Qˆ.
Intelligent agents
The term “agent” is used in two senses: (a) programs that act on behalf of humans to
gather information [syc-pan96]; (b) entities that interact with their environments by
retrieving percepts and generating actions. We will use the common restriction of (b) to
rational agents that act in such a way as to maximize future expected reward.
In this project, we only consider software agents instead of autonomous robots,
expert assistance, etc.
3. Learning agents spec for CSE333 9/19/02 3
The real problem with any intelligent agent system is the amount of trust placed in
the agent's ability to cope with the information provided by its sensors in its environment.
This would be the emphasis when we study the agent.
The Agent application fields concern Economics, Business (Commercial databases),
Management, Telecommunications (Network Management) and e-societies (like for e-
commerce). Those areas combine techniques from databases, statistics, and machine
learning and agent applications are widely used. In telecommunication field, agent
technology is used to support efficient (in terms of both cost and performance) service
provision to fixed and mobile users in competitive telecommunications environments.
3. Topic breakdown
An example problem
The concrete problem described below will help to define how the project breaks
down into components:
Both [mitchelt97] and [sut-bar98] present a simple example consisting of a maze for
which the learner must find a policy, where the reward is determined by eventually
reaching or not reaching a goal location in the maze.
We propose to modify the original problem definition by permitting multiple
distributed agents that communicate, either directly or via the environment. Either the
multi-agent system, or each agent, will use Q-learning. The mazes can be made arbitrarily
simple or complex to fit the speed and computational power and effectiveness of the
system we are able to develop in the time available.
4. Learning agents spec for CSE333 9/19/02 4
A further interesting variant of the problem would be to allow the maze to change
dynamically, either autonomously or in response to the learning agents. Robust
reinforcement learners will adapt successfully to such changes.
Topic breakdown
1. Machine learning
Part of this project will consist of investigating the literature on machine learning,
particularly reinforcement learning, and defining an approach based on this literature that
is realistically implementable by the team.
2. Agent computing
We will survey the agent paradigm of computing, focusing on rational agents, as
described in part 2 above. We will apply these concepts to the problem of machine
learning, as is done in much reinforcement-learning research.
3. Distributed computing
In multiagent learning in the strong sense, a common learning goal is pursued or, in
the weaker sense, agents pursue separate goals but share information. Distributed agents
may identify or execute distinct learning subtasks [weiss99]. We will survey the literature
on distributed computing, looking for connections to learning agents, and will apply what
we find in an attempt to build a distributed system of cooperating learning agents.
4. Implementation using Together, UML, and Java
The maze described above could be represented as a bitmap or two-dimensional
array of squares. Starting with a simple example is useful in order to concentrate on good
component design and successful implementation.
5. Learning agents spec for CSE333 9/19/02 5
Division of labor
The team members will work together on each main aspect of the project; however,
it is envisioned that leadership of the work in the respective areas will be distributed as
follows:
• Machine learning: David
• Agent computing: Jian
• Distributed computing: Huayan
• Tools and implementation: Thibaut
Scope
We will consult sources to gain a survey knowledge of the fields of agent computing,
distributed computing, and reinforcement learning, especially Q-learning. Our design and
implementation effort will focus narrowly on an artifact of realistic limited scope that
solves a well-defined arbitrarily simplifiable maze problem using Q-learning. We will
relate the features of our implementation to recent research in the same narrow area and
to broader concepts encountered in the sources.
4. Planned activities
Sept 19th – Oct 22nd:
We will acquire the knowledge needed in order for us to design learning agents.
Concurrently, we will start designing the learning agents using UML; design issues are
critical for the future implementation. Simple Java prototypes of standalone agents that
navigate a maze will be built, beginning with classes generated by Together.
6. Learning agents spec for CSE333 9/19/02 6
Oct 12th – Oct 21st:
Further source research in sources on distributed learning agents. Drafting of
summary of source research. Design and implementation of communicating distributed
agents with simple learning features. Preparation of the mid-term report.
Oct 22nd – mid November:
Java implementation of the learning aspect of the agents and enhancement of
communication efficiency. Each participant will code the components decided on and
described in the design part. Once these components are tested, they will be integrated
and the resulting system tested.
End of November:
Preparation of the final report and last adjustments of the learning agents.
7. Learning agents spec for CSE333 9/19/02 7
APPENDICES
Appendix A: References
The list of references below will be reduced to those actually used in writing the paper,
and cited, or used in the implementation.
[aga-bek97] Arvin Agah and George A. Bekey. Phylogenetic and ontogenetic learning in
a colony of interacting robots. Autonomous Robots 4, pp. 85-100, 1997.
[anders02] Chuck Anderson. Robust Reinforcement Learning with Static and Dynamic
Stability. http://www.cs.colostate.edu/~anderson/res/rl/nsf2002.pdf, 2002.
[durfee99] Edmund H. Durfee. Distributed problem solving and planning. In Gerhard
Weiss, Ed., Multiagent systems: A modern approach to distributed artificial
intelligence. MIT Press, 1999, pp. 121ff, 1999.
[fra-gra96] Stan Franklin and Art Graesser. Is it an agent, or just a program?:
A taxonomy for autonomous agents. Proceedings of the Third International Workshop
on Agent Theories, Architectures, and Languages, 1996. www.msci.memphis.edu/
~franklin/AgentProg.html
[huh-ste99] Michael N. Huhns and Larry M. Stephens. Multiagent systems and societies
of agents. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to
distributed artificial intelligence, MIT Press, 1999, pp. 79-120, 1999.
[lam-lyn90] Leslie Lamport and Nancy Lynch. Distributed computing: models and
methods. In Jan van Leeuwen, ed., Handbook of Theoretical Computer Science, Vol. B,
MIT Press, 1990, pp. 1158-1199.
[mitchelt97] Tom M. Mitchell. Machine learning. McGraw-Hill, 1997.
[mor-mii96] David E. Moriarty and Risto Miikkulainen. Efficient reinforcement learning
through symbiotic evolution. Machine Learning 22, pp. 11-33, 1996.
[petrie96] Charles J. Petrie. Agent-based engineering, the web, and intelligence.
IEEE Expert, December 1996.
[rus-nor95] Stuart Russell and Peter Norvig. Artificial intelligence: A modern approach.
Prentice Hall, 1995.
[SAG97] Software Agents Group MIT Media Laboratory. “CHI97 Software Agents
Tutorial”, http://pattie.www.media.mit.edu/people/pattie/CHI97/.
[sandho99] Tuomas W. Sandholm. Distributed rational decision making. In Gerhard
Weiss, Ed., Multiagent systems: A modern approach to distributed artificial
intelligence, MIT Press, 1999, pp. 201-258, 1999.
[sen-wei99] Sandip Sen and Gerhard Weiss. Learning in multiagent systems. In Gerhard
Weiss, Ed., Multiagent systems: A modern approach to distributed artificial
intelligence, MIT Press, 1999, pp. 259-298, 1999.
8. Learning agents spec for CSE333 9/19/02 8
[shen94] Wei-Min Shen. Autonomous learning from the environment. Computer Science
Press, 1994.
[sut-bar98] Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An
introduction. MIT Press, 1998.
[syc-pan96] Katia Sycara, Anandeep Pannu, Mike Williamson, Dajun Zeng, Keith
Decker. Distributed intelligent agents. IEEE Expert, December 1996, pp. 36-45.
[venners97] Bill Venners. The architecture of aglets. Java World, April, 1997.
[wal-wya94] Jim Waldo, Geoff Wyant, Ann Wollrath, Sam Kenall. A note on distributed
computing. Sun Microsystems technical report SMLI TR-94-29, November 1994.
[weiss99] Gerhard Weiss, Ed. Multiagent systems: A modern approach to distributed
artificial intelligence. MIT Press, 1999.
[wooldr99] Michael Wooldridge. Intelligent agents. In Gerhard Weiss, Ed., Multiagent
systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp.
27-77.
Appendix B: Definition and classification of agents
Definition of agents
Researchers involved in agent have offered a variety of definitions. If we use some
general feature of agent to characterize it, it should be autonomous, goal-oriented,
collaborative, flexible, self-starting, temporal continuity, character, adaptive, mobile, and
learning. According to definition from IBM, "Intelligent agents are software entities that
carry out some set of operations on behalf of a user or another program with some
degree of independence or autonomy, and in so doing, employ some knowledge or
representation of the user's goals or desires”.
From Stan Franklin, “An autonomous agent is a system situated within and a part
of an environment that senses that environment and acts on it, over time, in pursuit of its
own agenda and so as to effect what it senses in the future”.
From the feature of agent, there is a wide set of agent types.
9. Learning agents spec for CSE333 9/19/02 9
Interface Agents
Computer programs that employ Artificial Intelligence techniques to provide active
assistance to a user with computer-based tasks
Mobile Agents
Software processes capable of moving around networks such as the world wide web
(WWW), interacting with other hosts, gathering information on behalf of their owner and
returning with any information it found that was requested by the owner.
Co-operative Agents
A co-operative agent can communicate with, and react to its environment. An agent's
view of its environment would be very narrow due to its limited sensors. Co-operation
exists when the actions of an agent achieves not only the agent's own goals, but also the
goals of agents other than itself.
Reactive Agents
Reactive agents are a special type of agent, which do not possess internal symbolic
models of their environment. Instead, the reactive agent "reacts" to a stimulus or input
that is governed by some state or event in its environment. This environmental event
triggers a reaction or response from the agent
Appendix C: Agent Development and Implementation
JADE (Java Agent DEvelopment Framework) is a software framework fully
implemented in Java language. It simplifies the implementation of multi-agent systems
through a middle-ware through a set of tools that supports the debugging and deployment
phase. The agent platform can be distributed across machines (which not even need to
share the same OS) and the configuration can be controlled via a remote GUI. The
10. Learning agents spec for CSE333 9/19/02 10
configuration can be even changed at run-time by moving agents from one machine to
another one, as and when required. JADE is completely implemented in Java language
and the minimal system requirement is the version 1.2 of JAVA (the run time
environment or the JDK).
Appendix D: Pros and cons of smart/learning agents
and applications
The Pros of learning agents are
1) Agent adapts to environment change
2) Agent can be customized
3) Agent has manageable flexibilities.
The Cons are
1) Agents need time to learn/relearn
2) Agents can only automate preexisting pattern
3) Agents have no common sense
Appendix E: Exploitation and exploration in learning
For agents that use reinforcement learning, unlike systems that learn by training
examples, the issue arises of exploitation of obtained knowledge versus exploration to
obtain new information. Exploration gains no immediate reward and is only useful if it
can improve future utility. An exploitation-only policy, on the other hand, would mean
sacrificing any learning, to improve future expected reward, in favor of immediate
reward.
11. Learning agents spec for CSE333 9/19/02 11
Appendix F: Risks
We will seek to avoid several possible obstacles, including:
• The construction of “toy worlds,” i.e., problem specifications tailored to the
envisioned solution;
• Complexity of design without performance gain;
• Overfitting the generalizable components to the specific problem at hand,
putting reusability at risk;
Premature commitment to a specific solution (Q-learning) as opposed to exploration
of various alternatives.
Reference to get title, author:
[xx99] http://www.cs.helsinki.fi/research/hallinto/TOIMINTARAPORTIT/1999/report99/node2.html.