CSE333 project initial spec: Learning agents

Learning agents spec for CSE333 9/19/02 1

CSE333 project initial spec: Learning agents
Participants:
Huayan Gao (huayan.gao@uconn.edu),
Thibaut Jahan, (thj@ifrance.com)
David Keil, (DavidKeil@aol.com)
Jian Lian (lianjian@yahoo)

1. Objectives and goals
This project will investigate current research on software learning agents and will

implement a simple system that demonstrates such agents. Our goal is to build a

distributed learning agent system that interactively finds a policy for navigating a maze.

Our implementation will be component-based, using UML and Java. It may also include

investigation on scalability, robustness, and adaptability of the system. Four component

candidates of a distributed learning agent are perception, action, communication, and

learning. Our ambition is to build a general architecture model of components for

learning agents. We implement the different "generic" components so they can be

assembled easily into an agent. Learning in interaction with the agent’s environment is

the problem of reinforcement learning. We will therefore address reinforcement learning

(within Q-learning).

2. Topic summary

Reinforcement learning

Reinforcement learning is rational policy search and revives ideas associated with

adaptive systems and related to optimal control and dynamic programming [sut-bar98].

Traditional machine-learning research approaches assumed that learning was offline

(separated from application of knowledge learned).


A policy maps from agent states (shaped by percepts) to actions, defining an agent’s

actions as a series of responses to previously unknown, dynamically generated percepts.

A rational agent is one that acts to maximize its expected utility or future reward or

performance measure. Because their actions may affect the environment, such agents

must incorporate thinking or planning ahead into their computations. Because they obtain

information from their environments only through percepts, they have incomplete

knowledge of the environment and must conduct a trial-and-error search for a policy that

obtains a high performance measure.

Q-learning

Q-learning is a variant of reinforcement learning in which the agent incrementally

computes, from its interaction with its environment, a table of expected aggregate future

rewards, with values discounted as they extend into the future. As it proceeds, the agent

modifies the values in the table to refine its estimates. The Q function returns the optimal

action, given a state. The evolving table of estimated Q values is called Qˆ.

Intelligent agents

The term “agent” is used in two senses: (a) programs that act on behalf of humans to

gather information [syc-pan96]; (b) entities that interact with their environments by

retrieving percepts and generating actions. We will use the common restriction of (b) to

rational agents that act in such a way as to maximize future expected reward.

In this project, we only consider software agents instead of autonomous robots,

expert assistance, etc.


The real problem with any intelligent agent system is the amount of trust placed in

the agent's ability to cope with the information provided by its sensors in its environment.

This would be the emphasis when we study the agent.

The Agent application fields concern Economics, Business (Commercial databases),

Management, Telecommunications (Network Management) and e-societies (like for e-

commerce). Those areas combine techniques from databases, statistics, and machine

learning and agent applications are widely used. In telecommunication field, agent

technology is used to support efficient (in terms of both cost and performance) service

provision to fixed and mobile users in competitive telecommunications environments.

3. Topic breakdown

An example problem

The concrete problem described below will help to define how the project breaks

down into components:

Both [mitchelt97] and [sut-bar98] present a simple example consisting of a maze for

which the learner must find a policy, where the reward is determined by eventually

reaching or not reaching a goal location in the maze.

We propose to modify the original problem definition by permitting multiple

distributed agents that communicate, either directly or via the environment. Either the

multi-agent system, or each agent, will use Q-learning. The mazes can be made arbitrarily

simple or complex to fit the speed and computational power and effectiveness of the

system we are able to develop in the time available.


A further interesting variant of the problem would be to allow the maze to change

dynamically, either autonomously or in response to the learning agents. Robust

reinforcement learners will adapt successfully to such changes.

Topic breakdown

1. Machine learning

Part of this project will consist of investigating the literature on machine learning,

particularly reinforcement learning, and defining an approach based on this literature that

is realistically implementable by the team.

2. Agent computing

We will survey the agent paradigm of computing, focusing on rational agents, as

described in part 2 above. We will apply these concepts to the problem of machine

learning, as is done in much reinforcement-learning research.

3. Distributed computing

In multiagent learning in the strong sense, a common learning goal is pursued or, in

the weaker sense, agents pursue separate goals but share information. Distributed agents

may identify or execute distinct learning subtasks [weiss99]. We will survey the literature

on distributed computing, looking for connections to learning agents, and will apply what

we find in an attempt to build a distributed system of cooperating learning agents.

4. Implementation using Together, UML, and Java

The maze described above could be represented as a bitmap or two-dimensional

array of squares. Starting with a simple example is useful in order to concentrate on good

component design and successful implementation.


Division of labor

The team members will work together on each main aspect of the project; however,

it is envisioned that leadership of the work in the respective areas will be distributed as

follows:

• Machine learning: David

• Agent computing: Jian

• Distributed computing: Huayan

• Tools and implementation: Thibaut

Scope

We will consult sources to gain a survey knowledge of the fields of agent computing,

distributed computing, and reinforcement learning, especially Q-learning. Our design and

implementation effort will focus narrowly on an artifact of realistic limited scope that

solves a well-defined arbitrarily simplifiable maze problem using Q-learning. We will

relate the features of our implementation to recent research in the same narrow area and

to broader concepts encountered in the sources.

4. Planned activities

Sept 19th – Oct 22nd:

We will acquire the knowledge needed in order for us to design learning agents.

Concurrently, we will start designing the learning agents using UML; design issues are

critical for the future implementation. Simple Java prototypes of standalone agents that

navigate a maze will be built, beginning with classes generated by Together.


Oct 12th – Oct 21st:

Further source research in sources on distributed learning agents. Drafting of

summary of source research. Design and implementation of communicating distributed

agents with simple learning features. Preparation of the mid-term report.

Oct 22nd – mid November:

Java implementation of the learning aspect of the agents and enhancement of

communication efficiency. Each participant will code the components decided on and

described in the design part. Once these components are tested, they will be integrated

and the resulting system tested.

End of November:

Preparation of the final report and last adjustments of the learning agents.


APPENDICES

Appendix A: References
The list of references below will be reduced to those actually used in writing the paper,
and cited, or used in the implementation.
[aga-bek97] Arvin Agah and George A. Bekey. Phylogenetic and ontogenetic learning in
a colony of interacting robots. Autonomous Robots 4, pp. 85-100, 1997.
[anders02] Chuck Anderson. Robust Reinforcement Learning with Static and Dynamic
Stability. http://www.cs.colostate.edu/~anderson/res/rl/nsf2002.pdf, 2002.
[durfee99] Edmund H. Durfee. Distributed problem solving and planning. In Gerhard
Weiss, Ed., Multiagent systems: A modern approach to distributed artificial
intelligence. MIT Press, 1999, pp. 121ff, 1999.
[fra-gra96] Stan Franklin and Art Graesser. Is it an agent, or just a program?:
A taxonomy for autonomous agents. Proceedings of the Third International Workshop
on Agent Theories, Architectures, and Languages, 1996. www.msci.memphis.edu/
~franklin/AgentProg.html
[huh-ste99] Michael N. Huhns and Larry M. Stephens. Multiagent systems and societies
of agents. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to
distributed artificial intelligence, MIT Press, 1999, pp. 79-120, 1999.
[lam-lyn90] Leslie Lamport and Nancy Lynch. Distributed computing: models and
methods. In Jan van Leeuwen, ed., Handbook of Theoretical Computer Science, Vol. B,
MIT Press, 1990, pp. 1158-1199.
[mitchelt97] Tom M. Mitchell. Machine learning. McGraw-Hill, 1997.
[mor-mii96] David E. Moriarty and Risto Miikkulainen. Efficient reinforcement learning
through symbiotic evolution. Machine Learning 22, pp. 11-33, 1996.
[petrie96] Charles J. Petrie. Agent-based engineering, the web, and intelligence.
IEEE Expert, December 1996.
[rus-nor95] Stuart Russell and Peter Norvig. Artificial intelligence: A modern approach.
Prentice Hall, 1995.
[SAG97] Software Agents Group MIT Media Laboratory. “CHI97 Software Agents
Tutorial”, http://pattie.www.media.mit.edu/people/pattie/CHI97/.
[sandho99] Tuomas W. Sandholm. Distributed rational decision making. In Gerhard
intelligence, MIT Press, 1999, pp. 201-258, 1999.
[sen-wei99] Sandip Sen and Gerhard Weiss. Learning in multiagent systems. In Gerhard
intelligence, MIT Press, 1999, pp. 259-298, 1999.


[shen94] Wei-Min Shen. Autonomous learning from the environment. Computer Science
Press, 1994.
[sut-bar98] Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An
introduction. MIT Press, 1998.
[syc-pan96] Katia Sycara, Anandeep Pannu, Mike Williamson, Dajun Zeng, Keith
Decker. Distributed intelligent agents. IEEE Expert, December 1996, pp. 36-45.
[venners97] Bill Venners. The architecture of aglets. Java World, April, 1997.
[wal-wya94] Jim Waldo, Geoff Wyant, Ann Wollrath, Sam Kenall. A note on distributed
computing. Sun Microsystems technical report SMLI TR-94-29, November 1994.
[weiss99] Gerhard Weiss, Ed. Multiagent systems: A modern approach to distributed
artificial intelligence. MIT Press, 1999.
[wooldr99] Michael Wooldridge. Intelligent agents. In Gerhard Weiss, Ed., Multiagent
systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp.
27-77.

Appendix B: Definition and classification of agents

Definition of agents

Researchers involved in agent have offered a variety of definitions. If we use some

general feature of agent to characterize it, it should be autonomous, goal-oriented,

collaborative, flexible, self-starting, temporal continuity, character, adaptive, mobile, and

learning. According to definition from IBM, "Intelligent agents are software entities that

carry out some set of operations on behalf of a user or another program with some

degree of independence or autonomy, and in so doing, employ some knowledge or

representation of the user's goals or desires”.

From Stan Franklin, “An autonomous agent is a system situated within and a part

of an environment that senses that environment and acts on it, over time, in pursuit of its

own agenda and so as to effect what it senses in the future”.

From the feature of agent, there is a wide set of agent types.


Interface Agents

Computer programs that employ Artificial Intelligence techniques to provide active

assistance to a user with computer-based tasks

Mobile Agents

Software processes capable of moving around networks such as the world wide web

(WWW), interacting with other hosts, gathering information on behalf of their owner and

returning with any information it found that was requested by the owner.

Co-operative Agents

A co-operative agent can communicate with, and react to its environment. An agent's

view of its environment would be very narrow due to its limited sensors. Co-operation

exists when the actions of an agent achieves not only the agent's own goals, but also the

goals of agents other than itself.

Reactive Agents

Reactive agents are a special type of agent, which do not possess internal symbolic

models of their environment. Instead, the reactive agent "reacts" to a stimulus or input

that is governed by some state or event in its environment. This environmental event

triggers a reaction or response from the agent

Appendix C: Agent Development and Implementation
JADE (Java Agent DEvelopment Framework) is a software framework fully

implemented in Java language. It simplifies the implementation of multi-agent systems

through a middle-ware through a set of tools that supports the debugging and deployment

phase. The agent platform can be distributed across machines (which not even need to

share the same OS) and the configuration can be controlled via a remote GUI. The


configuration can be even changed at run-time by moving agents from one machine to

another one, as and when required. JADE is completely implemented in Java language

and the minimal system requirement is the version 1.2 of JAVA (the run time

environment or the JDK).

Appendix D: Pros and cons of smart/learning agents
and applications
The Pros of learning agents are

1) Agent adapts to environment change

2) Agent can be customized

3) Agent has manageable flexibilities.

The Cons are

1) Agents need time to learn/relearn

2) Agents can only automate preexisting pattern

3) Agents have no common sense

Appendix E: Exploitation and exploration in learning
For agents that use reinforcement learning, unlike systems that learn by training

examples, the issue arises of exploitation of obtained knowledge versus exploration to

obtain new information. Exploration gains no immediate reward and is only useful if it

can improve future utility. An exploitation-only policy, on the other hand, would mean

sacrificing any learning, to improve future expected reward, in favor of immediate

reward.


Appendix F: Risks
We will seek to avoid several possible obstacles, including:

• The construction of “toy worlds,” i.e., problem specifications tailored to the

envisioned solution;

• Complexity of design without performance gain;

• Overfitting the generalizable components to the specific problem at hand,

putting reusability at risk;

Premature commitment to a specific solution (Q-learning) as opposed to exploration

of various alternatives.

Reference to get title, author:
[xx99] http://www.cs.helsinki.fi/research/hallinto/TOIMINTARAPORTIT/1999/report99/node2.html.

CSE333 project initial spec: Learning agents

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Viewers also liked

Viewers also liked (7)

Similar to CSE333 project initial spec: Learning agents

Similar to CSE333 project initial spec: Learning agents (20)

More from butest

More from butest (20)

CSE333 project initial spec: Learning agents