Wednesday, June 3, 2015

Reinforcement learning Agents

I thought I'll use agents to schedule in the hybrid cloud, since they have good autonomic behaviour and also capable of learning.

Notes from Stuart Russuel, Peter Norvig, "Artificial Intelligence : A modern apporach

Supervised learning

supervised learning methods are appropriate when a teacher is providing correct values or when the function's output represents a prediction about the future that can be checked by looking at the percepts in the next time step

How canagents can learn in much less generous environments, 
where the agent receives no examples,
and starts with no model of the environment and no utility function?

The agent should have some sort of feedback.  It can try some random moves, if it is closer to the solution it receives a reward / reinforcement

end state : terminal state in the state history sequence

The task of reinforcement learning is to use rewards to learn a successful agent function
reward can be provided by a precept
In complex domains, reinforcement learning is the only feasible way to train a
program to perform at high levels.

Basic working of Reinforcement learning

An agent in an environment gets percepts, maps some of them to positive or negative utilities, and  then has to decide what action to take.


  • The environment can be accessible or inaccessible. 
    • In an accessible environment, states can be identified with percepts
    • Inaccessible environment, the agent must maintain some internal state to try to keep track of the environment.
  • The agent can begin with knowledge of the environment and the effects of its actions; 
    • or it will have to learn this model as well as utility information
  • Rewards can be received only in terminal states, or in any state
  • Rewards can be components of the actual utility that the agent is trying to maximize, or they can be hints as to the actual  utility.
  • The agent can be a passive learner or an active learner.

2 basic designs

  • agent learns a utility function on states, uses it to select actions that maximize the expected utility of their outcomes
  • The agent learns an action-value function giving the expected utility of taking a given action in a given state. This is called Q-learning.
Utility function
An agent that learns utility functions must also have a model of the environment.  
It must know the basic rules
only then it can apply the utility function to the outcome states.

Action-value function
they do not know where their actions lead, they cannot look ahead.
it can compare their values directly without having to consider their outcomes.

An action-value function assigns an expected utility to taking a given action in a given state
Q(a,i) to denotes the value of doing action a in state i
Q-values are directly related to utility values
U(i) = max Q(a, i)

2 adv:
  • they suffice for decision making without the use of a model
  • they can be learned directly from reward feedback

Passive learning in a known environment

In passive learning, the environment generates state transitions and the agent perceives them
The object is to use the information about rewards to learn the expected utility U(i) associated with each nonterminal state i.
the utility of a sequence is the sum of the rewards accumulated in the states of the sequence, i.e. it is additive.

No comments:

Post a Comment