Friday, July 16, 2010

planning paradigms

I plan to unify the following planning paradigms (this is just a draft). Let me outline the scope and try to ask / answer some basic questions:

The paradigms are:
  1. DP (deductive planning) -- in which planning is reduced to inference
  2. RL (reinforcement learning)
  3. classical, state-space-search planning (eg STRIPS)
  4. partial order planning, GRAPHPLAN, HTN (hierarchical task networks)
  5. utility theory, decision theory
  6. decision networks / influence graphs
  7. MDP (Markov decision processes), POMDP (partially observable MDPs)
  8. decision-theoretic agents
  9. "action theories" -- how to specify actions in logic
Some basic questions:


Should planning subsume inference, or inference subsume planning, or shall we have both?  Are there advantages (or disadvantages) if we have both mechanisms interleaved?  Or maybe it'll just create a mess -- "techniques must not be multiplied beyond necessity?"

A basic question is whether DP can mimic the mechanisms of:
  1. decomposition
  2. partial order planning
  3. hierarchical task planning
[ TO-DO:  insert architecture diagram ]


My latest conclusion is:
      DP + inductive learning   subsumes   RL

RL is the learning of an optimal policy.  A policy is a set of mappings:
        states -> actions

In deductive planning, we can have backward-chaining from a goal, but also forward-chaining from the current state.  The latter approach seems equivalent to RL, especially when augmented with inductive learning, where the learner can learn the rules that functions like the mappings above.

However, I should add that DP + IL (inductive learning) only mimics RL functionally, but is not equivalent to RL algorithmically.  Perhaps there are some merits in the RL algorithm?

One thing can be said for sure:  RL cannot co-exist simply with another planning method without creating conflicts.  In my earlier analysis I concluded that the planner should override RL whenever it has solutions, since it is knowledge-based and more sophisticated.  RL is useful when the planner is clueless, particularly in areas where knowledge is scant.


What is their relation to classical planning?

To be continued...


  1. YKY,

    I believe that RL can coexist with classical planning (in the limit of crisp, observable domains) and with monte-carlo tree planning (for partially observable markov processes), in the following manner.

    In my previous post, I talked about the idea of having a set of rules which would look much like the existing BN rules, but which would talk about utility and expected value. (NOTE: we need some sort of temporal logic to state these in?) RL-like techniques, informed by decision network / influence graph algorithms, would be used to update the values of these rules. However, as in *some* existing RL algorithms, there would be a separation between the attempt to learn the model of the world (including consequences of actions) and the attempt to compute expected values from this. This means that, as you said, RL emerges as a combination of inductive learning and some form of planning technique.

    A reasonable choice for the planning part seems to be MCTS, because of its generality, compatibility with partially observable markov, and compatibility with RL. Moreover, MCTS has shown excellent performance in the field. The only thing that makes me hesitate is that there are some "dark art" issues of fine-tuning to make it work well: the MC stage is not well understood, and things that intuitively should make a more accurate MC stage appear to degrade performance instead!

    Come to think of it, in the context of Genifer, there are some obvious modifications to MCTS that we could try. We are preferring to use belief prop rather than MCMC as our estimation alg... so perhaps we should try prop. algs before MC algs... and more generally, let Genifer *reason* about the goodness of a potential path, rather than just doing random-walk trials to evaluate its goodness... (Perhaps this would move us toward DP.)

  2. I assume by "inference" here you mean "inference control"? I see two aspects to "online planning" (online in the machine learning sense): "strategy construction" and "action selection". Inference control is an example of online planning. More sophisticated planning would be emergent from the system's dynamics, but at the bottom there's the algorithms doing inference control, and these are planning algorithms.

    It's always been that "Reinforcement Learning = Dynamic Programming + Inductive Learning", so your observation is literally true :-) (It could be argued that "deductive planning" is "dynamic programming in relational domains".)