Thursday, July 15, 2010

Thoughts on Goals

These are some thoughts on what might constitute an effective goal system.

One use-case I like to keep in mind is inference guidance. The goal system would not necessarily really be called upon for guidance at every inference step (this might be too expensive), but imagining that it was, we'd like to have a system that could perform fairly well.

The basic case, then, would be to think of a query as a goal and create a system which would respond to it in a manner similar to our existing inference algorithm (utilizing some background knowledge about how to achieve this goal).

Basically, we'd want to tell the system that if we have a goal of estimating variable A, and we have a rule involving a variable that unifies with A, the other variables in that rule become subgoals and we can get a better estimate for A by improving our estimates for them and applying belief prop to calculate A. This is a fairly messy statement, because it implicitly relies on an argument of convergence. Basically, the idea is that taking more rules into account tends to produce a better estimate... but this is not always the case, of course. The idea we're employing is something like "taking all the rules into account would result in the best estimate we could come up with, and chains of reasoning with more elements will tend to get closer to this..."

The line of thought from the basic facts to the appropriate action would be highly nontrivial. As a general principle, I think it's usually infeasible to plan an action from first principles when we need to decide what to do. This means (roughly) that the framework of utility theory, which tells us that we should take the action which maximizes the expected utility, is insufficient. In principle we can do with just declarative knowledge, but in practice, we need procedural knowledge.

Genifer currently uses declarative rules, which tell her how facts follow from other facts. What I'm suggesting is that we also use procedural rules, which would tell her what to do when. Procedural rules would, put together, resemble programs.

The idea here is that rather than deciding what to do for each new problem based on a planning algorithm, the system should always be in the process of constructing an overall policy, a program which it uses to make decisions. This reinforces the theme of automatic programming in Genifer.

The basic goal is to construct a set of rules which imply the highest success rate (the highest expected utility). Updateless decision theory tells us that we should calculate the expected success rate in our earliest prior distribution, that is, refusing to update on anything we learn. This technically loses nothing, because we can create rules "If you have learned such-and-such then ...", and the method solves a few problems with conventional decision theory. Notably, though, conditional decision theory (in which we look for the policy with the highest expected value given all that we know) works just fine in most cases, and would probably be faster to reason about.

Either way, the basic paradigm for converting from declarative knowledge to procedural knowledge is to search for rules with high expected utility. Unfortunately, the utility of rules are interrelated, because we've got to know how we will behave in the future in order to know which action is best to take now. This problem has standard treatments in the reinforcement learning literature, however.

The basic idea, then: whereas a declarative rule is in the form statements -> statements, and the conclusion carries probabilities, and imperative rule is in the form statements -> actions, and the actions carry (estimated) expected utilities.

What about goals and subgoals? It makes sense to think of a goal as a utility-bearing statement which is not under our direct control. We want to achieve this goal by our actions; an action is something which is under direct control. So if we generalize the above a bit, allowing imperative rules to be arbitrary "statements -> statements" rules, we get goals and subgoals automatically via the formula "subgoal -> goal." A top-level goal is a statement given unconditional utility (that is to say, utility not derived from any rule).

[Edit: this is a declarative rule subgoal -> goal,  which should be accompanied by an imperative rule trying(goal) -> trying(subgoal), or something to that effect.]

Top-level goals should have basic utility, which cannot be changed except by us. They may also have expected value, which is the sum of the basic utility and the additional utility caused by the future consequences of achieving that goal. Subgoals and actions will generally just have expected value. Basic utility can be thought of as the declarative, decision-theoretic measure; expected value, on the other hand, serves the purpose of our imperative measure. This is because expected value is a local measure which can be used immediately to make decisions. Essentially, the imperative program we follow emerges from our imperative rules via the rule of always taking that action which possesses the highest imperative value.

Fleshing out this solution will draw on ideas from reinforcement learning, and also may draw on ideas from influence diagrams (the Bayes Net answer to utility reasoning), causal bayesian reasoning (though causal decision theory solves less problems than updateless decision theory!), and other areas.

[Edit: the declarative/imperative distinction I am making is somewhat messy. I should instead distinguish 3 classes of information: information about probability, information about utility, and information about expected value. "Information" takes the form of both rules ("B->A") and naked statements ("A").]

5 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. Nice. Conditional actions "C->A" having separate expected values allow for hierarchical RL, intra-option learning, etc.

    ReplyDelete