Saturday, July 31, 2010

Current Design

We've reached a point where we're worrying about implementation more than design for a bit, so I thought it might be good to record my understanding of the design being implemented.

Language:

statement := head <- body truthvalue
head := atomicstatement
body := [atomicstatement { & [~] atomicstatement }*]
atomicstatement := predicate(term{,term}*)
term := constant | variable | function(term{,term}*)
truthvalue := (probability, confidence)
probability is in range [0,1]
confidence is in range [0,1]

IE, a statement has a conclusion (which is an atomic statement) in the head, and a possibly empty list of possibly negated atomic statements which are conditions, in its body. It records a probability for the conclusion given the conditions, including a (NARS-style) confidence for that conclusion. (If the list is empty, then we are just asserting something about the head's probability; if it's nonempty, we're asserting a conditional probability.)

Functions may be Skolem functions, which is how the system expresses existential statements. Universal statements are expressed by leaving variables unbound; in this case, the probability associated with the statement is intended to be the probability that a randomly chosen instantiation is true.

A simple query for the truth value of an atomic statement proceeds in roughly the following way. We are trying to construct a bayesian network to answer the query. The query statement is the first node, and each new node is another atomic statement which is deemed relevant; the links in the network come from the conditional rules. At each step, we look for new statements which "match," meaning that they have an atomic formula whose free variables can be instantiated to match the appearance of one of the current nodes in the bayesian network.

When a new matching statement is found, it is instantiated and added to the growing network. In the case of statements with empty condition-lists, matching provides a probability estimate for one of the nodes, which allows belief propagation to occur. When two different rules match at their head, however, (conditions or no conditions) we cannot simply add them to the network; we have a rule combination problem (my prev. post).

The procedure for dealing with rule combination problems that we've decided on at present is as follows. (Rules given for probability only; deciding how to propagate confidence is not hard, but we haven't done it yet.)

  1. If the two statements have conditions which are mutually exclusive, we can just add the two resulting probabilities together. With the simple statement format given above, this is easy to test for: when one rule has A in its condition list and the other has ~A, the two are mutually exclusive.
  2. Failing this, we check for primacy of one rule over another. This is equivalent to my notion of "derived rule" in my post on the combination problem, but we'll be storing it as causal information (YKY has convinced me that causal information really does encompass derivative-ness information). The general idea is to take the more causally primal rule whenever we know that one is more primal.
  3. Finally, if neither of the first two conditions hold, we do rule averaging. My reason for advocating noisy-or was that it allowed logical reasoning, but the present formalism allows logical reasoning with simple averaging, so long as we take confidence into account: a rule conveys 0 confidence when its conditions don't match, so it does not weaken other rules when it averages with it.
Additional random topics.

The weights for rules can be adjusted on-line by counting instances where the conditions are met, and incrementing the confidence based on the probability that the conditions are indeed satisfied in the instance. This rule could (should) be added to the above query-answering algorithm, but it means we are no longer simply building a bayesian network.

The design right now makes totally ubiquitous independence assumptions; it would be interesting to find a way to dynamically relax these assumptions when we have more processing power. It's very expensive to relax too many, but it might be worthwhile to relax a few (and maybe save the results in a way that avoids re-doing the expensive computations).

No comments:

Post a Comment