YKY has suggested that the strategy will be as follows:
- 1. start with horn clauses, replacing classic conditional with probabilistic. This relatively simple to reason about, even including abduction and lifted reasoning, with a bayes-net construction algorithm that YKY calls "prolog trick".
- Add in negation. If we were dealing with classical conditional, we'd now have full FOL clauses-- so we've got what can be thought of as a probabilistic generalization of FOL clauses. The extra complication to the Bayesian networks is trivial (and doesn't need to fall back on negation-as-failure), but I want to examine some less obvious complications.
- Where the "prolog trick" uses FOL resolution (to minimally instantiate a rule so as to apply to the present case), put in HOL resolution so that we can start using HOL predicates and probabilistically generalizing over HOL variables. (The second one there is exciting.) This also seems simple, except for the bit about having to implement HOL unification. Again, I'll see about complications.
Complications with #2
Completeness for FOL and completeness for horn-clauses are, to an extent, different beasts. Horn-clause reasoning is complete with respect to the goal of establishing "concrete" facts (statements which do not include alternatives). FOL completeness, in contrast, wants to be able to reason about truth values for any statement that the system can express.
It isn't hard to apply FOL resolution to Horn clauses to deduce more rules which are consequences of the existing ones (in fact, it's quite easy). It's also not hard to allow conditional queries to the Bayesian networks which we'll be constructing, which would allow us to deduce new rules from existing information. My worry, though, is that we'll be able to have two rules whose information cannot be integrated by the system.
An example:
A1: P(C|A&B)=1
A2: P(~B|C)=1
We should be able to deduce P(~B|A)=1.
C1: P(A&B&C)=P(A&B) [from A1]
C2: P(B|C)=0 [from A2]
C3: P(B&C)=0 [mult. C2 by P(C)]
C4: P(A&B&C)=P(A|B&C)*P(B&C)=0 [decompose P(A&B&C) and then apply C3]
C5: P(A&B)=0 [from C1, C4]
C6: P(B|A)=0 [divide by P(A), assuming it's not 0]
C7: P(~B|A)=1
Let's call this "rule-level reasoning". YKY has already asserted (private email) that this sort of reasoning is needed for abduction, although I disagree with him there: I think abduction should be handled by the standard Bayes Net belief propagation, rather than by reversing the direction of a rule via probabilistic reasoning. This means that I think, when we do "prolog trick" construction of the baye's net, we don't just look for rules which have consequences which are nodes in our current network; we also look for rules whose premises are in our current network. When we glue those rules onto the network, the consequences are added as uncertain variables, which means we now try to determine their truth. If we find any evidence one way or the other, it gets added to the network, and the network starts doing abductive reasoning automatically via belief propagation.
My impression is that YKY prefers the idea of looking for rule candidates in basically the same way, but before adding the rule to the network, use rule-level reasoning to change its direction so that all reasoning is deduction. I don't see the point of this, but I do agree that rule-level reasoning is needed when we can't paste a rule into the network without creating a directed cycle.
How should rule-level reasoning be implemented? I think the right way of doing it is just to implement the possibility of conditional queries (a standard belief propagation task, which would complicate "prolog trick" network construction just in that we'd be looking for links between the query variables). When we want to perform rule-level reasoning, we just make a query that is in the form of the rule we want to construct. The resulting probability is the conditional probability for the rule! Easy as pie, and we don't have to worry about all the probability algebra stuff I did in the explicit derivation above. (This gets approximated by belief propagation.)
This does not quite tidy up the completeness concerns. Essentially, I want to know that the reasoning is (an approximation of) complete reasoning given the Kolmogorov axioms and the semantics for our probabilistic generalizations, ie, "the probability of a statement with unbound variables is the probability that it will be true under a random instantiation."
Bayes net reasoning (belief prop) is approximately complete for propositional Horn-with-negation networks so long as the rules form no directed cycles. (It is precisely complete if there are no cycles at all. Actually, it's also been proven to be precise for networks with single cycles. The precise conditions for belief prop giving exact answers are unknown.) With rule-level reasoning, it should be complete for rule-sets with (correctable) directed cycles as well. If that were established rigorously (showing for example that this sort of reasoning will not accidentally use a rule twice in different forms and get the probabilities wrong as a result), then what would remain would be precisely the question of whether our reasoning system is complete over our definition of probabilistic generalizations.
An approximate reasoning method that would be "obviously complete" would be to actually take random instantiations and use the frequency of their truth (ie, their average probability) to decide the probability of the generalization. YKY has proposed essentially this technique for the estimation of a generalization's probability. However, we also want to do
lifted estimation: deduce the probability of a predicate as a marginal probability in the lifted network constructed by instantiating rules only enough to apply to the predicate. This lifted estimation is certainly correct under the assumption that there is no other relevant information, but certainly not correct for all cases. At present we have no formalism for combining the lifted reasoning with random sampling or other sources of information.
We need one.
(Tackling this will probably be the topic of my next post.)
Moving on to HOL...
Complications with #3
My reservations here arise essentially from my desire to take the "descent" approach I mentioned in my last post: start with system IG, an untyped HOL, and create a "prolog trick" reasoning system compatible with that. YKY's approach is an "ascent" path, which catches me off-guard as to how we should approach soundness and completeness.
HOL is never classically complete, but it can be complete in various weaker senses. A popular option is to use Henkin semantics rather than classical semantics. IG is known to be sound and complete w.r.t. its FOL subset, which is not quite as good; as far as I know, it's Henkin soundness and completeness is an open question. Another appealing approach would be to show soundness and completeness with respect to an interpretation of ZFC, typed HOL, or another standard system. We could investigate other untyped HOLs to find systems complete in stronger senses if we wanted.
That's all for today. :)