## Thursday, June 24, 2010

### Probabilistic HOL

Where We Are
The formalism we will be using for the horn-clause subset of first-order probabilistic reasoning will be something close to the following. (Propositions may end up being either crisp or fuzzy or both in the initial version. For stating crisp propositions, we have predicates which are boolean predicates, and boolean truth functions (at least and) to make sentences. For stating fuzzy propositions, we have continuous functions which are predicates, and fuzzy truth functions (at least and) for combining them. The choice between the two will not matter in what follows.)
1. Predicates, or more generally sentences with free variables, can have probabilities associated with them. These represent the probability that they will be true on random arguments (random instantiations of their variables). I call these "unconditional probabilities" since the system will also store conditional probabilities. They are also called "marginal probabilities." We can specify ahead of time this prior probability for some predicates, and explicitly conditional probabilities, elsewhere. Together, these probabilities can be thought of as the probabilistic version of universal quantification that will be employed.
2. Existential quantification could be represented by allowing statements to the effect that the probability of a predicate is nonzero; however, we will be using skolem functions in any case, and it seems (to me) more reasonable to always represent existential quantification with skolem functions. (Eliminating zero from the range does not eliminate places arbitrarily close to zero, so the meaning of such a statement is unclear to me. Perhaps existential statements could be generalized to say "the probability is not below x" for typically-small x, but that introduces interval probabilities.)
3. Conditional probabilities will be employed, which will allow the construction of Bayesian networks (along with the unconditional probabilities, which get used for the initial nodes in the network). Predicates, variables, constants, skolem functions, &, and a conditional put together give the capability to express Horn clauses. Conditional assertions play the role of rules, and unconditional assertions (which can be seen as just assertions with empty conditions) play the role of ground information which the rules act upon; note, however, that since unconditional assertions can still have variables, lifted inference (inference which reaches general conclusions rather than only ground conclusions) can take place via the same algorithm that would do the ground inference. Lifted inference can be approximated to get the marginal probabilities for predicates and partial instantiations of predicates which we might otherwise not have marginal probabilities for (ie, when we did ground reasoning later we'd be forced to instantiate more bayes net than we might want to for efficiency, because some predicate needs more information to generate a probability estimate for it).
Where We May Go:
We will eventually want to move on to full first-order logic (FOL) and higher-order logic (HOL). These are some speculations on how.

1. In the resulting system, skolem functions and skolem constants are treated as always existing, but not necessarily meaningful: they may not be equal to any entities in the domain that we are actually interested in, ie, they may have an "undefined" value. (This happens when they exist in an uncertain generalization; if the generalization turns out to be false in some particular case, the skolem-function entities are "stranded" and do not take place in further meaningful reasoning.) This is consistent with  the HOL approach we want to take, ie, resolving paradoxes by accepting that some expressions have an undefined value.
2. For FOL, it is sufficient to generalize the above treatment from Horn clauses to general clauses.  If we add negation to the mix, then we've got general FOL clauses on our hands. This is the case even if we give up conditionals, though, which may indicate that we're adding too much power; we want most of the reasoning to stay within probabilistic conditionals so that it can be performed by Bayes nets. Restricting this, we can allow negated predicates but not arbitrary negations of larger compounds. This still corresponds to general clauses in the crisp (p=1) case, and Bayesian networks can easily handle negation of individual variables. However, it becomes unclear that the same reasoning method is complete for the expanded logic. So, I need to think about this more, as well as the several other ways to generalize away from strict Horn clauses.
3. Right now, I am trying to come up with a higher-order logic without going into higher-order probabilities. (2-level PZ is not what I'm worried about; I'm worried about unbounded levels.) Essentially, what's desired is probabilistic reasoning about HOL. This means that although we may speak of a HOL statement having a probability as a value, it does not have that "value" in the HOL sense. Otherwise, the existence of higher-order functions would practically force us to encode higher-order probabilities. Maybe this is the right way to go in the long run, but for this round I'm trying to come up with a relatively simple system...
4. The "ascent" strategy for designing the HOL is to first decide on the best way to generalize the present formalism to FOL, and then perhaps start considering SOL, TOL, et cetera until the pattern is clear enough to generalize to HOL. The "descent" strategy is to skip even the full FOL step and instead just try to figure out how to apply lifted bayesian network style reasoning to HOL directly. My approach is somewhere in between, trying to think about both in parallel so that I might apply constraints from the top to the bottom, and from the bottom to the top, until both meet.
5. One "descent" constraint is the idea that expressions do not always have truth values as their values. Predicates are just functions that map to truth values, and there are other types of functions. Bayesian networks can certainly deal with this! It might be good to use type reasoning to determine the sets of possible values for the bayes-net variables. (Even untyped HOL does type reasoning, after all.) This is not a difficult one.
Lots more to think about... but I might as well post this now.

1. 0. We need 2 terms to distinguish between [un]conditionedal statements (whether they contain a Bayesian link or not) and open/ground statements (whether they contain free variables or not). All 4 combinations are possible: conditional open, conditional ground, unconditioned open, unconditioned ground.

1. IMO there is no need to generalize from Horn to full FOL -- assuming that we take the route "logic on top of uncertainty", then there exists an informal correspondence between Horn formulas and Bayes-net links; and since BNs cover everything probabilistic, "Horn" is all we need. The classical implication operator A -> B = (~A \/ B) is gone for good.

2. We can, and will have negation. Negation can exist perfectly well with Horn. What is ditched is the relation "A -> B = (~A \/ B)".

So far, the system we have is "FO-BN" (first-order BN), except that it does not have classical-FOL-style implication.

3. If we add higher-order unification to "FO-BN", we can get "HO-BN". I think HO-BN should be MORE than sufficient for bootstrapping AGI.

Notice, in saying "HO unification", we're already assuming type theory. I don't know of any unification algorithm that works for untyped lambda calculus.

I don't like type theory either; if it can be avoided I would avoid it.

4. I'm wondering how HOL Light deals with HO unification -- it doesn't seem to contain Huet's algorithm in the code. This is interesting...

5. Re Curry's paradox: I'm inclined to just ignore it in the prototype. I'm guessing it will not lead to EFSQ (ex falso sequitur quodlibet) unless we introduce self-referential statements into the query or KB. And if the user does that, I think it's OK for Genifer to break down -- kind of a radical position, I know. But an advantage is that we can avoid type theory.

7. Re the use of HO functions to represent predicates. The semantics have changed a bit with the introduction of BNs: a predicate does not merely return a TV. A statement "A -> B" get translated to a BN link, *outside* of the logic. In other words, this translation occurs on a layer below the logic and hence invisible to the logic.

We may need a way to tell whether a lambda expression is a "proposition", during the generation of BNs. But it can be a simple "sort" as in sorted logic, while a full-blown type theory can be avoided.

2. YKY,

0. Sure.

1. Interesting point! By the way, that reminds me. If we've got A->B = .5 and C->B = .75, what combined BN do we construct? What combination rule do you you prefer?

2. Well, strictly speaking, Horn rejects negation; but yea, there does not seem to be a problem with adding it to our formalism... however, I'm not *totally* confident of the properties of that system. Perhaps my next post will be an examination of that concern.

3. Let's call them "FOB" and "HOB" if you're ok with that. :) I see what you're saying... you'd just use HOB as a way of specifying more abstract rules, which would need more sophisticated instantiations to apply. Certainly seems ok. However, again there is the concern with respect to completeness. Essentially, it completely rejects the constraints from my "descent" strategy, which means I have trouble comparing it to existing HOL systems to determine logical properties. I want to be able to show that the system emulates some existing HOL. Specifically, IG which Luke Palmer pointed us to. (IG is the IXI's more-powerful cousin.)

The way IG, and most untyped HOL, work is that they reason *about* types. The logic is untyped, but in order to perform certain reasoning steps, you've got to prove that the statement you're working with *does* have a defined truth value, which mostly means proving that it's well-typed. (The "mostly" is a Very Good Thing, in both directions: we can use existing type-checking algorithms to automate this, but we're not ever restricted to them.) My suggestion with using the BN to reason about function values rather than just truth values would take advantage of this.

4. Hmm... perhaps it's up to a tactic to implement that sort of thing?

5. I think the right way to go is system IG. However, your idea might be a workable initial version... perhaps the answer to my (Henkin-)completeness concerns at that stage will be just "We don't want to be complete just yet, because if we were complete we'd be inconsistent!"

7. Well, it'd be possible to just use "sorts," but at that point you're crippling lambda calculus almost beyond recognition: you can't use any higher-order functions without needing to introduce type reasoning. Better to just not verify at all initially... let the User Beware, because wrong rules will lead to wrong results.

3. 1. We can use a number of ways to create the CPT (conditional probability table). Also, each variable can carry >1 parameters. I think this is a relatively minor detail... your suggestion at that time was OK.

2. Maybe in binary logic, negation does not live well with Horn. But we're just using Horn-like ("Horny"?)

I am quite confident that this formulation is sound -- we're just basing everything on [propositional] BNs. This approach has been used by several probabilistic logics by academic researchers.

3. OK, I can see your concern. But remember, we're using the "logic on top of uncertainty" approach (I've made another blog post to justify it). In this approach, you cannot expect HOL to directly evaluate to truth values. HOL will evaluate to BNs, and the BNs provide the semantics. I think Palmer's IG will still be of value, as it deals with Curry's paradox, which is an issue orthogonal to the issue of semantics.

7. In *untyped* lambda calculus, do we still need types in order to use "higher-order functions"? I thought we could use arbitrary-order functions in untyped =)

4. 7. Hmm... Perhaps I was interpreting you as wanting to apply sorts more widely than would be necessary... actually, why do we need them at all? If something is not a "proposition" during BN construction, the user made a mistake... the current plan is to just let paradox unfold until we can handle it better, right?

Barring that, I do not really know your intention behind the term "sorts"... so long as all the variables are not totally instantiated yet, it is possible that complicated type reasoning will be needed to verify that something will evaluate to a truth value. Right? On the other hand, we can just go untyped and allow anything, but verify nothing (until sufficient instantiation of course).

5. > the current plan is to just let paradox unfold until we can handle
> it better, right?"

Yes.

What I mean by "sorts" is that in untyped lambda calculus, all terms are just terms and have no type. So, a term can be either a proposition (eg male(john)) or not a proposition (eg sine(3.14159)). So, I can say the first term is of the proposition sort while the second term is of the non-proposition sort.

I don't know how these 2 sorts can be determined, though.

In my mind, I'm guessing the Curry Paradox can be resolved by meta-reasoning, which is akin to type inference anyway.

One more point: The formulas in our FO/HO KB do *not* evaluate to truth values. They are used to construct BNs, and then the BNs provide the semantics (ie, truth values). Maybe this is a misunderstanding that blocks you from grasping this approach...?