Latex Maths

Showing posts with label HOL. Show all posts
Showing posts with label HOL. Show all posts

Thursday, August 5, 2010

My Current HOL Algorithm

The following is equivalent in strength to HOL Light. Continues this post.

  • As an extension of the notion of term (defined here), add the notation of polymorphically typed lambda calculus, an equality predicate, and the choice operator (I'll call it epsilon-- it's a function that takes predicates and forms a term that represents an arbitrary satisfier of the predicate, if one exists; otherwise, it just represents an arbitrary term).
  • Take terms with boolean type as a drop-in replacement for atomic statements in that design. (It may seem strange to use them to replace atomic statements, since they are compound statements, but bear with me. Rules will correspond to valid sequents, rather than to compound statements in the logic.)
  • Use lambda-calculus unification as a drop-in replacement for first-order unification. This includes normalization; we could either store the normal forms of statements in the KB or let the unification algorithm normalize on the spot, depending on whether space efficiency of the KB or time efficiency of the unifier is a bigger concern.
  • Implement equational reasoning. Fundamentally, we treat an equality statement which has a given truth value, T, as if it represented two rules with that truth value: one justifying inferences in each direction, substituting one term for another. These rules cannot be stated as FOL rules, however, since they represent substitution of terms within larger statements. Nonetheless, the same basic propagation scheme for the truth values will work. In addition to the 2 pseudo-rules for each equality statement, we treat the system as knowing A=A for all terms A. Knuth-Bendix completion should be implemented eventually, which allows us to ignore one of the two inference directions for each equality statement without losing anything. Ultimately, I believe this means equational reasoning can be implemented as an extension to the normalization step of the unification algorithm (but I could be wrong here).
  • Finally, we also need one hypothetical reasoning rule: if assuming A implies B with truth calue TV1, and assuming B implies A with truth value TV2, then A=B with truth value (& TV1 TV2) for an appropriate &-formula. Rather than implementing this as one complicated hypothetical reasoning step, it seems to me that the most convenient way would be to implement rule-level reasoning in general (ie, a query can be a rule, rules can be combined with other rules to produce new rules, etc) and then add this as a way of recording high-probability mutual implication.
That's all, folks!

PS:

In addition to the above reasoning system, for HOL Light, we also need the axiom of choice and the axiom of infinity. (The axiom of extensionality is covered by normalization.) Both of these axioms are very much "mathematical" in nature, and it isn't clear whether they are needed for more common-sense reasoning. (My suspicion is that they would emerge in other forms, ie, in coding the KB we'd encode something like these plus much more.)

Sunday, July 11, 2010

Eliminating my HOL "Meta-Rules"

HOL systems, as I've mentioned before, are not usually specified just as inference rules on statements. Instead, they are specified indirectly, as inference rules on sequents. A sequent is a set of premises and a conclusion, representing a valid argument in the logic. (This is represented symbolically in the form A, B, C,... |- D where A, B, C,... is the set of premises and D is the conclusion. The |- symbol is called the turnstyle.) So, specifying a system via a sequent calculus is a slightly curious practice: rather than just specifying inference rules directly, we specify a system of reasoning which comes up with valid inference rules. Historically, this move to the meta-level is what led to the field we call "proof theory".

My previous approach was to preserve this multi-level structure, calling the sequents "rules" and the sequent inference rules "meta-rules" (since they are rules for creating rules). However, it would be more efficient to eliminate the extra level if possible.

In general, this may be a complicated process. Some sequent derivation rules have special syntactical restrictions, such as "... X and Y share no free variables ...". Also consider sequent rules like "If A,B |- C, then A |- (B->C)". It isn't obvious that there is a simple way to turn a specification of a system in terms of sequent derivation rules into a direct specification consisting of inference rules on statements.

Typical systems, though, will obey a few nice properties. First, adding more premises to a valid sequent will always result in a valid sequent. Furthermore, many sequent derivation rules just propagate along the premises from the sequents they start with to the sequents they produce. For sequents which do that, the job is easy. Consider an example:

"From A, B, C... |- P and D, E, F... |- Q conclude A, B, C... D, E, F... |- (P&Q)"


In this case, we'd just translate the whole thing to the rule "From P and Q conclude (P&Q)". These sorts can be eliminated from the meta-level immediately, which is good (every little bit helps).

Unfortunately, not all cases are so easy. Many inference rules will almost pass along the premises untouched, but make some little modifications to the list. These modifications will usually be to remove specific premises, occasionally to add premises, and quite rarely to modify the internals of premises. The example above, "If A,B |- C, then A |- (B->C)", is one example of this: one premise is eliminated in the process of drawing the conclusion. Getting rid of meta-rules of this sort requires cleverness.

First-order natural deduction systems handle that particular example in a nice way: at any time during reasoning, we may make a new assumption. This can be anything. We keep the list of assumptions that have been made. Now, say we just concluded something, S. One deduction move we can now make is to remove an assumption from the list (call it A), and stick it on the beginning of S with an arrow: A->S is our new conclusion, which is true in the new context (ie, the old context minus one assumption).

This general idea could be extended to any rules which add or eliminate premises: they are seen as just modifying the list of assumptions.

A practical implementation of this sort of idea for Genifer might be to engage in hypothetical reasoning via conditional queries: we make a query with a list of assumptions attached, to be used in addition to the actual facts. However, some issues may come up with respect to the difference between the classical conditional and the probabilistic conditional. I need to consider this more. (The approach is similar to my "rule-level reasoning", but rather than reasoning about rules using conditional queries, we reason about statements...)

This leaves rules which modify the sequents internally. The primary example of this is variable instantiation, but that's handled by unification! So, in summary, it appears to me that my "meta-level" is not needed in Genifer and can be eliminated with a little patience...

All of the above discussion was done with a translation of HOL Light in mind, but could apply to IG or an IG variant equally well.

Monday, July 5, 2010

HOL update

YKY doesn't want to focus on HOL at the moment, but he suggested I write down my thoughts anyway. I was going to put it off since I'd really want to do a more detailed write-up taking care of all the loose ends, but since there is some interest from Russel, I'm outlining my approach now.

The approach is to translate system IG from the paper Illative Combinatory Logic. This is an untyped lambda-calculus logic which soundly & completely interprets first-order logic both "directly" (ie, as truth functions) and via curry-howard isomorphism (which indicates the strength of the type reasoning the system can interpret).

The way I'm planning on translating it is to treat a statement and its negation as essentially separate statements in the probability distribution; ie, the Liar sentence can have probability 0, but its negation can simultaneously have probability 0. Only when we know a statement is a proposition can we derive that P(prop) = 1 - P(~prop).

After this, what we do is just encode the valid sequents as rules in our existing logic. However, IG has an infinite number of valid sequents, specified with some sequent derivation rules. This means that we've got an infinite number of rules now-- we need an efficient mechanism for constructing valid rules as they are needed. The sequent derivation rules get turned into "meta-rules" in our logic. This is a bit complicated, but better than the alternatives (see my previous post).

One complication is that sequent-based need to be combined with something like the noisy-or combination rule [I need to add a reference...]. At present we're planning on dealing with this by having noisy-or be the default combination rule in the whole system; there are some cases where it is a terrible rule, though. YKY thinks we can determine which cases to apply it to using causal reasoning.

For efficient reasoning, we'd like to normalize terms before reasoning so that we can quickly recognize equality. However, in untyped HOL, some terms have nonterminating normalizations. There are a few things we can do. One is to just have a time cutoff on normalization attempts. Another is to try and do some type reasoning before normalization (so we'd only normalize when we could first prove termination). A third option that might have some interesting properties is what I'm calling "compressive normalization"... rather than the usual normalization, use the shortest form of a term as the normal form. This is non-terminating and we'd still need a time cutoff, but unlike the usual normal form, it exists even for terms which aren't propositions-- it just always exists. It might be useful to keep short forms of terms around for compression purposes.

Speaking of compression; integrating both HOL and likelihood reasoning into Gen means that might be feasible to specify a prior probability distribution over models within Gen itself, meaning iduction just becomes probabilistic deduction. However, I have not worked out details here.

That's all for today! :)

Thursday, July 1, 2010

HOL, Paradox Free: a plan of attack, or two.

Looking at the Illative Combinatory Logic paper today, and thinking about paradox-free HOL.

There are plenty of paradox-free HOLs out there. The question is, how do we translate them into our system? Most HOLs seem to be described in sequent calculus. This means the system has an extra meta-level that ours lacks: it's not really specified as a system for reasoning about HOL statements; rather, it's specified as a system for reasoning about HOL proofs. This means there are at least 3 levels at which the translation can be made: (H) we could try to use the sequent derivation rules as rules in our system, (M) we could use the sequents themselves as rules in our system, and (L) we could use the conditional statements of the HOL as our rules. (I'm labeling these as High, Medium, and Low, in case you're wondering what the letters stand for.)

H might be easiest in that it would only require a finite number of rules to be added to the system. However, it would mean making room for sequents as a type of statement in the system, which could be awkward. We would still want to be able to make basic statements, ie, statements about actual things rather than just about what follows from what... so we would need to also add rules for using sequents to derive base facts from one another. On top of all this, we need to worry about how the probabilistic reasoning is done. It seems that the sequent reasoning would end up being crisp, which means the probabilistic reasoning would only happen when we applied sequents to uncertain base-statements. Again, this seems awkward.

M would require an infinite number of rules, meaning that we'd have to implement rule-level reasoning to derive new rules as they are needed. I discussed rule-level reasoning before; the additional sequent-derivation rules would have to be integrated with my earlier thoughts. For the most part, I think this would mean restricting my earlier suggestion to make sure that it's consistent for the same reasons the sequent rules are consistent. Figuring out the maximum level of rule-level reasoning I can preserve may be difficult. Figuring out the minimum level that's sufficient probably won't be, though! What would be somewhat difficult would be generalizing the backwards-chaining reasoner to create new rules for itself to backwards-chain on (in a sorta meta-backwards-chaining way).

L would require a doubly infinite number of rules (so to speak). The approach would again require rule-level reasoning, but this time rather than implementing a finite number of meta-rules (based on the rules for deriving new sequents) we'd appear to need an infinite number, specified by a finite number of meta-meta rules. This is obviously too much; so, we'd have to pull back and instead figure out a system which satisfied the constraints laid down by the sequent rules. In other words, we treat the sequent-calculus description of the logic as a specification of target features rather than a description of the deduction rules of a system. This is obviously a more difficult approach... an interesting challenge, but more difficult.

Also worth a mention: a tempting, but probably incorrect direction: merge the levels so that we don't have to deal with so many. We'd start allowing conditionals to compose with other language elements, like in regular FOL, so that we can have stuff like A->(B->C). We'd then try to use regular reasoning to implement all the different levels at once. This sounds powerful, but it would probably just create an inconsistent system, defeating the point.




Conclusion

Approach M seems to be the best plan, so I will get to work on that!

Saturday, June 26, 2010

Careful Ascent

YKY has suggested that the strategy will be as follows:

  1. 1. start with horn clauses, replacing classic conditional with probabilistic. This relatively simple to reason about, even including abduction and lifted reasoning, with a bayes-net construction algorithm that YKY calls "prolog trick".
  2. Add in negation. If we were dealing with classical conditional, we'd now have full FOL clauses-- so we've got what can be thought of as a probabilistic generalization of FOL clauses. The extra complication to the Bayesian networks is trivial (and doesn't need to fall back on negation-as-failure), but I want to examine some less obvious complications.
  3. Where the "prolog trick" uses FOL resolution (to minimally instantiate a rule so as to apply to the present case), put in HOL resolution so that we can start using HOL predicates and probabilistically generalizing over HOL variables. (The second one there is exciting.) This also seems simple, except for the bit about having to implement HOL unification. Again, I'll see about complications.
Complications with #2

Completeness for FOL and completeness for horn-clauses are, to an extent, different beasts. Horn-clause reasoning is complete with respect to the goal of establishing "concrete" facts (statements which do not include alternatives). FOL completeness, in contrast, wants to be able to reason about truth values for any statement that the system can express.

It isn't hard to apply FOL resolution to Horn clauses to deduce more rules which are consequences of the existing ones (in fact, it's quite easy). It's also not hard to allow conditional queries to the Bayesian networks which we'll be constructing, which would allow us to deduce new rules from existing information.  My worry, though, is that we'll be able to have two rules whose information cannot be integrated by the system.

An example:

A1: P(C|A&B)=1
A2: P(~B|C)=1

We should be able to deduce P(~B|A)=1.

C1: P(A&B&C)=P(A&B) [from A1]
C2: P(B|C)=0 [from A2]
C3: P(B&C)=0 [mult. C2 by P(C)]
C4: P(A&B&C)=P(A|B&C)*P(B&C)=0 [decompose P(A&B&C) and then apply C3]
C5: P(A&B)=0 [from C1, C4]
C6: P(B|A)=0 [divide by P(A), assuming it's not 0]
C7: P(~B|A)=1

Let's call this "rule-level reasoning". YKY has already asserted (private email) that this sort of reasoning is needed for abduction, although I disagree with him there: I think abduction should be handled by the standard Bayes Net belief propagation, rather than by reversing the direction of a rule via probabilistic reasoning. This means that I think, when we do "prolog trick" construction of the baye's net, we don't just look for rules which have consequences which are nodes in our current network; we also look for rules whose premises are in our current network. When we glue those rules onto the network, the consequences are added as uncertain variables, which means we now try to determine their truth. If we find any evidence one way or the other, it gets added to the network, and the network starts doing abductive reasoning automatically via belief propagation.

My impression is that YKY prefers the idea of looking for rule candidates in basically the same way, but before adding the rule to the network, use rule-level reasoning to change its direction so that all reasoning is deduction. I don't see the point of this, but I do agree that rule-level reasoning is needed when we can't paste a rule into the network without creating a directed cycle.

How should rule-level reasoning be implemented? I think the right way of doing it is just to implement the possibility of conditional queries (a standard belief propagation task, which would complicate "prolog trick" network construction just in that we'd be looking for links between the query variables). When we want to perform rule-level reasoning, we just make a query that is in the form of the rule we want to construct. The resulting probability is the conditional probability for the rule! Easy as pie, and we don't have to worry about all the probability algebra stuff I did in the explicit derivation above. (This gets approximated by belief propagation.)

This does not quite tidy up the completeness concerns. Essentially, I want to know that the reasoning is (an approximation of) complete reasoning given the Kolmogorov axioms and the semantics for our probabilistic generalizations, ie, "the probability of a statement with unbound variables is the probability that it will be true under a random instantiation."

Bayes net reasoning (belief prop) is approximately complete for propositional Horn-with-negation networks so long as the rules form no directed cycles. (It is precisely complete if there are no cycles at all. Actually, it's also been proven to be precise for networks with single cycles. The precise conditions for belief prop giving exact answers are unknown.) With rule-level reasoning, it should be complete for rule-sets with (correctable) directed cycles as well. If that were established rigorously (showing for example that this sort of reasoning will not accidentally use a rule twice in different forms and get the probabilities wrong as a result), then what would remain would be precisely the question of whether our reasoning system is complete over our definition of probabilistic generalizations.

An approximate reasoning method that would be "obviously complete" would be to actually take random instantiations and use the frequency of their truth (ie, their average probability) to decide the probability of the generalization. YKY has proposed essentially this technique for the estimation of a generalization's probability. However, we also want to do lifted estimation: deduce the probability of a predicate as a marginal probability in the lifted network constructed by instantiating rules only enough to apply to the predicate. This lifted estimation is certainly correct under the assumption that there is no other relevant information, but certainly not correct for all cases. At present we have no formalism for combining the lifted reasoning with random sampling or other sources of information.

We need one.

(Tackling this will probably be the topic of my next post.)

Moving on to HOL...

Complications with #3

My reservations here arise essentially from my desire to take the "descent" approach I mentioned in my last post: start with system IG, an untyped HOL, and create a "prolog trick" reasoning system compatible with that. YKY's approach is an "ascent" path, which catches me off-guard as to how we should approach soundness and completeness.

HOL is never classically complete, but it can be complete in various weaker senses. A popular option is to use Henkin semantics rather than classical semantics. IG is known to be sound and complete w.r.t. its FOL subset, which is not quite as good; as far as I know, it's Henkin soundness and completeness is an open question. Another appealing approach would be to show soundness and completeness with respect to an interpretation of ZFC, typed HOL, or another standard system. We could investigate other untyped HOLs to find systems complete in stronger senses if we wanted.

That's all for today. :)

Thursday, June 24, 2010

Probabilistic HOL

Where We Are
The formalism we will be using for the horn-clause subset of first-order probabilistic reasoning will be something close to the following. (Propositions may end up being either crisp or fuzzy or both in the initial version. For stating crisp propositions, we have predicates which are boolean predicates, and boolean truth functions (at least and) to make sentences. For stating fuzzy propositions, we have continuous functions which are predicates, and fuzzy truth functions (at least and) for combining them. The choice between the two will not matter in what follows.)
  1. Predicates, or more generally sentences with free variables, can have probabilities associated with them. These represent the probability that they will be true on random arguments (random instantiations of their variables). I call these "unconditional probabilities" since the system will also store conditional probabilities. They are also called "marginal probabilities." We can specify ahead of time this prior probability for some predicates, and explicitly conditional probabilities, elsewhere. Together, these probabilities can be thought of as the probabilistic version of universal quantification that will be employed.
  2. Existential quantification could be represented by allowing statements to the effect that the probability of a predicate is nonzero; however, we will be using skolem functions in any case, and it seems (to me) more reasonable to always represent existential quantification with skolem functions. (Eliminating zero from the range does not eliminate places arbitrarily close to zero, so the meaning of such a statement is unclear to me. Perhaps existential statements could be generalized to say "the probability is not below x" for typically-small x, but that introduces interval probabilities.)
  3. Conditional probabilities will be employed, which will allow the construction of Bayesian networks (along with the unconditional probabilities, which get used for the initial nodes in the network). Predicates, variables, constants, skolem functions, &, and a conditional put together give the capability to express Horn clauses. Conditional assertions play the role of rules, and unconditional assertions (which can be seen as just assertions with empty conditions) play the role of ground information which the rules act upon; note, however, that since unconditional assertions can still have variables, lifted inference (inference which reaches general conclusions rather than only ground conclusions) can take place via the same algorithm that would do the ground inference. Lifted inference can be approximated to get the marginal probabilities for predicates and partial instantiations of predicates which we might otherwise not have marginal probabilities for (ie, when we did ground reasoning later we'd be forced to instantiate more bayes net than we might want to for efficiency, because some predicate needs more information to generate a probability estimate for it).
 Where We May Go:
We will eventually want to move on to full first-order logic (FOL) and higher-order logic (HOL). These are some speculations on how.

  1. In the resulting system, skolem functions and skolem constants are treated as always existing, but not necessarily meaningful: they may not be equal to any entities in the domain that we are actually interested in, ie, they may have an "undefined" value. (This happens when they exist in an uncertain generalization; if the generalization turns out to be false in some particular case, the skolem-function entities are "stranded" and do not take place in further meaningful reasoning.) This is consistent with  the HOL approach we want to take, ie, resolving paradoxes by accepting that some expressions have an undefined value.
  2. For FOL, it is sufficient to generalize the above treatment from Horn clauses to general clauses.  If we add negation to the mix, then we've got general FOL clauses on our hands. This is the case even if we give up conditionals, though, which may indicate that we're adding too much power; we want most of the reasoning to stay within probabilistic conditionals so that it can be performed by Bayes nets. Restricting this, we can allow negated predicates but not arbitrary negations of larger compounds. This still corresponds to general clauses in the crisp (p=1) case, and Bayesian networks can easily handle negation of individual variables. However, it becomes unclear that the same reasoning method is complete for the expanded logic. So, I need to think about this more, as well as the several other ways to generalize away from strict Horn clauses.
  3. Right now, I am trying to come up with a higher-order logic without going into higher-order probabilities. (2-level PZ is not what I'm worried about; I'm worried about unbounded levels.) Essentially, what's desired is probabilistic reasoning about HOL. This means that although we may speak of a HOL statement having a probability as a value, it does not have that "value" in the HOL sense. Otherwise, the existence of higher-order functions would practically force us to encode higher-order probabilities. Maybe this is the right way to go in the long run, but for this round I'm trying to come up with a relatively simple system...
  4. The "ascent" strategy for designing the HOL is to first decide on the best way to generalize the present formalism to FOL, and then perhaps start considering SOL, TOL, et cetera until the pattern is clear enough to generalize to HOL. The "descent" strategy is to skip even the full FOL step and instead just try to figure out how to apply lifted bayesian network style reasoning to HOL directly. My approach is somewhere in between, trying to think about both in parallel so that I might apply constraints from the top to the bottom, and from the bottom to the top, until both meet.
  5. One "descent" constraint is the idea that expressions do not always have truth values as their values. Predicates are just functions that map to truth values, and there are other types of functions. Bayesian networks can certainly deal with this! It might be good to use type reasoning to determine the sets of possible values for the bayes-net variables. (Even untyped HOL does type reasoning, after all.) This is not a difficult one.
Lots more to think about... but I might as well post this now.