Geniferology: P = NP, my first stab at the problem

[ This idea is inspired by Karmarkar's polynomial-time interior point method for linear programming. But it turned out to be a dead end, at least from my current perspective. Anyway, I leave the record here for future's sake and it might illuminate other people. ]

In mathematics, there is a long tradition of using "smooth" things to calculate "discrete, combinatorial" things. A famous early example is Euler's use of Newton/Leibniz's integral to approximate infinite series of the form $ \sum^n_{i=1} f(i) $, also known as Euler-Maclaurin summation [cf the book "Mathematical Masterpieces", Knoebel et al 2007]. A more recent example is Alexander Grothendieck's invention of schemes that builds a bridge between the (smooth) algebraic geometry and (discrete) Diophantine geometry.

So my hunch is to "move the SAT problem to a smooth setting and then apply smooth techniques". In fact, that was exactly what happened for the linear programming problem -- the simplex algorithm is discrete in the sense that it traverses the vertex points on the boundary of the feasible solution space, such a traversal tends to be exponential because the number of vertices grows exponentially as the number of equations. Whereas, the "interior point" methods avoid the boundary and vertices and instead manipulate "ellipsoids" that bound the feasible space.

Using an analogy, imagine a hotel with an exponential number of rooms, where some of the rooms contain the prize, but we have only a polynomial amount of time to check the rooms. On the surface it seems impossible; that is why P=NP is hard. But it is only the size of the ambient space that is exponential; the targets are finitely generated from a description of length N (= the input size); so there is hope.

Now add to the mix the inspiration from "interior point" methods: our algorithm should avoid testing the hotel rooms -- because as soon as we test the first room we may get trapped into testing an exponential number of rooms, just as in linear programming we have to avoid "touching" the boundary. Perhaps we should iterate through the input constraints, updating a certain "data structure" (which could be any mathematical structure in a broad sense), and such a structure would asymptotically approach the feasible solution(s).

The final shape of this data structure cannot contain the entire solution set, since it has an exponential number of vertices, and there isn't enough time to update that many vertices before the answer should be out. It cannot even contain a subset of those vertices, because then some of the vertices have to be eliminated in the process, thus making our function discontinuous. (Topologically, continuous means that the pre-image of every open set remains open).

Our function cannot output [0,1] because the answer may jump between 0 and 1, again discontinuous.

Notice that SAT is equivalent to deciding whether a set of polynomials have solutions in the reals ("existential theory of the reals"). Currently the best algorithm for this is singly exponential [cf: "Algorithms in real algebraic geometry", Basu et al 2006, ch.13].

But our function cannot output the volume of the set of real solutions, as the volume depends on the entirety of the boundary, which we want to avoid. The Oleinik-Petrovsky-Thom-Milnor theorem [cf Basu et al 2006, p.4] set a bound on the sum of Betti numbers of an algebraic set, which is polynomial in the degree and exponential in the number of variables. The first Betti number is the number of connected components of the set. This seems to confirm my intuition that the boundary is of "exponential" complexity and thus should be avoided.

After eliminating these options, there may still be a viable path which is to find the "minimal" real solution in the algebraic set. That means we take only the real component of z and require that its Euclidean norm, $||Re z||_2$, be smallest. A slight glitch is that the real part of a complex number is given by $Re z = z + z^*$ where $z^*$ denotes complex conjugation, which cannot be expressed in a polynomial formula.

But I realized that a fatal problem of this approach is that we still have not avoided testing an exponential number of points. This can be illustrated by this picture:

The regions are defined by a set of $k$ polynomials of maximum degree $m$, in $n$ variables. We need to find the common intersections (if any exists) of all $k$ polynomials. Figuratively speaking (as this diagram is not realistic), the potential number of intersections is related to the number of "peaks", ie the degree of the polynomials.

If we must test each "peak" against each other, for all $k$ polynomials, the maximal number of tests would be $m^k = O(m^n)$, ie, exponential in the number of variables.

To put it succinctly, the problem is: When we test a peak at a certain location, our effort spent at that location does not contribute to testing the peaks at other locations, and we have an exponential number of peak-to-peak combinations to be tested.

This is the general feeling one has when tackling an NP-hard problem.

So, the critical insight that may ultimately solve NP is that we should not waste time testing local cases, but we should exploit the fact that the localities are somehow correlated to each other.

It is well known that 2-SAT is in P and 3-SAT is NP-complete. In other words, when the degree of the polynomials gets to 3, we get NP-hardness. Any degree-higher-than-3 problems can be reduced to degree-3 instances, so we only need to consider cubic surfaces. That would be my next goal.... :)

Blog Archive

Latex Maths

Friday, May 23, 2014

P = NP, my first stab at the problem

No comments:

Post a Comment