## Computer vision

Some of you have told me that you're interested in computer vision, so I'll start with visual perception.The first thing to know is that the computer vision problem is theoretically already

__solved__. There is no mystery in machine vision:

When we look at a scene, what we have is a image which is a projection of the scene's objects onto a retina. The mathematics of such a projection is completely specified by

**projective geometry**, with some changes due to occlusion (blocking of light rays). The machine vision problem is to compute the inverse of the projection, ie, $f^{-1}$.

Of course you also need a way to represent physical objects in the scene. There are many data structures designed for that purpose, with

**Clifford algebra**(or geometric algebra) being a more mathematical technique.

So why is the vision problem still unsolved? I guess it's mainly because researchers try to solve the problem using relatively simple or "clever" mathematical tricks instead of really solving every step along the chain of $f^{-1}$. To do this would require a large scale project, probably beyond the reach of a single professor with a few grad students.

And if someone takes the trouble to solve vision completely, they might just as well tackle the problem of general intelligence (AGI)!

## 2 type of machine learning

All of machine learning is a

**search for hypotheses**in the**hypothesis space**. The resulting hypothesis may not be necessarily correct, but it is the*best*given the**data**and based on some**criteria**("scores").
There are 2 major paradigms of machine learning:

**logic-based learning**

-- hypotheses are logic formulas

-- search in the space of logic formulas

-- uses classical AI search techniques**statistical learning**(including neural networks)

-- emerged in the 1990s

-- data points exist in high dimensional**space**

-- use geometric structures (planes or surfaces) to**cut**the space into regions

-- learning is achieved through**parameter**learning

-- includes neural networks, support vector machines, ...

## Neural networks

This is a neuron:

A neuron receives

**input**signals on its**dendrites**from other neurons. When the**sum**of the inputs exceeds a certain**threshold**, an**action potential**is fired and propagates along the**axon**, that reaches other neurons.**Equation of a single neuron**

The equation of a neuron is thus:

$$ \sum w_i x_i \ge \theta $$

where the $x_i$'s are inputs, the $w_i$'s are called the

**weights**, $\theta$ is the**threshold**.
Graphically, each neuron is a

**hyper-plane**that cuts the**hyperspace**into 2 regions:
Thus imagine multiple neurons

**chopping up hyperspace**into desired regions.**Back-propagation**

Back-propagation is a technique invented in the 1970s to train a

**multi-layer**neural network to recognize complex, non-linear patterns.
This is the

**step function**of the inequality $\sum w_i x_i \ge \theta$:
This is replaced by the smooth

**sigmoid function**:
So the equation now becomes: $\Phi( \sum w_i x_i ) \ge \theta$.

Recall that in calculus, we have the

**chain rule**for differentiation:
$$ D \; f(g(x)) = f ' (g(x)) \cdot g'(x) $$

(The derivative does not exist for the step function because it is not smooth at the threshold point).

Imagine each layer of the neural network as a function $f_i$. So the chain rule allows us to propagation the error ($\Delta x$) from the lowest layer ($f_1$) to the highest layer ($f_n$):

$$ f_n(f_{n-1}( ... ... f_2(f_1(\Delta x)) ... ) $$

(The formula is just a figurative illustration). So we can gradually tune the parameters of the neurons in each layer to

**minimize**the error.## Support vector machines

[ See next blog post. ]

## Logical / relational learning

This is an example (correct) hypothesis:

$ uncle(X, Y) \leftarrow parent(Z, Y) \wedge brother(Z, X) $

This hypothesis would be too

**general**:
$ uncle(X, Y) \leftarrow $ ~~parent(Z, Y)~~ $ \wedge brother(Z, X) $

Whereas this one is too

**specific**:

$ uncle(X, Y) \leftarrow parent(Z, Y) \wedge brother(Z, X) \wedge has\underline{ }beard(X) $

**general-specific order**(also known as

**subsumption order**).

An

**order**(<, Chinese 序) creates a structure called a

**lattice**(Chinese 格). This is an example of a lattice:

This is another example of a lattice:

Sometimes we use $\top$ to denote the

**top**element (ie, the most general statement, meaning "everything is correct"), and $\bot$ to denote the

**bottom**element (ie, the most specific statement, meaning "everything is false").

Hellο there, I discoѵeгed your ѕіte via Gοοgle even as looking

ReplyDeletefoг a related matter, your ѕіte саme uρ,

it seems to be gooԁ. I have bookmarκeԁ

it in my goоgle bookmarκs.

Μy webpage :: gardening tips

Thiѕ iѕ a really good tip especially to thоse fresh tο the blоgoѕphere.

ReplyDeleteShort but ѵery ρrecisе info… Thanκ you foг ѕhаring this onе.

Α must reaԁ article!

Alsο νisіt my webpagе; whаt doеѕ

going green mean (flirt.delfinweb.de)