I also combined materials from chapter 2 of the book "A First Course in Applied Mathematics" by Jorge Rebaza (2012):
How much is the PageRank algorithm worth?
In 2012:
Google annual revenue = US 50.2 billion
Google total assets = US 93.8 billion
Google market cap = US 248.2 billion
Hong Kong's GDP = US 243.7 billion
So the entire Hong Kong population working for a year, is roughly worth the outstanding stocks of Google. Perhaps learning the PageRank algorithm can give us an estimate of how useful mathematics can be :)
In the graph:
nodes = web pages
links = out-going links from page to page
The matrices are the adjacency matrices of the graphs. Each row represents the out-going links from the page $P_i$. If a page has $n$ out-going links, each link is weighted by $1/n$.
Notice that each row of the matrices sum to 1. These are called stochastic matrices.
Then the rank of a page is defined by adding up the ranks of pages with out-links pointing to it:
$$ r(P) = \sum_{i} \frac{r(P_i)}{|P_i|} $$The algorithm
As of 2012 Google is searching 8.9 billion web pages. First, it builds a directed graph ($\Gamma$) of how the web pages are linked to each other.
Example 1:
$$ A = \left[ \begin{array}{c c c c c} 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 1/2 & 0 & 0 & 1/2 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \end{array} \right] $$
Example 2:
$$ A = \left[ \begin{array}{c c c c c c} 0 & 1 & 0 & 0 & 0 & 0 \\ 1/2 & 0 & 1/2 & 0 & 0 & 0 \\ 1/2 & 1/2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1/3 & 0 & 1/3 & 1/3 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1/2 & 1/2 & 0 \end{array} \right] $$
In the graph:
nodes = web pages
links = out-going links from page to page
The matrices are the adjacency matrices of the graphs. Each row represents the out-going links from the page $P_i$. If a page has $n$ out-going links, each link is weighted by $1/n$.
Then the rank of a page is defined by adding up the ranks of pages with out-links pointing to it:
where $r(P_i)$ is the rank of Page $i$, and $|P_i|$ denotes the total number of out-going links from Page $i$. The rationale is that it is a good thing to be pointed to, so the "goodness" of Page $i$ is passed to Page $P$; but it has to be divided by $|P_i|$ because a page that points outwards "promiscuously" should have its influence diluted.
But this ranking has to be calculated iteratively starting from an initial ranking. The iterative equation has the form:
$$ r_k(P) = \sum_i \frac{r_{k-1}(P_i)}{|P_i|}, \quad \quad k = 1, 2, ... $$
This means that the "goodness" (ranking) is passed around the graph iteratively until their values at every node converge to stable values.
$$ v^T_k = v^T_{k-1} H $$
where
$$ H_{ij} = \left\{ \begin{array}{l@l} \frac{1}{|P_i|} & \mbox{if } P_i \mbox{ has a link to } P \\ 0 & \mbox{otherwise} \end{array} \right. $$
Notice that matrix H is constant throughout the iteration. That means we are repeatedly multiplying by H:
$$ v = \lim_{k \rightarrow \infty} v_k = \lim_{k \rightarrow \infty} H^k v_0 $$
where $v$ is the PageRank vector.
The above is reminiscent of the way we repeatedly press the "cosine" button on a calculator, and see the results converge. That value is the solution to the equation:
$$ \cos x = x. $$
In general, when we want to solve an equation of the form $ f(x) = x $, we can repeatedly apply $f()$ starting from an initial value $x_0$. The point for which $f(x^*) = x^*$ is called the fixed point, a very useful concept in mathematics.
The above iteration is called the power method, which will converge to the eigenvector with the greatest eigenvalue of the matrix $H$. This is called the dominant eigenvector, which I will now explain...
Power method
This section explains why the power method converges. Recall the definition of eigenvector and eigenvalue:$$ A x = \lambda x $$
where $x$ = eigenvector, $\lambda$ = eigenvalue.
Assume that the matrix A satisfies these conditions:
- $A$ has a complete set of linearly independent eigenvectors $v_1, ... v_n$
- the corresponding eigenvalues can be ordered:
$ |\lambda_1| > |\lambda_2| \ge ... \ge |\lambda_n| $
(notice the strict > in the first term)
For iteration, choose an arbitrary initial vector $x_0$. Due to the eigenvectors forming a basis, we can express $x_0$ as:
$$ x_0 = c_1 v_1 + ... + c_n v_n $$
Now we multiply the above with matrix A and get:
$$ \begin{array}{l@=l} A x_0 & = c_1 \lambda_1 v_1 + c_2 \lambda_2 v_2 + ... + c_n \lambda_n v_n \\ A^2 x_0 & = c_1 \lambda_1^2 v_1 + c_2 \lambda_2^2 v_2 + ... + c_n \lambda_n^2 v_n \\ & \vdots \\ A^k x_0 & = c_1 \lambda_1^k v_1 + c_2 \lambda_2^k v_2 + ... + c_n \lambda_n^k v_n \end{array} $$
And because $ |\lambda_1| > |\lambda_2| \ge ... \ge |\lambda_n| $, as $k \rightarrow \infty$, the weight of the first term outgrows those of the rest, so the vector $A^k x_0$ moves towards the dominant eigenvector direction.
The rate of convergence is determined by the ratio $\frac{ |\lambda_2| } { |\lambda_1| }$.
For the power method to converge, we need to ensure that the matrix is stochastic and irreducible....
Irreducible matrices
The section explains the necessary condition for the power method to converge; but the measure to force convergence also gave rise to Google's personalized search, as we shall see why.A matrix is called reducible if there exists a permutation matrix $P$ such that:
$$ P^T A P = \left[ \begin{array}{cc} C& D \\ \mathbf{0} & E \end{array} \right] $$
where $\mathbf{0}$ is a $(n-r) \times r$ block matrix with 0's.
Observe that the above definition expresses the similarity between $A$ and a block upper triangular matrix. (Note: 2 matrices $A$ and $B$ are similar if there is a $P$ such that $B = P A P^{-1}$, where P is any invertible matrix. This means that the effect of applying $A$ is the same as applying $B$ between a change of coordinates and back. In the case of permutation matrices, $P^{-1}$ coincides with $P^T$.)
The significance of this definition is that:
matrix A is irreducible $\Longleftrightarrow \Gamma(A) $ is strongly connected
A graph is strongly connected if every node is reachable from any other node.
Example:
where
$$ A = \left[ \begin{array}{c c c c} 0 & 1/3 & 1/3 & 1/ 3 \\ 1/2 & 0 & 1/2 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array} \right] $$
$$ B = \left[ \begin{array}{c c c c} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 1/3 & 1/3 & 0 & 1/3 \\ 1/2 & 0 & 1/2 & 0 \end{array} \right] $$
$$ B = P^T A P $$
$$ P = \left[ \begin{array}{c c c c} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{array} \right] $$
The effect of multiplying by $P$ (a permutation matrix), is to exchange some rows, hence the name.
Notice that there is no way in matrix $A$ to go from the nodes 3 & 4 to nodes 1 & 2. Thus matrix $A$ is reducible.
The effect of multiplying by $P$ (a permutation matrix), is to exchange some rows, hence the name.
Notice that there is no way in matrix $A$ to go from the nodes 3 & 4 to nodes 1 & 2. Thus matrix $A$ is reducible.
To ensure irreducibility (in order for the power method to work), Google forces every page to be reachable from every other page.
Google's solution is to add a perturbation matrix to $H$:
$$ G = \alpha H + (1 - \alpha) E $$
where $G$ is the Google matrix, $\alpha \in (0,1)$, and $E = e \; e^T / n$ where $e$ is a vector or 1's. This makes every page reachable to every page, but later Google modified this to:
$$ E = e \; u^T $$
where $u$ is a probabilistic vector called the personalization vector, which plays a great role in Google's success. More details are explained in Rebaza's book.
Hope this is helpful to you :)