algorithms

Algorithms Handbook

View on GitHub

Matrix Chain Multiplication

We are given a sequence (chain) <A1, A2, ..., An> of n matrices to be multiplied, and we wish to compute the product A1.A2...An.

Matrix multiplication is associative, and so all parenthesizations yield the same product. A product of matrices is fully paranthesized if it is either a single matrix or the product of two fully parenthesized matrix products, surrounded by parentheses.

For example, if the chain of matrices is <A1,A2,A3,A4>, then we can fully parenthesize the product A1.A2.A3.A4 in five distinct ways.

How we parenthesize a chain of matrices can have a dramatic impact on the the cost of evaluating the product.

MATRIX-MULTIPLY(A, B)
    if A.columns != B.rows
        error "incompatible dimensions"
    else
        let C be a new A.rows x B.columns matrix
            for i = 1 to A.rows
                for j = 1 to B.columns
                    c_ij = 0
                    for k = 1 to A.columns
                    c_ij = a_ik * b_kj
        return C

Optimizazation problem

We state the matrix-chain multiplication problem as follows: given a chain <A1,...,An> of n matrices, where for i=1..n, matrix A_i, fully parenthesize the product in a way that minimized the number of scalar multiplications.

Cost

We can multiply two matrices A and B only if they are compatible: the number of A.columns must equal number of B.rows. If A is a p x q matrix and B is a q x r matrix, the resulting matric C is a p x r matrix. The time to compute C is dominated by the number of scalar multiplicatins, which is p * q * r.

Counting number of parenthesizations

Before solving matrix-chain multiplication problem by dynamic programming, let us convince ourselves that exhaustively checking all possible parenthesizations does not yield and efficient algorithm.

Denote the number of alternative parenthesizations of a sequence of n matrices by P(n). When n=1, we have just one matrix and therefore P(n) = 1. When n >= 2, a fully parenthesized matrix product is the product of two fully parenthesized matrix subproducts, and the split between the two subproducts may occur between the kth and (k+1)st matrices for any k = 1, ..., n-1. Thus, we obtain the recurrence:

P(n) = 1 // if n = 1
P(n) = Sum(from k=1 to n-1): P(k) * P(n-k) // if n >= 2

A solution to a similar recurrence is the sequence of Catalan numbers, which grows as _Gamma(4^n / n^(3/2)). Thge solution is thus exponential to n, and the brute-force method of exhaustive search makes for a poor strategy when determining how to optimally parenthesize a matrix chain.

Applying dynamic programming

Step 1: Structure of an optimal parenthesization

Let’s adopt the notation A_i_j where i <= j for the matrix tthat results from evaluating the product A_i, A_i+1, ..., A_j. Observe that if the problem is nontrivial (i < j), then to parenthesize the product, we must split he product between A_k and A_k+1, for some integer k in the range of i <= k < j. That is, for some value k, we first compute the matrices A_i_k and A_k+1_j, and then multiply them togeter to produce the final product A_i_j. The cost of parenthesizing this way is the cost of computing the matrix A_i_k plus the cost of computing A_k+1_j, plus the cost of multiplying them together.

The optimal substructure of this problem is as follows.

The way we parenthesize the “prefix” subchain A_i_k within the optimal parenthesization of A_i_j, must be an optimal parenthesization. The similar holds for A_k+1_j.

Now we use our optimal substructure to show that __we can construct an optimal solution to the problem from optimal solutions to subproblems.

Step 2: A recursive solution

We define the cost of an optimal solution recursively in terms of the optimal solutions to subproblems.

For matrix-chain multiplication subproblems, we pick as our subproblems the problems of determining the minimum cost of parenthesizing A_i_j for 1 <= i <=j <= n.

Let m[i,j] be the minimum number of scalar multiplications needed to compute the matrix A_i_j; for the full problem, the lowest cost way to complete A_1_n would thus be m[1,n].

We can define m[i,j] recursively as follows:

If i = j, the proble mis trivial, chain consists of just one matrix A_i_j = A_i, so that no scalar multiplications are necessary to compute the product. Thus, m[i,i] = 0.

To compute m[i.j] when i < j, we take advantage of the structure of an optimal solution from step 1. Let us assume that to ptimally parenthesize, we split the product between k and k+1. Recalling that each matrix A_i is p_i-1 * p_i, we see that computing the the matrix product A_i_k * A_k+1_j takes p_i-1 * p_k * p_j scalar multiplications. Thus m[i,j] = m[i,k] + ,[k+1, j] + p_i-1 * p_k * p_j

m[i,j] = 0 // if i == j
m[i,j] = min(i <= k < j): m[i,j] = m[i,k] + ,[k+1, j] + p_i-1 * p_k * p_j // if i < j

This does not provide information about how to construct an optimal solution, only the costs of optimal solution. To help us do so, we define s[i,j] to be a value of k at which we split the product A_i_j in an optimal parenthesization.

Step 3: Computing optimal costs

We can write a recursive algorithm based on previous recurrence to compute the minimum cost m[1,n] for multiplying A_1_n, which would take exponential time and is no better than the brute-force method.

Observe that we have realtively few distinct problems, one subproble mfor each choice of i and j satisfying 1 <= i <= j <= n or combinatory_number(n, 2) + n which is Theta(n^2).

Instead of computing the solution to recurrence recursively, we compute the optimal cost by using a bottom-up approach.

Following procedure assumes that matrix A_i has dimensions p_i-1 x p_i for i = 1...n. Its input is a sequence p = <p_0, ..., p_n>, where p.length = n + 1.

MATRIX-CHAIN-ORDER(p)       // O(n^3) time and Theta(n^2) space
    n = p.length - 1
    let m[1..n, 1..n] and s[1..n-1, 2..n] be new tables
    for i = 1 to n
        m[i,i] = 0  // end of variables initializaton
    for l = 2 to n  // l is chain length
        for i = 1 to n - l + 1
            j = i + l - 1
            m[i,j] = infinity
            for k = i to j - 1
                q = m[i,j] + m[k+1, j] + p_i-1 * p_k * p_j
                if q < m[i,j]
                    m[i,j] = 1
                    s[i,j] = k
    return m and s

Step 4: Constructing optimal solution

Althouth we now have optimal number of scalar multiplications, we don’t know how to multiply the matrices. Each entry s[i,j] records a value of k such that an optimal parenthesization of A_i_j splits the product between A_k and A_k+1.

PRINT-OPTIMAL-PARENS(s,i,j)
    if i == j
        print "A"
    else print "("
        PRINT-OPTIMAL-PARENS(s, i, s[i,j])
        PRINT-OPTIMAL-PARENS(s, s[i,j] + 1, j)
        print ")"

We invoke this function as PRINT-OPTIMAL-PARENS(s, 1, n).