Matrix chain multiplication is an optimization problem concerning the most efficient way to multiply a given sequence of matrices
This is going to be an in-depth explanation of matrix chain multiplication. This may be solved using brute force or overlapping subproblems but in this blog, we would be using the interesting principles of dynamic programming to explain it. I’ll be using handwritten notes compiled on OneNote by Microsoft to better explain the concepts in a comprehensible manner. I have also uploaded a video on YouTube on the same topic.(https://youtu.be/sgoqQzf3MqA)
We need to find the most efficient way to multiply a sequence of matrices. The focus would be to determine the order of parenthesization to minimize the cost of multiplication. Before we get into this, let’s look at some pre-requisites required to study this algorithm.
You first need to know the basics of simple matrix multiplication. Let’s understand this with an example.
Suppose we have 2 matrices- A1 of dimensions 5 ×4, denoted by i×j and A2 of dimensions 4×6, denoted by j×k. Notice that for the matrices to be multipliable, the number of columns in the first matrix must be equal to the number of rows in the second matrix. Then the dimensions of the product would be 5×6, that is i×k.
Now we need to calculate the total number of multiplication operations required. For getting the first element of the product matrix, we need to multiply the first row containing 4 elements with the first column which will also have 4 elements. We need to perform overall 4 multiplication operations to get this one single element. Similarly, for all the elements.
So, the overall the number of multiplication operations would be 5×6×4, that is, i×j×k.
Another property is that matrix multiplication is not commutative, meaning A1.A2≠A2.A1, although it maybe equal in some cases. But it is associative, that is A1×(A2×A3)=(A1×A2)×A3.
Moving on, let’s look at certain notations:
- We have to multiply ’n’ matrices: A1×A2×⋯×An such that the overall number of operations is minimized. By operations, we mean the multiplication operations.
- Let the dimensions of the matrix A1 be P0 ×P1, P1 X P2 for A2 and so on. So, for any matrix, Pi-1 would denote the row and Pi would denote the column.
Understanding the solution through an example
Now, for the main example, we would be using 4 matrices, A1 of dimensions 5×4, A2 of 4×6, A3 of 6×2 and A4 of 2×7. Our aim is to find an optimal parenthesization of all matrices to break the product into two parts each of which is parenthesized independently.
Note:- We would be denoting the splitting point by ‘k’.
Here, if k=2 then the matrices would be multiplied in this way-
And if k=1, then matrices would be multiplied in this way-
We would also have to look within this bracket, We would be seeing the table later on but suppose the value of k for this bracket is 2 then matrices would be multiplied in this way-
Hence, we need to find the value of k to minimize the cost of multiplication.
To solve this, we would be applying the principles of dynamic programming:-
· Memoization: Storing the intermediate results of the solutions of sub-problems, allowing us to speed up the computation of the overall solution.
· Bottom-up approach: The table would be filled from top to bottom, from left to right.
We would be using another notation, m[i,j], which denotes the minimum number of operations required to multiply matrices Ai to Aj. For example, m[2,4] is the minimum number of multiplication operations required to multiply A2 to A4(A2.A3.A4)
Let’s look at the procedure of filling the entries one by one.
The first entry is m[1,1]. This means we just have one matrix. The important point to note is that we are not multiplying A1 by A1. So, no product is required in this case and the number of operations required would be 0. Similarly, for all the diagonal elements.
Now, look at the lower triangular elements. m[2,1] means A2×A1 which cannot be considered as the order of the matrices cannot be changed in matrix multiplication. Similarly, for all the other lower triangular elements as well. So, we have reduced our problem to just half the matrix.
Before calculating the other entries, let us look at an important formula.
We just saw that m[i,j] is 0 when i=j, that is for the diagonal elements.
Now, to understand the second part, let’s look at an example(see above figure). Suppose the optimal value of k=2. So, the overall number of operations would be the sum of operations to multiply A1 and A2, that is m[1,2] and the operations to multiply A3 and A4, that is m[3,4]. Now these products will have to multiplied within themselves also! X will be multiplied with Y and the number of operations would be Pi-1×Pk×Pj.
An example of the table entry
In order to fill up the rest of the table, we would be using the formula we just derived. And, it would be filled in a bottom up manner. To understand this better, we would take an example of one of the entries, that is m[1,3]. Notice that i≤k<j. So, the possible values of k in this case would be 1 and 2 and we need to find the minimum of both the cases. Substituting the values from the formula, we obtain the value of m to be 88 for the first case and 180 for the second case. So, we choose 88 as that is the minimum of the two.
Now, we write this value of k in the S table which stores the optimal values of k.
Similarly, we fill up all the other entries as well!
Looking at the original question, we need to focus on the last entry that was filled which is m[1,4]. This signifies that the entire problem can be solved in a minimum of 158 operations.
For the parenthesization, we look at the last entry in the S table, that is s[1,4].
This means that the matrices would be split in this way-
But what about A1to A3? If we check s[1,3], it is 1. So, it is further split in this way-
Time complexity analysis
Since we are just generating half the table, so number of elements would be (n*(n-1))/2. And we are getting each element by calculating all then finding the minimum. This takes at most ‘n’ time. So, overall time complexity becomes θ(n3).
1. n ← length[p]-1
2. for i ← 1 to n
3. do m [i, i] ← 0
4. for l ← 2 to n // l is the chain length
5. do for i ← 1 to n-l + 1
6. do j ← i+ l -1
7. m[i,j] ← ∞
8. for k ← i to j-1
9. do q ← m [i, k] + m [k + 1, j] + pi-1 pk pj
10. if q < m [i,j]
11. then m [i,j] ← q
12. s [i,j] ← k
13. return m and s
We learnt how dynamic programming can make solving a complex problem a lot more easier. Hope you all had fun in learning about this interesting algorithm!!