Matrix Mathematics chapter
QMRNumerical analysis is the study of algorithms that use numerical approximation (as opposed to general symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics).
One of the earliest mathematical writings is a Babylonian tablet from the Yale Babylonian Collection (YBC 7289), which gives a sexagesimal numerical approximation of \sqrt{2}, the length of the diagonal in a unit square. Being able to compute the sides of a triangle (and hence, being able to compute square roots) is extremely important, for instance, in astronomy, carpentry and construction.[2]
The Babylonian tablet is a quadrant etched into the tablet
QMRMatrix representation is a method used by a computer language to store matrices of more than one dimension in memory. Fortran and C use different schemes. Fortran uses "Column Major", in which all the elements for a given column are stored contiguously in memory. C uses "Row Major", which stores all the elements for a given row contiguously in memory. LAPACK defines various matrix representations in memory. There is also Sparse matrix representation and Morton-order matrix representation. According to the documentation, in LAPACK the unitary matrix representation is optimized.[1] Some languages such as Java store matrices using Iliffe vectors. These are particularly useful for storing irregular matrices. Matrices are of primary importance in linear algebra.
An m × n (read as m by n) order matrix is a set of numbers arranged in m rows and n columns. Matrices of the same order can be added by adding the corresponding elements. Two matrices can be multiplied, the condition being that the number of columns of the first matrix is equal to the number of rows of the second matrix. Hence, if an m × n matrix is multiplied with an n × r matrix, then the resultant matrix will be of the order m × r.[2]
Operations like row operations or column operations can be performed on a matrix, using which we can obtain the inverse of a matrix. The inverse may be obtained by determining the adjoint as well.[2] rows and columns are the different classes of matrices
Basics of 2D array[edit]
The mathematical definition of a matrix finds applications in computing and database management, a basic starting point being the concept of arrays. A two-dimensional array can function exactly like a matrix. Two-dimensional arrays can be visualized as a table consisting of rows and columns.
int a[3][4], declares an integer array of 3 rows and 4 columns. Index of row will start from 0 and will go up to 2.
Similarly, index of column will start from 0 and will go up to 3.[3]
Column 0 Column 1 Column 2 Column 3
row 0 a[0][0] a[0][1] a[0][2] a[0][3]
row 1 a[1][0] a[1][1] a[1][2] a[1][3]
row 2 a[2][0] a[2][1] a[2][2] a[2][3]
This table shows arrangement of elements with their indices.
Initializing Two-Dimensional arrays: Two-Dimensional arrays may be initialized by providing a list of initial values.
int a[2][3] = {1,2,3,4,5,6,} or int a[2][3] = {{2,3,4}},{{4,4,5}};
Calculation of Address : An m x n matrix (a[1...m][1...n]) where the row index varies from 1 to m and column index from 1 to n,aij denotes the number in the ith row and the jth column. In the computer memory, all elements are stored linearly using contiguous addresses. Therefore,in order to store a two-dimensional matrix a, two dimensional address space must be mapped to one-dimensional address space. In the computer's memory matrices are stored in either Row-major order or Column-major order form.
QMRIn scientific computing, skyline matrix storage, or SKS, or a variable band matrix storage, or envelope storage scheme[1] is a form of a sparse matrix storage format matrix that reduces the storage requirement of a matrix more than banded storage. In banded storage, all entries within a fixed distance from the diagonal (called half-bandwidth) are stored. In column-oriented skyline storage, only the entries from the first nonzero entry to the last nonzero entry in each column are stored. There is also row oriented skyline storage, and, for symmetric matrices, only one triangle is usually stored.[2]
Skyline storage has become very popular in the finite element codes for structural mechanics, because the skyline is preserved by Cholesky decomposition (a method of solving systems of linear equations with a symmetric, positive-definite matrix; all fill-in falls within the skyline), and systems of equations from finite elements have a relatively small skyline. In addition, the effort of coding skyline Cholesky[3] is about same as for Cholesky for banded matrices (available for banded matrices, e.g. in LAPACK; for a prototype skyline code, see [3]).
Before storing a matrix in skyline format, the rows and columns are typically renumbered to reduce the size of the skyline (the number of nonzero entries stored) and to decrease the number of operations in the skyline Cholesky algorithm. The same heuristic renumbering algorithm that reduce the bandwidth are also used to reduce the skyline. The basic and one of the earliest algorithms to do that is reverse Cuthill–McKee algorithm.
However, skyline storage is not as popular for very large systems (many millions of equations) because skyline Cholesky is not so easily adapted for massively parallel computing, and general sparse methods,[4] which store only the nonzero entries of the matrix, become more efficient for very large problems due to much less fill-in.
QMRIn mathematics, a block matrix or a partitioned matrix is a matrix which is interpreted as having been broken into sections called blocks or submatrices.[1] Intuitively, a matrix interpreted as a block matrix can be visualized as the original matrix with a collection of horizontal and vertical lines which break it out, or partition it, into a collection of smaller matrices.[2] Any matrix may be interpreted as a block matrix in one or more ways, with each interpretation defined by how its rows and columns are partitioned.
This notion can be made more precise for an n by m matrix M by partitioning n into a collection rowgroups, and then partitioning m into a collection colgroups. The original matrix is then considered as the "total" of these groups, in the sense that the (i,j) entry of the original matrix corresponds in a 1-to-1 way with some (s,t) offset entry of some (x,y), where x \in \mathit{rowgroups} and y \in \mathit{colgroups}.
The matrix
\mathbf{P} = \begin{bmatrix}
1 & 1 & 2 & 2\\
1 & 1 & 2 & 2\\
3 & 3 & 4 & 4\\
3 & 3 & 4 & 4\end{bmatrix}
can be partitioned into 4 2×2 blocks
\mathbf{P}_{11} = \begin{bmatrix}
1 & 1 \\
1 & 1 \end{bmatrix}, \mathbf{P}_{12} = \begin{bmatrix}
2 & 2\\
2 & 2\end{bmatrix}, \mathbf{P}_{21} = \begin{bmatrix}
3 & 3 \\
3 & 3 \end{bmatrix}, \mathbf{P}_{22} = \begin{bmatrix}
4 & 4\\
4 & 4\end{bmatrix}.
The partitioned matrix can then be written as
\mathbf{P} = \begin{bmatrix}
\mathbf{P}_{11} & \mathbf{P}_{12}\\
\mathbf{P}_{21} & \mathbf{P}_{22}\end{bmatrix}.
Block matrix multiplication[edit]
It is possible to use a block partitioned matrix product that involves only algebra on submatrices of the factors. The partitioning of the factors is not arbitrary, however, and requires "conformable partitions"[3] between two matrices A and B such that all submatrix products that will be used are defined.[4] Given an (m \times p) matrix \mathbf{A} with q row partitions and s column partitions
\mathbf{A} = \begin{bmatrix}
\mathbf{A}_{11} & \mathbf{A}_{12} & \cdots &\mathbf{A}_{1s}\\
\mathbf{A}_{21} & \mathbf{A}_{22} & \cdots &\mathbf{A}_{2s}\\
\vdots & \vdots & \ddots &\vdots \\
\mathbf{A}_{q1} & \mathbf{A}_{q2} & \cdots &\mathbf{A}_{qs}\end{bmatrix}
and a (p\times n) matrix \mathbf{B} with s row partitions and r column partitions
\mathbf{B} = \begin{bmatrix}
\mathbf{B}_{11} & \mathbf{B}_{12} & \cdots &\mathbf{B}_{1r}\\
\mathbf{B}_{21} & \mathbf{B}_{22} & \cdots &\mathbf{B}_{2r}\\
\vdots & \vdots & \ddots &\vdots \\
\mathbf{B}_{s1} & \mathbf{B}_{s2} & \cdots &\mathbf{B}_{sr}\end{bmatrix},
that are compatible with the partitions of A, the matrix product
\mathbf{C}=\mathbf{A}\mathbf{B}
can be formed blockwise, yielding \mathbf{C} as an (m\times n) matrix with q row partitions and r column partitions. The matrices in your matrix \mathbf{C} are calculated by multiplying:
\mathbf{C}_{\alpha \beta} = \sum^s_{\gamma=1}\mathbf{A}_{\alpha \gamma}\mathbf{B}_{\gamma \beta}.
Or, using the Einstein notation that implicitly sums over repeated indices:
\mathbf{C}_{\alpha \beta} = \mathbf{A}_{\alpha \gamma}\mathbf{B}_{\gamma \beta}.
Block matrix inversion[edit]
Main article: Helmert–Wolf blocking
If a matrix is partitioned into four blocks, it can be inverted blockwise as follows:
\begin{bmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{bmatrix}^{-1} = \begin{bmatrix} \mathbf{A}^{-1}+\mathbf{A}^{-1}\mathbf{B}(\mathbf{D}-\mathbf{CA}^{-1}\mathbf{B})^{-1}\mathbf{CA}^{-1} & -\mathbf{A}^{-1}\mathbf{B}(\mathbf{D}-\mathbf{CA}^{-1}\mathbf{B})^{-1} \\ -(\mathbf{D}-\mathbf{CA}^{-1}\mathbf{B})^{-1}\mathbf{CA}^{-1} & (\mathbf{D}-\mathbf{CA}^{-1}\mathbf{B})^{-1} \end{bmatrix},
\,
where A, B, C and D have arbitrary size. (A and D must be square, so that they can be inverted. Furthermore, A and D−CA−1B must be nonsingular.[5])
Equivalently,
\begin{bmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{bmatrix}^{-1} = \begin{bmatrix} (\mathbf{A}-\mathbf{BD}^{-1}\mathbf{C})^{-1} & -(\mathbf{A}-\mathbf{BD}^{-1}\mathbf{C})^{-1}\mathbf{BD}^{-1} \\ -\mathbf{D}^{-1}\mathbf{C}(\mathbf{A}-\mathbf{BD}^{-1}\mathbf{C})^{-1} & \quad \mathbf{D}^{-1}+\mathbf{D}^{-1}\mathbf{C}(\mathbf{A}-\mathbf{BD}^{-1}\mathbf{C})^{-1}\mathbf{BD}^{-1}\end{bmatrix}.
Block diagonal matrices [edit]
A block diagonal matrix is a block matrix which is a square matrix, and having main diagonal blocks square matrices, such that the off-diagonal blocks are zero matrices. A block diagonal matrix A has the form
\mathbf{A} = \begin{bmatrix}
\mathbf{A}_{1} & 0 & \cdots & 0 \\ 0 & \mathbf{A}_{2} & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & \mathbf{A}_{n}
\end{bmatrix}
where Ak is a square matrix; in other words, it is the direct sum of A1, …, An. It can also be indicated as A1 \oplus A2 \oplus\,\ldots\,\oplus An or diag(A1, A2,\ldots, An) (the latter being the same formalism used for a diagonal matrix). Any square matrix can trivially be considered a block diagonal matrix with only one block.
For the determinant and trace, the following properties hold
\operatorname{det} \mathbf{A} = \operatorname{det} \mathbf{A}_1 \times \ldots \times \operatorname{det} \mathbf{A}_n,
\operatorname{tr} \mathbf{A} = \operatorname{tr} \mathbf{A}_1 +\cdots +\operatorname{tr} \mathbf{A}_n.
The inverse of a block diagonal matrix is another block diagonal matrix, composed of the inverse of each block, as follows:
\begin{pmatrix}
\mathbf{A}_{1} & 0 & \cdots & 0 \\
0 & \mathbf{A}_{2} & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & \mathbf{A}_{n}
\end{pmatrix}^{-1} = \begin{pmatrix} \mathbf{A}_{1}^{-1} & 0 & \cdots & 0 \\
0 & \mathbf{A}_{2}^{-1} & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & \mathbf{A}_{n}^{-1}
\end{pmatrix}.
The eigenvalues and eigenvectors of A are simply those of A_{1} and A_{2} and ... and A_{n} (combined).
Block tridiagonal matrices[edit]
A block tridiagonal matrix is another special block matrix, which is just like the block diagonal matrix a square matrix, having square matrices (blocks) in the lower diagonal, main diagonal and upper diagonal, with all other blocks being zero matrices. It is essentially a tridiagonal matrix but has submatrices in places of scalars. A block tridiagonal matrix A has the form
\mathbf{A} = \begin{bmatrix}
\mathbf{B}_{1} & \mathbf{C}_{1} & & & \cdots & & 0 \\
\mathbf{A}_{2} & \mathbf{B}_{2} & \mathbf{C}_{2} & & & & \\
& \ddots & \ddots & \ddots & & & \vdots \\
& & \mathbf{A}_{k} & \mathbf{B}_{k} & \mathbf{C}_{k} & & \\
\vdots & & & \ddots & \ddots & \ddots & \\
& & & & \mathbf{A}_{n-1} & \mathbf{B}_{n-1} & \mathbf{C}_{n-1} \\
0 & & \cdots & & & \mathbf{A}_{n} & \mathbf{B}_{n}
\end{bmatrix}
where Ak, Bk and Ck are square sub-matrices of the lower, main and upper diagonal respectively.
Block tridiagonal matrices are often encountered in numerical solutions of engineering problems (e.g., computational fluid dynamics). Optimized numerical methods for LU factorization are available and hence efficient solution algorithms for equation systems with a block tridiagonal matrix as coefficient matrix. The Thomas algorithm, used for efficient solution of equation systems involving a tridiagonal matrix can also be applied using matrix operations to block tridiagonal matrices (see also Block LU decomposition).
Block Toeplitz matrices[edit]
A block Toeplitz matrix is another special block matrix, which contains blocks that are repeated down the diagonals of the matrix, as a Toeplitz matrix has elements repeated down the diagonal. The individual block matrix elements, Aij, must also be a Toeplitz matrix.
A block Toeplitz matrix A has the form
\mathbf{A} = \begin{bmatrix}
\mathbf{A}_{(1,1)} & \mathbf{A}_{(1,2)} & & & \cdots & \mathbf{A}_{(1,n-1)} & \mathbf{A}_{(1,n)} \\
\mathbf{A}_{(2,1)} & \mathbf{A}_{(1,1)} & \mathbf{A}_{(1,2)} & & & & \mathbf{A}_{(1,n-1)} \\
& \ddots & \ddots & \ddots & & & \vdots \\
& & \mathbf{A}_{(2,1)} & \mathbf{A}_{(1,1)} & \mathbf{A}_{(1,2)} & & \\
\vdots & & & \ddots & \ddots & \ddots & \\
\mathbf{A}_{(n-1,1)} & & & & \mathbf{A}_{(2,1)} & \mathbf{A}_{(1,1)} & \mathbf{A}_{(1,2)} \\
\mathbf{A}_{(n,1)} & \mathbf{A}_{(n-1,1)} & \cdots & & & \mathbf{A}_{(2,1)} & \mathbf{A}_{(1,1)}
\end{bmatrix}.
Direct sum[edit]
For any arbitrary matrices A (of size m × n) and B (of size p × q), we have the direct sum of A and B, denoted by A \oplus B and defined as
\mathbf{A} \oplus \mathbf{B} =
\begin{bmatrix}
a_{11} & \cdots & a_{1n} & 0 & \cdots & 0 \\
\vdots & \cdots & \vdots & \vdots & \cdots & \vdots \\
a_{m 1} & \cdots & a_{mn} & 0 & \cdots & 0 \\
0 & \cdots & 0 & b_{11} & \cdots & b_{1q} \\
\vdots & \cdots & \vdots & \vdots & \cdots & \vdots \\
0 & \cdots & 0 & b_{p1} & \cdots & b_{pq}
\end{bmatrix}.
For instance,
\begin{bmatrix}
1 & 3 & 2 \\
2 & 3 & 1
\end{bmatrix}
\oplus
\begin{bmatrix}
1 & 6 \\
0 & 1
\end{bmatrix}
=
\begin{bmatrix}
1 & 3 & 2 & 0 & 0 \\
2 & 3 & 1 & 0 & 0 \\
0 & 0 & 0 & 1 & 6 \\
0 & 0 & 0 & 0 & 1
\end{bmatrix}.
This operation generalizes naturally to arbitrary dimensioned arrays (provided that A and B have the same number of dimensions).
Note that any element in the direct sum of two vector spaces of matrices could be represented as a direct sum of two matrices.
Direct product[edit]
Main article: Kronecker product
Application[edit]
In linear algebra terms, the use of a block matrix corresponds to having a linear mapping thought of in terms of corresponding 'bunches' of basis vectors. That again matches the idea of having distinguished direct sum decompositions of the domain and range. It is always particularly significant if a block is the zero matrix; that carries the information that a summand maps into a sub-sum.
Given the interpretation via linear mappings and direct sums, there is a special type of block matrix that occurs for square matrices (the case m = n). For those we can assume an interpretation as an endomorphism of an n-dimensional space V; the block structure in which the bunching of rows and columns is the same is of importance because it corresponds to having a single direct sum decomposition on V (rather than two). In that case, for example, the diagonal blocks in the obvious sense are all square. This type of structure is required to describe the Jordan normal form.
This technique is used to cut down calculations of matrices, column-row expansions, and many computer science applications, including VLSI chip design. An example is the Strassen algorithm for fast matrix multiplication, as well as the Hamming(7,4) encoding for error detection and recovery in data transmissions.
QMRIn linear algebra, a Jordan normal form (often called Jordan canonical form)[1] of a linear operator on a finite-dimensional vector space is an upper triangular matrix of a particular form called a Jordan matrix, representing the operator with respect to some basis. Such a matrix has each non-zero off-diagonal entry equal to 1, immediately above the main diagonal (on the superdiagonal), and with identical diagonal entries to the left and below them. If the vector space is over a field K, then a basis with respect to which the matrix has the required form exists if and only if all eigenvalues of the matrix lie in K, or equivalently if the characteristic polynomial of the operator splits into linear factors over K. This condition is always satisfied if K is the field of complex numbers. The diagonal entries of the normal form are the eigenvalues of the operator, with the number of times each one occurs being given by its algebraic multiplicity.[2][3][4]
If the operator is originally given by a square matrix M, then its Jordan normal form is also called the Jordan normal form of M. Any square matrix has a Jordan normal form if the field of coefficients is extended to one containing all the eigenvalues of the matrix. In spite of its name, the normal form for a given M is not entirely unique, as it is a block diagonal matrix formed of Jordan blocks, the order of which is not fixed; it is conventional to group blocks for the same eigenvalue together, but no ordering is imposed among the eigenvalues, nor among the blocks for a given eigenvalue, although the latter could for instance be ordered by weakly decreasing size.[2][3][4]
The Jordan–Chevalley decomposition is particularly simple with respect to a basis for which the operator takes its Jordan normal form. The diagonal form for diagonalizable matrices, for instance normal matrices, is a special case of the Jordan normal form.[5][6][7]
The Jordan normal form is named after Camille Jordan.One can see that the Jordan normal form is essentially a classification result for square matrices, and as such several important results from linear algebra can be viewed as its consequences.
Spectral mapping theorem[edit]
Using the Jordan normal form, direct calculation gives a spectral mapping theorem for the polynomial functional calculus: Let A be an n × n matrix with eigenvalues λ1, ..., λn, then for any polynomial p, p(A) has eigenvalues p(λ1), ..., p(λn).
Cayley–Hamilton theorem[edit]
The Cayley–Hamilton theorem asserts that every matrix A satisfies its characteristic equation: if p is the characteristic polynomial of A, then p(A) = 0. This can be shown via direct calculation in the Jordan form, since any Jordan block for λ is annihilated by (X − λ)m where m is the multiplicity of the root λ of p, the sum of the sizes of the Jordan blocks for λ, and therefore no less than the size of the block in question. The Jordan form can be assumed to exist over a field extending the base field of the matrix, for instance over the splitting field of p; this field extension does not change the matrix p(A) in any way.
Minimal polynomial[edit]
The minimal polynomial P of a square matrix A is the unique monic polynomial of least degree, m, such that P(A) = 0. Alternatively, the set of polynomials that annihilate a given A form an ideal I in C[x], the principal ideal domain of polynomials with complex coefficients. The monic element that generates I is precisely P.
Let λ1, ..., λq be the distinct eigenvalues of A, and si be the size of the largest Jordan block corresponding to λi. It is clear from the Jordan normal form that the minimal polynomial of A has degree Σsi.
While the Jordan normal form determines the minimal polynomial, the converse is not true. This leads to the notion of elementary divisors. The elementary divisors of a square matrix A are the characteristic polynomials of its Jordan blocks. The factors of the minimal polynomial m are the elementary divisors of the largest degree corresponding to distinct eigenvalues.
The degree of an elementary divisor is the size of the corresponding Jordan block, therefore the dimension of the corresponding invariant subspace. If all elementary divisors are linear, A is diagonalizable.
Invariant subspace decompositions[edit]
The Jordan form of a n × n matrix A is block diagonal, and therefore gives a decomposition of the n dimensional Euclidean space into invariant subspaces of A. Every Jordan block Ji corresponds to an invariant subspace Xi. Symbolically, we put
\mathbb{C}^n = \bigoplus_{i = 1}^k X_i
where each Xi is the span of the corresponding Jordan chain, and k is the number of Jordan chains.
One can also obtain a slightly different decomposition via the Jordan form. Given an eigenvalue λi, the size of its largest corresponding Jordan block si is called the index of λi and denoted by ν(λi). (Therefore the degree of the minimal polynomial is the sum of all indices.) Define a subspace Yi by
\; Y_i = \operatorname{Ker} (\lambda_i I - A)^{\nu(\lambda_i)}.
This gives the decomposition
\mathbb{C}^n = \bigoplus_{i = 1}^l Y_i
where l is the number of distinct eigenvalues of A. Intuitively, we glob together the Jordan block invariant subspaces corresponding to the same eigenvalue. In the extreme case where A is a multiple of the identity matrix we have k = n and l = 1.
The projection onto Yi and along all the other Yj ( j ≠ i ) is called the spectral projection of A at λi and is usually denoted by P(λi ; A). Spectral projections are mutually orthogonal in the sense that P(λi ; A) P(λj ; A) = 0 if i ≠ j. Also they commute with A and their sum is the identity matrix. Replacing every λi in the Jordan matrix J by one and zeroising all other entries gives P(λi ; J), moreover if U J U−1 is the similarity transformation such that A = U J U−1 then P(λi ; A) = U P(λi ; J) U−1. They are not confined to finite dimensions. See below for their application to compact operators, and in holomorphic functional calculus for a more general discussion.
Comparing the two decompositions, notice that, in general, l ≤ k. When A is normal, the subspaces Xi's in the first decomposition are one-dimensional and mutually orthogonal. This is the spectral theorem for normal operators. The second decomposition generalizes more easily for general compact operators on Banach spaces.
It might be of interest here to note some properties of the index, ν(λ). More generally, for a complex number λ, its index can be defined as the least non-negative integer ν(λ) such that
\mathrm{Ker}(\lambda - A)^{\nu(\lambda)} = \operatorname{Ker} (\lambda - A)^m, \; \forall m \geq \nu(\lambda) .
So ν(λ) > 0 if and only if λ is an eigenvalue of A. In the finite-dimensional case, ν(λ) ≤ the algebraic multiplicity of λ.
Generalizations[edit]
Matrices with entries in a field[edit]
Jordan reduction can be extended to any square matrix M whose entries lie in a field K. The result states that any M can be written as a sum D + N where D is semisimple, N is nilpotent, and DN = ND. This is called the Jordan–Chevalley decomposition. Whenever K contains the eigenvalues of M, in particular when K is algebraically closed, the normal form can be expressed explicitly as the direct sum of Jordan blocks.
Similar to the case when K is the complex numbers, knowing the dimensions of the kernels of (M − λI)k for 1 ≤ k ≤ m, where m is the algebraic multiplicity of the eigenvalue λ, allows one to determine the Jordan form of M. We may view the underlying vector space V as a K[x]-module by regarding the action of x on V as application of M and extending by K-linearity. Then the polynomials (x − λ)k are the elementary divisors of M, and the Jordan normal form is concerned with representing M in terms of blocks associated to the elementary divisors.
The proof of the Jordan normal form is usually carried out as an application to the ring K[x] of the structure theorem for finitely generated modules over a principal ideal domain, of which it is a corollary.
Compact operators[edit]
In a different direction, for compact operators on a Banach space, a result analogous to the Jordan normal form holds. One restricts to compact operators because every point x in the spectrum of a compact operator T, the only exception being when x is the limit point of the spectrum, is an eigenvalue. This is not true for bounded operators in general. To give some idea of this generalization, we first reformulate the Jordan decomposition in the language of functional analysis.
Simply laced groups[edit]
A simply laced group is a Lie group whose Dynkin diagram only contain simple links, and therefore all the nonzero roots of the corresponding Lie algebra have the same length. The A, D and E series groups are all simply laced, but no group of type B, C, F, or G is simply laced.
QMRA tridiagonal matrix is a matrix that is both upper and lower Hessenberg matrix.[2] In particular, a tridiagonal matrix is a direct sum of p 1-by-1 and q 2-by-2 matrices such that p + q/2 = n -- the dimension of the tridiagonal. Although a general tridiagonal matrix is not necessarily symmetric or Hermitian, many of those that arise when solving linear algebra problems have one of these properties. Furthermore, if a real tridiagonal matrix A satisfies ak,k+1 ak+1,k > 0 for all k, so that the signs of its entries are symmetric, then it is similar to a Hermitian matrix, by a diagonal change of basis matrix. Hence, its eigenvalues are real. If we replace the strict inequality by ak,k+1 ak+1,k ≥ 0, then by continuity, the eigenvalues are still guaranteed to be real, but the matrix need no longer be similar to a Hermitian matrix.[3]
The set of all n × n tridiagonal matrices forms a 3n-2 dimensional vector space.
Many linear algebra algorithms require significantly less computational effort when applied to diagonal matrices, and this improvement often carries over to tridiagonal matrices as well.
Determinant[edit]
Main article: continuant (mathematics)
The determinant of a tridiagonal matrix A of order n can be computed from a three-term recurrence relation.[4] Write f1 = |a1| = a1 and
f_n = \begin{vmatrix}
a_1 & b_1 \\
c_1 & a_2 & b_2 \\
& c_2 & \ddots & \ddots \\
& & \ddots & \ddots & b_{n-1} \\
& & & c_{n-1} & a_n
\end{vmatrix}.
The sequence (fi) is called the continuant and satisfies the recurrence relation
f_n = a_n f_{n-1} - c_{n-1}b_{n-1}f_{n-2}
with initial values f0 = 1 and f-1 = 0. The cost of computing the determinant of a tridiagonal matrix using this formula is linear in n, while the cost is cubic for a general matrix.
The inverse of a non-singular tridiagonal matrix T
T = \begin{pmatrix}
a_1 & b_1 \\
c_1 & a_2 & b_2 \\
& c_2 & \ddots & \ddots \\
& & \ddots & \ddots & b_{n-1} \\
& & & c_{n-1} & a_n
\end{pmatrix}
is given by
(T^{-1})_{ij} = \begin{cases}
(-1)^{i+j}b_i \cdots b_{j-1} \theta_{i-1} \phi_{j+1}/\theta_n & \text{ if } i \leq j\\
(-1)^{i+j}c_j \cdots c_{i-1} \theta_{j-1} \phi_{i+1}/\theta_n & \text{ if } i > j\\
\end{cases}
where the θi satisfy the recurrence relation
\theta_i = a_i \theta_{i-1} - b_{i-1}c_{i-1}\theta_{i-2} \quad \text{ for } i=2,3,\ldots,n
with initial conditions θ0 = 1, θ1 = a1 and the ϕi satisfy
\phi_i = a_i \phi_{i+1} - b_i c_i \phi_{i+2} \quad \text{ for } i=n-1,\ldots,1
with initial conditions ϕn+1 = 1 and ϕn = an.[5][6]
Closed form solutions can be computed for special cases such as symmetric matrices with all off-diagonal elements equal[7] or Toeplitz matrices[8] and for the general case as well.[9][10]
In general, the inverse of a tridiagonal matrix is a semiseparable matrix and vice versa.[11]
Solution of linear system[edit]
Main article: tridiagonal matrix algorithm
A system of equations A x = b for \scriptstyle b\in \reals^n can be solved by an efficient form of Gaussian elimination when A is tridiagonal called tridiagonal matrix algorithm, requiring O(n) operations.[12]
Eigenvalues[edit]
When a tridiagonal matrix is also Toeplitz, there is a simple closed-form solution for its eigenvalues, namely a + 2 \sqrt{bc} \, \cos(k \pi / {(n+1)}) , for k=1,...,n. [13][14]
A real symmetric tridiagonal matrix has real eigenvalues, and all the eigenvalues are distinct (simple) if all off-diagonal elements are nonzero.[15] Numerous methods exist for the numerical computation of the eigenvalues of a real symmetric tridiagonal matrix to arbitrary finite precision, typically requiring O(n^2) operations for a matrix of size n\times n, although fast algorithms exist which require only O(n\ln n).[16]
Computer programming[edit]
A transformation that reduces a general matrix to Hessenberg form will reduce a Hermitian matrix to tridiagonal form. So, many eigenvalue algorithms, when applied to a Hermitian matrix, reduce the input Hermitian matrix to tridiagonal form as a first step.
A tridiagonal matrix can also be stored more efficiently than a general matrix by using a special storage scheme. For instance, the LAPACK Fortran package stores an unsymmetric tridiagonal matrix of order n in three one-dimensional arrays, one of length n containing the diagonal elements, and two of length n − 1 containing the subdiagonal and superdiagonal elements.
QMRIn linear algebra, a pentadiagonal matrix is a matrix that is nearly diagonal; to be exact, it is a matrix in which the only nonzero entries are on the main diagonal, and the first two diagonals above and below it. So it is of the form
{\begin{pmatrix}c_{1}&d_{1}&e_{1}&0&\cdots &\cdots &0\\b_{1}&c_{2}&d_{2}&e_{2}&\ddots &&\vdots \\a_{1}&b_{2}&\ddots &\ddots &\ddots &\ddots &\vdots \\0&a_{2}&\ddots &\ddots &\ddots &e_{n-3}&0\\\vdots &\ddots &\ddots &\ddots &\ddots &d_{n-2}&e_{n-2}\\\vdots &&\ddots &a_{n-3}&b_{n-2}&c_{n-1}&d_{n-1}\\0&\cdots &\cdots &0&a_{n-2}&b_{n-1}&c_{n}\end{pmatrix}}.
It follows that a pentadiagonal matrix has at most 5n-6 nonzero entries, where n is the size of the matrix. Hence, pentadiagonal matrices are sparse. This makes them useful in numerical analysis.
In numerical analysis, a sparse matrix is a matrix in which most of the elements are zero. By contrast, if most of the elements are nonzero, then the matrix is considered dense. The fraction of non-zero elements over the total number of elements (i.e., that can fit into the matrix, say a matrix of dimension of m x n can accommodate m x n total number of elements) in a matrix is called the sparsity (density).
Conceptually, sparsity corresponds to systems which are loosely coupled. Consider a line of balls connected by springs from one to the next: this is a sparse system as only adjacent balls are coupled. By contrast, if the same line of balls had springs connecting each ball to all other balls, the system would correspond to a dense matrix. The concept of sparsity is useful in combinatorics and application areas such as network theory, which have a low density of significant data or connections.
Large sparse matrices often appear in scientific or engineering applications when solving partial differential equations.
When storing and manipulating sparse matrices on a computer, it is beneficial and often necessary to use specialized algorithms and data structures that take advantage of the sparse structure of the matrix. Operations using standard dense-matrix structures and algorithms are slow and inefficient when applied to large sparse matrices as processing and memory are wasted on the zeroes. Sparse data is by nature more easily compressed and thus require significantly less storage. Some very large sparse matrices are infeasible to manipulate using standard dense-matrix algorithms.
Storing a sparse matrix[edit]
A matrix is typically stored as a two-dimensional array. Each entry in the array represents an element ai,j of the matrix and is accessed by the two indices i and j. Conventionally, i is the row index, numbered from top to bottom, and j is the column index, numbered from left to right. For an m × n matrix, the amount of memory required to store the matrix in this format is proportional to m × n (disregarding the fact that the dimensions of the matrix also need to be stored).
In the case of a sparse matrix, substantial memory requirement reductions can be realized by storing only the non-zero entries. Depending on the number and distribution of the non-zero entries, different data structures can be used and yield huge savings in memory when compared to the basic approach. The trade-off is that accessing the individual elements becomes more complex and additional structures are needed to be able to recover the original matrix unambiguously.
Formats can be divided into two groups:
Those that support efficient modification, such as DOK (Dictionary of keys), LIL (List of lists), or COO (Coordinate list). These are typically used to construct the matrices.
Those that support efficient access and matrix operations, such as CSR (Compressed Sparse Row) or CSC (Compressed Sparse Column).
Dictionary of keys (DOK)[edit]
DOK consists of a dictionary that maps (row, column)-pairs to the value of the elements. Elements that are missing from the dictionary are taken to be zero. The format is good for incrementally constructing a sparse matrix in random order, but poor for iterating over non-zero values in lexicographical order. One typically constructs a matrix in this format and then converts to another more efficient format for processing.[1]
List of lists (LIL)[edit]
LIL stores one list per row, with each entry containing the column index and the value. Typically, these entries are kept sorted by column index for faster lookup. This is another format good for incremental matrix construction.[2]
Coordinate list (COO)[edit]
COO stores a list of (row, column, value) tuples. Ideally, the entries are sorted (by row index, then column index) to improve random access times. This is another format which is good for incremental matrix construction.[3]
Yale[edit]
The Yale sparse matrix format stores an initial sparse m × n matrix, M, in row form using three (one-dimensional) arrays (A, IA, JA). Let NNZ denote the number of nonzero entries in M. (Note that zero-based indices shall be used here.)
The array A is of length NNZ and holds all the nonzero entries of M in left-to-right top-to-bottom ("row-major") order.
The array IA is of length m + 1 and contains the index in A of the first element in each row, followed by the total number of nonzero elements NNZ. IA[i] contains the index in A of the first nonzero element of row i. Row i of the original matrix extends from A[IA[i]] to A[IA[i + 1] − 1], i.e. from the start of one row to the last index before the start of the next. The last entry, IA[m], must be the number of elements in A. Another way to understand this array is to define IA[0] as zero ,then the nth element IA[n] is the total count of nonzero element from row[0] to row[n-1]. [4]
The third array, JA, contains the column index in M of each element of A and hence is of length NNZ as well.
For example, the matrix
\begin{pmatrix}
0 & 0 & 0 & 0 \\
5 & 8 & 0 & 0 \\
0 & 0 & 3 & 0 \\
0 & 6 & 0 & 0 \\
\end{pmatrix}
is a 4 × 4 matrix with 4 nonzero elements, hence
So, in array JA, the element "5" from A has column index 0, "8" and "6" have index 1, and element "3" has index 2.
In this case the Yale representation contains 13 entries, compared to 16 in the original matrix. The Yale format saves on memory only when NNZ < (m (n − 1) − 1) / 2. Another example, the matrix
\begin{pmatrix}
10 & 20 & 0 & 0 & 0 & 0 \\
0 & 30 & 0 & 40 & 0 & 0 \\
0 & 0 & 50 & 60 & 70 & 0 \\
0 & 0 & 0 & 0 & 0 & 80 \\
\end{pmatrix}
is a 4 × 6 matrix (24 entries) with 8 nonzero elements, so
The whole is stored as 21 entries.
IA splits the array A into rows: (10, 20) (30, 40) (50, 60, 70) (80);
JA aligns values in columns: (10, 20, ...) (0, 30, 0, 40, ...)(0, 0, 50, 60, 70, 0) (0, 0, 0, 0, 0, 80).
Note that in this format, the first value of IA is always zero and the last is always NNZ, so they are in some sense redundant. However, they can make accessing and traversing the array easier for the programmer.
Compressed row Storage (CRS or CSR)[edit]
CSR is effectively identical to the Yale Sparse Matrix format, except that the column array is normally stored ahead of the row index array. I.e. CSR is (val, col_ind, row_ptr), where val is an array of the (left-to-right, then top-to-bottom) non-zero values of the matrix; col_ind is the column indices corresponding to the values; and, row_ptr is the list of value indexes where each row starts. The name is based on the fact that row index information is compressed relative to the COO format. One typically uses another format (LIL, DOK, COO) for construction. This format is efficient for arithmetic operations, row slicing, and matrix-vector products. See scipy.sparse.csr_matrix.
CSC is similar to CSR except that values are read first by column, a row index is stored for each value, and column pointers are stored. I.e. CSC is (val, row_ind, col_ptr), where val is an array of the (top-to-bottom, then left-to-right) non-zero values of the matrix; row_ind is the row indices corresponding to the values; and, col_ptr is the list of val indexes where each column starts. The name is based on the fact that column index information is compressed relative to the COO format. One typically uses another format (LIL, DOK, COO) for construction. This format is efficient for arithmetic operations, column slicing, and matrix-vector products. See scipy.sparse.csc_matrix. This is the traditional format for specifying a sparse matrix in MATLAB (via the sparse function).
Recall matrices ar quadrant grids
Banded[edit]
Main article: Band matrix
An important special type of sparse matrices is band matrix, defined as follows. The lower bandwidth of a matrix A is the smallest number p such that the entry ai,j vanishes whenever i > j + p. Similarly, the upper bandwidth is the smallest number p such that ai,j = 0 whenever i < j − p (Golub & Van Loan 1996, §1.2.1). For example, a tridiagonal matrix has lower bandwidth 1 and upper bandwidth 1. As another example, the following sparse matrix has lower and upper bandwidth both equal to 3. Notice that zeros are represented with dots for clarity.
\left(
\begin{smallmatrix}
X & X & X & \cdot & \cdot & \cdot & \cdot & \\
X & X & \cdot & X & X & \cdot & \cdot & \\
X & \cdot & X & \cdot & X & \cdot & \cdot & \\
\cdot & X & \cdot & X & \cdot & X & \cdot & \\
\cdot & X & X & \cdot & X & X & X & \\
\cdot & \cdot & \cdot & X & X & X & \cdot & \\
\cdot & \cdot & \cdot & \cdot & X & \cdot & X & \\
\end{smallmatrix}
\right)
Matrices with reasonably small upper and lower bandwidth are known as band matrices and often lend themselves to simpler algorithms than general sparse matrices; or one can sometimes apply dense matrix algorithms and gain efficiency simply by looping over a reduced number of indices.
By rearranging the rows and columns of a matrix A it may be possible to obtain a matrix A′ with a lower bandwidth. A number of algorithms are designed for bandwidth minimization.
Diagonal[edit]
A very efficient structure for an extreme case of band matrices, the diagonal matrix, is to store just the entries in the main diagonal as a one-dimensional array, so a diagonal n × n matrix requires only n entries.
Symmetric[edit]
A symmetric sparse matrix arises as the adjacency matrix of an undirected graph; it can be stored efficiently as an adjacency list.
Reducing fill-in[edit]
The fill-in of a matrix are those entries which change from an initial zero to a non-zero value during the execution of an algorithm. To reduce the memory requirements and the number of arithmetic operations used during an algorithm it is useful to minimize the fill-in by switching rows and columns in the matrix. The symbolic Cholesky decomposition can be used to calculate the worst possible fill-in before doing the actual Cholesky decomposition.
There are other methods than the Cholesky decomposition in use. Orthogonalization methods (such as QR factorization) are common, for example, when solving problems by least squares methods. While the theoretical fill-in is still the same, in practical terms the "false non-zeros" can be different for different methods. And symbolic versions of those algorithms can be used in the same manner as the symbolic Cholesky to compute worst case fill-in.
Solving sparse matrix equations[edit]
Both iterative and direct methods exist for sparse matrix solving.
Iterative methods, such as conjugate gradient method and GMRES utilize fast computations of matrix-vector products Ax_i, where matrix A is sparse. The use of preconditioners can significantly accelerate convergence of such iterative methods.
QMRBabylonian clay tablet YBC 7289 (c. 1800–1600 BC) with annotations. The approximation of the square root of 2 is four sexagesimal figures, which is about six decimal figures. 1 + 24/60 + 51/602 + 10/603 = 1.41421296...[1]
The picture is a quadrant
QMR In the mathematical discipline of matrix theory, a Jordan block over a ring R (whose identities are the zero 0 and one 1) is a matrix composed of 0 elements everywhere except for the diagonal, which is filled with a fixed element \lambda \in R, and for the superdiagonal, which is composed of ones. The concept is named after Camille Jordan.
{\begin{pmatrix}\lambda &1&0&\cdots &0\\0&\lambda &1&\cdots &0\\\vdots &\vdots &\vdots &\ddots &\vdots \\0&0&0&\lambda &1\\0&0&0&0&\lambda \end{pmatrix}}
Every Jordan block is thus specified by its dimension n and its eigenvalue \lambda and is indicated as J_{\lambda ,n}. Any block diagonal matrix whose blocks are Jordan blocks is called a Jordan matrix; using either the \oplus or the “\mathrm {diag} ” symbol, the (l+m+n)\times (l+m+n) block diagonal square matrix whose first diagonal block is J_{\alpha ,l}, whose second diagonal block is J_{\beta ,m} and whose third diagonal block is J_{\gamma ,n} is compactly indicated as J_{\alpha ,l}\oplus J_{\beta ,m}\oplus J_{\gamma ,n} or \mathrm {diag} \left(J_{\alpha ,l},J_{\beta ,m},J_{\gamma ,n}\right), respectively. For example the matrix
J=\left({\begin{matrix}0&1&0&0&0&0&0&0&0&0\\0&0&1&0&0&0&0&0&0&0\\0&0&0&0&0&0&0&0&0&0\\0&0&0&i&1&0&0&0&0&0\\0&0&0&0&i&0&0&0&0&0\\0&0&0&0&0&i&1&0&0&0\\0&0&0&0&0&0&i&0&0&0\\0&0&0&0&0&0&0&7&1&0\\0&0&0&0&0&0&0&0&7&1\\0&0&0&0&0&0&0&0&0&7\end{matrix}}\right)
is a 10\times 10 Jordan matrix with a 3\times 3 block with eigenvalue 0, two 2\times 2 blocks with eigenvalue the imaginary unit and a 3\times 3 block with eigenvalue 7. Its Jordan-block structure can also be written as either J_{0,3}\oplus J_{i,2}\oplus J_{i,2}\oplus J_{7,3} or \mathrm {diag} \left(J_{0,3},J_{i,2},J_{i,2},J_{7,3}\right).
Linear algebra[edit]
Any n\times n square matrix A whose elements are in an algebraically closed field K is similar to a Jordan matrix J, also in \mathbb {M} _{n}(K), which is unique up to a permutation of its diagonal blocks themselves. J is called the Jordan normal form of A and corresponds to a generalization of the diagonalization procedure.[1][2][3] A diagonalizable matrix is similar, in fact, to a special case of Jordan matrix: the matrix whose blocks are all 1\times 1.[4][5][6]
More generally, given a Jordan matrix J=J_{\lambda _{1},m_{1}}\oplus J_{\lambda _{2},m_{2}}\oplus \ldots \oplus J_{\lambda _{N},m_{N}}, i.e. whose k^{\text{th}} diagonal block, 1\leq k\leq N is the Jordan block J_{\lambda _{k},m_{k}} and whose diagonal elements \lambda _{k} may not all be distinct, the geometric multiplicity of \lambda \in K for the matrix J, indicated as \mathrm {gmul} _{J}\lambda \,, corresponds to the number of Jordan blocks whose eigenvalue is \lambda . Whereas the index of an eigenvalue \lambda for J, indicated as \mathrm {idx} _{J}\lambda \,, is defined as the dimension of the largest Jordan block associated to that eigenvalue.
The same goes for all the matrices A similar to J, so \mathrm {idx} _{A}\lambda \, can be defined accordingly with respect to the Jordan normal form of A for any of its eigenvalues \lambda \in \mathrm {spec} A. In this case one can check that the index of \lambda for A is equal to its multiplicity as a root of the minimal polynomial of A (whereas, by definition, its algebraic multiplicity for A, \mathrm {mul} _{A}\lambda \,, is its multiplicity as a root of the characteristic polynomial of A, i.e. \det(A-xI)\in K[x]). An equivalent necessary and sufficient condition for A to be diagonalizable in K is that all of its eigenvalues have index equal to 1, i.e. its minimal polynomial has only simple roots.
Note that knowing a matrix's spectrum with all of its algebraic/geometric multiplicities and indexes does not always allow for the computation of its Jordan normal form (this may be a sufficient condition only for spectrally simple, usually low-dimensional matrices): the Jordan decomposition is, in general, a computationally challenging task. From the vector space point of view, the Jordan decomposition is equivalent to finding an orthogonal decomposition (i.e. via direct sums of eigenspaces represented by Jordan blocks) of the domain which the associated generalized eigenvectors make a basis for.
Functions of matrices[edit]
Let A\in \mathbb {M} _{n}(\mathbb {C} ) (i.e. a n\times n complex matrix) and C\in \mathrm {GL} _{n}(\mathbb {C} ) be the change of basis matrix to the Jordan normal form of A, i.e. A=C^{-1}JC. Now let f(z) be a holomorphic function on an open set {\mathit {\Omega }} such that \mathrm {spec} A\subset {\mathit {\Omega }}\subseteq \mathbb {C} , i.e. the spectrum of the matrix is contained inside the domain of holomorphy of f. Let
f(z)=\sum _{h=0}^{\infty }a_{h}(z-z_{0})^{h}
be the power series expansion of f around z_{0}\in {\mathit {\Omega }}\backslash \mathrm {spec} A, which will be hereinafter supposed to be 0 for simplicity's sake. The matrix f(A) is then defined via the following formal power series
f(A)=\sum _{h=0}^{\infty }a_{h}A^{h}
is absolutely convergent with respect to the Euclidean norm of \mathbb {M} _{n}(\mathbb {C} ). To put it another way, f(A)\, converges absolutely for every square matrix whose spectral radius is less than the radius of convergence of f around 0 and is uniformly convergent on any compact subsets of \mathbb {M} _{n}(\mathbb {C} ) satisfying this property in the matrix Lie group topology.
The Jordan normal form allows the computation of functions of matrices without explicitly computing an infinite series, which is one of the main achievements of Jordan matrices. Using the facts that the k^{\mathrm {th} } power (k\in \mathbb {N} _{0}) of a diagonal block matrix is the diagonal block matrix whose blocks are the k^{\mathrm {th} } powers of the respective blocks, i.e. \left(A_{1}\oplus A_{2}\oplus A_{3}\oplus \ldots \right)^{k}=A_{1}^{k}\oplus A_{2}^{k}\oplus A_{3}^{k}\oplus \ldots , and that A^{k}=C^{-1}J^{k}C\,, the above matrix power series becomes
f(A)=C^{-1}f(J)C=C^{-1}\left(\bigoplus _{k=1}^{N}f\left(J_{\lambda _{k},m_{k}}\right)\right)C
where the last series must not be computed explicitly via power series of every Jordan block. In fact, if \lambda \in {\mathit {\Omega }}, any holomorphic function of a Jordan block f(J_{\lambda ,n})\, is the following upper triangular matrix:
f(J_{\lambda ,n})=\left({\begin{matrix}f(\lambda )&f^{\prime }(\lambda )&{\frac {f^{\prime \prime }(\lambda )}{2}}&\cdots &{\frac {f^{(n-2)}(\lambda )}{(n-2)!}}&{\frac {f^{(n-1)}(\lambda )}{(n-1)!}}\\0&f(\lambda )&f^{\prime }(\lambda )&\cdots &{\frac {f^{(n-3)}(\lambda )}{(n-3)!}}&{\frac {f^{(n-2)}(\lambda )}{(n-2)!}}\\0&0&f(\lambda )&\cdots &{\frac {f^{(n-4)}(\lambda )}{(n-4)!}}&{\frac {f^{(n-3)}(\lambda )}{(n-3)!}}\\\vdots &\vdots &\vdots &\ddots &\vdots &\vdots \\0&0&0&\cdots &f(\lambda )&f^{\prime }(\lambda )\\0&0&0&\cdots &0&f(\lambda )\\\end{matrix}}\right)=\left({\begin{matrix}a_{0}&a_{1}&a_{2}&\cdots &a_{n-2}&a_{n-1}\\0&a_{0}&a_{1}&\cdots &a_{n-3}&a_{n-2}\\0&0&a_{0}&\cdots &a_{n-4}&a_{n-3}\\\vdots &\vdots &\vdots &\ddots &\vdots &\vdots \\0&0&0&\cdots &a_{0}&a_{1}\\0&0&0&\cdots &0&a_{0}\\\end{matrix}}\right).
As a consequence of this, the computation of any functions of a matrix is straightforward whenever its Jordan normal form and its change-of-basis matrix are known. Also, \mathrm {spec} f(A)=f(\mathrm {spec} A), i.e. every eigenvalue \lambda \in \mathrm {spec} A corresponds to the eigenvalue f(\lambda )\in \mathrm {spec} f(A), but it has, in general, different algebraic multiplicity, geometric multiplicity and index. However, the algebraic multiplicity may be computed as follows:
{\text{mul}}_{f(A)}f(\lambda )=\sum _{\mu \in {\text{spec}}A\cap f^{-1}(f(\lambda ))}~{\text{mul}}_{A}\mu .\,
The function f(T) of a linear transformation T between vector spaces can be defined in a similar way according to the holomorphic functional calculus, where Banach space and Riemann surface theories play a fundamental role. In the case of finite-dimensional spaces, both theories perfectly match.
Dynamical systems[edit]
Now suppose a (complex) dynamical system is simply defined by the equation
{\dot {\mathbf {z} }}(t)=A(\mathbf {c} )\mathbf {z} (t),
\mathbf {z} (0)=\mathbf {z} _{0}\in \mathbb {C} ^{n},
where \mathbf {z} :\mathbb {R_{+}} \rightarrow {\mathcal {R}} is the (n-dimensional) curve parametrization of an orbit on the Riemann surface {\mathcal {R}} of the dynamical system, whereas A(\mathbf {c} ) is an n\times n complex matrix whose elements are complex functions of a d-dimensional parameter \mathbf {c} \in \mathbb {C} ^{d}. Even if A\in \mathbb {M} _{n}\left(\mathrm {C} ^{0}(\mathbb {C} ^{d})\right) (i.e. A continuously depends on the parameter \mathbf {c} ) the Jordan normal form of the matrix is continuously deformed almost everywhere on \mathbb {C} ^{d} but, in general, not everywhere: there is some critical submanifold of \mathbb {C} ^{d} on which the Jordan form abruptly changes its structure whenever the parameter crosses or simply “travels” around it (monodromy). Such changes mean that several Jordan blocks (either belonging to different eigenvalues or not) join together to a unique Jordan block, or vice versa (i.e. one Jordan block splits into two or more different ones). Many aspects of bifurcation theory for both continuous and discrete dynamical systems can be interpreted with the analysis of functional Jordan matrices.
From the tangent space dynamics, this means that the orthogonal decomposition of the dynamical system's phase space changes and, for example, different orbits gain periodicity, or lose it, or shift from a certain kind of periodicity to another (such as period-doubling, cfr. logistic map).
In a sentence, the qualitative behaviour of such a dynamical system may substantially change as the versal deformation of the Jordan normal form of A(\mathbf {c} ).
Linear ordinary differential equations[edit]
The simplest example of a dynamical system is a system of linear, constant-coefficient, ordinary differential equations, i.e. let A\in \mathbb {M} _{n}(\mathbb {C} ) and \mathbf {z} _{0}\in \mathbb {C} ^{n}:
{\dot {\mathbf {z} }}(t)=A\mathbf {z} (t),
\mathbf {z} (0)=\mathbf {z} _{0},
whose direct closed-form solution involves computation of the matrix exponential:
\mathbf {z} (t)=e^{tA}\mathbf {z} _{0}.
Another way, provided the solution is restricted to the local Lebesgue space of n-dimensional vector fields \mathbf {z} \in \mathrm {L} _{\mathrm {loc} }^{1}(\mathbb {R} _{+})^{n}, is to use its Laplace transform \mathbf {Z} (s)={\mathcal {L}}[\mathbf {z} ](s). In this case
\mathbf {Z} (s)=\left(sI-A\right)^{-1}\mathbf {z} _{0}.
The matrix function \left(A-sI\right)^{-1} is called the resolvent matrix of the differential operator {\frac {\mathrm {d} }{\mathrm {d} t}}-A. It is meromorphic with respect to the complex parameter s\in \mathbb {C} since its matrix elements are rational functions whose denominator is equal for all to \det(A-sI). Its polar singularities are the eigenvalues of A, whose order equals their index for it, i.e. \mathrm {ord} _{(A-sI)^{-1}}\lambda =\mathrm {idx} _{A}\lambda .
QMRIn mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. Abstractly, the matrix exponential gives the connection between a matrix Lie algebra and the corresponding Lie group.
Let X be an n×n real or complex matrix. The exponential of X, denoted by eX or exp(X), is the n×n matrix given by the power series
e^X = \sum_{k=0}^\infty{1 \over k!}X^k.
The above series always converges, so the exponential of X is well-defined. If X is a 1×1 matrix the matrix exponential of X is a 1×1 matrix whose single element is the ordinary exponential of the single element of X.
Properties[edit]
Let X and Y be n×n complex matrices and let a and b be arbitrary complex numbers. We denote the n×n identity matrix by I and the zero matrix by 0. The matrix exponential satisfies the following properties:[1]
e0 = I
eaXebX = e(a + b)X
eXe−X = I
If XY = YX then eXeY = eYeX = e(X + Y).
If Y is invertible then eYXY−1 = YeXY−1.
exp(XT) = (exp X)T, where XT denotes the transpose of X. It follows that if X is symmetric then eX is also symmetric, and that if X is skew-symmetric then eX is orthogonal.
exp(X∗) = (exp X)∗, where X∗ denotes the conjugate transpose of X. It follows that if X is Hermitian then eX is also Hermitian, and that if X is skew-Hermitian then eX is unitary.
A Laplace transform of matrix exponentials amounts to the resolvent, ∫0∞dt e−t etX = I / (I−X).
Linear differential equation systems[edit]
Main article: matrix differential equation
One of the reasons for the importance of the matrix exponential is that it can be used to solve systems of linear ordinary differential equations. The solution of
\frac{d}{dt} y(t) = Ay(t), \quad y(0) = y_0,
where A is a constant matrix, is given by
y(t) = e^{At} y_0. \,
The matrix exponential can also be used to solve the inhomogeneous equation
\frac{d}{dt} y(t) = Ay(t) + z(t), \quad y(0) = y_0.
See the section on applications below for examples.
There is no closed-form solution for differential equations of the form
\frac{d}{dt} y(t) = A(t) \, y(t), \quad y(0) = y_0,
where A is not constant, but the Magnus series gives the solution as an infinite sum.
The exponential of sums[edit]
For any real numbers (scalars) x and y we know that the exponential function satisfies ex+y = ex ey. The same is true for commuting matrices. If matrices X and Y commute (meaning that XY = YX), then
e^{X+Y} = e^Xe^Y ~.
However, for matrices that do not commute the above equality does not necessarily hold. In this case the Baker–Campbell–Hausdorff formula can be used to calculate eX+Y.
The converse is not true in general. The equation eX+Y = eX eY does not imply that X and Y commute.
For Hermitian matrices there are two notable theorems related to the trace of matrix exponentials.
Golden–Thompson inequality[edit]
Main article: Golden–Thompson inequality
If A and H are Hermitian matrices, then
\operatorname{tr}\exp(A+H) \leq \operatorname{tr}(\exp(A)\exp(H)). [2]
Note that there is no requirement of commutativity. There are counterexamples to show that the Golden–Thompson inequality cannot be extended to three matrices – and, in any event, tr(exp(A)exp(B)exp(C)) is not guaranteed to be real for Hermitian A, B, C. However, the next theorem accomplishes this in one sense.
Lieb's theorem[edit]
The Lieb's theorem, named after Elliott H. Lieb, states that, for a fixed Hermitian matrix H, the function
f(A) = \operatorname{tr} \,\exp \left (H + \log A \right)
is concave on the cone of positive-definite matrices.[3]
The exponential map[edit]
Note that the exponential of a matrix is always an invertible matrix. The inverse matrix of eX is given by e−X. This is analogous to the fact that the exponential of a complex number is always nonzero. The matrix exponential then gives us a map
\exp \colon M_n(\mathbb C) \to \mathrm{GL}(n,\mathbb C)
from the space of all n×n matrices to the general linear group of degree n, i.e. the group of all n×n invertible matrices. In fact, this map is surjective which means that every invertible matrix can be written as the exponential of some other matrix[4] (for this, it is essential to consider the field C of complex numbers and not R).
For any two matrices X and Y,
\| e^{X+Y} - e^X \| \le \|Y\| e^{\|X\|} e^{\|Y\|},
where || · || denotes an arbitrary matrix norm. It follows that the exponential map is continuous and Lipschitz continuous on compact subsets of Mn(C).
The map
t \mapsto e^{tX}, \qquad t \in \mathbb R
defines a smooth curve in the general linear group which passes through the identity element at t = 0.
In fact, this gives a one-parameter subgroup of the general linear group since
e^{tX}e^{sX} = e^{(t+s)X}.\,
The derivative of this curve (or tangent vector) at a point t is given by
\frac{d}{dt}e^{tX} = Xe^{tX} = e^{tX}X. \qquad (1)
The derivative at t = 0 is just the matrix X, which is to say that X generates this one-parameter subgroup.
More generally,[5] for a generic t-dependent exponent, X(t),
\frac{d}{dt}e^{X(t)} = \int_0^1 e^{\alpha X(t)} \frac{dX(t)}{dt} e^{(1-\alpha) X(t)}\,d\alpha ~.
Taking the above expression eX(t) outside the integral sign and expanding the integrand with the help of the Hadamard lemma one can obtain the following useful expression for the derivative of the matrix exponent,
\left(\frac{d}{dt}e^{X(t)}\right)e^{-X(t)} = \frac{d}{dt}X(t) + \frac{1}{2!}[X(t),\frac{d}{dt}X(t)] + \frac{1}{3!}[X(t),[X(t),\frac{d}{dt}X(t)]]+\cdots
Note that the coefficients in the expression above are different from what appears in the exponential. For a closed form, see derivative of the exponential map.
The determinant of the matrix exponential[edit]
By Jacobi's formula, for any complex square matrix the following trace identity holds:[6]
\det (e^A)= e^{\operatorname{tr}(A)}~.
In addition to providing a computational tool, this formula demonstrates that a matrix exponential is always an invertible matrix. This follows from the fact that the right hand side of the above equation is always non-zero, and so det(eA) ≠ 0, which implies that eA must be invertible.
In the real-valued case, the formula also exhibits the map
\exp \colon M_n(\mathbb R) \to \mathrm{GL}(n,\mathbb R)
to not be surjective, in contrast to the complex case mentioned earlier. This follows from the fact that, for real-valued matrices, the right-hand side of the formula is always positive, while there exist invertible matrices with a negative determinant.
Computing the matrix exponential[edit]
Finding reliable and accurate methods to compute the matrix exponential is difficult, and this is still a topic of considerable current research in mathematics and numerical analysis. Matlab, GNU Octave, and SciPy all use the Padé approximant.[7][8][9]
Diagonalizable case[edit]
If a matrix is diagonal:
A=\begin{bmatrix} a_1 & 0 & \ldots & 0 \\
0 & a_2 & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\
0 & 0 & \ldots & a_n \end{bmatrix} ,
then its exponential can be obtained by exponentiating each entry on the main diagonal:
e^A=\begin{bmatrix} e^{a_1} & 0 & \ldots & 0 \\
0 & e^{a_2} & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\
0 & 0 & \ldots & e^{a_n} \end{bmatrix} .
This also allows one to exponentiate diagonalizable matrices. If A = UDU−1 and D is diagonal, then eA = UeDU−1. Application of Sylvester's formula yields the same result. (To see this, note that addition and multiplication, hence also exponentiation, of diagonal matrices is equivalent to element-wise addition and multiplication, and hence exponentiation; in particular, the "one-dimensional" exponentiation is felt element-wise for the diagonal case.)
Projection case[edit]
If P is a projection matrix (i.e. is idempotent), its matrix exponential is eP = I + (e − 1)P. This may be derived by expansion of the definition of the exponential function and by use of the idempotency of P:
e^P = \sum_{k=0}^{\infty} \frac{P^k}{k!}=I+\left(\sum_{k=1}^{\infty} \frac{1}{k!}\right)P=I+(e-1)P ~.
Rotation case[edit]
For a simple rotation in which the perpendicular unit vectors a and b specify a plane,[10] the rotation matrix R can be expressed in terms of a similar exponential function involving a generator G and angle θ.[11][12]
G=ba^\mathsf{T}-ab^\mathsf{T} \qquad a^\mathsf{T}b=0
-G^2=aa^\mathsf{T}+bb^\mathsf{T}=P \qquad P^2=P \qquad PG=GP=G ~,
\begin{align}
R\left( \theta \right) &= {{e}^{G\theta }}=I+G\sin (\theta ) +{{G}^{2}}(1- \cos (\theta ) ) \\
&=I-P+P\cos (\theta )+G\sin (\theta ) ~.\\
\end{align}
The formula for the exponential results from reducing the powers of G in the series expansion and identifying the respective series coefficients of G2 and G with −cos(θ) and sin(θ) respectively. The second expression here for eGθ is the same as the expression for R(θ) in the article containing the derivation of the generator, R(θ) = eGθ.
No comments:
Post a Comment