SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
Linear Algebra
Lecture slides for Chapter 2 of Deep Learning
Ian Goodfellow
2016-06-24
(Goodfellow 2016)
About this chapter
• Not a comprehensive survey of all of linear algebra
• Focused on the subset most relevant to deep
learning
• Larger subset: e.g., Linear Algebra by Georgi Shilov
(Goodfellow 2016)
Scalars
• A scalar is a single number
• Integers, real numbers, rational numbers, etc.
• We denote it with italic font:
a, n, x
(Goodfellow 2016)
Vectors
• A vector is a 1-D array of numbers:
• Can be real, binary, integer, etc.
• Example notation for type and size:
rder. We can identify each individual number by its index in that ordering.
ypically we give vectors lower case names written in bold typeface, such
s x. The elements of the vector are identified by writing its name in italic
ypeface, with a subscript. The first element of x is x1, the second element
x2 and so on. We also need to say what kind of numbers are stored in
he vector. If each element is in R, and the vector has n elements, then the
ector lies in the set formed by taking the Cartesian product of R n times,
enoted as Rn. When we need to explicitly identify the elements of a vector,
e write them as a column enclosed in square brackets:
x =
2
6
6
6
4
x1
x2
...
xn
3
7
7
7
5
. (2.1)
We can think of vectors as identifying points in space, with each element
ving the coordinate along a different axis.
ometimes we need to index a set of elements of a vector. In this case, we
efine a set containing the indices and write the set as a subscript. For
xample, to access x1, x3 and x6, we define the set S = {1, 3, 6} and write
S. We use the sign to index the complement of a set. For example x 1 is
he vector containing all elements of x except for x1, and x S is the vector
ontaining all of the elements of x except for x1, x3 and x6.
Matrices: A matrix is a 2-D array of numbers, so each element is identified by
wo indices instead of just one. We usually give matrices upper-case variable
ames with bold typeface, such as A. If a real-valued matrix A has a height
• Vectors: A vector is an array of
order. We can identify each indiv
Typically we give vectors lower
as x. The elements of the vector
typeface, with a subscript. The fi
is x2 and so on. We also need t
the vector. If each element is in R
vector lies in the set formed by t
denoted as Rn. When we need to
(Goodfellow 2016)
Matrices
• A matrix is a 2-D array of numbers:
• Example notation for type and shape:
A =
2
4
A1,1 A1,2
A2,1 A2,2
A3,1 A3,2
3
5 ) A>
=

A1,1 A2,1 A3,1
A1,2 A2,2 A3,2
The transpose of the matrix can be thought of as a mirror image across the
nal.
th column of A. When we need to explicitly identify the elements of a
x, we write them as an array enclosed in square brackets:

A1,1 A1,2
A2,1 A2,2
. (2.2)
times we may need to index matrix-valued expressions that are not just
gle letter. In this case, we use subscripts after the expression, but do
onvert anything to lower case. For example, f(A)i,j gives element (i, j)
e matrix computed by applying the function f to A.
ors: In some cases we will need an array with more than two axes. In
eneral case, an array of numbers arranged on a regular grid with a
ble number of axes is known as a tensor. We denote a tensor named “A”
this typeface: A. We identify the element of A at coordinates (i, j, k)
riting Ai,j,k.
ndices and write the set as a subscript
x6, we define the set S = {1, 3, 6} and
x the complement of a set. For example
ents of x except for x1, and x S is the
of x except for x1, x3 and x6.
ray of numbers, so each element is identifi
. We usually give matrices upper-case va
h as A. If a real-valued matrix A has a
we say that A 2 Rm⇥n. We usually id
g its name in italic but not bold font, an
Column
Row
(Goodfellow 2016)
Tensors
• A tensor is an array of numbers, that may have
• zero dimensions, and be a scalar
• one dimension, and be a vector
• two dimensions, and be a matrix
• or more dimensions.
(Goodfellow 2016)
Matrix Transpose
CHAPTER 2. LINEAR ALGEBRA
A =
2
4
A1,1 A1,2
A2,1 A2,2
A3,1 A3,2
3
5 ) A>
=

A1,1 A2,1 A3,1
A1,2 A2,2 A3,2
Figure 2.1: The transpose of the matrix can be thought of as a mirror image across the
main diagonal.
the i-th column of A. When we need to explicitly identify the elements of a
matrix, we write them as an array enclosed in square brackets:

A1,1 A1,2
A2,1 A2,2
. (2.2)
rtant operation on matrices is the transpose. The transpose of a
mirror image of the matrix across a diagonal line, called the main
ning down and to the right, starting from its upper left corner. See
graphical depiction of this operation. We denote the transpose of a
A>, and it is defined such that
(A>
)i,j = Aj,i. (2.3)
an be thought of as matrices that contain only one column. The
a vector is therefore a matrix with only one row. Sometimes we
33
ations have many useful properties that make mathematical
more convenient. For example, matrix multiplication is
A(B + C) = AB + AC. (2.6)
A(BC) = (AB)C. (2.7)
is not commutative (the condition AB = BA does not
alar multiplication. However, the dot product between two
:
x>
y = y>
x. (2.8)
matrix product has a simple form:
(AB)>
= B>
A>
. (2.9)
onstrate Eq. 2.8, by exploiting the fact that the value of
(Goodfellow 2016)
Matrix (Dot) Product
duct of matrices A and B is a third matrix C. In order
ned, A must have the same number of columns as B has
⇥ n and B is of shape n ⇥ p, then C is of shape m ⇥ p.
product just by placing two or more matrices together,
C = AB. (2.4)
n is defined by
Ci,j =
X
k
Ai,kBk,j. (2.5)
d product of two matrices is not just a matrix containing
dual elements. Such an operation exists and is called the
Hadamard product, and is denoted as A B.
en two vectors x and y of the same dimensionality is the
can think of the matrix product C = AB as computing
etween row i of A and column j of B.
= •m
p
m
p
n
n
Must
match
e defined, A must have the same number of columns as B has
pe m ⇥ n and B is of shape n ⇥ p, then C is of shape m ⇥ p.
atrix product just by placing two or more matrices together,
C = AB. (2.4)
ration is defined by
Ci,j =
X
k
Ai,kBk,j. (2.5)
andard product of two matrices is not just a matrix containing
ndividual elements. Such an operation exists and is called the
t or Hadamard product, and is denoted as A B.
between two vectors x and y of the same dimensionality is the
. We can think of the matrix product C = AB as computing
uct between row i of A and column j of B.
34
(Goodfellow 2016)
Identity Matrix
PTER 2. LINEAR ALGEBRA
2
4
1 0 0
0 1 0
0 0 1
3
5
Figure 2.2: Example identity matrix: This is I3.
A2,1x1 + A2,2x2 + · · · + A2,nxn = b2 (2
. . . (2
Am,1x1 + Am,2x2 + · · · + Am,nxn = bm. (2
Matrix-vector product notation provides a more compact representation
ations of this form.
Inverse Matrices
werful tool called matrix inversion that allows us to
for many values of A.
ersion, we first need to define the concept of an identity
x is a matrix that does not change any vector when we
at matrix. We denote the identity matrix that preserves
n. Formally, In 2 Rn⇥n, and
8x 2 Rn
, Inx = x. (2.20)
ity matrix is simple: all of the entries along the main
the other entries are zero. See Fig. 2.2 for an example.
(Goodfellow 2016)
Systems of Equations
t of useful properties of the matrix product here, but
that many more exist.
ear algebra notation to write down a system of linear
Ax = b (2.11)
n matrix, b 2 Rm is a known vector, and x 2 Rn is a
we would like to solve for. Each element xi of x is one
Each row of A and each element of b provide another
Eq. 2.11 as:
A1,:x = b1 (2.12)
A2,:x = b2 (2.13)
. . . (2.14)
expands to
aware that many more exist.
ough linear algebra notation to write down a system of linear
Ax = b (2.11)
a known matrix, b 2 Rm is a known vector, and x 2 Rn is a
riables we would like to solve for. Each element xi of x is one
iables. Each row of A and each element of b provide another
ewrite Eq. 2.11 as:
A1,:x = b1 (2.12)
A2,:x = b2 (2.13)
. . . (2.14)
Am,:x = bm (2.15)
tly, as:
A1,1x1 + A1,2x2 + · · · + A1,nxn = b1 (2.16)
35
(Goodfellow 2016)
Solving Systems of Equations
• A linear system of equations can have:
• No solution
• Many solutions
• Exactly one solution: this means multiplication by
the matrix is an invertible function
(Goodfellow 2016)
Matrix Inversion
• Matrix inverse:
• Solving a system using an inverse:
• Numerically unstable, but useful for abstract
analysis
identity matrix is simple: all of the entries along the main
all of the other entries are zero. See Fig. 2.2 for an example.
se of A is denoted as A 1, and it is defined as the matrix
A 1
A = In. (2.21)
Eq. 2.11 by the following steps:
Ax = b (2.22)
A 1
Ax = A 1
b (2.23)
Inx = A 1
b (2.24)
36
f the identity matrix is simple: all of the entries along the main
while all of the other entries are zero. See Fig. 2.2 for an example.
inverse of A is denoted as A 1, and it is defined as the matrix
A 1
A = In. (2.21)
solve Eq. 2.11 by the following steps:
Ax = b (2.22)
A 1
Ax = A 1
b (2.23)
Inx = A 1
b (2.24)
36
(Goodfellow 2016)
Invertibility
• Matrix can’t be inverted if…
• More rows than columns
• More columns than rows
• Redundant rows/columns (“linearly dependent”,
“low rank”)
(Goodfellow 2016)
Norms
• Functions that measure how “large” a vector is
• Similar to a distance between zero and the point
represented by the vector
is given by
||x||p =
X
i
|xi|p
!1
p
for p 2 R, p 1.
Norms, including the Lp norm, are functions mapping vect
values. On an intuitive level, the norm of a vector x measure
the origin to the point x. More rigorously, a norm is any func
the following properties:
• f(x) = 0 ) x = 0
• f(x + y)  f(x) + f(y) (the triangle inequality)
• 8↵ 2 R, f(↵x) = |↵|f(x)
The L2 norm, with p = 2, is known as the Euclidean nor
Euclidean distance from the origin to the point identified by
(Goodfellow 2016)
• Lp
norm
• Most popular norm: L2 norm, p=2
• L1 norm, p=1:
• Max norm, infinite p:
Norms
o measure the size of a vector. In machine learning, we
ectors using a function called a norm. Formally, the L
||x||p =
X
i
|xi|p
!1
p
the Lp norm, are functions mapping vectors to non-n
ive level, the norm of a vector x measures the distan
nt x. More rigorously, a norm is any function f that
ties:
APTER 2. LINEAR ALGEBRA
ause it increases very slowly near the origin. In several machine learning
lications, it is important to discriminate between elements that are exactly
and elements that are small but nonzero. In these cases, we turn to a function
t grows at the same rate in all locations, but retains mathematical simplicity:
L1 norm. The L1 norm may be simplified to
||x||1 =
X
i
|xi|. (2.31)
L1 norm is commonly used in machine learning when the difference between
o and nonzero elements is very important. Every time an element of x moves
y from 0 by ✏, the L1 norm increases by ✏.
We sometimes measure the size of the vector by counting its number of nonzero
ments. Some authors refer to this function as the “L0 norm,” but this is incorrect
minology. The number of non-zero entries in a vector is not a norm, because
zero and elements that are small but nonzero. In these cases, we turn to a function
that grows at the same rate in all locations, but retains mathematical simplicity:
the L1 norm. The L1 norm may be simplified to
||x||1 =
X
i
|xi|. (2.31)
The L1 norm is commonly used in machine learning when the difference between
zero and nonzero elements is very important. Every time an element of x moves
away from 0 by ✏, the L1 norm increases by ✏.
We sometimes measure the size of the vector by counting its number of nonzero
elements. Some authors refer to this function as the “L0 norm,” but this is incorrect
terminology. The number of non-zero entries in a vector is not a norm, because
scaling the vector by ↵ does not change the number of nonzero entries. The L1
norm is often used as a substitute for the number of nonzero entries.
One other norm that commonly arises in machine learning is the L1 norm,
also known as the max norm. This norm simplifies to the absolute value of the
element with the largest magnitude in the vector,
||x||1 = max
i
|xi|. (2.32)
Sometimes we may also wish to measure the size of a matrix. In the context
of deep learning, the most common way to do this is with the otherwise obscure
Frobenius norm sX
(Goodfellow 2016)
• Unit vector:
• Symmetric Matrix:
• Orthogonal matrix:
Special Matrices and Vectors
A = A>
. (2.35)
en arise when the entries are generated by some function of
es not depend on the order of the arguments. For example,
ance measurements, with Ai,j giving the distance from point
= Aj,i because distance functions are symmetric.
ector with unit norm:
||x||2 = 1. (2.36)
vector y are orthogonal to each other if x>y = 0. If both
orm, this means that they are at a 90 degree angle to each
n vectors may be mutually orthogonal with nonzero norm.
only orthogonal but also have unit norm, we call them
ix is a square matrix whose rows are mutually orthonormal
e mutually orthonormal:
> >
1 n
machine learning algorithm in terms of arbitrary matrices,
sive (and less descriptive) algorithm by restricting some
ices need be square. It is possible to construct a rectangular
quare diagonal matrices do not have inverses but it is still
them cheaply. For a non-square diagonal matrix D, the
scaling each element of x, and either concatenating some
is taller than it is wide, or discarding some of the last
D is wider than it is tall.
is any matrix that is equal to its own transpose:
A = A>
. (2.35)
n arise when the entries are generated by some function of
s not depend on the order of the arguments. For example,
nce measurements, with Ai,j giving the distance from point
= Aj,i because distance functions are symmetric.
es often arise when the entries are generated by some function of
at does not depend on the order of the arguments. For example,
distance measurements, with Ai,j giving the distance from point
Ai,j = Aj,i because distance functions are symmetric.
is a vector with unit norm:
||x||2 = 1. (2.36)
nd a vector y are orthogonal to each other if x>y = 0. If both
ero norm, this means that they are at a 90 degree angle to each
most n vectors may be mutually orthogonal with nonzero norm.
e not only orthogonal but also have unit norm, we call them
matrix is a square matrix whose rows are mutually orthonormal
ns are mutually orthonormal:
A>
A = AA>
= I. (2.37)
41
LGEBRA
A 1
= A>
, (2.38)
(Goodfellow 2016)
Eigendecomposition
• Eigenvector and eigenvalue:
• Eigendecomposition of a diagonalizable matrix:
• Every real symmetric matrix has a real, orthogonal
eigendecomposition:
atrix as an array of elements.
dely used kinds of matrix decomposition is called eigen-
h we decompose a matrix into a set of eigenvectors and
square matrix A is a non-zero vector v such that multipli-
the scale of v:
Av = v. (2.39)
as the eigenvalue corresponding to this eigenvector. (One
vector such that v>A = v>, but we are usually concerned
.
or of A, then so is any rescaled vector sv for s 2 R, s 6= 0.
he same eigenvalue. For this reason, we usually only look
example of the effect of eigenvectors and eigenvalues. Here, we have
h two orthonormal eigenvectors, v(1)
with eigenvalue 1 and v(2)
with
Left) We plot the set of all unit vectors u 2 R2
as a unit circle. (Right)
of all points Au. By observing the way that A distorts the unit circle, we
cales space in direction v(i)
by i.
form a matrix V with one eigenvector per column: V = [v(1), . . . ,
, we can concatenate the eigenvalues to form a vector = [ 1, . . . ,
ndecomposition of A is then given by
A = V diag( )V 1
. (2.40)
en that constructing matrices with specific eigenvalues and eigenvec-
to stretch space in desired directions. However, we often want to
rices into their eigenvalues and eigenvectors. Doing so can help us
ain properties of the matrix, much as decomposing an integer into
rs can help us understand the behavior of that integer.
matrix can be decomposed into eigenvalues and eigenvectors. In some
ALGEBRA
on exists, but may involve complex rather than real numbers.
ook, we usually need to decompose only a specific class of
simple decomposition. Specifically, every real symmetric
posed into an expression using only real-valued eigenvectors
A = Q⇤Q>
, (2.41)
gonal matrix composed of eigenvectors of A, and ⇤ is a
eigenvalue ⇤i,i is associated with the eigenvector in column i
(Goodfellow 2016)
Effect of Eigenvalues
−3 −2 −1 0 1 2 3
x(
−3
−2
−1
0
1
2
3
x1
v(1)
v(1)
Before multiplication
−3 −2 −1 0 1 2 3
x′
(
−3
−2
−1
0
1
2
3
x′
1
v(1)
λ1 v(1)
v(1)
λ1 v(1)
After multiplication
(Goodfellow 2016)
Singular Value Decomposition
• Similar to eigendecomposition
• More general; matrix need not be square
LINEAR ALGEBRA
ly applicable. Every real matrix has a singular value decomposition,
is not true of the eigenvalue decomposition. For example, if a matrix
, the eigendecomposition is not defined, and we must use a singular
osition instead.
at the eigendecomposition involves analyzing a matrix A to discover
f eigenvectors and a vector of eigenvalues such that we can rewrite
A = V diag( )V 1
. (2.42)
ular value decomposition is similar, except this time we will write A
of three matrices:
A = UDV >
. (2.43)
hat A is an m ⇥ n matrix. Then U is defined to be an m ⇥ m matrix,
m ⇥ n matrix, and V to be an n ⇥ n matrix.
(Goodfellow 2016)
Moore-Penrose Pseudoinverse
• If the equation has:
• Exactly one solution: this is the same as the inverse.
• No solution: this gives us the solution with the
smallest error
• Many solutions: this gives us the solution with the
smallest norm of x.
When A has more columns than r
pseudoinverse provides one of the ma
he solution x = A+y with minima
olutions.
When A has more rows than colu
n this case, using the pseudoinverse
possible to y in terms of Euclidean n
rithms for computing the pseudoinverse are not based on this defini-
er the formula
A+
= V D+
U>
, (2.47)
nd V are the singular value decomposition of A, and the pseudoinverse
onal matrix D is obtained by taking the reciprocal of its non-zero
taking the transpose of the resulting matrix.
as more columns than rows, then solving a linear equation using the
provides one of the many possible solutions. Specifically, it provides
x = A+y with minimal Euclidean norm ||x||2 among all possible
as more rows than columns, it is possible for there to be no solution.
using the pseudoinverse gives us the x for which Ax is as close as
in terms of Euclidean norm ||Ax y||2.
e Trace Operator
rator gives the sum of all of the diagonal entries of a matrix:
(Goodfellow 2016)
Computing the Pseudoinverse
eudoinverse allows us to make some headway in these
of A is defined as a matrix
A+
= lim
↵&0
(A>
A + ↵I) 1
A>
. (2.46)
mputing the pseudoinverse are not based on this defini-
la
A+
= V D+
U>
, (2.47)
singular value decomposition of A, and the pseudoinverse
D is obtained by taking the reciprocal of its non-zero
ranspose of the resulting matrix.
mns than rows, then solving a linear equation using the
of the many possible solutions. Specifically, it provides
Take reciprocal of non-zero entries
The SVD allows the computation of the pseudoinverse:
(Goodfellow 2016)
Trace
perator
sum of all of the diagonal entries of a matrix:
Tr(A) =
X
i
Ai,i. (2.48)
ful for a variety of reasons. Some operations that are
sorting to summation notation can be specified using
46
pression using many useful identities. For example, the trace
nt to the transpose operator:
Tr(A) = Tr(A>
). (2.50)
square matrix composed of many factors is also invariant to
ctor into the first position, if the shapes of the corresponding
resulting product to be defined:
Tr(ABC) = Tr(CAB) = Tr(BCA) (2.51)
Tr(
nY
i=1
F (i)
) = Tr(F (n)
n 1Y
i=1
F (i)
). (2.52)
cyclic permutation holds even if the resulting product has a
r example, for A 2 Rm⇥n and B 2 Rn⇥m, we have
(Goodfellow 2016)
Learning linear algebra
• Do a lot of practice problems
• Start out with lots of summation signs and indexing
into individual entries
• Eventually you will be able to mostly use matrix
and vector product notation quickly and easily

Mais conteúdo relacionado

Mais procurados

2. Linear Algebra for Machine Learning: Basis and Dimension
2. Linear Algebra for Machine Learning: Basis and Dimension2. Linear Algebra for Machine Learning: Basis and Dimension
2. Linear Algebra for Machine Learning: Basis and DimensionCeni Babaoglu, PhD
 
Abstract algebra & its applications (1)
Abstract algebra & its applications (1)Abstract algebra & its applications (1)
Abstract algebra & its applications (1)drselvarani
 
System of linear equations
System of linear equationsSystem of linear equations
System of linear equationsCesar Mendoza
 
Sequences and Series
Sequences and SeriesSequences and Series
Sequences and Seriessujathavvv
 
Measure and integration
Measure and integrationMeasure and integration
Measure and integrationPrakash Dabhi
 
Eigen values and eigen vectors
Eigen values and eigen vectorsEigen values and eigen vectors
Eigen values and eigen vectorstirath prajapati
 
Homogeneous Linear Differential Equations
 Homogeneous Linear Differential Equations Homogeneous Linear Differential Equations
Homogeneous Linear Differential EquationsAMINULISLAM439
 
Applications of linear algebra
Applications of linear algebraApplications of linear algebra
Applications of linear algebraPrerak Trivedi
 
First order linear differential equation
First order linear differential equationFirst order linear differential equation
First order linear differential equationNofal Umair
 
Applications of linear algebra in computer science
Applications of linear algebra in computer scienceApplications of linear algebra in computer science
Applications of linear algebra in computer scienceArnob Khan
 
Group Theory and Its Application: Beamer Presentation (PPT)
Group Theory and Its Application:   Beamer Presentation (PPT)Group Theory and Its Application:   Beamer Presentation (PPT)
Group Theory and Its Application: Beamer Presentation (PPT)SIRAJAHMAD36
 
Cs6702 graph theory and applications 2 marks questions and answers
Cs6702 graph theory and applications 2 marks questions and answersCs6702 graph theory and applications 2 marks questions and answers
Cs6702 graph theory and applications 2 marks questions and answersappasami
 
Maths-->>Eigenvalues and eigenvectors
Maths-->>Eigenvalues and eigenvectorsMaths-->>Eigenvalues and eigenvectors
Maths-->>Eigenvalues and eigenvectorsJaydev Kishnani
 
Graph isomorphism
Graph isomorphismGraph isomorphism
Graph isomorphismCore Condor
 
Linear algebra notes 1
Linear algebra notes 1Linear algebra notes 1
Linear algebra notes 1Ghulam Murtaza
 
vector spaces notes.pdf
vector spaces notes.pdfvector spaces notes.pdf
vector spaces notes.pdfjacky489530
 

Mais procurados (20)

LinearAlgebra.ppt
LinearAlgebra.pptLinearAlgebra.ppt
LinearAlgebra.ppt
 
2. Linear Algebra for Machine Learning: Basis and Dimension
2. Linear Algebra for Machine Learning: Basis and Dimension2. Linear Algebra for Machine Learning: Basis and Dimension
2. Linear Algebra for Machine Learning: Basis and Dimension
 
Matrix Algebra seminar ppt
Matrix Algebra seminar pptMatrix Algebra seminar ppt
Matrix Algebra seminar ppt
 
Abstract algebra & its applications (1)
Abstract algebra & its applications (1)Abstract algebra & its applications (1)
Abstract algebra & its applications (1)
 
Matrices ppt
Matrices pptMatrices ppt
Matrices ppt
 
System of linear equations
System of linear equationsSystem of linear equations
System of linear equations
 
Sequences and Series
Sequences and SeriesSequences and Series
Sequences and Series
 
Measure and integration
Measure and integrationMeasure and integration
Measure and integration
 
Eigen values and eigen vectors
Eigen values and eigen vectorsEigen values and eigen vectors
Eigen values and eigen vectors
 
Homogeneous Linear Differential Equations
 Homogeneous Linear Differential Equations Homogeneous Linear Differential Equations
Homogeneous Linear Differential Equations
 
Applications of linear algebra
Applications of linear algebraApplications of linear algebra
Applications of linear algebra
 
First order linear differential equation
First order linear differential equationFirst order linear differential equation
First order linear differential equation
 
Applications of linear algebra in computer science
Applications of linear algebra in computer scienceApplications of linear algebra in computer science
Applications of linear algebra in computer science
 
Group Theory and Its Application: Beamer Presentation (PPT)
Group Theory and Its Application:   Beamer Presentation (PPT)Group Theory and Its Application:   Beamer Presentation (PPT)
Group Theory and Its Application: Beamer Presentation (PPT)
 
Complex number
Complex numberComplex number
Complex number
 
Cs6702 graph theory and applications 2 marks questions and answers
Cs6702 graph theory and applications 2 marks questions and answersCs6702 graph theory and applications 2 marks questions and answers
Cs6702 graph theory and applications 2 marks questions and answers
 
Maths-->>Eigenvalues and eigenvectors
Maths-->>Eigenvalues and eigenvectorsMaths-->>Eigenvalues and eigenvectors
Maths-->>Eigenvalues and eigenvectors
 
Graph isomorphism
Graph isomorphismGraph isomorphism
Graph isomorphism
 
Linear algebra notes 1
Linear algebra notes 1Linear algebra notes 1
Linear algebra notes 1
 
vector spaces notes.pdf
vector spaces notes.pdfvector spaces notes.pdf
vector spaces notes.pdf
 

Semelhante a 02 linear algebra

1. Introduction.pptx
1. Introduction.pptx1. Introduction.pptx
1. Introduction.pptxSungaleliYuen
 
Linear_Algebra_final.pdf
Linear_Algebra_final.pdfLinear_Algebra_final.pdf
Linear_Algebra_final.pdfRohitAnand125
 
Introduction To Matrix
Introduction To MatrixIntroduction To Matrix
Introduction To MatrixAnnie Koh
 
Introduction to matlab chapter2 by Dr.Bashir m. sa'ad.pdf
Introduction to matlab chapter2 by Dr.Bashir m. sa'ad.pdfIntroduction to matlab chapter2 by Dr.Bashir m. sa'ad.pdf
Introduction to matlab chapter2 by Dr.Bashir m. sa'ad.pdfYasirMuhammadlawan
 
Bba i-bm-u-2- matrix -
Bba i-bm-u-2- matrix -Bba i-bm-u-2- matrix -
Bba i-bm-u-2- matrix -Rai University
 
Engg maths k notes(4)
Engg maths k notes(4)Engg maths k notes(4)
Engg maths k notes(4)Ranjay Kumar
 
Precalculus 1 chapter 1
Precalculus 1 chapter 1Precalculus 1 chapter 1
Precalculus 1 chapter 1oreves
 
Power point for Theory of computation and detail
Power point for Theory of computation and detailPower point for Theory of computation and detail
Power point for Theory of computation and detailNivaTripathy1
 
basics of autometa theory for beginner .
basics of autometa theory for beginner .basics of autometa theory for beginner .
basics of autometa theory for beginner .NivaTripathy1
 
1 linear algebra matrices
1 linear algebra matrices1 linear algebra matrices
1 linear algebra matricesAmanSaeed11
 
ArrayBasics.ppt
ArrayBasics.pptArrayBasics.ppt
ArrayBasics.pptRaselAzam1
 
Functions And Relations
Functions And RelationsFunctions And Relations
Functions And Relationsandrewhickson
 

Semelhante a 02 linear algebra (20)

1. Introduction.pptx
1. Introduction.pptx1. Introduction.pptx
1. Introduction.pptx
 
Linear_Algebra_final.pdf
Linear_Algebra_final.pdfLinear_Algebra_final.pdf
Linear_Algebra_final.pdf
 
Introduction To Matrix
Introduction To MatrixIntroduction To Matrix
Introduction To Matrix
 
matlab functions
 matlab functions  matlab functions
matlab functions
 
1560 mathematics for economists
1560 mathematics for economists1560 mathematics for economists
1560 mathematics for economists
 
Vectouurs
VectouursVectouurs
Vectouurs
 
Vectouurs
VectouursVectouurs
Vectouurs
 
Introduction to matlab chapter2 by Dr.Bashir m. sa'ad.pdf
Introduction to matlab chapter2 by Dr.Bashir m. sa'ad.pdfIntroduction to matlab chapter2 by Dr.Bashir m. sa'ad.pdf
Introduction to matlab chapter2 by Dr.Bashir m. sa'ad.pdf
 
Bba i-bm-u-2- matrix -
Bba i-bm-u-2- matrix -Bba i-bm-u-2- matrix -
Bba i-bm-u-2- matrix -
 
Engg maths k notes(4)
Engg maths k notes(4)Engg maths k notes(4)
Engg maths k notes(4)
 
Matlab cheatsheet
Matlab cheatsheetMatlab cheatsheet
Matlab cheatsheet
 
Precalculus 1 chapter 1
Precalculus 1 chapter 1Precalculus 1 chapter 1
Precalculus 1 chapter 1
 
Power point for Theory of computation and detail
Power point for Theory of computation and detailPower point for Theory of computation and detail
Power point for Theory of computation and detail
 
Commands list
Commands listCommands list
Commands list
 
basics of autometa theory for beginner .
basics of autometa theory for beginner .basics of autometa theory for beginner .
basics of autometa theory for beginner .
 
1 linear algebra matrices
1 linear algebra matrices1 linear algebra matrices
1 linear algebra matrices
 
ArrayBasics.ppt
ArrayBasics.pptArrayBasics.ppt
ArrayBasics.ppt
 
Lecture_note2.pdf
Lecture_note2.pdfLecture_note2.pdf
Lecture_note2.pdf
 
Functions And Relations
Functions And RelationsFunctions And Relations
Functions And Relations
 
M a t r i k s
M a t r i k sM a t r i k s
M a t r i k s
 

Mais de Ronald Teo

07 regularization
07 regularization07 regularization
07 regularizationRonald Teo
 
Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodel
 Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodel Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodel
Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodelRonald Teo
 
Lec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgLec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgRonald Teo
 
Lec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsLec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsRonald Teo
 
Lec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-researchLec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-researchRonald Teo
 
Lec4b pong from_pixels
Lec4b pong from_pixelsLec4b pong from_pixels
Lec4b pong from_pixelsRonald Teo
 
Lec4a policy-gradients-actor-critic
Lec4a policy-gradients-actor-criticLec4a policy-gradients-actor-critic
Lec4a policy-gradients-actor-criticRonald Teo
 
Lec2 sampling-based-approximations-and-function-fitting
Lec2 sampling-based-approximations-and-function-fittingLec2 sampling-based-approximations-and-function-fitting
Lec2 sampling-based-approximations-and-function-fittingRonald Teo
 
Lec1 intro-mdps-exact-methods
Lec1 intro-mdps-exact-methodsLec1 intro-mdps-exact-methods
Lec1 intro-mdps-exact-methodsRonald Teo
 

Mais de Ronald Teo (16)

Mc td
Mc tdMc td
Mc td
 
07 regularization
07 regularization07 regularization
07 regularization
 
Dp
DpDp
Dp
 
06 mlp
06 mlp06 mlp
06 mlp
 
Mdp
MdpMdp
Mdp
 
04 numerical
04 numerical04 numerical
04 numerical
 
Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodel
 Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodel Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodel
Eac4f222d9d468a0c29a71a3830a5c60 c5_w3l08-attentionmodel
 
Intro rl
Intro rlIntro rl
Intro rl
 
Lec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgLec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scg
 
Lec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methodsLec5 advanced-policy-gradient-methods
Lec5 advanced-policy-gradient-methods
 
Lec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-researchLec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-research
 
Lec4b pong from_pixels
Lec4b pong from_pixelsLec4b pong from_pixels
Lec4b pong from_pixels
 
Lec4a policy-gradients-actor-critic
Lec4a policy-gradients-actor-criticLec4a policy-gradients-actor-critic
Lec4a policy-gradients-actor-critic
 
Lec3 dqn
Lec3 dqnLec3 dqn
Lec3 dqn
 
Lec2 sampling-based-approximations-and-function-fitting
Lec2 sampling-based-approximations-and-function-fittingLec2 sampling-based-approximations-and-function-fitting
Lec2 sampling-based-approximations-and-function-fitting
 
Lec1 intro-mdps-exact-methods
Lec1 intro-mdps-exact-methodsLec1 intro-mdps-exact-methods
Lec1 intro-mdps-exact-methods
 

Último

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

02 linear algebra

  • 1. Linear Algebra Lecture slides for Chapter 2 of Deep Learning Ian Goodfellow 2016-06-24
  • 2. (Goodfellow 2016) About this chapter • Not a comprehensive survey of all of linear algebra • Focused on the subset most relevant to deep learning • Larger subset: e.g., Linear Algebra by Georgi Shilov
  • 3. (Goodfellow 2016) Scalars • A scalar is a single number • Integers, real numbers, rational numbers, etc. • We denote it with italic font: a, n, x
  • 4. (Goodfellow 2016) Vectors • A vector is a 1-D array of numbers: • Can be real, binary, integer, etc. • Example notation for type and size: rder. We can identify each individual number by its index in that ordering. ypically we give vectors lower case names written in bold typeface, such s x. The elements of the vector are identified by writing its name in italic ypeface, with a subscript. The first element of x is x1, the second element x2 and so on. We also need to say what kind of numbers are stored in he vector. If each element is in R, and the vector has n elements, then the ector lies in the set formed by taking the Cartesian product of R n times, enoted as Rn. When we need to explicitly identify the elements of a vector, e write them as a column enclosed in square brackets: x = 2 6 6 6 4 x1 x2 ... xn 3 7 7 7 5 . (2.1) We can think of vectors as identifying points in space, with each element ving the coordinate along a different axis. ometimes we need to index a set of elements of a vector. In this case, we efine a set containing the indices and write the set as a subscript. For xample, to access x1, x3 and x6, we define the set S = {1, 3, 6} and write S. We use the sign to index the complement of a set. For example x 1 is he vector containing all elements of x except for x1, and x S is the vector ontaining all of the elements of x except for x1, x3 and x6. Matrices: A matrix is a 2-D array of numbers, so each element is identified by wo indices instead of just one. We usually give matrices upper-case variable ames with bold typeface, such as A. If a real-valued matrix A has a height • Vectors: A vector is an array of order. We can identify each indiv Typically we give vectors lower as x. The elements of the vector typeface, with a subscript. The fi is x2 and so on. We also need t the vector. If each element is in R vector lies in the set formed by t denoted as Rn. When we need to
  • 5. (Goodfellow 2016) Matrices • A matrix is a 2-D array of numbers: • Example notation for type and shape: A = 2 4 A1,1 A1,2 A2,1 A2,2 A3,1 A3,2 3 5 ) A> =  A1,1 A2,1 A3,1 A1,2 A2,2 A3,2 The transpose of the matrix can be thought of as a mirror image across the nal. th column of A. When we need to explicitly identify the elements of a x, we write them as an array enclosed in square brackets:  A1,1 A1,2 A2,1 A2,2 . (2.2) times we may need to index matrix-valued expressions that are not just gle letter. In this case, we use subscripts after the expression, but do onvert anything to lower case. For example, f(A)i,j gives element (i, j) e matrix computed by applying the function f to A. ors: In some cases we will need an array with more than two axes. In eneral case, an array of numbers arranged on a regular grid with a ble number of axes is known as a tensor. We denote a tensor named “A” this typeface: A. We identify the element of A at coordinates (i, j, k) riting Ai,j,k. ndices and write the set as a subscript x6, we define the set S = {1, 3, 6} and x the complement of a set. For example ents of x except for x1, and x S is the of x except for x1, x3 and x6. ray of numbers, so each element is identifi . We usually give matrices upper-case va h as A. If a real-valued matrix A has a we say that A 2 Rm⇥n. We usually id g its name in italic but not bold font, an Column Row
  • 6. (Goodfellow 2016) Tensors • A tensor is an array of numbers, that may have • zero dimensions, and be a scalar • one dimension, and be a vector • two dimensions, and be a matrix • or more dimensions.
  • 7. (Goodfellow 2016) Matrix Transpose CHAPTER 2. LINEAR ALGEBRA A = 2 4 A1,1 A1,2 A2,1 A2,2 A3,1 A3,2 3 5 ) A> =  A1,1 A2,1 A3,1 A1,2 A2,2 A3,2 Figure 2.1: The transpose of the matrix can be thought of as a mirror image across the main diagonal. the i-th column of A. When we need to explicitly identify the elements of a matrix, we write them as an array enclosed in square brackets:  A1,1 A1,2 A2,1 A2,2 . (2.2) rtant operation on matrices is the transpose. The transpose of a mirror image of the matrix across a diagonal line, called the main ning down and to the right, starting from its upper left corner. See graphical depiction of this operation. We denote the transpose of a A>, and it is defined such that (A> )i,j = Aj,i. (2.3) an be thought of as matrices that contain only one column. The a vector is therefore a matrix with only one row. Sometimes we 33 ations have many useful properties that make mathematical more convenient. For example, matrix multiplication is A(B + C) = AB + AC. (2.6) A(BC) = (AB)C. (2.7) is not commutative (the condition AB = BA does not alar multiplication. However, the dot product between two : x> y = y> x. (2.8) matrix product has a simple form: (AB)> = B> A> . (2.9) onstrate Eq. 2.8, by exploiting the fact that the value of
  • 8. (Goodfellow 2016) Matrix (Dot) Product duct of matrices A and B is a third matrix C. In order ned, A must have the same number of columns as B has ⇥ n and B is of shape n ⇥ p, then C is of shape m ⇥ p. product just by placing two or more matrices together, C = AB. (2.4) n is defined by Ci,j = X k Ai,kBk,j. (2.5) d product of two matrices is not just a matrix containing dual elements. Such an operation exists and is called the Hadamard product, and is denoted as A B. en two vectors x and y of the same dimensionality is the can think of the matrix product C = AB as computing etween row i of A and column j of B. = •m p m p n n Must match e defined, A must have the same number of columns as B has pe m ⇥ n and B is of shape n ⇥ p, then C is of shape m ⇥ p. atrix product just by placing two or more matrices together, C = AB. (2.4) ration is defined by Ci,j = X k Ai,kBk,j. (2.5) andard product of two matrices is not just a matrix containing ndividual elements. Such an operation exists and is called the t or Hadamard product, and is denoted as A B. between two vectors x and y of the same dimensionality is the . We can think of the matrix product C = AB as computing uct between row i of A and column j of B. 34
  • 9. (Goodfellow 2016) Identity Matrix PTER 2. LINEAR ALGEBRA 2 4 1 0 0 0 1 0 0 0 1 3 5 Figure 2.2: Example identity matrix: This is I3. A2,1x1 + A2,2x2 + · · · + A2,nxn = b2 (2 . . . (2 Am,1x1 + Am,2x2 + · · · + Am,nxn = bm. (2 Matrix-vector product notation provides a more compact representation ations of this form. Inverse Matrices werful tool called matrix inversion that allows us to for many values of A. ersion, we first need to define the concept of an identity x is a matrix that does not change any vector when we at matrix. We denote the identity matrix that preserves n. Formally, In 2 Rn⇥n, and 8x 2 Rn , Inx = x. (2.20) ity matrix is simple: all of the entries along the main the other entries are zero. See Fig. 2.2 for an example.
  • 10. (Goodfellow 2016) Systems of Equations t of useful properties of the matrix product here, but that many more exist. ear algebra notation to write down a system of linear Ax = b (2.11) n matrix, b 2 Rm is a known vector, and x 2 Rn is a we would like to solve for. Each element xi of x is one Each row of A and each element of b provide another Eq. 2.11 as: A1,:x = b1 (2.12) A2,:x = b2 (2.13) . . . (2.14) expands to aware that many more exist. ough linear algebra notation to write down a system of linear Ax = b (2.11) a known matrix, b 2 Rm is a known vector, and x 2 Rn is a riables we would like to solve for. Each element xi of x is one iables. Each row of A and each element of b provide another ewrite Eq. 2.11 as: A1,:x = b1 (2.12) A2,:x = b2 (2.13) . . . (2.14) Am,:x = bm (2.15) tly, as: A1,1x1 + A1,2x2 + · · · + A1,nxn = b1 (2.16) 35
  • 11. (Goodfellow 2016) Solving Systems of Equations • A linear system of equations can have: • No solution • Many solutions • Exactly one solution: this means multiplication by the matrix is an invertible function
  • 12. (Goodfellow 2016) Matrix Inversion • Matrix inverse: • Solving a system using an inverse: • Numerically unstable, but useful for abstract analysis identity matrix is simple: all of the entries along the main all of the other entries are zero. See Fig. 2.2 for an example. se of A is denoted as A 1, and it is defined as the matrix A 1 A = In. (2.21) Eq. 2.11 by the following steps: Ax = b (2.22) A 1 Ax = A 1 b (2.23) Inx = A 1 b (2.24) 36 f the identity matrix is simple: all of the entries along the main while all of the other entries are zero. See Fig. 2.2 for an example. inverse of A is denoted as A 1, and it is defined as the matrix A 1 A = In. (2.21) solve Eq. 2.11 by the following steps: Ax = b (2.22) A 1 Ax = A 1 b (2.23) Inx = A 1 b (2.24) 36
  • 13. (Goodfellow 2016) Invertibility • Matrix can’t be inverted if… • More rows than columns • More columns than rows • Redundant rows/columns (“linearly dependent”, “low rank”)
  • 14. (Goodfellow 2016) Norms • Functions that measure how “large” a vector is • Similar to a distance between zero and the point represented by the vector is given by ||x||p = X i |xi|p !1 p for p 2 R, p 1. Norms, including the Lp norm, are functions mapping vect values. On an intuitive level, the norm of a vector x measure the origin to the point x. More rigorously, a norm is any func the following properties: • f(x) = 0 ) x = 0 • f(x + y)  f(x) + f(y) (the triangle inequality) • 8↵ 2 R, f(↵x) = |↵|f(x) The L2 norm, with p = 2, is known as the Euclidean nor Euclidean distance from the origin to the point identified by
  • 15. (Goodfellow 2016) • Lp norm • Most popular norm: L2 norm, p=2 • L1 norm, p=1: • Max norm, infinite p: Norms o measure the size of a vector. In machine learning, we ectors using a function called a norm. Formally, the L ||x||p = X i |xi|p !1 p the Lp norm, are functions mapping vectors to non-n ive level, the norm of a vector x measures the distan nt x. More rigorously, a norm is any function f that ties: APTER 2. LINEAR ALGEBRA ause it increases very slowly near the origin. In several machine learning lications, it is important to discriminate between elements that are exactly and elements that are small but nonzero. In these cases, we turn to a function t grows at the same rate in all locations, but retains mathematical simplicity: L1 norm. The L1 norm may be simplified to ||x||1 = X i |xi|. (2.31) L1 norm is commonly used in machine learning when the difference between o and nonzero elements is very important. Every time an element of x moves y from 0 by ✏, the L1 norm increases by ✏. We sometimes measure the size of the vector by counting its number of nonzero ments. Some authors refer to this function as the “L0 norm,” but this is incorrect minology. The number of non-zero entries in a vector is not a norm, because zero and elements that are small but nonzero. In these cases, we turn to a function that grows at the same rate in all locations, but retains mathematical simplicity: the L1 norm. The L1 norm may be simplified to ||x||1 = X i |xi|. (2.31) The L1 norm is commonly used in machine learning when the difference between zero and nonzero elements is very important. Every time an element of x moves away from 0 by ✏, the L1 norm increases by ✏. We sometimes measure the size of the vector by counting its number of nonzero elements. Some authors refer to this function as the “L0 norm,” but this is incorrect terminology. The number of non-zero entries in a vector is not a norm, because scaling the vector by ↵ does not change the number of nonzero entries. The L1 norm is often used as a substitute for the number of nonzero entries. One other norm that commonly arises in machine learning is the L1 norm, also known as the max norm. This norm simplifies to the absolute value of the element with the largest magnitude in the vector, ||x||1 = max i |xi|. (2.32) Sometimes we may also wish to measure the size of a matrix. In the context of deep learning, the most common way to do this is with the otherwise obscure Frobenius norm sX
  • 16. (Goodfellow 2016) • Unit vector: • Symmetric Matrix: • Orthogonal matrix: Special Matrices and Vectors A = A> . (2.35) en arise when the entries are generated by some function of es not depend on the order of the arguments. For example, ance measurements, with Ai,j giving the distance from point = Aj,i because distance functions are symmetric. ector with unit norm: ||x||2 = 1. (2.36) vector y are orthogonal to each other if x>y = 0. If both orm, this means that they are at a 90 degree angle to each n vectors may be mutually orthogonal with nonzero norm. only orthogonal but also have unit norm, we call them ix is a square matrix whose rows are mutually orthonormal e mutually orthonormal: > > 1 n machine learning algorithm in terms of arbitrary matrices, sive (and less descriptive) algorithm by restricting some ices need be square. It is possible to construct a rectangular quare diagonal matrices do not have inverses but it is still them cheaply. For a non-square diagonal matrix D, the scaling each element of x, and either concatenating some is taller than it is wide, or discarding some of the last D is wider than it is tall. is any matrix that is equal to its own transpose: A = A> . (2.35) n arise when the entries are generated by some function of s not depend on the order of the arguments. For example, nce measurements, with Ai,j giving the distance from point = Aj,i because distance functions are symmetric. es often arise when the entries are generated by some function of at does not depend on the order of the arguments. For example, distance measurements, with Ai,j giving the distance from point Ai,j = Aj,i because distance functions are symmetric. is a vector with unit norm: ||x||2 = 1. (2.36) nd a vector y are orthogonal to each other if x>y = 0. If both ero norm, this means that they are at a 90 degree angle to each most n vectors may be mutually orthogonal with nonzero norm. e not only orthogonal but also have unit norm, we call them matrix is a square matrix whose rows are mutually orthonormal ns are mutually orthonormal: A> A = AA> = I. (2.37) 41 LGEBRA A 1 = A> , (2.38)
  • 17. (Goodfellow 2016) Eigendecomposition • Eigenvector and eigenvalue: • Eigendecomposition of a diagonalizable matrix: • Every real symmetric matrix has a real, orthogonal eigendecomposition: atrix as an array of elements. dely used kinds of matrix decomposition is called eigen- h we decompose a matrix into a set of eigenvectors and square matrix A is a non-zero vector v such that multipli- the scale of v: Av = v. (2.39) as the eigenvalue corresponding to this eigenvector. (One vector such that v>A = v>, but we are usually concerned . or of A, then so is any rescaled vector sv for s 2 R, s 6= 0. he same eigenvalue. For this reason, we usually only look example of the effect of eigenvectors and eigenvalues. Here, we have h two orthonormal eigenvectors, v(1) with eigenvalue 1 and v(2) with Left) We plot the set of all unit vectors u 2 R2 as a unit circle. (Right) of all points Au. By observing the way that A distorts the unit circle, we cales space in direction v(i) by i. form a matrix V with one eigenvector per column: V = [v(1), . . . , , we can concatenate the eigenvalues to form a vector = [ 1, . . . , ndecomposition of A is then given by A = V diag( )V 1 . (2.40) en that constructing matrices with specific eigenvalues and eigenvec- to stretch space in desired directions. However, we often want to rices into their eigenvalues and eigenvectors. Doing so can help us ain properties of the matrix, much as decomposing an integer into rs can help us understand the behavior of that integer. matrix can be decomposed into eigenvalues and eigenvectors. In some ALGEBRA on exists, but may involve complex rather than real numbers. ook, we usually need to decompose only a specific class of simple decomposition. Specifically, every real symmetric posed into an expression using only real-valued eigenvectors A = Q⇤Q> , (2.41) gonal matrix composed of eigenvectors of A, and ⇤ is a eigenvalue ⇤i,i is associated with the eigenvector in column i
  • 18. (Goodfellow 2016) Effect of Eigenvalues −3 −2 −1 0 1 2 3 x( −3 −2 −1 0 1 2 3 x1 v(1) v(1) Before multiplication −3 −2 −1 0 1 2 3 x′ ( −3 −2 −1 0 1 2 3 x′ 1 v(1) λ1 v(1) v(1) λ1 v(1) After multiplication
  • 19. (Goodfellow 2016) Singular Value Decomposition • Similar to eigendecomposition • More general; matrix need not be square LINEAR ALGEBRA ly applicable. Every real matrix has a singular value decomposition, is not true of the eigenvalue decomposition. For example, if a matrix , the eigendecomposition is not defined, and we must use a singular osition instead. at the eigendecomposition involves analyzing a matrix A to discover f eigenvectors and a vector of eigenvalues such that we can rewrite A = V diag( )V 1 . (2.42) ular value decomposition is similar, except this time we will write A of three matrices: A = UDV > . (2.43) hat A is an m ⇥ n matrix. Then U is defined to be an m ⇥ m matrix, m ⇥ n matrix, and V to be an n ⇥ n matrix.
  • 20. (Goodfellow 2016) Moore-Penrose Pseudoinverse • If the equation has: • Exactly one solution: this is the same as the inverse. • No solution: this gives us the solution with the smallest error • Many solutions: this gives us the solution with the smallest norm of x. When A has more columns than r pseudoinverse provides one of the ma he solution x = A+y with minima olutions. When A has more rows than colu n this case, using the pseudoinverse possible to y in terms of Euclidean n rithms for computing the pseudoinverse are not based on this defini- er the formula A+ = V D+ U> , (2.47) nd V are the singular value decomposition of A, and the pseudoinverse onal matrix D is obtained by taking the reciprocal of its non-zero taking the transpose of the resulting matrix. as more columns than rows, then solving a linear equation using the provides one of the many possible solutions. Specifically, it provides x = A+y with minimal Euclidean norm ||x||2 among all possible as more rows than columns, it is possible for there to be no solution. using the pseudoinverse gives us the x for which Ax is as close as in terms of Euclidean norm ||Ax y||2. e Trace Operator rator gives the sum of all of the diagonal entries of a matrix:
  • 21. (Goodfellow 2016) Computing the Pseudoinverse eudoinverse allows us to make some headway in these of A is defined as a matrix A+ = lim ↵&0 (A> A + ↵I) 1 A> . (2.46) mputing the pseudoinverse are not based on this defini- la A+ = V D+ U> , (2.47) singular value decomposition of A, and the pseudoinverse D is obtained by taking the reciprocal of its non-zero ranspose of the resulting matrix. mns than rows, then solving a linear equation using the of the many possible solutions. Specifically, it provides Take reciprocal of non-zero entries The SVD allows the computation of the pseudoinverse:
  • 22. (Goodfellow 2016) Trace perator sum of all of the diagonal entries of a matrix: Tr(A) = X i Ai,i. (2.48) ful for a variety of reasons. Some operations that are sorting to summation notation can be specified using 46 pression using many useful identities. For example, the trace nt to the transpose operator: Tr(A) = Tr(A> ). (2.50) square matrix composed of many factors is also invariant to ctor into the first position, if the shapes of the corresponding resulting product to be defined: Tr(ABC) = Tr(CAB) = Tr(BCA) (2.51) Tr( nY i=1 F (i) ) = Tr(F (n) n 1Y i=1 F (i) ). (2.52) cyclic permutation holds even if the resulting product has a r example, for A 2 Rm⇥n and B 2 Rn⇥m, we have
  • 23. (Goodfellow 2016) Learning linear algebra • Do a lot of practice problems • Start out with lots of summation signs and indexing into individual entries • Eventually you will be able to mostly use matrix and vector product notation quickly and easily