Introduction Mathematics Intelligent Systems Syllabus
1. Introduction to Mathematics for Intelligent
Systems
Andres Mendez-Vazquez
April 30, 2018
1 Introduction
Given the fact that heavy mathematical modeling is at the center of the Machine
Learning algorithms, it is clear that a student, dedicating her/his time and effort
to learn this area, must be proficient and comfortable with such mathematics.
Given the demands on those students, it is clear that the normal courses in a
Computer Science curricula do not cover the necessary mathematics for Machine
Learning. Therefore, if we want to get the correct skills for a new generation
of people in Machine Learning, it is necessary to teach those skills. This is the
main reason why this course on the Mathematics for Intelligent Systems exists.
2 Objectives
The objective of this course is to teach four important areas in the Mathematics
for Intelligent Systems:
• Theory and basic methods in Linear Algebra,
• Introduction to statistics and probability methods,
• Basic Methods in Optimization.
3 Course Grading
The total grading of the course is split in the following way:
Requirements % of the Grade
Midterm 1 10%
Midterm 2 10%
Midterm 4 10%
Final 10%
8 Homework’s 60%
1
2. Mathematics for Intelligent Systems
3.1 Homework
We will have homework with a variable number of problems. Homework will be
handmade and scanned into pdfs for grading with the following format:
1. Name and date must be at the first page.
2. Each problem must be stated before the solution.
The programming sections will be done using python.
4 Syllabus
I.1 Linear Algebra [1, 2, 3]
1. Vector Spaces
(a) Using Vector to Represent Samples
(b) Lengths and Dot Products
(c) Matrices
(d) The Example of Regression
2. System of Linear Equations
(a) Introduction
(b) Elementary row operations and elementary matrices
(c) Squared Matrices and Inverse
(d) Homogeneous and In-homogeneous Systems
(e) Solving Linear Equations
i. Elimination - A = LU
(f) Solving the Regression Problem
3. Vector Spaces
(a) Space of Vectors
(b) The Null Space of A.
(c) Rank of a Matrix
(d) Solving Homogeneous and In-homogeneous Systems
(e) The idea of Basis and Independence
(f) Dimension of a Vector Space
4. Orthogonality
(a) Orthonormal Basis
(b) Projections onto sub-spaces
Cinvestav GDL 2
3. Mathematics for Intelligent Systems
(c) The Regression Least Square
(d) Gram-Schmidt
5. Determinants
(a) The Properties of the Determinants
(b) Permutations
(c) Cofactors
(d) Cramer’s Rule
6. The Eigenvectors
(a) Eigenvalues and Eigenvectors
(b) Diagonalization
(c) Singular Value Decomposition
7. Linear Transformations
(a) The Concept of Linear Transformations
(b) The Matrix to represent a Linear Transformation
(c) The Pseudo-inverse
(d) Application to the Regression
8. Derivatives of Matrices
(a) Generalizing the concept of derivatives
9. Applications
(a) Solving the Linear Regression Problem
(b) Application Principal Component Analysis
(c) Application for image processing EigenFaces
(d) Markov Matrices and the Google Matrix
(e) Linear Algebra in Probability and Statistics
I.2 Probability and Statistical Theory [4, 5]
1. Introduction
(a) Probability Definition
(b) The Sample Space
(c) Basic Set Operations
(d) Counting
i. How to produce probabilities?
Cinvestav GDL 3
4. Mathematics for Intelligent Systems
2. Conditional Probability
(a) Definition and intuition
(b) Bayes’ Rule
(c) Conditional Probabilities are probabilities
(d) Independence of events
(e) The use of conditioning to solve problems
3. Random Variables
(a) The basic intuition and definition
(b) Distributions
i. Continuous
ii. Discrete
(c) Cumulative distribution function
(d) Function of Random Variables
4. Expectation
(a) Define expectation as a weighted average
(b) Linearity of the expectation
(c) Examples, Geometric and Negative Binomial
(d) Indicator Random Variable and the Fundamental Bridge
(e) Variance
5. Moments
(a) Moments as summarizing a distribution
(b) How do we interpret moments?
(c) Sample moments
i. The Sample Mean
ii. The Sample Variance
(d) Moment generating functions
6. Joint distributions
(a) Joint, Marginal and Conditional
(b) Transformation of variables
(c) Covariance and Correlations
(d) Conditional Expectation
7. Inequalities and Limit Theorems
Cinvestav GDL 4
5. Mathematics for Intelligent Systems
(a) Inequalities
(b) Law of Large Numbers
8. Applications
(a) Bayesian inference
(b) Generative Models vs. Discriminative Models
i. Models that learn from the data
(c) Markov Chains
(d) Markov Chain Monte Carlo
I.3 Optimization [6, 7]
1. Introduction
(a) Formulation
(b) Example: Least Squared Error
(c) Continuous versus Discrete Optimization
(d) Constrained and Unconstrained Optimization
2. The Basics
(a) What is a solution?
(b) How to recognize a minimum?
(c) Overview:
i. Linear Search
ii. Trust Region
3. Convex Functions
(a) Why Convex Functions?
(b) Differentiable Convex Functions
(c) Characterizing Convex Functions using differentiability
4. The Concept of an Algorithm
(a) Algorithms
(b) Convergence
(c) Comparison of Algorithms
5. Linear Search Methods
(a) Introduction
(b) Linear Search
Cinvestav GDL 5
6. Mathematics for Intelligent Systems
(c) Search using Derivatives
i. Gradient Descent
ii. Newton’s Method
(d) Applications in the Perceptron Algorithm and Logistic Regression
6. Trust-Region Methods
(a) Linear Search Vs Trust-Region
(b) The Basics for the Trust-Region
(c) Basic Algorithms
i. Levenberg-Marquardt algorithm
ii. Dogleg Method
7. Specific Methods:
(a) Conjugate Gradient Methods
(b) Quasi-Newton Methods
8. Optimality Conditions and Duality
(a) Lagrange Multipliers
(b) The Karush-Kuhn-Tucher Condiditons
(c) Constraint Qualifications
(d) Lagrangian Dual Problems
(e) Formulating the Dual Problem
(f) Linear and Quadratic Programming
Cinvestav GDL 6
7. Mathematics for Intelligent Systems
References
[1] G. Strang, Introduction to Linear Algebra. Wellesley, MA: Wellesley-
Cambridge Press, fourth ed., 2009.
[2] K. Hoffman and R. Kunze, Linear algebra. Prentice-Hall mathematics series,
Prentice-Hall, 1971.
[3] S. Lang, Algebra. Graduate Texts in Mathematics, Springer New York, 2005.
[4] R. Ash, Basic Probability Theory. Dover Books on Mathematics Series,
Dover Publications, Incorporated, 2012.
[5] J. Blitzstein and J. Hwang, Introduction to Probability. Chapman &
Hall/CRC Texts in Statistical Science, CRC Press, 2014.
[6] J. Nocedal and S. J. Wright, Numerical Optimization, second edition. World
Scientific, 2006.
[7] M. S. Bazaraa, Nonlinear Programming: Theory and Algorithms. Wiley
Publishing, 3rd ed., 2013.
Cinvestav GDL 7