This document contains lecture notes for a pattern recognition course taught by Dr. Mostafa Gadal-Haqq at Ain Shams University. The notes cover mathematical foundations of pattern recognition including probability theory, statistics, and mathematical notations. Specifically, the notes define concepts like random variables, probability distributions, expected values, variance, and conditional probability. They also provide examples of applying these concepts to problems involving events, outcomes, and data modeling. The document concludes by noting that the next lecture will cover Bayesian decision theory.
1. CSC446 : Pattern Recognition
Prof. Dr. Mostafa G. M. Mostafa
Faculty of Computer & Information Sciences
Computer Science Department
AIN SHAMS UNIVERSITY
Lecture Note 3:
Mathematical Foundations
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
Appendix, Pattern Classification and PRML
2. CS446 : Pattern Recognition
Readings: Chapter 1 in Bishop’s PRML
Data Modeling (Regression)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
3. Learning: Data Modeling
• Assume we have examples of pairs (x , y) and we
want to learn the mapping 𝑭: 𝑿 → 𝒀 to predict y
for future values of x.
𝒚 𝒙 = 𝐬𝐢𝐧( 𝟐𝝅𝒙)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
4. Polynomial Curve Fitting
• Problem: There are many possible mapping
functions 𝑭: 𝑿 → 𝒀 exist!
Which one to choose?
• We could choose the one
that minimize the error :
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
5. Polynomial Curve Fitting
• Fitting a different polynomials (models) to
data:
𝑦 𝑥 = 𝒘 𝟎 𝑦 𝑥 = 𝒘 𝟎+𝒘 𝟏 𝒙
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
6. Polynomial Curve Fitting
• Fitting a different polynomials (models) to
data:
𝑦 𝑥 = 𝒘 𝟎+𝒘 𝟏 𝒙+𝒘 𝟐 𝒙 𝟐
𝑦 𝑥 = 𝒘 𝟎+𝒘 𝟏 𝒙+𝒘 𝟐 𝒙 𝟐
+ ⋯ + 𝒘 𝟖 𝒙 𝟖
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
7. Overfitting
• At M = 9, we get zero training Error , BUT
highest testing Error
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
8. Effect of Data Size
• As number of data samples N increases, we
get more closer to the real data model with
higher order.
M = 9 M = 9
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
9. Performance Evaluation
• Generalization error is the true error for the
population of examples we would like to optimize
– Sample mean only approximates it.
• Two ways to assess the generalization error is:
• Theoretical: Law of Large numbers
– statistical bounds on the difference between the true and
sample mean errors
• Practical: Use a separate data set with m data
samples to test the model
(Mean) test error =
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
10. Assignment 1
1. Derive an equation for estimating the
parameters w from the sample data for
the cases M = 1 and M = 2.
2. Use such equations to draw a relation
between w and E(w) for each M. Use the
estimated values of w as the middle values
of the w range.
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
11. CS446 : Pattern Recognition
Readings: Appendix A
Probability & Statistics
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
12. 1- Probability Theory
• Randomness:
–we call a phenomenon random if individual outcomes
are uncertain but there is nonetheless a regular
distribution of outcomes in a large number of
repetitions.
• Probability:
–the probability of any outcome of a random phenomenon
is the proportion of times the outcome would occur in a
very long series of repetitions.
–Probability is the long-term relative frequency.
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
13. 1- Probability Theory
• Discrete random variables:
–Let x X ; the sample space X = {v1, v2, ... , vm}.
–We denote by pi the probability that x = vi:
• Where pi must satisfy the following two conditions:
pi = Pr{ x = vi } , i = 1, . . . , m.
m
i
ii pp
1
1and0
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
14. 1- Probability Theory
• Equally likely outcomes:
“Equally likely outcomes are outcomes that
have the same probability of occurring.”
• Examples:
– Rolling a fair die
– Tossing a fair coin
• P(x) is a “Uniform Distribution”
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
15. 1- Probability Theory
• Equally likely outcomes:
• if we have ten identical balls numbered from 0 to 9, in a box
find the probability of randomly drawing a ball with a number
divisible by 3,
– the event space (desired outcomes): A={3,6,9}.
– the sample space (possible outcomes): S = {0, 1, 2, . . . , 9}.
• Since the drawing is at random, then each outcome is equally
likely to occur, i.e.: P(0) = P(1) = P(2) =…= P(9) =1/10
• P(A) ={numb. Of outcomes in A} / {number of outcomes in S}
= 3/10 = 0.3
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
16. 1- Probability Theory
• Biased outcomes (non-uniform dist.):
“Biased outcomes are outcomes that have
different probability of occurring.”
• Examples:
– Rolling a unfair die
– Tossing a unfair coin
• P(x) is a “Non-uniform Dist.”
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
17. 1- Probability Theory
• Biased outcomes (non-uniform dist.):
• A biased coin, twice as likely to come up tails as
heads, is tossed twice:
– What is the probability that at least one head occurs?
• Solution:
– Sample space = {HH, HT, TH, TT}
– P(H= head) = 1/3 , P(T= tail) =2/3
– Sample points/probability for the event:
• P(HT)= 1/3 x 2/3 = 2/9 P(HH)= 1/3 x 1/3= 1/9
• P(TH) = 2/3 x 1/3 = 2/9 P(TT)= 2/3 x 2/3 = 4/9
– Answer: 5/9 = 0.56 (sum of weights in red)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
18. 1- Probability Theory
• Probability and Language
What’s the probability of a random word (from a random
dictionary page) being a verb?
• Solution:
• All words = just count all the words in the dictionary
• # of ways to get a verb: number of words which are verbs!
• If a dictionary has 50,000 entries, and 10,000 are verbs,
then:
• P(Verb) =10000/50000 = 1/5 = .20
wordsall
verbagettowaysof
verbadrawingP
#
)(
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
19. 1- Probability Theory
• Conditional Probability
– A way to reason about the outcome of an
experiment based on partial information:
• In a word guessing game the first letter for the word
is a “t”. How likely is the second letter is an “h”?
• How likely is a person has a disease given that a
medical test was negative?
• A spot shows up on a radar screen. How likely it
corresponds to an aircraft?
• I saw your friend, How likely I will saw you?
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
20. 1- Probability Theory
• Conditional Probability
• let A and B be events
• p(B|A) = the probability of event B occurring given event A
occurs
• definition:
)(
),(
)|(
BP
BAP
BAP
A BA,B
Note: P(A,B)=P(A|B) · P(B)
Also : P(A,B) = P(B,A)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
21. 1- Probability Theory
• Conditional Probability
• One of the following 30 items is chosen at random.
• What is P(X), the probability that it is an X?
• What is P(X|red), the probability that it is an X given that it
is red?
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
22. 1- Probability Theory
• Statistically Independent events
–Variables x and y are said to be
statistically independent if and only if:
–That is, knowing the value of x did not
give us any additional knowledge about
the possible value of y
)()(),( yPxPyxP
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
23. 1- Probability Theory
• Marginal Probability
• Conditional Probability
• Joint Probability
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
24. 1- Probability Theory
• Sum Rule
• Product Rule
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
25. 1- Probability Theory
• Sum Rule
• Product Rule
• The Rules of Probability
)()|()()|(),( YpYXpXpXYpYXp
Y
YXpXp ),()(
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
26. 1- Probability Theory
• Bayes Theorem
where
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
27. 1- Probability Theory
• Probability mass function, P(x):
– P(x) is the cumulative distribution of p(x).
Xx
z
xP
xP
dxxpz)P(x
1)(and
0)(
)(
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
28. 2- Statistics
• Statistics is the science of collecting, organizing, and interpreting numerical
facts, which we call data.
• The best way of
looking at data is to
draw its histogram/
(frequency
distribution)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
29. 2- Statistics
• Univariate Gaussian/Normal Density:
–A density that is analytically tractable
–Continuous density
–A lot of processes are asymptotically Gaussian
Where:
= mean (or expected value) of x
2 = squared deviation or variance
,
2
1
exp
2
1
)(
2
x
xp
1)( dxxp
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
30. 2- Statistics
• Univariate Gaussian/Normal Density
p(u) ~ N(0,1)
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
31. 2- Statistics
• Multivariate Normal Density
– Multivariate normal density in d dimensions is:
where:
x = (x1, x2, …, xd)t = The multivariate random variable
= (1, 2, …, d)t = the mean vector
= d*d covariance matrix, || and -1 are it determinant
and inverse, respectively .
)x()x(
2
1
exp
)2(
1
)x( 1
2/12/
t
d
p
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
32. 2- Statistics
• Multivariate Density: Statistically Independent
– If xi and xj are statistically independent
σij = 0.
– In this case, p (x) reduces to the product of the
univariate normal densities for the components of
x. That is: if p(xi) ~ N(xi | µi , σi )
p(x) = p(x1,x2, …, xd) = p(x1) p(x2) … p(xd)
= p(xi) ,
i
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
33. 2- Statistics
• Multivariate Normal Density
– From the multivariate normal density, the loci of
points of constant density are hyperellipsoids for
which the quadratic form (x−µ)t Σ−1(x−µ) is
constant
– The quantity:
r2 = (x−µ)t Σ−1 (x−µ)
is sometimes called the squared Mahalanobis
distance from x to µ.
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
35. 2- Statistics
Expected values:
• The expected value, mean or average of the random variable
x is defined by:
• if f(x) is any function of x, the expected value of f is defined
by:
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1
36. 2- Statistics
Expected values:
• The second moment of x is defined by:
• The variance of x is defined by:
where σ is the standard deviation of x.
ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide - 1