Correspondence Analysis for Discrete Multivariate Statistics

•

0 gostou•588 visualizações

Correspondence analysis is a technique for approximating a contingency table with lower rank tables to analyze the relationship between two categorical variables. It works by finding pairs of correspondence factors that have unit variance with respect to the marginal distributions and are maximally correlated. The correspondence factors and their correlations are obtained from the singular value decomposition of a normalized contingency table. Hypothesis tests can then be conducted to test the independence of the categorical variables and how well a lower rank approximation fits the data. The analysis also provides a spatial representation of the row and column categories in lower dimensions.

Educação Tecnologia

MULTIVARIATE STATISTICS, Lesson 11.
Correspondence Analysis
(Discrete version of the canonical correlation analysis)
We have two – usually not independent – discrete (categorical) r.v.’s X and Y taking on n
and m diﬀerent values, respectively (e.g., hair-color and eye-color). The joint distribution R is
estimated from the joint frequencies based on N observations:
fij
rij = (i = 1, . . . , n; j = 1, . . . , m).
N
Let R denote the n × m matrix of rij ’s. The marginal distributions P and Q are:

pi = ri. (i = 1, . . . , n) and qj = r.j (j = 1, . . . , m).

Let P and Q denote the n × n and m × m diagonal matrices containing the marginal probabilities
in their main diagonals, respectively.

• Problem: to approximate the table R with a lower rank table. For tis purpose we are looking
for correspondence factor pairs αl , βl taking on values and having unit variance with respect to
the marginal distributions, further being maximally correlated on certain constraints. ER αβ is
maximum (1) for the trivial (constantly 1) variable pair α1 , β1 . Then for k = 2, . . . , min{n, m}
we are looking for the maximum of ER αβ on the constraints that

EP α = EQ β = 0, D2 α = D2 β = 1,
P Q Cov P ααi = Cov Q ββi = 0 (i = 1, . . . , k − 1).

• Solution: by the SVD of the n × m matrix B = P−1/2 RQ−1/2 = r sl vl uT , where r is
l=1 l
the rank of R. The singular values 1 = s1 ≥ s2 ≥ · · · ≥ sr > 0 are the correlations of the
correspondence factor pairs, while the values taken on by the k-th pair are coordinates of the
correspondence vectors P−1/2 vk and Q−1/2 uk , respectively.
• Hypothesis testing: to test the independence of X and Y , the test statistic
n m n m 2 r
(rij − pi qj )2 rij 2
t=N =N − 1 = N B − B1 F =N s2
l
i=1 j=1
pi qj i=1 j=1
pi qj l=2

is used, that under independence follows χ2 ((n − 1)(m − 1))-distribution. To thest the hypo-
thesis that a k-rank approximation of the contingency table suﬃces, the statistic
r
N B− Bk 2
F =N s2 ,
l k = 1, . . . , r − 1
l=k+1

k
should take on small values” (its distribution is yet unknown), where Bk = l=1 sl vl uT is
l
”
the best k-rank approximation of B in Frobenius norm (in spectral norm, too).
k
• Deﬁnition: The above t is called total correspondence of the table, while N l=2 s2 /t is
l
called correspondence explained by the ﬁrst k correspondence factor pairs.
• Spacial representation of the row and column categories in k dimension: by the points

(α1 (i), α2 (i), . . . , αk (i)) and (β1 (j), β2 (j), . . . , βk (j))

(i = 1, . . . , n; j = 1, . . . , m). Store them for further analysis.

Mais conteúdo relacionado

Mais procurados

Matrix Models of 2D String Theory in Non-trivial BackgroundsUtrecht University

The wild McKay correspondenceTakehiko Yasuda

Estimates for a class of non-standard bilinear multipliersVjekoslavKovac1

A sharp nonlinear Hausdorff-Young inequality for small potentialsVjekoslavKovac1

Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...VjekoslavKovac1

Some Examples of Scaling SetsVjekoslavKovac1

Density theorems for anisotropic point configurationsVjekoslavKovac1

Density theorems for Euclidean point configurationsVjekoslavKovac1

Mean varianceodkoo

On Twisted Paraproducts and some other Multilinear Singular IntegralsVjekoslavKovac1

Scattering theory analogues of several classical estimates in Fourier analysisVjekoslavKovac1

Norm-variation of bilinear averagesVjekoslavKovac1

Dynamical systems solved exMaths Tutoring

Computing F-blowupsTakehiko Yasuda

On gradient Ricci solitonsmnlfdzlpz

Bellman functions and Lp estimates for paraproductsVjekoslavKovac1

A Szemeredi-type theorem for subsets of the unit cubeVjekoslavKovac1

Boundedness of the Twisted ParaproductVjekoslavKovac1

Quantitative norm convergence of some ergodic averagesVjekoslavKovac1

PdaSudhir Kumar

Mais procurados (20)

Matrix Models of 2D String Theory in Non-trivial Backgrounds

The wild McKay correspondence

Estimates for a class of non-standard bilinear multipliers

A sharp nonlinear Hausdorff-Young inequality for small potentials

Variants of the Christ-Kiselev lemma and an application to the maximal Fourie...

Some Examples of Scaling Sets

Density theorems for anisotropic point configurations

Density theorems for Euclidean point configurations

Mean variance

On Twisted Paraproducts and some other Multilinear Singular Integrals

Scattering theory analogues of several classical estimates in Fourier analysis

Norm-variation of bilinear averages

Dynamical systems solved ex

Computing F-blowups

On gradient Ricci solitons

Bellman functions and Lp estimates for paraproducts

A Szemeredi-type theorem for subsets of the unit cube

Boundedness of the Twisted Paraproduct

Quantitative norm convergence of some ergodic averages

Pda

Semelhante a Correspondence Analysis for Discrete Multivariate Statistics

03 image transformRumah Belajar

Lewenz_McNairs-copyAnna Lewenz

Roots of unity_cool_appdineshkrithi

The Probability that a Matrix of Integers Is DiagonalizableJay Liew

IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline

Mcgill3Arthur Charpentier

Manuscript 1334Vivek Sharma

Manuscript 1334-1Vivek Sharma

Dsp U Lec10 DFT And FFTtaha25

Chapter 9 computation of the dftmikeproud

Quantum mechanicsChristy Betos

P1 Variation Modulguest3952880

Sparsity and Compressed SensingGabriel Peyré

SSA slidesatikrkhan

LPS talk notesMatt Hawthorn

Update 2Emanuel Emanuel

WAVELET-PACKET-BASED ADAPTIVE ALGORITHM FOR SPARSE IMPULSE RESPONSE IDENTIFI...bermudez_jcm

Least squares support Vector Machine ClassifierRaj Sikarwar

IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline

Proof of Kraft-McMillan theoremVu Hung Nguyen

Semelhante a Correspondence Analysis for Discrete Multivariate Statistics (20)

03 image transform

Lewenz_McNairs-copy

Roots of unity_cool_app

The Probability that a Matrix of Integers Is Diagonalizable

IJCER (www.ijceronline.com) International Journal of computational Engineerin...

Mcgill3

Manuscript 1334

Manuscript 1334-1

Dsp U Lec10 DFT And FFT

Chapter 9 computation of the dft

Quantum mechanics

P1 Variation Modul

Sparsity and Compressed Sensing

SSA slides

LPS talk notes

Update 2

WAVELET-PACKET-BASED ADAPTIVE ALGORITHM FOR SPARSE IMPULSE RESPONSE IDENTIFI...

Least squares support Vector Machine Classifier

IJCER (www.ijceronline.com) International Journal of computational Engineerin...

Proof of Kraft-McMillan theorem

Mais de dessybudiyanti

a space time modeldessybudiyanti

a space-time modeldessybudiyanti

Kapita Selekta-a space time model (Salisa & Anna)dessybudiyanti

Presentasi "Fast and Botstrap Robust for LTS" (Mega&Ika)dessybudiyanti

Presentasi "Fast and Bootstrap Robust LTS" (Mega&Ika)dessybudiyanti

Fast and Bootstrap Robust for LTSdessybudiyanti

Greenacre Lewidessybudiyanti

Deteksi Influencedessybudiyanti

Pemilihan Model Terbaikdessybudiyanti

ANALISIS PENYEIMBANGAN LINTASAN SERTA PENGUJIAN PERBEDAAN SHIFT KERJA TERHADA...dessybudiyanti

Teknik Samplingdessybudiyanti

APLIKASI SIX SIGMA PADA PENGUKURAN KINERJA DI UD. SUMBER KULIT MAGETAN dessybudiyanti

Presentasi Tentang Regresi Lineardessybudiyanti

Optimasi Produksi Dengan Metode Respon Surfacedessybudiyanti

Simple Linier Regressiondessybudiyanti

Presentasi Tentang AHPdessybudiyanti

Dua Tahun Rehabilitasi Dan Rekonstruksi Aceh Dan Nias Pasca-Tsunami : Evaluas...dessybudiyanti

Jurnal Time Seriesdessybudiyanti

Uji Klinikdessybudiyanti

Mais de dessybudiyanti (20)

a space time model

a space-time model

Kapita Selekta-a space time model (Salisa & Anna)

Presentasi "Fast and Botstrap Robust for LTS" (Mega&Ika)

Presentasi "Fast and Bootstrap Robust LTS" (Mega&Ika)

Fast and Bootstrap Robust for LTS

Greenacre Lewi

Deteksi Influence

Pemilihan Model Terbaik

ANALISIS PENYEIMBANGAN LINTASAN SERTA PENGUJIAN PERBEDAAN SHIFT KERJA TERHADA...

Teknik Sampling

APLIKASI SIX SIGMA PADA PENGUKURAN KINERJA DI UD. SUMBER KULIT MAGETAN

Presentasi Tentang Regresi Linear

Optimasi Produksi Dengan Metode Respon Surface

Simple Linier Regression

Presentasi Tentang AHP

Dua Tahun Rehabilitasi Dan Rekonstruksi Aceh Dan Nias Pasca-Tsunami : Evaluas...

Jurnal Time Series

Uji Klinik

Último

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur

Arihant handbook biology for class 11 .pdfchloefrazer622

Nutritional Needs Presentation - HLTH 104misteraugie

Código Creativo y Arte de Software | Unidad 1Maestría en Comunicación Digital Interactiva - UNR

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82

Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K

Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron

Sports & Fitness Value Added Course FY..Disha Kariya

Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande

Class 11th Physics NEET formula sheet pdfAyushMahapatra5

Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha

social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3

Student login on Anyboli platform.helpinRaunakKeshri1

Measures of Central Tendency: Mean, Median and ModeThiyagu K

Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019

1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh

Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB

The basics of sentences session 2pptx copy.pptxheathfieldcps1

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD

Correspondence Analysis for Discrete Multivariate Statistics

1. MULTIVARIATE STATISTICS, Lesson 11. Correspondence Analysis (Discrete version of the canonical correlation analysis) We have two – usually not independent – discrete (categorical) r.v.’s X and Y taking on n and m different values, respectively (e.g., hair-color and eye-color). The joint distribution R is estimated from the joint frequencies based on N observations: fij rij = (i = 1, . . . , n; j = 1, . . . , m). N Let R denote the n × m matrix of rij ’s. The marginal distributions P and Q are: pi = ri. (i = 1, . . . , n) and qj = r.j (j = 1, . . . , m). Let P and Q denote the n × n and m × m diagonal matrices containing the marginal probabilities in their main diagonals, respectively. • Problem: to approximate the table R with a lower rank table. For tis purpose we are looking for correspondence factor pairs αl , βl taking on values and having unit variance with respect to the marginal distributions, further being maximally correlated on certain constraints. ER αβ is maximum (1) for the trivial (constantly 1) variable pair α1 , β1 . Then for k = 2, . . . , min{n, m} we are looking for the maximum of ER αβ on the constraints that EP α = EQ β = 0, D2 α = D2 β = 1, P Q Cov P ααi = Cov Q ββi = 0 (i = 1, . . . , k − 1). • Solution: by the SVD of the n × m matrix B = P−1/2 RQ−1/2 = r sl vl uT , where r is l=1 l the rank of R. The singular values 1 = s1 ≥ s2 ≥ · · · ≥ sr > 0 are the correlations of the correspondence factor pairs, while the values taken on by the k-th pair are coordinates of the correspondence vectors P−1/2 vk and Q−1/2 uk , respectively. • Hypothesis testing: to test the independence of X and Y , the test statistic n m n m 2 r (rij − pi qj )2 rij 2 t=N =N − 1 = N B − B1 F =N s2 l i=1 j=1 pi qj i=1 j=1 pi qj l=2 is used, that under independence follows χ2 ((n − 1)(m − 1))-distribution. To thest the hypothesis that a k-rank approximation of the contingency table suffices, the statistic r N B− Bk 2 F =N s2 , l k = 1, . . . , r − 1 l=k+1 k should take on small values” (its distribution is yet unknown), where Bk = l=1 sl vl uT is l ” the best k-rank approximation of B in Frobenius norm (in spectral norm, too). k • Definition: The above t is called total correspondence of the table, while N l=2 s2 /t is l called correspondence explained by the first k correspondence factor pairs. • Spacial representation of the row and column categories in k dimension: by the points (α1 (i), α2 (i), . . . , αk (i)) and (β1 (j), β2 (j), . . . , βk (j)) (i = 1, . . . , n; j = 1, . . . , m). Store them for further analysis.

Correspondence Analysis for Discrete Multivariate Statistics

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Correspondence Analysis for Discrete Multivariate Statistics

Semelhante a Correspondence Analysis for Discrete Multivariate Statistics (20)

Mais de dessybudiyanti

Mais de dessybudiyanti (20)

Último

Último (20)

Correspondence Analysis for Discrete Multivariate Statistics