SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
최적화 기법을 이용한 거주지
군집의 탐색
홍성연 (hong.seongyun@gmail.com)
2012/05/22
거주지 분화에 관한 연구의 일반적인 흐름

 Patterns of segregation – which population
 group is separated from other population
 groups?

     Causes of segregation – what are the
     underlying reasons for the residential
     separation?


          Consequences of segregation – what does
          that imply in our society?
Measures of segregation
       Duncan and Duncan’s
       index of dissimilarity
              (1955)



                                White’s index of spatial
           Morrill’s adjusted   proximity (1983)
                     index of
                dissimilarity
                       (1991)            Wong’s adjusted
                                         index of
                                         dissimilarity (1993)




                                  Reardon and O’Sullivan’s
                                 spatial segregation indices
                                           (2004)
Enclave vs. Ethnoburb
                             Enclave                    Ethnoburb
   Dynamics                  Forced segregation         Voluntary segregation
   Spatial form              Small scale                Small to medium scale
   Population                High density               Medium density
   Location                  Inner city                 Suburbs
   Economy                   Labour-intensive sectors   Business of all kinds
   Internal stratification   Minimum                    Very stratified
   Interaction               Mainly within group        Both within- & inter-groups
   Tension                   Between groups             Inter- & intra-group
   Community                 Mainly inward              Both inward and outward
   Example                   Traditional Chinatown      San Gabriel Valley

Source: Li, 1997
Some candidates …
• GAM and Kulldorff’s scan statistic?

  • Originally developed for epidemiological or ecological studies where
    clustering is often very rare

  • Often utilised in a situation where data are generated from observations,
    such as the occurrence of a disease

• Getis-Ord’s local G* statistic and local Moran’s I?

  • Designed to detect statistically significant clustering of the sample points

    assuming no autocorrelation in the study region

  • At least appeared in the relevant literature
기존 방법의 문제점
         Source: Poulsen et al., 2010



         P(z < –5.17) = 0.000000117047
         P(z > 10.32) = 2.861158 x 10–25




         P(z > 20.64) = 6.003128 x 10–95
거주지 분화에 관한 연구의 특징
• Often employ census data as the primary source of information

• The presence is usually very apparent even on a simple choropleth

 map of the population.

• Difficulties arise in delineating the boundaries of residential clusters,

 because those located in suburban areas have no clear borders.

• The question that should be addressed by a statistical tool is more

 related to the extent of residential clustering than its presence or
 approximate location.
최적화 기법의 활용
• Suppose that the study region is divided into n census tracts, Ω = {x1,

 x2, x3, . . . , xn}, and the aim is to identify a particular number of groups
 whose data values are distinctively larger than those of the remaining
 census tracts.

• The idea behind the proposed clustering method is that the quality of

 a given clustering can be represented by numerical indices, and the
 best possible subsets can be found by optimising the index values.

• Which index should we use?
최적화 기법의 활용
• Within-group sum of absolute deviations:

                               𝑔   𝑛𝑖

                         𝑤 = � � 𝑎 𝑖𝑖 𝜇 𝑖 − 𝑏 𝑖𝑖
                              𝑖=0 𝑗=1

 where ni is the number of census tracts in Ai, aij is the weight of the
 corresponding census tract and bij is the data value of interest, such
 as the population density of an ethnic group; μi refers to the
 weighted mean of all data values in Ai.
최적화 기법의 활용
• Because we cannot investigate all possible combinations, we need to

 use an alternative algorithm.

• The one I implemented for demonstration worked as follows:

  • Step 1: Choose starting points

  • Step 2: Calculate and compare the clustering measure

  • Step 3: Expand the current cluster

  • Step 4: Repeat the procedures for each cluster
Synthetic data sets
• Patterns generated from an exponential distribution with   λ = 0.005
Synthetic data sets
• (More) patterns from the same exponential distribution
Local G* with a distance-based adjacency f.
• Centre-to-centre distance less than 1, 2, 8 m
Local G* with a queen-contiguity matrix
Local G* with a queen-contiguity matrix
Proposed approach
Proposed approach
Population composition in Auckland
Table 1. Index of dissimilarity (D) for major ethnic groups in Auckland,
2001

                                               Asian
             European    Chinese     Indians           Korean    All
   D          0.387       0.330       0.358            0.453    0.300

                                        Pacific peoples
              Māori      Samoan      Tongan       Cook Island    All
   D          0.321       0.490       0.511            0.484    0.527
Pacific peoples in Auckland
• Geographic distribution of

 Pacific peoples in the Auckland
 urban areas, 2006
Results
Results
Koreans in Auckland
• Geographic distribution of

 Koreans in the Auckland urban
 areas, 2006
Results
Results
How many iterations?
• Pacific peoples in Auckland, 2006 (based on 100 simulations)
How many iterations?
• Pacific peoples in Auckland, 2006 (based on 100 simulations)
How many iterations?
• Koreans in Auckland, 2006 (based on 100 simulations)
How many iterations?
• Koreans in Auckland, 2006 (based on 100 simulations)
How many clusters (partitions)?
How many clusters (partitions)?
Random seeds vs. manual seeds
• Some unpublished figures for Pacific peoples ...
Random seeds vs. manual seeds
• Some unpublished figures for Korean ...
결과 정리
• Same as most other local statistics in the sense that it attempts to

 identify a set of geographically close observations with high (or low,
 depending on the context) data values in relation to the rest of the
 data

• Does not require defining ‘close’ or ‘high’ prior to its application, and

 this feature provides an advantage over the other traditional methods
 in terms of delineating the boundaries of arbitrarily shaped clusters
결과 정리
• Possible to obtain similar results from other recently developed

 clustering methods (e.g. Tango and Takahashi 2005, Mu and Wang
 2008, Yao et al. 2011), but they set the upper limit of cluster size for
 computational reasons or adopt inferential statistics as a clustering
 criterion.
  • Maybe reasonable for epidemiological research, where the cluster to be

    found can be small and the data are usually derived from samples, but
    probably not for residential clusters of population groups

  • Computation is more straightforward than the other (scan statistic-based)

    ‘flexible’ approaches.
Albany
적용가능한 사례                     Buffalo

• Similar to k-means




 Albany                Buffalo         N ’hood
                                          Type




 Cincinnati            New ark
Computer implementation
• Some ‘proof-of-concept’ level functions have been written in R.

  • Working but slow ...

• More stable versions will be included in the ‘seg’ package, hopefully

 before August of this year.
참고 문헌
Duncan OD, and Duncan B. 1955. A methodological analysis of
  segregation indexes. American Sociological Review 20: 210-217.
White MJ. 1983. The measurement of spatial segregation. The
  American Journal of Sociology 88: 1008-1018.
Reardon SF, and O'Sullivan D. 2004. Measures of Spatial Segregation
   Sociological Methodology 34: 121-162.
Poulsen M, Johnston R, and Forrest J. 2010. The intensity of ethnic
   residential clustering: exploring scale effects using local indicators of
   spatial association. Environment and Planning A 42: 874-894.
Hong S-Y, and O'Sullivan D. 2012. Detecting ethnic residential clusters
  using an optimisation clustering method. International Journal of
  Geographical Information Science: 1-21.

Mais conteúdo relacionado

Semelhante a 185회 콜로퀴움 홍성연 박사 발표자료

D1T3 enm workflows updated
D1T3 enm workflows updatedD1T3 enm workflows updated
D1T3 enm workflows updatedTown Peterson
 
Basic biostatistics dr.eezn
Basic biostatistics dr.eeznBasic biostatistics dr.eezn
Basic biostatistics dr.eeznEhealthMoHS
 
Niche comparisons 201606 para curso Lichos
Niche comparisons 201606 para curso LichosNiche comparisons 201606 para curso Lichos
Niche comparisons 201606 para curso LichosTown Peterson
 
Franhouder july2013
Franhouder july2013Franhouder july2013
Franhouder july2013CS, NcState
 
Sleepwalking towards Johannesburg?
Sleepwalking towards Johannesburg?Sleepwalking towards Johannesburg?
Sleepwalking towards Johannesburg?Rich Harris
 
Spatial Context 151208
Spatial Context 151208Spatial Context 151208
Spatial Context 151208Alasdair Rae
 
Multidisciplinary research and GIS techniques in language history studies: f...
Multidisciplinary research and GIS techniques in language history studies: f...Multidisciplinary research and GIS techniques in language history studies: f...
Multidisciplinary research and GIS techniques in language history studies: f...Pierpaolo Di Carlo
 
Population and sampling
Population and samplingPopulation and sampling
Population and samplingEdu Anud, Jr
 
Sa Presentation 20070917111 Thomas
Sa Presentation 20070917111 ThomasSa Presentation 20070917111 Thomas
Sa Presentation 20070917111 Thomasnspiropo
 
IInvestigation of the genetic basis of adaptation
IInvestigation of the genetic basis of adaptationIInvestigation of the genetic basis of adaptation
IInvestigation of the genetic basis of adaptationPhilippe Henry
 
Chapter 1
Chapter 1Chapter 1
Chapter 1Lem Lem
 
Neutral theory 2019
Neutral theory 2019Neutral theory 2019
Neutral theory 2019RanajitDas12
 
Data-Driven Color Palettes for Categorical Maps
Data-Driven Color Palettes for Categorical MapsData-Driven Color Palettes for Categorical Maps
Data-Driven Color Palettes for Categorical Mapsnacis_slides
 
A data-intensive assessment of the species abundance distribution
A data-intensive assessment of the species abundance distributionA data-intensive assessment of the species abundance distribution
A data-intensive assessment of the species abundance distributionElita Baldridge
 
Mobilitiy and participation geographies
Mobilitiy and participation geographiesMobilitiy and participation geographies
Mobilitiy and participation geographieseverydayparticipation
 
Curso Lichos - MOP and (separately) Niche conservatism 201606
Curso Lichos - MOP and (separately) Niche conservatism 201606Curso Lichos - MOP and (separately) Niche conservatism 201606
Curso Lichos - MOP and (separately) Niche conservatism 201606Town Peterson
 
Data mining, transfer and learner corpora: Using data mining to discover evid...
Data mining, transfer and learner corpora: Using data mining to discover evid...Data mining, transfer and learner corpora: Using data mining to discover evid...
Data mining, transfer and learner corpora: Using data mining to discover evid...Steve Pepper
 

Semelhante a 185회 콜로퀴움 홍성연 박사 발표자료 (20)

D1T3 enm workflows updated
D1T3 enm workflows updatedD1T3 enm workflows updated
D1T3 enm workflows updated
 
Basic biostatistics dr.eezn
Basic biostatistics dr.eeznBasic biostatistics dr.eezn
Basic biostatistics dr.eezn
 
Niche comparisons 201606 para curso Lichos
Niche comparisons 201606 para curso LichosNiche comparisons 201606 para curso Lichos
Niche comparisons 201606 para curso Lichos
 
Franhouder july2013
Franhouder july2013Franhouder july2013
Franhouder july2013
 
Sleepwalking towards Johannesburg?
Sleepwalking towards Johannesburg?Sleepwalking towards Johannesburg?
Sleepwalking towards Johannesburg?
 
Spatial Context 151208
Spatial Context 151208Spatial Context 151208
Spatial Context 151208
 
Multidisciplinary research and GIS techniques in language history studies: f...
Multidisciplinary research and GIS techniques in language history studies: f...Multidisciplinary research and GIS techniques in language history studies: f...
Multidisciplinary research and GIS techniques in language history studies: f...
 
Population and sampling
Population and samplingPopulation and sampling
Population and sampling
 
Sa Presentation 20070917111 Thomas
Sa Presentation 20070917111 ThomasSa Presentation 20070917111 Thomas
Sa Presentation 20070917111 Thomas
 
IInvestigation of the genetic basis of adaptation
IInvestigation of the genetic basis of adaptationIInvestigation of the genetic basis of adaptation
IInvestigation of the genetic basis of adaptation
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
Neutral theory 2019
Neutral theory 2019Neutral theory 2019
Neutral theory 2019
 
Omri's PhD Thesis
Omri's PhD ThesisOmri's PhD Thesis
Omri's PhD Thesis
 
Session 3&4.pptx
Session 3&4.pptxSession 3&4.pptx
Session 3&4.pptx
 
Data-Driven Color Palettes for Categorical Maps
Data-Driven Color Palettes for Categorical MapsData-Driven Color Palettes for Categorical Maps
Data-Driven Color Palettes for Categorical Maps
 
A data-intensive assessment of the species abundance distribution
A data-intensive assessment of the species abundance distributionA data-intensive assessment of the species abundance distribution
A data-intensive assessment of the species abundance distribution
 
Mobilitiy and participation geographies
Mobilitiy and participation geographiesMobilitiy and participation geographies
Mobilitiy and participation geographies
 
Curso Lichos - MOP and (separately) Niche conservatism 201606
Curso Lichos - MOP and (separately) Niche conservatism 201606Curso Lichos - MOP and (separately) Niche conservatism 201606
Curso Lichos - MOP and (separately) Niche conservatism 201606
 
Data mining, transfer and learner corpora: Using data mining to discover evid...
Data mining, transfer and learner corpora: Using data mining to discover evid...Data mining, transfer and learner corpora: Using data mining to discover evid...
Data mining, transfer and learner corpora: Using data mining to discover evid...
 
K033049053
K033049053K033049053
K033049053
 

Último

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Último (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

185회 콜로퀴움 홍성연 박사 발표자료

  • 1. 최적화 기법을 이용한 거주지 군집의 탐색 홍성연 (hong.seongyun@gmail.com) 2012/05/22
  • 2. 거주지 분화에 관한 연구의 일반적인 흐름 Patterns of segregation – which population group is separated from other population groups? Causes of segregation – what are the underlying reasons for the residential separation? Consequences of segregation – what does that imply in our society?
  • 3. Measures of segregation Duncan and Duncan’s index of dissimilarity (1955) White’s index of spatial Morrill’s adjusted proximity (1983) index of dissimilarity (1991) Wong’s adjusted index of dissimilarity (1993) Reardon and O’Sullivan’s spatial segregation indices (2004)
  • 4. Enclave vs. Ethnoburb Enclave Ethnoburb Dynamics Forced segregation Voluntary segregation Spatial form Small scale Small to medium scale Population High density Medium density Location Inner city Suburbs Economy Labour-intensive sectors Business of all kinds Internal stratification Minimum Very stratified Interaction Mainly within group Both within- & inter-groups Tension Between groups Inter- & intra-group Community Mainly inward Both inward and outward Example Traditional Chinatown San Gabriel Valley Source: Li, 1997
  • 5. Some candidates … • GAM and Kulldorff’s scan statistic? • Originally developed for epidemiological or ecological studies where clustering is often very rare • Often utilised in a situation where data are generated from observations, such as the occurrence of a disease • Getis-Ord’s local G* statistic and local Moran’s I? • Designed to detect statistically significant clustering of the sample points assuming no autocorrelation in the study region • At least appeared in the relevant literature
  • 6. 기존 방법의 문제점 Source: Poulsen et al., 2010 P(z < –5.17) = 0.000000117047 P(z > 10.32) = 2.861158 x 10–25 P(z > 20.64) = 6.003128 x 10–95
  • 7. 거주지 분화에 관한 연구의 특징 • Often employ census data as the primary source of information • The presence is usually very apparent even on a simple choropleth map of the population. • Difficulties arise in delineating the boundaries of residential clusters, because those located in suburban areas have no clear borders. • The question that should be addressed by a statistical tool is more related to the extent of residential clustering than its presence or approximate location.
  • 8. 최적화 기법의 활용 • Suppose that the study region is divided into n census tracts, Ω = {x1, x2, x3, . . . , xn}, and the aim is to identify a particular number of groups whose data values are distinctively larger than those of the remaining census tracts. • The idea behind the proposed clustering method is that the quality of a given clustering can be represented by numerical indices, and the best possible subsets can be found by optimising the index values. • Which index should we use?
  • 9. 최적화 기법의 활용 • Within-group sum of absolute deviations: 𝑔 𝑛𝑖 𝑤 = � � 𝑎 𝑖𝑖 𝜇 𝑖 − 𝑏 𝑖𝑖 𝑖=0 𝑗=1 where ni is the number of census tracts in Ai, aij is the weight of the corresponding census tract and bij is the data value of interest, such as the population density of an ethnic group; μi refers to the weighted mean of all data values in Ai.
  • 10. 최적화 기법의 활용 • Because we cannot investigate all possible combinations, we need to use an alternative algorithm. • The one I implemented for demonstration worked as follows: • Step 1: Choose starting points • Step 2: Calculate and compare the clustering measure • Step 3: Expand the current cluster • Step 4: Repeat the procedures for each cluster
  • 11. Synthetic data sets • Patterns generated from an exponential distribution with λ = 0.005
  • 12. Synthetic data sets • (More) patterns from the same exponential distribution
  • 13. Local G* with a distance-based adjacency f. • Centre-to-centre distance less than 1, 2, 8 m
  • 14. Local G* with a queen-contiguity matrix
  • 15. Local G* with a queen-contiguity matrix
  • 18. Population composition in Auckland Table 1. Index of dissimilarity (D) for major ethnic groups in Auckland, 2001 Asian European Chinese Indians Korean All D 0.387 0.330 0.358 0.453 0.300 Pacific peoples Māori Samoan Tongan Cook Island All D 0.321 0.490 0.511 0.484 0.527
  • 19. Pacific peoples in Auckland • Geographic distribution of Pacific peoples in the Auckland urban areas, 2006
  • 22. Koreans in Auckland • Geographic distribution of Koreans in the Auckland urban areas, 2006
  • 25. How many iterations? • Pacific peoples in Auckland, 2006 (based on 100 simulations)
  • 26. How many iterations? • Pacific peoples in Auckland, 2006 (based on 100 simulations)
  • 27. How many iterations? • Koreans in Auckland, 2006 (based on 100 simulations)
  • 28. How many iterations? • Koreans in Auckland, 2006 (based on 100 simulations)
  • 29. How many clusters (partitions)?
  • 30. How many clusters (partitions)?
  • 31. Random seeds vs. manual seeds • Some unpublished figures for Pacific peoples ...
  • 32. Random seeds vs. manual seeds • Some unpublished figures for Korean ...
  • 33. 결과 정리 • Same as most other local statistics in the sense that it attempts to identify a set of geographically close observations with high (or low, depending on the context) data values in relation to the rest of the data • Does not require defining ‘close’ or ‘high’ prior to its application, and this feature provides an advantage over the other traditional methods in terms of delineating the boundaries of arbitrarily shaped clusters
  • 34. 결과 정리 • Possible to obtain similar results from other recently developed clustering methods (e.g. Tango and Takahashi 2005, Mu and Wang 2008, Yao et al. 2011), but they set the upper limit of cluster size for computational reasons or adopt inferential statistics as a clustering criterion. • Maybe reasonable for epidemiological research, where the cluster to be found can be small and the data are usually derived from samples, but probably not for residential clusters of population groups • Computation is more straightforward than the other (scan statistic-based) ‘flexible’ approaches.
  • 35. Albany 적용가능한 사례 Buffalo • Similar to k-means Albany Buffalo N ’hood Type Cincinnati New ark
  • 36. Computer implementation • Some ‘proof-of-concept’ level functions have been written in R. • Working but slow ... • More stable versions will be included in the ‘seg’ package, hopefully before August of this year.
  • 37. 참고 문헌 Duncan OD, and Duncan B. 1955. A methodological analysis of segregation indexes. American Sociological Review 20: 210-217. White MJ. 1983. The measurement of spatial segregation. The American Journal of Sociology 88: 1008-1018. Reardon SF, and O'Sullivan D. 2004. Measures of Spatial Segregation Sociological Methodology 34: 121-162. Poulsen M, Johnston R, and Forrest J. 2010. The intensity of ethnic residential clustering: exploring scale effects using local indicators of spatial association. Environment and Planning A 42: 874-894. Hong S-Y, and O'Sullivan D. 2012. Detecting ethnic residential clusters using an optimisation clustering method. International Journal of Geographical Information Science: 1-21.