SlideShare uma empresa Scribd logo
1 de 21
The Problem 
A MapReduce Algorithm to Create Contiguity 
Weights for Spatial Analysis of Big Data 
Xun Li, Wenwen Li, Luc Anselin, Sergio Rey, Julia 
Nov 4, 2014 
BIGSPATIAL 2014 
Koschinsky 
1
Big Spatial Data Challenge 
Cyber-Framework: CyberGIS, Spatial Hadoop 
2 
Big Spatial Data Domain 
Spatial 
Data 
Management 
Computing 
Grids 
Super 
Computers 
HPC 
Spatial 
Analysis 
Cloud Computing 
Platform 
Visualization 
Spatial 
Process 
Modeling 
Spatial 
Pattern 
Detection
Spatial Analysis on Big Data 
3 
Spatial 
Analysis 
Spatial Data 
Preprocessing 
Spatial Data 
Exploration 
Spatial Model 
Specification 
Spatial Model 
Estimation 
Spatial Model 
Validation 
Spatial 
Clustering/Autocorre 
lation 
Spatial Lag Model 
Spatial Error Model 
Spatial Weights: W 
Spatial Statistics 
Example:
Spatial Weights 
Spatial Weights 
• Spatial weights is an essential component in spatial analysis where a 
representation of spatial structure is needed. 
• Tobler: “Everything is related to everything else, but near things are 
more related to each other”. 
Create Spatial Weights (W) 
• Extract spatial structure: 
• Spatial neighboring information (contiguity based weights) 
• Spatial distance information (distance based weights) 
4 
A B C D E 
A 0 1 0 0 0 
B 1 0 1 1 0 
C 0 1 0 1 1 
D 0 1 1 0 0 
E 0 0 1 0 0 
A B C D E 
2.5 
2.5 
3.5 
A 0 
1.2 
B 1.2 0 
2.3 
0.7 
C 2.3 
0 
1.1 
D 0.7 1.1 
0 
E 0.3 
0 
4.5 
0.3 
2.5 
2.5 
3.5 
4.5 
0.1 
0.1 
Contiguity based Weights Distance based Weights
Contiguity Spatial Weights: how to find neighbors 
5 
Classic Algorithms: 
• Brutal force search : 
• Test A against B,C,D,E | B against C,D,E | C against D,E | D against E 
• O(n2) 
• Spatial Index : 
• Binning algorithm 
• r-tree index 
O(n logn) 
• Rook Contiguity: 
neighbors share borders 
• Queen Contiguity: 
neighbors share borders or vertices
Parallelize Spatial Weights Creation for big data? 
6 
Split data with a buffer zone 
A B C D E 
A 0 1 1 1 0 
B 1 0 0 1 0 
C 1 0 0 1 0 
D 1 1 1 0 1 
E 0 0 0 1 0
Counting Algorithm for Contiguity Weights Creation 
7 
Counting Algorithms: 
• Inspired by TopoJson: 
• Same vertices only stored once. 
• Counting how many polygons share a point (Queen Weights): O(n) 
1 
2 
3 4 
6 
5 
7 
8 
9 
10 
11 
12 
13 
14 
16 
15 
17 
18 
20 
19 
Count A: 
{1:[A], 
2:[A], 
3:[A], 
4:[A], 
5:[A], 
6:[A]} 
Count B: 
{1:[A] 
,2:[A] 
,3:[A] 
,4:[A] 
,5:[A,B] 
,6:[A,B] 
,7:[B] 
,8:[B] 
,9:[B] 
,10:[B]} 
Count C: 
{1:[A] 
,2:[A], 
,3:[A,C] 
,4:[A,C] 
,5:[A,B] 
,6:[A,B] 
,7:[B] 
,8:[B] 
,9:[B] 
,10:[B] 
,13:[C] 
,14:[C] 
,15:[C] 
,16:[C]} 
Neighbors: 
[A,C] 
[A,B]
Counting Algorithm for Contiguity Weights Creation 
8 
Counting Algorithms: 
• Counting how many polygons share an edge (Rook Weights): O(n) 
1 
2 
3 4 
6 
5 
7 
8 
9 
10 
11 
12 
13 
14 
16 
15 
17 
18 
20 
19 
Count A: 
{(1,2):[A] 
,(2,3):[A] 
,(3,4):[A] 
,(4,5):[A] 
,(5,6):[A] 
,(6,1):[A]} 
Count B: 
{(1,2):[A] 
,(2,3):[A] 
,(3,4):[A] 
,(4,5):[A] 
,(5,6):[A,B] 
,(6,1):[A] 
,(6,7):[B] 
,(7,8):[B] 
,(8,9):[B] 
,(9,10):[B]} 
Neighbors: 
[A,B]
Parallel Counting Algorithm? 
9 
1 
2 
3 4 
6 
5 
7 
8 
9 
10 
11 
12 
13 
14 
16 
15 
17 
18 
20 
19 
7 
Count Results: 
{1:[A] 
,2:[A] 
,3:[A,C] 
,4:[A,C] 
,5:[A] 
,6:[A] 
,13:[C] 
,14:[C] 
…} 
Count Results: 
{5:[B,D] 
,6:[B] 
…,9:[B] 
,10:[B,D] 
,11:[D,E] 
,12:[D,E] 
,13:[D] 
…} 
1 
2 
3 4 
6 
5 
13 
14 
16 
15 
4 
6 
5 
7 
8 
9 
10 
11 
12 
13 
17 
20 
19 
7
Parallel Counting Algorithm? –Conti. 
10 
Print line by line 
1:[A] 
2:[A] 
3:[A,C] 
4:[A,C] 
5:[A] 
6:[A] 
13:[C] 
14:[C] 
… 
Print line by line 
5:[B,D] 
6:[B] 
… 
9:[B] 
10:[B,D] 
11:[D,E] 
12:[D,E] 
13:[D] 
… 
1 
2 
3 4 
6 
5 
13 
14 
16 
15 
4 
6 
5 
7 
8 
9 
10 
11 
12 
13 
17 
20 
19 
7 
Merge & Sort 
Two Results: 
1:[A] 
2:[A] 
3:[A,C] 
4:[A,C] 
4:[A] 
4:[D] 
5:[A] 
5:[B,D] 
6:[A] 
6:[B] 
7:[B] 
11:[D,E] 
12:[D,E] 
13:[C] 
13:[D] 
14:[C] 
… 
{3:[A,C]} 
{4:[A,C,D]} 
{5:[A,B,D]} 
{6:[A,B]} 
{11:[D,E]} 
{12:[D,E]} 
{13:[C,D]} 
A B C D E 
A 0 1 1 1 0 
B 1 0 0 1 0 
C 1 0 0 1 0 
D 1 1 1 0 1 
E 0 0 0 1 0
MapReduce Contiguity Weights Creation 
11 
Input HDFS Output HDFS 
Data 
split1 
split2 
split3 
split4 
map 
map 
map 
map 
Sorted 
results1 
Sorted 
results2 
reduce 
reduce 
W.part0 
W.part1 
DistCP W
MapReduce Contiguity Weights Creation –Cont. 
12 
Other Details: 
• Input data (each line): 
e.g. 
A, 1,2,3,4,5,6 
• Output data *.gal file (every two lines): 
e.g. 
A 3 
B C D 
• Source code: 
https://github.com/lixun910/mrweights
Experiments 
13 
Original Data: 
• parcel data of Chicago city in the United States 
• 592,521 polygons 
Artificial Big Data: 
• Duplicate original data several times side by side 
• For example: a 4x original data with 2,370,084 polygons 
• The largest test data is a 32x original data
Experiment 
14 
Test System 
• Desktop Computer 
• 2.93 GHz 8 cores CPU, 16 GB memory, 100 GB HD and 64- 
bit Operating System 
• Hadoop System 
• Amazon Elastic MapReduce (EMR) 
• 1 to 18 nodes of “C3 Extra Large” computer instance 
(7.5 GB memory, 14 cores (4 core x 3.5 unit) CPU, 80 GB (2 x 
40GB SSD), 64-bit Operating System and 500Mbps moderate 
network speed )
Experiment 
15 
Code/Application 
• Desktop version (Python) 
• No parallel 
• Hadoop version (Python) 
• Executed via Hadoop streaming pipeline
Experiment-1 
16 
PC v.s. Hadoop 
• Data: 1x, 2x, 4x, 8x, 16x and 32x data respectively 
• Hadoop setup: 6 nodes of C3.xlarge
Experiment-2 
17 
Hadoop with different number of nodes on 32x data 
• Hadoop setup: 6, 12, 14, 18 nodes of C3.xlarge
Integrate to Weights Creation Web Service 
18 
HPC Pool & Hadoop 
Threshold to trigger 
Hadoop Weights 
Creation: 
2 million polygons
Issues 
19 
• This algorithm won’t work when spatial neighbors do not share 
points or edges (it requires the shared points are exactly same) 
• This algorithm can’t generate distance based weights 
• Potential solution 
• Use MapReduce r-tree (SpatialHadoop)
Conclusion 
• Contribution: a MapReduce algorithm to create 
contiguity weights matrix for big spatial data 
• Ongoing work: use existing MapReduce r-tree to solve 
the potential issues of this algorithm 
20
Thanks! 
The Problem 
Nov 4, 2014 
BIGSPATIAL 2014 
21

Mais conteúdo relacionado

Mais procurados

Using Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite ImageryUsing Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite ImageryAstraea, Inc.
 
Processing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTechProcessing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTechRob Emanuele
 
Big Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC SystemsBig Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC SystemsHPCC Systems
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkRob Emanuele
 
Foss4 g 2017-kansai-ryoo-kim
Foss4 g 2017-kansai-ryoo-kimFoss4 g 2017-kansai-ryoo-kim
Foss4 g 2017-kansai-ryoo-kimOSgeo Japan
 
OpenLayers Feature Frenzy
OpenLayers Feature FrenzyOpenLayers Feature Frenzy
OpenLayers Feature FrenzyAndreas Hocevar
 
2013.10.24 big datavisualization
2013.10.24 big datavisualization2013.10.24 big datavisualization
2013.10.24 big datavisualizationSean Kandel
 
Spatial station
Spatial stationSpatial station
Spatial stationAtiqa khan
 
OL3-Cesium: 3D for OpenLayers maps
OL3-Cesium: 3D for OpenLayers mapsOL3-Cesium: 3D for OpenLayers maps
OL3-Cesium: 3D for OpenLayers mapsAndreas Hocevar
 
Dem analaysis and catchment delineation using GIS
Dem analaysis and catchment delineation using GISDem analaysis and catchment delineation using GIS
Dem analaysis and catchment delineation using GISHans van der Kwast
 
What's New in ArcGIS 10.1 Data Interoperability Extension
What's New in ArcGIS 10.1 Data Interoperability ExtensionWhat's New in ArcGIS 10.1 Data Interoperability Extension
What's New in ArcGIS 10.1 Data Interoperability ExtensionSafe Software
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysisAbhiram Kanigolla
 
Feature Extraction Based Estimation of Rain Fall By Cross Correlating Cloud R...
Feature Extraction Based Estimation of Rain Fall By Cross Correlating Cloud R...Feature Extraction Based Estimation of Rain Fall By Cross Correlating Cloud R...
Feature Extraction Based Estimation of Rain Fall By Cross Correlating Cloud R...IOSR Journals
 
Infinum Android Talks #04 - Google Maps Android API utility library
Infinum Android Talks #04 - Google Maps Android API utility libraryInfinum Android Talks #04 - Google Maps Android API utility library
Infinum Android Talks #04 - Google Maps Android API utility libraryDenis_infinum
 

Mais procurados (18)

Using Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite ImageryUsing Deep Learning to Derive 3D Cities from Satellite Imagery
Using Deep Learning to Derive 3D Cities from Satellite Imagery
 
Processing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTechProcessing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTech
 
Big Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC SystemsBig Data and Geospatial with HPCC Systems
Big Data and Geospatial with HPCC Systems
 
EECSCon Poster
EECSCon PosterEECSCon Poster
EECSCon Poster
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
 
Foss4 g 2017-kansai-ryoo-kim
Foss4 g 2017-kansai-ryoo-kimFoss4 g 2017-kansai-ryoo-kim
Foss4 g 2017-kansai-ryoo-kim
 
OpenLayers Feature Frenzy
OpenLayers Feature FrenzyOpenLayers Feature Frenzy
OpenLayers Feature Frenzy
 
2013.10.24 big datavisualization
2013.10.24 big datavisualization2013.10.24 big datavisualization
2013.10.24 big datavisualization
 
Spatial station
Spatial stationSpatial station
Spatial station
 
OL3-Cesium: 3D for OpenLayers maps
OL3-Cesium: 3D for OpenLayers mapsOL3-Cesium: 3D for OpenLayers maps
OL3-Cesium: 3D for OpenLayers maps
 
Dem analaysis and catchment delineation using GIS
Dem analaysis and catchment delineation using GISDem analaysis and catchment delineation using GIS
Dem analaysis and catchment delineation using GIS
 
Raster processing
Raster processingRaster processing
Raster processing
 
What's New in ArcGIS 10.1 Data Interoperability Extension
What's New in ArcGIS 10.1 Data Interoperability ExtensionWhat's New in ArcGIS 10.1 Data Interoperability Extension
What's New in ArcGIS 10.1 Data Interoperability Extension
 
R programming language in spatial analysis
R programming language in spatial analysisR programming language in spatial analysis
R programming language in spatial analysis
 
Feature Extraction Based Estimation of Rain Fall By Cross Correlating Cloud R...
Feature Extraction Based Estimation of Rain Fall By Cross Correlating Cloud R...Feature Extraction Based Estimation of Rain Fall By Cross Correlating Cloud R...
Feature Extraction Based Estimation of Rain Fall By Cross Correlating Cloud R...
 
Mapreduce
MapreduceMapreduce
Mapreduce
 
QGIS training class 1
QGIS training class 1QGIS training class 1
QGIS training class 1
 
Infinum Android Talks #04 - Google Maps Android API utility library
Infinum Android Talks #04 - Google Maps Android API utility libraryInfinum Android Talks #04 - Google Maps Android API utility library
Infinum Android Talks #04 - Google Maps Android API utility library
 

Destaque

Diane_MAED-EM presentation report
Diane_MAED-EM presentation reportDiane_MAED-EM presentation report
Diane_MAED-EM presentation reportdiane mercado
 
ฉันเหมือนใคร
ฉันเหมือนใครฉันเหมือนใคร
ฉันเหมือนใครminddddd
 
Interview in The Policy Magazine, The UAE Insurance Report 2012
Interview in The Policy Magazine, The UAE Insurance Report 2012Interview in The Policy Magazine, The UAE Insurance Report 2012
Interview in The Policy Magazine, The UAE Insurance Report 2012Agile Financial Technologies
 
Health insurance exchanges Employer Coverage Tool
Health insurance exchanges   Employer Coverage ToolHealth insurance exchanges   Employer Coverage Tool
Health insurance exchanges Employer Coverage Toollerickson312
 
ADVN - archief en onderzoekscentrum
ADVN - archief en onderzoekscentrumADVN - archief en onderzoekscentrum
ADVN - archief en onderzoekscentrumTom Cobbaert
 
180180219 de-toekomst-van-confederaal-belgie-volgens-n-va
180180219 de-toekomst-van-confederaal-belgie-volgens-n-va180180219 de-toekomst-van-confederaal-belgie-volgens-n-va
180180219 de-toekomst-van-confederaal-belgie-volgens-n-valesoirbe
 
How effective is the combination of main product
How effective is the combination of main productHow effective is the combination of main product
How effective is the combination of main productSabina Begum
 
Les classes inversées, un phénomène précurseur pour la formation à l’ère numé...
Les classes inversées, un phénomène précurseur pour la formation à l’ère numé...Les classes inversées, un phénomène précurseur pour la formation à l’ère numé...
Les classes inversées, un phénomène précurseur pour la formation à l’ère numé...Marcel Lebrun
 
Japan, Korea and India - Cross Cultural Paper - by Erek Cyr
Japan, Korea and India - Cross Cultural Paper - by Erek CyrJapan, Korea and India - Cross Cultural Paper - by Erek Cyr
Japan, Korea and India - Cross Cultural Paper - by Erek CyrErek Cyr
 
Empalme de números índices
Empalme de números índicesEmpalme de números índices
Empalme de números índicesbertalozano3105
 
ματιές στο ναύπλιο
ματιές στο ναύπλιοματιές στο ναύπλιο
ματιές στο ναύπλιοsxoliastis
 
Customer Service - Banco Sabadell
Customer Service - Banco SabadellCustomer Service - Banco Sabadell
Customer Service - Banco SabadellXavier Marin
 
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)Toru Fujino
 

Destaque (18)

Diane_MAED-EM presentation report
Diane_MAED-EM presentation reportDiane_MAED-EM presentation report
Diane_MAED-EM presentation report
 
Mi ppt inicial
Mi ppt inicialMi ppt inicial
Mi ppt inicial
 
ฉันเหมือนใคร
ฉันเหมือนใครฉันเหมือนใคร
ฉันเหมือนใคร
 
Interview in The Policy Magazine, The UAE Insurance Report 2012
Interview in The Policy Magazine, The UAE Insurance Report 2012Interview in The Policy Magazine, The UAE Insurance Report 2012
Interview in The Policy Magazine, The UAE Insurance Report 2012
 
Health insurance exchanges Employer Coverage Tool
Health insurance exchanges   Employer Coverage ToolHealth insurance exchanges   Employer Coverage Tool
Health insurance exchanges Employer Coverage Tool
 
ADVN - archief en onderzoekscentrum
ADVN - archief en onderzoekscentrumADVN - archief en onderzoekscentrum
ADVN - archief en onderzoekscentrum
 
180180219 de-toekomst-van-confederaal-belgie-volgens-n-va
180180219 de-toekomst-van-confederaal-belgie-volgens-n-va180180219 de-toekomst-van-confederaal-belgie-volgens-n-va
180180219 de-toekomst-van-confederaal-belgie-volgens-n-va
 
How effective is the combination of main product
How effective is the combination of main productHow effective is the combination of main product
How effective is the combination of main product
 
Les classes inversées, un phénomène précurseur pour la formation à l’ère numé...
Les classes inversées, un phénomène précurseur pour la formation à l’ère numé...Les classes inversées, un phénomène précurseur pour la formation à l’ère numé...
Les classes inversées, un phénomène précurseur pour la formation à l’ère numé...
 
Japan, Korea and India - Cross Cultural Paper - by Erek Cyr
Japan, Korea and India - Cross Cultural Paper - by Erek CyrJapan, Korea and India - Cross Cultural Paper - by Erek Cyr
Japan, Korea and India - Cross Cultural Paper - by Erek Cyr
 
Citations
CitationsCitations
Citations
 
Empalme de números índices
Empalme de números índicesEmpalme de números índices
Empalme de números índices
 
12 3 12 leccion
12 3 12 leccion12 3 12 leccion
12 3 12 leccion
 
Market research for msben project advert
Market research for msben project advertMarket research for msben project advert
Market research for msben project advert
 
ματιές στο ναύπλιο
ματιές στο ναύπλιοματιές στο ναύπλιο
ματιές στο ναύπλιο
 
Customer Service - Banco Sabadell
Customer Service - Banco SabadellCustomer Service - Banco Sabadell
Customer Service - Banco Sabadell
 
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)
Tree-to-Sequence Attentional Neural Machine Translation (ACL 2016)
 
The Future Of Work
The Future Of Work The Future Of Work
The Future Of Work
 

Semelhante a Big spatial2014 mapreduceweights

Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processingjins0618
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsBita Kazemi
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceKyong-Ha Lee
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkCloudera, Inc.
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmYu Liu
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelSqrrl
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
 
Topological Data Analysis
Topological Data AnalysisTopological Data Analysis
Topological Data AnalysisDeviousQuant
 
1 chayes
1 chayes1 chayes
1 chayesYandex
 
Outrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesOutrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesMax De Marzi
 
Reverse Engineering for additive manufacturing
Reverse Engineering for additive manufacturingReverse Engineering for additive manufacturing
Reverse Engineering for additive manufacturingAchmadRifaie4
 
Advanced Data Structures 2006
Advanced Data Structures 2006Advanced Data Structures 2006
Advanced Data Structures 2006Sanjay Goel
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsAlbert Bifet
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processinghuguk
 
WMS Performance Shootout 2011
WMS Performance Shootout 2011WMS Performance Shootout 2011
WMS Performance Shootout 2011Jeff McKenna
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLMLconf
 

Semelhante a Big spatial2014 mapreduceweights (20)

Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasets
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
 
Graph chi
Graph chiGraph chi
Graph chi
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with Pregel
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
Topological Data Analysis
Topological Data AnalysisTopological Data Analysis
Topological Data Analysis
 
1 chayes
1 chayes1 chayes
1 chayes
 
Outrageous Ideas for Graph Databases
Outrageous Ideas for Graph DatabasesOutrageous Ideas for Graph Databases
Outrageous Ideas for Graph Databases
 
Reverse Engineering for additive manufacturing
Reverse Engineering for additive manufacturingReverse Engineering for additive manufacturing
Reverse Engineering for additive manufacturing
 
Advanced Data Structures 2006
Advanced Data Structures 2006Advanced Data Structures 2006
Advanced Data Structures 2006
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processing
 
WMS Performance Shootout 2011
WMS Performance Shootout 2011WMS Performance Shootout 2011
WMS Performance Shootout 2011
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
 

Mais de Arizona State University

Mais de Arizona State University (8)

CartoDB fans: GeoDa1.8 provides extra power of spatial analysis
CartoDB fans: GeoDa1.8 provides extra power of spatial analysisCartoDB fans: GeoDa1.8 provides extra power of spatial analysis
CartoDB fans: GeoDa1.8 provides extra power of spatial analysis
 
CAST a software for data analysis in space and time
CAST a software for data analysis in space and timeCAST a software for data analysis in space and time
CAST a software for data analysis in space and time
 
Travel Plan using Geo-tagged Photos in Geocrowd2013
Travel Plan using Geo-tagged Photos in Geocrowd2013 Travel Plan using Geo-tagged Photos in Geocrowd2013
Travel Plan using Geo-tagged Photos in Geocrowd2013
 
Machine learningmove website
Machine learningmove websiteMachine learningmove website
Machine learningmove website
 
Wxpysal website
Wxpysal websiteWxpysal website
Wxpysal website
 
3 d pointcloud
3 d pointcloud3 d pointcloud
3 d pointcloud
 
Xelerator software
Xelerator softwareXelerator software
Xelerator software
 
Mining attractive places and travel patterns from photos
Mining attractive places and travel patterns from photosMining attractive places and travel patterns from photos
Mining attractive places and travel patterns from photos
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Big spatial2014 mapreduceweights

  • 1. The Problem A MapReduce Algorithm to Create Contiguity Weights for Spatial Analysis of Big Data Xun Li, Wenwen Li, Luc Anselin, Sergio Rey, Julia Nov 4, 2014 BIGSPATIAL 2014 Koschinsky 1
  • 2. Big Spatial Data Challenge Cyber-Framework: CyberGIS, Spatial Hadoop 2 Big Spatial Data Domain Spatial Data Management Computing Grids Super Computers HPC Spatial Analysis Cloud Computing Platform Visualization Spatial Process Modeling Spatial Pattern Detection
  • 3. Spatial Analysis on Big Data 3 Spatial Analysis Spatial Data Preprocessing Spatial Data Exploration Spatial Model Specification Spatial Model Estimation Spatial Model Validation Spatial Clustering/Autocorre lation Spatial Lag Model Spatial Error Model Spatial Weights: W Spatial Statistics Example:
  • 4. Spatial Weights Spatial Weights • Spatial weights is an essential component in spatial analysis where a representation of spatial structure is needed. • Tobler: “Everything is related to everything else, but near things are more related to each other”. Create Spatial Weights (W) • Extract spatial structure: • Spatial neighboring information (contiguity based weights) • Spatial distance information (distance based weights) 4 A B C D E A 0 1 0 0 0 B 1 0 1 1 0 C 0 1 0 1 1 D 0 1 1 0 0 E 0 0 1 0 0 A B C D E 2.5 2.5 3.5 A 0 1.2 B 1.2 0 2.3 0.7 C 2.3 0 1.1 D 0.7 1.1 0 E 0.3 0 4.5 0.3 2.5 2.5 3.5 4.5 0.1 0.1 Contiguity based Weights Distance based Weights
  • 5. Contiguity Spatial Weights: how to find neighbors 5 Classic Algorithms: • Brutal force search : • Test A against B,C,D,E | B against C,D,E | C against D,E | D against E • O(n2) • Spatial Index : • Binning algorithm • r-tree index O(n logn) • Rook Contiguity: neighbors share borders • Queen Contiguity: neighbors share borders or vertices
  • 6. Parallelize Spatial Weights Creation for big data? 6 Split data with a buffer zone A B C D E A 0 1 1 1 0 B 1 0 0 1 0 C 1 0 0 1 0 D 1 1 1 0 1 E 0 0 0 1 0
  • 7. Counting Algorithm for Contiguity Weights Creation 7 Counting Algorithms: • Inspired by TopoJson: • Same vertices only stored once. • Counting how many polygons share a point (Queen Weights): O(n) 1 2 3 4 6 5 7 8 9 10 11 12 13 14 16 15 17 18 20 19 Count A: {1:[A], 2:[A], 3:[A], 4:[A], 5:[A], 6:[A]} Count B: {1:[A] ,2:[A] ,3:[A] ,4:[A] ,5:[A,B] ,6:[A,B] ,7:[B] ,8:[B] ,9:[B] ,10:[B]} Count C: {1:[A] ,2:[A], ,3:[A,C] ,4:[A,C] ,5:[A,B] ,6:[A,B] ,7:[B] ,8:[B] ,9:[B] ,10:[B] ,13:[C] ,14:[C] ,15:[C] ,16:[C]} Neighbors: [A,C] [A,B]
  • 8. Counting Algorithm for Contiguity Weights Creation 8 Counting Algorithms: • Counting how many polygons share an edge (Rook Weights): O(n) 1 2 3 4 6 5 7 8 9 10 11 12 13 14 16 15 17 18 20 19 Count A: {(1,2):[A] ,(2,3):[A] ,(3,4):[A] ,(4,5):[A] ,(5,6):[A] ,(6,1):[A]} Count B: {(1,2):[A] ,(2,3):[A] ,(3,4):[A] ,(4,5):[A] ,(5,6):[A,B] ,(6,1):[A] ,(6,7):[B] ,(7,8):[B] ,(8,9):[B] ,(9,10):[B]} Neighbors: [A,B]
  • 9. Parallel Counting Algorithm? 9 1 2 3 4 6 5 7 8 9 10 11 12 13 14 16 15 17 18 20 19 7 Count Results: {1:[A] ,2:[A] ,3:[A,C] ,4:[A,C] ,5:[A] ,6:[A] ,13:[C] ,14:[C] …} Count Results: {5:[B,D] ,6:[B] …,9:[B] ,10:[B,D] ,11:[D,E] ,12:[D,E] ,13:[D] …} 1 2 3 4 6 5 13 14 16 15 4 6 5 7 8 9 10 11 12 13 17 20 19 7
  • 10. Parallel Counting Algorithm? –Conti. 10 Print line by line 1:[A] 2:[A] 3:[A,C] 4:[A,C] 5:[A] 6:[A] 13:[C] 14:[C] … Print line by line 5:[B,D] 6:[B] … 9:[B] 10:[B,D] 11:[D,E] 12:[D,E] 13:[D] … 1 2 3 4 6 5 13 14 16 15 4 6 5 7 8 9 10 11 12 13 17 20 19 7 Merge & Sort Two Results: 1:[A] 2:[A] 3:[A,C] 4:[A,C] 4:[A] 4:[D] 5:[A] 5:[B,D] 6:[A] 6:[B] 7:[B] 11:[D,E] 12:[D,E] 13:[C] 13:[D] 14:[C] … {3:[A,C]} {4:[A,C,D]} {5:[A,B,D]} {6:[A,B]} {11:[D,E]} {12:[D,E]} {13:[C,D]} A B C D E A 0 1 1 1 0 B 1 0 0 1 0 C 1 0 0 1 0 D 1 1 1 0 1 E 0 0 0 1 0
  • 11. MapReduce Contiguity Weights Creation 11 Input HDFS Output HDFS Data split1 split2 split3 split4 map map map map Sorted results1 Sorted results2 reduce reduce W.part0 W.part1 DistCP W
  • 12. MapReduce Contiguity Weights Creation –Cont. 12 Other Details: • Input data (each line): e.g. A, 1,2,3,4,5,6 • Output data *.gal file (every two lines): e.g. A 3 B C D • Source code: https://github.com/lixun910/mrweights
  • 13. Experiments 13 Original Data: • parcel data of Chicago city in the United States • 592,521 polygons Artificial Big Data: • Duplicate original data several times side by side • For example: a 4x original data with 2,370,084 polygons • The largest test data is a 32x original data
  • 14. Experiment 14 Test System • Desktop Computer • 2.93 GHz 8 cores CPU, 16 GB memory, 100 GB HD and 64- bit Operating System • Hadoop System • Amazon Elastic MapReduce (EMR) • 1 to 18 nodes of “C3 Extra Large” computer instance (7.5 GB memory, 14 cores (4 core x 3.5 unit) CPU, 80 GB (2 x 40GB SSD), 64-bit Operating System and 500Mbps moderate network speed )
  • 15. Experiment 15 Code/Application • Desktop version (Python) • No parallel • Hadoop version (Python) • Executed via Hadoop streaming pipeline
  • 16. Experiment-1 16 PC v.s. Hadoop • Data: 1x, 2x, 4x, 8x, 16x and 32x data respectively • Hadoop setup: 6 nodes of C3.xlarge
  • 17. Experiment-2 17 Hadoop with different number of nodes on 32x data • Hadoop setup: 6, 12, 14, 18 nodes of C3.xlarge
  • 18. Integrate to Weights Creation Web Service 18 HPC Pool & Hadoop Threshold to trigger Hadoop Weights Creation: 2 million polygons
  • 19. Issues 19 • This algorithm won’t work when spatial neighbors do not share points or edges (it requires the shared points are exactly same) • This algorithm can’t generate distance based weights • Potential solution • Use MapReduce r-tree (SpatialHadoop)
  • 20. Conclusion • Contribution: a MapReduce algorithm to create contiguity weights matrix for big spatial data • Ongoing work: use existing MapReduce r-tree to solve the potential issues of this algorithm 20
  • 21. Thanks! The Problem Nov 4, 2014 BIGSPATIAL 2014 21

Notas do Editor

  1. Hot topic Much research has focused on creating a cber-framework Computing resources includes: computing grids, super computers, HPC, cloud computing platform etc. 5 import components SA provides scientists Ability to analyze big data statistically
  2. Is a process of Spatial weights is an essential part of spatial analysis since it represents the geographic structure of spatial objects. For example,.. However, current data structure and algorithms base on sing desk com arch There are some research work tried to parallelize spatial analysis, however, they are still not capable of dealing with big data. And no one talks about creating spatial weights, which is the first step to solve this problem.
  3. Spatial Weights Create Spatial Weights What is W? W is most represented using a matrix, called weights matrix. Each cell value represent the spatial relationship between object I and J If the cell value is Zero, then the two objects has no spatial relationship in this weights matrix Contiguity weights matrix is a binary matrix. Value 1 represents two objects are contiguous. They are neighbors. Distance weights matrix uses actual distance between two objects.
  4. r-tree works by group nearby objects using their bounding box at different hierarchical level for a fast search. For each spatial object, it takes O(logn) time to find candidate neighbors r-tree has faster search time than binning algorithm, but it takes longer time to create a r-tree index. So, binning algorithm is more practical than r-tree
  5. However, find a buffer zone takes extra time, and since the geometries have irregular shapes, most of the time it’s hard to find a proper buffer zone. Another solution, which we are trying now is using the MapReduced r-tree, and we can talk about it later.
  6. HDFS: Hadoop Distributed File System
  7. Since Hadoop will spend extra time to deliver program and communicate with running nodes, it is actually slower than running the same program on the desktop computer for dataset less than 4-time of the raw data (2 million) However, the bigger the data, the better performance this algorithm can achieve on the Hadoop system. For example, for a 8x data, the algorithm on Hadoop took 167 seconds to complete, and the runtime is much faster than that on a desktop computer (482.67 seconds) The PC can’t handle 16x data 8 million. the running time increases linearly , which means this algorithm can be scaled up with growing size of data
  8. The best performance we can get from all tests is using 18 computer nodes in Hadoop to create contiguity weights file using 32x data in 163 seconds. The running time also does not decline linearly with the increasing number of computing nodes. This phenomenon is reasonable since there will be some extra time used for larger number of computing nodes to communicate inside the Hadoop system.
  9. Web Processing Service (WPS)
  10. We demonstrate the capability and efficiency of this algorithm by generating the weights file for big spatial data using Amazon’s hadoop system.