SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
© 2016 IBM Corporation
High Performance Spatial-Temporal Trajectory
Analysis with Spark
YongHua (Henry) Zeng
zengyh@cn.ibm.com
Big Data & Analytics Solution Architect
Analytics Platform Services,IBM China Lab
© 2016 IBM Corporation2
Agenda
• Background
• Architecture
• Technical Design
• Big Data Platform design
• Data governance design
• Algorithm model
• Spark spatial computing
• Scenarios demo
• Conclusion and Next step
2
© 2016 IBM Corporation3
Background Introduction
-- study the human trajectory by mobile signal data
Problem
• Varieties of data that traditional
planning will not be able to tackle
• Many of the data have the characteristics
of big data (volume, velocity, varieties)
• Cellular signaling data is one of such
typical data that can enable new types of
applications to facilitate smarter urban
planning
• Analyzing cellular signal data can help
urban planner & city governing bodies to
better understand the city
Data Set
• Cellular signal data
• Mobile users 5M
• 25M to 50M data every minute; 30G of data daily
• ~ 400M cellular signal records daily
• More data coming with GPS, RFID for 4M vehicles
© 2016 IBM Corporation4
Solution Architecture
Data
sources
Distributed File System
Streaming
Resource Management
YARN
API Services
Orchestration
Batch
Relational
Database
w/ Spatial
Extention
Computation Engine
Visualization
& Report
Data
Ingestion
HDFS
LDAP
Service
Cluster
Management
Security
Service
javascript
Flex
Shp
file
etc
© 2016 IBM Corporation5
Data
Collection Data
Aggregation
Coordinates
Formalization
Abnormal
Detections
Final
Computing
Source
Data
Pre-processing Base Model Computing
Data Quality
Metrics
Application Model Computing
Residential
Statistics
Working
Region
Statistics
Regional
Commuting
Analysis
The Big Data Platform
Application Views
GIS Server
GIS
Database
Residential,
Community
Data
Data
Cleansing
Business Architecture
© 2016 IBM Corporation6
Architecture Decision Points
GIS spatial DB
Data Fusion Standard
Bigdata Platform
ELT
Data Store & Analysis
OD analyssi
Index
Computing
Data Quality
computiing
Home-office
analyssi
Streaming
Home-Office	
DW/Market
Data Export
thermodynamic
diagram
User 2
User 3
User1
GIS 应用展现
Base Alg App Alg
手机信令
(在线/脱
机)
Data
collect
ion
Database(business,
spatial)
Home-Office	
DW/Market
Job andresourceSchedule
Flex/JS
Spatial DB
(spatial
extension)
ArcGIS
Spark
Streaming
Oozie/YarnShell脚本
Spark/HDFS
Sqoop
Java
© 2016 IBM Corporation7
System front-end architecture
Geospatial
Analysis Big
Data Platform
(HDFS)
Sqoop
FTP
© 2016 IBM Corporation8
Agenda
• Background
• Architecture
• Technical Design
• Big Data Platform design
• Data governance design
• Algorithm model
• Spark spatial computing
• Scenarios demo
• Conclusion and Next step
8
© 2016 IBM Corporation9
Items on Big Data Platform Design
ü Planning and product selection
ü Deployment and operation
ü Application deployment
ü Job scheduling
ü Resource management
ü Spark within BigInsights
© 2016 IBM Corporation10
IBM BigInsights for Apache Hadoop and Spark
Discovery
& Exploration
Prescriptive
Analytics
Predictive
Analytics
Content
Analytics
Business Intelligence
Data
Mgmt
Hadoop &
NoSQL
Content
Mgmt
Data
Warehouse
Information Integration & Governance
IBM ANALYTICS PLATFORM
Built on Spark. Hybrid. Trusted.
Spark Analytics Operating System
Machine LearningOn premises On cloud
Data at rest & In-motion.Inside & outside the firewall. Structured & unstructured.
§ Analytical platform for
persistent Big Data
– 100% open source core
with IBM add-ons for
analysts, data
scientists, and admins
– On site or cloud
§ Distinguishing
characteristics
– Built-in analytics . . . .
Enhances business
knowledge
– Enterprise software
integration . . . .
Complements and
extends existing
capabilities
– Production-ready . . . .
Speeds time-to-value
§ IBM advantage
– Combination of
software, hardware,
services and research
© 2016 IBM Corporation11
IBM Open Platform
100% open source platform compliant with ODPi
Apache Hadoop ecosystem
Apache Spark ecosystem
IBM-specific BigInsights features
Big SQL (industry standard SQL)
Text analytics
BigSheets (spreadsheet-style tool)
Big R (R support)
IBM Streams, Cognos (limited use licenses)
Overview of BigInsights
Free Quick Start (non production):
• IBM Open Platform
• IBM added value features
• Community support
© 2016 IBM Corporation12
Big data platform job scheduling and resource mgmt
12
- Dedicated slave nodes for computing - almost all CPU & memory
resources in each slave node is managed by Yarn
- Capacity scheduler using dedicated queues for various business usage -
production (batch & streaming processing, data movement), development
- Elastic resource capacity for each queue by specifying a large maximum
capacity, to achieve high resource utilization
- Fine-grained Yarn container allocation by specifying small increment
vcore/memory sizes, to support various workload types - big, medium and
small jobs
- No CGroups-based CPU resource isolation, because of system stability
issues caused by this in our IOP 4.1/RHEL 6.5 environments
Job scheduling with
Oozie
Resource mgmt with
YARN
© 2016 IBM Corporation13
Spark within BigInsights
ü Deployment
§ Amabari for installation and deployment
§ Spark (compute node) co-exist with data node (HDFS)
§ Cluster mode with YARN as the resource mgmt
ü Runtime Configuration
§ Bad configuration may cause job under-perform, failed, cluster
instable etc
§ Methodology to configure the partition #, cores/mem of executor, #
of executors
ü Monitoring and Tuning
§ Spark streaming stability (monitoring log, checkpoint)
§ Handle massive small files
§ Shuffle, partition, IO utilization etc
§ Job execution, GC time etc via dashboard
© 2016 IBM Corporation14
Data Perspective Considerations
§ Data process flow
§ Data management
− capacity sizing, layout in HDFS, lifecycle mgmt
§ Data movement
− Between big data platform and other systems
RDBMS
Data Process Flow
© 2016 IBM Corporation15
DIST
15
• 5 Layers of Data in the System
• L1 raw data ingested into HDFS
• L2 ELT data (pre-processing with streaming) in HDFS
• L3 result data via algorithm model in HDFS
• L4 data for visualization (in HDFS or RDBMS)
• L5 archived data in external storage (compressed)
• Design the data layout, # of copies, retention etc in HDFS
• Jobs to prune out-dated data in HDFS (Oozie)
Data lifecycle management
© 2016 IBM Corporation16
DIST
16
• Data Ingestion into Big Data platform
• Offline/Online data ingestion -- HDFS loading from (external storage)
/FTP server + Streaming
• Future – Kafka + Spark Streaming (more data sources, analytics path)
• Data Export from Big Data platform
• Near real-time heatmap generation
• Algorithm model results exported to RDBMS -- Sqoop
Data movement
HDFS
load
from
FTP
Spark
streaming
ArcGIS
Server
(generat
e
heatma
p based
on shp
file)
实时
展现
回溯查
询
每30分钟推送到数
据库中
Basic
Algorithm
-- stay
pointin
HDFS
FTP
push
© 2016 IBM Corporation17
DIST
17
Algorithm Model of Trajectory(OD) Analysis
统计数据导出
Cellular
signal data
ELT
Trajectory
Sequence
Multi-Day
Stable Point
OD
Identification
CommuteStay Points
OD Statistics
OD Index Stats
Commute Stats
People Flow
Stats
Data Quality
Index
统计数据导出>1km >1k
m
GRO UP1 GRO UP2 GRO UP3
By different
area type
Algorithm Accuracy
Validation
Algorithm
Performance
Algorithm Stability
Algorithm
Extensibility
Algorithm
Configurable
Application
Algorithm
Base
Algorithm
© 2016 IBM Corporation18
Geospatial Computation with Spark
§ Requirements
− Spark to direct support of SDE/Shp/GeoJson file
− Most of the geospatial computation in Spark cluster (point-area relationship, spherical distance, geospatial
stats etc)
− Performance challenge – 20M records per each iteration of geospatial computation
§ Solution Design
SDE
shp
file
Spark
Cluster
Basic Algorithm (geospatial
computation)
ApplicationAlgorithm (geospatial
statistics)
SDE interface SHP interface
Geospatial
API
Grid API
Spark-GIS libGrid
definition
© 2016 IBM Corporation19
Spatial Grid Design for Spark
关系
Home-Office
Model
Statistic by
Group
Group-Grid
Mapping
Statistic by
Grid
Grid Home-
Office Statistic
Table
Grid Statistic
Table
User Define
Query
Pre-define
Query
Convert formula expansion formula
Spark
Base Algorithm
Spark
App Statistic
Relation
Database
Web GIS
Application
Web GIS Front-end
© 2016 IBM Corporation20
Agenda
• Background – problem and data
• Architecture
• Technical Design
• Big Data Platform design
• Data governance design
• Algorithm model
• Spark spatial computing
• Scenarios demo
• Conclusion and Next step
20
© 2016 IBM Corporation21
Scenario --1
Population Heatmap Commute OD Route
Better Understanding of Key Metrics for Urban City
Planning with Big Data (Sampled data vs all data; History
data vs current data)
Ø Urban planner can have more reasonable planning ofthe
city based on current population distribution
Ø Traffic planning institute leverage this to optimize the traffic
network
Ø City mgmt. unitcan better plan city services facilities & city
abnormal events detection based on population flow
New Methodology & New Applications Using Big Data for
Better Urban City Planning, Monitoring & Decision
Making
v Quickly understand the currentpeople commute traffic
volume and directions,and identify the bottleneck
v Optimize the traffic plan and scheduling during commute
peak time
v More new applications can be builtfor planners,
administrators and new data services can be provided to
city residents for the participation ofcity management
© 2016 IBM Corporation22
Scenario --2
Commute Time Cost
Office-Residence Imbalance
© 2016 IBM Corporation23
Big Data Architecture Key Point – System
v Big data product selection
• ODPi (Open Data Platform)
v Big data component selection
• Data moving,data store,computing,SQL interface…
• …
v Deployment mode selection
• Local cluster
• IAAS
• Bigdata cloud
v Separate deployment env and data exploration env
Big Data Architecture Key Point – Data
Ø Data collection
Ø Data ELT
Ø Data Pipeline
Ø Data lifecycle governance
Ø Data Volume plan
Ø Data Fusion
Ø Spatial data analysis and visualization
Big Data Architecture Key Point – Platform
Ø HA
Ø Security
Ø Monitoring & Stability
Ø Scale-out and upgrade
Ø Resource management
Ø Job Schedule
Ø Multi-tenant
Big Data Architecture Key Point –
Algorithm & Model层面
Ø BusinessAnalysis
Ø Alg model design
Ø Model verification
Ø Model adjustment
Ø Model validity insurance
© 2016 IBM Corporation24
Road Ahead…
Deep analysis with more scenarios
• Traffic prediction
• Trip predication
• Commute methods
• etc
More data sources for trajectory/traffic
• GPS for taxi, bus
• RFID on road
• Road monitoring data
• Subway stop check-in/out info
• Parking Lot
• Fusion with weather, social data
Data exploration environment to support data science &
continuous engineering of new features
Leverage more SparkML for traffic prediction
Cluster scale-out with more data and algorithms
Data ingestion with Kafka/Flume (message hub)
SQL on Hadoop
Graph computation for nearest path and roadmatcher
Current
Deployment
Big Data Platform
Scale-out
Scale-out
New
Scenarios
w/ new data
Data Exploration
Environment
Engineering
and
deployment
Data movement
© 2016 IBM Corporation25 © 2016 IBM Corporation
Spark GeoSpatial Analysis for Other Scenarios
Spatial-Temporal Trajectory
Analysis for human
Trajectory Data Management
Trajectory Analysis Function
Spatial-Temporal Trajectory
Analysis for vehicle
Common
API
geo-spatial data pre-process,geo-spatial Geometry Computing,Surface Mesh
Computing
Distributed geo-spatial calculating API (Base on Spark)
IBM’s Big Data Analytics Platform
Smart
Transportation
Smart
Logistics
Smart
Tourism
others
© 2016 IBM Corporation26
Big Data University and Data Science Workbench
− A community initiative led by IBM
− @yourpace, @yourplaceonline courses about data
− Developed by industry experts
− Free courses by the community with hands-on labs
− Certificate of completion and badges
− Looking for contributors!
Integrated Set of Tools, Languages and Execution Environments
Clean and Prepare Data
• OpenRefine
Experiment with and Analyze Data
• Jupyter Notebooks, R Studio, SeaHorse
Connect to data processing engines:
• Spark, Hadoop, dashDB, BigSQL, BigR
http://DataScientistWorkbench.com
http://bigdatauniversity.com

Mais conteúdo relacionado

Mais procurados

Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Improving Data Locality for Spark Jobs on Kubernetes Using AlluxioImproving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Improving Data Locality for Spark Jobs on Kubernetes Using AlluxioAlluxio, Inc.
 
Benefit SAP S4HANA.pptx
Benefit SAP S4HANA.pptxBenefit SAP S4HANA.pptx
Benefit SAP S4HANA.pptxAlexYuniarto1
 
Procurement Transformation with S/4 HANA Sourcing and Procurement
Procurement Transformation with S/4 HANA Sourcing and ProcurementProcurement Transformation with S/4 HANA Sourcing and Procurement
Procurement Transformation with S/4 HANA Sourcing and ProcurementSAP Ariba
 
AI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksAI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksDatabricks
 
Data Monitoring with whylogs
Data Monitoring with whylogsData Monitoring with whylogs
Data Monitoring with whylogsAlexey Grigorev
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorDatabricks
 
End to End Process Transformation with Signavio.pdf
End to End Process Transformation with Signavio.pdfEnd to End Process Transformation with Signavio.pdf
End to End Process Transformation with Signavio.pdfIgnacioPeredoCL
 
Data Distribution and Ordering for Efficient Data Source V2
Data Distribution and Ordering for Efficient Data Source V2Data Distribution and Ordering for Efficient Data Source V2
Data Distribution and Ordering for Efficient Data Source V2Databricks
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introductionAlexey Grigorev
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...HostedbyConfluent
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPconfluent
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Eric Sun
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentDatabricks
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
 

Mais procurados (20)

Apo core interface cif
Apo core interface cifApo core interface cif
Apo core interface cif
 
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Improving Data Locality for Spark Jobs on Kubernetes Using AlluxioImproving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
 
Benefit SAP S4HANA.pptx
Benefit SAP S4HANA.pptxBenefit SAP S4HANA.pptx
Benefit SAP S4HANA.pptx
 
Procurement Transformation with S/4 HANA Sourcing and Procurement
Procurement Transformation with S/4 HANA Sourcing and ProcurementProcurement Transformation with S/4 HANA Sourcing and Procurement
Procurement Transformation with S/4 HANA Sourcing and Procurement
 
AI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksAI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with Databricks
 
SAP S/4HANA Cloud
SAP S/4HANA CloudSAP S/4HANA Cloud
SAP S/4HANA Cloud
 
Data Monitoring with whylogs
Data Monitoring with whylogsData Monitoring with whylogs
Data Monitoring with whylogs
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
 
End to End Process Transformation with Signavio.pdf
End to End Process Transformation with Signavio.pdfEnd to End Process Transformation with Signavio.pdf
End to End Process Transformation with Signavio.pdf
 
SAP Workloads on AWS
SAP Workloads on AWSSAP Workloads on AWS
SAP Workloads on AWS
 
Data Distribution and Ordering for Efficient Data Source V2
Data Distribution and Ordering for Efficient Data Source V2Data Distribution and Ordering for Efficient Data Source V2
Data Distribution and Ordering for Efficient Data Source V2
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
 
SAP Overview
SAP OverviewSAP Overview
SAP Overview
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
 
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model DeploymentUnified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 

Destaque

SexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial DataSexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial DataCharalampos (Babis) Nikolaou
 
Neo4j Spatial - FooCafe September 2015
Neo4j Spatial - FooCafe September 2015Neo4j Spatial - FooCafe September 2015
Neo4j Spatial - FooCafe September 2015Craig Taverner
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationCesare Cugnasco
 
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaMagellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaSpark Summit
 
Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial dataKudos S.A.S
 
Auto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingAuto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingZbigniew Jerzak
 
Adaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream ProcessingAdaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream ProcessingZbigniew Jerzak
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...DataStax
 
Multilevel aggregation for Hadoop/MapReduce
Multilevel aggregation for Hadoop/MapReduceMultilevel aggregation for Hadoop/MapReduce
Multilevel aggregation for Hadoop/MapReduceTsuyoshi OZAWA
 
HTM & Apache Flink (2016-06-27)
HTM & Apache Flink (2016-06-27)HTM & Apache Flink (2016-06-27)
HTM & Apache Flink (2016-06-27)Eron Wright
 
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram SriharshaMagellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram SriharshaSpark Summit
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkRob Emanuele
 
Monitoring temporary populations through cellular core network data
Monitoring temporary populations through cellular core network dataMonitoring temporary populations through cellular core network data
Monitoring temporary populations through cellular core network dataBeniamino Murgante
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
High Scalability Network Monitoring for Communications Service Providers
High Scalability Network Monitoring for Communications Service ProvidersHigh Scalability Network Monitoring for Communications Service Providers
High Scalability Network Monitoring for Communications Service ProvidersCA Technologies
 
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing ApplicationsWill it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing ApplicationsNavina Ramesh
 
Linked data presentation for who umc 21 jan 2015
Linked data presentation for who umc 21 jan 2015Linked data presentation for who umc 21 jan 2015
Linked data presentation for who umc 21 jan 2015Kerstin Forsberg
 
Modern Applications Demand Network Analytics
Modern Applications Demand Network AnalyticsModern Applications Demand Network Analytics
Modern Applications Demand Network AnalyticsPluribus Networks
 
Is your MQTT broker IoT ready?
Is your MQTT broker IoT ready?Is your MQTT broker IoT ready?
Is your MQTT broker IoT ready?Eurotech
 

Destaque (20)

SexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial DataSexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial Data
 
Neo4j Spatial - FooCafe September 2015
Neo4j Spatial - FooCafe September 2015Neo4j Spatial - FooCafe September 2015
Neo4j Spatial - FooCafe September 2015
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
 
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaMagellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
 
Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial data
 
Auto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingAuto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream Processing
 
Adaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream ProcessingAdaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream Processing
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
 
Multilevel aggregation for Hadoop/MapReduce
Multilevel aggregation for Hadoop/MapReduceMultilevel aggregation for Hadoop/MapReduce
Multilevel aggregation for Hadoop/MapReduce
 
HTM & Apache Flink (2016-06-27)
HTM & Apache Flink (2016-06-27)HTM & Apache Flink (2016-06-27)
HTM & Apache Flink (2016-06-27)
 
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram SriharshaMagellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
 
Monitoring temporary populations through cellular core network data
Monitoring temporary populations through cellular core network dataMonitoring temporary populations through cellular core network data
Monitoring temporary populations through cellular core network data
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
 
High Scalability Network Monitoring for Communications Service Providers
High Scalability Network Monitoring for Communications Service ProvidersHigh Scalability Network Monitoring for Communications Service Providers
High Scalability Network Monitoring for Communications Service Providers
 
Spatial Data Model 2
Spatial Data Model 2Spatial Data Model 2
Spatial Data Model 2
 
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing ApplicationsWill it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing Applications
 
Linked data presentation for who umc 21 jan 2015
Linked data presentation for who umc 21 jan 2015Linked data presentation for who umc 21 jan 2015
Linked data presentation for who umc 21 jan 2015
 
Modern Applications Demand Network Analytics
Modern Applications Demand Network AnalyticsModern Applications Demand Network Analytics
Modern Applications Demand Network Analytics
 
Is your MQTT broker IoT ready?
Is your MQTT broker IoT ready?Is your MQTT broker IoT ready?
Is your MQTT broker IoT ready?
 

Semelhante a High Performance Spatial-Temporal Trajectory Analysis with Spark

Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016Anand Haridass
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Cynthia Saracco
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into ProductionMapR Technologies
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017Codemotion
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016Mathieu Dumoulin
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jrJonathan Raspaud
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresDATAVERSITY
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceCambridge Semantics
 

Semelhante a High Performance Spatial-Temporal Trajectory Analysis with Spark (20)

Iotbds v1.0
Iotbds v1.0Iotbds v1.0
Iotbds v1.0
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
Sohail resume
Sohail resumeSohail resume
Sohail resume
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Putting Apache Drill into Production
Putting Apache Drill into ProductionPutting Apache Drill into Production
Putting Apache Drill into Production
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
The CDO Agenda: how data architecture can help?
The CDO Agenda: how data architecture can help?The CDO Agenda: how data architecture can help?
The CDO Agenda: how data architecture can help?
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 

Mais de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Mais de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Último

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

High Performance Spatial-Temporal Trajectory Analysis with Spark

  • 1. © 2016 IBM Corporation High Performance Spatial-Temporal Trajectory Analysis with Spark YongHua (Henry) Zeng zengyh@cn.ibm.com Big Data & Analytics Solution Architect Analytics Platform Services,IBM China Lab
  • 2. © 2016 IBM Corporation2 Agenda • Background • Architecture • Technical Design • Big Data Platform design • Data governance design • Algorithm model • Spark spatial computing • Scenarios demo • Conclusion and Next step 2
  • 3. © 2016 IBM Corporation3 Background Introduction -- study the human trajectory by mobile signal data Problem • Varieties of data that traditional planning will not be able to tackle • Many of the data have the characteristics of big data (volume, velocity, varieties) • Cellular signaling data is one of such typical data that can enable new types of applications to facilitate smarter urban planning • Analyzing cellular signal data can help urban planner & city governing bodies to better understand the city Data Set • Cellular signal data • Mobile users 5M • 25M to 50M data every minute; 30G of data daily • ~ 400M cellular signal records daily • More data coming with GPS, RFID for 4M vehicles
  • 4. © 2016 IBM Corporation4 Solution Architecture Data sources Distributed File System Streaming Resource Management YARN API Services Orchestration Batch Relational Database w/ Spatial Extention Computation Engine Visualization & Report Data Ingestion HDFS LDAP Service Cluster Management Security Service javascript Flex Shp file etc
  • 5. © 2016 IBM Corporation5 Data Collection Data Aggregation Coordinates Formalization Abnormal Detections Final Computing Source Data Pre-processing Base Model Computing Data Quality Metrics Application Model Computing Residential Statistics Working Region Statistics Regional Commuting Analysis The Big Data Platform Application Views GIS Server GIS Database Residential, Community Data Data Cleansing Business Architecture
  • 6. © 2016 IBM Corporation6 Architecture Decision Points GIS spatial DB Data Fusion Standard Bigdata Platform ELT Data Store & Analysis OD analyssi Index Computing Data Quality computiing Home-office analyssi Streaming Home-Office DW/Market Data Export thermodynamic diagram User 2 User 3 User1 GIS 应用展现 Base Alg App Alg 手机信令 (在线/脱 机) Data collect ion Database(business, spatial) Home-Office DW/Market Job andresourceSchedule Flex/JS Spatial DB (spatial extension) ArcGIS Spark Streaming Oozie/YarnShell脚本 Spark/HDFS Sqoop Java
  • 7. © 2016 IBM Corporation7 System front-end architecture Geospatial Analysis Big Data Platform (HDFS) Sqoop FTP
  • 8. © 2016 IBM Corporation8 Agenda • Background • Architecture • Technical Design • Big Data Platform design • Data governance design • Algorithm model • Spark spatial computing • Scenarios demo • Conclusion and Next step 8
  • 9. © 2016 IBM Corporation9 Items on Big Data Platform Design ü Planning and product selection ü Deployment and operation ü Application deployment ü Job scheduling ü Resource management ü Spark within BigInsights
  • 10. © 2016 IBM Corporation10 IBM BigInsights for Apache Hadoop and Spark Discovery & Exploration Prescriptive Analytics Predictive Analytics Content Analytics Business Intelligence Data Mgmt Hadoop & NoSQL Content Mgmt Data Warehouse Information Integration & Governance IBM ANALYTICS PLATFORM Built on Spark. Hybrid. Trusted. Spark Analytics Operating System Machine LearningOn premises On cloud Data at rest & In-motion.Inside & outside the firewall. Structured & unstructured. § Analytical platform for persistent Big Data – 100% open source core with IBM add-ons for analysts, data scientists, and admins – On site or cloud § Distinguishing characteristics – Built-in analytics . . . . Enhances business knowledge – Enterprise software integration . . . . Complements and extends existing capabilities – Production-ready . . . . Speeds time-to-value § IBM advantage – Combination of software, hardware, services and research
  • 11. © 2016 IBM Corporation11 IBM Open Platform 100% open source platform compliant with ODPi Apache Hadoop ecosystem Apache Spark ecosystem IBM-specific BigInsights features Big SQL (industry standard SQL) Text analytics BigSheets (spreadsheet-style tool) Big R (R support) IBM Streams, Cognos (limited use licenses) Overview of BigInsights Free Quick Start (non production): • IBM Open Platform • IBM added value features • Community support
  • 12. © 2016 IBM Corporation12 Big data platform job scheduling and resource mgmt 12 - Dedicated slave nodes for computing - almost all CPU & memory resources in each slave node is managed by Yarn - Capacity scheduler using dedicated queues for various business usage - production (batch & streaming processing, data movement), development - Elastic resource capacity for each queue by specifying a large maximum capacity, to achieve high resource utilization - Fine-grained Yarn container allocation by specifying small increment vcore/memory sizes, to support various workload types - big, medium and small jobs - No CGroups-based CPU resource isolation, because of system stability issues caused by this in our IOP 4.1/RHEL 6.5 environments Job scheduling with Oozie Resource mgmt with YARN
  • 13. © 2016 IBM Corporation13 Spark within BigInsights ü Deployment § Amabari for installation and deployment § Spark (compute node) co-exist with data node (HDFS) § Cluster mode with YARN as the resource mgmt ü Runtime Configuration § Bad configuration may cause job under-perform, failed, cluster instable etc § Methodology to configure the partition #, cores/mem of executor, # of executors ü Monitoring and Tuning § Spark streaming stability (monitoring log, checkpoint) § Handle massive small files § Shuffle, partition, IO utilization etc § Job execution, GC time etc via dashboard
  • 14. © 2016 IBM Corporation14 Data Perspective Considerations § Data process flow § Data management − capacity sizing, layout in HDFS, lifecycle mgmt § Data movement − Between big data platform and other systems RDBMS Data Process Flow
  • 15. © 2016 IBM Corporation15 DIST 15 • 5 Layers of Data in the System • L1 raw data ingested into HDFS • L2 ELT data (pre-processing with streaming) in HDFS • L3 result data via algorithm model in HDFS • L4 data for visualization (in HDFS or RDBMS) • L5 archived data in external storage (compressed) • Design the data layout, # of copies, retention etc in HDFS • Jobs to prune out-dated data in HDFS (Oozie) Data lifecycle management
  • 16. © 2016 IBM Corporation16 DIST 16 • Data Ingestion into Big Data platform • Offline/Online data ingestion -- HDFS loading from (external storage) /FTP server + Streaming • Future – Kafka + Spark Streaming (more data sources, analytics path) • Data Export from Big Data platform • Near real-time heatmap generation • Algorithm model results exported to RDBMS -- Sqoop Data movement HDFS load from FTP Spark streaming ArcGIS Server (generat e heatma p based on shp file) 实时 展现 回溯查 询 每30分钟推送到数 据库中 Basic Algorithm -- stay pointin HDFS FTP push
  • 17. © 2016 IBM Corporation17 DIST 17 Algorithm Model of Trajectory(OD) Analysis 统计数据导出 Cellular signal data ELT Trajectory Sequence Multi-Day Stable Point OD Identification CommuteStay Points OD Statistics OD Index Stats Commute Stats People Flow Stats Data Quality Index 统计数据导出>1km >1k m GRO UP1 GRO UP2 GRO UP3 By different area type Algorithm Accuracy Validation Algorithm Performance Algorithm Stability Algorithm Extensibility Algorithm Configurable Application Algorithm Base Algorithm
  • 18. © 2016 IBM Corporation18 Geospatial Computation with Spark § Requirements − Spark to direct support of SDE/Shp/GeoJson file − Most of the geospatial computation in Spark cluster (point-area relationship, spherical distance, geospatial stats etc) − Performance challenge – 20M records per each iteration of geospatial computation § Solution Design SDE shp file Spark Cluster Basic Algorithm (geospatial computation) ApplicationAlgorithm (geospatial statistics) SDE interface SHP interface Geospatial API Grid API Spark-GIS libGrid definition
  • 19. © 2016 IBM Corporation19 Spatial Grid Design for Spark 关系 Home-Office Model Statistic by Group Group-Grid Mapping Statistic by Grid Grid Home- Office Statistic Table Grid Statistic Table User Define Query Pre-define Query Convert formula expansion formula Spark Base Algorithm Spark App Statistic Relation Database Web GIS Application Web GIS Front-end
  • 20. © 2016 IBM Corporation20 Agenda • Background – problem and data • Architecture • Technical Design • Big Data Platform design • Data governance design • Algorithm model • Spark spatial computing • Scenarios demo • Conclusion and Next step 20
  • 21. © 2016 IBM Corporation21 Scenario --1 Population Heatmap Commute OD Route Better Understanding of Key Metrics for Urban City Planning with Big Data (Sampled data vs all data; History data vs current data) Ø Urban planner can have more reasonable planning ofthe city based on current population distribution Ø Traffic planning institute leverage this to optimize the traffic network Ø City mgmt. unitcan better plan city services facilities & city abnormal events detection based on population flow New Methodology & New Applications Using Big Data for Better Urban City Planning, Monitoring & Decision Making v Quickly understand the currentpeople commute traffic volume and directions,and identify the bottleneck v Optimize the traffic plan and scheduling during commute peak time v More new applications can be builtfor planners, administrators and new data services can be provided to city residents for the participation ofcity management
  • 22. © 2016 IBM Corporation22 Scenario --2 Commute Time Cost Office-Residence Imbalance
  • 23. © 2016 IBM Corporation23 Big Data Architecture Key Point – System v Big data product selection • ODPi (Open Data Platform) v Big data component selection • Data moving,data store,computing,SQL interface… • … v Deployment mode selection • Local cluster • IAAS • Bigdata cloud v Separate deployment env and data exploration env Big Data Architecture Key Point – Data Ø Data collection Ø Data ELT Ø Data Pipeline Ø Data lifecycle governance Ø Data Volume plan Ø Data Fusion Ø Spatial data analysis and visualization Big Data Architecture Key Point – Platform Ø HA Ø Security Ø Monitoring & Stability Ø Scale-out and upgrade Ø Resource management Ø Job Schedule Ø Multi-tenant Big Data Architecture Key Point – Algorithm & Model层面 Ø BusinessAnalysis Ø Alg model design Ø Model verification Ø Model adjustment Ø Model validity insurance
  • 24. © 2016 IBM Corporation24 Road Ahead… Deep analysis with more scenarios • Traffic prediction • Trip predication • Commute methods • etc More data sources for trajectory/traffic • GPS for taxi, bus • RFID on road • Road monitoring data • Subway stop check-in/out info • Parking Lot • Fusion with weather, social data Data exploration environment to support data science & continuous engineering of new features Leverage more SparkML for traffic prediction Cluster scale-out with more data and algorithms Data ingestion with Kafka/Flume (message hub) SQL on Hadoop Graph computation for nearest path and roadmatcher Current Deployment Big Data Platform Scale-out Scale-out New Scenarios w/ new data Data Exploration Environment Engineering and deployment Data movement
  • 25. © 2016 IBM Corporation25 © 2016 IBM Corporation Spark GeoSpatial Analysis for Other Scenarios Spatial-Temporal Trajectory Analysis for human Trajectory Data Management Trajectory Analysis Function Spatial-Temporal Trajectory Analysis for vehicle Common API geo-spatial data pre-process,geo-spatial Geometry Computing,Surface Mesh Computing Distributed geo-spatial calculating API (Base on Spark) IBM’s Big Data Analytics Platform Smart Transportation Smart Logistics Smart Tourism others
  • 26. © 2016 IBM Corporation26 Big Data University and Data Science Workbench − A community initiative led by IBM − @yourpace, @yourplaceonline courses about data − Developed by industry experts − Free courses by the community with hands-on labs − Certificate of completion and badges − Looking for contributors! Integrated Set of Tools, Languages and Execution Environments Clean and Prepare Data • OpenRefine Experiment with and Analyze Data • Jupyter Notebooks, R Studio, SeaHorse Connect to data processing engines: • Spark, Hadoop, dashDB, BigSQL, BigR http://DataScientistWorkbench.com http://bigdatauniversity.com