Enviar pesquisa
Carregar
High Performance Spatial-Temporal Trajectory Analysis with Spark
•
6 gostaram
•
2,943 visualizações
DataWorks Summit/Hadoop Summit
Seguir
High Performance Spatial-Temporal Trajectory Analysis with Spark
Leia menos
Leia mais
Tecnologia
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 26
Baixar agora
Baixar para ler offline
Recomendados
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
confluent
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
Databricks
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Julian Hyde
SAP HANA Training - For Technical/BASIS administrators.
SAP HANA Training - For Technical/BASIS administrators.
Gaganpreet Singh
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
Karthik Murugesan
Accelerate your SAP HANA Migration with Capgemini & AWS FAST PPT
Accelerate your SAP HANA Migration with Capgemini & AWS FAST PPT
Amazon Web Services
Take the Next Step to S/4HANA with "RISE with SAP"
Take the Next Step to S/4HANA with "RISE with SAP"
panayaofficial
An Overview of ModeShape
An Overview of ModeShape
Randall Hauch
Recomendados
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
confluent
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
Databricks
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Julian Hyde
SAP HANA Training - For Technical/BASIS administrators.
SAP HANA Training - For Technical/BASIS administrators.
Gaganpreet Singh
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
2019 Slides - Michelangelo Palette: A Feature Engineering Platform at Uber
Karthik Murugesan
Accelerate your SAP HANA Migration with Capgemini & AWS FAST PPT
Accelerate your SAP HANA Migration with Capgemini & AWS FAST PPT
Amazon Web Services
Take the Next Step to S/4HANA with "RISE with SAP"
Take the Next Step to S/4HANA with "RISE with SAP"
panayaofficial
An Overview of ModeShape
An Overview of ModeShape
Randall Hauch
Apo core interface cif
Apo core interface cif
Surendra Bhelkar
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Alluxio, Inc.
Benefit SAP S4HANA.pptx
Benefit SAP S4HANA.pptx
AlexYuniarto1
Procurement Transformation with S/4 HANA Sourcing and Procurement
Procurement Transformation with S/4 HANA Sourcing and Procurement
SAP Ariba
AI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with Databricks
Databricks
SAP S/4HANA Cloud
SAP S/4HANA Cloud
Benedict Yong (杨腾翔)
Data Monitoring with whylogs
Data Monitoring with whylogs
Alexey Grigorev
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
Databricks
End to End Process Transformation with Signavio.pdf
End to End Process Transformation with Signavio.pdf
IgnacioPeredoCL
SAP Workloads on AWS
SAP Workloads on AWS
Amazon Web Services
Data Distribution and Ordering for Efficient Data Source V2
Data Distribution and Ordering for Efficient Data Source V2
Databricks
Physical Plans in Spark SQL
Physical Plans in Spark SQL
Databricks
Data engineering zoomcamp introduction
Data engineering zoomcamp introduction
Alexey Grigorev
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
HostedbyConfluent
SAP Overview
SAP Overview
Suresh Thammishetty
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
confluent
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
Eric Sun
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
Databricks
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
Provectus
SexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial Data
Charalampos (Babis) Nikolaou
Neo4j Spatial - FooCafe September 2015
Neo4j Spatial - FooCafe September 2015
Craig Taverner
Mais conteúdo relacionado
Mais procurados
Apo core interface cif
Apo core interface cif
Surendra Bhelkar
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Alluxio, Inc.
Benefit SAP S4HANA.pptx
Benefit SAP S4HANA.pptx
AlexYuniarto1
Procurement Transformation with S/4 HANA Sourcing and Procurement
Procurement Transformation with S/4 HANA Sourcing and Procurement
SAP Ariba
AI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with Databricks
Databricks
SAP S/4HANA Cloud
SAP S/4HANA Cloud
Benedict Yong (杨腾翔)
Data Monitoring with whylogs
Data Monitoring with whylogs
Alexey Grigorev
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
Databricks
End to End Process Transformation with Signavio.pdf
End to End Process Transformation with Signavio.pdf
IgnacioPeredoCL
SAP Workloads on AWS
SAP Workloads on AWS
Amazon Web Services
Data Distribution and Ordering for Efficient Data Source V2
Data Distribution and Ordering for Efficient Data Source V2
Databricks
Physical Plans in Spark SQL
Physical Plans in Spark SQL
Databricks
Data engineering zoomcamp introduction
Data engineering zoomcamp introduction
Alexey Grigorev
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
HostedbyConfluent
SAP Overview
SAP Overview
Suresh Thammishetty
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
confluent
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
Eric Sun
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
Databricks
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
Provectus
Mais procurados
(20)
Apo core interface cif
Apo core interface cif
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Benefit SAP S4HANA.pptx
Benefit SAP S4HANA.pptx
Procurement Transformation with S/4 HANA Sourcing and Procurement
Procurement Transformation with S/4 HANA Sourcing and Procurement
AI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with Databricks
SAP S/4HANA Cloud
SAP S/4HANA Cloud
Data Monitoring with whylogs
Data Monitoring with whylogs
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
End to End Process Transformation with Signavio.pdf
End to End Process Transformation with Signavio.pdf
SAP Workloads on AWS
SAP Workloads on AWS
Data Distribution and Ordering for Efficient Data Source V2
Data Distribution and Ordering for Efficient Data Source V2
Physical Plans in Spark SQL
Physical Plans in Spark SQL
Data engineering zoomcamp introduction
Data engineering zoomcamp introduction
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
SAP Overview
SAP Overview
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
Unified MLOps: Feature Stores & Model Deployment
Unified MLOps: Feature Stores & Model Deployment
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
Destaque
SexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial Data
Charalampos (Babis) Nikolaou
Neo4j Spatial - FooCafe September 2015
Neo4j Spatial - FooCafe September 2015
Craig Taverner
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Cesare Cugnasco
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Spark Summit
Using python to analyze spatial data
Using python to analyze spatial data
Kudos S.A.S
Auto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream Processing
Zbigniew Jerzak
Adaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream Processing
Zbigniew Jerzak
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
DataStax
Multilevel aggregation for Hadoop/MapReduce
Multilevel aggregation for Hadoop/MapReduce
Tsuyoshi OZAWA
HTM & Apache Flink (2016-06-27)
HTM & Apache Flink (2016-06-27)
Eron Wright
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Spark Summit
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
Rob Emanuele
Monitoring temporary populations through cellular core network data
Monitoring temporary populations through cellular core network data
Beniamino Murgante
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
High Scalability Network Monitoring for Communications Service Providers
High Scalability Network Monitoring for Communications Service Providers
CA Technologies
Spatial Data Model 2
Spatial Data Model 2
Kaium Chowdhury
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Navina Ramesh
Linked data presentation for who umc 21 jan 2015
Linked data presentation for who umc 21 jan 2015
Kerstin Forsberg
Modern Applications Demand Network Analytics
Modern Applications Demand Network Analytics
Pluribus Networks
Is your MQTT broker IoT ready?
Is your MQTT broker IoT ready?
Eurotech
Destaque
(20)
SexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial Data
Neo4j Spatial - FooCafe September 2015
Neo4j Spatial - FooCafe September 2015
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Using python to analyze spatial data
Using python to analyze spatial data
Auto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream Processing
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Multilevel aggregation for Hadoop/MapReduce
Multilevel aggregation for Hadoop/MapReduce
HTM & Apache Flink (2016-06-27)
HTM & Apache Flink (2016-06-27)
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
Magellan-Spark as a Geospatial Analytics Engine by Ram Sriharsha
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
Monitoring temporary populations through cellular core network data
Monitoring temporary populations through cellular core network data
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
High Scalability Network Monitoring for Communications Service Providers
High Scalability Network Monitoring for Communications Service Providers
Spatial Data Model 2
Spatial Data Model 2
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Linked data presentation for who umc 21 jan 2015
Linked data presentation for who umc 21 jan 2015
Modern Applications Demand Network Analytics
Modern Applications Demand Network Analytics
Is your MQTT broker IoT ready?
Is your MQTT broker IoT ready?
Semelhante a High Performance Spatial-Temporal Trajectory Analysis with Spark
Iotbds v1.0
Iotbds v1.0
Roy Cecil
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
Anand Haridass
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Ali Hodroj
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Value Association
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
Data Driven Innovation
Sohail resume
Sohail resume
Sohail Ahmed
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Cynthia Saracco
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
DataWorks Summit/Hadoop Summit
Putting Apache Drill into Production
Putting Apache Drill into Production
MapR Technologies
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
Big Data and OSS at IBM
Big Data and OSS at IBM
Boulder Java User's Group
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
Codemotion
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Hortonworks
The CDO Agenda: how data architecture can help?
The CDO Agenda: how data architecture can help?
BCS Data Management Specialist Group
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
Wilfried Hoge
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
Jonathan Raspaud
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
DATAVERSITY
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
Cambridge Semantics
Semelhante a High Performance Spatial-Temporal Trajectory Analysis with Spark
(20)
Iotbds v1.0
Iotbds v1.0
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
Sohail resume
Sohail resume
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Putting Apache Drill into Production
Putting Apache Drill into Production
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
Big Data and OSS at IBM
Big Data and OSS at IBM
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
An Introduction to Apache Ignite - Mandhir Gidda - Codemotion Rome 2017
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The CDO Agenda: how data architecture can help?
The CDO Agenda: how data architecture can help?
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
Mais de DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
Mais de DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Último
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Rustici Software
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
Christopher Logan Kennedy
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard37
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
rafiqahmad00786416
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
apidays
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
Sandro Moreira
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Orbitshub
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Orbitshub
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
Samir Dash
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
apidays
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
The Digital Insurer
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Edi Saputra
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
apidays
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
UiPathCommunity
Último
(20)
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
High Performance Spatial-Temporal Trajectory Analysis with Spark
1.
© 2016 IBM
Corporation High Performance Spatial-Temporal Trajectory Analysis with Spark YongHua (Henry) Zeng zengyh@cn.ibm.com Big Data & Analytics Solution Architect Analytics Platform Services,IBM China Lab
2.
© 2016 IBM
Corporation2 Agenda • Background • Architecture • Technical Design • Big Data Platform design • Data governance design • Algorithm model • Spark spatial computing • Scenarios demo • Conclusion and Next step 2
3.
© 2016 IBM
Corporation3 Background Introduction -- study the human trajectory by mobile signal data Problem • Varieties of data that traditional planning will not be able to tackle • Many of the data have the characteristics of big data (volume, velocity, varieties) • Cellular signaling data is one of such typical data that can enable new types of applications to facilitate smarter urban planning • Analyzing cellular signal data can help urban planner & city governing bodies to better understand the city Data Set • Cellular signal data • Mobile users 5M • 25M to 50M data every minute; 30G of data daily • ~ 400M cellular signal records daily • More data coming with GPS, RFID for 4M vehicles
4.
© 2016 IBM
Corporation4 Solution Architecture Data sources Distributed File System Streaming Resource Management YARN API Services Orchestration Batch Relational Database w/ Spatial Extention Computation Engine Visualization & Report Data Ingestion HDFS LDAP Service Cluster Management Security Service javascript Flex Shp file etc
5.
© 2016 IBM
Corporation5 Data Collection Data Aggregation Coordinates Formalization Abnormal Detections Final Computing Source Data Pre-processing Base Model Computing Data Quality Metrics Application Model Computing Residential Statistics Working Region Statistics Regional Commuting Analysis The Big Data Platform Application Views GIS Server GIS Database Residential, Community Data Data Cleansing Business Architecture
6.
© 2016 IBM
Corporation6 Architecture Decision Points GIS spatial DB Data Fusion Standard Bigdata Platform ELT Data Store & Analysis OD analyssi Index Computing Data Quality computiing Home-office analyssi Streaming Home-Office DW/Market Data Export thermodynamic diagram User 2 User 3 User1 GIS 应用展现 Base Alg App Alg 手机信令 (在线/脱 机) Data collect ion Database(business, spatial) Home-Office DW/Market Job andresourceSchedule Flex/JS Spatial DB (spatial extension) ArcGIS Spark Streaming Oozie/YarnShell脚本 Spark/HDFS Sqoop Java
7.
© 2016 IBM
Corporation7 System front-end architecture Geospatial Analysis Big Data Platform (HDFS) Sqoop FTP
8.
© 2016 IBM
Corporation8 Agenda • Background • Architecture • Technical Design • Big Data Platform design • Data governance design • Algorithm model • Spark spatial computing • Scenarios demo • Conclusion and Next step 8
9.
© 2016 IBM
Corporation9 Items on Big Data Platform Design ü Planning and product selection ü Deployment and operation ü Application deployment ü Job scheduling ü Resource management ü Spark within BigInsights
10.
© 2016 IBM
Corporation10 IBM BigInsights for Apache Hadoop and Spark Discovery & Exploration Prescriptive Analytics Predictive Analytics Content Analytics Business Intelligence Data Mgmt Hadoop & NoSQL Content Mgmt Data Warehouse Information Integration & Governance IBM ANALYTICS PLATFORM Built on Spark. Hybrid. Trusted. Spark Analytics Operating System Machine LearningOn premises On cloud Data at rest & In-motion.Inside & outside the firewall. Structured & unstructured. § Analytical platform for persistent Big Data – 100% open source core with IBM add-ons for analysts, data scientists, and admins – On site or cloud § Distinguishing characteristics – Built-in analytics . . . . Enhances business knowledge – Enterprise software integration . . . . Complements and extends existing capabilities – Production-ready . . . . Speeds time-to-value § IBM advantage – Combination of software, hardware, services and research
11.
© 2016 IBM
Corporation11 IBM Open Platform 100% open source platform compliant with ODPi Apache Hadoop ecosystem Apache Spark ecosystem IBM-specific BigInsights features Big SQL (industry standard SQL) Text analytics BigSheets (spreadsheet-style tool) Big R (R support) IBM Streams, Cognos (limited use licenses) Overview of BigInsights Free Quick Start (non production): • IBM Open Platform • IBM added value features • Community support
12.
© 2016 IBM
Corporation12 Big data platform job scheduling and resource mgmt 12 - Dedicated slave nodes for computing - almost all CPU & memory resources in each slave node is managed by Yarn - Capacity scheduler using dedicated queues for various business usage - production (batch & streaming processing, data movement), development - Elastic resource capacity for each queue by specifying a large maximum capacity, to achieve high resource utilization - Fine-grained Yarn container allocation by specifying small increment vcore/memory sizes, to support various workload types - big, medium and small jobs - No CGroups-based CPU resource isolation, because of system stability issues caused by this in our IOP 4.1/RHEL 6.5 environments Job scheduling with Oozie Resource mgmt with YARN
13.
© 2016 IBM
Corporation13 Spark within BigInsights ü Deployment § Amabari for installation and deployment § Spark (compute node) co-exist with data node (HDFS) § Cluster mode with YARN as the resource mgmt ü Runtime Configuration § Bad configuration may cause job under-perform, failed, cluster instable etc § Methodology to configure the partition #, cores/mem of executor, # of executors ü Monitoring and Tuning § Spark streaming stability (monitoring log, checkpoint) § Handle massive small files § Shuffle, partition, IO utilization etc § Job execution, GC time etc via dashboard
14.
© 2016 IBM
Corporation14 Data Perspective Considerations § Data process flow § Data management − capacity sizing, layout in HDFS, lifecycle mgmt § Data movement − Between big data platform and other systems RDBMS Data Process Flow
15.
© 2016 IBM
Corporation15 DIST 15 • 5 Layers of Data in the System • L1 raw data ingested into HDFS • L2 ELT data (pre-processing with streaming) in HDFS • L3 result data via algorithm model in HDFS • L4 data for visualization (in HDFS or RDBMS) • L5 archived data in external storage (compressed) • Design the data layout, # of copies, retention etc in HDFS • Jobs to prune out-dated data in HDFS (Oozie) Data lifecycle management
16.
© 2016 IBM
Corporation16 DIST 16 • Data Ingestion into Big Data platform • Offline/Online data ingestion -- HDFS loading from (external storage) /FTP server + Streaming • Future – Kafka + Spark Streaming (more data sources, analytics path) • Data Export from Big Data platform • Near real-time heatmap generation • Algorithm model results exported to RDBMS -- Sqoop Data movement HDFS load from FTP Spark streaming ArcGIS Server (generat e heatma p based on shp file) 实时 展现 回溯查 询 每30分钟推送到数 据库中 Basic Algorithm -- stay pointin HDFS FTP push
17.
© 2016 IBM
Corporation17 DIST 17 Algorithm Model of Trajectory(OD) Analysis 统计数据导出 Cellular signal data ELT Trajectory Sequence Multi-Day Stable Point OD Identification CommuteStay Points OD Statistics OD Index Stats Commute Stats People Flow Stats Data Quality Index 统计数据导出>1km >1k m GRO UP1 GRO UP2 GRO UP3 By different area type Algorithm Accuracy Validation Algorithm Performance Algorithm Stability Algorithm Extensibility Algorithm Configurable Application Algorithm Base Algorithm
18.
© 2016 IBM
Corporation18 Geospatial Computation with Spark § Requirements − Spark to direct support of SDE/Shp/GeoJson file − Most of the geospatial computation in Spark cluster (point-area relationship, spherical distance, geospatial stats etc) − Performance challenge – 20M records per each iteration of geospatial computation § Solution Design SDE shp file Spark Cluster Basic Algorithm (geospatial computation) ApplicationAlgorithm (geospatial statistics) SDE interface SHP interface Geospatial API Grid API Spark-GIS libGrid definition
19.
© 2016 IBM
Corporation19 Spatial Grid Design for Spark 关系 Home-Office Model Statistic by Group Group-Grid Mapping Statistic by Grid Grid Home- Office Statistic Table Grid Statistic Table User Define Query Pre-define Query Convert formula expansion formula Spark Base Algorithm Spark App Statistic Relation Database Web GIS Application Web GIS Front-end
20.
© 2016 IBM
Corporation20 Agenda • Background – problem and data • Architecture • Technical Design • Big Data Platform design • Data governance design • Algorithm model • Spark spatial computing • Scenarios demo • Conclusion and Next step 20
21.
© 2016 IBM
Corporation21 Scenario --1 Population Heatmap Commute OD Route Better Understanding of Key Metrics for Urban City Planning with Big Data (Sampled data vs all data; History data vs current data) Ø Urban planner can have more reasonable planning ofthe city based on current population distribution Ø Traffic planning institute leverage this to optimize the traffic network Ø City mgmt. unitcan better plan city services facilities & city abnormal events detection based on population flow New Methodology & New Applications Using Big Data for Better Urban City Planning, Monitoring & Decision Making v Quickly understand the currentpeople commute traffic volume and directions,and identify the bottleneck v Optimize the traffic plan and scheduling during commute peak time v More new applications can be builtfor planners, administrators and new data services can be provided to city residents for the participation ofcity management
22.
© 2016 IBM
Corporation22 Scenario --2 Commute Time Cost Office-Residence Imbalance
23.
© 2016 IBM
Corporation23 Big Data Architecture Key Point – System v Big data product selection • ODPi (Open Data Platform) v Big data component selection • Data moving,data store,computing,SQL interface… • … v Deployment mode selection • Local cluster • IAAS • Bigdata cloud v Separate deployment env and data exploration env Big Data Architecture Key Point – Data Ø Data collection Ø Data ELT Ø Data Pipeline Ø Data lifecycle governance Ø Data Volume plan Ø Data Fusion Ø Spatial data analysis and visualization Big Data Architecture Key Point – Platform Ø HA Ø Security Ø Monitoring & Stability Ø Scale-out and upgrade Ø Resource management Ø Job Schedule Ø Multi-tenant Big Data Architecture Key Point – Algorithm & Model层面 Ø BusinessAnalysis Ø Alg model design Ø Model verification Ø Model adjustment Ø Model validity insurance
24.
© 2016 IBM
Corporation24 Road Ahead… Deep analysis with more scenarios • Traffic prediction • Trip predication • Commute methods • etc More data sources for trajectory/traffic • GPS for taxi, bus • RFID on road • Road monitoring data • Subway stop check-in/out info • Parking Lot • Fusion with weather, social data Data exploration environment to support data science & continuous engineering of new features Leverage more SparkML for traffic prediction Cluster scale-out with more data and algorithms Data ingestion with Kafka/Flume (message hub) SQL on Hadoop Graph computation for nearest path and roadmatcher Current Deployment Big Data Platform Scale-out Scale-out New Scenarios w/ new data Data Exploration Environment Engineering and deployment Data movement
25.
© 2016 IBM
Corporation25 © 2016 IBM Corporation Spark GeoSpatial Analysis for Other Scenarios Spatial-Temporal Trajectory Analysis for human Trajectory Data Management Trajectory Analysis Function Spatial-Temporal Trajectory Analysis for vehicle Common API geo-spatial data pre-process,geo-spatial Geometry Computing,Surface Mesh Computing Distributed geo-spatial calculating API (Base on Spark) IBM’s Big Data Analytics Platform Smart Transportation Smart Logistics Smart Tourism others
26.
© 2016 IBM
Corporation26 Big Data University and Data Science Workbench − A community initiative led by IBM − @yourpace, @yourplaceonline courses about data − Developed by industry experts − Free courses by the community with hands-on labs − Certificate of completion and badges − Looking for contributors! Integrated Set of Tools, Languages and Execution Environments Clean and Prepare Data • OpenRefine Experiment with and Analyze Data • Jupyter Notebooks, R Studio, SeaHorse Connect to data processing engines: • Spark, Hadoop, dashDB, BigSQL, BigR http://DataScientistWorkbench.com http://bigdatauniversity.com
Baixar agora