Enviar pesquisa
Carregar
Railroad Modeling at Hadoop Scale
•
2 gostaram
•
879 visualizações
DataWorks Summit
Seguir
Tecnologia
Vista de apresentação de diapositivos
Denunciar
Compartilhar
Vista de apresentação de diapositivos
Denunciar
Compartilhar
1 de 22
Recomendados
Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.
Chijioke “CJ” Ejimuda
Esri ArcGIS Federal
Esri ArcGIS Federal
Harsh Prakash (AWS, Azure, Security+, Agile, PMP, GISP)
Esri WebGIS Platform
Esri WebGIS Platform
Harsh Prakash (AWS, Azure, Security+, Agile, PMP, GISP)
Esri in AWS Cloud
Esri in AWS Cloud
Harsh Prakash (AWS, Azure, Security+, Agile, PMP, GISP)
PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics
SingleStore
AAPG Geoscience Technology Workshop 2019
AAPG Geoscience Technology Workshop 2019
Chijioke “CJ” Ejimuda
Advait kulkarni
Advait kulkarni
Advait Kulkarni
Airline Analysis of Data Using Hadoop
Airline Analysis of Data Using Hadoop
Greater Noida Institute Of Technology
Recomendados
Leveraging big data to maximize value from rail and power infrastructure assets.
Leveraging big data to maximize value from rail and power infrastructure assets.
Chijioke “CJ” Ejimuda
Esri ArcGIS Federal
Esri ArcGIS Federal
Harsh Prakash (AWS, Azure, Security+, Agile, PMP, GISP)
Esri WebGIS Platform
Esri WebGIS Platform
Harsh Prakash (AWS, Azure, Security+, Agile, PMP, GISP)
Esri in AWS Cloud
Esri in AWS Cloud
Harsh Prakash (AWS, Azure, Security+, Agile, PMP, GISP)
PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics
SingleStore
AAPG Geoscience Technology Workshop 2019
AAPG Geoscience Technology Workshop 2019
Chijioke “CJ” Ejimuda
Advait kulkarni
Advait kulkarni
Advait Kulkarni
Airline Analysis of Data Using Hadoop
Airline Analysis of Data Using Hadoop
Greater Noida Institute Of Technology
Resume
Resume
Karishma Mulani
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Databricks
Power BI Streaming Datasets - San Diego BI Users Group
Power BI Streaming Datasets - San Diego BI Users Group
Greg McMurray
A Study on New York City Taxi Rides
A Study on New York City Taxi Rides
Caglar Subasi
A study of Data Quality and Analytics
A study of Data Quality and Analytics
Ali Habeeb
The Future of Data Pipelines
The Future of Data Pipelines
All Things Open
StreamSet ETL tool
StreamSet ETL tool
SwapnilSHampi
FPGA Acceleration of Apache Spark on AWS
FPGA Acceleration of Apache Spark on AWS
Christoforos Kachris
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Jim Dowling
Cruising in data lake from zero to scale
Cruising in data lake from zero to scale
John Varghese
YingqiCV
YingqiCV
Yingqi Xiong
Graph Analytics for big data
Graph Analytics for big data
Sigmoid
From Gust To Tempest: Scaling Storm
From Gust To Tempest: Scaling Storm
DataWorks Summit
Historical occupational classification and stratification schemes (lecture)
Historical occupational classification and stratification schemes (lecture)
Richard Zijdeman
Assignment.econ 260
Assignment.econ 260
حاتم السعداوي
Examen de-química-ii
Examen de-química-ii
AlejandraBerenice2108
Harry Verwayen, The More You Give The More You Get
Harry Verwayen, The More You Give The More You Get
EUscreen
Andreas Fickers: Transmedia Storytelling and Media History
Andreas Fickers: Transmedia Storytelling and Media History
EUscreen
Revista Académica de la Universidad La Salle
Revista Académica de la Universidad La Salle
Pérez Esquer
Examen de-química-ii-1
Examen de-química-ii-1
ItzelValenciaEscarcega1999
Polka Dot Teapots
Polka Dot Teapots
cinty45
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
DataWorks Summit
Mais conteúdo relacionado
Mais procurados
Resume
Resume
Karishma Mulani
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Databricks
Power BI Streaming Datasets - San Diego BI Users Group
Power BI Streaming Datasets - San Diego BI Users Group
Greg McMurray
A Study on New York City Taxi Rides
A Study on New York City Taxi Rides
Caglar Subasi
A study of Data Quality and Analytics
A study of Data Quality and Analytics
Ali Habeeb
The Future of Data Pipelines
The Future of Data Pipelines
All Things Open
StreamSet ETL tool
StreamSet ETL tool
SwapnilSHampi
FPGA Acceleration of Apache Spark on AWS
FPGA Acceleration of Apache Spark on AWS
Christoforos Kachris
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Jim Dowling
Cruising in data lake from zero to scale
Cruising in data lake from zero to scale
John Varghese
YingqiCV
YingqiCV
Yingqi Xiong
Graph Analytics for big data
Graph Analytics for big data
Sigmoid
Mais procurados
(12)
Resume
Resume
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Power BI Streaming Datasets - San Diego BI Users Group
Power BI Streaming Datasets - San Diego BI Users Group
A Study on New York City Taxi Rides
A Study on New York City Taxi Rides
A study of Data Quality and Analytics
A study of Data Quality and Analytics
The Future of Data Pipelines
The Future of Data Pipelines
StreamSet ETL tool
StreamSet ETL tool
FPGA Acceleration of Apache Spark on AWS
FPGA Acceleration of Apache Spark on AWS
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Cruising in data lake from zero to scale
Cruising in data lake from zero to scale
YingqiCV
YingqiCV
Graph Analytics for big data
Graph Analytics for big data
Destaque
From Gust To Tempest: Scaling Storm
From Gust To Tempest: Scaling Storm
DataWorks Summit
Historical occupational classification and stratification schemes (lecture)
Historical occupational classification and stratification schemes (lecture)
Richard Zijdeman
Assignment.econ 260
Assignment.econ 260
حاتم السعداوي
Examen de-química-ii
Examen de-química-ii
AlejandraBerenice2108
Harry Verwayen, The More You Give The More You Get
Harry Verwayen, The More You Give The More You Get
EUscreen
Andreas Fickers: Transmedia Storytelling and Media History
Andreas Fickers: Transmedia Storytelling and Media History
EUscreen
Revista Académica de la Universidad La Salle
Revista Académica de la Universidad La Salle
Pérez Esquer
Examen de-química-ii-1
Examen de-química-ii-1
ItzelValenciaEscarcega1999
Polka Dot Teapots
Polka Dot Teapots
cinty45
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
DataWorks Summit
Marketing de Experiencias Online
Marketing de Experiencias Online
Ricardo Valdés
Hadoop: Making it work for the Business Unit
Hadoop: Making it work for the Business Unit
DataWorks Summit
White eagle, by robert
White eagle, by robert
beatusest2
Homework 3
Homework 3
Audioslaveano Ratm
An Exploratory Study on the Links between Individual Upcycling, Product Attac...
An Exploratory Study on the Links between Individual Upcycling, Product Attac...
Kyungeun Sung
Examen quimica diciembre
Examen quimica diciembre
Victor Delgado Espinosa
Curso gestión estratégica del tiempo 2016
Curso gestión estratégica del tiempo 2016
juansalas
Computación para todos (primaria) 5to grado
Computación para todos (primaria) 5to grado
Marcos Torres
陰陽編 太極から八卦へ
陰陽編 太極から八卦へ
reigan_s
Ngss poster
Ngss poster
VictoriaLes
Destaque
(20)
From Gust To Tempest: Scaling Storm
From Gust To Tempest: Scaling Storm
Historical occupational classification and stratification schemes (lecture)
Historical occupational classification and stratification schemes (lecture)
Assignment.econ 260
Assignment.econ 260
Examen de-química-ii
Examen de-química-ii
Harry Verwayen, The More You Give The More You Get
Harry Verwayen, The More You Give The More You Get
Andreas Fickers: Transmedia Storytelling and Media History
Andreas Fickers: Transmedia Storytelling and Media History
Revista Académica de la Universidad La Salle
Revista Académica de la Universidad La Salle
Examen de-química-ii-1
Examen de-química-ii-1
Polka Dot Teapots
Polka Dot Teapots
Dealing with an Upside Down Internet With High Performance Time Series Database
Dealing with an Upside Down Internet With High Performance Time Series Database
Marketing de Experiencias Online
Marketing de Experiencias Online
Hadoop: Making it work for the Business Unit
Hadoop: Making it work for the Business Unit
White eagle, by robert
White eagle, by robert
Homework 3
Homework 3
An Exploratory Study on the Links between Individual Upcycling, Product Attac...
An Exploratory Study on the Links between Individual Upcycling, Product Attac...
Examen quimica diciembre
Examen quimica diciembre
Curso gestión estratégica del tiempo 2016
Curso gestión estratégica del tiempo 2016
Computación para todos (primaria) 5to grado
Computación para todos (primaria) 5to grado
陰陽編 太極から八卦へ
陰陽編 太極から八卦へ
Ngss poster
Ngss poster
Semelhante a Railroad Modeling at Hadoop Scale
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
StampedeCon
Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020
Timothy Spann
Microservices, Events, and Breaking the Data Monolith with Kafka
Microservices, Events, and Breaking the Data Monolith with Kafka
VMware Tanzu
SmartCity StreamApp Platform: Real-time Information for Smart Cities and Tran...
SmartCity StreamApp Platform: Real-time Information for Smart Cities and Tran...
Cubic Corporation
Accumulo Summit 2014: Accumulo with Distributed SQL queries
Accumulo Summit 2014: Accumulo with Distributed SQL queries
Accumulo Summit
Liwanshi-Raheja-Resume
Liwanshi-Raheja-Resume
Liwanshi Raheja
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
DataWorks Summit
Crime Analysis & Prediction System
Crime Analysis & Prediction System
BigDataCloud
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
Jim Dowling
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark Summit
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's Keynote
Hortonworks
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
Pat Patterson
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
Zekeriya Besiroglu
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...
Carolyn Duby
Horses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
freshdatabos
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
Amazon Web Services
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann
ShibiaoNong_Resume_ColumbiaMS (1)
ShibiaoNong_Resume_ColumbiaMS (1)
Shibiao Nong
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
Semelhante a Railroad Modeling at Hadoop Scale
(20)
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020
Microservices, Events, and Breaking the Data Monolith with Kafka
Microservices, Events, and Breaking the Data Monolith with Kafka
SmartCity StreamApp Platform: Real-time Information for Smart Cities and Tran...
SmartCity StreamApp Platform: Real-time Information for Smart Cities and Tran...
Accumulo Summit 2014: Accumulo with Distributed SQL queries
Accumulo Summit 2014: Accumulo with Distributed SQL queries
Liwanshi-Raheja-Resume
Liwanshi-Raheja-Resume
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Crime Analysis & Prediction System
Crime Analysis & Prediction System
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark Summit EMEA - Arun Murthy's Keynote
Spark Summit EMEA - Arun Murthy's Keynote
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...
Horses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
ShibiaoNong_Resume_ColumbiaMS (1)
ShibiaoNong_Resume_ColumbiaMS (1)
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Mais de DataWorks Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
Mais de DataWorks Summit
(20)
Data Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Último
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
DianaGray10
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Zilliz
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
lior mazor
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
apidays
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Dropbox
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Product Anonymous
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Rustici Software
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Khem
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
Zilliz
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
Último
(20)
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Railroad Modeling at Hadoop Scale
1.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience Railroad Modeling at HadoOp Scale Hadoop Summit 3 June 2014, San Jose, CA John Akred (@BigDataAnalysis), Tatsiana Maskalevich (@notrockstar) www.svds.com @SVDataScience
2.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 2 Why is a data science & engineering consulting company building its own Caltrain app?
3.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 3
4.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 4 • Commuter rail between San Francisco and San Mateo and Santa Clara counties ~30 stations • 118 passenger cars • 60% >=30 years old • 2014 weekday ridership is 52,019 people • On-time performance is about 92% • No reliable real-time status information • API outage between April 5th and June 2nd
5.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience HOW DO WE KNOW IF THE TRAIN IS LATE? • Direct observation – We can hear the train horn – We can see the train when it goes by • Purpose-built systems: – We can use Caltrain API’s (when working) • Other signals – We can check Twitter for delay info or rider comments 5
6.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience SVDS Approach 6 Take advantage of the available signals Use historical data to make direct and latent observations more useful Provide a service that gives users valuable planning and riding features Don’t let the perfect be the enemy of the good
7.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 7 Stovepipe: One-to-one relationship from data source to product Hard Failure: If the data source is broken, so is the app. Multi-sourced: Redundancy of overlapping data sources makes your products more resilient Graceful Degradation: If a data source breaks, there is a backup and your app continues to function Production data services abstract the probabilistic integration of overlapping data sources. We call this model a Data Mesh: DATA RESILIENCY Products Data Sources Broken Data Sources Data Services
8.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 8 Source Signals Audio Image Text API Variety Volume Velocity
9.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 9 • Microphone connected to Raspberry Pi mic->preamp->analog-to-digital converter->usb • PyAudio running on Raspberry Pi serializes audio as an array of 2-byte integers. • Sound data + metadata -> Flume on AWS via flumelogger • We use FFT + Decision Trees to detect and classify the trains into express and local based on the whistle sound. Audio Capture and Ingest Raspberry Pi Raw Audio Agent Raw Audio Agent
10.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 10 • wget pulls images from camera’s built- in server 2-3 times a second, and saves them on a local server/NAS • Flume pushes the image data to our EC2 servers • openCV (python) is used to detect trains in images Image Capture and Ingest Raw Image Agent Raw Image Agent Local Server
11.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 11 • Capturing all the tweets with keyword ‘Caltrain’ via Twitter API • Flume agent sends tweets to Apache Storm topology for processing • Tweets are parsed and written to HDFS and HBase • Event Detection is based on the baseline number of tweets per hour and keywords Text Capture and Ingest: Twitter Raw Image Agent Twitter APIs
12.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 12 • Real-time departure times available via 511.org developer API’s • Python script collects data once a minute from 511.org APIs and stores it in HDFS as sequence files using WebHDFS API’s. • Python script collects data from the Caltrain site that includes run # • Didn’t function from April 5th until June 2nd 2014 Caltrain API Data Capturing scraper.py 511.Org APIs Caltrain Webpag e data_collec tor_api.py
13.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 13 Combining the Signals Audio Signal Detection Image Recogni- tion Text Analysis STATE of complex system
14.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 14 Twitter Agent Analytics Dev MapReduce Event StorageSound Agent Image Agent Twitter Spout Sound Spout Image Spout Tweet Parser Tweets Counter HDFS Writer Event Detector Alerts Twitter API HBase Writer Microphone on Raspberry Pi Web Camera External Data Sources Data Platform Sounds Classifier Train Detector Transmit to APP Caltrain Agent Caltrain Spout Caltrain API Schedule Integrator
15.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 15 Batch: • Apply FFT to audio data to identify train based on train whistle’s fundamental frequencies. • Decision tree trained to classify trains into local or express based on minimum and maximum fundamental frequencies (Doppler effect) Data Science: Audio Real-Time: • Execute local / express classifier • Send data to the Event Detector for APP alerts • Store results in HBase • Apply FFT to audio signal • Extract min and max fundamental frequencies Frequency,Hz Histogram of Whistle Frequencies Over a Period of Time FrequencyCounts
16.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 16 Real-Time • ORB algorithm (openCV) is used to detect the train in image • Sends results to the Event Detector to identify train and compare to schedule • Event Detector updates APP with the train’s status, alerts if late Data Science: Image Number of Key-PointsThat AreThe Same In Two ConsecutivesImages Time (Sec) NumberofMatchingPoints
17.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 17 Batch: • Update baseline tweet frequencies for each hour as additional historical data collected • Store model parameters in HBase Data Science: Text Real-Time: • Count tweets as they stream through topology • Alert based on frequency deviations from the baseline
18.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 18 Baseline Calculation Baseline
19.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience 19 Future Work • Detect direction of train in image processing • Use natural language processing on twitter data for event detector. • Continue evaluation of analytical frameworks for model computation • Add observation posts • Release Caltrain Rider Application
20.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience COMING SOON: CALTRAIN RIDER APP • Find out what train to catch using our ‘Ride Now’ view • Select a train, see when that train should be reaching each stop in a trip detail view. • For more info: www.svds.com/trains 20
21.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience questions 21 Yes, We’re Hiring www.svds.com/join-us
22.
© 2014 Silicon
Valley Data Science LLC All Rights Reserved. svds.com @SVDataScience THANK YOU John @BigDataAnalysis Tatsiana @notrockstar 22
Notas do Editor
When train is detected an the information is sent to Hbase and to the Event detector The camera has a network connection, so we can drop images via wget to the local server. Label wget
Add API setup