SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Think ahead. Act now.
Think ahead. Act now.
Managing and querying
datasets with Data Factory,
Cosmos DB and Azure
Functions.
Marc Duiker
Think ahead. Act now.
The Case
A murder has happened in New York City on 29th Jan 2014.
The suspect most likely escaped by using a taxi.
It’s our job to find out which taxi the suspect could have used.
Think ahead. Act now.
Original data sources
NYPD Complaint Data
2014
NYC Taxi Trip Data
2014
https://data.cityofnewyork.us/Public-Safety/NYPD-
Complaint-Data-Historic/qgea-i56i
https://www.kaggle.com/kentonnlp/2014-new-york-
city-taxi-trips
5.6M rows
1.3 GB
15M rows
2.3 GB
Think ahead. Act now.
NYPD Complaint Data (trimmed down)
CSV with 39k records for Jan 2014
Attributes
• Date & time
• Offense classification (KY_CD)
• Latitude & longitude
• …
More details in NYPD_Complaint_Data_Column_Descriptions.csv
Think ahead. Act now.
Think ahead. Act now.
NYC Taxi Trip Data (trimmed down)
CSV with 477k records for 29th Jan 2014
Attributes
• Pickup date & time
• Pickup latitude & longitude
• …
Think ahead. Act now.
Think ahead. Act now.
What can we use on Azure?
Query
Azure FunctionsData FactoryCosmos DB
Storage TransferCsv
Think ahead. Act now.
Cosmos DB
Think ahead. Act now.
Cosmos DB: databases & collections
Account
(gabc-nyc-db)
Database
(nycdatabase)
Collection
(complaints)
Collection
(taxitrips)
Documents
{ .. }
Documents
{ .. }
SQL API
Think ahead. Act now.
Cosmos DB: GeoJSON
https://docs.microsoft.com/en-us/azure/cosmos-db/geospatial
Azure Cosmos DB supports indexing and querying of
geospatial point data that's represented using
the GeoJSON specification.
{
"type":"Point",
"coordinates":[-73.88, 40.76]
}
Think ahead. Act now.
Cosmos DB: Change feed
• Cosmos DB persists events about insertion and updates to
documents in the change feed.
Think ahead. Act now.
Cosmos DB
Think ahead. Act now.
Azure Data Factory (v2)
Source Sink
• Mapping
• Transforms
• Scheduling
• Throughput
Think ahead. Act now.
Data Factory: source schema
• When importing a csv DataFactory looks at the first line of
data to determine the data types.
• Inspect the data for empty and numeric values
• “”  empty value for String, Int64 or Double?
• 0  Int64 or Double?
• Sometimes numbers are categories (String).
Think ahead. Act now.
Azure Data Factory
https://datafactoryv2.azure.com/
Think ahead. Act now.
Azure Functions (Runtime 2)
• Serverless compute service to run code
on demand
• Support for various languages: C#, F#, Node.js, Java, or PHP
• Automatic scaling
• Pay-per-use
Think ahead. Act now.
Azure Functions Triggers
Think ahead. Act now.
Azure Functions
Think ahead. Act now.
Integrating the services
UpdateTaxiTripGeoData
NYPD Complaint
CSV
NYC Taxi Trip
CSV
Data Factory
Pipelines
Cosmos DB
Collections
GetComplaintsByOffenseId
GetTaxiTripsWithinRange
Lab 1
Lab 2
Think ahead. Act now.
Hands-on labs
https://github.com/XpiritBV/GABC2018_HandsOnLabs/

Mais conteúdo relacionado

Mais procurados

Open Source Databases And Gis
Open Source Databases And GisOpen Source Databases And Gis
Open Source Databases And GisKudos S.A.S
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...Gianfranco Palumbo
 
AWS July Webinar Series - Getting Started with Amazon DynamoDB
AWS July Webinar Series - Getting Started with Amazon DynamoDBAWS July Webinar Series - Getting Started with Amazon DynamoDB
AWS July Webinar Series - Getting Started with Amazon DynamoDBAmazon Web Services
 
R statistics with mongo db
R statistics with mongo dbR statistics with mongo db
R statistics with mongo dbMongoDB
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DCCCRinc
 
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' MeetupMongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' MeetupMongoDB
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Stefan Urbanek
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevAltinity Ltd
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBNosh Petigara
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkSpark Summit
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...InfluxData
 
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
Supercharge your Analytics with ClickHouse, v.2. By Vadim TkachenkoSupercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
Supercharge your Analytics with ClickHouse, v.2. By Vadim TkachenkoAltinity Ltd
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jArangoDB Database
 
Improving the usability of the Information system of land cover in Spain (SIOSE)
Improving the usability of the Information system of land cover in Spain (SIOSE)Improving the usability of the Information system of land cover in Spain (SIOSE)
Improving the usability of the Information system of land cover in Spain (SIOSE)Benito Zaragozí
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...NoSQLmatters
 

Mais procurados (18)

Open Source Databases And Gis
Open Source Databases And GisOpen Source Databases And Gis
Open Source Databases And Gis
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
 
AWS July Webinar Series - Getting Started with Amazon DynamoDB
AWS July Webinar Series - Getting Started with Amazon DynamoDBAWS July Webinar Series - Getting Started with Amazon DynamoDB
AWS July Webinar Series - Getting Started with Amazon DynamoDB
 
R statistics with mongo db
R statistics with mongo dbR statistics with mongo db
R statistics with mongo db
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
 
Scio
ScioScio
Scio
 
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' MeetupMongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
MongoDB at the Silicon Valley iPhone and iPad Developers' Meetup
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
 
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
Supercharge your Analytics with ClickHouse, v.2. By Vadim TkachenkoSupercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4j
 
Improving the usability of the Information system of land cover in Spain (SIOSE)
Improving the usability of the Information system of land cover in Spain (SIOSE)Improving the usability of the Information system of land cover in Spain (SIOSE)
Improving the usability of the Information system of land cover in Spain (SIOSE)
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
 

Semelhante a Managing and querying large data sets using Data Factory, Cosmos DB and Azure Functions

Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.GeeksLab Odessa
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax Academy
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesSwami Sundaramurthy
 
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Alex Pinto
 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To ProductionJen Aman
 
Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkSadayuki Furuhashi
 
Document Data Modelling with Couchbase Server 4.0
Document Data Modelling with Couchbase Server 4.0Document Data Modelling with Couchbase Server 4.0
Document Data Modelling with Couchbase Server 4.0Cihan Biyikoglu
 
Presentation sql server to oracle a database migration roadmap
Presentation    sql server to oracle a database migration roadmapPresentation    sql server to oracle a database migration roadmap
Presentation sql server to oracle a database migration roadmapxKinAnx
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Intro to Joyent's Manta Object Storage Service
Intro to Joyent's Manta Object Storage ServiceIntro to Joyent's Manta Object Storage Service
Intro to Joyent's Manta Object Storage ServiceRod Boothby
 
SomeSQL at Skyscanner - Scaling in a changing world of databases and hardware
SomeSQL at Skyscanner - Scaling in a changing world of databases and hardwareSomeSQL at Skyscanner - Scaling in a changing world of databases and hardware
SomeSQL at Skyscanner - Scaling in a changing world of databases and hardwarealistair_hann
 
Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...jaxLondonConference
 
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Data Con LA
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 

Semelhante a Managing and querying large data sets using Data Factory, Cosmos DB and Azure Functions (20)

Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
Big Data Seervices in Danaos Use Case
Big Data Seervices in Danaos Use CaseBig Data Seervices in Danaos Use Case
Big Data Seervices in Danaos Use Case
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines
 
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To Production
 
How do you decide where your customer was?
How do you decide where your customer was?How do you decide where your customer was?
How do you decide where your customer was?
 
Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with Embulk
 
Document Data Modelling with Couchbase Server 4.0
Document Data Modelling with Couchbase Server 4.0Document Data Modelling with Couchbase Server 4.0
Document Data Modelling with Couchbase Server 4.0
 
M7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal HausenblasM7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal Hausenblas
 
Presentation sql server to oracle a database migration roadmap
Presentation    sql server to oracle a database migration roadmapPresentation    sql server to oracle a database migration roadmap
Presentation sql server to oracle a database migration roadmap
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Intro to Joyent's Manta Object Storage Service
Intro to Joyent's Manta Object Storage ServiceIntro to Joyent's Manta Object Storage Service
Intro to Joyent's Manta Object Storage Service
 
SomeSQL at Skyscanner - Scaling in a changing world of databases and hardware
SomeSQL at Skyscanner - Scaling in a changing world of databases and hardwareSomeSQL at Skyscanner - Scaling in a changing world of databases and hardware
SomeSQL at Skyscanner - Scaling in a changing world of databases and hardware
 
Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...Large scale, interactive ad-hoc queries over different datastores with Apache...
Large scale, interactive ad-hoc queries over different datastores with Apache...
 
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 

Mais de Marc Duiker

Take your Azure Functions to the next level with Durable Functions - Serverle...
Take your Azure Functions to the next level with Durable Functions - Serverle...Take your Azure Functions to the next level with Durable Functions - Serverle...
Take your Azure Functions to the next level with Durable Functions - Serverle...Marc Duiker
 
Take your Azure Functions to the next level with Durable Functions - WAZUG
Take your Azure Functions to the next level with Durable Functions - WAZUGTake your Azure Functions to the next level with Durable Functions - WAZUG
Take your Azure Functions to the next level with Durable Functions - WAZUGMarc Duiker
 
Put Your Web App on a Diet with Azure Functions
Put Your Web App on a Diet with Azure FunctionsPut Your Web App on a Diet with Azure Functions
Put Your Web App on a Diet with Azure FunctionsMarc Duiker
 
Take your azure functions to the next level with durable functions @ Experts ...
Take your azure functions to the next level with durable functions @ Experts ...Take your azure functions to the next level with durable functions @ Experts ...
Take your azure functions to the next level with durable functions @ Experts ...Marc Duiker
 
Orchestrate your Azure Functions with Durable Functions - AzureThursday Meetup
Orchestrate your Azure Functions with Durable Functions - AzureThursday MeetupOrchestrate your Azure Functions with Durable Functions - AzureThursday Meetup
Orchestrate your Azure Functions with Durable Functions - AzureThursday MeetupMarc Duiker
 
Improving your vision with Azure Cognitive Services - /dev/070
Improving your vision with Azure Cognitive Services - /dev/070Improving your vision with Azure Cognitive Services - /dev/070
Improving your vision with Azure Cognitive Services - /dev/070Marc Duiker
 
Getting Started with Serverless Architectures using Azure Functions
Getting Started with Serverless Architectures using Azure FunctionsGetting Started with Serverless Architectures using Azure Functions
Getting Started with Serverless Architectures using Azure FunctionsMarc Duiker
 
Improving your vision with Azure Cognitive Services - MixUG
Improving your vision with Azure Cognitive Services - MixUGImproving your vision with Azure Cognitive Services - MixUG
Improving your vision with Azure Cognitive Services - MixUGMarc Duiker
 

Mais de Marc Duiker (8)

Take your Azure Functions to the next level with Durable Functions - Serverle...
Take your Azure Functions to the next level with Durable Functions - Serverle...Take your Azure Functions to the next level with Durable Functions - Serverle...
Take your Azure Functions to the next level with Durable Functions - Serverle...
 
Take your Azure Functions to the next level with Durable Functions - WAZUG
Take your Azure Functions to the next level with Durable Functions - WAZUGTake your Azure Functions to the next level with Durable Functions - WAZUG
Take your Azure Functions to the next level with Durable Functions - WAZUG
 
Put Your Web App on a Diet with Azure Functions
Put Your Web App on a Diet with Azure FunctionsPut Your Web App on a Diet with Azure Functions
Put Your Web App on a Diet with Azure Functions
 
Take your azure functions to the next level with durable functions @ Experts ...
Take your azure functions to the next level with durable functions @ Experts ...Take your azure functions to the next level with durable functions @ Experts ...
Take your azure functions to the next level with durable functions @ Experts ...
 
Orchestrate your Azure Functions with Durable Functions - AzureThursday Meetup
Orchestrate your Azure Functions with Durable Functions - AzureThursday MeetupOrchestrate your Azure Functions with Durable Functions - AzureThursday Meetup
Orchestrate your Azure Functions with Durable Functions - AzureThursday Meetup
 
Improving your vision with Azure Cognitive Services - /dev/070
Improving your vision with Azure Cognitive Services - /dev/070Improving your vision with Azure Cognitive Services - /dev/070
Improving your vision with Azure Cognitive Services - /dev/070
 
Getting Started with Serverless Architectures using Azure Functions
Getting Started with Serverless Architectures using Azure FunctionsGetting Started with Serverless Architectures using Azure Functions
Getting Started with Serverless Architectures using Azure Functions
 
Improving your vision with Azure Cognitive Services - MixUG
Improving your vision with Azure Cognitive Services - MixUGImproving your vision with Azure Cognitive Services - MixUG
Improving your vision with Azure Cognitive Services - MixUG
 

Último

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Managing and querying large data sets using Data Factory, Cosmos DB and Azure Functions

  • 1. Think ahead. Act now. Think ahead. Act now. Managing and querying datasets with Data Factory, Cosmos DB and Azure Functions. Marc Duiker
  • 2. Think ahead. Act now. The Case A murder has happened in New York City on 29th Jan 2014. The suspect most likely escaped by using a taxi. It’s our job to find out which taxi the suspect could have used.
  • 3. Think ahead. Act now. Original data sources NYPD Complaint Data 2014 NYC Taxi Trip Data 2014 https://data.cityofnewyork.us/Public-Safety/NYPD- Complaint-Data-Historic/qgea-i56i https://www.kaggle.com/kentonnlp/2014-new-york- city-taxi-trips 5.6M rows 1.3 GB 15M rows 2.3 GB
  • 4. Think ahead. Act now. NYPD Complaint Data (trimmed down) CSV with 39k records for Jan 2014 Attributes • Date & time • Offense classification (KY_CD) • Latitude & longitude • … More details in NYPD_Complaint_Data_Column_Descriptions.csv
  • 6. Think ahead. Act now. NYC Taxi Trip Data (trimmed down) CSV with 477k records for 29th Jan 2014 Attributes • Pickup date & time • Pickup latitude & longitude • …
  • 8. Think ahead. Act now. What can we use on Azure? Query Azure FunctionsData FactoryCosmos DB Storage TransferCsv
  • 9. Think ahead. Act now. Cosmos DB
  • 10. Think ahead. Act now. Cosmos DB: databases & collections Account (gabc-nyc-db) Database (nycdatabase) Collection (complaints) Collection (taxitrips) Documents { .. } Documents { .. } SQL API
  • 11. Think ahead. Act now. Cosmos DB: GeoJSON https://docs.microsoft.com/en-us/azure/cosmos-db/geospatial Azure Cosmos DB supports indexing and querying of geospatial point data that's represented using the GeoJSON specification. { "type":"Point", "coordinates":[-73.88, 40.76] }
  • 12. Think ahead. Act now. Cosmos DB: Change feed • Cosmos DB persists events about insertion and updates to documents in the change feed.
  • 13. Think ahead. Act now. Cosmos DB
  • 14. Think ahead. Act now. Azure Data Factory (v2) Source Sink • Mapping • Transforms • Scheduling • Throughput
  • 15. Think ahead. Act now. Data Factory: source schema • When importing a csv DataFactory looks at the first line of data to determine the data types. • Inspect the data for empty and numeric values • “”  empty value for String, Int64 or Double? • 0  Int64 or Double? • Sometimes numbers are categories (String).
  • 16. Think ahead. Act now. Azure Data Factory https://datafactoryv2.azure.com/
  • 17. Think ahead. Act now. Azure Functions (Runtime 2) • Serverless compute service to run code on demand • Support for various languages: C#, F#, Node.js, Java, or PHP • Automatic scaling • Pay-per-use
  • 18. Think ahead. Act now. Azure Functions Triggers
  • 19. Think ahead. Act now. Azure Functions
  • 20. Think ahead. Act now. Integrating the services UpdateTaxiTripGeoData NYPD Complaint CSV NYC Taxi Trip CSV Data Factory Pipelines Cosmos DB Collections GetComplaintsByOffenseId GetTaxiTripsWithinRange Lab 1 Lab 2
  • 21. Think ahead. Act now. Hands-on labs https://github.com/XpiritBV/GABC2018_HandsOnLabs/