SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Dharmesh Vaya
@DRVaya
http://drvaya.wordpress.com/
Agenda
● What is Big Data ?
● Available Big Data Solutions & Issues
● Why Google BigQuery ?
● Inside BigQuery
● Features & Components
● RESTful API
● Development with BigQuery (Live Demo)
○ Query History, Projects, DataSets, Public Datasets, Table Details, Writing
Queries, Save Results.
○ Integration with Applications.
● BigQuery Tools
● Big Data Solution with BigQuery & Google Cloud Platform
● Pricing Model
● Any questions ?
What is Big Data ?
Is it a Data Type ?
No
Its a buzzword - massive volume of
structured and/or unstructured data.
It is so large that it is difficult to
process/analyze using traditional databases.
What is Big Data ?
Data that has following attributes can be ‘Big Data’
So how Big is B - I - G ?
So how Big is B - I - G ?
Library of Congress - Textual Data
20 Terabytes
(20 000 000 000 000 bytes)
So how Big is B - I - G ?
Amazon.com - Inventory &Customer Data
42 Terabytes
(42 000 000 000 000 bytes)
So how Big is B - I - G ?
YouTube.com - Media Data
100+ Terabytes
(100 000 000 000 000
bytes)
So how Big is B - I - G ?
Google.com - Search, Mail, Media & anything you can think of !!
850+ Terabytes
(850 000 000 000 000 bytes)
(Speculated Figures)
So how Big is B - I - G ?
World Data Center for Climate - Meteorology Data
6.2 Petabytes
(7 000 000 000 000 000 bytes)
Available Big Data Solutions & Issues
- Highly Scalable and Distributed Computing.
- Storage (HDFS) optimized for high throughput
- Security, disabled by default
- MapReduce is batch based, hence no real
time operations.
- Costly to maintain.
- Highly Scalable, talks of handling Petabytes
- Elastic set of resources to return result sets
- Almost 10x fast as compared to Hadoop.
- High costs of Data Migration and integration
- Operations/Maintenance cost may shoot up
Why Google BigQuery ?
Hadoop
(with Hive)
Amazon
Redshift
Google
BigQuery
= 1.4 TB
On an average its within 8-10 seconds !!
Inside Google BigQuery
● BigQuery is based on Dremel, a technology pioneered by Google & extensively used
within.
● It used Columnar storage & multi-level execution trees to achieve interactive
performance for queries against multi-terabyte datasets.
● BigQuery's performance advantage comes from its parallel processing architecture.
● The query is processed by thousands of servers in a multi-level execution tree
structure, with the final results aggregated at the root. BigQuery stores the data in a
columnar format so that only data from the columns being queried are real.
● All this & more is now available as a publicly available service for any business
or developer to use. This release made it possible for those outside of Google to
utilize the power of Dremel for their Big Data processing requirements.
Columnar Storage & Trees
Inside Google BigQuery
There’s a difference
● Dremel is designed as an interactive
data analysis tool for large datasets.
● MapReduce is designed as a
programming framework to batch
process large datasets
Hey you mentioned
Dremel,
isn’t Map Reduce
based on it ?
Features & Components
Features:
● Web GUI for BigQuery
● Affordable
● Run in Background
● Easy Data Importation
● Flexible (Addition of Columns, Native Support For Timestamp Type
Of Data)
● REST API Support
● More than just Standard SQL
Components:
● Project
● Tables
● DataSets
● Jobs
RESTful API
Method HTTP Request
delete DELETE
/projects/projectId/datasets/datasetId
get GET
/projects/projectId/datasets/datasetId
insert POST /projects/projectId/datasets
list GET /projects/projectId/datasets
patch PATCH
/projects/projectId/datasets/datasetId
update PUT
/projects/projectId/datasets/datasetId
For Datasets
RESTful API
Method HTTP Request
delete GET /projects/projectId/jobs/jobId
getQueryR
esults
GET
/projects/projectId/queries/jobId
insert POST
https://www.googleapis.
com/upload/bigquery/v2/projects/p
rojectId/jobs
and
POST /projects/projectId/jobs
list GET /projects/projectId/jobs
query POST /projects/projectId/queries
For Jobs
Similar methods for -
● Projects
● Tables
● TableData
Demo using Web Interface
Demo : Excel Connector
+
BigQuery Tools
BigQuery Excel Connector bq Command LineBigQuery Browser Tool
Virtualization
& BI Tools
ETL Tools
ODBC Connector
Big Data Solution with BigQuery
Big Data Solution with BigQuery
Data Pipeline - transforming and loading data into BigQuery
The process of using the Google Cloud Platform to upload data into BigQuery involves
uploading the CSV files or Javascript Object Notation (JSON) files to Google Cloud Storage before
loading the data into BigQuery. Alternatively, REST API can also be used to provide programmatic
integration into the current computing environment.
Data Visualization - performing data analysis on BigQuery and visualizing the results
A custom, web-based dashboard can be built on Google App Engine using the BigQuery REST
API to execute the queries and using Google Chart Tools to visualize the results
Pricing Model
Action Example
Loading Data Loading files/data into BigQuery
Exporting Data Exporting data, Saving Results from BigQuery
Table Reads Browsing through data
Table Copies Copy existing table to new table
Storage Action Cost
Storage $0.020 per GB, per month.
Streaming Inserts Free until January 1, 2015.
After January 1, 2015,
$0.01 per 100,000 rows
Query Pricing Cost
On-demand $5 per TB
Reserved
Capacity
5GB per second
$20k/ month
Wow that’s like 800MB for 1 Rupee,
even Internet ain’t that cheap here.
Where to use ?
● Not a replacement to traditional systems, but it compliments the eco-system !!
● Major strength is Handling Large DataSets
● Major usage in Data Analytics
● Important component of Google Cloud Platform
● People are interested in numbers/data and that too quick….
Google BigQuery is the future of Analytics!!
Any questions ?
What we covered ...
✓ What is Big Data ?
✓ Available Big Data Solutions & Issues
✓ Why Google BigQuery ?
✓ Features, Components & Tools
✓ RESTful API
✓ Demo using Web Interface
✓ Big Query Tools
✓ Big Data Solution with BigQuery
✓ Pricing Model
✓ Usage
https://bigquery.cloud.google.com
No registration, just sign-in with your Google account
Follow Dharmesh Vaya on @DRVaya
or subscribe to my http://drvaya.wordpress.com/
You can also add me on +DharmeshVaya
About the presenter
https://cloud.google.com/developers/articles/getting-started-with-google-bigquery
https://cloud.google.com/files/Redbus.pdf
http://www.reddit.
com/r/bigquery/comments/28ialf/173_million_2013_nyc_taxi_rides_shared_on_big
query/
http://www.datawrangling.com/some-datasets-available-on-the-web/
http://bigqueri.es/
https://developers.google.com/bigquery/pricing#data

Mais conteúdo relacionado

Mais procurados

30 days of google cloud event
30 days of google cloud event30 days of google cloud event
30 days of google cloud eventPreetyKhatkar
 
Google Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.comGoogle Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.comAlex Van Boxel
 
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewDoiT International
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsAndreas Raible
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperMárton Kodok
 
TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDatatdc-globalcode
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataTreasure Data, Inc.
 
Google Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teamsGoogle Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teamsBarton Rhodes
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query BasicsIdo Green
 
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014James Chittenden
 
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIsGDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIsPatrick Chanezon
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
Streaming 4 billion Messages per day. Lessons Learned.
Streaming 4 billion Messages per day. Lessons Learned.Streaming 4 billion Messages per day. Lessons Learned.
Streaming 4 billion Messages per day. Lessons Learned.Angelos Petheriotis
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapItai Yaffe
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCPAllCloud
 

Mais procurados (20)

Google BigQuery
Google BigQueryGoogle BigQuery
Google BigQuery
 
30 days of google cloud event
30 days of google cloud event30 days of google cloud event
30 days of google cloud event
 
Google Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.comGoogle Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.com
 
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s New
 
Big query
Big queryBig query
Big query
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & Benefits
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigData
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
Google Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teamsGoogle Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teams
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
 
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014
 
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIsGDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
 
Google and big query
Google and big queryGoogle and big query
Google and big query
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
Streaming 4 billion Messages per day. Lessons Learned.
Streaming 4 billion Messages per day. Lessons Learned.Streaming 4 billion Messages per day. Lessons Learned.
Streaming 4 billion Messages per day. Lessons Learned.
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 
Data Science on Google Cloud Platform
Data Science on Google Cloud PlatformData Science on Google Cloud Platform
Data Science on Google Cloud Platform
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCP
 

Destaque

You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQueryRyuji Tamagawa
 
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - SematextHBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - SematextCloudera, Inc.
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsCloudera, Inc.
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...Cloudera, Inc.
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceCloudera, Inc.
 
HBaseCon 2013: Deal Personalization Engine with HBase @ Groupon
HBaseCon 2013: Deal Personalization Engine with HBase @ GrouponHBaseCon 2013: Deal Personalization Engine with HBase @ Groupon
HBaseCon 2013: Deal Personalization Engine with HBase @ GrouponCloudera, Inc.
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaselarsgeorge
 

Destaque (7)

You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQuery
 
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - SematextHBaseCon 2012 | Real-time Analytics with HBase - Sematext
HBaseCon 2012 | Real-time Analytics with HBase - Sematext
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
 
HBaseCon 2013: Deal Personalization Engine with HBase @ Groupon
HBaseCon 2013: Deal Personalization Engine with HBase @ GrouponHBaseCon 2013: Deal Personalization Engine with HBase @ Groupon
HBaseCon 2013: Deal Personalization Engine with HBase @ Groupon
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
 

Semelhante a Exploring BigData with Google BigQuery

VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...Márton Kodok
 
Google Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarGoogle Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarRasel Rana
 
Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)Rasel Rana
 
bigquery.pptx
bigquery.pptxbigquery.pptx
bigquery.pptxHarissh16
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on HadoopDataWorks Summit
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopTony Ng
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolEDB
 
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...Agile Testing Alliance
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemstaimur hafeez
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryMárton Kodok
 
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Jonathan Seidman
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitzRaghu Kashyap
 
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 Apache AGE and the synergy effect in the combination of Postgres and NoSQL Apache AGE and the synergy effect in the combination of Postgres and NoSQL
Apache AGE and the synergy effect in the combination of Postgres and NoSQLEDB
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 

Semelhante a Exploring BigData with Google BigQuery (20)

VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 
Google Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery WebinarGoogle Developer Group - Cloud Singapore BigQuery Webinar
Google Developer Group - Cloud Singapore BigQuery Webinar
 
Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)
 
bigquery.pptx
bigquery.pptxbigquery.pptx
bigquery.pptx
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on Hadoop
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic Tool
 
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
Data Platform on GCP
Data Platform on GCPData Platform on GCP
Data Platform on GCP
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
 
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitz
 
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 Apache AGE and the synergy effect in the combination of Postgres and NoSQL Apache AGE and the synergy effect in the combination of Postgres and NoSQL
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
 

Último

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Exploring BigData with Google BigQuery

  • 2. Agenda ● What is Big Data ? ● Available Big Data Solutions & Issues ● Why Google BigQuery ? ● Inside BigQuery ● Features & Components ● RESTful API ● Development with BigQuery (Live Demo) ○ Query History, Projects, DataSets, Public Datasets, Table Details, Writing Queries, Save Results. ○ Integration with Applications. ● BigQuery Tools ● Big Data Solution with BigQuery & Google Cloud Platform ● Pricing Model ● Any questions ?
  • 3. What is Big Data ? Is it a Data Type ? No Its a buzzword - massive volume of structured and/or unstructured data. It is so large that it is difficult to process/analyze using traditional databases.
  • 4. What is Big Data ? Data that has following attributes can be ‘Big Data’
  • 5. So how Big is B - I - G ?
  • 6. So how Big is B - I - G ? Library of Congress - Textual Data 20 Terabytes (20 000 000 000 000 bytes)
  • 7. So how Big is B - I - G ? Amazon.com - Inventory &Customer Data 42 Terabytes (42 000 000 000 000 bytes)
  • 8. So how Big is B - I - G ? YouTube.com - Media Data 100+ Terabytes (100 000 000 000 000 bytes)
  • 9. So how Big is B - I - G ? Google.com - Search, Mail, Media & anything you can think of !! 850+ Terabytes (850 000 000 000 000 bytes) (Speculated Figures)
  • 10. So how Big is B - I - G ? World Data Center for Climate - Meteorology Data 6.2 Petabytes (7 000 000 000 000 000 bytes)
  • 11. Available Big Data Solutions & Issues - Highly Scalable and Distributed Computing. - Storage (HDFS) optimized for high throughput - Security, disabled by default - MapReduce is batch based, hence no real time operations. - Costly to maintain. - Highly Scalable, talks of handling Petabytes - Elastic set of resources to return result sets - Almost 10x fast as compared to Hadoop. - High costs of Data Migration and integration - Operations/Maintenance cost may shoot up
  • 12. Why Google BigQuery ? Hadoop (with Hive) Amazon Redshift Google BigQuery = 1.4 TB On an average its within 8-10 seconds !!
  • 13. Inside Google BigQuery ● BigQuery is based on Dremel, a technology pioneered by Google & extensively used within. ● It used Columnar storage & multi-level execution trees to achieve interactive performance for queries against multi-terabyte datasets. ● BigQuery's performance advantage comes from its parallel processing architecture. ● The query is processed by thousands of servers in a multi-level execution tree structure, with the final results aggregated at the root. BigQuery stores the data in a columnar format so that only data from the columns being queried are real. ● All this & more is now available as a publicly available service for any business or developer to use. This release made it possible for those outside of Google to utilize the power of Dremel for their Big Data processing requirements.
  • 15. Inside Google BigQuery There’s a difference ● Dremel is designed as an interactive data analysis tool for large datasets. ● MapReduce is designed as a programming framework to batch process large datasets Hey you mentioned Dremel, isn’t Map Reduce based on it ?
  • 16. Features & Components Features: ● Web GUI for BigQuery ● Affordable ● Run in Background ● Easy Data Importation ● Flexible (Addition of Columns, Native Support For Timestamp Type Of Data) ● REST API Support ● More than just Standard SQL Components: ● Project ● Tables ● DataSets ● Jobs
  • 17. RESTful API Method HTTP Request delete DELETE /projects/projectId/datasets/datasetId get GET /projects/projectId/datasets/datasetId insert POST /projects/projectId/datasets list GET /projects/projectId/datasets patch PATCH /projects/projectId/datasets/datasetId update PUT /projects/projectId/datasets/datasetId For Datasets
  • 18. RESTful API Method HTTP Request delete GET /projects/projectId/jobs/jobId getQueryR esults GET /projects/projectId/queries/jobId insert POST https://www.googleapis. com/upload/bigquery/v2/projects/p rojectId/jobs and POST /projects/projectId/jobs list GET /projects/projectId/jobs query POST /projects/projectId/queries For Jobs Similar methods for - ● Projects ● Tables ● TableData
  • 19. Demo using Web Interface
  • 20. Demo : Excel Connector +
  • 21. BigQuery Tools BigQuery Excel Connector bq Command LineBigQuery Browser Tool Virtualization & BI Tools ETL Tools ODBC Connector
  • 22. Big Data Solution with BigQuery
  • 23. Big Data Solution with BigQuery Data Pipeline - transforming and loading data into BigQuery The process of using the Google Cloud Platform to upload data into BigQuery involves uploading the CSV files or Javascript Object Notation (JSON) files to Google Cloud Storage before loading the data into BigQuery. Alternatively, REST API can also be used to provide programmatic integration into the current computing environment. Data Visualization - performing data analysis on BigQuery and visualizing the results A custom, web-based dashboard can be built on Google App Engine using the BigQuery REST API to execute the queries and using Google Chart Tools to visualize the results
  • 24. Pricing Model Action Example Loading Data Loading files/data into BigQuery Exporting Data Exporting data, Saving Results from BigQuery Table Reads Browsing through data Table Copies Copy existing table to new table Storage Action Cost Storage $0.020 per GB, per month. Streaming Inserts Free until January 1, 2015. After January 1, 2015, $0.01 per 100,000 rows Query Pricing Cost On-demand $5 per TB Reserved Capacity 5GB per second $20k/ month Wow that’s like 800MB for 1 Rupee, even Internet ain’t that cheap here.
  • 25. Where to use ? ● Not a replacement to traditional systems, but it compliments the eco-system !! ● Major strength is Handling Large DataSets ● Major usage in Data Analytics ● Important component of Google Cloud Platform ● People are interested in numbers/data and that too quick…. Google BigQuery is the future of Analytics!!
  • 26. Any questions ? What we covered ... ✓ What is Big Data ? ✓ Available Big Data Solutions & Issues ✓ Why Google BigQuery ? ✓ Features, Components & Tools ✓ RESTful API ✓ Demo using Web Interface ✓ Big Query Tools ✓ Big Data Solution with BigQuery ✓ Pricing Model ✓ Usage
  • 27. https://bigquery.cloud.google.com No registration, just sign-in with your Google account Follow Dharmesh Vaya on @DRVaya or subscribe to my http://drvaya.wordpress.com/ You can also add me on +DharmeshVaya About the presenter