SlideShare uma empresa Scribd logo
1 de 33
Baixar para ler offline
BigQuery Basics

Paris 2014
BigQuery Basics

Who? Why?
Ido Green
Solutions Architect
plus.google.com/greenido

greenido.wordpress.com
BigQuery Basics

Topics we cover in this lesson
●
●
●
●
●
●
●

BigQuery Overview
Typical Uses
Project Hierarchy
Access Control and Security
Datasets and Tables
Tools
Demos
BigQuery Basics

How does BigQuery fit in the analytics landscape?
● MapReduce based analysis can be slow for ad-hoc queries
● Managing data centers and tuning software takes time & money
● Analytics tools should be services
BigQuery Basics

Why BigQuery?
● Generate big data reports require expensive servers
and skilled database administrators
● Interacting with big data has been expensive, slow and
inefficient
● BigQuery changes all that
○ Reducing time and expense to query data
BigQuery Basics

What's BigQuery?
● Service for interactive analysis of massive datasets (TBs)
○ Query billions of rows: seconds to write, seconds to return
○ Uses a SQL-style query syntax
○ It's a service, accessed by a RESTful API
● Reliable and secure
○ Replicated across multiple sites
○ Secured through Access Control Lists
● Scalable
○ Store hundreds of terabytes
○ Pay only for what you use
● Fast (really)
○ Run ad hoc queries on multi-terabyte data sets in seconds
BigQuery Basics

Analyzing Large Amount of Data
.....at high speed

demobigquery.appspot.com
Uses
BigQuery Basics

Typical Uses
Analyzing query results using a visualization library such as Google
Charts Tools API
BigQuery Basics

Typical Uses
Another way to analyze query results with Google Spreadsheets
○

greenido.wordpress.com/2013/12/16/big-query-and-google-spreadsheet-intergration/

○

greenido.wordpress.com/2013/07/24/big-query-power-with-javascript/
BigQuery Basics

BigQuery Use Cases
● Log Analysis. Making sense of computer generated records
● Retailer. Using data to forecast product sales
● Ads Targeting. Targeting proper customer sections
● Sensor Data. Collect and visualize ambient data
● Data Mashup. Query terabytes of heterogeneous data
BigQuery Basics

Some Customer Case Studies
Uses BigQuery to hone ad targeting
and gain insights into their business
Dashboards using BigQuery to
analyze booking and inventory data

Use BigQuery to provide their
customers ways to expand game
engagement and find new channels for
monetization
Used BigQuery, App Engine and the
Visualizaton API to build a business
intelligence solution
BigQuery Basic Technical Details
BigQuery Basics

Project Hierarchy
● Project. All data in BigQuery belongs inside a project
○ Set of users, APIs, authentication, billing information
● Dataset. Holds one or more tables
○ Lowest access control unit (to which ACLs are applied)
● Table. Row-column structure that contains actual data
● Job. Used to start potentially long running queries
BigQuery Basics

Datasets and Tables
Table name is represented as
follows:
● Current Project
<dataset>.<table
name>
● Different Project
<project>:<dataset>.<table>

e.g. publicdata:samples.wikipedia
BigQuery Basics

Schema Example
● Demographics about names occurrence table schema
name:string,gender:string,count:integer
BigQuery Basics

Data Types
●
●
●
●
●

String
○ UTF-8 encoded, <64kB
Integer
○ 64 bit signed
Float
Boolean
○ "true" or "false", case insensitive
Timestamp
○ String format
■ YYYY-MM-DD HH:MM:SS[.sssss] [+/-][HH:MM]
○ Numeric format (seconds from UNIX epoch)
■ 1234567890, 1.234567890123456E9

(*) Max row size: 64kB
Date type is supported as timestamp
BigQuery Basics

Data Format
BigQuery supports the following format for loading data:
1. Comma Separated Values (CSV)
2. JSON
a. BigQuery can load data faster,
embedded newlines.
b. Supports nested/repeated data fields

if your data con
BigQuery Basics

Repeated and Nested Fields

[
[

Schema
example

{
{
"fields": [
"fields": [
{
{

Loading data with repeated and
nested fields is supported by
JSON data format only

"mode":
"mode":
"name":
"name":

"nullable",
"nullable",
"country",
"country",

"type": "string"
"type": "string"
},
},
{
{
"mode": "nullable",
"mode": "nullable",
"name": "city",
"name": "city",
"type": "string"
"type": "string"
}
}
],
],
"mode": "repeated",
"mode": "repeated",
"name": "location",
"name": "location",
"type": "record"
"type": "record"
},
},
...........
...........
BigQuery Basics

Accessing BigQuery
● BigQuery Web browser
○

Imports/exports data, runs
queries

● bq command line tool
○ Performs operations from
the command line

● Service API
○ RESTful API to access
BigQuery programmatically

○

Requires authorization by
OAuth2

○

Google client libraries for
Python, Java, JavaScript,
PHP, ...

○
BigQuery Basics

Third-party Tools
ETL tools for loading data into BigQuery

Visualization and Business Intelligence
BigQuery Basics

Example of Visualization Tools
Using commercial visualization tools to graph the query results
BigQuery Basics

Loading Data Using the Web Browser
●
●
●
●

Upload from local disk or from Cloud Storage
Start the Web browser
Select Dataset
Create table and follow the wizard steps
BigQuery Basics

Loading Data Using bq Tool
"bq load" command
Syntax
bq load [--source_format=NEWLINE_DELIMITED_JSON|CSV]
destination_table data_source_uri table_schema

●
●
●

●

If not specified, the default file format is CSV (comma separated values)
The files can also use newline delimited JSON format
Schema
○ Either a filename or a comma-separated list of column_name:datatype
pairs that describe the file format.
Data source may be on local machine or on Cloud Storage
BigQuery Basics

Load Limitations
● 1,000 import jobs per table per day
● 10,000 import jobs per project per day
● File size (for both CSV and JSON)
○ 1GB for compressed file
○ 1TB for uncompressed
■ 4GB for uncompressed CSV with newlines in strings
● 10,000 files per import job
● 1TB per import job
BigQuery Basics

A Few Best Practices
CSV/JSON must be split into chunks less than 1TB
● "split" command with --line-bytes option
● Split to smaller files
○ Easier error recovery
○ To smaller data unit (day, month instead of year)
● Uploading to Cloud Storage is recommended

Cloud Storage

BigQuery
BigQuery Basics

A Few Best Practices
● Split Tables by Dates
○ Minimize cost of data scanned
○ Minimize query time
● Upload Multiple Files to Cloud Storage
○ Allows parallel upload into BigQuery
● Denormalize your data
BigQuery Basics

Exercise & Questions
BigQuery Basics

Exercise
Work through Big Query Exercise 1 -- Basics
● Use the BigQuery UI
● Use the bq command line tool
● Upload a dataset
You will query the public sample GSOD (global summary of
day) weather dataset.
You will get and upload earthquake data.
BigQuery Basics

Questions
● What are the different ways to load data into
BigQuery?
● What is the maximum size of data in a BigQuery
table?
● How can we import data into BigQuery?
○ What's the limitation?
○ What formats does BigQuery accept?
BigQuery Basics

Google I/O Data Sensing
● Start the BigQuery Web browser
● Click on Display Project in the project chooser dialog window
● Enter data-sensing-lab when prompted
● In the dataset data-sensing-lab:io_sensor_data, select the table
moscone_io13
● In the New Query box, enter the following query:
SELECT * FROM [data-sensing-lab:io_sensor_data.moscone_io13] LIMIT 10

● Click Run Query button
● Scroll to see relevant results
BigQuery Basics

Data Structure
● Define table schema when creating table
● Data is stored in per-column structure
● Each column is handled separately and only combined when
necessary
Advantage of this data structure:
● No need to set index in advance
● Load only the relevant Columns
BigQuery Basics

Thank you!
Questions?

Mais conteúdo relacionado

Mais procurados

Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best PracticesMatillion
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementationSimon Su
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberXiang Fu
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsLynn Langit
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQueryRyuji Tamagawa
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsAndreas Raible
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Discover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementDiscover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementMárton Kodok
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperMárton Kodok
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfAlkin Tezuysal
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotReal-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotXiang Fu
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analyticshafeeznazri
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 

Mais procurados (20)

Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best Practices
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementation
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQuery
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & Benefits
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Discover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementDiscover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statement
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotReal-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache Pinot
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media AnalyticsBig Query - Utilizing Google Data Warehouse for Media Analytics
Big Query - Utilizing Google Data Warehouse for Media Analytics
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Vector database
Vector databaseVector database
Vector database
 

Semelhante a Big Query Basics

Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Ido Green
 
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use CasesTatvic Analytics
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQueryMárton Kodok
 
Quick Intro to Google Cloud Technologies
Quick Intro to Google Cloud TechnologiesQuick Intro to Google Cloud Technologies
Quick Intro to Google Cloud TechnologiesChris Schalk
 
Executive Intro to BigQuery
Executive Intro to BigQueryExecutive Intro to BigQuery
Executive Intro to BigQueryWilliam M. Cohee
 
Building Integrated Applications on Google's Cloud Technologies
Building Integrated Applications on Google's Cloud TechnologiesBuilding Integrated Applications on Google's Cloud Technologies
Building Integrated Applications on Google's Cloud TechnologiesChris Schalk
 
[Webinar] Interacting with BigQuery and Working with Advanced Queries
[Webinar] Interacting with BigQuery and Working with Advanced Queries[Webinar] Interacting with BigQuery and Working with Advanced Queries
[Webinar] Interacting with BigQuery and Working with Advanced QueriesTatvic Analytics
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...Márton Kodok
 
Google Cloud Platform 2014Q1 - Starter Guide
Google Cloud Platform   2014Q1 - Starter GuideGoogle Cloud Platform   2014Q1 - Starter Guide
Google Cloud Platform 2014Q1 - Starter GuideSimon Su
 
Introduction to Google's Cloud Technologies
Introduction to Google's Cloud TechnologiesIntroduction to Google's Cloud Technologies
Introduction to Google's Cloud TechnologiesChris Schalk
 
Intro to Google's Cloud Technologies
Intro to Google's Cloud TechnologiesIntro to Google's Cloud Technologies
Intro to Google's Cloud TechnologiesChris Schalk
 
Building Apps on Google Cloud Technologies
Building Apps on Google Cloud TechnologiesBuilding Apps on Google Cloud Technologies
Building Apps on Google Cloud TechnologiesChris Schalk
 
Data Provision API with BigQuery - Google Cloud Summit Jakarta 18
Data Provision API with BigQuery  - Google Cloud Summit Jakarta 18Data Provision API with BigQuery  - Google Cloud Summit Jakarta 18
Data Provision API with BigQuery - Google Cloud Summit Jakarta 18Imre Nagi
 
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanelA Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanelData Science Club
 
Implementing google big query automation using google analytics data
Implementing google big query automation using google analytics dataImplementing google big query automation using google analytics data
Implementing google big query automation using google analytics dataCountants
 
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryVoxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryMárton Kodok
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Christopher Gutknecht
 
Building Integrated Applications on Google's Cloud Technologies
Building Integrated Applications on Google's Cloud TechnologiesBuilding Integrated Applications on Google's Cloud Technologies
Building Integrated Applications on Google's Cloud TechnologiesChris Schalk
 

Semelhante a Big Query Basics (20)

Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)
 
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
[Webinar] Getting Started with BigQuery: Basics, Its Appilcations & Use Cases
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuery
 
Quick Intro to Google Cloud Technologies
Quick Intro to Google Cloud TechnologiesQuick Intro to Google Cloud Technologies
Quick Intro to Google Cloud Technologies
 
Executive Intro to BigQuery
Executive Intro to BigQueryExecutive Intro to BigQuery
Executive Intro to BigQuery
 
Building Integrated Applications on Google's Cloud Technologies
Building Integrated Applications on Google's Cloud TechnologiesBuilding Integrated Applications on Google's Cloud Technologies
Building Integrated Applications on Google's Cloud Technologies
 
[Webinar] Interacting with BigQuery and Working with Advanced Queries
[Webinar] Interacting with BigQuery and Working with Advanced Queries[Webinar] Interacting with BigQuery and Working with Advanced Queries
[Webinar] Interacting with BigQuery and Working with Advanced Queries
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 
Google Cloud Platform 2014Q1 - Starter Guide
Google Cloud Platform   2014Q1 - Starter GuideGoogle Cloud Platform   2014Q1 - Starter Guide
Google Cloud Platform 2014Q1 - Starter Guide
 
Introduction to Google's Cloud Technologies
Introduction to Google's Cloud TechnologiesIntroduction to Google's Cloud Technologies
Introduction to Google's Cloud Technologies
 
Intro to Google's Cloud Technologies
Intro to Google's Cloud TechnologiesIntro to Google's Cloud Technologies
Intro to Google's Cloud Technologies
 
Building Apps on Google Cloud Technologies
Building Apps on Google Cloud TechnologiesBuilding Apps on Google Cloud Technologies
Building Apps on Google Cloud Technologies
 
Data Provision API with BigQuery - Google Cloud Summit Jakarta 18
Data Provision API with BigQuery  - Google Cloud Summit Jakarta 18Data Provision API with BigQuery  - Google Cloud Summit Jakarta 18
Data Provision API with BigQuery - Google Cloud Summit Jakarta 18
 
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanelA Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
A Big (Query) Frog in a Small Pond, Jakub Motyl, BuffPanel
 
Implementing google big query automation using google analytics data
Implementing google big query automation using google analytics dataImplementing google big query automation using google analytics data
Implementing google big query automation using google analytics data
 
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryVoxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
 
Building Integrated Applications on Google's Cloud Technologies
Building Integrated Applications on Google's Cloud TechnologiesBuilding Integrated Applications on Google's Cloud Technologies
Building Integrated Applications on Google's Cloud Technologies
 
Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
 

Mais de Ido Green

How to get things done - Lessons from Yahoo, Google, Netflix and Meta
How to get things done - Lessons from Yahoo, Google, Netflix and Meta How to get things done - Lessons from Yahoo, Google, Netflix and Meta
How to get things done - Lessons from Yahoo, Google, Netflix and Meta Ido Green
 
Crypto 101 and a bit more [Sep-2022]
Crypto 101 and a bit more [Sep-2022]Crypto 101 and a bit more [Sep-2022]
Crypto 101 and a bit more [Sep-2022]Ido Green
 
The Future of Continuous Software Updates Is Here
The Future of Continuous Software Updates Is HereThe Future of Continuous Software Updates Is Here
The Future of Continuous Software Updates Is HereIdo Green
 
Open Source & DevOps Market trends - Open Core Summit
Open Source & DevOps Market trends - Open Core SummitOpen Source & DevOps Market trends - Open Core Summit
Open Source & DevOps Market trends - Open Core SummitIdo Green
 
DevOps as a competitive advantage
DevOps as a competitive advantageDevOps as a competitive advantage
DevOps as a competitive advantageIdo Green
 
Data Driven DevOps & Technologies (swampUP 2019 keynote)
Data Driven DevOps & Technologies (swampUP 2019 keynote)Data Driven DevOps & Technologies (swampUP 2019 keynote)
Data Driven DevOps & Technologies (swampUP 2019 keynote)Ido Green
 
Create An Amazing Apps For The Google Assistant!
Create An Amazing Apps For The Google Assistant!Create An Amazing Apps For The Google Assistant!
Create An Amazing Apps For The Google Assistant!Ido Green
 
Google Assistant - Why? How?
Google Assistant - Why? How?Google Assistant - Why? How?
Google Assistant - Why? How?Ido Green
 
The Google Assistant - Macro View (October 2017)
The Google Assistant - Macro View (October 2017)The Google Assistant - Macro View (October 2017)
The Google Assistant - Macro View (October 2017)Ido Green
 
Actions On Google - GDD Europe 2017
Actions On Google - GDD Europe 2017Actions On Google - GDD Europe 2017
Actions On Google - GDD Europe 2017Ido Green
 
Building conversational experiences with Actions on Google
Building conversational experiences with Actions on GoogleBuilding conversational experiences with Actions on Google
Building conversational experiences with Actions on GoogleIdo Green
 
Actions On Google - How? Why?
Actions On Google - How? Why?Actions On Google - How? Why?
Actions On Google - How? Why?Ido Green
 
Startups Best Practices
Startups Best PracticesStartups Best Practices
Startups Best PracticesIdo Green
 
Progressive Web Apps For Startups
Progressive Web Apps For StartupsProgressive Web Apps For Startups
Progressive Web Apps For StartupsIdo Green
 
Earn More Revenue With Firebase and AdMob
Earn More Revenue With Firebase and AdMobEarn More Revenue With Firebase and AdMob
Earn More Revenue With Firebase and AdMobIdo Green
 
How To Grow Your User Base?
How To Grow Your User Base?How To Grow Your User Base?
How To Grow Your User Base?Ido Green
 
Amp Overview #YGLF 2016
Amp Overview #YGLF 2016Amp Overview #YGLF 2016
Amp Overview #YGLF 2016Ido Green
 
AMP - Accelerated Mobile Pages
AMP - Accelerated Mobile PagesAMP - Accelerated Mobile Pages
AMP - Accelerated Mobile PagesIdo Green
 
From AMP to PWA
From AMP to PWAFrom AMP to PWA
From AMP to PWAIdo Green
 

Mais de Ido Green (20)

How to get things done - Lessons from Yahoo, Google, Netflix and Meta
How to get things done - Lessons from Yahoo, Google, Netflix and Meta How to get things done - Lessons from Yahoo, Google, Netflix and Meta
How to get things done - Lessons from Yahoo, Google, Netflix and Meta
 
Crypto 101 and a bit more [Sep-2022]
Crypto 101 and a bit more [Sep-2022]Crypto 101 and a bit more [Sep-2022]
Crypto 101 and a bit more [Sep-2022]
 
The Future of Continuous Software Updates Is Here
The Future of Continuous Software Updates Is HereThe Future of Continuous Software Updates Is Here
The Future of Continuous Software Updates Is Here
 
Open Source & DevOps Market trends - Open Core Summit
Open Source & DevOps Market trends - Open Core SummitOpen Source & DevOps Market trends - Open Core Summit
Open Source & DevOps Market trends - Open Core Summit
 
DevOps as a competitive advantage
DevOps as a competitive advantageDevOps as a competitive advantage
DevOps as a competitive advantage
 
Data Driven DevOps & Technologies (swampUP 2019 keynote)
Data Driven DevOps & Technologies (swampUP 2019 keynote)Data Driven DevOps & Technologies (swampUP 2019 keynote)
Data Driven DevOps & Technologies (swampUP 2019 keynote)
 
Create An Amazing Apps For The Google Assistant!
Create An Amazing Apps For The Google Assistant!Create An Amazing Apps For The Google Assistant!
Create An Amazing Apps For The Google Assistant!
 
VUI Design
VUI DesignVUI Design
VUI Design
 
Google Assistant - Why? How?
Google Assistant - Why? How?Google Assistant - Why? How?
Google Assistant - Why? How?
 
The Google Assistant - Macro View (October 2017)
The Google Assistant - Macro View (October 2017)The Google Assistant - Macro View (October 2017)
The Google Assistant - Macro View (October 2017)
 
Actions On Google - GDD Europe 2017
Actions On Google - GDD Europe 2017Actions On Google - GDD Europe 2017
Actions On Google - GDD Europe 2017
 
Building conversational experiences with Actions on Google
Building conversational experiences with Actions on GoogleBuilding conversational experiences with Actions on Google
Building conversational experiences with Actions on Google
 
Actions On Google - How? Why?
Actions On Google - How? Why?Actions On Google - How? Why?
Actions On Google - How? Why?
 
Startups Best Practices
Startups Best PracticesStartups Best Practices
Startups Best Practices
 
Progressive Web Apps For Startups
Progressive Web Apps For StartupsProgressive Web Apps For Startups
Progressive Web Apps For Startups
 
Earn More Revenue With Firebase and AdMob
Earn More Revenue With Firebase and AdMobEarn More Revenue With Firebase and AdMob
Earn More Revenue With Firebase and AdMob
 
How To Grow Your User Base?
How To Grow Your User Base?How To Grow Your User Base?
How To Grow Your User Base?
 
Amp Overview #YGLF 2016
Amp Overview #YGLF 2016Amp Overview #YGLF 2016
Amp Overview #YGLF 2016
 
AMP - Accelerated Mobile Pages
AMP - Accelerated Mobile PagesAMP - Accelerated Mobile Pages
AMP - Accelerated Mobile Pages
 
From AMP to PWA
From AMP to PWAFrom AMP to PWA
From AMP to PWA
 

Último

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Big Query Basics

  • 2. BigQuery Basics Who? Why? Ido Green Solutions Architect plus.google.com/greenido greenido.wordpress.com
  • 3. BigQuery Basics Topics we cover in this lesson ● ● ● ● ● ● ● BigQuery Overview Typical Uses Project Hierarchy Access Control and Security Datasets and Tables Tools Demos
  • 4. BigQuery Basics How does BigQuery fit in the analytics landscape? ● MapReduce based analysis can be slow for ad-hoc queries ● Managing data centers and tuning software takes time & money ● Analytics tools should be services
  • 5. BigQuery Basics Why BigQuery? ● Generate big data reports require expensive servers and skilled database administrators ● Interacting with big data has been expensive, slow and inefficient ● BigQuery changes all that ○ Reducing time and expense to query data
  • 6. BigQuery Basics What's BigQuery? ● Service for interactive analysis of massive datasets (TBs) ○ Query billions of rows: seconds to write, seconds to return ○ Uses a SQL-style query syntax ○ It's a service, accessed by a RESTful API ● Reliable and secure ○ Replicated across multiple sites ○ Secured through Access Control Lists ● Scalable ○ Store hundreds of terabytes ○ Pay only for what you use ● Fast (really) ○ Run ad hoc queries on multi-terabyte data sets in seconds
  • 7. BigQuery Basics Analyzing Large Amount of Data .....at high speed demobigquery.appspot.com
  • 9. BigQuery Basics Typical Uses Analyzing query results using a visualization library such as Google Charts Tools API
  • 10. BigQuery Basics Typical Uses Another way to analyze query results with Google Spreadsheets ○ greenido.wordpress.com/2013/12/16/big-query-and-google-spreadsheet-intergration/ ○ greenido.wordpress.com/2013/07/24/big-query-power-with-javascript/
  • 11. BigQuery Basics BigQuery Use Cases ● Log Analysis. Making sense of computer generated records ● Retailer. Using data to forecast product sales ● Ads Targeting. Targeting proper customer sections ● Sensor Data. Collect and visualize ambient data ● Data Mashup. Query terabytes of heterogeneous data
  • 12. BigQuery Basics Some Customer Case Studies Uses BigQuery to hone ad targeting and gain insights into their business Dashboards using BigQuery to analyze booking and inventory data Use BigQuery to provide their customers ways to expand game engagement and find new channels for monetization Used BigQuery, App Engine and the Visualizaton API to build a business intelligence solution
  • 14. BigQuery Basics Project Hierarchy ● Project. All data in BigQuery belongs inside a project ○ Set of users, APIs, authentication, billing information ● Dataset. Holds one or more tables ○ Lowest access control unit (to which ACLs are applied) ● Table. Row-column structure that contains actual data ● Job. Used to start potentially long running queries
  • 15. BigQuery Basics Datasets and Tables Table name is represented as follows: ● Current Project <dataset>.<table name> ● Different Project <project>:<dataset>.<table> e.g. publicdata:samples.wikipedia
  • 16. BigQuery Basics Schema Example ● Demographics about names occurrence table schema name:string,gender:string,count:integer
  • 17. BigQuery Basics Data Types ● ● ● ● ● String ○ UTF-8 encoded, <64kB Integer ○ 64 bit signed Float Boolean ○ "true" or "false", case insensitive Timestamp ○ String format ■ YYYY-MM-DD HH:MM:SS[.sssss] [+/-][HH:MM] ○ Numeric format (seconds from UNIX epoch) ■ 1234567890, 1.234567890123456E9 (*) Max row size: 64kB Date type is supported as timestamp
  • 18. BigQuery Basics Data Format BigQuery supports the following format for loading data: 1. Comma Separated Values (CSV) 2. JSON a. BigQuery can load data faster, embedded newlines. b. Supports nested/repeated data fields if your data con
  • 19. BigQuery Basics Repeated and Nested Fields [ [ Schema example { { "fields": [ "fields": [ { { Loading data with repeated and nested fields is supported by JSON data format only "mode": "mode": "name": "name": "nullable", "nullable", "country", "country", "type": "string" "type": "string" }, }, { { "mode": "nullable", "mode": "nullable", "name": "city", "name": "city", "type": "string" "type": "string" } } ], ], "mode": "repeated", "mode": "repeated", "name": "location", "name": "location", "type": "record" "type": "record" }, }, ........... ...........
  • 20. BigQuery Basics Accessing BigQuery ● BigQuery Web browser ○ Imports/exports data, runs queries ● bq command line tool ○ Performs operations from the command line ● Service API ○ RESTful API to access BigQuery programmatically ○ Requires authorization by OAuth2 ○ Google client libraries for Python, Java, JavaScript, PHP, ... ○
  • 21. BigQuery Basics Third-party Tools ETL tools for loading data into BigQuery Visualization and Business Intelligence
  • 22. BigQuery Basics Example of Visualization Tools Using commercial visualization tools to graph the query results
  • 23. BigQuery Basics Loading Data Using the Web Browser ● ● ● ● Upload from local disk or from Cloud Storage Start the Web browser Select Dataset Create table and follow the wizard steps
  • 24. BigQuery Basics Loading Data Using bq Tool "bq load" command Syntax bq load [--source_format=NEWLINE_DELIMITED_JSON|CSV] destination_table data_source_uri table_schema ● ● ● ● If not specified, the default file format is CSV (comma separated values) The files can also use newline delimited JSON format Schema ○ Either a filename or a comma-separated list of column_name:datatype pairs that describe the file format. Data source may be on local machine or on Cloud Storage
  • 25. BigQuery Basics Load Limitations ● 1,000 import jobs per table per day ● 10,000 import jobs per project per day ● File size (for both CSV and JSON) ○ 1GB for compressed file ○ 1TB for uncompressed ■ 4GB for uncompressed CSV with newlines in strings ● 10,000 files per import job ● 1TB per import job
  • 26. BigQuery Basics A Few Best Practices CSV/JSON must be split into chunks less than 1TB ● "split" command with --line-bytes option ● Split to smaller files ○ Easier error recovery ○ To smaller data unit (day, month instead of year) ● Uploading to Cloud Storage is recommended Cloud Storage BigQuery
  • 27. BigQuery Basics A Few Best Practices ● Split Tables by Dates ○ Minimize cost of data scanned ○ Minimize query time ● Upload Multiple Files to Cloud Storage ○ Allows parallel upload into BigQuery ● Denormalize your data
  • 29. BigQuery Basics Exercise Work through Big Query Exercise 1 -- Basics ● Use the BigQuery UI ● Use the bq command line tool ● Upload a dataset You will query the public sample GSOD (global summary of day) weather dataset. You will get and upload earthquake data.
  • 30. BigQuery Basics Questions ● What are the different ways to load data into BigQuery? ● What is the maximum size of data in a BigQuery table? ● How can we import data into BigQuery? ○ What's the limitation? ○ What formats does BigQuery accept?
  • 31. BigQuery Basics Google I/O Data Sensing ● Start the BigQuery Web browser ● Click on Display Project in the project chooser dialog window ● Enter data-sensing-lab when prompted ● In the dataset data-sensing-lab:io_sensor_data, select the table moscone_io13 ● In the New Query box, enter the following query: SELECT * FROM [data-sensing-lab:io_sensor_data.moscone_io13] LIMIT 10 ● Click Run Query button ● Scroll to see relevant results
  • 32. BigQuery Basics Data Structure ● Define table schema when creating table ● Data is stored in per-column structure ● Each column is handled separately and only combined when necessary Advantage of this data structure: ● No need to set index in advance ● Load only the relevant Columns