SlideShare uma empresa Scribd logo
1 de 68
Baixar para ler offline
Google Cloud for
Data Crunchers
Patrick Chanezon, Developer Advocate, Cloud
@chanezon, chanezon@google.com

Ryan Boyd, Developer Advocate, Apps
@ryguyrg, rboyd@google.com

Kirrily Robert, Data Engineer, Freebase.com
@skud, skud@google.com
Agenda

•   Google App Engine
•   Google Storage for Developers
•   Prediction API
•   BigQuery
•   Google Fusion Tables
•   Google Refine




                                    Google Developer Day 2010
Google App Engine




                    Google Developer Day 2010
What is
  cloud
computing?




   3
Cloud Computing Defined



         SaaS

         PaaS


          IaaS

                          Source: Gartner AADI Summit Dec 2009
                                   Google Developer Day 2010
Google's Cloud Offerings
 Your Apps
                    1. Google Apps
                    2. Third party Apps:
                       Google Apps Marketplace
             SaaS   3. ________


                             Google App Engine
             PaaS
                                Google Storage
             IaaS                Prediction API
                                     BigQuery

                               Google Developer Day 2010
Google App Engine

 - Easy to build
 - Easy to maintain
 - Easy to scale




                      7
Cloud development in a box
• SDK & “The Cloud”
• Hardware
• Networking
• Operating system
• Application runtime
    o Java, Python
•   Static file serving
•   Services
•   Fault tolerance
•   Load balancing



                          8
App Engine Services


   Memcache   Datastore   URL Fetch




     Mail       XMPP      Task Queue




    Images    Blobstore   User Service


                  9
Always free to get started

  ~5M pageviews/month
   • 6.5 CPU hrs/day
   • 1 GB storage
   • 650K URL Fetch calls/day
   • 2,000 recipients emailed
   • 1 GB/day bandwidth
   • 100,000 tasks enqueued
   • 650K XMPP messages/day
                                10
Purchase additional resources *




 * free monthly quota of ~5 million page views still in full effect
                                                                      11
Google App Engine for Business
   Same scalable cloud hosting platform. Designed for the enterprise.

    • Enterprise application management
        – Centralized domain console
    • Enterprise reliability and support
        – 99.9% Service Level Agreement
        – Premium Developer Support
    • Hosted SQL
        – Managed relational SQL database in the cloud
    • SSL on your domain
        – Including "naked" domain support
    • Secure by default
        – Integrated Single Sign On (SSO)
    • Pricing that makes sense                                     Google App Engine
                                                                     for Business
        – Pay only for what you use
* Hosted SQL and SSL on your domain available later this year
                                                                Google Developer Day 2010
App Engine for Data Crunchers

• High Performance Image Serving
• OpenId/Oauth integration
• Increased quotas
    • > 1k entities per query
    • 10’’ task queues
• Async UrlFetch
• Mapper API (Reduce coming soon)
• Channel API
• Matcher API




                                    Google Developer Day 2010
Mapper API
 • First component of App Engine’s MapReduce toolkit
 • Large scale data manipulation
 • Examples include:
    • Report generation
    • Computing statistics and metrics …
 • Python Example:
    • http://blog.notdot.net/2010/05/Exploring-the-new-mapper-API
 • Java Example:
    • http://ikaisays.com/2010/07/09/using-the-java-mapper-framework-for-app-
    engine/




                                                          Google Developer Day 2010
Channel API
• Allows for Server Push (Comet) to browser
 • Blog post announcement:
     • http://googleappengine.blogspot.com/2010/05/app-engine-at-google-
     io-2010.html
 • External coverage:
     • Sneak Peak from an early trusted tester
     • http://bitshaq.com/2010/09/01/sneak-peak-gae-channel-api/
• Demo code for Dance Dance Robot available here:
 • http://code.google.com/p/dance-dance-robot/
 • Also see: https://groups.google.com/group/google-appengine-java/
 browse_thread/thread/6fa09953ffae2cd3/c1db7de5fdb82b65?pli=1#




                                                           Google Developer Day 2010
Matcher API
• Allows an app to register a set of queries to match against a
stream of documents
 • Trustes Testers, Python only
 • Group post announcement:
     • http://groups.google.com/group/google-appengine/msg/40021537e2e58962
 • Docs:
     • http://code.google.com/p/google-app-engine-samples/wiki/
     AppEngineMatcherService
• Demo code:
 • http://code.google.com/p/google-app-engine-samples/source/browse/#svn/trunk/
 matcher-sample




                                                           Google Developer Day 2010
Google Storage for Developers
Store your data in Google's cloud




                                    Google Developer Day 2010
What Is Google Storage?

• Store your data in Google's cloud
  o any format, any amount, any time

• You control access to your data
  o private, shared, or public

• Access via Google APIs or 3rd party tools/libraries




                                           Google Developer Day 2010
Google Storage Technical Details

RESTful API 
• Verbs: GET, PUT, POST, HEAD, DELETE 
• Resources: identified by URI, like:
   http://commondatastorage.googleapis.com/bucket/object
• Compatible with S3 
Buckets
• Flat containers (no bucket hierarchy)




                                            Google Developer Day 2010
Performance and Scalability
Object types and size
• Objects of any type and 100GB+ / Object
• Unlimited numbers of objects, 1000s of buckets
• Range-get support for data retrieval

Replication
• All data replicated to multiple US data centers
• Leveraging Google's worldwide network for data delivery

Consistency
• “Read-your-writes” data consistency




                                            Google Developer Day 2010
Security and Privacy Features
Authenticated downloads from a web browser
• Sharing with individuals
• Group sharing via Google Groups
• Sharing with Google Apps domains

Permissions set on Buckets or Objects
• READ (an object, or list a bucket’s contents)
• WRITE (applicable to buckets, allows upload/delete/etc)
• FULL_CONTROL (read/write ACLs on objects or buckets)




                                           Google Developer Day 2010
Tools
         Google Storage Manager




gsutil




                                  Google Developer Day 2010
Google Storage Benefits
        High Performance and Scalability
        Backed by Google infrastructure




           Strong Security and Privacy
           Control access to your data



      Easy to Use
      Get started fast with Google & 3rd party tools

                                      Google Developer Day 2010
Some Early Google Storage Adopters




                          Google Developer Day 2010
Google Storage usage within Google

         Google                           Google
        BigQuery                       Prediction API




                                  Haiti Relief Imagery          USPTO data




                   Partner Reporting   Partner Reporting


                                                         Google Developer Day 2010
Google Storage - Availability
Limited preview in US* currently
• 100GB free storage and network per account
• Sign up for wait list at
      • http://code.google.com/apis/storage/




* Non-US preview available on case-by-case basis
                                                   Google Developer Day 2010
Google Prediction API
Google's prediction engine in the cloud




                                          Google Developer Day 2010
Introducing the Google Prediction API

• Google's sophisticated machine learning technology
• Available as an on-demand RESTful HTTP web service




                                               Google Developer Day 2010
A virtually endless number of applications...


 Customer    Transaction        Species             Message         Diagnostics
 Sentiment      Risk          Identification        Routing




  Churn      Legal Docket      Suspicious          Work Roster     Inappropriate
Prediction   Classification     Activity           Assignment         Content




Recommend      Political         Uplift              Email           Career
 Products       Bias            Marketing           Filtering       Counseling


                           ... and many more ...
                                                           Google Developer Day 2010
How does it work?
1. TRAIN                         The quick brown fox jumped over the
                     "english"
The Prediction API               lazy dog.
finds relevant                   To err is human, but to really foul things
features in the      "english"
                                 up you need a computer.
sample data during
                     "spanish"   No hay mal que por bien no venga.
training.
                     "spanish"   La tercera es la vencida.



2. PREDICT                       To be or not to be, that is the
                     ?
The Prediction API               question.
later searches for   ?           La fe mueve montañas.
those features
during prediction.

                                                    Google Developer Day 2010
Introducing the Google Prediction API




                            Google Developer Day 2010
A Prediction API Example
Automatically determine application recommendations


• Goal: Increase relevancy on the Apps Marketplace via
  recommendations
• Customers: Businesses of various sizes and industries
  using Google Apps around the world
• Data: Sampling of previous installs of applications
• Outcome: Predict applications which would be
  appropriate for a new customer visiting the site




                                            Google Developer Day 2010
Using the Prediction API
A simple three step process...

                                 Upload your training data to
              1. Upload          Google Storage



                                 Build a model from your data
               2. Train



              3. Predict         Make new predictions


                                                Google Developer Day 2010
Step 1: Upload
 Upload your training data to Google Storage

• Training data: outputs and input features
• Data format: comma separated value format (CSV), result in first column

  "SlideRocket","EDUCATION","us","en","10","5"
  "MailChimp","BUSINESS","us","en","7","0"
  "MailChimp","STANDARD","se","sv","1","0"
  "Smartsheet","BUSINESS","us","en","13","4"


  Upload to Google Storage

  gsutil cp installs gs://appdata/



                                                    Google Developer Day 2010
Step 2: Train
Create a new model by training on data

To train a model:

POST prediction/v1.1/training?data=appdata%2Finstalls


Training runs asynchronously. To see if it has finished:
GET prediction/v1.1/training/appdata%2Finstalls

{"data":{
   "data":"appdata/installs",
   "modelinfo":"estimated accuracy: 0.xx"}}}



                                              Google Developer Day 2010
Step 3: Predict
 Apply the trained model to make predictions on new data
POST prediction/v1.1/query/appdata%2Finstalls/predict
{ "data":{
    "input": { "mixture" : [
      "EDUCATION","us","en","10","0" ]}}}

{ data : {
  "kind" : "prediction#output",
  "outputLabel":"Manymoon",
  "outputMulti" :[
     {"label":"OffiSync", "score": x.xx}
     {"label":"Zoho CRM", "score": x.xx}
     {"label":"MailChimp", "score": x.xx}]}}




                                             Google Developer Day 2010
Demo!
    Google Developer Day 2010
Demo Screenshots




Predicting apps for a 501-1,000 seat educational institution
                                                   Google Developer Day 2010
Demo Screenshots




Predicting apps for a 501-1,000 seat educational institution
                                                   Google Developer Day 2010
Demo Screenshots




    Predicting apps for a small business
                                           Google Developer Day 2010
Demo Screenshots




    Predicting apps for a small business
                                           Google Developer Day 2010
Prediction API Capabilities
Data
• Input Features: numeric or unstructured text
• Output: up to hundreds of discrete categories, or
  continuous values

Training
• Many machine learning techniques
• Automatically selected
• Performed asynchronously

Access from many platforms:
• Web app from Google App Engine
• Apps Script (e.g. from Google Spreadsheet)
• Desktop app

                                             Google Developer Day 2010
Prediction API - Pricing
Free Quota in trial/development
• 100 predictions/day, 5MB trained/day
• Available for 6 months

Paid Usage
• $10/month per project includes 10,000 predictions
• Additional predictions are $0.50 per 1,000
• Absolute limit of 60,000 predictions per day
• $0.002 per MB trained (max size per dataset is 100MB)




                                            Google Developer Day 2010
Google Storage - Availability
Limited preview in US* currently
• Sign up for wait list at
      • http://code.google.com/apis/predict/




* Non-US preview available on case-by-case basis
                                                   Google Developer Day 2010
Google BigQuery
Interactive analysis of large datasets in Google's cloud




                                              Google Developer Day 2010
Introducing Google BigQuery
 – Google's large data adhoc analysis technology
   • Analyze massive amounts of data in seconds
 – Simple SQL-like query language
 – Flexible access
   • REST APIs, JSON-RPC, Google Apps Script




                                                   Google Developer Day 2010
Why BigQuery?
Working with large data is a challenge




                                         Google Developer Day 2010
Many Use Cases ...



                                       Trends
  Interactive          Spam
                                      Detection
     Tools




             Web               Network
          Dashboards          Optimization

                                    Google Developer Day 2010
Key Capabilities of BigQuery
• Scalable: Billions of rows

• Fast: Response in seconds

• Simple: Queries in SQL

• Web Service
  o REST
  o JSON-RPC
  o Google App Scripts




                               Google Developer Day 2010
Using BigQuery
Another simple three step process...

                              Upload your raw data to
              1. Upload       Google Storage



                             Import raw data into
              2. Import
                             BigQuery table



              3. Query       Perform SQL queries
                             on table

                                            Google Developer Day 2010
Writing Queries
Compact subset of SQL
  o SELECT ... FROM ...
    WHERE ...
    GROUP BY ... ORDER BY ...
    LIMIT ...;

Common functions
  o Math, String, Time, ...

Additional statistical approximations
  o TOP
  o COUNT DISTINCT




                                        Google Developer Day 2010
BigQuery via REST
GET /bigquery/v1/tables/{table name}

GET /bigquery/v1/query?q={query}
Sample JSON Reply:
{
    "results": {
      "fields": { [
         {"id":"COUNT(*)","type":"uint64"}, ... ]
      },
      "rows": [
         {"f":[{"v":"2949"}, ...]},
         {"f":[{"v":"5387"}, ...]}, ... ]
    }
}

Also supports JSON-RPC



                                                    Google Developer Day 2010
Security and Privacy
Standard Google Authentication
• Client Login
• OAuth
• AuthSub

HTTPS support
• protects your credentials
• protects your data

Relies on Google Storage to manage access




                                            Google Developer Day 2010
Large Data Analysis Example
Wikimedia Revision History




Wikimedia Revision history data from:
http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z
                                                                             Google Developer Day 2010
Using BigQuery Shell
Python DB API 2.0 + B. Clapper's sqlcmd
http://www.clapper.org/software/python/sqlcmd/




                                                 Google Developer Day 2010
BigQuery from a Spreadsheet




                     Google Developer Day 2010
Google Fusion Tables




                       Google Developer Day 2010
Google Fusion Tables

• Manage large collections of tabular data in the cloud
    • 100 Mb tables
    • Filters, Aggregation, Merge
    • ACL, Collaboration, Discuss Data
    • Visualizations
• REST API
    • Geo queries
• Maps Integration
    • FusionTablesLayer




                                               Google Developer Day 2010
Google Fusion Tables




                       Google Developer Day 2010
Google Visualization API




                           Google Developer Day 2010
Google Visualization API

• Collection of JavaScript Visualization components
    • Some from Google (Chart Tools)
    • Some from other developers
    • Share the same wire protocol for Data Sources




                                             Google Developer Day 2010
Example: Weather data

• US National Climatic Data Center
    • weather data at stations around the globe since 1929
    • Stored in Google Storage
    • Created a Table for Bigquery
    • Upload Weather Station coordinates in Fusion Tables
    • App Engine App
       • Maps API to display weather station Maps
       • Bigquery to query average temperature in January
       • A bit of Python to create a JSON Data Source
       • Visualization API
• Just an example: rince, repeat, enhance!



                                             Google Developer Day 2010
Example: Weather data




                        Google Developer Day 2010
Google Refine




                Google Developer Day 2010
Google Refine

• Power tool for working with messy data
     • Cleanup
     • Transform
     • Augment
     • (Link with FreeBase)
• Desktop software for now
• http://code.google.com/p/google-refine/




                                            Google Developer Day 2010
Google Refine




                Google Developer Day 2010
Recap
• Google App Engine
  o Easy to build, deploy and manage web apps
• Google Storage
  o High speed data storage on Google Cloud
• Prediction API
  o Google's machine learning technology
• BigQuery
  o Interactive analysis of very large data sets
• Google Fusion Tables
  o Manage collections of tabular data in the cloud
• Google Refine
  o Power tool for working with messy data
• Google Visualization
  o Collection of JavaScript Visualization


                                             Google Developer Day 2010
More information
http://code.google.com/apis/
http://code.google.com/more/table/




                               Google Developer Day 2010

Mais conteúdo relacionado

Mais procurados

Mandy Waite, Warszawa marzec 2013
Mandy Waite, Warszawa marzec 2013Mandy Waite, Warszawa marzec 2013
Mandy Waite, Warszawa marzec 2013
GeekGirlsCarrots
 
node.js on Google Compute Engine
node.js on Google Compute Enginenode.js on Google Compute Engine
node.js on Google Compute Engine
Arun Nagarajan
 
Image Compression Storage Policy for Openstack Swift
Image Compression Storage Policy for Openstack SwiftImage Compression Storage Policy for Openstack Swift
Image Compression Storage Policy for Openstack Swift
Matthew Chang
 

Mais procurados (12)

Building Reactive Applications With Akka And Java
Building Reactive Applications With Akka And JavaBuilding Reactive Applications With Akka And Java
Building Reactive Applications With Akka And Java
 
Introduction to Google Cloud
Introduction to Google CloudIntroduction to Google Cloud
Introduction to Google Cloud
 
Google Compute Engine Starter Guide
Google Compute Engine Starter GuideGoogle Compute Engine Starter Guide
Google Compute Engine Starter Guide
 
30 days of Google Cloud Introduction
30 days of Google Cloud Introduction30 days of Google Cloud Introduction
30 days of Google Cloud Introduction
 
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic TrainingGCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
GCP - GCE, Cloud SQL, Cloud Storage, BigQuery Basic Training
 
Mandy Waite, Warszawa marzec 2013
Mandy Waite, Warszawa marzec 2013Mandy Waite, Warszawa marzec 2013
Mandy Waite, Warszawa marzec 2013
 
node.js on Google Compute Engine
node.js on Google Compute Enginenode.js on Google Compute Engine
node.js on Google Compute Engine
 
Image Compression Storage Policy for Openstack Swift
Image Compression Storage Policy for Openstack SwiftImage Compression Storage Policy for Openstack Swift
Image Compression Storage Policy for Openstack Swift
 
Cloud platform overview for camping
Cloud platform overview for campingCloud platform overview for camping
Cloud platform overview for camping
 
Big Data at DYNO
Big Data at DYNOBig Data at DYNO
Big Data at DYNO
 
SEC302 Twitter's GCP Architecture for its petabyte scale data storage in gcs...
SEC302  Twitter's GCP Architecture for its petabyte scale data storage in gcs...SEC302  Twitter's GCP Architecture for its petabyte scale data storage in gcs...
SEC302 Twitter's GCP Architecture for its petabyte scale data storage in gcs...
 
Are you ready for Drupal 8?
Are you ready for Drupal 8?Are you ready for Drupal 8?
Are you ready for Drupal 8?
 

Destaque

Destaque (10)

Devday 2014 using_afs_in_your_cloud_app
Devday 2014 using_afs_in_your_cloud_appDevday 2014 using_afs_in_your_cloud_app
Devday 2014 using_afs_in_your_cloud_app
 
Clash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft AzureClash of Technologies Google Cloud vs Microsoft Azure
Clash of Technologies Google Cloud vs Microsoft Azure
 
AI for business: Capire l'opportunità
AI for business: Capire l'opportunitàAI for business: Capire l'opportunità
AI for business: Capire l'opportunità
 
Serverless Data Architecture at scale on Google Cloud Platform
Serverless Data Architecture at scale on Google Cloud PlatformServerless Data Architecture at scale on Google Cloud Platform
Serverless Data Architecture at scale on Google Cloud Platform
 
Introduzione Deep Learning & TensorFlow
Introduzione Deep Learning & TensorFlowIntroduzione Deep Learning & TensorFlow
Introduzione Deep Learning & TensorFlow
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
 
Google Cloud Platform and Kubernetes
Google Cloud Platform and KubernetesGoogle Cloud Platform and Kubernetes
Google Cloud Platform and Kubernetes
 
Google app engine
Google app engineGoogle app engine
Google app engine
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
 
A Tour of Google Cloud Platform
A Tour of Google Cloud PlatformA Tour of Google Cloud Platform
A Tour of Google Cloud Platform
 

Semelhante a Google Cloud for Data Crunchers - Strata Conf 2011

Javaedge 2010-cschalk
Javaedge 2010-cschalkJavaedge 2010-cschalk
Javaedge 2010-cschalk
Chris Schalk
 

Semelhante a Google Cloud for Data Crunchers - Strata Conf 2011 (20)

CloudOps evening presentation from Google
CloudOps evening presentation from GoogleCloudOps evening presentation from Google
CloudOps evening presentation from Google
 
Introduction to Google Cloud Platform Technologies
Introduction to Google Cloud Platform TechnologiesIntroduction to Google Cloud Platform Technologies
Introduction to Google Cloud Platform Technologies
 
Javaedge 2010-cschalk
Javaedge 2010-cschalkJavaedge 2010-cschalk
Javaedge 2010-cschalk
 
Introduction to Google's Cloud Technologies
Introduction to Google's Cloud TechnologiesIntroduction to Google's Cloud Technologies
Introduction to Google's Cloud Technologies
 
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIsGDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
 
What's new in App Engine and intro to App Engine for Business
What's new in App Engine and intro to App Engine for BusinessWhat's new in App Engine and intro to App Engine for Business
What's new in App Engine and intro to App Engine for Business
 
Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...
Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...
Google Developer Day 2010 Japan: Part 1: Google App Engine for Business の概要 P...
 
Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)
Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)
Entrepreneurship Tips With HTML5 & App Engine Startup Weekend (June 2012)
 
Google Cloud Platform Update
Google Cloud Platform UpdateGoogle Cloud Platform Update
Google Cloud Platform Update
 
Google Cloud Technologies Overview
Google Cloud Technologies OverviewGoogle Cloud Technologies Overview
Google Cloud Technologies Overview
 
#MBLTdev: Разработка backend для мобильного приложения с использованием Googl...
#MBLTdev: Разработка backend для мобильного приложения с использованием Googl...#MBLTdev: Разработка backend для мобильного приложения с использованием Googl...
#MBLTdev: Разработка backend для мобильного приложения с использованием Googl...
 
Google App Engine for Business 101
Google App Engine for Business 101Google App Engine for Business 101
Google App Engine for Business 101
 
GDD Brazil 2010 - What's new in Google App Engine and Google App Engine For B...
GDD Brazil 2010 - What's new in Google App Engine and Google App Engine For B...GDD Brazil 2010 - What's new in Google App Engine and Google App Engine For B...
GDD Brazil 2010 - What's new in Google App Engine and Google App Engine For B...
 
Google Developers Overview Deck 2015
Google Developers Overview Deck 2015Google Developers Overview Deck 2015
Google Developers Overview Deck 2015
 
Google Technical Webinar - Building Mashups with Google Apps and SAP, using S...
Google Technical Webinar - Building Mashups with Google Apps and SAP, using S...Google Technical Webinar - Building Mashups with Google Apps and SAP, using S...
Google Technical Webinar - Building Mashups with Google Apps and SAP, using S...
 
How Google Cloud Platform can help in the classroom/lab
How Google Cloud Platform can help in the classroom/labHow Google Cloud Platform can help in the classroom/lab
How Google Cloud Platform can help in the classroom/lab
 
Google's serverless journey: past to present
Google's serverless journey: past to presentGoogle's serverless journey: past to present
Google's serverless journey: past to present
 
Quick Intro to Google Cloud Technologies
Quick Intro to Google Cloud TechnologiesQuick Intro to Google Cloud Technologies
Quick Intro to Google Cloud Technologies
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
Powerful Google developer tools for immediate impact! (2023-24 A)
Powerful Google developer tools for immediate impact! (2023-24 A)Powerful Google developer tools for immediate impact! (2023-24 A)
Powerful Google developer tools for immediate impact! (2023-24 A)
 

Mais de Patrick Chanezon

Mais de Patrick Chanezon (20)

KubeCon 2019 - Scaling your cluster (both ways)
KubeCon 2019 - Scaling your cluster (both ways)KubeCon 2019 - Scaling your cluster (both ways)
KubeCon 2019 - Scaling your cluster (both ways)
 
KubeCon China 2019 - Building Apps with Containers, Functions and Managed Ser...
KubeCon China 2019 - Building Apps with Containers, Functions and Managed Ser...KubeCon China 2019 - Building Apps with Containers, Functions and Managed Ser...
KubeCon China 2019 - Building Apps with Containers, Functions and Managed Ser...
 
Dockercon 2019 Developing Apps with Containers, Functions and Cloud Services
Dockercon 2019 Developing Apps with Containers, Functions and Cloud ServicesDockercon 2019 Developing Apps with Containers, Functions and Cloud Services
Dockercon 2019 Developing Apps with Containers, Functions and Cloud Services
 
GIDS 2019: Developing Apps with Containers, Functions and Cloud Services
GIDS 2019: Developing Apps with Containers, Functions and Cloud ServicesGIDS 2019: Developing Apps with Containers, Functions and Cloud Services
GIDS 2019: Developing Apps with Containers, Functions and Cloud Services
 
Docker Enterprise Workshop - Intro
Docker Enterprise Workshop - IntroDocker Enterprise Workshop - Intro
Docker Enterprise Workshop - Intro
 
Docker Enterprise Workshop - Technical
Docker Enterprise Workshop - TechnicalDocker Enterprise Workshop - Technical
Docker Enterprise Workshop - Technical
 
The Tao of Docker - ITES 2018
The Tao of Docker - ITES 2018The Tao of Docker - ITES 2018
The Tao of Docker - ITES 2018
 
Moby KubeCon 2017
Moby KubeCon 2017Moby KubeCon 2017
Moby KubeCon 2017
 
Microsoft Techsummit Zurich Docker and Microsoft
Microsoft Techsummit Zurich Docker and MicrosoftMicrosoft Techsummit Zurich Docker and Microsoft
Microsoft Techsummit Zurich Docker and Microsoft
 
Develop and deploy Kubernetes applications with Docker - IBM Index 2018
Develop and deploy Kubernetes  applications with Docker - IBM Index 2018Develop and deploy Kubernetes  applications with Docker - IBM Index 2018
Develop and deploy Kubernetes applications with Docker - IBM Index 2018
 
Docker Meetup Feb 2018 Develop and deploy Kubernetes Apps with Docker
Docker Meetup Feb 2018 Develop and deploy Kubernetes Apps with DockerDocker Meetup Feb 2018 Develop and deploy Kubernetes Apps with Docker
Docker Meetup Feb 2018 Develop and deploy Kubernetes Apps with Docker
 
DockerCon EU 2017 Recap
DockerCon EU 2017 RecapDockerCon EU 2017 Recap
DockerCon EU 2017 Recap
 
Docker Innovation Culture
Docker Innovation CultureDocker Innovation Culture
Docker Innovation Culture
 
The Tao of Docker - Devfest Nantes 2017
The Tao of Docker - Devfest Nantes 2017The Tao of Docker - Devfest Nantes 2017
The Tao of Docker - Devfest Nantes 2017
 
Docker 之道 Modernize Traditional Applications with 无为 Create New Cloud Native ...
Docker 之道 Modernize Traditional Applications with 无为 Create New Cloud Native ...Docker 之道 Modernize Traditional Applications with 无为 Create New Cloud Native ...
Docker 之道 Modernize Traditional Applications with 无为 Create New Cloud Native ...
 
Moby Open Source Summit North America 2017
Moby Open Source Summit North America 2017Moby Open Source Summit North America 2017
Moby Open Source Summit North America 2017
 
Moby Introduction - June 2017
Moby Introduction - June 2017Moby Introduction - June 2017
Moby Introduction - June 2017
 
Docker Cap Gemini CloudXperience 2017 - la revolution des conteneurs logiciels
Docker Cap Gemini CloudXperience 2017 - la revolution des conteneurs logicielsDocker Cap Gemini CloudXperience 2017 - la revolution des conteneurs logiciels
Docker Cap Gemini CloudXperience 2017 - la revolution des conteneurs logiciels
 
Weave User Group Talk - DockerCon 2017 Recap
Weave User Group Talk - DockerCon 2017 RecapWeave User Group Talk - DockerCon 2017 Recap
Weave User Group Talk - DockerCon 2017 Recap
 
Oscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby projectOscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby project
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Google Cloud for Data Crunchers - Strata Conf 2011

  • 1. Google Cloud for Data Crunchers Patrick Chanezon, Developer Advocate, Cloud @chanezon, chanezon@google.com Ryan Boyd, Developer Advocate, Apps @ryguyrg, rboyd@google.com Kirrily Robert, Data Engineer, Freebase.com @skud, skud@google.com
  • 2. Agenda • Google App Engine • Google Storage for Developers • Prediction API • BigQuery • Google Fusion Tables • Google Refine Google Developer Day 2010
  • 3. Google App Engine Google Developer Day 2010
  • 4. What is cloud computing? 3
  • 5. Cloud Computing Defined SaaS PaaS IaaS Source: Gartner AADI Summit Dec 2009 Google Developer Day 2010
  • 6. Google's Cloud Offerings Your Apps 1. Google Apps 2. Third party Apps: Google Apps Marketplace SaaS 3. ________ Google App Engine PaaS Google Storage IaaS Prediction API BigQuery Google Developer Day 2010
  • 7. Google App Engine - Easy to build - Easy to maintain - Easy to scale 7
  • 8. Cloud development in a box • SDK & “The Cloud” • Hardware • Networking • Operating system • Application runtime o Java, Python • Static file serving • Services • Fault tolerance • Load balancing 8
  • 9. App Engine Services Memcache Datastore URL Fetch Mail XMPP Task Queue Images Blobstore User Service 9
  • 10. Always free to get started ~5M pageviews/month • 6.5 CPU hrs/day • 1 GB storage • 650K URL Fetch calls/day • 2,000 recipients emailed • 1 GB/day bandwidth • 100,000 tasks enqueued • 650K XMPP messages/day 10
  • 11. Purchase additional resources * * free monthly quota of ~5 million page views still in full effect 11
  • 12. Google App Engine for Business Same scalable cloud hosting platform. Designed for the enterprise. • Enterprise application management – Centralized domain console • Enterprise reliability and support – 99.9% Service Level Agreement – Premium Developer Support • Hosted SQL – Managed relational SQL database in the cloud • SSL on your domain – Including "naked" domain support • Secure by default – Integrated Single Sign On (SSO) • Pricing that makes sense Google App Engine for Business – Pay only for what you use * Hosted SQL and SSL on your domain available later this year Google Developer Day 2010
  • 13. App Engine for Data Crunchers • High Performance Image Serving • OpenId/Oauth integration • Increased quotas • > 1k entities per query • 10’’ task queues • Async UrlFetch • Mapper API (Reduce coming soon) • Channel API • Matcher API Google Developer Day 2010
  • 14. Mapper API • First component of App Engine’s MapReduce toolkit • Large scale data manipulation • Examples include: • Report generation • Computing statistics and metrics … • Python Example: • http://blog.notdot.net/2010/05/Exploring-the-new-mapper-API • Java Example: • http://ikaisays.com/2010/07/09/using-the-java-mapper-framework-for-app- engine/ Google Developer Day 2010
  • 15. Channel API • Allows for Server Push (Comet) to browser • Blog post announcement: • http://googleappengine.blogspot.com/2010/05/app-engine-at-google- io-2010.html • External coverage: • Sneak Peak from an early trusted tester • http://bitshaq.com/2010/09/01/sneak-peak-gae-channel-api/ • Demo code for Dance Dance Robot available here: • http://code.google.com/p/dance-dance-robot/ • Also see: https://groups.google.com/group/google-appengine-java/ browse_thread/thread/6fa09953ffae2cd3/c1db7de5fdb82b65?pli=1# Google Developer Day 2010
  • 16. Matcher API • Allows an app to register a set of queries to match against a stream of documents • Trustes Testers, Python only • Group post announcement: • http://groups.google.com/group/google-appengine/msg/40021537e2e58962 • Docs: • http://code.google.com/p/google-app-engine-samples/wiki/ AppEngineMatcherService • Demo code: • http://code.google.com/p/google-app-engine-samples/source/browse/#svn/trunk/ matcher-sample Google Developer Day 2010
  • 17. Google Storage for Developers Store your data in Google's cloud Google Developer Day 2010
  • 18. What Is Google Storage? • Store your data in Google's cloud o any format, any amount, any time • You control access to your data o private, shared, or public • Access via Google APIs or 3rd party tools/libraries Google Developer Day 2010
  • 19. Google Storage Technical Details RESTful API  • Verbs: GET, PUT, POST, HEAD, DELETE  • Resources: identified by URI, like: http://commondatastorage.googleapis.com/bucket/object • Compatible with S3  Buckets • Flat containers (no bucket hierarchy) Google Developer Day 2010
  • 20. Performance and Scalability Object types and size • Objects of any type and 100GB+ / Object • Unlimited numbers of objects, 1000s of buckets • Range-get support for data retrieval Replication • All data replicated to multiple US data centers • Leveraging Google's worldwide network for data delivery Consistency • “Read-your-writes” data consistency Google Developer Day 2010
  • 21. Security and Privacy Features Authenticated downloads from a web browser • Sharing with individuals • Group sharing via Google Groups • Sharing with Google Apps domains Permissions set on Buckets or Objects • READ (an object, or list a bucket’s contents) • WRITE (applicable to buckets, allows upload/delete/etc) • FULL_CONTROL (read/write ACLs on objects or buckets) Google Developer Day 2010
  • 22. Tools Google Storage Manager gsutil Google Developer Day 2010
  • 23. Google Storage Benefits High Performance and Scalability Backed by Google infrastructure Strong Security and Privacy Control access to your data Easy to Use Get started fast with Google & 3rd party tools Google Developer Day 2010
  • 24. Some Early Google Storage Adopters Google Developer Day 2010
  • 25. Google Storage usage within Google Google Google BigQuery Prediction API Haiti Relief Imagery USPTO data Partner Reporting Partner Reporting Google Developer Day 2010
  • 26. Google Storage - Availability Limited preview in US* currently • 100GB free storage and network per account • Sign up for wait list at • http://code.google.com/apis/storage/ * Non-US preview available on case-by-case basis Google Developer Day 2010
  • 27. Google Prediction API Google's prediction engine in the cloud Google Developer Day 2010
  • 28. Introducing the Google Prediction API • Google's sophisticated machine learning technology • Available as an on-demand RESTful HTTP web service Google Developer Day 2010
  • 29. A virtually endless number of applications... Customer Transaction Species Message Diagnostics Sentiment Risk Identification Routing Churn Legal Docket Suspicious Work Roster Inappropriate Prediction Classification Activity Assignment Content Recommend Political Uplift Email Career Products Bias Marketing Filtering Counseling ... and many more ... Google Developer Day 2010
  • 30. How does it work? 1. TRAIN The quick brown fox jumped over the "english" The Prediction API lazy dog. finds relevant To err is human, but to really foul things features in the "english" up you need a computer. sample data during "spanish" No hay mal que por bien no venga. training. "spanish" La tercera es la vencida. 2. PREDICT To be or not to be, that is the ? The Prediction API question. later searches for ? La fe mueve montañas. those features during prediction. Google Developer Day 2010
  • 31. Introducing the Google Prediction API Google Developer Day 2010
  • 32. A Prediction API Example Automatically determine application recommendations • Goal: Increase relevancy on the Apps Marketplace via recommendations • Customers: Businesses of various sizes and industries using Google Apps around the world • Data: Sampling of previous installs of applications • Outcome: Predict applications which would be appropriate for a new customer visiting the site Google Developer Day 2010
  • 33. Using the Prediction API A simple three step process... Upload your training data to 1. Upload Google Storage Build a model from your data 2. Train 3. Predict Make new predictions Google Developer Day 2010
  • 34. Step 1: Upload Upload your training data to Google Storage • Training data: outputs and input features • Data format: comma separated value format (CSV), result in first column "SlideRocket","EDUCATION","us","en","10","5" "MailChimp","BUSINESS","us","en","7","0" "MailChimp","STANDARD","se","sv","1","0" "Smartsheet","BUSINESS","us","en","13","4" Upload to Google Storage gsutil cp installs gs://appdata/ Google Developer Day 2010
  • 35. Step 2: Train Create a new model by training on data To train a model: POST prediction/v1.1/training?data=appdata%2Finstalls Training runs asynchronously. To see if it has finished: GET prediction/v1.1/training/appdata%2Finstalls {"data":{ "data":"appdata/installs", "modelinfo":"estimated accuracy: 0.xx"}}} Google Developer Day 2010
  • 36. Step 3: Predict Apply the trained model to make predictions on new data POST prediction/v1.1/query/appdata%2Finstalls/predict { "data":{ "input": { "mixture" : [ "EDUCATION","us","en","10","0" ]}}} { data : { "kind" : "prediction#output", "outputLabel":"Manymoon", "outputMulti" :[ {"label":"OffiSync", "score": x.xx} {"label":"Zoho CRM", "score": x.xx} {"label":"MailChimp", "score": x.xx}]}} Google Developer Day 2010
  • 37. Demo! Google Developer Day 2010
  • 38. Demo Screenshots Predicting apps for a 501-1,000 seat educational institution Google Developer Day 2010
  • 39. Demo Screenshots Predicting apps for a 501-1,000 seat educational institution Google Developer Day 2010
  • 40. Demo Screenshots Predicting apps for a small business Google Developer Day 2010
  • 41. Demo Screenshots Predicting apps for a small business Google Developer Day 2010
  • 42. Prediction API Capabilities Data • Input Features: numeric or unstructured text • Output: up to hundreds of discrete categories, or continuous values Training • Many machine learning techniques • Automatically selected • Performed asynchronously Access from many platforms: • Web app from Google App Engine • Apps Script (e.g. from Google Spreadsheet) • Desktop app Google Developer Day 2010
  • 43. Prediction API - Pricing Free Quota in trial/development • 100 predictions/day, 5MB trained/day • Available for 6 months Paid Usage • $10/month per project includes 10,000 predictions • Additional predictions are $0.50 per 1,000 • Absolute limit of 60,000 predictions per day • $0.002 per MB trained (max size per dataset is 100MB) Google Developer Day 2010
  • 44. Google Storage - Availability Limited preview in US* currently • Sign up for wait list at • http://code.google.com/apis/predict/ * Non-US preview available on case-by-case basis Google Developer Day 2010
  • 45. Google BigQuery Interactive analysis of large datasets in Google's cloud Google Developer Day 2010
  • 46. Introducing Google BigQuery – Google's large data adhoc analysis technology • Analyze massive amounts of data in seconds – Simple SQL-like query language – Flexible access • REST APIs, JSON-RPC, Google Apps Script Google Developer Day 2010
  • 47. Why BigQuery? Working with large data is a challenge Google Developer Day 2010
  • 48. Many Use Cases ... Trends Interactive Spam Detection Tools Web Network Dashboards Optimization Google Developer Day 2010
  • 49. Key Capabilities of BigQuery • Scalable: Billions of rows • Fast: Response in seconds • Simple: Queries in SQL • Web Service o REST o JSON-RPC o Google App Scripts Google Developer Day 2010
  • 50. Using BigQuery Another simple three step process... Upload your raw data to 1. Upload Google Storage Import raw data into 2. Import BigQuery table 3. Query Perform SQL queries on table Google Developer Day 2010
  • 51. Writing Queries Compact subset of SQL o SELECT ... FROM ... WHERE ... GROUP BY ... ORDER BY ... LIMIT ...; Common functions o Math, String, Time, ... Additional statistical approximations o TOP o COUNT DISTINCT Google Developer Day 2010
  • 52. BigQuery via REST GET /bigquery/v1/tables/{table name} GET /bigquery/v1/query?q={query} Sample JSON Reply: { "results": { "fields": { [ {"id":"COUNT(*)","type":"uint64"}, ... ] }, "rows": [ {"f":[{"v":"2949"}, ...]}, {"f":[{"v":"5387"}, ...]}, ... ] } } Also supports JSON-RPC Google Developer Day 2010
  • 53. Security and Privacy Standard Google Authentication • Client Login • OAuth • AuthSub HTTPS support • protects your credentials • protects your data Relies on Google Storage to manage access Google Developer Day 2010
  • 54. Large Data Analysis Example Wikimedia Revision History Wikimedia Revision history data from: http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z Google Developer Day 2010
  • 55. Using BigQuery Shell Python DB API 2.0 + B. Clapper's sqlcmd http://www.clapper.org/software/python/sqlcmd/ Google Developer Day 2010
  • 56. BigQuery from a Spreadsheet Google Developer Day 2010
  • 57. Google Fusion Tables Google Developer Day 2010
  • 58. Google Fusion Tables • Manage large collections of tabular data in the cloud • 100 Mb tables • Filters, Aggregation, Merge • ACL, Collaboration, Discuss Data • Visualizations • REST API • Geo queries • Maps Integration • FusionTablesLayer Google Developer Day 2010
  • 59. Google Fusion Tables Google Developer Day 2010
  • 60. Google Visualization API Google Developer Day 2010
  • 61. Google Visualization API • Collection of JavaScript Visualization components • Some from Google (Chart Tools) • Some from other developers • Share the same wire protocol for Data Sources Google Developer Day 2010
  • 62. Example: Weather data • US National Climatic Data Center • weather data at stations around the globe since 1929 • Stored in Google Storage • Created a Table for Bigquery • Upload Weather Station coordinates in Fusion Tables • App Engine App • Maps API to display weather station Maps • Bigquery to query average temperature in January • A bit of Python to create a JSON Data Source • Visualization API • Just an example: rince, repeat, enhance! Google Developer Day 2010
  • 63. Example: Weather data Google Developer Day 2010
  • 64. Google Refine Google Developer Day 2010
  • 65. Google Refine • Power tool for working with messy data • Cleanup • Transform • Augment • (Link with FreeBase) • Desktop software for now • http://code.google.com/p/google-refine/ Google Developer Day 2010
  • 66. Google Refine Google Developer Day 2010
  • 67. Recap • Google App Engine o Easy to build, deploy and manage web apps • Google Storage o High speed data storage on Google Cloud • Prediction API o Google's machine learning technology • BigQuery o Interactive analysis of very large data sets • Google Fusion Tables o Manage collections of tabular data in the cloud • Google Refine o Power tool for working with messy data • Google Visualization o Collection of JavaScript Visualization Google Developer Day 2010