SlideShare uma empresa Scribd logo
1 de 43
Treasure Data
                      The architecture of data analytics PaaS on AWS



                                    Masahiro Nakagawa

                                   JAWS Days: 2013/03/16




Friday, April 5, 13
Who are you?
          Masahiro Nakagawa
              • @repeatedly / masa@treasure-data.com


          Treasure Data, Inc.
              • Senior Software Engineer, since 2012/11

          Open Source projects
              •   D Programming Language
              •   MessagePack: D, Python, etc...
              •   Fluentd: Core, mongo, etc...
              •   etc...

                                                          2

Friday, April 5, 13
Introduction to
          Treasure Data




Friday, April 5, 13
Company Overview
          Silicon Valley-based Company
              • All Founders are Japanese
                      • Hironobu Yoshikawa
                      • Kazuki Ohta
                      • Sadayuki Furuhashi


          OSS Enthusiasts
              • MessagePack, Fluentd, etc.




                                             4

Friday, April 5, 13
Investors
             Bill Tai
             Naren Gupta - Nexus Ventures, Director of Redhat, TIBCO
             Othman Laraki - Former VP Growth at Twitter
             James Lindenbaum, Adam Wiggins, Orion Henry - Heroku
              Founders
             Anand Babu Periasamy, Hitesh Chellani - Gluster Founders
             Yukihiro “Matz” Matsumoto - Creator of Ruby
             Dan Scheinman - Director of Arista Networks
             Jerry Yang - Founder of Yahoo!
             + 10 more people
              • and....
                                                                         5

Friday, April 5, 13
Treasure Data = Cloud + Big Data
     Cloud                                                                            Big Data-as-a-Service



                            Database-as-a-service




                                             Enterprise
                      Lightweight             RDBMS           Traditional
                        RDBMS                               Data Warehouse

                                                    DB2
  On-Premise
                                    $34B                                     $10B
                                    market                                   market


                                                          1Bil entry                             Data Volume
                                                          Or 10TB


          © 2012 Forrester Research, Inc. Reproduction Prohibited                                              6

Friday, April 5, 13
Why Cloud? ‘Time’ is Money
                             Ideal
    Customer              Expectation
     Value

                                                        Obsolete
                                                        over time


                                           Reality
                                        (On-Premise)


                                                             Upgrade
                      HW/SW Selection, PoC, Deploy...
                                                                       Time
      Sign-up or PO




                                                                         7

Friday, April 5, 13
Big Data Adoption Stages
                        Optimization           What’s the best?
                      Predictive Analysis      What’s a trend?     Analytics
                      Statistical Analysis         Treasure Data’s FOCUS
                                                    Why?
                            Alerts                  Error?(80% of needs)
                      Drill Down Query         Where exactly?
                                                                       Reporting
                      Ad-hoc Reports               Where?
                      Standard Reports         What happened?

                                     Intelligence Sophistication
                                                                               8

Friday, April 5, 13
Full Stack Support for Big Data Reporting

        Our best-in-class architecture       Data from almost any source
        and operations team ensure the       can be securely and reliably
        integrity and availability of your   uploaded using td-agent in
        data.                                streaming or batch mode.




        Our SQL, REST, JDBC, ODBC            You can store gigabytes to
        and command-line interfaces          petabytes of data efficiently and
        support all major query tools        securely in our cloud-based
        and approaches.                      columnar datastore.




                                                                       9

Friday, April 5, 13
Vision: Single Analytics Platform for the World
                                                                   10

Friday, April 5, 13
11

         Our Customers – Fortune Global 500 leaders and
         start-ups including:




Friday, April 5, 13
Treasure Data’s
          Service Architecture




Friday, April 5, 13
Treasure Data = Collect + Store + Query
                                                                13

Friday, April 5, 13
Example in AdTech: MobFox




           1. Europe’s largest independent mobile ad exchange.
           2. 20 billion imps/month (circa Jan. 2013)
           3. Serving ads for 15,000+ mobile apps (circa Jan. 2013)
           4. Needed Big Data Analytics infrastructure ASAP.

                                                                  14

Friday, April 5, 13
Two Weeks From Start to Finish!




                                                        15

Friday, April 5, 13
Used AWS Products (1)
          RDS
              • Store user information, job status, etc...
              • Store metadata of our columnar database
              • Queue of worker (perfectqueue / perfectsched)


          EC2
              • API servers
              • Hadoop clusters
              • Job workers
                      • Using Chef to deploy


                                                                16

Friday, April 5, 13
Used AWS Products (2)
          ELB
              • Load balancing of API servers
              • Load balancing of td-agents


          S3
              • Columnar storage built on top of S3
                      • MessagePack columnar format
                      • realtime / archive storage
              • Our Result feature supports S3 output.

                  No EMR, SQS and other products !
                                                         17

Friday, April 5, 13
Architecture Breakdown



      Data Collection             Data Store/Analytics        Connectivity
      • Increasing variety of     • Remaining complexity in   • Required to ensure
        data sources                both traditional DWH        connectivity with
      • No single data schema       and Hadoop (very slow       existing BI/visualization/
      • Lack of streaming data      time to market)             apps by JDBC, REST
        collection method         • Challenges in scaling       and ODBC.
      • 60% of Big Data project     data volume and           • Output ot other services,
        resource consumed           expanding cost.             e.g. S3, RDBMS, etc.




                                                                                         18

Friday, April 5, 13
1) Data Collection
          60% of BI project resource is consumed here
          Most ‘underestimated’ and ‘unsexy’ but MOST important
          Fluentd: OSS lightweight but robust Log Collector
              • http://fluentd.org/




                                                               19

Friday, April 5, 13
Fluentd
                      the missing log collector



                               fluentd.org

                                                  20

Friday, April 5, 13
In short
             Open sourced log collector written in Ruby
             Using rubygems ecosystem for plugins



                  It’s like syslogd, but
              uses JSON for log messages

                                                           21

Friday, April 5, 13
Time       2012-02-04 01:33:51
        Apache                                                               Tag          apache.log
                                                                            Record {
                                                                                       "host": "127.0.0.1",
                                                                        tail           "method": "GET",
                                                                                       "path": "/",
                       write                                                           ...
                                                                                   }

                                                                                             insert
  127.0.0.1
  127.0.0.1
  127.0.0.1
              -
              -
              -
                  -
                  -
                  -
                      [11/Dec/2012:07:26:27]
                      [11/Dec/2012:07:26:30]
                      [11/Dec/2012:07:26:32]
                                               "GET
                                               "GET
                                               "GET
                                                      /
                                                      /
                                                      /
                                                          ...
                                                          ...
                                                          ...
                                                                       Fluentd
  127.0.0.1   -   -   [11/Dec/2012:07:26:40]   "GET   /   ...
  127.0.0.1   -   -   [11/Dec/2012:07:27:01]   "GET   /   ...
                               ...




                                                                 event
                                                                buffering
                                                                                       Mongo
                                                                                                         22

Friday, April 5, 13
Architecture
             Pluggable     Pluggable   Pluggable



                  Input     Buffer     Output

             > Forward     > Memory    > Forward
             > HTTP        > File      > File
             > File tail               > Amazon S3
             > dstat                   > MongoDB
             > ...                     > ...

                                                     23

Friday, April 5, 13
Before Fluentd
              Server1           Server2               Server3

          Application         Application           Application


                        ・・・               ・・・                    ・・・




                                                High Latency!
                                                must wait for a day...
                               Fluent
                              Log Server
                                                                  24

Friday, April 5, 13
After Fluentd
              Server1                Server2              Server3

          Application            Application             Application


               Fluentd   ・・・         Fluentd   ・・・        Fluentd   ・・・




                                                     In streaming!

                           Fluentd             Fluentd

                                                                       25

Friday, April 5, 13
Access logs                                   Alerting
     Apache                                        Nagios

    App logs                                      Analysis
     Frontend                                      MongoDB
     Backend
                                                   MySQL

    System logs                                    Hadoop
      syslogd
                      filter / buffer / routing
                                                  Archiving
    Databases                                      Amazon S3
                                                             26

Friday, April 5, 13
td-agent
             Open sourced distribution package of fluentd
             ETL part of Treasure Data
             Including useful components
                 • ruby, jemalloc, fluentd
                 • 3rd party gems: td, mongo, webhdfs, etc...
                      •   td plugin is for Treasure Data

             http://packages.treasure-data.com/



                                                                27

Friday, April 5, 13
Treasure Data Service Architecture
                                                                 This!

                  Apache

                      App                                                        Treasure Data
                                              td-agent                           columnar data
                      App       RDBMS                                             warehouse

                  Other data sources

                                                                                        MAPREDUCE JOBS

                                         HIVE, PIG (to be supported)
                            td-command
                                                                                      Query
                                                                         Query
                                                                                      Processing
                                                                          API
                                         JDBC, REST                                   Cluster
            User             BI apps




                                                                                                    28

Friday, April 5, 13
AWS plugins
             S3
             SNS
             SQS
             DynamoDB
             foward-aws
             RDS                       http://fluentd.org/plugin/
             RedShift
             CloudWatch
             Yet Another Cloud Watch
             CloudWatch Lite

                                                                29

Friday, April 5, 13
2) Data Store / Analytics - Columnar Storage




                                                    30

Friday, April 5, 13
Treasure Data Service Processing Flow
                                                Worker
             Frontend
                                    Job Queue                     Hadoop




                                                                  Hadoop


              Applications push
              metrics to Fluentd
                                                               sums up data minutes
              (via local Fluentd)    Fluentd    Fluentd         (partial aggregation)



                      Treasure
                                                          Librato Metrics
                          Data
         for historical analysis                           for realtime analysis

                                                                                        31

Friday, April 5, 13
Friday, April 5, 13
Structure of Columnar Storages

               import             bulk import                     SELECT ...



            Import Storage         Bulk Import Storage


                             Realtime Storage              Archive Storage

                                                         merge (every 1 hour)

                         23c82b0ba3405d4c15aa85d2190e     2013-03-15 00:23:00 912ec80
                         6d7b1482412ab14f0332b8aee119     2013-03-16 00:01:00 277a259
                         8a7bc848b2791b8fd603c719e54f                   ...
                         0e3d402b17638477c9a7977e7dab
                                     ...



                                                                                        33

Friday, April 5, 13
Query Language




                      Query Execution




                      Columnar Data




                      Object Storage




                                 34

Friday, April 5, 13
1/4: Compile SQL into MapReduce

                         SQL Statement
                                  SELECT COUNT(DISTINCT ip) FROM tbl;



                              Hive
                      SQL - to - MapReduce




                                                                   35

Friday, April 5, 13
2/4: MapReduce is executed in parallel

                                                           SELECT COUNT(DISTINCT ip) FROM tbl;




                      cc2.8xlarge cluster compute instance (up to 100 nodes * 32 threads)



                                                                                                 36

Friday, April 5, 13
3/4: Columnar Data Access

                                                              SELECT COUNT(DISTINCT ip) FROM tbl;




                      10Gbps Network




                                       Read ONLY the Required Part of Data


                                                                                                    37

Friday, April 5, 13
4/4: Object-based Storage




                                     38

Friday, April 5, 13
Data first, Schema later


            SELECT           54 (int)    “test” (string)        120 (int)         NULL




            Schema           user:int        name:string       value:int        host:int




            Raw data(JSON)   {“user”:54, “name”:”test”, “value”:”120”, “host”:”local”}




                                                                                           39

Friday, April 5, 13
3) Connectivity

                                   REST API
                      td-command
                                                                 Query
                                                       Query
      Query                                             API
                                                                 Processing
                                   JDBC, ODBC Driver             Cluster
                       BI apps




                       Web App
                                                           Treasure Data
         Result         MySQL                             Columnar Storage

                         S3
                        …




                                                                              40

Friday, April 5, 13
Multi-Tenancy
    All customers share the Hadoop clusters (Multi Data Centers)
    Resource Sharing (Burst Cores), Rapid Improvement, Ease of Upgrade

                                                                       Job Submission
                                                                       + Plan Change
                                     Local FairScheduler

                      datacenter A

                                     Local FairScheduler
                                                               Global
                      datacenter B
                                                              Scheduler
                                     Local FairScheduler

                      datacenter C                            On-Demand
                                                           Resouce Allocation
                                     Local FairScheduler
                      datacenter D


                                                                                  41

Friday, April 5, 13
Conclusion
          Treasure Data
              • Cloud based Big-data analytics platform
              • Provide Machete for Big data reporting

          Big Data processing
              • Collect / Store / Analytics / Visualization
                       Our focus!
          Our used AWS products
              • EC2, S3, RDS, ELB
              • Building Treasure Data specific systems on AWS


                                                                 42

Friday, April 5, 13
Big Data for the Rest of Us

                      www.treasure-data.com | @TreasureData




Friday, April 5, 13

Mais conteúdo relacionado

Mais procurados

Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정PgDay.Seoul
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Icebergkbajda
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2PgTraining
 
Performance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL DatabasePerformance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL DatabaseTung Nguyen Thanh
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Streaming sql and druid
Streaming sql and druid Streaming sql and druid
Streaming sql and druid arupmalakar
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Mvcc in postgreSQL 권건우
Mvcc in postgreSQL 권건우Mvcc in postgreSQL 권건우
Mvcc in postgreSQL 권건우PgDay.Seoul
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
Migrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQLMigrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQLUmair Mansoob
 
20221117_クラウドネイティブ向けYugabyteDB活用シナリオ
20221117_クラウドネイティブ向けYugabyteDB活用シナリオ20221117_クラウドネイティブ向けYugabyteDB活用シナリオ
20221117_クラウドネイティブ向けYugabyteDB活用シナリオMasaki Yamakawa
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재PgDay.Seoul
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsCloudera, Inc.
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Cloudera, Inc.
 

Mais procurados (20)

Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
[pgday.Seoul 2022] 서비스개편시 PostgreSQL 도입기 - 진소린 & 김태정
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2
 
Performance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL DatabasePerformance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL Database
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Streaming sql and druid
Streaming sql and druid Streaming sql and druid
Streaming sql and druid
 
Amazon Aurora: Under the Hood
Amazon Aurora: Under the HoodAmazon Aurora: Under the Hood
Amazon Aurora: Under the Hood
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Mvcc in postgreSQL 권건우
Mvcc in postgreSQL 권건우Mvcc in postgreSQL 권건우
Mvcc in postgreSQL 권건우
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Migrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQLMigrating Oracle database to PostgreSQL
Migrating Oracle database to PostgreSQL
 
20221117_クラウドネイティブ向けYugabyteDB活用シナリオ
20221117_クラウドネイティブ向けYugabyteDB活用シナリオ20221117_クラウドネイティブ向けYugabyteDB活用シナリオ
20221117_クラウドネイティブ向けYugabyteDB活用シナリオ
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
 

Destaque

Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSAmazon Web Services
 
AWS User Group Sydney - Atlassian 5-10-16
AWS User Group Sydney - Atlassian 5-10-16AWS User Group Sydney - Atlassian 5-10-16
AWS User Group Sydney - Atlassian 5-10-16PolarSeven Pty Ltd
 
应用开发利器 IBM Bluemix平台云介绍
应用开发利器 IBM Bluemix平台云介绍应用开发利器 IBM Bluemix平台云介绍
应用开发利器 IBM Bluemix平台云介绍Hardway Hou
 
Hadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-TenancyHadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-TenancyTreasure Data, Inc.
 
PaaS is dead, Long live PaaS - Defrag 2016
PaaS is dead, Long live PaaS - Defrag 2016PaaS is dead, Long live PaaS - Defrag 2016
PaaS is dead, Long live PaaS - Defrag 2016brendandburns
 
Toyko azure meetup # 1 azure paa s overview
Toyko azure meetup # 1   azure paa s overviewToyko azure meetup # 1   azure paa s overview
Toyko azure meetup # 1 azure paa s overviewTokyo Azure Meetup
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingIlyas F ☁☁☁
 
Define y desarrolla tu primera api
Define y desarrolla tu primera apiDefine y desarrolla tu primera api
Define y desarrolla tu primera apiCloudAppi
 
Big data y las apis (big data spain)
Big data y las apis (big data spain)Big data y las apis (big data spain)
Big data y las apis (big data spain)CloudAppi
 
Big Data as PaaS in Enterprises
Big Data as PaaS in EnterprisesBig Data as PaaS in Enterprises
Big Data as PaaS in EnterprisesPankaj Khattar
 
Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...
Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...
Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...Amazon Web Services
 
Database Consolidation using Oracle Multitenant
Database Consolidation using Oracle MultitenantDatabase Consolidation using Oracle Multitenant
Database Consolidation using Oracle MultitenantPini Dibask
 
Treasure Data and OSS
Treasure Data and OSSTreasure Data and OSS
Treasure Data and OSSN Masahiro
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWSDanilo Poccia
 
Microsoft PaaS Cloud Windows Azure Platform
Microsoft PaaS Cloud Windows Azure PlatformMicrosoft PaaS Cloud Windows Azure Platform
Microsoft PaaS Cloud Windows Azure PlatformEsri
 
(Draft) lambda architecture by using TreasureData
(Draft) lambda architecture by using TreasureData(Draft) lambda architecture by using TreasureData
(Draft) lambda architecture by using TreasureDataToru Takahashi
 
BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)
BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)
BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)pmluque
 

Destaque (20)

Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
AWS User Group Sydney - Atlassian 5-10-16
AWS User Group Sydney - Atlassian 5-10-16AWS User Group Sydney - Atlassian 5-10-16
AWS User Group Sydney - Atlassian 5-10-16
 
应用开发利器 IBM Bluemix平台云介绍
应用开发利器 IBM Bluemix平台云介绍应用开发利器 IBM Bluemix平台云介绍
应用开发利器 IBM Bluemix平台云介绍
 
Hadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-TenancyHadoop meets Cloud with Multi-Tenancy
Hadoop meets Cloud with Multi-Tenancy
 
Azure: PaaS or IaaS
Azure: PaaS or IaaSAzure: PaaS or IaaS
Azure: PaaS or IaaS
 
PaaS is dead, Long live PaaS - Defrag 2016
PaaS is dead, Long live PaaS - Defrag 2016PaaS is dead, Long live PaaS - Defrag 2016
PaaS is dead, Long live PaaS - Defrag 2016
 
Toyko azure meetup # 1 azure paa s overview
Toyko azure meetup # 1   azure paa s overviewToyko azure meetup # 1   azure paa s overview
Toyko azure meetup # 1 azure paa s overview
 
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingMicrosoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature Mapping
 
Define y desarrolla tu primera api
Define y desarrolla tu primera apiDefine y desarrolla tu primera api
Define y desarrolla tu primera api
 
D naiyer resume
D naiyer resumeD naiyer resume
D naiyer resume
 
Big data y las apis (big data spain)
Big data y las apis (big data spain)Big data y las apis (big data spain)
Big data y las apis (big data spain)
 
Big Data as PaaS in Enterprises
Big Data as PaaS in EnterprisesBig Data as PaaS in Enterprises
Big Data as PaaS in Enterprises
 
Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...
Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...
Using Red Hat’s OpenShift PaaS to Develop Scalable Applications on AWS (DMG21...
 
Database Consolidation using Oracle Multitenant
Database Consolidation using Oracle MultitenantDatabase Consolidation using Oracle Multitenant
Database Consolidation using Oracle Multitenant
 
Treasure Data and OSS
Treasure Data and OSSTreasure Data and OSS
Treasure Data and OSS
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWS
 
Microsoft PaaS Cloud Windows Azure Platform
Microsoft PaaS Cloud Windows Azure PlatformMicrosoft PaaS Cloud Windows Azure Platform
Microsoft PaaS Cloud Windows Azure Platform
 
Bi risk services 2013
Bi risk services 2013Bi risk services 2013
Bi risk services 2013
 
(Draft) lambda architecture by using TreasureData
(Draft) lambda architecture by using TreasureData(Draft) lambda architecture by using TreasureData
(Draft) lambda architecture by using TreasureData
 
BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)
BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)
BIG DATA en CLOUD PaaS para Internet de las Cosas (IoT)
 

Semelhante a Treasure Data PaaS Architecture on AWS

Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsInside Analysis
 
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)John Adams
 
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Cloudera, Inc.
 
情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure DataTreasure Data, Inc.
 
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17Makoto Yui
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
Db tech show - hivemall
Db tech show - hivemallDb tech show - hivemall
Db tech show - hivemallMakoto Yui
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionSplunk
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Databricks
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Karthik Murugesan
 
Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0Bratamay Majumder
 
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value Splunk
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera, Inc.
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesDatabricks
 

Semelhante a Treasure Data PaaS Architecture on AWS (20)

Treasure Data and Heroku
Treasure Data and HerokuTreasure Data and Heroku
Treasure Data and Heroku
 
Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise Analytics
 
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
 
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
 
情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data情報処理学会 Exciting Coding! Treasure Data
情報処理学会 Exciting Coding! Treasure Data
 
Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17Talk about Hivemall at Data Scientist Organization on 2015/09/17
Talk about Hivemall at Data Scientist Organization on 2015/09/17
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Db tech show - hivemall
Db tech show - hivemallDb tech show - hivemall
Db tech show - hivemall
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Big Data
Big DataBig Data
Big Data
 
Treasure Data Cloud Strategy
Treasure Data Cloud StrategyTreasure Data Cloud Strategy
Treasure Data Cloud Strategy
 
20100301icde
20100301icde20100301icde
20100301icde
 
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
 
Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0Oracle Modern Information Management Platform - v1.0
Oracle Modern Information Management Platform - v1.0
 
Future of BI Deck
Future of BI Deck Future of BI Deck
Future of BI Deck
 
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean Downes
 

Mais de Treasure Data, Inc.

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersTreasure Data, Inc.
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketTreasure Data, Inc.
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data PlatformsTreasure Data, Inc.
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowTreasure Data, Inc.
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsTreasure Data, Inc.
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataTreasure Data, Inc.
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataTreasure Data, Inc.
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data DotsTreasure Data, Inc.
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessTreasure Data, Inc.
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Treasure Data, Inc.
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)Treasure Data, Inc.
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallTreasure Data, Inc.
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataTreasure Data, Inc.
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...Treasure Data, Inc.
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudTreasure Data, Inc.
 

Mais de Treasure Data, Inc. (20)

GDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for MarketersGDPR: A Practical Guide for Marketers
GDPR: A Practical Guide for Marketers
 
AR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and MarketAR and VR by the Numbers: A Data First Approach to the Technology and Market
AR and VR by the Numbers: A Data First Approach to the Technology and Market
 
Introduction to Customer Data Platforms
Introduction to Customer Data PlatformsIntroduction to Customer Data Platforms
Introduction to Customer Data Platforms
 
Hands On: Javascript SDK
Hands On: Javascript SDKHands On: Javascript SDK
Hands On: Javascript SDK
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
 
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and AppsBrand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
 
How to Power Your Customer Experience with Data
How to Power Your Customer Experience with DataHow to Power Your Customer Experience with Data
How to Power Your Customer Experience with Data
 
Why Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without DataWhy Your VR Game is Virtually Useless Without Data
Why Your VR Game is Virtually Useless Without Data
 
Connecting the Customer Data Dots
Connecting the Customer Data DotsConnecting the Customer Data Dots
Connecting the Customer Data Dots
 
Harnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company SuccessHarnessing Data for Better Customer Experience and Company Success
Harnessing Data for Better Customer Experience and Company Success
 
Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017Packaging Ecosystems -Monki Gras 2017
Packaging Ecosystems -Monki Gras 2017
 
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
 
Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14Keynote - Fluentd meetup v14
Keynote - Fluentd meetup v14
 
Introduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of HivemallIntroduction to New features and Use cases of Hivemall
Introduction to New features and Use cases of Hivemall
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
 
Using Embulk at Treasure Data
Using Embulk at Treasure DataUsing Embulk at Treasure Data
Using Embulk at Treasure Data
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
 
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...Treasure Data:  Move your data from MySQL to Redshift with (not much more tha...
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 

Último

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Último (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Treasure Data PaaS Architecture on AWS

  • 1. Treasure Data The architecture of data analytics PaaS on AWS Masahiro Nakagawa JAWS Days: 2013/03/16 Friday, April 5, 13
  • 2. Who are you?  Masahiro Nakagawa • @repeatedly / masa@treasure-data.com  Treasure Data, Inc. • Senior Software Engineer, since 2012/11  Open Source projects • D Programming Language • MessagePack: D, Python, etc... • Fluentd: Core, mongo, etc... • etc... 2 Friday, April 5, 13
  • 3. Introduction to Treasure Data Friday, April 5, 13
  • 4. Company Overview  Silicon Valley-based Company • All Founders are Japanese • Hironobu Yoshikawa • Kazuki Ohta • Sadayuki Furuhashi  OSS Enthusiasts • MessagePack, Fluentd, etc. 4 Friday, April 5, 13
  • 5. Investors  Bill Tai  Naren Gupta - Nexus Ventures, Director of Redhat, TIBCO  Othman Laraki - Former VP Growth at Twitter  James Lindenbaum, Adam Wiggins, Orion Henry - Heroku Founders  Anand Babu Periasamy, Hitesh Chellani - Gluster Founders  Yukihiro “Matz” Matsumoto - Creator of Ruby  Dan Scheinman - Director of Arista Networks  Jerry Yang - Founder of Yahoo!  + 10 more people • and.... 5 Friday, April 5, 13
  • 6. Treasure Data = Cloud + Big Data Cloud Big Data-as-a-Service Database-as-a-service Enterprise Lightweight RDBMS Traditional RDBMS Data Warehouse DB2 On-Premise $34B $10B market market 1Bil entry Data Volume Or 10TB © 2012 Forrester Research, Inc. Reproduction Prohibited 6 Friday, April 5, 13
  • 7. Why Cloud? ‘Time’ is Money Ideal Customer Expectation Value Obsolete over time Reality (On-Premise) Upgrade HW/SW Selection, PoC, Deploy... Time Sign-up or PO 7 Friday, April 5, 13
  • 8. Big Data Adoption Stages Optimization What’s the best? Predictive Analysis What’s a trend? Analytics Statistical Analysis Treasure Data’s FOCUS Why? Alerts Error?(80% of needs) Drill Down Query Where exactly? Reporting Ad-hoc Reports Where? Standard Reports What happened? Intelligence Sophistication 8 Friday, April 5, 13
  • 9. Full Stack Support for Big Data Reporting Our best-in-class architecture Data from almost any source and operations team ensure the can be securely and reliably integrity and availability of your uploaded using td-agent in data. streaming or batch mode. Our SQL, REST, JDBC, ODBC You can store gigabytes to and command-line interfaces petabytes of data efficiently and support all major query tools securely in our cloud-based and approaches. columnar datastore. 9 Friday, April 5, 13
  • 10. Vision: Single Analytics Platform for the World 10 Friday, April 5, 13
  • 11. 11 Our Customers – Fortune Global 500 leaders and start-ups including: Friday, April 5, 13
  • 12. Treasure Data’s Service Architecture Friday, April 5, 13
  • 13. Treasure Data = Collect + Store + Query 13 Friday, April 5, 13
  • 14. Example in AdTech: MobFox 1. Europe’s largest independent mobile ad exchange. 2. 20 billion imps/month (circa Jan. 2013) 3. Serving ads for 15,000+ mobile apps (circa Jan. 2013) 4. Needed Big Data Analytics infrastructure ASAP. 14 Friday, April 5, 13
  • 15. Two Weeks From Start to Finish! 15 Friday, April 5, 13
  • 16. Used AWS Products (1)  RDS • Store user information, job status, etc... • Store metadata of our columnar database • Queue of worker (perfectqueue / perfectsched)  EC2 • API servers • Hadoop clusters • Job workers • Using Chef to deploy 16 Friday, April 5, 13
  • 17. Used AWS Products (2)  ELB • Load balancing of API servers • Load balancing of td-agents  S3 • Columnar storage built on top of S3 • MessagePack columnar format • realtime / archive storage • Our Result feature supports S3 output. No EMR, SQS and other products ! 17 Friday, April 5, 13
  • 18. Architecture Breakdown Data Collection Data Store/Analytics Connectivity • Increasing variety of • Remaining complexity in • Required to ensure data sources both traditional DWH connectivity with • No single data schema and Hadoop (very slow existing BI/visualization/ • Lack of streaming data time to market) apps by JDBC, REST collection method • Challenges in scaling and ODBC. • 60% of Big Data project data volume and • Output ot other services, resource consumed expanding cost. e.g. S3, RDBMS, etc. 18 Friday, April 5, 13
  • 19. 1) Data Collection  60% of BI project resource is consumed here  Most ‘underestimated’ and ‘unsexy’ but MOST important  Fluentd: OSS lightweight but robust Log Collector • http://fluentd.org/ 19 Friday, April 5, 13
  • 20. Fluentd the missing log collector fluentd.org 20 Friday, April 5, 13
  • 21. In short  Open sourced log collector written in Ruby  Using rubygems ecosystem for plugins It’s like syslogd, but uses JSON for log messages 21 Friday, April 5, 13
  • 22. Time 2012-02-04 01:33:51 Apache Tag apache.log Record { "host": "127.0.0.1", tail "method": "GET", "path": "/", write ... } insert 127.0.0.1 127.0.0.1 127.0.0.1 - - - - - - [11/Dec/2012:07:26:27] [11/Dec/2012:07:26:30] [11/Dec/2012:07:26:32] "GET "GET "GET / / / ... ... ... Fluentd 127.0.0.1 - - [11/Dec/2012:07:26:40] "GET / ... 127.0.0.1 - - [11/Dec/2012:07:27:01] "GET / ... ... event buffering Mongo 22 Friday, April 5, 13
  • 23. Architecture Pluggable Pluggable Pluggable Input Buffer Output > Forward > Memory > Forward > HTTP > File > File > File tail > Amazon S3 > dstat > MongoDB > ... > ... 23 Friday, April 5, 13
  • 24. Before Fluentd Server1 Server2 Server3 Application Application Application ・・・ ・・・ ・・・ High Latency! must wait for a day... Fluent Log Server 24 Friday, April 5, 13
  • 25. After Fluentd Server1 Server2 Server3 Application Application Application Fluentd ・・・ Fluentd ・・・ Fluentd ・・・ In streaming! Fluentd Fluentd 25 Friday, April 5, 13
  • 26. Access logs Alerting Apache Nagios App logs Analysis Frontend MongoDB Backend MySQL System logs Hadoop syslogd filter / buffer / routing Archiving Databases Amazon S3 26 Friday, April 5, 13
  • 27. td-agent  Open sourced distribution package of fluentd  ETL part of Treasure Data  Including useful components • ruby, jemalloc, fluentd • 3rd party gems: td, mongo, webhdfs, etc... • td plugin is for Treasure Data  http://packages.treasure-data.com/ 27 Friday, April 5, 13
  • 28. Treasure Data Service Architecture This! Apache App Treasure Data td-agent columnar data App RDBMS warehouse Other data sources MAPREDUCE JOBS HIVE, PIG (to be supported) td-command Query Query Processing API JDBC, REST Cluster User BI apps 28 Friday, April 5, 13
  • 29. AWS plugins  S3  SNS  SQS  DynamoDB  foward-aws  RDS http://fluentd.org/plugin/  RedShift  CloudWatch  Yet Another Cloud Watch  CloudWatch Lite 29 Friday, April 5, 13
  • 30. 2) Data Store / Analytics - Columnar Storage 30 Friday, April 5, 13
  • 31. Treasure Data Service Processing Flow Worker Frontend Job Queue Hadoop Hadoop Applications push metrics to Fluentd sums up data minutes (via local Fluentd) Fluentd Fluentd (partial aggregation) Treasure Librato Metrics Data for historical analysis for realtime analysis 31 Friday, April 5, 13
  • 33. Structure of Columnar Storages import bulk import SELECT ... Import Storage Bulk Import Storage Realtime Storage Archive Storage merge (every 1 hour) 23c82b0ba3405d4c15aa85d2190e 2013-03-15 00:23:00 912ec80 6d7b1482412ab14f0332b8aee119 2013-03-16 00:01:00 277a259 8a7bc848b2791b8fd603c719e54f ... 0e3d402b17638477c9a7977e7dab ... 33 Friday, April 5, 13
  • 34. Query Language Query Execution Columnar Data Object Storage 34 Friday, April 5, 13
  • 35. 1/4: Compile SQL into MapReduce SQL Statement SELECT COUNT(DISTINCT ip) FROM tbl; Hive SQL - to - MapReduce 35 Friday, April 5, 13
  • 36. 2/4: MapReduce is executed in parallel SELECT COUNT(DISTINCT ip) FROM tbl; cc2.8xlarge cluster compute instance (up to 100 nodes * 32 threads) 36 Friday, April 5, 13
  • 37. 3/4: Columnar Data Access SELECT COUNT(DISTINCT ip) FROM tbl; 10Gbps Network Read ONLY the Required Part of Data 37 Friday, April 5, 13
  • 38. 4/4: Object-based Storage 38 Friday, April 5, 13
  • 39. Data first, Schema later SELECT 54 (int) “test” (string) 120 (int) NULL Schema user:int name:string value:int host:int Raw data(JSON) {“user”:54, “name”:”test”, “value”:”120”, “host”:”local”} 39 Friday, April 5, 13
  • 40. 3) Connectivity REST API td-command Query Query Query API Processing JDBC, ODBC Driver Cluster BI apps Web App Treasure Data Result MySQL Columnar Storage S3 … 40 Friday, April 5, 13
  • 41. Multi-Tenancy  All customers share the Hadoop clusters (Multi Data Centers)  Resource Sharing (Burst Cores), Rapid Improvement, Ease of Upgrade Job Submission + Plan Change Local FairScheduler datacenter A Local FairScheduler Global datacenter B Scheduler Local FairScheduler datacenter C On-Demand Resouce Allocation Local FairScheduler datacenter D 41 Friday, April 5, 13
  • 42. Conclusion  Treasure Data • Cloud based Big-data analytics platform • Provide Machete for Big data reporting  Big Data processing • Collect / Store / Analytics / Visualization Our focus!  Our used AWS products • EC2, S3, RDS, ELB • Building Treasure Data specific systems on AWS 42 Friday, April 5, 13
  • 43. Big Data for the Rest of Us www.treasure-data.com | @TreasureData Friday, April 5, 13