SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
Treasure Data:
Big Data Analytics on Heroku
Muga Nishizawa, Chief Software Architect
Muga Nishizawa (@muga_nishizawa)
Chief Software Architect, Treasure Data
Treasure Data Overview
 Founded to deliver big data analytics in days not months without
  specialist IT resources for one-tenth the cost of other alternatives
 Service based subscription business model
 World class open source team
  • Founded world’s largest Hadoop User Group
  • Developed Fluentd and MessagePack
  • Contributed to Memcached, Hibernate, etc.
 Treasure Data is in production
  • 20 customers incl. Fortune 500 companies
  • 100+ billion records stored
 Processing 10,000 messages per second




                                                                    3
Our Customers – Fortune Global 500
leaders and start-ups including:




                                     4
One Hundred Billion Records and
Growing!
120
100
 80
 60
 40
 20
      Sep    Nov     Jan   Mar    May     Jul   Aug
      2011   2011   2012   2012   2012   2012   2012




                                                       5
Treasure Data Service
 “Store Your Data Now for Future Insights”




                                             6
Treasure Data Service
    “Store Your Data Now for Future Insights”
               User
   Apache

   App
                                                           Treasure Data
               RDBMS                                   columnar data storage
   App

   Other data sources

                                                                  MAPREDUCE JOBS

                        HIVE, PIG (to be supported)
         td-command
                                                                Query
                                                      Query
                                                                Processing
                                                       API
                        JDBC, REST                              Cluster
User        BI apps



                                                                               7
Treasure Data Service
    “Store Your Data Now for Future Insights”
               User
   Apache
                                                      2012-02-04 01:33:51
   App
                                                      myappdb.buylog { Data
                                                                 Treasure
               RDBMS                                          columnar data storage
   App
                                                          “user”: ”12345”,
   Other data sources
                                                          “path”: “/buyItem”,
                                                          “price”: 150,
                                                                      MAPREDUCE JOBS


         td-command
                        HIVE, PIG (to be supported)
                                                          “referer”: “/landing”
                                                                       Query
                                                      }     Query
                                                                       Processing
                                                             API
                        JDBC, REST                                     Cluster
User        BI apps



                                                                                      8
Treasure Data Service
    “Store Your Data Now for Future Insights”
               User
   Apache
                              $ td query -w -d myappdb 
   App                             "SELECT                Treasure Data
   App         RDBMS                     TD_TIME_FORMAT(time, data storage
                                                       columnar "yyyy-MM-dd", "PDT"
                                         COUNT(1) AS cnt 
   Other data sources                FROM buylog 
                                     GROUP BY TD_TIME_FORMAT(time, "yyyy-MM-dd"
                                                                   MAPREDUCE JOBS
                                     ORDER BY cnt"
                      HIVE, PIG (to be supported)
        td-command
                                                          Query
                                                Query
                                                          Processing
                                                 API
                      JDBC, REST                          Cluster
User        BI apps



                                                                       9
Treasure Data Service
    “Store Your Data Now for Future Insights”
               User
   Apache

   App
                                                         Treasure Data
               RDBMS                                 columnar data storage
   App

   Other data sources
                                     +------------+------+
                                                                MAPREDUCE JOBS
                                     | day          | cnt |
                        HIVE, PIG (to+------------+------+
                                      be supported)
         td-command
                                     | 2012-05-26 | 4981 |    Query
                                                      Query
                                                              Processing
                        JDBC, REST| 2012-05-27 | 4481 |
                                                       API
                                                              Cluster
User        BI apps                  | 2012-05-28 | 481 |
                                     +------------+------+

                                                                             10
Comparing On-Premise & Cloud Big Data Mkts
       Cloud

                             Database-                 Big Data-as-a-
                               as-a-                      Service
                              Service




                             Traditional
                               DBMS                              Hadoop
                           (ODS, Data Mart)        Data
                                                 Warehouse

     On-Premise


                       Low                    Data Volume               High

  © 2012 Forrester Research, Inc. Reproduction Prohibited                      11
Treasure Data as Heroku Add-on




                                 12
Demo with Heroku




                   13
Synergy Effect for Data-Driven
Development!



                  ×


                                 14
                                      1
                                      0
The Power of the Cloud

Easier to Scale
Easier to Maintain
Easier to Iterate



                         15
                              1
                              1
Implementation Process
Traditional DW and
On-Premise Big Data




                           16
Implementation Process
Traditional DW and                   Heroku
On-Premise Big Data                     ×
                                  Treasure Data




                      Dramatically streamlined
                      Implementation process




                                                  17
Viki.com: “Global Hulu”




                          18
                               1
                               4
Viki Before
 Hard to manage Hadoop
 Complicated data collection




                                19
Viki After
 No more Hadoop maintenance
 Versatile data collector, td-agent




                                       20
Please Try It!




                 21
How Does It Work?




                    22
Query Processing
                   Query Language




                   Query Execution




                   Columnar Data




                   Object Storage




                             23
1/4: Compile SQL into MapReduce


                 SELECT COUNT(DISTINCT ip) FROM tbl;




                                                       24
2/4: MapReduce is executed in parallel

                                         SELECT COUNT(DISTINCT ip) FROM tbl;




      cc2.8xlarge cluster compute instance (up to 100 nodes * 32 threads)


                                                                               25
3/4: Columnar Data Access
                                         SELECT COUNT(DISTINCT ip) FROM tbl;




   10Gbps Network




                    Read ONLY the Required Part of Data


                                                                               26
4/4: Object-based Storage




                            27
Enjoy Data-Driven Development!




                                 28
Big Data for the Rest of Us

www.treasure-data.com | @TreasureData
Great Investors
   Bill Tai
   Naren Gupta –Nexus Ventures, Director of Red Hat, TIBCO
   Dave Stamm – Clarify, Daisy Systems, Enkata
   Othman Laraki –Twitter
   James Lindembaum, Adam Wiggins and Orion Henry – Heroku
   Anand Babu Periasamy and Hitesh Chellani –Gluster
   Yukihiro “Matz” Matsumoto – Creator of Ruby, now at Heroku
   Dan Schienman – Former Cisco SVP
   Jean-Philippe Emelie Marcos – Tango, D.E. Shaw
   + executives from Cisco, Red Hat, Salesforce.com, GREE




                                                                 32
What are your options?
  Traditional                        OnPremise Hadoop
                                       • Never design for analytic
                                         processing
                                       • Too many people
                                       • Too much software from too
                                         many sources

                                      Cloud Hadoop
    Too much complexity
                                       • Partial solution
    Too long to get live
                                       • Vendor lock-in
    Too expensive to maintain
    Can only innovate at speed of
     vendor




                                                                      33
Confidenti
   34
al
Example Use Case – MySQL to TD




                                 35
Example Use Case – MySQL to TD




                                 36

Mais conteúdo relacionado

Mais procurados

Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjay
Ajay Shriwastava
 
Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web Services
Amazon Web Services
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
DataWorks Summit
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
DataWorks Summit
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
OW2
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
DataWorks Summit
 
Delivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDelivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated Architecture
DataWorks Summit
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
DataWorks Summit
 

Mais procurados (20)

VMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware Hadoop
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjay
 
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
 
Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web Services
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
 
Red Hat - Presentation at Hortonworks Booth - Strata 2014
Red Hat - Presentation at Hortonworks Booth - Strata 2014Red Hat - Presentation at Hortonworks Booth - Strata 2014
Red Hat - Presentation at Hortonworks Booth - Strata 2014
 
Extending the EDW with Hadoop - Chicago Data Summit 2011
Extending the EDW with Hadoop - Chicago Data Summit 2011Extending the EDW with Hadoop - Chicago Data Summit 2011
Extending the EDW with Hadoop - Chicago Data Summit 2011
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
 
Oracle Data Warehouse
Oracle Data WarehouseOracle Data Warehouse
Oracle Data Warehouse
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
 
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
 
Delivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated ArchitectureDelivering on the Hadoop/HBase Integrated Architecture
Delivering on the Hadoop/HBase Integrated Architecture
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 

Semelhante a Treasure Data: Big Data Analytics on Heroku

BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
DataWorks Summit
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoop
skaluska
 
The SPOSAD Architectural Style for Multi-tenant Software Applications
The SPOSAD Architectural Style for Multi-tenant Software ApplicationsThe SPOSAD Architectural Style for Multi-tenant Software Applications
The SPOSAD Architectural Style for Multi-tenant Software Applications
Heiko Koziolek
 

Semelhante a Treasure Data: Big Data Analytics on Heroku (20)

Fluentd meetup #3
Fluentd meetup #3Fluentd meetup #3
Fluentd meetup #3
 
How Klout is changing the landscape of social media with Hadoop and BI
How Klout is changing the landscape of social media with Hadoop and BIHow Klout is changing the landscape of social media with Hadoop and BI
How Klout is changing the landscape of social media with Hadoop and BI
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
Hadoop World 2011: Data Ingestion, Egression, and Preparation for Hadoop - Sa...
 
Information processing architectures
Information processing architecturesInformation processing architectures
Information processing architectures
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoop
 
Galaxy of bits
Galaxy of bitsGalaxy of bits
Galaxy of bits
 
The SPOSAD Architectural Style for Multi-tenant Software Applications
The SPOSAD Architectural Style for Multi-tenant Software ApplicationsThe SPOSAD Architectural Style for Multi-tenant Software Applications
The SPOSAD Architectural Style for Multi-tenant Software Applications
 
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are InterchangeableMyth Busters II: BI Tools and Data Virtualization are Interchangeable
Myth Busters II: BI Tools and Data Virtualization are Interchangeable
 
Big Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS CloudBig Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS Cloud
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Klout changing landscape of social media
Klout changing landscape of social mediaKlout changing landscape of social media
Klout changing landscape of social media
 
Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案
 
Couchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = ThreeCouchbase Server and IBM BigInsights: One + One = Three
Couchbase Server and IBM BigInsights: One + One = Three
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 

Mais de Salesforce Developers Japan

データ連携の新しいカタチ - 変更データキャプチャ/プラットフォームイベントを MuleSoft Anypoint Platform と組み合わせて試してみよう
データ連携の新しいカタチ - 変更データキャプチャ/プラットフォームイベントを MuleSoft Anypoint Platform と組み合わせて試してみようデータ連携の新しいカタチ - 変更データキャプチャ/プラットフォームイベントを MuleSoft Anypoint Platform と組み合わせて試してみよう
データ連携の新しいカタチ - 変更データキャプチャ/プラットフォームイベントを MuleSoft Anypoint Platform と組み合わせて試してみよう
Salesforce Developers Japan
 
業務課題の解決に、データ分析・予測結果の活用を - Einstein Discovery / Einstein 予測ビルダーのご紹介 -
業務課題の解決に、データ分析・予測結果の活用を - Einstein Discovery / Einstein 予測ビルダーのご紹介 -業務課題の解決に、データ分析・予測結果の活用を - Einstein Discovery / Einstein 予測ビルダーのご紹介 -
業務課題の解決に、データ分析・予測結果の活用を - Einstein Discovery / Einstein 予測ビルダーのご紹介 -
Salesforce Developers Japan
 

Mais de Salesforce Developers Japan (20)

Salesforce DX の始め方とパートナー様成功事例
Salesforce DX の始め方とパートナー様成功事例Salesforce DX の始め方とパートナー様成功事例
Salesforce DX の始め方とパートナー様成功事例
 
データ連携の新しいカタチ - 変更データキャプチャ/プラットフォームイベントを MuleSoft Anypoint Platform と組み合わせて試してみよう
データ連携の新しいカタチ - 変更データキャプチャ/プラットフォームイベントを MuleSoft Anypoint Platform と組み合わせて試してみようデータ連携の新しいカタチ - 変更データキャプチャ/プラットフォームイベントを MuleSoft Anypoint Platform と組み合わせて試してみよう
データ連携の新しいカタチ - 変更データキャプチャ/プラットフォームイベントを MuleSoft Anypoint Platform と組み合わせて試してみよう
 
Einstein Analyticsでのデータ取り込みと加工
Einstein Analyticsでのデータ取り込みと加工Einstein Analyticsでのデータ取り込みと加工
Einstein Analyticsでのデータ取り込みと加工
 
GMOペパボのエンジニアが語るHeroku活用ノウハウ
GMOペパボのエンジニアが語るHeroku活用ノウハウGMOペパボのエンジニアが語るHeroku活用ノウハウ
GMOペパボのエンジニアが語るHeroku活用ノウハウ
 
Salesforce Big Object 最前線
Salesforce Big Object 最前線Salesforce Big Object 最前線
Salesforce Big Object 最前線
 
Salesforce 開発者向け最新情報 Web セミナー 〜 TrailheaDX での新発表 & Summer '19 リリース新機能 〜
Salesforce 開発者向け最新情報 Web セミナー 〜 TrailheaDX での新発表 & Summer '19 リリース新機能 〜Salesforce 開発者向け最新情報 Web セミナー 〜 TrailheaDX での新発表 & Summer '19 リリース新機能 〜
Salesforce 開発者向け最新情報 Web セミナー 〜 TrailheaDX での新発表 & Summer '19 リリース新機能 〜
 
Einstein Next Best Action を試してみよう
Einstein Next Best Action を試してみようEinstein Next Best Action を試してみよう
Einstein Next Best Action を試してみよう
 
Salesforce DXとLightning Web ComponentsでモダンSalesforceアプリ開発
Salesforce DXとLightning Web ComponentsでモダンSalesforceアプリ開発Salesforce DXとLightning Web ComponentsでモダンSalesforceアプリ開発
Salesforce DXとLightning Web ComponentsでモダンSalesforceアプリ開発
 
Lightning時代のService Cloud概要とカスタマイズ
Lightning時代のService Cloud概要とカスタマイズLightning時代のService Cloud概要とカスタマイズ
Lightning時代のService Cloud概要とカスタマイズ
 
Spring '19リリース開発者向け新機能セミナー
Spring '19リリース開発者向け新機能セミナーSpring '19リリース開発者向け新機能セミナー
Spring '19リリース開発者向け新機能セミナー
 
業務課題の解決に、データ分析・予測結果の活用を - Einstein Discovery / Einstein 予測ビルダーのご紹介 -
業務課題の解決に、データ分析・予測結果の活用を - Einstein Discovery / Einstein 予測ビルダーのご紹介 -業務課題の解決に、データ分析・予測結果の活用を - Einstein Discovery / Einstein 予測ビルダーのご紹介 -
業務課題の解決に、データ分析・予測結果の活用を - Einstein Discovery / Einstein 予測ビルダーのご紹介 -
 
Einstein analyticsdashboardwebinar
Einstein analyticsdashboardwebinarEinstein analyticsdashboardwebinar
Einstein analyticsdashboardwebinar
 
MuleSoft Anypoint Platformのコンセプトとサービス
MuleSoft Anypoint PlatformのコンセプトとサービスMuleSoft Anypoint Platformのコンセプトとサービス
MuleSoft Anypoint Platformのコンセプトとサービス
 
IoTで成功を収めるための製品と戦略 〜 Salesforce IoT 〜
IoTで成功を収めるための製品と戦略 〜 Salesforce IoT 〜IoTで成功を収めるための製品と戦略 〜 Salesforce IoT 〜
IoTで成功を収めるための製品と戦略 〜 Salesforce IoT 〜
 
Heroku seminar winter19
Heroku seminar winter19Heroku seminar winter19
Heroku seminar winter19
 
Dreamforce18 update platform
Dreamforce18 update platformDreamforce18 update platform
Dreamforce18 update platform
 
Winter '19 開発者向け新機能
Winter '19 開発者向け新機能Winter '19 開発者向け新機能
Winter '19 開発者向け新機能
 
Lightning時代のレポート ダッシュボード & Flow 最前線
Lightning時代のレポート ダッシュボード & Flow 最前線Lightning時代のレポート ダッシュボード & Flow 最前線
Lightning時代のレポート ダッシュボード & Flow 最前線
 
Summer18 開発者向け新機能Webセミナー
Summer18 開発者向け新機能WebセミナーSummer18 開発者向け新機能Webセミナー
Summer18 開発者向け新機能Webセミナー
 
使ってみよう、Salesforce Big Object!
使ってみよう、Salesforce Big Object!使ってみよう、Salesforce Big Object!
使ってみよう、Salesforce Big Object!
 

Treasure Data: Big Data Analytics on Heroku

  • 1. Treasure Data: Big Data Analytics on Heroku Muga Nishizawa, Chief Software Architect
  • 2. Muga Nishizawa (@muga_nishizawa) Chief Software Architect, Treasure Data
  • 3. Treasure Data Overview  Founded to deliver big data analytics in days not months without specialist IT resources for one-tenth the cost of other alternatives  Service based subscription business model  World class open source team • Founded world’s largest Hadoop User Group • Developed Fluentd and MessagePack • Contributed to Memcached, Hibernate, etc.  Treasure Data is in production • 20 customers incl. Fortune 500 companies • 100+ billion records stored  Processing 10,000 messages per second 3
  • 4. Our Customers – Fortune Global 500 leaders and start-ups including: 4
  • 5. One Hundred Billion Records and Growing! 120 100 80 60 40 20 Sep Nov Jan Mar May Jul Aug 2011 2011 2012 2012 2012 2012 2012 5
  • 6. Treasure Data Service “Store Your Data Now for Future Insights” 6
  • 7. Treasure Data Service “Store Your Data Now for Future Insights” User Apache App Treasure Data RDBMS columnar data storage App Other data sources MAPREDUCE JOBS HIVE, PIG (to be supported) td-command Query Query Processing API JDBC, REST Cluster User BI apps 7
  • 8. Treasure Data Service “Store Your Data Now for Future Insights” User Apache 2012-02-04 01:33:51 App myappdb.buylog { Data Treasure RDBMS columnar data storage App “user”: ”12345”, Other data sources “path”: “/buyItem”, “price”: 150, MAPREDUCE JOBS td-command HIVE, PIG (to be supported) “referer”: “/landing” Query } Query Processing API JDBC, REST Cluster User BI apps 8
  • 9. Treasure Data Service “Store Your Data Now for Future Insights” User Apache $ td query -w -d myappdb App "SELECT Treasure Data App RDBMS TD_TIME_FORMAT(time, data storage columnar "yyyy-MM-dd", "PDT" COUNT(1) AS cnt Other data sources FROM buylog GROUP BY TD_TIME_FORMAT(time, "yyyy-MM-dd" MAPREDUCE JOBS ORDER BY cnt" HIVE, PIG (to be supported) td-command Query Query Processing API JDBC, REST Cluster User BI apps 9
  • 10. Treasure Data Service “Store Your Data Now for Future Insights” User Apache App Treasure Data RDBMS columnar data storage App Other data sources +------------+------+ MAPREDUCE JOBS | day | cnt | HIVE, PIG (to+------------+------+ be supported) td-command | 2012-05-26 | 4981 | Query Query Processing JDBC, REST| 2012-05-27 | 4481 | API Cluster User BI apps | 2012-05-28 | 481 | +------------+------+ 10
  • 11. Comparing On-Premise & Cloud Big Data Mkts Cloud Database- Big Data-as-a- as-a- Service Service Traditional DBMS Hadoop (ODS, Data Mart) Data Warehouse On-Premise Low Data Volume High © 2012 Forrester Research, Inc. Reproduction Prohibited 11
  • 12. Treasure Data as Heroku Add-on 12
  • 14. Synergy Effect for Data-Driven Development! × 14 1 0
  • 15. The Power of the Cloud Easier to Scale Easier to Maintain Easier to Iterate 15 1 1
  • 16. Implementation Process Traditional DW and On-Premise Big Data 16
  • 17. Implementation Process Traditional DW and Heroku On-Premise Big Data × Treasure Data Dramatically streamlined Implementation process 17
  • 19. Viki Before  Hard to manage Hadoop  Complicated data collection 19
  • 20. Viki After  No more Hadoop maintenance  Versatile data collector, td-agent 20
  • 22. How Does It Work? 22
  • 23. Query Processing Query Language Query Execution Columnar Data Object Storage 23
  • 24. 1/4: Compile SQL into MapReduce SELECT COUNT(DISTINCT ip) FROM tbl; 24
  • 25. 2/4: MapReduce is executed in parallel SELECT COUNT(DISTINCT ip) FROM tbl; cc2.8xlarge cluster compute instance (up to 100 nodes * 32 threads) 25
  • 26. 3/4: Columnar Data Access SELECT COUNT(DISTINCT ip) FROM tbl; 10Gbps Network Read ONLY the Required Part of Data 26
  • 29. Big Data for the Rest of Us www.treasure-data.com | @TreasureData
  • 30.
  • 31.
  • 32. Great Investors  Bill Tai  Naren Gupta –Nexus Ventures, Director of Red Hat, TIBCO  Dave Stamm – Clarify, Daisy Systems, Enkata  Othman Laraki –Twitter  James Lindembaum, Adam Wiggins and Orion Henry – Heroku  Anand Babu Periasamy and Hitesh Chellani –Gluster  Yukihiro “Matz” Matsumoto – Creator of Ruby, now at Heroku  Dan Schienman – Former Cisco SVP  Jean-Philippe Emelie Marcos – Tango, D.E. Shaw  + executives from Cisco, Red Hat, Salesforce.com, GREE 32
  • 33. What are your options?  Traditional  OnPremise Hadoop • Never design for analytic processing • Too many people • Too much software from too many sources  Cloud Hadoop  Too much complexity • Partial solution  Too long to get live • Vendor lock-in  Too expensive to maintain  Can only innovate at speed of vendor 33
  • 34. Confidenti 34 al
  • 35. Example Use Case – MySQL to TD 35
  • 36. Example Use Case – MySQL to TD 36