SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
Data-Driven Development Era
and Its Technologies
Developers Summit 2015 Autumn (Oct 14, 2015)
Satoshi Tagomori (@tagomoris)
Satoshi "Moris" Tagomori
(@tagomoris)
Fluentd, Norikra, Hadoop, ...
Treasure Data, Inc.
HQ
Branch
http://www.treasuredata.com/
Main Topics around "Data"
• Data collection
• Storage
• Data processing
• Batch distributed processing
• Stream processing
• Machine Learning
• Near real-time query & Data lake
• Visualization
Data Analytics Flow
Collect Store Process Visualize
Data source
Reporting
Monitoring
Where before What
Using Services or Not
• Using services fully-managed:
• Google BigQuery & Dataflow
• Treasure Data services
• Using services self-managed:
• Amazon EMR & Redshift
• Google Cloud Dataproc
• Using your own environment & cluster
Using Services or Not
• Using services fully-managed:
• Google BigQuery & Dataflow
• Treasure Data services
• Using services self-managed:
• Amazon EMR & Redshift
• Google Cloud Dataproc
• Using your own environment & cluster
a bit more cost
extremely less efforts
fully controlled by self
extremely more efforts
less cost
less efforts
Using Services or Not:
"Use Services!"
To concentrate
DATA and Analytics,
NOT tools
Why should we use services?
• About distributed systems:
• hard to operate & upgrade
• impossible to "small-start"
• very hard to hire professional engineer
• Data Driven Development:
• collect/store data at first!
• consider output data at second!
• "before building your own environment"
Really? Are you TD guy?
• ...Really!
• But it requires very long discussions :P
• "スタートアップのデータ処理基盤、作るか、使うか"

http://tsuchinoko.dmmlabs.com/?p=1770
How to choose software/services
in
Data-Driven Development
"What" decides "How"
• Distributed systems are to solve problems
• There're many kind of data
• There're many problems
• Systems solve different problems from each other
• There are no "Silver bullet"!
What First, How Second
• What do you want to do?
• Reporting? Analytics? Recommendation? or ...
• What type of data you wan to process?
• Stored large log? Stream sensor data? or ...
• What is you need as result?
• CSV? Spreadsheet? Graph? DB Relation? or ...
How?(just for example)
• MapReduce, Tez
• Large batch jobs, big JOINs, high stability
• Spark
• Small/Middle batch jobs, machine learning
• Impala, Presto, Drill, Redshift, BigQuery
• Near-real-time search, small-to-large analytics
• Storm, Spark streaming
• Stream data conversion/aggregation
"Processing" is just a part
of whole dataflow!
Data Analytics Flow (again)
Collect Store Process Visualize
Data source
Reporting
Monitoring
Data Analytics Flow (again)
Collect Store Process Visualize
Data source
Reporting
Monitoring
Data Collection
• Data Driven Development -> collect at first!
• As batch: Data already exists as files
• Easily integrated with existing batch systems
• Sqoop, Embulk, ...
• As stream: Data just generated now
• Easily connected with monitoring systems
• Without burst network traffic
• Flume, Logstash, Fluentd, ...
Fluentd: Support Service
by SRA OSS
with Treasure Data
Released
TODAY!
Other Important Topics
• Storage: Performance, Availability, Schema management
• Apache Hadoop HDFS, Apache HBase, Amazon S3, Cloudera Kudu, ...
• Visualization: Functionality, Connectivity, Visibility
• Tableau, Pentaho, Many other enterprise products, ...
• Distributed Queues: Performance, Stability, Connectivity
• Apache Kafka, Amazon Kinesis, ...
Get Familiar with Options
NOT to Take Pains about Technology!
Concentrate
DATA and Analytics,
NOT tools.
Thanks!

Mais conteúdo relacionado

Mais procurados

A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
Itai Yaffe
 

Mais procurados (20)

Presto
PrestoPresto
Presto
 
Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Presto: SQL-on-Anything. Netherlands Hadoop User Group MeetupPresto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
Presto: SQL-on-Anything. Netherlands Hadoop User Group Meetup
 
Where Is My Data - ILTAM Session
Where Is My Data - ILTAM SessionWhere Is My Data - ILTAM Session
Where Is My Data - ILTAM Session
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Análisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackAnálisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic Stack
 
Hadoop at Ebay
Hadoop at EbayHadoop at Ebay
Hadoop at Ebay
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep dive
 
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方
 
Using Elasticsearch for Analytics
Using Elasticsearch for AnalyticsUsing Elasticsearch for Analytics
Using Elasticsearch for Analytics
 
Using Embulk at Treasure Data
Using Embulk at Treasure DataUsing Embulk at Treasure Data
Using Embulk at Treasure Data
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178
 
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLONAli Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 

Destaque

Hivemall v0.3の機能紹介@1st Hivemall meetup
Hivemall v0.3の機能紹介@1st Hivemall meetupHivemall v0.3の機能紹介@1st Hivemall meetup
Hivemall v0.3の機能紹介@1st Hivemall meetup
Makoto Yui
 

Destaque (20)

Stream processing in Mercari - Devsumi 2015 autumn LT
Stream processing in Mercari - Devsumi 2015 autumn LTStream processing in Mercari - Devsumi 2015 autumn LT
Stream processing in Mercari - Devsumi 2015 autumn LT
 
データファースト開発
データファースト開発データファースト開発
データファースト開発
 
失敗から学ぶ データ分析グループの チームマネジメント変遷
失敗から学ぶデータ分析グループのチームマネジメント変遷失敗から学ぶデータ分析グループのチームマネジメント変遷
失敗から学ぶ データ分析グループの チームマネジメント変遷
 
hivemallを使って4日間で性別推定した話
hivemallを使って4日間で性別推定した話hivemallを使って4日間で性別推定した話
hivemallを使って4日間で性別推定した話
 
Hivemall v0.3の機能紹介@1st Hivemall meetup
Hivemall v0.3の機能紹介@1st Hivemall meetupHivemall v0.3の機能紹介@1st Hivemall meetup
Hivemall v0.3の機能紹介@1st Hivemall meetup
 
Sano hmm 20150512
Sano hmm 20150512Sano hmm 20150512
Sano hmm 20150512
 
PGに簡単なゲームのやり方を学習させる Vol.1 - まずはQ学習を理解する
PGに簡単なゲームのやり方を学習させる Vol.1  - まずはQ学習を理解するPGに簡単なゲームのやり方を学習させる Vol.1  - まずはQ学習を理解する
PGに簡単なゲームのやり方を学習させる Vol.1 - まずはQ学習を理解する
 
Henry Cipolla - Data Driven Development
Henry Cipolla - Data Driven DevelopmentHenry Cipolla - Data Driven Development
Henry Cipolla - Data Driven Development
 
AURA: Aerial Unpaved Roads Assessment System Demonstration - Data Collection...
AURA: Aerial Unpaved Roads Assessment System Demonstration  - Data Collection...AURA: Aerial Unpaved Roads Assessment System Demonstration  - Data Collection...
AURA: Aerial Unpaved Roads Assessment System Demonstration - Data Collection...
 
Sales Tax Bootcamp for Amazon FBA Sellers
Sales Tax Bootcamp for Amazon FBA SellersSales Tax Bootcamp for Amazon FBA Sellers
Sales Tax Bootcamp for Amazon FBA Sellers
 
(MBL309) Analyze Mobile App Data and Build Predictive Applications
(MBL309) Analyze Mobile App Data and Build Predictive Applications(MBL309) Analyze Mobile App Data and Build Predictive Applications
(MBL309) Analyze Mobile App Data and Build Predictive Applications
 
同じサービスを ECSとOpsWorksで 運用してみた
同じサービスをECSとOpsWorksで運用してみた同じサービスをECSとOpsWorksで運用してみた
同じサービスを ECSとOpsWorksで 運用してみた
 
(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWS(BDT205) Your First Big Data Application On AWS
(BDT205) Your First Big Data Application On AWS
 
Improve Monitoring & Monetization of Your Mobile Apps
Improve Monitoring & Monetization of Your Mobile AppsImprove Monitoring & Monetization of Your Mobile Apps
Improve Monitoring & Monetization of Your Mobile Apps
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
BDT101 Big Data with Amazon Elastic MapReduce - AWS re: Invent 2012
 
(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...
(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...
(MBL311) Workshop: Build an Android App Using AWS Mobile Services | AWS re:In...
 
TDD: Test Driven Development 첫번째 이야기
TDD: Test Driven Development 첫번째 이야기TDD: Test Driven Development 첫번째 이야기
TDD: Test Driven Development 첫번째 이야기
 
Analytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAnalytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWS
 
黄色いゾウさんと愉快な仲間たちの近況報告 #hadoopreading
黄色いゾウさんと愉快な仲間たちの近況報告 #hadoopreading黄色いゾウさんと愉快な仲間たちの近況報告 #hadoopreading
黄色いゾウさんと愉快な仲間たちの近況報告 #hadoopreading
 

Semelhante a Data-Driven Development Era and Its Technologies

Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
Christopher Whitaker
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final Version
Janani Eshwaran
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final Version
Janani Eshwaran
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and Maintenance
Mercedes Coyle
 

Semelhante a Data-Driven Development Era and Its Technologies (20)

Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final Version
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final Version
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and Maintenance
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Accelerating analytics in a new era of data
Accelerating analytics in a new era of dataAccelerating analytics in a new era of data
Accelerating analytics in a new era of data
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real WorldWSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
WSO2Con ASIA 2016: Patterns for Deploying Analytics in the Real World
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
 

Mais de SATOSHI TAGOMORI

Mais de SATOSHI TAGOMORI (20)

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/Operations
 
Maccro Strikes Back
Maccro Strikes BackMaccro Strikes Back
Maccro Strikes Back
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of Ruby
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script Confusing
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: Bigdam
 
Technologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise BusinessTechnologies, Data Analytics Service and Enterprise Business
Technologies, Data Analytics Service and Enterprise Business
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Perfect Norikra 2nd Season
Perfect Norikra 2nd SeasonPerfect Norikra 2nd Season
Perfect Norikra 2nd Season
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra Perfect
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 

Data-Driven Development Era and Its Technologies

  • 1. Data-Driven Development Era and Its Technologies Developers Summit 2015 Autumn (Oct 14, 2015) Satoshi Tagomori (@tagomoris)
  • 2. Satoshi "Moris" Tagomori (@tagomoris) Fluentd, Norikra, Hadoop, ... Treasure Data, Inc.
  • 3.
  • 6. Main Topics around "Data" • Data collection • Storage • Data processing • Batch distributed processing • Stream processing • Machine Learning • Near real-time query & Data lake • Visualization
  • 7. Data Analytics Flow Collect Store Process Visualize Data source Reporting Monitoring
  • 9. Using Services or Not • Using services fully-managed: • Google BigQuery & Dataflow • Treasure Data services • Using services self-managed: • Amazon EMR & Redshift • Google Cloud Dataproc • Using your own environment & cluster
  • 10. Using Services or Not • Using services fully-managed: • Google BigQuery & Dataflow • Treasure Data services • Using services self-managed: • Amazon EMR & Redshift • Google Cloud Dataproc • Using your own environment & cluster a bit more cost extremely less efforts fully controlled by self extremely more efforts less cost less efforts
  • 11. Using Services or Not: "Use Services!" To concentrate DATA and Analytics, NOT tools
  • 12. Why should we use services? • About distributed systems: • hard to operate & upgrade • impossible to "small-start" • very hard to hire professional engineer • Data Driven Development: • collect/store data at first! • consider output data at second! • "before building your own environment"
  • 13. Really? Are you TD guy? • ...Really! • But it requires very long discussions :P • "スタートアップのデータ処理基盤、作るか、使うか"
 http://tsuchinoko.dmmlabs.com/?p=1770
  • 14. How to choose software/services in Data-Driven Development
  • 15. "What" decides "How" • Distributed systems are to solve problems • There're many kind of data • There're many problems • Systems solve different problems from each other • There are no "Silver bullet"!
  • 16. What First, How Second • What do you want to do? • Reporting? Analytics? Recommendation? or ... • What type of data you wan to process? • Stored large log? Stream sensor data? or ... • What is you need as result? • CSV? Spreadsheet? Graph? DB Relation? or ...
  • 17. How?(just for example) • MapReduce, Tez • Large batch jobs, big JOINs, high stability • Spark • Small/Middle batch jobs, machine learning • Impala, Presto, Drill, Redshift, BigQuery • Near-real-time search, small-to-large analytics • Storm, Spark streaming • Stream data conversion/aggregation
  • 18. "Processing" is just a part of whole dataflow!
  • 19. Data Analytics Flow (again) Collect Store Process Visualize Data source Reporting Monitoring
  • 20. Data Analytics Flow (again) Collect Store Process Visualize Data source Reporting Monitoring
  • 21. Data Collection • Data Driven Development -> collect at first! • As batch: Data already exists as files • Easily integrated with existing batch systems • Sqoop, Embulk, ... • As stream: Data just generated now • Easily connected with monitoring systems • Without burst network traffic • Flume, Logstash, Fluentd, ...
  • 22. Fluentd: Support Service by SRA OSS with Treasure Data Released TODAY!
  • 23. Other Important Topics • Storage: Performance, Availability, Schema management • Apache Hadoop HDFS, Apache HBase, Amazon S3, Cloudera Kudu, ... • Visualization: Functionality, Connectivity, Visibility • Tableau, Pentaho, Many other enterprise products, ... • Distributed Queues: Performance, Stability, Connectivity • Apache Kafka, Amazon Kinesis, ...
  • 24. Get Familiar with Options NOT to Take Pains about Technology!