SlideShare a Scribd company logo
1 of 26
Download to read offline
Technologies,
Data Analytics Service
and Enterprise Businesses
SENDAI IT COMMUNE #2
2018-01-09
Satoshi Tagomori (@tagomoris)
Treasure Data, Inc.
Satoshi Tagomori (@tagomoris)
Fluentd, MessagePack-Ruby, Norikra, Woothee, ...
Treasure Data, Inc.
Retry-able Failures or Not
Idempotent Operations: (冪等な操作)
べきとう
Technologies
Data Analytics Service
Enterprise Business
Technologies
↓
Data Analytics Service
↓
Enterprise Business
Enterprise Business ?
• Many different definitions and discussions about "Enterprise"... :(
• MY DEFINITION IN THIS TALK:



"Businesses NOT about IT"

• Thus, most of businesses are "Enterprise", everywhere, not only in Tokyo
Data Analytics Service ?
• Provides ways to know:
• How many people are reaching our products?
• How many times are they seeing our advertisements?
• And how many times do they buy our products?
• When are they use our products?
• When did they buy our products?
• Where did they buy our products?
• ...
• Something helps our business using data
Data Analytics Service

for Enterprise Business ?
• Something helps "Business not about IT", using data (IT)
• Staffs (using data analytics service) doesn't know about IT
• and also don't take care about IT
• but "need" result of analytics
• Everyone are checking report about yesterday at 10:00 AM
• We need results before 10:00AM
• 10:10 AM is too late, but 2:00 AM is too early...
Deadline and Retries
Big Job: Power 1
10:00AM00:00AM 05:30AM01:00AM
Big Job: Power 1
Crash! Delay...
Big Job: Power 2
Big Job: Power 2
Crash! OK!
Small Jobs: Power 1
Small Jobs: Power 1
Crash! OK!
Missions of Data Analytics Service
for Enterprise Business
Fast "enough"
Cheap "enough"
Stable
Easy to use "enough"
Technologies for Data Analytics Service
• Data Management System
• Distributed Processing System
• Queue and Scheduler
• Connecting Systems and Services
• Controlling Jobs, Tasks and Workflows
• Managing Retries
Data Management Systems
• Data Collecting Systems
• Fluentd, Embulk, ...
• Distributed Database and Storage
• Storing data in efficient format (MPC1, MessagePack columnar format)
• Managing index
• Managing schema
• Providing transactional operations
Distributed Processing System
• Running Analytics Queries
• MapReduce engines: Hadoop + Hive
• MPP (Massive Parallel Processing systems): Presto
• Running Data Management Jobs
• Converting data formats, re-index, detecting schema, ...
• Computing Resource Management
• Customer queries (and internal use) must be separated!
Queue and Scheduler
• Queuing Queries
• Allow to enqueue queries, run these next-to-next
Power 1
Customer
Request
• Scheduling Queries
• Run queries when it's ok to run
Data for Queries
01:00AM 03:00AM
Connecting Systems and Services
• Non-"connected" Data Analytics Service
Ultra Super Great
Analytics Service
Database
Query
Result
Not "easy enough"
Connecting Systems and Services
• Data Analytics Service MUST be "connected"
Treasure Data
Database
Query
Result
Control Jobs/Tasks
• A Job needs results of other jobs
"Risky"
Time based schedule
A,B,C -> D,E -> F
01:00AM
03:10AM ?
03:30AM
06:30AM ?
07:00AM 10:00AM
"Risky"
Time based schedule
A,B,C -> D,E -> F
01:00AM
Crash!
03:30AM
Oops, No Data...
10:00AM
• "Risk" for failures
07:00AM
Oops, No Data...
08:15AM ?
Control Jobs/Tasks
• A Job needs results of other jobs
Time based schedule
A,B,C -> D,E -> F
01:00AM
03:10AM ?
06:00AM
08:30AM ?
11:00AM ???
• "Time based schedule" needs
• Wide space for retries
• Big resource for fast results (not cheap!)
Space for Retries Space for Retries
Control Jobs/Tasks
• Workflow pattern
Workflow execution
A,B,C -> D,E -> F
01:00AM
07:15AM ?
10:00AMWorkflow control barriers
Workflow execution
A,B,C -> D,E -> F
01:00AM 10:00AMWorkflow control barriers
• Workflow pattern with retries
Crash!
Retries !!!!!!!!!!!!!!!!!!!!!!!!
Retry-able Failures or Not
• "Retry-able Failures"
• Crash of compute nodes
• Communication errors
• Service down of "connected" services
• ...
• Non-"Retry-able Failures"
• SQL syntax error
• Missing data sources / Missing tables
• Wrong API key of "connected" services
• ...
Table B
Table B
Retry-able Operations ?
• For example.... :
• Run Query A
• Append result of A into B
• Count rows of B
• Failures?:
• Run Query A
• Append result of A into B ... (Failed!)
• Retry Query A
• Retry to append result of A into B
• Count rows of B
1
2
3
4
1
2
1
2
3
4
Idempotent Operations
• "Idempotent" (冪等である) operation
• can get "same" result when it's executed twice or more
べきとう
Table B
1
2
3
4
• Idempotent Operation:
• Run Query A
• "Replace" table B with result of A
• Count rows of B
Table B
1
2
Replay-able Data Analytics Workflow
• Need to do many "try-and-error"
• w/ updated queries
• w/ updated data...
• Idempotent operations makes workflow "Replay-able"
• Fast try-and-error (PDCA!) cycles
• → Fast business growth!
Enterprise Business
❤
Technologies
Thank you!
@tagomoris

More Related Content

What's hot

Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure DataTaro L. Saito
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookTreasure Data, Inc.
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015N Masahiro
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Sadayuki Furuhashi
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_casewyukawa
 
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableShu Ting Tseng
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceSATOSHI TAGOMORI
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CAkbajda
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_enOgibayashi
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupWojciech Biela
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...viirya
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloudQubole
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Treasure Data, Inc.
 

What's hot (20)

Presto
PrestoPresto
Presto
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Norikra Recent Updates
Norikra Recent UpdatesNorikra Recent Updates
Norikra Recent Updates
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop Meetup
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
 
Presto in the cloud
Presto in the cloudPresto in the cloud
Presto in the cloud
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -
 

Similar to Technologies, Data Analytics Service and Enterprise Business

Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...
Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...
Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...uptime software
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Adam Muise
 
Power BI - 2016 - Public
Power BI - 2016 - PublicPower BI - 2016 - Public
Power BI - 2016 - PublicJulian Payne
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesSATOSHI TAGOMORI
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructureSimon Belak
 
Movin on Up - SPEngage Phoenix 2017
Movin on Up - SPEngage Phoenix 2017Movin on Up - SPEngage Phoenix 2017
Movin on Up - SPEngage Phoenix 2017Jim Adcock
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 
7 Trends in Time & Expense In 7 Minutes
7 Trends in Time & Expense In 7 Minutes7 Trends in Time & Expense In 7 Minutes
7 Trends in Time & Expense In 7 MinutesDATABASICS
 
Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Bernardo Najlis
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Movin on Up - ScarePoint Friday Cincinnati 2016
Movin on Up - ScarePoint Friday Cincinnati 2016Movin on Up - ScarePoint Friday Cincinnati 2016
Movin on Up - ScarePoint Friday Cincinnati 2016Jim Adcock
 
Maxis Alchemize imug 2017
Maxis Alchemize imug 2017Maxis Alchemize imug 2017
Maxis Alchemize imug 2017BrandonWilhelm4
 
Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...
Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...
Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...Chris McNulty
 
SKill Up Yourself with SQL, Azure, Power BI
SKill Up Yourself with SQL, Azure, Power BISKill Up Yourself with SQL, Azure, Power BI
SKill Up Yourself with SQL, Azure, Power BISequelGate
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
Pinterest hadoop summit_talk
Pinterest hadoop summit_talkPinterest hadoop summit_talk
Pinterest hadoop summit_talkKrishna Gade
 
50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven ProductsDataWorks Summit
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015Bipin Singh
 

Similar to Technologies, Data Analytics Service and Enterprise Business (20)

Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...
Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...
Unified IT Monitoring: Beautiful Dashboards vs. Deep Reporting - What’s More ...
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
 
Power BI - 2016 - Public
Power BI - 2016 - PublicPower BI - 2016 - Public
Power BI - 2016 - Public
 
Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its Technologies
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
Movin on Up - SPEngage Phoenix 2017
Movin on Up - SPEngage Phoenix 2017Movin on Up - SPEngage Phoenix 2017
Movin on Up - SPEngage Phoenix 2017
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
7 Trends in Time & Expense In 7 Minutes
7 Trends in Time & Expense In 7 Minutes7 Trends in Time & Expense In 7 Minutes
7 Trends in Time & Expense In 7 Minutes
 
Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Movin on Up - ScarePoint Friday Cincinnati 2016
Movin on Up - ScarePoint Friday Cincinnati 2016Movin on Up - ScarePoint Friday Cincinnati 2016
Movin on Up - ScarePoint Friday Cincinnati 2016
 
Maxis Alchemize imug 2017
Maxis Alchemize imug 2017Maxis Alchemize imug 2017
Maxis Alchemize imug 2017
 
Scalable web architecture
Scalable web architectureScalable web architecture
Scalable web architecture
 
Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...
Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...
Australia SharePoint Conference 2012 - SharePoint Performance - Tales from th...
 
SKill Up Yourself with SQL, Azure, Power BI
SKill Up Yourself with SQL, Azure, Power BISKill Up Yourself with SQL, Azure, Power BI
SKill Up Yourself with SQL, Azure, Power BI
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
Pinterest hadoop summit_talk
Pinterest hadoop summit_talkPinterest hadoop summit_talk
Pinterest hadoop summit_talk
 
50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products50 Billion pins and counting: Using Hadoop to build data driven Products
50 Billion pins and counting: Using Hadoop to build data driven Products
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
 

More from SATOSHI TAGOMORI

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speedSATOSHI TAGOMORI
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsSATOSHI TAGOMORI
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of RubySATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)SATOSHI TAGOMORI
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script ConfusingSATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubySATOSHI TAGOMORI
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsSATOSHI TAGOMORI
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the WorldSATOSHI TAGOMORI
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In RubySATOSHI TAGOMORI
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldSATOSHI TAGOMORI
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 
Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"SATOSHI TAGOMORI
 
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsSATOSHI TAGOMORI
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDSATOSHI TAGOMORI
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsSATOSHI TAGOMORI
 
Engineer as a Leading Role
Engineer as a Leading RoleEngineer as a Leading Role
Engineer as a Leading RoleSATOSHI TAGOMORI
 

More from SATOSHI TAGOMORI (20)

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/Operations
 
Maccro Strikes Back
Maccro Strikes BackMaccro Strikes Back
Maccro Strikes Back
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of Ruby
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script Confusing
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"
 
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API Details
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TD
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Tale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench Tools
 
Engineer as a Leading Role
Engineer as a Leading RoleEngineer as a Leading Role
Engineer as a Leading Role
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 

Recently uploaded (20)

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 

Technologies, Data Analytics Service and Enterprise Business

  • 1. Technologies, Data Analytics Service and Enterprise Businesses SENDAI IT COMMUNE #2 2018-01-09 Satoshi Tagomori (@tagomoris) Treasure Data, Inc.
  • 2. Satoshi Tagomori (@tagomoris) Fluentd, MessagePack-Ruby, Norikra, Woothee, ... Treasure Data, Inc.
  • 3.
  • 4. Retry-able Failures or Not Idempotent Operations: (冪等な操作) べきとう
  • 7. Enterprise Business ? • Many different definitions and discussions about "Enterprise"... :( • MY DEFINITION IN THIS TALK:
 
 "Businesses NOT about IT"
 • Thus, most of businesses are "Enterprise", everywhere, not only in Tokyo
  • 8. Data Analytics Service ? • Provides ways to know: • How many people are reaching our products? • How many times are they seeing our advertisements? • And how many times do they buy our products? • When are they use our products? • When did they buy our products? • Where did they buy our products? • ... • Something helps our business using data
  • 9. Data Analytics Service
 for Enterprise Business ? • Something helps "Business not about IT", using data (IT) • Staffs (using data analytics service) doesn't know about IT • and also don't take care about IT • but "need" result of analytics • Everyone are checking report about yesterday at 10:00 AM • We need results before 10:00AM • 10:10 AM is too late, but 2:00 AM is too early...
  • 10. Deadline and Retries Big Job: Power 1 10:00AM00:00AM 05:30AM01:00AM Big Job: Power 1 Crash! Delay... Big Job: Power 2 Big Job: Power 2 Crash! OK! Small Jobs: Power 1 Small Jobs: Power 1 Crash! OK!
  • 11. Missions of Data Analytics Service for Enterprise Business Fast "enough" Cheap "enough" Stable Easy to use "enough"
  • 12. Technologies for Data Analytics Service • Data Management System • Distributed Processing System • Queue and Scheduler • Connecting Systems and Services • Controlling Jobs, Tasks and Workflows • Managing Retries
  • 13. Data Management Systems • Data Collecting Systems • Fluentd, Embulk, ... • Distributed Database and Storage • Storing data in efficient format (MPC1, MessagePack columnar format) • Managing index • Managing schema • Providing transactional operations
  • 14. Distributed Processing System • Running Analytics Queries • MapReduce engines: Hadoop + Hive • MPP (Massive Parallel Processing systems): Presto • Running Data Management Jobs • Converting data formats, re-index, detecting schema, ... • Computing Resource Management • Customer queries (and internal use) must be separated!
  • 15. Queue and Scheduler • Queuing Queries • Allow to enqueue queries, run these next-to-next Power 1 Customer Request • Scheduling Queries • Run queries when it's ok to run Data for Queries 01:00AM 03:00AM
  • 16. Connecting Systems and Services • Non-"connected" Data Analytics Service Ultra Super Great Analytics Service Database Query Result Not "easy enough"
  • 17. Connecting Systems and Services • Data Analytics Service MUST be "connected" Treasure Data Database Query Result
  • 18. Control Jobs/Tasks • A Job needs results of other jobs "Risky" Time based schedule A,B,C -> D,E -> F 01:00AM 03:10AM ? 03:30AM 06:30AM ? 07:00AM 10:00AM "Risky" Time based schedule A,B,C -> D,E -> F 01:00AM Crash! 03:30AM Oops, No Data... 10:00AM • "Risk" for failures 07:00AM Oops, No Data... 08:15AM ?
  • 19. Control Jobs/Tasks • A Job needs results of other jobs Time based schedule A,B,C -> D,E -> F 01:00AM 03:10AM ? 06:00AM 08:30AM ? 11:00AM ??? • "Time based schedule" needs • Wide space for retries • Big resource for fast results (not cheap!) Space for Retries Space for Retries
  • 20. Control Jobs/Tasks • Workflow pattern Workflow execution A,B,C -> D,E -> F 01:00AM 07:15AM ? 10:00AMWorkflow control barriers Workflow execution A,B,C -> D,E -> F 01:00AM 10:00AMWorkflow control barriers • Workflow pattern with retries Crash!
  • 22. Retry-able Failures or Not • "Retry-able Failures" • Crash of compute nodes • Communication errors • Service down of "connected" services • ... • Non-"Retry-able Failures" • SQL syntax error • Missing data sources / Missing tables • Wrong API key of "connected" services • ...
  • 23. Table B Table B Retry-able Operations ? • For example.... : • Run Query A • Append result of A into B • Count rows of B • Failures?: • Run Query A • Append result of A into B ... (Failed!) • Retry Query A • Retry to append result of A into B • Count rows of B 1 2 3 4 1 2 1 2 3 4
  • 24. Idempotent Operations • "Idempotent" (冪等である) operation • can get "same" result when it's executed twice or more べきとう Table B 1 2 3 4 • Idempotent Operation: • Run Query A • "Replace" table B with result of A • Count rows of B Table B 1 2
  • 25. Replay-able Data Analytics Workflow • Need to do many "try-and-error" • w/ updated queries • w/ updated data... • Idempotent operations makes workflow "Replay-able" • Fast try-and-error (PDCA!) cycles • → Fast business growth!