SlideShare uma empresa Scribd logo
1 de 54
Baixar para ler offline
/ @laclefyoshi / ysaeki@r.recruit.co.jp
•
•
•
•
•
•
2
• 2011/04
• 2015/09
•
• Druid (KDP, 2015)
• RDB NoSQL ( , 2016; : HBase )
• ESP8266 Wi-Fi IoT (KDP, 2016)
•
• (WebDB Forum 2014)
• Spark Streaming (Spark Meetup December; 2015)
• Kafka AWS Kinesis (Apache Kafka Meetup Japan #1; 2016)
• (FutureOfData; 2016)
• Queryable State for Kafka Streams (Apache Kafka Meetup Japan #2; 2016)
• Apache Spark ( Geek Night #11; 2016)
3
5
6
http://www.datascientist.or.jp/
8
SQL
9
http://www.datascientist.or.jp/news/2015-11-20.html
Apache Spark SQL
https://databricks.com/blog/2016/09/27/spark-survey-2016-released.html
11
• SQL
• SELECT GROUPBY JOIN
•
•
• AWS Kinesis Apache Kafka
• &
•
•
•
12
Kinesis Analytics
PipelineDB
MemSQL
VoltDB
13
Kinesis Analytics
the easiest way to process streaming data in real time
with standard SQL
14
PipelineDB
relational database that runs SQL queries continuously
on streams, incrementally storing results in tables
15
MemSQL
a high performance data warehouse designed for the
cloud and on-premises
16
VoltDB
modern applications processing millions of data points
in milliseconds with 100% accuracy
17
☓
☓
19
OSS
20
NewSQL 1 CockroachDB
☓
☓ ☓
☓ ☓
☓
21
JSON
Web
☓ ☓
(※) VARCHAR(N) JSON
22
JSON
SELECT
info#>'{features, 0, geometry, coordinates}' as coord
FROM geo_view;
SELECT
info->'features'->0->'geometry'->'coordinates' as coord
FROM geo_view;
SELECT
JSON_EXTRACT_JSON(info::features, 0, 'geometry', 'coordinates')
FROM geo;
SELECT
JSON_EXTRACT_JSON(info, 'features', 0, 'geometry', 'coordinates')
FROM geo;
SELECT
FIELD(info, 'features[0].geometry.coordinates')
FROM geo;
SELECT
FIELD(ARRAY_ELEMENT(FIELD(info, 'features'), 0), 'geometry.coordinates')
FROM geo;
Kinesis Analytics
Kinesis Analytics:
Kinesis Stream
26
Kinesis Analytics:
Bitcoin ASK: BID: LAST: Volume:
https://api.bitcoinaverage.com/ticker/global/all
27
Kinesis Analytics:
JSON
28
Kinesis Analytics:
timestamp
1 1000
• 1 50KB
• JSON Array
29
Kinesis Analytics:
Web UI
30
Kinesis Analytics: SQL
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
min_ask INTEGER,
max_bid INTEGER,
avg_last INTEGER
);
CREATE OR REPLACE PUMP "TEST_STREAM_PUMP"
AS INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM
MIN("ask") as min_ask,
MAX("bid") as max_bid,
AVG("last") as avg_last
FROM "SOURCE_SQL_STREAM_001"
GROUP BY PARTITION_KEY,
FLOOR(("SOURCE_SQL_STREAM_001".ROWTIME
- TIMESTAMP '1970-01-01 00:00:00') SECOND / 120 TO SECOND);
CREATE STREAM
CREATE PUMP
31
Kinesis Analytics:
Web UI
32
PipelineDB
PipelineDB: Kinesis
CREATE STREAM bitcoins (info JSON);
SELECT pipeline_kinesis.add_endpoint('input_stream',
'ap-northeast-1',
'/path_to_credential_file');
SELECT pipeline_kinesis.consume_begin('input_stream',
'kinesis-stream-name',
'bitcoins',
format := 'json');
CREATE CONTINUOUS VIEW bitcoins_view AS SELECT info FROM bitcoins;
SELECT * FROM bitcoins_view LIMIT 10;
34
CREATE STREAM / SELECT pipeline_*.consume_begin
CREATE CONTINUOUS VIEW
PipelineDB: Kafka
CREATE STREAM bitcoins (info JSON);
SELECT pipeline_kafka.add_broker('172.17.0.3:9092');
SELECT pipeline_kafka.consume_begin('test-bitcoin-j',
'bitcoins',
format := ‘json');
CREATE CONTINUOUS VIEW bitcoins_view AS SELECT info FROM bitcoins;
SELECT * FROM bitcoins_view LIMIT 10;
CREATE STREAM / SELECT pipeline_*.consume_begin
CREATE CONTINUOUS VIEW
35
MemSQL
MemSQL: Kafka
CREATE TABLE bitcoins (info JSON);
CREATE PIPELINE `test_kafka_bitcoin` AS LOAD DATA
KAFKA '172.17.0.3:9092/test-bitcoin-j' INTO TABLE `bitcoins`;
TEST PIPELINE test_kafka_bitcoin LIMIT 1;
START PIPELINE test_kafka_bitcoin;
CREATE VIEW bitcoins_view AS SELECT info FROM bitcoins;
SELECT * FROM bitcoins LIMIT 10;
SELECT * FROM bitcoins_view LIMIT 10;
CREATE TABLE / CREATE PIPELINE
CREATE VIEW
37
VoltDB
VoltDB: Kinesis
CREATE TABLE bitcoins (info VARCHAR(5000));
CREATE VIEW bitcoins_view AS SELECT info FROM bitcoins;
SELECT * FROM bitcoins LIMIT 10;
SELECT * FROM bitcoins_view LIMIT 10;
<deployment>
<import>
<configuration type="kinesis" format="csv" enabled="true">
<property name=“stream.name”> kinesis-stream-name </property>
<property name=“region”> ap-northeast-1 </property>
<property name="access.key"> ... </property>
<property name="secret.key"> ... </property>
<property name="procedure"> bitcoins.insert </property>
</configuration>
</import>
</deployment>
39
CREATE TABLE /
CREATE VIEW
VoltDB: Kafka
CREATE TABLE bitcoins (info VARCHAR(5000));
CREATE VIEW bitcoins_view AS SELECT info FROM bitcoins;
SELECT * FROM bitcoins LIMIT 10;
SELECT * FROM bitcoins_view LIMIT 10;
<deployment>
<import>
<configuration type="kafka" format="csv" enabled="true">
<property name=“topics”> test-bitcoin-j </property>
<property name=“brokers"> 172.17.0.3:9092 </property>
<property name="procedure"> bitcoins.insert </property>
</configuration>
</import>
</deployment>
40
CREATE TABLE /
CREATE VIEW
SQL JOIN
x JOIN
42
☓ ☓
☓
(※) CREATE VIEW
x JOIN
43
☓ ☓ ☓ ☓
☓
(※) CREATE VIEW
JOIN
CREATE REFERENCE TABLE
Nested JOIN :
... FROM A LEFT JOIN
(B LEFT JOIN C ON B.col = C.col)
ON A.col = B.col ...
44
SQL WINDOW
Tumbling Window
Sliding Window
SQL Tumbling Window
SELECT
AVG(last) AS avg_last
FROM bitcoins
GROUP BY
FLOOR(EXTRACT(MINUTE FROM row_timestamp) / 2);
GROUP BY
SQL Sliding Window
SELECT STREAM
count(*) OVER lastHour
FROM APP_STREAM
WINDOW lastHour AS (PARTITION BY ... RANGE INTERVAL '1' HOUR PRECEDING);
CREATE VIEW app_stream_view
WITH (sw = '1 hour', step_factor = 50)
AS SELECT count(*) FROM app_stream;
SELECT
count(*) OVER (PARTITION BY ... ORDER BY ...)
FROM app_stream_table;
timediff()
• SQL
• SELECT GROUPBY JOIN
•
•
• AWS Kinesis Apache Kafka
• &
•
•
•
50
Kinesis Analytics
• Kinesis Stream KPL
• Aggregated Record
• Web UI
• API aws
• AddApplicationReferenceDataSource
• S3 / UpdateApplication
• S3 1GB
52


Appendix: ...
http://jp.techcrunch.com/2016/09/23/20160922apple-acquires-another-machine-learning-company-tuplejump/
54

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Gwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New Architecture
Gwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New ArchitectureGwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New Architecture
Gwen Shapira, Confluent | Kafka Summit 2020 Keynote | Kafka’s New Architecture
 
Kafka Summit SF 2017 - Riot's Journey to Global Kafka Aggregation
Kafka Summit SF 2017 - Riot's Journey to Global Kafka AggregationKafka Summit SF 2017 - Riot's Journey to Global Kafka Aggregation
Kafka Summit SF 2017 - Riot's Journey to Global Kafka Aggregation
 
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and Kibana
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka StreamsKafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
 
Using ELK-Stack (Elasticsearch, Logstash and Kibana) with BizTalk Server
Using ELK-Stack (Elasticsearch, Logstash and Kibana) with BizTalk ServerUsing ELK-Stack (Elasticsearch, Logstash and Kibana) with BizTalk Server
Using ELK-Stack (Elasticsearch, Logstash and Kibana) with BizTalk Server
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructure
 
Kafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backboneKafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backbone
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
 
EventHub for kafka ecosystems kafka meetup
EventHub for kafka ecosystems   kafka meetupEventHub for kafka ecosystems   kafka meetup
EventHub for kafka ecosystems kafka meetup
 
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
 
Putting Kafka Together with the Best of Google Cloud Platform
Putting Kafka Together with the Best of Google Cloud Platform Putting Kafka Together with the Best of Google Cloud Platform
Putting Kafka Together with the Best of Google Cloud Platform
 
Kafka Summit SF 2017 - Fast Data in Supply Chain Planning
Kafka Summit SF 2017 - Fast Data in Supply Chain PlanningKafka Summit SF 2017 - Fast Data in Supply Chain Planning
Kafka Summit SF 2017 - Fast Data in Supply Chain Planning
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Migrating with Debezium
Migrating with DebeziumMigrating with Debezium
Migrating with Debezium
 
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in Scala
 
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
 

Semelhante a ストリーミングデータのアドホック分析エンジンの比較

Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
Amazee Labs
 
Infrastructural challenges of a fast-pace startup
Infrastructural challenges of a fast-pace startupInfrastructural challenges of a fast-pace startup
Infrastructural challenges of a fast-pace startup
DevOps Braga
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
Data Science Milan
 

Semelhante a ストリーミングデータのアドホック分析エンジンの比較 (20)

Workshop KrakYourNet2016 - Web applications hacking Ruby on Rails example
Workshop KrakYourNet2016 - Web applications hacking Ruby on Rails example Workshop KrakYourNet2016 - Web applications hacking Ruby on Rails example
Workshop KrakYourNet2016 - Web applications hacking Ruby on Rails example
 
RoR Workshop - Web applications hacking - Ruby on Rails example
RoR Workshop - Web applications hacking - Ruby on Rails exampleRoR Workshop - Web applications hacking - Ruby on Rails example
RoR Workshop - Web applications hacking - Ruby on Rails example
 
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
UEMB200: Next Generation of Endpoint Management Architecture and Discovery Se...
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
 
Serverless Stream Processing with Bill Bejeck
Serverless Stream Processing with Bill BejeckServerless Stream Processing with Bill Bejeck
Serverless Stream Processing with Bill Bejeck
 
Html 5 boot camp
Html 5 boot campHtml 5 boot camp
Html 5 boot camp
 
Icinga 2009 at OSMC
Icinga 2009 at OSMCIcinga 2009 at OSMC
Icinga 2009 at OSMC
 
MySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKMySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELK
 
Taming WebSocket with Scarlet
Taming WebSocket with ScarletTaming WebSocket with Scarlet
Taming WebSocket with Scarlet
 
iOS for ERREST
iOS for ERRESTiOS for ERREST
iOS for ERREST
 
Infrastructural challenges of a fast-pace startup
Infrastructural challenges of a fast-pace startupInfrastructural challenges of a fast-pace startup
Infrastructural challenges of a fast-pace startup
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
 
Igor Davydenko
Igor DavydenkoIgor Davydenko
Igor Davydenko
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Ceph Day Tokyo - Bring Ceph to Enterprise
Ceph Day Tokyo - Bring Ceph to Enterprise Ceph Day Tokyo - Bring Ceph to Enterprise
Ceph Day Tokyo - Bring Ceph to Enterprise
 
NoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSONNoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSON
 
Gaming on AWS - 1. AWS로 글로벌 게임 런칭하기 - 장르별 아키텍처 중심
Gaming on AWS - 1. AWS로 글로벌 게임 런칭하기 - 장르별 아키텍처 중심Gaming on AWS - 1. AWS로 글로벌 게임 런칭하기 - 장르별 아키텍처 중심
Gaming on AWS - 1. AWS로 글로벌 게임 런칭하기 - 장르별 아키텍처 중심
 
You got database in my cloud (short version)
You got database  in my cloud (short version)You got database  in my cloud (short version)
You got database in my cloud (short version)
 
Storlets fb session_16_9
Storlets fb session_16_9Storlets fb session_16_9
Storlets fb session_16_9
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 

ストリーミングデータのアドホック分析エンジンの比較