SlideShare uma empresa Scribd logo
1 de 64
Baixar para ler offline
O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
Time Series Processing with Solr and Spark
Josef Adersberger (@adersberger)
CTO, QAware
TIME SERIES 101
4
01
WE’RE SURROUNDED BY TIME SERIES
▸ Operational data: Monitoring data, performance
metrics, log events, …
▸ Data Warehouse: Dimension time
▸ Measured Me: Activity tracking, ECG, …
▸ Sensor telemetry: Sensor data, …
▸ Financial data: Stock charts, …
▸ Climate data: Temperature, …
▸ Web tracking: Clickstreams, …
▸ …
@adersberger
5
WE’RE SURROUNDED BY TIME SERIES (Pt. 2)
▸ Oktoberfest: Visitor and beer consumption trend
the singularity
6
01
TIME SERIES: BASIC TERMS
univariate time series multivariate time series multi-dimensional time
series (time series tensor)
time series setobservation
@adersberger
7
01
ILLUSTRATIVE OPERATIONS ON TIME SERIES
align
Time series => Time series
diff downsampling outlier
min/max avg/med slope std-dev
Time series => Scalar
@adersberger
OUR USE CASE
Monitoring Data Analysis 

of a business-critical,

worldwide distributed 

software system. Enable

root cause analysis and

anomaly detection.

> 1,000 nodes worldwide
> 10 processes per node
> 20 metrics per process

(OS, JVM, App-spec.)
Measured every second.
= about 6.3 trillions observations p.a.

Data retention: 5 yrs.
11
01
USE CASE: EXPLORING
Drill-down
host
process
measurements
counters (metrics)
Query time series metadata
Superimpose time series
@adersberger
12
01
USE CASE: STATISTICS
@adersberger
13
01
USE CASE: ANOMALY DETECTION
Featuring Twitter Anomaly Detection (https://github.com/twitter/AnomalyDetection

and Yahoo EGDAS https://github.com/yahoo/egads
@adersberger
14
01
USE CASE: SQL AND ZEPPELIN
@adersberger
CHRONIX SPARK
https://github.com/ChronixDB/chronix.spark
http://www.datasciencecentral.com
19
01
AVAILABLE TIME SERIES DATABASES
https://github.com/qaware/big-data-landscape
EASY-TO-USE BIG TIME
SERIES DATA STORAGE &
PROCESSING ON SPARK
21
01
THE CHRONIX STACK chronix.io
Big time series database
Scale-out
Storage-efficient
Interactive queries

No separate servers: Drop-in 

to existing Solr and Spark 

installations

Integrated into the relevant

open source ecosystem
@adersberger
Core
Chronix Storage
Chronix Server
Chronix Spark
ChronixFormat
GrafanaChronix Analytics
Collection
Analytics Frontends
Logstash fluentd collectd
Zeppelin
Prometheus Ingestion Bridge
KairosDB OpenTSDBInfluxDB Graphite
22
node
Distributed Data &

Data Retrieval
‣ Data sharding
‣ Fast index-based queries
‣ Efficient storage format
Distributed Processing
‣ Heavy lifting distributed
processing
‣ Efficient integration of Spark
and Solr
Result Processing
Post-processing on a
smaller set of time series
data flow
icon credits to Nimal Raj (database), Arthur Shlain (console) and alvarobueno (takslist)
@adersberger
23
TIME SERIES MODEL
Set of univariate multi-dimensional numeric time series
▸ set … because it’s more flexible and better to parallelise if operations can input and
output multiple time series.
▸ univariate … because multivariate will introduce too much complexity (and we have our
set to bundle multiple time series).
▸ multi-dimensional … because the ability to slice & dice in the set of time series is very
convenient for a lot of use cases.
▸ numeric … because it’s the most common use case.
A single time series is identified by a combination of its non-temporal
dimensional values (e.g. unit “mem usage” + host “aws42” + process “tomcat”)
@adersberger
24
01
CHRONIX SPARK API: ENTRY POINTS
CHRONIX SPARK


ChronixRDD
ChronixSparkContext
‣ Represents a set of time series
‣ Distributed operations on sets of time series
‣ Creates ChronixRDDs
‣ Speaks with the Chronix Server (Solr)
@adersberger
25
01
CHRONIX SPARK API: DATA MODEL
MetricTimeSeries
MetricObservationDataFrame
+ toDataFrame()
@adersberger
Dataset<MetricTimeSeries>
Dataset<MetricObservation>
+ toDataset()
+ toObservationsDataset()
ChronixRDD
26
01
SPARK APIs FOR DATA PROCESSING
RDD DataFrame Dataset
typed yes no yes
optimized medium highly highly
mature yes yes medium
SQL no yes no
@adersberger
27
01
CHRONIX RDD
Statistical operations
the set characteristic: 

a JavaRDD of 

MetricTimeSeries
Filter the set (esp. by

dimensions)
@adersberger
28
01
METRICTIMESERIES DATA TYPE
access all timestamps
the multi-dimensionality:

get/set dimensions

(attributes)
access all observations as
stream
access all numeric values
@adersberger
30
01
//Create Chronix Spark context from a SparkContext / JavaSparkContext

ChronixSparkContext csc = new ChronixSparkContext(sc);



//Read data into ChronixRDD

SolrQuery query = new SolrQuery(

"metric:"java.lang:type=Memory/HeapMemoryUsage/used"");



ChronixRDD rdd = csc.query(query,

"localhost:9983", //ZooKeeper host

"chronix", //Solr collection for Chronix

new ChronixSolrCloudStorage());



//Calculate the overall min/max/mean of all time series in the RDD

double min = rdd.min();

double max = rdd.max();

double mean = rdd.mean();
DataFrame df = rdd.toDataFrame(sqlContext);

DataFrame res = df

.select("time", "value", "process", "metric")

.where("process='jenkins-jolokia'")

.orderBy("time");

res.show();
@adersberger
CHRONIX SPARK INTERNALS
32
Distributed Data &

Data Retrieval
‣ Data sharding (OK)
‣ Fast index-based queries (OK)
‣ Efficient storage format
@adersberger
33
01
CHRONIX FORMAT: CHUNKING TIME SERIES
TIME SERIES
‣ start: TimeStamp
‣ end: TimeStamp
‣ dimensions: Map<String, String>
‣ observations: byte[]
TIME SERIES
‣ start: TimeStamp
‣ end: TimeStamp
‣ dimensions: Map<String, String>
‣ observations: byte[]
Logical
TIME SERIES
‣ start: TimeStamp
‣ end: TimeStamp
‣ dimensions: Map<String, String>
‣ observations: byte[]
Physical
Chunking:
1 logical time series = 

n physical time series (chunks)
1 chunk = fixed amount of
observations
1 chunk = 1 Solr document
@adersberger
34
01
CHRONIX FORMAT: ENCODING OF OBSERVATIONS
Binary encoding of all timestamp/value pairs (observations) with ProtoBuf incl.
binary compression.
Delta encoding leading to more effective binary compression
… of time stamps (DCC, Date-Delta-Compaction)













… of values: diff
chunck
• timespan
• nbr. of observations
periodic distributed time stamps (pts): timespan / nbr. of observations
real time stamps (rts) if |pts(x) - rts(x)| < threshold : rts(x) = pts(x)
value_to_store = pts(x) - rts(x)
value_to_store = value(x) - value(x-1)
@adersberger
35
01
CHRONIX FORMAT: TUNING CHUNK SIZE AND CODEC
GZIP +
128
kBytes
Florian Lautenschlager, Michael Philippsen, Andreas Kumlehn, Josef Adersberger

Chronix: Efficient Storage and Query of Operational Time Series
International Conference on Software Maintenance and Evolution 2016 (submitted)
@adersberger
storage 

demand access

time
36
01
CHRONIX FORMAT: STORAGE EFFICIENCY BENCHMARK
@adersberger
37
01
CHRONIX FORMAT: PERFORMANCE BENCHMARK
unit: secondsnbr of queries query
@adersberger
38
Distributed Processing
‣ Heavy lifting distributed
processing
‣ Efficient integration of Spark

and Solr
@adersberger
39
01
SPARK AND SOLR BEST PRACTICES: ALIGN PARALLELISM
SolrDocument

(Chunk)
Solr Shard Solr Shard
TimeSeries TimeSeries TimeSeries TimeSeries TimeSeries
Partition Partition
ChronixRDD
• Unit of parallelism in Spark: Partition
• Unit of parallelism in Solr: Shard
• 1 Spark Partition = 1 Solr Shard
SolrDocument

(Chunk)
SolrDocument

(Chunk)
SolrDocument

(Chunk)
SolrDocument

(Chunk)
SolrDocument

(Chunk)
SolrDocument

(Chunk)
SolrDocument

(Chunk)
SolrDocument

(Chunk)
SolrDocument

(Chunk)
@adersberger
40
01
ALIGN THE PARALLELISM WITHIN CHRONIXRDD
public ChronixRDD queryChronixChunks(

final SolrQuery query,

final String zkHost,

final String collection,

final ChronixSolrCloudStorage<MetricTimeSeries> chronixStorage)
throws SolrServerException, IOException {



// first get a list of replicas to query for this collection

List<String> shards = chronixStorage.getShardList(zkHost, collection);



// parallelize the requests to the shards

JavaRDD<MetricTimeSeries> docs = jsc.parallelize(shards, shards.size()).flatMap(

(FlatMapFunction<String, MetricTimeSeries>) shardUrl -> chronixStorage.streamFromSingleNode(

new KassiopeiaSimpleConverter(), shardUrl, query)::iterator);

return new ChronixRDD(docs);

}
Figure out all Solr
shards (using
CloudSolrClient in
the background)
Query each shard in parallel and convert
SolrDocuments to MetricTimeSeries
@adersberger
41
01
SPARK AND SOLR BEST PRACTICES: PUSHDOWN
SolrQuery query = new SolrQuery(

“<Solr query containing filters and aggregations>");



ChronixRDD rdd = csc.query(query, …
@adersberger
Predicate pushdown
• Pre-filter time series based on their 

metadata (dimensions, start, end)

with Solr.

Aggregation pushdown
• Perform pre-aggregations (min/max/avg/…) at
ingestion time and store it as metadata.
• (to come) Perform aggregations on Solr-level at
query time by enabling Solr to decode
observations
42
01
SPARK AND SOLR BEST PRACTICES: EFFICIENT DATA TRANSFER
Reduce volume: Pushdown & compression

Use efficient protocols: 

Low-overhead, bulk, stream

Avoid remote transfer: Place Spark

tasks (processes 1 partition) on the 

Solr node with the appropriate shard.

(to come by using SolrRDD)
@adersberger
Export 

Handler
Chronix

RDD
CloudSolr

Stream
Format
Decoder
bulk of 

JSON tuples
Chronix Spark
Solr / SolrJ
43
private Stream<MetricTimeSeries> 

streamWithCloudSolrStream(String zkHost, String collection, String shardUrl, SolrQuery query,

TimeSeriesConverter<MetricTimeSeries> converter)
throws IOException {

Map params = new HashMap();

params.put("q", query.getQuery());

params.put("sort", "id asc");

params.put("shards", extractShardIdFromShardUrl(shardUrl));

params.put("fl",

Schema.DATA + ", " + Schema.ID + ", " + Schema.START + ", " + Schema.END +

", metric, host, measurement, process, ag, group");

params.put("qt", "/export");

params.put("distrib", false);



CloudSolrStream solrStream = new CloudSolrStream(zkHost, collection, params);

solrStream.open();

SolrTupleStreamingService tupStream = new SolrTupleStreamingService(solrStream, converter);

return StreamSupport.stream(
Spliterators.spliteratorUnknownSize(tupStream, Spliterator.SIZED), false);

}
Pin query to one shard
Use export request handler
Boilerplate code to stream response
@adersberger
Time Series Databases should be first-class citizens.
Chronix leverages Solr and Spark to 

be storage efficient and to allow interactive 

queries for big time series data.
THANK YOU! QUESTIONS?
Mail: josef.adersberger@qaware.de
Twitter: @adersberger
TWITTER.COM/QAWARE - SLIDESHARE.NET/QAWARE
BONUS SLIDES
PERFORMANCE
codingvoding.tumblr.com
PREMATURE OPTIMIZATION IS
NOT EVIL IF YOU HANDLE BIG
DATA
Josef Adersberger
PERFORMANCE
USING A JAVA PROFILER WITH A LOCAL CLUSTER
PERFORMANCE
HIGH-PERFORMANCE, LOW-OVERHEAD COLLECTIONS
PERFORMANCE
830 MB -> 360 MB

(- 57%)
unveiled wrong Jackson 

handling inside of SolrClient
53
01
THE SECRETS OF DISTRIBUTED PROCESSING PERFORMANCE


Rule 1: Be as close to the data as possible!

(CPU cache > memory > local disk > network)

Rule 2: Reduce data volume as early as possible! 

(as long as you don’t sacrifice parallelization)

Rule 3: Parallelize as much as possible! 

(max = #cores * x)
PERFORMANCE
THE RULES APPLIED
‣ Rule 1: Be as close to the data as possible!
1. Solr caching

2. Spark in-memory processing with activated RDD compression

3. Binary protocol between Solr and Spark

‣ Rule 2: Reduce data volume as early as possible!
‣ Efficient storage format (Chronix Format)

‣ Predicate pushdown to Solr (query)

‣ Group-by & aggregation pushdown to Solr (faceting within a query)

‣ Rule 3: Parallelize as much as possible!
‣ Scale-out on data-level with SolrCloud

‣ Scale-out on processing-level with Spark
APACHE SPARK 101
CHRONIX SPARK WONDERLAND
ARCHITECTURE
APACHE SPARK
SPARK TERMINOLOGY (1/2)
▸ RDD: Has transformations and actions. Hides data partitioning &
distributed computation. References a set of partitions (“output
partitions”) - materialized or not - and has dependencies to
another RDD (“input partitions”). RDD operations are evaluated as
late as possible (when an action is called). As long as not being the
root RDD the partitions of an RDD are in memory but they can be
persisted by request.
▸ Partitions: (Logical) chunks of data. Default unit and level of
parallelism - inside of a partition everything is a sequential
operation on records. Has to fit into memory. Can have different
representations (in-memory, on disk, off heap, …)
APACHE SPARK
SPARK TERMINOLOGY (2/2)
▸ Job: A computation job which is launched when an action is called on a
RDD.
▸ Task: The atomic unit of work (function). Bound to exactly one partition.
▸ Stage: Set of Task pipelines which can be executed in parallel on one
executor.
▸ Shuffling: If partitions need to be transferred between executors. Shuffle
write = outbound partition transfer. Shuffle read = inbound partition
transfer.
▸ DAG Scheduler: Computes DAG of stages from RDD DAG. Determines
the preferred location for each task.
THE COMPETITORS / ALTERNATIVES
CHRONIX RDD VS. SPARK-TS
▸ Spark-TS provides no specific time series storage it uses the Spark persistence
mechanisms instead. This leads to a less efficient storage usage and less possibilities
to perform performance optimizations via predicate pushdown.
▸ In contrast to Spark-TS Chronix does not align all time series values on one vector of
timestamps. This leads to greater flexibility in time series aggregation
▸ Chronix provides multi-dimensional time series as this is very useful for data
warehousing and APM.
▸ Chronix has support for Datasets as this will be an important Spark API in the near
future. But Chronix currently doesn’t support an IndexedRowMatrix for SparkML.
▸ Chronix is purely written in Java. There is no explicit support for Python and Scala yet.
▸ Chronix doesn not support a ZonedTime as this makes it way more complicated.
CHRONIX SPARK INTERNALS
61
01
CHRONIXRDD: GET THE CHUNKS FROM SOLR
public ChronixRDD queryChronixChunks(

final SolrQuery query,

final String zkHost,

final String collection,

final ChronixSolrCloudStorage<MetricTimeSeries> chronixStorage)
throws SolrServerException, IOException {



// first get a list of replicas to query for this collection

List<String> shards = chronixStorage.getShardList(zkHost, collection);



// parallelize the requests to the shards

JavaRDD<MetricTimeSeries> docs = jsc.parallelize(shards, shards.size()).flatMap(

(FlatMapFunction<String, MetricTimeSeries>) shardUrl -> chronixStorage.streamFromSingleNode(

new KassiopeiaSimpleConverter(), shardUrl, query)::iterator);

return new ChronixRDD(docs);

}
Figure out all Solr
shards (using
CloudSolrClient in
the background)
Query each shard in parallel and convert
SolrDocuments to MetricTimeSeries
62
01
BINARY PROTOCOL WITH STANDARD SOLR CLIENT
private Stream<MetricTimeSeries> streamWithHttpSolrClient(String shardUrl,

SolrQuery query,

TimeSeriesConverter<MetricTimeSeries>
converter) {

HttpSolrClient solrClient = getSingleNodeSolrClient(shardUrl);

solrClient.setRequestWriter(new BinaryRequestWriter());

query.set("distrib", false);

SolrStreamingService<MetricTimeSeries> solrStreamingService = 

new SolrStreamingService<>(converter, query, solrClient, nrOfDocumentPerBatch);

return StreamSupport.stream(

Spliterators.spliteratorUnknownSize(solrStreamingService, Spliterator.SIZED), false);

}
Use HttpSolrClient pinned to one shard
Use binary (request)

protocol
Boilerplate code to stream response
63
private Stream<MetricTimeSeries> 

streamWithCloudSolrStream(String zkHost, String collection, String shardUrl, SolrQuery query,

TimeSeriesConverter<MetricTimeSeries> converter)
throws IOException {

Map params = new HashMap();

params.put("q", query.getQuery());

params.put("sort", "id asc");

params.put("shards", extractShardIdFromShardUrl(shardUrl));

params.put("fl",

Schema.DATA + ", " + Schema.ID + ", " + Schema.START + ", " + Schema.END +

", metric, host, measurement, process, ag, group");

params.put("qt", "/export");

params.put("distrib", false);



CloudSolrStream solrStream = new CloudSolrStream(zkHost, collection, params);

solrStream.open();

SolrTupleStreamingService tupStream = new SolrTupleStreamingService(solrStream, converter);

return StreamSupport.stream(
Spliterators.spliteratorUnknownSize(tupStream, Spliterator.SIZED), false);

}
EXPORT HANDLER PROTOCOL
Pin query to one shard
Use export request handler
Boilerplate code to stream response
64
01
CHRONIXRDD: FROM CHUNKS TO TIME SERIES
public ChronixRDD joinChunks() {

JavaPairRDD<MetricTimeSeriesKey, Iterable<MetricTimeSeries>> groupRdd

= this.groupBy(MetricTimeSeriesKey::new);



JavaPairRDD<MetricTimeSeriesKey, MetricTimeSeries> joinedRdd

= groupRdd.mapValues((Function<Iterable<MetricTimeSeries>, MetricTimeSeries>) mtsIt -> {

MetricTimeSeriesOrdering ordering = new MetricTimeSeriesOrdering();

List<MetricTimeSeries> orderedChunks = ordering.immutableSortedCopy(mtsIt);

MetricTimeSeries result = null;

for (MetricTimeSeries mts : orderedChunks) {

if (result == null) {

result = new MetricTimeSeries

.Builder(mts.getMetric())

.attributes(mts.attributes()).build();

}

result.addAll(mts.getTimestampsAsArray(), mts.getValuesAsArray());

}

return result;

});



JavaRDD<MetricTimeSeries> resultJavaRdd =

joinedRdd.map((Tuple2<MetricTimeSeriesKey, MetricTimeSeries> mtTuple) -> mtTuple._2);



return new ChronixRDD(resultJavaRdd); }
group chunks
according
identity
join chunks to

logical time 

series

Mais conteúdo relacionado

Mais procurados

Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3Rob Skillington
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and SparkJosef Adersberger
 
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...NETWAYS
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoNathaniel Braun
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...NoSQLmatters
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase HBaseCon
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0HBaseCon
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017HBaseCon
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an actionGordon Chung
 
openTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldopenTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldOliver Hankeln
 
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...DataStax
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLFlink Forward
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBGeoffrey Anderson
 
Gnocchi v4 - past and present
Gnocchi v4 - past and presentGnocchi v4 - past and present
Gnocchi v4 - past and presentGordon Chung
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase HBaseCon
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...Rob Skillington
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesOleksii Diagiliev
 

Mais procurados (20)

Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and Spark
 
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
 
JEE on DC/OS
JEE on DC/OSJEE on DC/OS
JEE on DC/OS
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an action
 
openTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldopenTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed world
 
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
 
Gnocchi v4 - past and present
Gnocchi v4 - past and presentGnocchi v4 - past and present
Gnocchi v4 - past and present
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
Influxdb and time series data
Influxdb and time series dataInfluxdb and time series data
Influxdb and time series data
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Gnocchi v3
Gnocchi v3Gnocchi v3
Gnocchi v3
 

Destaque

Chronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusChronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusQAware GmbH
 
Chronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache SolrChronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache SolrFlorian Lautenschlager
 
Analysis of time series
Analysis of time seriesAnalysis of time series
Analysis of time seriesPablosperessos
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTetiana Ivanova
 
Time Series
Time SeriesTime Series
Time Seriesyush313
 

Destaque (12)

Chronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusChronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for Prometheus
 
Chronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache SolrChronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache Solr
 
Analysis of time series
Analysis of time seriesAnalysis of time series
Analysis of time series
 
Time series Forecasting
Time series ForecastingTime series Forecasting
Time series Forecasting
 
Time series
Time seriesTime series
Time series
 
Time series
Time seriesTime series
Time series
 
Time Series Analysis Ravi
Time Series Analysis RaviTime Series Analysis Ravi
Time Series Analysis Ravi
 
Time series
Time seriesTime series
Time series
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and Practice
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
 
Timeseries forecasting
Timeseries forecastingTimeseries forecasting
Timeseries forecasting
 
Time Series
Time SeriesTime Series
Time Series
 

Semelhante a Time Series Analysis

Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache SparkQAware GmbH
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 
Chronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockChronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockQAware GmbH
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Matthias Niehoff
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseAll Things Open
 
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Mathias Herberts
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupFrens Jan Rumph
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruitymrphilroth
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkQAware GmbH
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5SAP Concur
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016DataStax
 
Storing Cassandra Metrics
Storing Cassandra MetricsStoring Cassandra Metrics
Storing Cassandra MetricsChris Lohfink
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeFlink Forward
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015Patrick McFadin
 

Semelhante a Time Series Analysis (20)

Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
Chronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockChronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the Block
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 
Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108Artimon - Apache Flume (incubating) NYC Meetup 20111108
Artimon - Apache Flume (incubating) NYC Meetup 20111108
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruity
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
 
Storing Cassandra Metrics
Storing Cassandra MetricsStoring Cassandra Metrics
Storing Cassandra Metrics
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Spark etl
Spark etlSpark etl
Spark etl
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 

Mais de QAware GmbH

50 Shades of K8s Autoscaling #JavaLand24.pdf
50 Shades of K8s Autoscaling #JavaLand24.pdf50 Shades of K8s Autoscaling #JavaLand24.pdf
50 Shades of K8s Autoscaling #JavaLand24.pdfQAware GmbH
 
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...QAware GmbH
 
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN MainzFully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN MainzQAware GmbH
 
Down the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile ArchitectureDown the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile ArchitectureQAware GmbH
 
"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!QAware GmbH
 
Make Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform EngineeringMake Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform EngineeringQAware GmbH
 
Der Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit PlaywrightDer Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit PlaywrightQAware GmbH
 
Was kommt nach den SPAs
Was kommt nach den SPAsWas kommt nach den SPAs
Was kommt nach den SPAsQAware GmbH
 
Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo QAware GmbH
 
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
 Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See... Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...QAware GmbH
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster QAware GmbH
 
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.QAware GmbH
 
Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!QAware GmbH
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s AutoscalingQAware GmbH
 
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAPKontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAPQAware GmbH
 
Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.QAware GmbH
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s AutoscalingQAware GmbH
 
Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.QAware GmbH
 
Per Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API GatewaysPer Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API GatewaysQAware GmbH
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster QAware GmbH
 

Mais de QAware GmbH (20)

50 Shades of K8s Autoscaling #JavaLand24.pdf
50 Shades of K8s Autoscaling #JavaLand24.pdf50 Shades of K8s Autoscaling #JavaLand24.pdf
50 Shades of K8s Autoscaling #JavaLand24.pdf
 
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
 
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN MainzFully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
 
Down the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile ArchitectureDown the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile Architecture
 
"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!
 
Make Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform EngineeringMake Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform Engineering
 
Der Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit PlaywrightDer Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit Playwright
 
Was kommt nach den SPAs
Was kommt nach den SPAsWas kommt nach den SPAs
Was kommt nach den SPAs
 
Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo
 
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
 Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See... Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
 
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
 
Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling
 
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAPKontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
 
Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling
 
Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.
 
Per Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API GatewaysPer Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API Gateways
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
 

Último

➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 

Último (20)

➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

Time Series Analysis

  • 1. O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
  • 2. Time Series Processing with Solr and Spark Josef Adersberger (@adersberger) CTO, QAware
  • 4. 4 01 WE’RE SURROUNDED BY TIME SERIES ▸ Operational data: Monitoring data, performance metrics, log events, … ▸ Data Warehouse: Dimension time ▸ Measured Me: Activity tracking, ECG, … ▸ Sensor telemetry: Sensor data, … ▸ Financial data: Stock charts, … ▸ Climate data: Temperature, … ▸ Web tracking: Clickstreams, … ▸ … @adersberger
  • 5. 5 WE’RE SURROUNDED BY TIME SERIES (Pt. 2) ▸ Oktoberfest: Visitor and beer consumption trend the singularity
  • 6. 6 01 TIME SERIES: BASIC TERMS univariate time series multivariate time series multi-dimensional time series (time series tensor) time series setobservation @adersberger
  • 7. 7 01 ILLUSTRATIVE OPERATIONS ON TIME SERIES align Time series => Time series diff downsampling outlier min/max avg/med slope std-dev Time series => Scalar @adersberger
  • 9. Monitoring Data Analysis 
 of a business-critical,
 worldwide distributed 
 software system. Enable
 root cause analysis and
 anomaly detection.
 > 1,000 nodes worldwide > 10 processes per node > 20 metrics per process
 (OS, JVM, App-spec.) Measured every second. = about 6.3 trillions observations p.a.
 Data retention: 5 yrs.
  • 10.
  • 11. 11 01 USE CASE: EXPLORING Drill-down host process measurements counters (metrics) Query time series metadata Superimpose time series @adersberger
  • 13. 13 01 USE CASE: ANOMALY DETECTION Featuring Twitter Anomaly Detection (https://github.com/twitter/AnomalyDetection
 and Yahoo EGDAS https://github.com/yahoo/egads @adersberger
  • 14. 14 01 USE CASE: SQL AND ZEPPELIN @adersberger
  • 16.
  • 18.
  • 19. 19 01 AVAILABLE TIME SERIES DATABASES https://github.com/qaware/big-data-landscape
  • 20. EASY-TO-USE BIG TIME SERIES DATA STORAGE & PROCESSING ON SPARK
  • 21. 21 01 THE CHRONIX STACK chronix.io Big time series database Scale-out Storage-efficient Interactive queries
 No separate servers: Drop-in 
 to existing Solr and Spark 
 installations
 Integrated into the relevant
 open source ecosystem @adersberger Core Chronix Storage Chronix Server Chronix Spark ChronixFormat GrafanaChronix Analytics Collection Analytics Frontends Logstash fluentd collectd Zeppelin Prometheus Ingestion Bridge KairosDB OpenTSDBInfluxDB Graphite
  • 22. 22 node Distributed Data &
 Data Retrieval ‣ Data sharding ‣ Fast index-based queries ‣ Efficient storage format Distributed Processing ‣ Heavy lifting distributed processing ‣ Efficient integration of Spark and Solr Result Processing Post-processing on a smaller set of time series data flow icon credits to Nimal Raj (database), Arthur Shlain (console) and alvarobueno (takslist) @adersberger
  • 23. 23 TIME SERIES MODEL Set of univariate multi-dimensional numeric time series ▸ set … because it’s more flexible and better to parallelise if operations can input and output multiple time series. ▸ univariate … because multivariate will introduce too much complexity (and we have our set to bundle multiple time series). ▸ multi-dimensional … because the ability to slice & dice in the set of time series is very convenient for a lot of use cases. ▸ numeric … because it’s the most common use case. A single time series is identified by a combination of its non-temporal dimensional values (e.g. unit “mem usage” + host “aws42” + process “tomcat”) @adersberger
  • 24. 24 01 CHRONIX SPARK API: ENTRY POINTS CHRONIX SPARK 
 ChronixRDD ChronixSparkContext ‣ Represents a set of time series ‣ Distributed operations on sets of time series ‣ Creates ChronixRDDs ‣ Speaks with the Chronix Server (Solr) @adersberger
  • 25. 25 01 CHRONIX SPARK API: DATA MODEL MetricTimeSeries MetricObservationDataFrame + toDataFrame() @adersberger Dataset<MetricTimeSeries> Dataset<MetricObservation> + toDataset() + toObservationsDataset() ChronixRDD
  • 26. 26 01 SPARK APIs FOR DATA PROCESSING RDD DataFrame Dataset typed yes no yes optimized medium highly highly mature yes yes medium SQL no yes no @adersberger
  • 27. 27 01 CHRONIX RDD Statistical operations the set characteristic: 
 a JavaRDD of 
 MetricTimeSeries Filter the set (esp. by
 dimensions) @adersberger
  • 28. 28 01 METRICTIMESERIES DATA TYPE access all timestamps the multi-dimensionality:
 get/set dimensions
 (attributes) access all observations as stream access all numeric values @adersberger
  • 29.
  • 30. 30 01 //Create Chronix Spark context from a SparkContext / JavaSparkContext
 ChronixSparkContext csc = new ChronixSparkContext(sc);
 
 //Read data into ChronixRDD
 SolrQuery query = new SolrQuery(
 "metric:"java.lang:type=Memory/HeapMemoryUsage/used"");
 
 ChronixRDD rdd = csc.query(query,
 "localhost:9983", //ZooKeeper host
 "chronix", //Solr collection for Chronix
 new ChronixSolrCloudStorage());
 
 //Calculate the overall min/max/mean of all time series in the RDD
 double min = rdd.min();
 double max = rdd.max();
 double mean = rdd.mean(); DataFrame df = rdd.toDataFrame(sqlContext);
 DataFrame res = df
 .select("time", "value", "process", "metric")
 .where("process='jenkins-jolokia'")
 .orderBy("time");
 res.show(); @adersberger
  • 32. 32 Distributed Data &
 Data Retrieval ‣ Data sharding (OK) ‣ Fast index-based queries (OK) ‣ Efficient storage format @adersberger
  • 33. 33 01 CHRONIX FORMAT: CHUNKING TIME SERIES TIME SERIES ‣ start: TimeStamp ‣ end: TimeStamp ‣ dimensions: Map<String, String> ‣ observations: byte[] TIME SERIES ‣ start: TimeStamp ‣ end: TimeStamp ‣ dimensions: Map<String, String> ‣ observations: byte[] Logical TIME SERIES ‣ start: TimeStamp ‣ end: TimeStamp ‣ dimensions: Map<String, String> ‣ observations: byte[] Physical Chunking: 1 logical time series = 
 n physical time series (chunks) 1 chunk = fixed amount of observations 1 chunk = 1 Solr document @adersberger
  • 34. 34 01 CHRONIX FORMAT: ENCODING OF OBSERVATIONS Binary encoding of all timestamp/value pairs (observations) with ProtoBuf incl. binary compression. Delta encoding leading to more effective binary compression … of time stamps (DCC, Date-Delta-Compaction)
 
 
 
 
 
 
 … of values: diff chunck • timespan • nbr. of observations periodic distributed time stamps (pts): timespan / nbr. of observations real time stamps (rts) if |pts(x) - rts(x)| < threshold : rts(x) = pts(x) value_to_store = pts(x) - rts(x) value_to_store = value(x) - value(x-1) @adersberger
  • 35. 35 01 CHRONIX FORMAT: TUNING CHUNK SIZE AND CODEC GZIP + 128 kBytes Florian Lautenschlager, Michael Philippsen, Andreas Kumlehn, Josef Adersberger
 Chronix: Efficient Storage and Query of Operational Time Series International Conference on Software Maintenance and Evolution 2016 (submitted) @adersberger storage 
 demand access
 time
  • 36. 36 01 CHRONIX FORMAT: STORAGE EFFICIENCY BENCHMARK @adersberger
  • 37. 37 01 CHRONIX FORMAT: PERFORMANCE BENCHMARK unit: secondsnbr of queries query @adersberger
  • 38. 38 Distributed Processing ‣ Heavy lifting distributed processing ‣ Efficient integration of Spark
 and Solr @adersberger
  • 39. 39 01 SPARK AND SOLR BEST PRACTICES: ALIGN PARALLELISM SolrDocument
 (Chunk) Solr Shard Solr Shard TimeSeries TimeSeries TimeSeries TimeSeries TimeSeries Partition Partition ChronixRDD • Unit of parallelism in Spark: Partition • Unit of parallelism in Solr: Shard • 1 Spark Partition = 1 Solr Shard SolrDocument
 (Chunk) SolrDocument
 (Chunk) SolrDocument
 (Chunk) SolrDocument
 (Chunk) SolrDocument
 (Chunk) SolrDocument
 (Chunk) SolrDocument
 (Chunk) SolrDocument
 (Chunk) SolrDocument
 (Chunk) @adersberger
  • 40. 40 01 ALIGN THE PARALLELISM WITHIN CHRONIXRDD public ChronixRDD queryChronixChunks(
 final SolrQuery query,
 final String zkHost,
 final String collection,
 final ChronixSolrCloudStorage<MetricTimeSeries> chronixStorage) throws SolrServerException, IOException {
 
 // first get a list of replicas to query for this collection
 List<String> shards = chronixStorage.getShardList(zkHost, collection);
 
 // parallelize the requests to the shards
 JavaRDD<MetricTimeSeries> docs = jsc.parallelize(shards, shards.size()).flatMap(
 (FlatMapFunction<String, MetricTimeSeries>) shardUrl -> chronixStorage.streamFromSingleNode(
 new KassiopeiaSimpleConverter(), shardUrl, query)::iterator);
 return new ChronixRDD(docs);
 } Figure out all Solr shards (using CloudSolrClient in the background) Query each shard in parallel and convert SolrDocuments to MetricTimeSeries @adersberger
  • 41. 41 01 SPARK AND SOLR BEST PRACTICES: PUSHDOWN SolrQuery query = new SolrQuery(
 “<Solr query containing filters and aggregations>");
 
 ChronixRDD rdd = csc.query(query, … @adersberger Predicate pushdown • Pre-filter time series based on their 
 metadata (dimensions, start, end)
 with Solr.
 Aggregation pushdown • Perform pre-aggregations (min/max/avg/…) at ingestion time and store it as metadata. • (to come) Perform aggregations on Solr-level at query time by enabling Solr to decode observations
  • 42. 42 01 SPARK AND SOLR BEST PRACTICES: EFFICIENT DATA TRANSFER Reduce volume: Pushdown & compression
 Use efficient protocols: 
 Low-overhead, bulk, stream
 Avoid remote transfer: Place Spark
 tasks (processes 1 partition) on the 
 Solr node with the appropriate shard.
 (to come by using SolrRDD) @adersberger Export 
 Handler Chronix
 RDD CloudSolr
 Stream Format Decoder bulk of 
 JSON tuples Chronix Spark Solr / SolrJ
  • 43. 43 private Stream<MetricTimeSeries> 
 streamWithCloudSolrStream(String zkHost, String collection, String shardUrl, SolrQuery query,
 TimeSeriesConverter<MetricTimeSeries> converter) throws IOException {
 Map params = new HashMap();
 params.put("q", query.getQuery());
 params.put("sort", "id asc");
 params.put("shards", extractShardIdFromShardUrl(shardUrl));
 params.put("fl",
 Schema.DATA + ", " + Schema.ID + ", " + Schema.START + ", " + Schema.END +
 ", metric, host, measurement, process, ag, group");
 params.put("qt", "/export");
 params.put("distrib", false);
 
 CloudSolrStream solrStream = new CloudSolrStream(zkHost, collection, params);
 solrStream.open();
 SolrTupleStreamingService tupStream = new SolrTupleStreamingService(solrStream, converter);
 return StreamSupport.stream( Spliterators.spliteratorUnknownSize(tupStream, Spliterator.SIZED), false);
 } Pin query to one shard Use export request handler Boilerplate code to stream response @adersberger
  • 44. Time Series Databases should be first-class citizens. Chronix leverages Solr and Spark to 
 be storage efficient and to allow interactive 
 queries for big time series data.
  • 45. THANK YOU! QUESTIONS? Mail: josef.adersberger@qaware.de Twitter: @adersberger TWITTER.COM/QAWARE - SLIDESHARE.NET/QAWARE
  • 49. PREMATURE OPTIMIZATION IS NOT EVIL IF YOU HANDLE BIG DATA Josef Adersberger
  • 50. PERFORMANCE USING A JAVA PROFILER WITH A LOCAL CLUSTER
  • 52. PERFORMANCE 830 MB -> 360 MB
 (- 57%) unveiled wrong Jackson 
 handling inside of SolrClient
  • 53. 53 01 THE SECRETS OF DISTRIBUTED PROCESSING PERFORMANCE 
 Rule 1: Be as close to the data as possible!
 (CPU cache > memory > local disk > network)
 Rule 2: Reduce data volume as early as possible! 
 (as long as you don’t sacrifice parallelization)
 Rule 3: Parallelize as much as possible! 
 (max = #cores * x)
  • 54. PERFORMANCE THE RULES APPLIED ‣ Rule 1: Be as close to the data as possible! 1. Solr caching 2. Spark in-memory processing with activated RDD compression 3. Binary protocol between Solr and Spark
 ‣ Rule 2: Reduce data volume as early as possible! ‣ Efficient storage format (Chronix Format) ‣ Predicate pushdown to Solr (query) ‣ Group-by & aggregation pushdown to Solr (faceting within a query)
 ‣ Rule 3: Parallelize as much as possible! ‣ Scale-out on data-level with SolrCloud ‣ Scale-out on processing-level with Spark
  • 57. APACHE SPARK SPARK TERMINOLOGY (1/2) ▸ RDD: Has transformations and actions. Hides data partitioning & distributed computation. References a set of partitions (“output partitions”) - materialized or not - and has dependencies to another RDD (“input partitions”). RDD operations are evaluated as late as possible (when an action is called). As long as not being the root RDD the partitions of an RDD are in memory but they can be persisted by request. ▸ Partitions: (Logical) chunks of data. Default unit and level of parallelism - inside of a partition everything is a sequential operation on records. Has to fit into memory. Can have different representations (in-memory, on disk, off heap, …)
  • 58. APACHE SPARK SPARK TERMINOLOGY (2/2) ▸ Job: A computation job which is launched when an action is called on a RDD. ▸ Task: The atomic unit of work (function). Bound to exactly one partition. ▸ Stage: Set of Task pipelines which can be executed in parallel on one executor. ▸ Shuffling: If partitions need to be transferred between executors. Shuffle write = outbound partition transfer. Shuffle read = inbound partition transfer. ▸ DAG Scheduler: Computes DAG of stages from RDD DAG. Determines the preferred location for each task.
  • 59. THE COMPETITORS / ALTERNATIVES CHRONIX RDD VS. SPARK-TS ▸ Spark-TS provides no specific time series storage it uses the Spark persistence mechanisms instead. This leads to a less efficient storage usage and less possibilities to perform performance optimizations via predicate pushdown. ▸ In contrast to Spark-TS Chronix does not align all time series values on one vector of timestamps. This leads to greater flexibility in time series aggregation ▸ Chronix provides multi-dimensional time series as this is very useful for data warehousing and APM. ▸ Chronix has support for Datasets as this will be an important Spark API in the near future. But Chronix currently doesn’t support an IndexedRowMatrix for SparkML. ▸ Chronix is purely written in Java. There is no explicit support for Python and Scala yet. ▸ Chronix doesn not support a ZonedTime as this makes it way more complicated.
  • 61. 61 01 CHRONIXRDD: GET THE CHUNKS FROM SOLR public ChronixRDD queryChronixChunks(
 final SolrQuery query,
 final String zkHost,
 final String collection,
 final ChronixSolrCloudStorage<MetricTimeSeries> chronixStorage) throws SolrServerException, IOException {
 
 // first get a list of replicas to query for this collection
 List<String> shards = chronixStorage.getShardList(zkHost, collection);
 
 // parallelize the requests to the shards
 JavaRDD<MetricTimeSeries> docs = jsc.parallelize(shards, shards.size()).flatMap(
 (FlatMapFunction<String, MetricTimeSeries>) shardUrl -> chronixStorage.streamFromSingleNode(
 new KassiopeiaSimpleConverter(), shardUrl, query)::iterator);
 return new ChronixRDD(docs);
 } Figure out all Solr shards (using CloudSolrClient in the background) Query each shard in parallel and convert SolrDocuments to MetricTimeSeries
  • 62. 62 01 BINARY PROTOCOL WITH STANDARD SOLR CLIENT private Stream<MetricTimeSeries> streamWithHttpSolrClient(String shardUrl,
 SolrQuery query,
 TimeSeriesConverter<MetricTimeSeries> converter) {
 HttpSolrClient solrClient = getSingleNodeSolrClient(shardUrl);
 solrClient.setRequestWriter(new BinaryRequestWriter());
 query.set("distrib", false);
 SolrStreamingService<MetricTimeSeries> solrStreamingService = 
 new SolrStreamingService<>(converter, query, solrClient, nrOfDocumentPerBatch);
 return StreamSupport.stream(
 Spliterators.spliteratorUnknownSize(solrStreamingService, Spliterator.SIZED), false);
 } Use HttpSolrClient pinned to one shard Use binary (request)
 protocol Boilerplate code to stream response
  • 63. 63 private Stream<MetricTimeSeries> 
 streamWithCloudSolrStream(String zkHost, String collection, String shardUrl, SolrQuery query,
 TimeSeriesConverter<MetricTimeSeries> converter) throws IOException {
 Map params = new HashMap();
 params.put("q", query.getQuery());
 params.put("sort", "id asc");
 params.put("shards", extractShardIdFromShardUrl(shardUrl));
 params.put("fl",
 Schema.DATA + ", " + Schema.ID + ", " + Schema.START + ", " + Schema.END +
 ", metric, host, measurement, process, ag, group");
 params.put("qt", "/export");
 params.put("distrib", false);
 
 CloudSolrStream solrStream = new CloudSolrStream(zkHost, collection, params);
 solrStream.open();
 SolrTupleStreamingService tupStream = new SolrTupleStreamingService(solrStream, converter);
 return StreamSupport.stream( Spliterators.spliteratorUnknownSize(tupStream, Spliterator.SIZED), false);
 } EXPORT HANDLER PROTOCOL Pin query to one shard Use export request handler Boilerplate code to stream response
  • 64. 64 01 CHRONIXRDD: FROM CHUNKS TO TIME SERIES public ChronixRDD joinChunks() {
 JavaPairRDD<MetricTimeSeriesKey, Iterable<MetricTimeSeries>> groupRdd
 = this.groupBy(MetricTimeSeriesKey::new);
 
 JavaPairRDD<MetricTimeSeriesKey, MetricTimeSeries> joinedRdd
 = groupRdd.mapValues((Function<Iterable<MetricTimeSeries>, MetricTimeSeries>) mtsIt -> {
 MetricTimeSeriesOrdering ordering = new MetricTimeSeriesOrdering();
 List<MetricTimeSeries> orderedChunks = ordering.immutableSortedCopy(mtsIt);
 MetricTimeSeries result = null;
 for (MetricTimeSeries mts : orderedChunks) {
 if (result == null) {
 result = new MetricTimeSeries
 .Builder(mts.getMetric())
 .attributes(mts.attributes()).build();
 }
 result.addAll(mts.getTimestampsAsArray(), mts.getValuesAsArray());
 }
 return result;
 });
 
 JavaRDD<MetricTimeSeries> resultJavaRdd =
 joinedRdd.map((Tuple2<MetricTimeSeriesKey, MetricTimeSeries> mtTuple) -> mtTuple._2);
 
 return new ChronixRDD(resultJavaRdd); } group chunks according identity join chunks to
 logical time 
 series