SlideShare uma empresa Scribd logo
1 de 48
Baixar para ler offline
Building a High-
Performance Database with
Scala, Akka, and Spark
Evan Chan
November 2017
Who am I
User and contributor to Spark since 0.9,
Cassandra since 0.6
Created Spark Job Server and FiloDB
Talks at Spark Summit, Cassandra Summit, Strata,
Scala Days, etc.
http://velvia.github.io/
Why Build a New
Streaming Database?
Needs
• Ingest HUGE streams of events — IoT etc.
• Real-time, low latency, and somewhat flexible queries
• Dashboards, quick answers on new data
• Flexible schemas and query patterns
• Keep your streaming pipeline super simple
• Streaming = hardest to debug. Simplicity rules!
Message
Queue
Events
Stream
Processing
Layer
State /
Database
Happy
Users
Spark + HDFS Streaming
Kafka
Spark
Streaming
Many small files
(microbatches)
Dedup,
consolidate
job
Larger efficient
files
• High latency
• Big impedance mismatch between streaming
systems and a file system designed for big blobs
of data
Cassandra?
• Ingest HUGE streams of events — IoT etc.
• C* is not efficient for writing raw events
• Real-time, low latency, and somewhat flexible queries
• C* is real-time, but only low latency for simple
lookups. Add Spark => much higher latency
• Flexible schemas and query patterns
• C* only handles simple lookups
Introducing FiloDB
A distributed, columnar time-series/event database.
Built for streaming.
http://www.github.com/filodb/FiloDB
Message
Queue
Events
Spark
Streaming
Short term
storage, K-V
Adhoc,
SQL, ML
Cassandra
FiloDB: Events,
ad-hoc, batch
Spark
Dashboa
rds,
maps
100% Reactive
• Scala
• Akka Cluster
• Spark
• Monix / Reactive Streams
• Typesafe Config for all configuration
• Scodec, Ficus, Enumeratum, Scalactic, etc.
• Even most of the performance critical parts are written in Scala
:)
Scala, Akka, and
Spark for Database
Why use Scala and Akka?
• Akka Cluster!
• Just the right abstractions - streams, futures,
Akka, type safety….
• Failure handling and supervision are critical for
databases
• All the pattern matching and immutable goodness
:)
Scala Big Data Projects
• Spark
• GeoMesa
• Khronus - Akka time-series DB
• Sirius - Akka distributed KV Store
• FiloDB!
Actors vs Futures vs
Observables
One FiloDB Node
NodeCoordinatorActor
(NCA)
DatasetCoordinatorActor
(DsCA)
DatasetCoordinatorActor
(DsCA)
Active MemTable
Flushing MemTable
Reprojector ColumnStore
Data, commands
Akka vs Futures
NodeCoordinatorActor
(NCA)
DatasetCoordinatorActor
(DsCA)
DatasetCoordinatorActor
(DsCA)
Active MemTable
Flushing MemTable
Reprojector ColumnStore
Data, commands
Akka - control
flow
Core I/O - Futures/Observables
Akka vs Futures
• Akka Actors:
• External FiloDB node API (remote + cluster)
• Async messaging with clients
• Cluster/distributed state management
• Futures and Observables:
• Core I/O
• Columnar data processing / ingestion
• Type-safe processing stages
Futures for Single Actions
/**
* Clears all data from the column store for that given projection, for all versions.
* More like a truncation, not a drop.
* NOTE: please make sure there are no reprojections or writes going on before calling this
*/
def clearProjectionData(projection: Projection): Future[Response]
/**
* Completely and permanently drops the dataset from the column store.
* @param dataset the DatasetRef for the dataset to drop.
*/
def dropDataset(dataset: DatasetRef): Future[Response]
/**
* Appends the ChunkSets and incremental indices in the segment to the column store.
* @param segment the ChunkSetSegment to write / merge to the columnar store
* @param version the version # to write the segment to
* @return Success. Future.failure(exception) otherwise.
*/
def appendSegment(projection: RichProjection,
segment: ChunkSetSegment,
version: Int): Future[Response]
Monix / Reactive Streams
• http://monix.io
• “observable sequences that are exposed as
asynchronous streams, expanding on the
observer pattern, strongly inspired by ReactiveX
and by Scalaz, but designed from the ground up
for back-pressure and made to cleanly interact
with Scala’s standard library, compatible out-of-
the-box with the Reactive Streams protocol”
• Much better than Future[Iterator[_]]
Monix / Reactive Streams
def readChunks(projection: RichProjection,
columns: Seq[Column],
version: Int,
partMethod: PartitionScanMethod,
chunkMethod: ChunkScanMethod = AllChunkScan): Observable[ChunkSetReader] = {
scanPartitions(projection, version, partMethod)
// Partitions to pipeline of single chunks
.flatMap { partIndex =>
stats.incrReadPartitions(1)
readPartitionChunks(projection.datasetRef, version, columns, partIndex, chunkMethod)
// Collate single chunks to ChunkSetReaders
}.scan(new ChunkSetReaderAggregator(columns, stats)) { _ add _ }
.collect { case agg: ChunkSetReaderAggregator if agg.canEmit => agg.emit() }
}
}
Functional Reactive Stream
Processing
• Ingest stream merged with flush commands
• Built in async/parallel tasks via mapAsync
• Notify on end of stream, errors
val combinedStream = Observable.merge(stream.map(SomeData), flushStream)
combinedStream.map {
case SomeData(records) => shard.ingest(records)
None
case FlushCommand(group) => shard.switchGroupBuffers(group)
Some(FlushGroup(shard.shardNum, group, shard.latestOffset))
}.collect { case Some(flushGroup) => flushGroup }
.mapAsync(numParallelFlushes)(shard.createFlushTask _)
.foreach { x => }
.recover { case ex: Exception => errHandler(ex) }
Akka Cluster and
Spark
Spark/Akka Cluster Setup
Driver
NodeClusterActor
Client
Executor
NCA
DsCA1 DsCA2
Executor
NCA
DsCA1 DsCA2
Adding one executor
Driver
NodeClusterActor
Client
executor1
NCA
DsCA1 DsCA2
State:

Executors ->
(executor1)
MemberUp
ActorSelection
ActorRef
Adding second executor
Driver
NodeClusterActor
Client
executor1
NCA
DsCA1 DsCA2
State:

Executors ->
(executor1,
executor2) MemberUp
ActorSelection ActorRef
executor2
NCA
DsCA1 DsCA2
Sending a command
Driver
NodeClusterActor
Client
Executor
NCA
DsCA1 DsCA2
Executor
NCA
DsCA1 DsCA2
Flush()
Yes, Akka in Spark
• Columnar ingestion is stateful - need stickiness of
state. This is inherently difficult in Spark.
• Akka (cluster) gives us a separate, asynchronous
control channel to talk to FiloDB ingestors
• Spark only gives data flow primitives, not async
messaging
• We need to route incoming records to the correct
ingestion node. Sorting data is inefficient and forces
all nodes to wait for sorting to be done.
Data Ingestion Setup
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source
Actor
Row Source
Actor
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source
Actor
Row Source
Actor
Node
Cluster
Actor
Partition Map
FiloDB NodeFiloDB Node
FiloDB separate nodes
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source
Actor
Row Source
Actor
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source
Actor
Row Source
Actor
Node
Cluster
Actor
Partition Map
Testing Akka Cluster
• MultiNodeSpec / sbt-multi-jvm
• NodeClusterSpec
• Tests joining of different cluster nodes and
partition map updates
• Is partition map updated properly if a cluster
node goes down — inject network failures
• Lessons
Kamon Tracing
• http://kamon.io
• One trace can encapsulate multiple Future steps
all executing on different threads
• Tunable tracing levels
• Summary stats and histograms for segments
• Super useful for production debugging of reactive
stack
Kamon Tracing
def appendSegment(projection: RichProjection,
segment: ChunkSetSegment,
version: Int): Future[Response] = Tracer.withNewContext("append-segment") {
val ctx = Tracer.currentContext
stats.segmentAppend()
if (segment.chunkSets.isEmpty) {
stats.segmentEmpty()
return(Future.successful(NotApplied))
}
for { writeChunksResp <- writeChunks(projection.datasetRef, version, segment, ctx)
writeIndexResp <- writeIndices(projection, version, segment, ctx)
if writeChunksResp == Success
} yield {
ctx.finish()
writeIndexResp
}
}
private def writeChunks(dataset: DatasetRef,
version: Int,
segment: ChunkSetSegment,
ctx: TraceContext): Future[Response] = {
asyncSubtrace(ctx, "write-chunks", "ingestion") {
val binPartition = segment.binaryPartition
val segmentId = segment.segmentId
val chunkTable = getOrCreateChunkTable(dataset)
Future.traverse(segment.chunkSets) { chunkSet =>
chunkTable.writeChunks(binPartition, version, segmentId, chunkSet.info.id, chunkSet.chunks, stats)
}.map { responses => responses.head }
}
}
Kamon Metrics
• Uses HDRHistogram for much finer and more
accurate buckets
• Built-in metrics for Akka actors, Spray, Akka-Http,
Play, etc. etc.
KAMON trace name=append-segment n=2863 min=765952 p50=2113536 p90=3211264 p95=3981312
p99=9895936 p999=16121856 max=19529728
KAMON trace-segment name=write-chunks n=2864 min=436224 p50=1597440 p90=2637824
p95=3424256 p99=9109504 p999=15335424 max=18874368
KAMON trace-segment name=write-index n=2863 min=278528 p50=432128 p90=544768 p95=598016
p99=888832 p999=2260992 max=8355840
Validation: Scalactic
private def getColumnsFromNames(allColumns: Seq[Column],
columnNames: Seq[String]): Seq[Column] Or BadSchema = {
if (columnNames.isEmpty) {
Good(allColumns)
} else {
val columnMap = allColumns.map { c => c.name -> c }.toMap
val missing = columnNames.toSet -- columnMap.keySet
if (missing.nonEmpty) { Bad(MissingColumnNames(missing.toSeq, "projection")) }
else { Good(columnNames.map(columnMap)) }
}
}
for { computedColumns <- getComputedColumns(dataset.name, allColIds, columns)
dataColumns <- getColumnsFromNames(columns, normProjection.columns)
richColumns = dataColumns ++ computedColumns
// scalac has problems dealing with (a, b, c) <- getColIndicesAndType... apparently
segStuff <- getColIndicesAndType(richColumns, Seq(normProjection.segmentColId), "segment")
keyStuff <- getColIndicesAndType(richColumns, normProjection.keyColIds, "row")
partStuff <- getColIndicesAndType(richColumns, dataset.partitionColumns, "partition") }
yield {
• Notice how multiple validations compose!
Machine-Speed Scala
How do you go REALLY fast?
• Don’t serialize
• Don’t allocate
• Don’t copy
Filo fast
• Filo binary vectors - 2 billion records/sec
• Spark InMemoryColumnStore - 125 million
records/sec
• Spark CassandraColumnStore - 25 million
records/sec
Filo: High Performance
Binary Vectors
• Designed for NoSQL, not a file format
• random or linear access
• on or off heap
• missing value support
• Scala only, but cross-platform support possible
http://github.com/velvia/filo is a binary data vector library designed
for extreme read performance with minimal deserialization costs.
Billions of Ops / Sec
• JMH benchmark: 0.5ns per FiloVector element access / add
• 2 Billion adds per second - single threaded
• Who said Scala cannot be fast?
• Spark API (row-based) limits performance significantly
val randomInts = (0 until numValues).map(i => util.Random.nextInt)
val randomIntsAray = randomInts.toArray
val filoBuffer = VectorBuilder(randomInts).toFiloBuffer
val sc = FiloVector[Int](filoBuffer)
@Benchmark
@BenchmarkMode(Array(Mode.AverageTime))
@OutputTimeUnit(TimeUnit.MICROSECONDS)
def sumAllIntsFiloApply(): Int = {
var total = 0
for { i <- 0 until numValues optimized } {
total += sc(i)
}
total
}
JVM Inlining
• Very small methods can be inlined by the JVM
• final def avoids virtual method dispatch.
• Thus methods in traits, abstract classes not inlinable
val base = baseReader.readInt(0)
final def apply(i: Int): Int = base + dataReader.read(i)
case (32, _) => new TypedBufferReader[Int] {
final def read(i: Int): Int = reader.readInt(i)
}
final def readInt(i: Int): Int = unsafe.getInt(byteArray, (offset + i * 4).toLong)
0.5ns/read is achieved through a stack of very small methods:
BinaryRecord
• Tough problem: FiloDB must handle many
different datasets, each with different schemas
• Cannot rely on static types and standard
serialization mechanisms - case classes,
Protobuf, etc.
• Serialization very costly, especially strings
• Solution: BinaryRecord
BinaryRecord II
• BinaryRecord is a binary (ie transport ready) record
class that supports any schema or mix of column
types
• Values can be extracted or written with no serialization
cost
• UTF8-encoded string class
• String compare as fast as native Java strings
• Immutable API once built
Use Case: Sorting
• Regular sorting: deserialize record, create sort
key, compare sort key
• BinaryRecord sorting: binary compare fields
directly — no deserialization, no object allocations
Regular Sorting
Protobuf/Avro etc record
Deserialized instance
Sort Key
Protobuf/Avro etc record
Deserialized instance
Sort Key
Cmp
BinaryRecord Sorting
• BinaryRecord sorting: binary compare fields
directly — no deserialization, no object allocations
name: Str age: Int
lastTimestamp:
Long
group: Str
name: Str age: Int
lastTimestamp:
Long
group: Str
SBT-JMH
• Super useful tool to leverage JMH, the best micro
benchmarking harness
• JMH is written by the JDK folks
In Summary
• Scala, Akka, reactive can give you both awesome
abstractions AND performance
• Use Akka for distribution, state, protocols
• Use reactive/Monix for functional, concurrent
stream processing
• Build (or use FiloDB’s) fast low-level abstractions
with good APIs
Thank you Scala OSS!

Mais conteúdo relacionado

Destaque

Introduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFVIntroduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFV
Kingston Smiler
 
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
Azamat Abdoullaev
 

Destaque (20)

Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
Linux Security APIs and the Chromium Sandbox (SwedenCpp Meetup 2017)
 
What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?What in the World is Going on at The Linux Foundation?
What in the World is Going on at The Linux Foundation?
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
 
Advanced memory allocation
Advanced memory allocationAdvanced memory allocation
Advanced memory allocation
 
Docker Networking
Docker NetworkingDocker Networking
Docker Networking
 
Virtualization
VirtualizationVirtualization
Virtualization
 
Server virtualization
Server virtualizationServer virtualization
Server virtualization
 
Go Execution Tracer
Go Execution TracerGo Execution Tracer
Go Execution Tracer
 
In-depth forensic analysis of Windows registry files
In-depth forensic analysis of Windows registry filesIn-depth forensic analysis of Windows registry files
In-depth forensic analysis of Windows registry files
 
SDN Architecture & Ecosystem
SDN Architecture & EcosystemSDN Architecture & Ecosystem
SDN Architecture & Ecosystem
 
OpenFlow
OpenFlowOpenFlow
OpenFlow
 
Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017Deep dive into Coroutines on JVM @ KotlinConf 2017
Deep dive into Coroutines on JVM @ KotlinConf 2017
 
Network Virtualization
Network VirtualizationNetwork Virtualization
Network Virtualization
 
Introduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFVIntroduction to OpenFlow, SDN and NFV
Introduction to OpenFlow, SDN and NFV
 
Scaling and Transaction Futures
Scaling and Transaction FuturesScaling and Transaction Futures
Scaling and Transaction Futures
 
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
2018 State of the Union Address: Rediscovering the American way: USA XXI: Fut...
 
Blockchain demystification
Blockchain demystificationBlockchain demystification
Blockchain demystification
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
 
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
 
Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017
Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017
Andrea Tosatto - Kubernetes Beyond - Codemotion Milan 2017
 

Mais de Evan Chan

Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server Talk
Evan Chan
 

Mais de Evan Chan (17)

Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Designing Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and KubernetesDesigning Stateful Apps for Cloud and Kubernetes
Designing Stateful Apps for Cloud and Kubernetes
 
Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019Histograms at scale - Monitorama 2019
Histograms at scale - Monitorama 2019
 
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at ScaleFiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
FiloDB: Reactive, Real-Time, In-Memory Time Series at Scale
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
MIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data ArchitectureMIT lecture - Socrata Open Data Architecture
MIT lecture - Socrata Open Data Architecture
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and Spark
 
Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server Talk
 
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
Spark Job Server and Spark as a Query Engine (Spark Meetup 5/14)
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
 
Real-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and SharkReal-time Analytics with Cassandra, Spark, and Shark
Real-time Analytics with Cassandra, Spark, and Shark
 

Último

Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 

2017 High Performance Database with Scala, Akka, Spark

  • 1. Building a High- Performance Database with Scala, Akka, and Spark Evan Chan November 2017
  • 2. Who am I User and contributor to Spark since 0.9, Cassandra since 0.6 Created Spark Job Server and FiloDB Talks at Spark Summit, Cassandra Summit, Strata, Scala Days, etc. http://velvia.github.io/
  • 3. Why Build a New Streaming Database?
  • 4. Needs • Ingest HUGE streams of events — IoT etc. • Real-time, low latency, and somewhat flexible queries • Dashboards, quick answers on new data • Flexible schemas and query patterns • Keep your streaming pipeline super simple • Streaming = hardest to debug. Simplicity rules!
  • 6. Spark + HDFS Streaming Kafka Spark Streaming Many small files (microbatches) Dedup, consolidate job Larger efficient files • High latency • Big impedance mismatch between streaming systems and a file system designed for big blobs of data
  • 7. Cassandra? • Ingest HUGE streams of events — IoT etc. • C* is not efficient for writing raw events • Real-time, low latency, and somewhat flexible queries • C* is real-time, but only low latency for simple lookups. Add Spark => much higher latency • Flexible schemas and query patterns • C* only handles simple lookups
  • 8. Introducing FiloDB A distributed, columnar time-series/event database. Built for streaming. http://www.github.com/filodb/FiloDB
  • 9. Message Queue Events Spark Streaming Short term storage, K-V Adhoc, SQL, ML Cassandra FiloDB: Events, ad-hoc, batch Spark Dashboa rds, maps
  • 10. 100% Reactive • Scala • Akka Cluster • Spark • Monix / Reactive Streams • Typesafe Config for all configuration • Scodec, Ficus, Enumeratum, Scalactic, etc. • Even most of the performance critical parts are written in Scala :)
  • 11. Scala, Akka, and Spark for Database
  • 12. Why use Scala and Akka? • Akka Cluster! • Just the right abstractions - streams, futures, Akka, type safety…. • Failure handling and supervision are critical for databases • All the pattern matching and immutable goodness :)
  • 13. Scala Big Data Projects • Spark • GeoMesa • Khronus - Akka time-series DB • Sirius - Akka distributed KV Store • FiloDB!
  • 14. Actors vs Futures vs Observables
  • 16. Akka vs Futures NodeCoordinatorActor (NCA) DatasetCoordinatorActor (DsCA) DatasetCoordinatorActor (DsCA) Active MemTable Flushing MemTable Reprojector ColumnStore Data, commands Akka - control flow Core I/O - Futures/Observables
  • 17. Akka vs Futures • Akka Actors: • External FiloDB node API (remote + cluster) • Async messaging with clients • Cluster/distributed state management • Futures and Observables: • Core I/O • Columnar data processing / ingestion • Type-safe processing stages
  • 18. Futures for Single Actions /** * Clears all data from the column store for that given projection, for all versions. * More like a truncation, not a drop. * NOTE: please make sure there are no reprojections or writes going on before calling this */ def clearProjectionData(projection: Projection): Future[Response] /** * Completely and permanently drops the dataset from the column store. * @param dataset the DatasetRef for the dataset to drop. */ def dropDataset(dataset: DatasetRef): Future[Response] /** * Appends the ChunkSets and incremental indices in the segment to the column store. * @param segment the ChunkSetSegment to write / merge to the columnar store * @param version the version # to write the segment to * @return Success. Future.failure(exception) otherwise. */ def appendSegment(projection: RichProjection, segment: ChunkSetSegment, version: Int): Future[Response]
  • 19. Monix / Reactive Streams • http://monix.io • “observable sequences that are exposed as asynchronous streams, expanding on the observer pattern, strongly inspired by ReactiveX and by Scalaz, but designed from the ground up for back-pressure and made to cleanly interact with Scala’s standard library, compatible out-of- the-box with the Reactive Streams protocol” • Much better than Future[Iterator[_]]
  • 20. Monix / Reactive Streams def readChunks(projection: RichProjection, columns: Seq[Column], version: Int, partMethod: PartitionScanMethod, chunkMethod: ChunkScanMethod = AllChunkScan): Observable[ChunkSetReader] = { scanPartitions(projection, version, partMethod) // Partitions to pipeline of single chunks .flatMap { partIndex => stats.incrReadPartitions(1) readPartitionChunks(projection.datasetRef, version, columns, partIndex, chunkMethod) // Collate single chunks to ChunkSetReaders }.scan(new ChunkSetReaderAggregator(columns, stats)) { _ add _ } .collect { case agg: ChunkSetReaderAggregator if agg.canEmit => agg.emit() } } }
  • 21. Functional Reactive Stream Processing • Ingest stream merged with flush commands • Built in async/parallel tasks via mapAsync • Notify on end of stream, errors val combinedStream = Observable.merge(stream.map(SomeData), flushStream) combinedStream.map { case SomeData(records) => shard.ingest(records) None case FlushCommand(group) => shard.switchGroupBuffers(group) Some(FlushGroup(shard.shardNum, group, shard.latestOffset)) }.collect { case Some(flushGroup) => flushGroup } .mapAsync(numParallelFlushes)(shard.createFlushTask _) .foreach { x => } .recover { case ex: Exception => errHandler(ex) }
  • 24. Adding one executor Driver NodeClusterActor Client executor1 NCA DsCA1 DsCA2 State:
 Executors -> (executor1) MemberUp ActorSelection ActorRef
  • 25. Adding second executor Driver NodeClusterActor Client executor1 NCA DsCA1 DsCA2 State:
 Executors -> (executor1, executor2) MemberUp ActorSelection ActorRef executor2 NCA DsCA1 DsCA2
  • 27. Yes, Akka in Spark • Columnar ingestion is stateful - need stickiness of state. This is inherently difficult in Spark. • Akka (cluster) gives us a separate, asynchronous control channel to talk to FiloDB ingestors • Spark only gives data flow primitives, not async messaging • We need to route incoming records to the correct ingestion node. Sorting data is inefficient and forces all nodes to wait for sorting to be done.
  • 28. Data Ingestion Setup Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Node Cluster Actor Partition Map
  • 29. FiloDB NodeFiloDB Node FiloDB separate nodes Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Node Cluster Actor Partition Map
  • 30. Testing Akka Cluster • MultiNodeSpec / sbt-multi-jvm • NodeClusterSpec • Tests joining of different cluster nodes and partition map updates • Is partition map updated properly if a cluster node goes down — inject network failures • Lessons
  • 31. Kamon Tracing • http://kamon.io • One trace can encapsulate multiple Future steps all executing on different threads • Tunable tracing levels • Summary stats and histograms for segments • Super useful for production debugging of reactive stack
  • 32. Kamon Tracing def appendSegment(projection: RichProjection, segment: ChunkSetSegment, version: Int): Future[Response] = Tracer.withNewContext("append-segment") { val ctx = Tracer.currentContext stats.segmentAppend() if (segment.chunkSets.isEmpty) { stats.segmentEmpty() return(Future.successful(NotApplied)) } for { writeChunksResp <- writeChunks(projection.datasetRef, version, segment, ctx) writeIndexResp <- writeIndices(projection, version, segment, ctx) if writeChunksResp == Success } yield { ctx.finish() writeIndexResp } } private def writeChunks(dataset: DatasetRef, version: Int, segment: ChunkSetSegment, ctx: TraceContext): Future[Response] = { asyncSubtrace(ctx, "write-chunks", "ingestion") { val binPartition = segment.binaryPartition val segmentId = segment.segmentId val chunkTable = getOrCreateChunkTable(dataset) Future.traverse(segment.chunkSets) { chunkSet => chunkTable.writeChunks(binPartition, version, segmentId, chunkSet.info.id, chunkSet.chunks, stats) }.map { responses => responses.head } } }
  • 33. Kamon Metrics • Uses HDRHistogram for much finer and more accurate buckets • Built-in metrics for Akka actors, Spray, Akka-Http, Play, etc. etc. KAMON trace name=append-segment n=2863 min=765952 p50=2113536 p90=3211264 p95=3981312 p99=9895936 p999=16121856 max=19529728 KAMON trace-segment name=write-chunks n=2864 min=436224 p50=1597440 p90=2637824 p95=3424256 p99=9109504 p999=15335424 max=18874368 KAMON trace-segment name=write-index n=2863 min=278528 p50=432128 p90=544768 p95=598016 p99=888832 p999=2260992 max=8355840
  • 34. Validation: Scalactic private def getColumnsFromNames(allColumns: Seq[Column], columnNames: Seq[String]): Seq[Column] Or BadSchema = { if (columnNames.isEmpty) { Good(allColumns) } else { val columnMap = allColumns.map { c => c.name -> c }.toMap val missing = columnNames.toSet -- columnMap.keySet if (missing.nonEmpty) { Bad(MissingColumnNames(missing.toSeq, "projection")) } else { Good(columnNames.map(columnMap)) } } } for { computedColumns <- getComputedColumns(dataset.name, allColIds, columns) dataColumns <- getColumnsFromNames(columns, normProjection.columns) richColumns = dataColumns ++ computedColumns // scalac has problems dealing with (a, b, c) <- getColIndicesAndType... apparently segStuff <- getColIndicesAndType(richColumns, Seq(normProjection.segmentColId), "segment") keyStuff <- getColIndicesAndType(richColumns, normProjection.keyColIds, "row") partStuff <- getColIndicesAndType(richColumns, dataset.partitionColumns, "partition") } yield { • Notice how multiple validations compose!
  • 36. How do you go REALLY fast? • Don’t serialize • Don’t allocate • Don’t copy
  • 37. Filo fast • Filo binary vectors - 2 billion records/sec • Spark InMemoryColumnStore - 125 million records/sec • Spark CassandraColumnStore - 25 million records/sec
  • 38. Filo: High Performance Binary Vectors • Designed for NoSQL, not a file format • random or linear access • on or off heap • missing value support • Scala only, but cross-platform support possible http://github.com/velvia/filo is a binary data vector library designed for extreme read performance with minimal deserialization costs.
  • 39. Billions of Ops / Sec • JMH benchmark: 0.5ns per FiloVector element access / add • 2 Billion adds per second - single threaded • Who said Scala cannot be fast? • Spark API (row-based) limits performance significantly val randomInts = (0 until numValues).map(i => util.Random.nextInt) val randomIntsAray = randomInts.toArray val filoBuffer = VectorBuilder(randomInts).toFiloBuffer val sc = FiloVector[Int](filoBuffer) @Benchmark @BenchmarkMode(Array(Mode.AverageTime)) @OutputTimeUnit(TimeUnit.MICROSECONDS) def sumAllIntsFiloApply(): Int = { var total = 0 for { i <- 0 until numValues optimized } { total += sc(i) } total }
  • 40. JVM Inlining • Very small methods can be inlined by the JVM • final def avoids virtual method dispatch. • Thus methods in traits, abstract classes not inlinable val base = baseReader.readInt(0) final def apply(i: Int): Int = base + dataReader.read(i) case (32, _) => new TypedBufferReader[Int] { final def read(i: Int): Int = reader.readInt(i) } final def readInt(i: Int): Int = unsafe.getInt(byteArray, (offset + i * 4).toLong) 0.5ns/read is achieved through a stack of very small methods:
  • 41. BinaryRecord • Tough problem: FiloDB must handle many different datasets, each with different schemas • Cannot rely on static types and standard serialization mechanisms - case classes, Protobuf, etc. • Serialization very costly, especially strings • Solution: BinaryRecord
  • 42. BinaryRecord II • BinaryRecord is a binary (ie transport ready) record class that supports any schema or mix of column types • Values can be extracted or written with no serialization cost • UTF8-encoded string class • String compare as fast as native Java strings • Immutable API once built
  • 43. Use Case: Sorting • Regular sorting: deserialize record, create sort key, compare sort key • BinaryRecord sorting: binary compare fields directly — no deserialization, no object allocations
  • 44. Regular Sorting Protobuf/Avro etc record Deserialized instance Sort Key Protobuf/Avro etc record Deserialized instance Sort Key Cmp
  • 45. BinaryRecord Sorting • BinaryRecord sorting: binary compare fields directly — no deserialization, no object allocations name: Str age: Int lastTimestamp: Long group: Str name: Str age: Int lastTimestamp: Long group: Str
  • 46. SBT-JMH • Super useful tool to leverage JMH, the best micro benchmarking harness • JMH is written by the JDK folks
  • 47. In Summary • Scala, Akka, reactive can give you both awesome abstractions AND performance • Use Akka for distribution, state, protocols • Use reactive/Monix for functional, concurrent stream processing • Build (or use FiloDB’s) fast low-level abstractions with good APIs