SlideShare a Scribd company logo
1 of 36
Download to read offline
High performant in-
memory datasets
with Netflix Hollow
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 1
Who am I?
Roberto Perez Alcolea
• Mexican
• Streaming Platform @ Target
• Groovy Enthusiast
• roberto@perezalcolea.info
• @rpalcolea
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 2
Dataset Distribution
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 3
The problem
• Dissemination of small or moderately sized data sets (no
big-data)
Common approaches:
• Sending data to a data store (RDMS, NoSQL)
• Serializing and keeping a local copy
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 4
Is there a solution?
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 5
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 6
What is Hollow?
• Java library and toolset for disseminating in-memory
datasets from a single producer to many consumers
Goals:
• Maximum development agility
• Highly optimized performance and resource
management
• Extreme stability and reliability
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 7
How it works
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 8
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 9
Key concepts
• Blob / Blob Store
• Snapshot/Delta
• Type
• Deduplication
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 10
Key concepts
• State Engine
• Ordinal
• unique identifier of the record within a type
• sufficient to locate the record within a type
• Immutability
• Cycle
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 11
Hollow Producer
• A single machine that retrieves all data from a source of
truth and produces a delta chain
• Encapsulates the details of compacting, publishing,
announcing, validating, and (if necessary) rollback of data
states
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 12
Hollow Producer
class MyProducer {
HollowProducer buildProducer() {
...
HollowProducer
.withPublisher(publisher) /// required: a BlobPublisher
.withAnnouncer(announcer) /// optional: an Announcer
.withValidators(validators) /// optional: one or more Validator
.withListeners(listeners) /// optional: one or more HollowProducerListeners
.withBlobStagingDir(dir) /// optional: a java.io.File
.withBlobCompressor(compressor) /// optional: a BlobCompressor
.withBlobStager(stager) /// optional: a BlobStager
.withSnapshotPublishExecutor(e) /// optional: a java.util.concurrent.Executor
.withNumStatesBetweenSnapshots(n) /// optional: an int
.withTargetMaxTypeShardSize(size) /// optional: a long
.build()
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 13
Hollow Producer - Capabilities
• Restore at Startup
• Rolling Back
• Validating Data
• Compacting Data
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 14
Hollow Consumer
• Encapsulates the details of initializing and keeping a dataset
up to date
• At initialization time, loads snapshot
• After initialization time, keeps a local copy of the dataset
current by applying delta transitions
• Each time a new version is announced, triggerRefresh()
should be called on the HollowConsumer
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 15
Hollow Consumer
class MyConsumer {
HollowConsumer buildConsumer() {
...
HollowConsumer
.withBlobRetriever(blobRetriever) /// required: a BlobRetriever
.withLocalBlobStore(localDiskDir) /// optional: a local disk location
.withAnnouncementWatcher(announcementWatcher) /// optional: a AnnouncementWatcher
.withRefreshListener(refreshListener) /// optional: a RefreshListener
.withGeneratedAPIClass(MyGeneratedAPI.class) /// optional: a generated client API class
.withFilterConfig(filterConfig) /// optional: a HollowFilterConfig
.withDoubleSnapshotConfig(doubleSnapshotCfg) /// optional: a DoubleSnapshotConfig
.withObjectLongevityConfig(objectLongevityCfg) /// optional: an ObjectLongevityConfig
.withObjectLongevityDetector(detector) /// optional: an ObjectLongevityDetector
.withRefreshExecutor(refreshExecutor) /// optional: an Executor
.build()
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 16
Hollow Consumer API Generation
• We can initialize the data model using our POJOs
HollowAPIGenerator generator = new HollowAPIGenerator.Builder().withAPIClassname("BooksAPI")
.withPackageName("io.perezalcolea.hollow.api")
.withDataModel(Book.class)
.withDestination("./data-model/src/main/java")
.build();
generator.generateSourceFiles();
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 17
Insight Tools
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 18
Hollow Explorer
• UI which can be used to browse and search records within
any dataset
class MyExplorer {
void startExplorer() {
//Initialize consumer
HollowConsumer hollowConsumer = HollowConsumerBuilder.build("publish-dir", BooksAPI.class)
hollowConsumer.triggerRefresh()
HollowExplorerUIServer server = new HollowExplorerUIServer(hollowConsumer, 8080)
server.start()
server.join()
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 19
Hollow History
• UI which can be used to browse and search changes in a
dataset over time
class MyHistory {
void startHistory() {
//Initialize consumer
HollowConsumer hollowConsumer = HollowConsumerBuilder.build("publish-dir", BooksAPI.class)
hollowConsumer.triggerRefresh()
HollowHistoryUIServer server = new HollowHistoryUIServer(hollowConsumer, 8090)
server.start()
server.join()
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 20
Heap Usage Analysis
• Given a loaded HollowReadStateEngine, it is possible to
iterate over each type and gather statistics about its
approximate heap usage
class MyMemoryInsight {
void printMemoryUsage() {
HollowReadStateEngine stateEngine = consumer.getStateEngine()
long totalApproximateHeapFootprint = 0
for(HollowTypeReadState typeState : stateEngine.typeStates) {
String typeName = typeState.schema.name
long heapCost = typeState.approximateHeapFootprintInBytes
println(typeName + ": " + heapCost);
totalApproximateHeapFootprint += heapCost
}
println("TOTAL: " + totalApproximateHeapFootprint)
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 21
Other tools
• Metrics Collector
• Blob Storage Cleaner
• Filtering
• Combining/Splitting
• Patching
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 22
Interacting with a
Hollow Dataset
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 23
Sample model
@HollowPrimaryKey(fields={"bookId"})
public class Book {
long bookId;
String title;
String isbn;
String publisher;
String language;
Set<Author> authors;
}
public class Author {
long id;
String authorName;
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 24
Indexing/Querying
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 25
Default Primary Keys
• Each type in our data model gets a custom index class
called <typename>PrimaryKeyIndex
• Backed by Hollow Consumer
• Will automatically stay up-to-date as your dataset updates
class MyPrimaryKeyIndex {
Book findBook(long bookId) {
HollowConsumer hollowConsumer = //my consumer
BookPrimaryKeyIndex bookPrimaryKeyIndex = new BookPrimaryKeyIndex(consumer)
Book book = bookPrimaryKeyIndex.findMatch(bookId)
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 26
Consumer-specified Primary Keys
• A primary key index is not restricted to just default primary
keys
class MyPrimaryKeyIndex {
Book findAuthor(long authorId) {
HollowConsumer hollowConsumer = //my consumer
AuthorPrimaryKeyIndex authorPrimaryKeyIndex = new AuthorPrimaryKeyIndex(consumer, "id")
Author author = authorPrimaryKeyIndex.findMatch(authorId)
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 27
Hash Index
• Records based on keys for which there is not a one-to-one
mapping between records and key values
• Must specify each of a query type, a select field, and one or
more match fields
class MyHashIndex {
Iterable<Book> findBooksByPublisher(String publisher) {
HollowConsumer hollowConsumer = //my consumer
BooksAPIHashIndex publisherHashIndex = new BooksAPIHashIndex(consumer, "Book", "", "publisher.value")
return publisherHashIndex.findMatch(publisher)
}
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 28
Data Modeling
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 29
Primary Keys
• @HollowPrimaryKey annotation
• Provide a shortcut when creating a primary key index
@HollowPrimaryKey(fields={"bookId"})
public class Book {
long bookId;
String title;
String isbn;
String publisher;
String language;
Set<Author> authors;
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 30
Inlined vs Referenced Fields
• Inline: fields that they are no longer REFERENCE fields, but
instead encode their data directly in each record
@HollowPrimaryKey(fields={"bookId"})
public class Book {
long bookId;
@HollowInline
String title;
String isbn;
String publisher;
String language;
Set<Author> authors;
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 31
Namespaced Record Type Names
• Fields with like values may reference the same record type,
but reference fields of the same primitive type elsewhere in
the data model use different record types
• Types can be filtered
@HollowPrimaryKey(fields={"bookId"})
public class Book {
long bookId;
...
@HollowTypeName(name="Publisher")
String publisher;
@HollowTypeName(name="Language")
String language;
...
}
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 32
Maintaining Backwards
Compatibility
• Adding a new type
• Removing an existing type
• Adding a new field to an existing type
• Removing an existing field from an existing type.
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 33
Demo
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 34
Useful links
https://hollow.how/
https://github.com/Netflix/hollow-reference-implementation
https://github.com/Netflix/hollow
Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 35
Thank
YouRoberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 36

More Related Content

What's hot

Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
kawamuray
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Flink Forward
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
confluent
 

What's hot (20)

Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
Large-Scale Stream Processing in the Hadoop Ecosystem - Hadoop Summit 2016
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
HBaseConEast2016: Coprocessors – Uses, Abuses and SolutionsHBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for Isolation
 
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
HBaseConEast2016: How yarn timeline service v.2 unlocks 360 degree platform i...
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
 
Espresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom QuiggleEspresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom Quiggle
 
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
 
Streaming process with Kafka Connect and Kafka Streams
Streaming process with Kafka Connect and Kafka StreamsStreaming process with Kafka Connect and Kafka Streams
Streaming process with Kafka Connect and Kafka Streams
 

Similar to GR8ConfUS 2018 - High performant in-memory datasets with Netflix H0110W

Similar to GR8ConfUS 2018 - High performant in-memory datasets with Netflix H0110W (20)

The State of containerd
The State of containerdThe State of containerd
The State of containerd
 
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia GuptaIntro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
 
Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on Kubernetes
 
Containerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container RuntimeContainerd Internals: Building a Core Container Runtime
Containerd Internals: Building a Core Container Runtime
 
Architectural patterns for high performance microservices in kubernetes
Architectural patterns for high performance microservices in kubernetesArchitectural patterns for high performance microservices in kubernetes
Architectural patterns for high performance microservices in kubernetes
 
StormCrawler at Bristech
StormCrawler at BristechStormCrawler at Bristech
StormCrawler at Bristech
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNK
 
Tech Talk - Blockchain presentation
Tech Talk - Blockchain presentationTech Talk - Blockchain presentation
Tech Talk - Blockchain presentation
 
Meetup 12-12-2017 - Application Isolation on Kubernetes
Meetup 12-12-2017 - Application Isolation on KubernetesMeetup 12-12-2017 - Application Isolation on Kubernetes
Meetup 12-12-2017 - Application Isolation on Kubernetes
 
Node.js and the MySQL Document Store
Node.js and the MySQL Document StoreNode.js and the MySQL Document Store
Node.js and the MySQL Document Store
 
Containerd internals: building a core container runtime
Containerd internals: building a core container runtimeContainerd internals: building a core container runtime
Containerd internals: building a core container runtime
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
Intro to Kapacitor for Alerting and Anomaly Detection
Intro to Kapacitor for Alerting and Anomaly DetectionIntro to Kapacitor for Alerting and Anomaly Detection
Intro to Kapacitor for Alerting and Anomaly Detection
 
The state of containerd
The state of containerdThe state of containerd
The state of containerd
 
Tips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight DeploymentsTips, Tricks & Best Practices for large scale HDInsight Deployments
Tips, Tricks & Best Practices for large scale HDInsight Deployments
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
 
K8s vs Cloud Foundry
K8s vs Cloud FoundryK8s vs Cloud Foundry
K8s vs Cloud Foundry
 

More from Roberto Pérez Alcolea

Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...
Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...
Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...
Roberto Pérez Alcolea
 
Dependency Management in a Complex World (JConf Chicago 2022)
Dependency Management in a Complex World (JConf Chicago 2022)Dependency Management in a Complex World (JConf Chicago 2022)
Dependency Management in a Complex World (JConf Chicago 2022)
Roberto Pérez Alcolea
 
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
Roberto Pérez Alcolea
 
Dependency Management at Scale @ JConf Centroamérica 2020
Dependency Management at Scale @ JConf Centroamérica 2020Dependency Management at Scale @ JConf Centroamérica 2020
Dependency Management at Scale @ JConf Centroamérica 2020
Roberto Pérez Alcolea
 
Dependency Management at Scale
Dependency Management at ScaleDependency Management at Scale
Dependency Management at Scale
Roberto Pérez Alcolea
 

More from Roberto Pérez Alcolea (10)

Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...
Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...
Escaping Dependency Hell: A deep dive into Gradle's dependency management fea...
 
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
 
DPE Summit - A More Integrated Build and CI to Accelerate Builds at Netflix
DPE Summit - A More Integrated Build and CI to Accelerate Builds at NetflixDPE Summit - A More Integrated Build and CI to Accelerate Builds at Netflix
DPE Summit - A More Integrated Build and CI to Accelerate Builds at Netflix
 
Dependency Management in a Complex World (JConf Chicago 2022)
Dependency Management in a Complex World (JConf Chicago 2022)Dependency Management in a Complex World (JConf Chicago 2022)
Dependency Management in a Complex World (JConf Chicago 2022)
 
Leveraging Gradle @ Netflix (Guadalajara JUG Feb 25, 2021)
Leveraging Gradle @ Netflix (Guadalajara JUG Feb 25, 2021)Leveraging Gradle @ Netflix (Guadalajara JUG Feb 25, 2021)
Leveraging Gradle @ Netflix (Guadalajara JUG Feb 25, 2021)
 
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
 
Dependency Management at Scale @ JConf Centroamérica 2020
Dependency Management at Scale @ JConf Centroamérica 2020Dependency Management at Scale @ JConf Centroamérica 2020
Dependency Management at Scale @ JConf Centroamérica 2020
 
Dependency Management at Scale
Dependency Management at ScaleDependency Management at Scale
Dependency Management at Scale
 
GR8ConfUS 2017 - Real Time Traffic Visualization with Vizceral and Hystrix
GR8ConfUS 2017 - Real Time Traffic Visualization with Vizceral and HystrixGR8ConfUS 2017 - Real Time Traffic Visualization with Vizceral and Hystrix
GR8ConfUS 2017 - Real Time Traffic Visualization with Vizceral and Hystrix
 

Recently uploaded

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 

GR8ConfUS 2018 - High performant in-memory datasets with Netflix H0110W

  • 1. High performant in- memory datasets with Netflix Hollow Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 1
  • 2. Who am I? Roberto Perez Alcolea • Mexican • Streaming Platform @ Target • Groovy Enthusiast • roberto@perezalcolea.info • @rpalcolea Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 2
  • 3. Dataset Distribution Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 3
  • 4. The problem • Dissemination of small or moderately sized data sets (no big-data) Common approaches: • Sending data to a data store (RDMS, NoSQL) • Serializing and keeping a local copy Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 4
  • 5. Is there a solution? Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 5
  • 6. Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 6
  • 7. What is Hollow? • Java library and toolset for disseminating in-memory datasets from a single producer to many consumers Goals: • Maximum development agility • Highly optimized performance and resource management • Extreme stability and reliability Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 7
  • 8. How it works Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 8
  • 9. Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 9
  • 10. Key concepts • Blob / Blob Store • Snapshot/Delta • Type • Deduplication Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 10
  • 11. Key concepts • State Engine • Ordinal • unique identifier of the record within a type • sufficient to locate the record within a type • Immutability • Cycle Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 11
  • 12. Hollow Producer • A single machine that retrieves all data from a source of truth and produces a delta chain • Encapsulates the details of compacting, publishing, announcing, validating, and (if necessary) rollback of data states Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 12
  • 13. Hollow Producer class MyProducer { HollowProducer buildProducer() { ... HollowProducer .withPublisher(publisher) /// required: a BlobPublisher .withAnnouncer(announcer) /// optional: an Announcer .withValidators(validators) /// optional: one or more Validator .withListeners(listeners) /// optional: one or more HollowProducerListeners .withBlobStagingDir(dir) /// optional: a java.io.File .withBlobCompressor(compressor) /// optional: a BlobCompressor .withBlobStager(stager) /// optional: a BlobStager .withSnapshotPublishExecutor(e) /// optional: a java.util.concurrent.Executor .withNumStatesBetweenSnapshots(n) /// optional: an int .withTargetMaxTypeShardSize(size) /// optional: a long .build() } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 13
  • 14. Hollow Producer - Capabilities • Restore at Startup • Rolling Back • Validating Data • Compacting Data Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 14
  • 15. Hollow Consumer • Encapsulates the details of initializing and keeping a dataset up to date • At initialization time, loads snapshot • After initialization time, keeps a local copy of the dataset current by applying delta transitions • Each time a new version is announced, triggerRefresh() should be called on the HollowConsumer Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 15
  • 16. Hollow Consumer class MyConsumer { HollowConsumer buildConsumer() { ... HollowConsumer .withBlobRetriever(blobRetriever) /// required: a BlobRetriever .withLocalBlobStore(localDiskDir) /// optional: a local disk location .withAnnouncementWatcher(announcementWatcher) /// optional: a AnnouncementWatcher .withRefreshListener(refreshListener) /// optional: a RefreshListener .withGeneratedAPIClass(MyGeneratedAPI.class) /// optional: a generated client API class .withFilterConfig(filterConfig) /// optional: a HollowFilterConfig .withDoubleSnapshotConfig(doubleSnapshotCfg) /// optional: a DoubleSnapshotConfig .withObjectLongevityConfig(objectLongevityCfg) /// optional: an ObjectLongevityConfig .withObjectLongevityDetector(detector) /// optional: an ObjectLongevityDetector .withRefreshExecutor(refreshExecutor) /// optional: an Executor .build() } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 16
  • 17. Hollow Consumer API Generation • We can initialize the data model using our POJOs HollowAPIGenerator generator = new HollowAPIGenerator.Builder().withAPIClassname("BooksAPI") .withPackageName("io.perezalcolea.hollow.api") .withDataModel(Book.class) .withDestination("./data-model/src/main/java") .build(); generator.generateSourceFiles(); Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 17
  • 18. Insight Tools Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 18
  • 19. Hollow Explorer • UI which can be used to browse and search records within any dataset class MyExplorer { void startExplorer() { //Initialize consumer HollowConsumer hollowConsumer = HollowConsumerBuilder.build("publish-dir", BooksAPI.class) hollowConsumer.triggerRefresh() HollowExplorerUIServer server = new HollowExplorerUIServer(hollowConsumer, 8080) server.start() server.join() } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 19
  • 20. Hollow History • UI which can be used to browse and search changes in a dataset over time class MyHistory { void startHistory() { //Initialize consumer HollowConsumer hollowConsumer = HollowConsumerBuilder.build("publish-dir", BooksAPI.class) hollowConsumer.triggerRefresh() HollowHistoryUIServer server = new HollowHistoryUIServer(hollowConsumer, 8090) server.start() server.join() } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 20
  • 21. Heap Usage Analysis • Given a loaded HollowReadStateEngine, it is possible to iterate over each type and gather statistics about its approximate heap usage class MyMemoryInsight { void printMemoryUsage() { HollowReadStateEngine stateEngine = consumer.getStateEngine() long totalApproximateHeapFootprint = 0 for(HollowTypeReadState typeState : stateEngine.typeStates) { String typeName = typeState.schema.name long heapCost = typeState.approximateHeapFootprintInBytes println(typeName + ": " + heapCost); totalApproximateHeapFootprint += heapCost } println("TOTAL: " + totalApproximateHeapFootprint) } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 21
  • 22. Other tools • Metrics Collector • Blob Storage Cleaner • Filtering • Combining/Splitting • Patching Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 22
  • 23. Interacting with a Hollow Dataset Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 23
  • 24. Sample model @HollowPrimaryKey(fields={"bookId"}) public class Book { long bookId; String title; String isbn; String publisher; String language; Set<Author> authors; } public class Author { long id; String authorName; } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 24
  • 25. Indexing/Querying Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 25
  • 26. Default Primary Keys • Each type in our data model gets a custom index class called <typename>PrimaryKeyIndex • Backed by Hollow Consumer • Will automatically stay up-to-date as your dataset updates class MyPrimaryKeyIndex { Book findBook(long bookId) { HollowConsumer hollowConsumer = //my consumer BookPrimaryKeyIndex bookPrimaryKeyIndex = new BookPrimaryKeyIndex(consumer) Book book = bookPrimaryKeyIndex.findMatch(bookId) } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 26
  • 27. Consumer-specified Primary Keys • A primary key index is not restricted to just default primary keys class MyPrimaryKeyIndex { Book findAuthor(long authorId) { HollowConsumer hollowConsumer = //my consumer AuthorPrimaryKeyIndex authorPrimaryKeyIndex = new AuthorPrimaryKeyIndex(consumer, "id") Author author = authorPrimaryKeyIndex.findMatch(authorId) } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 27
  • 28. Hash Index • Records based on keys for which there is not a one-to-one mapping between records and key values • Must specify each of a query type, a select field, and one or more match fields class MyHashIndex { Iterable<Book> findBooksByPublisher(String publisher) { HollowConsumer hollowConsumer = //my consumer BooksAPIHashIndex publisherHashIndex = new BooksAPIHashIndex(consumer, "Book", "", "publisher.value") return publisherHashIndex.findMatch(publisher) } } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 28
  • 29. Data Modeling Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 29
  • 30. Primary Keys • @HollowPrimaryKey annotation • Provide a shortcut when creating a primary key index @HollowPrimaryKey(fields={"bookId"}) public class Book { long bookId; String title; String isbn; String publisher; String language; Set<Author> authors; } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 30
  • 31. Inlined vs Referenced Fields • Inline: fields that they are no longer REFERENCE fields, but instead encode their data directly in each record @HollowPrimaryKey(fields={"bookId"}) public class Book { long bookId; @HollowInline String title; String isbn; String publisher; String language; Set<Author> authors; } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 31
  • 32. Namespaced Record Type Names • Fields with like values may reference the same record type, but reference fields of the same primitive type elsewhere in the data model use different record types • Types can be filtered @HollowPrimaryKey(fields={"bookId"}) public class Book { long bookId; ... @HollowTypeName(name="Publisher") String publisher; @HollowTypeName(name="Language") String language; ... } Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 32
  • 33. Maintaining Backwards Compatibility • Adding a new type • Removing an existing type • Adding a new field to an existing type • Removing an existing field from an existing type. Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 33
  • 34. Demo Roberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 34
  • 36. Thank YouRoberto Perez Alcolea - High performant in-memory datasets with Netflix H0110W - GR8ConfUS 2018 36