SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
© 2020 SPLUNK INC.
How Splunk is using
Pulsar IO
Jerry Peng
Principal Software Engineer @ Splunk | Committer and PMC member @
Apache {Pulsar, Heron, Storm}
Pulsar Summit 2021
© 2020 SPLUNK INC.
How Splunk is using Pulsar IO
1. Overview of Pulsar IO
2. Pulsar @ Splunk
3. Pulsar IO @ Splunk
4. Improvements to Pulsar IO / Functions
5. Future of Pulsar IO @ Splunk
Agenda
© 2020 SPLUNK INC.
Overview of Pulsar IO
A connector framework to ingress and egress data to and from Pulsar
● Source - Ingress data into Pulsar from an external system.
● Sink - Egress data from Pulsar to an external system.
What is Pulsar IO?
© 2020 SPLUNK INC.
Overview of Pulsar IO
● A integrated solution to answer questions like
○ What is the best way for moving data into and out of the Pulsar?
○ Where should I run my application to publish data to or consume data from Pulsar?
○ How should I run my application to publish data to or consume data from Pulsar?
Why Pulsar IO?
© 2020 SPLUNK INC.
Overview of Pulsar IO
● Easy to use
○ Users to be able to ingress or egress data from and to external systems without having to
write any code
○ Built-in connectors
● Managed Runtime
○ A user does not need to worry about where and how to run a connector
○ Execution, scheduling, scaling, and fault tolerance taken care by runtime
● Flexible Runtime
○ Run instances as threads, processes, K8s pods, etc.
Design Goals
© 2020 SPLUNK INC.
Overview of Pulsar IO
public interface Source<T> extends AutoCloseable {
/**
* Open connector with configuration.
*
* @param config initialization config
* @param sourceContext environment where the source
connector is running
* @throws Exception IO type exceptions when opening a
connector
*/
void open(final Map<String, Object> config, SourceContext
sourceContext) throws Exception;
/**
* Reads the next message from source.
* If source does not have any new messages, this call
should block.
* @return next message from source. The return result
should never be null
* @throws Exception
*/
Record<T> read() throws Exception;
}
public interface Sink<T> extends AutoCloseable {
/**
* Open connector with configuration.
*
* @param config initialization config
* @param sinkContext environment where the sink connector
is running
* @throws Exception IO type exceptions when opening a
connector
*/
void open(final Map<String, Object> config, SinkContext
sinkContext) throws Exception;
/**
* Write a message to Sink.
*
* @param record record to write to sink
* @throws Exception
*/
void write(Record<T> record) throws Exception;
}
API
© 2020 SPLUNK INC.
Overview of Pulsar IO
● Pulsar IO is powered by Pulsar Functions framework
● Inherits all the features and benefits of the Pulsar Functions framework
Architecture and Execution
© 2020 SPLUNK INC.
Overview of Pulsar IO
● Deployment flexibility - Run a custom source/sink or a built-in one
● Execution flexibility - Sources and sinks can run as part of an existing cluster, as a standalone
process, on Kubernetes, etc.
● Parallelism - To increase the throughput of a sink or source, multiple instances of sources and
sink can be run by just adding a simple configuration.
● Load balancing - If sources and sink are run in “cluster” mode
● Fault-tolerance, monitoring, and metrics - If sources and sink are run in “cluster” mode, the
worker service as part of the Pulsar function framework will automatically monitor deployed
sources and sinks. When nodes fail, sources and sink we be redeployed to operational nodes.
Metrics are also automatically collected
● Dynamic updates - Each connector’s parallelism, source code, ingress and egress topics, and
many other configurations can be changed on the fly
● Stateful - Has access to State API
Benefits Summarized
© 2020 SPLUNK INC.
Overview of Pulsar IO
● https://pulsar.apache.org/docs/en/io-overview/
● https://www.splunk.com/en_us/blog/it/introducing-pulsar-io.html
References
© 2020 SPLUNK INC.
Pulsar @ Splunk
Overview
● Pulsar used for both streaming and queueing use cases
● Deployed as a multi-tenant SaaS offering internally
● Data nervous system at Splunk
○ Moving / routing data
○ Connecting apps / services together
© 2020 SPLUNK INC.
Pulsar @ Splunk
DSP Architecture
© 2020 SPLUNK INC.
Pulsar IO @ Splunk
● Powers the Unified Connector Framework (UCF) of Splunk’s Data Stream
Processor platform.
● Responsible for data ingress and egress of the DSP platform
What Pulsar IO was used for?
© 2020 SPLUNK INC.
Pulsar IO @ Splunk
● Inherent issues with existing homegrown / legacy platform
○ Complex Architecture
○ Scalability Issues
○ Performance
○ Infra Cost
○ Maintainability
● Leverage Open Source
○ Already using Pulsar, why not leverage more of its functionality?
○ Cost and risks of maintaining homegrown / proprietary platforms and protocols.
○ Leverage existing OSS connectors
○ Engage with community
Why Pulsar IO was chosen?
© 2020 SPLUNK INC.
Pulsar IO @ Splunk
Architecture
© 2020 SPLUNK INC.
Pulsar IO @ Splunk
● Large Scale Data Collection
○ Collecting large amounts of static data and ingesting them into DSP
● Requirements
○ Ingest petabytes of data per day
○ Tens of thousands of connectors
○ Low startup time
○ Cost effective
Use Case
© 2020 SPLUNK INC.
Pulsar IO Batch Source
● Designed for ingesting large amounts of static data into Pulsar
○ For example, ingesting data stored in AWS S3 or GCS
● Two phases
○ Discovery - discover data to collect and ingest into pulsar. The discover phase with output
a stream of tasks for the collect phase to execute.
■ Discovery phase triggered by user defined logic but usually periodically
■ Discovery phase only run on one instance i.e instance-0
■ This is done to reduce redundant API calls to external systems that enforce rate limits and charge per call
○ Collect - collect the data. Execute tasks generated by discovery phase
■ Run in parallel among instances
Overview
© 2020 SPLUNK INC.
Pulsar IO Batch Source
Architecture
Discover
(in instance-0)
Task queue
(Intermediate
Pulsar Topic)
Collect in instance-0
Collect in instance-1
Collect in instance-n
Event Queue
(Configured output
Pulsar topic)
. . .
Trigger
(for example,
every 5 minutes)
Tasks Tasks
Events
© 2020 SPLUNK INC.
Pulsar IO Batch Source
public interface BatchSource<T> extends AutoCloseable {
/**
* Open connector with configuration.
*
* @param config config that's supplied for source
* @param context environment where the source connector is running
* @throws Exception IO type exceptions when opening a connector
*/
void open( final Map<String, Object> config, SourceContext context) throws Exception;
/**
* Discovery phase of a connector. This phase will only be run on one instance, i.e. instance
0, of the connector.
* Implementations use the taskEater consumer to output serialized representation of tasks as
they are discovered.
*
* @param taskEater function to notify the framework about the new task received.
* @throws Exception during discover
*/
void discover(Consumer< byte[]> taskEater) throws Exception;
/**
* Called when a new task appears for this connector instance.
*
* @param task the serialized representation of the task
*/
void prepare( byte[] task) throws Exception;
/**
* Read data and return a record
* Return null if no more records are present for this task
* @return a record
*/
Record<T> readNext() throws Exception;
}
public interface BatchSourceTriggerer {
/**
* initializes the Triggerer with given config. Note that the triggerer doesn't start running
* until start is called.
*
* @param config config needed for triggerer to run
* @param sourceContext The source context associated with the source
* The parameter passed to this trigger function is an optional description of the event that
caused the trigger
* @throws Exception throws any exceptions when initializing
*/
void init(Map<String, Object> config, SourceContext sourceContext) throws Exception;
/**
* Triggerer should actually start looking out for trigger conditions.
*
* @param trigger The function to be called when its time to trigger the discover
* This function can be passed any metadata about this particular
* trigger event as its argument
* This method should return immediately. It is expected that implementations will use their own
mechanisms
* to schedule the triggers.
*/
void start(Consumer<String> trigger);
/**
* Triggerer should stop triggering.
*
*/
void stop();
}
API
© 2020 SPLUNK INC.
Pulsar IO @ Splunk
Deployment @ Splunk
● Deploying Function Workers
separately of Brokers
● Function / Connector instances
running as threads with Worker
JVM process
○ Connectors are viewed as vetted /
safe code.
● Leveraging built-in connectors
functionality of Pulsar IO
● State store for Pulsar Functions,
i.e. table service, deployed as a
independent cluster as well.
Deployment @ Splunk
© 2020 SPLUNK INC.
Giving back to OSS @ Splunk
● Pulsar IO Batch Source
● Re-designed core pieces of the Pulsar Functions architecture to improve
scalability and stability
● Performance testing
○ Running 100,000 instances (10,000 sources, 10 instances each)
○ Petabytes per day ingest rates
● Numerous bug fixes
Overview for Pulsar IO / Functions
© 2020 SPLUNK INC.
Improvements to Pulsar IO @ Splunk
● Pulsar Functions uses an internal topic
called the “metadata” topic to hold a log
of functions/sources/sinks submitted to
run. Previously, this topic grows
unbounded over time. At Splunk, we
re-designed and re-implemented the
metadata registration workflow in
Pulsar Functions to support topic
compaction so that old metadata can
be safely truncated.
● https://github.com/apache/pulsar/pull/7255
Metadata Topic Compaction
© 2020 SPLUNK INC.
Improvements to Pulsar IO @ Splunk
● Problem
○ When the leader worker isn't processing assignment messages fast enough. The
background routine that checks for unassigned functions instances will trigger scheduler to
schedule and write more assignments to the assignment topic. There is essentially a
feedback loop that can cause many assignment updates to be published in the assignment
topic that are unnecessary.
● Modification
○ When a worker becomes the leader, it stops tailing the assignment topic. Since, the leader
runs the scheduling process, it will already now which instances are assigned to which
worker thus it is unnecessary for it to tail the assignment topic.
● https://github.com/apache/pulsar/pull/7237
Improving Scheduling Performance and Stability
© 2020 SPLUNK INC.
Using Exclusive Producer
© 2020 SPLUNK INC.
Improvements to Pulsar IO @ Splunk
● Problem
○ Previously, there no mechanism for re-balancing instances scheduled on Function Workers.
○ Scheduling of instances to workers may become skewed over time
● Modification
○ Add interface for rebalance strategy.
○ Allow users to trigger a rebalance on-demand
○ Implement ability to automatically periodically rebalance
● https://github.com/apache/pulsar/pull/7237
Automatic Re-balancing of Instances
© 2020 SPLUNK INC.
Future of Pulsar IO @ Splunk
● Autoscaling
○ Workers and instances
■ Autoscale workers on K8s using K8s HPAs almost done
● Resource Aware Scheduling
○ Scheduling that takes into account
● Continue to work on getting topic compaction to work fully with the internal
topics used by Pulsar Functions
● Integrating client memory limits to Pulsar functions
○ Adding per producer and consumer memory limits
■ More isolation even when running instances as threads
● More fixes and optimizations!
© 2020 SPLUNK INC.
Connectors as a Service
● Providing a platform for all connectors to run on at Splunk based on Pulsar IO
© 2020 SPLUNK INC.
Connectors as a Service
Continued...
© 2020 SPLUNK INC.
Connectors as a Service
Continued...
© 2020 SPLUNK INC.
Thank you!
We are hiring!
jerryp@splunk.com

Mais conteúdo relacionado

Mais procurados

A Pulsar Use Case In Federated Learning - Pulsar Summit NA 2021
A Pulsar Use Case In Federated Learning - Pulsar Summit NA 2021A Pulsar Use Case In Federated Learning - Pulsar Summit NA 2021
A Pulsar Use Case In Federated Learning - Pulsar Summit NA 2021StreamNative
 
Building a FaaS with pulsar
Building a FaaS with pulsarBuilding a FaaS with pulsar
Building a FaaS with pulsarStreamNative
 
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...StreamNative
 
Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021
Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021
Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021StreamNative
 
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...StreamNative
 
Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021
Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021
Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021StreamNative
 
Scaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsarScaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsarStreamNative
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray
 
Pulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionPulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionStreamNative
 
Transaction preview of Apache Pulsar
Transaction preview of Apache PulsarTransaction preview of Apache Pulsar
Transaction preview of Apache PulsarStreamNative
 
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembBuilding a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembStreamNative
 
What's New in Apache Pulsar 2.9- Pulsar Summit Asia 2021
What's New in Apache Pulsar 2.9- Pulsar Summit Asia 2021What's New in Apache Pulsar 2.9- Pulsar Summit Asia 2021
What's New in Apache Pulsar 2.9- Pulsar Summit Asia 2021StreamNative
 
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...StreamNative
 
Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0StreamNative
 
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...How Splunk Mission Control leverages various Pulsar subscription types_Pranav...
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...StreamNative
 
TGIPulsar - EP #006: Lifecycle of a Pulsar message
TGIPulsar - EP #006: Lifecycle of a Pulsar message TGIPulsar - EP #006: Lifecycle of a Pulsar message
TGIPulsar - EP #006: Lifecycle of a Pulsar message StreamNative
 
Large scale log pipeline using Apache Pulsar_Nozomi
Large scale log pipeline using Apache Pulsar_NozomiLarge scale log pipeline using Apache Pulsar_Nozomi
Large scale log pipeline using Apache Pulsar_NozomiStreamNative
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaGuozhang Wang
 
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021StreamNative
 

Mais procurados (20)

A Pulsar Use Case In Federated Learning - Pulsar Summit NA 2021
A Pulsar Use Case In Federated Learning - Pulsar Summit NA 2021A Pulsar Use Case In Federated Learning - Pulsar Summit NA 2021
A Pulsar Use Case In Federated Learning - Pulsar Summit NA 2021
 
Building a FaaS with pulsar
Building a FaaS with pulsarBuilding a FaaS with pulsar
Building a FaaS with pulsar
 
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
 
Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021
Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021
Simplifying Migration from Kafka to Pulsar - Pulsar Summit NA 2021
 
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
How Pulsar Enables Netdata to Offer Unlimited Infrastructure Monitoring for F...
 
Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021
Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021
Deep Dive into the Pulsar Binary Protocol - Pulsar Virtual Summit Europe 2021
 
Scaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsarScaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsar
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 
Pulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionPulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless Evolution
 
Transaction preview of Apache Pulsar
Transaction preview of Apache PulsarTransaction preview of Apache Pulsar
Transaction preview of Apache Pulsar
 
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembBuilding a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
 
What's New in Apache Pulsar 2.9- Pulsar Summit Asia 2021
What's New in Apache Pulsar 2.9- Pulsar Summit Asia 2021What's New in Apache Pulsar 2.9- Pulsar Summit Asia 2021
What's New in Apache Pulsar 2.9- Pulsar Summit Asia 2021
 
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
Security and Multi-Tenancy with Apache Pulsar in Yahoo! (Verizon Media) - Pul...
 
Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0Transaction Support in Pulsar 2.5.0
Transaction Support in Pulsar 2.5.0
 
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...How Splunk Mission Control leverages various Pulsar subscription types_Pranav...
How Splunk Mission Control leverages various Pulsar subscription types_Pranav...
 
TGIPulsar - EP #006: Lifecycle of a Pulsar message
TGIPulsar - EP #006: Lifecycle of a Pulsar message TGIPulsar - EP #006: Lifecycle of a Pulsar message
TGIPulsar - EP #006: Lifecycle of a Pulsar message
 
Large scale log pipeline using Apache Pulsar_Nozomi
Large scale log pipeline using Apache Pulsar_NozomiLarge scale log pipeline using Apache Pulsar_Nozomi
Large scale log pipeline using Apache Pulsar_Nozomi
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache Kafka
 
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
Apache BookKeeper State Store: A Durable Key-Value Store - Pulsar Summit NA 2021
 

Semelhante a How Splunk Is Using Pulsar IO

Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server InternalsPraveen Gollakota
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesDatabricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowFrom Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowDatabricks
 
Apache Cassandra and Apche Spark
Apache Cassandra and Apche SparkApache Cassandra and Apche Spark
Apache Cassandra and Apche SparkAlex Thompson
 
Functional reactive programming
Functional reactive programmingFunctional reactive programming
Functional reactive programmingAraf Karsh Hamid
 
Spring boot for buidling microservices
Spring boot for buidling microservicesSpring boot for buidling microservices
Spring boot for buidling microservicesNilanjan Roy
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Sparkfelixcss
 
Pulsar Functions Deep Dive
Pulsar Functions Deep DivePulsar Functions Deep Dive
Pulsar Functions Deep DiveData Con LA
 
Kirill Rozin - Practical Wars for Automatization
Kirill Rozin - Practical Wars for AutomatizationKirill Rozin - Practical Wars for Automatization
Kirill Rozin - Practical Wars for AutomatizationSergey Arkhipov
 
Pulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarniPulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarniStreamNative
 
CC++ echo serverThis assignment is designed to introduce network .pdf
CC++ echo serverThis assignment is designed to introduce network .pdfCC++ echo serverThis assignment is designed to introduce network .pdf
CC++ echo serverThis assignment is designed to introduce network .pdfsecunderbadtirumalgi
 
Deploying Splunk on OpenShift – Part2 : Getting Data In
Deploying Splunk on OpenShift – Part2 : Getting Data InDeploying Splunk on OpenShift – Part2 : Getting Data In
Deploying Splunk on OpenShift – Part2 : Getting Data InEric Gardner
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesDatabricks
 
Oracle bi 10g_install_migration
Oracle bi 10g_install_migrationOracle bi 10g_install_migration
Oracle bi 10g_install_migrationMlx Le
 
Bharath Ram Chandrasekar_Tele 6603_SDN &NFV
Bharath Ram Chandrasekar_Tele 6603_SDN &NFVBharath Ram Chandrasekar_Tele 6603_SDN &NFV
Bharath Ram Chandrasekar_Tele 6603_SDN &NFVBharath Ram Chandrasekar
 
Mistral and StackStorm
Mistral and StackStormMistral and StackStorm
Mistral and StackStormDmitri Zimine
 
Publishing AwsLlambda Logs Into SplunkCloud
Publishing AwsLlambda Logs Into SplunkCloudPublishing AwsLlambda Logs Into SplunkCloud
Publishing AwsLlambda Logs Into SplunkCloudvarun kumar karuna
 

Semelhante a How Splunk Is Using Pulsar IO (20)

Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server Internals
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowFrom Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
 
Apache Cassandra and Apche Spark
Apache Cassandra and Apche SparkApache Cassandra and Apche Spark
Apache Cassandra and Apche Spark
 
Workshop slides
Workshop slidesWorkshop slides
Workshop slides
 
Functional reactive programming
Functional reactive programmingFunctional reactive programming
Functional reactive programming
 
Spring boot for buidling microservices
Spring boot for buidling microservicesSpring boot for buidling microservices
Spring boot for buidling microservices
 
Apache cassandra nio
Apache cassandra nioApache cassandra nio
Apache cassandra nio
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
 
Pulsar Functions Deep Dive
Pulsar Functions Deep DivePulsar Functions Deep Dive
Pulsar Functions Deep Dive
 
Kirill Rozin - Practical Wars for Automatization
Kirill Rozin - Practical Wars for AutomatizationKirill Rozin - Practical Wars for Automatization
Kirill Rozin - Practical Wars for Automatization
 
Pulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarniPulsar Functions Deep Dive_Sanjeev kulkarni
Pulsar Functions Deep Dive_Sanjeev kulkarni
 
CC++ echo serverThis assignment is designed to introduce network .pdf
CC++ echo serverThis assignment is designed to introduce network .pdfCC++ echo serverThis assignment is designed to introduce network .pdf
CC++ echo serverThis assignment is designed to introduce network .pdf
 
Deploying Splunk on OpenShift – Part2 : Getting Data In
Deploying Splunk on OpenShift – Part2 : Getting Data InDeploying Splunk on OpenShift – Part2 : Getting Data In
Deploying Splunk on OpenShift – Part2 : Getting Data In
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
 
Oracle bi 10g_install_migration
Oracle bi 10g_install_migrationOracle bi 10g_install_migration
Oracle bi 10g_install_migration
 
Bharath Ram Chandrasekar_Tele 6603_SDN &NFV
Bharath Ram Chandrasekar_Tele 6603_SDN &NFVBharath Ram Chandrasekar_Tele 6603_SDN &NFV
Bharath Ram Chandrasekar_Tele 6603_SDN &NFV
 
Mistral and StackStorm
Mistral and StackStormMistral and StackStorm
Mistral and StackStorm
 
Publishing AwsLlambda Logs Into SplunkCloud
Publishing AwsLlambda Logs Into SplunkCloudPublishing AwsLlambda Logs Into SplunkCloud
Publishing AwsLlambda Logs Into SplunkCloud
 

Mais de StreamNative

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...StreamNative
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...StreamNative
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022StreamNative
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022StreamNative
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...StreamNative
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...StreamNative
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022StreamNative
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022StreamNative
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022StreamNative
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022StreamNative
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022StreamNative
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022StreamNative
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...StreamNative
 

Mais de StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 

Último

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Último (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

How Splunk Is Using Pulsar IO

  • 1. © 2020 SPLUNK INC. How Splunk is using Pulsar IO Jerry Peng Principal Software Engineer @ Splunk | Committer and PMC member @ Apache {Pulsar, Heron, Storm} Pulsar Summit 2021
  • 2. © 2020 SPLUNK INC. How Splunk is using Pulsar IO 1. Overview of Pulsar IO 2. Pulsar @ Splunk 3. Pulsar IO @ Splunk 4. Improvements to Pulsar IO / Functions 5. Future of Pulsar IO @ Splunk Agenda
  • 3. © 2020 SPLUNK INC. Overview of Pulsar IO A connector framework to ingress and egress data to and from Pulsar ● Source - Ingress data into Pulsar from an external system. ● Sink - Egress data from Pulsar to an external system. What is Pulsar IO?
  • 4. © 2020 SPLUNK INC. Overview of Pulsar IO ● A integrated solution to answer questions like ○ What is the best way for moving data into and out of the Pulsar? ○ Where should I run my application to publish data to or consume data from Pulsar? ○ How should I run my application to publish data to or consume data from Pulsar? Why Pulsar IO?
  • 5. © 2020 SPLUNK INC. Overview of Pulsar IO ● Easy to use ○ Users to be able to ingress or egress data from and to external systems without having to write any code ○ Built-in connectors ● Managed Runtime ○ A user does not need to worry about where and how to run a connector ○ Execution, scheduling, scaling, and fault tolerance taken care by runtime ● Flexible Runtime ○ Run instances as threads, processes, K8s pods, etc. Design Goals
  • 6. © 2020 SPLUNK INC. Overview of Pulsar IO public interface Source<T> extends AutoCloseable { /** * Open connector with configuration. * * @param config initialization config * @param sourceContext environment where the source connector is running * @throws Exception IO type exceptions when opening a connector */ void open(final Map<String, Object> config, SourceContext sourceContext) throws Exception; /** * Reads the next message from source. * If source does not have any new messages, this call should block. * @return next message from source. The return result should never be null * @throws Exception */ Record<T> read() throws Exception; } public interface Sink<T> extends AutoCloseable { /** * Open connector with configuration. * * @param config initialization config * @param sinkContext environment where the sink connector is running * @throws Exception IO type exceptions when opening a connector */ void open(final Map<String, Object> config, SinkContext sinkContext) throws Exception; /** * Write a message to Sink. * * @param record record to write to sink * @throws Exception */ void write(Record<T> record) throws Exception; } API
  • 7. © 2020 SPLUNK INC. Overview of Pulsar IO ● Pulsar IO is powered by Pulsar Functions framework ● Inherits all the features and benefits of the Pulsar Functions framework Architecture and Execution
  • 8. © 2020 SPLUNK INC. Overview of Pulsar IO ● Deployment flexibility - Run a custom source/sink or a built-in one ● Execution flexibility - Sources and sinks can run as part of an existing cluster, as a standalone process, on Kubernetes, etc. ● Parallelism - To increase the throughput of a sink or source, multiple instances of sources and sink can be run by just adding a simple configuration. ● Load balancing - If sources and sink are run in “cluster” mode ● Fault-tolerance, monitoring, and metrics - If sources and sink are run in “cluster” mode, the worker service as part of the Pulsar function framework will automatically monitor deployed sources and sinks. When nodes fail, sources and sink we be redeployed to operational nodes. Metrics are also automatically collected ● Dynamic updates - Each connector’s parallelism, source code, ingress and egress topics, and many other configurations can be changed on the fly ● Stateful - Has access to State API Benefits Summarized
  • 9. © 2020 SPLUNK INC. Overview of Pulsar IO ● https://pulsar.apache.org/docs/en/io-overview/ ● https://www.splunk.com/en_us/blog/it/introducing-pulsar-io.html References
  • 10. © 2020 SPLUNK INC. Pulsar @ Splunk Overview ● Pulsar used for both streaming and queueing use cases ● Deployed as a multi-tenant SaaS offering internally ● Data nervous system at Splunk ○ Moving / routing data ○ Connecting apps / services together
  • 11. © 2020 SPLUNK INC. Pulsar @ Splunk DSP Architecture
  • 12. © 2020 SPLUNK INC. Pulsar IO @ Splunk ● Powers the Unified Connector Framework (UCF) of Splunk’s Data Stream Processor platform. ● Responsible for data ingress and egress of the DSP platform What Pulsar IO was used for?
  • 13. © 2020 SPLUNK INC. Pulsar IO @ Splunk ● Inherent issues with existing homegrown / legacy platform ○ Complex Architecture ○ Scalability Issues ○ Performance ○ Infra Cost ○ Maintainability ● Leverage Open Source ○ Already using Pulsar, why not leverage more of its functionality? ○ Cost and risks of maintaining homegrown / proprietary platforms and protocols. ○ Leverage existing OSS connectors ○ Engage with community Why Pulsar IO was chosen?
  • 14. © 2020 SPLUNK INC. Pulsar IO @ Splunk Architecture
  • 15. © 2020 SPLUNK INC. Pulsar IO @ Splunk ● Large Scale Data Collection ○ Collecting large amounts of static data and ingesting them into DSP ● Requirements ○ Ingest petabytes of data per day ○ Tens of thousands of connectors ○ Low startup time ○ Cost effective Use Case
  • 16. © 2020 SPLUNK INC. Pulsar IO Batch Source ● Designed for ingesting large amounts of static data into Pulsar ○ For example, ingesting data stored in AWS S3 or GCS ● Two phases ○ Discovery - discover data to collect and ingest into pulsar. The discover phase with output a stream of tasks for the collect phase to execute. ■ Discovery phase triggered by user defined logic but usually periodically ■ Discovery phase only run on one instance i.e instance-0 ■ This is done to reduce redundant API calls to external systems that enforce rate limits and charge per call ○ Collect - collect the data. Execute tasks generated by discovery phase ■ Run in parallel among instances Overview
  • 17. © 2020 SPLUNK INC. Pulsar IO Batch Source Architecture Discover (in instance-0) Task queue (Intermediate Pulsar Topic) Collect in instance-0 Collect in instance-1 Collect in instance-n Event Queue (Configured output Pulsar topic) . . . Trigger (for example, every 5 minutes) Tasks Tasks Events
  • 18. © 2020 SPLUNK INC. Pulsar IO Batch Source public interface BatchSource<T> extends AutoCloseable { /** * Open connector with configuration. * * @param config config that's supplied for source * @param context environment where the source connector is running * @throws Exception IO type exceptions when opening a connector */ void open( final Map<String, Object> config, SourceContext context) throws Exception; /** * Discovery phase of a connector. This phase will only be run on one instance, i.e. instance 0, of the connector. * Implementations use the taskEater consumer to output serialized representation of tasks as they are discovered. * * @param taskEater function to notify the framework about the new task received. * @throws Exception during discover */ void discover(Consumer< byte[]> taskEater) throws Exception; /** * Called when a new task appears for this connector instance. * * @param task the serialized representation of the task */ void prepare( byte[] task) throws Exception; /** * Read data and return a record * Return null if no more records are present for this task * @return a record */ Record<T> readNext() throws Exception; } public interface BatchSourceTriggerer { /** * initializes the Triggerer with given config. Note that the triggerer doesn't start running * until start is called. * * @param config config needed for triggerer to run * @param sourceContext The source context associated with the source * The parameter passed to this trigger function is an optional description of the event that caused the trigger * @throws Exception throws any exceptions when initializing */ void init(Map<String, Object> config, SourceContext sourceContext) throws Exception; /** * Triggerer should actually start looking out for trigger conditions. * * @param trigger The function to be called when its time to trigger the discover * This function can be passed any metadata about this particular * trigger event as its argument * This method should return immediately. It is expected that implementations will use their own mechanisms * to schedule the triggers. */ void start(Consumer<String> trigger); /** * Triggerer should stop triggering. * */ void stop(); } API
  • 19. © 2020 SPLUNK INC. Pulsar IO @ Splunk Deployment @ Splunk ● Deploying Function Workers separately of Brokers ● Function / Connector instances running as threads with Worker JVM process ○ Connectors are viewed as vetted / safe code. ● Leveraging built-in connectors functionality of Pulsar IO ● State store for Pulsar Functions, i.e. table service, deployed as a independent cluster as well. Deployment @ Splunk
  • 20. © 2020 SPLUNK INC. Giving back to OSS @ Splunk ● Pulsar IO Batch Source ● Re-designed core pieces of the Pulsar Functions architecture to improve scalability and stability ● Performance testing ○ Running 100,000 instances (10,000 sources, 10 instances each) ○ Petabytes per day ingest rates ● Numerous bug fixes Overview for Pulsar IO / Functions
  • 21. © 2020 SPLUNK INC. Improvements to Pulsar IO @ Splunk ● Pulsar Functions uses an internal topic called the “metadata” topic to hold a log of functions/sources/sinks submitted to run. Previously, this topic grows unbounded over time. At Splunk, we re-designed and re-implemented the metadata registration workflow in Pulsar Functions to support topic compaction so that old metadata can be safely truncated. ● https://github.com/apache/pulsar/pull/7255 Metadata Topic Compaction
  • 22. © 2020 SPLUNK INC. Improvements to Pulsar IO @ Splunk ● Problem ○ When the leader worker isn't processing assignment messages fast enough. The background routine that checks for unassigned functions instances will trigger scheduler to schedule and write more assignments to the assignment topic. There is essentially a feedback loop that can cause many assignment updates to be published in the assignment topic that are unnecessary. ● Modification ○ When a worker becomes the leader, it stops tailing the assignment topic. Since, the leader runs the scheduling process, it will already now which instances are assigned to which worker thus it is unnecessary for it to tail the assignment topic. ● https://github.com/apache/pulsar/pull/7237 Improving Scheduling Performance and Stability
  • 23. © 2020 SPLUNK INC. Using Exclusive Producer
  • 24. © 2020 SPLUNK INC. Improvements to Pulsar IO @ Splunk ● Problem ○ Previously, there no mechanism for re-balancing instances scheduled on Function Workers. ○ Scheduling of instances to workers may become skewed over time ● Modification ○ Add interface for rebalance strategy. ○ Allow users to trigger a rebalance on-demand ○ Implement ability to automatically periodically rebalance ● https://github.com/apache/pulsar/pull/7237 Automatic Re-balancing of Instances
  • 25. © 2020 SPLUNK INC. Future of Pulsar IO @ Splunk ● Autoscaling ○ Workers and instances ■ Autoscale workers on K8s using K8s HPAs almost done ● Resource Aware Scheduling ○ Scheduling that takes into account ● Continue to work on getting topic compaction to work fully with the internal topics used by Pulsar Functions ● Integrating client memory limits to Pulsar functions ○ Adding per producer and consumer memory limits ■ More isolation even when running instances as threads ● More fixes and optimizations!
  • 26. © 2020 SPLUNK INC. Connectors as a Service ● Providing a platform for all connectors to run on at Splunk based on Pulsar IO
  • 27. © 2020 SPLUNK INC. Connectors as a Service Continued...
  • 28. © 2020 SPLUNK INC. Connectors as a Service Continued...
  • 29. © 2020 SPLUNK INC. Thank you! We are hiring! jerryp@splunk.com