Mais conteúdo relacionado Semelhante a Pulsar Functions Deep Dive_Sanjeev kulkarni (20) Mais de StreamNative (20) Pulsar Functions Deep Dive_Sanjeev kulkarni1. © 2020 SPLUNK INC.
Pulsar Functions
A Deep Dive | Pulsar Summit 2020
Sanjeev Kulkarni
sanjeevk@splunk.com
2. © 2020 SPLUNK INC.
Pulsar Functions:- A Deep Dive
Brief introduction to Pulsar Functions
Deep Dive into internals
• Submission workflow
• Scheduling workflow
• Execution workflow
• Java Instance concepts
Current/Future Work
Agenda
3. © 2020 SPLUNK INC.
Pulsar Functions:- A Deep Dive
Brief introduction to Pulsar Functions
Deep Dive into internals
• Submission workflow
• Scheduling workflow
• Execution workflow
• Java Instance concepts
Current/Future Work
Agenda
4. © 2020 SPLUNK INC.
Pulsar Functions:- A Brief Introduction
Bringing Serverless concepts to the
streaming world.
Execute processing logic per
message on input topic
Function output goes to an output
topic
• Optional
Abstract View
Core Concept
5. © 2020 SPLUNK INC.
Pulsar Functions:- A Brief Introduction
Emphasis on simplicity
Great for 90% use-cases on streams
• Filtering
• Routing
• Enrichment
Not meant to replace Spark/Flink
SDK-less API
import java.util.function.Function;
public class ExclamationFunction implements Function<String, String> {
@Override
public String apply(String input) {
return input + "!";
}
}
Simple API
6. © 2020 SPLUNK INC.
Pulsar Functions:- A Brief Introduction
Flexible execution environments
• Pulsar managed
– Thread
– Process
• Externally managed
– Kubernetes
CRUD based Rest API
Function lifecycle
7. © 2020 SPLUNK INC.
Pulsar Functions:- A Deep Dive
Brief introduction to Pulsar Functions
Deep Dive into internals
• Submission workflow
• Scheduling workflow
• Execution workflow
• Java Instance concepts
Current/Future Work
Agenda
8. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
Submit to any worker
Json repr of FunctionConfig
• tenant/namespace/name
• Input/Output
• configs
• lot more knobs ….
Function Code
• jars/.py/zip/etc
FunctionConfig
public class FunctionConfig {
private String tenant;
private String namespace;
private String name;
private String className;
private Collection<String> inputs;
private String output;
private ProcessingGuarantees processingGuarantees;
private Map<String, Object> userConfig;
private Map<String, Object> secrets;
private Integer parallelism;
private Resources resources;
...
}
Function Representation
9. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
AuthN/AuthZ checks
FunctionConfig validation
• missing parameters
• Incorrect parameters
• Local Configs
Function Code Validation
• class presence, etc
Copy Code to Bookeeper
FunctionMetaData
message FunctionMetaData {
FunctionDetails functionDetails;
PackageLocationMetaData packageLocation;
uint64 version;
uint64 createTime;
map<int32, FunctionState> instanceStates;
FunctionAuthenticationSpec functionAuthSpec;
}
Submission Checks
10. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
System of record
Stores all Functions
• map from <FQFN, FunctionMetaData>
FQFN:- Fully Qualified Function
Name
Backed by Pulsar Topic
• Function MetaData Topic
Contains a MetaData Topic Tailer
Function MetaData Manager
Function MetaData Manager
MetaData Topic Tailer
MetaData Topic
foo -> {functionDetails : {...},
version: 2,
…}
11. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
Just before Function
creation/update/delete
Function MetaData Manager:- Update State Machine
MetaData Topic Tailer
MetaData Topic
foo -> {functionDetails : {...},
version: 2,
…}
12. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
Make a copy of the current state
Function MetaData Manager:- Update State Machine
MetaData Topic Tailer
MetaData Topic
foo -> {functionDetails : {...},
version: 2,
…}
foo -> {functionDetails : {...},
version: 2,
…}
13. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
Make a copy of the current state
Merge the updates
Function MetaData Manager:- Update State Machine
MetaData Topic Tailer
MetaData Topic
foo -> {functionDetails : {...},
version: 2,
…}
foo -> {functionDetails : {......},
version: 2,
…}
14. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
Make a copy of the current state
Merge the updates
Increment the version
Function MetaData Manager:- Update State Machine
MetaData Topic Tailer
MetaData Topic
foo -> {functionDetails : {...},
version: 2,
…}
foo -> {functionDetails : {......},
version: 3,
…}
15. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
Make a copy of the current state
Merge the updates
Increment the version
Write to MetaData Topic
Function MetaData Manager:- Update State Machine
MetaData Topic Tailer
MetaData Topic
foo -> {functionDetails : {...},
version: 2,
…}
16. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
Make a copy of the current state
Merge the updates
Increment the version
Write to MetaData Topic
Tailer reads and verifies
Function MetaData Manager:- Update State Machine
MetaData Topic Tailer
MetaData Topic
foo -> {functionDetails : {...},
version: 2,
…}
17. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
Make a copy of the current state
Merge the updates
Increment the version
Write to MetaData Topic
Tailer reads and verifies
Upon no conflict, tailer updates
Function MetaData Manager:- Update State Machine
MetaData Topic Tailer
MetaData Topic
foo -> {functionDetails : {.....},
version: 3,
…}
18. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
Multiple Workers
Function MetaData Manager:- When do conflicts occur?
MetaData Topic Tailer
MetaData Topic
foo -> {functionDetails : {...},
version: 2,
…}
Worker 1
MetaData Topic Tailer
foo -> {functionDetails : {...},
version: 2,
…}
Worker 2
19. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
Multiple Workers
Concurrent updates to same function
Function MetaData Manager:- When do conflicts occur?
MetaData Topic Tailer
MetaData Topic
foo -> {functionDetails : {...},
version: 2,
…}
Worker 1
MetaData Topic Tailer
foo -> {functionDetails : {...},
version: 2,
…}
Worker 2
20. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
Multiple Workers
Concurrent updates to same function
First Writer Wins
Function MetaData Manager:- When do conflicts occur?
MetaData Topic Tailer
MetaData Topic
foo -> {functionDetails : {...},
version: 3,
…}
Worker 1
MetaData Topic Tailer
foo -> {functionDetails : {...},
version: 3,
…}
Worker 2
21. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
Multiple Workers
Concurrent updates to same function
First Writer Wins
Others are rejected
Function MetaData Manager:- When do conflicts occur?
MetaData Topic Tailer
MetaData Topic
foo -> {functionDetails : {...},
version: 3,
…}
Worker 1
MetaData Topic Tailer
foo -> {functionDetails : {...},
version: 3,
…}
Worker 2
22. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
Submit to any worker
Validation load scales linearly
Deterministic State Machine
MetaData Topic is audit log
Advantages
MetaData Topic Tailer
MetaData Topic
foo -> {functionDetails : {...},
version: 3,
…}
Worker 1
MetaData Topic Tailer
foo -> {functionDetails : {...},
version: 3,
…}
Worker 2
23. © 2020 SPLUNK INC.
Pulsar Functions:- Submission Workflow
MetaData topic topic growth
MetaData Topic compaction
non-trivial
Worker Start time
All Workers know everything
Pitfalls
MetaData Topic Tailer
MetaData Topic
foo -> {functionDetails : {...},
version: 3,
…}
Worker 1
MetaData Topic Tailer
foo -> {functionDetails : {...},
version: 3,
…}
Worker 2
24. © 2020 SPLUNK INC.
Pulsar Functions:- A Deep Dive
Brief introduction to Pulsar Functions
Deep Dive into internals
• Submission workflow
• Scheduling workflow
• Execution workflow
• Java Instance concepts
Current/Future Work
Agenda
25. © 2020 SPLUNK INC.
Pulsar Functions:- Scheduling Workflow
Abstracts out Scheduler
Executed only on a Leader
Invoked when
• Function CRUD operations
– create/update
– delete
• Worker Changes
– Unresponsive/dead workers
– New workers
– Periodic
– Leadership changes
IScheduler Interface
public interface IScheduler {
List<Assignment> schedule(<List<Instance> unassigned,
List<Instance> current,
Set<String> workers);
}
Pluggable Scheduler
26. © 2020 SPLUNK INC.
Pulsar Functions:- Scheduling Workflow
Empty Coordination Topic
Failover Subscription
Active Consumer is the Leader
Leader Election
Leader Election
Coordination Topic
Worker 1Worker 2Worker 3
27. © 2020 SPLUNK INC.
Pulsar Functions:- Scheduling Workflow
Assignment Topic
Written by the Leader
Compacted based on key(FQFN +
Instance Id)
All workers know about all
assignments
Function Assignments
Assignment Topic
Worker 1Worker 2Worker 3
{foo, 1} : worker-1,
...
{foo, 1} : worker-1,
...
{foo, 1} : worker-1,
...
Assignment Tailer Assignment Tailer Assignment Tailer
28. © 2020 SPLUNK INC.
Pulsar Functions:- Scheduling Workflow
Stores Assignment
Compacted
Key -> (FQFN + InstanceId)
Assignment
message Instance {
FunctionMetaData functionMetaData = 1;
int32 instanceId = 2;
}
message Assignment {
Instance instance = 1;
string workerId = 2;
}
Assignment Topic
29. © 2020 SPLUNK INC.
Pulsar Functions:- A Deep Dive
Brief introduction to Pulsar Functions
Deep Dive into internals
• Submission workflow
• Scheduling workflow
• Execution workflow
• Java Instance concepts
Current/Future Work
Agenda
30. © 2020 SPLUNK INC.
Pulsar Functions:- Execution Workflow
Triggered by Changes to
Assignment Table
Takes care of the worker’s specific
assignments
Function lifecycle management via
Spawner
Function RunTime Manager
Assignment Topic
Worker
{foo, 1} : worker-1,
...
Assignment Tailer
RunTime Manager
Spawner Spawner
31. © 2020 SPLUNK INC.
Pulsar Functions:- Execution Workflow
Abstracts out execution
environments using Runtime Factory
Manages Function lifecycle
Maintains grpc connection with
Function instance
Spawner
GRPC Channel
Spawner
Function
32. © 2020 SPLUNK INC.
Pulsar Functions:- Execution Workflow
Short-circuit MetaData Manager and
Runtime Manager
Directly use Spawner
Local Runner
GRPC Channel
Spawner
Function
Local Runner
33. © 2020 SPLUNK INC.
Pulsar Functions:- Execution Workflow
Simple interface for creating
execution environments
Creates Runtimes
Runtime Factory
public interface RuntimeFactory {
void initialize(WorkerConfig workerConfig);
Runtime createContainer(InstanceConfig instanceConfig,
String codeFile);
void close();
}
Runtime Factory
34. © 2020 SPLUNK INC.
Pulsar Functions:- A Deep Dive
Brief introduction to Pulsar Functions
Deep Dive into internals
• Submission workflow
• Scheduling workflow
• Execution workflow
• Java Instance concepts
Current/Future Work
Agenda
35. © 2020 SPLUNK INC.
Pulsar Functions:- Java Instance
Java Instance is (source, function,
sink) ensemble.
Source abstracts reading from input
topics
Sink abstracts writing to output topic
Java Instance
Source -> Process -> Sink
Source Sink
f
36. © 2020 SPLUNK INC.
Pulsar Functions:- Java Instance
Pulsar Source implements the
Source interface to read from Pulsar
topics
Pulsar Sink implements Sink
interface to write to Pulsar topic
Java Instance
Regular Pulsar Functions
Pulsar
Source
Pulsar
Sink
f
37. © 2020 SPLUNK INC.
Pulsar Functions:- Java Instance
Java Instance
What if we have non-Pulsar Source?
Non
Pulsar
Source
Pulsar
Sink
f
38. © 2020 SPLUNK INC.
Pulsar Functions:- Java Instance
Java Instance
Pulsar IO
Non
Pulsar
Source
Pulsar
Sink
f
39. © 2020 SPLUNK INC.
Pulsar Functions:- Java Instance
Non Pulsar Source reads from
external system
Identity Function lets the data pass
thru
Pulsar Sink writes to Pulsar
Java Instance
Pulsar IO Source
Non
Pulsar
Source
SinkIdentity
40. © 2020 SPLUNK INC.
Pulsar Functions:- Java Instance
Pulsar Source reads from Pulsar
topics
Identity Function lets the data pass
thru
Non Pulsar Sink writes to an external
system
Java Instance
Pulsar IO Sink
Pulsar
Source
Non
Pulsar
Sink
Identity
41. © 2020 SPLUNK INC.
Pulsar Functions:- A Deep Dive
Brief introduction to Pulsar Functions
Deep Dive into internals
• Submission workflow
• Scheduling workflow
• Execution workflow
• Java Instance concepts
Current/Future Work
Agenda
42. © 2020 SPLUNK INC.
Pulsar Functions:- Future Work
Each setup only supports a static
Runtime(Process/Thread/Pods)
Change it to be dynamically
specified during submission
Function RunTime Manager
Changes
Dynamic Runtime Selection
Assignment Topic
Worker
{foo, 1} : worker-1,
...
Assignment Tailer
RunTime Manager
Spawner
Function-1
Spawner
Function-2
Thread
Process
43. © 2020 SPLUNK INC.
Pulsar Functions:- Future Work
MetaData Topic not compacted
Stores all function change requests
Worker needs to read from
beginning upon startup
Function MetaData Topic Compaction
MetaData Topic Tailer
MetaData Topic
foo -> {functionDetails : {...},
version: 2,
…}
44. © 2020 SPLUNK INC.
Pulsar Functions:- Future Work
Chaining Functions
Output of one going as input of
others
A simple workflow API
Function Mesh
f1
f2
f3
f4
45. © 2020 SPLUNK INC.
Pulsar Functions:- Future Work
Discover/Collect Cycle
Repeating Cycle
Don’t drop discovered tasks on
failures
BatchSource
public interface BatchSource<T> {
void open(final Map<String, Object> config,
SourceContext context);
void discover(Consumer<byte[]> taskEater);
void prepare(byte[] task);
Record<T> readNext();
}
Batch Sources