SlideShare uma empresa Scribd logo
1 de 69
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How Nextdoor Built a Scalable, Serverless
Data Pipeline for Billions of Events per Day
S R V 3 1 9
Prakash Janakiraman · Co-Founder & Chief Architect
Matt Wise · Sr. Systems Architect
Slava Markeyev · Software Engineer
N o v e m b e r 2 9 , 2 0 1 7
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Speakers
Prakash Janakiraman Slava MarkeyevMatt Wise
Co-Founder & Chief Architect Sr. Systems Architect Systems Engineer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What to Expect from the Session
• About Nextdoor
• Overview of our legacy pipeline
• Requirements for new pipeline
• Recap of AWS Serverless offerings
• Tool we built to help standardize Serverless ETL
• Deployment example via Cloudformation
• Q&A
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Used by 80% of US neighborhoods
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data drives products at Nextdoor
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data at Nextdoor
0
1
2
3
2013 2014 2015 2016 2017
BillionsEvents/Day
Data Ingested
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Realtime Data Ingestion Pipeline
Slava Markeyev · Software Engineer
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Legacy pipeline overview
• Built using Apache Flume
• A distributed self-hosted service for
collecting and aggregating log data
(similar to AWS Kinesis Streams,
Apache Kafka, and Fluentd)
• Served us well for 4 years
• Scaled to handle ~1.5B events per
day
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Legacy pipeline overview
Flume Collector Host
Application
Host Flume Agent
Syslog
Flume Collector Host
Flume Collector Host
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Legacy pipeline overview
Amazon S3 bucket
Flume CollectorHost
Amazon Redshift
Developers
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Legacy pipeline pain points
• Highly susceptible to back pressure
• Slow destination
• Influx of data
• Occasional slow Flume Collector (process, IO, CPU, bugs?)
• Restarts were very slow (tens of minutes)
• Hard to apply changes to the pipeline
• Proprietary storage format made it difficult to near-impossible to recover
corrupted data
• Flume agents were CPU hungry and this caused issues on critical infrastructure
• Hard to scale quickly with influxes of log data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What we wanted in a new pipeline
• Spend less (or no) time managing the pipeline so we could focus on other
responsibilities
• Leaner server infrastructure
• Easy (re)configuration of the pipeline
• Tighter SLAs
• Streaming logs into Elasticsearch with an SLA of 1 minute
• Writing partitioned batches of data into S3 with an SLA of 5 minutes
• Automatically handle influx in data and normal cyclical patterns
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example: Trending post notifications
Clicks and impressions
through Data pipeline
Analysis and
aggregation
DynamoDB
Amazon S3
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kinesis Streams Lambda Kinesis Firehose S3
AWS managed offerings
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What your pipeline might look like
Lambda
Function
Lambda
Function
Kinesis
Firehose
S3
Kinesis Stream
Flume Collector
Host
Syslog
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Lambda functions
• Easy to write lightweight processing functions in Node, Python, Java, and C#
• However, it’s ETL and there’s a lot of boilerplate
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is ETL?
Extract
• Read data from a source and make available for consumption
• Consume the data into a data structure readable by a machine language
(for example Java Objects)
Transform
• Lowercase, Redact, Augment, Join, and so on
Load
• Write object back to binary form (maybe compress)
• Transport the binary data to a destination
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Where is the boilerplate?
Step Example
Reading data source Reading file from S3 in batches
Deserialization Binary → Object
Filtering Remove trace level logging
Transformation Lowercase, redact, augment, join
Serialization Object → Binary
Transport Binary → Destination format → Destination
Insight Counting, logging, monitoring, and alerting
Error handling Retrying, exception handling, error reporting
Extract
Transform
Load
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Standardizing Lambda functions
• What if tomorrow we wanted to ingest
logs from ALB, ELB, VPC...
• How do we reuse existing code while
adding features and minimizing spaghetti
code and copypasta?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Design considerations
• CPU cycles
• Memory footprint
• Network I/O
• Concurrency and thread pooling
• Buffering and compressing
• Retry logic
• Graceful error handling
• Metric reporting
• Configuration
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Our solution
github.com/nextdoor/bender
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
No more spaghetti and copypasta!
Bender handles the complex plumbing and provides the interfaces necessary to
build your own business logic for any step of the ETL process.
Filter Deserialize Transform Wrap Serialize Transport
Data Sources Data Sinks
Kinesis Streams SNS TopicS3 Other
?
S3 Kinesis Firehose Other
?
Elasticsearch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What ETL steps are supported?
ETL Step
Sources Kinesis, S3, and SNS
Deserialization JSON and anything parseable by a RegEx
Filtering RegEx and str contains
Operations GeoIp lookup, multi-record splits, Elasticsearch Ingest
Wrappers Basic, S3, and Kinesis
Transporters Firehose, S3, Elasticsearch, Splunk, Sumologic, Scalyr,
Generic HTTP
Extract
Transform
Load
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Additional functionality
Other
Error handling Graceful retries and robust exception handling
Reporting CloudWatch metrics and Datadog
Configuration Config validation before deployment
Documentation Auto generated documentation from code
Useful
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bender: Configuration
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does the configuration look like?
handler:
type: KinesisHandler
sources:
- name: Events
source_regex: .*:stream/my-stream
regex_patterns: [".*TRACE.*"]
deserializer:
type: GenericJson
operations:
- ...
wrapper:
type: KinesisWrapper
serializer:
type: Json
transport:
type: ElasticSearch
index_time_format: yyyy-MM-dd
reporters:
- type: Cloudwatch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does the configuration look like?
handler:
type: KinesisHandler
sources:
- name: Events
source_regex: .*:stream/my-stream
regex_patterns: [".*TRACE.*"]
deserializer:
type: GenericJson
operations:
- ...
wrapper:
type: KinesisWrapper
serializer:
type: Json
transport:
type: ElasticSearch
index_time_format: yyyy-MM-dd
reporters:
- type: Cloudwatch
Type of Lambda trigger this function is using
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does the configuration look like?
handler:
type: KinesisHandler
sources:
- name: Events
source_regex: .*:stream/my-stream
regex_patterns: [".*TRACE.*"]
deserializer:
type: GenericJson
operations:
- ...
wrapper:
type: KinesisWrapper
serializer:
type: Json
transport:
type: ElasticSearch
index_time_format: yyyy-MM-dd
reporters:
- type: Cloudwatch
How to treat data coming from the source
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does the configuration look like?
handler:
type: KinesisHandler
sources:
- name: Events
source_regex: .*:stream/my-stream
regex_patterns: [".*TRACE.*"]
deserializer:
type: GenericJson
operations:
- ...
wrapper:
type: KinesisWrapper
serializer:
type: Json
transport:
type: ElasticSearch
index_time_format: yyyy-MM-dd
reporters:
- type: Cloudwatch
Filter out data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does the configuration look like?
handler:
type: KinesisHandler
sources:
- name: Events
source_regex: .*:stream/my-stream
regex_patterns: [".*TRACE.*"]
deserializer:
type: GenericJson
operations:
- ...
wrapper:
type: KinesisWrapper
serializer:
type: Json
transport:
type: ElasticSearch
index_time_format: yyyy-MM-dd
reporters:
- type: Cloudwatch
How to read data coming from the source
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does the configuration look like?
handler:
type: KinesisHandler
sources:
- name: Events
source_regex: .*:stream/my-stream
regex_patterns: [".*TRACE.*"]
deserializer:
type: GenericJson
operations:
- ...
wrapper:
type: KinesisWrapper
serializer:
type: Json
transport:
type: ElasticSearch
index_time_format: yyyy-MM-dd
reporters:
- type: Cloudwatch
Operations to perform on the data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does the configuration look like?
handler:
type: KinesisHandler
sources:
- name: Events
source_regex: .*:stream/my-stream
regex_patterns: [".*TRACE.*"]
deserializer:
type: GenericJson
operations:
- ...
wrapper:
type: KinesisWrapper
serializer:
type: Json
transport:
type: ElasticSearch
index_time_format: yyyy-MM-dd
reporters:
- type: Cloudwatch
{
"partition_key": "1234.456",
"sequence_number": "12345",
"source_arn": "us-east-1:1234:stream/logs.",
"event_source": "aws:kinesis",
"function_name": "log-pipeline...",
"function_version": "97",
"processing_time": 1507823274129,
"arrival_time": 1507823273628,
"processing_delay": 1199,
"timestamp": 1507823272930,
"payload": {
// JSON data from original message
}
}
Add a wrapper
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does the configuration look like?
handler:
type: KinesisHandler
sources:
- name: Events
source_regex: .*:stream/my-stream
regex_patterns: [".*TRACE.*"]
deserializer:
type: GenericJson
operations:
- ...
wrapper:
type: KinesisWrapper
serializer:
type: Json
transport:
type: ElasticSearch
index_time_format: yyyy-MM-dd
reporters:
- type: Cloudwatch
Write it back to JSON string or maybe something else?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does the configuration look like?
handler:
type: KinesisHandler
sources:
- name: Events
source_regex: .*:stream/my-stream
regex_patterns: [".*TRACE.*"]
deserializer:
type: GenericJson
operations:
- ...
wrapper:
type: KinesisWrapper
serializer:
type: Json
transport:
type: ElasticSearch
index_time_format: yyyy-MM-dd
reporters:
- type: Cloudwatch
Send it off somewhere
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What does the configuration look like?
handler:
type: KinesisHandler
sources:
- name: Events
source_regex: .*:stream/my-stream
regex_patterns: [".*TRACE.*"]
deserializer:
type: GenericJson
operations:
- ...
wrapper:
type: KinesisWrapper
serializer:
type: Json
transport:
type: ElasticSearch
index_time_format: yyyy-MM-dd
reporters:
- type: Cloudwatch Report metrics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Extending Bender
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Operation to transform JSON
If key exists replace with value
if payload.has_key(key):
payload[key] = new_value Something like this…
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What the config code will look like
@JsonTypeName("JsonReplaceKVOperation")
@JsonSchemaDescription("Replaces a given key with specified value.")
public class JsonReplaceKVOperationConfig extends OperationConfig {
@JsonSchemaDescription("Key name to find")
@JsonProperty(required = true)
private String key;
@JsonSchemaDescription("New string value")
@JsonProperty(required = true)
private String value;
@Override
public Class<JsonReplaceKVOperationFactory> getFactoryClass() {
return JsonReplaceKVOperationFactory.class;
}
// getters and setters
}
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What the config file will look like
handler:
type: KinesisHandler
sources:
- name: Events
source_regex: .*:stream/my-stream
deserializer:
type: GenericJson
operations:
- type: JsonReplaceKVOperation
key: foo
value: bar
...
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What the documentation will look like
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Operation code
public class JsonReplaceKVOperation extends PayloadOperation {
private String key;
private String value;
public JsonReplaceKVOperation(String key, String value) {
this.key = key;
this.value = value;
}
@Override
protected void perform(JsonObject payload) {
if (payload.has(this.key))
// Note: addProperty replaces the current value
payload.addProperty(this.key, this.value);
}
}
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Code on GitHub
github.com/nextdoor/bender
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why did we choose AWS serverless offerings?
Matt Wise · Sr. Systems Architect
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kinesis for data ingestion
• Operational overhead is near zero
• Scaling streams up and down is a simple API call
• Scaling up gives you more performance in every dimensions
• Data retention of 24 hours to 7 days
• Stream access is fast – delays measured in milliseconds
• AWS provides tools…
• Log collection agent
• Stream Auto Scaling agent
• KCL for in-app direct stream publishing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda for our execution layer
• Scales automatically with Kinesis and your data
• Manages Kinesis Triggers, Function Iterators all on its own
• Provides plenty of metrics that are easy to alarm on
• Blazing fast … backlogs are worked through as fast as your target can handle it
• Built in easy to use management operations
• Pausing and enabling triggers
• Releasing by updating a function to the latest version
• Rollbacks by just rolling back to the last known good version
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Firehose for data aggregation
• Lambda+Kinesis Streams = Lots of small operations!
• Kinesis -> Lambda -> S3 = Tons of tiny files
• Bulk-uploading files to S3 leads to faster downstream analytics performance
• Bender can batch multiple individual events up into a single Firehose Record,
keeping the costs sane and maximizing performance!
• Max-file-size and Max-event-age settings allow you to control your uploads and
your data delay
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
S3 for short term and long term storage
• Fast, cost effective, and stable
• Lifecycle policies allow for automatic purging or migration of data
• Hands-off operation
• Fine grained access controls
• And so on, and so on, and so on...
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Final months of the old pipeline…
Pipeline 80% full here!
Data delays for days!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Well that looks better!
Y-Axis here less than 10 minutes!
Peak delay of less than 1 minute!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Totally worth it!
• Old pipeline was actually dropping events regularly - ~0.02% is millions and millions of eventsFlume Kinesis/Lambda/Firehose/S3
@ 1.5B events/day, was dropping 0.02% @ 3B events/day, all events make it
Required significant architectural work to
scale beyond ~2B events/day
Built to scale on day one, all major scaling
components are AWS-managed
Near weekly outages that lasted for hours In 12 months, 1 outage that lasted more than
an hour.. Automatically resolved by AWS
Significant operational knowledge required
for operation and scaling of the system
Java knowledge required to add features –
AWS operates the rest
~15-20% of a an engineers time to manage Literally almost zero management in months
Combined ingest/storage/forwarding make
changes hard and reduce stability
Separated responsibilities are easier to scale
and tweak, improve reliability
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A real pipeline!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Anatomy of a pipeline
CloudTrail S3 Bucket SNS Topic Subscription ElasticSearch
Domain
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Office
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Geo Filter
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
S3 Bucket SNS
Elasticsearch
Users
App
Container Kinesis
Application
Endpoint
Instance Logs Kinesis
CloudTrail
CloudFront
ELB/ALB
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Final thoughts...
• The Systems Team @ Nextdoor has 3 people…
• We manage hundreds of servers, and we automate everything
• Hands-off auto-scaling is critical to keeping our team small and efficient
• Lambda/Kinesis costs are a drop in the bucket compared to engineering
productivity costs when things don’t work
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
We’re hiring!
nextdoor.com/jobs
https://s3.amazonaws.com/bender-presentation-assets-reinvent-2017/bender-stack.yaml
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

Mais conteúdo relacionado

Mais procurados

ABD310 big data aws and security no notes
ABD310 big data aws and security no notesABD310 big data aws and security no notes
ABD310 big data aws and security no notesAmazon Web Services
 
Deploy and Enforce Compliance Controls When Archiving Large-Scale Data Stores...
Deploy and Enforce Compliance Controls When Archiving Large-Scale Data Stores...Deploy and Enforce Compliance Controls When Archiving Large-Scale Data Stores...
Deploy and Enforce Compliance Controls When Archiving Large-Scale Data Stores...Amazon Web Services
 
SID344-Soup to Nuts Identity Federation for AWS
SID344-Soup to Nuts Identity Federation for AWSSID344-Soup to Nuts Identity Federation for AWS
SID344-Soup to Nuts Identity Federation for AWSAmazon Web Services
 
DVC303-Technological Accelerants for Organizational Transformation
DVC303-Technological Accelerants for Organizational TransformationDVC303-Technological Accelerants for Organizational Transformation
DVC303-Technological Accelerants for Organizational TransformationAmazon Web Services
 
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...Amazon Web Services
 
GAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game BalancingGAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game BalancingAmazon Web Services
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTAmazon Web Services
 
GPSTEC305-Machine Learning in Capital Markets
GPSTEC305-Machine Learning in Capital MarketsGPSTEC305-Machine Learning in Capital Markets
GPSTEC305-Machine Learning in Capital MarketsAmazon Web Services
 
SRV334-Making Things Right with AWS Config Rules and AWS Lambda
SRV334-Making Things Right with AWS Config Rules and AWS LambdaSRV334-Making Things Right with AWS Config Rules and AWS Lambda
SRV334-Making Things Right with AWS Config Rules and AWS LambdaAmazon Web Services
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...Amazon Web Services
 
DAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDS
DAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDSDAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDS
DAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDSAmazon Web Services
 
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...Amazon Web Services
 
ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...Amazon Web Services
 
What’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial DatabasesWhat’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial DatabasesAmazon Web Services
 
FSV306_Getting to Yes—Minimal Viable Cloud with Maximum Security
FSV306_Getting to Yes—Minimal Viable Cloud with Maximum SecurityFSV306_Getting to Yes—Minimal Viable Cloud with Maximum Security
FSV306_Getting to Yes—Minimal Viable Cloud with Maximum SecurityAmazon Web Services
 
Deploying Business Analytics at Enterprise Scale - AWS Online Tech Talks
Deploying Business Analytics at Enterprise Scale - AWS Online Tech TalksDeploying Business Analytics at Enterprise Scale - AWS Online Tech Talks
Deploying Business Analytics at Enterprise Scale - AWS Online Tech TalksAmazon Web Services
 
SID301_Using AWS Lambda as a Security Team
SID301_Using AWS Lambda as a Security TeamSID301_Using AWS Lambda as a Security Team
SID301_Using AWS Lambda as a Security TeamAmazon Web Services
 

Mais procurados (20)

ABD310 big data aws and security no notes
ABD310 big data aws and security no notesABD310 big data aws and security no notes
ABD310 big data aws and security no notes
 
ABD217_From Batch to Streaming
ABD217_From Batch to StreamingABD217_From Batch to Streaming
ABD217_From Batch to Streaming
 
Deploy and Enforce Compliance Controls When Archiving Large-Scale Data Stores...
Deploy and Enforce Compliance Controls When Archiving Large-Scale Data Stores...Deploy and Enforce Compliance Controls When Archiving Large-Scale Data Stores...
Deploy and Enforce Compliance Controls When Archiving Large-Scale Data Stores...
 
SID344-Soup to Nuts Identity Federation for AWS
SID344-Soup to Nuts Identity Federation for AWSSID344-Soup to Nuts Identity Federation for AWS
SID344-Soup to Nuts Identity Federation for AWS
 
GPSTEC307_Too Many Tools
GPSTEC307_Too Many ToolsGPSTEC307_Too Many Tools
GPSTEC307_Too Many Tools
 
DVC303-Technological Accelerants for Organizational Transformation
DVC303-Technological Accelerants for Organizational TransformationDVC303-Technological Accelerants for Organizational Transformation
DVC303-Technological Accelerants for Organizational Transformation
 
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
 
GAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game BalancingGAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
GAM310_Build a Telemetry and Analytics Pipeline for Game Balancing
 
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPTHow TrueCar Gains Actionable Insights with Splunk Cloud PPT
How TrueCar Gains Actionable Insights with Splunk Cloud PPT
 
GPSTEC305-Machine Learning in Capital Markets
GPSTEC305-Machine Learning in Capital MarketsGPSTEC305-Machine Learning in Capital Markets
GPSTEC305-Machine Learning in Capital Markets
 
SRV334-Making Things Right with AWS Config Rules and AWS Lambda
SRV334-Making Things Right with AWS Config Rules and AWS LambdaSRV334-Making Things Right with AWS Config Rules and AWS Lambda
SRV334-Making Things Right with AWS Config Rules and AWS Lambda
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
 
DAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDS
DAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDSDAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDS
DAT309_Best Practices for Migrating from Oracle and SQL Server to Amazon RDS
 
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
 
ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...ABD207 building a banking utility leveraging aws to fight financial crime and...
ABD207 building a banking utility leveraging aws to fight financial crime and...
 
What’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial DatabasesWhat’s New in Amazon RDS for Open-Source and Commercial Databases
What’s New in Amazon RDS for Open-Source and Commercial Databases
 
FSV306_Getting to Yes—Minimal Viable Cloud with Maximum Security
FSV306_Getting to Yes—Minimal Viable Cloud with Maximum SecurityFSV306_Getting to Yes—Minimal Viable Cloud with Maximum Security
FSV306_Getting to Yes—Minimal Viable Cloud with Maximum Security
 
Deploying Business Analytics at Enterprise Scale - AWS Online Tech Talks
Deploying Business Analytics at Enterprise Scale - AWS Online Tech TalksDeploying Business Analytics at Enterprise Scale - AWS Online Tech Talks
Deploying Business Analytics at Enterprise Scale - AWS Online Tech Talks
 
SID301_Using AWS Lambda as a Security Team
SID301_Using AWS Lambda as a Security TeamSID301_Using AWS Lambda as a Security Team
SID301_Using AWS Lambda as a Security Team
 

Semelhante a How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Events per Day - SRV319 - re:Invent 2017

I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...Amazon Web Services
 
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018Amazon Web Services
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural PatternsAmazon Web Services
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersAmazon Web Services
 
Getting started with Serverless on AWS
Getting started with Serverless on AWSGetting started with Serverless on AWS
Getting started with Serverless on AWSAdrian Hornsby
 
ARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million UsersARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million UsersAmazon Web Services
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with ZopaAmazon Web Services
 
Journey Towards Scaling Your API to 10 Million Users
Journey Towards Scaling Your API to 10 Million UsersJourney Towards Scaling Your API to 10 Million Users
Journey Towards Scaling Your API to 10 Million UsersAdrian Hornsby
 
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_SingaporeAmazon Web Services
 
Learn how to build serverless applications using the AWS Serverless Platform-...
Learn how to build serverless applications using the AWS Serverless Platform-...Learn how to build serverless applications using the AWS Serverless Platform-...
Learn how to build serverless applications using the AWS Serverless Platform-...Amazon Web Services
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Amazon Web Services
 
Reactive Architectures with Microservices
Reactive Architectures with MicroservicesReactive Architectures with Microservices
Reactive Architectures with MicroservicesAWS Germany
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Amazon Web Services
 
Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Brendan Bouffler
 
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSightABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSightAmazon Web Services
 
Scale Website dan Mobile Applications Anda di AWS hingga 10 juta pengguna
Scale Website dan Mobile Applications Anda di AWS hingga 10 juta penggunaScale Website dan Mobile Applications Anda di AWS hingga 10 juta pengguna
Scale Website dan Mobile Applications Anda di AWS hingga 10 juta penggunaAmazon Web Services
 
DAT317_Migrating Databases and Data Warehouses to the Cloud
DAT317_Migrating Databases and Data Warehouses to the CloudDAT317_Migrating Databases and Data Warehouses to the Cloud
DAT317_Migrating Databases and Data Warehouses to the CloudAmazon Web Services
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansAmazon Web Services
 

Semelhante a How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Events per Day - SRV319 - re:Invent 2017 (20)

I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
 
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural Patterns
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million Users
 
Getting started with Serverless on AWS
Getting started with Serverless on AWSGetting started with Serverless on AWS
Getting started with Serverless on AWS
 
ARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million UsersARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million Users
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with Zopa
 
Journey Towards Scaling Your API to 10 Million Users
Journey Towards Scaling Your API to 10 Million UsersJourney Towards Scaling Your API to 10 Million Users
Journey Towards Scaling Your API to 10 Million Users
 
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
 
Learn how to build serverless applications using the AWS Serverless Platform-...
Learn how to build serverless applications using the AWS Serverless Platform-...Learn how to build serverless applications using the AWS Serverless Platform-...
Learn how to build serverless applications using the AWS Serverless Platform-...
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
 
Reactive Architectures with Microservices
Reactive Architectures with MicroservicesReactive Architectures with Microservices
Reactive Architectures with Microservices
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
 
Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018Genomics on aws-webinar-april2018
Genomics on aws-webinar-april2018
 
Serverless Architectures.pdf
Serverless Architectures.pdfServerless Architectures.pdf
Serverless Architectures.pdf
 
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSightABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
 
Scale Website dan Mobile Applications Anda di AWS hingga 10 juta pengguna
Scale Website dan Mobile Applications Anda di AWS hingga 10 juta penggunaScale Website dan Mobile Applications Anda di AWS hingga 10 juta pengguna
Scale Website dan Mobile Applications Anda di AWS hingga 10 juta pengguna
 
DAT317_Migrating Databases and Data Warehouses to the Cloud
DAT317_Migrating Databases and Data Warehouses to the CloudDAT317_Migrating Databases and Data Warehouses to the Cloud
DAT317_Migrating Databases and Data Warehouses to the Cloud
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data Oceans
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Events per Day - SRV319 - re:Invent 2017

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Events per Day S R V 3 1 9 Prakash Janakiraman · Co-Founder & Chief Architect Matt Wise · Sr. Systems Architect Slava Markeyev · Software Engineer N o v e m b e r 2 9 , 2 0 1 7
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speakers Prakash Janakiraman Slava MarkeyevMatt Wise Co-Founder & Chief Architect Sr. Systems Architect Systems Engineer
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What to Expect from the Session • About Nextdoor • Overview of our legacy pipeline • Requirements for new pipeline • Recap of AWS Serverless offerings • Tool we built to help standardize Serverless ETL • Deployment example via Cloudformation • Q&A
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Used by 80% of US neighborhoods
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data drives products at Nextdoor
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data at Nextdoor 0 1 2 3 2013 2014 2015 2016 2017 BillionsEvents/Day Data Ingested
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Realtime Data Ingestion Pipeline Slava Markeyev · Software Engineer
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Legacy pipeline overview • Built using Apache Flume • A distributed self-hosted service for collecting and aggregating log data (similar to AWS Kinesis Streams, Apache Kafka, and Fluentd) • Served us well for 4 years • Scaled to handle ~1.5B events per day
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Legacy pipeline overview Flume Collector Host Application Host Flume Agent Syslog Flume Collector Host Flume Collector Host
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Legacy pipeline overview Amazon S3 bucket Flume CollectorHost Amazon Redshift Developers
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Legacy pipeline pain points • Highly susceptible to back pressure • Slow destination • Influx of data • Occasional slow Flume Collector (process, IO, CPU, bugs?) • Restarts were very slow (tens of minutes) • Hard to apply changes to the pipeline • Proprietary storage format made it difficult to near-impossible to recover corrupted data • Flume agents were CPU hungry and this caused issues on critical infrastructure • Hard to scale quickly with influxes of log data
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What we wanted in a new pipeline • Spend less (or no) time managing the pipeline so we could focus on other responsibilities • Leaner server infrastructure • Easy (re)configuration of the pipeline • Tighter SLAs • Streaming logs into Elasticsearch with an SLA of 1 minute • Writing partitioned batches of data into S3 with an SLA of 5 minutes • Automatically handle influx in data and normal cyclical patterns
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example: Trending post notifications Clicks and impressions through Data pipeline Analysis and aggregation DynamoDB Amazon S3
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis Streams Lambda Kinesis Firehose S3 AWS managed offerings
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What your pipeline might look like Lambda Function Lambda Function Kinesis Firehose S3 Kinesis Stream Flume Collector Host Syslog
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Lambda functions • Easy to write lightweight processing functions in Node, Python, Java, and C# • However, it’s ETL and there’s a lot of boilerplate
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is ETL? Extract • Read data from a source and make available for consumption • Consume the data into a data structure readable by a machine language (for example Java Objects) Transform • Lowercase, Redact, Augment, Join, and so on Load • Write object back to binary form (maybe compress) • Transport the binary data to a destination
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Where is the boilerplate? Step Example Reading data source Reading file from S3 in batches Deserialization Binary → Object Filtering Remove trace level logging Transformation Lowercase, redact, augment, join Serialization Object → Binary Transport Binary → Destination format → Destination Insight Counting, logging, monitoring, and alerting Error handling Retrying, exception handling, error reporting Extract Transform Load
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Standardizing Lambda functions • What if tomorrow we wanted to ingest logs from ALB, ELB, VPC... • How do we reuse existing code while adding features and minimizing spaghetti code and copypasta?
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Design considerations • CPU cycles • Memory footprint • Network I/O • Concurrency and thread pooling • Buffering and compressing • Retry logic • Graceful error handling • Metric reporting • Configuration
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Our solution github.com/nextdoor/bender
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. No more spaghetti and copypasta! Bender handles the complex plumbing and provides the interfaces necessary to build your own business logic for any step of the ETL process. Filter Deserialize Transform Wrap Serialize Transport Data Sources Data Sinks Kinesis Streams SNS TopicS3 Other ? S3 Kinesis Firehose Other ? Elasticsearch
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What ETL steps are supported? ETL Step Sources Kinesis, S3, and SNS Deserialization JSON and anything parseable by a RegEx Filtering RegEx and str contains Operations GeoIp lookup, multi-record splits, Elasticsearch Ingest Wrappers Basic, S3, and Kinesis Transporters Firehose, S3, Elasticsearch, Splunk, Sumologic, Scalyr, Generic HTTP Extract Transform Load
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Additional functionality Other Error handling Graceful retries and robust exception handling Reporting CloudWatch metrics and Datadog Configuration Config validation before deployment Documentation Auto generated documentation from code Useful
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bender: Configuration
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What does the configuration look like? handler: type: KinesisHandler sources: - name: Events source_regex: .*:stream/my-stream regex_patterns: [".*TRACE.*"] deserializer: type: GenericJson operations: - ... wrapper: type: KinesisWrapper serializer: type: Json transport: type: ElasticSearch index_time_format: yyyy-MM-dd reporters: - type: Cloudwatch
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What does the configuration look like? handler: type: KinesisHandler sources: - name: Events source_regex: .*:stream/my-stream regex_patterns: [".*TRACE.*"] deserializer: type: GenericJson operations: - ... wrapper: type: KinesisWrapper serializer: type: Json transport: type: ElasticSearch index_time_format: yyyy-MM-dd reporters: - type: Cloudwatch Type of Lambda trigger this function is using
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What does the configuration look like? handler: type: KinesisHandler sources: - name: Events source_regex: .*:stream/my-stream regex_patterns: [".*TRACE.*"] deserializer: type: GenericJson operations: - ... wrapper: type: KinesisWrapper serializer: type: Json transport: type: ElasticSearch index_time_format: yyyy-MM-dd reporters: - type: Cloudwatch How to treat data coming from the source
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What does the configuration look like? handler: type: KinesisHandler sources: - name: Events source_regex: .*:stream/my-stream regex_patterns: [".*TRACE.*"] deserializer: type: GenericJson operations: - ... wrapper: type: KinesisWrapper serializer: type: Json transport: type: ElasticSearch index_time_format: yyyy-MM-dd reporters: - type: Cloudwatch Filter out data
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What does the configuration look like? handler: type: KinesisHandler sources: - name: Events source_regex: .*:stream/my-stream regex_patterns: [".*TRACE.*"] deserializer: type: GenericJson operations: - ... wrapper: type: KinesisWrapper serializer: type: Json transport: type: ElasticSearch index_time_format: yyyy-MM-dd reporters: - type: Cloudwatch How to read data coming from the source
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What does the configuration look like? handler: type: KinesisHandler sources: - name: Events source_regex: .*:stream/my-stream regex_patterns: [".*TRACE.*"] deserializer: type: GenericJson operations: - ... wrapper: type: KinesisWrapper serializer: type: Json transport: type: ElasticSearch index_time_format: yyyy-MM-dd reporters: - type: Cloudwatch Operations to perform on the data
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What does the configuration look like? handler: type: KinesisHandler sources: - name: Events source_regex: .*:stream/my-stream regex_patterns: [".*TRACE.*"] deserializer: type: GenericJson operations: - ... wrapper: type: KinesisWrapper serializer: type: Json transport: type: ElasticSearch index_time_format: yyyy-MM-dd reporters: - type: Cloudwatch { "partition_key": "1234.456", "sequence_number": "12345", "source_arn": "us-east-1:1234:stream/logs.", "event_source": "aws:kinesis", "function_name": "log-pipeline...", "function_version": "97", "processing_time": 1507823274129, "arrival_time": 1507823273628, "processing_delay": 1199, "timestamp": 1507823272930, "payload": { // JSON data from original message } } Add a wrapper
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What does the configuration look like? handler: type: KinesisHandler sources: - name: Events source_regex: .*:stream/my-stream regex_patterns: [".*TRACE.*"] deserializer: type: GenericJson operations: - ... wrapper: type: KinesisWrapper serializer: type: Json transport: type: ElasticSearch index_time_format: yyyy-MM-dd reporters: - type: Cloudwatch Write it back to JSON string or maybe something else?
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What does the configuration look like? handler: type: KinesisHandler sources: - name: Events source_regex: .*:stream/my-stream regex_patterns: [".*TRACE.*"] deserializer: type: GenericJson operations: - ... wrapper: type: KinesisWrapper serializer: type: Json transport: type: ElasticSearch index_time_format: yyyy-MM-dd reporters: - type: Cloudwatch Send it off somewhere
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What does the configuration look like? handler: type: KinesisHandler sources: - name: Events source_regex: .*:stream/my-stream regex_patterns: [".*TRACE.*"] deserializer: type: GenericJson operations: - ... wrapper: type: KinesisWrapper serializer: type: Json transport: type: ElasticSearch index_time_format: yyyy-MM-dd reporters: - type: Cloudwatch Report metrics
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Extending Bender
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Operation to transform JSON If key exists replace with value if payload.has_key(key): payload[key] = new_value Something like this…
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What the config code will look like @JsonTypeName("JsonReplaceKVOperation") @JsonSchemaDescription("Replaces a given key with specified value.") public class JsonReplaceKVOperationConfig extends OperationConfig { @JsonSchemaDescription("Key name to find") @JsonProperty(required = true) private String key; @JsonSchemaDescription("New string value") @JsonProperty(required = true) private String value; @Override public Class<JsonReplaceKVOperationFactory> getFactoryClass() { return JsonReplaceKVOperationFactory.class; } // getters and setters }
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What the config file will look like handler: type: KinesisHandler sources: - name: Events source_regex: .*:stream/my-stream deserializer: type: GenericJson operations: - type: JsonReplaceKVOperation key: foo value: bar ...
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What the documentation will look like
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Operation code public class JsonReplaceKVOperation extends PayloadOperation { private String key; private String value; public JsonReplaceKVOperation(String key, String value) { this.key = key; this.value = value; } @Override protected void perform(JsonObject payload) { if (payload.has(this.key)) // Note: addProperty replaces the current value payload.addProperty(this.key, this.value); } }
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Code on GitHub github.com/nextdoor/bender
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why did we choose AWS serverless offerings? Matt Wise · Sr. Systems Architect
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis for data ingestion • Operational overhead is near zero • Scaling streams up and down is a simple API call • Scaling up gives you more performance in every dimensions • Data retention of 24 hours to 7 days • Stream access is fast – delays measured in milliseconds • AWS provides tools… • Log collection agent • Stream Auto Scaling agent • KCL for in-app direct stream publishing
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lambda for our execution layer • Scales automatically with Kinesis and your data • Manages Kinesis Triggers, Function Iterators all on its own • Provides plenty of metrics that are easy to alarm on • Blazing fast … backlogs are worked through as fast as your target can handle it • Built in easy to use management operations • Pausing and enabling triggers • Releasing by updating a function to the latest version • Rollbacks by just rolling back to the last known good version
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Firehose for data aggregation • Lambda+Kinesis Streams = Lots of small operations! • Kinesis -> Lambda -> S3 = Tons of tiny files • Bulk-uploading files to S3 leads to faster downstream analytics performance • Bender can batch multiple individual events up into a single Firehose Record, keeping the costs sane and maximizing performance! • Max-file-size and Max-event-age settings allow you to control your uploads and your data delay
  • 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. S3 for short term and long term storage • Fast, cost effective, and stable • Lifecycle policies allow for automatic purging or migration of data • Hands-off operation • Fine grained access controls • And so on, and so on, and so on...
  • 49. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Final months of the old pipeline… Pipeline 80% full here! Data delays for days!
  • 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Well that looks better! Y-Axis here less than 10 minutes! Peak delay of less than 1 minute!
  • 51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Totally worth it! • Old pipeline was actually dropping events regularly - ~0.02% is millions and millions of eventsFlume Kinesis/Lambda/Firehose/S3 @ 1.5B events/day, was dropping 0.02% @ 3B events/day, all events make it Required significant architectural work to scale beyond ~2B events/day Built to scale on day one, all major scaling components are AWS-managed Near weekly outages that lasted for hours In 12 months, 1 outage that lasted more than an hour.. Automatically resolved by AWS Significant operational knowledge required for operation and scaling of the system Java knowledge required to add features – AWS operates the rest ~15-20% of a an engineers time to manage Literally almost zero management in months Combined ingest/storage/forwarding make changes hard and reduce stability Separated responsibilities are easier to scale and tweak, improve reliability
  • 52. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A real pipeline!
  • 53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Anatomy of a pipeline CloudTrail S3 Bucket SNS Topic Subscription ElasticSearch Domain
  • 54. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 55. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 56. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 57. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 58. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 59. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 60. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 61. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 62. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 63. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 64. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Office
  • 65. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Geo Filter
  • 66. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. S3 Bucket SNS Elasticsearch Users App Container Kinesis Application Endpoint Instance Logs Kinesis CloudTrail CloudFront ELB/ALB
  • 67. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Final thoughts... • The Systems Team @ Nextdoor has 3 people… • We manage hundreds of servers, and we automate everything • Hands-off auto-scaling is critical to keeping our team small and efficient • Lambda/Kinesis costs are a drop in the bucket compared to engineering productivity costs when things don’t work
  • 68. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. We’re hiring! nextdoor.com/jobs https://s3.amazonaws.com/bender-presentation-assets-reinvent-2017/bender-stack.yaml
  • 69. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!