Come learn how to build, launch, and scale a Big Data application in a serverless context. This is going to be an information packed meetup around Big Data processing, Lambda functions, Lambda Step functions, and everything that ties them together.
Big Data is something we're very passionate about. As the cost of servers have come down and the cost of software has become free, using data to drive your business has become much more obtainable to a larger group of companies. The serverless methodology has recently come in the scene, and it's proving to be just as transformational as cloud has been to the Big Data analytics space. We will be sharing some of our learnings and experiences over the last two years of working with Big Data in a serverless context. We will cover one or two examples of eventful Big Data processing, and the impact it can have on your business in terms of speed of analytics and cost savings to the bottom line.
2. HOWTO BUILD A BIG DATA
APPLICATION
We start off by building 3-tier applications
• Web Server
• Application Server
• Database
3. HOWTO BUILD A BIG DATA
APPLICATION
We break down the parts to enable scaling
• Remove state from Application server
• Shard Database
• Introduce caching
4. 3TIER ARCHITECTURE V1
Data Collection Instances
client
mobile client
Data Collection
Data Collection Instances
Data Collection Instances
Data Analysis
Elastic Load
Balancing
router
Amazon
Route 53
Internet
Gateway
Internet
Gateway
Application Server
MySQL DB instances
Application Instances
Application Instances
Application Instances
Business Users
5. HOWTO BUILD A BIG DATA
APPLICATION
To deal with data volume we move to NoSQL Database
• Columnar database
• Fast reads, No Joins
6. 3TIER ARCHITECTURE V1
Data Collection Instances
client
mobile client
Data Collection
Data Collection Instances
Data Collection Instances
Data Analysis
Elastic Load
Balancing
router
Amazon
Route 53
Internet
Gateway
Internet
Gateway
Application Server
MySQL DB instances
Application Instances
Application Instances
Application Instances
Business Users
7. 3TIER ARCHITECTURE V2
Data Collection Instances
client
mobile client
Data Collection
Data Collection Instances
Data Collection Instances
Elastic Load
Balancing
router
Amazon
Route 53
Internet
Gateway
Internet
Gateway
Application Server
Application Instances
Amazon
DynamoDB
Cache Node
Data Analysis
Business Users
Application Instances
Application Instances
8. HOWTO BUILD A BIG DATA
APPLICATION
But we still need SQL of some parts of our application
• We add Redshift data warehouse
• Columnar database
• Fast reads
• SQL engine
• Petabyte Scale data warehouse
9. 3TIER ARCHITECTURE V2
Data Collection Instances
client
mobile client
Data Collection
Data Collection Instances
Data Collection Instances
Elastic Load
Balancing
router
Amazon
Route 53
Internet
Gateway
Internet
Gateway
Application Server
Application Instances
Amazon
DynamoDB
Cache Node
Amazon
Redshift
Data Analysis
Business Users
Application Instances
Application Instances
10. HOWTO BUILD A BIG DATA
APPLICATION
Let push a little more into removing as many “servers” as
possible.
• How can we remove the Data Collection servers
• How can we remove Application Servers
• How can we improve thought put
11. 3TIER ARCHITECTURE V2
Data Collection Instances
client
mobile client
Data Collection
Data Collection Instances
Data Collection Instances
Elastic Load
Balancing
router
Amazon
Route 53
Internet
Gateway
Internet
Gateway
Application Server
Application Instances
Amazon
DynamoDB
Cache Node
Amazon
Redshift
Data Analysis
Business Users
Application Instances
Application Instances
12. 3TIER SERVERLESS
client
mobile client
Eventful Data Collection
Amazon
DynamoDB
Cache Node
Amazon
Redshift
Application Instances
Application Server
Elastic Load
Balancing
router
Amazon
Route 53
Internet
Gateway
Internet
Gateway
AWS
Lambda
Amazon API
Gateway
Amazon Machine
Learning
Data Analysis
Business Users
Application Instances
Application Instances
13. HOWTO BUILD A BIG DATA
APPLICATION
Let push a little more into removing as many “servers” as
possible.
• How can we remove the Data Collection servers
• How can we remove Application Servers
• How can we improve thought put
14. 3TIER SERVERLESS
client
mobile client
Eventful Data Collection
Amazon
DynamoDB
Cache Node
Amazon
Redshift
Application Instances
Application Server
Elastic Load
Balancing
router
Amazon
Route 53
Internet
Gateway
Internet
Gateway
AWS
Lambda
Amazon API
Gateway
Amazon Machine
Learning
Data Analysis
Business Users
Application Instances
Application Instances
15. HOWTO BUILD A BIG DATA
APPLICATION
Removing your application servers is not so easy.
• You run your application server as a monolith
• You have a proven build/deployment process
• You understand how to debug your application
locally
16. 3TIER SERVERLESS
client
mobile client
Eventful Data Collection
Amazon
DynamoDB
Cache Node
Amazon
Redshift
Application Instances
Application Server
Elastic Load
Balancing
router
Amazon
Route 53
Internet
Gateway
Internet
Gateway
AWS
Lambda
Amazon API
Gateway
Amazon Machine
Learning
Data Analysis
Business Users
Application Instances
Application Instances
18. MICRO SERVICES
• It’s not as scary as you think
• Most applications are small (Compared to Netflix)
• There are many frameworks to manage your
application
• Serverless (Multi Cloud, Multi language)
• AWS SAM (AWS specific, Multi language)
19. MICRO SERVICES
This architecture requires different tooling and mindset
• More difficult to run offline
• More difficult to debug
• More difficult to Monitor
• Many more places where it can and will break
20. MICRO SERVICES
With all these problem why would I want to more to
Micro Services.
• Many different presentations on why
• Microservices at Netflix scale (Great video)
21. 3TIER SERVERLESS
client
mobile client
Eventful Data Collection
Amazon
DynamoDB
Cache Node
Amazon
Redshift
Application Instances
Application Server
Elastic Load
Balancing
router
Amazon
Route 53
Internet
Gateway
Internet
Gateway
AWS
Lambda
Amazon API
Gateway
Amazon Machine
Learning
Data Analysis
Business Users
Application Instances
Application Instances
23. MOVINGYOUR APPLICATION
TO SERVERLESS
What are the steps we took to move our monolithic
Django/Java/NodeJS app to Serverless.
• Separate presentation logic from business logic
• Understand time requirements for each function
• Understand the input and output payload for each
function
24. MOVINGYOUR APPLICATION
TO SERVERLESS
What technologies will you use.
• Serverless framework
• Similar to AWS SAM
• Multi Language (Python, Go, C#, Javascript, Java)
• Multi Platform (AWS, GCP, Microsoft, IBM)
25. MOVINGYOUR APPLICATION
TO SERVERLESS
Understanding the limitations and constraints.
• AWS Lambda
• 5 minute execution time
• 50 mb function bundles
• API Gateway
• 30 Second time out
• 10 MB payload limit
26. MOVINGYOUR APPLICATION
TO SERVERLESS
Breaking down the frontend of our monolith application.
• Separate view logic and business logic
• Create libraries out of business logic to simplify
sharing
• View logic and template gets translated to
ReactJS components
• API Gateway endpoints are created for each
view
27. MOVINGYOUR APPLICATION
TO SERVERLESS
Wait how do we manage 30-50 endpoints.
• Thats where Serverless framework comes into
play
• Define each endpoint usingYAML
• Define access each Lambda function to API
Gateway connection
• Define access to other resources needed.
28. MOVINGYOUR APPLICATION
TO SERVERLESS
But my functions takes longer then 5 minutes to
execute.
• Well there is still a problem.
• API Gateway has a 30 second time out.
• So your functions have to execute within 30
seconds.
• That will not work for my application. (My
application too)
29. MOVINGYOUR APPLICATION
TO SERVERLESS
How to get passed the 30 second time out from API Gateway.
• Create ticketing system.
• ReactJS component requests data from the backend,
and the backend returned aTicket ID.
• ReactJS application Polls for results to the backend
every (x) seconds.
• Ticketing systems checks status and returns to the
frontend the job has completed and the data is ready.
• ReactJS application requests data for itsTicket ID.
30. MOVINGYOUR APPLICATION
TO SERVERLESS
How do I do all of that with ReactJS.
• We wont go in-depth with ReactJS (Thats a whole talk by
itself).
• With ReactJS we use Redux for data flow coordination.
• And Redux Saga for side effects (Ajax calls to the server).
• Redux Saga allows you to create a kind of demon
process for your data fetching from the backend.
• We create a long demon sequence to poll and fetch
data after the processing is complete.
31. MOVINGYOUR APPLICATION
TO SERVERLESS
Wait I thought you need NodeJS to run ReactJS
• ReactJS is just Javascript and can be ran from a
CDN.
• You can compile your ReactJS application into a
Javascript bundle thats loaded from a Static HTML
page.
• Once RectJS is loaded it runs from the users
browser.
32. MOVINGYOUR APPLICATION
TO SERVERLESS
What about this ticketing system.
• This is another place Lambda comes in handy.
• You can asynchronous launch a lambda function
per request.
• If you need a complex workflow, you can use AWS
Step functions to synchronize your lambda
functions.
33. MOVINGYOUR APPLICATION
TO SERVERLESS
Put all that together you get something pretty
interesting.
• No longer paying for idle time.
• Better react to spikes in your system.
• Ability to scale dynamically.
• Ability to start off small and grow.
34. 3TIER SERVERLESS
client
mobile client
Eventful Data Collection
Amazon
DynamoDB
Cache Node
Amazon
Redshift
Application Instances
Application Server
Elastic Load
Balancing
router
Amazon
Route 53
Internet
Gateway
Internet
Gateway
AWS
Lambda
Amazon API
Gateway
Amazon Machine
Learning
Data Analysis
Business Users
Application Instances
Application Instances
36. MOVINGYOUR APPLICATION
TO SERVERLESS
Its not all good.
• Need for increase monitoring.
• Need for increase Alerting.
• Need for better logging.
• Need for better error handling.