My presentation at QConNY 2017 about the Internet of Things and Edge Compute architecture / strategy at Chick-fil-A. I discuss using a cloud-native approach to computing at the Edge, and discuss the services that are part of our architecture to enable data collection and control of "things" in our restaurants.
5. Principles: Security
TODO – some sort of intro to IOT
design principles / considerations
slide
Maybe just some pictures over a few
slides that tell the story
Secure
Credit: https://www.glassdoor.com/Photos/AMG-National-Trust-Bank-Office-Photos-IMG491177.htm
Secure
Credit: Brook Ward / https://creativecommons.org/licenses/by-nc/2.0/
Secure
10. Let’s create a new product…
Requirements
• Should be amazing!
• Produced with a new machine we’ll develop
• Should be able to collect data from our machine
• Should be able to command our machine to cook what
we want on demand
15. Registration & AuthN/AuthZ
• Dynamic Client Registration for OAuth Clients
• Authorization – Human authorization
• Auth Code Flow / Device Code Flow
• Stateless Tokens – JWT
• No degradation when WAN offline
• Software Development Kit (SDK) to make it easy
16. Security: Demo
What happens with a new device?
1. Connect (Wi-Fi in our case)
2. Discover endpoints via .wellknown
3. Register with Auth Server
4. Request authorization as Johnny 5
5. Approve the request (SSO / MFA)
6. Return a JWT
7. Switch Wi-Fi Networks
18. Security Recommendations
1. Don’t hardcode permanent, powerful credentials at
manufacture time, and then never change them
2. Require human authorization for devices whenever
possible
3. Monitor device traffic profiles to ensure they are behaving
normally
4. Don’t allow inbound connectivity if possible
23. What if we lose connectivity?
What if the network is too slow?
24. Edge Architecture
Why Edge Compute?
• Support critical businesses when
network is down
• Reduce latency for “thing” interactions
• Data aggregation before shipping to
cloud
33. How do I build an application to
control my device?
34. Edge Applications
• Run in Docker containers
• On-board as a software “thing”
• Interact with local and cloud services
• Short-lived vs Long-lived
• Service Limits
35. CI /CD for IOT
Commit Build Virtual Edge Validate Release
Candidate
Deploy
Integration
Tests
36. Edge Applications: Putting it together
MQTT
Johnny 5
Controller
Cloud
Controller
App
Edge
Cloud
Cook
State
Get Data
Pub State
Subscribe
Subscribe
Pub State
41. Key Takeaways
Connecting things creates the opportunity to orchestrate
interactions between devices and people
• Think ecosystem: secure, open, scalable
• Cloud First, but if you need Edge, design it like a micro-
cloud
• Ensure that you have a strong security story
42. What’s Next for Chick-fil-A?
• Analytics and Machine Learning on IoT Data
• Machine Learning at the Edge
• Considering providing local queueing for Edge apps
• Re-evaluating persistence
• Support for short-lived apps
43. Where to find me
www.linkedin.com/in/brian-chambers
@brianchambers21
http://brianchambers.blog
Notas do Editor
Intro to ME / Connect personally
Maybe make it personal
Going to be friends and co-workers here for the next 50 mins
Tell a few things I do outside of work so people know something about me
What I do at CFA as EA
Intro to CFA
make sure to talk about the scale and the scope of what we’re doing, and make sure people understand what CFA is (distributed nature and scale)
2000 restaurants across the US and Canada
Fast growth
Culture of innovation – what are the stories that tell about who we are as a brand?
(No longer applicable) Key Takeaway: How you can take MQTT, OAuth, and Docker Swarm to build a scalable IOT solution
Intro to CFA / Me
Agenda overview
Dive into each of the key topics and do a quick demo
Go into the CFA architecture overview
Time for QA at the end
What are “things” in the IOT world?
Things can mean a lot of different…. Things…
Mobile devices and wearables
Consumer electronics and assets that are connected (long-lived like ovens and refrigerators)
Could be big assets like cars… or industrial machines… engines, etc. (look at GE).
In the CFA world here’s the way we think of things.
They can be hardware things…
Kitchen equipment like we mentioned
Software things
Often many software things on a single hardware thing
IOT is simply taking things and connecting them
Why?
To collect data
To create interactions between people and things that are meaningful / create amazing experiences
So why is a company like CFA working on IOT?
Do we even have technology? We’re just fast food right? Fast food companies don’t do interesting engineering…
Not so much…
Why???
Capacity
Quality of products
Equipment usage and health
Food safety
Customer experiences
Automation
We see technology are critical to the future of our customer experiences, our ability to scale, and out ability to keep our foothold as a leader in the industry with quality food and great customer services experiences for everyone that comes.
Principles
Secure
Open
Scalable
Why was secure important to us? Tell a story about security.. Maybe the nanny cam one from a ways back
Security
Lot of headlines about security issues with IOT devices
Firmware often not updated
Ability to access other services is not well thought out
Design goal to ensure that we have a very secure solution
Important to have layers – connectivity, service authentication, granular permissions
Idea of where inertia takes you
Lack of standards in IOT today
Credit: Brook Ward / https://creativecommons.org/licenses/by-nc/2.0/
Open
Minimal hurdles to engage
Easy rules of engagement to follow
SDK for vendors that want to participate
Platform / Ecosystem
Mention the digital ecosystem we are building at CFA
Credit - https://www.inc.com/14-tips-for-jumping-entrepreneurships-hurdles.html
Scalable
Where inertia takes you is not good
A lot of IOT vendors see themselves as the center of the universe
Everyone provides a gateway
Everyone provides connectivity
Everyone provides a portal with analytics
There is No interoperability
No ability to build bigger things that are composites of different solutions
Costs get out of hand
Complexity gets out of hand
Tons of single points of failure
Difficult to support and manage
No interoperability
Credit - http://www.content4demand.com/blog/better-approach-building-modular-content/
Quick look at the end state architecture we use. We’re going to tear it down and build back up as we go.
Overview of the structure
We have services in the cloud that support the overall solution.
We hae the Edge
WE have a layer for connectivity
And we have things.
So the first question to answer
So today we are going to learn about Chick-fil-As approach to IOT but we are also going to do something that I think is a first at this conference…
We are going to create a new product together and figure out how to make the architecture work to support it.
Sound good?
Lets just say that, hypothetically, I wanted to be able to automate the creation of my new product idea, the IOT sandwich…
Have an opportunity together today to create a brand new product. The IOT sandwich. You might shake your head.. But trust me… its going to be amazing.
We’ll talk through the thought process and the architecture to be able to do all the things we need to do to be able to bring the device online, get connectivity, collect data from it, automate the preparation, and more.
Now that we have our IOT sandwich machine ready to go… Super high tech.
Lets get it to a point where it can securely send out data about what its doing, cause that’s useful data for us to collect and use for future decisions.
We’ll need to make sure that all of this is done VERY SECURELY.
Lets take a look at what things we should think about when it comes to security for things..
Alright, we have a product that we’ve developed that looks amazing
And we have a state of the art machine ready to be able to make it
So if Johnny is going to be an IOT device, how do we get him connected? Lets take a look at that. This is where security really comes into play…
Lets talk for a second about what we need to think about when it comes to onboarding devices
There are a lot of different approaches…
We could put crypto material on at manufacture time, but unfortunately that doesn’t work for our case… and it creates some management challenges as well
So lets talk about the approach we used to keep it simple for device manufacturers.. And what our new IOT Sandwich Bot will use.
Network credentials – don’t want to have real credentials or certs for networks hardcoded at manufacture time. Sometimes that’s not possible. Talk about our approach. We have certain credentials profiles at manufacture time, but that really doesn’t let you do anything other than register yourself. You still have to be authorized on install by a person with credentials that have access to authoirze that particular type of device. And that profile varies… so its pretty abstract and difficult to figure out
TLS – really goes without saying that everything needs to be TLS these days… both internal and external just in case. We use our own certificate authority for SSL at the edge.
Device Registration – how does a device show up in a restaurant and get connected and get permissions – more on this in a second
Authentication and Authorization – more on that as well
Brokered Communications – devices don’t get to talk direct to each other. They send messages via the MQTT broker. Single point of authentication. In fact, we actually don’t allow peer to peer communications at the network level, so if you’re a device, all you have to talk to is our edge, and perhaps internet services as well.
Use Industry standards instead of inventing our own approach
No degredation when Network offline
Demo time – 5-7 mins
So lets do it. Lets get our Johnny 5 IOT Sandwich maker connected.
My laptop will be Johnny and we will interact with the Auth server in the cloud to get Johnny on-boarded and ready to start cooking our sandwich.
So far, this is what we have seen from the architecture perspective
We can onboard a “thing” or device into the CFA ecosystem
We have connectivity as part of that process
We can get a token from the auth server
Why arent we talking about power management and constrained devices?
Most of the use cases we have solved so far don’t have power as constraints.
We do support Bluetooth but have not seen a lot of use cases so far.
The way we’ve solve that just requires a separate piece that we call an “advocate”. It has special permissions. If you’re interested come find me and I can tell you more afterwards.
Much better to use refreshable tokens. With network credentials, distribute via some authenticated service. Supports being able to change the credentials in the future if needed.
Human authorization – ensures a person makes a deicison, and ads more layers that have to be compromised. In our case we use SSO and MFA so you really need to be the intended person to onboard a device.
Great, so we’re connected and we have a token…
Our device can be manually controlled, or be “app controlled / autonomous”
For now, lets focus on a manual controlled device.
So What can we do with it?
First, we might want to be able to collect data from the machine so we understand how its being used, how often, if its being cleaned correctly, if its throwing error codes, or if it has a strange pattern of behavior for some reason. So lets do that.
IN SHORT…
We basically need a way to send and receive messages. A messaging services.
There are a number of options, but one of the market leaders is called MQTT.
We provide a messaging service for our IOT ecosystem to solve this.
Why is it a good protocol to use for us?
We run it both at the edge and in the cloud.
We use it at the edge to broker interactions – relatively high volumes of messages.
We use it in the cloud to bridge back down and send messages to the edge, relatively low volume.
https://github.com/mqtt/mqtt.github.io/wiki/Design-Principles
Explain the kinds of messages we might want to send out from a device
When it starts cooking
When it finishes
When someone presses a button
When it cleans
The fact that it came online
Putting it together so far,
here’s where we are.
All good, but what if we lose connectivity?
We won’t be able to collect any data for a while which might be okay
But it might not be good enough for us to make our sandwich. We probably want to be able to do that whether the network is up or down… so how are we going to solve that problem?
Our goal is to make the IOT sandwich anytime we want
This is where Edge compute comes into play.
What is edge compute? Why do we have it? What does it actually look like for us (3-5 devices, commodity hardware, 8GB RAM, SSD storage)
How do we think about Edge?
Edge Design is like cloud design
More than just hardware, its really a software ecosystem that we’ve developed to enable our business.
In a sense, Edge computing is really just cloud thinking at a micro scale.
We have sometimes referred to it as a micro-private-cloud or micro-datacenter.
Quick narrative
I believe CLOUD THINKING is relevant and directly applicable to EDGE THINKING
Has been a really interesting dynamic to live in. Cloud on one side with unlimited resources.
Edge on the other, highly constrained.
We were able to take cloud concepts and bring them into play at the Edge, but still have to manage our limited resource capacity. This causes us to implement some interesting patterns when it comes to Applications that run at the edge.
We have tried to apply what we’ve learned from cloud to the edge from a design principle perspective.
Build reusable, scalable platforms that have reasonable, well-documented service levels and limitations
Talk about what services we actually provide at the edge (next slide)
Deeper dive into the Edge and what we do there
How it talks out to the cloud for services
What kinds of apps we run there
What we will do at the edge in the future
Devices vs Edge apps… what they need
Devices need auth and messaging
Apps need HTTP server and persistence store
Need event collection
What is it?
Why do we use it?
Same reason you would want to use containers in the cloud basically. The design principles hold.
Isolation of apps
Self-healing architecture. When one of our edge devices dies or a service dies, the Swarm will ensure another instance is started back up
Edge tools are used to interact with the swarm remotely to handle edge cases (heh) where we have swarm failures and need to rebuild remotely, or issue other kinds of commands to the edge. Actually refining our toolset now. Came
Explain a little more how this works when services are offline
Devices almost always need only the edge to do their jobs
They rarely depend on cloud
Edge apps have some dependency on cloud and have graceful degradation when WAN is down
Putting it together so far, here’s where we are.
So I want to build an application here that can interact with the RIOT Sandwich maker…
First, I’ll need to be able to authenticate so I need to onboard like any other device to get a token, like we saw before.
Then I’ll need to be able to interact with the device so we’ll need to solve that.
I might need to know what demand there is for my sandwich at any given time so I know what to make…
And I might need to persist some state in case the server where my app is running dies. In fact, I’d really like to be HA so that I have little to no outages.
What kinds of inputs could an app take?
MQTT
Could take images or video if we wanted
This speaks to the power of the edge.
If you have enough resources (but not too many) you can do some cool stuff very quickly at the Edge.
We aren’t quite there yet, but perhaps in the future.
We are certainly building with that in mind in a world where images and video are increasingly common ways to solve problems
When the cloud isnt there, we want graceful degredation of services.
What kinds of inputs could an app take?
MQTT
Could take images or video if we wanted
This speaks to the power of the edge.
If you have enough resources (but not too many) you can do some cool stuff very quickly at the Edge.
We aren’t quite there yet, but perhaps in the future.
We are certainly building with that in mind in a world where images and video are increasingly common ways to solve problems
When the cloud isnt there, we want graceful degredation of services.
Putting it together so far, here’s where we are.
Now we’ve got a full Edge architecture including apps that we want to run
All this is great, but we need to consider Operations and Management as well…
Really want to think about scale…
ML and analytics to diagnose failures
Auto-shipping for devices diagnosed as failed (costs as much to troubleshoot remotely as replace)
Edge Tools – issue whitelisted tasks to the Swarm to fix it, perform other kinds of updates, etc.
CI/CD critical as well
Congrats, we’ve successfully been able to create the IOT sandwich.
Give yourselves a hand!
And one more look at our final architecture to recap.
So now we’ve added Operations and Management… and we have our final view of the architecture
To recap…
What can we do?
Support scale of hundreds of thousands of events per day across thousands of restaurants, at scale
Decentralized onboarding
Enabled ecosystem
Why did we do it ourselves instead of using a cloud platform?
Wanted to use open standards as much as possible, especially with security / OAuth
Wanted to be able to run our stuff and vendor stuff at the edge so we needed an open platform like Docker
So what should you take away from this?
Areas we are focusing on now…
Analytics / Machine Learning on the data collected
ML at the Edge
MQTT broker – QOS 2 not implemented in ours, may need for some use cases
Persistence change to shared Cassandra
Waking up apps when there are requests to conserve resources
Putting a post up with some helpful links related to the talk and more resources for those that want to go deeper
Quick thanks to Wes, Jean, and CFA team for all the hard work put into this, and to audience for listening. Love to connect further.