This session will feature RAINN, and the always up, always on needed infrastructure required to support its mission. You will gain an understanding of their environment and why they chose AWS, how they tackle security and more.
Scale and Reach: Always Up - Always On - AWS Symposium 2014 - Washington D.C. - Partner Presentation - PBS Digital
1. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
“Always Up-Always On”:
Scale and Reach
Jacob Hileman, RAINN
Mike Howsden, PBS
2. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Crisis Services in the Cloud
Jacob Hileman
3. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
About RAINN
• Nation’s largest anti-sexual assault
organization
• Public policy
• Prevention and education
• Victim services
4. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Victim Services
• 1-866-95-HOPE
– 1100 partner centers
• www.RAINN.org
• Online Hotline: online.RAINN.org
5. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
National Sexual Assault Hotline
• Over 1.5 million callers helped
• Routes to the closest Rape Crisis Center
– 1100 partner centers
6. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
RAINN.org
• First contact point for our users
• Wide array of content:
– Laws in your state
– Articles for victims
7. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Why 24/7 Is Important
• Many service providers only operate
during business hours
• Getting help when you need it is important
• Different demographics seek help at
different parts of the day
8. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Importance of 24/7
APRIL 2014:
270,000visitors
650,000page views
9. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Importance of 24/7
2014 pace:
3,000,000
visitors
10. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Importance of 24/7
54%
of traffic outside of
business hours
11. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Online Hotline
• Web-based crisis intervention chat.
• It’s private. No transcripts are kept.
• It’s anonymous. No IP addresses stored.
• It’s encrypted. All chats over SSL.
• Yearly security review.
12. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Online Hotline
350,000
visits since launch
13. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Online Hotline
JANUARY – JUNE 2014
3x
the traffic we saw in all of 2011
14. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Security & Availability
• Security Groups
– Servers have only the network access
they need
– No master security group that holds
everything
15. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Security & Availability
• Use a staging environment
– Updates verified in a staging environment
– Then promoted to production
– CloudFormation
16. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Security & Availability
• Use Amazon’s VPN Gateway
• Launch private instances only available
to your office network
• Keep database and app servers off the
internet
17. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Security & Availability
• Elastic Load Balancers
• Update servers with no downtime
– Connection Draining
18. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Security & Availability
• Amazon RDS
– Point in time backups
– Access to logs
– Multi-AZ failover
• mySQL & SQL Server
19. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Security & Availability
• Use multi-factor identification
– Google Authenticator
20. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
User Trust
• User privacy and safety are important
• Keep your users first
21. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Thank You
JacobH@RAINN.org
22. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Scaling Downton Abbey with AWS
Mike Howsden
mvhowsden@pbs.org
23. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Downton Abbey Audience
• Record-breaking 11.1 million streams of full-episodes for season 4.
– 46% of that content was consumed on mobile and over-the-top devices.
• There were 15 million streams of all Downton Abbey content during the
streaming window (clips, previews and full episodes combined)
• The MASTERPIECE website saw 10.1 million unique visitors who
generated 22.2 million visits in January and February.
• Single-day streams of 410,000 for the premiere of Episode 1. The biggest
premiere out of any Downton season.
Source: Google Analytics January 6 - March 9, 2014
24. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Not your normal week at PBS
25. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Scaling Reads
• EC2 Servers
(Horizontally Scaled)
• RDS
• ELB
• Cache Servers
• S3+Cloudfront for
HLS/RTMP video
and static media
delivery
Lessons learned:
read-only mode, long
cache-times help
weather a storm
26. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Advantages in AWS
• Sunk costs aren’t sunk
– Servers can scale without having to purchase new
hardware
– Optimize at your leisure, fixes hit the budget
immediately
• Lower barriers to adopting new services
27. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Looking Forward
• Utilizing Elastic Map Reduce to better
understand video QOS
• Better leverage CDN and Delivery routes to
improve end users experiences
• Launching on numerous additional platforms
(OTT/mobile)
• Scaling writes (more user specific services)
28. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Thank You
Mike Howsden
mvhowsden@pbs.org
Notas do Editor
I’m jacob hileman. i’m the director of technology at rainn. I’m going to talk about rainn and a few of the services we offer and what we do to keep the services up 24/7.
So first, about rainn.
Rainn is the nation’s largest anti sexual assault organization. Our efforts are primarily focused on three areas:
--Public policy
--Prevention and education
--And victim services
I’m going to concentrate on victim services and some of the programs we offer them.
We have three major services we offer survivors.
The first is a telephone hotline that connects you to your closest rape crisis center. We are partners with over 1100 centers in the US.
Next we have our website rainn.org. Which has articles, a rape crisis center map and search, and other resources that survivors and family members can use.
And last we have an online chat hotline at online.rainn.org the online hotline is a web based chat where you can talk with one of our staffers for support, crisis intervention and referrals.
The national sexual assault hotline was rainn’s first program originally launched in 1994. it allows victims a centralized number they can call to be connected with their local rape crisis center.
The hotline worker they reach can tell the caller about therapeutic services, medical attention or reporting options in their area
For example, if we call from DC, we’d reach a hotline worker through the DC Rape Crisis Center. They’re up in Fort Totten. They could explain to the caller that the right place in DC to get a rape kit is the Washington Hospital Center. They could also arrange to have an advocate meet them at the hospital, and then offer therapy for afterwards. Or, they could talk to the caller about pros and cons of reporting if they’re feeling unsure.
Next is rainn.org
--People searching google for rape help or sexual assault help will typically find us as the number one search result. So we’re often the first or second resource for people trying to understand their assault or a friend or family member’s assault.
--and we help them do this with a wide array of content including:
the laws in your state regarding everything from hiv testing, the statute of limitations on rape and even confidentiality laws.
-For example, a visitor might want to know if she can still report her child sexual abuse because after years of silence, she’s ready to tell. -Another visitor might need to know whether or not they can receive medical attention and disclose the rape to a physician without being forced to report to local police.
--we have resources for survivors, including articles about the effects of sexual assault, info for adult survivors, depression, suicide, and even resources for victims of military sexual assault.
-for example, Visitors that are struggling with flashbacks can visit RAINN’s page about that to read a breathing exercise and tips on how to avoid or manage flashbacks -Loved ones of survivors can visit our page on “How to Help a Loved One” to better understand how to be supportive and sensitive to their loved one’s needs -Anyone who is unsure of whether or not their experience would be considered rape can visit “Was I Raped?” to gain a better understanding of what would or would not be considered rape -Survivors often feel crazy or isolated by their experience. They can visit our page that discusses common side effects and get some affirmation that their thoughts and feelings are natural reactions to what happened
As you can tell, 24/7 is very important to our organization
--Sexual violence doesn’t happen on a 9-5 schedule and getting a voicemail is not helpful during a crisis
--getting help when you need it can turn a crisis into an opportunity. For example, a visitor can’t decide whether or not to report, or doesn’t understand the importance of medical attention.
--different demorgraphics seek help at different points in the day. For example:
-Moms that need to chat but can only do so after the kids go to bed
-Or teens that need to use it after school
-Or working professionals who can’t get help during business hours
-Or survivors who are having trouble getting through the night because of nightmares
-Or even someone doing peace corps abroad.
And like I mentioned earlier, our website rainn.org is typically the first point of contact for our users.
And we have a lot of them.
So in april of this year, we had 270,000 visitors
With over 650,000 page views
That’s a lot of people looking for information.
And we’re on pace for over
For over 3,000,000 visits this year.
what I think is most surprising about these numbers is that
54% of all those visits happened outside of business hours.
These numbers really illustrate how important high availability is.
When serving our demographic we need to be available to them when they need it.
Now The other major service we offer victims is our online hotline.
The online hotline is a web based crisis intervention chat and our flagship program
You visit our site and chat one on one with a trained staffer.
We take the privacy and security of the online hotline users very seriously.
--It’s private, anonymous, and one on one.
--We don’t store transcripts of chats. Once the chat is over the transcripts are cleared.
--It’s anonymous. We don’t store access logs or web logs.
--And it’s encrypted. All our chats occur over ssl.
--It goes through a yearly security review done by an independent third party.
Essentially, what this privacy means to our users – a woman experiencing DV or a teen being abused by her dad, is that they don’t have to worry that their abuser will be able to find any record of their contact with RAINN; Staffers are trained to teach visitors how to clear their history, cookies, and cache just in case. We’re even working on a new service that allows victims to call our telephone hotline using their browser, so that there’s no record of the call on their phone bill.
Providing a safe environment to our users is a major priority for us.
And our hotline has had a lot of users visit it since its launch
--Over 350,000 visits have been made to our hotline service since its launch in 2007.
--That’s an incredible amount of people looking for crisis services.
--and they are coming from all across the country and all times of the day.
And traffic is continuing to grow
In the first 5 months of 2014 we’ve already tripled the traffic we saw in all of 2011.
and it’s only june.
3 times the traffic. We’re on pace to do nearly 12x the traffic from 2011 by the end of the year.
And like I said before This traffic spread out all across the day. There isn’t a time where there are no users visiting our hotline. It is a true 24/7 service.
Okay so now let’s talk about some of the ways we use aws for security and high availability
We do a few things to minimize threats and maximize uptime. One of the easiest and most beneficial things we do is take advantage of the vpc security groups.
Using security groups you can restrict network access to servers on a protocol and port by port level.
This lets us lock down servers to only have only the network access they need to do their job.
Only give your servers network access they need to do their job.
Another thing we do is to never update production servers.
We’ve built a staging environment that matches our production environment and we do all our testing there.
If an update to our software is needed, we launch the ami that the production servers are running, update it, test it and then promote it to production. No accidental bugs. No accidental downtime.
Having a staging environment and a production environment at amazon is still less than half of what we were paying previously at a physical host. And the best part is staging doesn’t even have to be up all the time. Using Cloudformation you can spin up the entire environment from a configuration file on demand. Then turn it off when you don’t need it anymore. So you’re only paying for those resources when you use them.
So I can’t recommend this enough.
Use amazon’s vpn gateway.
It allows you to extend your office datacenter or server room straight into the cloud. You can keep database and app servers private and off the internet while still allowing you to connect to them from your office. It’s a fantastic feature.
Elastic load balancers.
We use elastic load balancers in front of all of our public instances. they allow you to better distribute traffic across your application and web servers for quick and easy scaling.
Now While being able to scale your infrastructure based on demand is pretty cool, I think the connection draining feature is their biggest selling point.
When you enable Connection Draining on a load balancer, any back-end instances that you deregister will complete requests that are in progress before deregistration.
For example, when we update drupal we add the new drupal instances to the production ELB pool while deregistering the old instances. Users slowly get routed to the new drupal instances without ever noticing a difference.
It’s a feature that makes high availability dead simple to achieve.
Here’s one way that moving our infrastructure has already saved us time…
we had one of the physical servers hosting rainn.org crash a few months ago. Even though we were receiving fanatical support, it took over 24 hours for our site to be restored from backups. That’s 8,000 visitors we weren’t able to help. We found that during our disaster recovery tabletop exercises we’d be able to bring our website back up in that situation within an hour at amazon. that’s 350 lost visitors instead of 8,000.
We also take full advantage of amazon rds.. Which is their fully managed database service.
It includes point in time backups, access to server logs, and even multi az failover.
They handle all of the server upgrades and security patches. You just give them a window of time, usually 30 minutes, where they can perform the upgrades once a week. The best part is during this time the servers failover to a redundant copy with no downtime to you. It happens in the background so your applications never notice it.
While mySQL has had multi az for a while, they just added the feature to SQL server so it’s a perfect fit for our applications.
Multi factor.
We also use multi factor authentication. You can install google authenticator on your phone and it’ll give you a six digit number you enter in after your password. It’s free and adds a simple, but enormously effective layer of security to your system.
Your entire infrastructure could easily be hosted at amazon. Letting someone get access to it just by guessing your password is a scary thought.
For my last slide I want to shift and talk a little bit about user privacy and safety and how your users trust is important.
If you work in a direct service envinronment, as I’m sure some of you do, you want to always keep your users privacy and safety in mind.
When implementing or changing any process or system, put yourself in the user’s shoes. Knowing what you know and what you’re able to do, would you be comfortable using the service if you were in a crisis situation? If you wouldn’t be, fix it. Change the process or the system. Revisit the drawing board, figure out where the breakdown is happening.
If people are coming to your service for help it’s because they trust you. The last thing you want to do is jeopardize that trust. You want to always Keep your users first.
Thank you.
So this is the task, more than I’ve handled before and a very different context, how do we handle this?
How have I handled scaling reads in the past (Atlantic Media, datacenter, CDN, no cloud)?
Number of video streams and website views goes up significantly during Downton but for the rest of the year stays pretty constant.