Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Lamar University CAS HA
1. High Availability in Hurricane
Alley
Multi-site multi-node CAS
Deep in the Heart of Texas
Srinivas Varadaraj & Bill Thompson
Jasig Sakai Conference 1
2. Agenda
1. Strategy
2. Technical requirements
3. Constraints
4. Stuff at hand
5. Architectural decisions
6. Cluster & production architecture
7. Challenges and solutions
8. Multi-site routing
9. Production experiences
10. Questions & Comments
Jasig Sakai Conference 2
4. Technical requirements
• Application Compatibility
• High Availability
• Rolling maintenance
• Transparency
• Scalability
• AD integration
• Customization(branding)
Jasig Sakai Conference 4
5. Constraints
• Limited budget , use existing resources.
– Power in the datacenters
– Single internet
– High latency connectivity
• Limited in-house development & experience
– Stay close to release code
• Aggressive timeframe
Jasig Sakai Conference 5
6. Stuff we had at hand
• SAN infrastructure with replication to DR
• VM clusters
• Site-to-site VPN based connectivity to DR
• F5 loadbalancers
• Dedicated firewalls
• Opportunity
Jasig Sakai Conference 6
7. Decisions ! Decisions ! Decisions !
• Virtual Machines
• SAN based storage
• The great ticket registry debate
• To replicate tickets or NOT !
• Building by cloning
• “Appliance” like
• SSL Local vs Offloading
• Cluster VS Standalone application servers
• Timeout !
Jasig Sakai Conference 7
10. “Holy troubles, Batman!”
• SSL offloading
– Tomcat offloading workaround
• Authentication and Validation persistence
– User and application can go to either site.
– Enter site identifiers
• Multi-site ticket replication.
– Latency in WAN
• Algorithm usage in phpCAS clients and Java CAS
clients
• Slow performance of mod_auth_cas on VMs
Jasig Sakai Conference 10
12. HTTP_REQUEST(Request from the client)
HTTP_REQUEST{
1) Grab header length to determine payload size
2) If both sites are down, redirect to a branded
service unavailable page
3) If URI has siteID of other site and other site is
up, route to other site
4) Otherwise default route to local site
}
Jasig Sakai Conference 12
13. HTTP_REQUEST_DATA(Payload manipulation)
HTTP_REQUEST_DATA{
1) Grab <samlp:AssertionAtrifact> from payload , this may contain siteID
2)
if we have a siteID of the other side {
If the siteID is Loadbalancer introduced {
blank the loadbalancer extension
}
Route to other side
else {
if we have a siteID of the local side {
If the siteID is Loadbalancer introduced {
blank the loadbalancer extension
}
Route to local side
}
}
Jasig Sakai Conference 13
14. HTTP_RESPONSE(Response from the server)
HTTP_RESPONSE{
1) Grab server’s response headers
2) If SiteID is not in the response header
{
Introduce a loadbalancer siteID to
compensate for java CAS client
}
Release HTTP to client
}
Jasig Sakai Conference 14
16. Experiences in Production
• Approx. 8 months in production
• 7 Applications in production 10 in development
• Survived two power outages at DR
• Survived multiple internet outages
• Successful rolling upgrades to MySQL & CAS
• Flow based redesign.
• LPPE
• Re-visit ticket registry
Jasig Sakai Conference 16
17. Questions/Comments
• Credits:
– CAS developers and community
– F5 & F5 devcentral
– Unicon
– LU & Txstate
• Thank you for your time !!
• Contacts:
– Sri: Sri@lamar.edu
– Bill: wgthom@unicon.net
Jasig Sakai Conference 17
Notas do Editor
Consolidate identities on campus and standardize on a single LDAP service that provides that identity.Provide Single Sign On (SSO) standard to the campus compatible with a vast majority of existing applications, (Zimbra email, Self Service Banner, Blackboard LMS, payment gateway system, degree audit system, Library systems etc.)Introducing a self-service mechanism for account management and include this into the login flow process.Provide capabilities for a light weight portal which can be replaced with a full portal in the future.
Solution had to be compatible with majority of application. So, solution with open standards were most favorable.Solution had to highly available, resilient to both local disasters (datacenter outage, LAN and WAN outages, application service failures) and regional disasters ( Hurricanes).No single points of failureAbility to do maintenance work on the system without outage (rolling maintenance).This availability has to transparent to the user to keep the SSO process seamless.System had to easily scalable with load (horizontal or vertical).Tie into pre-existing AD with Attributes release to applications. Branding to match university standard requirements for regular browser access and mobile browser access.Active-active, or active-failover delivery model with automated failover of services.
Had to use existing network resources as much as possible ( existing LAN , WAN equipment and connections)Limited in house developers and experience led to very good partnership with Unicon to provide installation and ongoing support for install baseDesire to keep as close to the release code as possible without heavy customization.Latency link between the datacenters.Short timeframe :)
SAN based infrastructure , with VM clusters.A replication of core/mission critical data to the DR site from SAN to SAN over a site-to-site VPN tunnelLoadbalancer infrastructure (LTM) that supported SSL offload to a wildcard certificate.Dedicated datacenter firewalls at both primary and DR datacenters.Opportunity !
VMs clusters provided protection from hardware failures and ability to scale horizontally via cloning.SAN storage provided storage redundancy and protection from hardware failures with offline backup capabilitySSL offloading to the loadbalancers allowed us to save money by using the wildcard certs.Choice of ticket registry was much debate, it was down to EHCache and Database ticket registry.Chose to horizontally scale it to protect against service failures.To eliminate database failures taking down CAS, we chose an ndb cluster which allow, in memory data with periodic writes to disk, built in replication between nodes, multiple access points to the same data allowing building of VM like 'appliances' where each application instances
SSL offloading from Tomcat requires special setup in Apache + AJP , not a true solution but a workaround , < insert configuration here>To replicate tickets between sites or not ?Maintain persistence within the site ( options, source address vsjsession id)Maintain persistence across sites ?Authentication and Validation can happen at different sites, enter site identifiers.Routing traffic based on site identifiers. Site identifiers are all over the place, /serviceValidate is in the URI, /samlValidate its in the SAML payload, Java CAS client is unique !! unique ID is not generated using the same algoritms. So had to add and remove site identifiers on the loadbalancers