The Apereo Open Academic Environment is a platform that focusses on group collaboration between researchers, students and lecturers, and strongly embraces openness, creation, re-use, re-mixing and discovery of content, people and groups.
How does Apereo OAE work? OAE targets a large scale and a multi-tenant cloud-compatible deployment model, where a single installation can host multiple institutions at the same time.
This presentation provides a very detailed overview of the overall architecture and the different components and technologies. We will take a closer look into all of the following components and how they are being used:
- Node.js
- OAE Widgets
- Apache Cassandra
- ElasticSearch
- Redis
- Nginx
-RabbitMQ
We also talk about the approach used for continuous nightly performance testing and how we are validating the desired (horizontal) scalability. Details around back-end and UI unit testing, code coverage and security testing are shared as well.
2. Topics
1. Project Goals
2. Hilary System Architecture
3. Clustering
4. Hilary Design and Extension Patterns
5. Performance Testing
6. Deployment and Automation
7. UI Architecture
8. Customization and Configuration
9. Part 2: Hands on example
Or something else?
Wednesday, 12 June 13
13. Multi-tenancy
• Market is heading
• Support multiple institutions at same time
• Multi-tenancy+
• Easily created, maintained and configured
Wednesday, 12 June 13
15. Topics
1. Project Goals
2. Hilary System Architecture
3. Clustering
4. Hilary Design and Extension Patterns
5. Performance Testing
6. Deployment and Automation
7. UI Architecture
8. Customization and Configuration
9. Part 2: Hands on example
Wednesday, 12 June 13
16. OAE Architecture
The Apereo OAE project is made up of 2 distinct source code
platforms:
• “Hilary”
• Server-side RESTful web platform that exposes the
OAE services
• Written entirely using server-side JavaScript in Node.js
• “3akai-ux”
• A client-side / browser platform that provides the
HTML, JavaScript and CSS that make up the browser
UI of the application
Wednesday, 12 June 13
19. Application Servers
• Written in server-side JavaScript, run in Node.js
• Node.js used by: eBay, LinkedIn, Storify, Trello
• Light-weight single-threaded event-driven platform that process IO
asynchronously / non-blocking
• Uses callbacks and an event queue to stash work to be done after IO or
other heavy processes complete
• App servers can be configured into functional specialization:
• User Request Processor
• Activity Processor
• Search Indexer
• Preview Processor
Wednesday, 12 June 13
20. Apache Cassandra
• Authoritative data source
• Provides high-availability and fault-tolerance without trading away performance
• Regarding CAP theorem, Cassandra favours Availability and Partition Tolerance over
Consistency, however consistency is tunable on the query-level (we almost always
use “quorum”)
• Uses a ring topology to shard data across nodes, with configurable replication levels
• No RDBMS?
• Cassandra gives more flexibility with incremental scalability in a cloud
environment
• Flexible scaling helps to overcome unpredictable growth of multi-tenant systems
• Medium-to-long term options for replicating data to multiple data-centers for
localizing both reads and writes
• Used by: Netflix, eBay,Twitter
Wednesday, 12 June 13
21. ElasticSearch
• Lucene-backed search platform
• Built for masterless incremental scaling and high-
availability
• Powers Hilary search, including library, related content,
group members and memberships
• Exposes HTTP RESTful APIs for indexing and querying
documents
• RESTful query interface uses JSON-based Query DSL
• Used by: GitHub, FourSquare, StackOverflow,WordPress
Wednesday, 12 June 13
22. RabbitMQ
• Message queue platform written in Erlang
• Used for distributing tasks to specialized
application server instances
• Supports active-active queue mirroring for
high availability
• Used by: Joyent
Wednesday, 12 June 13
23. Redis
• Fills a variety of functionality:
• Broadcast messaging (can move to RabbitMQ)
• Locking
• Caching of basic user profiles
• Holds volatile activity aggregation data
• Comes with no managed clustering solution (yet), but has slave
replication for active fail-over
• Some clients manage master-slave switching, and distributed
reads for you
• Used by:Twitter, Instagram, StackOverflow, Flickr
Wednesday, 12 June 13
24. Etherpad
• Open Source collaborative editing
application written in Node.js
• Originally developed by Google and Mozilla
• Licensed under Apache License v2
• Powers collaborative document editing in
OAE
Wednesday, 12 June 13
25. Nginx
• HTTP and reverse-proxy server
• Used to distribute load to application
servers, etherpad servers and stream file
downloads
• Useful rate-limiting features based on
source IP
• Used by: Netflix,WordPress.com
Wednesday, 12 June 13
26. Topics
1. Project Goals
2. Hilary System Architecture
3. Clustering
4. Hilary Design and Extension Patterns
5. Performance Testing
6. Deployment and Automation
7. UI Architecture
8. Customization and Configuration
9. Part 2: Hands on example
Wednesday, 12 June 13
27. Clustering Cassandra
• Cassandra uses a partitioned ring topology to distribute data
• Nodes are given a numeric token which determines their
“location” in the ring
• When data rows are read / written to Cassandra, the row
key is hashed using a “Partitioner”, which determines what
node holds the row’s data
• “Replication Strategy” is used to determine which nodes will
hold replicas of which rows
• E.g.“Simple Strategy” will use the N - 1 nodes clockwise
around the ring from the primary node
Wednesday, 12 June 13
29. Clustering Cassandra
(cont’d)
• Query consistency specified at request time
• ALL - All nodes must respond successfully
• LOCAL_QUORUM - (RF/2 + 1) nodes (in the
datacenter) must respond successfully
• EACH_QUORUM - (RF/2 + 1) nodes (in all
datacenters) must respond successfully
• ONE - Only one node must respond successfully
• Therefore, if you write with QUORUM then read with
QUORUM, then results should always be consistent
Wednesday, 12 June 13
30. Clustering ElasticSearch
• ElasticSearch shards data into a configurable
number of shards
• “Number of Replicas” can be configured at
runtime, which determines how many replicas
of each shard should exist
• Shard replicas are distributed among the
nodes in the cluster
• Shard is identified by hash of the document id
Wednesday, 12 June 13
32. Clustering Etherpad
• Short and sweet: It doesn’t really cluster
• Data is stored in Cassandra, but active sessions must all share
the same etherpad server
• Configure number of etherpad servers and their hosts in
Hilary, and configure Nginx to proxy to the appropriate
server
• Server is selected based on a numeric hash of the content
item id
• No high availability. If an etherpad server goes down, those
sessions are disconnected :( But etherpad content is retained
Wednesday, 12 June 13
34. Clustering RabbitMQ
• Uses a Master-Slave Active/Active queue mirroring
policy for redundancy
• We set a policy of ha-mode=all to all OAE queues
• Ensures messages are replicated to all queues
• Ensures all subscribed consumers receive
messages
• Since all nodes are active peers, when a node fails,
consumer simply reconnects
Wednesday, 12 June 13
35. Clustering Search
Indexers
• Search Indexers are regular application
nodes configured to consume search index
tasks
• INDEX_UPDATE, INDEX_DELETE
• Offloads fetching and processing of
indexing tasks to nodes that don’t impact
request latency
Wednesday, 12 June 13
36. Clustering Activity
Processors
• Regular application nodes that
• Receive and route activity tasks
• Collect and aggregate routed activities
• May be configured with dedicated Redis server for aggregation
• Aggregation is the process of deeming 2 or more activities “similar” and grouping them
into a single activity
• Maintains temporary information in Redis to keep track of what activities have occurred
recently and aggregate them on-the-fly
• Due to concurrency issues when aggregating new activities into feeds, routed activities are
sharded into “concurrency buckets”
• Avoids duplicate activities in streams, while not completely serializing the aggregation
process
• Bucket is selected based on a hash of
• Activity Stream ID (e.g., user or group id)
• Activity Type (e.g., share content, add to group, etc...)
Wednesday, 12 June 13
37. Clustering Activity Processors
(cont’d)
• Activity0 and Activity1 were serialized into bucket 0 to
avoid concurrency collisions in “user A”s stream
• A bucket is only collected by one activity processor at
a time
• An activity processor can concurrently collect multiple
buckets (max concurrent buckets is configurable)
• Number of buckets is configurable (is 3 in this
example)
Wednesday, 12 June 13
38. Clustering Preview
Processor
• Regular application node that is specialized
to handle preview processing tasks
• GENERATE_PREVIEWS
• Offloads the CPU and memory-intensive
process of generating previews to machines
that don’t impact user request latency
Wednesday, 12 June 13
39. Topics
1. Project Goals
2. Hilary System Architecture
3. Clustering
4. Hilary Design and Extension Patterns
5. Performance Testing
6. Deployment and Automation
7. UI Architecture
8. Customization and Configuration
9. Part 2: Hands on example
Still with us?
Wednesday, 12 June 13
40. Hilary design and
extension points
• Common patterns
• Search
• Activities
• File storage
• Preview Processor
Wednesday, 12 June 13
41. Search producers /
transformers
• Search producers
• Generates documents that need to go in the
index
• Search transformers
• Transforms query results coming back from ES
into something the UI can use
Wednesday, 12 June 13
42. Search Producers
• Produces documents that can be indexed/stored
by ElasticSearch
• Simple JSON document
• A search document contains the full profile. (ie no data
is hidden)
• Runs on separate Search Indexing servers
Wednesday, 12 June 13
44. Search Transformers
• Transforms an ElasticSearch result to something
the UI can use.
• Hides sensitive user data (if necessary)
• Adds thumbnail URLs
• Runs on the application servers
Wednesday, 12 June 13
46. Custom search queries
• Exposed as a REST API
• Ex:
• /api/search/general
• /api/search/content-library
• /api/search/<search name>
Wednesday, 12 June 13
47. Custom search queries
• // GET http://cam.oae.com/api/search/custom-foo?q=bar
var SearchAPI = require('oae-search');
SearchAPI.registerSearch('custom-foo', function(ctx, opts, callback) {
// The query you need to write
// opts.q = “bar”
var query = …
// Access scope the results
var filter = ..
callback(null, SearchUtil.createQuery(query, filter, opts));
});
Wednesday, 12 June 13
48. Activities
• Follows the activitystrea.ms spec
• Each activity has:
• an object (content item “presentation.ppt”)
• an actor (BrandenVisser)
• a target (Bert Pareyn)
• a verb (to share)
• End up in an activity stream
• Generated by separate activity servers
Wednesday, 12 June 13
50. Activities and notifications
• Notifications are “special” activities that
were routed to a separate activity stream
• E-mails can be sent out for notifications
• Piggy-backing notifications on activities
gives you free aggregation
Wednesday, 12 June 13
52. Activity seeds
• When something happens an activity seed
is created and sent out to RabbitMQ
• Contains the data for the Activity Servers
to produce the persisted activities and
generate the routes
Wednesday, 12 June 13
53. ContentAPI.on(‘content-comment', function(ctx, comment) {
// Create the actor, object and target objects for the activity
..
// Construct the activity seed
..
// Submit to RabbitMQ
ActivityAPI.postActivity(ctx, activitySeed);
});
Local events get offloaded to RabbitMQ
Wednesday, 12 June 13
54. Activity Producers
• Produces the persisted entity that should be
stored for each activity.
• Each entity should hold all the data necessary for
producing routes and transforming into UI friendly
data
• Should try to be compact as activities will be de-
normalized and an entity will be saved per stream
(each user has at least 2 streams, so this is a lot of data)
• Produced on separate activity servers
Wednesday, 12 June 13
57. Activity routes
• Activities can be routed to “activity streams”
• Each user and group has an activity stream, users also have a notification
stream
• Routing is the process of taking an activity and
determining who should receive the activity
• A route is a simple string with the ID of the
principal to which the activity should be delivered
• Routed on separate activity servers
Wednesday, 12 June 13
58. Activity Routes - Propagation
• Permissions/privacy in activities
• Possible values:
• ANY The produced entity data can be routed to any route. i.e., The entity is public or loggedin
• ROUTES The entity can only be propagated to the activity routes specified by the router
• SPECIFY Specify additional routes the entity should be routed to
• ex: Branden adds Bert to the Private group “OAE-Team”
• Actor: Branden, Object: “OAE-Team”-group, Target: Bert
• The default routing would generate an activity for all the managers of the OAE-Team group, need to
add a route for Bert as well
Wednesday, 12 June 13
60. Activity Transformers
• Transforms persisted activities into
activitystrea.ms compliant results
• Adds OAE specific data that can be
consumed by the UI.
• ex: More images
Wednesday, 12 June 13
62. File storage
• New file storage backends can be plugged in
• Available storage backends:
• Local disk storage / Mounted NFS
• Amazon S3
• Try not to serve actual file bodies with Hilary
Wednesday, 12 June 13
63. Preview Processor
• Generates thumbnail and large readable preview
images of content in the system
• Uses the REST API to interact with OAE
• Isolation from the application server
• Informed of new content to process by RabbitMQ
• Allows for multiple processors, prevents
dropped messages
Wednesday, 12 June 13
64. Preview Processor
• Existing processors for:
• Collabdocs - Uses webshot to take a browser screenshot of the etherpad
• Images - GraphicsMagick used to make various sized thumbnails
• Office Docs and PDFS - Individual page snapshots using LibreOffice and GraphicsMagick,
so the whole document may be viewed in the browser
• Arbitrary links
• Flickr, SlideShare,Vimeo,YouTube: Special Handling for REST APIs to fetch display
name, description and preview images directly
• Other links: Uses webshot to take a screenshot of links for which it doesn’t have a
specific handler
• Creating custom processors
• New processors can be added in new NPM modules
• Uses a registration pattern to hook in to the Preview Processor
• Flexible meta-data to back custom widgets
Wednesday, 12 June 13
68. Workflow
1. Generate data
2. Load data into the system
3. Tsung tests
4. Circonus
5. Analysis
Wednesday, 12 June 13
69. Workflow
• Establish a baseline by cycling through
testing, analysis and improvement
• Iterate-and-improve
• Try and maintain an acceptable baseline
when adding new features
Wednesday, 12 June 13
70. OAE model loader
• NodeJS tool that generates and loads data
• Tries to reflect real life scenarios
• ex: 30% of the members of a group
should be managers
Wednesday, 12 June 13
71. Model loader - Generation
• Generates re-runnable JSON files that define data to
be loaded into the system
• All data is based on predefined distributions and can
be tweaked
• Supports:
• Users
• Groups
• Content
• Links / Collaborative documents / Files
• Discussions
Wednesday, 12 June 13
72. Model loader - loading
• Loads the data into the system
• Writes the generated IDs to disk so they
can be re-used in the Tsung tests
Wednesday, 12 June 13
73. Tsung
• Tsung is an Erlang distributed load-testing
tool
• Able to simulate thousands of concurrent
users
• Used to stress test the application
Wednesday, 12 June 13
74. Tsung
• Takes an XML file that defines the HTTP requests and fires them off at the
server
• Sessions
• Each session has a probability of execution
• Contains:Transactions, dynamic variables
• Transactions
• Contains: Requests,Thinktime
• Requests
Wednesday, 12 June 13
76. node-oae-tsung
• Tsung’s XML syntax is hard and boring + xml files can get
huge
• => Automate it
• NodeJS tool to generate XML file
• Uses generated ID from the model loader data load
• Contains 3 layers:
• API
• Tests
• Suites
Wednesday, 12 June 13
77. node-oae-tsung API
• Each method in the Tsung API represents a high-level
user action against your UI
• Performs the REST API requests that would get called
when a user performs that “click” / action / page visit
• Encapsulates those requests into a “transaction”
• ex: Loading the group members page would do a
request to:
• GET /api/me
• GET /api/group/<group>/members
Wednesday, 12 June 13
78. node-oae-tsung API
// “Show the members of a group”-page
var members = module.exports.members = function(session, group) {
var tx = session.addTransaction('group_members');
tx.addRequest('GET', '/api/me');
tx.addRequest('GET', '/api/group/' + group + '/members');
};
Wednesday, 12 June 13
79. node-oae-tsung test
case
• Each test case describes a possible user session by executing a
number “transactions” / API methods:
• User logs in
• User searches for groups
• User visits a group
• Thinks a bit
• Searches for users to add
• Adds some users
• Performs a general search
• Visits a content item
• Shares that content item with a group
• Logs out
• End result is a Tsung “session” object that contains many
transactions and “think times”
Wednesday, 12 June 13
80. module.exports.test = function(runner, probability) {
probability = probability || 100;
// Create a new session.
var session = runner.addSession('add_group_users', probability);
var user = User.login(session, '%%_group_add_users_manager_username%%', '%%_group_add_users_manager_passwo
Group.profile(session, '%%_group_add_users_group_id%%');
session.think(2);
// Go to the members list
Group.members(session, groupId);
session.think(6);
// Add 2 users
var update = {
'%%_group_add_users_user_0%%': 'member',
'%%_group_add_users_user_1%%': 'member'
};
Group.updateMembers(session, groupId, update);
session.think(2);
...
}
Wednesday, 12 June 13
81. node-oae-tsung suite
• Contains a list of test cases / sessions you want to include in your test suite (i.e.,Tsung XML file)
• Has an optional probability option to control the session distribution
• Standard test suite file:
general_interest_term_anon,15
general_interest_term_auth,50
general_interest_content_auth,5
general_interest_group_auth,7
general_interest_user_auth,10
private_groups_interest,40
study_group_content,40
edit_content,40
edit_group,15
add_content_users,10
add_content_groups,5
add_group_users,10
add_group_groups,5
Wednesday, 12 June 13
82. Putting it together
• Need to create a new mix of sessions and distributions? Just
create a new suite file.
• Need to create new sessions that do different things? Just
create new test cases that use the API methods.
• Did web requests in your application change on certain pages?
Just update the requests executed in the API methods, all test
cases / suites update with it.
• New feature in your application? Create the API methods for
the new actions, incorporate into test cases.
• Did your UI get completely overhauled? Oops... remodel your
API methods from scratch.
Wednesday, 12 June 13
83. Tsung
• Take the tsung.xml file and run it
• Depending on the session lengths, can take
a couple of hours
• Generates graphs
Wednesday, 12 June 13
96. So, does it scale?
• Yes. We can scale the application
horizontally by adding more nodes
• Doubling the hardware, roughly doubles
the throughput
Wednesday, 12 June 13
99. Case: Permissions
• Identified as a key component that would impact performance
the most
• Permissions are propagated indirectly through group membership
• Steve has access to ContentAVIA GroupC
• Steve has access to GroupAVIA GroupC and GroupB
Wednesday, 12 June 13
100. Case: Permissions
(cont’d)
Attempt #1:
First do a direct association check against the target. If unsuccessful,“Explode” and store
denormalized group memberships for user when permission check is performed (if
necessary). Group membership changes intelligently try and keep denormalized groups list
up to date to avoid invalidation.
When doing a permission check, fetch the exploded list of groups and a select against the
target’s members with the indirect list of groups.
Assumption #1: Most permission checks would be for direct access to resources, so
exploded check would not happen often
Assumption #2: When needed, fetching exploded group hierarchy would be quite fast:
Fetch one row from Cassandra.
Assumption #3: Selecting matches in the direct target members would be quite fast:
Query finite number of columns from one row would be fast
Result: Baseline test peaked at about 300 requests per second. Scaling up to larger sizes
would explode the resource costs unacceptably, try and do better.
Wednesday, 12 June 13
101. Case: Permissions
(cont’d)
What went wrong?
For starters, D-Trace analysis shows our app
servers are spending over 90% of their time
serializing / deserializing Thrift bodies from
Cassandra
Wednesday, 12 June 13
102. Case: Permissions
(cont’d)
How do we fix it?
Maybe we’re querying exploded groups more than we assumed, and there is much
more data in the exploded group memberships than we assumed.
Attempt #2
Query the target resource direct members. If there are no groups, simply compare
the direct members with the source user. If there are groups, explode the groups of
the source user (if not already denormalized), and only query the group ids from the
exploded memberships that are directly assocated to the target resource. If no
results, no access. If there are results, there is access.
Assumption #1: Fetching full “exploded” groups is expensive, avoid it
Assumption #2: Most permission checks are against resources who don’t have
groups assigned as members, and there are generally much fewer groups assigned as
members than groups a user will indirectly be a member of
Result: Same baseline test peaked at 850 requests per second.Thats reasonable!
Wednesday, 12 June 13
103. Case:Activity
Activity Aggregation is the process of collecting multiple
“similar” activities into a single aggregated activity. In order to
accomplish this, lots of “recent activities” need to be pooled
together, and consulted for each routed activity.
Attempt #1: Store the activity history and activity buckets
in Cassandra
Result: 3 Cassandra nodes, breaks down on a routed
activity throughput of 135 per second during data-load’.
Equivalent to roughly 1.3 activities per second in our tests.
Not acceptable.
Wednesday, 12 June 13
104. Case:Activity (cont’d)
Problem #1: Cassandra latency reports in DataStax OpsCenter showing Cassandra
nodes are surpassing 5000 requests per second, and latency is climbing to over 10 seconds
and load on the Cassandra servers pushing 7.
Solution #1: Move aggregation into Redis.The data is volatile and gets evicted over time
any way.Avoids cluster co-ordination gossip and disk I/O for a large part of the activity
load.
Problem #2: Initial tests showed the memory footprint easily skyrocketed over 8Gb.
Not Acceptable.
Solution #2: Normalize the aggregated entities in Redis rather than duplicate them for
each activity aggregation entry.
Result: With 3 Cassandra nodes and 3 activity servers, collection happens at ~1500
routed activities per second (translates to 15 activities per second).With 3 Cassandra
nodes and 6 activity servers, collection was occurring at 2500 routed activities per
second, which is as fast as the mass data-load was creating them. Memory footprint in all
tests remains less than 4Gb for the entire duration. Cassandra remains stable in all tests.
Good to go!
Wednesday, 12 June 13
105. Topics
1. Project Goals
2. Hilary System Architecture
3. Clustering
4. Hilary Design and Extension Patterns
5. Performance Testing
6. Deployment and Automation
7. UI Architecture
8. Customization and Configuration
9. Part 2: Hands on example
Someone poke the guy sleeping in the back.
Wednesday, 12 June 13
106. Deployment and
Automation
• As you can imagine, many machines to manage. Current inventory:
• 3x Cassandra
• 2x Redis
• 2x RabbitMQ
• 4x Application + Indexer
• 3x Preview Processor
• 1x Activity Processor
• 1x Nginx
• 3x Etherpad
• Performance testing with a cluster of 21 virtual machines
• Additional scalability testing and verification with ~30 virtual machines
Wednesday, 12 June 13
107. Puppet
• Use puppet to centralize machine configuration and prevent configuration drift
• Collection of “Manifests” that define the state that the machine should in based on
its hostname / role:
• What files should exist? What should their contents be?
• What packages should be installed?
• What services should be running, or stopped?
• http://github.com/sakaiproject/puppet-hilary
• All 20+ machines in cluster have Puppet installed, which ask for “catalog” info
(expected configuration state) from a single puppet master machine
• Puppet Master knows how to determine the machine state from the manifests
based on its host (e.g., db0 is a cassandra node, it should have cassandra, java, etc...)
• Use puppetdb with “External Resources” to share machine-specific information
with each other node in the cluster
Wednesday, 12 June 13
108. Hiera
• Serves configuration information for puppet
• JSON-based data-format, which can be inherited in a flexible manner
• Keeps large complex configuration data as clean as possible
{
"classes": [
"::oaeservice::hosts",
"::oaeservice::firewall::open",
"::oaeservice::rsyslog"
],
"nodetype": "%{nodetype}",
"nodesuffix": "%{nodesuffix}",
"web_domain": "oae-performance.oaeproject.org"
Wednesday, 12 June 13
109. MCollective
• Provides parallel execution over a number of machines at
one time
• Start / Stop / Check status of services
• Install / Remove / Check version of packages
• Use puppet resource syntax to check adhoc machine
facts
• Apply puppet manifests
• Each cluster node subscribes to an ActiveMQ server to
receive commands. Central machine (the “client”) publishes
the command and waits for reply
Wednesday, 12 June 13
110. Slapchop
• Missing piece:We need to create 21 machines of different specs in a
cloud service, and somehow get MCollective on them
• A tool we lovingly call slapchop
• Define a JSON manifest that holds machines configs and instances
• Run slapchop to create the machines in Joyent cloud, start them, get
mcollective installed
• Well, kind of...
• Now you can log in to the MCollective client and run mco puppet
apply
• Well, kind of...
• Go from empty cloud to working 21 machine cluster in ~15 minutes
Wednesday, 12 June 13
111. Nagios
• Provides monitoring and alerts
• Cassandra health
• Diskspace
• ElasticSearch JVM stats (memory usage, garbage collection)
• Application server health
• OS memory usage
• Nginx health
• RabbitMQ queue sizes
• RabbitMQ health
• Redis health
• Nagios NRPE scripts deployed with puppet
Wednesday, 12 June 13
113. Munin
• We are using Munin for time-series OS
resource statistics
• Disk, network, redis, load, memory
• Deployed automatically through puppet
manifests all nodes
Wednesday, 12 June 13
116. Security
• Machines deployed in privateVLAN -- private interfaces are isolated
• Public interface firewall completely closed on all nodes
• Single bastion node with public SSH enabled, key-only authentication
• Single Nginx node with public Web ports enabled
• TCP Syn cookies enabled for publically exposed machines to prevent syn-floods
• Rate-limiting to 50req/s per source IP applied for Web API requests
• Several XSS tests and all issues followed up
• Using OWASP JQuery plugin for XSS filtering user-created data
• All infrastructure security deployed automatically with puppet
• Penetration testing performed by University of Mercia
• UI vulnerability testing performed by SCIRT group
Wednesday, 12 June 13
117. Topics
1. Project Goals
2. Hilary System Architecture
3. Clustering
4. Hilary Design and Extension Patterns
5. Performance Testing
6. Deployment and Automation
7. UI Architecture
8. Customization and Configuration
9. Part 2: Hands on example
Wednesday, 12 June 13
126. Twitter Bootstrap
• Re-usable, consistent CSS is hard
• Most popular CSS framework
• Documentation already there
• Basic components, styles, etc.
• Override where necessary
Wednesday, 12 June 13
145. Namespacing
• Widgets share same container
• Avoid clashes
• Namespace:
• HTML IDs
• CSS classes
• jQuery selectors
Wednesday, 12 June 13
146. Widget JS
• Require required APIs
• Return function to be executed as widget
Wednesday, 12 June 13
147. i18n
• UI available in multiple languages
• Standard .properties files
• 2 types of bundles
• Core bundles
• Widget bundles
Wednesday, 12 June 13
148. i18n
Translation priority
1. Widget user language file
2. Widget default language file
3. Container user language file
4. Container user language file
Wednesday, 12 June 13
150. i18n
• English
• French
• German
• Italian
• Spanish
• Russian (Partial)
• Chinese (Partial)
Wednesday, 12 June 13
151. l10n
• API methods for localizing:
• Timezones
• Date Formatting
• Currency
Wednesday, 12 June 13
152. UI templating
• TrimPath
• Avoids lots of DOM manipulation
• Pass in JSON data
• Supports if statements, for loops, etc.
Wednesday, 12 June 13
153. UI templating
• Templates are defined in between <!-- -->
• oae.api.util.template().render(...)
Wednesday, 12 June 13
154. UI templating
• Template
<div id="example_template"><!--
<h4>Welcome {firstName}.</h4>
You are ${profile.age} years old
--></div>
• Input
oae.api.util.template().render($("#example_template"), {
“firstName”: “John”,
“profile”: {
“placeofbirth”: “Los Angeles”,
“age”: 45
}
});
• Result
<h4>Welcome John.</h4>
You are 45 years old.
Wednesday, 12 June 13
155. UI templating
• Template
<div id="example_template"><!--
{if score >= 5}
<h1>Congratulations, you have succeeded</h1>
{elseif score >= 0}
<h1>Sorry, you have failed}
{else}
<h1>You have cheated</h1>
{/if}
--></div>
• Input
oae.api.util.template().render($("#example_template"), {
“score”: 6
});
• Result
<h1>Congratulations, you have succeeded!</h1>
Wednesday, 12 June 13
156. UI templating
• Template
<div id="example_template"><!--
{for conference in conferences}
<div>${conference.name} (${conference.year})</div>
{forelse}
<div>No conferences have been organized</div>
{/for}
--></div>
• Input
oae.api.util.template().renderTemplate($("#example_template"), {
“conferences”: [
{“name”: “Sakai San Diego”, “year”: 2013},
{“name”: “Sakai Atlanta”, “year”: 2012}
]
});
• Result
<div>Sakai San Diego (2013)</div>
<div>Sakai Atlanta (2012)</div>
Wednesday, 12 June 13
157. Topics
1. Project Goals
2. Hilary System Architecture
3. Clustering
4. Hilary Design and Extension Patterns
5. Performance Testing
6. Deployment and Automation
7. UI Architecture
8. Customization and Configuration
9. Part 2: Hands on example
Or something else?
Wednesday, 12 June 13
161. Tenant configuration
• Configure global tenant (overridden by tenants) or
individual tenant configurations
• Configure on the fly
• Single Sign On integration
• Default UI language
• Default visibility settings
• Data storage settings
• etc.
Wednesday, 12 June 13
167. Extending with NPM
• NPM - Node Package Manager
• Dependency management, including remote fetching custom modules from
the NPM repo or github
• Stored inside of node_modules directory of your project
• Usually a logical set of functionality (e.g., a back-end REST API, or a set of
related widgets)
• NPM module in 3akai-ux is searched for custom widgets
• NPM module in Hilary (that starts with oae-) is searched for init.js to
integrate to the application container
• New dependencies can be added to package.json file
• Changes to this file must be maintained with a patch, though :(
Wednesday, 12 June 13
168. UI Release Processes
• Grunt
• Task-based build system implemented in JavaScript
• Similar in theory of operation to Make, Rake
• Rich ecosystem of plug-ins to do most tasks
• Easy to implement new task when a plugin doesn’t exist
yet
• Used for running test suites, production builds, linting
tools
Wednesday, 12 June 13
169. UI Release Processes
• Production Build
• Optimizes the static assets to reduce throughput, request frequency, and optimize
caching across versions
• Require.js Optimization:
• Concatenate JavaScript dependencies (reduces number of web requests significantly)
• Minify / Uglify JavaScript files (reduces payload sizes significantly, even when gzip
enabled on web server)
• Hash optimization:
• Hash the contents of static assets and append result to the filename, then cache
them indefinitely on the browsers
• When the files change, the hash in the filename changes to force reloading of the
updated asset
• If files never change across version, client never reloads file until their cache is
cleared
Wednesday, 12 June 13
170. Developer Resources:
Widget SDK
• Contains help on creating widgets
• Code best practices
• Design style guide
• UI and API documentation
• Widget Builder
• Examples
Wednesday, 12 June 13
171. Developer Resources:
Docs UI
• UI that has documentation automatically
generated from the docs in the Hilary and
3akai-ux source code
• Accessible from /docs path of any tenant
Wednesday, 12 June 13
172. Topics
1. Project Goals
2. Hilary System Architecture
3. Clustering
4. Hilary Design and Extension Patterns
5. Performance Testing
6. Deployment and Automation
7. UI Architecture
8. Customization and Configuration
9. Part 2: Hands on example
You do have Hilary installed, right?
Wednesday, 12 June 13