Using Amazon S3 & CloudFront for syllabus storage & delivery; SNS & SQS for a message bus; and Elasticsearch on ECS for content search.
Eric Grysko (Software Engineer, Student Services IT)
Presented on Aug 30 2017 to Software Development - Special Interest Group at Cornell University
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Cornell University, Central Syllabi - A look at the ingredients for a major new Class Roster feature.
1. Student Services IT
Central Syllabi
A look at the ingredients for a major
new Class Roster feature.
classes.cornell.edu
ECSS3SNS SQS CloudFront SES
2. Student Services IT
Athletics & Physical Education Dean of Students
Career Services Cornell Health
Chimes Public Service Center
Campus Life Enterprise Services
(Cornell Store, Housing, Dining...)
Student Disability Services
Campus & Community Engagement ... and more
Supported Customers
University Registrar Student Employment
Financial Aid Undergraduate Admissions
Graduate School ... and more
Division of Student and Campus Life (SCL)
Vice Provost for Enrollment
3. Central Syllabi
Class Roster
‣ Official schedule of classes
classes.cornell.edu
‣ Office of the University Registrar
‣ October 2014
Launched on-premises
Symfony Framework (PHP), MongoDB, MySQL
‣ February 2016
AWS Migration
AWS (EC2, RDS, ELB, ASG), Jenkins, Ansible
‣ March 2016
Released "Scheduler"
AngularJS
‣ June 2017
Released "Central Syllabi"
4. Central Syllabi
New Functionality - June 2017
‣ Expanded API
Internal & Academic Units
‣ Syllabus Upload
Instructors & Admin. Asst.
‣ Syllabus Download
All
‣ Syllabus Content Search
All
‣ Email Alerts
All
‣ Reports
Instructors, Admin Asst., and Staff
5. Central Syllabi
New Ingredients
‣ Expanded API
Internal & Academic Units
‣ Syllabus Upload
Instructors & Admin. Asst.
‣ Syllabus Download
All
‣ Syllabus Content Search
All
‣ Email Alerts
All
‣ Reports
Instructors, Admin Asst., and Staff
Message Bus
Amazon SNS + SQS
Object Storage
Amazon S3
Content Delivery
Amazon CloudFront
Full Text Search
Elasticsearch on Amazon ECS
Notifications
Amazon SES (API, not SMTP)
6. Central Syllabi
‣ Class Roster introduced "API 4.0" to add support for "Central Syllabi"
AngularJS app
‣ Stumbling into an event driven model and a message bus
‣ Amazon SNS - Simple Notification Service
‣ Publish/Subscribe (Pub/Sub) Messaging Service
‣ Publish messages to SNS Topic using AWS SDK/API
‣ Subscribe endpoints (SQS, HTTPS, AWS Lambda, SMS, etc)
‣ Amazon SQS - Simple Queue Service
‣ SQS Queue subscribed to SNS Topic
‣ ReceiveMessage from SQS using AWS SDK/API
‣ at least once delivery; no guarantee on order of messages received
‣ Easy entry point to pub/sub
‣ AWS SDK; serialize/deserialize objects; push to topic and a receive from queue
‣ Pricing: Less than $1/million messages
Expanded API
7. Central Syllabi
Events
Published to SNS Topic (JSON)
SyllabusCreate SyllabusFileUpload SyllabusAttach ReportGenerate
SyllabusUpdate SyllabusFileDelete SyllabusDetach ReportRun
SyllabusReplace SyllabusFileUnsafe SyllabusPublish RosterRefresh
SyllabusDelete SyllabusUnpublish
SyllabusSearchIngest
8. Central Syllabi
‣ Publish hydrated messages to SNS
‣ SQS Queue subscribed to SNS Topic
‣ Worker Process (Run Loop)
‣ SQS Long Polling (20s) sqs:ReceiveMessage
‣ Processing of may trigger additional sns:Publish
‣ On success; sqs:DeleteMessage
‣ Advantages
‣ decouple components
‣ async/non-blocking; can parallelize
‣ "Maximum Receives"; redrive policy to DLQ (Dead Letter Queue)
‣ CloudWatch monitor message age (alarmed on failure)
‣ Facilitates future integrations; ex. publishing messages not internally used
(RosterRefresh)
‣ Challenges
‣ Idempotent ("at least once", order not guaranteed); not using FIFO
‣ Fast (async)
Event Processing
{
"type": "SyllabusPublish"
"syllabusId": "..."
"rosterId": 13,
"roster": {
"strm": 2678,
"rosterSlug": "FA17",
...
},
"syllabus": {
"viewPermission": ...
"publishedDttm": ...
"resourceId": ...
...
}
}
9. Central Syllabi
‣ Amazon S3 (Simple Storage Service)
‣ Unlimited object storage (Key/Value); not a file system
‣ Durability: 99.999999999%
‣ Pricing: about 2 cents/gigabyte month
‣ S3 Bucket "cu-classroster-prod"
‣ Evaluation of several permissions (IAM, Bucket Policy, ACL)
‣ Bucket policy: requires encryption (AES256) and grants CloudFront OAI (user) access
‣ CORS Policy
‣ IAM Role w/policy: GetObject, PutObject only. No DeleteObject
‣ Class Roster provides clients temporary credentials to enable
direct upload to S3
Syllabus Upload
10. Central Syllabi
‣ AngularJS app w/API 4.0
‣ Client given API Token
‣ Client issues $http request
‣ Client requests includes Authorization: header
‣ Class Roster API uses AWS SDK to generate
and return security policy
and signature to client
‣ Client uploads to S3
‣ PutObject (V4)
http://docs.aws.amazon.com/AmazonS3/latest/
API/sigv4-UsingHTTPPOST.html
‣ Client notifies API on completion
‣ SyllabusFileUpload event published to
SNS
Create a POST Policy
{
"s3Key":"prod/syllabus-file/2678/
ae51683709939c0f2ab250e9511606803ab0020e6024e03",
"formInputs": {
"acl": "private",
"X-Amz-Security-Token": "...",
"key": "${filename}",
"X-Amz-Credential": "ASIAJKOTJGRPYKUDMUHA/
20170828/us-east-1/s3/aws4_request",
"X-Amz-Algorithm": "AWS4-HMAC-SHA256",
"X-Amz-Date": "20170828T231949Z",
"Policy": "...",
"X-Amz-Signature": "..."
},
"formAttributes": {
"action": "https://cu-
classroster-prod.s3.amazonaws.com",
"method": "POST",
"enctype": "multipart/form-data"
},
"fileId": "4c06f8ee0830e747eb791dd01b9a0b7796"
}
11. Central Syllabi
‣ Amazon CloudFront (Content Delivery Network)
‣ CloudFront vs S3
‣ Or ... why are we not using Amazon S3 for delivery
‣ Distribution
‣ vanity domain: files.classes.cornell.edu + SSL Certificate (Amazon Certificate Manager)
‣ Behaviors (path routing to origin)
‣ Origin (S3)
‣ Allow CloudFront distribution to access origin via bucket policy w/OAI
(Origin Access Identity)
‣ Class Roster authorizes client to access private content
Syllabus Download
12. Central Syllabi
‣ Signed URL vs Signed Cookie
‣ Signed URL, similar in concept to S3, but uses key-pair (root credentials to create)
‣ short expiration
‣ content-disposition to rename
‣ Client requests syllabus from Class Roster
‣ https://classes.cornell.edu/download/syllabus-simple/FA17/AEM/1220/1/001
‣ Server returns 302 Redirect to Signed URL:
‣ https://files.classes.cornell.edu/prod/
syllabus-file/2678/16d90dc10961b1f4aaf758691fc0ca82b8ca5accc487c85f477a020dc65ffbd
?response-content-disposition=inline:filename="Syllabus-FA17-AEM1220-LEC001-PRIOR-TERM.pdf"
&Expires=<expireUnixTime>
&Signature=<base64Text>
&Key-Pair-Id=<access key>
‣ http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-signed-urls.html
CloudFront Signed URLs
13. Central Syllabi
‣ Full text search
‣ "Options": RDBMS (MySQL) vs NoSQL (MongoDB) vs Elasticsearch
‣ What is Elasticsearch?
‣ Full text search engine based on Apache Lucene supported by
elastic.co
‣ Open source; massively scalable; RESTful; "E" in ELK Stack
‣ Not for the faint of heart
‣ Considered AWS Elasticsearch
‣ pros: AWS managed service (backups, scaling)
‣ cons: limited plugins; out of date; costly; not in VPC
‣ ** at time of development, lacked attachment pipeline plugin
‣ Decision: Elasticsearch on EC2 Container Service
‣ Utilize official Elastic Docker images for base
docker pull docker.elastic.co/elasticsearch/elasticsearch
‣ Elasticsearch is I/O Intensive; ECS Task Definition w/placement constraint set to
container instance that has an attribute to identify our persistent EBS volume
‣ Ansible + Jenkins; lay down task definition, service, IAM roles, etc.
Syllabus Content Search
14. Central Syllabi
‣ Index - Collection of Types
Syllabi
‣ Type - Consistent Fields within Type
Syllabus
‣ Plugins
ingest-attachment (Apache Tika for text extraction)
‣ APIs (RESTful and JSON) - Indices, Document, Search, etc
PUT /syllabi
PUT /syllabi/syllabus/<syllabusId>
GET /_search
Elasticsearch at a Glance
15. Central Syllabi
‣ Worker processing of
SyllabusPublish results in
new event:
SyllabusSearchIngest
‣ Worker receives
SyllabusSearchIngest
‣ GetObject from S3
‣ Document API
‣ PUT syllabi/syllabus/<id>
‣ Backup vs DR Plan
‣ re-ingest all docs
Elasticsearch Ingestion
$key = $syllabus->getGeneralFile()->getS3Key();
try {
$s3Object = $s3Client->getObject(
['Key' => $key, 'Bucket' => $this->getS3Bucket()]);
} catch (Exception $e) {
throw new FileNotFoundException('File Not Found');
}
$body = base64_encode($s3Object['Body']);
$esParams = [
'index' => 'syllabi',
'type' => 'syllabus',
'id' => $syllabus->getId(),
'pipeline' => 'attachment',
'body' => [
'rosterId' => $roster->getId(),
'strm' => $roster->getStrm(),
'syllabusId' => $syllabus->getId(),
'updatedDttm' => $syllabus->getUpdatedDttm()-
>format(DATE_ATOM),
'data' => $body
]
];
$result = $this->getElasticsearchClient()->index($esParams);