Ever wonder how some applications are built? Ever wonder how to combine components of the Windows Azure platform? Stop wondering and learn how we’ve built MyGet.org, a multi-tenant software-as-a-service. In this session we’ll discuss architecture, commands, events, access control, multi tenancy and how to mix and match those things together. Learn about the growing pains and misconceptions we had on the Windows Azure platform. The result just may be a reliable, cost-effective solution that scales.
2. Who am I?
Maarten Balliauw
Daytime: Technical Evangelist, JetBrains
Co-founder of MyGet
Author – Pro NuGet http://amzn.to/pronuget
AZUG
Focus on web
ASP.NET MVC, Windows Azure, SignalR, ...
MVP Windows Azure & ASPInsider
http://blog.maartenballiauw.be
@maartenballiauw
3. Who am I?
Maarten Balliauw
Daytime: Technical Evangelist, JetBrains
Co-founder of MyGet
Author – Pro NuGet http://amzn.to/pronuget
AZUG
Focus on web
ASP.NET MVC, Windows Azure, SignalR, ...
MVP Windows Azure & ASPInsider
http://blog.maartenballiauw.be
@maartenballiauw
4. Who am I?
Maarten Balliauw
Daytime: Technical Evangelist, JetBrains
Co-founder of MyGet
Author – Pro NuGet http://amzn.to/pronuget
AZUG
Focus on web
ASP.NET MVC, Windows Azure, SignalR, ...
MVP Windows Azure & ASPInsider
http://blog.maartenballiauw.be
@maartenballiauw
5. Agenda
NuGet? MyGet?
How we started
What we did not know
Our first architecture
Our second architecture
Multi-tenancy
ACS
Tough times (learning moments)
When business meets technology
Conclusion
9. Why MyGet?
Safely store your IP with us
Creating packages is hard. We have Build Services!
Granular security
Activity streams
Symbol server
Analytics
10. I’m not alone!
Xavier Decoster
@xavierdecoster
Yves Goeleven
@yvesgoeleven
Also known as @MyGetTeam
13. NuPack!
Using OData as their feeds
Which is some sort of WCF…
Multiple feeds?
Exchanged some ideas with Xavier
Prototyped something during TechDays Belgium, 2011
16. Here’s some code from back then…
[Authorize]
public class FeedController : Controller {
public ActionResult List() {
var privateFeedTable = PrivateFeedTable.Create();
var privateFeeds = privateFeedTable.GetAll(
f => f.PartitionKey == User.Identity.Name.ToBase64());
var model = new PrivateFeedListViewModel();
foreach (var privateFeed in privateFeeds.Where(f => f.IsVisible)) {
var privateFeedViewModel = new PrivateFeedViewModel();
model.Items.Add(AutoMapper.Mapper.Map(privateFeed, privateFeedViewModel));
}
return View(model);
17. How about this one?
try
{
privateFeedNuGetPackageTable.Add(privateFeedPackage);
}
catch
{
// Omnomnom!
}
24. ReSharper time!
A lot of refactoring done
Direct data access -> repositories
Repositories used by services
Services used by controllers
Using best practices
SOLID and DRY (well, not everywhere but refactoring takes time)
Running on two instances (availability, yay!)
25. We became a startup
Someone mentioned they would pay for our service
Think about business model
Volume of feeds and packages kept going up
Users in EU and US
29. Not so awesome…
Best practices!
Are they?
Layers!
No spaghetti code but lasagna code
Typical business application architecture!
Proved to be very inflexible
31. Awesome!
Datacenters nearby our users
Centralizes storage
Packages on CDN for faster throughput
DNS fail-over if one of the DC’s went down
32. No so awesome…
Datacenters nearby our users
Or not?
Centralizes storage
Speed of light! USA was slow!
Packages on CDN for faster throughput
Sync issues, downtime, …
DNS fail-over if one of the DC’s went down
Seems not every ISP follows DNS standards
33. We persisted!
Local caching in USA added
2 instances in EU, 3 in the USA
Speed of light! Syncing all data kept being slow
Populating cache was a nightmare
CDN kept having issues
Of 3 instances, only 1 was being used with enough load (60%)
34. We were growing!
We had public subscription plans
We added enterprise tenants (multi-tenancy added)
Resulting in…
Architecture became complex
Caching and syncing became complex
37. We had a look at our workloads
Managing feeds and packages
Doesn’t matter much where (sync vs. bandwidth)
Downloading packages
May matter where, let the tenant decide
Builds
Who cares where!
40. Our first architecture…
… was scaled across the globe
… but as synchronous as it could be
… prone to all issues with latency vs. synchrony
Event Driven Architecture?*
*disclaimer: we borrowed some concepts from EDA
41. EDA in MyGet
Some actions put an ICommand on a queue
(ground rule: if it can’t be done in 1 write, use ICommand)
All actions complete with an IEvent on a queue
Handlers can subscribe to ICommand and IEvent
Handlers are idempotent and not depending on
others
42. Example: log in
2 operations: 1 read, 1 write
Read the profile
Store the profile with LastLogin date
No use of ICommand
Finishes with UserLoggedInEvent
43. Example: change feed owner
Many operations!
Read two user profiles
Read current access rights
Change access rights
Push new privileges to SymbolSource.org
One command, one event
ChangeFeedOwnerCommand
FeedOwnerChangedEvent
45. Gain?
We now run on 2 instances, mostly for redundancy
Average CPU usage? 20% (across machines)
Flexibility!
Way easier to implement new features!
New feature: activity log
Simply subscribe to events we want to see in that log
46. Storage
No relational database (why not?)
Event-driven architecture
How do you store a feed’s packages and versions in an optimal
way?
Three important values: feed name, package id, package version
Table per feed
Package id = PartitionKey
Package version = RowKey
47. Storage
Reading 1.000 rows and deserializing them is SLOW (many
seconds)
We cache some tables on blob storage
1.000 rows in serialized JSON = small
Loading one file over HTTP = fast
Searching in memory through 1.000 rows = fast
Cache update subscribed to IEvent
49. How to bring this into code
Just like Request, Response and User:
a Tenant is contextual
All those are potentially different for every
request
DI containers with lifetimes exist…
50. Resolving a tenant
public interface ITenantContext {
Tenant Tenant { get; }
}
// Registration in container
builder.RegisterType<RequestTenantContext>()
.As<ITenantContext>().InstancePerLifetimeScope();
public class RequestTenantContext {
public Tenant Resolve(RequestContext context, IEnumerable<Tenant> tenants) {
var hostname = context.HttpContext.Request.Url.Host;
return tenants.FirstOrDefault(t => t.HostName == hostname);
}
}
54. ACS for MyGet
No more user registration
One single trust relationship (= less coding)
Microsoft Account, Yahoo!, Google, Facebook
Other IdP’s (tenants and our own)*
*We built many others and are working on a spin-off
http://socialsts.com (Twitter, LinkedIn, Microsoft Account, …)
55. One small trick…
var realm = TenantContext.Tenant.Realm;
var allowedAudienceUris = FederatedAuthentication.FederationConfiguration
.IdentityConfiguration
.AudienceRestriction
.AllowedAudienceUris;
if (allowedAudienceUris.All(
audience => audience.ToString() != TenantContext.Tenant.Realm))
{
allowedAudienceUris.Add(new Uri(TenantContext.Tenant.Realm));
}
57. Huge downtime on July 2nd, 2012
Symptoms:
Users complaining about “downtime”
No monitoring SMS alert
Half an hour later: “site up!”, “site down!”, “site up!”, “site down!” SMS alerts
No sign of issues in the Windows Azure Management portal
But what’s the cause?
We just deployed our multi-tenant architecture
We just enabled storage analytics
ELMAH was showing storage throttling
16.000 unprocessed commands and events in queue
Full story at http://blog.myget.org/post/2012/07/02/Site-issues-on-July-2nd-2012.aspx
58. Huge downtime on July 2nd, 2012
One, simple piece of code…
GetHashCode() on Package object faulty
GetHashCode() used to track object in data context (new vs. update)
2 objects with the same hashcode = UnhandledException
Full story at http://blog.myget.org/post/2012/07/02/Site-issues-on-July-2nd-2012.aspx
59. An exception killed the site? WTF?!?
No. We caught any Exception and back then, blindly retry operations
Resulting in 16.000 commands and events being retried continuously
Causing storage throttling
Causing the website to retry reads
Causing more throttling
Starving IIS worker threads
Lessons learned?
A simple bug can halt the entire application
Only retry transient errors
Our monitoring wasn’t optimal
Our code wasn’t optimal (code from back when MyGet was a blog post…)
60. Huge downtime February 23rd, 2013
Symptoms:
Everything down
Furious users on social media
Windows Azure Management Portal Down
Furious tweets about #WindowsAzure
The cause?
Global outage of Windows Azure due to an expired SSL certificate on
storage
Full story at http://blog.myget.org/post/2013/02/24/We-were-down.aspx
61. Considerations and lessons learned
Move storage to HTTP instead of HTTPS?
Windows Azure down globally impacts us quite a bit
Fail-over to another solution costs money and lots of effort
Decided against it for now
Considering off-Windows Azure backups of at least all packages
Full story at http://blog.myget.org/post/2013/02/24/We-were-down.aspx
62. One more! New features…
“Retention policies” introduced
Seemed to be a success!
3+ million commands and events in queue
Solution: scale out
20 instances did it in a few minutes
Solution for the future: feature toggling
65. We’re constantly being bitten
Introduce a new beta feature
Come up with a revenue model
See the feature needs serious rewriting (metering)
Lesson learned? Think revenue early on.
66. Measure everything, test assumptions
“The Lean Startup” book says this
Don’t build it yourself: Google Analytics
67. this is why we built username/password registration,
seems a lot of people prefer typing instead of one clic
we must keep investing in Build Services
feed discovery is more popular than we imagined
from zero reactions on our blog and Twitter
the technical fear we had about “download as
ZIP” consuming too much server resources?
Seems 19 people used it this month. *yawn*