Rzepnicki_thesis_presentation_2003(2) (1)

Evaluating the Impact of Content
Delivery Networks on N-tier E-
commerce Environments
Witold Rzepnicki
March 27th, 2007

2
Short Bio
• Moved to U.S. from Poland circa 1995
• Completed undergraduate studies in Computer
Information Systems at Missouri State University
• I have worked for Hallmark Cards since 1998 as a Java EE
developer, project manager/lead and a technology
architect
• PMP and SCEA certifications….811, 816 and 818 came in
handy
• Hobbies: travel, foreign languages, tennis (outdoors and
on Nintendo Wii)

3
Acknowledgments
• Dr. Hossein Saiedian
• Dr. Arvin Agah
• Dr. Prasad Kulkarni
• My wife, Masako

4
Outline
• Problem
• Significance
• Methodology/Solution
• Results/Evaluation
• Conclusion
• Further Research

5
Outline
Problem
• Significance
• Conclusion
• Further Research

6
The Non-technical Introduction
What Matters to Consumers?
• Are you happy with the web sites you visit?
• Consumers cite website performance and
responsiveness as key challenges for E-
commerce environments (Nielsen Research)
• Role of content and content delivery
Satisfaction Level 2005 2004 2003 2002
2002 vs 2005
change
Very Satisfied 40% 37% 40% 37% 3%
Somewhat Satisfied 24% 24% 23% 22% 2%
Neutral 31% 32% 30% 33% -3%
Somewhat Dissatisfied 4% 5% 5% 5% -1%
Very Dissatisfied 2% 3% 3% 3% -1%

7
Typical Hourly Downtime Costs
• Brokerage operations $6,450,000
• Credit card authorization $2,600,000
• Ebay $225,000
• Amazon.com $180,000
• Package shipping services $150,000
• Home shopping channel $113,000
• Catalog sales center $90,000
• Airline reservation center $89,000
Source: Pp. 185-188 of the Proceedings of LISA '02: Sixteenth Systems Administration Conference,
(Berkeley, CA: USENIX Association, 2002).

8
Subject E-commerce Environment
• Architecture
– Infrastructure
– Logical
• Workload characteristics
– Seasonal spikes
– Content size increase
• Current content delivery model
– Server-farm based content
delivery
– Not geographically dispersed
Content Type Size 1998 2004 2007
Documents 0 KB 28 KB 34 KB
Images 36 KB 42 KB 96 KB
Scripts 0 KB 2 KB 20 KB
StyleSheets 0 KB 0 KB 62 KB
Total 36 KB 72 KB 181 KB

9
Architectural Views - Infrastructure

10
Content Delivery Bottlenecks
Even with infinite server-side scalability we
would still encounter WAN bottlenecks

13
Problem Statement
• Insufficient performance and scalability during peaks
• Tactics to-date do not fully address the content
delivery layer
– Last-mile, first-mile, peering and backbone problems
– Upper limit to bandwidth scalability for content
delivery (single hosting site)
– Cost factors
• Symptom: performance degrades as Web servers get
overloaded with requests

14
Outline
• Problem
Significance
• Conclusion
• Further research

15
Content Delivery Networks
• The CDNs offload some or all of the content
delivery from the origin Web servers.
• It is a large set of replica servers called the
edge servers that deliver content on behalf of
the origin server.
• CDNs claim to address
– Client perceived latency (e.g. Web browsers)
– Capacity management of the servers
– Static content caching requirements

16
• Quality attribute evaluation of the CDN claim
– Performance
– Scalability
– Availability
– Maintainability
• Consumer and server-side measurements
• Infrastructure footprint impact
– Potential cost savings can be significant
– One hosting center versus two
– Resilience of a geographically dispersed network
• Research to-date focuses on network impacts alone
Research Focus

17
Performance and Scalability Issues
Consumer
experience
and site
visits

18
Performance and Scalability Issues
Resource
utilization

19
Tactics Implemented To-date
• Horizontal and vertical scalability strategies
implemented to-date
– Clustering
– Origin server caching – content and application
– Scaling individual nodes’ CPU and memory capacity
– Application and database tuning
– Additional bandwidth and switching improvements
– Considered introducing another hosting site to
further improve bandwidth

20
Outline
• Problem
• Significance
Methodology/Solution
• Conclusion

21
Why a CDN?
• Server-side caching approaches not sufficient
• Fewer “hops” and more efficient routing
• Ease of implementation versus establishing a
set of secondary hosting facilities
• CDNs (e.g., Akamai) improve web performance
by
– Performing extensive network & server
measurements
– DNS redirection to the most efficient servers

22
DNS Fundamentals
• Client-server
architecture
• TTL and
caching
• Name
resolution
steps

23
Content Delivery Network
•Browser requests redirected
to the most suitable edge
server
•Browser gets web site’s DNS
CNAME entry with domain
name in CDN network
•Hierarchy of a CDN’s DNS
servers direct client to a
“nearby” server
•Based on current network
conditions as measured by the
CDN

24
CDN Selection and Implementation
• Redirect method selection: URL
rewrite vs. URL redirect, partial-
site vs. full-site
• DNS changes
– Local name server
• CDN configuration changes

25
How To Measure Quality Attribute
Impacts?
• Performance
– Page response times
– Java EE component processing times
– Data center network latency
• Scalability
– Ability to sustain traffic spikes while maintaining the
same resource footprint
– Resource utilization (bandwidth, CPU, etc.)
• Other QA impacts
– Availability and maintainability

26
Experimental Challenges
• Scalability
– Requires sufficient load to test elasticity of
resources
– Need to simulate fast transactional bursts
– Gather production environment data during the
February peak
• Performance
– Establish pre-CDN and post-CDN baselines under
steady state
– Eliminate outside “noise” by isolating transactions in
a non-production environment

27
Monitoring and Measurement
Framework
• Consumer perspective
– Real-time user monitoring
– Browse versus shop transactions
– Geographic distribution
– Consistent and sustained rate
• Application perspective
– URI stem-level performance measurement
– Host, network and end-to-end times
• System perspective
– Vmstat and bandwidth utilization

28
Consumer Transaction Emulation
• Response times before and after CDN
• Real-time user monitoring
• Transaction characteristics and frequency
ISP City and State
Level3 Los Angeles, CA
Savvis Santa Clara, CA
Verizon Denver, CO
MFN Washington, D.C.
Internap Miami, FL
Level3 Chicago, IL
Sprint New York, NY

29
End-to-end Performance Timeline

30
Browse and Shop Transaction
Characteristics
Transaction workload characteristics Browse Shop
Number of transaction steps 9 6
Number of images retrieved 163 94
Number of scripts, HTML, CSS, Flash components 57 39
Number of server-side J2EE components accessed 12 15
Average image size 2.9 KB 2.8 KB
Average size of HTML, script and Flash 4.9 KB 5.8 KB
Total number of bytes retrieved per connection 250 KB 98 KB
Number of web-server connections initiated from the
browser
4 5

31
Appliance-based Server-side
Monitoring

32
Outline
• Problem
• Significance
Results/Evaluation
• Evaluation
• Conclusion

33
Scalability: Memory Utilization

34
Scalability: CPU Utilization

35
Scalability: TCP Packet Count
Reduction

36
Scalability: CDN Bandwidth
Utilization

37
Bandwidth Efficiency Improvements

38
Web Tier Scale Factor
• Maximum concurrent Web server socket threads
• Maximum object “hits” in Akamai
• 16,000 hits / 3,600 threads
• Equivalent to 4x of our Web server farm

39
Performance: Shop Tx Impact
20% improvement

40
Performance: Browse Tx Impact
30% improvement – more content for “window”
shopping

42
Performance: HTML Object Download
Time
• Browse
Transaction
• Shop
Transaction
• Why the discrepancy between the RTUM and
Server performance?

43
Maintainability and Availability
• Configuration management
– 2 hours on average to deploy configuration changes
• Content management
– 7-10 minutes to propagate content across edge
servers
• Achieved 100% availability during the observed
February peak

44
Outline
• Problem
• Significance
• Results
• Evaluation
Conclusions

45
Conclusions – The Good
• Improved user-perceived performance
• Significant scalability impact
• Availability improvements

46
Conclusions – The Not-so-Good
• Server-side performance impacted by additional
DNS redirects
• Maintainability impacts
– Configuration changes
– Content changes

47
Outline
• Problem
• Significance
• Evaluation/Results
• Conclusion
Further research

48
Future Work
• Edge computing
– Edge delivery of applications
• Impact of edge delivery on media streaming
and protocols other than HTTP
– RTSP, MMS

49
End of Presentation
• Thank you
• Questions are welcome

Rzepnicki_thesis_presentation_2003(2) (1)

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Viewers also liked

Viewers also liked (20)

Similar to Rzepnicki_thesis_presentation_2003(2) (1)

Similar to Rzepnicki_thesis_presentation_2003(2) (1) (20)

Rzepnicki_thesis_presentation_2003(2) (1)

Editor's Notes