Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1zBtGDa.
Dave McCrory talks about what is Data Gravity, how it affects performance and portability and why these effects are amplified when there are larger volumes of data. Filmed at qconlondon.com.
Dave McCrory most recently served as SVP of engineering at Warner Music Group, where he led over 100 engineers building the company’s new Digital Services Platform, based on an open source enterprise platform as a service. His extensive experience in the cloud and virtualization industry included positions as a senior architect in Cloud Foundry while at VMware and as a cloud architect at Dell.
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Don’t Let Data Gravity Crush Your Infrastructure
1. Don’t let Data Gravity
CRUSH your infrastructure
Dave @McCrory
CTO @Basho
2. InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/data-gravity-infrastructure
3. Presented at QCon London
www.qconlondon.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
29. Data Creation
• Human Driven and Machine Driven
• Humans use Human Interfaces
(UIs are an example)
• Machines use Machine Interfaces
(APIs are an example)
30. Moore’s Law meets Big Data
• IDC estimates Data will double every 2 years through 2020
http://www.emc.com/collateral/analyst-reports/idc-the-digital-universe-in-2020.pdf
• McKinsey estimates Data will increase by 40% yearly
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation
• Data production will be 44 times greater in 2020 than it
was in 2009
http://www.csc.com/big_data/flxwd/83638-big_data_just_beginning_to_explode_interactive_infographic
Data is doubling at an equal or greater
rate than CPU transistor counts!
31. Causes of Data Growth
• Big Data
• Internet of Things (Sensors)
• Machine to Machine
• Machine Learning
33. • Data is stored on a physical device (subject to
degradation)
• The device must consume power to make data
available
• Data is subject to “bitrot” aka error propagation
• The more data you have, the more costly it is to
maintain
35. • Data warehousing?
• Data needs to have context (structure)
• Data Entropy problem
• Junk Data is still Junk Data, no matter how long you
store it
• Each storage type has trade-offs
• When have this ever gone well?
37. Amazon found every 100ms of latency cost them
Google found an extra .5 seconds in search page generation time
dropped traffic by 20%
A broker could lose
if their electronic trading platform is 5 milliseconds behind the competition
Businesses care about Latency
source: http://blog.gigaspaces.com/amazon-found-every-100ms-of-latency-cost-them-1-in-sales/
Shopzilla -
increase in revenue, a 50% reduction in hardware, and a 120% increase traffic
from Google
Bing found that a 2 second slowdown changed queries/user by -1.8% and revenue/
user by -4.3%
41. Moving Data to the
Processing
• How fast is the Bus/Network?
• How much data is being moved?
• How variable is the latency?
• How much latency is acceptable?
Considerations
45. Moving Processing to the
Data
• How fast is the Bus/Network?
• How much processing power is near the Data?
• How variable is the latency?
• How much latency is acceptable?
Considerations