"Big Data" gets thrown around a lot, but what does it really mean?
This presentation, from SES Chicago, breaks down three of the main elements of Big Data, Volume, Variety, and Velocity.
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
The Age of Big Data
1. The Age of Big Data
and the Modern
Marketer
Chicago | November 12–16
Seth Dotterer
Conductor
VP, Marketing and Product
(speaker logo)
2. Chicago | November 12–16, 2012 | #SESCHI
Link: http://www.youtube.com/watch?v=QV3t-3QIf1E
3. Chicago | November 12–16, 2012 | #SESCHI
“Data underpins our
economy and our society
- data about how much is
being spent and where,
data about how schools,
hospitals and police are
performing, data about
where things are and data
about the weather.”
Tim Berners Lee,
Director of W3C.
5. Chicago | November 12–16, 2012 | #SESCHI
A collection of data sets so large and
complex that it becomes difficult to
process using on-hand database
management tools.
6. Chicago | November 12–16, 2012 | #SESCHI
Definition of Big Data
Volume - Variety - Velocity
• a
@twitterhandle
15. Chicago | November 12–16, 2012 | #SESCHI
Variety
• Structured Data:
• Unstructured Data
• Even Structured Data has issues
Unstructured
(mostly)
@twitterhandle
20. Chicago | November 12–16, 2012 | #SESCHI
Hadoop has its roots as a search engine
•
Apache Hadoop, the leading open source solution, was created from Google’s MapReduce
and the Google File System but adopted by a Yahoo employee.
• Now, Hadoop and the resulting and related technologies and toolsets are used to create
distributed processing across clusters of computers. These scale up to thousands of machines
that each can store and process information, and can pick up failures from any of their
neighbor machines.
@dotterer
21. Chicago | November 12–16, 2012 | #SESCHI
Big data is the fuel. It is like oil. If you leave
it in the ground, it doesn’t have value. But
when we find ways to ingest, curate, and
analyze the data in new and different ways,
such as in Watson, Big Data becomes very
interesting.”
Stephen Gold, VP of
Marketing for IBM’s Watson
29. Chicago | November 12–16, 2012 | #SESCHI
How Search Marketers can use big data
• Think Flow, not Batch.
• PPC Vendors like Marin, Kenshoo, Adobe all have big data optimization
• Leverage cross-discipline (SEO position data to drive Bids, and vice-versa)
• Mining and teasing out the tactics and strategies that are working, vs. what is noise
• Opportunity forecasting
@dotterer
30. Chicago | November 12–16, 2012 | #SESCHI
Big Data Means….
How Cross Channel Marketing is Changing how independent silos
are working
No More Silos
36. Chicago | November 12–16, 2012 | #SESCHI
Are you referring to MCA-O2S, MCA-AMS or MCA-ADC?
http://www.kaushik.net/avinash/multi-channel-attribution-definitions-models/
• MCA-ADS - Multi-Channel Attribution, Across Digital Channels:
• MCA-O2S - Multi-Channel Attribution, Online to Store
• MCA-AMS - Multi-Channel Attribution, Across Multiple Screens
37. Chicago | November 12–16, 2012 | #SESCHI
“The ability to
take data - to be
able to
understand it, to
process it, to
extract value
from it, to
visualize it, to
communicate
it's going to be a
hugely important
skill in the next
decades”
Hal Varian
- Google’s Chief Economist
42. Chicago | November 12–16, 2012 | #SESCHI
Takeaways
• Big Data is what YOU say it is, so that you can solve your problems.
Don't lose sight of that, or blindly 'trust the data'.
• The more frequently (or faster) you analyze your data, the more likely it is to be valuable.
Think less about ‘batch’ and more about flow.
• Find new sources of data – don’t just dumping more of the same at it.
• Keep data as long as you can
• Become (or hire) a statistician - Big Data and advanced analytics have emerged, with programs
sprouting up at USC, N.C. State, NYU and elsewhere.
• Figuring out what the data tells you is hard.
• How to convince others what the data tells you is harder. Visualization helps.
@dotterer
43. Chicago | November 12–16, 2012 | #SESCHI
Thanks!
sdotterer@conductor.com
@dotterer
Name, VP of Product and Marketing for Conductor, the leader in SEO Platform technology, Excited to talk about Big data This morning, we heard Avinash talk about how to not just measure, but learn from the data points we are measuring.So what do I know about this? AT Conductor, At conductor, we’re not just a consumer of the benefits of big data, we’re also building big data solutions for our customers– near and dear to my heart. Every day our platform gathers data on our customers, their websites, their competitor’s websites, and provides insight, reporting and workflow around how to get better.We gather enormous amounts of data while we do this.
Up to 50% of all internet traffic is non-human. And that’s through browser type
The structure of the data itself. The structure of the container that hosts the data. The structure of the access method used to access the data.Some people use the term ‘multi-structured’ to describe this
Need to capture and store data in burstsStorage is Cheap – but has led to some sloppy data storage problemsChallenge to Query that Data in a Responsive Enough WayHow long is there a delay until the data is accessible.
Hadoop was created by Doug Cutting and Michael J. Cafarella.[5] Doug, who was working at Yahoo at the time,[6] named it after his son's toy elephant.[7] It was originally developed to support distribution for the Nutch search engine project.[8]– – – 2004-2006: gestation GFS & MapReduce papers published directly address Nutch's scaling issues The other two classes are NoSQL and Massively Parallel Processing (MPP) data stores.
The evidence is clear: Data-driven decisions tend to be better decisions. I
In 2001 PASSUR began offering its own arrival estimates as a service called RightETA. It calculated these times by combining publicly available data about weather, flight schedules, and other factors with proprietary data the company itself collected, including feeds from a network of passive radar stations it had installed near airports to gather data about every plane in the local sky.PASSUR started with just a few of these installations, but by 2012 it had more than 155. Every 4.6 seconds it collects a wide range of information about every plane that it "sees." This yields a huge and constant flood of digital data. What's more, the company keeps all the data it has gathered over time, so it has an immense body of multidimensional information spanning more than a decade. RightETA essentially works by asking itself "What happened all the previous times a plane approached this airport under these conditions? When did it actually land?"After switching to RightETA, the airline virtually eliminated gaps between estimated and actual arrival times. PASSUR believes that enabling an airline to know when its planes are going to land and plan accordingly is worth several million dollars a year at each airport. It's a simple formula: Using big data leads to better predictions, and better predictions yield better decisions.
Tell the story of me, and how I got here…NASA’s Big Data ChallengeWe have deep space spacecraft that sends back data in the order of MB/s. Then we have earth orbiters that can send back data in GB/s per second. We are planning missions today that will easily stream more then 24TB’s a day. That’s roughly 2.4 times the entire Library of Congress – EVERY DAY. For one mission.Sandy, Knocked Out our washing machine– so went online – saw some reviewsThat information was stored in a DB, but also while I was looking at it, it generated reccomendations for me, based on what I’d already looked at, where I was, and what it thought I was more likely to purchse based on previous visitors.My wife and I bought the refridgerator, and scheduled delivery. I got a text message from Chase 5 minutes later asking if I had really purchased it.
Identify me as a visitor,Look up my profile (whether my password is right or not, what I’m looking for,Select content that matches visitor’s preferences, Assemble that pageRemember in the background it has to build a description of me, compare it to other folks, and query it’s entire product database to find what I want, my points, what roduct manuals they should pass along, and reviews that are going to be helpful to meWent into the store, and now it’s tying my online buying behavior (my profile) to my offline purchase
I don’t buy expensive stuff at Best Buy all that ofte
My recommendations – People I may knowAnd amazingly! Bob Sacco posted an artcile about Big data can add years to a CMOs tenure
http://www.kaushik.net/avinash/multi-channel-attribution-definitions-models/An example of MCA-AMS is the ability to understand that a search I did on my tablet computer while watching a television commercial resulted in a click on a paid search ad to a camera site which logged into my memory which later caused me to read reviews of the camera on my Nexus S while stuck in traffic and that finally caused a sale for Sony when I got home and happened to be on my laptop.The most common attribution models bundled into even the simplest web analytics tools are: Last click, first click, and even distribution.If you are lucky, you have access to a more sophisticated tool which would include: Adjustable, based on mathematical algorithms, time decay model.But most of what you'll get out of playing with these models is a deep and profound appreciation for how they'll, even in their most shining moment, give you directional guidance how to adjust your media spend (shift dollars/euros/pesos from Search to Display or from Display to Email or… other combinations).You'll realize (even if you use the greatest customized model created by your most magnificent consultant at a equally magnificent cost to you) that success then will come not from that rough output, but rather from your ability to take that rough output, make changes, observe the impact (over weeks, or months if you are small sized), identify insights and be less wrong over time.If you happen to be in a larger company, say you spend more than $10 million on digital marketing per year, you'll quickly see, having learned to be less wrong over time, that the question you want to answer with multi-channel attribution modeling is not "who gets how much credit" but rather "how can I optimally balance my digital marketing portfolio."
Approximately 80% of costs for a data project are spent on preping the data – mostly cleaning up data quality issueIf you’re budgeting, don’t make the very common mistake of spending money in frameworks that are only useful once you have clean data