2. What Is He Going To Talk About?
What are we trying to do?
How do approach doing it?
How do you navigate to success?
Overview &
Data Analysis
Data Design &
Loading
Strategies
Securing Your
Deployment
Creating A Single
View
Part
1
Part
2
Part
3
3. A data platform is common process, tooling
and management from ingestion to
presentation
• Integrating technologies
• Streamlining ingestion
• Centralizing access
• Supporting analytics
4. I have sat in your chair looking for answers
Source: Elephant by James Fujii @ http://jamesfujiistoryartist.blogspot.com/2010/05/dancing-elephant-april-2010.html
Source: Gorilla by Luigi Lucarelli @ http://loaduniverse.blogspot.com/2012/04/gorilla-sketch.html
13+ years 1.5 years Present
6. Single view use cases abound
Financial Services
Creating a single view
of a product, trader,
client
Generating
recommendations to
sales people
Identifying anomalous
patterns
Data sources: client
profile, trade data,
market data, service
catalog
Retail
Creating a single view
of a product or customer
Generating
recommendations to
customers
Proactive, personal
marketing to customers
Data sources: customer
profile, loyalty program,
inventory, product
catalog
Healthcare
Creating 360-degree
patient view
Managing electronic
health-care records for
patients
Population management
for at-risk demographics
Data sources: patient
records, sensor data,
care center metrics
11. Getting to a Single View of the customer
Calculation using the
aggregation framework
MapReduce
Analytics
Embedding data
in context
Generic
container to
manage all the
contact points
and analytical
observations
Full-text search
for ad hoc query
Geo-indexing for
extending insight
12. Questions we can now ask …
• What kinds of products has this person purchased?
e.g. distinct( “orders.category”, { “id” : 12345678 } )
• How close are they from a point of service? Now.
e.g. find("location " : { $near : [40.8768, -73.787] } )
• Who is most dissatisfiedwithour service?
e.g. find().sort({ sentiment : 1 } ).limit( 100 )
• What shouldmy customer representative mention in the next conversation?
e.g. find( { “action.topic”: “talkingpoint” } ).sort( createdOn : -1 )
13. Start focused and iterate quickly
1. Focus on a single effort, a couple of data sources, and a
familiar ‘actor” to avoid being overwhelmed
2. Consider what the data is and a few questions you might ask
of it – this informs the initial data model
3. Prototype, prove then iterate
14. Dynamic schemas enables flexibility and
agility
• Data is coming in many forms
• New data sources surface constantly
• Single-views facilitate analytics and breed more data
• Flexible schemas make it easy to evolve vs. reinvent
• Schemas reflect the access patterns of the data
Source: Sogeti: bigdata-introduction-sogeti-consultingservices-businesstechnology-20130628v5-130723151309-phpapp02.ppt
15. Modeling a single view of the customer begins
with the types of queries we think we will want
to make
• What is the customer (individual or company) doing that is not normal?
• What are the customer’s peers doing, that they would do, but haven’t?
• What productshas the customer purchased and what is likelihood that they will
buy something new in the next week | thirty days?
• What willthe customer do this week?
• What are customers tellingus about them, us, the world?
• What content / products are effective at generating sales?
• Who is influential – both us and customers?
• What does the world look like from our proprietarypointof view vs. the what is
happening in the public?
• Which customers should I focus on, what should I show them and why is it smart?
16. What do all these crazy formats look like?
Twitter
timeline
Product
catalog
Comments
Content
Binary
17. Centering around customers enables single
access pattern for fast rich queries and
iterative extension
• Using arrays to allow for multiple relationships
(organizations & locations)
• Tags as a means to flexibly profile
e.g. top_10_percent, badge_of_honor, flight_risk
• Actions array to store analytical outputs that
describe the customer to sales channels and
people e.g. anomalies, recommendations, predictions
• Products array to store products they
customer has purchased and if they publicly
engaged e.g. rating, commenting, tweeting returned etc.
• Products collection to store details
• Product comments reference Customers
19. Adding streaming customer sentiment to base
model enables new analytics and customer
insight
• Sentiment analytics could be performed
– Across all customer expressions
– Towards the company
– Towards specific products
• An aggregate sentiment score might place the
customer on a positive / negative scale as a
means to identify predisposition and can be
contrasted with sentiment towards the
company
• Sentiment towards products might be stored
in an array, similar to comments, enabling
context specific dispositions
21. So, we have a model, but what if I need more
than one way to look at my data?
• Aggregate instead of recalculate
Examples: Creating an activity score based on a few variables on the fly
or… Storing a variety of counters, but not the individual records
• Store the original granularity for a detailed query and embed
aggregations
Example: Total number of reviews vs. counting the reviews themselves
• Embedding and linking gives you the best of performance and
management
Example: a master collection for addresses, but an embedded copy for
performance
22. MongoDB shifts responsibilities to developers
• Enforcing security, auditing, redaction, schema
• Assuming change
• Managing the data holistically
• Balancing the decision to embed vs. link and
the extra management of data that may be required
23. What does your single-view look like?
• What is at the center of your single-view?
• What questions are you asking of your data?
• How do you plan to get the data in there?
• How do you make sure it doesn’t exit out the backdoor?
We are here to talk about one of the most coveted and yet seemingly intractable challenges companies face, the single view of their data.
I come from large businesses, leading a team at JPMorgan and as a distinguished engineer at IBM.
Everyone has islands of data … intentionally and intelligently defined, but disjointed and fragile.
I am relatively new to MongoDB ... I come from large businesses, leading a team at JPMorgan and a distinguished engineer at IBM. My interests are varied, but one thing I think we can all relate to is the circus we experience when we think about our data.
Everyone, almost no matter the size of company has many islands of data … all individually intentionally and intelligently defined, but as a whole disjointed and fragile.
Hopefully we capture some of the things we are all struggling with and that my experience working with MongoDB over the last few years will be useful.
So, lets get right into it.
We are trying to get from multiple systems … in this case some we control and others we do not … in all their gnarly forms … and create richer document models.
The use case is going to be customer centric – so a single view of the customer – and we are going to look how we bring together these sources into a more useful fashion and how that might evolve over time.
So, even though we plan to cover a customer centric use case, these single views exist in many industries. These are three examples.
Single view of a trader
Single view of a product
Single patient view
Lets take a look at the high-level architecture for a single view. This should look a lot like what you document for your solutions.
On the left we have the internet – our users, their devices, our applications – increasingly wearable devices.
In the middle we have the web stack and hopefully the expression of some web services.
Let’s zoom in on the heart of this system…
On the right we have a series of data sources from orders, content management, web logs, profiles etc.
Some of that data is magically flowing across the ETL arrows … maybe it flows to Hadoop, maybe it doesn’t.
You might have real-time event data, like a twitter feed or news, flowing to a stream analytics engine like Storm.
What we do know is that the life of data in MongoDB is not static. It exists as and evolving data set where real-time and offline analytics continuously enhances the profile.
This approximates the relationships between some of the data sources we have been considering.
The green elements are showing external systems – where we might simply be loading data (i.e. we do not control them)
The other systems are ones we own and the colors are discreet systems.
This is just a portion of what this might look like and it is useful when comparing it with a MongoDB mental model.
There are a few things happening here, so lets just take a moment with this diagram.
First, we have two major views or documents being created – customers and products.
A bunch of the data that was once external is not embedded into the document.
Other elements are linked, but you can see some elements are now being aggregated and stored into the respective documents.
For example, the web traffic is being aggregated to produce the top pages for a give user … you could imagine linking that to the topics that might be of interest through categories or tags.
You might use MapReduce to crunch a sentiment score … etc. etc.
Which highly rated reviews have “health” in the user supplied texte.g. find( { $text: { $search: “health" } } )
How often has this person reviewed a product?e.g. reviews.count()
Here is the thing … this process is iterative and going to require the repetition to hone
So focus on a single effort, stop overwhelming yourself.
What would you ask if you could … this links you back to the business which justifies the whole activity in the first place.
Prototype.
The good news is that MongoDB’s dynamic schemas make this easy.
Whatever we think are today’s requirements, they will change.
The data types we are dealing with certainly have changed.
The key is that schemas reflect the access patterns – the use – of the data.
The simplest way to figure out which single view you are trying to create is to list out all the questions your business has interest in asking and see what the main nouns are.
Items in RED are other nouns, but the prevailing theme is the customer.
So what do al these formats look like …
Everything from opaque binaries with a little meta data, twitter feeds, product catalogs, social commenting, rating services, content management systems etc. This stuff is structure for the most part, but its understandable not linked and would be valuable if they were!
----
Content: http://developer.nytimes.com/docs/article_search_api#h2-examples
Comments: http://developers.gigya.com/037_API_reference/030_Comments/comments.getComments
Product catalog (ebay): https://developer.ebay.com/DevZone/Product/CallRef/findProducts.html
Binary is from wikipedia
Twitter: https://dev.twitter.com/docs/api/1.1/get/statuses/user_timeline
So lets actually look at what this might look like …
Customer is modeled with metadata + a generic container called actions and more defined array for products.
Notice how products are managed externally. This is because some of the content does not make sense to embed. What is being embedded is the key and whatever summary or metadata might be use. For example, did the customer return the item.
Let’s take a look at what the document that represents this structure might look like.
At the top we have the typical metadata about the customer, then actions and then products. [explain what is in actions]
So, great! We have had success and just as would happen in real life, we need to add a feature. Not 24 hours ago we didn’t even have a single view and now all of a sudden we need to extend it.
In this case we are being asked to include sentiment analytics in our system to help customer representatives and our offer systems be aware and target customers more effectively.
Customer: Personal sentiment and sentiment to the company
Product: Sentiment around a specific product / specific version of a product
So many companies are facing change as a result of innovations
What questions do I want to answer … often I get questions about dba etc.
Call to action is about thinking where there is opportunity and what are you anchoring your data hub around?
----- Meeting Notes (5/19/14 12:48) -----
How are you going to organiza around ....
Data hub is here in the deck
Refining ... anamals are maiting ... horizon ....
----
Data source that might exist for each example ...
----
Single view of a dataset what the fuck is the example
----
New template…
We are here to talk about one of the most common, sexy and yet intractable challenges … creating a single view of the customer.
No more than 4 bullets.