Mais conteúdo relacionado Semelhante a Social Data Analytics using IBM Big Data Technologies (20) Mais de Nicolas Morales (14) Social Data Analytics using IBM Big Data Technologies1. Social Data Analytics
using
IBM Big Data technologies
Vijay Bommireddipalli vijayrb@us.ibm.com
Development Manager, Social Data Accelerator
IBM Big Data
October 21, 2013
© 2012 IBM Corporation
2. Please note
IBM’s statements regarding its plans, directions, and intent are subject to change or
withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general
product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment,
promise, or legal obligation to deliver any material, code or functionality. Information
about potential future products may not be incorporated into any contract. The
development, release, and timing of any future features or functionality described for
our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM
benchmarks in a controlled environment. The actual throughput or performance that
any user will experience will vary depending upon many factors, including
considerations such as the amount of multiprogramming in the user’s job stream, the
I/O configuration, the storage configuration, and the workload processed. Therefore,
no assurance can be given that an individual user will achieve results similar to those
stated here.
2
© 2011 IBM Corporation
4. Tag ! You’re it !
- Micro-segmentation
4
© 2011 IBM Corporation
5. Social Data Analytics
- Using social media as a rich source of information
Behavior
Maybe our politicians should take
a playbook out of the rivalry
between duke/unc and take it
to the courts
http://ity.com/wfUsir
I'm at Mickey's Irish Pub Downtown
(206 3rd St, Court Ave, Raleigh) w/ 2
others http://4sq.com/gbsaYR
@silliesylvia good!!! U
Interest
shouldnt! Think about the
Location
important stuff, like ur 43rd
birthday ;)
@silliesylvia I <3 your leather
Consumption
btw happy birthday Sylvia ;)
leggings!! Its so katniss!!
dear redbox please have
kings speech for my new tv
colin firth movie marathon
Age
Intent to consume
@silliesylvia $10 dollars says
matthew & mary get married
next season :)
#downtownabbey
OMG OMG. just
dropped my new ipad3
crappola!!!
Consumption
5
Prediction
Interest
@bamagirl can’t wait to
watch sherlock with you!
Oh, robert downey jr, I still
love you but bbc is so
amazing
Intent to consume
360 degree profile
Personal Attributes
• Sylvia Campbell, Female, In a
Relationship
• 32 years old, birthday on 7/17
• Lives near Raleigh, NC
• College graduate; Income of 80-120k
Buzz/Sentiment
• Retweets BF’s comments
• Interest in BBC shows: Downton Abbey,
Sherlock, Fringe, (P&P?)
• Sherlock Holmes, Robert Downey, Jr.
• Hunger Games, Katniss/J. Lawrence
Interests/Behavior
• Watch movies, tv shows
• Romance plots, “hero types”, strong
women
• Uses iPad 3, Redbox, Hulu
• Shopping , interest in sales/deals
• Duke/ UNC basketball
© 2011 IBM Corporation
6. Social Data Analytics
- Comprehensive Entity Extraction and Integration
Name: Jane Doe
Id: jaydee
Address: Home of
the Buccaneers
Interests: running,
yoga, football…
Name: Jane Doe
Name: Jane Doe, Cava
Address: Tampa, FL
Address: Tampa, Fl
Twitter: jaydee
Twitter: @maryguida
Blog Topic: food
Blog Topic: politics
Hobbies: running, yoga, …
Hobbies: running, yoga, …
Relationships: Tony C (brother)…
Relationships: Tony C (brother)…
Name: J Doe
Blog Topic: food
Entity
Integration
Name: jane
Address: Tampa, FL
Relationships: Tony C
(brother)., …
All names are fictitious
6
Challenges:
Scale
1000’s sites, 100s millions users
Complex matching decisions
Partial, noisy and incomplete profile
attributes
Only 3% of consumers have sufficient
attribute information in their profiles.
© 2011 IBM Corporation
7. Consumer Intelligence
Timely Insights
• Intent to buy various products
• Current Location
Personal Attributes
• Identifiers: name, address, age,
gender, occupation…
• Interests: sports, pets, cuisine…
• Life Cycle Status: marital, parental
Social Media based
360-degree
Consumer Profiles
• Personal relationships: family,
friends and roommates…
• Business relationships: co-workers
and work/interest network…
• Life-changing events: relocation,
having a baby, getting married,
getting divorced, buying a house…
What should I buy?? A mini laptop with Windows 7
OR a Apple MacBook!??!
Location announcements
I'm at Starbucks Parque Tezontle
http://4sq.com/fYReSj
7
• Personal preferences of products
• Product Purchase history
Relationships
Life Events
Monetizable intent to buy
I need a new
products digital camera for my food pictures,
any recommendations around 300?
Products Interests
Life Events
College: Off to Stanford for my MBA! Bbye chicago!
Looks like we'll be moving to New Orleans sooner than I
thought.
Intent to buy a house
I'm thinking about buying a home in Buckingham Estates
per a recommendation. Anyone have advice on that area?
#atx #austinrealestate #austin
© 2011 IBM Corporation
10. Big Data Platform and Accelerators - Summary
Software components that
accelerate development and/or
implementation of specific
solutions or use cases on top
of the Big Data platform
Provide business logic, data
processing, and
UI/visualization, tailored for a
given use case
Analytic Applications
Bundled with Big Data platform
components – InfoSphere
BigInsights and InfoSphere
Streams
BI /
Exploration / Functional Industry Predictive Content
Reporting Visualization
App
App
Analytics Analytics
IBM Big Data Platform
Visualization
& Discovery
Applications &
Development
Systems
Management
Accelerators
Hadoop
System
Stream
Computing
Data
Warehouse
Contextual
Search
Key Benefits
Information Integration & Governance
Time to value
Leverage best practices
around implementation of a
given use case.
Cloud | Mobile | Security
10
© 2011 IBM Corporation
11. Social Media Analytics Architecture
Online flow: Data-in-motion analysis
Real time analytics.
Pre-defined views
and charts
Stream Computing and Analytics
Social Media
Data Ingest
and Prep
Entity
Analytics:
Profile
Resolution
Extract Buzz,
Intent ,
Sentiment
Dashboard
BigInsights System and Analytics
Social Media
Data
Extract Buzz,
Intent ,
Sentiment And
Consumer
Profiles
Entity
Analytics and
Integration
Comprehensive
Social Media
Customer
Profiles
Pre-defined
Workbooks and
Dashboards
Offline flow: Data-at-rest analysis
Data Explorer
Index using
Push API
Ad hoc access
Optional: Indexed Search
11
© 2011 IBM Corporation
12. SDA 1.2
Social Media Sources Supported
– Gnip, Boardreader
– Tweets, Boards, Blogs
Analyze Streaming data as well as data at rest
– Streams for processing of streaming data
– BigInsights/Hadoop for input, output and configuration data
Key Micro-segmentation Attributes (out-of-box)
– Personal Info: Gender, Location, Parental status, Marital status, Employment
– Interests: Movie interest, Comic book fan, Product interest, Current customer
of, Products owned
– ** Attributes can be added in (requires some development effort)
Entity resolution across the different social media sources
12
© 2011 IBM Corporation
13. SDA 1.2
Outputs/Measures (out-of-box)
–
–
–
–
Buzz
Sentiment
Intent to buy/start service
Intend to attend/see
Example use cases
–
–
–
–
Retail – Lead generation, Brand management
Financial – Lead generation and Brand management
Media & Entertainment: Brand management
Generic
Visualization using BigSheets
Extendable/Customizable Solution
13
© 2011 IBM Corporation
14. SDA - Acting on the insights
Metrics based understanding of Feedback in Social Media
– And more importantly Feedback from whom !
Comprehensive (social media) profiles with microsegmentation
information
Campaign execution can be done in Social Media
Entity resolution across the different social media sources
External (social media) to Internal (CRM) linkage **coming
14
© 2011 IBM Corporation
15. SDA Outputs
Pre-defined Workbooks
Dashboards
Granular outputs for further slicing and dicing by Data Scientists
15
© 2011 IBM Corporation
17. BigInsights & Streams Text Analytics
High Performance rule based Information Extraction Engine
Highly scalable solution available for at-rest and in-motion analytics
Pre-built extractors, and toolkit to build custom Extractors
• Rich Extractor library supports multiple languages
• Declarative Information Extraction (IE) system based on an algebraic
framework
Sophisticated tooling to help build, test, and refine rules
Developed at IBM Research since 2004
Embedded in several IBM products
• BigInsights, Streams.
• Lotus Notes
• Cognos Consumer Insights
What is TA
17
Why
Biginsights
TA
How is TA
Deployed
& used
Dev. tools
© 2011 IBM Corporation
18. Applications of Text analytics
Broad range of applications in many industries
• CRM Analytics
Voice of customer
Product and Services gap analysis
Customer churn
• Social Media Analytics
Purchase intent
Customer churn prediction
Reputational Risk
• Digital Piracy
Illegal broadcast of streaming and video content
• Log Analytics
Failure analysis and root cause identification
Availability assurance
• Regulatory Compliance
Data Redaction
• Identify and protect sensitive information
18
What is TA
Why
Biginsights
TA
How is TA
Deployed
& used
Dev. tools
© 2011 IBM Corporation
19. Performance Comparison (with ANNIE open source **)
Task: Named Entity Recognition
Dataset : Different document collections from the Enron corpus obtained by randomly sampling 1000 documents for each
size
Throughput (KB/sec)
700
600
500
400
ANNIE
Open Source Entity Tagger
300
>10x faster
< 60% memory
SystemT
200
100
0
0
20
40
60
80
100
Average document size (KB)
** http://dl.acm.org/citation.cfm?id=1858681.1858695
Performance comparison with GATE 5
What is TA
19
Why
Biginsights
TA
How is TA
Deployed
& used
Dev. tools
© 2011 IBM Corporation
20. Text Analytics Development Flow
Declarative language for extractor logic
Optimization and deployment to scalable runtime
Extracted
Information
Development Tooling
Extractor
Text Analytics
Optimizer
Compiled
Operator
Graph
Text Analytics
Runtime
Sample Input
Documents
Rule based language
Annotator Query Language - AQL
with familiar SQL-like syntax
Specify annotator semantics
declaratively
Choose an efficient
execution plan
Highly scalable,
embeddable Java runtime
What is TA
20
Why
Biginsights
TA
How is TA
Deployed
& used
Dev. tools
© 2011 IBM Corporation
21. Invoking Text Analytics within BigInsights
Document
encoded as
JSON record.
Jaql runtime coordinates a
multi-stage map-reduce flow.
JAQL Function Wrapper
Input Record
{
label: “http://www.ibm ...”,
text: “<html>n<head> …”
}
AQL
SystemT
Optimizer
Dictionaries
21
Input
Adapter
SystemT
Runtime
Compiled
Plan
Output
Adapter
Output Record
{
label: “http://www.ibm ...”,
text: “<html>n<head> …”
Person:
[
{ firstName: [10, 15],
lastName: [16, 25] },
…
{ firstName: [1042, 1045],
lastName: [1046, 1050] }
],
Hyperlink:
[
{ anchorText: [25, 33] },
…
{ anchorText: [990, 997] }
],
H1: …
Annotations added as
additional attributes to
JSON} record.
© 2011 IBM Corporation
22. Additional Advantages of IBM Text Analytics
Quality: Drives effectiveness of entire application
• Enables high accuracy and coverage
Performance: Dominant cost is CPU
• Process large documents and large number of documents
with high throughput
Explain-ability
• Determine the cause of errors and fix it without affecting the
remaining correct results
Reusability: easily adaptable for a different domain
• The development platform must enable layers of abstractions to be built and easily reused
in a different domain
Expressivity
• Rule language with a rich set of operators available to enable complex extraction tasks
What is TA
22
Why
Biginsights
TA
How is TA
Deployed
& used
Dev. tools
© 2011 IBM Corporation
23. BigInsights Text Analytics Development
What is TA
23
Why
Biginsights
TA
How is TA
Deployed
& used
Dev. tools
© 2011 IBM Corporation
24. AQL editor with content assist
24
What is TA
Why
Biginsights
TA
How is TA
Deployed
& used
Dev. tools
© 2011 IBM Corporation
25. Understanding the lineage of results
Click to drill down and see
the rules that triggered
inclusion of results
Explain and search
through the results
What is TA
25
Why
Biginsights
TA
How is TA
Deployed
& used
Dev. tools
© 2011 IBM Corporation
26. IBM Text Analytics for Big Data
High Performance Information Extraction Engine
Analysis can be applied to data at-rest and in-motion
• Build extractor once and use with BigInsights or Streams
Parallel execution scales to Big Data volumes
• Linearly scalable to extremely high volumes
Highly customizable to a variety of domains and languages
• Pre-built extractors available out of the box
Sophisticated tooling enables ease of development and refinement of results
26
© 2011 IBM Corporation