Rohit Chatter is a principal architect at inMobi who has 17 years of experience working for companies like Yahoo!, Tivo, and Alcatel Lucent. He specializes in designing big data solutions using technologies like Hadoop, Hive, and HBase. In this presentation, he discusses the opportunities and challenges of big data, including issues around data growth, access, and timely insights. He then describes the features a big data BI product should have, such as custom reports, dashboards, and the ability to ingest, define relationships, and visualize large amounts of data quickly and easily. Finally, he provides examples of how big data BI can help industries like media, e-commerce, and telecommunications.
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Big Data BI Simplified
1. Big Data BI Simplified
Rohit Chatter
rohitchattar@gmail.com
Twitter: @Rohitchatter
2. agenda
Who I am – Rohit Chatter
Big Data - Preview
Big Data – Opportunities & Challenges
The Big Data Product
What’s Inside – 10,000 Feet
Use Cases
3. Rohit Chatter was Senior Architect at Yahoo! in
Advertiser and Data Platform group. Now at inMobi as
Principal Architect - Analytics
He is a thought leader specializing in designing
solutions involving huge amount of data. He
architected Paid Search BI stack for Microsoft-Yahoo
alliance that uses Hadoop, Hive, GraphDB & HBase.
He has deep knowledge and understanding of various
usage models involving traditional databases and
newer Big Data platforms to provide customer centric
and cost effective solutions.
He has spent 17 years in the industry. Before joining
inMobi, he has worked for companies like Yahoo!, Tivo,
Alcatel Lucent, TCS etc. Some of his recent projects
include BI solutions for Paid Search Advertiser
Analytics, Partner Analytics and Web Analytics.
rohitchattar@gmail.com
Speaker@TDWI Bangalore Chapter
Panel Member @ Hadoop The Fifth Elephant
Business Domain:
Web Analytics
Search Advertising Analytics
Publisher Analytics
Technology:
Hadoop, Hive, Hbase, RDBMS, BI
tools & technology, Data Modeling
4. agenda
Who I am – Rohit Chatter
Big Data - Preview
Big Data – Opportunities & Challenges
The Big Data Product
What’s Inside – 10,000 Feet
Use Cases
6. “Information is the oil of the 21st century,
...and analytics is the combustion engine.”
“Unfortunately, we spend 80% of the time
collecting data and 20% analyzing it.”
“With increasing importance of precise and timely insights, analysts
want to be able to deliver accurate data reports quickly.”
8. agenda
Who I am – Rohit Chatter
Big Data - Preview
Big Data – Opportunities & Challenges
The Big Data Product
What’s Inside – 10,000 Feet
Use Cases
9.
10. Business Problems
Scale
IT/BI Business
• Data growth with time
• Granularity needed for right business decisions
Data Reach
Ease of Data Access.
Distance between Data and Business
One time reports for investigation or validation of
analysis
Reprocessing
• Data reprocessing becomes a nightmare
• IT always in catch-up mode
Timely Insights
• Data acquisition to Insight – In Time
Low Flexibility for new Reports & Dashboards
• Add new dimension and metrics with complex
business rules
• Modify reports
• New dashboards
Engineering Involvement
• Huge dependency on IT/BI team on a day to
day basis
11. agenda
Who I am – Rohit CHatter
Big Data - Preview
Big Data – Opportunities & Challenges
The Big Data Product – To Be
What’s Inside – 10, 000 Feet
Use Cases
12. BI Framework on Hadoop
Custom Reports & Dashboard
Canned & Schedule based reports
Cubes (Yes!! On Hadoop)
Pivot interface for Visualization & Dashboard
STAR Model on Hadoop
Define Entities & Relationship
Define complex metrics
Define dimensions
Data to Analytics - Improved SLAs
Significantly reduces time to analytics from the time data is acquired
Single Sign On
What should Big Data BI have?
Analytics, Dashboards & Reports
Business grouping of reports
Report Designer
Dashboard Builder
Adhoc Analysis
Scalable & Pluggable architecture
Any Source
HBase, Solr, Graphdb, Pig, Shark, Impala, Hive, Oracle, MySQL
Data Re-processing – Simplified
All data processing happens on grid and stays on grid
Security
Report & Data access are managed via roles
14. Simplify?
INGEST DEFINE RELATIONSHIP VISUALIZE
0 5 10 15 20 25
New Dashboard
Self Serve
Data Accessibility
Data to Insights
BigData BI
Others
In Hours
Days
15. agenda
Who I am – Rohit Chatter
Big Data - Preview
Big Data – Opportunities & Challenges
The Big Data Product – To Be
What’s Inside – 10, 000 Feet
Use Cases
17. agenda
Who am I – Rohit Chatter
Big Data - Preview
Big Data – Opportunities & Challenges
The Big Data Product – To Be
What’s Inside – 10,000 Feet
Use Cases
18. Where all BigData BI can help?
Media Industry
•Audience Engagement, User Value life cycle, User Behavior
•Ad Network – Campaign optimization, Better ROI, Brand
Performance
•Exchange
E-commerce
•Recommendation engine
•Sentiment Analysis & Brand loyalty
19. CHURN PREDICTION FOR A TELECOM OPERATOR
Identify the risky customers and develop focused strategies to retain them.
SOLUTION APPROACH
► Dependent variable to define attritors: Customer was defined as attritor if they has done less than 2 calls over a period of 3 months
► Logistic regression was used to develop a model equation to calculate attrition propensity score for all customers
► Customer scores were developed to rank them into high medium and low attritors.
Based on Model the customers were
targeted with a marketing offer proactively
which reduced attrition and resulted in $
2.3 MM inc. volume
Predicted
Value
Observed
value
Likelihood for
attrition
Likelihood for no
attrition
Total
Customers on Attrition 8,422 1,824 10,246
Customers on No attrition 1,708 14,012 15,720
The statistical Model performed 2.67% Total 10,130 15,836 25,966
better than random prediction
BUSINESS IMPACT
20. Customer Life Time Value
Develop targeted marketing programs for high potential/high value clients
SOLUTION APPROACH
Segmentation:
The Natural Segmentation conducted through K-Means clustering showed 4
distinct segments: S1 – Utility Customers S2 – Premium and Loyal Customers S3 –
Premium and careful S4 – Service shy
The final cluster comprised of low value customers though the number of
customers in that segment was high.
2500
2000
1500
1000
500
0
Labor Revenue
-10% 0% 10% 20% 30% 40% 50%
-500
Total Revenue Share
SAMPLE OUTPUT: LABOR REVENUE FROM 4 SEGMENTS
BUSINESS IMPACT
► Behavioral change among the customers falling in the two groups of interest represented over 12 Million $ of revenue to be
gained annually
► The CLTV value was compared with marketing investment per customer to find the viability of customer acquisition. The
organization was able to save on marketing investment by 35% and increased revenue by 43%.
S1
S2
S3
S4
Customer Life Time Value
The NPV method was employed for calculating the Customer Life Time Value
(CLTV).
CLTV model for each segment was built and CLTV of each customer was
calculated.
Based on the CLTV values, a further segmentation of customers were done as:
High value, Moderate value and Low value. SAMPLE OUTPUT: Top 10 customers of S2 Segment CLTV