2. Data-Driven, a Survival Must?
2BI: Goal, Mindset
Industry Winner Loser
Online Retail
DVD Rental
Social Network
The same story will happen in other industries soon, if not already happened.
BI is a major tool to turn a company into a data-driven company.
3. What Can BI System Help?
3BI: Goal, Mindset
Globalization
Customer Demand
Market Conditions
Competition
Technology Advance
Regulations
…
Business
Environment
Organization
Responses
Strategic Planning
New Business Models
Restructure Business Processes
Supply Chain Optimization
Improve Partnership Relationships
Improve Information Systems
Encourage Innovation
Improve Customer Service
Improve Communication
Improve Data Access
Automate tasks
Real-time Response
…
Pressures
Opportunities
Decision and
Support
Analysis
Predictions
Decisions
Business
Intelligence
Support
Turban, E., Sharda, R., Delen, D., and King, D. (2010). Business Intelligence: a managerial approach (2nd ed.). Upper Saddle River, NJ: Prentice Hall.
4. Without Good BI System – Information Everywhere, but Hard to Access
4BI: Goal, Mindset
Executives
Managers Why cannot I
have right
data at right
time?
Analysts Why should I
waste most
time getting
the data?
Data Engineers Ad hoc query
all day? The
data job is so
boring.
Operators
Can I get all
relevant data
in one place?
5. Without Good BI System – Inefficiency due to Information Silo
5BI: Goal, Mindset
O9 Solutions, Inc. Funny Business: Sales and Operations, accessed January 28, 2016, https://www.linkedin.com/hp/update/6098306831355047936
6. How Can BI System Help?
6BI: Goal, Mindset
Bench
marking
Historical Current Predictive
Views of Business Operations and Performance
Better, Quicker Business Decision-Making
Performance
Management
Reporting Analytics
Data
Mining
Predictive
Analytics
FinanceSales
Returns
Supply
Production
Web
Email
User
Usage
Industry
Analysis
Competitive
Analysis
Social
Analysis
Product
Ranking
Technology
Analysis
Internal Data External Data
7. How BI Deliver Core Values to Customers
7
Business
Intelligence
Tableau
QlikView
Amazon QuickSight
Web/Mobile
Analysis
Google Analytics
Adobe Analytics
SEO
Social
Analysis
Facebook Twitter
Pinterest WeChat
Cloud Big Data
Warehouse
Amazon Redshift
Machine
Learning
R Python
Amazon ML
BI: Goal, Mindset
8. Web/Mobile/APP Analysis – Audience(demo, interest, user type)
8
Davis, J. (2015). Google Analytics Demystified: A Hand-On Approach (2nd edition). CreateSpace.
BI: Web/Mobile Analysis
11. Adobe Discover: Navigation Flow Among Top Pages/Content
11Adobe training video. Retrieved
from https://outv.omniture.com/. 11BI: Web/Mobile Analysis
12. Adobe Discover: Navigation Flow from a Page
12
Adobe training video. Retrieved from https://outv.omniture.com/.
BI: Web/Mobile Analysis
15. Visitor Segmentation Study via Adobe SiteCatalyst
15BI: Web/Mobile Analysis
• What/how are
they viewing?
• Why did they
leave?
• How to engage
them more?
• How to
connect them?
New
Visitors
Casual
Visitors
Loyal
Visitors
Elapsed
Visitors
• Growing the loyal visitors is essential to keep the site thriving.
• So it is important to understand their navigation pattern, what do
they like and unlike.
16. Visits from Social Channels
16
Facebook
Pinterest
Twitter
BI: Social Analysis
17. Top Facebook Posts – Facebook Insights
17BI: Social Analysis
Talking about this:
Engaged Users:
Reach:
Engagement Rate:
19. Trace Pin in Pinterest – Curalate.com
19BI: Social Analysis
• ~1000 visits form this pin.
• The pinboard YUM by another
account has 655 Pins and 168,699
Followers.
• Keywords used to find the pin, should
the pin be tagged this way?
20. Tao of Social Media
20BI: Data Visualization
Schaefer, M. (2012). The Tao of Twitter: Changing Your Life and Business 140 Characters at a Time. NY: McGraw-Hill Education.
• Tao 1: Making Targeted Connections
• Tao 2: Providing Meaningful Content
• Tao 3: Offering Authentic Helpfulness
21. Tao in Chinese
21BI: Data Visualization
圣人无常心,以百姓心为心。
-- 老子 道德经 四十九章
The sage has no mind of his own
He is aware of the needs of others.
-- Lao Tsu, Tao Te Ching
Lao Tsu, (1997). Tao Te Ching - 25th Anniversary Edition. Translated by Gia-fu Feng and Jane English, Chapter 49. NYC, Vintage
Books / Random House.
22. What is Big Data
22
• Forrester: Big Data is the frontier of a firm's ability to store,
process, and access (SPA) all the data it needs to operate
effectively, make decisions, reduce risks, and serve customers
• IBM: Big data is the data characterized by 3 attributes:
volume, variety, and velocity
Walker, R. (2015). From Big Data to Big Profits: Success with Data and Analytics, chapter 1. NYC: Oxford.
BI: Data Flow Architecture
23. Big Data Market is Growing Fast
23BI: Data Flow Architecture
Kelly, J. (2015). Executive Summary: Big Data Vendor Revenue and Market Forecast, 2011-2016, accessed January 21, 2016,
http://wikibon.com/executive-summary-big-data-vendor-revenue-and-market-forecast-2011-2026/
24. Big Data in Cloud
24BI: Data Flow Architecture
Wikipedia. Cloud Computing, accessed January 28, 2016, https://en.wikipedia.org/wiki/Cloud_computing
• Big data are moving
to cloud fast.
• Applications in cloud
are generating more
big data in cloud.
25. BI Data Flow Architecture With ETL
25BI: Data Flow Architecture
Relational
Database
NoSQL
Store
Excel File
Text File
Web
Extract
Standardize
Primary Keys
Clean-
ing
Transform
Transform
Format
Translate Embedded
Logic
Referential Integrity Check Indexing
Load
BI Data Warehouse
Summarization
Derivation
Merge Sort
Integration
Aggregation
BI System
Social
Moss, L. T. & Atre, S. (2003). Business intelligence roadmap: the complete project lifecycle for decision-support applications.
Boston, MA: Addison –Wesley.
26. Facebook’s Data Space Management with Open Source Tools
26BI: Data Flow Architecture
Transactional
Databases
Application Logs
Web
Crawls
(Post)
Structured Data Unstructured Data
Hadoop Distributed File System (HDFS)
Query language Query UI (HiPal)
Hive
15 terabytes new
data per day in 2009
Data Warehousing
Framework
Argus
Portal for Sharing
Charts and Graphs
Databee
Workflow
Management
System
PyHive
Python Script
Framework for
MapReduce
Cassandra
Storage System for
Serving Data to End
Users
Tools
Parallelized Data
Processing at Massive
Scale
Hammerbacher, J. (2009). Information platforms and the rise of the data scientist. In Segaran, T. & Hammerbacher, J. (Eds.).
Beautiful Data, chapter 5. Sebastopol, CA: O’Reilly Media.
27. Teradata Unified Data Architecture
27BI: Data Flow Architecture
“Teradata Unified Data Architecture in Action,” Teradata’s Corporation, accessed April 19, 2014, http://www.teradata.com/white-
papers/Teradata-Unified-Data-Architecture-in-Action/
28. Amazon Big Data Portfolio
28BI: Data Flow Architecture
“Introduction to Amazon Redshift,” Pavan Pothukuchi, accessed January 15, 2016,
http://www.slideshare.net/AmazonWebServices/dat201-introduction-to-amazon-redshift
29. Amazon Redshift Benefits
29BI: Data Flow Architecture
“Introduction to Amazon Redshift,” Pavan Pothukuchi, accessed January 15, 2016,
http://www.slideshare.net/AmazonWebServices/dat201-introduction-to-amazon-redshift
30. Amazon Redshift Architecture
30BI: Data Flow Architecture
“Introduction to Amazon Redshift,” Pavan Pothukuchi, accessed January 15, 2016,
http://www.slideshare.net/AmazonWebServices/dat201-introduction-to-amazon-redshift
31. Machine Learning
31BI: Machine Learning
“Machine Learning,” Andrew Ng, accessed January 20, 2016, https://www.coursera.org/learn/machine-learning
• Definition
– Field of study that gives computers the ability to learn without being explicitly
programmed. -- Arthur Samuel (1959).
• Examples:
– Database mining
• Large datasets from growth of automation/web.
• E.g., Web click data, medical records, biology, engineering
– Applications can’t be programed by hand.
• E.g., Autonomous helicopter, handwriting recognition, most of Natural Language
Processing (NLP), Computer Vision.
– Self-customizing programs
• E.g., Amazon, Netflix product recommendations
– Understanding human learning (brain, real AI).
32. Neuron & Neural Networks
32BI: Machine Learning
“Machine Learning,” Andrew Ng, accessed January 20, 2016, https://www.coursera.org/learn/machine-learning
Pedestrian Car Motorcycle Truck
Want , , , etc.
when pedestrian when car when motorcycle
Input Output Input (Image Pixel) Output (Judgement)
33. Use Amazon ML for Filtering Actionable Tweets
33BI: Machine Learning
Alex Ingerman (2015) “Real-World Smart Applications with Amazon Machine Learning,” , accessed January 29, 2016,
https://www.youtube.com/watch?v=sHJx1KJf8p0
Customer Service
Actionable
Customer Service
Not Actionable
Human LabelTweets mentioning AWS Training ML Model
Training tweet analysis
model developed by
Amazon to automatically
find the tweets which
are actionable for
customer service
34. Beautiful Data Visualization
34BI: Data Visualization
Lliinsky, N. (2010). On beauty. In Steele, J. & Lliinsky, N. (Eds.). Beautiful visualization, Chapter 1. Sebastopol, CA: O’Reilly Media.
• Informative
– Reveal intended message clearly with enough data
– With different perspectives to facilitate discovery
• Efficient
– Visually emphasize what matters and reveal relationship
– Use axes, color and size to convey meaning
• Novel
– Break the limit of default format, choose best format to suit data
– A fresh look at the data
– A new level of understanding
• Aesthetic
– Appropriate usage of graphical construction to offer visual appeal.
35. Napoleon’s March to Russia in 1812 - 1813
35BI: Data Visualization
Tufle, E. (2001). The Visual Display of Quantitative Information (2nd ed.). (Original by Charles Joseph Minard.) Connecticut , US:
Graphics Press.
•Army size
•Geo Location
•Move direction
•Temperature
•Date
•Event
36. When Relevant Information Are Put Together…
36BI: Data Visualization
Tufle, E. (2001). The Visual Display of Quantitative Information (2nd ed.). Connecticut , US: Graphics Press.
A cholera epidemic took
the lives of 600 Londoners
in September 1854.
Nobody knew the cause.
Dr. John Snow started the
mapping of incident
locations, and linked them
to a particular pump site.
It was verified later the
Broad Street pump was
the cause of the epidemic.
Sanitization started and
then the epidemic was
stopped.
37. When We Do Literarily What User Asked …
37BI: Data Visualization
Could I have top
10 stores in BI?
No problem.
Here you are!
I see. Thanks.
I thought you would
be very interested.
I was… but only
for 10 seconds…
38. If We Add a Cyclic Group of Category and Brand …
38BI: Data Visualization
Change
Dimension
Drill-down to Phone Drill-down to Apple
39. Can Business Intelligence Match Human Intelligence?
39
How the six tech
companies were
organized?
(Manu Cornet, 2011)
http://www.bonkersworld.net/
organizational-charts/
Can BI system
bring insights so
straightforward
and drive users
to think deep?
BI: Data Visualization
40. Information in Well-designed Dashboard
40BI: Dashboard Design
• Exceptionally well organized
– All important data in one page
• Condensed, primarily in the form of summaries and exceptions
– Single numbers from sums or averages.
– Something falls outside the realm of normality, which needs attention.
• Specific to and customized for the dashboard’s audience and objectives
– Information should be narrowed to address the objective(s).
– Use audience’s vocabulary.
• Displayed using concise and often small media that communicate the data
and its message in the clearest and most direct way possible.
– Reduce the non-data pixels.
– Enhance the data pixels.
Few, S. (2006). Information Dashboard Design. Sebastopol, CA: O’Reilly Media.
41. Define Key Performance Indicators (KPIs)
41BI: Dashboard Design
Category Measures
Sales Bookings
Billings
Sales pipeline
Number of orders
Order amounts
Selling prices
Marketing Market share
Campaign success
Customer
demographics
Finance Revenues
Expenses
Profits
Web
Services
Number of visitors
Number of page hits
Visit durations
Comparative Measure Example
The same measure at
the same point in time
in the past
The same day last year
The same measure at
some other point in
time in the past
The end of last year
The current target for
the measure
A budgeted amount for the
current period
A prior prediction of the
measure
Forecast of where we
expected to be today
An extrapolation of the
current measure
Projection out into the
future, e.g. year end.
Some measure of the
norm for this measure
Average, normal range or a
bench mark.
Few, S. (2006). Information Dashboard Design. Sebastopol, CA: O’Reilly Media.
42. Effective Dashboard Display Media
42BI: Dashboard Design
Few, S. (2006). Information Dashboard Design. Sebastopol, CA: O’Reilly Media.
Easier to spot trend with line chart
Clean
display
of
related
data
Simple
symbol
or
number
43. Utilize Short-Term Memory
43BI: Dashboard Design
Few, S. (2006). Information Dashboard Design. Sebastopol, CA: O’Reilly Media.
• Memory comes in three fundamental types:
– Iconic memory (a.k.a. the visual sensory register)
– Short-term memory (a.k.a. working memory)
– Long-term memory
• Only 3-9 chunks of information can be stored in short-term memory.
• Graphs over text.
– Individual numbers are stored in discrete chunks.
– One or more lines in a line graph, can represent a great deal of information as a single chunk.
• Relevant information on the same screen.
– Once the information is no longer visible, unless it is one of the few chunks of information
stored in short-term memory, it is no longer available.
– If everything remains within eye span, users can exchange information in and out of short-
term memory at lighting speed.
44. Sample Sales Dashboard
44BI: Dashboard Design
Few, S. (2006). Information Dashboard Design. Sebastopol, CA: O’Reilly Media.
45. When Dashboard is not Enough -> Self Service BI
45BI: Dashboard Design
• As soon as a dashboard shows abnormalities, users will
want to know more details.
• The responsible individual will be called. He will query the
database or ask IT staff to run the query… The process is
long and resource consuming.
• Layered reports in self service BI can provide top-down
views to user fingertips:
– Layer 1: One page overview
– Layer 2: Categorical reports such as regional/product reports
– Layer 3: Data tables down to most granular levels.
46. Data Modeling – Understand Data Connection
46BI: Data Modeling
• Given a system, first study how the data are linked, then
model the linkage in BI system.
R. Arlen Price
Faculty
An obesity-related locus in
chromosome region 12q23-24
Diabetes
Author
Subscribe
Read
American Diabetes
Association
Publication
National Institutes of
Health
Funding
Research Interest
Genetics of Complex Traits, Genetics
of Obesity, Behavioral Genetics,
Genetic Epidemiology
Faculty Profile
Research Techniques
Linkage mapping, linkage
disequilibrium association analyses,
and gene expression profiling
Profile
Research
Strength
Ding Li
Author
Student
Attend
Events
Proposal
Review
Data Linkage on STM Publishing
47. Data Modeling – Natural Linear & Star Structure
47BI: Data Modeling
• Data connection is the key to revel the insights hidden in data.
• In simple situation, a central table or a central key field can
link the tables together.
48. Data Modeling – Construct Star Structure
48BI: Data Modeling
• A link table can be constructed to link tables on multiple
common fields.
• In this example, Sale, Return and Target tables need to be
linked on (Item, Store, Date).
49. Data Modeling – Time Series
49BI: Data Modeling
• In time series model, each event keeps its own timestamp, so it is
easy to track the time gap in each step.
• Typical questions:
– For all the articles submitted on Jan. 2013, how long does it take to get
reviewed, receive final decision, and publish online if accepted? Compare
with articles submitted on Jan. 2012.
– For all the articles published on Apr. 2014, when were they submitted,
reviewed, and received final decision? Compare with the articles
published on Apr. 2013.
Submit
Date
Review
Date
Decision
Date
Online
Date
Download
Date
50. Data Modeling – Universal Time
50BI: Data Modeling
• In this model, users want to view all activities within same period.
• Typical questions:
– In Apr. 2014, how many articles submitted, reviewed, and published? If a
user change to another period, all the numbers will be changed according
to new period simultaneously.
Event
date
Submit
Editor
Review
Peer
Review
Production
Online
Usage
51. Challenges of BI Development Management
51BI: Development Management
• BI project involves cross talk between multiple departments.
Winning cooperative support is the key for its success.
• BI development often encounters unexpected issues in data
availability, data quality, data linkage, and business logic
transfer. Forcing a deadline may cause low-quality report;
over-relaxing due date may halt a project. An agile process is
pivotal to moving project forward.
• BI system is very efficient to expose data abnormalities. A
cleaner data system is only possible if source data problem is
addressed between BI developer and data owners/suppliers.
52. Heavyweight Development Process – Thorough but High Risk
52BI: Development Management
Moss, L. T. & Atre, S. (2003). Business intelligence roadmap: the complete project lifecycle for decision-support applications. Boston,
MA: Addison –Wesley.
53. Agile Development Process
53BI: Development Management
Plan
•Business
Goals
•KPIs
Analysis
•Data Sources
•Calculation
Logics
Data ETL
•Extraction
•Transform
•Loading
Design
•Report Layout
•Data
Visualization
Validation
•Data
•Logics
Feedback
•New
Requirements
Phased Release.
◦ Important KPIs first.
◦ Well connected data first.
Fast Development
Quick Feedback
◦ Design
◦ Data
◦ Logic
54. BI Platforms – 2015 Gartner Magic Quadrant
54BI: Platform, Tool
Rita L. Sallam, Joao
Tapadinhas, Josh Parenteau, Daniel
Yuen, Bill Hostmann (2014). Magic
Quadrant for Business Intelligence
and Analytics Platforms, last
accessed on Apr. 22,2014,
http://www.gartner.com/technolo
gy/reprints.do?id=1-
1QLGACN&ct=140210&st=sb
Agile Platform
◦ Tableau.
◦ QlikView.
◦ Tibco Spotfire
Large Platform
◦ Microsoft
◦ IBM (Cognos)
◦ SAS
◦ SAP (BusinessObjects)
◦ Oracle (OBIEE)
◦ MicroStrategy
◦ Information Builders
55. BI Platform Example – QlikView
55BI: Platform, Tool
• Pros
– Click driven, visually interactive interface is simple to learn and use.
– Based on in-memory associative technology, which is fast.
– Flexible data source (Oracle, SQL, excel, txt file).
– Quicker to build comparing with traditional BI systems.
• Cons
– Need straight-forward relationship among tables, which requires clean
data to link tables.
– Its underlining calculation logic, set analysis, is hard to use for
complicated logics.
– Its script language is not complete enough to accomplish
comprehensive tasks.
– Most data need to be in memory.
56. BI Platform Example – Tableau vs QlikView
56BI: Platform, Tool
• Pros
– More innovate visualization, including geo mapping.
– Using UI to select data set instead of expression in code.
– Free Tableau Public makes it very popular.
• Cons
– Weak ETL capability.
• Sample Projects
– Payment difference to medical providers for 100 common inpatient services
57. Tableau Public – Free Hosting of Data Visualization
57BI: Platform, Tool https://public.tableau.com/s/gallery/new-yorks-citi-bikes
58. Thank You
58BI: Thank You
Analyzing data is worth the cost…
The price of light is less than the cost of darkness.
--Arthur C. Nielsen, Founder of ACNielsen Company
Please send your comment or suggestion to ding.li@smartdatanet.com
59. Appendix: Services from Smart Data Net Inc.
59BI: Smart Data Net Inc.
Data
Web Clicks Social Posts
User DemographicsSale
Supply
Competitors
BI Solutions
1.Provide
2.Analyze
3.Develop
4.GetInsights
Business
Client
Smart Data
NetCommunicate all the time
Forecast
Demand
Return
Profit
Cost
Marketing
User Feedback
Email Open/Click
User ReferralR & D
60. Appendix: How BI Can Help Small Business
60BI: How BI Can Help Small Business
• Web Analysis
– What do users want to see?
– Can users find right contents?
– Is website search engine friendly?
– Tool: Google Analytics
• Social Analysis
– What contents are engaging users?
– How to make contents far-reaching?
– How to foster a supportive social group?
– Tool: Facebook Insights, Twitter Analytics,
HootSuite, Curalate
• Sale Analysis
– Near real-time revenue/cost analysis
– Find problem/opportunity quickly
– Service level analysis
– Tool: Tableau, QlikView
• Marketing/User Analysis
– Which marketing method can bring most
valuable users?
– How to target right users based on their
previous behavior?
– User segmentation analysis
– Tool: Tableau, QlikView, R, Python
61. Appendix: How to Become a BI Developer/Data Scientist
61BI: How to Become a BI Developer/Data Scientist
• Visualization Track
(programming experience not required)
– Proficient on a Visualization Tool
• Tableau, QlikView
– Study Visualization Best Practices
• Books from Edward Tufle, Stephen Few
– Understand Business Analysis Flow
• Discuss with business users
• Data Management Track
– Data Warehouse & BI Platform
• Amazon Redshift, Cognos, SAS, SAP, SQL,
Oracle
– Big Data Store
• Hadoop, Teradata, AWS, Azure
– No-SQL Store
• MongoDB
• Data Mining Track
(for programmer or statistician)
– Data Manipulation
• Python
– Statistics
• R
– Machine Learning
• Octave, Java, R, Python
• Resources
– Free Online Classes
• Coursera.og
– Seminars
• Meetup.com
– Tool Online Training
• www.tableausoftware.com/learn/training
62. Appendix: How to Become a Tableau Developer from Scratch
62BI: How to Become a Tableau Developer
• Tableau has Powerful Visualization, Great
Usability and Short Learning Curve
– Efficient for geo and trending analysis
– Takes a couple of weeks to learn and a few
months to master
– Can be the first step to enter the data science
world
• Step 1: Using Free Tableau Public
– Download Tableau Public
• http://www.tableausoftware.com/public/
– Take Online Training
• http://www.tableausoftware.com/public/training
– Apply to Open Public Data
• https://www.data.gov/open-gov/
• https://data.ny.gov/
• https://nycopendata.socrata.com/
– Save and Publish Your Work Online
• With the free version, users cannot save the result
on a local machine
• This is all you need if you can publish all your work
to public (the server is hosted by Tableau for free)
• Step 2: Using Tableau Desktop
– Download Tableau Desktop
• http://www.tableausoftware.com/products/desktop
– Use the 14-days free trial to do as much training
and development as possible
• http://www.tableausoftware.com/learn/training
– Purchase the product if it is the right tool for you
• Personal edition: $1000, no database connection
• Professional edition: $2000, can open database
• Next Steps: Enjoy Data Visualization and
Analysis; Learn More Theory, Best Practices and
Tools.