4. 4
You will learn a few data analysis topics
Posing a question
Wrangling your data into a format you can use and fixing
any problems with it
Exploring the data, finding patterns in it, and building
your intuition about it
Drawing conclusions and/or making predictions
Communicating your findings
5. 5
What is Big Data Analytics?
Data analytics is an emerging technique that dives into a
data set without prior set of hypotheses
Accumulation of raw data captured from various sources
(i.e. discussion boards, emails, exam logs, chat logs in e-
learning systems) can be used to identify fruitful
patterns and relationships
Examining large amount of data
8. 8
Applications of Data analytics
Understanding and targetting Customers
Understanding and optimizing Business Processes
Improving Healthcare and Public Health
Optimizing Machine and Device Performance
Financial Trading
Improving and Optimizing Cities and Countries
Can you think of anything more??
How??
18. 18
Data Classification
Some Examples:
Separating Customer based on gender
Data sorting based on content type/file type,size etc
Classifying data into restricted, pubic or private data
types
"Among all the customers of Zalando, which are likely to respond to a new
offer?"
Will respond Will not respond
19. 19
Decision trees (DT)
Build classification or regression models in the form of Tree
structure
Classification Methods
21. 21
Classification Methods
Support Vector Machines(SVM)
Each data item is a point in n-dimensional space(n number
of features)
Find the hyperplane that differentiate the two classes
23. 23
Classification Methods
Select the hyperplane which
segragates two classes better
Ans: B
Maximising the distance between
nearest data point (Margin)
Ans: C
Select hyper-plane which classifies
accurately prior to maximising margin
Ans: A
Ignores outliers
Introduce: Z=x²+y²
In original input space
hyperplane looks like a circle
24. 24
Classification Methods
Bayesian Networks
Dotted lines: Potential Links
Blue box: Additional nodes and links between input
and output
Based on probability theory.
Can mix expert opinion and data to build
models
Backwards reasoning - in addition to
predicting outputs given inputs, we can
use output values to infer inputs.
Support for missing data during learning
and classification
26. 26
Association Rules
Discovering interesting realtions between variables in
large DB
Example Problems
Which products are frequently bought together by
customers? (Basket Analysis)
● DataTable = Receipts x Products
● Results could be used to change the placements of products in the market
Which courses tend to be attended together?
● DataTable = Students x Courses
● Results could be used to avoid scheduling conflicts....
27. 27
Association Rules
Examples
Bread, Cheese → Red Wine.
Customers that buy bread and cheese, also tend to buy red
wine
Machine Learning → Web Mining, ML Praktikum
Students that take 'Machine Learning' also take 'Web Mining'
and the 'Machine Learning Praktikum'
28. 28
Apriori Principle illustration
If {c,d,e} is frequent then all
subssets of this itemset are
frequent
Support Based pruning illustration
If {a,b} is infrequent then all
supersets of this itemset are
infrequent
Association Rules
30. 30
Cluster analysis
Task of grouping a set of objects in such a way that
objects in the same group (called a cluster) are more
similar (in some sense or another) to each other than to
those in other groups (clusters).
Examples
Biology: What is the taxonomy of the species?
Education: What are student groups that need special
attention?
Business: What are the customer segments?
33. 33
K-means clustering
k-means clustering aims to partition n observations into k
clusters in which each observation belongs to the cluster
with the nearest mean, serving as a prototype of the
cluster
Unsupervised learning algorithm
Define k centroids, one for each cluster
Take each point in the data set and associate it to the
nearest centroid
Recalculate the centroids
Repeat until the centroid doesnt move
34. 34
Hierarchical clustering
Groups data over a variety of scales by creating a cluster
tree or dendrogram.
Find the similarity or dissimilarity between every pair of
objects in the data set.
Group the objects into a binary, hierarchical cluster
tree.
Determine where to cut the hierarchical tree into
clusters
39. 39
Predictive Analytics
Make predictions about unknown future events based on
past happenings
Why now?
Growing volumes and types of data, and more interest in
using data to produce valuable insights.
Faster, cheaper computers.
Easier-to-use software.
Tougher economic conditions and a need for competitive
differentiation.
40. 40
Predictive Analytics
improve pattern detection and prevent criminal
behavior.
determine customer responses or purchases, as well as
promote cross-sell opportunities
forecast inventory and manage resources, to set ticket
prices.
Credit scores are used to assess a buyer’s likelihood of
default for purchases
41. 41
Data Visualization
Data visualization is the process of converting raw data
into easily understood pictures of information that
enable fast and effective decisions.
Visualization plays the key role in the efficient
communication of information (especially with large
amounts of information).
Visualization is used as a "check" to verify / falsify
results of automatic data analysis.
42. 42
Why Data Visualization?
Identify areas that need attention or improvement.
Clarify which factors influence customer behavior.
Help you understand which products to place where.
Predict sales volumes.
Data visualization is a quick, easy way to convey concepts in a
universal manner
44. 44
Visual Analytics Loop
Visual Analytics will foster the constructive evaluation, correction and rapid
improvement of our processes and models and - ultimately - the improvement of our
knowledge and our decisions
46. 46
Visual Analytics vs Information Visualization
Visual analytics is more than just visualization. It can rather be seen as an
integral approach to decision-making, combining visualization, human
factors and data analysis.
C04-0.01 room number
Starting
LMS registration >> BD2016
Groups
Who are we repeat in brief
What are we doing
Interactive session
Why are you sitting here? Why do u wanna do data anlysis? What dat do you have? Or what data you are familiar with? // for business people
Convert data into a preferred data format
Make others understand what you have found esp to business people
Vini
Do in day to day life
Examining raw data with the purpose of drawing conclusions about that information
Allows company to make better dcisions
3 types: Exploratory – new features in the data are discovered
Confirmatory – existing hypothesis are validted
Qualitative- draw conclusion from non numerical datalike words
Why would you use big data analytics?
Banks and credit cards companies: analyze withdrawal and spending patterns to prevent fraud or identity theft.
Ecommerce companies examine Web site - buy a product or service based upon prior purchases or viewing trends.
Predictive maintenance
Virus signature
Profit
Digital advertisement (targeted advetisement)
Recommender systems
Image recognition
Speech recognition
Gaming (motion gaming)
Price comparison websites – pricerunner, pricegrabber, junglee
Airline route planning
Delivery logistics – find best routes to ship
Self driving car
Robots
Improving science and research
Improving sports performance
Cities – traffic monitoring
danny
danny
Danny
Determine business objectives
Assess situations
Determine data mining goals
Produce poroject plan
Danny
Collect initial data
Describe data
Explore data
Verify data
Danny
Select
Clean
Construct
Integrate
Format data
Danny
#select mofelling techniques
Generate test design
Build model
Assess model
Dannyevaluate results
Review process
Determine next step
Danny
Plan deployment
Monitoring and maintenance
Review project
classification - a set of predefined classes and want to know which class a new object belongs to.
Clustering - group a set of objects and find whether there is some relationship between the objects.
classification - supervised learning
clustering - unsupervised learning.
Association : discovering interesting relations ´between variables
Learns a method for predicting the instance class from pre labelled classified instances
Sorting data within a db or repository
Decision trees
Support vector machines
Bayesian networks
DT: Clearly lay out the problem so that all options can be challenged.
Allow us to analyze fully the possible consequences of a decision.
Provide a framework to quantify the values of outcomes and the probabilities of achieving them.
Help us to make the best decisions on the basis of existing information and best guesses.
Apriori principle : Any subset of a frequent itemset must be frequent
Medicine : What are the diagnostic clusters?
Business: common needs, attitude, beahavious, demographics
Student groups : what issues they have for not excelling in exams: what psychological, environmental, aptitudinal, affective, and attitudinal factors