Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Data Mining Algorithms for Regression Analysis
1. STUDIEREN UND DURCHSTARTEN. Author I: Dip.-Inf. (FH) Johannes Hoppe Author II: M.Sc. Johannes Hofmeister Author III: Prof. Dr. Dieter Homeister Date: 13.05.2011
2. Further Data Mining Algorithms Author I: Dip.-Inf. (FH) Johannes Hoppe Author II: M.Sc. Johannes Hofmeister Author III: Prof. Dr. Dieter Homeister Date: 13.05.2011
4. DM Algorithms - Regression Analysis Regression Analysis AKA. function approximation Includes any techniques for modeling and analyzing several variables Models the relationship between one or more variables you are trying to predict (dependent variables) and the predictive variables (independent variables) 4
5. DM Algorithms - Regression Analysis SSAS build in MS Linear Regression Analysis MS Logistic Regression Analysis MS Time Series Algorithm http://msdn.microsoft.com/en-us/library/ms170993(SQL.90).aspx 5
6. DM Algorithms - Regression / Linear Regression Linear Regression Analyze two continuous columns Relationship is an equation Equation is a line (linear equation) f(x) = m*x + b Error == distance from the regression line http://msdn.microsoft.com/en-us/library/ms174824(SQL.90).aspx 6
8. DM Algorithms - Regression / Linear Regression Explanation The Diagram shows a relationship between sales and advertising along with the regression equation. The goal is to be able to predict sales based on the amount spent on advertising. The graph shows a very linear relationshipbetween sales and advertising. A key measure of the strength of the relationship is the R-square. The R-square measures the amount of the overall variation in the data that is explained by the model.This regression analysis results in an R-square of 70%.This implies that 70% of the variation in sales can be explained by the variation in advertising. [Source: Olivia Parr Rud et. al, Data Mining Cookbook] 8
9. DM Algorithms - Regression / Logistic Regression Logistic regression Dependent variables have values between 0 and 1 Functions which describes the probability of a given event Instead of creating a straight line, logistic regression analysis creates an "S" shaped curve that contains maximum and minimum constraints Wikipedia Algorithm != MSDN Algorithm http://msdn.microsoft.com/en-us/library/ms174828(SQL.90).aspx 9
11. DM Algorithms - Regression / Time-Series MS Time-Series Algorithm Trend Analysis Optimized for analyzing continuous values eg. product sales over time Train Predict Cross-predictions possible! * * cool! http://msdn.microsoft.com/en-us/library/ms174923(SQL.90).aspx
12. DM Algorithms - Regression / Time-Series MS Time-Series Algorithm
13. DM Algorithms - Regression / Time-Series Combination of 2 algorithms, results are mixed ARTxp Auto Regressive Tree Method Developed by Microsoft Research Based on Microsoft Decision-Tree For Short term predictions ARIMA: Auto Regressive Integrated Moving Average Developed by Box and Jenkins For long term predictions http://msdn.microsoft.com/en-us/library/ms174828(SQL.90).aspx http://msdn.microsoft.com/en-us/library/bb677216.aspx 13
18. DM Algorithms - Neural Networks Definition A neural network is a massively parallel distributed processor that has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in two respects: Knowledge is acquired by the network through a learning process. Interneuron connection strengths known as synaptic weights are used to store the knowledge. [Source: Haykin, S. (1994), Neural Networks: A Comprehensive Foundation, NY: Macmillan. ] 18
19. DM Algorithms - Neural Networks Most NN are composed of several layers of neurons The direction of most connections is from input to output Often used: Back Propagation Networks A single neuron has several inputs with individual weights and one output In the basic form, the output is activated if the sum of inputs*weights exceeds a given threshold Learning is done with a target value at an additional training input plus a training mode signal. 19