This document provides an overview of Microsoft Neural Network and Logistic Regression algorithms. It describes how neural networks can detect nonlinear relationships in data and are composed of an input, hidden and output layer. The Microsoft Neural Network algorithm uses backpropagation to update weights and minimize errors. Parameters like maximum inputs/outputs, sample size, and hidden node ratio can be configured. Examples of DMX queries are provided to create models for predicting customer attributes from demographic and technology usage data.
2. overview Microsoft Neural Network and Logistic Regression overview DMX Queries Model Content Principles of the Microsoft Neural Network Algorithm Algorithm Parameters
3. Microsoft Neural Network overview Microsoft Neural Network derives the analysis performed from two factors. Any and all of the inputs may be related somehow to any or all of the outputs, and the network must consider this in training. Different combinations of inputs may be related differently to outputs.
4. Microsoft Neural Network overview The relationships detected by the Microsoft Natural Network algorithm may span on up to two levels. In the single-level case, input facts are connected directly to the outputs. In the two-level case, input combinations effectively become new inputs, which are then connected to the outputs. The level that transforms certain input combinations into new inputs is referred to as a hidden layer.
5. Microsoft Logistic Regression overview The Microsoft Logistic Regression algorithm is the one with a single level of relationships used to predict the probability of events based on inputs. This algorithm is implemented by forcing the hidden layer of a neural network to have zero nodes and is manifest only in the internal structure of the algorithm.
6. DMX Queries The Microsoft Neural Network supports most of the tasks that Microsoft Decision Trees can do, including classification and regression. The next slide shows queries to create and train a mining structure for Employee information data.
7. DMX Queries CREATE MINING STRUCTURE EmployeeStructure( EmployeeID LONG KEY, Gender TEXT DISCRETE, [Marital Status] TEXT DISCRETE, Age LONG CONTINUOUS, [Education Level] TEXT DISCRETE, [Home Ownership] TEXT DISCRETE, TechnologyUsage TABLE ( [Technology] TEXT KEY ) ) GO A mining Structure Holding Employee data and Technology usage information
8. DMX Queries INSERT INTO MINING STRUCTURE [EmployeeStructure] ( [EmployeeID], [Gender], [Marital Status], [Age], [Education Level], [Home Ownership], [TechnologyUsage]( SKIP, [Technology] ) ) SHAPE { OPENQUERY ([Chapter 12], ‘SELECT [EmployeeID], [Gender], [Marital Status], [Age], [Education Level], [Home Ownership] FROM [Customers] ORDER BY [EmployeeID]‘) } APPEND ( { OPENQUERY ([Chapter 12], ‘SELECT [EmployeeID], [Technology] FROM [Technology] ORDER BY [EmployeeID]‘) } RELATE [EmployeeID] To [EmployeeID] ) AS [TechUsage] GO A mining structure holding customer data and technology usage information
9. DMX Queries ALTER MINING STRUCTURE EmployeeStructure ADD MINING MODEL VariousPredictions( EmployeeID, Gender, [Marital Status], [Age] PREDICT, [Education Level] PREDICT, [Home Ownership] PREDICT ) USING MICROSOFT NEURAL NETWORK GO INSERT INTO VariousPredictions GO Query to build a Neural Network mining model that predicts both a discrete target (Home Ownership) and a continuous (Age) target.
10. DMX Queries ALTER MINING STRUCTURE EmployeeStructure ADD MINING MODEL NestedTableInput( EmployeeID, Gender, [Marital Status], [Age] PREDICT, [Education Level], [Home Ownership], TechnologyUsage ( Technology ) ) USING MICROSOFT NEURAL NETWORK GO INSERT INTO NestedTableInput GO You can also include a nested table in a neural network algorithm, as long as it is not marked as predictable. Query to predict Age based on the Employee’s demographic data, as well as the technology items that the Employee is currently using.
11. Model Content A Neural Network model has one or more subnets. The model content describes the topologies of these subnets. It also stores the weights of each edge of the neural network.
13. Understanding the Structure of a Neural Network Model Each neural network model has a single parent node that represents the model and its metadata, and a marginal statistics node that provides descriptive statistics about the input attributes. Underneath these two nodes, there are at least two more nodes, and might be many more, depending on how many predictable attributes the model has. The first node always represents the top node of the input layer. Beneath this top node, you can find input nodes that contain the actual input attributes and their values. Successive nodes each contain a different sub network . Each sub network always contains a hidden layer , and an output layer for that sub network.
14. Principles of the Microsoft Neural Network Algorithm The origin of the Neural Network algorithm can be traced to the 1940s, when two researchers, Warren McCulloch and Walter Pits, tried to build a model to simulate how biological neurons work. Neural networks mainly address the classification and regression tasks of data mining such as decision trees, neural networks can find nonlinear relationships among input attributes and predictable attributes. Neural networks supports both discrete and continuous outputs.
15. How the algorithm works? The Microsoft Neural Network algorithm creates a network that is composed of up to three layers of neurons. Input layer: Input neurons define all the input attribute values for the data mining model, and their probabilities. Hidden layer: Hidden neurons receive inputs from input neurons and provide outputs to output neurons. The hidden layer is where the various probabilities of the inputs are assigned weights. The greater the weight that is assigned to an input, the more important the value of that input is. Output layer: Output neurons represent predictable attribute values for the data mining model.
17. Backpropagation Backpropagation(which is considered as the core process of the algorithm)involves the following steps: 1. Randomly assigns values for all the weights in the network at the initial stage (usually ranging from –1.0 to 1.0). 2. For each training example, the algorithm calculates the outputs based on the current weights in the network. 3. This step calculates the errors for each output and hidden neuron in the network. The weights in the network are updated. 4. Step 2 is repeated until the condition is satisfied.
18.
19.
20.
21. HIDDEN_NODE_RATIO specifies the ratio of hidden neurons to input and output neurons. The following formula determines the initial number of neurons in the hidden layer: HIDDEN_NODE_RATIO * SQRT(Total input neurons * Total output neurons) The default value is 4.0.
22. SUMMARY Microsoft Neural Network and Logistic Regression overview DMX Queries Model Content Principles of the Microsoft Neural Network Algorithm Algorithm Parameters
23. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net