Data effects business management opinions that steer tactics and strategies that ultimately effect business operations, which can be seen/witnessed/evidenced by measuring the business
Execution vs. searchBalancing the “knowns” & “unknowns”Data here, there, everywhere …Machine learning as foundation to analyticsVisualization as action to analyticsImminent opportunities
Most business systems and processes are geared towards two main needs: (a) know what you know already and (b) know what you don't know already. However, not knowing what you don't know remains what challenges businesses most and serve as the catalyst for business failures. Therefore, all businesses need to know more faster and better. This is where analytics becomes super valuable
Describe business need to have timely, detailed data transformed into information transformed into knowledge, transformed into actionable decisions in order to drive the business forward via (a) competitive advantage, (b) compliance, and (c) productivity/cost savings.
Analytics design and implementation are one based upon "embrace and extend" and not "rip out and replace." Therefore all businesses can benefit, one step at a time
Source:http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdfSuppose you have an application that you think machine learning might be good for. The first problem facing you is the bewildering variety of learning algorithms available. Which one to use? There are literally thousands available, and hundreds more are published each year. The key to not getting lost in this huge space is to realize that it consists of combinations of just three components. The components are: Representation. A classifier must be represented in some formal language that the computer can handle. Conversely, choosing a representation for a learner is tantamount to choosing the set of classifiers that it can possibly learn. This set is called the hypothesis space of the learner. If a classifier is not in the hypothesis space, it cannot be learned. A related question, which we will address in a later section, is how to represent the input, i.e., what features to use. Evaluation. An evaluation function (also called objective function or scoring function) is needed to distinguish good classifiers from bad ones. The evaluation function used internally by the algorithm may differ from the external one that we want the classifier to optimize, for ease of optimization (see below) and due to the issues discussed in the next section. Optimization. Finally, we need a method to search among the classifiers in the language for the highest-scoring one. The choice of optimization technique is key to the efficiency of the learner, and also helps determine the classifier produced if the evaluation function has more than one optimum. It is common for new learners to start out using off-the-shelf optimizers, which are later replaced by custom-designed ones.
Generalization is defined as the ability of the ML algorithm to tag labels correctly to data input beyond the examples in the training set.
if supervised learning, what’s your target value classification if target value is a discrete value, e.g., True/False, Yes/No, 1/2/3, A/B/C, Red/Green/Blue regression if target value can take on a number or range of values, e.g., 0.00 to 100.00, or -999 to 999, or +∞ to -∞
if you’re NOT looking to predict a target value, use unsupervised learning clustering if you are trying to fit data into some discrete groups density estimation if you are trying to have some numerical estimate of how strong the fit is into each groupClustering = discover groups of similar examples within the dataDensity estimation = determine the distribution of data within the input space (all features considered)Visualization by projection = project the data from high-dimensional space down to 2 or 3 dimensions for the purpose of visualization
To perform classification with generalized linear models (linear regression), see Logistic regression.Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.The advantages of support vector machines are:Effective in high dimensional spaces.Still effective in cases where number of dimensions is greater than the number of samples.Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.The disadvantages of support vector machines include:If the number of features is much greater than the number of samples, the method is likely to give poor performances.SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation (see Scores and probabilities, below).K Nearest Neighborsprovides functionality for unsupervised and supervised neighbors-based learning methods. Unsupervised nearest neighbors is the foundation of many other learning methods, notably manifold learning and spectral clustering. Supervised neighbors-based learning comes in two flavors: classification for data with discrete labels, and regression for data with continuous labels.The principle behind nearest neighbor methods is to find a predefined number of training samples closest in distance to the new point, and predict the label from these. The number of samples can be a user-defined constant (k-nearest neighbor learning), or vary based on the local density of points (radius-based neighbor learning). The distance can, in general, be any metric measure: standard Euclidean distance is the most common choice. Neighbors-based methods are known as non-generalizing machine learning methods, since they simply “remember” all of its training data (possibly transformed into a fast indexing structure such as a Ball Tree or KD Tree.).Despite its simplicity, nearest neighbors has been successful in a large number of classification and regression problems, including handwritten digits or satellite image scenes. Being a non-parametric method, it is often successful in classification situations where the decision boundary is very irregular.Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of independence between every pair of features.Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.Ensemble MethodsThe goal of ensemble methods is to combine the predictions of several models built with a given learning algorithm in order to improve generalizability / robustness over a single model.Two families of ensemble methods are usually distinguished:In averaging methods, the driving principle is to build several models independently and then to average their predictions. On average, the combined model is usually better than any of the single model because its variance is reduced.Examples: Bagging methods, Forests of randomized trees...By contrast, in boosting methods, models are built sequentially and one tries to reduce the bias of the combined model. The motivation is to combine several weak models to produce a powerful ensemble.
This and similar analyses reveal that each performance measure will convey some information and hide other.Therefore, there is an information trade-off carried through the different metrics.A practitioner has to choose, quite carefully, the quantity s/he is interested in monitoring, while keeping in mind that other values matter as well.Classifiers may rank differently depending on the metrics along which they are being compared.