O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Marketing Analytics using R/Python

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Capstone Project - IS 6596
Project Supervisor:
Dr. Rohit Aggarwal
Project Contributors:
Mayank Badjatya - u1085897
Sagar S...
1
Capstone Project – IS 6596
Contents
Executive Summary......................................................................
2
Capstone Project – IS 6596
Executive Summary
The objective of this project is to discuss the importance of Machine Learn...
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 20 Anúncio

Marketing Analytics using R/Python

Baixar para ler offline

The objective of this project is to discuss the importance of Machine Learning in different sectors and how does it solve the problems in the Marketing Analytics field. We have discussed Marketing Segmentation, Advertisement, and Fraud detection in our project. We used different Machine Learning algorithms and used R and Python library to predict and solve these problems. After making models and running test data on those models we got following results:
• We trained a Decision tree and Random Forest classifier model which has 73% accuracy to predict whether a person will be a defaulter or not based on credit history, income, job type, dependents etc.
• We segmented the Social networking profiles based on the likes and dislikes of a person using K-Means Clustering.
• We made a predictive model of the messages a customer receives and determined whether a message will be a Spam or not a spam with an accuracy of 97%. We used Naïve Bayes classifier for this model.

The objective of this project is to discuss the importance of Machine Learning in different sectors and how does it solve the problems in the Marketing Analytics field. We have discussed Marketing Segmentation, Advertisement, and Fraud detection in our project. We used different Machine Learning algorithms and used R and Python library to predict and solve these problems. After making models and running test data on those models we got following results:
• We trained a Decision tree and Random Forest classifier model which has 73% accuracy to predict whether a person will be a defaulter or not based on credit history, income, job type, dependents etc.
• We segmented the Social networking profiles based on the likes and dislikes of a person using K-Means Clustering.
• We made a predictive model of the messages a customer receives and determined whether a message will be a Spam or not a spam with an accuracy of 97%. We used Naïve Bayes classifier for this model.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Marketing Analytics using R/Python (20)

Anúncio

Mais recentes (20)

Marketing Analytics using R/Python

  1. 1. Capstone Project - IS 6596 Project Supervisor: Dr. Rohit Aggarwal Project Contributors: Mayank Badjatya - u1085897 Sagar Singh - u1088202 MARKETING ANALYTICS USING R/PYTHON
  2. 2. 1 Capstone Project – IS 6596 Contents Executive Summary.......................................................................................................................................2 Book Description...........................................................................................................................................3 Why Data Science?........................................................................................................................................5 Skill sets required for a Data Science............................................................................................................6 7 Steps to effective Predictive Modelling.....................................................................................................7 Marketing Analysis........................................................................................................................................9 Fraud Detection ......................................................................................................................................10 Market Segmentation.............................................................................................................................13 Advertising..............................................................................................................................................16 Lessons Learned..........................................................................................................................................19 Next Steps...................................................................................................................................................19
  3. 3. 2 Capstone Project – IS 6596 Executive Summary The objective of this project is to discuss the importance of Machine Learning in different sectors and how does it solve the problems in the Marketing Analytics field. We have discussed Marketing Segmentation, Advertisement, and Fraud detection in our project. We used different Machine Learning algorithms and used R and Python library to predict and solve these problems. After making models and running test data on those models we got following results: • We trained a Decision tree and Random Forest classifier model which has 73% accuracy to predict whether a person will be a defaulter or not based on credit history, income, job type, dependents etc. • We segmented the Social networking profiles based on the likes and dislikes of a person using K- Means Clustering. • We made a predictive model on the messages a customer receives and determined whether a message will be a Spam or not a spam with an accuracy of 97%. We used Naïve Bayes classifier for this model. • We created several other models using different algorithms, but these are beyond the scope of this report.
  4. 4. 3 Capstone Project – IS 6596 Book Description An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use innovative statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra. Machine Learning with R: This book is intended for anybody hoping to use data for action. Perhaps you already know a bit about machine learning, but have never used R; or perhaps you know a little about R, but are new to machine learning. In any case, this book will get you up and running quickly. It would be helpful to have a bit of familiarity with basic math and programming concepts, but no prior experience is required. All you need is curiosity. Machine learning, at its core, is concerned with the algorithms that transform information into actionable intelligence. This fact makes machine learning well-suited to the present-day era of big data. Without machine learning, it would be nearly impossible to keep up with the massive stream of information. Given the growing prominence of R—a cross-platform, zero-cost statistical programming environment—there has never been a better time to start using machine learning. R offers a powerful but easy-to-learn set of tools that can assist you with finding data insights. By combining hands-on case studies with the essential
  5. 5. 4 Capstone Project – IS 6596 theory that you need to understand how things work under the hood, this book provides all the knowledge that you will need to start applying machine learning to your own projects. Marketing Analytics Data Driven Techniques: This book helps tech-savvy marketers and data analysts solve real-world business problems with Excel. Using data-driven business analytics to understand customers and improve results is a great idea in theory, but in today's busy offices, marketers and analysts need simple, low-cost ways to process and make the most of all that data. This expert book offers the perfect solution. Written by data analysis expert Wayne L. Winston, this practical resource shows you how to tap a simple and cost-effective tool, Microsoft Excel, to solve specific business problems using powerful analytic techniques—and achieve optimum results. Practical exercises in each chapter helped us to apply and reinforce techniques as you learn. Shows you how to perform sophisticated business analyses using the cost-effective and widely available Microsoft Excel instead of expensive, proprietary analytical tools • Reveals how to target and retain profitable customers and avoid high-risk customers • Helps you forecast sales and improve response rates for marketing campaigns • Explores how to optimize price points for products and services, optimize store layouts, and improve online advertising • Covers social media, viral marketing, and how to exploit both effectively.
  6. 6. 5 Capstone Project – IS 6596 Why Data Science? Data Science is a field, which can be implemented anywhere. Here is the list of people who uses data science as a tool in their field and are not from IT background. • Politics: We may have heard how statistical wizard Nate Silver predicted the electoral votes for each state in the 2012 presidential election, showing that raw data crunching of polls is much more reliable than traditional punditry. • Healthcare: The role of big data in medicine is one where we can build better health profiles and better predictive models around individual patients so that we can better diagnose and treat disease. Big data comes into play around aggregating increasingly information around multiple scales for what constitutes a disease—from the DNA, proteins, and metabolites to cells, tissues, organs, organisms, and ecosystems. • Automotive Industry: Areas in the automotive industry impacted by Big Data include: a. Conceptual Design: Real-world data collected from billions of miles driven will undoubtedly influence safety, aerodynamics, power algorithms and other fundamental elements of the vehicle. b. Drawing Boards: Efficiency gained in design, production volumes and manufacturing through Big Data in the auto industry will make it economically feasible to make today’s options tomorrow’s standard equipment. c. Procurement: Supply chain management optimized by Big Data will help manufacturers continue to wring new efficiency from the procurement process. d. Manufacturing: On the assembly line, data gathered throughout the building process will be used in predictive analytics to improve manufacturing simulations and watch machine performance, making the next assembly line even more efficient and flexible. • Marketing: Big Data is already having a major influence on vehicle marketing. Social sentiment will play a growing role in manufacturers’ plans to design new vehicles. Customer feedback on current models also helps marketing experts identify key themes and messages for new campaigns. • Finance: Understanding consumer habits, preferences and buying power across market segments gives manufacturers insights needed to develop more-effective financing programs. But that’s just the first step. New insights from Big Data analyses of sales and in-field use data will help captive financing companies develop new services and new revenue streams. • Services: Like performance, service will benefit as both a contributor and a user of Big Data in the automotive industry. Information gathered through millions of service events will provide feedback to designers.
  7. 7. 6 Capstone Project – IS 6596 Skill sets required for a Data Science Technical Skills: Python Coding – Python is the most common coding language I typically see required in data science roles, along with Java, Perl, or C/C++. Hadoop Platform – Although this isn’t always a requirement, it is heavily preferred in many cases. Having experience with Hive or Pig is also a strong selling point. Familiarity with cloud tools such as Amazon S3 can also be beneficial. SQL Database/Coding – Even though NoSQL and Hadoop have become a large component of data science, it is still expected that a candidate will be able to write and execute complex queries in SQL. Unstructured data – It is critical that a data scientist be able to work with unstructured data, whether it is from social media, video feeds or audio. Non-Technical Skills Intellectual curiosity – No doubt we have seen this phrase everywhere lately, especially as it relates to data scientists. Frank Lo describes what it means, and talks about other necessary “soft skills” in his guest blog posted a few months ago. Business acumen – To be a data scientist we’ll need a solid understanding of the industry we’re working in, and know what business problems your company is trying to solve. In terms of data science, being able to discern which problems are important to solve for the business is critical, in addition to identifying new ways the business should be leveraging its data. Communication skills – Companies searching for a strong data scientist are looking for someone who can clearly and fluently translate their technical findings to a non-technical team, such as the Marketing or Sales departments. A data scientist must enable the business to make decisions by arming them with quantified insights, in addition to understanding the needs of their non-technical colleagues to wrangle the data appropriately.
  8. 8. 7 Capstone Project – IS 6596 7 Steps to effective Predictive Modelling Step 1: Defining the Objective The first step in any modeling process is defining the objective. We see in what field does the problem fall in. There are many fields like Target Marketing, Risk & Fraud Management, Strategy Implementation and Change Management, Operational Efficiency, Increase Customer Experience, Manage Marketing, Campaigns Forecast, Revenue or Loss, Workforce Management, Financial Modeling, Churn Management, and Social Media Influencers Step 2: Gathering the Data Accurate, actionable, accessible data is the lifeblood of any successful model. So we collect enough data to make a predictive model on it. Step 3: Preparing the Data for Modeling The average modeler spends 70% of his or her time preparing data. In this step we need to prepare data into right format for analysis and the tool we may want use. 1. Do initial cleaning up 2. Define Variables and Create Data Dictionary 3. Joining/Appending multiple datasets 4. Validate for correctness 5. Produce Basic Summary Reports Step 4: Selecting and Transforming the Variables Determining the best fit is essential to good model performance. The underlying structure of the independent variables in relation to the dependent variable, determines the power and longevity of a model. Special consideration is given to the fact that marketing data can have hundreds or even thousands of variables. We apply methods for identifying the best candidate variables. Programs are introduced that automatically segment and transform the most powerful variables, to ensure the best fit. Step 5: Processing and Evaluating the Model All the preparation works up to this point makes this next step run smoothly. Weights of Evidence and Information Values are calculated. For our main case study, we used various options within PROC LOGISTIC to determine the model with the best fit. Validation data are scored, tabulated, and compared using both SAS® & MSExcel®. Step 6: Validating the Model Models should perform well on the development data. Plus, if the hold-out sample is randomly selected, the model performance should score the validation data with similar results. A true test of model performance is how well it performs on data from a different time or market area. So, we used three powerful methods for ensuring model fit. 1) Scoring alternate data is the best way to tell if our model will
  9. 9. 8 Capstone Project – IS 6596 perform in a real campaign; 2) Bootstrapping uses simple resampling techniques to find confidence intervals around our estimates; 3) Key Variable Analysis calculates important market factors as they are affected by the model, thus ensuring reasonable results. Step 7: Implementing and Maintaining the Model Effective implementation is a combination of business intelligence and well-designed procedures. So, we score a new data set with the new model. Several auditing procedures are done and tracking, and model maintenance are emphasized as best practices. Figure 1 7 Steps of Predictive Model
  10. 10. 9 Capstone Project – IS 6596 Marketing Analysis Figure 2 : Facets of Marketing Analysis An accurate customer risk assessment will help us acquire the most profitable consumers while minimizing risk. For business-to-consumer companies, Experian offers consumer credit information, advanced scoring software, prescreening systems, and application decisioning tools. For companies looking to acquire business customers, our business reports and public records, portfolio data and risk modeling tools allow clients to create comprehensive profiles of business prospects. Determine which businesses are well-capitalized and financially suited for customer acquisition.
  11. 11. 10 Capstone Project – IS 6596 Fraud Detection Fraud is a billion-dollar business and it is increasing every year. The PwC global economic crime survey of 2016 suggests that more than one in three (36%) of organizations experienced economic crime. Traditional methods of data analysis have long been used to detect fraud. They require complex and time- consuming investigations that deal with different domains of knowledge like financial, economics, business practices and law. To know more about how Machine Learning algorithms, solve Fraud detection problem we took a dataset from the “Machine Learning using R” credit data set. The idea behind our credit model is to identify factors that make an applicant at higher risk of default. Therefore, we need to obtain data on many past bank loans and whether the loan went into default, as well as information about the applicant. We can see that “job”, “phone”, “checking_balance”, “credit_history”, “purpose”,” savings_balance”, “employment_duration”, “other_credit”, “housing” are the categorical data so in Python we use onehotencoder() to convert the categorical data into 0s and 1s. After applying the onehotencoder() on all categorical dataset we got 36 columns. The credit dataset includes 1,000 examples of loans, plus a combination of numeric and nominal features indicating characteristics of the loan and the loan applicant. A class variable indicates whether the loan went into default. Figure 3 Conversion of categorical data into 0s and 1s
  12. 12. 11 Capstone Project – IS 6596 We did the initial data exploration and plotted that using matplotlib library. Figure 4 Exploratory Data Analysis We used decision tree to determine whether a person is a defaulter or not depending on the features. The core algorithm for building decision trees called ID3. The Decision tree classifiers uses greedy approach hence an attribute chooses at first step can’t be used anymore which can give better classification if used in later steps. Also, it overfits the training data which can give poor results for unseen data. It uses two concepts to determine on which feature it needs to divide the dataset. Information Gain The information gain is based on the decrease in entropy after a dataset is split on an attribute. Constructing a decision tree is all about finding attribute that returns the highest information gain (i.e., the most homogeneous branches). Entropy A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar values (homogenous). ID3 algorithm uses entropy to calculate the homogeneity of a sample. If the sample is completely homogeneous the entropy is zero and if the sample is an equally divided it has entropy of one. After applying the Decision tree model, we got the following classification report.
  13. 13. 12 Capstone Project – IS 6596 Figure 5 F1 Score for Decision Tree F1 score is a measure of a test's accuracy. The F1 score is the harmonic average of the precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0. Decision tree makes a model which is biased so to overcome this drawback we use Bagging. Bagging is a way to decrease the variance of our prediction by generating additional data for training from our original dataset using combinations with repetitions to produce multisets of the same cardinality/size as our original data. Random Forests is an ensemble classifier which uses many decision tree models to predict the result. A different subset of training data is selected, with replacement to train each tree. A collection of trees is a forest, and the trees are being trained on subsets which are being selected at random, hence random forests. After applying Random Forest classifier, we got the following result. Figure 6 F1 Score for Random Forest We can clearly see the increase in the F1-score. Now the next step in building model as discussed earlier is to fine tune the model. For this we use Grid Search Cross Validation technique. After applying the GridSearchCV we got the following classification report. Figure 7 F1 Score after GridSearchCV From this model we understand that the model will predict 73% of the time whether a person will be a defaulter or not.
  14. 14. 13 Capstone Project – IS 6596 Market Segmentation One of the most fundamental marketing activities is in market segmentation. As companies cannot connect with all their potential customers, they must divide markets into groups (segments) of consumers, customers, or clients with similar needs and wants. Firms can then target each of these segments by positioning themselves in a unique segment (such as Ferrari in the high-end sports car market). While market researchers often form market, segments based on practical grounds, industry practice and wisdom, cluster analysis allows segments to be formed that are based on data that are less dependent on subjectivity. Cluster analysis is a convenient method for identifying homogeneous groups of objects called clusters. Objects (or cases, observations) in a specific cluster share many characteristics, but are very dissimilar to objects not belonging to that cluster. Below we have tried try this process from start to finish. For this analysis, we used a dataset representing a random sample of 30,000 U.S. high school students who had profiles on a well-known SNS in 2006. To protect the users' anonymity, the SNS will remain unnamed. However, at the time the data was collected, the SNS was a popular web destination for US teenagers. Therefore, it is reasonable to assume that the profiles represent a wide cross section of American adolescents in 2006. Let's take a quick look at the specifics of the data. Figure 8 Description of the data set
  15. 15. 14 Capstone Project – IS 6596 Figure 9 Min-Max of the Age Figure 10 Gender and Age anomaly There is something strange around the gender row. On looking carefully, we noticed the NA value. We see that 2,724 records (9 percent) have missing gender data. Besides gender, only age has missing values. A total of 5,086 records (17 percent) have missing ages. Also concerning is the fact that the minimum and maximum values seem to be unreasonable; it is unlikely that a 3-year-old or a 106-year-old is attending high school. To ensure that these extreme values don't cause problems for the analysis, we cleaned them up before moving on. Figure 11 Box Plot for the age distribution A more reasonable range of ages for the high school students includes those who are at least 13 years old and not yet 20 years old. Any age value falling outside this range we treated the same as missing data. An easy solution for handling the missing values is to exclude any record with a missing value. In this case, we created dummy variables for female and unknown gender. We assigned teens$female the value 1 if gender is equal to F and the gender is not equal to NA; otherwise, it assigns the value 0 . Next, we eliminated the 5,523 missing age values. We have used a different strategy known as data imputation, which involves filling in the missing data with a guess as to the true value. Most people in a graduation cohort were born within a single calendar year. We have identified the typical age for each cohort, we had a reasonable estimate of the age of a student in that graduation year.
  16. 16. 15 Capstone Project – IS 6596 To cluster the teenagers into marketing segments, we used an implementation of k-means clustering. We started our cluster analysis by considering only the 36 features that represent the number of times various interests appeared on the teen SNS profiles. Evaluating clustering results can be somewhat subjective. Ultimately, the success or failure of the model hinges on whether the clusters are useful for their intended purpose. As the goal of this analysis was to identify clusters of teenagers with similar interests for marketing purposes, we largely measured our success in qualitative terms. For other clustering applications, more quantitative measures of success may be needed. By examining whether the clusters fall above or below the mean level for each interest category, we can notice patterns that distinguish the clusters from each other. Cluster 3 is substantially above the mean interest level on all the sports. This suggests that this may be a group of Athletes per The Breakfast Club stereotype. Figure 12 Cluster segmentation Cluster 0 includes the most mentions of "cheerleading," the word "hot," and is above the average level of football interest. Hence, these are the so-called Princesses. Similarly, we tried to cluster the different groups, and this is what we found. We now focused our effort on turning these insights into action. We applied the clusters back onto the full dataset. We looked at the demographic characteristics of the clusters. The mean age does not vary much by cluster, which is not too surprising as these teen identities are often determined before high school. On the other hand, there are some substantial differences in the proportion of females by cluster. This is a very interesting finding as we didn't use gender data to create the clusters, yet the clusters are still Cluster 0 (N = 872) Princess cute hair shopping clothes dance Cluster 1 (N = 21308) Basket Cases ??? Cluster 2 (N = 1041) Criminals drunk deaths drugs die music Cluster 3 (N = 5971) Athletes basketball soccer football volleyball soccer Cluster 4 (N = 808) Brains band marching music rock
  17. 17. 16 Capstone Project – IS 6596 predictive of gender. Given our success in predicting gender, we also suspected that the clusters are predictive of the number of friends the users have. This hypothesis seems to be supported by the data. Our findings support the popular adage that "birds of a feather flock together." By using machine learning methods to cluster teenagers with others who have similar interests, we were able to develop a typology of teen identities that was predictive of personal characteristics, such as gender and the number of friends. These same methods can be applied to other contexts with similar results. Advertising Compared to all the marketing techniques, email marketing is the cheapest way of sending a marketing message to millions of people. Being so cheap, it is the tool of choice for marketing teams with a small budget trying to sell cheap products. Most of the times, such products do not deliver what they promise. Unfortunately, with email marketing, we run the risk of being exposed to malware and fraudulent emails. Worms and viruses often make use of email and spam techniques to propagate. Phishing emails and Nigerian 419 scams are examples of fraudulent emails which try to harvest either our money or our personal information including credit card details. So, while email marketing is the tool of choice for most marketing teams, it does require stringent regulations to ensure that it does not get abused. Below we tried to build a model which predicts whether a composed message is spam or not. The dataset included the text of SMS messages along with a label indicating whether the message is unwanted. Junk messages are labeled spam, while legitimate messages are labeled ham. Since Naive Bayes has been used successfully for e-mail spam filtering, it seems likely that it could also be applied to SMS spam. However, relative to e-mail spam, SMS spam poses additional challenges for automated filters. SMS messages are often limited to 160 characters, reducing the amount of text that can be used to identify whether a message is junk. Figure 13 Description of the data set The first step towards constructing our classifier involves processing the raw data for analysis. SMS messages are strings of text composed of words, spaces, numbers, and punctuation. Handling this type of complex data takes a lot of thought and effort. One needs to consider how to remove numbers and
  18. 18. 17 Capstone Project – IS 6596 punctuation; handle uninteresting words such as and, but, and or; and how to break apart sentences into individual words. Figure 14 Description of length of the Ham messages Figure15 Description of length of the Spam messages Our first order of business was to standardize the messages to use only lowercase characters. To this end, we used tolower() function that returns a lowercase version of text strings. Continuing with our cleanup process, we also eliminated any punctuation from the text messages. Our next task was to remove filler words such as to, and, but, and or from our SMS messages. These terms are known as stop words and are typically removed prior to text mining. This is due to the fact that although they appear very frequently, they do not provide much useful information for machine learning. Another common standardization for text data involves reducing words to their root form in a process called stemming. The stemming process takes words like learned, learning, and learns, and strips the suffix to transform them into the base form, learn. These are left with the blank spaces that previously separated the now-missing pieces. The final step in our text cleanup process was to remove additional whitespace. A word cloud is a way to visually depict the frequency at which words appear in text data. The cloud is composed of words scattered somewhat randomly around the figure. The resulting word clouds are shown in the following diagram:
  19. 19. 18 Capstone Project – IS 6596 Figure 16 Spam Word cloud Figure 17 Ham Word cloud Now that the data are processed to our liking, the final step is to split the messages into individual components through a process called vectorization. We took the corpus and created a data structure in which rows indicate documents (SMS messages) and columns indicate terms (words). The final step in the data preparation process was to transform the sparse matrix into a data structure that can be used to train a Naive Bayes classifier. The sparse matrix included over 6,500 features; this is a feature for every word that appears in at least one SMS message. It's unlikely that these are useful for classification. To reduce the number of features, we eliminated any word that appear in less than five SMS messages, or in less than about 0.1 percent of the records in the training data. Figure 18 Vectorization To evaluate the SMS classifier, we need to test its predictions on unseen messages in the test data. The process of evaluating machine learning algorithms is very similar to the process of evaluating students. Since algorithms have varying strengths and weaknesses, tests should distinguish among the learners. Figure 19 Classification report
  20. 20. 19 Capstone Project – IS 6596 A confusion matrix is a table that categorizes predictions according to whether they match the actual value. One of the table's dimensions indicates the possible categories of predicted values, while the other dimension indicates the same for actual values. Although we have only seen 2 x 2 confusion matrices so far, a matrix can be created for models that predict any number of class value. Lessons Learned Lesson 1: Marketing research is fun- We get to work with a wide variety of datasets, dive in and learn all about the market their operating in and relay valuable insights back to stakeholders. We dig up everything from why consumers make certain purchase decisions to what they’re passionate about and what makes them tick. Lesson 2: Collaboration is key- While doing this project we found out that while they might be tremendous innovators, but collaboration is very important. Lesson 3: Check, re-check and then check again Projects move quickly which means we don’t have time to go back and re-collect data or make corrections to a report. Questionnaires, surveys, and reports must be checked, checked by our coworker and checked again. Next Steps The next step would be to discover the other facets of Marketing Analysis like “Upsell and Cross Sell”, “Recommendation System” etc. We can use algorithms like Principal Component Analysis(PCA), QDA, LDA to reduce the number of features. Also, we can make analysis on the time series data using ARIMA algorithm.

×