O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a navegar o site, você aceita o uso de cookies. Leia nosso Contrato do Usuário e nossa Política de Privacidade.
O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a utilizar o site, você aceita o uso de cookies. Leia nossa Política de Privacidade e nosso Contrato do Usuário para obter mais detalhes.
Data Warehousing and
• Business user friendly stories about past events (including near time)
• Designed to support decision making
• Serves a digest of answers in grouped and aggregated ways
• More meaningful and therefore more important to the business
• Ingests data from disparate sources which need to be merged to
enable business friendly queries
Data Warehouse Definition
• A consolidating bolt-on to existing operational systems
• Structured data associated with a specific user base and a specific set of
predefined business queries
• The data schema is predefined and structured to facilitate regular and ad-
• Populating the data warehouse requires multiple ETL processes designed in
• Halts the proliferation of reports
Data Warehouse Basic Architecture
ETL Staging Area
Operational Data Soures Data Preparation Business Queries
Data Warehouse Requirements
• Organisational Data is easy to access
• Information is presented consistently
• Adaptive and resilient to change
• Serves as a base for improved decision making
• Accepted by the business community
• A Data warehouse provides historic information for decision making
• Machine Learning uses algorithms to process features in the data to
learns patterns, make predictions and solution outcomes
• Image recognition, Classification, Forecasting, Anomaly detection
• Learning is Supervised (labelled with the desired outcome) or
Unsupervised (unlabelled, the model learns unaided)
Machine Learning - Supervised
• A predictive model is trained using a labelled training data set and the
outcome evaluated on its performance
• The model is tweaked to improve performance
• The model is then run against a test data set which is unlabelled and
evaluated on its performance in identifying the correct label
• k-Nearest Neighbours
• Linear and Logistical Regression
• Decision Trees
• Support Vector Machines
Machine Learning - Unsupervised
• The training data set is unlabelled
• The descriptive model is trained and evaluated on its performance
• Clustering - k-Means
• Association Rules
• Natural Language Processing
Machine Learning an Extension to Data
• Much of the hard work to cleanse and transform data has been
• Ask the Business Question – what is the objective? Is it descriptive or
• Does the data contain the desired features?
• Is further data transformation required
• Which ML algorithm is optimal for answering the question?
• Iterative approach assessing and evaluating model(s) performance
• Present the Solution
• Kimball, R., Ross, M., Thornthwaite, W., Mundy. J and Becker, B. (2008) The data warehouse lifecycle toolkit.
2nd ed. Indianapolis: Wiley Publishing, Inc.
• Lantz, B. (2015). Machine Learning with R, 2nd edn, Birmingham: Packt.
• O'Leary, D. E. (2014), ‘Embedding AI and Crowdsourcing in the Big Data Lake’, IEEE Intelligent Systems,
Volume 29, Issue 5, pp. 70-73.