2. 2
ACKNOWLEDGEMENT
We present before you our ISAS
Project on BUSINESS INTELLIGENCE. I would
like to thank my teacher for giving the idea
about the project. I would like to thank my
parents for their physical and financial help and
support. I would definitely include our sources
of information – internet and various books.
Without whom we could never complete this
Project.
3. 3
TABLE OF CONTENTS:
Definition of Business Intelligence………………..P-4
Terms that enables Business Intelligence:
Data Mining…………………………………………P-5-6
Definition
What is data warehouse?
What can data mining do?
Importance of data mining.
Data Warehouse…………………………………..P-7-8
Definition
Data warehouse models.
Advantages and disadvantages.
Data Analysis…………………………………….P-9-12
Definition
The process of data analysis
Data cleaning
Running Analysis
Presenting Result
Protecting Files
Types of data analysis
Reductive Analysis
Mathematical Analysis
Protecting Files
Conclusion………………………………………….P-12
4. 4
BUSINESS INTELLIGENCE
Definition:
Business intelligence is the processes, technologies & tools that
help us change data into information, information into knowledge and
knowledge into plans that guide organization. Technologies for
gathering, storing, analyzing & providing access to data to help
enterprise users make better business decisions.
BI technologies provide historical, current, and predictive views of
business operations. Common functions of business intelligence
technologies are data reporting, data mining, data warehouse, data
analysis, business performance management, benchmarking, text
mining, and predictive analytics.
By extension, “business intelligence” may refer to the collected
information itself of the explicit knowledge developed from the
information.
5. 5
DATA MINING
Definition:
Data mining is “the extraction of hidden predictive information
from large database”. Data mining software is one of a number of
analytical tools for analyzing data. It allows users to analyze data from
many different dimensions or angles, categorize it and summarize the
relationships identified.
Technically, data mining is the process of finding correlations or
patterns among dozens of fields in large relational database. Every
company or organization needs data mining services to increase their
profitability.
Data mining techniques are the result of a long process of research
and product development. This evolution began when business data was
first stored on computers, continued with improvement with data
access. Data mining is a powerful new technology with great potential to
help companies focus on the most important information in their data
warehouse.
What is data warehouse?
Data warehouse focuses on data storage. However, the means to
retrieve and analyze data, to extract, transform and load data, and to
6. 6
manage the data dictionary are also considered essential components of
a data warehouse system.
What can data mining do?
Data mining is primarily used today by companies with a strong
consumer focus - retail, financial, communication, and marketing
organizations. It enables these companies to determine relationships
among "internal" factors such as price, product positioning, or staff
skills, and "external" factors such as economic indicators, competition,
and customer demographics. And, it enables them to determine the
impact on sales, customer satisfaction, and corporate profits.
Importance of data mining:
Data mining is primarily used today by companies with a strong
customer focus – retail, financial, communication and marketing
organizations. Data mining is having lot of importance because of its
huge applicability. It is being used increasingly in business applications
for understanding and then predicting valuable data, like consumer
buying actions and buying tendency, profiles of customers, industry
analysis, etc. Data Mining is used in several applications like market
research, consumer behavior, direct marketing, bioinformatics, genetics,
text analysis, e-commerce, customer relationship management and
financial services.
7. 7
DATA WAREHOUSE
Definition:
A data warehouse is a place where data is stored for archival,
analysis and security purposes. Usually a data warehouse is either a
single computer or many computers (servers) tied together to create one
giant computer system.
Data can consist of raw data or formatted data. It can be on various
types of topics including organization's sales, salaries, operational data,
summaries of data including reports, copies of data, human resource
data, inventory data, external data to provide simulations and analysis,
etc.
Data warehouse models:
There are many different models of data warehouses. Online
Transaction Processing, which is a data warehouse model, is built for
speed and ease of use. Another type of data warehouse model is called
Online Analytical processing, which is more difficult to use and adds an
extra step of analysis within the data. Usually it requires more steps
which slows the process down and requires much more data in order to
analyze certain queries. One of the more common data warehouse
models include a data warehouse
8. 8
Advantages and disadvantages of data warehouse:
Advantages: -
The number one reason why user should implement a data
warehouse is so that employees can access the data warehouse and use
the data for reports, analysis and decision making. Using the data in a
warehouse can help user locate trends, focus on relationships and help
user understand more about the environment that your business
operates in.
Data warehouses also increase the consistency of the data and
allow it to be checked over and over to determine how relevant it is.
Because most data warehouses are integrated, user can pull data from
many different areas of business, for instance human resources, finance,
IT, accounting, etc.
Disadvantages: -
While there are plenty of reasons why you should have a data
warehouse, it should be noted that there are a few negatives of having a
data warehouse including the fact that it is time consuming to create
and to keep operating.
User might also have a problem with current systems being
incompatible with user’s data.
9. 9
Data Analysis
Definition:
Data Analysis is a process of inspecting, cleaning, transforming,
and modeling data with the goal of highlighting useful information,
suggesting conclusions, and supporting decision making. Data analysis
has multiple facets and approaches, encompassing diverse techniques
under a variety of names, in different business, science, and social
science domains.
Data mining descriptive statistics is a particular data analysis
technique that focuses on modeling and knowledge discovery for
predictive rather than purely descriptive purposes. Business intelligence
covers data analysis that relies heavily on aggregation, focusing on
business information. In statistical applications, some people divide data
analysis into descriptive statistics exploratory data analysis (EDA) and
confirmatory data analysis (CDA).
EDA focuses on discovering new features in the data and CDA on
confirming or falsifying existing hypotheses.
Predictive analytics focuses on application of statistical or
structural models for predictive forecasting or classification, while text
analytics applies statistical, linguistic, and structural techniques to
extract and classify information from textual sources, a species of
unstructured data. All are varieties of data analysis.
10. 10
Data integration is a precursor to data analysis, and data analysis is
closely linked to data visualization, which is unrelated to the subject of
this article.
The Process of Data Analysis:
Data analysis is a process, within which several phases can be
distinguished: -
1. Data cleaning.
2. Running Analysis.
3. Presenting Results.
4. Protecting Files.
Data Cleaning
Before substantive analysis begins, we need to verify that out data
are accurate and that the variables are well named and property labeled.
That is, we clean the data. First we must bring our data into state. If
received the data in state format, this is as simple as a single use
command. If the data arrived in another format, we need to verify that
they were imported correctly into state. We should also evaluate the
variables name and labels. Awkward names make it more difficult to
analyze the data and can lead to mistakes. Likewise, incomplete or
poorly designed labels make the output difficult to read and lead to
mistakes. Next we verify that the sample and variables are what they
should be.
11. 11
Running Analysis
Once the data are cleaned, fitting out models and computing the
graphs and tables for our paper or book are often the simplest part of
the workflow. Indeed, this part of the book is relatively short. Although I
do not discuss specific types of analysis, later I talk about ways to ensure
the accuracy of our result, to facilitate later replication, and to keep
track out do files, data files, and log files regardless of the statistical
methods we are using.
Presenting Result
Once the analyses are complete, we want to present them. I
consider several issues in the workflow of the presentation. An efficient
workflow can automate much of this work. Second, we need to
document the provenance of all findings that we present. If our
presentation does not preserve the source of our results, it can be very
difficult to track them down later. Finally, there are number of simple
things that we can do to make our presentation more effective.
Protecting Files
When we are cleaning our data, running analyses and writing, we
need to protect our files to prevent loss due to hardware failure, file
corruption, or unintentional deletions. There are number of simple
things we can do to make it easier to routinely save our work. With
backup software readily available and the cost of disk storage so cheap,
the hardest parts of making backups are keeping track of what we have.
12. 12
Types of Data Analysis
Reductive Analysis
It is a methodology in which individual facts, or aggregations of
those facts, are used as the basis for analysis. The methodology can be
argued not to be analysis at all. When used alone, it results in a simple
reduction of the data set to one or more statistics which often, do not
adequately represent the underlying phenomena.
Mathematical Analysis
It sometimes referred to as classical, data analysis is a methodology
in which mathematical models are applied to the data and used as the
basis for analysis. Mathematical modeling is an important technique in
the study of data because it lets us reduce unmanageable masses of data
to models that can be used to make prediction about the underlying
phenomena and understand such attributes of the data as normality and
linearity.
Visual Analysis
It is a methodology in which the data, as a whole, is used as the
basis for analysis. The data is presented visually and any modeling that
occurs is done as a result of the analysis of those visuals. It is especially
powerful because it matches our natural abilities to interpret data
holistically and exposes attributes of the data, which can be hidden in
models.