Editing, coding and tabulation of data

 The data collected from the respondents is generally not in the
form to be analyzed directly. After the responses are recorded or
received, the next stage is that of preparation of data i.e. to make
the data amenable for appropriate analysis. Data preparation
includes editing, coding, and data entry and is the activity that
ensures the accuracy of the data and their conversion from raw
form to reduced and classified forms that are more appropriate for
analysis. Preparing a descriptive statistical summary is another
preliminary step leading to an understanding of the collected data

 Editing
 Coding
 Validation of data
 Data entry
 Classification
 Tabulation

EDITING: The process of
checking and adjusting
responses in the completed
questionnaires for omissions,
legibility, and consistency and
readying them for coding and
storage.

first step in analysis is to edit the raw data.
Editing detects errors and omissions,
corrects them when possible, and certifies
that maximum data quality standards are
achieved. The editor's purpose is to
guarantee that data are:
Accurate.
Consistent with the intent of the question
and other information in the survey.
Uniformly entered.
Complete.
Arranged to simplify coding and tabulation

Purpose of Editing For consistency between and among
responses. For completeness in responses–
To reduce effects of item non-response.
To better utilize questions answered out of order.
To facilitate the coding process.

1. Checking of the no. of Schedules / Questionnaire)
2. Completeness (Completed in filling of questions)
3. Legibility.
4. To avoid Inconstancies in answers.
5. To Maintain Degree of Uniformity.
6. To Eliminate Irrelevant Responses.

Field Editing
Preliminary editing by a field supervisor on the same day as the
interview to catch technical omissions, check legibility of
handwriting, and clarify responses that are logically or
conceptually inconsistent.
In large projects, field editing review is a responsibility of the field
supervisor. It, should be done soon after the data have been
gathered. During the stress of data collection in a personal
interview and paper-and-pencil recording in an observation, the
researcher often uses ad hoc abbreviations special symbols. Soon
after the interview, experiment, or observation, the investigator
should review the reporting forms

Office Editing
Editing performed by a central office staff; often done more
rigorously than field editing.
It should take place when all forms or schedules have been
completed and returned to the office. This type of editing implies
that all forms should get a thorough editing by a single editor in a
small study and by a team of editors in case of a large inquiry.
Editor(s) may correct the obvious errors such as an entry in the
wrong place, entry recorded in months when it should have been
recorded in weeks, and the like. In case of inappropriate on
missing replies, the editor can sometimes determine the proper
answer by reviewing the other information in the schedule. At
times, the respondent can be contacted for clarification.

CODING The process of identifying and classifying each answer with
a numerical score or other character symbol. The numerical score or
symbol is called a code, and serves as a rule for interpreting,
classifying, and recording data. Identifying responses with codes is
necessary if data is to be processed by computer.

 Coding refers to the process of assigning numerals or other
symbols to answers so that responses can be put into a limited
number of categories or classes. Numeric coding simplifies the
researcher's task in converting a nominal variable, like gender, to
a "dummy variable,". Statistical software also can use
alphanumeric codes, as when we use M and F, or other letters, in
combination with numbers and symbols for gender.

Coded data is often stored electronically in the form of a data matrix -
a rectangular arrangement of the data into rows (representing cases)
and columns (representing variables) The data matrix is organized
into fields, records, and files: Field: A collection of characters that
represents a single type of data. Record: A collection of related fields,
i.e., fields related to the same case (or respondent) File: A collection of
related records, i.e. records related to the same sample

 Coding involves assigning numbers or other symbols to answers so
that the responses can be grouped into a limited number of
categories.
 In coding, categories are the partitions of a data set of a given
variable (e.g., if the variable is gender, the partitions are male and
female).
 Both closed- and open-response questions must be coded.

Code design /Coding Frame –
It describes the locations of variables and lists of code assignments
to the attributes composing those variables
It serves two essential functions:
 It is primary guide used in the coding process.
 It is the guide for locating variables and interpreting the columns
in data file during analysis.

 EXAMPLE
Que : Which magazines do you read?
1. Hindustan Times
2. business standard
3. economic times
4. the Hindu
5. the times of India

 After the data is coded, it is validated for data entry errors. The
data is then used for further analysis. The purpose of validating
the data is that it has been collected as per the specifications in
the prescribed format or questionnaire.
 For example, if the respondent is asked to rate a particular aspect
on 1 to 7, then the obvious responses should be 1 or 2 ….., or 7.
Any other inputted number is not considered as valid.
 In validation of the data, the above data will be restricted to the
integers between 1 and 7. This minimizes the errors. The other
validations are age within a number like 100, dates such as birth
dates, joining dates, etc should not be future dates etc.

 Data having a common characteristic are placed in one class
and in this way the entire data get divided into a number of
groups or classes. Classification can be one of the following two
types, depending upon the nature of the phenomenon involved:
Classification according to attributes:,

 data are classified on the basis of common characteristics which
can either be descriptive (such as literacy, sex, honesty, etc.) or
numerical (such as weight, height, income, etc.).Classification
according to : Data relating to income, production, age, we class-
interval sight, etc. come under this category.
 Such data are known as statistics of variables and are classified on
the basis of class intervals. For instance, persons whose incomes,
say, are within Rs 201 to Rs 400 can form one group, those whose
incomes are within Rs 401 to s 600 can form another group and so
on

 Tabulation is the process of summarizing raw data and
displaying the same in compact form (i.e., in the form of
statistical tables) for further analysis. In a broader sense,
tabulation is an orderly arrangement of data in columns and
rows.

To carry out investigation
To do comparison
To locate omissions and errors in the data.
To use space economically
To simplify data
To use it as future reference

 It conserves space and reduces explanatory and descriptive
statement to a minimum.
 It facilitates the process of comparison.
 It facilitates the summation of items and the detection of
errors and omissions.
 It provides a basis for various statistical computations.

1. It simplifies complex data.
2. It facilitates comparison.
3. It facilitates computation.
4. It presents facts in minimum possible space. 5. Tabulated
data are good for references and they make it easier to
present the information in the form of graphs and
diagrams.

Every table should have a clear, concise and adequate title so as to
make the table intelligible without reference to the text and this
title should always be placed just above the body of the table.
Every table should be given a distinct number to facilitate easy
reference. The column headings (captions) and the row headings
(stubs) of the table should be clear and brief. The units of
measurement under each heading or sub-heading must always be
indicated. Explanatory footnotes, if any, concerning the table
should be placed directly beneath the table, along with the
reference symbols used in the table.

 Table number.
 Title of the table.
 Captions or column headings.
 Stubs or row designation.
 Body of the table.
 Foot notes.
 Sources of data.

 statistical enquiry.
 easily understandable.
 suit the size of the paper.
 Rows and columns in a table should be numbered.
 The arrangements of rows and columns should be in a logical
 The rows and columns are separated by lines clearly.

There are three basis of classifying tables.
Purpose of a table
Originality of a table
Construction of a table.

Purpose :
 General Purpose Table
 Special Purpose Table
Originality:
 Original Table
 Derived Table
Construction
 Simple or One-Way Table
 Complex Table Double or Two-Way Table
 Treble Table Manifold Table

General Purpose Table: General
purpose table is that table which is
of general use. It is does not serve
any specific purpose or specific
problem under consideration.
Special Purpose Table: Special
Purpose table is that table which is
prepared with some specific purpose
in mind.

Original Table: An original table is that
in which data are presented in the same
form and manner in which they are
collected.
Derived Table: A derived table is that in
which data are not presented in the form
or manner in which these are collected.
Instead the data are first converted into
ratios or percentage and then presented.

SUBJECTS STUDENTS
MARKETING 30
FINANCE 25
HR 25
OPERATIONS 20

In a complex table (also known as a manifold table) data are
presented according to two or more characteristics simultaneously.
The complex tables are two-way or three-way tables according to
whether two or three characteristics are presented simultaneously.
a. Double or Two-Way Table
b. Three-Way Table c. Manifold (or Higher Order) Table

 Table In such a table, the variable under study is further
subdivided into two groups according to two inter-related
characteristics

 In such a table, the variable under study is divided according
to three interrelated characteristics.

Such tables provide information about a large no of
interrelated characteristics in the data set.

 Graphic representation is another way of analysing numerical data. A
graph is a sort of chart through which statistical data are represented
in the form of lines or curves drawn across the coordinated points
plotted on its surface.
 Graphs enable us in studying the cause and effect relationship between
two variables. Graphs help to measure the extent of change in one
variable when another variable changes by a certain amount.
 Graphs also enable us in studying both time series and frequency
distribution as they give clear account and precise picture of problem.
Graphs are also easy to understand and eye catching.

It is used to make the data understandable to common man.
It helps in easy and quick understanding of data.
Data displayed by graphical representation can be memorized
for a long time.
Can be compared at a glance.

 This kind of a diagram becomes suitable for representing data
supplied chronologically in an ascending or descending order.
 Usually, it shows the behavior of a variable over time.
Successive values of a variable at different periods or places
are plotted as separate points on a two dimensional plane and
the locus of all those points joined together form a continuous
line segment, called line diagram.

 While tracing out such a diagram, the usual convention is to show
the successive values of the variable under study along the vertical
axis in an increasing order and the time dimension along the
horizontal axis. It should carefully be noted that none of the two
axes be too long or too short with respect to each other.

In bar graphs data is represented by bars.
The bars can be made in any direction i.e. vertical or
horizontal.
The bars are taken of equal weight and start from a common
horizontal or vertical line and their length indicates the
corresponding values of statistical data. When do we use bar
diagram ?
When the data are given in whole numbers.
When the data are to be compared easily.

 It is another well-known useful statistical weapon to represent
raw data decently. This device is applied specially in a situation
where the given data can be classified on the basis of a non-
measurable criterion e.g., standards of college education in
different states of India at the present time.
 This is very often called cross-section data. More precisely, a bar
graph is formed as a collection of rectangles having the same
width or breadth placed successively at equal distance. Practically,
the height of each bar placed vertically represents the value of the
variable on the identical class interval shown horizontally.

 Usually, these bars are placed either vertically on the
horizontal axis or horizontally on the vertical axis and they are
thus known as vertical bar chart or horizontal bar chart.

0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Category 1 Category 2 Category 3 Category 4
Series 3
Series 2
Series 1

It is a circle in which different components are represented
through the sections or portions of a circle.
To draw a pie diagram, first the value of each category is
expressed as a percentage of the total and then the angle 360⁰ is
divided in the same percentages.
Then at the centre of a circle these angle are drawn
simultaneously starting from a particular radius.
In this way we get a set of sectorial areas proportional to the
values of the items.

 When do we use pie diagram?
When the data are given in percentage.
When different aspect of a variable are to be displayed.
When the data are to be compared normally.

 the aggregate value of the variable is expressed as the total area of a
circle with a reasonable radius. The entire area in the circle is
subdivided into a number of parts by several radii which are separately
related to the total area of the circle and also maintain the same
proportional relation with the angle at the centre.
 For drawing it correctly, we convert the particular given values of the
variable as a percentage of the total value of the variable. As the angle
at the centre is 360°, it is supposed to express 100 p.c. value of the
variable where 1 p.c. value of the variable is equivalent to an angle of
3.6° at the centre.

 A histogram is essentially a bar graph of a frequency
distribution. It can be constructed for equal as well as
unequal class intervals. Area of any rectangle of a histogram
is proportional to the frequency of that class.

When data are given in the form of frequencies.
When class interval has to be displayed by a diagram.
When we need to calculate the Mode of a distribution graphically.

A frequency polygon is essentially a line graph
We can get it from a histogram, if the mid points of the upper
bases of the rectangles are connected by straight lines.
But it is not essential to plot a histogram first to draw it.
We can construct it directly from a given frequency
distribution.

When data are given in the form of
frequencies.
When two or more groups have to be
displayed in one diagram.
When two or more groups are to be
compared.

 Frequency curve is another type of graphical representation of
data.
 When then top points of a frequency polygon are joined not by
straight lines but by curved ones.
 Frequency polygon is drawn using scale while while Frequency
curve is drawn using free hand.

Editing, coding and tabulation of data

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Editing, coding and tabulation of data

Semelhante a Editing, coding and tabulation of data (20)

Mais de Siddharth Gupta

Mais de Siddharth Gupta (16)

Último

Último (17)

Editing, coding and tabulation of data