2. 2
The Need for Data Structures
Data structures organize data
⇒ more efficient programs.
More powerful computers ⇒ more complex
applications.
More complex applications demand more
calculations.
Complex computing tasks are unlike our
everyday experience.
3. 3
What is a data structure?
In a general sense, any data
representation is a data structure.
Example: An integer
More typically, a data structure is meant
to be an organization for a collection of
data items.
4. 4
Organizing Data
Any organization for a collection of records
can be searched, processed in any order,
or modified.
The choice of data structure and algorithm
can make the difference between a
program running in a few seconds or
many days.
5. 5
Efficiency
A solution is said to be efficient if it solves
the problem within its resource
constraints.
Space
Time
The cost of a solution is the amount of
resources that the solution consumes.
6. 6
Costs and Benefits
A data structure requires a certain
amount of:
space for each data item it stores
time to perform a single basic operation
programming effort.
7. 7
Example: Banking Application
Operations are (typically):
Open accounts
Close accounts
Access account to Add money
Access account to Withdraw money
8. 8
Example: Banking Application
Teller and ATM transactions are
expected to take little time.
Opening or closing an account can
take much longer (perhaps up to an
hour).
9. 9
Example: Banking Application
When considering the choice of data
structure to use in the database system
that manages the accounts, we are
looking for a data structure that:
Is inefficient for deletion
Highly efficient for search
Moderately efficient for insertion
10. 10
Example: Banking Application
One data structure that meets these requirements is
the hash table (chapter 9).
Records are accessible by account number (called
an exact-match query)
Hash tables allow for extremely fast exact-match
search.
Hash tables also support efficient insertion of new
records.
Deletions can also be supported efficiently (but too
many deletions lead to some degradation in
performance – requiring the hash table to be
reorganized).
11. 11
Example: City Database
Database system for cities and towns.
Users find information about a particular
place by name (exact-match query)
Users also find all places that match a
particular value (or range of values),
such as location or population size
(called a range query).
12. 12
Example: City Database
The database must answer queries quickly
enough to satisfy the patience of a typical
user.
For an exact-match query, a few seconds is
satisfactory
For a range queries, the entire operation
may be allowed to take longer, perhaps on
the order of a minute.
13. 13
Example: City Database
The hash table is inappropriate for
implementing the city database because:
It cannot perform efficient range queries
The B+ tree (section 10) supports large
databases:
Insertion
Deletion
Range queries
If the database is created once and then
never changed, a simple linear index would
be more appropriate.
14. 14
Selecting a Data Structure
Select a data structure as follows:
1. Analyze the problem to determine the resource
constraints a solution must meet.
2. Determine the basic operations that must be
supported. Quantify the resource constraints
for each operation.
3. Select the data structure that best meets these
requirements.
15. 15
Some Questions to Ask
Are all data inserted into the data structure
at the beginning, or are insertions
intersparsed with other operations?
Can data be deleted?
Are all data processed in some well-
defined order, or is random access
allowed?
16. 16
Data Structure Philosophy
Each data structure has costs and benefits.
Rarely is one data structure better than
another in all situations.
A data structure requires:
space for each data item it stores,
time to perform each basic operation,
programming effort.
17. 17
Data Structure Philosophy
Each problem has constraints on available space
and time.
Only after a careful analysis of problem
characteristics can we know the best data
structure for the task.
Bank example:
Start account: a few minutes
Transactions: a few seconds
Close account: overnight
continued
18. 18
Goals of this Course
1. Reinforce the concept that costs and benefits
exist for every data structure.
1. Learn the commonly used data structures.
These form a programmer's basic data structure
``toolkit.'‘
1. Understand how to measure the cost of a data
structure or program.
These techniques also allow you to judge the merits
of new data structures that you or others might
invent.
19. 19
Abstract Data Types
Abstract Data Type (ADT): a definition for a
data type solely in terms of a set of values
and a set of operations on that data type.
Each ADT operation is defined by its inputs
and outputs.
Encapsulation: Hide implementation details.
20. 20
Data Structure
A data structure is the physical implementation of an
ADT.
Each operation associated with the ADT is implemented by one
or more subroutines in the implementation.
In a OO language such as C++, an ADT and its
implementation together make up a class.
Data structure usually refers to an organization for data
in main memory.
File structure: an organization for data on peripheral
storage, such as a disk drive.
21. 21
Labeling collections of objects
Humans deal with complexity by assigning a
label to an assembly of objects. An ADT
manages complexity through abstraction.
Hierarchies of labels
Ex1: transistors ⇒ gates ⇒ CPU.
In a program, implement an ADT, then think only
about the ADT, not its implementation.
22. 22
Logical vs. Physical Form
Data items have both a logical and a physical form.
Logical form: definition of the data item within an
ADT.
Ex: Integers in mathematical sense: +, -
Physical form: implementation of the data item
within a data structure.
Ex: 16/32 bit integers, overflow.
25. 25
Problems
Problem: a task to be performed.
Best thought of as inputs and matching
outputs.
Problem definition should include constraints
on the resources that may be consumed by
any acceptable solution.
26. 26
Problems (cont)
Problems ⇔ mathematical functions
A function is a matching between inputs (the domain)
and outputs (the range).
An input to a function may be single number, or a
collection of information.
The values making up an input are called the
parameters of the function.
A particular input must always result in the same
output every time the function is computed.
27. 27
Algorithms and Programs
Algorithm: a method or a process followed to solve
a problem.
A recipe: The algorithm gives us a “recipe” for solving
the problem by performing a series of steps, where
each step is completely understood and doable.
An algorithm takes the input to a problem (function)
and transforms it to the output.
A mapping of input to output.
A problem can be solved by many algorithms.
28. 28
A problem can have many
algorithms
For example, the problem of sorting can be
solved by the following algorithms:
Insertion sort
Bubble sort
Selection sort
Shellsort
Mergesort
Others
29. 29
Algorithm Properties
An algorithm possesses the following properties:
It must be correct.
It must be composed of a series of concrete steps.
There can be no ambiguity as to which step will be performed
next.
It must be composed of a finite number of steps.
It must terminate.
A computer program is an instance, or concrete
representation, for an algorithm in some
programming language.
30. 30
Programs
A computer program is a concrete
representation of an algorithm in some
programming language.
Naturally, there are many programs that
are instances of the same algorithms,
since any modern programming
language can be used to implement any
algorithm.
31. 31
To Summarize:
A problem is a function or a mapping of
inputs to outputs.
An algorithm is a recipe for solving a
problem whose steps are concrete and
ambiguous.
A program is an instantiation of an
algorithm in a computer programming
language.
32. 32
Example
Problem: find y = x to the power of 2
Algorithm1: Multiply X by X
Algorithm2: Add X to itself X times
Program1: for (int i = 0; i<x; i++)
y +=x;
33. 33
Example (cont.)
Program2: (Assembly Intel 8086)
mov bl,x // read x
mov al,bl // store x
mov cl,bl // int counter
loop: add al,bl // al = al + x
dec cl // decrement loop ctr
jnz loop
34. 34
In class exercises
Think of a program you have used that is
unacceptably slow. Identify other basic operations
that the program performs quickly enough.
Imagine that you are a shipping clerk for a large
company. You have just been handed about 1000
invoices, each of which is a single sheet of paper with
a large number in the upper right corner. The
invoices must be sorted by this number, in order from
lowest to highest. Write down as many different
approaches to sorting the invoices as you can think
of.
Notas do Editor
A primary concern for this course is efficiency.
You might believe that faster computers make it unnecessary to be concerned with efficiency. However…
So we need special training.
If you are willing to pay enough in time delay. Example: Simple unordered array of records.
Alternate definition: Better than known alternatives (“relatively efficient”).
Space and time are typical constraints for programs.
This does not mean always strive for the most efficient program. If the program operates well within resource constraints, there is no benefit to making it faster or smaller.
Typically want the “simplest” data structure that will meet the requirements.
These questions often help to narrow the possibilities.
If data can be deleted, a more complex representation is typically required.
The space required includes data and overhead.
Some data structures/algorithms are more complicated than others.
The first goal is a worldview to adopt
The second goal is the “nuts and bolts” of the course.
The third goal prepares a student for the future.
The concept of an ADT is one instance of an important principle that must be understood
By any successful computer specialist: managing complexity through abstraction.
In this class, we frequently move above and below “the line” separating logical and physical forms.
But NO constraints on HOW the problem is solved
“Correct” means computes the proper function.
“Concrete steps” are executable by the machine in question.
We frequently interchange use of “algorithm” and “program” though they are actually different concepts.