1. Abstract Data Types
• Data abstraction, or abstract data types, is a programming
methodology where one defines not only the data
structure to be used, but the processes to manipulate the
structure
– like process abstraction, ADTs can be supported directly by
programming languages
• To support it, there needs to be mechanisms for
– defining data structures
– encapsulation of data structures and their routines to manipulate
the structures into one unit
• by placing all definitions in one unit, it can be compiled at one time
– information hiding to protect the data structure from outside
interference or manipulation
• the data structure should only be accessible from code encapsulated with
it so that the structure is hidden and protected from the outside
• objects are one way to implement ADTs, but because objects have
additional properties, we defer discussion of them until the next chapter
2. ADT Design Issues
• Encapsulation: it must be possible to define a unit that
contains a data structure and the subprograms that
access (manipulate) it
– design issues:
• will ADT access be restricted through pointers?
• can ADTs be parameterized (size and/or type)?
• Information hiding: controlling access to the data
structure through some form of interface so that it
cannot be directly manipulated by external code
– this is often done by using two sections of an ADT definition
• public part (interface) constitutes those elements that can be accessed
externally (often the interface permits only access to subprograms and
constants)
• the private part, which remains secure because it is only accessible by
subprograms of the ADT itself
3. Modula-2 ADTs
• Unit for encapsulation called a module
– modules can be combined to form libraries of ADTs
• To define a module:
– definition module: the interface containing a partial or
complete type definition (data structure) and the subprogram
headers and parameters
– implementation module: the portion of the data structure that
is to be hidden, along with all operation subprograms
• If the complete type declaration is given in the definition
module, the type is “transparent” otherwise it is
“opaque”
– opaque types represent true ADTs and must be accessed
through pointers
• this restriction allows the ADT to be entirely hidden from user
programs since the user program need only define a pointer
4. ADTs in Ada
• The encapsulation construct is the package
• Packages consist of two parts:
– specification package (the public or interface part)
– body package (the hidden or private part)
• The two packages can be compiled separately
– but only if specification package is compiled first
• The specification package must include details of the data
structure itself
– to preserve information hiding, the data structure’s definition can follow
the word private denoting that the following is hidden
• Ada offers three forms of ADTs
– those without information hiding, and thus are not true ADTs
– those that preserve information hiding by specifying that the data structure
is private
– those that specify that the data structure is limited private
• all ADTs have built-in operations for assignment and equality except for
limited private ADTs which have no built-in operations at all
5. Example Part I
package Stack_Pack is
type Stack_Type is limited private;
Max_Size : constant := 100;
function Empty(Stk : in Stack_Type) return Boolean;
procedure Push(Stk : in out Stack_Type; Element : in Integer);
procedure Pop(Stk : in out Stack_Type);
function Top(Stk : in Stack_Type) return Integer;
private
type List_Type is array (1..Max_Size) of Integer;
type Stack_Type is
record
List : List_Type;
Topsub : Integer range 0..Max_Size := 0;
end record;
end Stack_Pack;
The specification package
for a stack ADT – see the next
slide for the body package
The actual ADT
definition must either
appear in the open
section (e.g., the public
part) or in the private
section
An alternative implementation to this approach is to
define a pointer in the private section of this package and
define the actual Stack_Type ADT in the body package.
This is discussed in more detail in the notes section of this
slide.
6. Example Part II
with Ada.Text_IO; use Ada.Text_IO;
package body Stack_Pack is
function Empty(Stk : in Stack_Type) return Boolean is
begin
return Stk.Topsub = 0;
end Empty;
procedure Push(Stk : in out Stack_Type; Element : in Integer) is
begin
if Stk.Topsub >= Max_Size then
Put_Line(“ERROR – Stack overflow”);
else
Stk.Topsub := Stk.Topsub +1; Stk.List(Topsub):=Element;
end if;
end Push;
procedure Pop(Stk : in out Stack_Type) is
begin … end Pop;
function Top(Stk : in Stack_Type) return Integer is
begin … end Top;
end Stack_Pack;
The rest of the implementation
can be found on page 481
7. C++ ADTs
• C++ offers two mechanisms for building data structures:
the struct and the class
– because the struct does not have a mechanism for information
hiding, it can only offer encapsulation, so for a true ADT, we
must use C++s object
– C++ classes contain both visible (public) and hidden (private)
components (as well as protected)
– C++ instances can be static, heap-dynamic and stack-dynamic
• the lifetime of an instance ends when it reaches the end of the scope of
where it was declared
• a stack-dynamic object may have heap-dynamic data so that parts of the
object may continue even though the instant is deallocated
– we defer most of our discussion of objects in C++ to the next
chapter, but we will see an example next
8. C++ Example
#include <iostream.h>
class stack {
private:
int *stackPtr;
int max;
int topPtr;
public:
stack( ) { // constructor
stackPtr = new int [100];
max = 99;
topPtr = -1;
}
~stack( ) {delete [ ] stackPtr;} // destructor
void push(int number) {…} // details omitted
void pop( ) {…}
int top( ) {…}
int empty( ) {…}
Unlike the Ada example, in C++, the
entire definition is encapsulated in one
location
Information hiding is preserved through
the use of a private part with the interface
being defined in the public part
Any methods that are to be defined in this
class but not accessible outside of the
class would also be defined in the private
section
9. Java, C# and Ruby ADTs
• All three languages support ADTs through classes
– Java permits no stand-alone functions, only methods defined
in class definitions and unlike C++, referenced through
reference variables (pointers), therefore, in Java, every data
structure is an ADT
• it is up to the programmer as to whether information hiding is enforced
or not
– C# borrows from both C++ and Java but primarily from Java,
where all objects are heap dynamic, modifiers are private,
public, protected, but C# also offers
• internal and protected internal modifiers which are used for assemblies
(cross-platform objects), and methods that can serve as both accessors
and mutators (see the example on page 500-501)
– Ruby requires that all class variables be private, and all
methods default to being public (but the programmer can
change this)
• class variables do not have to be explicitly declared in Ruby, see the
example on page 502-04
• we look at Ruby in more detail in chapter 12
10. Parameterized ADTs
• The ability to define an ADT
where the type and/or size is
specified generically so that a
specific version can be
generated later
– a stack defined without
specifying the element type
(integer vs. string vs. float, etc)
– a stack defined without a
restriction on the size of the stack
– Ada, C++, Java and C# all have
this capability
• The approach is to replace the
type definition with a place
holder that is filled in later
In ADA:
generic
Max_Size : positive;
type Element_Type is private;
… rest of ADT as before except that
Element_Type replaces Integer
and Max_Size as a constant is
removed
now we instantiate our ADT:
package Integer_Stack is new
Generic_Stack(100, Integer);
11. Parameterized ADTs Continued
• In C++, parameterized ADTs are implemented as
templated classes
– to change the stack class’ size, only change the constructor to
receive the size as a parameter, which is used to establish the
size of the array
– to change the stack’s type, the class is now defined as a
template using template <class Type> where Type is the
place-holder to be filled in by a specific type at run-time
• In both Ada and C++, the parameterized ADT
definition is generated at compile-time
– the new statement signals that a new package should be
generated by the compiler
• in C++, if two definitions ask for the same type of ADT, only 1 set of
source code is generated, in Ada, the same source code is generated
twice!
• In Java and C#, parameterized ADTs are implemented
as generic classes (you should have covered this in 360
for Java, so we skip it here)
12. Encapsulation Constructs
• For large programs, to avoid having to recompile all code when
one section changes
– code can be grouped into logically related chunks called encapsulations
• one approach is the nested subprogram, place logically related subprograms
inside of the subprograms that commonly call them, although this approach is
not available in C-languages since nesting of subprograms is not possible
• use a header file (C, C++) and place logically related functions in the same
file, distributing the program across multiple files
– C++ goes beyond simple header files and includes the notation of a friend which
has access to private definitions
• Ada packages (which can be compiled separately) can include any number of
data and subprogram declarations so that they can contain interfaces for
multiple ADTs
• C# assemblies that can share code with other software written in the .NET
environment
• Each language has some technique for then using the named
encapsulation, sometimes called a namespace
– see the notes section of this slide for details in various languages
Notas do Editor
Programmers and computer science in general greatly improved on programming methodologies over time as they learned about good and bad programming habits, and what programming language constructs were needed to help support good habits. One major change in programming emphasis arose in the late 70s/early 80s as a response to programmers creating data structures as needed, rather than in a principled manner. The solution is known as abstract data types, to support the need of data abstraction.
Since you have already studied abstract data types in CSC 360, we will skip over the example illustrated in the textbook in section 11.2 and concentrate on how languages support them.
Early languages had no support for ADTs. Early FORTRAN had no structures other than arrays. COBOL allowed one to define a structure but had no mechanisms for encapsulation or information hiding. PL/I included all of the various data structures as part of the language so that, while you could declare a variable to be of a specific data structure type and access it through built-in processes, you could not define your own. Simula-67 was the first to offer the ability to define your own data structures, but this idea was not popularized until ALGOL-68. Its two primary successors, Pascal and C, popularized the notion of programmer defined structures, which has been provided in nearly every language since. However, neither of these languages has mechanisms for information hiding, and they only have weak support for encapsulation (encapsulation is not mandatory).
By restricting an ADT’s access to be via pointer, one can modify the ADT code and recompile that definition without having to recompile any user code. For instance, if I define an ADT in a file and compile it, and then you write a program to use my ADT, fine. Later, I modify the structure of my ADT (without modifying the interface) and recompile it and make the new object file available to you. You will not have to recompile your code if your code accesses the ADT through a pointer. However, if your code declares a variable of the ADT structure itself, then your code MUST be recompiled because I have changed the storage requirements for the ADT.
Simula 67 was the first language to provide a facility for user-defined classes. Objects were dynamically allocated, which was fairly unique at the time, and the class construct combined both data structure definition and the subprograms to operate on the variables, making Simula-67 the first language to offer encapsulation. However, Simula 67 did not offer a mechanism for information hiding, so the Simula 67 class fails as an ADT. ALGOL-68 would also permit the user to define data structures but had no encapsulation or information hiding.
The form of a class in Simula67 is
class class_name;
begin
variable declarations here
subprogram definitions of operators here
code section here
end class_name;
For most languages, ADTs are implemented by pointers and so assignment merely copies pointer values so that one pointer now points at the other ADT, rather than copying the data structure itself. In Ada, assignment means “copy the data structure into a new memory location”. Similarly, equality in most languages tests the two pointers to see if they point at the same memory location, but in Ada, equality tests to see if two data structures have the same data values. While assignment and equality may be less efficient in Ada than other languages, it provides more flexibility in that these operations are more meaningful.
NOTE: because we ar eusing List_Type in our Stack_Type definition, List_Type must be defined first, and therefore it is defined prior to our actual Stack ADT, which is a record with a List and a Topsub. The List_Type does not have to be defined in either the private section or in this package as long as it is defined somewhere, but it makes the most sense to define it where it is because it should also be hidden.
By making the ADT definition above a pointer to a stack record, and defining the actual data structure in the body package, it makes our definition a little cleaner – that is, we are not defining the data structure in one place and the code elsewhere, we are defining the interface in one place and the structure and code in the body package. The specification package would look like this instead:
private
type Stack_Type;
type Stack_Ptr is access Stack_Type;
And then define Stack_Type itself (along with List_Type) in the body package.
While this is cleaner, there is a drawback to this approach in that a user program can declare a Stack_Ptr and manipulate it without having it point at a Stack_Type and therefore lead to run-time errors. In addition, equality and assignment are now of pointers and therefore do not copy the data structure or compare the data structure as is planned in Ada.
Notice that the actual ADT definition is omitted in the body package because it had already been defined in the specification package
The author expresses concern regarding the use of pointers to ADTs. In this Ada example, the data structure is a record (like a struct), and not a pointer to a record (struct). There are advantages and disadvantages to this approach. The primary advantages are that we do not have to perform pointer dereferencing every time we want to access the data structure and that we don’t have typical pointer concerns (aliases, dangling pointers, lost objects). However, we also have disadvantages – assignment requires copying items between two data structures, and equality means testing items between two data structures.
The main advantage of using a pointer to a data structure however is to get away from the need to recompile code as discussed on the first slide’s notes for this chapter. The Modula-2 approach where all data structures are pointed to is cleaner than the Ada approach where the programmer has a choice. The author really seems to like the Ada approach better.
Note that C/C++ structs do not support encapsulation at all, however you can use structs and build your own encapsulation through the use of a header file.
We explore classes in chapter 12, so we won’t cover the above example in any detail here.
Note about Java and generics:
Prior to Java 1.5, generics were not available. You could however simulate this through the use of polymorphism. Recall that if you had some class Parent and subclass Child, and a method called foo in Parent, then an object of either class could call upon foo, so foo becomes a generic method. If you could extend this concept by making the ADT store data of type Objects, then since all object types descend from Object, the ADT can then implement methods that operate on the type Object but still permit you to store a specific type in the ADT (for instance, if the ADT is a stack, you could store Strings or Colors there). If you were to store a primitive type, you would use the appropriate wrapper classes (such as Integer to store int values). So you could make a generic ADT in Java through this technique. The only problem with this is to obtain a specific item from the ADT would require casting the Object to its right type. You can see a brief example of this on page 507-8 where the ADT is used to store Integers.
Java 1.5 cleans this up by permitting true generic objects for parameterized ADTs.
The idea of a namespace is that it allows units of a program (e.g., function names) to have the same name but be different sets of code – a namespace is a container (encapsulation) in which a given name can be recognized. Here, we look at how namespaces are specified in some of the more common languages.
C++ uses namespace to specify a namespace. To access elements of a namespace, you use the :: operator. The :: is known as the scope resolution operator. The :: notation is needed when two libraries have different definitions but share the same named item. An example might be having a namespace for MyStack and referencing a variable of that encapsulation using MyStack::topPtr.
Java uses packages which combine one or more class definitions. Without a package, special access can be granted between classes (we cover this in chapter 12). We use import to import an entire package or a specific class from a package:
import java.io.*; imports all classes in the package
import java.io.JOptionPane; imports only the selected class
Since Java is OO, the use of the namespace is governed by interaction with objects. Therefore, Java does not permit imported classes to share names, but names of different class’ methods, variables and constants can be shared. You would address a shared item by referencing the class, as in aStack.topPtr or aStack.pop();
Ada also uses packages. In Ada, the with statement is used to include a package, and the uses statement is used to specify the specific definition (ADT) desired from that package. For example:
with Ada.Text_IO;
uses Ada.Text_IO;
oruses Ada.Text_IO.Put;
Ruby uses modules, which are collections of methods and constants. Modules are unlike classes though as in that a Module is not a definition that can be instantiated or extended.