332 ch07

ITCS332:
Organization of
Programming Languages

Chapter 6
Data Types
ISBN 0-321-33025-0

Chapter 6 Topics
• Introduction
• Primitive Data Types
• Character String Types
• User-Defined Ordinal Types
• Array Types
• Associative Arrays
• Record Types
• Union Types
• Pointer and Reference Types

.ITCS332 by Dr. Abdel Fattah Salman 6-2

Introduction
• A data type defines a collection of data objects and a set of predefined
operations on those objects
• How well the data types match the real-world problem space; so it is crucial
that a L support an appropriate variety of data types and structures.
• PL/I included many data types, with intent to supporting a large range of
applications.
• A better approach in ALGOL68: provide a few basic types and a few flexible
structure-defining operators allowing a user to design data structures for each
need.
• User-defined types improves readability through the use of meaningful names
for types.
• User-defined types aid modifiability: A user can change the type of a category
of variables in a program by changing only a type declaration statement.
• The fundamental idea of an abstract data type is that the use of a types is
separated from the representation and set of operations on values of that type.
• The 2 most common structured (nonscalar) data types are: arrays and records


Introduction
• These DT are specified by type operators or constructors: In C Ls, the type
operators are: brackets, parentheses, and asterisks are used to specify
arrays, functions, and pointers.
• A descriptor is the collection of the attributes of a variable. In an
implementation, a descriptor is a collection of memory cells that store
variable attributes
• If all attributes are static, descriptors are needed only at compile time. They
are built by a compiler and stored in symbol table.
• For dynamic attributes, part or all of the descriptor are maintained during
execution.
• Descriptors are used for type checking and to build the code for allocation
and deallocation operations.
• The word object is associated with the value of a var and space it occupies.
• An object represents an instance of a user-defined (abstract data) type.
• In OO Ls, every instance of every class (predefined or user-defined) is
object
• One design issue for all data types: What operations are defined for vars of
the type and how are they specified?

Primitive Data Types
• Primitive data types: Those not defined in terms of other
data types
• Almost all programming languages provide a set of
primitive data types
– Some primitive data types are merely reflections of the
hardware
– Others require little non-hardware support
– Primitive data types are used along with one or more type
constructors to build the structured types.
– Primitive data types include: integer, real, decimal, character,
boolean


Primitive Data Types: Integer
• Almost always an exact reflection of the hardware so the mapping
is trivial
• There may be as many as eight different (in size) integer types in
a language
• Java’s signed integer sizes: byte, short, int, long.
• C# and C++ include unsigned integer types.
• Signed integers are stored in 2’s complement representation.


Primitive Data Types: Floating Point
• FP types model real numbers as approximations (π and e).
• Problems: Approximated representation and Loss of accuracy
through arithmetic operations
• Languages for scientific use support at least two floating-point
types (e.g., float(4 bytes) and double(8 bytes);
sometimes more
• Usually exactly like the hardware, but not always
• IEEE Floating-Point Standard 754
• Precision is the accuracy of the
fractional part of the value it bits.
• Range is a combination of the
exponent and fraction ranges.


Primitive Data Types: Decimal
• Computers designed for business applications have hardware
support for decimal data types.
• Decimal types store a fixed number of decimal digits, with a
decimal point at fixed position in the value.
• For business applications (money)
– Essential to COBOL
– C# offers a decimal data type
• Advantage: accuracy
• Disadvantages: limited range, wastes memory.
• Decimal types are stored in BCD: unpacked -one digit per byte,
packed -2 digits per byte.
• Operations on decimal values are done in hardware or by
simulation.


Primitive Data Types: Boolean
• Simplest of all
• Range of values: two elements, one for “true” and one
for “false”.
• Boolean types are used to represent switches or flags in
programs
• Could be implemented as a single bit, but often as a byte
– Advantage: readability


Primitive Data Types: Character
• Stored as numeric codings
• Most commonly used coding: ASCII
• An alternative, 16-bit coding: Unicode
– Globalization of business
– Computers need to communicate with other computers
– Includes characters from most natural languages
– The first 128 characters of Unicode are similar to those of
ASCII
– Originally used in Java
– C# and JavaScript also support Unicode


Character String Types
• Values are sequences of characters
• Design issues:
– Is it a primitive type or just a special kind of array?
– Should the length of strings be static or dynamic?
• C and C++ define strings as array of chars and provide string operations as
functions in standard library “string.h”. Strings are ASCIIZ.
• Problem: Move string data do not guard against overflowing a destination. C++
programmers must use string class from standard library rather than char array.
• In C# and Java, strings are supported as a primitive type by string class
(constant strings) and stringbuffer class (variable strings like arrays of chars).
• Typical operations on strings:
– Assignment, Comparison (=, >, etc )and copying are complicated if
operands have variable lengths.
– Catenation
– Substring reference: is a reference to a substring in a given string
– Perl, JavaScirpt, and PHP include built-in Pattern matching operations
based on regular expressions.


Character String Types
• The pattern expression: /[A-Za-z][A-Za-zd]+/
matches typical names in PLs.
• Brackets enclose character classes.
• The first class specifies all letters; the second specifies all letters and
digits.
• The plus specifies that there must be one or more of what is in the
category.
• So, the whole pattern matches strings that begin with a letter followed
by one or more letters or digits.
• The pattern expression /d+.?d*|.d+/ ,matches numeric
literals.
• The . Specifies the decimal point; the ? Quantifies what it follows to
have zero or one appearance; The | separates 2 alternatives in the
whole pattern:
• The first pattern matches strings of one or more digits, possibly
followed by decimal point, followed by zero or more digits;
• The second alternative matches strings that begin with a decimal point
followed by one or more digits.

Character String Type in Certain Languages

• C and C++
– Not primitive
– Use char arrays and a library of functions that provide
operations
• SNOBOL4 (a string manipulation language)
– Primitive
– Many operations, including elaborate pattern matching
• Java
– Primitive via the String class


Character String Length Options
There are several design options regarding the string length:
• Static length string: the length is set when it is created as in COBOL and Java’s
String class
• Limited Dynamic Length: strings of varying length up to a fixed maximum defined by
var’s definition as in C and C++
– In C-based language, a special character is used to indicate the end of a string’s
characters, rather than maintaining the length
• Dynamic Length strings: strings of varying length with no maximum as in SNOBOL4,
Perl, JavaScript
• Ada supports all three string length options:
– Type string from the standard package.
– Type bounded_string from the Ada.Strings.Bounded package.
– Type Unbounded_string from the Ada.Strings.Unbounded package
• Character String Type Evaluation
– Aid to writeability
– Dealing with strings as arrays can be more cumbersome than dealing with primitive
string type. As a primitive type with static length, they are inexpensive to provide--
why not have them? (Providing string through standard library is like primitive
strings).
– Dynamic length is nice and flexible, but is it worth the expense?

Character String Implementation
• String types could be supported in hardware but in most cases
software is used to implement string storage, retrieval, and
manipulation.
• Static length: compile-time descriptor with 3 fields:
– Name of the type
– Type’s length in character
– Address of the first character
• Limited dynamic length: may need a run-time descriptor to store
both the maximum and the current lengths (C and C++ do not
require limited dynamic descriptor because the string is
terminated with null).
• Dynamic length: need run-time descriptor to store only the
current length;
• All descriptors are stored in symbol table..

Compile- and Run-Time Descriptors
• Allocation/de-allocation is the biggest implementation problem: The
storage must grow and shrink as needed. There are 2 approaches:
String can be store in a linked list – extra storage occupied by the
links and necessary complexity of string operations, and simple
allocation and deallocation.
Using adjacent memory cells to store a complete string – requires
less storage and faster string operations, but allocation and
deallocation are slower.

Compile-time descriptor Run-time descriptor for
for static strings limited dynamic strings

User-Defined Ordinal Types
• An ordinal type is one in which the range of possible
values can be easily associated with the set of positive
integers
• Examples of primitive ordinal types in Java
– integer
– char
– boolean


Enumeration Types
• An enumeration type is one in which all possible values, which
are named constants, are provided in the definition.
• Enumeration types provide a way of defining and grouping
collections of named constants.
• Enumeration constants are implicitly assigned integers: 0,1, …
and can be explicitly assigned any integer in the definition.
• An example in C# :
enum days {mon, tue, wed, thu, fri, sat, sun};
• Design issues: All are related to type checking
– Is an enumeration constant allowed to appear in more than one
type definition, and if so, how is the type of an occurrence of
that constant checked?
– Are enumeration values coerced to integer?
– Any other type coerced to an enumeration type?

Enumeration Types
• In Ls that do not have enumeration types, programmers simulate them
with integer values as in FORTRAN77: we use 0 to represent blue, 1 to
represent red, …
integer red, blue
data red, blue /0, 1/
• The problem with this approach is that because we have not defined a
type for our colors, there is no type checking when they are used.
enum colors {red, blue, green, yellow, black};
colors mycolor= blue, yourColor= red;
• The enumeration values are coerced to int when they are put in an
integer context. For example, if the current value of mycolor is blue,
then mycolor++ would assign green to mycolor.
• C++ enumeration constants can appear in only ONE enumeration type
in the same referencing environment.
• C# enumeration types are like C++, except that they never coerced to
integers.

Evaluation of Enumerated Type
• Aid to readability: Named values are easily recognized, whereas coded
values are not - e.g., no need to code a color as a number
• Aid to reliability, e.g., compiler can check:
–No arithmetic operations are legal on enumeration types. (don’t allow
colors to be added)
–No enumeration variable can be assigned a value outside its defined
range.
–In C++: Numeric values can be assigned to enumeration type
variables only if they are cast to the type of the assigned variables .
Numeric values are assigned to enumeration type variables are
checked to determine whether they are in the range of the internal
values of the enumeration type.
–Ada, C#, and Java 5.0 provide better support for enumeration than
C++ because enumeration type variables in these languages are not
coerced into integer types

Subrange Types
• A Subrange Type is an ordered contiguous subsequence of an
ordinal type. For example: 12..18 is a subrange of integer type
• Ada’s design:
– Subranges are part of subtypes. Subtypes are not new types,
but only new names for restricted versions of existing types.
type Days is (mon, tue, wed, thu, fri, sat, sun);
subtype Weekdays is Days range mon..fri;
subtype Index is Integer range 1..100;
– The restriction on the existing type is in the range of possible
values. All operations defined for parent type are also defined
for the subtype.
Day1: Days;
Day2: Weekday;
Day2 := Day1;


Subrange Types
– The compiler must generate range-checking code for every
assignment to a subrange variable.
– Types are checked for compatibility at compile time and range
checking is done at run time.
– Common uses of user-defined ordinal types: indexes of arrays and
loop vars.
– Subrange types are different from Ada’s derived types:

Type derived_small_int is new integer range 1..100;
Subtype subrange_small_int is integer range 1..100;

– Vars of both types inherit the value range and operations of integer.
– Variables of derived_small_int are not compatible with any
integer type.
– Variables of type subrange_small_int are compatible with
variables and constants of integer type and any subtype of integer.

Subrange Evaluation
• Aid to readability: Make it clear to the readers that variables of
subrange can store only certain range of values
• Reliability: Assigning a value to a subrange variable that is
outside the specified range is detected as an error by the compiler
or by the run-time system.
Implementation of User-Defined Ordinal Types
• Enumeration types are implemented as integers
• Subrange types are implemented like the parent types with
code inserted (by the compiler) to restrict assignments to
subrange variables.
– This step increases code size and execution time but is usually
considered well worth the cost.


Array Types
• An array is an aggregate of homogeneous data elements in which an
individual element is identified by its position in the aggregate,
relative to the first element.
• A reference to an array element needed one or more subscripts which
require a run-time calculation to determine the mem location being
referenced.
• Array Design Issues:
– What types are legal for subscripts?
– Are subscripting expressions in element references range
checked?
– When are subscript ranges bound?
– When does allocation take place?
– What is the maximum number of subscripts?
– Can array objects be initialized?
– Are any kind of slices allowed?

Array Indexing
• Specific elements are determined by means of a two-level mechanism:
the first part is the aggregate name, the second part is a dynamic
selector consisting of one or more items known as subscripts or indexes.
• If all subscripts in a reference are constants – the selector is static,
otherwise it is dynamic.
• Indexing (or subscripting) is a mapping from indices to elements
array_name(index_value_list)→ an element
• Index Syntax
– FORTRAN, PL/I, Ada use parentheses
• Ada explicitly uses parentheses to show uniformity between
array references and function calls because both are mappings
– Most other languages use brackets
– In Ls that provide multidimensional arrays as array of arrays, each
subscript appears in its own bracket.

Arrays Index (Subscript) Types
• A problem with using parentheses to enclose subscripts and
subprogram parameters. Context information is used to solve it.
• Array element references map subscripts to specific array element
• Function calls map parameters to functional values.
• List(59) may be a reference to array element or a call to a
function named list.
• FORTRAN, C: integer only
• Pascal: any ordinal type (integer, Boolean, char, enumeration)
• Ada: integer or enumeration (includes Boolean and char)
• Java: integer types only
• C, C++, Perl, and Fortran do not specify range checking
• Java, ML, C# specify range checking

Subscript Binding and Array Categories
• There are 5 categories of arrays: category definition is based on
range subscript binding and binding to storage
• The category name indicate where and when storage is allocated.
• A Static array is one in which the subscript ranges are statically
bound and storage allocation is static (before run-time)
– Advantage: efficiency (no dynamic allocation)
• A Fixed stack-dynamic array: is one in which subscript ranges are
statically bound, but the allocation is done at declaration
elaboration time during execution.
– Advantage: space efficiency; A large array in one subprogram
can use the same space in another subprogram (The 2
subprograms are not active at the same time).


Subscript Binding and Array Categories (continued)
• A Stack-dynamic array is one in which subscript ranges are
dynamically bound and the storage allocation is dynamic (done at
run-time). Once the subscript ranges are bound and storage is
allocated, they remain fixed during the lifetime of the var.
– Advantage: flexibility (the size of an array need not be known
until the array is to be used).
• A Fixed heap-dynamic array: similar to fixed stack-dynamic:
storage binding is dynamic but fixed after allocation (i.e., binding
is done when requested and storage is allocated from heap, not
stack).
• A Heap-dynamic array is one in which binding of subscript
ranges and storage allocation is dynamic and can change any
number of times during the array's lifetime.
– Advantage: flexibility (arrays can grow or shrink during
program execution).

Subscript Binding and Array Categories (continued)
• C and C++ arrays that include static modifier are static
• C and C++ arrays without static modifier are fixed stack-dynamic
• C and C++ provide fixed heap-dynamic arrays (By using
operators: new and delete).
• Ada arrays can be stack-dynamic
• In Java all arrays are heap-dynamic arrays.
• C# provides fixed heap-dynamic arrays and includes a second
array class ArrayList that provides heap-dynamic: Objects of this
class are created without any elements and added to this object using the
add method.
• Perl and JavaScript support heap-dynamic arrays: array can
grow and shrink: In Perl we create array of 5 elements with @list =
(1,2,3,5,7).
• It can be lengthened with push function as push(@list, 11, 19);
• The arrays can be emptied with @list = ();


Array Initialization
• Some language allow initialization at the time of storage allocation
– C, C++, Java, C# example int list [] = {4, 5, 7, 83}
– Character strings in C and C++ char name [] = “freddie”;
– Arrays of strings in C and C++
char *names [] = {“Bob”, “Jake”, “Joe”];
– Java initialization of String objects
string[] names = {“Bob”, “Jake”, “Joe”};

List:array(1..5)of integer:=(1,3,5,7,9); initializes all elements
Bunch:array(1..5)of integer:=(1=>17,3=>35,others =>0)

– The first and third elements are initialized using direct assignment and
others clause initializes the remaining elements.


Arrays Operations
• An array operation is one that operates on an array as a unit.
• Ada allows array assignment and also concatenation (&).
Concatenation is defined between 2 single-dimensional arrays and
between a single-dimensional array and a scalar.
• Fortran provides elemental operations because they are between
pairs of array elements
– For example, + operator between two arrays results in an array
of the sums of the element pairs of the two arrays
– Library functions for matrix multiplication, transpose, dot
product,…
• APL provides the most powerful array processing operations for
vectors and matrixes as well as unary operators (for example, to
reverse column elements). See examples of APL array operations
on page 272.


Rectangular and Jagged Arrays
• A rectangular array is a multi-dimensioned array in
which all of the rows have the same number of elements
and all columns have the same number of elements. All
subscripts are placed in a single pair of brackets.
• A jagged array has rows (columns) with varying number
of elements. The use separate pair of brackets for each
dimension a[6][5].
– Possible when multi-dimensioned arrays actually appear as
arrays of arrays


Slices
• A slice is some substructure of an array; nothing more
than a referencing mechanism
• Slices are only useful in languages that have array
operations
• Slice Examples:
– Fortran 95
Integer, Dimension (10) :: Vector
Integer, Dimension (3, 3) :: Mat
Integer, Dimension (3, 3) :: Cube

Vector (3:6) is a four element array


Slices Examples in Fortran 95


Implementation of Arrays
• Implementing arrays requires more compile time effort than does
implementing simple types (int).
• The code to access array element must be generated at compile
time.
• Access function maps subscript expressions to an address in the
array
• Access function for single-dimensioned arrays:
address(list[k])= address(list[lower_bound]) +
((k-lower_bound)* element_size)

• The compile-time descriptor for single-dimensioned arrays
includes information needed to construct access function.
• If all attributes are static and index range checking in not done at
run-time, no descriptor is needed.

Accessing Multi-dimensioned Arrays

• Multidimensional arrays are more complex to implement than single-
dimensioned arrays .
• Memory is linear – a simple sequence of bytes.
• Values of data types that have 2 or more dimensions must be mapped
onto the single-dimensioned memory.
• Two common ways to store a multidimensional array :
– Row major order (by rows) – used in most languages
– Column major order (by columns) – used in Fortran
• Sequential access to matrix elements will be faster if they are accessed
in the order in which they are stored – minimizing paging.
• The access function for a multidimensional array is the mapping of its
base address and a set of index values to the address in memory of the
element specified by the index values.
• The access function for a 2-dimensional array stored in row-major order
is shown below:

Row / Column major ordering

11 12 13 14 15
21 22 23 24 25
31 32 33 34 35

Row major order (second subscript increases faster)
11 12 13 14 15 21 22 23 24 25 31 32 33 34 35

Column major order (first subscript increases faster)
11 21 31 12 22 32 13 23 33 14 24 34 15 25 35


Locating an Element in a Multi-dimensioned Array
• The address of an element is the base address of the array plus the
element size times the number of elements preceding it in the array.
Loc (a[i, j])= address of a[1,1] + (# of elements preceding it ) * el_size
= address of a[1,1] + ((number of rows above ith row * row_size)
+ number of elements left of jth column) * el_size
= address of a[1,1] + (((i-1)*n + (j-1)) * el_size
= address of a[1,1] + (i*n-n+j-1) * el_size
= address of a[1,1] + ((i*n+j)-(n+1)) * el_size
= address of a[1,1] + ((i*n+j)*el_size-(n+1) * el_size
= address of a[1,1] -(n+1)*el_size +(i*n+j)* el_size

(i-1)*element size

(j-1)*element size

Locating an Element in a Multi-dimensioned Array

• The address of an element is the base address of the array plus the
element size times the number of elements preceding it in the array.
Location(a[i,j])= address of a[1,1]+
((number of rows above ith row * row_size) +
number of elements left of jth column) * element_size
• General format:
Location(a[i,j])=address of a[row_lb , col_lb]-
(((row_lb * n)+ col_lb)* element_size)+
(((I * n) + j) * element_size)

• The first 2 terms are the constant part and
the last is the variable part.
• For each dimension on an array, ONE add
and ONE multiply instructions are required
for the access function.

Compile-Time Descriptors

Single-dimensioned array
Multi-dimensional array

Associative Arrays
• An associative array is an unordered collection of data elements
that are indexed by an equal number of values called keys
– User defined keys must be stored in the structure
– In nonassociative arrays: the indices never need to be stored
– Each element of an associative array is a pair of entities: a
key and a value.
• Design issues: What is the form of references to elements?
• In Perl, associative arrays are called hashes – their elements are
stored and retrieved with hash functions. Every hash variable must
begin with %. Scalar variable begin with $. The key value is
placed in braces and the hash name is replaced by a scalar variable
name that is the same except for the first character.


Associative Arrays in Perl
• Names begin with %; literals are delimited by parentheses
%hi_temps=("Mon"=>77, "Tue" => 79, “Wed” => 65, …);
• Subscripting is done using braces and keys
$hi_temps{"Wed"} = 83;
– Elements can be removed with delete
delete $hi_temps{"Tue"};
The entire hash can be emptied by assigning an empty literal to
it: @salaries = ();


Perl’s Associative Arrays
• Perl has a primitive datatype for hash tables aka “associative arrays”.
• Elements indexed not by consecutive integers but by arbitrary keys
• %ages refers to an associative array and @people to a regular array
• Note the use of { } for associative arrays and [ ] for regular arrays

%ages = (“Bill Clinton”=>53,”Hillary”=>51,
"Socks“=>"27 in cat years");
$ages{“Hillary”} = 52;
@people=("Bill Clinton“,"Hillary“,"Socks“);
$ages{“Bill Clinton"}; # Returns 53
$people[1]; # returns “Hillary”
• keys(X), values (X) and each(X)
foreach $person (keys(%ages)) {print "I know the age
of $personn";}
foreach $age (values(%ages)){print "Somebody is
$agen";}
while (($person, $age) = each(%ages)) {print "$person
is $agen";}

Record Types
• A record is a possibly heterogeneous aggregate of
data elements in which the individual elements are
identified by names
• Design issues:
– What is the syntactic form of references to the field?
– Are elliptical references allowed?


Definition of Records
• COBOL uses level numbers to show nested records; others use recursive
definition dot notation
• Record Field References
– COBOL
field_name OF record_name_1 OF ... OF record_name_n
– Others ()
record_name_1. record_name_2. ... record_name_n. field_name
• COBOL uses level numbers to show nested records; others use recursive
definition
01 EMP-REC.
02 EMP-NAME.
05 FIRST PIC X(20).
05 MID PIC X(10).
05 LAST PIC X(20).
02 HOURLY-RATE PIC 99V99.

Definition of Records in Ada
• Record structures are indicated in an orthogonal way
type Emp_Rec_Type is record
First: String (1..20);
Mid: String (1..10);
Last: String (1..20);
Hourly_Rate: Float;
end record;
Emp_Rec: Emp_Rec_Type;


References to Records
• Most language use dot notation: Emp_Rec.Name
• Fully qualified references must include all record names
• Elliptical references allow leaving out record names as long as the
reference is unambiguous, for example in COBOL
FIRST, FIRST OF EMP-NAME, and FIRST of EMP-REC
are elliptical references to the employee’s first name.
Operations on Records
• Assignment is very common if the types are identical
• Ada allows record comparison
• Ada records can be initialized with aggregate literals
• COBOL provides MOVE CORRESPONDING
– Copies a field of the source record to the corresponding field
in the target record

Evaluation and Comparison to Arrays
• Straight forward and safe design
• Records are used when collection of data values is
heterogeneous
• Access to array elements is much slower than access to
record fields, because subscripts are dynamic (field
names are static)
• Dynamic subscripts could be used with record field
access, but it would disallow type checking and it would
be much slower


Implementation of Record Type

Offset address relative to
the beginning of the records
is associated with each field


Unions Types
• A union is a type whose variables are allowed to store
different type values at different times during execution
• Design issues
– Should type checking be required?
– Should unions be embedded in records?
Discriminated vs. Free Unions
• Fortran, C, and C++ provide union constructs in which there
is no language support for type checking; the union in these
languages is called free union
• Type checking of unions require that each union include a
type indicator called a discriminant
– Supported by Ada

Ada Union Types
type Shape is (Circle, Triangle, Rectangle);
type Colors is (Red, Green, Blue);
type Figure (Form: Shape) is record
Filled: Boolean;
Color: Colors;
case Form is
when Circle => Diameter: Float;
when Triangle =>
Leftside, Rightside: Integer;
Angle: Float;
when Rectangle => Side1, Side2: Integer;
end case;
end record;

Ada Union Type Illustrated

A discriminated union of three shape variables

Evaluation of Unions
• Potentially unsafe construct: Do not allow type checking
• Java and C# do not support unions: Reflective of growing concerns
for safety in programming language

Pointer and Reference Types
• A pointer type variable has a range of values consisting of
memory addresses and a special value, NULL (nil)
– Provide the power of indirect addressing
– Provide a way to manage dynamic memory
– A pointer can be used to access a location in the area where
storage is dynamically created (usually called a heap)
• Design Issues of Pointers
– What are the scope of and lifetime of a pointer variable?
– What is the lifetime of a heap-dynamic variable?
– Are pointers restricted as to the type of value to which they can
point?
– Are pointers used for dynamic storage management, indirect
addressing, or both?
– Should the language support pointer types, reference types, or both?


Pointer Operations
• Two fundamental operations: assignment and
dereferencing
• Assignment is used to set a pointer variable’s value to
some useful address
• Dereferencing yields the value stored at the location
represented by the pointer’s value
– Dereferencing can be explicit or implicit
– C++ uses an explicit operation via *
j = *ptr
sets j to the value located at ptr


Pointer Assignment Illustrated

The assignment operation j = *ptr


Problems with Pointers

• Dangling pointers (dangerous)
– Dangling Pointer is when dynamic memory has been deallocated (deleted)
but there is one or more pointers still pointing to it. A pointer points to a
heap-dynamic variable that has been de-allocated
– Creating one:
• Allocate a heap-dynamic variable and set a pointer to point at it
• Set a second pointer to the value of the first pointer
• Deallocate the heap-dynamic variable, using the first pointer
– Example:
int *myPtr,*urPtr;
myPtr = new int(10);
cout << "The value of myPtr is " << *myPtr << endl;
urPtr = myPtr;
delete myPtr; // urPtr is a “dangling pointer”
*myPtr = 5;
cout << "The value of myPtr is " << *myPtr << endl;
• It is an error to dereference a pointer after deleting any of its aliases. This creates
“dangling pointers”

Problems with Pointers
• Lost heap-dynamic variable
– An allocated heap-dynamic variable that is no longer accessible to
the user program (often called garbage)
– Creating one:
• Pointer p1 is set to point to a newly created heap-dynamic
variable
• Pointer p1 is later set to point to another newly created heap-
dynamic variable. This causes losing the first heap-dynamic
variable, i.e. that variable cannot be accessed or deallocated.
• Example:
void *p1,*p2;
p1 = new int(10);
p1=new float (7.4); //The int var(=10) is lost

• The process of losing heap-dynamic variables is called memory
leakage


Pointers in Ada
• Some dangling pointers are disallowed because dynamic
objects can be automatically de-allocated at the end of
pointer's type scope
• All pointers are initialized to null
• The lost heap-dynamic variable problem is not
eliminated by Ada


Pointers in C and C++
• Extremely flexible but must be used with care
• Pointers can point at any variable regardless of when it was allocated
• Used for dynamic storage management and addressing
• Pointer arithmetic is possible
• Explicit dereferencing and address-of operators
• Domain type need not be fixed (void *)
float stuff[100];
float *p;
p = stuff;
*(p+5) is equivalent to stuff[5] and p[5]
*(p+i) is equivalent to stuff[i] and p[i]

• void * can point to any type and can be type checked (cannot be de-
referenced)


Pointers in Fortran 95
• Pointers point to heap and non-heap variables
• Implicit dereferencing
• Pointers can only point to variables that have the
TARGET attribute
• The TARGET attribute is assigned in the declaration:
REAL, POINTER :: ptr (POINTER is an attribute)
ptr => target (where target is either a pointer or a non-pointer
with the TARGET attribute)
The TARGET attribute is assigned in the declaration, e.g.

INTEGER, TARGET :: NODE

Reference Types

• C++ includes a special kind of pointer type called a
reference type that is used primarily for formal
parameters
– Advantages of both pass-by-reference and pass-by-value
• Java extends C++’s reference variables and allows them
to replace pointers entirely
– References refer to call instances
• C# includes both the references of Java and the pointers
of C++


Evaluation of Pointers
• Dangling pointers and dangling objects are problems as
is heap management
• Pointers are like goto's--they widen the range of cells
that can be accessed by a variable
• Pointers or references are necessary for dynamic data
structures--so we can't design a language without them.

Representations of Pointers
• Large computers use single values
• Intel microprocessors use segment and offset


Solving Dangling Pointer Problem
• Tombstone: extra heap cell that is a pointer to the heap-
dynamic variable
– The actual pointer variable points only at tombstones
– When heap-dynamic variable is de-allocated, tombstone
remains but set to nil
– Costly in time and space
• Locks-and-keys: Pointer values are represented as (key, address)
pairs
– Heap-dynamic variables are represented as variable plus cell
for integer lock value.
– When heap-dynamic variable is allocated, lock value is created
and placed in lock cell and key cell of pointer.


Heap Management
• Memory management: identify unused, dynamically allocated memory cells
and return them to the heap
• Approaches
– Manual: explicit allocation and deallocation (C, C++)
– Automatic:
• Reference counters (modula2, Adobe Photoshop)
• Garbage collection (Lisp, Java)
• Problems with manual approach:
– Requires programmer effort
– Programmer’s failures leads to space leaks and dangling references/sharing
– Proper explicit memory management is difficult and has been estimated to
account for up to 40% of development time!
• A very complex run-time process
• Single-size cells vs. variable-size cells
• Two approaches to reclaim garbage
– Reference counters (eager approach): reclamation is gradual
– Garbage collection (lazy approach): reclamation occurs when the list of
available space becomes empty

Reference Counter
• Idea: keep track how many references there are to a cell in memory. If
this number drops to 0, the cell is garbage.
• Reference counters: maintain a counter in every cell that store the
number of pointers currently pointing at the cell
• Store garbage in free list; allocate from this list
• Advantages
– resources can be freed directly
– immediate reuse of memory possible
• Disadvantages
– Can’t handle cyclic data structures
– Bad locality properties
– Large overhead for pointer manipulation
– Disadvantages: space required, execution time required,
complications for cells connected circularly


Garbage Collection
• GC is a process by which dynamically allocated storage is reclaimed during
the execution of a program.
• Usually refers to automatic periodic storage reclamation by the garbage
collector (part of the run-time system), as opposed to explicit code to free
specific blocks of memory.
• Usually triggered during memory allocation when available free memory falls
below a threshold. Normal execution is suspended and GC is run.
• The run-time system allocates storage cells as requested and disconnects
pointers from cells as necessary; garbage collection then begins
– Every heap cell has an extra bit used by collection algorithm
– All cells initially set to garbage
– All pointers traced into heap, and reachable cells marked as not garbage
– All garbage cells returned to list of available cells
– Disadvantages: when you need it most, it works worst (takes most time
when program needs most of cells in heap)
• Major GC algorithms:
– Mark and sweep
– Copying
– Incremental garbage collection algorithms

Marking Algorithm


Variable-Size Cells
• All the difficulties of single-size cells plus more
• Required by most programming languages
• If garbage collection is used, additional problems occur
– The initial setting of the indicators of all cells in the heap is
difficult
– The marking process in nontrivial
– Maintaining the list of available space is another source of
overhead


Summary
• The data types of a language are a large part of what determines
that language’s style and usefulness
• The primitive data types of most imperative languages include
numeric, character, and Boolean types
• The user-defined enumeration and subrange types are convenient
and add to the readability and reliability of programs
• Arrays and records are included in most languages
• Pointers are used for addressing flexibility and to control dynamic
storage management


332 ch07

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 332 ch07

Similar to 332 ch07 (20)

Recently uploaded

Recently uploaded (20)

332 ch07