The document discusses various data structures used in programming, including arrays, lists, linked lists, stacks, queues, and dictionaries. It provides definitions and summaries of each data structure, including their common operations and time complexities. For example, it notes that arrays provide O(1) direct access by index but fixed size, while lists are dynamically sized but insertion/deletion at non-end positions is O(n).
2. Asymptotic notation
Before writing any program, we write some blueprint which is called as an
algorithm.
We can have many solutions for each algorithm like A1, A2, A3 … etc
Analyze the algorithm in terms of Time and Space complexity. Based on that
we will select the best algorithm.
There are some notations created by scientists in order to denote these
complexities in simple terminology called as Asymptotic notation.
Types:
Big oh notation (O notation) – Used to denote the worst case / upper bound of the
algorithm. We are always interested in this.
Omega notation (Ω notation) – Used to denote the best case/ lower bound of the
algorithm
Theta notation ( notation) – Used to denote average case of the algorithm
Ex with array : 5,4,2,6,8,9 best case Ω(1), worst case O(n) , average analysis
(n/2) = (n)
3. Mostly used Asymptotic notations
constant − Ο(1)
logarithmic − Ο(log n)
linear − Ο(n)
n log n − Ο(n log n)
quadratic − Ο(n
2
)
cubic − Ο(n
3
)
polynomial − n
Ο(1)
exponential − 2
Ο(n)
4. What is ADT ?
To manage the complexity of problems and the problem-solving process,
computer scientists use abstractions to allow them to focus on the “big
picture” without getting lost in the details.
An abstract data type, sometimes abbreviated ADT, is a logical description of
how we view the data and the operations that are allowed without regard to
how they will be implemented.
Example : List, Map
One ADT can have several implementations
5. Example of ADT
Lets consider the interface System.Collections.IList
The basic operations, which it defines, are:
int Add(object) – adds element in the end of the list
void Insert(int, object) – inserts element on a preliminary chosen position
in the list
void Clear() – removes all elements in the list
bool Contains(object) – checks whether the list contains the element
void Remove(object) – removes the element from the list
void RemoveAt(int) – removes the element on a given position
int IndexOf(object) – returns the position of the element
this[int] – indexer, allows access to the elements on a set position
6. What is data structure and it’s need?
Data structure is a systematic way of organizing data in order to use it efficiently.
Choosing right data structure makes program much more efficient – We could
save memory and execution time. Sometimes even the amount of code that we
write.
Need:
As applications are getting complex, data also getting increased. Due to this,
below are the three common problems that we are facing today.
Data Search
Processing Speed
Multiple requests
7. Basic data structures in programming.
Linear – these include arrays(Array), lists(ArrayList, List<T>), stacks(Stack<T>),
queues(Queue<T>) and linked lists(LinkedList<T>)
Non-Linear:
Dictionaries – key-value pairs organized in hash tables (HashTable and
Dictionary<T>)
Tree-like – Tree, Binary tree, AVL tree, Spanning tree and Heap
Sets – unordered bunches of unique elements
Others – multi-sets, bags, multi-bags, priority queues, Graphs…
8. Motivation behind inventing the array
Let’s say you have a requirement to store 100 values into the memory. How can
we store these many values into the memory with out using arrays.
What is the basic thing required to store some value into the memory in high
level languages?
A variable, which holds the address location of the memory.
In order to store 100 values into the memory, we need to create 100 variables
in the program ?
100 variable is fine, what if you want to store/access 10000 elements ?
9. Array
Arrays are one of the simplest and most commonly used data
structure in computer programming.
All the elements of array must be of same type. Hence arrays are
homogenous (Why?)
The contents of the array is stored in contiguous memory
block.(Why?)
All the elements can be directly accessed with index. (How?)
Let’s take an example to understand how array stored into the
heap.
Ex: bool [] booleanArray;
FileInfo [] files;
booleanArray = new bool[10];
files = new FileInfo[10];
11. Two dimensional arrays
Two dimensional arrays.
For example , if I create multi dimensional array with mxn values then this is how it is
going to store the data in memory
3D array :
12. Basic operation on Array
Read elements by index O(1)
Ex: int valueAtIndexTwo = array[2];
Write element by specifying the index
Ex: array[10] = 12; O(1)
Search for an element by value O(n)
Search for an element by value using Binary search O(log n) only
when array is sorted
http://eli.thegreenplace.net/2015/memory-layout-of-multi-
dimensional-arrays/
13. Array analysis
Ordering – Guaranteed
Contiguous –Yes
Direct access –Yes via index O(1)
Look up efficiency – O(1)
ArrayList has O(n) time complexity for arbitrary indices of add/remove, but O(1) for
the operation at the end of the list.
The running time of an array access is denoted O(1) because it is constant. That is,
regardless of how many elements are stored in the array, it takes the same amount
of time to look up an element.
This constant running time is possible solely because an array's elements are stored
contiguously, hence a lookup only requires knowledge of the array's starting
location in memory, the size of each array element, and the element to be
indexed.
The .NET Framework does an automatic check on each element access attempt,
whether the index is valid or it is out of the range of the array.
14. Limitations of Array
The size of the array is fixed while declaration itself.
Can store only similar data items
15. Array List
The ArrayList maintains an internal object array and provides
automatic resizing of the array as the number of elements added to
the ArrayList grows.
Because the ArrayList uses an object array, developers can add any
type—strings, integers, FileInfo objects, Form instances, anything.
Therefore, even if you have an ArrayList that stores nothing but value
types, each ArrayList element is a reference to a boxed value type,
as shown below.
The boxing and unboxing, along with the extra level of indirection,
that comes with using value types in an ArrayList can hamper the
performance of your application when using large ArrayLists with
many reads and writes.
17. Basic operation on ArrayList
Add(object) – adding a new element
Insert(int, object) – adding a new element at a specified position
(index)
Count – returns the count of elements in the list
Remove(object) – removes a specified element
RemoveAt(int) – removes the element at a specified position
Clear() – removes all elements from the list
this[int] – an indexer, allows accessing the elements by a given
position (index)
18. ArrayList.Insert():
if (_size == _items.Length)
{
EnsureCapacity(_size + 1);
}
if (index < _size)
{
Array.Copy(_items, index, _items, index + 1, _size - index);
}
_items[index] = value;
_size++;
Copies a range of elements from System.Array starting at the specified source index and pastes them to
another System.Array starting at the specified destination index. The length and the indexes are specified as 32-
bit integers.
19. ArrayList.RemoveAt():
_size--;
if (index < _size)
{
Array.Copy(_items, index + 1, _items, index, _size - index);
}
Copy(sourceArray, sourceIndex, destinationArray, destinationIndex,
length, false);
Copies a range of elements from an System.Array starting at the
specified source index and pastes them to another System.Array
starting at the specified destination index. The length and the indexes
are specified as 32-bit integers.
20. Analysis of ArrayList
Ordering – Guaranteed
Contiguous –Yes
Direct access –Yes via index O(1)
Look up efficiency – O(1)
ArrayList has O(n) time complexity for arbitrary indices of
add/remove, but O(1) for the operation at the end of the list
21. Limitations of ArrayList
The main problem with ArrayList is that is uses object - it means you
have to cast to and from whatever you are encapsulating.
Implicit boxing will happen whenever you use a value type - it will
be boxed when put into the ArrayList and unboxed when
referenced.
Since generics came in, this object has become obsolete and
would only be needed in .NET 1.0/1.1 code.
22. List<T>
The List C# data structure was introduced in the .NET Framework 2.0 as part of
the new set of generic collections.
The List<T> class is a generic equivalent type of ArrayList.
It implements the IList<T>generic interface by using an array whose size is
dynamically increased as required.
It keeps its elements in the memory as an array.
It can be extremely efficient data structure when it is necessary to add elements
fast, extract elements and access the elements by index. Still, it is pretty slow in
inserting and removing elements unless these elements are at the last position.
Represents a strongly typed list of objects that can be accessed by index.
Provides methods to search, sort, and manipulate lists.
Elements in this collection can be accessed using an integer index. Indexes in
this collection are zero-based.
23. Operations on List<T>
We already explained that the List<T> class uses an inner array for keeping
the elements and the array doubles its size when it gets overfilled. Such
implementation causes the following good and bad sides:
- The search by index is very fast – we can access with equal speed each
of the elements, regardless of the count of elements.
- The search for an element by value works with as many comparisons as
the count of elements (in the worst case), i.e. it is slow.
- Inserting and removing elements is a slow operation – when we add or
remove elements, especially if they are not in the end of the array, we
have to shift the rest of the elements and this is a slow operation.
- When adding a new element, sometimes we have to increase the
capacity of the array, which is a slow operation, but it happens seldom
and the average speed of insertion to List does not depend on the count
of elements, i.e. it works very fast.
24. Analysis of List
Ordering – Guaranteed
Contiguous –Yes
Direct access –Yes via index O(1)
Look up efficiency – O(1)
Best for small list where direct access is required
25. Linked List
A linked-list is a sequence of data structures which are connected together via
links.
Linked List is a sequence of links which contains items.
Each link contains a connection to another link. Linked list the second most used
data structure after array.
Following are important terms to understand the concepts of Linked List.
Link − Each Link of a linked list can store a data called an element.
Next − Each Link of a linked list contain a link to next link called Next.
LinkedList − A LinkedList contains the connection link to the first Link called First.
26. Advantages of LinkedList<T>
The append operation is very fast, because the list always knows its
last element (tail).
Inserting a new element at a random position in the list is very fast
(unlike List<T>) if we have a pointer to this position, e.g. if we insert at
the list start or at the list end.
Searching for elements by index or by value in LinkedList is a slow
operation, as we have to scan all elements consecutively by
beginning from the start of the list.
Removing elements is a slow operation, because it includes
searching.
27. Analysis of LinkesList
Ordering – User has control over precise control over element over
ordering
Contiguous – No
Direct access – No
Look up efficiency – O(n)
Best for lists where inserting/deleting in middle is common and no
direct access required
28. Queue
Queue is an abstract data type, in which the first element is inserted
from one end called REAR(also called tail), and the deletion of
existing element takes place from the other end called
as FRONT(also called head)
This makes queue as FIFO data structure, which means that element
inserted first will also be removed first.
The process to add an element into queue is called Enqueue
The process of removal of an element from queue is
called Dequeue.
The process of reading the element at head node is called Peek.
29. The Queue – Basic Operations
Queue<T> class provides the basic operations, specific for the data
structure queue. Here are some of the most frequently used:
- Enqueue(T) – inserts an element at the end of the queue
- Dequeue() – retrieves the element from the beginning of the
queue and removes it
- Peek() – returns the element from the beginning of the queue
without removing it
- Clear() – removes all elements from the queue
- Contains(T) – checks if the queue contains the element
- Count – returns the amount of elements in the queue
30. .NET implementation of the Queue
In C# queue is implemented using Circular buffer.
Circular buffer: A circular buffer is a memory allocation scheme where memory is
reused (reclaimed) when an index, incremented modulo the buffer size, writes over
a previously used location.
Internally it uses array to implement the queue. So it looks like this
31. .NET implementation of the Queue
Is full : _tail = (_tail + 1) % _array.Length;
_head = (_head + 1) % _array.Length;
Is Empty:
33. Stack
Stack is an abstract data type or a linear data structure, in which
last element will be removed first.
This makes Stack as LIFO data structure, which means that element
inserted last will be removed first.
The process to add an element into stack is called Push
The process of removal of an element from stack is called Pop.
34. Stack<T> – Basic Operations
Push(T) – adds a new element on the top of the stack
Pop() – returns the highest element and removes it from the stack
Peek() – returns the highest element without removing it
Count – returns the count of elements in the stack
Clear() – retrieves all elements from the stack
Contains(T) – check whether the stack contains the element
ToArray() – returns an array, containing all elements of the stack
35. .NET implementation of Stack
Push :
// Pushes an item to the top of the stack.
//
public virtual void Push(Object obj) {
//Contract.Ensures(Count == Contract.OldValue(Count) + 1);
if (_size == _array.Length) {
Object[] newArray = new Object[2*_array.Length];
Array.Copy(_array, 0, newArray, 0, _size);
_array = newArray;
}
_array[_size++] = obj;
_version++;
}
36. .NET implementation of Stack
Pop :
// Pops an item from the top of the stack. If the stack is empty, Pop
// throws an InvalidOperationException.
public virtual Object Pop() {
if (_size == 0)
throw new
InvalidOperationException(Environment.GetResourceString("InvalidOperation_Empty
Stack"));
//Contract.Ensures(Count == Contract.OldValue(Count) - 1);
Contract.EndContractBlock();
_version++;
Object obj = _array[--_size];
_array[_size] = null; // Free memory quicker.
return obj;
}
38. What is Hash- Table
Problem with Ordinal indexing ?
39. Hash table combines the random access ability of array with the dynamism of
linked list.
i.e. Insertion/Deletion and Lookup can be done with O(1)
complexity if it is implemented correctly
To achieve this we can create a data structure where while inserting data, the
data itself gives us some clue about where we can store the data.
A Hash table is a combination of two things
First, a hash function which return a non negative value called Hash code.
Second, an array capable of storing the data that we want to place into the structure.
The idea is that we run our data through the hash function and then store the
data in the element of an array represented by the returned hashcode.
40. As elements are added to a Hashtable, the actual load factor of
the Hashtable increases. When the actual load factor reaches the specified
load factor, the number of buckets in the Hashtable is automatically increased
to the smallest prime number that is larger than twice the current number
of Hashtable buckets.
For very large Hashtable objects, you can increase the maximum capacity to 2
billion elements on a 64-bit system by setting the enabled attribute of the
configuration element to true in the run-time environment.
41. How insertion happens in Hashtable
How lookup works in hash table
Ex: if you want search for “John” in the hashtable, we pass key and it hashes that key and gets
the same hash code which was generated while inserting “John” in the hash table. That is 4 .
It searches “John” at the 4 index of hashtable and returns true as “John” is present at 4th index of
hashtable.
Each element is a key/value pair stored in a DictionaryEntry object.
private struct DictionaryEntry{
public TKey key;
public TValue value;
public int hashCode;
public int next;
}
42. How to define the Hash function?
There is no limit number of possible hash functions.
However there are some characteristics expected to qualify it as an
efficient hash function.
Deterministic – Every time pass the exact the same piece of data into the
hash function, we always get same hash code.
Uniformly distributed data – You should not get same hash code for different
values every time
Ex of hash function
43. What if we came across this situation
Do you see any problem in the following hastable
We call this as collision.
A collision occurs when two pieces of data run through the hash function and
get the same hash code.
We want to store both pieces of data and don’t want to override the existing
one with new one.
44. Collision resolution techniques
Linear probing : in this method if collision occurs we try to place the data in the
next consecutive index until we find the vacancy.It has clustering problem .
Quadratic probing : If slot s is taken, rather than checking slot s + 1, then s + 2,
and so on as in linear probing, quadratic probing checks slot s + 12 first, then s –
12, then s + 22, then s – 22, then s + 32, and so on. However, even quadratic
hashing can lead to clustering.
Chaining (Used in Dictionary<T>): Here linked list comes into picture. Instead of
storing one value in each element of hashtable, it contains pointer to the
linked list. So each element of array is a pointer to head of linked list.
Rehashing (Used in HashTable): It has different hash functions (H1,H2..Hn) when
collision occurs.
Ex: Hk(key) =
[GetHash(key) + k * (1 + (((GetHash(key) >> 5) + 1) % (hashsize – 1)))] % hashsize
45.
46. When to use what?
Do you need a sequential list where the element is typically discarded after its
value is retrieved?
If yes, consider using the Queue class or the Queue<T> generic class if you need first-in,
first-out (FIFO) behavior. Consider using theStack class or the Stack<T> generic class if
you need last-in, first-out (LIFO) behavior. For safe access from multiple threads, use the
concurrent versions ConcurrentQueue<T> and ConcurrentStack<T>.
If not, consider using the other collections.
Do you need to access the elements in a certain order, such as FIFO, LIFO, or
random?
The Queue class and the Queue<T> or ConcurrentQueue<T> generic class offer FIFO
access. For more information, see When to Use a Thread-Safe Collection.
The Stack class and the Stack<T> or ConcurrentStack<T> generic class offer LIFO
access. For more information, see When to Use a Thread-Safe Collection.
The LinkedList<T> generic class allows sequential access either from the head to the tail,
or from the tail to the head.
47. Do you need to access each element by index?
The ArrayList and StringCollection classes and the List<T> generic class offer access
to their elements by the zero-based index of the element.
The Hashtable, SortedList, ListDictionary, and StringDictionary classes, and
the Dictionary<TKey, TValue> and SortedDictionary<TKey, TValue> generic classes
offer access to their elements by the key of the element.
The NameObjectCollectionBase and NameValueCollection classes, and
the KeyedCollection<TKey, TItem> and SortedList<TKey, TValue>generic classes
offer access to their elements by either the zero-based index or the key of the
element.
Will each element contain one value, a combination of one key and one
value, or a combination of one key and multiple values?
One value: Use any of the collections based on the IList interface or
the IList<T> generic interface.
One key and one value: Use any of the collections based on
the IDictionary interface or the IDictionary<TKey, TValue> generic interface.
One value with embedded key: Use the KeyedCollection<TKey, TItem> generic
class.
One key and multiple values: Use the NameValueCollection class.
48. Do you need to sort the elements differently from how they were entered?
The Hashtable class sorts its elements by their hash codes.
The SortedList class and the SortedDictionary<TKey, TValue> and SortedList<TKey,
TValue> generic classes sort their elements by the key, based on implementations
of the IComparer interface and the IComparer<T> generic interface.
ArrayList provides a Sort method that takes an IComparer implementation as a
parameter. Its generic counterpart, the List<T> generic class, provides
a Sort method that takes an implementation of the IComparer<T> generic
interface as a parameter.
Do you need fast searches and retrieval of information?
ListDictionary is faster than Hashtable for small collections (10 items or fewer).
The Dictionary<TKey, TValue> generic class provides faster lookup than
the SortedDictionary<TKey, TValue> generic class. The multi-threaded
implementation isConcurrentDictionary<TKey,
TValue>. ConcurrentBag<T> provides fast multi-threaded insertion for unordered
data. For more information about both multi-threaded types, see When to Use a
Thread-Safe Collection.