SlideShare uma empresa Scribd logo
1 de 30
Index Structures
Dr M Mahesh Kumar
Index Structures
 It is not sufficient simply to scatter the records that represent tuples of a relation among various
blocks. (SELECT * FROM R)
 An index is any data structure that takes the value of one or more fields and finds the records with
that value “quickly.” In particular, an index lets us find a record without having to look at more than
a small fraction of all possible records.
 The field(s) on whose values the index is based is called the search key, or just “key” if the index is
understood.
 Indexes help to speed up queries that specify values for one or more attributes.
Basics of Index Structures
 Storage structures consist of files, which are similar to the files used by operating systems. A data
file may be used to store a relation, for example. The data file may have one or more index files.
 Each index file associates values of the search key with pointers to data-file records that have that
value for the attribute(s) of the search key.
 Indexes are of two types:
 Dense: There is an entry in the index file for every record of the data file
 Sparse: Only some of the data records are represented in the index, often one index entry per block
of the data file.
 Indexes can also be
 Primary: Can determine the location of the records of the data file.
 Secondary: Cannot determine the location of the records of the data file.
 For example, it is common to create a primary index on the primary key of a relation and to create
secondary indexes on some of the other attributes.
Sequential files and Dense index
 A sequential file is created by sorting the tuples of a relation by their primary key. The tuples are
then distributed among blocks, in this order.
 If records are sorted, we can build on them a dense index, which is a sequence of blocks holding
only the keys of the records and pointers to the records themselves
Sequential files and Dense index
 The dense index supports queries that ask for records with a given search key value. Given key value
K , we search the index blocks for K , and when we find it, we follow the associated pointer to the
record with key K .
 It might appear that we need to examine every block of the index, or half the blocks of the index, on
average, before we find K . However, there are several factors that make the index-based search
more efficient than it seems.
1. The number of index blocks is usually small compared with the number of data blocks.
2. Since keys are sorted, we can use binary search to find K . If there are n blocks of the index, we only
look at log2 n of them.
3. The index may be small enough to be kept permanently in main memory buffers. If so, the search for key
K involves only main-memory accesses, and there are no expensive disk I/O ’s to be performed.
Sparse index
 A sparse index typically has only one key-pointer pair per block of the data file. It thus uses less
space than a dense index, at the expense of somewhat more time to find a record given its key.
 You can only use a sparse index if the data file is sorted by the search key, while a dense index can
be used for any search key.
Searching using Sparse index
 To find the record with search-key value K , we search the sparse index for the largest key less than
or equal to K .
 Since the index file is sorted by key, a binary search can locate this entry. We follow the associated
pointer to a data block.
 Now, we must search this block for the record with key K . Of course the block must have enough
format information that the records and their contents can be identified.
Multiple Levels of Index
 An index file can cover many blocks. Even if we use binary search to find the desired index entry,
we still may need to do many disk I/O ’s to get to the record we want. By putting an index on the
index, we can make the use of the first level of index more efficient.
Secondary Indexes
 A secondary index serves the purpose of any index: it is a data structure that facilitates finding
records given a value for one or more fields.
 However, the secondary index is distinguished from the primary index in that a secondary index
does not determine the placement of records in the data file.
 Rather, the secondary index tells us the current locations of records; that location may have been
decided by a primary index on some other field.
 An important consequence of the distinction between primary and secondary indexes is that:
• Secondary indexes are always dense. It makes no sense to talk of a sparse, secondary index. Since the
secondary index does not influence location, we could not use it to predict the location of any record whose
key was not mentioned in the index file explicitly.
Secondary Indexes
 The keys in the index file are sorted. The result is that the pointers in one index block can go to
many different data blocks, instead of one or a few consecutive blocks.
 For example, to retrieve all the records with search key 20, we not only have to look at two index
blocks, but we are sent by their pointers to three different data blocks. Thus, using a secondary index
may result in many more disk I/O ’s than if we get the same number of records via a primary index.
 However, there is no help for this problem; we cannot control the order of tuples in the data block,
because they are presumably ordered according to some other attribute(s).
Indirection in Secondary Indexes
 There is some wasted space, perhaps a significant amount of wastage, in the structure. If a search-
key value appears n times in the data file, then the value is written n times in the index file. It would
be better if we could write the key value once for all the pointers to data records with that value.
 A convenient way to avoid repeating values is to use a level of indirection, called buckets, between
the secondary index file and the data file.
Document Retrieval and Inverted Indexes
 With the advent of the World-Wide Web and the feasibility of keeping all documents on-line, the
retrieval of documents given keywords has become one of the largest database problems. While
there are many kinds of queries that one can use to find relevant documents, the simplest and most
common form can be seen in relational terms as follows:
 A document may be thought of as a tuple in a relation Doc. This relation has very many attributes, one
corresponding to each possible word in a document. Each attribute is boolean — either the word is present
in the document, or it is not. Thus, the relation schema may be thought of as
Doc(hasCat, hasDog, ... )
where hasCat is true if and only if the document has the word “cat” atleast once.
 There is a secondary index on each of the attributes of Doc. However, we save the trouble of indexing
those tuples for which the value of the attribute is FALSE; instead, the index leads us to only the
documents for which the word is present. That is, the index has entries only for the search-key value
TRUE.
 Instead of creating a separate index for each attribute (i.e., for each word), the indexes are combined into
one, called an inverted index. This index uses indirect buckets for space efficiency, as was discussed in
Section
Document Retrieval and Inverted Indexes
Document Retrieval and Inverted Indexes
B Trees
 While one or two levels of index are often very helpful in speeding up queries, there is a more
general structure that is commonly used in commercial systems.
 This family of data structures is called B-trees, and the particular variant that is most often used is
known as a B+ tree.
 B-trees automatically maintain as many levels of index as is appropriate for the size of the file being
indexed.
 B-trees manage the space on the blocks they use so that every block is between half used and completely
full.
The Structure of B-Trees
 A B-tree organizes its blocks into a tree that is balanced, meaning that all paths from the root to a
leaf have the same length.
 Typically, there are three layers in a B-tree: the root, an intermediate layer, and leaves, but any
number of layers is possible.
The Structure of B-Trees
 There is a parameter n associated with each B-tree index, and this parameter determines the layout of
all blocks of the B-tree. Each block will have space for n search-key values and n + 1 pointers.
Example: Finding the number of data values stored on B-Tree
 Suppose our blocks are 4096 bytes. Also let keys be integers of 4 bytes and let pointers be 8 bytes. If
there is no header information kept on the blocks, then we want to find the largest integer value of n
such that 4n + 8(n + 1) < 4096. That value is n = 340.
The Structure of B-Trees
 There are several important rules about what can appear in the blocks of a B-tree:
1. The keys in leaf nodes are copies of keys from the data file. These keys are distributed among the
leaves in sorted order, from left to right.
2. At the root, there are at least two used pointers. All pointers point to B-tree blocks at the level
below.
3. At a leaf, the last pointer points to the next leaf block to the right, i.e., to the block with the next
higher keys. Among the other n pointers in a leaf block, at least ceil((n + 1)/2) of these pointers are
used and point to data records; unused pointers are null and do not point anywhere. The ith pointer,
if it is used, points to a record with the ith key.
The Structure of B-Trees
Properties of B-Trees
 B-Tree of Order m has the following properties...
 Property #1 - All leaf nodes must be at same level.
 Property #2 - All nodes except root must have at least [m/2]-1 keys and maximum of m-1 keys.
 Property #3 - All non leaf nodes except root (i.e. all internal nodes) must have at least m/2 children.
 Property #4 - If the root node is a non leaf node, then it must have atleast 2 children.
 Property #5 - A non leaf node with n-1 keys must have n number of children.
 Property #6 - All the key values in a node must be in Ascending Order.
Insertion Operation in B-Tree
 In a B-Tree, a new element must be added only at the leaf node. That means, the new keyValue is
always attached to the leaf node only. The insertion operation is performed as follows...
 Step 1 - Check whether tree is Empty.
 Step 2 - If tree is Empty, then create a new node with new key value and insert it into the tree as a root
node.
 Step 3 - If tree is Not Empty, then find the suitable leaf node to which the new key value is added using
Binary Search Tree logic.
 Step 4 - If that leaf node has empty position, add the new key value to that leaf node in ascending order of
key value within the node.
 Step 5 - If that leaf node is already full, split that leaf node by sending middle value to its parent node.
Repeat the same until the sending value is fixed into a node.
 Step 6 - If the splitting is performed at root node then the middle value becomes new root node for the tree
and the height of the tree is increased by one.
Delete: 64, 23, 72, 65, 20, 70, 95, 77
Index Structures.pptx

Mais conteúdo relacionado

Mais procurados

Data structure & its types
Data structure & its typesData structure & its types
Data structure & its typesRameesha Sadaqat
 
Data structure power point presentation
Data structure power point presentation Data structure power point presentation
Data structure power point presentation Anil Kumar Prajapati
 
Binary search tree operations
Binary search tree operationsBinary search tree operations
Binary search tree operationsKamran Zafar
 
Distributed database management systems
Distributed database management systemsDistributed database management systems
Distributed database management systemsUsman Tariq
 
Binary search tree in data structures
Binary search tree in  data structuresBinary search tree in  data structures
Binary search tree in data structureschauhankapil
 
Application of Data structure
Application of Data structureApplication of Data structure
Application of Data structureDeepika051991
 
Linked stacks and queues
Linked stacks and queuesLinked stacks and queues
Linked stacks and queuesRamzi Alqrainy
 
Linear Data Structures - List, Stack and Queue
Linear Data Structures - List, Stack and QueueLinear Data Structures - List, Stack and Queue
Linear Data Structures - List, Stack and QueueSelvaraj Seerangan
 
Lecture 14 splay tree
Lecture 14 splay treeLecture 14 splay tree
Lecture 14 splay treeAbirami A
 
Introduction to data structure
Introduction to data structure Introduction to data structure
Introduction to data structure NUPOORAWSARMOL
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMSkoolkampus
 

Mais procurados (20)

Data structure & its types
Data structure & its typesData structure & its types
Data structure & its types
 
Binary search tree(bst)
Binary search tree(bst)Binary search tree(bst)
Binary search tree(bst)
 
Data structure power point presentation
Data structure power point presentation Data structure power point presentation
Data structure power point presentation
 
Graph data structure and algorithms
Graph data structure and algorithmsGraph data structure and algorithms
Graph data structure and algorithms
 
Binary search trees
Binary search treesBinary search trees
Binary search trees
 
Binary search tree operations
Binary search tree operationsBinary search tree operations
Binary search tree operations
 
Distributed database management systems
Distributed database management systemsDistributed database management systems
Distributed database management systems
 
Indexing Data Structure
Indexing Data StructureIndexing Data Structure
Indexing Data Structure
 
Binary search tree in data structures
Binary search tree in  data structuresBinary search tree in  data structures
Binary search tree in data structures
 
Application of Data structure
Application of Data structureApplication of Data structure
Application of Data structure
 
Tree and graph
Tree and graphTree and graph
Tree and graph
 
Extensible hashing
Extensible hashingExtensible hashing
Extensible hashing
 
Linked stacks and queues
Linked stacks and queuesLinked stacks and queues
Linked stacks and queues
 
Linked lists
Linked listsLinked lists
Linked lists
 
Trees
Trees Trees
Trees
 
Linear Data Structures - List, Stack and Queue
Linear Data Structures - List, Stack and QueueLinear Data Structures - List, Stack and Queue
Linear Data Structures - List, Stack and Queue
 
Lecture 14 splay tree
Lecture 14 splay treeLecture 14 splay tree
Lecture 14 splay tree
 
Tree - Data Structure
Tree - Data StructureTree - Data Structure
Tree - Data Structure
 
Introduction to data structure
Introduction to data structure Introduction to data structure
Introduction to data structure
 
12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS12. Indexing and Hashing in DBMS
12. Indexing and Hashing in DBMS
 

Semelhante a Index Structures.pptx

Indexing techniques
Indexing techniquesIndexing techniques
Indexing techniquesHuda Alameen
 
lecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptxlecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptxpeter1097
 
3620121datastructures.ppt
3620121datastructures.ppt3620121datastructures.ppt
3620121datastructures.pptSheejamolMathew
 
Database and Research Matrix.pptx
Database and Research Matrix.pptxDatabase and Research Matrix.pptx
Database and Research Matrix.pptxRahulRoshan37
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Relational Database Management System
Relational Database Management SystemRelational Database Management System
Relational Database Management Systemsweetysweety8
 
Indexing structure for files
Indexing structure for filesIndexing structure for files
Indexing structure for filesZainab Almugbel
 
indexingstructureforfiles-160728120658.pdf
indexingstructureforfiles-160728120658.pdfindexingstructureforfiles-160728120658.pdf
indexingstructureforfiles-160728120658.pdfFraolUmeta
 

Semelhante a Index Structures.pptx (20)

Indexing techniques
Indexing techniquesIndexing techniques
Indexing techniques
 
Unit08 dbms
Unit08 dbmsUnit08 dbms
Unit08 dbms
 
Lec 1 indexing and hashing
Lec 1 indexing and hashing Lec 1 indexing and hashing
Lec 1 indexing and hashing
 
Ardbms
ArdbmsArdbms
Ardbms
 
lecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptxlecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptx
 
3620121datastructures.ppt
3620121datastructures.ppt3620121datastructures.ppt
3620121datastructures.ppt
 
Database and Research Matrix.pptx
Database and Research Matrix.pptxDatabase and Research Matrix.pptx
Database and Research Matrix.pptx
 
Database management system session 6
Database management system session 6Database management system session 6
Database management system session 6
 
A41001011
A41001011A41001011
A41001011
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Relational Database Management System
Relational Database Management SystemRelational Database Management System
Relational Database Management System
 
Indexing structure for files
Indexing structure for filesIndexing structure for files
Indexing structure for files
 
indexingstructureforfiles-160728120658.pdf
indexingstructureforfiles-160728120658.pdfindexingstructureforfiles-160728120658.pdf
indexingstructureforfiles-160728120658.pdf
 
indexing and hashing
indexing and hashingindexing and hashing
indexing and hashing
 
Makalah if2091-2011-020
Makalah if2091-2011-020Makalah if2091-2011-020
Makalah if2091-2011-020
 
B+ trees and height balance tree
B+ trees and height balance treeB+ trees and height balance tree
B+ trees and height balance tree
 
DMBS Indexes.pptx
DMBS Indexes.pptxDMBS Indexes.pptx
DMBS Indexes.pptx
 
Data storage and indexing
Data storage and indexingData storage and indexing
Data storage and indexing
 
Lucene
LuceneLucene
Lucene
 
Unit 08 dbms
Unit 08 dbmsUnit 08 dbms
Unit 08 dbms
 

Último

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 

Último (20)

The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 

Index Structures.pptx

  • 1. Index Structures Dr M Mahesh Kumar
  • 2. Index Structures  It is not sufficient simply to scatter the records that represent tuples of a relation among various blocks. (SELECT * FROM R)  An index is any data structure that takes the value of one or more fields and finds the records with that value “quickly.” In particular, an index lets us find a record without having to look at more than a small fraction of all possible records.  The field(s) on whose values the index is based is called the search key, or just “key” if the index is understood.  Indexes help to speed up queries that specify values for one or more attributes.
  • 3. Basics of Index Structures  Storage structures consist of files, which are similar to the files used by operating systems. A data file may be used to store a relation, for example. The data file may have one or more index files.  Each index file associates values of the search key with pointers to data-file records that have that value for the attribute(s) of the search key.  Indexes are of two types:  Dense: There is an entry in the index file for every record of the data file  Sparse: Only some of the data records are represented in the index, often one index entry per block of the data file.  Indexes can also be  Primary: Can determine the location of the records of the data file.  Secondary: Cannot determine the location of the records of the data file.  For example, it is common to create a primary index on the primary key of a relation and to create secondary indexes on some of the other attributes.
  • 4. Sequential files and Dense index  A sequential file is created by sorting the tuples of a relation by their primary key. The tuples are then distributed among blocks, in this order.  If records are sorted, we can build on them a dense index, which is a sequence of blocks holding only the keys of the records and pointers to the records themselves
  • 5. Sequential files and Dense index  The dense index supports queries that ask for records with a given search key value. Given key value K , we search the index blocks for K , and when we find it, we follow the associated pointer to the record with key K .  It might appear that we need to examine every block of the index, or half the blocks of the index, on average, before we find K . However, there are several factors that make the index-based search more efficient than it seems. 1. The number of index blocks is usually small compared with the number of data blocks. 2. Since keys are sorted, we can use binary search to find K . If there are n blocks of the index, we only look at log2 n of them. 3. The index may be small enough to be kept permanently in main memory buffers. If so, the search for key K involves only main-memory accesses, and there are no expensive disk I/O ’s to be performed.
  • 6. Sparse index  A sparse index typically has only one key-pointer pair per block of the data file. It thus uses less space than a dense index, at the expense of somewhat more time to find a record given its key.  You can only use a sparse index if the data file is sorted by the search key, while a dense index can be used for any search key.
  • 7. Searching using Sparse index  To find the record with search-key value K , we search the sparse index for the largest key less than or equal to K .  Since the index file is sorted by key, a binary search can locate this entry. We follow the associated pointer to a data block.  Now, we must search this block for the record with key K . Of course the block must have enough format information that the records and their contents can be identified.
  • 8. Multiple Levels of Index  An index file can cover many blocks. Even if we use binary search to find the desired index entry, we still may need to do many disk I/O ’s to get to the record we want. By putting an index on the index, we can make the use of the first level of index more efficient.
  • 9. Secondary Indexes  A secondary index serves the purpose of any index: it is a data structure that facilitates finding records given a value for one or more fields.  However, the secondary index is distinguished from the primary index in that a secondary index does not determine the placement of records in the data file.  Rather, the secondary index tells us the current locations of records; that location may have been decided by a primary index on some other field.  An important consequence of the distinction between primary and secondary indexes is that: • Secondary indexes are always dense. It makes no sense to talk of a sparse, secondary index. Since the secondary index does not influence location, we could not use it to predict the location of any record whose key was not mentioned in the index file explicitly.
  • 10. Secondary Indexes  The keys in the index file are sorted. The result is that the pointers in one index block can go to many different data blocks, instead of one or a few consecutive blocks.  For example, to retrieve all the records with search key 20, we not only have to look at two index blocks, but we are sent by their pointers to three different data blocks. Thus, using a secondary index may result in many more disk I/O ’s than if we get the same number of records via a primary index.  However, there is no help for this problem; we cannot control the order of tuples in the data block, because they are presumably ordered according to some other attribute(s).
  • 11. Indirection in Secondary Indexes  There is some wasted space, perhaps a significant amount of wastage, in the structure. If a search- key value appears n times in the data file, then the value is written n times in the index file. It would be better if we could write the key value once for all the pointers to data records with that value.  A convenient way to avoid repeating values is to use a level of indirection, called buckets, between the secondary index file and the data file.
  • 12. Document Retrieval and Inverted Indexes  With the advent of the World-Wide Web and the feasibility of keeping all documents on-line, the retrieval of documents given keywords has become one of the largest database problems. While there are many kinds of queries that one can use to find relevant documents, the simplest and most common form can be seen in relational terms as follows:  A document may be thought of as a tuple in a relation Doc. This relation has very many attributes, one corresponding to each possible word in a document. Each attribute is boolean — either the word is present in the document, or it is not. Thus, the relation schema may be thought of as Doc(hasCat, hasDog, ... ) where hasCat is true if and only if the document has the word “cat” atleast once.  There is a secondary index on each of the attributes of Doc. However, we save the trouble of indexing those tuples for which the value of the attribute is FALSE; instead, the index leads us to only the documents for which the word is present. That is, the index has entries only for the search-key value TRUE.  Instead of creating a separate index for each attribute (i.e., for each word), the indexes are combined into one, called an inverted index. This index uses indirect buckets for space efficiency, as was discussed in Section
  • 13. Document Retrieval and Inverted Indexes
  • 14. Document Retrieval and Inverted Indexes
  • 15. B Trees  While one or two levels of index are often very helpful in speeding up queries, there is a more general structure that is commonly used in commercial systems.  This family of data structures is called B-trees, and the particular variant that is most often used is known as a B+ tree.  B-trees automatically maintain as many levels of index as is appropriate for the size of the file being indexed.  B-trees manage the space on the blocks they use so that every block is between half used and completely full.
  • 16. The Structure of B-Trees  A B-tree organizes its blocks into a tree that is balanced, meaning that all paths from the root to a leaf have the same length.  Typically, there are three layers in a B-tree: the root, an intermediate layer, and leaves, but any number of layers is possible.
  • 17. The Structure of B-Trees  There is a parameter n associated with each B-tree index, and this parameter determines the layout of all blocks of the B-tree. Each block will have space for n search-key values and n + 1 pointers. Example: Finding the number of data values stored on B-Tree  Suppose our blocks are 4096 bytes. Also let keys be integers of 4 bytes and let pointers be 8 bytes. If there is no header information kept on the blocks, then we want to find the largest integer value of n such that 4n + 8(n + 1) < 4096. That value is n = 340.
  • 18. The Structure of B-Trees  There are several important rules about what can appear in the blocks of a B-tree: 1. The keys in leaf nodes are copies of keys from the data file. These keys are distributed among the leaves in sorted order, from left to right. 2. At the root, there are at least two used pointers. All pointers point to B-tree blocks at the level below. 3. At a leaf, the last pointer points to the next leaf block to the right, i.e., to the block with the next higher keys. Among the other n pointers in a leaf block, at least ceil((n + 1)/2) of these pointers are used and point to data records; unused pointers are null and do not point anywhere. The ith pointer, if it is used, points to a record with the ith key.
  • 19. The Structure of B-Trees
  • 20. Properties of B-Trees  B-Tree of Order m has the following properties...  Property #1 - All leaf nodes must be at same level.  Property #2 - All nodes except root must have at least [m/2]-1 keys and maximum of m-1 keys.  Property #3 - All non leaf nodes except root (i.e. all internal nodes) must have at least m/2 children.  Property #4 - If the root node is a non leaf node, then it must have atleast 2 children.  Property #5 - A non leaf node with n-1 keys must have n number of children.  Property #6 - All the key values in a node must be in Ascending Order.
  • 21.
  • 22. Insertion Operation in B-Tree  In a B-Tree, a new element must be added only at the leaf node. That means, the new keyValue is always attached to the leaf node only. The insertion operation is performed as follows...  Step 1 - Check whether tree is Empty.  Step 2 - If tree is Empty, then create a new node with new key value and insert it into the tree as a root node.  Step 3 - If tree is Not Empty, then find the suitable leaf node to which the new key value is added using Binary Search Tree logic.  Step 4 - If that leaf node has empty position, add the new key value to that leaf node in ascending order of key value within the node.  Step 5 - If that leaf node is already full, split that leaf node by sending middle value to its parent node. Repeat the same until the sending value is fixed into a node.  Step 6 - If the splitting is performed at root node then the middle value becomes new root node for the tree and the height of the tree is increased by one.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. Delete: 64, 23, 72, 65, 20, 70, 95, 77

Notas do Editor

  1. 1