1. Rushdi Shams, Dept of CSE, KUET 1
Operating SystemsOperating Systems
File SystemsFile Systems
Version 1.0
2. Rushdi Shams, Dept of CSE, KUET 2
Introduction
• Processes can store a limited amount of information
within its own address space.
• For some applications this size is adequate but for
others (airline reservations, banking), the space is
too small
3. Rushdi Shams, Dept of CSE, KUET 3
Introduction
• Information stored must survive the termination of
the process using it. Typically, processes were
designed in a way such that after their usage on
variables, the values are lost when processes release
them.
• Inappropriate for applications like database systems
4. Rushdi Shams, Dept of CSE, KUET 4
Introduction
• Multiple processes must be able to access the
information concurrently. Truly speaking, while a
process operates on an information, it was not
accessible for others
5. Rushdi Shams, Dept of CSE, KUET 5
Introduction
• So, we have three essential requirements for long-
term information storage-
1. Store a large amount of information
2. The information must be stored for a long time
3. Multiple processes must be able to access the
information concurrently
• These requirements led us towards keeping the
information in files
6. Rushdi Shams, Dept of CSE, KUET 6
Introduction
• Information in files must be persistent- process
termination and creation must not affect them
• Only the owner can modify/delete the information if
she likes.
7. Rushdi Shams, Dept of CSE, KUET 7
File Naming
When a process creates a file, it gives a file name
When the process terminates, the file continues to
exist and can be accessed by other processes using its
name.
Some file systems distinguish between upper and
lower case letters, whereas others do not.
Are the file names cat.doc, cAt.doc, caT.doc and
CAT.doc same in windows? What about in UNIX?
8. Rushdi Shams, Dept of CSE, KUET 8
File Naming
Many OS support two-part file names- two parts
separated with a period (.).
The part after the period is called file extension.
Filename extensions can be considered a type of
metadata.
They are commonly used to infer information about
the way data might be stored in the file.
9. Rushdi Shams, Dept of CSE, KUET 9
File Structure
Generally, there are three kinds of files-
1. Byte Sequence
2. Record Sequence
3. Tree Sequence
10. Rushdi Shams, Dept of CSE, KUET 10
File Types
Generally, there are two
types of files-
1. ASCII files
2. Binary files
ASCII files are easy to
understand if you use a text
editor whereas binary files
are completely glibberish if
you print them
11. Rushdi Shams, Dept of CSE, KUET 11
File Access Strategies
Sequential access
read all bytes/records from the beginning
cannot jump around, could rewind or back up
Random access
bytes/records read in any order
essential for data base systems
read can be …
move file marker (seek), then read or …
read and then move file marker
12. Rushdi Shams, Dept of CSE, KUET 12
File Access Strategies
There are also several file usage patterns.
Most files are small (e.g., .login and .c files), and most
file references are too small files.
Large files use up most of the disk space (e.g., mp3
files).
Large files account for most of the bytes transferred
between memory and disk.
13. Rushdi Shams, Dept of CSE, KUET 13
File Access Strategies
These usage patterns are bad news for file system
designers.
To achieve high performance, a designer needs to
make sure that small files are accessed efficiently,
since there are many of them, and they are used
frequently.
A designer also needs to make sure that large files are
accessed efficiently, since they consume most of the
disk space, and account for most of the data
movement.
14. Rushdi Shams, Dept of CSE, KUET 14
Directories
To keep track of files, file systems normally have
directories or folders
In many systems directories themselves are files!
Simply put, there are three types of directory
systems-
1. Single-level directory systems
2. Two-level directory systems
3. Hierarchical directory systems
15. Rushdi Shams, Dept of CSE, KUET 15
Single-level directory system
Simplest form of directory
system
One directory contains all the
files
Sometimes called root directory
Good choice when number of
user is only one
The directory in figure has two
owners- A and B having four
files
Problem- different users may
accidentally use same names for
their files
16. Rushdi Shams, Dept of CSE, KUET 16
Two-level directory system
Each user will have
separate directories
containing their owned
files
The problem with
naming is resolved.
Log in procedure is
required
May allow one user to
access others’ files
17. Rushdi Shams, Dept of CSE, KUET 17
Hierarchical Directory Systems
The problem of two-level
directory system is the
absence of grouping facility
You may group your works
like- games in one place,
movies in the other.
Hierarchical directory
systems have the facility of
two-level directory systems
plus the organization facility
18. Rushdi Shams, Dept of CSE, KUET 18
File System Layout
File systems are stored on disks
Disks are divided up into one or more partitions
Each partition has independent file systems
Sector 0 of the disk is called Master Boot Record
(MBR)
Computer uses MBR to boot itself
The end of MBR contains the partition table- that has
the start and end address of each partition
19. Rushdi Shams, Dept of CSE, KUET 19
File System Layout
One of the partitions in the partition table is marked
as active
When computer is booted, the BIOS reads in and
executes MBR
The first thing MBR program does is to locate the
active partition
The first block of this active partition is Boot Block.
MBR reads and executes it.
The program in Boot Block loads the OS contained in
that partition
20. Rushdi Shams, Dept of CSE, KUET 20
File System Layout
Every partition starts with a Boot Block
But only the partition having the OS will be marked
as active
Other partitions hold the Boot Block as you can
change your mind and copy another version of the OS
into one of them
22. Rushdi Shams, Dept of CSE, KUET 22
File System Layout
Super Block contains all the key parameters about the file
system
It is read into memory when the computer is booted or the file
system is first touched
Typically holds a Magic Number to identify the file system type
(.exe or .bmp?), number of blocks, etc
23. Rushdi Shams, Dept of CSE, KUET 23
File System Layout
Free space management reveals the free blocks in the file
system
i-nodes is an array of data structures, one per file, simply telling
about the file
Root directory contains the top of the file system tree
Remainder of the disk contains all other directory and files for
that partition
24. Rushdi Shams, Dept of CSE, KUET 24
Implementing Files:
Contiguous Allocation
Simplest allocation scheme to store each file as a
contiguous run of disk blocks
Disk with 1 KB blocks, a 50 KB file would need 50
consecutive blocks; Disk with 2 KB blocks would
make it okay with 25 consecutive blocks
26. Rushdi Shams, Dept of CSE, KUET 26
Implementing Files:
Contiguous Allocation
Advantages
Simple to implement
You only need to know the disk address of the first
block and the distance of the file block from it
Read performance is excellent, as you need only two
seek operations (boot block and file block) and one
read operation (one contiguous stream of bytes)
Disadvantages
Disks become fragmented
Wastage of storage is possible
27. Rushdi Shams, Dept of CSE, KUET 27
Implementing Files:
Linked List Allocation
Every block will have two sections- a pointer to next
one (the first word of the block) and data (the rest of
the words)
Unlike contiguous allocation, every disk block can be
used in this method
No space is lost due to disk fragmentation
28. Rushdi Shams, Dept of CSE, KUET 28
Implementing Files:
Linked List Allocation
Reading a file sequentially is straight forward but
random access is slow.
To get block n, the OS has to start at the beginning
and read the n-1 blocks prior to it
Overhead due to pointer information
29. Rushdi Shams, Dept of CSE, KUET 29
Implementing Files:
Linked List Allocation
30. Rushdi Shams, Dept of CSE, KUET 30
Linked List Allocation:
using Table in Memory
The disadvantages are eliminated by taking the
pointer word from each block and putting it into a
table in memory
Such a table in main memory is called FAT (File
Allocation Table)
31. Rushdi Shams, Dept of CSE, KUET 31
Linked List Allocation:
using Table in Memory
32. Rushdi Shams, Dept of CSE, KUET 32
Linked List Allocation:
using Table in Memory
Entire block is available for data (no pointer
information)
Faster random access
The primary disadvantage-
the table must be in memory all the time to make it work
a 20 GB disk and with 1 KB block size, the table needs 20
million entries one for each of 20 million blocks
each entry is 4 bytes long
so, FAT will occupy 80 MB space in memory all the time
33. Rushdi Shams, Dept of CSE, KUET 33
i-nodes
It is a data structure that lists the attributes and disk
addresses of the file’s blocks
With i-nodes, it is possible to find all the blocks of a
file
i-nodes need only be in memory when the
corresponding file is open
If each i-node contains n-bytes and there are k-files
are opened at a time, that instant, kn-bytes are
reserved in memory
35. Rushdi Shams, Dept of CSE, KUET 35
i-nodes
This array is usually far smaller than the space
occupied by the FAT
Table for holding the linked list is proportional to
disk itself.
If the disk has n blocks, the table needs n-entries
As the disks grow, the table grows linearly with them
i-node requires an array whose size is proportional to
the maximum number of files that may be open at
once (it does not matter if you have 1 petabyte HDD
)
36. Rushdi Shams, Dept of CSE, KUET 36
i-nodes
The disadvantage is if you fix the size of i-node and
your files have a tendency to grow larger (obviously
you cannot say your file will be always 6MB in size )