Thursday, September 1, 2016

File Organization



A file organization essentially means organization of records in the file. Efficiency and utility of the file organization over another very much depends on the intended use of the file. Some of the basic file organization techniques are:

Serial File

In serial file records are collected in the order they arrive. When a new record is to be inserted, it is placed in the last block if there is a space. lf the last block cannot accommodate the record, a new block is allocated and the record to be inserted is placed. Deletion is performed by setting a flag called the deletion bit in the deleted record. Reusing the space of deleted records by storing new records which are to be inserted, is difficult. However, "garbage collection" routines can be used to keep track of the deleted records and the reusable space.

Since the record in a serial file are collected as they arrive without any prior analysis, categorization or normalization, searching a specific records from a serial file would often lead to an exhaustive search of the entire file.

Thus the time required to locate a record in a serial is very high if the file is speared over more than a few blocks. However, insertion of a new record into a serial is less time consuming. The address of the end of the file is known and the data is simply stored at the end and the end of file marker is appropriately shifted. Since updating a record in the serial file requires locating the desired record first, updating cost is also quite high for such file. In fact, the high retrieval cost of serial files is responsible for the restricted use of such files. The serial file organization is generally used for small files or when the data is collected prior to processing and where the data is difficult to organize.



Sequential File

In a sequential file records are placed sequentially onto the storage media. In addition, the records in the file are usually ordered according to the values of key attributes' The key is used to identify record uniquely. In a sequential search, since the records are examined according to the key sequence, in the worst case one may have to access all the n blocks of the file. i-n sequential file the successor record is immediately available and may even be present at the same block.

The insertion of a record in a sequential file is a complicated process. Normally the sequence would not be maintained if the new record rs inserted at the end. For small files, records beyond the point of insertion can be moved further to accommodate the new record. The insertion are usually done in batches where all the insertion request are stored in a separate file and at later time a batch update is executed.

Deletion of records can be performed by setting the deletion flag in the record to be deleted. When several records have been deleted, the file is reorganized to eliminate the gaps created in the file due to deletion.

Finally the updating of records in a sequential file may involve all the complications of insertion. Note that, The sequential file would not allow the insertion of records larger than the original record stored. If the key value is not changed and the updated record length does not exceed the original record length, the record may be rewritten to the original file. When the key value is changed or the record length increases, the update is similar to the insertion of a new record

Sequential files are frequently used in batch oriented commercial processing where insertion, deletion or updation are generally carried out in a batched mode, In this environment, get next type of command ii frequently used to retrieve or process successive records. For instance, the accounts department of a company while preparing salary statements would normally process the pay slips of all the employees of a particular department in a sequence. Hence if the employee records are stored with the Dept-no as the primary key, then specifying the Dept-no, the first employee record in the department is retrieved. The remaining employee records in the same department can now be readily accessed without much effort, using repeated get-next types of commands'

In spite of its simplicity and the advantages mentioned above, sequential files are not suitable in an environment where data has to be retrieved or updated in a more or less random order



Indexed Sequential File

The index sequential method of file organization attempts to reduce the access problem inherent in the sequential flies, without losing the benefits cf a sequential file organization.

This type of organization enables the user to process records either in a serial or in a random manner. The indexed sequential file, as the name suggests, is provided with indexes associated with the file. The indexes can be used to quickly locate any given record for random processing. Another important feature associated with an index sequential file is the use of an overflow area to handle insertion to the file. Unlike sequential file it is no longer required to rewrite the entire file when the records are inserted or updated. Indexed sequential file has three major components, namely


  • Prime area (sequential file)
  • Overflow area
  • Index


The Prime Area
The prime area can be defined-as an area in which the records are written, when an indexed sequential file is originally created. The records in the prime area follow a strict key sequence. Hence the prime area is essentially a sequential file. In order to insert a record into the file a free area called an overflow area is associated with each file. It is to be noted that at the time an indexed sequential file is created, no records are placed in the overflow area

The Overflow Area
It has already mentioned that the overflow area(s) s essentially used to store new records which cannot be otherwise be inserted in the prime area without rewriting the sequential file. There are two types of overflow areas called the cylinder overflow area and the independent overflow area. In Cylinder overflow area the spare space in every cylinder is reserved for accommodating the overflow records. Thus overflow area is on the same cylinder as the prime area. This procedure has the advantage that locating an overflow record may require rotational delay but not a seek time. In an independent overflow area the overflow records from any where in the prime area maybe placed. The size and address of the independent overflow area maybe specified by the programmer. In this case maybe additional seek time may be required.



Hashed File

Hashing is a technique for providing fast direct access to a specific stored record on the basis of a given value for some field. The field in question is usually but not necessarily the primary key. In outline the techniques work as follows.


  • Each stored record is placed in the database at a location whose address (RID or page number) is computed as some function (the hash function) of some field of that record. The computed address is called the hash address.
  • To store record initial the DBMS computes the hash address for the new record and instructs the file manager to place the record at that position.
  • To retrieve the record subsequently give the hash field value, the DBMS performs the same computation as before and instructs the file manager to fetch the record at the computed position.
  • When two or more keys of a file are mapped into the same address by the hash function the event is referred as collision.


Hashed files must have a mechanism for handling collision. A collision occurs when two logical values are mapped into same physical key value. Suppose we have a logical key value 2034; we find that feeding this key value through the hash function that computes the relative block address 12. A little later we try to insert a record with the key value 5678. We find that this also translates to relative block address 12. There is no problem if there is space in the block for the record. As the size of the file grows blocks are likely to f ill up.

If the file manager cannot insert a record into the computed block then a collision is said to have occurred. One of the simplest schemes for handling collision is to declare an overflow area into which collided are placed. As the hashed file grows, however the number of record placed in the overflow are increases. This can cause degradation in the access performance. Various type of hashed files are now also available which dynamically resize themselves in responsible to update activity.

No comments:

Post a Comment

Important Notice!

Dear students and friends. When you commenting please do not mention your email address. Because your email address will be publicly available and visible to all. Soon, it will start sending tons of spams because email crawlers can extract your email from feed text.

To contact me directly regarding any inquiry you may send an email to info@bcslectures.website and I will reply accordingly.