Tuesday, August 30, 2016

Multimedia and Hypermedia


What is Multimedia

Multimedia is the name given to the use of a combination of media types including still images, audio, video and motion graphics. Multimedia has moved from being an advanced technology to an essential feature of almost any PC. The main benefit is that it makes the whole user experience more dynamic, colorful, interactive and interesting.

Components of multimedia


  • Images - Clip arts, photos, user designed graphics
  • Audio - sound clips(wav), audio tracks (mp3) Video - Standard video (.avi and .dat), Internet
  • Video (real video, video conferencing files)
  • Motion clips - animated gifs, quick time animations, mpeg clips.


Advantages of multimedia applications


  • Tends to be more interactive
  • Preferred by inexperienced users
  • Preferred by children
  • Gives a whole new dimension to computing
  • Reduces boredom since multimedia applications are more interesting to use

Disadvantages of multimedia applications


  • Requires a large amount of system resources (memory, disk space, etc)
  • Users may loose focus of actual intended tasks. Multimedia applications and systems take a longer times to load up


What is Hypermedia

Hypermedia is a form of media that was brought about in support of the internet. It is based on a linked collection of web pages which in combination form a web resource or web site. Simple web pages are created using descriptions which are placed within specific tags. Hypermedia includes the use of textual web pages, a link based navigation system and the use of other web based content such as web graphics.

Web page
A web page is a single html (hypertext markup language) page.

Web site
A web site is a collection of interlinked web pages which contain content which is relevant to a particular subject or topic.

Hyperlink
A hyperlink is a tag based link which enables users to access other web pages or other web resources by simply clicking on it.
Eg: Clicking on a link at the right side of this website

Web address (URL)
A URL (uniform resource locator) is a simplified web address which enables users to access web resources such as web sites by simply entering their name with the appropriate extensions.



IP address
Internet protocol address is the assigned address of a machine which is connected to the internet.
Type "what is my ip" on the google search to see your IP address.

Download
The term could be considered as either a noun or a verb. The noun implies a download to be a program or file which is saved to ones machine through the net. The verb defines this process.

Mp3
A groundbreaking innovations which enables the compression of audio into a web enabled format with marginal loss of quality. The sound quality in an mp3 file is almost the same as digital CD audio tracks. However its file size is a fraction of its counterpart. Thus making it ideal for transferring audio over the internet.

Search engine
A type of web site that enables users to search for other web sites by simply entering a keyword. The search engine site is likely to have a back end database and using query tools it checks for matches to the users requests. If found matches are indicated along with links and descriptions about corresponding web sites.

Friday, August 26, 2016

Data Protection - Lecture Note VII (Concurrency)



  • Database concurrency and recovery are two inter-related subjects, both being parts of the more general topic of transaction processing.
  • Database systems typically provide multi-user access to a shared database. As such concurrency control is critical in ensuring that concurrent operations are carried out correctly and efficiently.


Concurrency Problems
There are three concurrency problems, that is three types of potential mistakes which could occur if concurrency control is not properly enforced in the database system.

The Lost Update Problem
This relates to a situation where two concurrent transactions, say A and B are allowed to update an uncommitted change on the same data item, say X. The second update by transaction B replaces the value of the first update by transaction A. consequently, the updated value of X by A is lost following the second update by B.



Uncommitted Dependency

  • The uncommitted dependency problem arises if a transaction is allowed to retrieve or, worse, update - a tuple that has been updated by another transaction but has not yet been committed by that other transaction.
  • Therefore if it has not yet been committed, there is always a possibility that it never will be committed but will be rolled back instead - in which case the first transaction will have seen some data that now no longer exist
  • According to the above diagram transaction A sees an uncommitted update at time T2. That update is then undone at time T3. Transaction A is therefore operating on a false assumption - namely, the assumption that tuple P has the value seen at time T2, where as in fact it has whatever value it had prior to time T1. As a result transaction A might well produce incorrect output
  • Here the Roll Back of transaction B might be due to no fault of B's, it might for example be the result of a system crash. (and transaction A might already have terminated by that time, in which case the crash would not cause a Roll Back to be issued for A also.)
  • The second example is even worse. Not only does transaction A becomes dependent on an uncommitted change at time T2, but it actually loses an update at time T3 - because Roll Back at time T3 causes tuple P to be restored to its value prior to time T'1. This is another version of lost update problem.



Locking
Locking is the most common mechanism of concurrency control. The idea of locking is quite simple when a transaction item will not change during the course of processing, it requires a lock, which locks other transactions out of the item preventing them changing it. The first transaction knows that the data item remains stable for its duration.

Locking granularity:- The size of the object to be locked is referred to as the degree of granularity (locking granularity). It could be:


  • The entire database
  • A disk block
  • A database relation
  • A tuple in a relation, or field of a tuple.


The larger the data item the lower is the degree of concurrency, the fewer the number of locks to be set, and the lower the processing overhead.

Two types of locks

There are two types of locks available


  • Exclusive locks. (X locks, or write locks):
    • Grant read/write access to the transaction which holds the lock.
    • Prevent any other transactions reading or writing the data.
  • Shared locks (S locks, or read locks):
    • Give read-only access to the transaction which holds the lock.
    • Prevent any transaction writing the data.
    • Several transactions may hold a shared lock on an item.


Locking protocol:

  • A transaction must acquire an S lock on the data item it wishes to retrieve.
  • A transaction must acquire an X lock on the data item it wishes to update.
  • If a lock request is denied, the transaction goes into Wait State.
  • A transaction in Wait State resumes operation only when the lock request is released.
  • X locks are held until commit or Roll Back, S locks are normally the same.


Dead lock problem

A system is in a state of dead lock if there exists a set of transactions as such that every transaction in the set is waiting for another transaction in the set.

In other words there exist a set of waiting transactions (Ts, T1 ,Tz, - - Tn) such that T0 is waiting for a data item which is held by T,, Tr is waiting for a data item held by Tz and Tn is waiting for a data item held by T0.

Wednesday, August 24, 2016

Data Protection - Lecture Note VI (System and Media Recovery)


SYSTEM RECOVERY


  • The system must be prepared to recover, not only from purely local failures such as the occurrence of an overflow condition within' an individual transaction, but also from "global" failures such as power outage.
  • A local failure, by definition, affects only the transaction in which the failure has actually occurred; such failures have already been discussed, A global failure, by contrasts, affects all of the transactions in progress at the time of the failure, and hence has significant system-wide implications.
  • Such failures fall into two broad categories:
    • System failures (e.9., power outage), which affect all transactions currently in progress but do not Physically damage the database. A system failure is sometimes called a soft crash.
    • Media failures (e.9., head crash on the disk), which do cause damage to the database, or to some portion of . it, and affect at least those transactions currently using that portion A media failure is sometimes called a hard crash.

System Failures


  • The key point regarding system failure is that the contents of main memory are lost (in particular, the database buffers are lost).
  • The precise state of any transaction that was in progress at the time of the failure is therefore no longer known; such a transaction can therefore never be successfully completed, and so must be undone - i.e., rolled back - when the system restarts.
  • Furthermore it might also be necessary to redo certain transactions at restatrt time that did successfully complete prior to the crash but did not manage to get their updates transferred froth the database buffers to the physical database.
  • The obvious question therefore arises: How does the system know at restart time which transactions do undo and which do redo? The answer is as follows. At certain prescribed intervals - typically whenever some prescribed numbers of entries have been written to the log - the system automatically takes a checkpoint.
  • Taking a checkpoint involves (a) physically writing ("force writing") the contents of the database buffers out to the physical database, and (b) physically writing a special checkpoint record out to physical log. The checkpoint record gives a list of all transactions that were in progress at the time the checkpoint was taken. To see how this information is used, consider the figure, which is read as follows:



    • A system failure has occurred at time TF.
    • The most recent checkpoint prior to time if was taken at time TC.
    • Transactions of type I/ completed (successfully) prior to time TC.
    • Transactions of type T2 started prior to time TC and completed successfully (after time TC and before time TF.)
    • Transactions of type 13 also started prior to time TC but did not complete by time TF,
    • Transactions of type 14 started after time TC and completed (successfully) before time TF.
    • Finally transactions of type 15 also started after time TC but did not complete by time TF.





  • lt should be clear that, when the system is restarted, transactions of types T3 and T5 must be undone, and transactions of types T2 and 14 must be redone.
  • Note, however, that transactions of type T1 do not enter into the restart process at all, because their updates were forced to the database at time TC as part of the checkpoint process.
  • Note too that transactions- that complete unsuccessfully (i.e., with a rollback) before time if also do not enter into the restart process at all.
  • At restart times therefore, the system first goes through the following procedure in order to identify all transactions of types T2- 75.

Start with the two lists of transactions, the UNDO list and the REDO list. Set the UNDO list equal to the list of all transactions given in the in the most recent checkpoint record set the REDO list to empty.

Search forward through the log, starting from the checkpoint record;

If a BEGIN TRANSACTION log entry is found for transaction I move f from the UNDO list
If a COMMIT log entry is found for transaction T, move T from the UNDO list to REDO list
When the end of the log is reached, the UNDO and REDO lists identify, respectively, transactions of types T3 and T5 and transactions of types T2 and 74.

The system now works backward through the log, undoing the transactions in the UNDO list; then it works forward again, redoing the transactions in the REDO-list.

Note:-Restoring the database to a consistent state by undoing work is sometimes called backward recovery. Similarly, restoring it -to a-consistent state by redoing work is sometimes called forward recovery.

Finally when all such recovery activity is complete, then (and only then) the system is ready to accept new work.

MEDIA RECOVERY


  • The topic of media recovery is somewhat different in kind from the topics of transaction and system recovery.
  • A media failure is a failure such as a disk head crash, or a disk controller failure, in which some portion of the database has been physically destroyed.
  • Recovery from such a failure basically involves reloading (or restoring) the database from a backup copy (or dump), and then using the - both active and archive portions.
  • The need to be able to perform media recovery implies the need for a dump /restore (or unload reload) utility. The dump portion of that utility is used to make backup copies of the database on demand. (Such copies can be kept on a tape or other archival storage; it is not necessary that they be on direct access media-hard disk.) After a media failure, the restore portion of the utility is used to recreate the database from a specified backup copy.

Data Protection - Lecture Note V (The ACID Properties)


Transaction has four- important properties


  • Atomicity - Transactions are atomic (all or nothing)
  • Consistency - Transactions preserve database consistency. Which means a transaction transforms a consistent state of a database into another consistent state, without necessarily preserving consistency at all immediate points.
  • Isolation - Transactions are isolated from one another. That is, even though in general there will be many 'transactions' running concurrently, any given transaction's updates are concealed from all the rest, until that transaction commits.
  • Durability - Once transaction commits its updates survive in the database, even if there is a subsequent system crash.

Data Protection - Lecture Note IV (Database Recovery)


Recovery in a database system means primarily, recovering the database itself that is, restoring the database system to a state that is known to be correct after some failure has rendered the current state incorrect, or at least suspected.

The underlying principles on which such recovery is based are quite simple, and can be summarized in a single word - Redundancy (redundancy, that is at the physical level; any such redundancy should of course be hidden from the user and thus not visible at the logical level).

Transaction


  • A transaction is a logical unit of work.
  • Thus a logical unit of work (i.e. Transaction) is not necessarily just a single database operation; rather, it is a sequence of - several such operation.
  • In general, that transforms a consistent state of the database into another consistent state.


A system that supports transaction management does provide the next best thing to such a guarantee. Specifically, it guarantees that if the transaction executes some updates and then a failure occurs before the transaction reaches its planned termination, then those updates will be undone. Thus the transaction either executes in its entirety or is totally canceled.

The system component that provides this atomicity is known as the transaction manager and COMMIT and ROLLBACK are the key to the way it works:


  • The COMMIT operation signals successful end of transaction. lt ,tells the transaction, manager that a logical unit of work has been successfully completed, the database is in a consistent state again, and all of the updates made by the logical unit of work can now be Committed or made permanent;
  • By contrast, the ROLLBACK operation signals unsuccessful end-of- transaction: It tells the transaction manager that something has gone wrong, the database might be in an inconsistent state, and all of the update made by the logical unit of work so far must be "rolled back" or undone.



In the example, therefore, we issue a COMMIT if we get though the two updates success{ully, which will commit the changes in the database and make them permanent. lf anything goes wrong, however- i.e., if either update raises an error condition-than we issue a ROLLBACK instead, to undo any changes made so far.

Transaction Recovery


  • A, transaction, beings with successful execution of a. BEGIN TRANSACTION statement, and it ends with successful execution of either a COMMIT or a ROLLBACK, statement.
  • A COMMIT point thus corresponds to the end of a logical unit of work.
  • ROLLBACK, by contrast, rolls the database back to the state it was in at. BEGIN TRANSACTION, which effectively means back to the previous commit point.
  • In the figure 1 example, we include explicit test for error and issued an explicit ROLL BACK if any error was detected .But of course, the system cannot assume that application program will always include explicit-test for all possible errors.
  • Therefore the system will issue an implicit ROLL BACK for any transaction that fails for any reason to reach its planned termination.
  • We can see, therefore the transaction are not only a unit of work but also the unit of recovery .For if transaction successfully commits, then system will guarantee that its updates ,will be permanently installed in the database even if the system crashes the very next moment.
  • It is quite possible, for instance, system might crash after the COMMIT has been honored but before the update have been physically written to the database.- they still might, be waiting in a main memory buffer and so be lost at the time of the crash. Even if that happens, the system's restart procedure will still install those updates in the database; it is able .to discover the values to be written by examining the' relevant entries in the log .(it follows that. the log must be physically written before COMMIT processing can complete - the write ahead log rule/Protocol). Thus the restart procedure will recover any, transactions 'that completed successfully but did not manage to get their updates physically written prior to crash; hence as stated earlier, transactions are indeed unit of recovery.


What is a backup file ?

A backup file is a copy of the original file which is kept away from the original to replace the originalfile in case it is lost or damaged.

Transaction file log

A list of all changes made to a database since the contains transaction details which could be used damage to the main file.

How should backup files be stored?


  • Away from the originals (physical distance) 
  • ln secure storage locations (q: Safe boxes)
  • ln secure 3rd party storage locations
  • ln encrypted form
  • Away from potential environmental hazards. (eg : Water)



Tuesday, August 23, 2016

Data Protection - Part III (Database Integrity)



Data integrity deals with the integrity part of the relational model. lt should not be confused with the security issue of databases, although there are some similarities between them. The relational model specifies three main types of integrity:


  1. Domain Integrity;
  2. Entity Integrity
  3. Referential Integrity.


Each of the above integrity rules applies to a particular set of potential problems in a relational database system. This session will describe these rules and explain when and how they are applied, as well as the adverse effect if these integrity constraints are not vigorously enforced in the system.

Relational Integrity
An integrity constraint can be regarded as a condition that all correct state of the database is required to satisfy. Integrity is enforced via integrity rules.

There are three main types of integrity specified in the relational model:

Domain integrity: Domain integrity is concerned with the values that may be concerned within a particular column of relations.

Entity integrity: Entity integrity is concerned with the primary keys of relations.

Referential integrity: Referential integrity is concerned with the foreign keys of relations


Domain Integrity

Domain integrity is concerned with the values that may be contained with in a particular column. The domain integrity rule states that every attribute is required to specify the constraint that its values are drawn from the relevant domain. All columns have an implicit domain derived from their data types (for example, a telephone number rs made up of 10 numeric digits). However, more explicit domain integrity can also be defined (for example, each telephone number is preceded by the digits 071).

Ideally commands should be available within the database system to allow the user to define domain specifications when the database is created - not all systems currently allow this. lt would also be helpful to have some type of domain constraint. For example, a CHECK clause could be used to specify the gender column as CHECK (gender IN ('M','F')). This would be more useful than just being able to specify, the column as type character - which would, allow any character.

Entity Integrity

Entity integrity is concerned with primary keys.

The entity integrity rule states that every base relation must have a primary key and no component of the primary key is allowed to accept null values.

Null values - Null values here mean that information is missing for some reason (for example, the property is non-applicable), or the value is unknown.

The function of primary keys - Primary keys perform the unique identification functions in the relational model.

Primary keys provide the basic tuple-level addressing mechanism because the only system guaranteed way of pinpointing a specific tuple is by its primary key value. Therefore, each primary key value must identify one and only one tuple.

Implication for primary keys accepting nulls - Suppose a relational customer included a tuple for which the name value was null. That would be like saying that there was a customer who had no identity.

If that null means property does not apply, then the tuple makes no sense. If it means value is unknown, then confusion will arise and queries using that attribute may get a "don't know" as their reply.

Therefore, the entity integrity rule is sometimes stated that in a relational database, we never record information about something we cannot identify.


Referential Integrity

Referential integrity is concerned with foreign keys.

The referential integrity rule states that the database must not contain any unmatched foreign key values. In other words, the referential integrity rule simply says that if B references A, then A must exist.

Referencing vs. referenced - A foreign key is an attribute of one relation F12 whose values are required to match those of the primary key of some relation R1. We refer to the relation that contains the foreign key as the referencing relation and the relation that contains the corresponding primary key as the referenced or target relation. A given relation can, of course, be both a referenced relation and a referencing relation.

Domain requirement - A foreign key value represents a reference to the tuple containing the matching primary key value (the referred tuple or target tuple). Therefore, a given foreign key and the corresponding primary key should be defined on the same underlying domain.

Possible cases for null values - unlike Primary keys, Foreign keys do some times have to accept null values. ln most cases, the reasons for using nulls are likely to fall in the value does not exist category, rather than the value unknown category. For example, in the case of the department-and-employees database, it might be possible for some employee to be currently assigned to no department at all.

Maintaining Referential Integrity
Any state of the database which does not satisfy the referential integrity rule is by definition incorrect, but the rule itself does not specify how such incorrect states can be avoided!


  • Two Possibilities - There are usually two possibilities available:
    • The system could simply reject any operation that, if executed, would result in an illegal state.
    • In many cases, a preferable alternative would be for the system to accept the operation but to perform certain additional compensating operations (if necessary) in order to guarantee that the overall result is still a legal state.


There are two basic questions which will arise when providing such an alternative:

Deletion - What should happen on an attempt to delete the target of a foreign key reference? For example, an attempt to delete a customer for which there exists at least one matching Order?

In general there are at least three possibilities:

  • Restricted - the delete operation is restricted to the case where there are such matching orders

For example, the deletion of a parent row will be disallowed if there are any dependent rows.

  • Cascades - the delete operation cascade to delete those matching orders also For example, if a parent row is deleted, then the system will automatically deleted the dependent rows.


  • Nullifies - the foreign key is set to null in all such matching orders and the customer is the deleted.

For example, when a parent row is deleted, all dependent foreign keys are set to null.

  • Updating - What should happen on an attempt to update the primary key of the target of a foreign key reference? For example, an attempt to update the name for a customer for which there exists at least one matching order.

Again there are three possibilities:

  • Restricted - the update operation is restricted to the case where there are matching orders.

For example, if an update command changes the primary key of a parent row and if dependent exists for that row, the update is disallowed.

  • Cascades - the delete operation cascades to update those matching orders also.

For example, if an update commands changes the primary key of a parent row, all of its dependent rows will have their foreign key updated to the same value.

  • Nullifies - the foreign key is set to null in all such matching orders and the customer is then updated.

For example, when an update command changes the primary key of a parent row, all of its dependent rows will have their foreign key set to null.

Saturday, August 20, 2016

Data Protection - Part II (Data Integrity)

Integrity in a database is concerned with the database having correct and consistent data. Integrity constraints ensure that changes made to the database by users do not result in data inconsistency. They guard against accidental damage to the database. Specifically, data integrity in a relational database is concerned with three aspects of the data in a database:


  • Accuracy;
  • Correctness;
  • Validity.
  • Availability


Validation checks
A set of condition checks performed on data elements to check if they fit into the given criteria.


  • Correct data elements which meet the criteria should result in passing the checks where as
  • Incorrect data elements which do not meet the given criteria should be rejected (fail the checks) from the system.
Types of validation checks

  • Range check - values within a given range.
  • Length check - values of particular length.
  • Type check - data type.
  • Format check - data format and order.
  • Value checks - checks for presence of values.


Range check
Checks if data items fit within a specified range limit. If the item is to be accepted it should be greater than the minimum cut off point but small than the maximum cut off point.

Example:  Range of values for the field age should only take values between 1 and 65.
Specified range = 1-65

Length check
Checks if data items fit within a specified character length.

Example: The field telephone number should accept only values which are 7 digits.
A valid telephone number would thus be a number like "2502856"

Type check
Checks if data items are of a specific data type such as character, numerical, integer, real, etc.

Example: A telephone number field could have a type check which ensures that only whole numbers are entered. If an integer check is used a number like "2502856" will be accepted where as one like "CB34433" will be rejected.

Format check
Checks if data items are of a specified format. In such a situation there is a specific order of values and characters which would be accepted by the system.

Example: For a format check of dd/mm/yyyy on a date field:
the date 1210812004 will be accepted where as the date 1204198 will be rejected.

Value check
Checks for the presence of at least some value. Such fields can not be left blank and don't provide for storing null values.

Example: A field with values 114343I will be accepted
Where as one with values [] will not be accepted.

Verification
Is a process of checking the correctness of data by comparing it with it's initial source or similar forms of replicates to check if it is identical to one another. lf they are not some form of mistake is likely to have occurred and either the data element or the source document will need to be corrected.

Verification process

If Data Produce Expected Output
Then : data is verified.
Else : data contains errors.


Friday, August 19, 2016

Data Protection - Part I (Database Security)

The terms security and integrity are frequently heard together in database context, though the two concepts actually quite distinct Security refers to the protection of data against unauthorized disclosure, alteration, or destruction; integrity reefers to the accuracy of validity of data. In other words:

Security involves ensuring that users are allowed to do the things they are trying to do; integrity involves ensuring that the things they are trying to do are correct.

Two main approaches
As pointed out, the purpose of database security is to make sure that only an authorized person can access authorized data objects through a set of authorized operations. There are many access control methods available to implement database security. They are generally classified into two main approaches, depending on the type of object (user or data). Through which the access control is exercised, the two approaches are.

Discretionary access control
Here a given user will typically have different access rights (also known as privileges or authorities) on different objects; further different users will typically have different rights on the same object. There are various methods in this group, different in styles but the same in their approach.

Security Rules
This deals with a set up of security rules for the systems. In this approach, the components for defining access control generally include the following.


  • Name: Name of the rule under which rule is registered in the system catalog
  • Privilege: This specifies which operations are permitted using GRANT clause. Typical operations are INSERT, UPDATE, DELETE. and RETRIEVE.
  • Scope:  This specifies where the rule applies using ON clause (for example some subset of a relation. or some tuple.
  • User: This specifies who is to be granted the specified access right using a 'to' clause. Violation
  • Response: This specifies the action to be taken in the event of access violation


Eg: Create security Rule SR3
GRANT RETRIVE (Sno#, Sname, SCity), DELETE
ON Supplier where supplier. City <>'London'
To Fred, Anne
ON Attempted violation Reject

Mandatory Access Control
Mandatory controls are applicable to databases in which data has a static and rigid classification structure. As tends to be the case in military or government environments. Here the basic idea is that each data object has a classification level j (Top secret, secret, confidential etc) and each user has clearance level i the simple rules are then imposed that:

User i can see object j only if the clearance level of i is greater than or equal to the classification level of j. User i can modify object j only if the clearance level if i is equal to the classification level of j.

Threats of loss or abuse of data


  • Viruses that destroy or modify data items.
  • Hacking into database systems.
  • Cracking of encrypted files.
  • Intercepting of data over networks (during transmission)
  • Attempted blackmail or misuse of stolen data.


Computer Virus
A computer virus is a hidden computer application that causes the loss of data or unexpected behavior of other applications.

Problems caused by computer viruses:

  • loss of data
  • file duplication
  • transfer of data
  • modifying file structures
  • filling memory blocks


Hacking
This is where an external individual gains access to an internal computer system illegally through a network connection.

Problems caused by hackers:

  • Illegal access (viewing) of confidential data.
  • Copying, modifying or transferring data files.
  • Leaving behind hidden applications.


Cracking
A type of illegal access to encrypted files. A cracker will usually attempt to identify the encryption pattern and then decrypt it using a cracking software application or self coding.

Intercepting data during transmission (sniffing)
A type of illegal access to data where an unauthorized 3rd party gains access to a data conversation between 2 or more other parties. The illegal user could then view and copy the conversation or join in and provide dummy data.



Methods of data security

  • Passwords
  • User access levels
  • Encryption
  • Authentication during data communication
  • Multiple backups
  • Antivirus software
  • Physical measures
  • Firewall



Passwords
A password is a confidential code which is considered to be only known by an authorized. users. Entering the correct code enables the user to access the system/application/data file where as an incorrect code would result in the denial of access to the user.

Example
P.W :john is hidden and shown as P.W : ****

Password types

  • Operating System - to access the system.
  • Software Application - to access the program.
  • Database file - to access the database file
  • User interface - to access the users view.
  • Network - to access network resources.



User access levels
A situation where different users have different access/modify privileges to the system. Each user is identified by a particular user name, a corresponding password and a user access type. lf a correct combination is selected the user is granted access to the specific view.

Example:
User Name : JUSTIN Password : GodBlessYou
Access type : Administrator

Encryption ( Cryptography)
A technique used to protect data files. It involves taking a data file and using a special encryption algorithm to convert the file into a meaningless set of data. The file could only be restored to its original meaningful form by the use of the corresponding decryption algorithm. Encryption is usually used to protect data files during file transmission.

Firewall
A firewall is a piece of software (in some instances supported by hardware devices) which filters the flow of information which enters and leaves a network. A firewall could be used to detect and block illegal access from external individuals such a computer hackers.

Antivirus Software
Antivirus software are software applications which check data and application files for virus
coding and i{ detected helps to clean infected files.

Examples:

  • Nortan Antivirus
  • KasperSky
  • Nod Eset
  • AVG


Authentication
A technique used to confirm the identity of a unknown party prior to the transmission of data between them. The technique is usually associated with the use of digital signatures and involves the confirmation from the certification authority.

Multiple backups
A technique used to facilitate retrieval in the case of accidental or intentional loss of data. It involves storing more than a single copy of data in separate locations. If one file is lost another copy could be restored and used. Backups are usually stored in separate locations or on private data house networks.

Physical measures
Keeping original and copies of data files in safe boxes or using lock and key measures or lockable doors in data storage areas.

Thursday, August 18, 2016

Logical Database Design

The Normalization Process


The Normalization is a mechanism that used to remove all kinds of anomalies (errors or the problems that can disturb the database handling in a table) of a relational database table. In general normalization has three levels those are;


  • 1st Normalization
  • 2nd Normalization
  • 3rd Normalization.

Anomalies

There are three kinds of anomalies those can be found in an un-normalized table. They are;

Insertion anomaly: when ever we try to insert a record if the table does not satisfy the consistency in the table it is called as an Insertion Anomaly

Update anomaly: when ever we try to update a record if the table does not satisfy the consistency in the table it is called as an Update Anomaly

Deletion anomaly: when ever we try to Delete a record if the table does not satisfy the consistency in the table it is called as an Delete Anomaly

Advantages of Normalization
  1. Reduces data duplication (redundancies) 
  2. Ensures consistency and certain form of integrity
  3. Prevents update anomalies 
  4. Provides for faster and more efficient access to data

Example:

We'll see a real example of students database.

Un-normalized form of the database is like this contains following fields:
  • Student Number
  • Student Name
  • Student Date of Birth
  • Course Number
  • Course Name
  • Lecture Number
  • Lecture Name
1st Normal Form (We have to remove Many - Many relationships by removing repeating attributes). In this database Student can enroll in one or more courses as well as a course can have zero or more students. Also  a course can have one or more lecturers as well as a lecturer can conduct one or more courses.

Student
  • Student ID
  • Student Name
  • Date of Birth
  • Course Number
Course
  • Course Number
  • Course Name
  • Lecture Number
Lecture
  • Lecture Number
  • Lecture Name
  • Course Number
2nd Normal Form (After removing partial dependencies). We have to add intermediate entities called Student-Course and Course-Lecture.

 Student
  • Student ID
  • Student Name
  • Date of Birth
  • Course Number
Course
  • Course Number
  • Course Name
Student-Course
  • Student ID
  • Course Number
Lecture
  • Lecture Number
  • Lecture Name
  • Course Number
Course-Lecture
  • Course Number
  • Lecture Number

3rd Normal Form (After removing transitive dependencies)

Student
  • Student ID
  • Student Name
  • Date of Birth
Course
  • Course Number
  • Course Name
Student-Course
  • Student ID
  • Course Number
Lecture
  • Lecture Number
  • Lecture Name
Course-Lecture
  • Course Number
  • Lecture Number

  • A table is said to be in 1st normal form if it contains no repeating attributes or group of repeating attributes.
  • A table is said to be in 2nd normal form if it is in 1st normal form and no column that is not part of the primary key is dependent on only a portion of the primary key.
  • a table is said to be in 3rd normal form if it is in 2nd normal form and all attributes are depend on the key, the whole key and nothing but the key.

lntroduction to Relational Database

The relational model defined by E.C.Codd in the early seventies is a theoretical model of a database. The model appealed to the computer science community because of its mathematical basis and to the computing industry at large because of the simple way in which it represented information by the well understood convention of tables of values.

Three parls of the Relational Model

So what exactly is a relational model? A good characterize it is as follows: The Relational model is a way of looking at data- that is, it is a prescription for a way of representing data(namely by means of tables), and prescription for a way of manipulating such a representation(namely, by means of operators such as JOIN). More Precisely the relational Model is concerned with three aspects of data : Data Structure, Data Manipulation, Data Integrity.

A structural part - the type of structure from which the database is constructed.

A manipulative part - the operations which are used for retrieving and updating data in the database.

An integrity part- the rule that all valid database must obey

Relational Data structure

We can view the table to be the basic object of the relational model. Relational model tables confirm to the intuitive notion of tables with columns of values and a header name for each column with which we are all familiar. The following are the components of the relational data structure:

Relation:

A set of tuple defined on a number of attributes.

Attribute: Attributes are like the columns of a conventional table. Each attribute has an attribute name(like the column heading) and attribute values(like column entries).

Tuple: A tuple can be likened to a row in a conventional table. Each tuple is a set of attribute values one for each attribute of the relation. Furthermore, each tuple has the same number of attributes.

Header and Body: Table has two distinct parts, a header part, and a body part. The header part of the
table is a simple mathematical set, including a list of attribute name for each column. Each attribute name must be unique within a table. The body part consist of a number of tuples. The intersection of a tuple with each column of the table holds an attribute value. Each Column in the body part of the table holds attributes values corresponding to only one type of attribute. The list of values which attributes in one column can take is referred to as the Domain of the that attributes.


Domains and Keys

Domains- This is a set of values of a given type. For example, the domain of an attribute named Supplier Number is the set of all possible supplier numbers, the domain of shipment quantities is the set of all integers greater than zero and less than 10,000. Thus domains are pool of values, from which the actual values appearing in attributes are drawn. Every attribute must be defined on exactly one underlying domain, meaning that attribute must be drawn from that domain.

One aspects of the significance of domains is that domain constrain comparisons. For example, it makes sense to compare two part numbers which make up attribute in two separate relations, but not make sense to compare a weight with quantity. Both of these are numbers, but different kinds of numbers. The weight and quantity domains would therefore be distinct. We can state that if two attributes draw their values from the same domain, then comparisons - hence JOINS, UNIONS, and many other operations - involving those two attributes make sense because they are comparing like with like.

Keys:
A set of attributes whose values , uniquely identify each tuple of a relation.

Candidate Key:
Any attribute which satisfies the above definitions. A relations may have many keys.

Primary Key and Alternate key:
Among all the candidate keys of a given relation. one will be chosen to be the primary key, and the others are called alternate keys. Remember that the primary key values are used identify uniquely each tuple within a table. Therefore, the values of i primary key attribute must each be unique within the domain of that attribute.

Composite Key:
when more than one attribute column is needed to establish unique identification for tuples within a table, the resulting primary key is referred to as a concatenated primary key.

Foreign Key:
These are attributes used to cross reference tuples using the tuples' primary key values. In other words, a primary key for one table is Known as a foreign key in the table into which it is embedded for the purpose of identifying relationship occurrences.

Foreign Key are used to represent relationships. There are no links or pointers connecting one table to another. In non relational systems, by contrast, such information is typically represented by some kind of physical link or pointer that is explicitly visible to the user.

Properties of Relations

Relation posses certain properties, all of them immediate consequences of the definition of relation. Within any given relation:


  • There are no duplicate tuples: This property follows from the fact that the body of the relation is a mathematical set, and sets in mathematics by definition do not include duplicate elements.


  • Tuples are unordered: This properly'also follows from the fact that the body of the relation is a mathematical set. Set in the mathematics are not ordered. Thus, the order of the tuples in the relation is unordered.


  • Attributes are unordered: This property follows from the fact that the heading of a relation is also defined as a set. Each attribute is always referenced by its name not by its relative positioning within the header.


  • All attribute values are atomic: At every row and column position within the table, there always exists precisely one value, never list of values.


The Procedure for Designing Relational Databases

Having discussed the relational data structure, it is useful to review the basic procedure for designing a relational database.

Identify data and user needs(System Analysis)
Construct Entity- Relationship(E-R) models
Normalize the Data
Transformed the normalized ER models to a relational one

Data Management - Part VII (Database Management System)

The Database management System is the software that handles all access to the database. Let us now examine the functions of the DBMS In a little more detail. Those function will include support for at least all of the following function.

Data definitions:

The DBMS must be able to accept data definitions in source form and convert them to the appropriate object form. In other words, the DBMS must include language processor components for each of the various data definition languages (DDLs). The DBMS must also "understand" the DDL definitions in the sense that, for example it "understands" that employee external records include a salary field; it must then be able to use this knowledge in interpreting and responding to user requests

Data Manipulation :

The DBMS must be able to handle requests from the user to retrieve. update, or delete existing data in the database, or to add new data to the database. In other words, the DBMS must include a data manipulation language (DML) processor component in general database request must be planned or unplanned":


  • A planned request is one for which the need was foreseen well in advance of the time at which the request is actually to be executed. The DBA will probably have tuned the physical database design in such a way as to guarantee good performance for such requests.
  • An unplanned request, by contrast, is an ad-hoc query, i.e., a request for which the need was not seen in advance. The physical database design might or might not be ideally suited for the specific request under consideration. In general, obtaining the best possible performance for unplanned request represents a significant challenge for the DBMS.

Planned request are characteristic of operational or production applications, while unplanned requests are characteristic of a decision support applications. Further more planned request will typically be issued from pre-written application programs, whereas unplanned requests, by definition, will be issued interactively

Data security and integrity :

The DBMS must monitor user requests and reject any attempts to violate the security and integrity rules defined by the DBA.

Data Recovery and Concurrency :

The DBMS or else some other related software component, usually called the transaction manager- must enforce certain recovery and concurrency controls.

Data dictionary :

The DBMS must provide a data dictionary function. The data dictionary can be regarded as a database in its own right( but a system database, rather than a user database). The dictionary contains "data about data (meta data ). The dictionary might even be integrated into the database it defines, and thus include its own definition. It should certainly be possible to query the dictionary just like any other database.

Performance:

It goes without saying that the DBMS should perform all of the function identified above as efficiently as possible.

Data Management - Part VI (Database Administration)

The Database Administrator

The data administrator is the person who makes the strategic and policy decision regarding the data of the enterprise, and the database administrator is who provides the necessary technical support for implementing those decisions. Thus the DBA is responsible for the overall control of the system at a technical level. We can now describe some of the functions of the DBA in a little more detail. In general those functions will include the following.


Defining the conceptual schema:

It is the data administrator's job to decide exactly what is to be held in the database- In other words, to identify the entities of interest to enterprise and to identify the information to be recorded about those entities. This process is usually referred to as logical (sometimes conceptual) database design. Once the data administrator has thus decided the content of the database at an abstract level, DBA will then create the corresponding conceptual schema, using the conceptual DDL.


Defining the internal schema :

The DBA must also decide how the data is to be represented in the stored database design. Having done the physical design, the DBA must then create the corresponding storage structure definition, using the internal DDL. ln addition he/she must also define the associated mapping between the internal and conceptual schema. In practice, either the conceptual DDL or the internal DDL - most likely the former - will probably include the means for defining that mapping, but the two functions (creating the schema, defining the mapping) should be clearly separable. Like the conceptual, the internal schema and corresponding mapping will exist in both source and object form.

Liaising with users:

It is the business of the DBA to liaise with the users, to ensure that the data they require is available, and to write the necessary external schema, using the applicable external DDL. In addition, the mapping between any given external schema and the conceptual schema must also be defined. Other aspect of the user liaison function include consulting on application design, providing technical education, assisting with problem determination and resolution, and similar System- related professional services.

Defining security and integrity rules: The DBA should use the conceptual DDL for specifying such rules

Defining Back up and recovery procedures:

Once enterprise is committed to a database system, it becomes critically dependent on the successful operation of the system. In the event of damage to any portion of the database - caused by human error, say, or a failure in the hardware or supporting operating system - it is essential to be able to repair the data concerned with minimum of delay and with as little effect as possible on the rest of the system. For example the availability of data that has not been damaged should ideally not be affected. The DBA must define and implement an appropriate recovery scheme involving, e.9., periodic unloading or dumping of the database to backup storage, and procedures for reloading the database from necessary from the most recent dump.

Monitoring performance and responding to changing requirements :

The DBA is responsible for organizing the system as to get the performance that is "best for the enterprise," and for making the appropriate adjustments as requirements change. For example , it might be necessary to recognize the stored database on a periodic basis to ensure that performance levels remain acceptable.

Data Management _ Part V (Database Architecture)

We require a framework on which we can discussed architectural issues for databases. The ANSII-SPARC study group define three level architecture for database systems.

Internal level


The internal level is the one closest to physical storage structure of the database. The internal level has an internal schema, which describes the physical storage structure of the database. The internal schema uses a physical data model and describes the complete details of data storage and access paths for the database.

External level


The external level is the one closest to the users - i.e. it is the one concerned with the way the data is viewed by individual users. The external level includes a number of external schemas or user views. Each external schema describes the database view of one group of the database users. Each view typically describes the part of the database that a particular user group is interested in and hides the rest of the database from that user group.

Conceptual Level


The conceptual level is a level of indirection between the two. The conceptual level has a conceptual schema, which describes the structure of the whole database for community of users. The conceptual schema is a global description of the database that hides the details of physical storage structure and concentrates on describing entities, data types, relationships, constraints.





Three schema architecture


In DBMS based on the three schema architecture, each user group refers only to its own external schema which is converted into a request on the conceptual schema, then into a request on the internal schema for processing on the stored database. lf the request is a database retrieval, the data extracted from the stored database must be reformatted to match the user's external view before it is presented to the user.

The process of transforming request and results between levels are called mappings. There are two levels of mapping in the architecture, one between the external and conceptual levels of the system and one between the conceptual and internal levels.

The conceptual/internal mapping defines the correspondence between the conceptual view and the stored database; it specifies how conceptual records and fields are represented at the internal level. If the structure of the stored database is changed - i.e. if a change is made to the storage structure definition- then the conceptual internal mapping must also be changed accordingly, so that the conceptual schema may remain invariant.

An external/conceptual mapping defines the correspondence between particular external view and the conceptual view. The differences that may exist between these two levels are similar to those that may exist between the conceptual view and the stored database. For example fields can have different data types, field and record names can be changed, multiple conceptual fields can be combined into a single(virtual) external field, and so on any number of external views can exist at the same time; any number of users can share a given external view; different external view can overlap.

Client Server Database Architecture


Client server architecture allows the end users to work in a client computer that has connected to the server in the network environment. The basic functionality of the client server database; is that the data base is kept in the centralized server and clients are requested to maintain their data in the server computer. For the retrieval of data clients have to request data from the server computer and for the data storage clients have to store the data to the server computer over the network. Many clients may connect to the server and the server will control and maintain all the clients data; by using accessibility concepts like data integrity and the security concepts.

One tier architecture:

Monolithic, "all-in-one" model - e.g. clients are "dumb" terminals connected directly to the mainframe.


  • Any change affects the entire system
  • Single point of failure


Two tier architecture:

Typically composed of multiple clients and a single server, the clients connect to the server over a network.

Client:

The client is fat - e.g. implements the GUl, from the server(s), performs business logic based on the data.

Server:

Typically a database. Provides a shared data-store for the client. The server doesn't typically implement business logic. Later revisions to the model implement some of the business logic on the server as stored procedures (a.k.a. fat servers).

Three tire database architecture:

ln a three tire database architecture three levels of layers are located separately in three different
locations. Client, Server, Database, Clients are fully separated from the server as well as database; and will function independently. Clients Servers and Databases are connected by network. When ever the client wants to communicate to the Database; that should be done via server computer. Server computer will control the client server environment, for data insertion and retrieval clients has to communicate the server and server will communicate to database.

Data Management - Part III (The Database Approach)

The Database Approach

The database approach emphasizes the integration and sharing of data throughout the organization (or at least major segments of the organization). This approach requires a fundamental reorientation or shift in through process, staring with top management. Such a reorientation is difficult for most organizations; still many are making this shift today and learning that shift today and learning that information can be used as competitive weapon.

Benefits of the Database Approach

Redundancy can be reduced:
In non database systems each application has its own private files. This fact can often lead to considerable redundancy in stored data with resultant waste in storage place.

Inconsistency can be avoided:
This is really a proposition of the previous point. Suppose that a given fact about the real world - say the fact that employee E3 works in department D8 - represented by two distinct entries in the stored database. Suppose also, the DBMS is not aware of this duplication. Then the will necessarily be occasions on which the two entries will not agree - namely when one of the two has been- updated and the other has not. At such time the database is said to be inconsistent.

The data can be shared:
An individual piece of data in the database can be shared among several different users, in the sense that each of these users can have access to the same piece of data.

Standard can be enforced:
With central control of the database the DBA can ensure that all applicable standards are observed in the representation of the data. This is important when it is needed to interchange data or to do migration of data between systems.

Security restrictions can be applied:
Can define security rules to check whenever access is attempted to sensitive data. Different rules can be established for each type of access (read insert delete) to each piece of information in the database.


Integrity can be maintained:
The problem of integrity is the problem of ensuring that the data in the database is accurate. Inconsistency between two entries that are intended to represent the same fact is an example of lack of integrity. Even if there is no redundancy however the database might still contain incorrect information. For example an employee might be shown as having worked 400 hours in the week instead of 40 or as belonging to a department D9 when no such department exist. Adding integrity rules can eliminate these situations.

Data independence:
One immediate consequence of obstruction is the idea of buffering data from the process that uses data. The idea is to achieve a situation where data organization is transparent, to the users or application programs, which uses data. lf, for instance, a change is made to some part of the underlying database no application programs using affected data should need to be changed. Also if changes made to some part of an application system this should not affect the data.

Data Management - Part IV (Data Model)

Data Model


Every database and indeed every DBMS, must adhere to the principles of some data model. However the term data model is somewhat ambiguous. Here we consider a number of alternative data models. They are:


  • Hierarchical model
  • Network model
  • Relational model


The first two data models fall under the term navigational, since they require the use of pointer technology in order for the user to navigate the database. Although these models were extremely popular in there time they have several drawbacks - mainly that making changes to the inherent structure of the database difficult.

Hierarchical Model


The hierarchical model depends on every entity being subject to a higher one. A single example is where parent can be immediately identified from the child and vice versa. Generally speaking, retrieval up and down the hierarchy are easily managed (not always)

Example

This example shows a hierarchy, which is based around the invoicing of customers and their payments. We assume that one customer can have several invoices and that an invoice can be paid by installments. This particular hierarchy would have been designed with a specific set of processing requirements in mind. For example, a query such as which invoices were sent to a particular customer? would be processed easily - since this is what the structure was supposed to handle However, a query which is the inverse of this one that is who was this particular invoice sent to? could not perform well because it was not anticipated, Effectively the second query requires a traversal of pointers in an upward direction and this is not straight forward.


The most important drawback is that the hierarchical model was used in such a way that only pre-specified structures were implemented and making any changes to them meant that the data structures would have to be amended. Major problem is that in many cases it is just not in the form of a hierarchy. If we modify the above example to allow a single payment to be paid for several invoices we are effectively destroying the hierarchy by adding this new element.

Drawbacks of the hierarchical model


Difficult to change the structure:
Once the database designer has formulated a particular hierarchy in response to a set of current needs. It becomes very difficult to make amendments to the structure. Many organizations are dynamic and this drawback has serious implications to the way in which it is possible to cater to this changing aspect.

Unplanned queries are difficult to support:
The beauty of database systems is that a user is able to perform ad-hoc queries- unfortunately this is not possible within the hierarchical model since the structure of data is only able to reflect one particular type of processing.

The Network model


The network model represents a complex structure. Within a network any record may have many immediate parents as well as many dependents. In other words, the restriction to one hierarchical parent does not apply. The network model is thus able to reproduce a greater number of real-world situations.


The network model can now successfully manage the scenario of the previous example, namely the ability for a payment to serve more than one invoice.

The main difference as far as this example is concerned is that in the hierarchical model invoice records were stored physically next to the customer record to which they belonged. As far as the network model is concerned this does not have to be the case.

Since pointers are used heavily to make a chain of invoice records which can be traced back to a particular customer. This is what makes answering the query'to which customer does a particular invoice belong?' much easier. ln the same example the aspect that we can now model successfully is when it is possible for a payment to cover more than one invoice.

Drawbacks of the network approach


Pointer technology is used to implement relationships:

  • The network model relies heavily on pointer technology This causes a reasonable amount of overheads and performance is an issue.
  • Unplanned queries are difficult to support
  • No assumptions can be made
  • As soon as the structure changes the programmer must be aware of it.


The Relational Model


Most people are familiar with the relational approach since it is the most popular choice for database development currently. A brief list of its advantages is given here only in order to compare with previous models.

Data stored in relation:
The relation(which is the only construct in model) is based on sound mathematical theory and has several very useful properties.

Easy to understand concepts:
Even naive users are able to interact with a relational database since the idea of table with rows and columns is familiar to every one. Also the relational database comes with a filly defined query language which contains just a handful of commands which can be easily understood.

No Physical pointers:
The main advantage over earlier approaches is the fact that there are no physical pointers for the user to worry about. Indeed at the lowest level there maybe well usage of pointers, but this is never made evident to the user. The advantages of this is that the user is no longer burdened with having to navigate the database.

Logical/Physical clearly separated:
One of the consequences of not using physical pointers is that now the logical and physical aspects of the system can be separated- this ensure high degree of data independence.

Easy to set up and change:
The query language includes DDL commands which are relatively easy to use. It is a straightforward thing to be able to add a new relation to an existing set of relations (and indeed discard it later). Changing needs can be more readily reflected in a relational database.

Data Management - Part II (File Processing - Traditional Method)

File Processing

Traditional File Processing Systems

In the beginning of computer-based data processing, there were no databases. Computers that were considerably less powerful than today's personal computer filled large rooms and were used almost exclusively for scientific and engineering calculations. Gradually computers were introduced into the business world. To be useful for the business applications, computers must be capable to store, manipulate, and retrieve large files of data computer file processing systems are developed for this purpose. As business applications be more complex, it became evident that traditional file processing systems had a number of short comings and limitations. As a result, these systems have been replaced by database processing systems in most critical business applications today.  Nevertheless we should have at least some familiarity with the file processing system for the fr Following reasons.


  • File processing systems are still widely used today, especially for backing up database systems.
  • Understanding the problems and limitations inherent in file processing systems can help us to avoid these same problems when designing database.


Disadvantages of file processing systems

Program data dependence:
File Description are stored within each application program that access a given file. For example, in the invoicing system one program will access both the inventory pricing file and the customer master file. Therefore that program will contain a detailed description for both the files. As a consequence any change to a file structure requires changes to the file descriptions for all programs that access the file.

Duplication of data:
Since applications are often developed independently in file processing systems, unplanned duplicate data files are the rules rather than the exception. For example order filling system will contain an inventory master file, while invoicing system will contain an inventory pricing file these files undoubtedly both contain data describing company's products such as product description, unit price, and quantity on hand. This duplication is wasteful since it requires additional storage space and increased effort to keep all the files up to date.

Limited data sharing
With traditional file processing systems, each application has its own private files and users little opportunity to share data outside their own applications. It is often frustrating to managers to find that a requested report will require a major programming effort to obtain data from several incompatible files in separate systems.

Lengthy development time
With traditional file processing systems, there is little Opportunity to leverage previous development efforts. Each new application requires that the developer essentially start from scratch by designing new file formats and descriptions, and then writing the file access logic for each new program. The lengthy development, times required are often inconsistency with today's fast paced business environment, in which time to market is a key factor.

Excessive program maintenance
The preceding factors all combined to create a heavy program maintenance load in organizations that rely on traditional file processing systems. In fact, as much as 80 percent of the total information systems development budget may be devoted to program maintenance in such organizations. This of course leaves little opportunity for developing new application.

Data Management - Part I

Introduction

The past two decades have witnessed enormous growth in the number and importance of
database applications. Databases are used to store manipulate , and retrieve data in nearly
every type organization including business healthcare, education, government, and libraries.
Database technology is routinely used by individuals on personal computers ,By work groups
accessing the database on network servers, and by all employees using enterprise- wide-
distributed distributed application.

Basic Concepts and Definitions

We define database as a memory location where all the related data are stored in an
organized manner. A database may be of any size and complexity. For example a sales
person may maintain a small database of customer contacts on his laptop computer that
consist of a few mega bytes of data. A large corporation may build very large database
consisting several terabytes of data. on a large mainframe computer that is used for
decision support applications.

Definitions of Data

Historically , the term data referred to known as facts that could be recorded and stored on
computer media. For example sales persons database, the data would include facts such as
customer name address and telephone number, This definition now needs to be expanded to
reflect a new reality. Databases today are used to store objects such as documents
,photographs ,sound and even video segments in addition to conventional textual and
numeric data. To reflect this reality we use the following broadened definition : Data
consists of facts ,texts graphics, images sound and video segments that have meaning in the
users environment.

We have defined a database as an organized collection of related data. By organized we
mean that the data are structured so as to be easily stored, manipulated, and retrieved by
users and that the users can use the data to answer questions concerning that domain.

Metadata

As we have indicated data only become useful when placed in context. The primary
mechanism for providing context for data is metadata: Metadata are data that describes the
properties or characteristics of other data. Some of these properties include data definitions,
data structures, and rules or constrains.


File

Firstly it may be an ASCII file, which in turn could imply it to be either an executable
application or just a document. (eg : word.exe or word.doc). The second definition considers a file to be a collection of interrelated records (a table).

Components of a Database


  • Tables (Entities, Relations)
  • Queries
  • Forms
  • Reports
  • Macros


Table: A table is a collection of records about a specific subject. Eg : Customer table, Supplier table,
Product table, etc.

A table contains :

Record (Instance, Row)
Fields (Columns)



A record is an instance of an entity. lt is basically a collection of field values which combine
to make up a single item description within a database table.

Field (Columns) is a property or characteristic of an entity.

Query: A query is a set of conditions, which are placed upon a give data source in order to extract
specific information as desired by the user.

eg : using structured query language (SOL)

SELECT ename
FROM empfile
WHERE Sal < 5000 AND Age > 50
ORDER BY ename ASC

Form: A form is a construct which is used to design the front end (user interface) for a database.

Report: ls a tool for generating presentable output from a database system. Report generators usually accept either tables or query results as input and then arrange these into a presentable manner. The resulting file could either be printed, stored or exported.

Macro: A macro is a pre-coded (in built) set of functions which could be called into an application.

It is basically a ordered set of functions which is to be performed on data items.

Example:

CODE Open datal.table --------> selects input source'
SORT ASC ----------------> 'sort function of a macro'
STORE, CLOSE -----------------> macrol 'store result and close'

Field types (Data types)
A field in a database table needs to be defined prior to storing any data. Each field must have a field name, a field type and a field length. Additionally field descriptions, integrity constraints and default values could also be set.


  • Char/Text (Stores alphanumeric characters)
  • Integer/Number (Stores whole numbers)
  • Real (Stores numbers with decimal places)
  • Binary/Boolean Yes/No (Stores '1 or 0)
  • Date/Time (stored according to a predefined date format such as dd/mm/yyyy)
  • Auto number (Stores an automatically generated primary key)

Methods of data capture (inputting data)

  • Manually Keying in (Key-2-Disk)
  • Scanning (OCR, OMR)
  • Importing from externai sources
  • Turnaround (feedback documents)

Manually keying in (Key-2-Disk System)

The traditional and to this day most common form of data input is based on the key-2-disk
system where data items from physical sources are read by a data entry operator and then
typed into the computer using a standard keyboard.

Scanning supported by character recognition

A newer system which automates the data entry process. lt involves first scanning in
images of physical documents and then using text recognition software to convert the
image into digital text / characters.

Types of scanning-recognition devices

OMR - identifies marks and shadings: Example: You fill an application form one character inside a box such as your name, computer software recognize the letters inside the box or box or circle that is filled.

OCR - identifies textual characters. Example: You scan a document and the software automatically converts its letters to editable format such as in a text editor.

MICR - identifies magnetic ink symbols.

Sources for importing data

  • Floppy disks (diskettes)
  • CD Roms, CD-R, CD-RW
  • Zip disks
  • Internal Databases
  • Local computer networks
  • The Internet

Turnaround documents

A turnaround document is a document which is used to provide automated entry of user responses. It involves first printing out a sheet of questions asking users to either mark or enter details in appropriate blank spaces. Once completed the document is returned to the company which in turn feeds it into the computer system using OCR/OMR scanning devices. The data obtained is then stored within the system.

Processing of data

Processing is the process of taking in raw unprocessed facts and figures, performing required procedures on them and then presenting them in a manner which is meaningful to the user.

Types of processing that could be done on data

  • Querying (extracting information)
  • UPdating (making changcs,l
  • Mathematical and Statistical functions
  • Summarizing
  • Grouping
  • Sorting

Important Notice!

Dear students and friends. When you commenting please do not mention your email address. Because your email address will be publicly available and visible to all. Soon, it will start sending tons of spams because email crawlers can extract your email from feed text.

To contact me directly regarding any inquiry you may send an email to info@bcslectures.website and I will reply accordingly.