RelationalDatabase1

Wednesday, July 12, 2006

07 - Functional Dependencies and Normalization

(making sure the appropriate columns are in a table)

Objectives

  • Deciding on a logical structure for a given body of data
  • The concept of functional dependencies
  • Determine the purpose of Normalization
  • Problems associated with redundant data.
  • Identification of anomalies
  • Benifits of normalization
  • Understand the process of normalization
  • Identify 1st 2nd and 3rd normal form

Preamble

  • Logical not physical design
  • relational design (map to other models)
  • an "art" not a science (some scientific priniples)
  • data strucures and data integrity
  • application-independent design (no matter what form, the design of the db is the same)
  • deciding base relations and their attributes

Functional Dependencies (FD)

  • A functional depenency is a relationship between two attributes.
  • for any relation R, attribute B is functionally dependent on attribute A, if for every valid instance of A, that value of A uniquely determines the value of B

Examples---------------------------------


Functional Dependency Article



Transitivity: Is there a nonkey attribute dependent on another nonkey attribute. Eliminate this and you are in 3rd normal form. (for the above example you'd want to put program id and program into a new table(program) ProgramID would stay in the student table, but would now be a foriegn key joining to program table.


Examples


Given R(A,B,C,D) A --> B, B --> C
A-->BC | Union
A-->BCD | Cant tell. don't know what determines D
AD-->B | Augmentation (D is redundant only need A)
BC-->A | Cant tell. could be the same person + birth(?) not necessarily A
AB-->B | Reflexive

Note: I don't really understand this... it could be wrong as it doesn't make any sense to me.

Principle of Seperation

  • Each tuple(row) of a relation contains an atomic, indivisible, unit of information.
  • atomic: can't be broken down further
  • indivisible: a distinct piece of information

Normalization

  • reduce complex users' views to a set of small, stable, data structures (tables)
  • normalized tables are flexible, stable, and easier to maintian
  • reduces data redundance (procedes controlled data redundancy)
  • eliminates data anomalies

Benefits of Data Normalization

  • the development of a strategy for constructing relations and selecting keys.
  • improved end-user computing activites (eg. accommodate ad-hoc queries, easier to create queries)
  • reduced problems with adding, updating, and deleting data
  • produces a hardware-independant, operating system system independent, DBMS independent and user-independent logical model

Anomalies

  • anomalies occur when relations are not in the proper form
  • insertion anomalies: attempting to insert a tuple with an unknown primary key
  • deletion anomalies: deleting a tuple that contains important information
  • update anomalies: occur when there is unnecessary redundancy in the data

Example


Problem: Making changes, adding/deleting/updating this table is going to be problematic.

Add Tuple: {S85, 35, P1, 9}
Problem: Two tuples with conflicting budgets. (P1 can't be both the old 32 AND the new 35)

Delete Tuple: {S79,27,P3,1}
Problem: Deletes the budget of project P3.

Update Tuple: {S75, 32, P1, 7} to {S75, 35, P1, 7}
Problem: Two tuples with different values for project P1's budget.

Partial dependencies (redundancies) screw things up.

Solution: Normalize the table.

A: EMPID
B: BUDGET
C: PROJECT
D: HOURS

Given R(A,B,C,D) AC-->D, C-->B
AC-->BD = augmentation because C determines B, you don't need A,
however, you do need AC to determine hours.
AC-->D, C-->B = 3NF

empID, project --> budget, hours should be changed to:

empID, project --> hours;
project --> budget.




Note the following:
  1. No anomalies will be created if budget is changed
  2. No dummy values are needed for projects that have not employees assigned
  3. If an employee's contribution is deleted, no important data is lost
  4. No anomalies are created if an employee's contribution is added.


First Normal Form (1NF) - Repeating Groups Removed

Remove any multivalued attributes or repeating groups
  • In the first normal form only single values are permitted at the intersection of each row and column hence, there are no repeating groups

  • To normalize a relation that contains a repeating group, remove the repeating group and form two new relations

  • The primary key of the new relation is a combination of the primary key of the original relation plus an attribute from the newly created relation for unique identification. (The PK determines all attributes)

Example: An Unnormalized Table

Student_Grade_Report(#Studentno, student_name, major, course_no, course_name, instructor_no, instructor_name, instructor_location, grade)

The course info is a repeating group. Take it out.

Student(#Studentno, student_name, major)
Student_Grade_Report(#Studentno, #course_no, course_name, instructor_no, instructor_name, instructor_location, grade)

Second Normal Form (2NF) - Partial Dependencies Removed

Q: Why upgrade to 2NF?
A: We still have update anomalies in 1NF

To illustrate...
  • In the above table to add a new course we sould need a student.
  • When Course information needs to be updated, we may have inconsistencies
  • In deleting a student, we may also delete critical information about a course
Step forward to 2NF:
  • The relation must first be in 1st normal form.
  • The relation is automatically in 2NF if, and only if, the primary key comprises a single attribute.
  • If the relation has a composite primary key, then each non-key attribute must be fully dependent on the entire primary key and not on a subset of the primary key (i.e. there must be not partial dependency).
Example: Removing the partial depedencies

Student(#Studentno, student_name, major)
Course_Grade(#Course_no, #Studentno, grade)
Course_Instructor(#Course_no, course_name, instructor_no, instructor_name, instructor_location, )

Third Normal Form (3NF) - Transitive Dependencies Removed


Q: Why upgrade from 2NF?
A:We still have update anomalies because of leftover transative dependencies.

To illustrate...
  • When adding a new instructor, we need to have a course.
  • Updating course information could lead to inconsistencies for instructor information.
  • Deleting a course may also delete instructor information
Moving on to 3NF:
  • The Relation must be in 2nd normal form
  • Remove all transative dependencies - a non-key attribute may not be functionally dependent on another non-key attribute.
Student(#student_no, student_name, major)
Course_Grade(#student_no, #course_no, grade)
Course(#course_no, course_name, instructor_no)
Instructor(#instructor_no, instructor_name, instructor_location)


Now you should be anomaly free. You are good to update with out losing info, or creating inaccurate info.


0 Comments:

Post a Comment

<< Home