RelationalDatabase1

Saturday, July 15, 2006

10 - Security

Security: the protection of data againsts unauthorized disclosure, alteration, or destruction.

General Considertions

  • Legal, social, and ethical aspects
  • Physical controls
  • Company policies
  • Operational concerns
  • Hardware controls
  • Operating system security
  • Data Ownership

Data Security


Threat: Any situation or event, whether intentional or unintentional that will adversly affect a system and consequently an organization.

Theft and fraud
  • unauthorized amendment of data
  • program alteration
  • wire tapping
  • illegal entry by hacker
Loss of confidentiality (secrecy)
  • Wire tapping
  • blackmail
  • illegal entry by hacker
  • inadequate, or ill thought out procedurers that allow confidential output to be mixed with normal output
  • staff shortage or strikes
  • inadequate staff training
Loss of privacy
  • Inadequate staff training
  • viewing unauthorized data and disclosing it
Loss of integrity
  • electronic interference and radiation
  • fire (electrical fault/lightning strike)
Loss of availability
  • fire (electrical fault/lightning strike)
  • flood
The extent of loss depends upon factors such as the impact of threat balanced against countermeasures and contingency plans.

E.g. Hardware failure resulting in a corrupt disk.
  • Does alternative hardware exist that can be used?
  • Is this alternative hardware secure?
  • Can we legally run our software on this hardware?
  • If no alternative hardware exists, how quickly can the problem be fixed?
  • When were the last backups taken of the database and log files?
  • Are the backups in a fireproof safe or offsite?
  • If the most current database needs to be recreated by restoring the backup with the log files, how long will it take?
  • Will there be any immediate effects on clients?
  • If we restore the system will the same or similar breach of security occur again unless we do something to prevent it from happening?
  • could our contingency planning be improved?

Computer Based Controls

The security of a DBMS is only as good as that of the operating system.

Authorization:
the granting of a right or privilege which enables a subject to legitimately have acess to a system or object. (user ID)
Authentication:
A mechanism by which a subject is determined to be the genuine subject that they claim to be. (passwords, login)


Privileges
  • closed system (log in to the network + log into the db, more secure)
  • open system
The types of privileges that an authorized subject may be given include:
  • Use of specific named databases
  • selection or retrieval of data
  • creation of tables and other objects
  • update of data (may be restricted to certain columns)
  • deletion of data (may be restricted to certain columns)
  • insertion of data (may be restricted to certain columns)
  • unlimited result set from a query (that is, a user is not restricted to a specified number of rows)
  • execution of specific procedures and utility programs (stored procedures, access to views -- not tables)
  • Creation of databases
  • Creation (and modification) of DBMS user identifiers and other types of authorized identifiers
  • Membership of a group of users, and consequent inheritance of the groups privileges
Ownership and privileges
  • Different ownership for different objects
  • ownership of objects gives the owner all appropriate privileges on the objects owned
  • newly created objects are automatically owned by their creator who gains the appropriate privileges for the object.
  • privileges can be passed on to other authorized users
  • DBMS maintains different types of authorization identifiers (users and groups)
    • Dont generally give users access to objects (fils/folders) in a network.
    • instead give access to a group

DBMS Support

Discretionary Control

  • privileges or authorities on different objects
  • policy decisions
  • flexible
Examples:
GRANT SELECT ON supplier TO charley
GRANT SELECT, UPDATE (status, city) ON supplier TO judy
GRANT ALL PRIVILEGES ON supplier TO ted
GRANT SELECT ON promotions TO public
GRANT dba TO phil
REVOKE SELECT ON supplier FROM charley
REVOKE UPDATE ON supplier FROM judy
  • The above policy decision are inforced by the DBMS
  • The systems catalog contains:
    • sysuserlist
    • sysauthlist
  • note: checks slow the db down.

Mandatory Control

  • Each object has a classification level (top secret, confidential, etc.)
  • each user has a clearance level
  • not supported by any current DBMS
  • very rigid

Views: can be used to represent only the data which is relevent to the user by effectively hiding other data (fields)

Backing up: The process of periodically taking a copy of the database and journal (and possibly programs) onto offline storage media.
  • Frequency depends on how fast data changes.
  • Various depending on size of data
Journaling: The process of keeping and maintaining a journal or log of all changes made to the database to enable recovery to be undertaken effectively in the event of a failure.
  • Must be enabled
  • copies of journal on other disks
  • record all security violations
  • done in transaction log
  • client server will record up the the failure
  • DB like access will only record rom last backup
Checkpointing:
The point of synchronization between the database and the transaction log file. All buffers are force-written to secondary storage.

Encryption:
The encoding of the data by special algorithm that renders the data unreadable by any program without the decryption key.
  • encrypt/decrypt
  • degretation of performance
Associated procedures:

To ensure controls are effective:

Authorization and Authentication
  • Password lengths
  • Password duration
Backup
  • Determine Procedures
  • frequency
Recovery
  • Determine procedures
Audit: Check that all proper controls are in place:
  • Ensuring accuracy of input data
  • Ensuring accuracy of data processing
  • Prevention and detection of errors during program execution
  • properly testing and documenting program development and maintenance
  • avoiding unauthorized program alteration
  • granting and monitoring access to data
  • ensuring documentation is up to date
Non-Computer Based Controls

Establishment of a security policy:
  • the area of the business it covers
  • responsibilities and obligations of employees
  • procedures that must be followed
Establishment of a Contingency plan:
  • Who the key personnel are and how they can be contacted
  • If key personnel are unavailable, a list of alternative personnel and how they can be contacted.
  • Who decides that a contingency exists and how that is decided
  • The technical requirements of transferring operations elsewhere
    • Additional equipment needed
    • will communication lines need to be installed
  • operational requirements of transferring operation elsewhere
    • staff needed to work away from home
    • staff needed to work unusual hours
    • staff will need to be compensated
  • any outside contacts who may be help, for example:
    • equipment manufacturer
  • Whether any insurance exists to cover the situation
Secure positioning of equipment
  • Restrict access to printers especially if used for sensitive info. (confidential memo etc)
  • Locate computer terminals sensibly if likely to display sensitive info. (screens visible from windows)
  • site cabling to avoid damages
Secure data and software
  • Ensure a secure storage area is available on-site
  • have an off-site secure storage area
  • index all the material (list it)
Physical Access controls:
  • Internal controls: govern acces to areas within a building
  • External controls: Govern access to the site
Emergency arrangement:
  • cold site (empty rooms)
  • warm site (rooms with computers)
  • hot site (rooms with working and connected computers)
PC security
  • Viruses (anti virus, patches etc.)
Statistical Databases: A collection of confidential info on indidviduals
  • Programs that pull random not consecutive data (eg census)
    • get rid of the specific info, keep data that generalizes
  • Preventing queries from operating on only a few database entries
  • Randomly adding in additional entries to the original query result set, which produces a certain error but approximates to the true response
  • using only a random sample of the database to answer the query
  • maintaining a history of the query results and rejecting queries that use a high number of records identical to those used in previous queries.
Risk Analysis
  • Establish a security team
  • Define the scope of the analysis and obtain system details
  • identify all existing countermeasures
  • identify and evaluate all assets
  • identify all assess all threats and risks (what can go wrong)
  • Select countermeasures, undertake a cost/benefit analysis and compare with existing countermeasures
  • make recomendations
  • test security system
Data protection and privacy laws

privacy: Concerns the right of an individual to not have personal info collected, stored and disclosed either willfully or indiscriminately.

data protection: The protection of personal data from unlawfull acquisition, storage and disclosure, and the provision of the necessary safeguards to avoid the destruction or corruption of the legitimage data held.

Since the 70's different countries have instituted different laws that deal with these issues
  • post 9/11 privacy = less
Security is very important for anyone who has clients (documentation with security info)



09 - Redundancy & Denormalization

Whoa. The actual title is far too long: "Methodology - Monitoring and Tuning the Operational System Transperancies."

Denormalization: Putting the Anomalies Back

Objectives

  • Meaning of denormalization

  • When to denormalize to improve performance

  • Importance of monitoring and tuning the operational system. (note: look@ index first for performance issues)

Step 8 - Consider the introduction of Controlled Redundancy

To determine whether introducing redundancy in a controlled manner by relaxing the normalization rules will improve the performance of the system.
  • Result of normalization is a logical database design that is structurally consistant and has minimal redundancy.

  • However, sometimes a normalized database design does not provide maximum processing efficiently.

  • It may be necessary to accept the loss of some of the benefits of a fully normalized design in favor of performance.

  • Also consider that denormalization:

    • Makes implementation more comlplex

    • Often sacrifices flexibility

    • May speed up retrievals, but it slows down updates

  • Consider denormalization in the following situations, specifically to speed up frequent or critical transactions:
    1. Combining 1:1 relationships

    2. Duplicating non key attributes in 1:* relationships to reduce joins

    3. Duplicating foreign key attributes in 1:* relationships to reduce joins

    4. Duplicating attributes in *:* relationships to reduce joins

    5. Introducing repeating groups

    6. Merging lookup tables with base relations

    7. Creating extract tables
      • Reports can access derived data and perform multi-relation joins on the same set of base relations. However, data the report is based on may be relatively static or may not have to be current.

      • Possible to create single, highly denormalized extract table based on the relations required by reports, and allow the users to access extract table directly insted of using base relations.

Monitor and Tune the Operational System

To monitor operational system and improve performance of the system to correct inappropriate design decisions or reflect changing requirements.

Thursday, July 13, 2006

08 - Methodology Conceptual Database Design


Objectives

  • Purpose of design methodology
  • Database design has 3 main phases: conceptual, logical, and physical design
  • How to decompose the scope of the design into specific users' views of the enterprise.
  • How to use ER modeling to build a local conceptual data model based on information given in a view of the enterprise
  • How to validate resultant conceptual model to ensure it is a true and accurate representation of a view of the enterprise
  • How to document the process of conceptual database design

Design Methodology

  • Structured approach that uses procedures, techniques, tools, and documentation aids to support and facilitate the process of design

  • Database design methodology has 3 main phases
    • Conceptual database design
    • Logical database design
    • Physical database design

Conceptual

  • Process of constructing a model of information used in an enterprise, independent of all physical considerations.

Logical

  • Process of constructing a model of information used in an enterprise based on a specific data model (e.g. relational), but independent of a particular DBMS and other physical considerations.

Physical

  • Process of producing a description of the implementation of the database on secondary storage; it describes the base relations, file organizations, and indexes

  • Design used to achieve efficicient access to the data, and any associated integrity constraints and security measures

  • Where the database is physicaly "built"

Critical Sucess Factors in DB Design

  • Work interactively with users as much as possible

  • Follow a structured methodology throughout the data modelling process

  • Employ a data driven approach.

  • Incorporate structural and integrity considerations into the data models.

  • Combine conceptualization, normalization, and transaction validation techniques into the data modelling methodology.

  • Use diagrams to represent as much of the data models as possible.

  • Use a Database Design Language (DBDL) to represent additional data semantics.

  • Build a data dictionary to supplement the data model diagrams.

  • Be willing to repeat steps.

Conceptual Design


Step 1: Build a local conceptual data model for each user view

  1. Identify entity types
    - Identify the main entity types that are required by the view

  2. Identify relationship types
    -Identify the important relationships that exist between the entity types that have been identified

  3. identify and associate attributes with entity or relationship types
    -Identify and associate attributes with the appropriate entity or relationship types and document the details of each domain

  4. Determine attribute domains
    -Determine domains for the attributes with the appropriate entity or relationship types and document the details of each attribute

  5. Determine candidate and primary key attributes
    -Identify the candidate key(s) for each entity and if there is more than one candidate key, to choose one to be the primary key.

  6. Consider use of enhanced modelling concepts (optional step)
    -Consider the use of enhanced modeling concepts, such as specialization/generalization, aggregation, and composition

  7. Check model for redundancy
    -Check for the presence of any redundancy in the model

  8. Validate local conceptual model against user transactions
    -Ensure that the local conceptual model supports the transactions required by the view

  9. Review local conceptual data model with user
    -Review the local conceptual data model with the user to ensure that the model is a "true" representation fo the user's view of the enterprise

Logical Database Design


Step 2: Build and validate a local logical data model for each view

  1. Remove features not compatible with the relational model (optional step)

  2. Derive relations for local logical data model

  3. Validate relations using normalization

  4. Validate relations against user transactions

  5. Define integrity constraints

  6. Review local logical data model with user

Step 3: Build and validate global logical data model

  1. Merge local logical data models into global model

  2. Validate global logical data model

  3. Check for future growth

  4. Review global logical data model with users

Physical Database Design


Step 4: Translate global logical data mdel for target DBMS

  1. Design Base relations

  2. Design representation of derived data

  3. Design Enterprise constraints

Step 5: Design physical representation

  1. Analyze transactions

  2. Choose file organization

  3. Choose indexes

  4. Estimate disk space requirements

Step 6: Design user views

Step 7: Design security mechanisms

Step 8: Consider the introduction of contorolled redundancy

Step 9: Monitor and tune the operational system


Example first cut conceptual ERD


Wednesday, July 12, 2006

ERD Example: Car Dealership

Create an E-R diagram for a car dealership. The dealership sells both new and used cars, and it operates a service facility. Base your design on the following business rules:

  • A salesperson may sell many cars, but each car is sold by only one salesperson.
  • A customer may buy many cars, but each car is sold to only one customer.
  • A salesperson writes a single invoice for each car he or she buys.
  • A customer gets an invoice for each car he or she buys.
  • A customer may come in just to have their car serviced, that is, one need not buy a car to be classified as a customer.
  • When a customer takes one or more cars in for repair or service, one service ticket is written for each car.
  • The car dealership maintains a service history for each of the cars serviced. The service records are referenced by the cars serial number.
  • A car brought in for service can be worked on by many mechanics, and each mechanic may work on many cars.
  • A car that is serviced may or may not need parts. (For example, adjusting a carburetor or cleaning a fuel injector nozzle does not require the use of parts).

  • Salespeople sell cars to customers thru sales invoices.
  • Customers buy cars from Sales people thru sales invoices.
  • A sales person writes one invoice to one customer per car sold.
  • A customer gets one invoice per car bought.
  • A customer initiates a car service thru a service ticket.
  • A customer gets one service ticket per car serviced.
  • The dealership uses the serial number of a car to look up service ticket history.
  • Mechanics are associated with cars and the services that they perform on cars thru the service/mechanic table. The service/mechanic table associates information about the service done with the service ticket.
  • Cars serviced often times use parts. The Parts Used table associates the the parts with the service ticket.

07 - Functional Dependencies and Normalization

(making sure the appropriate columns are in a table)

Objectives

  • Deciding on a logical structure for a given body of data
  • The concept of functional dependencies
  • Determine the purpose of Normalization
  • Problems associated with redundant data.
  • Identification of anomalies
  • Benifits of normalization
  • Understand the process of normalization
  • Identify 1st 2nd and 3rd normal form

Preamble

  • Logical not physical design
  • relational design (map to other models)
  • an "art" not a science (some scientific priniples)
  • data strucures and data integrity
  • application-independent design (no matter what form, the design of the db is the same)
  • deciding base relations and their attributes

Functional Dependencies (FD)

  • A functional depenency is a relationship between two attributes.
  • for any relation R, attribute B is functionally dependent on attribute A, if for every valid instance of A, that value of A uniquely determines the value of B

Examples---------------------------------


Functional Dependency Article



Transitivity: Is there a nonkey attribute dependent on another nonkey attribute. Eliminate this and you are in 3rd normal form. (for the above example you'd want to put program id and program into a new table(program) ProgramID would stay in the student table, but would now be a foriegn key joining to program table.


Examples


Given R(A,B,C,D) A --> B, B --> C
A-->BC | Union
A-->BCD | Cant tell. don't know what determines D
AD-->B | Augmentation (D is redundant only need A)
BC-->A | Cant tell. could be the same person + birth(?) not necessarily A
AB-->B | Reflexive

Note: I don't really understand this... it could be wrong as it doesn't make any sense to me.

Principle of Seperation

  • Each tuple(row) of a relation contains an atomic, indivisible, unit of information.
  • atomic: can't be broken down further
  • indivisible: a distinct piece of information

Normalization

  • reduce complex users' views to a set of small, stable, data structures (tables)
  • normalized tables are flexible, stable, and easier to maintian
  • reduces data redundance (procedes controlled data redundancy)
  • eliminates data anomalies

Benefits of Data Normalization

  • the development of a strategy for constructing relations and selecting keys.
  • improved end-user computing activites (eg. accommodate ad-hoc queries, easier to create queries)
  • reduced problems with adding, updating, and deleting data
  • produces a hardware-independant, operating system system independent, DBMS independent and user-independent logical model

Anomalies

  • anomalies occur when relations are not in the proper form
  • insertion anomalies: attempting to insert a tuple with an unknown primary key
  • deletion anomalies: deleting a tuple that contains important information
  • update anomalies: occur when there is unnecessary redundancy in the data

Example


Problem: Making changes, adding/deleting/updating this table is going to be problematic.

Add Tuple: {S85, 35, P1, 9}
Problem: Two tuples with conflicting budgets. (P1 can't be both the old 32 AND the new 35)

Delete Tuple: {S79,27,P3,1}
Problem: Deletes the budget of project P3.

Update Tuple: {S75, 32, P1, 7} to {S75, 35, P1, 7}
Problem: Two tuples with different values for project P1's budget.

Partial dependencies (redundancies) screw things up.

Solution: Normalize the table.

A: EMPID
B: BUDGET
C: PROJECT
D: HOURS

Given R(A,B,C,D) AC-->D, C-->B
AC-->BD = augmentation because C determines B, you don't need A,
however, you do need AC to determine hours.
AC-->D, C-->B = 3NF

empID, project --> budget, hours should be changed to:

empID, project --> hours;
project --> budget.




Note the following:
  1. No anomalies will be created if budget is changed
  2. No dummy values are needed for projects that have not employees assigned
  3. If an employee's contribution is deleted, no important data is lost
  4. No anomalies are created if an employee's contribution is added.


First Normal Form (1NF) - Repeating Groups Removed

Remove any multivalued attributes or repeating groups
  • In the first normal form only single values are permitted at the intersection of each row and column hence, there are no repeating groups

  • To normalize a relation that contains a repeating group, remove the repeating group and form two new relations

  • The primary key of the new relation is a combination of the primary key of the original relation plus an attribute from the newly created relation for unique identification. (The PK determines all attributes)

Example: An Unnormalized Table

Student_Grade_Report(#Studentno, student_name, major, course_no, course_name, instructor_no, instructor_name, instructor_location, grade)

The course info is a repeating group. Take it out.

Student(#Studentno, student_name, major)
Student_Grade_Report(#Studentno, #course_no, course_name, instructor_no, instructor_name, instructor_location, grade)

Second Normal Form (2NF) - Partial Dependencies Removed

Q: Why upgrade to 2NF?
A: We still have update anomalies in 1NF

To illustrate...
  • In the above table to add a new course we sould need a student.
  • When Course information needs to be updated, we may have inconsistencies
  • In deleting a student, we may also delete critical information about a course
Step forward to 2NF:
  • The relation must first be in 1st normal form.
  • The relation is automatically in 2NF if, and only if, the primary key comprises a single attribute.
  • If the relation has a composite primary key, then each non-key attribute must be fully dependent on the entire primary key and not on a subset of the primary key (i.e. there must be not partial dependency).
Example: Removing the partial depedencies

Student(#Studentno, student_name, major)
Course_Grade(#Course_no, #Studentno, grade)
Course_Instructor(#Course_no, course_name, instructor_no, instructor_name, instructor_location, )

Third Normal Form (3NF) - Transitive Dependencies Removed


Q: Why upgrade from 2NF?
A:We still have update anomalies because of leftover transative dependencies.

To illustrate...
  • When adding a new instructor, we need to have a course.
  • Updating course information could lead to inconsistencies for instructor information.
  • Deleting a course may also delete instructor information
Moving on to 3NF:
  • The Relation must be in 2nd normal form
  • Remove all transative dependencies - a non-key attribute may not be functionally dependent on another non-key attribute.
Student(#student_no, student_name, major)
Course_Grade(#student_no, #course_no, grade)
Course(#course_no, course_name, instructor_no)
Instructor(#instructor_no, instructor_name, instructor_location)


Now you should be anomaly free. You are good to update with out losing info, or creating inaccurate info.


Saturday, July 01, 2006

Terms for Project 1

Conceptual Data model

Features of conceptual data model include:

  • Includes the important entities and the relationships among them.
  • No attribute is specified.
  • No primary key is specified.

At this level, the data modeler attempts to identify the highest-level relationships among the different entities.

--1keydata.com



Logical Data Model

Features of logical data model include:

  • Includes all entities and relationships among them.
  • All attributes for each entity are specified.
  • The primary key for each entity specified.
  • Foreign keys (keys identifying the relationship between different entities) are specified.
  • Normalization occurs at this level.

At this level, the data modeler attempts to describe the data in as much detail as possible, without regard to how they will be physically implemented in the database.

In data warehousing, it is common for the conceptual data model and the logical data model to be combined into a single step (deliverable).

The steps for designing the logical data model are as follows:

  1. Identify all entities.
  2. Specify primary keys for all entities.
  3. Find the relationships between different entities.
  4. Find all attributes for each entity.
  5. Resolve many-to-many relationships.
  6. Normalization.
--1keydata.com


UML Notation


Business Rules

The descriptive statements of how a business is run. These statements describe how the various entities relate to each other. One example would be the rule that states an Accounts Payable vendor could be one of three types: supplier, transportation provider, or governmental agency.

--olenick.com

Strong vs Weak Entities

An entity set that does not possess sufficient attributes to form a primary key is called a weak entity set.

One that does have a primary key is called a strong entity set.

For example,
  • The entity set transaction has attributes transaction-number, date and amount.
  • Different transactions on different accounts could share the same number.
  • These are not sufficient to form a primary key (uniquely identify a transaction).
  • Thus transaction is a weak ent
For a weak entity set to be meaningful, it must be part of a one-to-many relationship set. This relationship set should have no descriptive attributes. (Why?)

The idea of strong and weak entity sets is related to the existence dependencies seen earlier.
  • Member of a strong entity set is a dominant entity.
  • Member of a weak entity set is a subordinate entity.
A weak entity set does not have a primary key, but we need a means of distinguishing among the entities.

The discriminator of a weak entity set is a set of attributes that allows this distinction to be made. The primary key of a weak entity set is formed by taking the primary key of the strong entity set on which its existence depends (see Mapping Constraints) plus its discriminator.

To illustrate:
  • Transaction is a weak entity. It is existence-dependent on account.
  • The primary key of account is account-number.
  • Transaction-number distinguishes transaction entities within the same account (and is thus the discriminator).
  • So the primary key for transaction would be (account-number, transaction-number).
Just remember: The primary key of a weak entity is found by taking the primary key of the strong entity on which it is existence-dependent, plus the discriminator of the weak entity set.

--CMPT-354-Han-95.3 Lecture Notes



Cardinality & Participation

Cardinality: business rules that specify how many times (min and max) an entity can be related to another entity in a given relationship.

Participation: The participation constraint represents whether all entity occurences are involved in a particular relationship (mandatory participation) or only some (optional participation).





Generalization/Specification

specification
is the process of maximizing the differences between member of and entity by identifying their distinguishing characteristics.
generalization is the process of minimizing the differences bewteen entities by identifying their common chartacteristics.




Referential Integrity

Referential Integrity is an integrity constraint that specifies that the value (or existence) of an attribute in one relation depends on the value (or existence) of an attribute in the same or another relation.

If a foreign key exists in a relation, either the foreign key value must match a candidate key value of some tuple in it's home relation or the foreign key value must be wholly null.


For example, branchNo in the Staff relation is a foreign key targeting the branchNo attribute in the home relation, Branch. It should not be possible to create a staff record with branch number B025. However we should be able to create a new staff record with a null branch number, to cater for the situation where a new member of staff has joined the company but has not yet been assigned to a particular branch office.