Introduction
So far, we have examined the relational model, discussed relational algebra, and looked at SQL. In each case we assumed that relations/tables were defined and available. However, except for simple problems, the process of determining suitable relations/tables for a given problem is a non-trivial one. The phrase database design is often used to describe this process. We usually consider this process to involve four steps:
Creating a logical representation of the structure of a database. This step results in two primary artifacts:
Several data modeling techniques are available. We will use the ER modeling technique proposed by Peter Chen in 1976.
Converting the logical representation into a schema. For a relational database design this involves determining table structures.
Refining the database schema to address weaknesses that may result in modification anomalies (see page 126). These weaknesses usually arise from dependencies between attributes.
Creating the physical database structures to support the logical representation. For a relational database design this involves determining the SQL DDL statements needed for each table.
Note: The following material should be read in conjunction with the readings outlined for Chapter 3 below.
The ER Data Model
The ER data model represents the things of interest to an organization as a set of basic objects, referred to as entities, and relationships between the objects. It is a graphical model. That is, these constructs are represented as annotated graphical primitives.
Following is an overview of the fundamental constructs of the ER model, the corresponding notation (graphical primitives), and heuristics/rules for determining the annotation.
Entities
As originally defined by Peter Chen, an entity is an object that exists and is distinguishable from other objects. For instance, John Smith with SSN 111-12-3456 is an entity, since he can be uniquely identified as a particular person. An entity may be concrete (i.e a person, a book, etc.) or abstract (i.e a project, a holiday, etc.). An entity class (or entity set) is a set of entities of the same type (e.g all employees of a bank).
Given a problem statement, or a set of requirements, we can often determine candidate entities by identifying noun phrases. However, the actual list of entities is a function of the needs of the organization.
Entities are denoted by rectangles with the name of the entity written inside the rectangle.
Note: We will use the term entity to refer to an "entity set" and the phrase entity instance to refer to occurrences of a particular entity (i.e such as John smith above).
Attributes
The characteristics, or properties, of an entity that describe it are known as attributes. For example, we may determine that Name, SSN, Address, Age, PhoneNumber are attributes of the entity EMPLOYEE. We are usually interested in the domain of an attribute (i.e the set of permitted values). In the case of Age for example, it must be a positive integer.
Some attributes may not be atomic. They may represent groups of attributes. For example Address may represent the attributes Street, City, State, Zip. Such attributes are said to be composite attributes.
Attributes may be divided into two groups:
Attributes are denoted by ovals and are connected to the corresponding entity by lines. The name of the attribute is written within the oval. Identifier names are underlined.
Relationships
A relationship is an association between several entities. For example, consider the entities DEPARTMENT and EMPLOYEE. We may define the relationship CONSISTS-OF to denote the association between DEPARTMENTS and their EMPLOYEES.
A relationship is denoted by a diamond and is connected to the associated entities by lines. The name of the relationship is written beneath the diamond.
Given a problem statement, or a set of requirements, we can often determine candidate relationships by identifying verbs.
Relationships are described in terms of:
Degree refers to the number of entities associated in the relationship. We use the following terms to refer to the degree of a relationship:
For example, the CONSISTS-OF relationship above is a binary relationship.
Cardinality refers to the maximum number of instances of an entity associated with each instance of another. Given two related entities A and B, the cardinality must be one of:
Heuristic:
Given two related entities, say A and B,
you may determine the
cardinality of the relationship in two steps.
First, select one of the entities, say A, and ask:
For each instance of A, what is the maximum number of instances of B that it may be associated with.
The answer will either be 1 or more than 1 in which case we say that it is N. Second, select the other entity, and repeat the question.
The cardinality of a relationship is written inside the diamond used to denote the relationship.
The appropriate cardinality for a particular relationship depends on the problem being modeled. Consider the CONSISTS-OF relationship above. It may be the case that each department consists of many employees but each employee is associated with a single department. If so, then we say the CONSISTS-OF relationship is 1:N.
A relationship may also have descriptive attributes. For example, we may have a M:N relationship, WORKS-ON, between the EMPLOYEE entity and a PROJECTS entity. The attribute DateStarted may be associated with the relationship to indicate when a particular employee started on a particular project. In such cases, we use the oval notation and connect the oval to the diamond.
Note: See Fig 3-3 for additional examples.
A relationship may exist among entities of the same type (see degree 1 relationships above). Such entities are referred to as recursive relationships. See Fig 3-5 for an example. Note however that the cardinality for that example should be M:N not 1:N.
In determining the cardinality of a relationship, we were concerned about the maximum number of instances of an entity that could be involved in a relationship. We may also consider the minimum number. If the number is zero, we indicate this by superimposing a zero on the line representing the relationship, and adjacent to the entity. If the number is one, we indicate this by superimposing a dash perpendicular to the line representing the relationship, and adjacent to the entity. This quantity is referred to as the minimum cardinality of the relationship. The example presented in Fig 3-4 indicates that each dormitory is occupied by a minimum of one and a maximum of many students but each student is associated with a minimum of zero and a maximum of one dormitory.
Weak Entities
A weak entity is an entity whose presence in the database depends logically on another entity. We denote such entities by a rectangle with rounded corners. Hence, for a weak entity to be meaningful, it must be part of a 1:N relationship. See the example presented in Fig 3-7.
Recall, that attributes that may contain multiple values (i.e. repeating groups like arrays) are not allowed in the ER model (see page 52). We may use weak entities to represent such attributes. That is, for any attribute known to be multi-valued:
See page 58 and Fig 3-9.
Readings: Chapter 3, pg 51-58