Introduction to Database Design Video
This video is a one hour presentation that helps people quickly understand the fundamentals of relational tables and the normalization design process. The presentation explains the normalization design process. It then walks through an example of using the process to design a set of tables for an order form.
Video Information
Applies to: SQL Server Database Engine, all versions
Author: Alan Brewer, Microsoft Corporation
Length: 00:58:32
Size: 45 MB
Type: WMV file
Video Summary
This list summarizes the material covered in the presentation.
- Introduction (timestamp: 00:00)
- Problem Statement (00:30)
- Many senior DBAs have to correct poorly designed databases others have built.
- Most of the problems comes from Application area. Database design is one such area where most of the problems arise with a poorly designed database.
- Many times the database tables are poorly structured, and it is expensive to change tables.
- Many new people cannot find information that quickly explains the fundamentals of database design.
- This presentation covers the fundamentals in an hour.
- What You Will Learn (timestamp 02:33)
- The basic table design process and relational concepts.
- Gain the knowledge to more easily understand the more extensive database design material in books and on the Web.
- Relational Databases (03:28)
- The fundamental units of storage are called relations. Relations look like simple 2-dimensional arrays, but relational theory defines rules about how to structure the relations to store data efficiently.
- Set theorists originally defined and mathematically proved the logic of the relational rules in the 1800s. The rules were applied to database systems by Dr. E. F. Codd at IBM in the 1960s.
- SQL databases are the most commonly used relational databases in commercial systems. They comply with the relational rules, but typically use a different terminology, such as using the term table instead of relation.
- There are two table design processes proven to result in efficient structures for storing data; normalization and entity framework modeling.
- Main Table Components (08:23)
- Well-designed SQL databases have two main types of tables. Entity tables store the data about the things the database models, such as courses, students, teachers, and classrooms. Relationship tables store the data specific to the relationships between entities, such as an OrderProducts table that records which products were requested on each order.
- The vertical dimension of a table is called a column, and represents a characteristic of the thing the table models. An example is a part table that has columns for PartNumber, PartName, PartColor, and PartWeight.
- Each column has a data type defining the format of the data that can be entered in the column. Each column also has a domain that specifies the data values that have meaning for use in the column. An example is a domain that limits the values in a PartColor column to the names of colors like red, blue, or green.
- The horizontal dimension of a table is called a row, and represents an individual in the class of things the table models. An example is a part table that has one row for part #123, and another row for part #124.
- Table Keys (11:45)
- Candidate keys are a column or group of columns whose values are unique for each row in the table. The values returned for the other columns in the table are said to depend on the value specified for the candidate key.
- One candidate key is chosen as the primary key, which uniquely identifies the rows in the table. The primary key should also be relatively small, and have values that do not change frequently.
- A foreign key is a column in one table that has the same domain as the primary key in another table. The foreign key can be used to join to the other table. For example, an OrderID foreign key column in an OrderProducts table refers to the OrderID primary key in an Order table. The two keys establish the relationship between products and orders.
- An Example Table (17:12)
- The slide shows a simple graphic illustrating how you can conceive of a table.
- Designing a Set of Tables (17:38)
- The process of designing database tables is one part of the larger process for designing a database.
- First interview your users to establish the data requirements. For the database, consider the application designers and developers as your users. Interview them for requirements that the internals of the application drives, and are not known by the end users. Also, do not use the user forms and reports to establish the formats of the tables. Design the tables to support all the features of the application, not specific forms or reports.
- The key step is to use either normalization or entity-relationship modeling to design a set of tables that efficiently meet all of the data requirements. This results in a logical model that can be implemented on any relational system.
- Validate your logical model against the data requirements, and get sign-off from the application designer and developers.
- Use the logical model to define a physical model. The physical model has the SQL statements used to create the set of tables on the relational database used for the project.
- In iterative development processes, such as sprint-based projects, a starting database design is done for an early sprint. In later sprints, columns or tables are added to support new functionality developed in those sprints. You can minimize the chance of needing significant design changes late in the project by normalizing or ER modeling each iteration of the database design.
- Normalization (23:04)
- The normalization design process starts with a set of prototype tables called data views. Normalization then applies a series of rules called normal forms; such as applying the rules for 1st, 2nd, 3rd, 4th, and 5th normal forms. Each normal form assumes that you have applied the rules from the previous normal forms, and specifies a new rule. The result of applying the rules is a set of tables that are efficient at storing data.
- Boyce-Codd normal form is a redefinition of the rules embodied in the 2nd and 3rd normal forms.
- You should at least go to third normal form or Boyce-Codd normal form. It is a good practice to apply the rules for 4th and 5th normal form.
- Normal Forms Explained (25:09)
- Outlines the rules of the normal forms and the steps to take during the normalization process.
- Start with data views. Ensure that the data views have all of the columns required to store all of the data identified when gathering the business requirements. Establish the domains for all the columns, and identify the candidate keys in the data views.
- To go to first normal form, ensure that all columns represent atomic units of data. Then move any repeating groups of keys or columns to new tables.
- To go to second normal form, move any columns that only depend on part of a composite key to a new table.
- To go to third normal form, move any columns that depend on a key that is not in the primary key to a new table.
- To go to fourth normal form, ensure that none of your tables have multiple, independent multivalued facts.
- Exercise Start: Normalizing a Form (31:00)
- Shows the order form used in the normalization exercise.
- Steps to Build Data Views (31:18)
- List the data views.
- Specify the domains for each column, identify the candidate keys, and bracket or otherwise mark any repeating keys or groups of columns.
- Example Data View (31:47)
- Shows an example of a single data view defined for the order form. Discusses the parts of the data view that do not comply with the rules for the first normal form.
- Data Views to first Normal Form (33:37)
- Split columns that are not atomic into separate columns for each part of the original.
- Move any repeating groups to a new table. The key of the new table is a composite key. It combines the key from the source table with whichever key column the repeating group depends on.
- First Normal Form Example (34:17)
- Shows the set of tables that result from applying the rules of the first normal form to the example data view. Discusses how the rules are applied, and the parts of the tables that do not comply with the rule for second normal form.
- First Normal Form to second Normal Form (36:07)
- Place any columns that depend on only some of the columns in a composite primary key into a new table. The key for the new table is that column or columns in the original key that the moved columns depend on.
- Second Normal Form Example (36:20)
- Shows the set of tables that result from applying the second normal form rule to the example of the first normal form. Discusses how the new tables created by applying the 1st and 2nd normal form rules now represent a typical relationship table between two entity tables. Discusses the parts of the tables that do not comply with the rule for third normal form.
- Second Normal Form to third Normal Form (38:22)
- Move columns that do not depend on a primary key to a new table. Leave the key column in the source table as a foreign key to the new table.
- Third Normal Form Example (38:38)
- Shows the set of tables that result from applying the rule for the third normal form to the example of the second normal form. Discusses the design elements that meet the requirements specified during the exercise, but are likely to need further changes to meet real world requirements.
- Normalization: Final Steps (43:40)
- Discusses the need to apply the normalization process on all forms, reports, and web pages that use the database. When normalizing multiple items, you typically end up with multiple forms or reports that generate tables with the same primary key domain. Merge those tables in the final database design. For example, if normalizing an order form and a warehouse inventory form generates two Product tables with ProductID primary keys, merge them into one table.
- Example: Completed Order (44:49)
- Shows the original order form filled in with the data for one order.
- Example: Tables With Order Data (45:01)
- Shows how the data from the completed order would be inserted as rows in the tables designed in the exercise.
- Fourth Normal Form (46:23)
- Illustrates applying the rule to break apart a table that has multiple, independent, multivalued columns.
- Implement a Physical Design (50:05)
- A discussion of a common mistake made when creating the physical model from a logical model; using simplistic rules to define indexes. An example would be defaulting to having a clustered index on the primary key, which is often not the best use of a clustered index.
- Design indexes based on an analysis of how the table columns are referenced in the WHERE clauses of the SQL statements the applications run. Tools such as the SQL Server Database Tuning Advisor can help determine the best set of indexes to support the applications that use the database.
- What to Learn Next (52:50)
- Practice a lot to gain experience in making the practical decisions that result in a good design. Don’t wait for the next project where you are the database designer. Pick different reports, forms, or web pages and go through an exercise to design a set of tables for that data.
- Read books on database design or data modeling. A good book specific to SQL Server is Pro SQL Server Relational Database Design and implementation. To find other books, search a book store site for either “database design” Or “data modeling”. Pick a book that has high user ratings.
- Learn more about the internals and specific behaviors of the database system you typically use. For SQL Server, books such as the “Inside SQL Server” series are excellent resources. Also, find websites that cover your database system, such as blog sites or third-party sites that specialize in information about the database you use.
- Find forums that other customers use to exchange information and answer questions about the database system. For example, MSDN provides a forum specific to SQL Server database design questions: SQL Server Database Design Forum.