Lesson 5: Supertypes and Subtypes
Estimated lesson time: 20 minutes
In this final lesson, you learn about supertypes and subtypes. Many times, you have a hierarchy in your data, but you do not know the number of hierarchy levels in advance. Let’s look at how to model this kind of problem.
Supertypes and Subtypes
Two entities are of distinct types if they have no attributes in common. It is possible for entities to have both common and distinct attributes. If they have a common identifier (that is, a common primary key), they have a special supertype-subtype relationship: they are neither distinct nor the same. You use supertypes and subtypes to represent different levels of entity generalization. Normalization and denormalization are about breaking down and assembling. Supertyping and subtyping are about generalization and specialization. Think of normalization as widening and subtyping as deepening. When you analyze your business problem, the verb is usually leads to a supertype/subtype relationship. For example, a company is a customer, and a person is a customer as well. Obviously, companies and persons have something in common.
In the companies/persons example, you started with a bottom-up approach. You can continue this generalization further. For example, both customers and suppliers are partners. You could also start from the top and discover specializations. The question is where to stop with this process. It is easy to find where to stop when you use the top-down approach: specialization makes sense only if subtypes have additional attributes. With the bottom-up approach, you could finish with just a few entities—for example, with subjects, objects, and events. Theoretically, you should stop when you reach abstract objects, objects that do not exist in the real world. For example, a computer table is a table, and a table is furniture; however, when you come to a store, you do not order “a piece of furniture.” Still, sometimes it is good to have a supertype just to share common identification (that is, a common primary key). This enables you to gather all the information for a supertype for advanced analysis. A practical approach is to stop when you have a problem naming the supertype. If you cannot name it quickly, probably nobody would need to perform analysis on that supertype.
If you overlooked supertypes and subtypes when you analyzed your business problem or if you are working on refining an existing model, you can still identify them from the existing model. A huge table with sparse known values and many NULL values is a candidate for specialization. Check whether those unknown values are really unknown or whether they are simply meaningless for some rows. If they are meaningless, you can get rid of them when you introduce subtypes. Using the bottom-up approach from an existing model is possible only if the model stays with a naming convention; otherwise, you have to re-analyze the business problem. Tables that have many columns with similar or even the same names probably need a supertype table.
Modeling Hierarchies
A business problem can introduce something that, from a business perspective, is a called a hierarchy. For example, an employee organizational chart is a hierarchy. A bill of materials is another hierarchy. From a mathematical point of view, hierarchies are actually graphs and trees. In graph theory, a graph is a set of nodes (also called points or vertices) connected by links (called lines or edges). In a graph, you can arrive at a node through different paths. You can travel in any direction. Links can have different weights. Paths can make cycles. A tree is a special kind of graph in which any two nodes are connected by exactly one path. A forest is a graph in which any two nodes are connected by at most one path; a forest is, of course, a set of trees. An employee hierarchy is usually a tree, and a bill of materials is a directed, acyclic graph.
You can model a tree or a forest by using a single table that contains two columns connected by a foreign key. Figure 2-5 shows the Employees table with a hierarchy modeled. A manager is also an employee, so there is a row for each manager in the table. Each employee has a unique EmployeeId. Employees also have managers; this structure is modeled through the ManagerId column. A foreign key connecting the Employees table to itself uses EmployeeId as the parent and ManagerId as the child column. A single manager, who is an employee, can manage multiple employees. You can denote the highest-ranking employee by using a ManagerId value of unknown (NULL) or one that is the same as the EmployeeId value.
Figure 2-5 Employees table with hierarchy—a tree
For modeling a graph, you need two tables. Figure 2-6 shows a model for bill of materials. You need a table for parts (materials, semi-products, products) and a separate one for bill of materials, which joins parts into assemblies. The BillOfMaterials table has a primary key made up of the PartId and AssemblyId columns; both are parts, so you have two foreign keys from the BillOfMaterials table to the Parts table.
Figure 2-6 Parts and BillOfMaterials tables with hierarchy—a graph
To present the data to end users, you have to resolve the hierarchies. In previous versions of SQL Server, this was a tedious job. You needed a loop to resolve a hierarchy, implemented either through a WHILE construct or with recursion. In SQL Server 2005, you can use recursive common table expressions (CTEs) to resolve hierarchies. CTEs also perform better than loops.
Practice: Supertypes, Subtypes, and Hierarchies
You have a fully normalized model (as before the denormalization practice) for your database that supports project management applications. You discover that the business has expanded to offer support and training services, and you need to include these services in your database design. Support and training are services similar to projects, and each service can have its own hierarchy. For example, a project can be part of a bigger project, which can include training and support. Your current design is:
Projects(ProjectId, ProjectName, CustomerId, StartDate, CustomerName, TotalTimeSpent)
ProjectDetails(ProjectId, ItemId, ActivityId, EmployeeId,
WorkDate, TimeSpent)
Customers(CustomerId, CustomerName)
Activities(ActivityId, ActivityName)
Employees(EmployeeId, EmployeeName)
Exercise 1: Find Supertypes and Subtypes
In this exercise, you need to find supertypes and subtypes.
Find supertypes and subtypes. Services are the supertypes of projects.
You have to change all references to the projects to services. Change the name of the Projects and ProjectDetails tables to Services and ServiceDetails. Change the names of the ProjectId and ProjectName attributes to ServiceId and ServiceName.
Add the service type to distinguish between services. You should add a lookup table for different service types.
Your improved design should look like the following.
Services(ServiceId, ServiceName, ServiceTypeId, CustomerId, StartDate, CustomerName, TotalTimeSpent) ServiceTypes(ServiceTypeId, TypeName) ServiceDetails(ServiceId, ItemId, ActivityId, EmployeeId, WorkDate, TimeSpent)
This design might surprise you; you might have expected explicit subtype tables for projects, training, and support. However, in the business problem description, there is no information about specific attributes for each service type. Introducing subtype tables would just complicate the design; all you need is a column that shows service type. This is an example of subtypes hidden, or stored implicitly, in a supertype; if there is no additional information about subtypes, they collapse to a single status attribute of a super-type.
IMPORTANT Find supertypes in the conceptual design stage
If your database was already in production, then you could not change the names of the tables and columns so easily. Changing the names would probably lead to changes in your application as well. Therefore, it is very important to find supertypes in advance, in the conceptual design stage.
Exercise 2: Find Hierarchies
In this exercise, you need to find hierarchies.
You need to model the hierarchy (tree) of the services. What you need is a column for the identification of the parent service.
In addition, add a foreign key between the ServiceId and ParentServiceId columns. Your improved design should look similar to this:
Services(ServiceId, ServiceName, ServiceTypeId, ParentServiceId, CustomerId, StartDate, CustomerName, TotalTimeSpent)
IMPORTANT Remember foreign keys
In this brief notation, the foreign key is not explicitly shown. However, you should not forget about it.
Quick Check
- How do supertypes and subtypes differ from graphs and trees?
- How would you model a road system?
Quick Check Answers
- Supertypes and subtypes show hierarchy in the structure; graphs and trees show hierarchy in the data.
- You should model a road system as a graph with cities as nodes.
© Microsoft. All Rights Reserved.