SQL Database Design Tips
Introduction
You as a developer may always use ORMs for CRUD operations, but it’s recommended that every developer have at least a basic knowledge about databases and queries as there might be some limitations with ORMs when you get into complicated queries.
Anyhow, we are not talking about ORMs vs DB queries in this post. Therefore, if you are a DBA or you are interested in using SQL server, a relational Database Management System for your back-end, it's important to consider a few things to have a well-designed database and good performance. Also, if you are manipulating data as a data-scientist, data analyst and so, it's important to know how to write efficient queries.
In this article, some tips are mentioned briefly to be considered when designing databases in SQL server.
Normalize & Denormalize
Normalization is a way of organizing data in database to reduce redundancy and improve data integrity and avoid maintenance problems.
It's done through obeying some simple rules so-called normal forms e.g. 1NF, 2NF, ...
The first 3 applicable forms are as follows:
Rule No. 1 (1NF) : Do not have multiple values in a cell
As an example in the following table, the column “employee_phone” has more than 1 value in it and is multi-valued.
employee_no |
Employee_name |
Employee_phone |
452 |
Simin Maleki |
00989199999999, 00989181111111 |
You can apply first Normal Form by converting "employee_phone" attribute to single-valued as you see in the table below:
employee_no |
Employee_name |
Employee_phone |
452 |
Simin Maleki |
00989199999999 |
452 |
Simin Maleki |
00989181111111 |
Rule No. 2 (2NF) : Remove partial dependency by breaking your table in two tables. All values must depend on the whole primary key if you have more than one column which makes the primary key together (composite primary key).
In the example table below, attributes “employee_no” and “project_no” make a composite primary key. Meanwhile, the attribute “project_budget” is related only to a part of the primary key and is not related to or describe employee-no; therefore, this table is not in Normal Form 2.
employee_no |
project_no |
project_budget |
452 |
19852 |
10000000 |
826 |
19852 |
10000000 |
624 |
84256 |
2500000 |
452 |
84256 |
2500000 |
750 |
65428 |
10000000 |
In order to correct this, we separate the table in two tables in which "project_no" is a foreign key in employee table.
employee_no |
project_no |
452 |
19852 |
826 |
19852 |
624 |
84256 |
452 |
84256 |
750 |
65428 |
project_no |
project_budget |
19852 |
10000000 |
84256 |
2500000 |
65428 |
10000000 |
Rule No. 3 (3NF) : Every non-prime attribute should depend on the primary key. If so, put them in another table.
In the example below, department_name depends on a non-prime attribute which is "department_id" and it’s not describing the "employee_id" which is the primary key of this table.
employee_no |
Department_name |
Department_id |
452 |
Financial |
52 |
826 |
HR |
10 |
In order to fix the issue to satisfy 3NF, we have to drop "department_name" from employee table and create another table called department. "department_id" will be a foreign key in employee table, then.
employee_no |
Department_id |
452 |
52 |
826 |
10 |
Department_name |
Department_id |
Financial |
52 |
HR |
10 |
Sometimes you are allowed to disobey !!
It's recommended to apply 3NF to data that changes frequently, because normalization has a negative impact on performance and it seems unusual to separate all the columns that are not describing primary key column to a new table.
There are other Normal Forms but are not really practical and used in real world because they affect functionality.
Normalization is a must if you want to look professional. Do it, although you may need to denormalize some already normalized tables due to performance issues.
For Denormalization, Normalize First
If and only if there are some performance issues which are not solvable by query optimization, indexing and so, you may think of a few denormalization.
For example, you may need to join 10 tables for a simple query, then denormalization worth being considered as a solution.
This action has pros and cons. the positive point is that you will have better performance in querying the tables but the negative point is that you will have duplicate data and bad performance in other operations like deletion, insertion and modification.
varchar vs nvarchar
N stands for National language Character Set and is used to specify a Unicode string. When the application grows, supporting clients from different locals is important.
Nvarchar type takes twice space as varchar; 2 bytes and 1 byte respectively.
Because of that, it was always recommended to use varchar instead to keep tables smaller and use less space.
But, regarding the fact that storage and memory is less expensive nowadays, it might not be the choice anymore. Also, modern development platforms now use Unicode internally and by using nvarchar you don't have to do encoding conversion when writing or reading to the database.
Using nvarchar may cause a query take longer time to read large columns but take into account that conversions also take time and are prone to errors.
Naming Conventions
It's a good approach to follow some rules for naming everything in your database specially tables and columns. It makes life easier for you and your team and for future fellows working on the project. Keep in mind that your database may never die and remain forever so it's important to define proper names for the objects from the beginning.
And these are some suggestions:
- Name your tables in the singular e.g. city, employee, student
- Use lower case
- Use underscore for separating the words e.g. student_course, employee_role
- Don't use underscore at the beginning
- Don't use reserved keywords
- Use the same name for all the probable primary keys e.g. "id", "ID" unless you have compsite primary key. Preferably do not use simple "id" as the primary key. Use something like [table name]+[id] e.g. id_student.
- For Foreign Key columns use a combination of id column and table name e.g. customer_id, id_employee, id_student.
- Don't use long names for tables and columns; as a standard, 26 characters for table and column names is recommended. You can use abbreviations if it's going to be long. You'd better use standard abbreviations for database object naming.
- Don't use data types in naming columns
- Preferably, if you are going to have the same name for a column in different tables, change them not to be the same e.g. student_name, employee_fullname
- Name boolean columns as "is_name" e.g. is_deleted
- Define your own schema if you need to create tables or any other objects that should be separated by concept rather than dbo. e.g. hr., fin.
- Use alias in views for computed data using As
- Use @ for defining a local variable or parameter.
Regarding the fact that the naming convention is a standard defined in organizations or projects, you may have to obey their standard.
Do Not rush into defining Indexes
There are many different types of indexes which you can create. The main purpose of creating indexes is improving the performance of a query among many other actions that should be done to improve the performance of a database.
BUT ... The main point is that Do Not create indexes at first when you are creating tables. Create them when you have queries which need to run faster and applying an indexing strategy may help. That is the time you should create a sort of indexes for columns which are called in where clauses, order by, group, etc. Therefore, another consideration you must take into account: Do Not create Indexes in every column You should know that switching indexes is a bit complicated and may need downtime and dropping all other indexes on the table and recreating them. So, do not hurry to create them at the early stages. Of course, during the application upgrade, you will define new queries. Then, review and index tuning come to action.
Recovery Model and Backup Plan
Choose your desired recovery option in database design process. Remember that full recovery will help you restore data for any transaction but it also slows down the system.
Selecting the proper recovery model depends on your business requirements.
After selecting your recovery model, design a backup strategy for your databases which defines the type and frequency of backups.
Schedule your regular backups in database maintenance plan or alternatively, you can schedule a job for that.
After designing your backup strategy, test your backups by restoring them as a copy database in a test environment.
As a recommendation from Microsoft, for full database backups, choose a period that is off-pick. It’s common to have a full backup every weekend and differential backup and perhaps Transnational Log backup between the two full backups.
As a Best Practice, choose a different physical storage for database backups from the database files.
Important: if you make a full backup in another location than your plan/job is normally saving the backups, you will break the backup plan. To avoid this use Copy-Only backups
Control the Transaction Log file
A log record is written for every single operation in the database by the engine including starting/ending a SQL transaction when modifying a data, creating/dropping a table, after page allocation/deallocation and so. Therefore, you see the transaction log file always grows rapidly until it reaches the maximum size or if it's set to be auto-growth, you will soon run out of free space on your disk drive. Therefore, you must monitor and manage SQL Server Transaction Log growth.
There are some cases that prevent the transaction log file to be truncated automatically and free up space for reuse such as configuring only full backup plan with full recovery model, uncommitted transactions, SQL Server Agent Service being stopped, long running full and diff backups, etc.
When you encounter the problem of free space with Transaction Log file, it's recommended to check the file size setting and if it's possible extend it. If not possible, change your recovery model to simple (if it's not already) which forces log truncation. If it's not applicable, you have to check the reason of preventing the SQL Transaction log file from being truncated. the sys.database system catalog view under log_reuse_wait_desc column would be a help.
When the recovery model is other than simple, the Transaction Log file is not truncated automatically then you have to take Transaction Log backup otherwise the file will grow continuously. If you have set a backup plan, you can add Transaction Log file backup to your plan to be taken between full backups.
In order to monitor the growth of SQL Transaction Log file, tools such as the System Center Operation manager (SCOM), Performance Monitor counters or creating an alert that reads from one of the system catalog views would be good choices.
Truncating Transaction Log file frees up space to be reused but it doesn't decrease the size of the Log file. In this case you can perform Shrink operation which will be very useful after performing an operation that creates a large number of Transaction Logs.
Documentation
Always have a document keeping the important data of your product resources. Mention the design specification e.g. recovery, backup and restore plan and files locations and a checklist to show what should be done before restoring the databases.
Reference
- Microsoft Documentation on SQL Server
- Standard Abbreviations
- SQL Transaction Log File Growth Monitoring and Management
See Also
Hope you read this article and use it in your real life of designing databases.
Any suggestion, objection or question ?? comment below :)