SQL Server: Representing List of Values Using a Single Value
Introduction
In some cases we have to use single value, in order to represent a list of elements (values). This is a very common needs, in hardware and software developing. This might be an external demand (applications for example) or even a result of bad database design. Yet, our needs clear, we want to store multiple values, in one value (in column for example). Those elements can be properties list, options, security permissions, dates, or any other data.
This article is not about the best solution using best database structure, but on the logics behind this single value. We will not discuss how we are using our "single value", in practice. This can be done using parsing text function, or directly, this can be done in the database side, or in the application side, using any programming language. This article only discuss the logics behind the solution. All the logics that we discuss in this article, can be implemented regardless of database use. In this article we will focus on using SQL Server to implement our logics, only as an example.
We will go over several solutions, using different logics. Our challenge is to find a good logic, which give us a one-to-one correspondence, between each available combination of our elements, to a single value which represent this combination.
Note!
This article exists, to bring all the solutions into one place. Solutions 1-5 are common for advanced users, Solutions #6.1 and #6.2, which is harder to find, based on Ronen Ariely blog, which is translating from his old post in Hebrew. Solutions #7 and #8, are new alternative.
* If you have another interesting solution, you are welcome to add it and improve the article.
Our Case Study
For the purpose of this article we will talk in general about permission's rules, as example. A user can have any combination of security rules which together describe the user's permissions. In our theoretical case, we need a way, to store all the user's permissions, using a single value (for example in the Users table).
Permission Description |
Read |
Write |
Execute |
Important!
This is not Relational Data Base Management (RDBMS) best solution! The obvious solution using RDBMS will base on something like creating Users table, Permissions table, and a junction table for Many-to-Many relationship. This solution might involve with using Table-valued parameters (TVP) to send multiple rows of data to a Transact-SQL statement. This is probably the best solution for most cases, as far as we discuss RDBMS database server side.
Note!
A column cannot be of a user-defined table type, we cannot use this for our case (store single element which represent multiple values). Moreover, using TVP is probably the best solution for most cases on the database side, but it is not so easy to implement sometimes with all developing technologies and languages (for example using JAVA).
Solution 1: XML & JSON
XML and JSON provides a structure to data, so that it is richer in information. it can be used as an exchange format between applications. They both have the same interoperability potential. JSON is optimized for data and does not have a <[CDATA[]]> feature. It is not well suited to used directly as markup language. JSON has a much smaller grammar and it maps more directly onto the data structures used in modern programming languages. JSON is much faster for machines to read, write, and parse (serializing and deserializing).
The XML and JSON provides a way to store flexible data in as a single value (in column if needed). We can use simple text string formatted as JSON or XML in order to represent any number of values.
<userrights>
<userright>Read</userright>
<userright>Write</userright>
<userright>Contribute</userright>
</userrights>
userright:['Read','Write','Contribute']
Note! Using JSON and XML types, in SQL Server
* The XML datatype, and related XPath functions, were introduced in SQL Server 2005. We can use the XML type if we need to use XPath functions. Code for parsing and generating JSON data is readily available in a large variety of programming languages, including any .Net language like C#. This can be used in SQL Server as a new CLR User-Defined type. Both those option are not related to this article.
Reminder: This article do not discuss parsing the single value, but only the logics behind the value, and using this value as it is directly (for example using simple "select <column name> from <table name>")
Those options (XML, JSON, etc) are good when we need a dynamic structure, or we need to navigate directly to specific node, for example, but in this article we do not deal with complex structure, or hierarchy structure, but only with simple values from the same level (a list of values). As we can see in the example above, the JSON format, includes lot of overhead, to the data, which we really DO NOT need, and the XML format is even worse in this issue, and it uses a longer string in order to store the same data.
Solution 2: Comma Separated List
The simple's way, will be using a single string, separated by comma, with all our user permissions. For example the user A with permissions to Read and Write will get the value "Read, Write". The description of the permission might be very long string. We can represent the SET of those descriptions, using integer primary Key (or in most cases better a surrogate primary key). For example same user A have permissions "1, 2".
PermissionID | Permission Description |
1 | Read |
2 | Write |
3 | Execute |
Note!
If you did chose this option, then please remember that T-SQL do not deal with text parsing well, and it is highly recommended to work with CLR function!
In most case we will prefer to use a single integer value, and not to parse in every use, this string of Comma-Separated list.
Solution 3: Explicitly all Available Combinations
Unique value | Permissions Combination |
1 | Read |
2 | Write |
3 | Execute |
4 | Read + Write |
5 | Read + Execute |
6 | Read + Write + Execute |
Now we know that if a user A have permission 6, it mean that he have Read, write, and Execute permissions, while a user B with permission 3 have only Execute permission, and so on.
This solution can be very useful using database, especially for a small SET of options, but it need much more managing when permissions changed. The big disadvantage of this option, is that the amount of row growth exponentially, when we add more permissions. The table growing rapidly. For amount of n elements, we will need to hold factorial (N!) rows. Moreover, this solution fit databases which work well with SETs, but might not work well for our application (we don't want to store all the information in memory, and we don't want to go back and forward to the database, each time we use it).
Solution 4: Groups of Three
The basic logic behind this solution, is to represent our elements list, in ordered list, and represent each element with a different number, using base 2.
For example we can use this table below, and use the SUM of the permissions in order to represent the user's permissions. A user that have permission 1 can only read, permission 3 is the SUM of 1+2 therefore the user have permissions to read and write, permission 4 is only Execute, and 5 is Execute + Read. Permission 6 is write + Execute and 7 is full permission Read, Write, and Execute.
Note!
You can notice that the unique values that we use are base 2 exponent (or power) n. Using T-SQL we can use the function POWER(2,n-1)
Unique Number | Permission Description |
1 | Read |
2 | Write |
4 | Execute |
The Linux File Permissions is a good example of this case. Each file and directory has three user based permission groups: Owner (u), Group (g), All Users (a). Each file or directory, has three permissions rules: Read (r), Write (w), Execute (x). We can represent the permissions groups and permissions types, by using this small table, using bit type (1 if the group and permission rule applied, and 0 if not).
Owner | Group | All Users | |||||||
Rule type | r | w | x | r | w | x | r | w | x |
Is Rule active on File or folder? | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
This table include the permissions for a single file or folder. Each row represent a file or a folder permissions. In our example Owner have READ, WRITE, and EXECUTE permission, while the group have only READ AND WRITE permissions, and the use have only READ permission.
As we mentioned above each three permissions (each group of permissions) can be represent in a single number. In our example the first number is 1+2+3 = 7, the second number is 1+2 = 3, and the third number is 1. Therefore, we can represent all the permission using a number with 3 characters (one character for each group of permissions): 731
The big advantage in this solution is the visualization of the value. It is very fast and easy for us, as humans, without any calculation, to convert the number 731 to a SET of options. Since each characters represent a group of 3 options.
Why do we use groups of three and not groups of two or four for example?
The answer simply lies in the visualization of the value. The SUM of all the values should be the smaller number that we can, in order to store the maximum amount, of permissions. (1) We will choose the biggest group that we can represent. Therefore if three work well there is no reason to work only with two permissions in a group. (2) At the same time we want each group to be displayed separately, in a single character (for the visualization). Therefore we cannot choose four permissions, since the next value using our logic POWER(2,n-1), will give us POWER(2,3)= 8, and in order to get all permissions we get the value 1+2+4+8 = 15 which will use two characters.
Solution 5: Numbers Base Two
The basic logic is the same as above, but in this case we are not limiting our-self to groups of three permission, but only one group with infinite number of permissions. As explain above we are losing the virtualization advantage, but we gain smaller value in order to represent the same amount of permissions.
Uniqe Number | Permission Description |
1 | Rule1 |
2 | Rule2 |
4 | Rule3 |
8 | Rule4 |
16 | Rule5 |
32 | Rule6 |
64 | Rule7 |
128 | Rule8 |
256 | Rule9 |
Example: User A with permissions Rule1+Rule2+Rule6 will get the value 1+2+32 = 35. As we can see, getting the permissions from the value is not intuitive as in previous solution. But in previous solution we need to use the number 777 in order to represent all first 9 permissions, while in this case we only need to use the number 511.
select SUM(P)
from (
select top 9 n, POWER(2,n-1) P
from Ari_Numbers_Tbl
) t
Clarification on our format:
We can represent our permissions as a list of binary data, like this:
Rule Number | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
Is Rule active on File or folder? | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 |
For each rule that the user get, we use 1 and for each rule that is not implied on the user, will get the value 0. We can notice that the user's permission is actually a number on base 2. In our case 100011. In order to get the value which represent those permission, we calculate the SUM of (2 power n). This is just a simple converting from base 2, to a number on base 10. And 100011 in base two is equivalent to 35 in base 10 (same as we got above).
Note: In order to use this logic we will need a function to convert from binary to base 10 and vice versa. We will create a table with all the permission rules. And each user will get one value using base 10 which represent all his permissions, according to the permission's table.
Solution 6: Fibonacci Base
What is Fibonacci Base?
Fibonacci numbers or Fibonacci sequence are the numbers in integer sequence, which by definition, the first two numbers are 1 and 1, or 0 and 1, depending on the chosen starting point of the sequence, and each subsequent number is the sum of the previous two.
We can use this query in order to create a Fibonacci sequence:
-- select all numbers in Fibonacci sequence
-- which are smaller then 1000000000:
WITH Fibonacci (PrevN, N) AS (
SELECT 0, 1 -- starting point
UNION ALL
SELECT N, PrevN + N
FROM Fibonacci
WHERE N < 1000000000
)
SELECT PrevN as Fibo
FROM Fibonacci
OPTION (MAXRECURSION 0);
GO
Every integer number is the sum of set of distinct (no number used more than once) Fibonacci Numbers. This forms the idea behind representing numbers using Fibonacci numbers.
Converting between different number bases
In order to convert from base 2 to decimal, we represent each number as column of POWER(2,n-1), and we uses 1 or 0 to indicate that this column should be calculate. This is actually the base 2 number. For example the base 2 number 000100011 was represent like this:
Value in base 10 | 256 | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
Is this place in base 2 should be calculate? | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 |
The equivalent decimal number got by calculating the SUM of the Values in base 10. Therefore the decimal number in our case is 1+2+32 = 35.
Can we do the same with any sequence numbers? The answer is no. A number representation system (a number base) is most useful if it has a unique representation of every integer (one-to-one correspondence).
6.1 Implementing Fibonacci base
Using the same displayed as above for Fibonacci base we can see that the number 000110011 in Fibonacci base is equal to 1+2+8+13 = 24, but this is not the only option to get the value 24! We could use the SUM of anyone of those options: {3,21} {1,2,21} {3,8,13} {1,2,8,13} {1,2,3,5,13}. It is not enough that we can represent each integer in base 10 as SUM of Fibonacci numbers, but we need it to be a unique option.
Value in base 10 | 55 | 34 | 21 | 13 | 8 | 5 | 3 | 2 | 1 |
Fibonacci base | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 |
The reason that in our example we got multiple options, lies in the use of two consecutive Fibonacci numbers. After all the definition of Fibonacci sequence is that the next number is the SUM of the last two numbers. Therefore we can replace two consecutive Fibonacci numbers with the next number (1+2 = 3 which is Fibonacci number by itself, same with 8+13).
We can find a single way to write every number as a sum of Fibonacci numbers if we also have the rule that no two consecutive Fibonacci numbers can be used in the same sum. In terms of the base system with Fibonacci numbers as the headings, this means that no two ones can occur next to each other.
It is very simple to see, that if we have a SET of values (permission's rules in our case), and those permissions sorted in a way that there is no way that we will use two consecutive rules, then we can use the Fibonacci base as it is, in the same way that we used Base 2 in previous solutions. For example if we have a system that serve two (or more) companies (or other group of users), and each company has its own SET of rules, then we can use the even places to keep the rules of one company, and odd places to keep the rules of the second one.
6.2 Implementing Fibonacci half base
Another option where we can implement the Fibonacci base, is by only the numbers in the odd places or only the numbers that are in the even places. For example using those bases (those are not full number bases and do not enable us to represent every integer from base 10, but it is fit our needs):
Value in base 10 | 55 | 21 | 8 | 3 | 1 |
Fibonacci half base | 1 | 1 | 1 | 1 | 1 |
The number 11111 which is full permissions is equal to 1+3+8+21+55 = 88
Why to use Fibonacci base?
While in base 2 in order to use the 9th rule, we needed to SUM the POWER(2,9-1)= 256 , in Fibonacci Base the 9th rule is only getting the number 55. Therefore the value that represent our total permissions can be much smaller.
Solution 7: INSTEAD OF trigger
We mentioned in the beginning of the article that using RDBMS approach will give us much better solution for most cases, while implementing in the database, but there are some cases that we need to use one column which store multiple values, which might be an external needs (like external application). There is a simple workaround solution which we must mention here, and that is hiding the real database structure and given the external client to get exactly what he want, without the need of actually store several values in a single column. We can use VIEW with INSTEAD of trigger, as our solution.
The external client will only contact the view, which will use anyone of the logic that we show here in order to use dynamic calculate virtual column. This column will give us the one value that represent all the permissions, and using INSTEAD of trigger the use could handle this view as it was a real table (INSERT, DELETE, and UPDATE). You can see, a simple and basic implementation, of this idea in the next script.
-- using VIEW with INSTEAD of trigger
/*************************************** DDL Original tables */
CREATE TABLE Ari_Users_Tbl(
UserID int identity(1,1) primary key,
UserName NVARCHAR(100)
)
CREATE TABLE Ari_SecurityRules_Tbl(
RuleID int primary key,
RuleName NVARCHAR(100)
)
CREATE TABLE Ari_UserSecurityRule_Tbl(
RuleID int ,
UserID int,
CONSTRAINT FK_UsersID FOREIGN KEY (UserID) REFERENCES Ari_Users_Tbl (UserID) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT FK_RuleID FOREIGN KEY (RuleID) REFERENCES Ari_SecurityRules_Tbl (RuleID) ON DELETE CASCADE ON UPDATE CASCADE
)
GO
insert Ari_Users_Tbl (UserName) values ('Ronen'), ('Ariely')
insert Ari_SecurityRules_Tbl (RuleID,RuleName) values (1,'Read'), (2,'Write'), (4,'Execute') -- We use base 2 for the RuleID
insert Ari_UserSecurityRule_Tbl (RuleID,UserID) values (1,1), (2,1), (4,1), (1, 2)
GO
-- Helper table:
-- We will create table with all options (we can so this dynamiclly with a function)
-- This is a relation table from the SUM number to the list in base 2
-- We can popelate this table dynamicly using binary to decimal function
Create table Ari_BaseTenToList_Tbl (
SumBaseTen INT,
ValueBaseTwo INT
)
insert Ari_BaseTenToList_Tbl (SumBaseTen, ValueBaseTwo)
values (1,1), (2,2), (3,1), (3,2), (4, 4), (5, 1), (5, 4), (6, 2), (6, 4), (7, 1),(7, 2),(7, 4)
GO
select * from Ari_Users_Tbl
select * from Ari_SecurityRules_Tbl
select * from Ari_UserSecurityRule_Tbl
select * from Ari_BaseTenToList_Tbl
GO
/*************************************** Starting our solution using view */
-- Create view
CREATE VIEW Ari_UsersWithRules_v as
select U.UserID, U.UserName, SUM(S.RuleID) User_Rules
from Ari_Users_Tbl U
left join Ari_UserSecurityRule_Tbl US on U.UserID = US.UserID
left join Ari_SecurityRules_Tbl S on S.RuleID = US.RuleID
group by U.UserID, U.UserName
GO
select * from Ari_UsersWithRules_v
GO
-- Create INSTEAD of trigger for insert (we need to create INSTEAD of triger for update as well!)
CREATE TRIGGER Ari_UsersWithRules_ioTrig ON Ari_UsersWithRules_v
INSTEAD OF INSERT AS BEGIN
SET NOCOUNT ON
declare @UserName nvarchar(100), @User_Rules int
DECLARE Ari_Cursor CURSOR FOR
SELECT I.UserName, I.User_Rules
FROM inserted I
OPEN Ari_Cursor
FETCH NEXT FROM Ari_Cursor INTO @UserName,@User_Rules
WHILE @@FETCH_STATUS = 0
BEGIN
insert Ari_Users_Tbl (UserName) values (@UserName)
insert Ari_UserSecurityRule_Tbl(UserID, RuleID)
select SCOPE_IDENTITY(), B2L.ValueBaseTwo
from Ari_BaseTenToList_Tbl B2L
where B2L.SumBaseTen = @User_Rules
FETCH NEXT FROM Ari_Cursor INTO @UserName,@User_Rules
END
END
GO
/*************************************** Let's see what the client see and how he insert new user with rules */
insert Ari_UsersWithRules_v (UserName, User_Rules) values ('New User', 1),('New Admin', 7)
GO
select * from Ari_UsersWithRules_v
select * from Ari_Users_Tbl
select * from Ari_UserSecurityRule_Tbl
GO
/*************************************** Time to clean out database ;-) */
drop VIEW Ari_UsersWithRules_v
drop table Ari_UserSecurityRule_Tbl
drop table Ari_Users_Tbl
drop table Ari_SecurityRules_Tbl
drop table Ari_BaseTenToList_Tbl
GO
For more information about using INSTEAD of trigger, you can check this article: http: //social.technet.microsoft.com/wiki/contents/articles/28152.instead-of-triggers.aspx
Solution 8: CLR User-Defined Types
Using CLR User-Defined Types, we can create a new type which is based on fixed or dynamic amount of variables. Not like a TVP, this type will not treat as a table but as a single element. We can use this idea in order to store our values in a single column.
Note!
A very common case, where we need to create a new type which use two values, is when we need to use and store complex numbers. Same Logic can be implemented using more then 2 values.
Conclusions
In this article we showed several logics to represent a list of values, using a single value. We demonstrate those options using T-SQL language and SQL Server, but you can use the same logics in programing, using any language that you want.
Our challenge is to find a good logic, which give us a one-one correspondence, between each available combination of our elements, to the single value which represent this combination.
Source reference
This article is based on Ronen Ariely's blog, at:
More information
- Using the Fibonacci numbers to represent whole numbers: http: //www.maths.surrey.ac.uk/hosted-sites/R.Knott/Fibonacci/fibrep.html
- Understanding Linux File Permissions: http: //www.linux.com/learn/tutorials/309527-understanding-linux-file-permissions
- INSTEAD OF Triggers: http: //social.technet.microsoft.com/wiki/contents/articles/28152.instead-of-triggers.aspx
- Creating User-Defined Data Types in SQL Server 2005: http: //www.codeproject.com/Articles/15651/Creating-User-Defined-Data-Types-in-SQL-Server