Share via


Secret sharing

One of the most difficult problems in cryptographic key management is
keeping a secret key safe from both compromise and loss. If you don't make
enough backups, the key might be destroyed in a hardware failure or
natural disaster. But if any backup is compromised, the key is
compromised.

Rather than invent new tools, one might try to solve the problem
with encryption. Just encrypting the key normally is not helpful, since
then you have the
same problem with the key needed to decrypt it. Instead, if you encrypt
your key using multiple keys, then
store these keys in different locations, you need all the keys to
decrypt it. This protects against compromise, but if any key is lost,
so is the original key. To fix this,
you'd have to encrypt the original key many times, each time using just
some of
the keys. For example, suppose you had 5 keys and you wanted to be able
to decrypt using any 3 of them. Then you'd need to perform all the
following encryptions:

  • E1(E2(E3(K)))
  • E1(E2(E4(K)))
  • E1(E2(E5(K)))
  • E1(E3(E4(K)))
  • E1(E3(E5(K)))
  • E1(E4(E5(K)))
  • E2(E3(E4(K)))
  • E2(E3(E5(K)))
  • E2(E4(E5(K)))
  • E3(E4(E5(K)))

The number of encryptions needed grows very quickly. It also takes
quite a bit of time and space to store all these multiply-encrypted
keys. Let's look at some other solutions.

One solution is to simply break the key file into two parts,
make a copy of each, and store the four parts in four locations. Then,
even if any one is compromised, the key is not revealed. Unfortunately,
this scheme is not that helpful, because just knowing half of someone's
private key is enough to drastically decrease the time needed for a
brute-force search to crack the remainder of the key.

Here's another simple solution: create a one-time pad (a file of random
data) the same size as the private key, then XOR them together. Put the
result in one place and the pad in another, and destroy the original
key. We call each file a "share" or "shadow". Later, you can XOR the two shares
together to retrieve the original key. Since the bytes of each file are
random, you can be assured that neither piece will reveal any
information about the key by itself. We can extend this scheme to n pieces by just creating n - 1 one-time pads, adding them together, and subtracting this from the key data to obtain the nth share. Unfortunately, if even one piece is lost, so is the key. This is called an (n,n) secret sharing scheme, because we have n shares and need n of them to reconstruct the secret.

Here's another more useful scheme: let the private key be represented as the number S. Choose a random line in the plane passing through the point (0, S). Next, choose n random points on that line as your n shares. Now, if we know any two of the shares, we know the line, and so can find S. If we only know one share, then we just know one point, and for every T there is a line passing through (0, T) and that point; hence, the secret could be anything at all. This is called an (n,2) secret sharing scheme, because we have n shares and need 2 of them to reconstruct the secret.

Particularly useful for individuals is the (3,2) scheme - you can keep
one share on your person on a floppy or USB pen drive at all times, one
share on your hard drive, and one share backed up at a secure remote
location. When you need to use the key, you recreate it briefly using
the shares on your hard drive and your person, then destroy it. If you
lose the floppy or your hard disk crashes, you can recreate the key
from the remaining shares (and make yourself 3 new shares). If any one
share is compromised, the key is not compromised. If you know of a
compromise, you can recreate the key, create 3 new shares, and destroy
the old ones, preventing the attacker from getting a second share. The
new shares are unrelated to the old ones because a different random
line through (0,S) is used.

In 1979, Shamir generalized these ideas even further to create a general (n,k) secret sharing scheme for any n, k. The idea is to choose a random degree-k polynomial passing through (0,S), where S is the secret. Then choose n random points on this curve. Because there is exactly one degree-k polynomial passing through any given set of k points, any k of these points define the curve and allow us to retrieve the secret. If we have only k - 1 points, then we don't learn anything about S, because there is a polynomial passing through (0,T) and those k - 1 points for every T.
To make the notion of choosing random polynomials and random points
precise, Shamir works with polynomials modulo a prime number p, choosing polynomial coefficients and point coordinates randomly from the integers in [0, p - 1].

Secret sharing has more applications than just protecting private keys.
It's also useful in situations where we want to entrust a secret to a
group of people but want to ensure that several of them must cooperate
to access the secret. We don't need to trust them all individually, nor
do they all need to cooperate, so the secret is still available if
members go missing or refuse to cooperate.

Finally, I wouldn't just tell you all this without providing some
working code you can use. You can download an excellent implementation
of the scheme for UNIX called secsplit (direct download).
I don't know of any free .NET implementation, but I'm sure that porting
this 600-line program into any environment wouldn't be difficult.

Some more resources on secret sharing:

Comments

  • Anonymous
    June 23, 2006
    The simple RAID 5 algorithm will do (n-1) restore trick (http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks#RAID_5)


    You may encrypt key and then split it on n-pieces (n >= 3) using “parity blocks”. You always can recover key using n-1 pieces.
  • Anonymous
    June 23, 2006
    Hi Alexander. You're right that RAID 5 does split data in such a way that any n-1 of n bits allows recovery, and such that each set of k parallel bits stores k-1 bits of information. However, this is not a secure scheme; for example, if you know the contents of k-2 of the data disks, you know a (k-2)/(k-1) proportion of the file data, which even for 4 disks is 2/3 of it. The even distribution of the missing data in striping can make certain objects such as Unicode strings easy to extract from limited data. If the k disks were based on Shamir's scheme, you would only have the same storage capacity as one disk, but any k-2 of the disks would be completely useless in determining the original data (which isn't generally very nice, but might be handy for reusing disks that store confidential data).