If you are like me and you work with a company that provides services to banks and other financial institutions, then you have probably been asked to design/develop databases that securely store all client sensitive data. Data is the client’s most valuable asset. So it is not surprising that clients hire special auditors just to make sure that their data is in good hands
The main challenge here is not just to convince the client that the data is secure enough. It is also necessary to prove that the data can be ready for reporting in a reasonable amount of time, and that it’s just one click away: “push the button, here is your report.” Security and performance are two contradictory goals to achieve. The more your data is secured using the strongest encryption algorithms and the longest keys, the worse your queries and reports will run against the data.
We usually use encryption to secure sensitive data, such as SSNs, phone numbers, etc. The choice between using symmetric/asymmetric encryption, key length, and algorithm usually depends on many factors. One of these factors could be client preference if the client is educated enough in cryptography.
Imagine that you have a table containing one million records with one encrypted column. And let’s say the encrypted column is driver license numbers. The requirement is to check if a person exists in the table before you add/update their info according to their driver license number. But here’s the dilemma: one million records will have to be decrypted to compare against the inputted value. To check if the inputted driver license number matches what’s in the table you will have to go through all one million entries. So what happens if you have 100,000 other records waiting in the queue to do the same check against this table? 100,000 index scan operations will occur, and this is a performance killer.
When I first learned about encryption in SQL Server 2005, I thought that I could just create an index for the encrypted column and compare the two encrypted texts using the equity operator. I assumed that in doing this I would avoid the index scan: “Table.EncryptedDriverLicenseNumber = Input.EncryptedValue.” But, unfortunately, this doesn’t work because the encryption functions in SQL Server 2005 and in 2008 are non-deterministic functions. This means that for the same input text you will most likely receive different output values in different calls. It is not a one-to-one relation, so the index will not work.
One possible solution for this problem is to use hashing, which can be thought of as one-way encryption. One-way encryption means that a hashed text cannot be decrypted back again. The good thing about hashing is that it’s deterministic, which means for the same input text you will receive the same hashed output. So indexes work well in this case. In addition, the hash function output is an integer value, which is even better for indexing.
Here is an example scenario: we have a table that contains individuals’ information, and the requirement is to secure each individual’s SSN so that the SSN column will be encrypted and saved in a varbinary column. We then add another integer column for the hashed SSN value. Every insert into this table will have both the encrypted and hashed versions of the SSN. Then the best part, we build an index on the hashed SSN column. To make the best use of this index, we include the clause: Table.HashedSSN = CHECKSIM(input.SSN). This will result in an index seek, and will boost the performance. We must also be sure to keep the original “where” condition as-is by adding the mentioned clause to it. This is because the mentioned clause is not guaranteed to return a unique result, because more than one SSN might have the same hashed Value. The “where” clause will look something like: Convert(varchar(9), DecryptByKey(Table.SSN)) = Input.SSN and Table.HashedSSN = CHECKSUM(Input.SSN). CHECKSUM is the SQL Server function used to generate the hashed value of any string. DecryptByKey is used as a sample for decryption functions. Others can also be used. For more information about CHECKSUM and Decryption function, check books online.
Now, is this solution perfect? Unfortunately, no it is not. This solution is better in terms of performance, but we had to compromise security to achieve it. The comprise occurs because having a hashed value in any table degrades the level of security applied, and a hacker can compare a hashed value to a hash lookup table previously built and try to find what the original value was.
As I said in the beginning, security and performance are two contradictory goals, and it’s necessary to balance between them.