Data masking tools have become an essential part of life since the ease with which confidential data could be proliferated. It has become a high stakes game of having to stay one step ahead of the pack to keep sensitive information accessible only to those for who it is intended. And the consequences of not keeping your confidential data safe from prying eyes can be devastating.
Imagine if your customer details become freely available over the Internet due to your not having implemented state of the art data masking tools. The reputation that you have worked so hard to nurture and the company that you have invested your life into creating can be destroyed in the click of a mouse. Similarly, your business may be involved in an industry with industry mandated data security measures, any breach of which can have devastating financial consequences for your business. Consequently, you really are left with no choice but to implement data masking tools to protect your sensitive data.
Data masking, also known as data scrambling, data obfuscation, data cleansing, or data anonymization involves the hiding of sensitive data in non-production databases from those not intended to see it. There are many techniques that can be implemented to mask data each with its own merits and drawbacks. Following are some of the more popular techniques.
Substitution involves the random substitution of the contents of a column of data with data that looks similar but is completely unrelated. This method preserves the look and feel of existing data; however, it may be problematic when dealing with vast amounts of data because it may be difficult to source large quantities of relevant data to substitute.
Shuffling is similar to Substitution but involves the substitute data being generated from the column itself, whereby the data in a column is shuffled randomly between rows until the data no longer correlates with the remaining information in the row. This technique also effectively preserves the look and feel of existing data; however it also quickly and efficiently deals with large amounts of data. Shuffling is ineffective when dealing with small amounts of data though and requires a sophisticated algorithm to ensure it is not “unshuffled.”
Number and Date Variance involves the algorithmic modification of each number or date value in a column by some random percentage of its real value. It has the benefit of being able to reasonably mask numeric data while still keeping the range and distribution of values within existing limits, yet it is only applicable to numeric data.
Encryption involves the algorithmic scrambling of data whereby only those with access to the appropriate key can view the encrypted data. Encryption may mask data but it also destroys the formatting as well as the look and feel of the data. Also, almost any encryption can be broken. Similarly, anyone with the access key can view the data.
Nulling Out/Truncating/Deletion involves the removal of the sensitive data within the data masking world. It is useful in circumstances where the data is not required but is not appropriate for test database environments, where data or at least a realistic approximation of the data is required by the test teams.