Security Notes
Let’s begin with a statement that contradicts most texts on the subject: Tokenization is a form of encryption.
Encryption is a process wherein a piece of data is replaced with a different piece of data so that only the intended reader will be able to reverse the process and extract the original data.
In standard encryption, one uses a formula, usually called a cipher, which operates with a second piece of data called a “key,” and with it computes the replacement data. The intended reader uses the same formula and a matching key to reverse the replacement effect and extract the original data.
In tokenization, the replacement data (called a token) is theoretically a randomly selected data string. So how would it be possible to reverse the replacement data and extract the original data? Simple. Somebody keeps a log, a table, that associates the original data with the replacement data. And by consulting that log, one can go back and forth between the original data and the replacement data.
In fact, in most instances the so-called random string is generated by something called a pseudo-random number generator (PRNG), which is, you guessed it, a formula.
The encryption formula is not secret, so if a hacker compromises the key, she compromises the data (she can generate the original data from its replacement data). Similarly, anyone compromising the token-log compromises the original data. There are plenty of ways to do either.
The extent to which the token is purely random is the extent to which the token is irreversible (without accessing the token log) to its generating data, as opposed to an encryption algorithm, which may be mathematically cracked. This advantage, however, is not very consequential. Hackers rarely crack the cipher itself. Instead, merchants lose data through flaws in their implementation protocols.
On the other hand, managing tokenization is a bigger burden than managing a cipher. The tokenization database, with full details, must be constantly secure and ready to respond to queries. In nominal encryption, by contrast, there is no managed database of the original data (the plaintext) and its replacement (the ciphertext). They are dynamically generated one from the other using a key, which is the only (small) piece of data that needs to be secured.
Because of the burden of managing the tokenization database (the “vault”), many merchants outsource the task, losing some control of their own data. The tokenization provider is then the only source where data can be tokenized or a token can be decrypted.
Because tokenization and formula encryption have symmetric functionality, they share the same complexities. In both cases, the same issues need to be resolved: expiration, visibility, sharing, variance. And as a result, there are about as many implementation schemes for tokenization as there are for formula encryption.
Most merchants are not interested in, and not equipped to sort out, these technical features where convenience, efficiency, and security need to be cross balanced.
As a result, tokenization has largely been used as a wedge for third parties to insinuate themselves into the $7 trillion payments food chain. The merchant usually pays someone to guard the tokenization table and to manage the traffic to and from it for every single query.
But all of this isn’t the big issue, which is that too many merchants don’t do either encryption or tokenization. Since card data compromised at one merchant works with all other merchants, this negligence is more consequential than the differences between formula encryption and tokenization encryption.
Formula encryption is more elegant than tokenization, and more flexible. It is also a good basis for the new crypto capabilities coming down the pike. One example: homomorphic encryption, where transaction data can be encrypted against hacking but can be readable by analytics engines. Another example: equivocation-based cryptography, where a hacker who cracks the ciphertext would find a large number of possible card numbers that match the same ciphertext, without a way to decide which is the right one.
Fast end-to-end encryption, where the merchant keeps control of his data and where ciphers and keys are readily upgradeable and replaceable, is the clean way for secure payment protocols.
—Gideon Samid • Gideon@BitMint.com