Gavin Douch

AnonKey:

Copy

What is this?

An AnonKey is a string of text that represents a participant in your research. You type in the name of the research participant into the text-box above, along with their unique participant number, and you get the AnonKey that you store in your database instead of their name.

The purpose of this is to protect participants’ privacy. If your data is stolen or leaked, nobody would be able to determine who the participants were using the AnonKey. But, if a participant withdraws their consent from the study, you can ask them for their name and participant number (which you provided when they agreed to join the study), generate the AnonKey again, and look for a match in your database to delete it.

By requiring the name in addition to the participant number to delete their data, we can be more sure that the person making the request is correct that the number they are providing is their participant number, and also prevents us from acting on fraudulent deletion requests, since they would not know what name corresponds to each participant number.

A Small Warning

If you spell the participant’s name incorrectly when you save their data into your database, your AnonKey will not match their real name. If you then need to delete their data later on, you will not be able to match your saved AnonKey with the new one you generate using the correct spelling of their name.

Some variation is allowed (differences in spaces, diacritics, and capitalisation), but it’s much easier to just type in their name correctly the first time.

Details for Nerds

The prefix of the AnonKey is the hexadecimal version number that stops at “v”, and then the hexadecimal encoded participant number that stops at “n” (this ensures AnonKeys never clash for different individuals, whether or not their names are identical).

The participant’s name is preprocessed by removing spaces, diacritics, substituting text in a non-Latin script for punycode, removing non-printable US-ASCII characters, and then converting all characters to lowercase (in that order).

The base of the AnonKey is the SHA-256 hash of the preprocessed name, salted with the participant number, the string “AnonKey”, and the AnonKey version number. This is then encoded in Base64, but with “+” and “/” replaced with “-“ and “_” respectively, and all trailing equals signs removed.

The AnonKey is suffixed by a three-digit hexadecimal checksum calculated as the sum of the US-ASCII values of every character before the third last character, modulo \(16^3\).

View the source code.

EBNF

AnonKey = prefix, base, checksum;
prefix = version, participant number;
version = hex number, "v";
hex number = (hex char - "0", {hex char}) | "0";
hex char = [0-9a-f];
participant number = hex number, "n";
base = 43 * base64 char;
base64 char = [0-9a-zA-Z-_]
checksum = 3 * hex char;

Regex

/(([1-9a-f][0-9a-f]*)|0)v(([1-9a-f][0-9a-f]*)|0)n[0-9a-zA-Z-_]{43}[0-9a-f]{3}/