Privacy Amplification

As previously, we have our protagonists Alice and Bob, trying to communicate in a secure manner, and an eavesdropper Eve, trying to intercept the communication.

As we have seen in Session 1, Alice and Bob need a shared private key to be able to communicate.

If Alice and Bob know each other well, they might attempt to build a private key from common shared knowledge, such as the flavor of the first cone of ice cream they shared, or the name of Bob's cats. Doing so will yield a somewhat "fairly" secret key.

Info

This key will only be fairly secret, as other individuals might also know such shared information, and Eve might be able to gather some through various means.

Naturally, Alice and Bob want to find a scheme to create an actually secret key from this fairly secret starting key.

Introducing privacy amplification :

Alice and Bob share a "weak secret" which comes from some distribution , represented as a random variable called the source.
While the distribution of may not be known, is available to both parties.

Eve possesses side information which might be correlated to , such as its first bit, its parity, or even some quantum state .

The goal for Alice and Bob is to generate some such that from the point of view of Eve, (chosen from a distribution ) is (close to) uniformally random.

Note

The size of will depend on how secret the original secret is. If there is not much that can be done.
Intuitively, this will be quantified by the min-entropy .

Randomness Extraction

Let's first start with a simpler problem, Alice has some source from which she gathers a string , which is possibly correlated with , with the promise that (with this condition, is called a -source), and is tasked to generate such that the distribution is close to uniform even conditionned on . Similar setup as before, but Alice is on her own and does not have to coordinate with bob.

Note

While Alice could just sample from an actually uniform distribution, or measure in the Hadamard basis a bunch of times, assume here that randomness is costly, and Alice does not have ready access to large amounts of randomness for free. The goal is to extract randomness from X.

A simple example

If has a distribution consisting of i.i.d. variables , one can extract a single bit nearly uniformly by taking the parity of all the bits

Example

One can use the same approach for variables which aren't identically distributed, and get a single bit close to a uniform distribution. However this is not very efficient.

Seeded extractors

Using a deterministic function to extract randomness (such as ) can only get you so far however, since for any , even if , there is no function that can always extract bits of randomness from .

Instead, we need to expand the scope to seeded extractor functions, which use a seed as a source of randomness. However, since this randomness may be costly, the seed should be of relatively small size compared to and .

Definition

A weak seeded extractor is a function Ext : such that for any -source ,
With uniformally distributed, independent from and .

One might think that one could simply use the seed as randomness, but Alice and Bob could not choose the same seed as a private key without communicating it publicly.

This induces the definition for a strong seeded extractor, where the extractor's input is independent from the seed

Definition

A strong seeded extractor is a function Ext : such that for any -source ,
With uniformally distributed, independent from and .

Why the min-entropy?

Intuitively, the min-entropy gives an estimate of how much randomness there is to extract from , but it is a valid question to seek if a strong seeded extractor really can extract bits from if it has (meaning it is a -source).

If , it is impossible to extract more than bits from while still being uncorrelated to , because you can guess the value of the extractor function by guessing first then applying the function, so , in particular .

Conversely, we will show strong seeded extractors which achieve the extraction of a (mostly) uniform bits given being a -source.

Hashing

To build a strong seeded extractor, we will make use of hashing functions, which essentially take a long string and map them to shorter strings somewhat uniformally.

We need to introduce classes of hashing functions families, namely 1-universal and 2-universal hash function families :

1-universal family

A family of hashing functions , is 1-universal if for every and ,

Notice that the definition has and fixed, and the probability is over the family of functions.

While a 1-universal family is enough to build a weak extractor, the following definition will let us build a strong one :

2-universal family

A family of hashing functions , is 1-universal if for every and ,

Once given such a family, the extractor will use the seed to choose a function in the family, and apply it to to obtain . While the set of all functions is a 2-universal family, it has too many functions, and the seed would need to be too large to be practical.

Instead, we will take functions in the field with elements, of the form where multiplication and addition are done in . This yields a family of functions which so happens to be a 2-universal family of hashing functions.

Proof

Notice how the functions map bit strings to bit strings, but we can simply toss the extra bits we don't want such that we are given bit strings, which still retains the 2-universality property.

Our extractor will now be defined as

How good is Hashing?

To analyze the quality of the hashing functions, we will make use of the leftover hash lemma, which has different formulations depending on whether we consider quantum side information or not.

No side information case

In this case, we don't take into account from Eve, in which case the lemma is stated as follows:

Lemma

Let integers, , , and a 2-universal family of hash functions. Then is a strong seeded extractor.

With our construction, the seed needs to be of length to be able to choose the hashing function, which is longer than the optimal seed length for the general case, but in the case of privacy amplification it is satisfactory. In the case without side information, we have .

To prove this lemma, we will proceed in two steps. First we want to show that the error can be bounded by first showing a bound on the collision probability, and then we bound this collision probability and achieve the result of the bound on the error.

From error to collision probability

As mentionned, we want to bound the error by something related to the collision probability where , and

First part of the proof

Bounding the collision probability

Here we make use of the 2-universal family property to bound the collision probability .

Second part of the proof

Side information case

To model side information, we will take to be a cq-state such that .

Before proving the statement with side information, we need to dive into the measurements Eve can make. Indeed, finding the optimal measurement to guess can be solved via semi-definite programming, but there is no formula whenever . Instead, we will have to consider pretty good measurements.

Pretty good measurements

Choosing the measurement is positive semidefinite, but not necessarily normalized such that , except when the are perfectly distinguishable. We can instead define where and we use the Moore-Penrose pseudo-inverse, and is a POVM. One can show that the probability of guessing using and this POVM is the square of the optimal probability given any POVM.

The proof with this pretty good measurement of the quality of our strong seeded extractor follows the same structure as the previous one, however it is quite more involved. If you do want to check it out, it starts page 13 of Lecture 4.

Back to cryptography

Using the results we have just shown, we can now devise a relatively simple protocol for privacy amplification:

Privacy Amplification Protocol

Alice and Bob build their shared weak secret , which might be correlated to Eve's .
Alice chooses a random seed for the extractor, and sends it to Bob through the public channel.
Alice and Bob each compute using the extractor built with hashing functions over the finite field with elements.

This protocol is always correct since Alice and Bob perform the same computation (assuming they have the correct shared to start with).

It is also secure, as our seeded extractor gives us the guarantee that is close to uniform, even conditionned on and . By using the 2-universal family of hash functions to build the strong seeded extractor, the protocol is now complete.

This will be useful for next session, which will discuss key distribution protocols.

Session 2 - Improving Security

Privacy Amplification

Randomness Extraction

A simple example

Seeded extractors

Why the min-entropy?

Hashing

How good is Hashing?

No side information case

From error to collision probability

Bounding the collision probability

Side information case

Pretty good measurements

Back to cryptography