(Warning: These materials may be subject to lots of typos and errors. We are grateful if you could spot errors and leave suggestions in the comments, or contact the author at yjhan@stanford.edu.)
1. Overview
Most scientific research are devoted to developing new tools or findings for challenging problems arising in practice, which help to provide explicit solutions to real problems. These works are typically called constructive, and characterize tasks which are possible to accomplish. However, there is also another line of work which focuses on the other side, i.e., provides fundamental limits on certain tasks. In other words, these works are mostly non-constructive and illustrate targets which are impossible to achieve. Such results are also useful for a number of reasons; we may certify the fundemental limits and optimality of some procedures so that it becomes unnecessary to look for better ones, we may understand the limitations of the current framework, and we may compute the amount of resources necessary to complete certain tasks.
Among different types of fundamental limits (or converse in information theory, lower bounds broadly used in a number of fields), throughout these lectures we will be interested in so-called information-theoretic ones. When it comes to the information-theoretic lower bounds, typically one makes observations (either passively or actively) subject to some prescribed rules and argues that certain tasks cannot be accomplished using any tool solely based on these observations. To show these impossibility results, one argues through the line that the amount of information contained in these observations is insufficient to perform these tasks. Therefore, the limited-information structure is of utmost importance in the previous arguments, and the structure occurs in a number of fields including statistics, machine learning, optimization, reinforcement learning, to name a few; e.g., a given size of samples to perform inference, a given amount of training data to learn a statistical model, a given number of gradient evaluations to optimize a convex function, a given round of adaptivity to search for a good policy, etc.
These lectures are devoted to providing an overview of (classical and modern) tools to establish information-theoretic lower bounds. We will neither be restricted to certain problems (e.g., statistical inference, optimization, bandit problems), nor restrict ourselves to tools in information theory. Instead, we will try to present an extensive set of tools/ideas which are suitable for different problem structures, followed by numerous interdisplinary examples. We will see that certain tools/ideas can be applied to many seemingly unrelated problems.
2. Usefulness of Lower Bounds
We ask the following question: why do we care about lower bounds? We remark that the usefulness of lower bounds is not restricted to providing fundamental limits and telling people which are impossible. In fact, the power of understanding lower bounds lies more on the upper bound in the sense that it helps us to understand the problem structure better, including figuring out the most difficult part of the problem and the most essential part of information which should be made full use of. In other words, the lower bounds are interwined with the upper bounds, and should be in no means treated as an independent component. We elaborate on this point via the following examples from puzzle games.
2.1. Example I: Card Guessing Game
There is a magic show as follows: there is a -card deck thoroughly shuffled by the audience. Alice draws
cards from the top of the deck and reveals
of them to Bob one after one, and then Bob can always correctly guess the remaining card in Alice’s hand. How can Alice and Bob achieve that?
Instead of proposing an explicit strategy of Alice and Bob, let us look at the information-theoretic limit of this game first. Suppose the deck consists of cards, what are the possible values of
such that such a strategy still exists? To prove such a bound, we need to understand what are the possible strategies. From Alice’s side, her strategy is simply a mapping
from unordered
-tuples to an ordered
-tuple, with an additional restriction that
for any unordered
-tuples
. From Bob’s side, his strategy is another mapping
from ordered
-tuples to a specific card in
. Finally, the correcntess of Bob’s guessing corresponds to
An equivalent way to state (1) is that, Bob can recover all cards after observing the first ordered 4 cards; write
for this strategy. Now we come to our information-theoretic observation: do ordered
-tuples contain enough information to recover any unordered
-tuples? Here we will quantify the information as cardinality: mathematically speaking, the mapping
must be surjective, so
. In other words,
Hence, this magic show will fail for any deck with more than cards. The next question is that, is
achievable? Now we are seeking an upper bound, but the previous lower bound still helps – the equality in (2) holds implies that
must be a bijection! Keeping the bijective nature of
in mind, it then becomes not too hard to propose the following strategy which works. Label all cards by
, and suppose the chosen cards be
. Alice computes the sum
, keeps
and reveals all others
to Bob. Let
, then for Bob to decode
, he only needs to solve the following equation:
where we define . It is easy to show that (3) always admits
solutions in
, and any solution can be encoded using
different permutations of the revealed cards
.
2.2. Example II: Coin Flipping Game
Now Alice and Bob play another game cooperatively: consider coins on the table each of which may be head or tail, and the initial state is unknown to both Alice and Bob. The audience tells a number in
to Alice, and she comes to the table, looks at all coins and flips one of them. Bob then comes to the table and tells the number told by the audience correctly. The question is: for which values of
can they come up with such a strategy?
To answer this question, again we need to understand the structure of all strategies. Let be the initial state of the coins, which may be arbitrary. We may also identify
as vertices of an
-dimensional hypercube. Let
be the number given by the audience. Alice’s strategy is then a mapping
, choosing a coin to flip (where
denotes the
-th canonical vector) based on her knowledge. Then Bob’s strategy is a mapping
, and the correctness condition implies that
It is clear from (4) that the map for any
must be injective. By cardinality arguments, this map is further bijective. Now the problem structure becomes more transparent: for each
, let
be the states where Bob will claim
. We call these vertices in the hypercube have
-th color. Then (4) states that, for each vertex
, its
different neighbors
must have
different colors. The converse is also true: if there exists such a coloring scheme, then we may find
such that (4) holds.
Now when does such a coloring scheme exist? A simple idea is that, the number of vertices with any color should be the same by double counting arguments. Hence, we must have , which implies that
must be a power of
. Based on the coloring intuition given by the lower bound, the strategy also becomes simple: consider any finite Abelian group
with
elements, we identify colors as elements of
and let
have color
It is straightforward to verify that (4) holds if and only if has characteristic
, i.e.,
for any
. By algebra, such an Abelian group exists if and only if
is a power of
, e.g.,
when
(necessity can be shown via taking quotients
repeatedly). Hence, such a strategy exists if and only if
is a power of
.
2.3. Example III: Hat Guessing Game
Now we look at a hat-guessing game with a more complicated information structure. There are people who are sitting together, each of whom wears a hat of color red or blue, independently with probability
. They cannot talk to each other, and they can see the color of all hats except for their own. Now each of them simultaneously chooses to guess the color of his own hat, or chooses to pass. They win if at least one person guesses the color, and all guesses are correct. What is their optimal winning probability of this game?
The answer to this question is , which is shocking at the first appearance because it greatly outperforms
achieved by the naive guessing scheme where only the first person guesses. To understand this improvement, we first think about the fundamental limit of this problem. Let
be the sequence of hat colors, and
is called success if they win under state
, and failure if fail. Clearly, at each success state
, there must be some person
who makes the correct guess. However, since he cannot see
, he will make the same guess even if
is flipped, and this guess becomes wrong in the new state. This argument seems to suggest that the
winning probability is not improvable, for each success state corresponds to a failure state and therefore at most consistutes half of the states. However, this intuition is wrong since multiple success states may correspond to the same failure state.
Mathematically speaking, let be the success states and
be the failure states. By the previous argument, there exists a map
which only flips one coordinate. Since there are at most
coordinates which can be flipped, we have
for each
. Consequently,
, and the winning probability is
The above lower bound argument shows that, in order to achieve the optimal winning probability, we must have for any
. Consequently, in any failure state, it must be the case where everyone makes wrong guesses. This crucial observation motivates the following strategy: identify people as elements of
. Each person
computes the sums
of the people indices with red and blue hats, respectively. If
, he claims blue; if
, he claims red; otherwise he passes. Then it becomes obvious that a failure occurs if and only if the indices of people with red hats sum into
, whose probability is
.
3. Bibliographic Notes
There are tons of information-theoretic analysis used to solve puzzles. We recommend two books if the readers would like to learn more about funny puzzles:
- Peter Winkler, Mathematical puzzles: a connoisseur’s collection. AK Peters/CRC Press, 2003.
- Jiri Matousek, Thirty-three miniatures: Mathematical and Algorithmic applications of Linear Algebra. Providence, RI: American Mathematical Society, 2010.
The examples are interesting. But the overview seems a little obscure for me.