# Compressed Sensing and Neural Network for Quantum State Tomography

Information Theory (Winter 2020)

By: Beicheng Lou and Evan Laksono

Hey, this blog is about how to apply compressed sensing to (quantum) state tomography. We’ll start with an extremely friendly introduction:

## Motivating game 1: Blank filling

Consider a blank filling game:

1. he_lo
2. d_g
3. s_pe_cal_fr_g_li_t_cex_i_lid_cio_s

You might fill in the first one (hello) and the last one (supercalifragilisticexpialidocious) with confidence, while the middle one may be a bit ambiguous (dog? dig? dag?).

Before this game, one might think the more blanks we have, the more possibilities it allows and therefore, the less confidence we have. However, in this demonstration, we see the opposite side: the more blanks we have (with the total length also longer correspondingly), the more constrained the problem is.

In some sense, this is the essence of compressed sensing: we have high-dimensional data with low-dimensional structure, thus with few observations we can tell what the data is.
In the 3rd example, we have a 34-alphabet word (high-dimensional data), but not every alphabet is arbitrary: the word should be one of the very few valid 34-alphabet words (low-dimensional structure). As a result, knowing a few alphabets in the word (few observations) is sufficient for us to tell what word it is.

## Motivating game 2: Pokemon identification

In some sense, humanity has known tomography since antiquity. Haven’t we all been intrigued by our shadows cast by the sun shine, pondering what it is? At least, in Plato’s allegory of the cave, common men are chained in a cave while watching the shadows of objects which are projected on a blank wall by the fire behind them.

We’d better not dwell too much onto this grim picture of humanity as Plato saw it. Instead, we can always take a lighter approach by contemplating on Pokemon’s silhouettes, which I would call “Pokemo-graphy”. I hope this sounds cute enough, but I know no better ways to pay homage to the cuteness of Pokemon. Can you help me to figure out the following Pokemon? I bet you should not call yourself a Pokemon fan if you cannot answer it perfectly without the aid of Professor Google. Let us pick the easiest example: Pikachu. Do you learn anything from the practice? Can you decipher your thought process, to conclude that it is indeed Pikachu? Firstly, I think telling you in advance that it is a Pokemon helps to narrow down your wild imagination as you are searching for an answer. There are also some unmistakable marks of Pikachu in the silhouette, such as its thunder-like tail or its chubby cheeks. We are pretty sure that you must have started to imagine where all the body parts are as soon as you see its shadow.

At this point, you probably have already gleaned some intuition regarding what tomography is about. We wish to reconstruct what the actual object looks like by simply taking its cross-sections. The idea has been explored widely for application in many areas, the most notable one is in medical imaging. Interestingly, a similar procedure is essential in the context of quantum computation. In the strange quantum world, we can only see a shadow of the quantum object so that we need to be intelligent if we want to know what it actually is.

## Quantum State Tomography (QST)

We are now getting into more serious stuffs. We really want to reconstruct a certain quantum state, both accurately and efficiently. Without getting too much into physics, this QST problem can be formulated as: we have a bunch of possible measurements that can be applied on a quantum state, and from the measurement results, we shall infer what the state is.

There are a number of viable alternatives to perform such a task, but in this work we explore the use of compressed sensing (CS) and neural network (NN) to perform QST.

### Compressed Sensing-QST (CS-QST)

“Motivating Game 1” gives some flavor of compressed sensing in a handwavy fashion. Here we present it more concretely:

In the classical problem of signal reconstruction, if we know a signal is sparse (i.e. very few nonzero entries) in frequency domain while our measurements are just a few samples from the time domain, we can reconstruct the signal in the form: where y is the measurement we have in time domain, x is the signal in frequency domain and A is the Fourier transform matrix. The $L_1$ norm is used instead of $L_0$ to encourage sparsity (for numerical tractibility).

This procedure is captured in a nice animation copied from Wikipedia:

## On the left is the signal in e.g. time domain which we want to reconstruct and are able to sample from. The orange line is our reconstruction and the black dots are the sampled data. On the right is the signal’s sparse representation in e.g. frequency domain.

Of course, the trick of compressed sensing is not guaranteed to work for any reconstruction problem with low dimensional structure: the reconstruction matrix (‘A’ in our example) has to satistfy the ‘Restricted Isometry Property’ (RIP). Some examples other than signal reconstruction would be image reconstruction, where we have the prior knowledge that images are typically sparse in wavelet domain. Luckily, RIP is satisfied by random matrices, thus making compressed sensing a widely applicable technique.

Moving from classical signal reconstruction to quantum state tomography, the recipe is still quite simple, but here instead of a vector, we want to reconstruct a matrix, and instead of knowing the signal is sparse, we know e.g. the matrix is low-rank. Physically, if a quantum state is described by a density matrix of low rank, then it is almost a pure state (which usually means we have a good quantum resource). Mathematically, what we are doing here is effectively matrix completion and “quantum state tomography” just gives it some physical meanings and constraints.

Consider m arbitrary measurement operators (yes, we have relaxed the typical constraint in literature that measurement operator be a composition of spin projections) $A_1$, $A_2$, …, $A_m$ and a quantum state represented by a density matrix $\rho$ of a low rank r. Through the m different measurements represented by the operators $A_1$, $A_2$, …, $A_m$, we can reconstruct the state (target matrix) by a convex optimization: Using a target matrix of $d = 10$ ( $\sim$3 qubits), with varying ranks ( $1\leq r \leq 10$), we show the minimum error one can achieve with m random measurements, as in Fig. 1. As a sanity check, we first note the obvious fact that minimum error should decrease with respect to the number of measurements. Moreover, when the target matrix is not full-rank, the error can be reduced to 0 without involving 100 measurements (corresponding to the total degrees of freedom for a $10\times 10$ matrix). Fig. 1 Reconstruction error vs number of measurements, for different target  matrix ranks

Interestingly, we also observed that when the number of measurements is below the minimum requirement, lower rank matrix does not lead to a better reconstruction. This feature was not reported in Ref. 1, and the result is shown in Fig. 2, which is intuitive in retrospect, because with very few measurements, low-rank matrix means higher likelihood for the measurement operators to act on bases in which the low-rank properties of the density matrix are not manifest, thus reducing the quality of information we obtained from the few measurements. On the other hand, a matrix of lower-rank contains less information so that it takes less measurements to perfectly recover such a matrix. This explains the observed crossover in Fig. 1.

Next, we look at the trend of number of required measurements (M) for perfect reconstruction for different ranks (which should be linear according to CS literature), as in Fig. 2. Fig. 2 Required number of measurements for different target matrix ranks, fitted by different functions}

Here we notice that the trend can be very well approximated by an exponentially rising function $A(1-e^{-r/B})$, which is indeed linear in r at low-rank limit and thus consistent with the literature.

We repeat the aforementioned procedure (first plot accuracy against # of measurements and then extract the threshold for perfect reconstruction), but on target matrices with fixed rank (r=2) and various dimensions d. The trend of number of required measurements with different target dimension is shown in Fig. 3. Fig. 3 Required number of measurements for different target matrix dimensions, fitted by different functions

Here we found the relation of M to d is $M\propto d\log{d}$, as predicted in classical CS  and different from the bounds for quantum case . This is the effect of our relaxation of the constraint on $A_m$ (i.e. more freedom to choose measurement operator -> more random measurement matrices -> better scaling of # of measurements required for accurate reconstruction).

### Neural Network-QST (NN-QST)

We also explored how neural network can be used to perform quantum state tomography. One central question of the problem is the how neural network can be used to encode a certain quantum state. Here, we try to reproduce the state-of-art methods to represent quantum wavefunctions or density matrices using Restricted Boltzmann Machine (RBM) . The modification comes in as we apply the methods on our own simulated measurements on N-qubit system. The basic idea is that we can use two separate RBMs to encode the probability amplitude ( $C$) and phase ( $\phi$) respectively in a certain basis set $\bm{\sigma} = \{\sigma_{1}, ..., \sigma_{N}\}$, where $\sigma_{i}$ can stand for any Pauli basis defined according to the measurement axis of qubit $i$. With index $\kappa \in \{C,\phi\}$, the Boltzmann weight for a particular configuration is given by: $p_{\kappa}(\bm{\sigma},\bm{h}) = \exp\bigg[\sum_{ij}W^{\kappa}_{ij}h_{i}\sigma_{j} + \sum_{j} b^{\kappa}_{j}\sigma_{j} +\sum_{i} c^{\kappa}_{i}h_{i}\bigg],$

where $\bm{h} = \{h_{1}, ..., h_{M}\}$ denotes the states of the hidden nodes, with $h_{j}\in\{-1,1\}$. We subsequently obtain the distribution over $\bm{\sigma}$, i.e. $p_{\kappa}(\bm{\sigma})$ by marginalizing the dependence on $\bm{h}$. The resulting wavefunction is given by: $\psi(\bm{\sigma}) = C(\bm{\sigma})\exp(i\phi(\bm{\sigma})),$ $C(\bm{\sigma}) = \sqrt{\frac{p_{C}(\bm{\sigma})}{\sum_{\bm{\sigma}}p_{C}(\bm{\sigma})}}, \hspace{10pt}\phi(\bm{\sigma}) = \frac{1}{2}\log(p_{\phi}(\bm{\sigma})).$

In terms of density matrix formulation, the scheme in Ref.  only works for rank-1 matrices, i.e. pure states. In our current implementation, we extend it to capture the diagonal terms of any density matrices based on simulated measurements on $N$-qubit system. The RBM is therefore trained to minimize the Kullback-Leibler (KL) divergence in the measurement basis: $\text{KL}_{C} = \sum_{\bm{\sigma}}P(\bm{\sigma})\log\frac{P(\bm{\sigma})}{\rho(\bm{\sigma,\sigma})}$

whereas $P(\bm{\sigma})$ denotes the measurement statistics in the $\bm{\sigma}$ basis, i.e. the probability of attaining state $\bm{\sigma}$ upon measurement, while $\rho(\bm{\sigma},\bm{\sigma})$ is the diagonal element of the density matrix in the $\bm{\sigma}$ basis. Our results for 3-qubit system are shown in Fig.  . Fig. 4: The original and reconstructed probability density of 3-qubit system under 10,000 measurements using RBM.

### What’s more

The next step is to extend the scheme to general low-rank matrices, which can in principle be implemented using auxiliary nodes, which can later be integrated out to produce mixed states . The reconstruction of the whole density matrices can in principle be done by choosing a set of measurement bases, and then allowing the RBM to learn their most faithful representations by minimizing the KL-divergence over all the measurement bases.

You may wonder in what sense can this NN-based QST be compressed? Well, one obvious way is to use regularization. The enormous body of literature on regularization in machine learning will allow you to have a way more general framework for compressed QST, where the prior knowledge is not limited to sparsity of the parameters. One can imagine having a generative model producing the parameters of the neural network, which can encode many different types of prior knowledge and sparsity is just one of them. For example, if our lab has some systematic problem that bias the quantum states we can produce, that can be captured by such a generative model. Well, that is a different story and will take up another blog post.

Look forward to that!

## References

 S. T. Flammia, D. Gross, Y.-K. Liu, and J. Eisert. Quantumtomography via compressed sensing: error bounds, samplecomplexity and efficient estimators. New Journal of Physics,14(9):095022, Sep 2012

 E. J. Candes, J. Romberg, and T. Tao. Robust uncertainty prin-ciples: exact signal reconstruction from highly incomplete fre-quency information. IEEE Transactions on Information Theory, 52(2):489–509, Feb 2006

 G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. Melko,and G. Carleo. Neural-network quantum state tomography. Nature Physics, 14:447–450, May 2018

 G. Torlai and R. G. Melko. Latent space purification via neural density operators. Physical Review Letters, 120:240503, Jun 2018.