Low-Complexity Integer-Forcing Methods for Block Fading MIMO Multiple-Access Channels

Integer forcing is an alternative approach to conventional linear receivers for multiple-antenna systems. In an integer-forcing receiver, integer linear combinations of messages are extracted from the received matrix before each individual message is recovered. Recently, the integer-forcing approach was generalized to a block fading scenario. Among the existing variations of the scheme, the ones with the highest achievable rates have the drawback that no efficient algorithm is known to find the best choice of integer linear combination coefficients. In this paper, we propose several sub-optimal methods to find these coefficients with low complexity, covering both parallel and successive interference cancellation versions of the receiver. Simulation results show that the proposed methods attain a performance close to optimal in terms of achievable rates for a given outage probability. Moreover, a low-complexity implementation using root LDPC codes is developed, showing that the benefits of the proposed methods also carry on to practice.


Low-Complexity Integer-Forcing Methods for Block
Fading MIMO Multiple-Access Channels

Ricardo Bohaczuk Venturelli and Danilo Silva
Abstract-Integer forcing is an alternative approach to conventional linear receivers for multiple-antenna systems. In an integer-forcing receiver, integer linear combinations of messages are extracted from the received matrix before each individual message is recovered. Recently, the integer-forcing approach was generalized to a block fading scenario. Among the existing variations of the scheme, the ones with the highest achievable rates have the drawback that no efficient algorithm is known to find the best choice of integer linear combination coefficients. In this paper, we propose several sub-optimal methods to find these coefficients with low complexity, covering both parallel and successive interference cancellation versions of the receiver. Simulation results show that the proposed methods attain a performance close to optimal in terms of achievable rates for a given outage probability. Moreover, a low-complexity implementation using root LDPC codes is developed, showing that the benefits of the proposed methods also carry on to practice.

I. INTRODUCTION
Integer-forcing (IF) receivers are an alternative to conventional methods of equalization, such as zero-forcing (ZF) and minimum-mean-squared-error (MMSE) equalization [2] for multiple-input and multiple-output (MIMO) channels. The IF approach follows from the compute-and-forward framework [3], [4] for relay networks, where the receivers attempt to extract integer linear combinations of the transmitted messages from the received signals, before recovering the messages themselves.
Although joint maximum likelihood (ML) receivers achieve the best performance among all methods, by searching over all possible transmitted codewords [5], their complexity is prohibitively high, increasing exponentially with the number of users. In contrast, IF receivers have a much lower complexity and can approach the ML performance in many situations [2]. Moreover, the performance of an IF receiver can be further improved in some situations by successive computation, leading to the so-called successive IF (SIF) receiver, analogously to the successive interference cancellation (SIC) technique for conventional linear receivers. A preliminary version of this paper was presented at the XXXIV Simpósio Brasileiro de Telecomunicações e Processamento de Sinais (SBrT' 16), Santarém, PA, Brazil, August 30-September 4, 2016 [1].
Digital Object Identifier: 10.14209/jcis.2017.14 Recent works on IF include the design of practical channel codes compatible with IF [6], [7], efficient methods to select the coefficients of the integer linear combinations [8], [9], as well as its application to relay networks [10]. Moreover, the IF principle has been used not only as receive method in the MIMO uplink scenario, but also for precoding in the MIMO downlink scenario [11], [12] and even for source coding [13].
The main results about IF receivers consider static fading, where all symbols of a codeword are subject to the same channel fading. However, in a practical situation where a powerful code with large blocklength is used, it may not be realistic to assume that all symbols of the codeword are subject to the same channel fading. Therefore, channels that allow block fading [14], where the channel fading can vary during the transmission of a codeword, seem to be a more realistic model.
In a recent work, El Bakoury and Nazer [15] generalize the IF approach to a block fading scenario. They described two decoding methods for block fading, which are called AM (arithmetic mean) and GM (geometric mean) decoding. The AM decoding method approximates the effective noise seen on all blocks (after equalization) as having the same variance, as in the case of static fading. On the other hand, the GM decoding method optimally exploits the diversity inherent in the channel variation, allowing to achieve higher rates than with AM decoding. Both decoding methods are applicable to both (non-successive) IF and SIF receivers.
The rates achievable by all these receivers depend on the choice of an integer matrix A specifying the coefficients of the linear combinations that should be decoded. However, finding the optimal choice of A for GM-IF, AM-SIF and GM-SIF appears to be a hard problem for which no efficient approximation algorithm is known, making these receivers currently infeasible to implement in practice. Nevertheless, results obtained in [15] by exhaustive search demonstrate that optimal GM-IF and AM-SIF significantly outperform AM-IF in terms of achievable rates, while GM-SIF outperforms all of them.
Our main contribution in this paper is to develop lowcomplexity optimization methods for these receivers (GM-IF, AM-SIF and GM-SIF) which can closely approach the performance obtained by an optimal search. Each proposed method is based on optimizing a more tractable lower bound on the achievable rate, which can be performed very efficiently. Simulation results show that the proposed method for GM-IF has a small gap compared to optimal performance, while the remaining two methods, at least for the scenarios tested, have performance almost indistinguishable from the optimal one.
Another contribution of this paper is to develop a low-arXiv:1711.07977v1 [cs.IT] 21 Nov 2017 complexity implementation of these receivers using finitelength codes over a low-order constellation, in order to validate the information-theoretic results under practical constraints. Attaining the GM performance requires the use of fulldiversity codes, from which we have adopted root low-density parity-check (LDPC) codes. Simulation results are shown to be consistent with the theoretical ones, with an expected gap due to the finite codeword length. The remainder of the paper is organized as follows. The system model is described in Section II. Section III reviews integer forcing for static fading as well as block fading, while Section IV reviews successive integer forcing, again for both fading types. In Section V, we present our proposed methods for selecting A. Section VI describes our low-complexity implementation under practical constraints. Simulation results are shown in Section VII, covering the information-theoretic performance as well as the performance with practical codes. Lastly, our conclusions are presented in Section VIII.

A. Notation
For any x > 0, define log + (x) max(log(x), 0). We denote row vectors as lowercase bold letters (e.g., x) and matrices as uppercase bold letters (e.g., X). The 2 -norm of a vector x is denoted by x . The matrix X T denotes the transpose of X. We use I and 0, respectively, to denote an identity and an all-zero matrix of appropriate size, which should always be clear from the context. The set of all m × n matrices with entries from the set A is denoted A m×n .
II. SYSTEM MODEL Consider a discrete-time, real Gaussian MIMO multipleaccess channel (MAC) with N T single-antenna transmitters and one N R -antenna receiver, subject to block fading with F independent fading realizations per codeword.
Specifically, let n be the codeword length and assume, for simplicity, that F divides n. For = 1, . . . , N T , let x ∈ R n denote the vector transmitted by the th transmitter, which can be represented by the th row of a matrix X ∈ R N T ×n . Similarly, let Y ∈ R N R ×n be a matrix whose jth row represents the vector received by the jth receive antenna. Assuming fading realizations of equal length, the received matrix can be expressed as where, for i = 1, . . . , F , and Z (i) ∈ R N R ×(n/F ) is a Gaussian noise matrix with i.i.d. entries of zero mean and variance σ 2 . Note that H (i) remains constant throughout the transmission of a block of n/F symbols, but can vary independently between realizations. For convenience, let We assume that the receiver has perfect knowledge of the channel realization H (1:F ) , while the transmitters do not have this knowledge and are only aware of the channel statistics. Each transmitted vector x is assumed to be the encoding of a message w ∈ W produced by the th transmitter, where W is the message space. The encoder rate, which is the same for all transmitters, is defined as Moreover, the transmitted vectors must satisfy a (symmetric) power constraint 1 n x 2 ≤ P.
These assumptions on equal power and equal rate are reasonable since the transmitters do not have knowledge of the channel matrix. For convenience, we denote SNR = P/σ 2 . At the receiver, the decoder attempts to recover all messages, producing estimatesŵ 1 , . . . ,ŵ N T . An error is said to occur ifŵ = w for any . The error probability of the scheme (encoder/decoder pair) is P e = E[P e (H (1:F ) )], where P e (H (1:F ) ) denotes the error probability for a fixed channel realization.
For any fixed H (1:F ) , a rate R is said to be achievable if, for any , δ > 0 and sufficiently large n, there exists a scheme of rate at least R − δ such that P e (H (1:F ) ) ≤ .
For a given family of schemes (indexed by n), let R scheme (H (1:F ) ) denote its maximum achievable rate under a fixed channel realization H (1:F ) . For a target rate R, the outage probability is defined as p out (R) P[R scheme (H (1:F ) ) < R], and for a fixed probability ρ ∈ (0, 1], the outage rate is defined as R out (ρ) sup{R : p out (R) ≤ ρ}.

Remark:
We have adopted a real-valued channel in order to facilitate comparison with optimal IF methods that require exhaustive search-whose complexity becomes prohibitively large over a complex-valued channel even for low dimensionas well as with the existing literature, which mostly considers real-valued channels. There is no loss of generality, since it is always possible to express a complex-valued channel Y c = H c X c + Z c as a real-valued channel Alternatively, all expressions presented here can be straightforwardly generalized to the complex case by replacing transpose with conjugate transpose.

III. INTEGER FORCING
To aid the understanding, we first review integer forcing for static fading (F = 1), before describing its extension to block fading (F ≥ 2). For simplicity, the superscript indicating the block index is omitted when F = 1.

A. Static Fading
An integer-forcing receiver [2] shares the same basic structure of a conventional linear receiver, as illustrated in Fig. 1: first, the received matrix is linearly transformed by an equal-  ization matrix; then, each resulting stream is individually processed by a channel decoder. The difference, in the case of the integer-forcing receiver, is that the equalizer attempts to estimate not the transmitted signals directly but rather an integer linear transformation of the transmitted signals, which is then inverted after noise is removed.
Crucial to this noise removal step at the channel decoders is the use of a lattice code common to all transmitters [2], [3]. A lattice Λ ∈ R n is a discrete subgroup of R n , i.e., it is closed under integer linear combinations [16]. In particular, a lattice can be expressed as Λ = {x ∈ R n : x = uG, u ∈ Z n } where G ∈ R n×n is the generator matrix of Λ. It follows that if every x ∈ Λ, then aX ∈ Λ, for all a ∈ Z 1×N T [16]. In other words, since aX is also a codeword from the same code Λ, then a decoder for Λ is able to decode it (i.e., "denoise" it), regardless of the value chosen for a [3].
In more detail, consider the received matrix in (2), Let A ∈ Z N T ×N T be a full-rank integer matrix. The receiver applies the equalization matrix B ∈ R N T ×N R to create an effective channel output where is the so-called effective noise [2] and is an integer linear transformation of X.
Assuming that each x is chosen from the same lattice Λ ⊆ R n , we have that each row of V is also a lattice point from Λ [3]. Thus, if decoding is successful and V is recovered, then X can also be recovered, provided A is full-rank. 1 As a consequence, it is shown in [2], [3] that the following rate is achievable 1 One might think that a stricter condition is required, namely, that A is invertible. This would be true if any element of Λ could be chosen for transmission without a power constraint. However, when nested lattice shaping is used to satisfy the power constraint, it is possible to show that requiring is the per-component variance of the vector z eff,m , and a m , b m and z eff,m are the mth row of A, B and Z eff , respectively. The optimal equalization matrix B, for a given integer matrix A, can be found using MMSE estimation [2] as In this case, we have [4] where and the achievable rate becomes Optimizing the choice of A, we have the achievable rate As can be seen, the optimal matrix A solving (20) consists of a set of linearly independent integer vectors minimizing (17). Since M is symmetric and positive definite [4], it admits a Cholesky decomposition M = GG T , where G ∈ R N T ×N T , leading to Thus, the optimal solution can be described as the integer coefficients (under the basis G) of a set of N T shortest linearly independent vectors in the lattice generated by G. This is known as the Shortest Independent Vector Problem (SIVP) [18], which is believed to be NP-Hard [19]. However, suboptimal algorithms for basis reduction 2 exist that can find an approximation in polynomial time, such as the Lenstra-Lenstra-Lovasz (LLL) algorithm [21]. Note that, if A = I is chosen, then the scheme reduces to MMSE equalization [2]. Thus, integer forcing generalizes linear equalization, providing potentially higher achievable rates.

B. Block Fading
In the case of block fading, a complication arises, since now each block of the transmitted matrix experiences a different channel realization [14]. While it is possible to independently equalize each block of the received matrix, the resulting integer linear transformation A must be the same for all blocks, in order for the rows V to remain lattice codewords [15].
More precisely, for each ith block, let A (i) ∈ Z N T ×N T and B (i) ∈ R N T ×N R be its corresponding integer and equalization matrices, respectively, and compute the effective channel output for the block as where It follows that where Y eff , V and Z eff denote the horizontal concatenation of Y eff,(i) , V (i) and Z eff,(i) , respectively, for i = 1, . . . , F . However, we can see that, in general, the rows of are not guaranteed to be lattice points, since we cannot generally express V as an integer linear transformation of X.
For instance, the first half of a codeword concatenated to the second half of another codeword is not guaranteed to form a codeword in a general code. 3 Thus, for integer forcing to work over a block fading channel, we should require all A (i) to be equal [15], so that V = AX.
For a given A, the optimal equalization matrix for each ith block can be obtained as and the per-component variance of the vector z eff,m,(i) , the m-th row of the Z eff,(i) , is given as where and a m is the mth row of A.
Let z eff,m = z eff,m,(1) · · · z eff,m,(F ) denote the mth row of Z eff . Since the noise variance (32) may now be different for each block, an achievable rate expression is not immediate to obtain and may, in fact, depend on the type of decoder used [15]. Specifically, it depends on whether the decoder properly exploits the diversity inherent in the block fading channel.
Two decoding methods are proposed and analyzed in [15]: the Arithmetic Mean (AM) and the Geometric Mean (GM) decoders discussed below. 1) AM decoder: This decoder does not attempt to exploit the channel variation and instead treats each component of the effective noise vector z eff,m as having the same variance [15], denoted by σ 2 AM,m . This variance is given by the arithmetic mean (hence the decoder name) of the variance of the effective noise on each block, It is shown in [15] that the AM-IF receiver achieves the following rate Thus, the maximum rate achievable by this method is Note that (35) is identical to (19) with M replaced by M AM . Thus, (37) can be solved in the same way as in the case of static fading [15]. In particular, the LLL algorithm (or similar) may be used to find an approximately optimal solution in polynomial time.
A special case of AM-IF consists of choosing A = I, which corresponds to conventional MMSE equalization followed by an AM decoder [15]. This scheme is referred to as AM-MMSE and its achievable rate denoted as 2) GM decoder: This decoder optimally exploits the fact that the effective noise variance is not constant across the blocks [15]. The rate achievable by this method can be understood as the average achievable rate among all the individual blocks, as if they could be treated as parallel channels (which is clearly an upper bound). More precisely, the rate achievable by GM-IF is proven in [15] to be Note that (41) is the geometric mean (hence the decoder name) of the variance of the effective noise in each block. Therefore, the maximum achievable rate with GM-IF is It is useful pointing out that, for the same matrix A, the GM decoder always achieves a rate at least as high as that of the AM decoder, since σ 2 GM ≤ σ 2 AM due to AM-GM inequality [22]. However, there is currently no known efficient method to find an optimal (or approximately optimal) solution for A in (43) [15], making optimal GM-IF currently infeasible to implement in practice, especially as N T grows.
As before, a special case of GM-IF consists of choosing A = I, which corresponds to conventional MMSE equalization followed by a GM decoder [15]. This scheme is referred to as GM-MMSE and its achievable rate denoted as (44)

IV. SUCCESSIVE INTEGER-FORCING
One way to improve the performance of a conventional linear receiver is to apply successive interference cancellation (SIC) [23]: after a codeword is successfully decoded, the receiver can use it as side information in order to cancel part of the interference, reducing the variance of the effective noise. This principle can be applied to integer-forcing as well [24]. Specifically, in a successive integer-forcing (SIF) receiver, each integer linear combination of codewords that is successfully decoded is used to cancel its contribution to the effective noise affecting the remaining linear combinations, potentially enabling a higher achievable rate.
Note that SIF decoding must be done sequentially, in contrast to conventional IF, which may be done in parallel. Thus, the decoding order is relevant. We assume that decoding follows the index of a m , m = 1, . . . , N T . Thus, in contrast to conventional IF, the ordering of the rows of A may have an impact on the achievable rates for SIF.
We start by reviewing SIF for static fading, followed by its extension to block fading.

A. Static Fading
Recall the effective channel (11) described in Section III. The SIF receiver exploits the fact that the rows Z eff are correlated in general and starts by performing a whitening transformation.
Consider the generalized covariance matrix of Z eff , defined as Assuming that the optimal equalization matrix B is used, it is possible to show that [2], [24] K Zeff = σ 2 AMA T where M is defined in (18).
Since K Zeff is a symmetric positive definite matrix [24], it admits a Cholesky decomposition K Zeff = LL T , where L is a lower triangular matrix with strictly positive diagonal entries.
Note that the generalized covariance matrix of N is the identity matrix [24]. The effective channel output (11) can be rewritten as where V = AX. Since L is a lower triangular matrix, we have that where m,j denotes the (m, j) entry of L. Note that y eff,m is not affected by n m for any m > m. The decoder acts in each row of Y eff in a successive way. Suppose that v 1 is successfully recovered. Then we can compute and remove its influence on the second row of Y eff , so that v 2 can be decoded in the presence of less noise. Generalizing, suppose that v 1 , . . . , v m−1 have been successfully recovered, providing the estimates n 1 , . . . , n m−1 . We can remove the influence of this noise to obtain so that v m can be decoded under noise of variance 2 m,m . Then, the corresponding noise vector can be estimated as Proceeding this way, it can be shown that the following rate is achievable [24] R SIF (H, A) = min By choosing the optimal A, the maximum achievable rate is However, finding the optimal matrix for the SIF decoder is a different (and harder) problem than that for the IF decoder. In particular, each row permutation of A may give a different achievable rate. As shown in [24], it is possible to restrict the choice of A to the class of unimodular matrices and the optimal solution is obtained by finding a Korkin-Zolotarev (KZ) basis for the lattice generated by G ∈ R N T ×N T , obtained from the Cholesky decomposition of M = GG T .
Finding a KZ basis for a lattice involves finding a shortest lattice vector and is therefore an NP-hard problem [20]. Suboptimal algorithms can be used, for example, applying the LLL algorithm N T successive times, where in each iteration the dimension of the underlying lattice decreases [20].
Note that it is possible to choose A = π(I), where π(I) denotes a row permutation of the identity matrix. In this case, the method reduces to conventional SIC decoding, which is referred to as MMSE-SIC. In principle, all possible permutations could be tested, however this quickly becomes unattractive as the number of users increases. Heuristic methods [23] can be applied to find a good decoding order, for instance, decoding first the user with the highest SNR (i.e., lowest σ 2 eff,m ), at the expense of some performance degradation.

B. Block Fading
As in the case of parallel IF, for successive IF in the block fading scenario it is required that A remain the same for all blocks so that the rows of V in (48) are still lattice points and lattice decoding and subsequent inversion is possible. All the other steps are the same as for static fading applied separately to each ith block, namely: equalization by B (i) given in (31), Cholesky decomposition of where M (i) is defined in (33), and successive noise cancellation and estimation from Note that the effective channel after cancellation can be expressed more simply as where y m and z m are the horizontal concatenation of y m,(i) and z m,(i) = m,m,(i) n m,(i) , respectively, for i = 1, . . . , F . In particular, each ith block of the reduced effective noise z m has a possibly different variance 2 m,m,(i) . For the lattice decoding step, either of the two decoding methods discussed before, namely AM and GM decoding, may be used [15]. Their generalization to the case of block fading is straightforward.
1) AM decoder: This decoder treats z m as white noise of variance It follows that the achievable rate of AM-SIF is given by However, there is currently no known efficient method to find an optimal or approximately optimal choice of A, which should be chosen to minimize (60). Note that, if we choose A = π(I), then the method reduces to conventional SIC with an AM decoder. This method is referred to as AM-SIC and its achievable rate is given by 2) GM decoder: As before, this decoder attempts to optimally exploit the variation of the noise statistics across blocks.
The achievable rate for GM-SIF can be computed as the average achievable rate all blocks, given by [15] Note that, due the AM-GM inequality, for the same matrix A, the GM decoder achieves a rate at least as high as that of the AM decoder. The maximum achievable rate for GM-SIF is then given by However, there is currently no known efficient method to find an optimal or approximately optimal solution for A, making optimal GM-SIF infeasible to implement in practice. Even an exhaustive search is more costly for GM-SIF than for GM-IF, since now all row permutations of the same A must be considered. As a special case, it is always possible to choose A = π(I), which corresponds to conventional SIC with a GM decoder. This method is referred to as GM-SIC and its achievable rate is given by (68)

V. PROPOSED METHODS
Although AM-SIF, GM-IF and GM-SIF all have higher achievable rates than AM-IF, the fact that no efficient algorithm is known to find even an approximately optimal A can undermine the practical applicability of these methods.
In this section, we propose four suboptimal, low-complexity methods for choosing the integer matrix A. The first two are applicable to GM-IF, the third is applicable to AM-SIF, while the fourth applies to GM-SIF.

A. Proposed Method 1 (GM-IF)
Let be the optimal matrix A for AM-IF. We propose to use either this matrix or the identity matrix, depending on which one gives the highest rate under GM-IF decoding. Let A 1 = {I, A AM-IF }. The rate achievable by this method is given by Note that, if matrix A AM is chosen, then a rate at least as high as that of AM-IF is achieved, since for the same choice of A, the GM decoder always outperforms the AM decoder. On the other hand, if the identity matrix is chosen, then the proposed method becomes the same as GM-MMSE and, therefore, achieves the same rate. Thus, the proposed method achieves rates as high as both GM-MMSE and AM-IF.
The complexity of this method is dominated by that of finding an optimal matrix for AM-IF, which can be approximated in polynomial time with the LLL algorithm. Therefore, the complexity is the same as that of AM-IF.

B. Proposed Method 2 (GM-IF)
In addition to the choices discussed above, we propose to test also the optimal matrix A (i) for each ith block that would be obtained with IF under static fading, namely The rate achievable by this method is given by Note that Proposed Method 1 chooses a matrix which may be "reasonably good" for all blocks simultaneously but which is not necessarily optimal for any block. Proposed Method 2 expands this choice by including matrices which are optimal for at least one block, even if it they are worse for the others blocks. A reasoning behind this approach is that, in contrast to the AM decoder, the GM decoder is not limited by the performance of the worst block, since individual rates are added in (39).
The complexity of Proposed Method 2 is higher than that of Proposed Method 1 since it is necessary run the LLL algorithm F +1 times. Since the LLL algorithm can be done in polynomial time, this proposed method still viable in practice, especially for small F .

C. Proposed Method 3 (AM-SIF)
Recall that the AM-SIF receiver correctly takes into account the fact that the effective noise matrix has a different generalized covariance matrix K Z eff,(i) for each block, using each corresponding L (i) for noise cancellation; only the lattice decoding step treats the reduced effective noise z m as having equal variance σ 2 AM-SIF,m across blocks. An upper bound on this variance can be obtained by treating each block of the effective noise matrix Z eff as having the same generalized covariance matrix, given by . From this point on, we can proceed similarly to the case of static fading (Section IV-A), obtaining a reduced effective noise z m with variance 2 m,m , where L is a lower triangular matrix with positive diagonal entries given by the Cholesky decomposition K Zeff = LL T . It follows that σ 2 AM-SIF,m ≤ 2 m,m , since a suboptimal noise cancellation scheme is used. 4 To be clear, the scheme described above uses block IF equalization, followed by static noise cancellation (SNC) and AM decoding, in contrast to AM-SIF, which uses block noise cancellation. To distinguish it from AM-SIF, we refer to this scheme as AM-SIF-SNC. Its achievable rate is then given by which is a lower bound on R AM-SIF (H (1:F ) , A). Let be the optimal matrix A for AM-SIF-SNC. We propose to use this matrix for AM-SIF. The rate achievable by this method is given by Note that optimizing (76) is exactly the same problem as optimizing (54). Thus, as discussed in Section IV-A, A AM-SIF-SNC can be computed by KZ reduction of the lattice generated by G ∈ R N T ×N T , obtained from the Cholesky decomposition of M = GG T , and this procedure can be approximated by applying the LLL algorithm N T times.

D. Proposed Method 4 (GM-SIF)
We can extended the same ideas of Proposed Method 2 to the case of successive decoding, simply by redefining (71) and (72) with their SIF counterparts, while using A AM-SIF-SNC instead of A AM-SIF , as in Proposed Method 3. However, since the decoding order is important, in principle we would have to test all permutations of the identity matrix, but that number of permutations grows exponentially with the number of users. To avoid this complexity, we simply exclude the identity matrix (and all its permutations) from the set of possible choices for A. Let is the optimal matrix for the ith block that would be obtained with SIF under static fading. The rate achievable by this method is given by Similarly to Proposed Method 2, the complexity of this method grows linearly with the number of blocks. However, finding A AM-SIF-SNC as well as each A SIF,(i) requires running the LLL algorithm N T times, for a total of N T (F + 1) runs.  Table I shows the complexity of finding A for each method discussed. For AM/GM-MMSE methods, since A = I is known a priori, the complexity is negligible. The AM-IF and Proposed Method 1 have the same complexity, corresponding to a single run of the LLL algorithm [9]. Note that, for fixed F , Proposed Method 2 approaches the complexity of AM-IF and Proposed Method 1. For the successive scenario, the optimal choice of A in AM/GM-SIC is found by testing all permutations of the identity matrix. Proposed Method 3 and Proposed Method 4 have similar complexity as Proposed Method 1 and 2, respectively. Finally, the complexity of optimal GM/AM-SIF by exhaustive search follows from the bound in [2] (see also [2], [9]).

E. Summary of Complexity
Note that, besides finding A, the receiver operation includes also the tasks of equalization and channel decoding, whose complexity scales with the blocklength n and is identical for each method in the same category (parallel or successive). Thus, the task of finding A tends to take a smaller fraction of the overall decoding time as n grows.

VI. IMPLEMENTATION WITH PRACTICAL CODES
All the achievable rate results discussed above assume the use of lattice codes of asymptotically high dimension over an asymptotically large constellation. However, in practice, a finite-length code and a finite-order modulation must be used. In this section, we discuss how practical lattice encoders and decoders for an IF receiver may be implemented with low complexity. We focus on the use of binary LDPC codes with 2-PAM modulation.

A. Encoding and Decoding
We start with conventional IF; the extension to successive IF is straightforward. To simplify the description, with a slight abuse of notation, we consider the finite field of size 2, denoted Z 2 , as a subset of the integers, Z 2 = {0, 1} ⊆ Z. Suppose the th user encodes its message as a codeword c ∈ C from a linear (n, k) block code C over Z 2 . The codeword c is then modulated into a vector x ∈ {− 1 2 , 1 2 } n from a 2-PAM constellation, computed as where d ∈ {− 1 2 , 1 2 } n is a (discrete) dither vector independent from x and known to the receiver, and the x mod 2 operation is applied element-wise and assumed to return a real-valued number in the interval (−1, 1]. Note that the dither vector must be used in order to reduce the transmit power from the {0, 1} constellation to the {− 1 2 , 1 2 } constellation, and it could be interpreted more simply as a modulation map. In this case, a simple choice would be d = (− 1 2 , . . . , − 1 2 ). However, using a random dither uniformly distributed over {− 1 2 , 1 2 } n is more convenient for our purposes since, as we shall see, it makes the error probability independent from the transmitted codeword. As a consequence of the use of dithers, the transmitted vector x is not a lattice point anymore, in contrast to the description is sections III and IV. However, dithers can be easily removed at the receiver with a simple modification, re-enabling the results of the those sections. More precisely, let C and D be matrices whose th row is c and d , respectively. The receiver computes the effective channel output as where Z eff is defined as in Section III and V = AC mod 2.
Note that each row v m of V is a codeword from C. Thus, we recover the same effective channel (11), except for the mod-2 operation. 5 For  Fig. 2 shows the exact LLR in comparison with its approximation for a channel with SNR = 5 dB. As we can see, for most of the input range the approximation is indistinguishable from the exact value. The approximation is slightly less accurate when the input is close to an integer, but since these correspond to the peak LLR values, i.e., when there is a high degree of certainty about the value of v m [j], this loss of accuracy should not degrade the performance of belief propagation.
The above decoding procedure can be easily extended to successive IF by replacing y eff,m with y m , z eff,m with z m , and σ 2 eff,m [j] with (90)

B. Code Construction
In practice, approaching the GM performance (be it for MMSE, SIC, IF or SIF reception) requires not only a suitable decoder but also well-designed codes that allow the decoder to exploit diversity. This issue is not apparent in [15] since their achievable rate results (even for AM) are based on asymptotically good lattices that are already optimal for exploiting diversity. Designing such codes under finite-length and lowcomplexity constraints, however, is far from trivial.
It is well-known that an important parameter characterizing the performance of a code for a fading channel is its diversity order, defined as [25] d − lim SNR→∞ log P e log SNR where P e is the error probability of the decoder. For a block Rayleigh fading channel, the diversity order of a q-ary code is known to satisfy a Singleton-like bound [25] d where · is the floor function and R is the code rate in bits per channel use. Thus, codes that achieve full diversity (d = F ) are limited by R ≤ (log 2 q)/F , or R ≤ 1/F for binary codes. A family of rate-1/F binary LDPC codes that achieve full diversity under belief propagation and have performance close to theoretical limits are the so-called root LDPC codes [26]. These codes are systematic, with the information bits corresponding to the first n/F 2 positions of each block, and have a parity-check matrix H ∈ Z (F −1)n/F ×n 2 satisfying the following structure for all 1 ≤ i, j ≤ F and all 1 ≤ k ≤ F − 1. This structure implies that, for each information bit from each ith block, there is one parity-check equation relating it to the bits from the jth block (and no other blocks), for all j = i. It is worth mentioning that root LDPC codes under belief propagation guarantee full diversity only over the information bits. Thus, if one is interested in recovering the entire codeword, it must be regenerated by re-encoding the information bits after decoding.

VII. SIMULATION RESULTS
In this section we present simulation results comparing the outage rate performance of our proposed methods with the optimal performance obtained by exhaustive search. For comparison, we include the performance of AM-IF (previously the only low-complexity IF receiver), as well as that of ML and conventional linear receivers. In our simulations, we specify an outage probability ρ = 0.01, estimated by 10 4 channel realizations. In each realization, the channel fading coefficients are drawn independently from a real-valued Gaussian distribution with zero mean and unit variance.
For the optimal GM-IF, AM-SIF and GM-SIF, the matrix A was obtained by an exhaustive search over all matrices vectors whose 1 -norm of each row does not exceed 15. Fig. 3 shows the outage rate for all these receivers on a 2 × 2 channel with F = 2 blocks. As can be seen, Proposed Method 1 and 2 achieve performance close to optimal GM-IF and strictly higher than the maximum between AM-IF and GM-MMSE. In particular, for an outage rate of 1.5 bits/dim, the performance of Proposed Method 2 is within 1.8 dB of optimal GM-IF and outperforms Proposed Method 1 by 2.4 dB. On the other hand, Proposed Method 3 and 4 appear to have performance indistinguishable from optimal AM-SIF and optimal GM-SIF, respectively. Note that Proposed Method 4 outperforms GM-SIC by approximately 3.2 dB for an outage rate of 2 bits/dim, while AM-SIF has a much lower performance in this case. Fig. 4 shows the outage rate for a scenario with F = 2 blocks and SNR = 25 dB, varying the number of users N T , while assuming the same number of receive antennas, N R = N T . Due to the complexity of exhaustive search, which grows exponentially with N T , the performance of optimal GM-IF is shown only for N T = 2 and N T = 3, while that of optimal AM-SIF and GM-SIF is shown only for N T = 2. As can be seen, as N T increases, the performance of both Proposed Methods 1 and 2 appears to converge and is approached by that of AM-IF. Similarly, the performance of Proposed Method 3 significantly improves as N T increases, outperforming GM-SIC for N T ≥ 4 and approaching that of Proposed Method 4. Nevertheless, Proposed Method 4 still outperforms all other methods by a visible margin. Fig. 5 considers the same scenario as Fig. 4, but with F = 4 blocks. Similar observations can be made, except that, comparatively to Proposed Methods 1 and 2, the performance of AM-IF has worsened and that of GM-MMSE has improved, while still being significantly outperformed by the proposed methods. For the successive methods, a behavior similar to the F = 2 case is observed, except that now GM-SIC outperforms Proposed Method 3 for all N T ≤ 6 and achieves a smaller gap to Proposed Method 4. While one might expect this gap to vanish for large F , it should be noted that, due to rate limitations, constructions of full-diversity codes are typically restricted to the small F case.
Lastly, Figs. 6 and 7 show the frame-error rate (FER) on a 2 × 2 channel with F = 2 and F = 4, respectively, using 2-PAM modulation. A regular, rate-1/F root-LDPC code [26] of length n = 208, constructed using a PEG-based technique [27], is used in each simulation. As can be seen, the simulations are consistent with the theoretical results, with a performance gap due to the small constellation size and the non-optimality of the channel code. In particular, for a FER of 1%, both proposed methods are within 3.7 dB of their theoretical FER for F = 2 and within 3.4 dB for F = 4.
The FER for successive decoding is not shown, since for R ≤ 1/2 all methods have performance similar to their non-IF counterpart. For the benefits of SIF to become salient, lattice codes with higher spectral efficiency are needed. The design of such codes, however, is outside the scope of this paper.

VIII. CONCLUSIONS
In this paper, we propose four suboptimal methods for selecting an integer matrix A for IF reception in a block fading scenario, two of them applicable to GM-IF and the other two applicable to AM-SIF and GM-SIF, respectively. The main idea behind these methods is to use a matrix A optimized for a lower-performance scheme with a simpler objective function for which an approximately optimal solution can be found in polynomial time. For AM-SIF, the corresponding simpler scheme is the proposed AM-SIF-SNC scheme, while, for GM-IF and GM-SIF, it is the best choice among their respective AM counterpart and certain static fading solutions, all of which can be found very efficiently.
As shown by simulations, the proposed methods for GM-IF achieve outage rates strictly higher than both GM-MMSE and AM-IF (until now the best low-complexity methods), regardless of the number of blocks and users, while being only slightly more complex than AM-IF. Exactly the same Nt (and Nr) Rate (bits/dim) Rate (bits/dim) We also show that AM-(S)IF and GM-(S)IF schemes can be realized in practice with low complexity, under finite codeword length and constellation constraints. Simulation results using full-diversity root LDPC codes are found to agree with theoretical ones, confirming the superiority of GM-IF in comparison with GM-MMSE and AM-IF.
An interesting avenue for future work is the development of low-complexity, full-diversity lattice codes with higher spectral efficiency (for instance, full-diversity q-ary linear codes with q > 2 for use in Construction A [2]). Such codes would be directly applicable to the GM-IF and GM-SIF schemes, allowing a wider and more interesting operating range, in particular at outage rates for which these schemes are much superior to their AM or non-IF counterparts.