Low-Complexity Tree-Based Iterative Decoding for Coded SCMA

—Sparse Code Multiple Access (SCMA) is a powerful multiple access technique for future generations of wireless communication where users are allowed to transmit through pre- deﬁned channel resources with a controlled degree of collision. The base-station then recovers all the users’ data through some iterative method. The well-known Message-Passing Algorithm (MPA) has excellent performance but has exponential decoding complexity. Alternative decoding algorithms, such as MPA in the log-domain (Log-MPA), have been proposed in the literature aiming to reduce the decoding complexity while not signiﬁcantly decreasing performance. In recent work, the authors proposed a modiﬁcation in the conventional Log-MPA by exploring a tree structure associated with the decoding equations. By properly avoiding symbols with low reliability, a pruned tree is obtained, yielding an arbitrary trade-off between performance and com- plexity in the joint detection. In the present work, we extend this contribution by showing that the advantages of the tree-based decoding algorithm are magniﬁed when SCMA is coupled to an error-correcting code, in particular, a Low-Density-Parity-Check (LDPC) code. Through computer simulations, we show that an improved performance-decoding complexity trade-off is obtained.


I. INTRODUCTION
Sparse Code Multiple Access is a non-orthogonal multiple access technique proposed for future-generation wireless networks, which provides high efficiency and good performance [1]. In SCMA, the users' data are mapped into codewords, that are allocated in a non-orthogonal way in resources such as sub-carriers of an Orthogonal Frequency Division Multiplexing (OFDM) [2].
SCMA is a generalization of Low-Density Spreading (LDS) [3], which, in turn, is a sparse version of Code Division Multiple Access (CDMA) with low density spreading sequences [4]. In the LDS technique, the information bits of each user is mapped into a complex symbol of a signal constellation, and this symbol is repeated in a small number of resources or slots (subcarriers, in the case of OFDM), while the other resources are not used (the power allocated to these resources is null). The choice of resources to be used by each user is made so that each resource is used by a small number of users. The low number of collisions per resource allows for a representation of LDS by a sparse graph so that the Message-Passing Algorithm (MPA) [3] can be used to recover the symbols transmitted from all users In the SCMA technique, each user's input bits are mapped into complex codewords belonging to a fixed codebook [4]. As in LDS, in SCMA the complex symbols (coordinates) of the codeword are spread in different resources, the only difference being that, in SCMA, the symbols are not mere repetitions. Thanks to this feature, SCMA provides better performance [1]. Also, unlike CDMA, LDS and SCMA meet the massive connectivity demand of future wireless communication systems, that is, they allow a massive number of users, due to inherent design for overloaded systems.
One of the challenges of SCMA is to establish a good relationship between performance and detection complexity [5]. In general, the better the performance, the more complex the decoder becomes. There are several decoders algorithms that try to balance this relationship. The Maximum Likelihood (ML) detector has excellent performance, however, its complexity is prohibitively high. A widely used decoder is the MPA [3], which offers lower detection complexity with little performance loss compared to ML. But MPA is still quite complex (exponential).
In the conventional MPA [3], the decoding equations involve exponential functions, resulting in large storage and complexity [6]. The problem is reduced with the use of the Log-MPA algorithm [7], whose calculations are realized in the logarithmic domain, in a way similar to turbo codes [8].
The purpose of this article is to extend the study of our previous work [9], whose objective was to present a possible solution to reduce the detection complexity of Log-MPA. The idea was to introduce a tree structure associated with the decoding equations of Log-MPA, and then pruning some terms (brunches in the tree) considered of little relevance. This results in a detection complexity reduction, with an adjustable complexity-performance trade-off through a single parameter, , which indicates the number of terms (branches) left unpruned.
In [9], it was claimed that a performance close to that of the original Log-MPA could be obtained with a lower if error-correcting codes were used in the SCMA system. In this work, we consider coded SCMA where a Low-Density Parity-Check (LDPC) code is used in each user unit. The iteration process between the LDPC decoder and the tree-based Log-MPA is presented. Through computer simulations, we show that, indeed, as claimed in [9], the performance of coded SCMA with the tree-based Log-MPA approaches that of coded 00 01 10 11 User 01 00 01 10 11 x 1,2 SCMA with the conventional Log-MPA for a smaller (higher degree of pruning), thus adding value to the decoder proposed in [9]. The contributions of this work are: • We introduce error-correcting codes, in particular LDPC codes, to the SCMA system proposed in [9], and show that the performance of coded SCMA with the proposed decoding algorithm approaches the one with the conventional Log-MPA with an even larger number of pruned nodes, thus with an even lower decoding complexity; • We revisit the Log-MPA detector in [9], in terms of a tree structure based on the conventional Log-MPA equations, to allow for the introduction of the LDPC decoder; • We present simulation results for an additional scenario, besides the standard 6-user, 4-resource scenario considered in [9]; • We show that the complexity order of the proposed detector under a linear pruning grows exponentially but at a much lower rate than the conventional detector.
Finally, we would like to mention that most works with proposals for reducing the complexity of SCMA decoding do not provide enough information for reproducibility. For example, if the system adopts turbo codes, nothing but the code rate is specified and, in some case not even this information is provided. In some other cases, only the codebook size is reported. For this reason, in this paper, the proposed algorithm is directly compared with the original Log-MPA. We herein provide, in the end of of the simulation results section, a GitHub link through which all SCMA allocation matrices and codebooks used in the simulations can be downloaded.
This article is organized as follows. In section II, the SCMA system is described. In Section III, the conventional Log-MPA detector for SCMA is briefly presented. The proposed coded SCMA and the associated decoder are presented in Section IV. In Section V, simulation results are presented. Finally, in Section VI, we conclude the paper.
Notation: bold lowercase and uppercase indicate vectors and matrices, respectively. X , represents the element in theth row and -th column of the matrix X. the superscript denotes transposed matrix. The Euclidean norm of a matrix or a vector is represented by · ; uppercase and calligraphic variable represents a set. Finally, z ∼ CN ( , Z) is a complex random vector with Gaussian distribution with average and covariance matrix Z; I denotes the × identity matrix.

II. SCMA TRANSMISSION SYSTEM
The SCMA system is made up of independent users, which are multiplexed into orthogonal resources. In this system the number of users is greater than the number of resources, therefore, the load factor / will always be greater than 1 ( > ), configuring an overloaded system.
In the transmitter, the -th user's information index , with (non-null) complex values of the codeword are spread over resources, < , according to a resource allocation matrix, F, of dimension × , that specifies which resources are used by each user. The , element of the matrix is equal to 1 implies that user makes use of the -th resource and, therefore, one of the symbols of the codeword c , is allocated to this resource. Otherwise, , is 0. As an example, consider the matrix: with = 4 (resources) and = 6 (users). A characteristic of the resource allocation matrix is its sparsity, that is, it must have few non-null elements. Define the two sets:

=
: , ≠ 0 , which specifies the resources used by the -th user, and the set { = : , ≠ 0 , which informs which users are allocated to the -th resource. The cardinality of is , and the cardinality of will be denoted by . In the resource allocation matrix in (1), we have = 3 and = 2.
With the application of F, the codebook C is resized to a sparse version, X ⊂ C , according to each user's signature, f . In the resized codeword, x , ∈ X , − components will be the zero symbol, and the other componentes will receive the original values of the symbols of c , .
Thus, the received signal is given by: where  Figure 1 exemplifies a possible SCMA system transmission scenario, in which the users codebooks (X ) in the SCMA, respecting the resource allocation matrix F. The colored squares are the non-null complex values of the codeword for c , which have been allocated to the corresponding resources. Each user selects a codeword, represented by a bold rectangle. The receiver will receive y, which in turn is the sum of the codewords of the users in the resources, jointly with the Gaussian noise.

III. CONVENTIONAL LOG-MPA DETECTOR
The SCMA's sparsity characteristic allows the application of the Log-MPA [7] iterative decoding algorithm. For this, a bipartite graph is constructed whose nodes represent, on one side, the user nodes (UN), and on the other side, the resource nodes (RN) [10]. The allocation of the edges between the UNs and RNs is carried out based on the previously defined sets and .
Decoding in Log-MPA for the SCMA system is carried out by exchanging messages between UNs and RNs. The message from RN to UN is described as [7]: is a normalization factor of messages from RN to UN, and The function * max is the Jacobian logarithm, whose use in the Log-MPA algorithm was considered in [8]. It is defined as follows. Consider initially this maximization for two values, namely and . In this case, the function * max returns: Note that the result will not be the maximum value, since the log part adds an adjustment to the final value. The extension of * max to a list with more than two values, as it appears in (5), is possible by first considering the maximization in (6) between any two values in the list and then, by doing the maximization several times, each time between a new value in the list and the result of the maximization at the previous time.
Based on (3) and (5), and considering = for all ∈ {1, . . . , }, the complexity of conventional Log-MPA has order O{ }. It is important to note that, for the maximization in (5), the algorithm needs to consider all the codewords of all users that use the -th resource except user . For more details on the Log-MPA algorithm, see [7].

IV. PROPOSED CODED SCMA AND ASSOCIATED DECODING ALGORITHM
In this section, we will detail the fundamental aspects of implementing the SCMA system with the LDPC code using a Log-MPA detector based on a pruned-tree structure. First, we will show how the LDPC code was added to an SCMA system. Subsequently, we present the process of implementing the pruned tree structure over the Log-MPA detection algorithm, detailing each of its equations. Later, we will explain the calculation of the complexity order of the proposed algorithm and how it is directly linked to the number of pruned branches.

A. SCMA System With LDPC
For better understanding, Figure 2 presents an outline of how the LDPC code for the -th user interacts with the SCMA system.
In the transmitter of the -th user, a binary message (m ) is LDCP encoded and the codeword is bit-interleaved, to avoid error bursts. Subsequently, the bit-interleaved codeword (d ) is divided into groups (d (1) , . . . ,d ( ) ) of bits each, where = log 2 . Each group will be processed by the SCMA system, generating a specific codeword of complex symbols from the SCMA codebook. To perform transmission, the parallel-to-serial process must take place and finally the compound vector (X ) formed by the codewords (x (1) , . . . , x ( ) ). It is worth mentioning that the SCMA codeword uses resources, − of which receives the zero symbol. The received signal Y is the superposition (over the fading channels) of all compound vectors from users, as in eq. (2).
In the reception,the received vector Y will be processed by a serial-to-parallel block, producing vectors (y (1) , . . . , y ( ) ), which are jointly processed by the SCMA's π j (·) multiuser detector. Herein, the detector is the Log-MPA based on the pruned tree structure. The detector outputs soft bits in the form of Log-Likelihood Ratios (LLRs) which are then serialized into L (d ). Subsequently, L (d ) is deinterleaved and, finally, delivered to the LDPC decoder as a priori information. These steps are repeated iteratively, until theth LDPC decoder performs a hard-decision and outputs the decoded message (m ). The multiuser detection takes all users into account, while an independent LDPC decoder is used for each user. Note that we are free to choose the LDPC codes independently. Herein, we adopt the same LDPC code for all users. But a more general code design is possible.

B. Tree-Based Log-MPA
The central idea of the proposed algorithm is to consider a tree structure in which each level is associated with a certain user that makes use of the resource , and each branch of this level is associated with a codeword of that user. The tree has levels. To get the [ ,ext] → [ ] message in (5), we have that the level 0 of the tree is associated with the user . The lower − 1 levels are associated with users in the set \ { }. In the proposed algorithm, with the exception of level zero, only branches ( < ) at level , descendants of a node at level − 1, are considered. Considering the resource allocation matrix 1 and Figure 1, Figure 3 demonstrates a tree for the resource = 2, = 1 and for the first codeword 1 = 1. It is noticed that for each new branches, descending nodes are pruned, represented by the symbols . In this way, pruned nodes do not generate new branches, that is, they do not have descending nodes.  [ ] is broken into steps, performed as you go through the tree from top to bottom, but considering only the sub-tree that remains after pruning. At the = 0 level, we have: Before we move to the next level, some definitions are needed. In the derivations that follow from this point, users using the -th resource (except the user ) are sorted in decreasing order, according to the absolute value of the channel gain. The set of indexes of ordered users is denoted by: \ = 1 , 2 , . . . , −1 , as : So, at the = 1 level, we have: The indices that maximize (9)  (14) Then, the other steps of the conventional Log-MPA algorithm are continued. It is worth mentioning that the normaliza-tion factor in (4) will be applied only to the surviving nodes at the last level since they are the ones that carry relevant information.

C. Complexity Order
As the objective of the proposed algorithm is to reduce the complexity order, this section presents the necessary calculations to determine the complexity order of the proposed Log-MPA. We will not consider the number of additions and multiplications involved in the calculation of the message [ ,ext] → . Intead, we will count the number of times the message is calculated. Then, the conventional Log-MPA has the following complexity order: For the detector proposed in this work, and considering = for all ∈ {1, . . . , }, the complexity order is or, in closed form, (16) A simple analysis of eq. (16) shows that the decoding complexity of the proposed algorithm grows from 2 (if is constant) up to (if = , for 0 < < 1). Even so, in the latter case, there is a significant reduction in complexity in comparison to Log-MPA, as can be seen in Table I.  From the table, it can be seen that, except for the case = 3 with = , the complexity order of the proposed approach is considerably lower. Moreover, as increases, the complexity reduction obtained with the proposed detector becomes more significant, since the more levels the tree has, that is, the larger is, the higher the fraction of pruned nodes.
V. SIMULATION RESULTS This section presents Monte Carlo simulation results for the bit error rate (BER), as a function of the average bit energy ratio per noise power spectral density ( / 0 ).
In our coded SCMA system, the rate 1/2 LDPC code of length 128 designed in [11] is adopted in all scenarios simulated (except for the simulation results shown in Fig. 8, which considers uncoded SCMA). We consider 10 iterations in the LDPC decoding in our simulations. The bit-interleaving strategy was also applied, to avoid error bursts. The stopping criterion is 50 frames errors for all scenarios. The flat Rayleigh fading channel model is adopted for all users.
In all simulations, the conventional Log-MPA [7] detector and the proposed one perform 5 iterations. Although the multiuser detector and the LDPC decoder in Fig. 2 can exchange messages repeatedly before the LDPC decoder makes a final hard decision, we considered the open-loop (i.e., one outer iteration) in all of our simulations.
For the proposed decoder, we show results for all possible values of the variable . The case = corresponds to the conventional Log-MPA itself, adopted as a reference. This condition appears in all simulations, therefore the performance of the Log-MPA detector is used as a reference. To reproduce our simulation results, the resource allocation matrices and the codebooks used herein can be found in [12].
The simulation results for the uncoded SCMA system are presented in Figs. 4, 6, and 8, and for the (LDPC) coded SCMA system, in Figs. 5, 7, and 9, respectively, for the following scenarios: • SCMA codebook designed in [13] with parameters = 4, = 6, = 4 and = 3; • LDS codebook with parameters = 8, = 6, = 4 and = 3; • LDS codebook with parameters = 8, = 12, = 9 and = 4. As can be seen from Figs. 4-9, in all three scenarios the gaps between the BER curves of the proposed tree-based Log-MPA and the corresponding BER curve of the conventional Log-MPA are relatively smaller in the coded SCMA scenarios than in the uncoded SCMA scenarios, for all values of . This shows that the proposed tree-based Log-MPA not only works well in the coded SCMA system, but it also magnifies the advantages of the method with regard to the uncoded case, first presented in [9].
It is worth mentioning that, if the outer loop in the Rx part of Fig. 2 is closed, and some iterations are performed between the multiuser detector and the LPDC decoders, the aforementioned gaps can be further decreased.
As a final result, we observe that, for all coded SCMA scenarios, a value of close to 0.5 yields a good performance (near the conventional Log-MPA performance). Although the number of scenarios considered in our simulations is small for this to become a rule of thumb, it is worth investigating how the complexity order of the tree-based Log-MPA grows with for = 0.5 . The result is shown in Figure 10, for values of up to 300. It can be seen that, although the complexity order of the proposed detector is still exponential under this condition, it grows much slower than the one for the conventional Log-MPA.

VI. CONCLUSIONS
In this work, a modified Log-MPA detector for SCMA, which presents good performance with a significant reduction in detection complexity, was proposed. The detector is based on a tree structure that, when properly pruned, provides a reduction in the number of codewords processed by the   iterative detector. Simulation results for the bit error rate were presented for both uncoded and LDPC coded cases, for three different system scenarios. It was found that the proposed detector performs very well, close to the conventional Log-MPA, with reduced complexity. Moreover, it was verified that the advantages of the proposed detector is even more pronounced in the coded system. We also showed that the complexity order of the proposed detector under a linear pruning, although still exponential in the SCMA codebook size, grows at a much smaller rate when compared to the conventional Log-MPA. Therefore, the contribution of this work is of great interest in practical scenarios.