Tensor-Based Multiuser Detection in Cooperative Multirelay Uplink

In this paper, four tensor-based receivers for a multiuser multirelay cooperative uplink are proposed, with the relays employ the amplify-and-forward (AF) protocol and a time-spread coding. Two different scenarios are considered regarding the multiuser interference at the relays. When multiuser interference at the relays is ignored, a quadrilinear PARAFAC model is adopted for the received signals. Otherwise, a new tensor model called Nested PARAFAC-Tucker decomposition (NPT1D) is used to represent the received signals. The proposed receivers jointly estimate the transmitted symbols, channel gains and spatial signatures, two of them being based on the Alternating Least Squares (ALS) algorithm and two of them using the non-iterative Least Squares Khatri-Rao Factorization (LS-KRF) method. Uniqueness is discussed and simulation results are provided to illustrate the performance of the proposed techniques.


Tensor-Based Multiuser Detection in Cooperative Multirelay Uplink
A. Augusto T. Peixoto and C. Alexandre R. Fernandes Abstract-In this paper, four tensor-based receivers for a multiuser multirelay cooperative uplink are proposed, with the relays employ the amplify-and-forward (AF) protocol and a time-spread coding.Two different scenarios are considered regarding the multiuser interference at the relays.When multiuser interference at the relays is ignored, a quadrilinear PARAFAC model is adopted for the received signals.Otherwise, a new tensor model called Nested PARAFAC-Tucker decomposition (NPT1D) is used to represent the received signals.The proposed receivers jointly estimate the transmitted symbols, channel gains and spatial signatures, two of them being based on the Alternating Least Squares (ALS) algorithm and two of them using the non-iterative Least Squares Khatri-Rao Factorization (LS-KRF) method.Uniqueness is discussed and simulation results are provided to illustrate the performance of the proposed techniques.

I. INTRODUCTION
T HE use of multilinear algebra concepts, such as tensor decompositions, has found applications in several areas [1], [2] and allowed the development of new receivers for telecommunications systems.An advantage of using tensors in comparison to matrices is the fact that they allow the direct use of multidimensional data, providing a better understanding and processing from a multidimensional perspective.One of the motivations for using tensor models in wireless communications systems comes from the fact that they allow multiuser signal separation and channel estimation under uniqueness conditions more relaxed than the ones of conventional matrixbased approaches [1], [2].
The most known tensor decomposition is the Parallel Factor Analysis (PARAFAC), proposed by Harshman [3] and Caroll & Chang [4] (Carroll & Chang's work presented the same decomposition as Canonical Decomposition).The PARAFAC has been used as a component analysis tool in many fields as, for instance, psychometrics [4], chemometrics [5], speech processing [6], blind signal separation [7] and many others [1].One of the main motivations for using the PARAFAC decomposition comes from its intrinsic uniqueness.In comparison to matrix decompositions, where we often have the problem of rotational freedom, the PARARAC decomposition of high order tensors is unique up to scaling and permutation ambiguity.This uniqueness property makes the PARAFAC decomposition a good solution to many signal processing problems [8], [9].
After the work of Sidiropoulos et al. [9], the use of tensor decompositions in wireless communications proved to be valid and effective.In the referred work, the authors showed that a set of DS-CDMA (direct-sequence code-division multiple access) signals received at an uniform linear array of antennas can be viewed as a 3 rd order tensor, admitting a PARAFAC decomposition that allowed the proposition of a blind receiver.Many other works in the literature have applied the PARAFAC in different wireless communications systems [10].More general tensor decompositions were also proposed for modeling wireless systems, as, for instance, the nested PARAFAC decomposition (NPD) [11], the nested Tucker decomposition (NTD) [12] and the PARATUCK [13], [14].One of the main features of the receivers proposed in these works is the fact that they do not require the use of training sequences, nor the channel knowledge, with weak identifiability conditions.Moreover, these tensor decompositions do not rely on statistical independence between the transmitted signals.Instead, the receiver algorithms are purely deterministic and explore the multilinear algebraic structure of the received signal.
Tensor decompositions have also been successfully employed in wireless cooperative communications, as in [15], where a receiver was proposed for a two-way relaying system.In [16], a blind receiver for an AF relaying uplink was proposed for multiuser and multirelay scenarios.[17] extends the work of [16] by proposing an unified multiuser receiver with a trilinear tensor model for different relaying schemes.In [18], receivers based on a trilinear tensor model for a cooperative scenario exploiting spreading diversity at the relays are proposed.
More recent works include [19], where a semi-blind receiver was proposed for a two-hop MIMO relaying system, adopting two tensor decompositions (PARAFAC and PARATUCK).The work [20] considers a similar system model than the one used in [19], but adopting the NPD and employing space-time (ST) coding at the transmitter and relay nodes.In [12], a oneway two-hop MIMO AF cooperative scheme was employed with a NTD, allowing the development of two semi-blind receivers to jointly estimate the information symbols and the relay channels.In [21], a three-hop one-way AF cooperative system was considered and a semi-blind receiver based on the PARATUCK-3 is proposed.The works [22] and [23] present tensor-based approaches for channel estimation and multiuser detection in cooperative MIMO systems, respectively.
In the present paper, we propose four tensor-based receivers for a multiuser multirelay cooperative uplink.In the considered system model, the relays employ the AF protocol and a timespread coding [24], and an antenna array is employed at the destination, taking advantage of cooperative and spatial diversities.Two different scenarios are considered regarding the multiuser interference at the relays.In the first scenario, interference between users is not considered at the relays, while the second scenario takes this interference into account, representing a more realistic scenario than the first one.
Four receivers that jointly estimate the transmitted symbols, channel gains and spatial signatures are then proposed, two for each scenario.The proposed techniques are based on the iterative ALS algorithm and on the non-iterative LS-KRF method [25], [26].A quadrilinear PARAFAC model is adopted for the received signals of the first scenario, with two semiblind receivers being developed.For the second scenario, a new tensor decomposition called nested PARAFAC-Tucker (NPTD) is used to model the received signals, with two supervised receivers being proposed.The NPTD can be viewed as a special case of the NTD and a generalization of the NPD.
Some results of the present work have been preliminarily presented in [27], where a semi-blind receiver for a cooperative multiuser DS-CDMA system was proposed.In this work, we extend the work [27] by generalizing the system model, proposing other receivers and presenting a new tensor decomposition.
The present work can also be viewed as a generalization of some works in the literature [9], [16]- [18].Indeed, the present paper extends [9] by considering a cooperative link with relays instead of the direct link.Moreover, in contrast to [18], the adopted system considers the relays transmitting in different time-slots instead of all relays transmitting simultaneously to the destination.In comparison to [16] and [17], our work admits time-spread coding at the relays.In addition, [16]- [18] do not consider the multirelay interference at the relay, contrarily to one of the cases here presented.
The uniqueness conditions of the tensor decompositions are discussed for both scenarios and simulation results that illustrate the performance of the proposed techniques are provided.
The present paper is structured as follows.Section II lays out the adopted notation and tensor decompositions.Section III presents the communication system models.Section IV shows the tensor modeling of the communication systems and their uniqueness properties.Section V presents the proposed receivers algorithms.Section VI shows the simulations results and Section VII summarizes the conclusions.
Notation: The notation used in this paper is presented here.Scalars are denoted by Roman lowercase letters (a,b,...), vectors as lower-case boldface letters (a,b,...), matrices as upper-case boldface letters (A,B,...) and tensors as calligraphic letters (A,B,...).The element (i,j) of the matrix A is denoted by [A] i,j or a i,j , and the element (i 1 , ..., i N ) of the N th order tensor A is denoted by [A] i1,...,i N or a i1,...,i N .
A T and A † stand for the transpose and the pseudo-inverse of A respectively.â, â, Â and Â represent the estimations of a, a, A and A, respectively.A(:,i) ∈ C R×1 is the i-th column of A ∈ C R×I .The operator diag j [A] is the diagonal matrix formed by the j-th row of A. The operator • denotes the outer product of two vectors, denotes the Khatri-Rao product between A ∈ C I×R and B ∈ C J×R , resulting in A B ∈ C IJ×R , and the operator ⊗ denotes the Kronecker product between A ∈ C I×K and B ∈ C J×L , resulting in A ⊗ B ∈ C IJ×KL .I R ∈ C R×R and I N,R ∈ C R×...×R denote, respectively, the identity matrix of dimension R and the N th order identity tensor of dimensions R × ... × R.
A N th order tensor X ∈ C I1×...×I N can be reexpressed as a (N − 1) th order tensor by concatenating two of its indices.For instance, by concatenating the indices i N −1 and i N in the following way: j The mode-n product between a N th order tensor X ∈ C I1×...×I N and a matrix A ∈ C Jn×In yields a N th order tensor Y = X × n A ∈ C I1×...×I N −1 ×J N ×I N +1 ×...×I N , defined as [1]: Let X ∈ C I1×...×I N and W ∈ C J1×...×J M be two tensors sharing a common dimension, i.e.I p = J q = K, with 1 ≤ p ≤ N and 1 ≤ q ≤ M .The contraction over the common mode (i p = j q = k) between X and W yields a II. NESTED TENSOR DECOMPOSITIONS In this section, we present the so called nested decompositions, a group of tensor decompositions that can be viewed as the nesting of two simpler tensor models.In particular, we describe the Nested PARAFAC and Nested Tucker decompositions.At the end of the section, a new nested decomposition called Nested PARAFAC-Tucker decomposition (NPTD) is presented.

A. Nested PARAFAC
The Nested PARAFAC decomposition (NPD) of a 4 th order tensor Y ∈ C I1×I2×I3×I4 was defined in [11] as: where ∈ C I3×Q2 and A (4) ∈ C I4×Q2 .Fig. 1 shows a block diagram of the NPD.By concatenating the indices i 3 and i 4 in the following way: can be rewritten as:  where w j1,q1 = [W] j1,q1 , W ∈ C J1×Q1 being an unfolded matrix of the tensor W ∈ C Q1×I3×I4 , defined by: It can be concluded from ( 4) and ( 5) that Y can be viewed as the nesting of two PARAFAC decompositions.Indeed, Y follows a PARAFAC model with factor matrices A (1) , A (2)  and W, with W being an unfolding of a PARAFAC model with factors B, A (3) and A (4) .
The NPD can also be written as the contraction over the mode q 1 between two PARAFAC tensors, as follows: where and In a similar manner, Y can be expressed as a PARAFAC decomposition with factor matrices U, A (3) and A (4) , with U ∈ C J2×Q2 being an unfolding of a PARAFAC model with factors A (1) , A (2) and B, where J 2 = I 1 I 2 .In this case, Y could be expressed as a contraction between two PARAFAC tensors, similarly as in ( 6)-( 8), however, the matrix B would be a factor of the tensor R, instead of being a factor of W, and the contraction would be over the mode q 2 .The matrix B can then be viewed as a factor that can be shared by the two PARAFAC tensors and the indices q 1 and q 2 correspond modes of B that interact with the tensors R and W by means of the contraction operation.
The NTD can be expressed as the contraction over the mode q 2 between two Tucker tensors, in the following way: where is a Tucker-1 tensor and is a Tucker-2 tensor.
Similarly, Y can be expressed as a Tucker-2 decomposition with core tensor D (2) and factors C (3) and R, where R ∈ C J2×R3 is an unfolding of a Tucker-2 tensor with core tensor D (1) and factor matrices C (1) and C (2) , where J 2 = I 1 I 2 .In this case, Y could be expressed as a contraction between two Tucker tensors, similarly as in ( 12)-( 14), however, the matrix C (2) would be a factor of T , instead of being a factor of G, and the contraction would be over the mode q 3 .The matrix C (2) can then be viewed as a factor that can be shared by the two Tucker tensors and the indices q 2 and q 3 correspond modes of C (2) that interact with the tensors T and G by means of the contraction operation.

C. Nested PARAFAC-Tucker
In this subsection, we present a new nested decomposition called Nested PARAFAC-Tucker decomposition (NPTD), which can be viewed as a special case of the NTD and a generalization of the NPD.Let us consider the case of a NTD with one of the core tensors having diagonal slices.Without loss of generality, let us consider that Q 1 = Q 2 and that [D (1) ] •,i2,• , for 1 ≤ i 2 ≤ I 2 , are diagonal matrices.( 9) can then be re-expressed as: where D (1) ∈ C I2×Q1 is a matrix formed from the diagonal elements of [D (1) ] •,i2,• , for 1 ≤ i 2 ≤ I 2 .Fig. 3 shows a block diagram of the NPTD.By concatenating the indices i 3 and i 4 in the following way: 15) becomes: where g j1,q1 = [G] j1,q1 , the matrix G ∈ C J1×Q1 being an unfolding matrix of the tensor G ∈ C Q1×I3×I4 , given by: i4,q3 .
(16) and (17) shows that Y is the nesting of a PARAFAC and a Tucker-2 decomposition.Indeed, Y can be viewed as a PARAFAC model with factor matrices C (1) , D (1) and G, with G being an unfolded matrix of a Tucker-2 model with core tensor D (2) and factors C (2) and C (3) .
The NPTD can be viewed as the contraction over the mode q 1 between a PARAFAC and a Tucker tensor, as follows: where is a PARAFAC tensor and is a Tucker-2 tensor.In a similar way, Y can be expressed as a Tucker-2 decomposition with core tensor D (2) and factors C (3) and U, where U ∈ C J2×R1 is an unfolding of a PARAFAC model with factors C (1) , D (1) and C (2) , where J 2 = I 1 I 2 .In this case, Y could be expressed as a contraction between a PARAFAC and a Tucker tensor, similarly as in ( 18)-( 20), however, the matrix C (2) would be a factor of R, instead of being a factor of G, and the contraction would be over the mode q 2 .The matrix C (2) can then be viewed as a factor that can be shared by the two tensors and the indices q 1 and q 2 correspond modes of C (2) that interact with the tensors R and G by means of the contraction operation.
A particular case of the NPTD can be obtained by making Q 1 = Q 2 and C (2) = I Q1 .In this case, (15) leads to: which can be expressed as a PARAFAC tensor with factor matrices C (1) , D (1) and P, where P ∈ C I3I4×Q1 is the unfolding of a Tucker-1 tensor with core tensor D (2) and factor C (3) .Due to this property, the tensor decomposition (21) will be denoted Nested PARAFAC-Tucker-1 decomposition (NPT1D).Similarly, ( 21) can be viewed as a Tucker-2 tensor, with core tensor D (2) and factors C (3) and , with B being a PARAFAC tensor with factors C (1) , D (1) and I Q1 .

III. SYSTEM MODEL
Let us consider a cooperative uplink communication system with M users transmitting towards a base station with the help of relay-aided links and no direct link between the users and the base station, as illustrated in Fig. 1.Each user is assisted by a group of R relays.The communication links between a user and a relay are called source-relay (SR) links and the links between a relay and the base station are called relaydestination (RD) links.The base station employs a linear array of K equally spaced antennas and each of the M users transmits to its R associated relays, which use the AF protocol and perform a time-spread operation on the users signals.All the relays and users are single antenna devices operating in halfduplex mode.Synchronization at the symbol level is assumed, a quasi-static frequency-flat fading environment is considered and all channels are independent to each other.
The transmission occurs in (R + 1) time-multiplexed stages, in the following way.In the first stage, the users transmits the symbols to the relays.In the second stage (first cooperative slot), the first relay of each user performs a time-spread operation with a code of length P and forwards the received signal to the destination.The transmission continues so that, in the (R + 1) th stage (R th cooperative slot), the R th relay of each user performs the time-spread operation and transmit the signal to the destination.The total transmission rate of the system is M/(PR+1).Regarding the interference at the relays, two different assumptions will be considered in this work: one scenario where multiuser interference is not considered at the relay nodes and one scenario where multiuser interference is considered at the relays.We assume that the number of relays does not change with time.For the readers interested in tensor-based receivers considering that the number of relays may change with time, we suggest the works [17], [28].

A. No multiuser interference at the relays
In this subsection, we make the assumption that each user and its associated relays are located inside a cluster, i.e. they are closer to each other so the signal received by a relay located inside a cluster contains no interference from the other users of the system.The same assumption was made in [16], [17].One possible interpretation for this assumption is that a user and its associated relays are all located inside a cell and the other users and their associated relays are located in other cells, being treated as co-channel interferers.Fig. 4 illustrates the adopted scenario for M users and 3 relays per user.The signal received by the r-th relay of the m-th user is given by: r,m is the channel coefficient between the m-th user and its r-th relay, s n,m is the n-th symbol of the m-th user and v (SR) r,m,n is the additive white Gaussian noise (AWGN) component.All the data symbols s n,m are independent and identically distributed and uniformly distributed over a Quadrature Amplitude Modulation (QAM) or a Phase-Shift Keying (PSK) alphabet.
Each relay performs a time-spreading operation on the received signals, so that the signal transmitted by the r-th relay of the m-th user during the n-th symbol period and p-th timespread slot is given by: for 1 ≤ p ≤ P, where c p,m is time-spread code of the relays of m-th cluster.Note that the same code is used by all the relays of the same user.The signal received by k-th antenna of the base station is then given by: for 1 ≤ k ≤ K, where h k,r,m is the channel coefficient between the k-th receive antenna and the r-th relay associated with the m-th user, v (RD) k,r,n,p is the corresponding noise of the RD link and g r,m is the amplification factor applied by the r-th relay of the m-th user.Substituting ( 22) into (24), we get: where The term v k,r,n,p denotes the noise component through the source-relay-destination (SRD) link.
We assume that all links are subject to multipath propagation and that the scatters are located far away from the destination, such that the signals transmitted by the relays of a given cluster arrive at the base station with the same angle of arrival.Hence, for the signals transmitted from a given cluster of relays, the angle spread is small compared to the spatial resolution of the antenna array at the destination.This assumption is valid when the user and its associated relays are close to each other and there is no scattering around the antennas of the base station, which is common in suburban areas, where the base station is on the top of a tall building or a tower [29].The channel coefficient h k,r,m may then be expressed as: where is the number of multipaths, θ m is the mean angle of arrival of the m-th scattering cluster, a k (θ m ) is the response of the k-th antenna of the m-th scattering cluster, defined as a k (θ m ) = exp(j(k − 1) sin(θ m )), k = 1,...,K, with θ m being an uniform random variable with zero mean and variance of 2π and β (RD) l,r,m is the complex fading envelope of the l-th path between the r-th relay of the m-th user and the base station.( 27) can be rewritten as follows: where γ l,r,m .Now, by substituting ( 28) into (25), we get: and, substituting (28) into (26), we get:

B. Multiuser interference at the relays
In this subsection, we assume that the signal sent by a user to its associated relays will act as interference to the relays of other users, corresponding a more challenging scenario than the one of the previous subsection.The signal received by the r-th relay of the m-th user is then given by: where h m,r, m is the channel coefficient between the m-th user and the r-th relay of the m-th user.For the RD link, the signal received at the k-th antenna of the base station, through the r-th cooperative slot, on the n-th symbol period and p-th timespread slot is given by: which, by substituting ( 31) into (32), gives us: The same assumptions about the propagation environment made in section III.A are considered in this scenario.Hence, from (28): where v k,r,n,p is given by (30).

IV. TENSOR MODELING
In this section, we present tensor modeling for the two systems described in Section III.From now on, to simplify presentation, we omit the AWGN terms and assume that the channels are constant during the whole transmission block.

A. Quadrilinear PARAFAC Model
For the scenario presented in subsection III.A, where no multiuser interference at the relays is considered, the baseband signal received on the RD link can be viewed as a four-way array with its dimensions related to space, cooperative slot, symbol and time spreading.Let Y ∈ C K×R×N ×P be a 4 th order tensor representing the baseband RD data signals at the base station: [Y] k,r,n,p = y (RD) k,r,n,p , for k = 1,...,K, r = 1,...,R, n = 1,...,N and p = 1,...,P.From (29), an element of Y is given as follows: where h r,m is defined as: The tensor Y can be expressed using the mode-n product as follows: The tensor Y can be reorganized in unfolded matrices.In this work, we use the following unfolded matrices [27]: An important feature of the tensor model of ( 35) is that it is essentially unique if the following conditions is verified [30], [31]: where k A is the Kruskal rank of the matrix A, (similarly to H, S and C).The concept of Kruskal rank can be found fairly explained in [32].( Given that a matrix whose elements are drawn independently from an continuous distribution has full k-rank with probability one [9], then H has full k-rank with probability one.Such assumption is valid when the user signals undergo independent fading channels.The matrix A is assumed to be full k-rank because we modeled it as a Vandermonde matrix with distinct generators.The symbols matrix S is full k-rank with a high probability if N is sufficiently large (in comparison to the modulation cardinality and the number of users).For last, the matrix C can be designed to be full k-rank.Thus, based on condition (43), we have great flexibility for choosing K, R, N and P, which is the one of the reasons for considering the tensor approach: it provides different trade-offs for the system model parameters.For instance, we have: • If min(R, M) + min(N, M) + min(P, M) ≥ 2M + 3, then, we may set K = 1, which means that 1 antenna at the base station is sufficient for M users.Thus, the system supports more users than relays and sensors.• If min(K, M)+min(R, M)+min(P, M) ≥ 2M + 3, then N = 1 satisfies (43), which means that a short block length is sufficient for detection.
then, we may choose P = 1, which corresponds to a scenario with no time-spreading coding, leading to the tensor models presented in [16] and [17].43), which means one relay per user is used, leading to the tensor model presented in [18].The tensor modelings of [16], [17], [18] can then be viewed as particular cases of the present work.It is also worth mentioning that, if K, N ≥ M, then (43) turns into min(R, M)+ min(P, M) ≥ 3.This means that we may set R = 2 and P = 1 (or R = 1 and P = 2), leading to a maximum system transmission rate of M/3.

B. Nested PARAFAC-Tucker-1 Model
Let us now consider the system model of Section III.B, i.e. with multiuser interference considered at the relays.In this case, we construct the tensor of received signals in a way slightly different from that used in Subsection IV-A, with a change on the order of the dimensions being carried out.Let Y ∈ C K×P ×R×N be the 4 th order tensor that contains the received signals in (34): [Y] k,p,r,n = y (RD) k,r,n,p , for k = 1,...,K, p = 1,...,P, r = 1,...,R and n = 1,...,N.By omitting the AWGN term, a typical element of Y can be expressed by: for k = 1,...,K, p = 1,...,P, r = 1,...,R and n = 1,...,N, where hm,r, m = h Note that (44) follows the NPT1D ( 21) with the following correspondences: and (C (1) , D (1) , D (2) , where H ∈ C M ×R×M is a 3 rd order tensor with [ H] m,r, m = hm,r, m.From ( 18)-( 21), the tensor Y can also be represented in following way: where is a PARAFAC tensor and is a Tucker-1 tensor.Moreover, the tensor Y can be expressed as a PARAFAC tensor, with factor matrices A, C and Z (1) , in the following way: where z l,m = [Z (1) ] l,m , with Z (1) ∈ C N R×M being the mode-1 unfolded matrix of the 3 rd order Tucker-1 tensor Z ∈ C M ×R×N , with core tensor H and factor matrix S, as follows: In ( 51) and (52), we have merged the indices r and n into the index, l, where l = (n-1)R + r and l = 1,...,L, with L = NR.The mode-1 and mode-3 unfoldings of Z are given by: and where I R ∈ C R×R is the identity matrix and H(1) ∈ C M R×M and H(3) ∈ C RM ×M are the mode-1 and mode-3 unfoldings of H.The 3 rd order tensor Y ∈ C K×P ×L , defined as [Y] k,p,l = y k,p,l , admits the following unfoldings: In this case, the trilinear PARAFAC decomposition (51) is essentially unique if the condition below is satisfied [32]: If condition (58) is satisfied, thus, any other set of matrices (A', C' and Z' (1) ) that satisfies (51) is related to the original matrix set (A, C and Z (1) ) by A' = AΠ∆ A , C' = CΠ∆ C and Z' (1) = Z (1) Π∆ Z (1) , where ∆ A ∆ C ∆ Z (1) = I M .If we assume that A, C and Z (1) are all full k-rank, we have: As mentioned in Section III.A, A and C can be assumed to be full k-rank.From (53), assuming that S is full rank, as explained in Section III.A, and that H( 1) is full column rank ( hm,r, m is drawn from a continuous distribution), then Z (1) is full k-rank.
From (59), we can then determine some parameters of the adopted system, as the number of users the proposed receiver can handle.For instance, we have: It means that RN ≥ 2, which could give us, for example, R = 1 and N = 2, thus a short block length and a single relay simultaneously would be sufficient to handle M users.• If L ≥ M and K ≥ M, then we may choose P = 2.
• If L ≥ M and P ≥ M, then K = 2 is enough to verify (59).It is important to notice that H and S are not unique, as the Tucker-1 model (52) is not unique.To overcome this problem, in the next section, we propose a supervised receiver where pilot symbols are used to estimate H during a training period.After that, in a non-supervised period, the unknown symbols are estimated using the estimation of H obtained during the training period.

V. RECEIVER ALGORITHMS
In this section, four estimation algorithms are proposed for the scenarios presented in Section III.Two of the presented receivers are iterative algorithms based on the ALS method and the other two receivers are non-iterative techniques.For the first scenario, two semi-blind receivers are proposed, while two supervised estimation techniques are derived for the second scenario.

A. Semi-Blind ALS receiver for the first scenario
Let us assume that there is no channel information at the receiver neither at the transmitter.The iterative algorithm presented below is based on the ALS method, which will fit the quadrilinear model to the received data tensor [33].The unfoldings (38)-( 40) are used to estimate A, H and S, where we assume knowledge of the matrix C. Algorithm 1 Semi-Blind ALS Receiver 1) Set i = 0; Initialize Â(i=0) and Ĥ(i=0) randomly; 2) i = i + 1; 1) T ; 6) Repeat steps 2-5 until convergence; 7) Remove scaling ambiguities; The quadrilinear ALS algorithm is shown in Algorithm 1, where Ĥ(i) , Ŝ(i) and Â(i) are the estimations of H, S and A at the i-th iteration, respectively.The tensor error at the end of the i-th iteration is given by: The convergence of the algorithm is obtained when |e(i) − e(i − 1)| < 10 −6 .
As we assume perfect knowledge of C at the receiver, the PARAFAC decomposition has no permutation ambiguity [9], i.e.Π = I M .After obtaining the estimations of A, H and S with Algorithm 1, it is necessary to remove scaling ambiguity.The scaling ambiguity of Â is removed by considering that the first row of A is known (the first row of A is composed of 1's).To remove scaling ambiguity from Ŝ, the first row of S is assumed as known (one pilot symbol per user).After obtaining the scaling matrices of Â and Ŝ, we estimate and cancel the scaling matrix ∆ H of Ĥ using the following relationship: , where I is the number of iterations.
The Semi-Blind ALS Receiver must assure the existence of the pseudo-inverses in Steps 2, 3 and 4 of Algorithm 1, which requires the following necessary identifiability conditions P KR ≥ M, P KN ≥ M and N RP ≥ M , respectively.

B. LS-KRF receiver for the first scenario
A non-iterative solution for semi-blind joint detection and channel estimation denoted LS-KRF is presented in the sequel.Assuming knowledge of matrix C and using (41), the following LS estimation K = (A H S) ∈ C KRN ×M can be obtained as follows: The second step of the algorithm consists in obtaining a 3 rd order tensor Km ∈ C K×R×N from the m-th column of the matrix K, in the following way: for m = 1, ..., M , r = 1, ..., R, n = 1, ..., N and k = 1, ..., K.
As K is a double-Khatri-Rao product, we have:  A(:,m), H(:,m) and S(:,m) can then be estimated from a low rank estimation method.Let us consider the following unfoldings of Km : for m = 1,...,M.A(:,m), H(:,m) and S(:,m) are optimally estimated as the dominant left-singular vectors of m and K(3) m , respectively.After obtaining the estimations of A, H and S, the scaling ambiguities can be removed in the same way as for the ALS receiver of Subsection V.A, with no permutation ambiguity.The Semi-Blind LS-KRF Receiver is detailed in Algorithm 2 and its computational complexity is given by O The only identifiability condition of Semi-Blind LS-KRF Receiver is the existence of the pseudo-inverse of C T in (61), which leads to r (C) = M , where r (C) denotes the rank of C, implying P ≥ M .By comparing this identifiability condition with the ones of the Algorithm 2 shown in Subsection V.A, we conclude that the Semi-Blind ALS works with weaker conditions than Semi-Blind LS-KRF.

C. ALS supervised receiver for the second scenario
In this subsection, a supervised ALS-based receiver is presented for the system model of Subsection III.B.The supervised receiver estimates the parameters of the system in two phases.The first phase is a supervised stage where a short training sequence is transmitted by the users to help the estimations of H(3) and A. On the second phase, the users' data symbols are transmitted (non-supervised stage) and the receiver estimates the symbols using the estimations of H(3) and A obtained during the first phase.It is assumed previous knowledge of the coding matrix C.
The unfolded matrices of the received signals during the training period (first phase) are given by: Algorithm 3 Supervised ALS Receiver First Phase (supervised stage) 1) Set i = 0; Initialize Â(i=0) randomly; 2) i = i + 1; 3) Ẑ( 1) t ; 5) Repeat steps 2-4 until convergence; 6) Remove scaling ambiguities of Â and Ẑ(1) t ; 7) Reorganize Ẑ(1) T ; and Y (3) where L t = N t R, S t ∈ C Nt×M is a matrix that contains the pilot symbols and N t is the number of pilot symbols per user.
The supervised ALS Receiver is detailed in Algorithm 3.During the first phase, the ALS algorithm is used to estimate the factor matrices A and Z (1) t using (69) and (67), as shown in Steps 1 to 5 of Algorithm 3, where Ẑ(1) t(i) and Â(i) are estimations of Z (1) t and A at the i-th iteration, respectively, while Â and Ẑ t are the estimations of A and Z (1) t after convergence, respectively.The stop criterion is similar to the one of Algorithm 1.As in Subsection V.A, there is no permutation ambiguity as the matrix C is assumed known at the receiver.The scaling ambiguity of the matrix Â is removed using the fact that its first row of is known.To remove the scaling ambiguity of Ẑ t we use the following identity: A .For estimating the global channel tensor H, we reorganize Ẑ(1) for m = 1,...,M, n t = 1,...,N t and r = 1,...,R.From (54), we have: can be estimated in the the following way: as shown in Steps 7 and 8 of Algorithm 3.
After the estimation of H(3) , the supervised stage is over and the second phase starts.During the second phase, the matrix Z (1) , which corresponds to the matrix Z (1) t during the nonsupervised stage, is expressed by: where the matrix Â is the estimation of A obtained during the first phase of the algorithm and Y (3) is the unfolding matrix of the received signal during the non-supervised stage.After obtaining Ẑ(1) , we reorganize it into Ẑ(3) , as follows: for m = 1, ..., M , r = 1, ..., R and n = 1, ..., N .The matrix S is then estimated from Ẑ(3) and Ĥ(3) (obtained during the supervised stage), in the following way: T .
The computational complexity of Algorithm 3 is ).The Supervised ALS Receiver must assure the existence of the pseudo-inverses in Steps 3, 4, 8, 9 and 11 of Algorithm 3, which requires the following necessary identifiability conditions P K ≥ M, P N t R ≥ M and N t ≥ M .

D. LS-KRF supervised receiver for the second scenario
In this subsection, a closed-form LS-KRF algorithm for the system model of Subsection III.B is presented.Similarly as the ALS algorithm of Subsection V.C, the estimation here is done in two phases: a supervised stage where pilot symbols are transmitted by the users in order to the receiver estimate H (3) and A, and a non-supervised phase, where the users' data symbols are transmitted and the receiver estimates the symbols using the estimations of H(3) and A previously obtained.The supervised LS-KRF Receiver is shown in Algorithm 4.During the supervised stage, the matrix G = (Z (1) t A) ∈ C LtK×M can be estimated from (68) as: In the second step of the algorithm, a matrix Ĝm ∈ C Lt×K is constructed from the m-th column of Ĝ, as follows: for m = 1, ..., M , l = 1, ..., L t and k = 1, ..., K, leading to: which corresponds to a rank-1 matrix.A(:,m) and Z t (:,m) are then estimated, respectively, as the dominant left and right singular vectors of Ĝm .The remaining steps of the proposed supervised LS-KRF Receiver are identical to steps 6-11 of Algorithm 3. The computational complexity of the Supervised LS-KRF Receiver is given by O(R 2 M 3 N t ).
The Supervised LS-KRF Receiver must assure the existence of the pseudo-inverses in Steps 1 and 5 of Algorithm 4, which requires the following necessary identifiability condition P ≥ M .By comparing this identifiability condition with the ones of the Algorithm 3 shown in Subsection V.C, we conclude that the Supervised ALS works under weaker conditions than Supervised LS-KRF.

VI. SIMULATION RESULTS
This section presents simulations results in order to evaluate the performance of the proposed receivers.The following scenario was adopted.The wireless channels have quasi-static frequency-flat Rayleigh fading, the complex envelope coefficients β (RD) l,r,m being independent and identically distributed (i.i.d.), and drawn from a complex Gaussion distribution with zero-mean and variance following an exponent law with path loss exponent equal to 4. The time-spread coding matrix C ∈ C P ×M is a truncated Hadamard matrix, i.e. if P ≥ M , C is the transpose of the matrix that contains the first M rows of the Hadamard matrix of dimension P and, if P < M , C is formed from the first P rows of the Hadamard matrix of dimension M .Moreover, 16-QAM modulation is used and the number of multipaths L (RD) r,m is equal to 20.The results were averaged over 20.000 independent Monte Carlo samples, the relays use a variable AF gain and P s = P r = 1, where P s and P r are the source and relay transmission powers, respectively.When not state otherwise, we used K = 3, R = 3, P = 8, N = 16, M = 4 and N t = 8.The symbol error rate (SER), channel normalized mean square error (NMSE) and average processing time are shown in function of the mean signal-to-noise ratio (SNR) of the RD link.The NMSE of A is defined as: where M C is the number of Monte Carlo runs, A (l) is the matrix A during the l-th Monte Carlo run and Â(l) is the estimation of A (l) .A similar expression is used for the NMSE of H.The simulation results corresponding to the scenarios with and without multiuser interference at the relay are presented in Subsections VI-A and VI-B, respectively A. No multiuser interference at the relays Fig. 5 shows the SER of the proposed semi-blind receivers (Algorithms 1 and 2), as well as the SER provided by the Zero Forcing (ZF) receiver and by two tensor-based receivers of previous works in the literature ( [17] and [18]).For the proposed receivers, we used M = 4, R = P = 2, and for the receivers of [17] and [18], we used, respectively M = R = 2 and M = P = 2.These parameters were set to provide similar transmission rate for all the receivers.The ZF receiver corresponds to the Step 3 of Algorithm 1, but assuming perfect

SER
Receiver of [18] Receiver of [17] Proposed Sem-Blind ALS Proposed Semi-Blind LS-KRF Zero-Forcing knowledge of the factor matrices A, C and H.As we can see from Fig. 5, the ZF receiver performed better in comparison to all the other techniques, as expected.However, by the slope of the SER curves, it can be viewed that the proposed techniques achieve the same order of diversity than the ZF method.Moreover, the proposed semi-blind receivers showed better SER performances in comparison to the ones of [17] and [18].This is due to the fact that the proposed semiblind receivers exploit both the time-spreading and cooperative diversity, contrarily to the other receivers.Indeed, [17] does not use time-spreading and [18] does not exploit cooperative diversity.The proposed Semi-Blind ALS Receiver unifies and generalizes the approaches of [17] and [18], taking advantage of their characteristics.One can also note from Fig. 5 that the proposed receivers provided very similar SERs.Fig. 6 shows the SER of the proposed semi-blind receivers (Algorithms 1 and 2) for several values of the number R of relays per user and the time-spread coding length P .It can be viewed from this figure that, when the number of relays is increased, the SER is significantly decreased.Moreover, by the slope of the SER curves, we see that increasing R leads to a higher order of diversity.This result shows that the proposed receivers exploit efficiently the cooperative diversity provided by the relays.Note also that the two proposed semi-blind receivers have similar SER performances for all the tested cases.Besides, when the value of P is decreased, the SER suffers a small decrease, with the same slope.This is expected, since the system does not exploit time diversity.However, it is important to note that, although the parameter P does not have a great impact on the SER, it plays an important role on the uniqueness conditions, as pointed out in Section IV.
Figs. 7 and 8 shows the NMSEs of the matrices A and H provided by the proposed semi-blind receivers (Algorithms 1 and 2), for several values of the number R of relays per user.These figures show that an increment in the number of relays enhances the channel estimation quality, due to the higher number of received signals obtained when R is increased.Moreover, the NMSE performance of the two proposed semiblind receivers are very close, similarly to the SER behavior in Figs. 5 and 6.In Figure 7, the NMSE provided by the tensor-based receivers of [17] and [18] is also showed (with the same parameters of Fig. 5).It is also possible to see that the proposed techniques with R = 2 (similar transmission rate) perform better than the ones of [17] and [18]. of the training sequence, the symbol estimations become more accurate.In particular, when N t = 8, the SER of proposed techniques is very close to that of the ZF receiver.One can also note from Fig. 9 that the proposed supervised receivers provided very similar SERs.Fig. 10 shows the SER versus SNR for the proposed supervised receivers (Algorithms 3 and 4), with several values of R and P .From Fig. 10, we can observe that the two supervised receivers provided very similar SERs.Moreover, a better SER performance is obtained when the number of relays is increased, due to the higher degree of cooperative diversity provided by the relays.As well as for the semi-blind techniques, the proposed supervised techniques are able to increase the order of diversity when R is augmented, even with multiuser interference at the relays.As well as for the semiblind techniques, we can observe that decreasing the value of P leads to a small SER loss.  .This is due to the fact that the dimensions of the matrix A does not depend on R, while the number of rows of H(3) are proportional to R. Thus, when R is augmented, the number of received signal is increased, however, the number of parameters to be estimated of H( 3) is proportionally increased. .

C. Comparison Between Scenarios
Fig. 12 compares the SER provided by the Semi-Blind and Supervised ALS receivers (Algorithms 1 and 3, respectively), with N t = 4 and 8 in the supervised technique.We can see from this figure that the performance of the supervised ALS highly depends of N t .Indeed, for N t = 4, the supervised ALS provides a SER higher than the semi-blind technique.However, the supervised ALS outperforms the semi-blind scenario when N t = 8.
Aiming to provide a fairer comparison between Algorithms 1 and 3 in terms of transmission rate, we simulated a supervised version of Algorithm 1.This technique, denoted by Supervised Alg. 1, can be summarized as follows.In the first part (supervised stage), the ALS algorithm is carried out with the knowledge of the matrix S (Step 3 of Algorithm 1 is not performed).In the second part (non-supervised stage), the matrix is estimated by means of a ZF receiver using the estimations of A and H obtained in the first stage.Fig. 12 shows the SER provided by the Supervised Alg. 1 technique with N t = 8.It can be viewed that this technique significantly outperforms the Supervised ALS method, due to the use of pilot symbols.However, the Supervised Alg. 1 provides a SER performance much worst than the Supervised ALS (Algorithm 3).This is due to the fact that the Supervised ALS does not consider multiuser interference at the relays.
The last two results evaluate the computational complexity of the proposed algorithms.Figure 13 shows the number of iterations for convergence of the ALS algorithms versus the SNR, whereas Figure 14 shows the FLOPS (floating operations per second) count versus the number of users M, for a fixed SNR of 15 dB.As expected, in Figure 13, the number of iterations decreases when the SNR is augmented.Moreover, by increasing M, more iterations are needed for convergence, due to the higher number of data to be estimated.We can also see from Fig. 13 that the Supervised ALS needs less iterations to converge than the Semi-blind ALS, due to the fact the Supervised ALS assumes the knowledge of the symbol matrix.
In Figure 14, it can be viewed that the number of floating operations per second required for the Semi-Blind ALS is higher than the ones of the other algorithms, due to its higher number of iterations, as shown in Fig. 13.On the other hand, due to the smaller number of iterations, also showed in  Fig. 13, the Supervised ALS requires an intermediary amount of operations.The Supervised LS-KRF showed the second higher number of floating operations per second because it has to compute the SVD of a large dimension matrix ( Ĝ ∈ C NtKR×M ), whereas the Semi-Blind LS-KRF performs SVDs of lower dimension matrices, being the algorithm that requires less operations.

VII. CONCLUSION
In this paper, four tensor-based receivers that jointly estimate the transmitted symbols, channel gains and spatial signatures are proposed for a cooperative multirelay communication system.The presented system model takes advantage of both cooperative and spatial diversities, as well as of a timespread coding used by the relays.Two propagation scenarios were considered.In the first scenario, multiuser interference is not considered at the relays, whereas in the second scenario multiuser interference at the relays is assumed.The proposed algorithms are based on the iterative ALS algorithm and on the non-iterative LS-KRF method.A quadrilinear PARAFAC model was adopted for the first scenario whereas a new tensor model called NPTD was used in the second scenario.
The LS-KRF receivers performed equally to their ALS counterparts, with less complexity.On the other hand, the proposed ALS receivers have weaker identifiability conditions than their LS-KRF counterparts.The results showed that, increasing the number of relays or the training sequence length, we can compensate the negative effects that a high number of users add to the system performance.Perspectives of this work include the proposition of new algorithms, the use of other coding schemes and other propagation scenarios.The development of another receiver, based on the approach of [34], is also a perspective of this work.

Fig. 4 .
Fig. 4. Cooperative Uplink with M users and 3 relays for each user.

(
35) corresponds to a PARAFAC model with factor matrices given by A ∈ C K×M , H ∈ C R×M , S ∈ C N ×M and C ∈ C P ×M , where A is the antenna array response matrix with [A] k,m = a k (θ m ), H is the channel coefficient matrix with [H] r,m = h r,m , S is symbol matrix with [S] n,m = s n,m and C is the time-spreading coding matrix with [C] p,m = c p,m .
If condition (42) is satisfied, any set of matrices (A', H', C' and S') that satisfies (35) are related with the original factor matrices (A, H, C and S) by A' = AΠ∆ A , H' = HΠ∆ H , C' = CΠ∆ C and S' = SΠ∆ S , with Π ∈ C M ×M being a permutation matrix and ∆ A , ∆ H , ∆ C and ∆ S are diagonal matrices that meet ∆ A ∆ H ∆ C ∆ S = I M .Assuming that A, H, C and S are all full k-rank, then, condition (42) turns into: min(K, M ) + min(R, M ) + min(N, M ) + min(P, M ) ≥ 2M + 3.

Fig. 9 3 Fig. 8 .Fig. 9 .
Fig. 9 shows the SER of the proposed supervised receivers (Algorithms 3 and 4), as well as the SER provided by the ZF receiver, for several values of the training sequence length N t (N t = 4, 6 and 8).The ZF receiver corresponds to the Step 11 of Algorithm 3, but assuming perfect knowledge of all factor matrices.It is possible to see from this figure that the SER is decreased when the length of the training sequence is augmented.As expected, by increasing the length

Fig. 11
Fig.11shows the NMSE of A and H(3) versus the SNR for the proposed supervised receivers (Algorithms 3 and 4), with several values of R. We can conclude from this figure that the NMSE of the two proposed supervised receivers are very close.Moreover, we can observe that an increase in the value of R improves the estimation accuracy of A, but it does not improve the estimation of H(3) .This is due to the fact that the dimensions of the matrix A does not depend on R, while the number of rows of H(3) are proportional to R. Thus, when R is augmented, the number of received signal is increased, however, the number of parameters to be estimated of H(3) is proportionally increased..

8 Fig. 13 .
Fig. 13.Number of iterations versus SNR -ALS receivers for several values of M.

Fig. 14 .
Fig. 14.FLOPS count versus number of users (M) of all proposed algorithms.