Performance Evaluation of Low-Complexity Algorithms for Orthogonal Time-Frequency Space Modulation

Data transmission in wireless systems brings numerous challenges, especially when it involves propagating in a multipath scenario over rapidly time-varying channels. In this context, Orthogonal Time-Frequency Space (OTFS) modulation has been recently proposed to work with time-frequency selective channels with high Doppler. In this modulation, the symbols are first multiplexed in a delay-Doppler domain rather than in the time-frequency domain used by Orthogonal Frequency Division Multiplexing (OFDM). The studies point out advantages of OTFS performance over OFDM in many aspects, such as data rate increase in high mobility. Another advantage is the sparsity of the channel produced by OTFS that allows using lowcomplexity algorithms for the detection of the data. In this paper, the performance of OTFS modulation in a doubly dispersive channel is evaluated with several low-complexity variants of the message passing algorithm (MPA) in terms of complexity and Bit Error Rate (BER) performance. The results show that MPA and Approximate Message Passing simplified by Expectation Propagation (AMP-EP) algorithms achieve higher performance. However, when taking into account both complexity and BER performance, AMP simplified by First-Order (AMP-FO) achieves the best performance-complexity tradeoff.


I. INTRODUCTION
The increased demand for data rate and User Equipments (UE) restricted to a limited electromagnetic spectrum for wireless communications, make the project requirements along the evolution of the generations more challenging. Fourth generation (4G) networks have achieved a big success, due to their ability to provide high data rates for a large number of users using the Orthogonal Frequency Division Multiplexing (OFDM) modulation [1] [2].
OFDM is a special form of multicarrier modulation which is particularly suited for transmission over a dispersive channel, where different subcarriers are orthogonal to each other. OFDM is a wideband modulation scheme that is designed to cope with the problems of the multipath channels. Here, the wideband frequency selective fading channel is divided into many narrow-band subchannels. If the number of subchannels is high enough, each subchannel may be considered as flat.
Although OFDM has been implemented in the 4G of mobile systems, it is not robust to time-varying channels with high Doppler spread (such as, high-speed rail mobile communications) [3]. The Orthogonal Time-Frequency Space (OTFS) modulation proposed by Hadani and et al. appears as a solution [4], targeting for example applications in the fifth generation (5G) of mobile systems, which requires a higher data rate and a higher UE speed.
OTFS shows significant advantages in doubly dispersive channels over OFDM [5], [6], [7]. The delay-Doppler domain is an alternative representation of a Linear Time-varying (LTV) channel modeling due to moving objects in the multipath scenario. For this, the OTFS modulator spreads each information (e.g., QAM) symbol over a set of two-dimensional orthogonal basis functions, which span across the frequencytime resources required to transmit a burst. The basis function set is specifically projected to deal with the dynamism of the time-varying multipath channel.
Using the technique of transformations in two dimensions, OTFS converts a doubly-dispersive channel into an almost non-fading channel in the delay-Doppler domain [8]. Hence, each symbol in a frame suffers an almost constant fade, thus achieving significant performance gains over existing modulation schemes that are not robust to strong Doppler such as OFDM. Moreover, since there are typically a small number of physical reflectors with Dopplers and associated delays in a multipath channel, few parameters are required for channel modeling and estimation in the delay-Doppler domain.
The impulse response of the channel in delay-Doppler domain is a sparse matrix [9], thus allowing to use lowcomplexity detector algorithms, such as message passing algorithm (MPA). It also has important implications for channel estimation/prediction and tracking [4].
The aim of this paper is to study, evaluate and compare different low-complexity MPA-based detectors for OTFS systems over time-frequency selective channel with high Doppler. Both bit error rate (BER) performance and complexity analysis are evaluated. The considered algorithms are the following: i) Factor Graph with Gaussian Approximation of Interference (FG-GAI) proposed in [10] whose linear complexity is attractive for detection in large-dimension channels, ii) Approximate Message Passing using Gaussian Approximation (AMP-GA) in which the calculations of the probability messages are updated by the calculations of means and variances between the nodes, iii) AMP simplified by Expectation Propagation (AMP-EP) and iv) AMP simplified by First-Order (AMP-FO), proposed in [11].
The paper is organized as follows. Section II presents the characteristics of OTFS system and the adopted channel model. In section III the different detector algorithms based on message passing are detailed. In section IV is presented complexity analysis algorithms. In section V the coupling of OTFS with MIMO systems is considered. In section VI the results of the performance of algorithm simulations are presented and discussed and section VII is dedicated to the conclusions and perspectives.

II. GENERAL DESCRIPTION OF OTFS
Traditional OFDM modulation operates in the frequencytime domain. An OFDM resource elements (RE) occupies one subcarrier on one particular OFDM symbol. In contrast, OTFS modulation operates in the Delay-Doppler domain, which is related to frequency and time by the Symplectic Finite Fourier Transform (SFFT), a two-dimensional Discrete Fourier Transform (DFT) [4], [12]. The OTFS modulation framework can be understood as a time-frequency multicarrier modulation with an additional pre-processing transformation from delay-Doppler domain to the time-frequency domain of the information symbols, by Inverse Symplectic Finite Fourier Transform (ISFFT). Hence, OTFS can be implemented as a pre-processing step on top of an underlying OFDM signal [13].
In OTFS, the quadrature amplitude modulation (QAM) symbols are indexed by points on a grid in the Delay-Doppler domain. Through ISFFT, each QAM symbol weights a 2D basis function defined in the Time-Frequency domain. The size of the delay-Doppler resource grid is related to the size of the frequency-time plane by the signal properties, i.e. bandwidth (B), Transmission Time Interval (TTI), pulse time duration (T ), sub-carrier spacing (∆f ), number of subcarriers (M ) and symbol block length (N ).
Then, the delay-Doppler grid consists of M points (number of subcarriers) along delay with spacing ∆τ = 1 M ∆f and N points (number of symbols) along Doppler with spacing ∆ν = 1 N T . The reciprocal time-frequency grid consists of M points along frequency with spacing ∆f = B M and N points along time with spacing T = T T I N [3]. Hence, the time-frequency grid can be interpreted as a sequence of N multicarrier symbols each consisting of M subcarriers, i.e. the bandwidth of the transmission B is the inverse to delay resolution ∆τ and the duration of the transmission T T I is inverse to the Doppler resolution ∆ν. The two grids are shown in Figure 1.  In summary, based on these definitions, it can be seen that the time-frequency plane is discretized in the grid by sampling the time and frequency axes in intervals of T (seconds) and ∆f (Hz), as: (1) Consequently, the delay-Doppler plane is discretized, as: Basically, a 2D ISFFT maps the information symbols x[k, l] of a grid Γ(2) in the delay-Doppler domain on a sequence of complex numbers X[m, n] mapped in the grid Λ (1) in the time-frequency domain, as follows: After this step of pre and post-processing blocks, one can implement the conventional OFDM modulation/demodulation. Figure 2 shows the OTFS system diagram. The OFDM modulator is applied to time-frequency symbols X[m, n] to convert the time-frequency modulated symbols to the time domain signal s(t) for transmission over the channel. Hence, the signal output of OTFS transmitter will be:

OTFS
where g tx (t) is pulse shaping used in the transmitter. In the following we will consider rectangular pulses of amplitude equal to one and duration T .
From equation (4), it is observed that every OTFS QAM symbol is spread over the full time-frequency grid and hence it is possible to exploit all the channel diversity.
The s(t) transmitted signal propagates through a timevarying channel with complex baseband channel impulse response h(τ, ν) and noise w(t). After passing through the channel, the received signal r(t) is given by: The channel modeling can be characterized by obtaining the Delay-Doppler Profile (DDP) of the channel [14], that contains the delay and Doppler paths associated with each multipath reflector. Given the sparsity of the channel representation, it is convenient to express the response h(τ, ν) in the form: where P is the number of propagation paths, h i , τ i and ν i represent the path gain, delay, and Doppler shift (or frequency) associated with i th path, respectively, and δ(·) denotes the Dirac delta function. The delay and Doppler taps for the i th path are equal to: wherel τi denotes the fractional delay,k νi denotes the fractional Doppler shift, N T and M ∆f denote the total duration and bandwidth of the transmitted signal frame, respectively, with T.∆f = 1. The interference caused by fractional delay and fractional Doppler shift is effectively suppressed when M and N are sufficiently large to approximately achieve ideal OTFS resolution. Then we can considerl τi =k νi = 0 [15]. The received signal r(t) is sampled at a rate f s = M ∆f = M T and a signal r[n] is formed, whose entries, from eq. (5) and eq. (6) are equal to: Then, at the receiver, the time domain received signal can be mapped to the time-frequency domain by an OFDM demodulator, and then to the delay-Doppler domain by SFFT.
Based on this mathematical description, in discrete domain the authors in [16] have used properties and identities between vectors to process the OTFS system. Following the same notation as a for vector, A for matrix, and A H to represent the Hermitian transpose, the transmitted signal can be written as: where X ∈ C M ×N denote the two-dimensional information symbols transmitted in the delay-Doppler domain; F n = , as: where w is the noise vector and H is the following M N ×M N matrix: with Π the permutation matrix (forward cyclic shift), and ∆ the M N × M N diagonal matrix: where z = e j2π M N . The matrices Π and ∆ model the delays and the Doppler shifts in eq. (5), respectively.
At the receiver, the received signal samples r are transformed into the time-frequency domain symbols R = vec −1 (r), then into the delay-Doppler domain symbols Y = F H M (F M G rx R)F N . To do this, an M -point FFT followed by an SFFT is applied. Here, G rx ∈ C M ×M is the diagonal matrix of the receiver pulse. In vectored form the received signal in the delay-Doppler domain can be written as: After substituting the transmitted signal vector s in eq. (10), we obtain: where is a sparse matrix that denotes the effective channel matrix, w = (F N ⊗ G rx )w is the noise vector with variance σ 2 0 . Due to the sparsity of H eff , it is possible to implement low-complexity detector algorithms to obtain the estimated symbols [17]. The algorithms based on message passing are well-adapted for this. These algorithms use a representation of the matrix H eff by a factor graph. The next section explains the different variants of message passing algorithms used in this paper.

III. DETECTOR ALGORITHMS BASED ON MESSAGE PASSING
Most of the estimation and inference problems in the field of digital communications can be described using a graphical representation such as the bayesian networks or the factor graphs [18]. A factor graph is a bipartite graph that specifies the joint distribution of the random variables x i taking value in a given domain. It is composed of two sets of vertices or nodes and a set of branches or edges. The two sets of nodes are : • the variables nodes x i , graphically represented by circles on Fig. 3 • the function nodes f j , represented by squares on Fig. 3 An example of factor graph is given in Figure 3 Variable Nodes x i The number of branches d f (j) = |m(j)| that converge on a given function node is called the degree of the function node f j . Similarly, the number of branches d x (i) = |n(i)| that converge on a given variable node is called the degree of the variable node x i . In our case, since the number of non zero element of the lines and columns of H eff is equal to the number of propagation paths P , the factor graph is regular In the next subsections, we will detail the following algorithms based on MPA: the original MPA, the Approximate Message Passing (AMP) in the version Factor Graph using Gaussian Approximation of Interference (FG-GAI) [10], the AMP using Gaussian Approximation (AMP-GA), AMP using expectation propagation (AMP-EP) and AMP using first order (AMP-FO) introduced in [11].

A. Message Passing Algorithm (MPA)
The aim of message passing algorithm is to estimate the marginal probabilities µ xi for all variables x i .
In the MPA, at each iteration, the algorithm computes messages or beliefs from the variable nodes to the factor nodes and then messages from factor nodes to the variable nodes. The messages are propagated usually in parallel, to the next factor node. This order, translated into the factor graph context and the SPA (Sum-Product Algorithm), results in the message update schedule.
The notations used to describe MPA based algorithms are the following: µ fj →xi are the messages from factor node f j toward variable node x i and µ xi→fj are the messages from the variable nodes x i towards the factor nodes f j .
These messages or beliefs, are a function of a variable node x i either in one direction or the other. The message from x i to f j represents the probability that x i has a certain value, given the observed value of this variable and given the values it received from the other factor nodes linked to x i , except f j . Using a Z-QAM (Z = constellation size), the messages have Z distinct values, one for each possible value from x i .
Then, to initialize this algorithm, firstly the probability mass functions to each factor node based on corresponding variables node to each value of α s is computed as: where α s belongs to alphabet A (|A| = Z), h j,l is the element of the j th row and l th column of matrix H eff (channel transfer matrix), σ 2 0 is the noise variance, and y j is the correlated received signal term.
The computation of all messages of the factor node f j to the corresponding variables node x i starts considering that the chances of the symbols of alphabet A are equality probable. Then, each message is computed according to the sum-product rule [18], where the previous product of all the messages sent from the variable node x i are summarized for each associated factor node f j , as follows: (17) Then, the messages from variable node to factor node are updated by the resulting product of messages from factor node to variable node, as follows: During message exchanges, a normalization process followed by the application of the damping factor (∆) is used. In this process, the values of the transferred messages from variables node to factor node are normalized (µ xi→fj ) by the addition of the corresponding x i for each QAM symbol. Then, the damping factor is calculated at each iteration (t) by total the normalized messages with applying a weight, as shown by eq. (19).
The application of the damping factor is a technique that helps minimizing the BER when evaluating the best number of iterations for the decoding algorithm, depending on the density on H eff matrix.
At iteration t we can estimate the marginal distribution Next, the Log Likelihood Ratio (LLR) calculation is applied to perform a test based on the probabilities ratio and thus infer about the detection of the received bit sequence. The LLR calculation Λ b→l from the transferred messages from the b th factor node to the l th variable node (b, l → f j , x i ) is based on the fundamentals of [20] and considering QAM symbols. The MPA is detailed in pseudo-code in Algorithm 1.

B. Gaussian Approximation of Interference (FG-GAI)
In FG-GAI [10], the messages µ t fj →xi (x i ) are replaced with Gaussian approximation of the interference. The received signal y j is given by: and the interference term w fj →xi will be modeled as a Gaussian variable with mean z fj →xi and variance ν fj →xi . As in MPA, the iteration starts with the calculation of the messages from the factor node f j to the variable nodes x i . The means z fj →xi and variances ν fj →xi are calculated as follows: where α s ∈ A, h j,l is the element of the j th row and l th column of matrix H eff , E(x) the expectation of x and σ 2 (x l ) is equal the variance of x l , defined as: Then the variables node x i updates its probability function that is conditioned to the corresponding value of y vector (y b ) to each x i that belongs to a valid symbol in constellation (alphabet A) and send to f j that responds with the mean and variance of the others x i . Then the messages from the variables node to factor node are updated. The probabilities for each possible symbols µ t xi→fj (x i = α s ) is calculated from the means and variances that correspond to the factor nodes linked to x i as follows: The marginal distribution µ t xi (x i ) can be calculated taking into account all the incoming messages: Another simplification of MPA using Gaussian Approximation is presented by [11]. It is called the Approximation Message Passing Using Gaussian Approximation (AMP-GA). In AMP-GA, basically the messages of mean and variances of the variable nodes are updated from messages of factor nodes by the calculation of a complex Gaussian function.
Let us denote µ t xi→fj (x i ) the message sent from the variable node x i to factor node f j in the t th iteration, and let us denote µ t fj →xi (x i ) the message from the factor node f j to variable node x i . Then, the message update rules are given by eq. (17) and eq. (18).
Knowing that symbols belong to a discrete set QAM symbols (α s ∈ A) , the calculation of the messages requires considerable complexity to marginalize a random vector x\x i . To deal with such complexity, as in [11] the minimum of Kullback-Leibler divergence criterion is applied in the AMP-GA to calculate parametersx t xi→fj (mean of projection distribution) andτ t xi→fj (variance of projection distribution). The message updated from the factor nodes to variable nodes are equal to: Considering x i as a continuous random variable and approximating the message into a complex Gaussian function, µ t fj →xi (x i ) can be calculated by integration in (17), as follows: where N C (x;x;τ ) (πτ ) −1 exp(−|x −x| 2 /τ ) denotes a complex Gaussian function. The parameters z t fj →xi (mean messages from factor nodes to variable nodes) and ν t fj →xi (variance messages from factor nodes to variable nodes) are given by: Now, in (18), the messages µ t xi→fj (x i ) can be normalized as follows: where γ t−1 xi→fj (variance messages from variables nodes to factor nodes) and ζ t−1 xi→fj (means messages from variables nodes to factor node) are given by: The marginal distribution µ t xi (x i ) can be calculated as follows: where γ t xi and ζ t xi are the estimated mean and variance of x i : (27), ∀j ∈ n(i) Computeτ t xi→fj using (28), ∀j ∈ n(i) end for{Computation of messages from FN to VN} for j = 1 to O do Compute z t fj →xi using (30), ∀i ∈ m(j) Compute ν t fj →xi using (31), ∀i ∈ m(j) end for {Computation of messages from VN to FN} for i = 1 to O do Compute γ t xi→fj using (33), ∀j ∈ n(i) Compute ζ t xi→fj using (34), ∀j ∈ n(i) Damping calculation by (19) end for end for Computation of LLR Decision calculation

D. AMP simplified by Expectation Propagation (AMP-EP)
The AMP-EP proposed by [11] arises as an alternative to reduce the computational complexity of AMP-GA introduced in the calculation of the messages from the variable nodes to factor nodes µ t xi→fj (x i ) in (32). These messages are replaced by the so-called symbol belief (β t (x i )) that is approximated by a Gaussian probability density function (PDF) as follows: Thus, in this approach, the message µ t xi→fj (x i ) is replaced by the symbol belief that is based on a Gaussian PDF, i.e., in this case we have an approximate message µ t xi→fj (x i ) calculated from an approximate belief of the symbol β t (x i ). After calculating the symbol belief for each variable node, starts the calculation of parametersx t xi (means messages) and τ t xi (variance messages) and then the exchange of messages of the variable nodes to factor nodes (x t xi→fj andτ t xi→fj ). To compute these parameters, firstly the values ofx t xi andτ t xi are updated using the calculated belief symbols, as follows: Finally,x t xi→fj andτ t xi→fj , the mean and variance messages from the variable nodes to the factor nodes are obtained as follows:τ Now, the messages from the factor nodes to the variable nodes are updated with the values ofτ t xi→fj andx t xi→fj previously computed. As a result, the messages of variance (ν t fj →xi ) and means (z t fj →xi ) will be used as input parameters for the calculation of the Gaussian PDF and thus will update the symbols belief β t (x i ) that will be the basis of calculation of the next iteration. The messages z t fj →xi and ν t fj →xi are computed by equation (30) and (31).
Then, the marginal distribution µ t xi (x i ) is obtained directly from the belief symbol β t (x i ) and the LLR can be obtained as in the MPA.

E. AMP simplified by First-Order (AMP-FO)
Further simplification AMP-EP, the last alternative of MPA for the reduction of complexity proposed by [11] is the AMP-FO. In this algorithm, the messages are rewritten after recursive updates and the negligible terms are omitted in the large system limit.
To adapt to OTFS decoding, we first rewrite the standard messages in (32) as follows: where γ t−1 xi (variance messages variables nodes) and ζ t−1 xi (means messages variables nodes) which are the messages exchanged from variables nodes to factor nodes are given by: with z t f b and ν t f b the means and variance messages from factor nodes to variable nodes, respectively. To initiate the exchange of messages from factor nodes to variable nodes, eq. (43) is updated for all variable nodes and then the mean and variance of projection distribution for each symbol of the QAM alphabet is calculated as: Then, all the means and variances of the messages exchanged from factor nodes to variables nodes are calculated as follows:

IV. COMPLEXITY ANALYSIS
In this section, we analyze the complexity of the considered algorithms by counting their required number of floatingpoint operations (FLOP). Flop counts are obtained by adding the arithmetic operations associated with the most deeply nested statements in an algorithm [19]. In the previous section we have presented the simplifications introduced by each algorithm in order to reduce the complexity starting from the MPA, followed by FG-GAI, AMP-GA AMP-EP and AMP-FO (from the most complex to the less complex algorithm).
All the message passing algorithms have a preprocessing step to compute the square norms |h j,l | 2 that requires 3P O FLOPs.
The complexity of MPA and FG-GAI is mainly due to the message exchange from VN to FN and from FN to VN. However MPA has an additional preprocessing step for calculating  Table I shows the number of total FLOPs per iteration of each algorithms as a function of the size of the used Z-QAM modulation, the number of paths P and the number of VNs and FNs O.   Table I, Figure 4 presents the complexity in terms of FLOPs as a function of the constellation size Z for each algorithm. We have considered the case P = 4 paths, 64 subcarriers with 64 symbols (O = 4096). We will use the same set of parameters to evaluate the bit error rate performance in this study. As expected, we can see in Figure 4 that the MPA algorithm is the most complex and its complexity increases considerably when increasing Z, while AMP-FO is the less complex and its complexity increases more slowly when increasing the constellation size. In FG-GAI, replacing µ fj xi with means and variances reduces the complexity by a factor of 75 at Z = 16 compared to MPA. By replacing the messages by means and variances, the complexity of the AMP algorithms is further reduced. A complexity reduction factor of 10 is obtained between FG-GAI and AMP-EP. Finally, the AMP-FO algorithm achieves the lowest complexity since its complexity is about 30% of the one of the EP algorithm by computing only the mean and variance information at the VNs and FNs nodes.

V. EXTENSION TO MIMO SYSTEMS
The input-output relation of a SISO-OTFS system given in equation (15) will be replace by MIMO input-output relation taking into account the multiple transmit and receive antennas [14]. Let us define the vectorized received signal at the jth antenna as y j and x i as the vectorized signal at the ith transmit antenna. We assuming a MIMO system composed of n t transmit antennas and n r receive antennas. Then, from equation (15), we have the following set of input-output equations: . . .
where H ji is the effective channel vector between the ith transmit antenna and the jth received antenna. We can rewrite equation (50) in a compact form as follows: where H M IM O is the MIMO effective channel matrix and ..,w T nr ] T is the noise vector of MIMO-OTFS systems.
According to this context, extending the study to MIMO-OTFS systems implies a substantial increase in the complexity of the algorithms of the Section III, mainly of the MPA that takes into account each edge between variable node and factor node which depends on the number of paths. Besides that, in a multipath scenario each antenna of the MIMO system configuration will suffer with the fading of each path.

VI. SIMULATION RESULTS
In this section, we will evaluate the Bit Error Rate (BER) performance of the OTFS system considering the different low-complexity algorithms presented in Section III (MPA, FG-GAI, AMP-GA, AMP-EP and AMP-FO) over a delay-Doppler channel model in a multipath scenario. The simulation parameters are given in Table II. The channel model is the delay-Doppler Profile multipath model [14]. Based on the parameters given in Table II, we have considered two different scenarios, both with four paths, where each reflector has delay shift multiple of 1 µs and Doppler shifts multiple of 234 Hz. The delay and Doppler shifts for each path in the two considered scenarios are provided in the Table III and Table IV, respectively. In Scenario 1, all reflectors have different Doppler shifts but they are all in the same direction while in Scenario 2 the reflectors are in different directions (two positive and two negative Doppler shifts).  We have first performed a study of the BER performance as a function of number of iterations for each algorithm to establish which value is the more appropriate for the presented system. The results for the scenario 2 and considering SN R = 12 dB are shown in Figure 5. According to Figure 5, we can infer that, for the considered DDP model, the required number of iterations for FG-GAI algorithm is 10, AMP-GA is 15, AMP-EP is 20 and AMP-FO is 15 iterations. The MPA algorithm converges faster and consequently requires only 5 iterations. Once the required number of iterations has been determined, we have studied the impact of the damping factor for scenario 2 using eq. (19) on the BER performance in the range of 0.45 to 0.75. The influence of the damping factor is given on Figure 6.  Figure 6 shows that the damping factor does not bring significant gains with regard to BER. The best damping factor for FG-GAI algorithm is 0.55, for AMP-GA is 0.5, AMP-EP is 0.65 and AMP-FO is 0.6. The MPA algorithm showed no variations in BER for the considered range. The same number of iterations and damping factors have been obtained when considering the scenario 1. Table V presents a summary of the damping factors and number of iteration that will be used in the next simulations. Using the above parameters, we have simulated the BER performance of the OTFS system for each low-complexity algorithm detector in a SNR range of -13 to 17 dB and both scenarios presented in tables III and IV. Figure 7 and Figure  8 present the BER performance as a function of the SNR for Scenario 1 and Scenario 2 respectively. We used the channel model defined by equation (6), then all graphs are based on the same H eff matrix.
As shown on Figures 7 and 8, all algorithms have similar performance at low SNR, however we can observe differences from 12 dB. In Scenario 1, the MPA algorithm, which is the most complex algorithm, achieves the best BER performance. At BER= 10 −3 the performance losses of AMP-EP and AMP-FO are 1.25 dB and 2.25 dB respectively with respect to MPA. On the other hand, in Scenario 2 the AMP-EP achieves the same performance than the MPA. At BER= 10 −3 the performance loss of AMP-FO is 0.8 dB with respect to the MPA. In both scenarios, AMP-GA has the worst performance, followed by FG-GAI and AMP-FO.
In both scenarios, the BER performance of the AMP-FO algorithm is relatively closed to the one of the AMP-EP while AMP-FO is significantly less complex. We can conclude that AMP-FO in both studied scenarios provide a good compromise between its complexity and its achieved BER performance.
In relation to the MIMO-OTFS, due to the considerable increase in the complexity of the algorithms, we chose the three least complex algorithms based on AMP: AMP-GA, AMP-EP and AMP-FO to analyze the performance for the second scenario considering a set of two transmit antennas and two received antennas. Thus, according to (52), H M IM O will have a dimension of 2M N × 2M N (8192 × 8192) and therefore we will have now 8 non-zero elements in each row and in each column of the factor graph associated to the effective channel matrix since each antenna will suffer from the fading of each of the 4 paths of the considered multipath scenario. The Figure 9 shows the result for the MIMO-OTFS case: As expected, the MIMO-OTFS scheme performs better than the SISO-OTFS, where a gain of 5 dB is observed at BER = 10 −1 . Due to the spatial diversity, the detector algorithms presented a similar performance behavior, diverging a little at 10 dB. In other words, the less complex algorithm AMP-FO also has the best performance-complexity tradeoff for MIMO-OTFS.

VII. CONCLUSIONS AND PERSPECTIVES
In this paper we have studied, evaluated and compared four low-complexity MPA-based detectors for OTFS systems over time-frequency selective channel with high Doppler in term of BER performance and complexity analysis. As expected, the MPA and AMP-EP algorithms achieve the best BER performance. However, the complexity AMP-FO algorithm is significantly lower since it is only about 30% of the complexity of the AMP-EP algorithm. The AMP-FO algorithm is the less complex studied algorithm and it allows a BER performance degradation of less than 1 dB and 2.25 dB at BER = 10 −3 compared to the AMP-EP and the MPA algorithms respectively in SISO-OTFS. Indeed, the AMP-FO algorithm gives the best performance-complexity tradeoff in both the SISO-OTFS system and the MIMO-OTFS system As future works, we will study applications of channel estimation techniques in OTFS and consider the extension to massive MIMO systems. Other areas of interest and performance: waveform optimization; multi-linear signal processing; physics, mathematics and scientific methodology for the training of engineers; system modeling; numerical methods and educometry.