MUSA Grant-Free Access Framework and Blind Detection Receiver

—Recently, a non-orthogonal multiple access scheme called multi-user shared access (MUSA) was proposed to provide massive connection capability of low-complexity devices in the 5G networks. MUSA achieves higher spectral efﬁciency allowing independent devices to transmit data on the same physical layer time-frequency resources. Furthermore, MUSA introduces a grant-free transmission and a blind multi-user detection at the receiver, reducing the complexity on the transmit side. This approach is interesting for Internet of Things applications over mobile communication networks, where the devices have limited power and processing capacity. The references available in the literature about this multiple access scheme do not bring sufﬁcient details about the MUSA multi-user detector. This limitation makes it difﬁcult to evaluate the MUSA performance and to propose improvements for this new technique. The main goal of this paper is to provide a framework describing the entire communication chain using MUSA as multiple access. This paper also brings a proposal for a blind multi-user detection, where the information about the MUSA parameters and the channel state information are unknown at the receiver side. The performance of the MUSA multi-user detector is improved by a deep learning based processing that enhances the quality of the channel estimation provided by a initial minimum mean square error estimator. The proposed deep neural network architecture employed to improve the channel estimation allows more users to share the same time-frequency resources for a given target block error rate, increasing the overall spectrum efﬁciency of the system.


I. INTRODUCTION
T HE Fifth Generation of Mobile Network (5G) has four different application scenarios, each one with specific key requirements. The enhanced Mobile Broadband (eMBB) [1] provides high throughput connectivity, aiming for 10 times the data provided by Long-Term Evolution (LTE) networks. The Ultra Reliable Low Latency Communications (URLLC) [2] supports robust low latency connectivity with a target Round Trip Time (RTT) of 1 ms. The enhanced Remote Area Communications (eRAC) [3] provides connectivity in long-range and remote areas, achieving at least 100 Mbps at 50 km from the Base Station (BS) while exploiting the TV White Space (TVWS). Finally, the massive Machine Type Communications (mMTC) [4] promises connectivity to a plethora of powerlimited Internet of Things (IoT) devices.
Although all applications scenarios imposes challenging requirements for the 5G network, accommodating a large number of IoT devices, which are unable to waste power on pilots signalling, on a limited number of Physical Layer (PHY) time-frequency resources deserves special attention. The main challenge for the mMTC scenario is to allow a massive number of power-limited and complexity-restricted devices to connect to the network. These IoT devices are typically sensors that transmit low amount of data and cannot afford the cumbersome synchronization process or the pilot signaling required by Orthogonal Multiple Access (OMA). The data traffic generated by these devices has a specific profile that can be exploited by innovative multiple access technique. Usually, the traffic generated by IoT devices is quite different from the human-driven communications scenarios. Typically, in mMTC applications, there is no severe restriction in the latency, the required data rate is very low and the number of transmission per device per day is small [5]. But a high number of devices are expected to be connected to the network, which means that the 5G BS must be able to handle several simultaneous transmissions from different limited devices [6].
Conventional OMA schemes will require large bandwidth to accommodate all transmissions, contradicting the premise of spectral efficiency for 5G systems [7]. Different Non-Orthogonal Multiple Access (NOMA) schemes have been considered as a potential solution for multiple access in IoT applications over 5G networks. NOMA techniques allows for massive connectivity without bandwidth increment due to the overloading concept [8]. In the overloaded Radio Access Network (RAN) [9], the data symbol of different users can be superimposed within the time-frequency resources, increasing the capacity and spectral efficiency of the system. NOMA schemes can control the mutual interference among the users at the cost of complexity increment at the BS [10]. There are two categories for the NOMA schemes in the literature: the power-domain NOMA [11] and the code-domain NOMA [12]. Among the code-domain NOMA the Sparse Code Multiple Access (SCMA) [13], Pattern Division Multiple Access (PDMA) [14], Interleaved Division Multiple Access (IDMA) [15], Lattice Partition Multiple Access (LPMA) [16] and Multi-User Shared Access (MUSA) [17] are the most promising solutions.
For IoT applications, MUSA becomes an attractive solution because it is a grant-free multiple access conceived for the mMTC scenario [17]. In this multiple access scheme, the data of a given user is spread with a short length code that belongs to a family of the non-orthogonal complex spreading sequences. The data symbol from different users are superposed in a set of time-frequency resources. As a grantfree access scheme, each user can choose its spreading code autonomously, eliminating the signaling for access schedule. Furthermore, no pilots or training sequences are used in the frame structure, resulting in a higher spectral efficiency. This approach means that the receiver has to blindly detect the users' data symbols, without the explicit Channel State Information (CSI). A blind detection based on Successive Interference Cancellation (SIC) can be used to implement the Multi-user Detector (MUD). Since no pilot signaling are transmitted, MUSA can reduce the overhead and the energy consumption of IoT devices, while providing high spectrum efficiency at the cost of higher complexity at the BS.
MUSA is receiving attention from the research community and several contributions have been presented after the introduction of this multiple access scheme, where the structures for transmitter and receiver have been presented, as well as the design of the complex spreading code [17]. Numerical results have shown that MUSA can achieve acceptable Block Error Rate (BLER) performance in very high overloading conditions [18]. However, in [17] and [18], an ideal receiver is used for the performance evaluation. It is assumed that the BS knows the CSI and the spreading sequences used by the IoT devices. These premises are incompatible with a grant-free multiple access with blind detection. Under this unrealistic assumption, the data of different users can be successfully decoded even in highly overloaded situations. This same unrealistic assumption is used by [10] for comparisons between MUSA and other NOMA schemes. In [12] and [19], the authors superficially describes the MUSA scheme without presenting implementation details. In [20], a more realistic receiver implementation was presented and the MUSA BLER performance was analyzed assuming flat fading channel and blind detection receiver. In [21], the BLER performance of a realistic blind detection receiver for grant-free MUSA is analyzed, assuming a frequency selective channel where no pilot nor preamble are used for channel estimation. The numerical results have shown that realistic grant-free MUSA with blind detection achieves an acceptable BLER performance even for a very large number of users colliding in the same transmission resource. Finally, [22] brings a performance analysis similar to the ones presented in [20] and [21]. However, [20], [21] and [22] do not describe how the receivers were implemented and how the SIC removes the interference. In other words, [20], [21] and [22] do not show how the receiver can decode the data of a given device without knowing the CSI and the spreading sequence used in transmission. Without these details, the results cannot be properly reproduced and improvements for MUSA cannot be proposed.
The aim of this paper is to provide a complete framework of how to implement a grant-free MUSA scheme with details of the transmitter and receiver with blind MUD based on SIC. The channel estimation on the receiver side will be improved by an Artificial Intelligence (AI) algorithm based on deep neural network (DNN) [23]. The DNN is previously trained with a set of Least Square (LS) estimated channel gains at high Signal-to-Noise Ratio (SNR) and low overloading factor. The trained DNN is then used to improve the channel estimation for a range of overloading factors.
The remaining of this paper is organized as follows: Section II presents the MUSA transmitter, while Section III shows the details of the blind MUD based on SIC. This section also describes how the DNN has been used to improve the channel estimation. Section IV brings the MUSA performance evaluation based on numerical simulation for different overload values. Finally, Section V concludes this paper.

II. PRINCIPLES OF THE MUSA TRANSMITTER
MUSA can accommodate a large number of devices in a limited number of time-frequency resources. The ratio between the number of devices and the number of time-frequency resources in the PHY layer is defined as overloading factor. In order to achieve high overloading factors, the data symbol of a given device is spread over a set of time-frequency resources by a complex sequence from a codebook, which is known at the receiver side. Other devices spread their data symbols using spread sequences from this codebook and transmit it using the same time-frequency resources. This procedure is similar to Direct Sequence Spread Spectrum (DS-SS) employed in Code Division Multiple Access (CDMA) [24]. However, instead of using orthogonal binary codes, MUSA employs non-orthogonal short length complex spread sequences. The authors in [17] have proposed a tri-level complex spreading code with real and imaginary parts in the set {−1, 0, 1}, leading to the Cartesian representation depicted in Fig. 1. The number of different codes that can be obtained from this alphabet is 9 , where is the length of the codeword. From this total of spread sequences, a group of pre-selected sequences can be used by the devices and this set is known by the receiver at the BS. As decreases, the probability of two different devices choosing the same spread sequence increases. This situation is called access collision and it can happen in a grant-free multiple access, such as in Carrier Sense Multiple Access (CSMA) employed in Wi-Fi networks [25].
However, while this collision in CSMA results in package drop, in MUSA it is not always fully destructive and data from two devices using the same spread sequence can still be successfully recovered. However, it is interesting to reduce this probability of collisions by increasing , at the cost of higher complexity on the BS receiver. Eq. (1) shows one MUSA codebook for = 15 and = 4.
Since MUSA is a grant-free access scheme, a given device can transmit without previous scheduling from the BS. Each device randomly chooses a normalized code sequence s , given by to spread its data symbol before transmitting, where g is the th column of G. After the data symbol is spread, the number of resources necessary to transmit the information is equal the length of the code, . Different modulation schemes can be used in this scenario. For instance, integration with Orthogonal Frequency Division Multiplexing (OFDM) is straightforward, since each time-frequency resource necessary to carry thelong spread sequence can be seen as subcarrier of one specific OFDM block. It is worth mentioning that no pilot or training sequences are transmitted for channel estimation at the receiver side. Also, no information about the sequences chosen by the devices are transmitted. Fig. 2 shows the block diagram of the MUSA transmitter.
Initially, a block of data bits b ×1 is generated by the th device. This bit vector is encoded by a channel encoder with code rate = / , resulting in the encoded block of bits, c ×1 . The encoded bits are mapped into in-pahse and quadrature symbols and, in this paper, Differential Binary Phase Shift Keying (DBPSK) has been chosen because it allows for phase tracking at the receiver side without previous knowledge of the CSI. Since mMTC devices transmit low amount of data per channel use, the low modulation order is adequate. The modulator introduces one reference symbol for phase tracking, resulting in a modulated block d ×1 , where = + 1. The modulated block is spread by a tri-level complex sequence s ×1 , arbitrary chosen from the codebook, resulting in the spread block m ×1 , where represents the length of the spread complex sequence. Next, the spreading block is allocated in a set of subcarriers from a OFDM symbol [18], which is generated by the Inverse Fast Fourier Transform (IFFT). It is assumed that each device receives < subcarriers in each OFDM symbol. Therefore, / different OFDM symbols are needed to transmit the m block. The frame necessary to transmit the m block is represented by X × .

III. PRINCIPLES OF THE MUSA RECEIVER
In the MUSA scheme, the devices share the OFDM timefrequency resources and the signal received in the th subcarrier of the th OFDM is given by where = 1, 2, · · · , and = 1, 2, · · · , / . The notation , represents the value in the th row and th column of X . The channel gain between the th device and the BS at the th subcarrier of the th OFDM block is represented by ℎ , , while , represents the Additive White Gaussian Noise (AWGN) noise samples with zero-mean and variance 2 . The received samples can be organized in a matrix R × that contains the received MUSA frame.
Due to the low complexity required by mMTC devices, the close loop power control shall be avoided, since this feature increases the communication overhead and requires extra control processing at the IoT devices. Hence, the Signalto-Interference-and-Noise Ratio (SINR) of the signal received from each device at the BS can vary significantly. This behavior is known as the near-far effect, which the closest device to the base station has, on average, the highest SINR, while the farthest one has the lowest SINR, on average, among the all other devices. This characteristic can be exploited by the receiver based on SIC. Furthermore, the difference among the SINRs from different devices can allow for successful data recovery when code collision from two different devices occurs.
The block diagram of a MUSA blind SIC-based MUD receiver is shown in Fig. 3. The SIC uses the recovered information received from a reliable link to remove the interference introduced in the information transmitted by other devices. This procedure is performed for all received data, starting with the signal with highest SINR to the signal with lowest SINR.
The matrix R × is defined after all / OFDM symbols are received. Assuming, without loss of generalization, that , 1 is the signal with highest SINR in the th subcarrier of the th OFDM symbol, then (3) can be rewritten as It is expected that the high SINR of the first signal to be decoded and the powerful Forward Error Correction (FEC) guarantee that the information is received without error. In order to proceed with the SIC algorithm, the column-wise Fast Fourier Transform (FFT) is performed in R × , decoupling the subcarriers. The resulting matrix is organized in a column vectorm ×1

1
. The despread block employs cross-correlation between the received vector and the known spreading codes to identify the code sequence employed by the device with the highest SINR.
In [20], [21], [22], the authors omit the procedure to blindly estimate the complex spreading. In [22], the authors mention that there is a metric that can be used to estimate the spread sequence, however, no further details are presented. Since the spread sequence estimation is essential for recovering the data sent by the devices, this paper presents a detailed description on how these sequences can be estimated by the BS. In the procedure proposed in this paper, the BS's receiver uses all known spread sequences from G to recover the data from a given device. Each sequence will led to a version of despreaded signal. All resulting sequences are detected using the  DBPSK receiver, which does not require the CSI. After the DBPSK detection, the sequence is decoded by the FEC and the the Cyclic Redundancy Check (CRC) for each recovered sequence is verified. The spread sequence s 1 that results in an error-free data is the sequence assumed to be employed by the device during transmission. Notice that this procedure requires that the received SINR must allow for the correct FEC decoding of the transmitted data. Hence, is it essential to select the signal from the device with highest SINR as the first one to be received. This approach has good performance when one device has a higher SINR than the others, as presented in [17]. If two or more devices have similar SINR, the system performance decreases significantly. At the end of the described procedure, the information data block of the device with highest SINR can be canceled from the received signal R in order to proceed with the detection of the information data block of the device with second highest SINR among all other remaining users. Therefore, the vectorb ×1 1 , recovered by using s 1 , is encoded, modulated and spread. This signal is assumed to be equal to the one produced by the th device and it must be subtracted from the received signal to cancel its interference on the other devices. The IFFT is applied to the feedback information block to generate the matrixX × 1 . This signal must be weighted by the channel response in order to be subtracted from R × . Although CSI knowledge is not necessary to demodulate the received sequence, this information is necessary for the SIC algorithm. X × 1 and R × are used to estimate the channel gain between the th device and the BS.
As suggested in [21], the LS estimator is used to estimate the channel gain as expressed below: where = 1, 2, · · · , / , x and r denotes the vectors containing points formed by the th column of theX and R, respectively. LS estimation provides poor estimation of the channel gains since the presence of noise is ignored in the estimation process. Hence, LS estimation error propagates by the SIC receiver and compromises the performance of the MUD. According to [26], AI algorithms can be used to improve the overall channel estimation accuracy by reducing the LS estimation error and reducing the impact of noise that is dominant in low SNR region. Motivated by these advantages, an optimized DNN architecture is employed on top of the LS estimation, where the improved version of the LS channel estimation is used to remove the interference of the th device from the received sequence. The details of the DNN used in to improve the LS estimation are presented next.

A. DNN-based channel estimator
Nowadays, DNN is used to mitigate or solve some problems in telecommunications area [27], [28], [29], in which it also includes the use of DNN applying in NOMA schemes [30], [31], [32], [33]. DNN is used in general to map input features to an output predicted values, through a set of mathematical operations performed by multiple connected layers, each containing multiple processing units called neurons. The DNN input-output mapping is accomplished by minimizing the DNN loss function that represents the input-output difference in the training phase, where the DNN is trained on known input-output data pairs. During the DNN training phase, the loss function minimization can be achieved through iteratively updating the DNN parameters denoted as weights to achieve the best possible performance. After that, the trained DNN performance is evaluated by its ability in predicting true outputs for new data inputs.
Let be the number of DNN layers, with neurons for each layer , where 1 ≤ ≤ and 1 ≤ ≤ . The neuron output˜ , can be expressed as follows: where˜ ∈ R e−1 ×1 ,˜ , ∈ R e−1 ×1 , and˜ , represent the neuron input, weight vector, and bias respectively. Each neurons ( , ) performs a linear transformation represented by the activation function , as shown in (6). Similarly, the DNN layer output can be expressed as follows After defining the DNN architecture, DNN should be trained on known dataset in order to minimize the loss function that measures how far apart the predicted DNN outputs˜ from the true outputs˜ T . The most common used loss function in regression problems is the mean squared error (MSE) function that can be defined as where train denotes the number of training samples. The main objective of the DNN is to update the total weights matrix , where the MSE between the predicted and true DNN outputs is minimized. To do so, several DNN optimizers can be used, like the stochastic gradient descent, and adaptive moment estimation (ADAM) [34].
The proposed DNN-based channel estimation depends mainly on LS channel estimatesĥ obtained from (5), where DNN is employed as an additional non-linear processing unit on top of the LS estimation to correct the overall estimation error by learning the channel frequency domain characteristics, resulting in improved estimated channels.
The proposed DNN-based channel estimation proceeds in the following steps: 1) Initial LS estimation as shown in (5).
2) The obtainedĥ is transformed from complex-valued to real-valued domain by stacking the real and imaginary part vertically, such thatĥ R ∈ R 2 ×1 . 3)ĥ R is fed as an input to the proposed DNN. 4) The output of the DNN is transformed back to the complex-valued domain, such thatĥ DNN ∈ C ×1 . The proposed DNN is trained using train = 8000 training samples (ĥ , h ), where h denotes the perfect channel for the th received symbol. The training dataset is generated using high SNR (40 dB), since the analysis provided in [26], show that training on high SNR values leads to a better DNN generalization functionality in lower SNRs due to the fact that in high SNR the impact of noise is low, thus the DNN is able to learn more the channel frequency domain characteristics. The mean squared error is chosen as a loss function that is optimized using ADAM optimizer. The testing datasets are generated using different SNR = [4, 11.2, 18.4, 25.6, 32.8]   performed on several DNN architectures in order to select the more efficient one in terms of performance and complexity. The simulations show that employing one hidden layer DNN with neurons is sufficient. Table I provides the parameters of the proposed DNN-based estimator.

IV. NUMERICAL RESULTS
The BLER is the main key performance indicator (KPI) for MUSA performance evaluation. In this paper, each device encodes data bits using a Polar Code, creating a codeword with length . A reference bit for the DBPSK modulation is added to the codewords, resulting in a block of = + 1 bits, which is transmitted by the devices using OFDM symbols with subcarriers, as described in Section II. Each device can spread their data using one of the complex spread sequences, each one with length . The sequences employed in this paper have been chosen to provide acceptable autocorrelation properties, allowing the BS to blindly estimate the sequences employed by the devices. On the receiver side, the blocks are recovered by the BS as described in Section III. After MUSA MUD, the codewords are processed by the Polar decoder. The codewords that cannot be successfully corrected are accounted for the BLER. Table II brings the parameters used for the simulations presented in this section. Since the length of the spreading sequence is = 4, the spreading block has = 192 · 4 = 768 elements. These elements are mapped into / = 64 OFDM symbols with = 12 subcarriers in each one. Therefore, one MUSA frame occupies the same Physical Resource Block (PRB) during 9 time-slots in a time-frequency grid, following the LTE structure. A time-varying frequency-selective channel based on the Tapped Delay Line (TDL)-D [35] was used in the simulations, assuming independent block fading for each PRB. Finally, it is assumed that all users transmit a data block in all MUSA frame. The device with the highest SNR achieves 40 dB, while the lowest SNR is 4 dB. The SNR for the devices is uniformly distributed between to these upper and lower limits. Fig. 4 shows the BLER performance for the proposed MUSA framework under different overload values, defined as = / · 100%. Also, both AWGN and TDL-D channels have been considered. The perfect channel estimation is denoted MUSA-ideal estimation, while MUSA-LS and MUSA-DNN are used to denote when LS and DNN-based estimation were employed. The acceptable BLER target is 10%. It is possible to observe in Fig. 4 that MUSA has good BLER performance over AWGN channel, since the channel response is flat and no frequency-response estimation is required. Hence, SIC algorithms does not suffer with error propagation and high overloading factors are possible for the target BLER, allowing up to 450%.
When the TDL-D channel is considered, Fig. 4 shows that the MUSA BLER performance decreases, even when CSI is assumed to be available at the BS. In this case, the maximum overloading factor that can be achieved at the target BLER is 350%. If the LS estimator is employed, the MUSA BLER performance becomes unacceptable and the system is considered to be inoperative even for overloading factors below 100%. The DNN-based estimation significantly improves the performance of the MUSA scheme over TDL-D channels. The improved channel estimation provided by the DNN reduced the error propagation and allows for the SIC algorithm to properly cancel the interference among the devices. Applying DNN as a post processing unit on top of the LS estimation leads to a significant normalized mean squared error (NMSE) performance improvement, as shown in Fig. 5 thus, allowing the MUSA to achieve an overload factor of 250% over TDL-D channel, which means that up to 10 users can share the 4 PRB of the MUSA frame at the target BLER. This means that the DNN-based estimation brought the MUSA system from the inoperative state to a situation where an considerable spectrum efficiency gain can be achieved. The use of DNN in other processes in the receiver chain can further improve the overall MUSA system performance, bringing it closer to the performance achieved with perfect channel estimation, and this will be the topic of future research efforts. Fig. 6 shows the performance of MUSA-DNN for different SNR conditions. In this case, SNR = b 0 is used in the simulations. Notice that, as expected, by decreasing the SNR, the MUSA-DNN performance also decreases, resulting in smaller overladiing factor for the target BLER. It is possible to notice that, for = 0.715 and = 0.550, the achieved overload factors are 200% and 150%, respectively. In other words, when the SNR decreases, the system needs to allocate fewer users in the same transmissions resource in order to achieve the BLER target, decreasing the throughput in the system. On the other hand, increasing the SNR increases the BLER performance curves, but it does not improve the overload factor at the target BLER. For > 1, the maximum overload factor is 250% for a BLER of 10 −1 .
The impact of the code rate on the MUSA performance was also evaluated. In this scenario = 191 and different values of are used, changing the the code rate. Fig. 7 depicts the MUSA performance in terms of BLER for ∈ [64, 32,16]. As expected, when the code rate is decreased, better BLER performance is achieved because of the higher error correction capabilities of the Polar Code, at the cost of lower throughput per device. With a = 16/191 = 0.08377, it is possible to allocate 12 users transmitting their data in the same elements and, even so, the system achieves a BLER less than 10%, but a data rate reduction of 4 times when compared with the scenario presented in Fig. 4. Finally, a comparison between the performances of the MUSA and Orthogonal Frequency Division Multiple Access (OFDMA) is analyzed below. It is worth to mention that OFDMA is a orthogonal multiple access scheme. Fig. 8 illustrates the main difference between the time-frequency resource allocation for MUSA and OFDMA. While MUSA uses the overload factor and allows for all users to transmit their data in all time-frequency resources, the OFDMA divides the time-frequency resources among the users in an orthogonal manner. In this paper, a block containing 64 OFDM symbols, each one with = 12 subcarriers was defined to compare the performance of the two schemes.
In MUSA, the devices spread their data to the entire block. In the OFDMA, an device must transmit the data on specific positions of the time-frequency grid. Furthermore, the devices in the OFDM scheme must employ pilot subcarriers to estimate the channel response and must follow the synchorinization procedure and reource allocation coordinated by the BS. These processes requires high processing capabilities and also demands power allocation for synchronization and channel estimation signalling, reducing the power and spectrum efficiencies of the system. Fig. 9 shows the BLER for MUSA-DNN and OFDM with 6 and 8 users for different values of SNR, obtained by varying . Because OFDMA does not suffer from multi-user interference, its performance improves as the SNR increases. The MUSA, on the other hand, suffers a error floor due to the multi-user interference. However, it is interesting to observe that for the

V. CONCLUSION
The advent of IoT applications will require a massive number of power-limited devices to be connected in the mobile network. NOMA techniques can provide the flexibility to increase the spectrum efficiency without increasing the complexity of these devices and MUSA is an interesting candidate. Previous papers has shown that MUSA can achieve very high overloading factor without including any information in the transmitted data that aids the channel estimation and the estimation of the spread sequences employed by the device. However, the procedures performed by the receiver in the BS were not detailed. In this paper, a complete framework for the MUSA is described. It has been shown that the MUSA performance can be severely penalized by doubly dispersive channels. This paper has also shown that the use of DNN benefits the quality of the the channel estimation, improving the overall system performance and allowing MUSA to achieve reasonable an overloading factor. AI algorithms can be used in other processes in the receiving chain, which might improve even further the BLER performance or reduce the system complexity on the BS side, making the MUSA an interesting candidate for future IoT application over mobile networks.