SDMA Grouping Based on Unsupervised Learning for Multi-User MIMO Systems

—In this study, we investigate a spatial division mul- tiple access (SDMA) grouping scheme to maximize the total data rate of a multi-user multiple input multiple output (MU- MIMO) system. Initially, we partition the set of mobile stations (MSs) into subsets according to their spatial compatibility. We explore diﬀerent clustering algorithms, comparing them in terms of computational complexity and capability to partition MSs properly. In the following, we schedule MSs from the diﬀerent subsets to build the SDMA group. Since we consider a scenario with a massive arrange of antenna elements and that operates on the mmWave scenario, we employ a hybrid beamforming scheme and analyze its behavior in terms of the total data rate. The analog and digital precoders exploit the channel informa- tion obtained from clustering and scheduling, respectively. The simulation results indicate that a proper partition of MSs into clusters can take advantage of the spatial compatibility eﬀectively and reduce the multi-user (MU) interference. The hierarchical clustering (HC) enhances the total data rate 25% compared with the baseline approach, while the density-based spatial clustering of applications with noise (DBSCAN) increases the total data rate 20% .

input multiple output (MIMO), and hybrid beamforming. Moreover, we evaluate the role of multi-antenna techniques in 5G. We exploit space-division multiple access (SDMA) to allow spatial multiplexing, i.e., the division of user equipments (UEs) in groups to share radio resources. The performance of a SDMA group depends on how efficiently the MU interference is mitigated. This is directly determined by the channel characteristics of the selected UEs. It stands out that the selection of UEs into spatially compatible SDMA groups can improve the total data rate. Otherwise, the signals sent to the UEs may interfere with each other and threaten the system performance. Therefore, the SDMA group composition impacts system operation efficiency [3].
SDMA-orthogonal frequency division multiple access (OFDMA) systems can allocate resources in time, frequency, and space dimensions to different UEs. In this context, the large number of degrees of freedom leads to highly complex radio resource allocation (RRA) problems [3], [4]. In particular, SDMA grouping can be classified as an integer optimization problem, since it involves integer variables, like the number of UEs or of time-frequency resource blocks. Usually, these problems have combinatorial behavior, which implies high complexity and ask for an exhaustive search in order to obtain an optimal solution [3]. The RRA problems can be divided into subproblems (e.g., frequency assignment, power allocation, SDMA grouping) to reduce the overall complexity. In this case, each subproblem considers a different dimension of the RRA problem. However, the resulting subproblems still hold a considerably high level of complexity. Thus, suboptimal strategies are often proposed in the literature to solve them [4].
In particular, for the SDMA grouping problem, two methodologies stand out: i) iterative SDMA grouping; and ii) clustering followed by UE scheduling. In [3], [5], there is a direct iterative formation of groups, i.e., each UE is included sequentially in the group, according to the spatial compatibility between the candidate UE and the UEs already admitted to the SDMA group. Another approach, used in [6], [7], first partitions all UEs of the system into subsets, also called clusters, and afterwards schedules UE from different clusters to compose the SDMA group.
In [8], the authors explore the K-means clustering (KMAC), a classical unsupervised machine learning algorithm, to split UEs according to second-order channel statistics computed according to a given mathematical expression. Moreover, it is also considered a scheduling scheme that combines load balancing and precoding design considering proportional fairness. In our work, we also consider the KMAC algorithm, but in a more realistic system-level simulation to determine the channel matrices and calculate covariance matrices. In [9], the authors evaluate other unsupervised machine learning algorithms to partition UEs. They evaluate the agglomerative hierarchical clustering (AHC) and K-medoids clustering (KMDC) considering different similarity measures, such as weighted likelihood, subspace projection, and Fubini-Study. In our work, we evaluate the AHC algorithm in a much more challenging scenario since we consider a larger number of UEs in a random arrangement within the cell. In [10], the authors investigate a SDMA grouping problem in MU MIMO in the context of 5G networks. They evaluate a partitioning process based on KMAC and a scheduling algorithm based on the branch and bound algorithm to select UEs from the clusters to compose a SDMA group that supports multiple spatial streams per cluster.
We consider a MU-MIMO system based on joint spatial division and multiplexing (JSDM). Initially proposed in [11], the JSDM scheme has as the main idea the partition of UEs into clusters according to second-order channel statistics and serving these groups by using a downlink hybrid beamforming scheme. It is a two-stage beamforming that combines an analog precoder, defined according to the partitioning process, with a digital precoder, which is a function of the effective channels of the UEs that compose the SDMA group. The JSDM framework has been extended in [6], where practical issues are addressed, such as the evaluation of the regime of a realistic setting where UEs have different angles of arrival and angular spreads.
In [12], we investigated the maximization of the total sum rate of an MU-MIMO system based on an SDMA scheme combined with hybrid beamforming. The simulation results indicated that the combination of a proper partition of UEs into clusters based on KMAC, and suitable scheduling based on the best-fit provided a technique able to exploit spatial compatibility more effectively and reduced the MU interference. The present work takes further steps in this research topic. We explore different clustering algorithms, comparing them in terms of computational complexity and capability to partition UEs properly.
Hybrid beamforming is a cost-effective alternative to implement massive MIMO in the millimeter-wave spectrum. The hybrid beamforming design requires the optimization of system performance under several hardware constraints, such as the number of radio frequency (RF) chains and the dimension of the phase shifter network [13]. As mentioned previously, the hybrid beamforming has a digital and an analog component. On the one hand, the digital component can be performed for each UE on each sub-carrier. On the other hand, several UEs and sub-carriers share the analog component since it is a postinverse fast Fourier transform (IFFT) operation. The structure of the analog component influences the hardware efficiency and has a notable influence on the spectral efficiency. [13].
In the proposed hybrid beamforming scheme, we define the analog precoder according to the characteristics of each cluster. Therefore, the design of the clustering algorithm has a critical impact on the achievable performance of the MU-MIMO system. Based on that, we evaluate different unsupervised learning algorithms, analyzing their operating principles, and their main advantages and disadvantages.
Motivated by the above discussion, we investigate a SDMA grouping algorithm in a millimeter wave MU-MIMO scenario. The main contributions ot the work can be summarized as follows: 1) evaluation of different unsupervised machine learning algorithms to partition UEs into proper subsets; 2) evaluation of a hybrid precoding scheme based on JSDM and its impact on the total data rate.
The remainder of this work is organized as follows. Section II describes assumed system model. Section III discusses the main design aspects of the evaluated SDMA grouping solution. In Section IV, we discuss the proposed hybrid precoding scheme. Finally, performance results are shown in Section V, and the main conclusions are drawn in Section VI.

II. S M
We consider the downlink of a multi-user multiple input multiple output (MU-MIMO) system based on OFDMA. The system is composed of one base station (BS) and a set J containing UEs. The transmitter uses antennas to send data streams to antennas at the th receiver. Before transmission, for a given resource block (RB) and transmission time interval (TTI), the symbol vector x ∈ C ×1 is filtered by the precoding matrix F ∈ C × . The filtered symbols are then transmitted through the channel associated with the RB which response is represented by H ∈ C × . Thus, the prior-filtering receive vector y ∈ C ×1 at the th receiver is given by where P ∈ R × is the power matrix, given by I , where is the transmit power allocated to each stream associated to the th UE and I is the × identity matrix; the second term on the right-hand side of (1) represents the MU interference, also known as intracell interference, caused by any UEs sharing the same RB; z ∈ C ×1 is the additive Gaussian noise vector, whose elements are independent and identically distributed (IID) as CN (0, 2 z ). The input symbol vector is normalized so that E{x x H } = I . The channel coefficient of a given RB corresponds to that associated with the middle sub-carrier and the first transmitted orthogonal frequency division multiplexing (OFDM) symbol in a TTI or a sub-frame. Thus, we consider that the channel remains constant during resource allocation in a TTI and over an RB. Moreover, we assume that the required channel state information (CSI) is available at the transmitter and receivers.
At the receiver, the vector y is filtered by the decoding matrix G ∈ C × . Therefore, the post-filtering receive vectorŷ ∈ C ×1 is given bŷ We consider a multipath wireless channel model [14] to describe the response channel matrix H . It can be expressed in terms of underlying physical paths as where, for each propagation path , is the path gain; is the shadowing modeled log-normal random variable with standard deviation ℎ ; , is the angle of arrival (AoA) at the receiver; , is the angle of departure (AoD) at the transmitter; is the relative delay; is the frequency of the central sub-carrier of the RB; is the Doppler shift; v ( , ) ∈ C ×1 is the response vector and v ( , ) ∈ C ×1 is the steering vector. The multipath wireless channel model is represented in Fig

III. SDMA G D
In our study, the SDMA grouping problem is performed into two steps. In the first step, we partition the set J of all UEs of the system into a set of clusters C according to the spatial compabiltity of their channels. In the second step, we select one UE from each cluster to compose the SDMA group G. The following subsections detail the most relevant design aspect of these steps.

A. Clustering in MU-MIMO Systems
Clustering is a class of algorithms from the unsupervised machine learning which aims to partition entities according to the similarity among them, i.e., the determination of subsets of entities with similar characteristics without any external intervention [15]- [17]. Therefore, we employ clustering algorithms to divide the population J of UEs into a set of clusters C = {C 1 , · · · , C }. The UEs belonging to a same cluster are quite similar, while UEs from distinct clusters are somewhat different.
In our work, we charaterize each UE according to its channel covariance eigenspace. Given the channel matrix of the th UE, H ∈ C × , the sample transmit covariance matrixH ∈ C × is given byH where , called TTI window size, indicates the number of channel matrix samples considered in the averaging process and H H , is the conjugate transpose of the channel matrix. The eigendecomposition ofH can be written as where D ∈ C × defines the matrix composed of eigenvectors and ∈ C × is the diagonal matrix of eigenvalues. Hence, the similarity among entities is defined according the dominant eigenvector d ∈ C ×1 of each UE, i.e., the eigenvector associated to the highest eigenvalue ofH .
According to [18], the number of possible ways of partitioning UEs into a set of clusters (non-empty and disjoint subsets of UEs) is given by the Stirling number of the second kind, which can be written as The exhaustive search for suitable clusters consists in the evaluation of all possibilities of partitioning the population of UEs into the set of custers. In practice, this brute-force approach is an unfeasible task with a pronibitive computational complexity O ( / !) [18]. Consequently, we evaluate algorithms that represent the main practical paradigms of the cluster analysis, namely K-means clustering (KMAC), agglomerative hierarchical clustering (AHC), and density-based spatial clustering of applications with noise (DBSCAN) [19]- [21].
1) K-Means Clustering: initially proposed in [22], it employs a greedy iterative approach to find a partition that minimizes the distance between entities that belong to each cluster to the cluster average value, called centroid. In our model, the centroid is a vector v ∈ C ×1 that describes the central characteristic of the th cluster C , which is given by where indexes the UEs that currently belong to cluster C . The pseudo-code of the KMAC is presented in Algorithm 1. The first step of the algorithm is the initialization of centroids. In our study, we randomly select the dominant eigenvector of out of the UEs as initial cluster centroids. For more details on the impact of centroid initialization on the KMAC algorithm please refer to [23]. Each iteration of the algorithm consists in a clustering assignment followed by a centroid update. Given the group of centroids v , { = 1, . . . , }, provided at iteration = 0, in the group assignment step, each UE ∈ J is assigned to the cluster C ★ with closest mean as Algorithm 1 K-Means Clustering 1: ← 0 2: for all ∈ {1, · · · , } do 3: initialize centroid v ∈ C ×1 4: repeat 5: ← + 1 6: for all ∈ {1, · · · , } do 7: initialize cluster C ← ∅ 8: for all ∈ {1, · · · , } do 9: for all ∈ {1, · · · , } do 10: assign the th UE to the closest cluster C ★ according to Eq.(8) 11: for all ∈ {1, · · · , } do 12: update centroid v ( ) ∈ C ×1 according to Eq.(7). 13: until convergence condition Eq.(9) holds or the maximum number of iterations is achieved.
We define a threshold > 0 and test at every iteration if there is no significant change of the centroids in comparison to the previous iteration, i.e., In the centroid update step, new centroids values v are computed for each cluster from the UEs in C using Eq. (7). The assignment and centroid update steps are carried out until convergence is reached. The output of the algorithm is a clustering of the UEs into disjoint clusters C , = {1, . . . , }, and a set of vectors {v ( ) 1 , v ( ) 2 , . . ., v ( ) } obtained as the centroids of the clusters. Based on these outputs, we determine the analog precoder, as described in Section IV. The computational complexity of the KMAC algorithm based on the pseudo-code Algorithm 1 is ( ), where is the number of iterations. More details are provided in Appendix A.
2) Agglomerative Hierarchical Clustering: it performs a hierarchical clustering employing a bottom-up strategy, i.e., each UE starts as a cluster C = { }, and the most similar pairs of clusters are successively merged until the target number of clusters is reached. The merging of clusters represents the creation of a new cluster in a higher hierarchical level. The linkage rule determines the merge strategy. In our work, we evaluate the complete, average, and Ward linkage rules.
The complete linkage defines the distance between the clusters C and C as the maximum distance between a UE on cluster C and a UE on cluster C , which can be written as The average linkage is defined as the average pairwise distance between the UEs from different clusters. It can be written as The Ward linkage defines the distance between the clusters C and C as the increase in the sum squared errors (SSE) Algorithm 2 Agglomerative Hierarchical Clustering 1: ← 0 2: initialize each UE as a cluster C = { } for ∈ {1, · · · , } 3: compute the distance between clusters {C , C } 4: repeat 5: ← + 1 6: determine the closest pair of clusters {C , C } 7: merge the clusters C ← C ∪ C 8: update the set of clusters C ( ) = C ( −1) \{C ∪C }∪C 9: update the distance among clusters 10: until |C| = when the two clusters are merged. The SSE of the cluster C is defined as where v represents the centroid of the th cluster, calculated The SSE for a set of clusters C = {C 1 , · · · , C } can be written as SSE = =1 SSE .
The Ward linkage defines the distance between the clusters C and C as the net change in the SSE value when these clusters are merged. It can be written as Algorithm 2 describes the pseudo-code of the AHC. Given a set of clusters C ( ) = {C 1 , · · · , C }, at each iteration we determine the most similar pair of clusters C and C according to the linkage rule. These clusters are merged into a new cluster C . We update the set of clusters, C ( ) = C ( −1) \{C ∪C }∪C . The process is repeated until the set of clusters C reaches the desired size, i.e., |C| = .
Based on the pseudo-code Algorithm 2, the computational complexity of the AHC algorithm is ( 3 ). More details are provided in Appendix B.
3) Density-Based Spatial Clustering of Applications with Noise: initially proposed in [24], it is a clustering algorithm that exploits the local density of entities to determine clusters, instead of using the distance among them.
The -neighborhood of the th UE is defined as the set of UEs whose Euclidean distance is smaller than . It can be written as The th UE is classified a core point if there are at least UEs in its -neighborhood, i.e., if |N ( )| ≥ . The parameter defines the local density, i.e., the frequency threshold that allows the classification of a UE as a core point. If the UE does not meet the frequency threshold, but still belongs to the neighborhood of a core point, it is classifies as a border point. If the th UE does not meet any of the previously described criteria, it is called a noise point [25].
The th UE is directly density reachable from the th core point UE if ∈ N ( , ). The th UE is called density reachable from the th UE if there is a set of core points leading from to . Furthermore, two UEs are defined as density connected if they are simultaneously density reachable from the same core point. A density-based cluster is defined as the maximal set of density connected points [18].
The pseudo-code for the DBSCAN is shown in Algorithm 3. Initially, we compute the -neighborhood for each UE, and check if it is a core point. Then, we update the set of core points C core . In the following, commands between lines 7 and 15 perform the assignment of UEs to clusters. Some border UE may be reachable from core UEs in more than one cluster. We arbitrarily assign to one of the clusters since overlapping clusters are allowed. Those UEs that do not belong to any cluster are treated as outliers or noise. The computational complexity of the DBSCAN algorithm is ( 2 ). More details are provided in Appendix C. if |N ( , )| ≥ then 6: update the set of core points C core ← C core ∪ { } for all ∈ N ( , ) do 13: assign the th UE to the th cluster ←

14:
if ∈ C core then 15: return to line 12 16: update the set of clusters C ← {C 1 ∪ C 2 ∪ · · · ∪ C } 17: update the set of noise points C noise ← { ∈ J | = ∅} 18: update the set of border points C border ← J \{C core ∪C noise }

B. Scheduling Algorithm
In the following, given the partition of UEs into a set of clusters C, we select one UE = ★ ∈ C from each cluster to create an SDMA group G containing UEs (or streams). We employ a random scheduling algorithm. That it, the BS randomly selects one UE from each cluster C to compose G. Therefore, the selection of the components of the SDMA group occurs without the assessment of any compatibility or suitability criteria.

IV. H B D
In our study, we adopt a system with a hybrid precoding scheme based on JSDM. Thus, we consider a two-stage precoder composed of an analog and a digital component. In the following, we detail our main design assumptions.

A. Analog Precoder Design
The analog precoder F RF ∈ R × , is defined from the centroids of the clusters C = {C 1 , · · · , C } and can be written as where v ∈ C ×1 is calculated according to Eq. (7). The analog precoder is assumed to have elements of equal magnitude, i.e., only phase shifting is performed in the analog domain.

B. Digital Precoder Design
For each selected UE in the SDMA group ∈ G, we consider that a matched filter on the dominant receive eigenmode of the UE's channel will be employed, i.e., at the receiver side, is the singular value decomposition (SVD) of the channel H ∈ C × of the th UE in the SDMA group G and u 1 is its dominant left-singular vector. Given the total number G = ∈ G of receive antennas in the SDMA group G, we define the group channel matrix H G ∈ C G × and the decoder matrix G ∈ C × G based on the channel matrices of the selected UEs, so that can write Therefore, given the block diagonal decoder matrix G, the group channel matrix H G and the analog precoder F RF , the equivalent channel matrix H eq ∈ C × is given by There are different precoding techniques that either totally or partially suppress spatial interference or ignore it. We evaluate the zero-forcing (ZF) filter as digital precoder. The ZF precoding is conceived to decorrelate the transmit signals so that the signal at every receiver output is free of interference. The precoding matrix is defined as The total power constraint is enforced by normalizing the digital and analog filters, such that F RF F BB P G 2 = RB , where P G ∈ R × is the block diagonal power matrix resulting of the combination of the power matrices of each UE belonging to the SDMA group and is the power for a given RB. We consider that the number of clusters is equal to the number of RF chains and streams, i.e., = . Therefore, the dimensions of F RF and F BB are compatible with the dimension of F, so that F = F RF F BB ∈ C × .
The post-filtering receive vector of the groupŷ G ∈ C ×1 is given byŷ where x G ∈ C ×1 is the group symbol vector and the z G ∈ C G ×1 is the group noise vector. Defining Q = GH G F P G ∈ C × , the average signal to interference-plus-noise ratio (SINR) perceived by the stream can be calculated as where 2 is the average noise power. The data rate of stream is calculated according to Shannon capacity formula [26] and is given by where is the bandwidth of the RB.

V. P E
We consider a single cell system with a carrier frequency of 28 GHz and a bandwidth of 100 MHz. Consequently, according to 5G numerology [27, Table 2], this carrier frequency implies a set of 125 RBs, each one composed of 12 subcarriers equally spaced of 60 kHz. Furthermore, the number of sub-frames per frame is 10, each sub-frame has 14 symbols and the TTI duration is 0.25 ms. We assume the one ring scattering channel [28], with the propagation effects modeled according to the urban micro (UMi) street canyon deployment proposed in [29]. Therefore, the path loss parameters are  Table I.
Initially, we define the appropriate number of clusters to partition the set of UEs in our simulation scenario. The determination of the optimal number of clusters is out of the scope of this work. For more details regarding this issue, the reader may refer to [30]- [32] and references therein.
In our study, we define the Calinski-Harabasz index (CHI) as the evaluation metric of clustering performance. Also called variance ratio criterion, it measures the cluster validity according to the average between-and within-cluster dispersion [33]. The CHI of the partition of UEs into clusters can be written as where ( ) is the inter-cluster variance, and ( ) is the intracluster variance. We consider a MU-MIMO scenario, where the BS is equipped with a uniform linear array (ULA) with = 64 antennas separated of half-wavelength and each UE is equipped with a ULA with = 2 antennas also separated of halfwavelength. We assume a set of = 100 UEs arranged inside a sector of 120 • of the cell according to a random uniform distribution. Figure 2 indicates the average value of the Ω( , ) considering partitions defined by the clustering algorithms KMAC and AHC -Ward linkage, for a variable number of clusters = [2, 10]. On one hand, KMAC algorithm presents a Ω( , ) almost constant for all values of . On the other hand, the AHC presents a maximum value at = 3, and remains constant at a lowest level for all other values. Moreover, AHC achieves absolute values of Ω( , ) higher than the KMAC algorithm. Hence, the AHC can establish a partition of UEs into more dense and well separated clusters than KMAC.
The evaluation of the Ω( , ) values define to our system a range of the most appropriate number of clusters, ≥ 3 due to AHC or ≥ 8 due to KMAC. Since the number of clusters is directly related to the number of RF chains, a design variable that impacts the energy efficiency and the hardware cost of the network, we define the most appropriate number of clusters to our scenario as = 3.
In the following, we compare the clustering algorithms described in Section III considering a set of = 100 UEs to be partitioned into = 3 clusters. We define as baseline the random clustering (RC) algorithm. It defines a random distribution of UEs into subsets without any evaluation of similarity among the elements of the same cluster. The UEs are arranged inside a sector of 120 • of the cell according to a random uniform distribution. Figure 3 shows the cumulative distribution function (CDF) of the total data rate for our system considering the effect of the different SDMA algorithms. After the definition of the components of the clusters, we randomly select one UE from each subset to compose the SDMA group. Tables II, III, and IV resume the main advantages of the evaluated algorithms in comparison with RC. To simplify the notation, we abbreviate AHC with the Ward linkage rule to AHC-W, AHC with the     The DBSCAN algorithm does not require the number of clusters, but it requires the setting of two parameters ( and ). We establish an iterative algorithm to update ( and ) until the number of clusters equals to . Besides this additional setting step, the DBSCAN algorithm still have lower computational complexity than AHC. Moreover, it achieves total data rates close to those obtained by AHC-W. Based on the description of the clustering algorithms, the increase in the number of UEs impacts the quality of the partitioning, since the UEs tend to be closer to each other and the channel characteristics become more similar. Therefore, the partitioning of UEs into distinct clusters becomes more difficult. In this case, it is mandatory for the scheduling algorithm to consider the correlation among UEs to compose the SDMA group since the different clusters do not ensure near orthogonal channel matrices. As in our work, we consider random scheduling, the increase of the system load rises the MU interference, reducing the total data rate. For more details, please refer to [34], where the authors provide a detailed evaluation of the impact of the number of UEs on the total data rate achieved by an SDMA grouping solution combined with JSDM.

VI. C
In this study, we investigated a SDMA grouping scheme to maximize the total data rate of a MU-MIMO system. The main step is the partition of UEs into subsets according to their spatial compatibility. We explored different clustering algorithms, namely KMAC, AHC, and DBSCAN comparing them in terms of computational complexity and capability to partition UEs properly. Since we consider a scenario with a massive arrange of antenna elements and that operates on the mmWave scenario, we employ a hybrid beamforming scheme and analyze its behavior in terms of the total data rate. The analog and digital precoders exploit the channel information obtained from clustering and scheduling, respectively. The simulation results indicate that the combination of a proper partition of MSs into clusters can exploit the spatial compatibility effectively and to reduce inter-cell interference. On one hand, the hierarchical clustering (HC) enhances the total data rate 25% compared with the baseline approach, but has significant computational complexity. On the other hand, the density-based spatial clustering of applications with noise (DBSCAN) has lower computational complexity and increases the total data rate 20%. As a major disadvantage, it requires an additional parameter determination process.
In our work, we consider a conventional CSI feedback. That is, each UE reports the channel covariance matrix to the BS. In scenarios with a massive number of antenna elements at the BS and a high number of UEs, this feedback scheme implies a large signaling overhead. Practical communication systems must consider the shortage of resources and prioritize carrying data instead of signaling. Thus, the impact of CSI acquisition strategies with reduced signaling on the proposed SDMA grouping solution is a relevant research question, but is out of the scope of this work and is left for future studies.
According to the pseudo-code Algorithm 1, the initialization of the th centroid defined on line 3 has a computational complexity ( ). This command determines the copy of the dominant eigenvector d ∈ C ×1 to the centroid vector v . This process occurs inside a loop of iterations defined in line 2. Therefore, the initialization of centroids has computational complexity ( ). The assignment step determines the association of each UE to a specific cluster. Line 10 uses Eq. 8 determines the association of each UE to a specific cluster. Since it involves the minimization of a difference of vectors d ∈ C ×1 and v , it has computational complexity ( ). This step is performed inside two loops of and iterations. Therefore, the assignment step has computational complexity ( ) The centroid update is defined according to Eq. 7. This operation involves vectors of dimension × 1 inside a loop of iterations. Thus, the computational complexity of this step is ( ). Therefore, the computational complexity of the KMAC in a process with a total of iterations is ( ) + ( ) + ( ) = ( ).
According to the pseudo-code Algorithm 2, the computation of the distance between the clusters C has computational complexity ( 2 ) since it requires the computation of Euclidean distances between vectors with dimension × 1 inside two loops with iterations. The determination of the closest pair of clusters has computational complexity (|C| 2 ) since it requires the evaluation of a data structure with |C| 2 elements. Initially, the set of clusters has |C| = elements, decreasing by 1 at each iteration. Therefore, this operation has computational complexity ( 2 ). After the merge step, the distances from the merged cluster to the other clusters have to be recomputed, whereas the distance between the other clusters remain the same. Therefore, we compute − distances at the th step. This operation has computational complexity ( − ) ( 2 ) = ( 3 ). The merging process is repeated until the desired number of clusters is reached. Therefore, the computational complexity of the AHC in a process with a total of − iterations is ( 2 ) + ( − ) ( ( 2 ) + ( 3 )) = ( 3 ).
The determination of the -neighborhood has a computational complexity ( ), since it requires the Euclidean distance of a vector with dimension × 1 . This operation is repeated times for each UE and occurs inside a loop of iterations. Therefore, the computational complexity to determine the -neighborhood of all UEs is ( 2 ). Since this operation represents the main computational effort of the algorithm, its computational complexity is ( 2 ). For more details regarding the computational complexity of the DBSCAN algorithm, see [25], [35].