On the Darmois-Skitovich Theorem and Spatial Independence in Blind Source Separation

In many signal processing applications, one may come across the need to individually recover unobserved signals which are also combined in an unknown manner. This problem is widely known as blind source separation (BSS). One of the most prominent set of BSS techniques, known as independent component analysis (ICA), owes much of its development and theoretical understanding to the Darmois-Skitovich theorem. Although this theorem is implicitly used in BSS to establish source separability conditions in ICA, little emphasis is given in the literature to its derivation and to the interpretation of its consequences. The goal of this paper is to revisit, in a more intuitive manner, the Darmois-Skitovich theorem and its derivation in the BSS context.


I. INTRODUCTION
T HE need to individually recover unobserved signals, also combined in an unknown manner, arises in several practical contexts such as biomedical signal processing [1], audio signal processing [2] and communications [3].This problem, widely regarded as blind source separation (BSS), can be summarized with the simplified scheme shown in Figure 1.In this scheme, N s sources indicated by the vector s(n) are applied to a mixing system H , resulting in N x mixture signals indicated by the vector x(n).The main goal in BSS is to obtain a separating system W such that its outputs, indicated by the vector y(n), consist in good estimates of N y ≤ N s sources [4], [5]. Mixtures: x(n) Research on BSS area is relatively recent and virtually started in the beginning of the 1980s, when linear instantaneous mixing systems were primarily considered [6].In 1985, B. Ans, J. Hérault and C. Jutten showed that it was possible to solve the BSS problem by resorting to the use F. R. M. Pavan (e-mail: frmp@lcs.poli.usp.br) and M. D. Miranda (e-mail: maria@lcs.poli.usp.br) are with Escola Politécnica of the University of São Paulo (EPUSP).This work was partly supported by the Coordination for the Improvement of Higher Education Personnel (CAPES).
Digital Object Identifier: 10.14209/jcis.2018.16 of nonlinear structures to determine the separating system coefficients [7].During that time, little was known about the theoretical limitations of the existent separating techniques and the reason why they did in fact work [6].
It was only in the beginning of the 1990s that P. Comon introduced a set of techniques extensively known as independent component analysis (ICA), based on the recovery of mutual spatial independence of the sources at the output of the separating system [8], [9].In this occasion, source separability conditions were also established, consisting of theoretical requirements that the mixing system and source distributions should meet so that the sources can be adequately separated.In essence, as the separating system imposes mutual spatial independence at its output, adequate BSS is guaranteed if at most one source is Gaussian.This rather obscure relation between source independence and non-Gaussianity was highlighted by P. Comon in [9], who drew inspiration from the Darmois-Skitovich theorem, published in 1953 independently by G. Darmois [10] and V. Skitovich [11], in the statistics field known as factor analysis.
The derivation of source separability conditions was fundamental in the development of BSS techniques and in the subsequent comprehension of their operating limits.Among many things, they provided theoretical reasoning to the principle of separation based on spatial independence imposition.This, in turn, gave logical support to the use of high-order statistics in order to separate sources-a fruitful idea which inspired several works such as [4], [5], [12] and [13].
In this way, the Darmois-Skitovich theorem had an undeniably important role in the development of ICA techniques.Despite this, little emphasis is given in the literature to the derivation of this theorem, which is not evident, and to the interpretation of its consequences in the BSS problem.The goal of this paper is to revisit and interpret, in a more intuitive manner, the Darmois-Skitovich theorem applied to BSS.

A. Paper organization
Next, the notation used along the paper is firstly introduced.In Section II, the BSS problem is generically formulated and subsequently considered for linear instantaneous mixtures.In Section III, a statistical model for the sources that allows for their adequate blind separation is presented.In Section IV, some preliminary theorems are presented in order to better understand the Darmois-Skitovich theorem in Section V.In Section VI, this theorem is applied to the BSS problem.At last, the conclusions of this paper are presented in Section VII.

B. Notation
Throughout the text, vectors are denoted by lowercase boldface letters and matrices are denoted by uppercase boldface letters.For example, x denotes a vector and H denotes a matrix.
Random quantities are always represented by underlined letters.For instance, x denotes a random variable (rv) and x = x 1 x 2 T denotes a random vector composed of rvs x 1 and x 2 , where (•) T denotes the transposition operation.Realizations or drawings of a random quantity are not underlined because they are not random [14]- [16].
Stochastic processes are always denoted enclosed in curly braces and are represented as random quantities followed by a time index enclosed in parentheses.Along the paper, only discrete-time processes with time index n ∈ Z are considered.For instance, {y(n)} denotes a scalar stochastic process and { y(n)} denotes a vector stochastic process.When evaluated at a fixed time instant n = n 0 , the scalar process {y(n)} becomes an rv y(n 0 ) and the vector process { y(n)} becomes a random vector y(n 0 ).

II. BSS PROBLEM FORMULATION
In the following, the simplified BSS scheme shown in Figure 1 is considered in the particular case of real signals.The i th source signal is denoted by s i (n) for i = 1, 2, . . ., N s .The source vector is defined as The th mixture signal is denoted by x (n) for = 1, 2, . . ., N x .The mixture signals are collected into the mixture vector given by The mathematical relation between the source vector s(n) and the mixture vector x(n) can be generically given by x(n) = H {s(•)}, where H is an unknown mixture mapping from R N s to R N x .This notation takes into account the causality and eventual memory of the mixture mapping.The k th estimated or reconstructed source is denoted by y k (n) for k = 1, 2, . . ., N y .The estimated source vector can be conveniently defined as and must satisfy y(n) = W{x(•)}, where W is a separation mapping from R N x to R N y .In the general case, the involved mixing and separating systems are evidently multiple-input and multiple-output (MIMO).
Since the 1980s, several solutions have been gradually proposed for the BSS problem.Various types of mixture models based on hypothetical-or eventually known-aspects of the mixing system and the unobserved sources have been considered.The first BSS solutions were proposed for a very particular type of mixing system model, whose study was fundamental in the subsequent development of solutions for more complicated models [5], [7], [8], [17].
One of the most simple mixing system models that can be considered is the linear instantaneous mixing system [4], [5]which is also usually assumed to be time invariant.In this case, the mixture signals x (n) can be expressed as a function of the source signals s i (n) according to (II.4) for = 1, 2, . . ., N x , where h ,i are the constant mixing system coefficients.The equations stemming from (II.4) can be compactly rewritten as where x(n) is the mixture vector defined in Equation (II.2),H ∈ R N x ×N s is the mixing system coefficient matrix-or simply mixing matrix-given by and s(n) is the source vector defined in Equation (II.1).For simplicity, no additive noise is considered in the mixture observations-the influence of noise in the separation process can be further analyzed with the adoption of more complete models [8], [17].
When seeking solutions for the particular type of mixing system of Equation (II.5), it is also common to consider a linear instantaneous separating system.In fact, if the mixture mapping is linear, instantaneous and bijective, for example, its inverse mapping must also be linear and instantaneous.Therefore, the relation between the estimated source vector y(n) and the mixture vector x(n) is given by where y(n) is defined in Equation (II.3) and W ∈ R N y ×N x is the separating system coefficient matrix-or simply separating matrix.Substituting (II.5) into (II.7), the following useful relation between the estimated source vector y(n) and the source vector s(n) can be obtained: where M ∈ R N y ×N s is the so called combined response matrix of the mixing and separating systems, given by The mixture model addressed in this section does not provide enough information to allow for the resolution of the BSS problem in a deterministic manner.Since the start of the 1990s, the path followed by the scientific community to address this issue has consisted in (i) adopting a set of hypotheses concerning statistical properties of the sources and (ii) recovering certain properties at the output of the separating system in order to adequately separate sources [8], [9], [17].In the following section, a prevalent statistical model for the sources that effectively allows for their blind separation is presented.

III. STATISTICAL MODEL FOR THE SOURCES
Considering that the source signals are functions of time, they can be preliminarily modeled as a vector stochastic process {s(n)} such that In this case, the linear instantaneous mixtures and estimated sources can be represented by stochastic processes {x(n)} and { y(n)}, respectively, such that and where and The adoption of a particular stochastic model for the sources additionally involves the consideration of assumptions about their statistical behavior along time and space.Such behavior may vary considerably among sources of distinct origins (e.g., speech and biomedical signals) and may not be precisely known in practical situations so as to compose a reasonable model [18].
In the context of linear instantaneous mixtures, the statistical time structure of the sources is usually ignored for simplicity or lack of a priori information.In other words, the distributions of the sources are assumed to be constant in time, implying that only their marginal distributions along time are considered [4], [5].Next, the implications of this source model assumption in the BSS problem formulation are dealt with in more detail.

A. Temporally independent and identically distributed sources
Broadly speaking, a stochastic process is called independent and identically distributed (iid) if it does not have any kind of statistical time structure-i.e., distribution variations and temporal interdependencies [14], [16].In the scalar case, the consideration of an iid assumption in the stochastic model conveniently allows for • the interpretation of the scalar iid stochastic process realization as succeeding independent drawings of a single rv-in this case, the process can be replaced in the model by its corresponding rv; • the estimation of the corresponding rv distribution parameters, such as expected value and variance, through time averages of the scalar iid stochastic process-due to the iid process being ergodic [16].In the BSS problem for linear instantaneous mixtures, disregarding the statistical time structure of the sources translates into an iid condition on the vector stochastic process {s(n)}.In the vector case, the iid condition is considerably stronger than in the scalar case, since the independence of different sources for different times, e.g., s 1 (0) and s 2 (1), is also guaranteed.This is slightly different than to only consider that the scalar processes {s i (n)}, for i = 1, 2, . . ., N s , are individually iid [14]- [16].
It is important to point out that the terms "independence" and "identically distributed" in the acronym iid are implicitly considered along time and not space.Even if the vector process {s(n)} is iid, this does not imply the space independence of the rvs s 1 (n 0 ), s 2 (n 0 ), . . ., s N s (n 0 ), for a fixed time n 0 ∈ Z.
Similarly to the scalar case, the adoption of an iid model for {s(n)} allows for the sources to be equivalently represented by a simple random vector s such that This representation is valid in the sense that a realization of {s(n)} can be interpreted as successive independent drawings of s.Furthermore, if {s(n)} is iid and the mixing and separating systems are instantaneous and time invariant, it can be shown that both {x(n)} and { y(n)} are also iid.In this case, these two processes can be represented by and respectively.In consequence, the input-output relations of the mixing and separating systems, given by Equations (III.2) and (III.3),can be rewritten as x = H s (III.9) and y = W x = M s, (III.10)respectively.In this particular case, it follows that the mixing and separation procedures can be conveniently interpreted as simple linear transformations applied to random vectors [5].
Finally, it should be emphasized that more precise models that consider the statistical time structure of the sources can be formulated, but at the expense of a loss of generality and more demanding mathematical treatment [18].On the other hand, although some model simplifications stem from the sources iid assumption, this hypothesis does not contribute to the formulation of linearly independent equations that allow for the adequate blind estimation of the sources-i.e., based only on observations of the mixtures.In order to make the separation possible, an alternative consists in adding another statistical assumption to the sources model, namely the spatial independence of the sources [9].Next, this assumption is further detailed.

B. Spatially independent sources
If the rvs s 1 , s 2 , . . ., s N s are independent in some sense, then the iid sources are called spatially independent.It is worth noting that there are several ways in which the statistical independence among more than two rvs can be defined.Henceforth, three main rv properties related to independence are considered: uncorrelatedness (i.e., second-order "independence"), pairwise independence (i.e., independence of any pair of rvs) and mutual independence (i.e., independence of any combination of two or more rvs).Rigorous definitions can be found, for instance, in [14], [16].These properties are related as follows: From these listed relations, it is possible to note that the assumption of mutual independence of the sources is generally more restrictive (i.e., stronger) than their pairwise independence and uncorrelatedness.In fact, while mutual spatial independence restricts the joint probability distributions of any combination of two or more sources, pairwise spatial independence restricts the joint probability distributions of only pairs of sources.Spatial uncorrelatedness is weaker than both kinds of independence, because it only consists in restrictions on second-order cross-moments of the sourceswhile probability distribution restrictions imply conditions on moments of any order.

C. BSS based on spatial independence
Historically, the first solutions proposed in BSS for linear instantaneous mixtures and iid sources were based on the assumption of mutual spatial independence of the sources [7], [8], [17].In practical situations, this assumption is usually valid or at least approximately accurate [5].The blind separation strategy behind these solutions was to recover the mutual spatial independence property at the output of the separating system.Initially, there was no rigorous explanation available of why this strategy seemed to work.
In fact, adding some kind of spatial independence assumption to the iid sources model may-or may not-allow for the blind separation of the sources.On the one hand, it can be shown that both spatial uncorrelatedness and pairwise spatial independence are not sufficient assumptions for separabilityin the sense that the recovery of such properties at the output of the separating system does not imply adequate source separation in the general case [19].On the other hand, it is possible to prove that, under certain conditions, the assumption of mutual spatial independence of the sources is sufficient for source separability.
A proof on the sufficiency of the mutual independence assumption for blind separation was initially presented by P. Comon in [9], who resorted to applying the Darmois-Skitovich theorem [10] to the BSS problem.This result was very important in that (i) the validity of the blind separation strategy based on the sources mutual spatial independence assumption was confirmed and (ii) the separability conditions that the source model should satisfy in order for the adequate blind separation of sources, through the recovery of independence, were established.
In order to better comprehend these separability conditions in the BSS problem for linear instantaneous mixtures with iid and mutually independent sources in space, it is convenient to revisit the Darmois-Skitovich theorem.However, preceding its presentation, some required preliminary theorems are briefly introduced, along with examples, in the following section.

IV. PRELIMINARY THEOREMS
Theorem 1 (Cramér, 1936).Let u 1 , u 2 , . . ., u N denote real and mutually independent rvs.If, for real constants a 1 , a 2 , . . ., a N , the sum is Gaussian, then all rvs u i for which a i 0 are also Gaussian.
This theorem was conjectured at first by P. Lévy; it was proved for N = 2 in [20] and extended to N > 2 in [21].Curiously, it is the converse of a well-known result, namely that a linear combination of independent Gaussian rvs is also Gaussian [16].The proof of Theorem 1, on the other hand, is much more involved than the proof of its converse.
It is interesting to compare Theorem 1 to the central limit theorem (CLT), widely used in statistics.In general terms, the CLT states that if u 1 , u 2 , . . ., u N are independent and identically distributed rvs, then as N tends to infinity, the distribution of v in Equation (IV.1) tends, in a probabilistic sense, to a Gaussian distribution [16].Due to this fact, it could be argued that Theorem 1 is not correct since in the CLT the rvs being summed are not required to have Gaussian distributions.However, this is not the case since the CLT is a result on the limit of a distribution as the number of independent rvs being summed tends to infinity, while in Theorem 1 the exact distribution of the sum of a finite number of independent rvs is considered.
Theorem 2 (Marcinkiewicz- Dugué, 1951).Let u denote a real rv with characteristic function Φ u : R → C defined as [16] Φ u (ω) = E e jωu , (IV.2) where E [•] denotes the expected value and j denotes the imaginary unit such that j 2 = −1.The only rvs that have a characteristic function of the form where p : R → C is a polynomial, are the constant rvs and the Gaussian rvs.
Some particular cases of this theorem were examined in the thesis of M. G. Kunetz in 1937 and its general proof was presented by M. Marcinkiewicz in 1940.Additionally, D. Dugué also mentioned this theorem in 1939 and presented a shorter proof of it in 1951 [22].
In order to prove Theorem 2, some rather sophisticated concepts of complex analysis and meromorphic functions (i.e., complex analytic functions except for a set of isolated points in the domain known as poles) are needed.Basically, the proof consists in determining conditions on the polynomial p such that Φ u (ω) = e p(ω) results in a valid characteristic function.
In the following example, presented in order to clarify some aspects of Theorem 2, all possible characteristic functions and probability density functions are determined for polynomials with maximum degree equal to two.
Example 1.Let u denote a real rv with probability density function f u and characteristic function Φ u (ω) = e p(ω) , with ω ∈ R and p polynomial such that degree(p) ≤ 2. Initially, it is useful to remember that the characteristic function of an rv u with probability density function f u is equal to the Fourier transform of f u with a sign reversal in the complex exponential.In fact, by expanding the expected value in Equation (IV.2),Φ u can be written as With that in mind, the following three collectively exhaustive situations for degree(p) ≤ 2 are considered: • p constant: The area under f u must be equal to one, which implies that Since p is constant, the only valid choice is p(ω) = 0 for all ω ∈ R. Thus, Φ u (ω) = 1 for all ω ∈ R and u is an rv with probability density function where δ(•) denotes the Dirac delta.Therefore, u is a constant rv that is equal to zero with probability one.
• degree(p) = 1: Since f u must be real, Fourier transform properties result in for all ω ∈ R, where (•) * denotes the complex conjugate.Assuming that p(ω) = αω + β, with α, β ∈ C and α 0, the conditions given by Equations (IV.5) and (IV.7) imply that β = 0 and α is purely imaginary.Thus, p must be of the form p(ω) = jωu 0 , with u 0 ∈ R and u 0 0, which yields In this case, u is a constant rv that is equal to u 0 0 with probability one.
The uniqueness of the Fourier transform pairs [23] implies that the characteristic functions considered in this example are the only functions associated to either constant or Gaussian rvs.Therefore, according to Theorem 2, there is no rv u such that Φ u (ω) = e p(ω) with degree(p) > 2.
Lemma 1 (Darmois, 1953;Linnik and Rao, 1964).Let f 1 , f 2 , . . ., f N : R → C and g 1 , g 2 : R → C all denote continuous functions in an open set V around the origin.Let T : R 2 → C satisfy the following decomposition: Then, all functions f i , for i = 1, 2, . . ., N, are polynomials with maximum degree equal to N.
A proof of this lemma for N = 2, based on the use of finite differences, is presented in [10].An alternative proof is presented in [21], along with some extensions of the original result.In general terms, this lemma establishes that the only functions for which the decomposition in two terms of Equation (IV.10) holds, under the condition (IV.11), are polynomials.Furthermore, the degrees of these polynomials must be, at most, equal to the number N of terms being originally summed.
An example in which the consistency of Lemma 1 is verified for different types of functions is presented next.
Example 2. Considering the particular case of N = 2 and f 1 , f 2 : R → R, the following pairs of functions are examined: • f 1 (x) = e x and f 2 (x) = |x|: The functions f 1 and f 2 are not polynomials.According to the contrapositive of Lemma 1, the function T does not admit a decomposition according to (IV.10) under the condition (IV.11) with nonzero constants.
• f 1 (x) = (x + 1) 2 and f 2 (x) = 2x: Considering these functions, straightforward manipulations on the expression of T yield T(x, y) = g 1 (x) + g 2 (y) + 2a 1 b 1 xy for all x, y ∈ R, (IV.12)where g 1 , g 2 : R → R are polynomials.Therefore, a decomposition of T according to (IV.10) is possible if, and only if, a 1 = 0 or b 1 = 0.In both situations, however, Lemma 1 is inconclusive since nonzero coefficients are required.In this case, the converse of the lemma is not true; i.e., f 1 and f 2 are polynomials with maximum degree N = 2, but a decomposition under the conditions of Lemma 1 does not exist.
• f 1 (x) = x 2 and f 2 (x) = −2x 2 : In this case, it follows from direct manipulations on the expression of T that x y for all x, y ∈ R, (IV.13)where g 1 , g 2 : R → R are polynomials.It follows that a decomposition of T according to (IV.10) is possible if, and only if, a 1 b 1 − 2a 2 b 2 = 0.Such condition is valid for nonzero coefficients and also while satisfying (IV.11)which, in this case, simplifies to a 1 b 2 − a 2 b 1 0. For example, if Therefore, by applying Lemma 1, it follows that f 1 and f 2 are polynomials of maximum degree N = 2-which, in fact, is true.
Based on Theorems 1 and 2 and Lemma 1, in the following section the Darmois-Skitovich theorem is enunciated and an outline of its proof, inspired by [10], [21], [24], is also presented.Afterwards, the application of this theorem in the context of BSS is addressed.
V. DARMOIS-SKITOVICH THEOREM Theorem 3 (Darmois, 1953;Skitovich, 1953).Let u 1 , u 2 , . . ., u N denote real and mutually independent rvs and define the linear forms where a 1 , a 2 , . . ., a N and b 1 , b 2 , . . ., b N are real constants.If v 1 and v 2 are independent, then for each index i (with i = 1, 2, . . ., N) such that a i b i 0, it follows that u i is either a constant rv or a Gaussian rv.
Proof.The joint characteristic function of v 1 and v 2 is defined as [16] Φ Substituting into (V.2) the expressions of v 1 and v 2 given by Equation (V.1), it follows that Due to the mutual independence of u i for i = 1, 2, . . ., N, the following factorization of Φ v 1 ,v 2 is possible [16]: In addition, the independence of v 1 and v 2 yields Equating the right-hand side of Equations (V.4) and (V.5) results in Applying the natural logarithm to both sides of this equation, and denoting Ψ(•) = ln(Φ(•)), the following identity between exponents of characteristic functions holds: For convenience, the set of all indices i for which a i b i 0 is now defined as: For indices i I, it is possible to incorporate the corresponding function Ψ u i in Equation (V.7) into one of the two terms on its left-hand side.Repeating this procedure on (V.7) for all the indices i I, the following generic expression can be obtained: where Ψ v 1 denotes the function Ψ v 1 after the incorporation of all the functions Ψ u i for which a i 0 and b i = 0, and Ψ v 2 represents the function Ψ v 2 after the incorporation of all the functions Ψ u i for which a i = 0 and b i 0. Eventually, if a i , b i = 0 for some index i, the corresponding term on the right-hand side will be identically zero since Ψ u i (0) = 0.It can be noted that there remain n(I) terms being summed on the right-hand side of Equation (V.9), where n(•) denotes the cardinality of a set.
In order to finish this proof, Equation (V.9) must be examined under the four collectively exhaustive cases listed next: (i) n(I) = 0: In this case, I = and a i b i = 0 for all i = 1, 2, . . ., N.
Nothing can be said of the rvs u i for i = 1, 2, . . ., N, because v 1 and v 2 are always independent since they are sums of rvs which belong to disjoint and, therefore, independent sets.(ii) n(I) = 1: Applying Lemma 1 to Equation (V.9), it follows that the function Ψ u i for i ∈ I is a polynomial of maximum degree equal to one.Now, it follows from Theorem 2 that u i is a constant rv.In fact, it can be shown that the only rv which is independent of itself is the constant rv-and this is exactly the case here.
Initially, Lemma 1 is applied to Equation (V.9) and it follows that the functions Ψ u i for i ∈ I are all polynomials of maximum degree equal to n(I) > 1.Now, it follows from Theorem 2 that the rvs u i for i ∈ I are either constant or Gaussian.(iv) n(I) In this case, a combined rv u i,k can always be defined such that with α i,k = 1/b i and β i,k = 1/a i .After combining exhaustively all possible rvs in Equation (V.1) for indices belonging to I, steps similar to case (iii) can be carried out for the resulting rvs.Specifically, Lemma 1 and Theorem 2 are applied to each resulting rv.For the individual rvs, it follows that they are either constant or Gaussian, analogously to case (iii).For the combined rvs, however, it only follows from the procedure of case (iii) that the combined rvs are either constant or Gaussian.If a given combined rv is constant, the individual rvs being summed are also constant because they are mutually independent.This is a particular case of Theorem 1 for Gaussian rvs with zero variance-i.e., constant rvs.On the other hand, if a given combined rv is Gaussian, then Theorem 1 is applied to conclude that at least one of the individual rvs is Gaussian, and the other ones are either constant or Gaussian.
This theorem was fully demonstrated in 1953 [10], [11].In spite of this, the relation between non-Gaussianity and independence of rvs had been long studied by several people; for instance, J. C. Maxwell, in the 19th century, investigated this topic when studying molecule velocity distributions in the three-dimensional space [25].
Eventually, the Darmois-Skitovich theorem was generalized for various cases, such as linear combinations of random vectors and random linear forms-i.e., linear forms, such as in Equation (V.1), with random coefficients instead of constant coefficients [21].
In several areas of statistics, the Darmois-Skitovich theorem is considerably relevant-especially in the fields of factor analysis and rv decompositions.relevance comes mainly from the fact that it consists in a characterization of the Gaussian distribution through the independence of two linear forms [21].This means that, apart from constant rvs 1 , only Gaussian rvs are not necessarily isolated when v 1 and v 2 are independent.This strong property is very interesting by itself in the theoretical sense, but it also has important practical applications-such as in the BSS problem.Such applications are further discussed in the next section.

VI. APPLICATIONS TO BSS
The goal of this section is to better understand, with clarifying examples, the separability conditions in BSS derived from the Darmois-Skitovich theorem.
Throughout this section, the real BSS problem for linear instantaneous mixtures with iid and mutually independent sources in space is considered.For simplicity, the BSS problem is assumed to be even-determined-i.e., with the same number of sources and mixtures such that Firstly, a condition on the combined response matrix to ensure uncorrelatedness at the output of the separating system is introduced.Next, separability conditions are presented, followed by a discussion on the consequences of a spatial whitening procedure applied to the mixtures.
A. Condition on the combined response matrix Lemma 2. Let s be the source vector with nonconstant and mutually independent rvs, and let y be the estimated source vector, both with N elements, such that y = M s where M ∈ R N ×N is an invertible combined response matrix.If the rvs in y uncorrelated, then where Λ y ∈ R N ×N e Λ s ∈ R N ×N are positive-definite diagonal matrices and M 1 ∈ R N ×N is an orthogonal matrix-i.e., M T 1 M 1 = I N , where I N denotes an N × N identity matrix.Proof.The covariance matrix of y is defined as [14], [16] where is the mean vector of y.Substituting y = M s into (VI.3)yields Now, substituting y = M s and (VI.4) into (VI.2) yields Since the rvs in s are nonconstant and mutually independent, they are also uncorrelated.This implies that where Λ s ∈ R N ×N is a positive-definite diagonal matrix.In addition, the rvs in y are also nonconstant because M is invertible.Thus, if the rvs in y are independent, it follows that C y = Λ y , (VI.9) where Λ y ∈ R N ×N is a positive-definite diagonal matrix.Finally, substituting (VI.8) and (VI.9) into (VI.7)yields By definition, it follows that is orthogonal and M is of the form (VI.1).
In short, this lemma establishes that the condition of uncorrelated estimated sources imposes a "special format", given by Equation (VI.1), on the combined response matrix.
In the following example, it is shown how Theorem 3 and Lemma 2 can be jointly applied in order to preliminarily infer separability conditions in a simple BSS problem.
Example 3 (The Darmois-Skitovich theorem and the combined response matrix).The BSS problem for N = 2, as shown in Figure 2, is considered in this example.Let the independent source rvs s 1 and s 2 be nonconstant.Additionally, let matrices H, W ∈ R 2×2 be invertible.The goal of this example is to verify under which conditions the imposition of spatial independence at the output of the separating system is able to adequately separate the given sources.Initially, the relation between the source vector and the estimated source vector-namely, y = M s-can be conveniently expanded as By varying the elements of the separating matrix W , the elements of the combined response matrix M = W H are also altered because H is invertible.Ensuring that M is chosen such that y 1 is independent of y 2 , Theorem 3 can be applied to Equation (VI.12), which yields In other words, if both elements of the i th column of M are nonzero, then the rv s i is Gaussian.Since M is a 2 × 2 matrix and satisfies a factorization of the form (VI.1) according to Lemma 2, it follows that As a result, the following cases can be considered: • All the elements of M are nonzero: From Theorem 3, it follows that the rvs s 1 and s 2 are Gaussian.In this case, neither of the sources have been adequately separated, since both have nonzero contributions to both y 1 and y 2 .
• M is either diagonal or antidiagonal: nothing can be said about s 1 and s 2 .However, in this case both sources are separated, except for eventual permutations and scale ambiguities.Although the structure of M has been related to the sources Gaussianity in the previous analysis, this relation can be better comprehended if the contrapositive of Theorem 3 is considered, namely In other words, if the rv s i is not Gaussian, then at least one of the elements of the i th column of M is equal to zero.Once again, it follows from Lemma 2 that

Now, the following cases are considered:
• s 1 and s 2 non-Gaussian: The contrapositive implies that is either diagonal or antidiagonal.Therefore, the sources are adequately separated by imposition of spatial independence, except for permutation or scale ambiguities.
• s 1 non-Gaussian and s 2 Gaussian: It is possible to conclude from the contrapositive only that a 1 b 1 = 0. On the other hand, the condition on M implies that a 2 b 2 = 0. Therefore, once again M must be either diagonal or antidiagonal and the sources are adequately separated.By symmetry, the case in which s 1 is Gaussian and s 2 is non-Gaussian is analogous.
• s 1 and s 2 Gaussian: Nothing can be said about the elements of M from the application of the contrapositive.To conclude this example, it should be noted that the application of the Darmois-Skitovich theorem to this BSS problem allowed for the verification of situations in which source separability is possible by imposition of spatial independence at the output of the separating system.Namely, separation is guaranteed up to ambiguities if at most one source is Gaussian.Additionally, nothing can be said if both sources are Gaussian, since in this case there is no additional restriction on the structure of the combined response matrix.In particular, if M is either diagonal or antidiagonal, then the sources are adequately separated.On the other hand, if all the elements of M are nonzero, then separation is not attained.
The results for N = 2 obtained in the considered example can be extended to the case of N ≥ 2 in what is known as the source separability theorem proved by P. Comon in [8], [9].These theoretical conditions that the mixture model must satisfy in order for the adequate separation of sources by spacial independence imposition are presented next.

B. Source separability theorem
Starting from the application of Theorem 3 to the particular BSS mixture model considered herein, in [9] the following intermediate lemma is obtained.Lemma 3 (Comon, 1992).Let s be the source vector with mutually independent rvs, and let y be the estimated source vector, both with N elements, such that y = M s where M ∈ R N ×N is the combined response matrix.In addition, the rvs in y are assumed to be pairwise independent.If the i th column of M has at least two nonzero elements, then s i is either a constant rv or a Gaussian rv.
Proof.Let M = [m k,i ] with k, i = 1, 2, . . ., N. Without loss of generality, it is assumed that the i th column of M has two nonzero elements in rows k 1 and k 2 such that k 1 k 2 -i.e., m k 1 ,i m k 2 ,i 0. Theorem 3 can be applied only considering rows k 1 and k 2 , since y k 1 and y k 2 are independent by hypothesis.It follows that s i is either a constant rv or a Gaussian rv.Now, Lemmas 2 and 3 can be used to determine separability conditions for the general case of N ≥ 2 and M invertible.These conditions are given by the following theorem, whose proof based on [9] is also presented.
Theorem 4 (Source separability theorem; Comon, 1992).Let s be the source vector with mutually independent rvs of which at most one is Gaussian and none are constant, and let y be the estimated source vector, both with N elements, such that y = M s where M ∈ R N ×N is an invertible combined response matrix.The following propositions are equivalent: (i) The estimated sources in y are pairwise independent.
(ii) The estimated sources in y are mutually independent.
(iii) M = ΛP, where Λ ∈ R N ×N is an invertible diagonal matrix and P ∈ R N ×N is a permutation matrix.
Proof.(iii) ⇒ (ii): If M = ΛP, then each element of y is equal to only one element of s multiplied by a real nonzero constant.
In addition, any pair of distinct elements of y are not equal to the same element of s up to a multiplication by a constant.Since the rvs in s are mutually independent, it follows that the estimated sources in y are also mutually independent.(ii) ⇒ (i): Since the estimated sources in y are mutually independent, then they are also pairwise independent [14].
(i) ⇒ (iii): Let the rvs in y be pairwise independent and let M ΛP for N ≥ 2. According to Lemma 2, M has a factorization of the form (VI.1).It follows that at least two different columns of M have at least two nonzero elements each.Applying Lemma 3 to the sources corresponding to each of the two aforementioned columns of M, it follows that s has at least two Gaussian sources.This is in contradiction with the original hypothesis that at most one of the elements of s is Gaussian.
This theorem ensures that, under certain conditions, the separation principle based on imposition of spatial independence at the output of the separating system is able to adequately separate the sources.In particular, this strategy is always valid if M is invertible and at most one of the sources is Gaussian and none of them are constant.This is in consonance with the conclusions obtained in Example 3 for N = 2.
Also, Theorem 4 establishes that the sources may be separated up to eventual permutation and scale ambiguities, represented by matrices P and Λ, respectively.These are inherent indeterminacies of any solution to the BSS problem for instantaneous linear mixtures with iid and mutually independent sources in space, except when additional hypotheses are considered in the sources statistical model [4], [5], [8] Finally, according to Theorem 4, although mutual independence of the sources is required in the statistical model, imposing either mutual independence or pairwise independence at the output of the separating system implies adequate source separation under the separability conditions.This is an interesting result, especially for envisioning solutions to the BSS problem.
As a way to further interpret and understand the separability conditions given by Theorem 4, the effect of spatial whitening of the mixtures in the BSS problem is discussed next.

C. Spatial whitening and separability
The procedure known as whitening consists in making a set of rvs uncorrelated and also normalized, such that all of them have the same variance [4], [8].For convenience, the following definition is adopted.
In BSS, an usual approach for separation consists in applying a spatial prewhitening procedure to the mixture vector, resulting in a whitened mixture vector on which the separation is subsequently performed.This spatial prewhitening step is usually done by means of a linear transformation.
A simplified scheme of the BSS problem with a prewhitening procedure is illustrated in Figure 3.The zero-mean mixture vector x is obtained according to (III.9),where H ∈ R N ×N is an invertible mixing matrix and s is a zero-mean source vector.A prewhitening linear transformation A ∈ R N ×N is applied to the mixture vector, resulting in a whitened mixture vector x = Ax. (VI.15) The whitening matrix A is chosen such that C x = I N , where C x = E[ x x T ] is the covariance matrix of x.Finally, an invertible separating transform W ∈ R N ×N is applied to the whitened mixture vector, resulting in an estimated source vector In order to better understand the consequences that a prewhitening procedure has in the global separating problem, it is convenient to consider the following proposition.Proposition 1.If the source rvs in s are mutually independent and nonconstant, then an adequate orthogonal transformation W applied to the prewhitened mixture vector is enough to adequately separate the sources.
Proof.Substituting (III.9)into (VI.15)yields x = AH s. (VI.17) Since the source rvs in s are mutually independent and nonconstant, and x is a random vector of uncorrelated rvs with unit variance, it follows from Lemma 2 that where Λ x = I N , M 1 ∈ R N ×N is an orthogonal matrix and Λ s is a positive-definite diagonal matrix.Now, substituting (VI.17 (VI.21) Mixtures: x Whitened mixtures: x Estimated sources: y Independent sources: . Simplified scheme of the main elements involved in BSS for linear instantaneous mixtures with iid and mutually independent sources in space when a prewhitening procedure is applied to the mixtures.
It follows from Equation (VI.21) that choosing an adequate orthogonal matrix W it is possible to adequately separate the sources.
From this proposition, it follows that a prewhitening step can be understood as solving a part of the BSS problem.This means that the search for an adequate separating matrix W can be restricted to orthogonal matrices when prewhitening is carried out [4], [5].In fact, independence implies uncorrelatedness and prewhitening is in general preliminary in order to attain independence.
It should be noted, however, that the condition given by Equation (VI.21) is equivalent to the imposition of spatial independence at the output of the separating system only if the separability conditions of Theorem 4 are satisfied.Specifically, at most one source must be a Gaussian rv.Therefore, although there also remains an orthogonal transformation in order to separate two or more independent Gaussian sources according to Proposition 1, such separation cannot be carried out blindly-i.e., by imposing spatial independence of the estimated sources.In fact, it is possible to show that for two or more Gaussian sources, and after prewhitening, any choice of an orthogonal matrix W is able to ensure independence for the estimated sources-but this does not imply that the sources are always separated.
In practice, spatial prewhitening can be done using similar procedures to what is carried out by principal component analysis (PCA) algorithms [5], [9].Next, a numerical example of prewhitening applied to observations of the mixture vector is presented.The goal is to verify what happens after the prewhitening step in terms of separability conditions and to illustrate the implications of Theorem 4 and Proposition 1.
Example 4 (Source distributions and spatial prewhitening of the mixtures).From now on, the scenario described in Example 3 with N = 2 is considered along with the scheme of Figure 3 and the following additional remarks: • The zero-mean source vector, given by s = [ s 1 s 2 ] T , is such that s 1 is independent of s 2 .
• The mixture vector, given by x = [ x 1 x 2 ] T , is obtained according to x = H s, where the following mixing matrix is considered

22)
• A prewhitening transformation A, obtained according to [5], is applied to the mixture vector resulting in a whitened mixture vector x = [ x 1 x 2 ] T such that x = Ax.Hereafter, the effect of the mixture and prewhitening procedures in the distributions of the involved signals is considered for three distinct joint distributions of the source rvs.Scatter plots for the independent sources, mixtures and whitened mixtures are shown in Figure 4 for 10 3 independent drawings of each random vector.The following source distributions are considered: (a) uniform rvs, (b) rvs with distinct distributions, one of them being bimodal, and (c) Gaussian rvs.
x (i)  independent sources mixtures whitened mixtures In particular, regarding Figure 4, it should be noted that: • The orthogonal axes shown in red and blue in the source scatter plots (first row of Fig. 4) are also represented after the mixing and prewhitening transforms (second and third rows of Fig. 4, respectively).The axes directions change due to the linear transformations being applied.Specifically, after mixing the axes point in the directions of h 1 (red) and h 2 (blue), which are not orthogonal.After the prewhitening transform, however, the orthogonality of the axes is recovered.
• By comparing the scatter plots of the unobserved sources with those of the whitened mixtures, it can be seen that the whitening procedure is capable of recovering the "general structure" of the source distribution support up to an orthogonal transformation (e.g., rotation) -which is consistent with Proposition 1.Therefore, the following step in source separation with prewhitening consists in determining an orthogonal separation matrix W to be applied to the whitened mixtures such that the sources are adequately separated.If, for example, a rotation is applied to the scatter plots in the third row of Figure 4, it can be noted that: • There is more than one rotation possible for the adequate separation of sources-all imply in correct separation, but with eventual change in the order of the sources.This is a reflex of the permutation ambiguity considered in Theorem 4.
• For the source distribution considered in case (b), the scatter plot of the whitened mixtures is elongated in the direction of h 2 when compared to the independent source scatter plot.In this case, when the correct rotation is applied to the scatter plot of the whitened mixtures, the sources are recovered with a scale ambiguity-as also considered in Theorem 4. Such behavior, however, is not observed in cases (a) and (c), where both source rvs have the same variances.Finally, according to Theorem 4, spatial independence imposition at the output of the separating system implies adequate blind source separation for cases (a) and (b).Source separation is not ensured, however, for case (c), where both sources are Gaussian.This can also be understood in a alternative and complementary way: any rotation applied to the scatter plot of the whitened sources results in independent estimated sources because (i) uncorrelatedness implies independence in the Gaussian case and (ii) the covariance matrix of the whitened mixtures is an identity.Therefore, spatial independence imposition is not enough for blind source separation in this case.VII.CONCLUSIONS P. Comon in [9] resorted to the important relation between independence and non-Gaussianity of rvs evidenced by the Darmois-Skitovich theorem to establish sufficient conditions of source separability for iid and mutually independent sources in space when applied to a linear instantaneous mixing system.Namely, that the sources can be recovered via independence imposition at the output of the separator system if at most one source is Gaussian, none are constant and the mixing system is invertible.
It should be finally noted that the Darmois-Skitovich theorem has been extended in many different ways, for example, in the derivation of separability conditions for complex sources [26], [27] and nonlinear mixtures [28].In addition, it is still a subject of research to the present days, especially in the field of mathematical statistics (see, for example, [29], [30]).Motivated by the fact that the relation between independence and non-Gaussianity of rvs is not evident, the goal of this paper was to aid in the understanding of fundamental theoretical concepts of BSS by presenting and interpreting the Darmois-Skitovich theorem in this context.In future works, the understanding of current techniques of BSS with high levels of sophistication is envisaged.

Figure 1 .
Figure 1.Simplified scheme of the main elements involved in the BSS of N y sources through the observation of N x mixtures.

2 Figure 2 .
Figure 2. Simplified scheme of the main elements involved in the real and even-determined BSS problem with N = 2, for linear instantaneous mixing and separating systems with iid and mutually independent sources in space.

Figure 4 .
Figure 4. Scatter plots of independent sources, mixtures and whitened mixtures for N = 2 and (a) uniform sources, (b) sources with distinct distributions and (c) Gaussian sources.Independent drawings of each sourceand the associated mixture and whitened mixture-are indicated in the axes labels by a superscript index i enclosed in parentheses, with i = 1, 2, . . ., 10 3 .
• For N s = 2, mutual independence is equivalent to pairwise independence; i.e., mutual indep.⇔pairwise indep.⇒uncorrelatedness.•A full equivalence of these three properties occurs in the particular case of N s ≥ 2 jointly Gaussian rvs; in this case, mutual indep.⇔ pairwise indep.⇔ uncorrelatedness.