Homophonic Coding and Random Number Generation

—Efﬁcient generation of a discrete probability distribution is of current interest in areas like cryptography and random number generation. This paper revisits some known homophonic coding techniques and discusses their application in random number generation. Both standard and constrained homophonic coding techniques are considered. Three algorithms are given for generating a discrete probability distribution using one or more biased coins. This approach contributes an alternative solution to the classical problem of generating a discrete probability distribution using biased coins.


I. INTRODUCTION
T He homophonic coding technique when applied to a sequence of symbols (u 1 , u 2 , . ..), at the output of an information source, produces at the homophonic encoder output a sequence of symbols called homophones in a larger alphabet, i.e., more than one homophone can be one-to-one associated to a given source symbol.Each homophone is usually represented one-to-one by a block x i , called homophonic codeword, containing W i D-ary symbols, where D is a positive integer greater than or equal to 2.
Homophonic coding is employed with the objective of decomposing each source symbol probability in such a way that the resulting sequence of homophones (or labels) appear to be randomly generated, i.e., a sequence of independent and identically distributed (i.i.d.) random variables.For simplicity and practical interest this paper focus on binary systems although the procedures described are applicable to D-ary coding alphabets.In a perfect standard binary homophonic coding scheme the symbols in each homophonic codeword are i.i.d.binary random variables while in a binary-constrained homophonic coding scheme they are independent and identically distributed but not equally likely binary random variables.
The generation of a string of random variables drawn from a discrete probability distribution using flips of a biased coin is considered an old problem of great importance in the areas of cryptography and random number generation.Random numbers find many applications in practice, in particular they are used to perform tests and simulation of communication systems as well as many other computational applications [1].Von Neumann [2] introduced a simple algorithm to generate a string of i.i.d.bits from flips of a coin with unknown bias.Since then several researchers have considered and studied the generation of uniform random variables under a variety of different assumptions [3]- [13].
This paper revisits some known homophonic coding techniques and discusses their application in random number generation.Two binary-constrained algorithms are discussed for the representation of discrete probability distributions by tossing a single biased coin.For a given discrete probability distribution {p 1 , p 2 , . . ., p K } each entry p i , 1 ≤ i ≤ K, is represented by a sum (finite or infinite) of fractions, where each fraction is labeled by a sequence of i.i.d.random variables.Each fraction is a rational number b i /n i where b i is a number between 1 and D, and n i is a power of D. A third algorithm, the Λ-algorithm, is presented to generate a sequence of random variables drawn from a discrete probability distribution by tossing two or more distinct biased coins.
The optimality question for the algorithms considered here remains open [11], [14].However, it is shown by some simple examples that with the Λ-algorithm it is possible to obtain equivalent and in some cases even better results than those in [15].It is important to notice that the biased coins employed in the Λ-algorithm are arbitrary, i.e., they do not depend on probability distribution {p 1 , p 2 , . . ., p K } as is the case with the algorithm in [15].This approach contributes an alternative solution to the classical problem of generating a sequence of i.i.d.random variables using two or more coins where some of the coins are biased.
We call the reader's attention to the fact that, when the probability distribution of the source of random numbers is known, it was shown that it is impossible to construct a recursive tree algorithm to achieve a specific target probability distribution.
In Section II we describe the Maximum entropy per step algorithm [16], or MAX-ENT algorithm for short, in the context of generation of a discrete probability distribution using a biased coin.In Section III we describe the minimum entropy per step algorithm [17], also in the context of random number generation, which is later modified and employed in Section IV for the generation of a discrete probability distribution using two or more biased coins.In this section, the application of our proposed algorithm will be illustrated through some examples, in which a uniform probability distribution is generated using two coins, one fair coin and a biased coin.Summing up, in Section V we will present some conclusions as well as some suggestions for future research.

A. Related Works
Feldman et al. [18] proved, among several results, that the outcomes of an n-sided fair die, that is, the outcome of a random variable which takes n equiprobable values, can be simulated in bounded time by using flips of just one type of coin of appropriate rational bias if and only if n is a power of 2 ( [18], Theorem 2).It also followed from [18] that a general n-sided fair die can always be simulated by using two coins of the appropriate bias and at most 2 log n +1 coin flips, where x denotes the smallest integer number greater or equal to x. Gargano and Vaccaro [15] published an improvement on this bound, and considered several algorithmic questions, related to the classical problem of simulating the outcomes of a uniform random variable by using a small number of biased coins, and an algorithm was given to generate an n-sided fair die using only a fair coin and a biased coin.
In [16] the algorithm later called MAX-ENT algorithm was introduced to perform homophonic coding in which the symbols in each homophonic codeword are independent and identically distributed binary random variables obeying an arbitrary probability distribution Π 2 = {p, 1 − p}, where p ≥ 1/2.The MAX-ENT algorithm provides a solution to the problem posed by Knuth and Yao [11,page 427] on the generation of probability distributions using a biased coin and advanced one step further the solution originally proposed by Julia Abrahams [3].In [17] the minimum entropy per step algorithm was introduced to perform homophonic coding, having as its main motivation the fact that in many situations it was more efficient than the MAX-ENT algorithm.
In [19] the authors provided the first known algorithm that generates unbiased random bits from an arbitrary finite Markov chain, which operates in expected linear time and achieves the information-theoretic upper bound on efficiency.In [20] the authors considered the problem of generating random bits from a loaded die as a natural generalization of generating random bits from a biased coin, and in this manner enabling the application of existing algorithms to general sources.Furthermore, the authors also investigated new approaches for efficiently generating a prescribed number of random bits from an arbitrary biased coin.In [21] the problem of extracting a prescribed number of random bits was addressed by reading the smallest possible number of symbols from a source whose statistical behaviour is not fully specified.The related interval algorithm proposed by Han and Hoshi [9] has asymptotically optimal performance, however it assumes that the distribution of the input stochastic process is known.It was noticed that, in practice, sources of randomness have inherent correlations and are affected by measurement's noise.In other words, it is difficult to obtain an accurate estimate of the probability distribution.The authors main contribution is the design of extractors that have a variable input-length and a fixed output length, which are efficient in the consumption of symbols from the source, and are capable of generating random bits from general stochastic processes and approach the information theoretic upper bound on efficiency.In [22], the authors' main contribution is an algorithm that generates random bit streams from biased coins, uses bounded space and runs in expected linear time.The algorithm approaches the information theoretic upper bound on efficiency as the size of the allotted space increases.Finally, in [23] the author reports a computation of the exact output rate of a recently discovered generalization of Peres algorithm [24] for generating random bits from loaded dice.Instead of resorting to a brute-force computation for all possible inputs, which becomes quickly impractical as the input size increases, the author computes the total output length on equiprobable sets of inputs by dynamic programming using a recursive formula.

II. THE MAXIMUM-ENTROPY PER STEP ALGORITHM
In [11, page 427] a question was posed relating the generation of discrete probability distributions from a biased coin: "What if the source of independent random bits is biased towards 1 with probability p ?"This question was answered in part by Julia Abrahams [3], for the case where the biased binary random variable has the special form (t i , t j ), for integers i and j, and where t is a positive root of the equation We consider in the sequel discrete probability distributions Q = {q 1 , q 2 , . . ., q K }.We assume with no loss of essential generality that all K probabilities q i , 1 ≤ i ≤ K, have non-zero values and that K ≥ 2. The set of labels V i = {v(i, 1), v(i, 2), . . ., v(i, j), . ..}, either finite or countably infinite, associated with q i , 1 ≤ i ≤ K, is characterized by the fact that for each entry v(i, j) we have P Vi|Q (v(i, j)|q l ) = 0 if and only if l = i.For binary variable-length coding of each v(i, j) a sequence X(1, j), X(2, j), . . ., X(W j , j) is defined whose entries are binary random variables, taking value in the alphabet {0, 1}, and where W j , the length of the sequence representing v(i, j), is in general also a random variable.It is required that x(1, j), x(2, j), . . ., x(W j , j) be a prefixfree encoding of v(i, j), i.e., such sequences are all distinct and none is the prefix of another.Hereafter all entropies are assumed to be in bits and all logarithms are understood to be in base 2. In order to simplify the notation, we will represent X(1, j), X(2, j), . . ., X(W j , j) by X 1 X 2 . . .X W whenever no ambiguities result.
Definition 1: We define a biased coin tossing coding scheme to be perfect if the symbols of any sequence X 1 X 2 . . .X W are i.i.d.discrete random variables.
Definition 2: We define a biased coin tossing coding scheme to be optimum if it is both perfect and minimizes the average sequence length E(W ) over perfect biased coin tossing schemes, for a given discrete probability distribution.

A. Biased Coin Tossing
In standard unbiased D-ary coin tossing schemes, D ≥ 2, the designer benefits from the fact that a given probability q i ∈ Q, 0 < q i < 1, has an essentially unique base D decomposition.This follows because q i either has a unique decomposition as an infinite sum of negative powers of D, or it has both a decomposition as a finite sum of distinct negative powers of D and a decomposition as an infinite sum of distinct negative powers of D in which the smallest term in the finite decomposition is expanded as an infinite sum of successive negative powers of D. For example, for D = 3, q i = 4/9 can be decomposed as either q i = 1/3 + 1/9 or as Biased coin tossing schemes unfortunately do not inherit the essentially unique probability decomposition property described earlier.This means that in order to split each probability in a discrete probability distribution into labels we need to work with the set of probabilities for all symbols i, 1 ≤ i ≤ K, instead of working with only one symbol probability at a time as for the D-ary case.We handle this situation with the biased coin coding algorithm introduced in [16] in the context of homophonic coding and described next in terms of random number generation.

B. Biased Coin Tossing Algorithm
Let Π 2 = {p, 1−p}, p ≥ 1/2, be the biased coin probability distribution.For a given discrete probability distribution the biased coin MAX-ENT algorithm simultaneously finds the decomposition of each probability as a sum (finite or infinite) of terms p λ (1 − p) l−λ , and the corresponding prefix-free sequence, where λ is the number of heads (1's) and l − λ is the number of tails (0's) of a sequence of length l.The labels v(i, j) are selected as terminal nodes in T , the binary rooted tree with nodes labeled by probabilities, such that from any non-terminal node two branches emanate with probabilities p and 1 − p = p, respectively.Let α(i, j) denote the probability of v(i, j).
Definition 3: We define the component running sum γ m (i, j) for q i , 1 ≤ i ≤ K, at the m th iteration of the MAX-ENT algorithm as with γ m (i, j) = q i for j = 0, where j denotes the number of labels allocated to q i up to the m th iteration.
Definition 4: We define the running sum set Γ m at the m th iteration of the MAX-ENT algorithm as We expand each unused (not yet labeled) terminal node in T , whose probability exceeds γ max , by the least number of branches sufficient to make the resulting extended terminal node probability less than or equal to γ max .We call the resulting tree the processed binary rooted tree with probabilities and denote it as T p .At each iteration labels are assigned to terminal nodes of the corresponding processed binary rooted tree with probabilities, in a manner that the unused terminal node with largest probability is assigned as a label to the symbol with largest running sum γ max .The MAX-ENT algorithm consists of the following steps.
1) Let m = 1.Let Γ 1 be the set whose elements are the probabilities q i , 1 ≤ i ≤ K, ordered in decreasing order, and construct the corresponding processed binary rooted tree with probabilities T p .2) Without loss of essential generality, assume that q r = γ max ∈ Γ 1 .If there are two or more probabilities with the same largest value, just pick any one of them at random to start.Let (i, j) = (i, 1) and let γ 1 (i, 1) = q i , 1 ≤ i ≤ K.
3) Find the unused path E l of length l in T p whose probability P (l) is largest among unused paths.4) Associate to q r the label (terminal node) v(r, j) and the binary sequence of length l r,j = l, whose digits constitute the labeling of E l in T p .This implies α(m, j) = P (l).
Let j ← j +1.For the updated j value compute the component running sum γ m (r, j) and let Γ m = Γ m −{γ max }.

III. THE MINIMUM ENTROPY PER STEP ALGORITHM
We now describe the minimum entropy per step (MIN-ENT) algorithm.Many of the terms used here have already been introduced in Section II.At the m th iteration, m > 1, a label is assigned to a terminal node of the corresponding tree T p , in a manner that the unused terminal node with largest probability P m is assigned as a label to the probability q r with minimum nonnegative value for the difference between its component running sum γ m (r, j) and P m , i.e., such that min i {γ m (i, j) The MIN-ENT algorithm consists of the following steps.
{q 1 , q 2 , . . ., q K }. 2) Determine γ max and produce the tree T p for the m th iteration by expanding each terminal node in the tree from the m th iteration, m > 1, whose probability exceeds γ max , by the least number of branches sufficient to make the resulting extended terminal node probability less than or equal to γ max .3) Find the unused path E l of length l in T p whose probability is largest among not yet labeled paths, and denote this largest probability by P m .4) If min i {γ m (i, j)−P m |(γ m (i, j)−P m ≥ 0)} = γ m (r, j)− P m ≥ 0, then we associate to q r the label (terminal node) v(r, j) and the binary sequence of length l, whose digits constitute the labeling of E l in T p .This implies α(r, j) = P m .Compute the component running sum γ m (r, j) after this decomposition and let Γ m = Γ m − {γ m (r, j)}.If γ m (r, j) = 0 then let Γ m+1 = Γ m .The decomposition of q r has been obtained and contains j labels, and if Γ m+1 = φ then END.Otherwise, i.e., if γ m (r, j) > 0, Example 4: Let Q be the K = 2 discrete probability distribution with q 1 = 1−(1−p) n = 1−p n and q 2 = (1−p) n = p n .We consider the generation of Q when Π 2 = {p, 1 − p} is the biased coin probability distribution.Applying the MIN-ENT algorithm we obtain the following probabilities for the labels representing q 1 : α(1, 1) = p, α(1, 2) = pp, α(1, 3) = p 2 p, . . ., α(1, j) = p (j−1) p, . . ., α(1, n) = p (n−1) p, and for q 2 we obtain a single label v(2, 1) whose probability is α(2, 1) = p n .It follows that H where h(p) = −p log p − (1 − p) log(1 − p) is the binary entropy function [5].However, since because lim n→∞ H(Q) = 0. We remark that both the MAX-ENT algorithm and the MIN-ENT algorithm produce identical results in this example because at each step of either algorithm there is only one possibility for performing the probability expansion, i.e., γ m (1) − P m > 0 and γ m (2) − P m < 0, for 1 ≤ m ≤ n.For m = n + 1 we have γ n+1 (1) = 0 and γ n+1 (2) − P n+1 = 0.

IV. USING TWO OR MORE BIASED COINS
In this section we introduce a generalization of the MIN-ENT algorithm [17], by generating a discrete probability distribution using two or more biased coins, obtaining results similar to Gargano and Vaccaro's [15], with the distinction of not necessarily using a probability distribution of heads and tails dependent on n.We introduce next some notation that will be used in the sequel.

A. Basic Terminology
A tree T is used to indicate the choice of distinct coins by the algorithm in order to produce the desired probability distribution.A distinct labeled leaf in T is associated one to one with each one of the possible outcomes.Given an algorithm to generate an n-sided die and its associated tree T , the worst case bounded time L max to produce an outcome is given by and the average time is given by where l T (x) denotes the depth level of x in T , i.e., the length of the path from the root of T to the leaf x, and p(x) denotes the probability of a leaf x being reached.
In what follows we assume that, for each tree considered, two branches from each node emanate.Each node is labeled using a coin distribution.If a node has no indication we assume the coin used is an unbiased coin, and if the node is indicated by (p, 1 − p) this means a biased coin is used with head's probability given by p and tail's probability given by 1 − p.Each branch is labeled with a probability.Following [15], for any positive integer n let p(n) be defined as and let r(n) = n − 2 log n /2 p(n) .It follows from ( 6) that p(n) = 0 if n is odd.The biased coin, as suggested in [15], has probability distribution (P H , P T ) for heads and tails given by where m is the largest odd factor of n.In order to generate a fair die with n faces, in the worst case [15, Theorem 1] coin flips are required, and on average coin flips are required.

B. Description of the Algorithm
The algorithm proposed here follows essentially the same steps of the one introduced in [17] with the important difference that instead of using only one coin we use two or more coins and at each step we decide which coin must be chosen to be flipped, in order to minimize the entropy in that step. Let . ., m r = {p r , 1 − p r } denote the probability distribution of coins 1, 2, . . ., r, respectively, and let Q = {q 1 , q 2 , . . ., q K } denote the probability distribution to be generated, i.e, the target probability distribution, where K is a positive integer, K ≥ 2. For a given source, our algorithm, henceforth referred to as the Λ-algorithm, finds the decomposition p λ1 1 p λ2 2 . . .p λr r (1−p 1 ) l1−λ1 (1−p 2 ) l2−λ2 . . .(1−p r ) lr−λr of each probability in Q as a sum (finite or infinite) of terms, where λ i denotes the number of heads and l i − λ i denotes the number of tails, 0 ≤ i ≤ r, for the coin with probability distribution m i , and r i=1 l i = l for a sequence of length l.Definition 5: For a given finite set of biased coins with probability distribution we define a hybrid binary tree as a binary tree for which each node is associated with one of the probability distributions m i , 1 ≤ i ≤ r.
The Λ-algorithm, when applied to the target probability distribution, generates a hybrid tree T where each leaf in T is associated with a probability in the target probability distribution.The probability of a path of length l in T , containing λ 1 +λ 2 +. ..+λ r heads and − p r ) lr−λr .In particular, for computing the probability of a leaf (terminal node), the path extending from the root node to that terminal node is considered.For m = 1 grow trees T 1 , T 2 , . . ., T r from the root, generated from the flip of the coins associated with the probability distributions m 1 , m 2 , . . ., m r , respectively.Expand by one branch each terminal node in those trees whose probability exceeds γ max .Keep only those trees for which at least one resulting extended terminal node probability is less than or equal to γ max .The resulting s trees are called processed binary rooted trees with probabilities, T p , 1 ≤ ≤ s, s ≤ r.
At the m th iteration, m > 1, the minimum nonnegative value is computed for the difference between the running sums in the running sum set Γ m and P m , i.e., min i {γ m (i, j) − P m |(γ m (i, j) − P m ≥ 0)} = γ m (t, j) − P m ≥ 0, where P m denotes the largest probability of a not yet labeled terminal node, among all T p , 1 ≤ ≤ s.Such a terminal node is assigned to q t .Notice that at each iteration for m > 1 there is a Γ m,p associated to each one of the T p trees.The norm of Γ m,p is given by and its minimum value is used to establish a criterium to choose the trees that will be kept for the following iteration.
Since the size of surviving trees grow exponentially with m, a rule is desired to eliminate those surviving trees for which their respective running sums at the m th step do not satisfy some convergence criteria.Therefore, a value for L max is chosen, being denoted by M .
The Λ-algorithm consists of the following steps.
, . . ., q K }.Grow each tree T 1 , T 2 , . . ., T r from the root node by a depth of one according to their respective coin probability distributions m 1 , m 2 , . . ., m r , respectively.2) Determine γ max and produce the trees T p , 1 ≤ ≤ s, for the m th iteration by expanding, by a depth of one, each terminal node in each T p tree from the (m − 1) th iteration, m > 1, whose probability exceeds γ max , and by keeping those expanded trees for which at least one extended terminal node probability (leaf probability) is less than or equal to γ max .3) Calculate the norm for each tree T p using (10).Keep the tree(s) with smallest Γ m,p .4) For each , 1 ≤ ≤ s, find the not yet labeled path E l of length l in T p whose probability P m is largest among unused paths.Denote by P m the largest probability among all P m that do not exceed γ max , and denote by E l the respective path.5) If min i {γ m (i, j)−P m |(γ m (i, j)−P m ≥ 0)} = γ m (t, j)− P m ≥ 0, 1 ≤ i ≤ K, then associate to q t the terminal node v(t, j).This implies α(t, j) = P m .Compute the running sum γ m (t, j) after this decomposition and let Γ m = Γ m − {γ m (t, j)}.If γ m (t, j) = 0 then let Γ m+1 = Γ m .The decomposition of q t is now complete and contains j terms, and if Γ m+1 = φ then END.Otherwise, i.e., if γ m (t, j) > 0, then let Γ m+1 = Γ m ∪ {γ m (t, j)}.6) Let m ← m + 1. 7) If m = M stop, otherwise go to step 2.

C. Performance Comparison
In this Section we provide examples showing smaller values for the average time E[T ] than those in [15].Using the Λ-algorithm, there are cases where the maximal length L max is not bounded but even in these cases a shorter E[T ] results.The Λ-algorithm is a generalization for more than one biased coin of the algorithm introduced in [17], obtaining equivalent and in some cases even better results than those in [15].In order to compare the performance of the Λ-algorithm with that of Gargano and Vaccaro [15], we are going to use the same coins that would be used in [15] for the generation of a uniform probability distribution using two coins, one fair and the other with distribution given by (7).We call attention once more to the fact that the biased coins employed in the Λalgorithm are arbitrary, i.e., they do not depend on n as is the case for the algorithm in [15].The biased coin, as suggested in [15], has probability distribution for heads P H , and tails P T , given by (7).
In the next step only the surviving trees are considered and the branches with probability greater than 1/6 must be expanded, remembering always to consider all the possible expansions using coins m 1 and m 2 .Two of the trees that are obtained by applying the Λ-algorithm to depth three from the root node are shown in Figures 4 and 5.
The trees shown in Figures 4 and 5 provide distinct solutions to the problem of generating a uniform probability distribution for n = 6, and both are bounded trees.We notice that the tree in Figure 4 is the same as that which results by using the  algorithm in [15].Applying (4) and ( 5) to the tree in Figure 4 the values L max = 3 and E[T ] = 2.67 result for the worst case time and for average time, respectively.Identical results for L max and E[T ] are obtained for the tree in Figure 5.
Example 6: Consider generating a uniform probability distribution for a random variable with n = 7 possible outcomes, using the coins with probability distribution m 1 = (1/2, 1/2) and m 2 = (3/7, 4/7), respectively.One of the trees obtained using the Λ-algorithm for n = 7 is illustrated in Figure 6, and the tree obtained by the use of the algorithm introduced in [15] is shown in Figure 7.
It should be noticed that the parameter L max for the tree in Figure 7 is bounded.On the other hand using the Λ-algorithm, L max in this example is unbounded (Figure 6) but the resulting average time is E[T ] = 3.1902, and is a better result than E[T ] = 3.29, obtained when the algorithm in [15] is used.

V. CONCLUSIONS
A new algorithm for the generation of a string of random variables drawn from a discrete probability distribution using the flips of two or more coins, some of them biased, was introduced.In particular, this approach contributes an alternative solution to the classical problem of generating a discrete uniform probability distribution using two or more unbiased coins.It was shown by some simple examples with the Λ-algorithm that it is possible to obtain equivalent and in some cases even better results than those in [15].In principle, the choice of coins that must be employed to generate a   7. Tree obtained for n = 7 using the algorithm in [15].discrete probability distribution using the Λ-algorithm does not depend on n.However, some examples have shown that the performance of the Λ-algorithm varies with the biased coins that are chosen, i.e., the expected length of the labels can vary.For this reason, it is important to investigate a criterium for specifying coin probability distributions, aiming at the optimization of the Λ-algorithm.Another point for future research is the investigation of ways to limit L max in those cases where the tree produced by the Λ-algorithm is unbounded.

ACKNOWLEDGMENT
The authors are grateful to the Editor and to the reviewers for providing constructive comments that have improved the quality of this paper.Danielle P. B. de Arruda Camara acknowledges partial support from the Pernambuco State Foundation to Support Science and Technology -FACEPE, Project APQ-0055-3.04/09.Valdemar C. da Rocha Jr. and Cecilio Pimentel acknowledge partial support from the Brazilian

Fig. 1 .
Fig. 1.Trees T1 and T2 generated by the first flip of coins with probability distribution m 1 and m 2 , respectively.

Fig. 3 .
Fig.3.All possibilities of growth for tree A2 in two steps of the Λ-algorithm.