Adaptive Universal Codes for Integer Representation

— For a given arbitrary list of integer numbers, in general there is no known single universal code which is an overall optimum in the sense of representing the whole list with the shortest average codeword length. This is the motivation in this paper to introduce a class of adaptive universal codes based on pattern codes, for integer representation. A construction of adaptive universal codes is given, which is based on Fibonacci codes. This construction is shown to perform well over a wider range of integer values in comparison to known universal code constructions.

For a given arbitrary list of integer numbers, in general there is no known single universal code which is an overall optimum in the sense of representing the whole list with the shortest average codeword length [1], [8], [9], [17], where by the length of a codeword we mean the number of digits that it contains.We were then motivated to try an alternative approach and partially circumvent this difficulty by introducing adaptive universal (AU) [18] codes based on pattern codes, originally treated as prefix codes by Gilbert [3].By an adaptive code we mean a code represented by a set of two or more codes together with a code selection rule.The code selection rule specifies which code in the set is to be used, as a function of the range of integers to be represented.Pattern codes have been studied in detail by Lakshmanan [23] and it is worth mentioning that they are related to a conjecture of Gilbert [3], studied later by Guibas and Odlyzko [19], concerning the maximum number of codewords allowed in a class of block codes known as prefix-synchronized codes.
In the following, we consider a generalization of the universal codes introduced by Capocelli [14] of the kind related to the Zeckendorf [20] representation of integers in terms of generalized Fibonacci numbers, that have the capability of locally confining errors.According to Capocelli [14], for a given positive integer r, universal codes can be constructed by considering the set of all binary strings of length greater than or equal to r in which the sequence formed by a 0 followed by r − 1 1's, denoted as 01 r−1 , occurs only once as a suffix.
These binary strings form a countably infinite set of prefix-free codewords which we will denote by S = S(r, 01 r−1 ).It was also shown that the S codes are prefix-free and complete [21], and universal in the sense of Elias [22], [23].Furthermore, S codes can be recognized by a finite-state automaton (i.e., they are regular codes) and thus cannot be asymptotically optimal [24].Moreover, S codes are synchronizable and have synchronization delay equal to one, i.e., in case errors occur and synchronization is lost, after a single codeword the decoder recovers synchronization (more robust than most other variable-length codes) [1], [12].
In the following, we introduce a construction for a class of codes which enjoy properties similar to the S codes.However, the codes in our construction perform well over a wider range of integer numbers.These codes are denoted by A and result from a combination of at least two universal codes.In Section II, we present a basic introduction to classical universal codes and show that the generalized Fibonacci representation can be exploited to construct binary uniquely decodable codes.In Section III we define Adaptive Universal codes.In Section IV, we construct a class of codes, called Fibonacci Adaptive Universal (FAU) codes, and compare their performance with that of other universal schemes.In Section V, we close this paper by presenting some comments and conclusions.

II. CLASSICAL UNIVERSAL CODES
After examining various universal codes, [5]- [7], [11]- [14], [22]- [25] the codes having the shortest average codeword length were obtained with the Fibonacci codes known as pattern codes [3].This fact is illustrated in Table I, where L Ω denotes the set of codeword lengths for code Ω and N denotes a positive integer.The three rightmost columns in Table I contain results from some of the proposed codes, to be described later.A prefix-free code is any code with the property that no codeword is a prefix of any other codeword.Let p = p 1 p 2 . . .p m denote an arbitrary binary string, or pattern, of length m.We denote by l(p) the length of the pattern p, i.e., the length of the pattern p = p 1 p 2 . . .p m is the positive integer m.
Definition 2.1: For a given binary pattern p of length m, a pattern code (p-code) is a set T of binary strings of variable length l +m, l ≥ 0, such that for any x 1 x 2 . . .x l p 1 p 2 . . .p m ∈ T , the pattern p = p 1 p 2 . . .p m occurs only once as a suffix.For example, the pattern p = 0111 of length four is a suffix in the codeword 1100001000 0111.It is clear that every p-code is a prefix-free code, and is therefore uniquely decodable.
Definition 2.2: For given integers N 1 and N 2 , N 1 ≤ N 2 , let A and B denote codes for representing an integer i, N 1 ≤ i ≤ N 2 .Code A is defined as uniformly better than code B in the  (3,4) , has either the same length or a shorter length than It is well known [1], [8], [14] that pattern codes known as Fibonacci codes are so far the best for representing a list of integers.However, when considering a sufficiently large range of integers, the integer representation provided by a single Fibonacci code is not uniformly better than that given by another Fibonacci code (see Table I).This is also the case for other families of universal codes.
Among other properties, we notice that Fibonacci codes have an easy-to-understand structure, a simple indication of codeword length, and as indicated in Table I they have a compact representation for integer numbers over large intervals.By C(m) we denote a Fibonacci code employing a pattern of length m.For example, in Table II, C(3), C(4) and C( 5) denote Fibonacci codes with patterns p = 011, p = 0111 and p = 01111, of length 3, 4 and 5, respectively.

A. Fibonacci codes
Fibonacci codes are binary codes constructed based on the Fibonacci enumeration system [13], [20], [24].The Fibonacci enumeration system of order m, m ≥ 2, [11], denoted as F (m) , can be used to represent any non-negative integer number, employing the following: a) a binary alphabet, e.g., {0, 1}, and b) the fact that there is a unique manner of writing a given integer which does not contain m consecutive digits equal to 1 [13], [26].The representation F (2) is known as the Zeckendorf's representation [20].The generalized sequence of Fibonacci numbers of order r, r ≥ 2, is defined by the linear recursion where Example 2.1: The enumeration system F (2) uses the Fibonacci sequence of order two, consisting of the numbers 1, 2, 3, 5, 8, 13, 21, 34, . .., as a basis.The F (2) representation for the numbers 11 and 17 is 00101 and 101001, respectively, as follows from Examples of Fibonacci codes are presented in Table II, specifically, codes C(3), C(4) and C(5), which belong to the family of Fibonacci codes presented by Capocelli et al [14], [24], [25], following the common practice in the literature of showing only some initial part of the integers.In general, for a given integer m, m > 2, C(m) denotes a pattern code, as introduced by Capocelli, consisting of the set of all binary strings of length greater than or equal to m in terms of generalized Fibonacci numbers.

III. ADAPTIVE UNIVERSAL CODES
Our proposal for constructing Adaptive Universal (AU) codes consists of combining two or more pattern codes so that the resulting adaptive code benefits from good properties of the component codes and pays a small penalty in efficiency.We call the reader's attention to the fact that, as explained in the following, each codeword representing a positive integer in an adaptive universal code begins with an identification pattern of 0's and 1's of fixed length, which uniquely specifies the suffix of the Fibonacci code being used to construct that codeword.
Given an arbitrary list of positive integers, for coding purposes this list can be subdivided into nonoverlapping intervals.The purpose of an adaptive code is to represent all intervals A specific codebook construction for A(C 1 , C 2 , . . ., C s ) is described in subsection III-A.

A. Codebook construction rule for
Without loss of generality, let {C 1 , C 2 , . . ., C s } denote a set of s disjoint pattern codes, for which l(p 1 ) is the shortest pattern length.The idea behind the rule for constructing the codebook for A(C 1 , C 2 , . . ., C s ), s ≥ 2, is to select codewords as short as possible.This rule is described as follows: • The first codeword is represented by the binary sequence p 1 = p 1 p 2 . . .p m1 (the shortest pattern), p 1 ∈ C 1 .• Except for the first codeword, other codewords are produced as follows: 1) Let i = 1.
2) Use the binary sequence q 1 q 2 . . .q a , of length a = log s , that specifies in C i those codewords beginning with q 1 q 2 . . .q a .3) Append to q 1 q 2 . . .q a an unused shortest binary sequence x 1 x 2 . . .x l , l ≥ 0, where l = 0 means a void sequence.4) Append to q 1 q 2 . . .q a x 1 x 2 . . .x l the binary sequence representing the pattern p i , thus forming a codeword of length T = a + l + m i from C i .5) Go to step 2 and continue in this manner, selecting codewords from code C i , as long as there are codewords of length T in C i .Otherwise, go to step 6. 6) Make i + 1 → i.If i ≤ s, go to step 2. Otherwise, go to step 7. 7) Make l + 1 → l. 8) Go to step 1. Summarizing, the codebook construction rule for A(C 1 , . . ., C s ) produces codewords X that are binary sequences, such that one codeword has the form p 1 p 2 . . .p mi and the remaining codewords have the general form q 1 q 2 . . .q a x 1 x 2 . . .x l p 1 p 2 . . .p mi .
Example 3.1: Let s = 2 and let A(C 1 , C 2 ) = A(C(3), C(4)), where C(3) and C(4) denote Fibonacci codes, as defined by Capocelli [14], with patterns p 1 = 011 and p 2 = 0111, respectively.Since s = 2, it follows that a = log 2 2 = 1, and the first digit in each codeword of A(C(3), C( 4)) identifies whether it belongs to C(3) or C(4).Specifically, codewords in A(C(3), C(4)) that start with q 1 = 0 come from C(3) and codewords that start with q 1 = 1 come from C(4).The first 16 codewords of codes C(3), C(4) and A(C(3), C(4)) are shown in Table III, where the codewords in A(C(3), C(4)) that came from code C(4) are written in boldface type.By the codebook construction rule, the first codeword in A(C(3), C(4)) is the pattern 011, and the second, third and fourth codewords are 0011, 00011, 01011, all from C(3), since up to length 4 there is no codeword beginning with 1 in C( 4) with a length equal to or shorter than 4. The next codeword beginning with 0 in C( 3) is 000011, of length 6, and we notice that C(4) offers 10111, which begins with 1 and has length 5, and is thus selected for A(C(3), C( 4)), and so on.
Example 3.1 serves the purpose of illustrating our construction but is by no means restricted to the Fibonacci codes introduced by Capocelli [14].In fact, it is important to point out that our construction of universal adaptive codes allows the freedom to choose any binary number to represent a pattern.
In the next section, we present a specific algorithm to construct an adaptive universal code by selecting codewords from two Fibonacci codes, without having to construct the codebook for the Fibonacci component codes.

IV. CONSTRUCTING FIBONACCI ADAPTIVE UNIVERSAL CODES
Although Definition 3.1 indicates how to combine two or more pattern codes to produce an adaptive code, in general this is not practical because it would imply constructing first the pattern codes, containing possibly a long list of codewords each, and then combing them.For that reason we present now a procedure that produces an adaptive code without the need for the intermediate step of constructing the component pattern codes.As illustrated in Table I, the integer representation provided by a single Fibonacci code is not uniformly better than that given by another Fibonacci code over a sufficiently large range of integers.This is also the case for other families of universal codes, when used for integer representation.Along the same reasoning adopted in Section III we propose the construction of Fibonacci Adaptive Universal (FAU) codes by combining two Fibonacci codes so that the resulting adaptive code benefits from good properties of the component codes.The possibility of combining more than two codes still needs further investigation before a practical algorithm is devised.

A. Construction
For given positive integers u and v, u < v, let C(u) and C(v) denote Fibonacci codes [14].We now show how to   We denote as consists of the symbol 0 followed by the binary sequence represented by F (u−1) (Q), having length k − (u − 2).If necessary, fill F (u−1) (Q) with zeroes on the left in order to occupy all k − (u − 2) places.Append the suffix for codewords from C(u).

If
, then B A(u,v) (N ) consists of the symbol 1 followed by the binary sequence represented by ) with zeroes on the left in order to occupy all k − (v − 2) places.Append the suffix for codewords from C(v).Example 4.1: We describe next how to combine the Fibonacci codes C(3) and C(4) to produce the FAU code A (3,4).For a given positive integer N , the associated codeword of the adaptive code A(3, 4) is constructed as follows.
k −1, then B A(3,4) (N ) consists of the symbol 0 followed by the binary sequence represented by F (2) (Q), having length k−1.If necessary, fill F (2) (Q) with zeroes on the left in order to occupy all k − 1 places.Append the suffix 011, of code C(3).
k , then B A(3,4) (N ) consists of the symbol 1 followed by the binary sequence represented by F (3) k ) with zeroes on the left in order to occupy all k − 2 places.Append the suffix 0111, of code C(4).5), respectively, using the general construction described in Section IV-A. Figure 2 illustrates the fact that code A(3, 5) performs better than code C(5) for codeword lengths t, t ≤ 12, and performs better than code C(3) for t ≥ 15.
Codeword lengths (t) Number of distinct codewords (N X (t))

V. COMMENTS AND CONCLUSIONS
In this paper we introduced Adaptive Universal codes to represent integers by combining two or more pattern-like codes, in particular, Fibonacci codes.This method represents a new approach to the classic problem of the compact representation of an arbitrary list of integers.As an application of our construction technique, we chose Fibonacci codes as the component codes since they showed the best individual performance among various other codes (see Table I).As it happens so far with all known coding constructions for the integers, our construction is also sub-optimal in the sense that adaptive codes are asymptotically longer than the logarithmic representation [11], [24].However, the construction of adaptive codes represents an interesting alternative to the problem of coding for arbitrary symbols, represented by integers, frequently in a compact representation, in many practical applications that use universal codes.

Example 4 . 3 :
The numbers required for the construction of codes A(3, 5) and A(4, 5) are presented in Tables VI and VII, respectively, for values k ≤ 8. Codes A(3, 5) and A(4, 5) are constructed from the combination of Fibonacci codes C(3) and C(5), and codes C(4) and C(

Fig. 1 .
Fig. 1.Comparison of the number of length-t codewords for the binary representation with Fibonacci codes C(3), C(4) and Adaptive code A(3, 4).

Fig. 2 .
Fig. 2. Comparison of the number of length-t codewords for the binary representation with Fibonacci codes C(3), C(5) and Adaptive code A(3, 5).

TABLE II FIRST
16 CODEWORDS OF CODES C(3), C(4) AND C(5).with the same accuracy of the best pattern code for each interval.Let C 1 , C 2 , . . ., C s denote s pattern codes with disjoint codebooks, i.e., such that C

TABLE IV NUMBERS
USED FOR CONSTRUCTING THE ADAPTIVE CODE A(3, 4) FOR k ≤ 8.
Table IV shows the numbers involved in the construction
(3,4)3,4)for k ≤ 8. Figure1illustrates a typical behavior of our construction of FAU, in comparison with the respective component codes.Notice that code A(3, 4) performs better than code C(4) for codeword lengths t, t < 11, and performs better than code C(3) for t ≥ 11, where N Ω (t) denotes the number of distinct codewords with length t by scheme Ω.

TABLE VI NUMBERS
USED FOR CONSTRUCTING THE ADAPTIVE CODE A(3, 5) FOR k ≤ 8.

TABLE VII NUMBERS
USED FOR CONSTRUCTING THE ADAPTIVE CODE A(4, 5) FOR k ≤ 8.