Design and Evaluation of Chase Decoder Architecture for Medium Capacity TDMA Satellite Systems

This paper describes a dignai machine architecture suitable for sofl-decision decoding of binary linear block error-correcting codes, based on Chase's algorithm II. Specifically, it is presented lhe hardware implementalion for lhe extended Golay (24, 12, 8) block code in a printed circuit board used in a satellite multi-lrequency TDMA data transmission system, called SAMSAT, which was developed at CPqD-TELEBRÁS. This architeclure may be implemented using oll-Ihe-shelf digital ICs and slill reach up lo 1 Mbil/s transmission rates, in a variety of digital satellite transmission applications. CosI and chip count of final circuitry are highly competitive within the class of performance to which this forward-error-correction (FEC) technique pertains.


Introduction
It is well known that, for any (n, k, d) binary linear block error-correcting code, conventional algebraic decoding techniques have INT [(d-1)/2] digits error correction capability, with n being the number of digits of the output block, k the number of bits of the input block ar message, d the minimum Hamming distance of the code and INT [x] denoting the greatest integer number less than or equal to x. ln 1972, Chase [1] proposed a class of decoding algorithms using channel measurement information (soft-decision) in conjunction with any conventional algebraic (i.e., hard-decision) decoder, which can correct many pattems of up to (d-1) errors, i.e., practically a two-fold increase over the hard-decisioll decoders.Chase also showed that his algorithms are equivalent (or an approximation) to correlation block decoders, thus being maximum-likelihood and then optimum for• binary linear block codes in an additive white Gaussian noise (AWGN) channel.He also simulated his algorithm II for the extended Golay code, denoted here as EGC (24,12,8), in an AWGN channel and found an asymptotic coding gain of 6dB over uncoded transmission and a 4dB improvement at an error probability Pe=10-5 .This is significant when compared to the same numbers for a binary hard-decision decoder (3 dB and 2 dB, respectively) and even whencompared to typical Viterbi soft-decision decoders for short constraint length convolutional codes of the same rate (1/2) (e.g., Shenoy and Johnson [2]).
Clar1< and Cain [3] and Viterbi and Omura [4] obtained by simulation and calculation (respectively) that the degradation introduced by any quantization greater than or equal to 8 leveis (3 bit), with respect to the non-quantized case, is less than or equal to 0.22 dB, regardless of the quantization scheme being used (uniform, optimum metric spacing, etc).Therefore, an 8 levei quantization scheme can be used for soft-decision decoders with negligible degradation on the final performance.This 8 leveI quantization will be ass uned for the following sections.Finally, Pessoa [5J simulated the sensitivity of Chase's algorithm II, associated to the EGC (24, 12, 8) and an 8 levei uniform quantizer, wíth respect to automatic gain control effects.It has been fbund that, for ± 6 dB variations at the quantizer input, with respect to the optimum levei, the corresponding degradation in the decoder performance is only ± 0.3 dB.
Other important aspects related to block decoders are: they do not have any source of degradation other than the leveis of quantization, have no error propagation over subsequent blocks, have a fixed amount of delay from input to output and are very suitable for burst communications.On the contrary, soft-decision decoders for tree codes have degradations associated to the path memory length (both for Viterbi and sequential decoding), variable decoding delays (only for sequential decoders) and variable error propagation over the decoded bits.
Although well known for several years, system and circuit designers have given Chase decoders a secondary role in the error-control-coding field, driving ali their efforts to Viterbi decoders.One exception to this trend can be found in Hackett [6], where a variation of Chase's algorithm, for low-speed applications, is implemented in software, using a general microprocessor.
The architecture proposed in the present paper may be implemented using standard digital ICs, available in the market (HCMOS series, TIL compatible; LS-nL; S-TIL PROM's; CMOS EPROM's and NMOS RAM's).The resulting hardware still can reach transmission rates as high as 1 Mbitls, with maximum operation speed being Iimited only by RAM's access time.Cost, complexity and chip count of final circuitry are highly competitive within the class of performance to which this FEC technique pertains.ln order to achieve such results, we had to deal with some recent digital design techniques and concepts such as pipeline or parallel processing, multi-processing hardware synchronization, self-timed circuits, etc.We start by presenting, in Section 2, the division of Chase's algorithm II in several parallel processes suitable for relatively simple and economical digital circuit synthesis.Then we thoroughly discuss the implementation of each section, detailing the necessary clocks, the synchronization problems and the interrelationship among the severaI processing units associated with those parallel processes.
ln Section 3 we show, as the main application of the derived method and its associated hardware, the use of this architecture to decode the EGC (24, 12, 8).The resulting circuitry is part of the TDMA/QPSK modem at 1068 kbiVs trasmission rate for a satellite multi-frequency TDMA data transmission system, the SAMSAT, which will be briefly presented.Other possible satellite communications applications are also discussed.
Then we present the results obtained with the circuit that has been built on a double-face printed circuit board (PCB) with 124 standard ICs.We show how it complies wíth the system engineering specifications: for an input bit-error rate (BER) of 10-2 , output measured BER was 2.10-7 and with 4.10-3 , it was 1.4 10-9 , leading to a net gain of approximately 4.7 dB.Final goals of circuit design engineering are also achieved: a single double-face PCB with 115 square inches, powered by a single +5V supply, with a consumption of 1.3A and a decoding delay of 26 symbols, operating at rates of 1 Mbitls could be implemented with low power needs, by using a relatively simple, modem and cost-effective architecture that still gives high performance at low SNR's (signal-to-noise ratios), i.e., low values of EblNo (ratio between the energyper information bit and the one-sided noise power spectral donsity).
2. The Digital Architecture for Chase's Algorithm II Decoders 2.1 Chase's Algorithm II Review Let y(t) be an antipodal binary signaling waveform received by a demodulator, corresponding to a noise corrupted waveform x(t) originally transmitted by a modulator in response to a codeword of n digits, denoted by X =(X1, x2, ... , xn), produced at the output of a binary linear block encoder whenever an information sequence of k bits, denoted by A =(a1, a2, ..., ak), is presented at its input.Let us assume that this bloc k encoder is based on a binary linear block code C(n, k, d).Then, if y(t) is quantized into Q leveis prior to decoding, the Chase algorithm [1] for soft-decision decoding of binary linear block codes using channel measurement information can be described by a procedure consisting of the following steps: (1) a quantized version of y(t) is available, sampled at the optimum sampling instants by aG-bit quantizer, such that, for each set of G bits associated with a single hard decision bit Yi of the n-bit sequence Y = (Y1, Y2 o ., Yn): Then there is a reliability vector R = (r1, '2, . o (2) make, on an a priori basis, polarity inversions on an arbitrary number F of digits in sequence Y (usually the least reliable ones), based on the generation of several test pattems Tj that are modulo-2 added to the received sequence Y, thus obtaining a new word Yj as (2) where(f) represents modulo-2 bit-by-bit addition; (3) decode, with the aid of a conventional binary (hard-decision) decoder, the sequence Yj -which is fed to this hard-decision decoder -and then find an associated error-pattern Ej; (4) add Ej to Tj, obtaining Z!j as a modul0-2 addition (3) where Ztj is an error pattern that carries both the binary decoded errors plus the possiDle a priori corrected errors via the polarity inversions; (5) find the analog weight of Ztj, Wa (Ztj) n Wa(Ztj) == I rjZtji (4) i =1 where q is the decimal representation of rj (note that Wa (Ztj) is the sum of the reliabilities associated with the digits that are 1's in Ztj); (6) choose the pattem ltj with the minimum analog weight; (7) if there are more test pattems, do steps 2 to 6 again; if there is none left, decode C, ihe <XJdeword estimator, as (5) (8) transfer to the output the corresponding information bits associated to C. Algorithms I, II, and III vary according to the test pattern set used to modify the received sequence.For Chase's algorithm II, the chosen test pattem set is the set of ali test patterns that have any combination of 1'5, which are located in the lhJT[d/2] positions of lowest confidence values ri in the word R, including the ali-zero pattern, that leads to the conventional hard-decision result.
Therefore, with Chase's algorithm II, we try to SOlve, via the test patterns, up to the F=INT[d/2] most probable errors, presumed to be the F least reliable digits of V, letting the binary decoder solve up to t=INT[(d-1)/2] errors that are more difficult to detect by confidence values inspect:on.Operating in this way, the Chase's algorithm II decoder can handle up to error corrections, increasing considerably the eíTor-eorrection capability of a code with minimum Hamming distance d.It should be observed that t 1 is not the error-correctíoi"1 capability IA algorithm II because the decoder does not correct ali the error pattems whose weight is less than or equal to (d-1).However, in a probabilistic way, it can correct the most Iikely error patterns whose weight is less than ar equal do (d-1) and surely correct ali error patterns whose weight is less than or eqr lal to t, the error-correction capability of the inner hai'd-decision binary decoder.
Finally, the number Ne of elements Tj of the test pattern set is the number of possible subsequencies for the INT[d/2] least reliable positions in the sequflnce.Since the sequence is binary, therl For the EGC (24, 12, 8), there are N e =16 test pattems.

A Novel Division of Chase's Algorithm II in Parallel Processes
The Chase algorithms, including algorithri"1 II, have been proposed within a theoretical framework, as it can be inferred from the presented review, and they may be implemented by several different digital techniques.However, there are only few techníques that can be considered attractive from a practical point of view, for operation at relatively high data rates, with small power comsumption and reduced price.
To show this, we shall analyze some alternatives for decoding the EGC (24, 12, 8) at the desired rate (1068 kbitls).The stationary decoding interval for a continuous transmission is one block and therefore the available processing time to make a decision over any test-pattern will be T(processing any Tj) = (24/16).(1/1068.10 3 ) = 1.4l-ls (8) This value is too low to accomplish ali the necessary operations in order to find the analog weight of Tj with standard available microprocessors (even with an 80386 running at 16 MHz).Bit-slice processors and a RISC (Reduced Instruction Set Computers) architecture are discarded, mainly due to high po\Ver needs, EMC (electromagnetic compatibility) problems in a radio environment and economical reasons.On the same basis, a serial processing architecture using standard ICs must be discarded.A totally parallel architecture using standard ICs must be discarded too, due to its enormous chip count.Three alternatives remain to be analyzed.The first one is to use a DSP (Digital Signal Processor) based architecture.However, this solution must use severaI DSPs due to technical limitations of operating speed for DSPs currently available in the market.Then, this approach becomes expensive, although it seems to be a promising one in the near future, when DSPs are expected to speed up and to become cheaper than today.The second altemative is to make a custom IC design (ASICs), which is the way Viterbi decoders are currently implemented, but this is attractive only if a huge market is envisaged, for cost reduction due to large scale production, which is not our forecast for the potential Brazilian marl<et.
Finally, the third altemative, that seems to be the most attractiv€ one, is to find an architecture suitable for PCB design which is still applicable for ASIC design and which does not affect acceptable power and packing limitations imposed by the FEC function.This third alternative was the one chosen.
ln order to explore such an alternative, the architecture presented in this work transforms Chase's algorithm II and treats it as a decoding macro-process which is divided in several distinct processes, each one suitable for hardware implementation.By careful selection, we can arrive at the following processes: (1) store the received hard-decision sequence Y and the reliability vector R; (2) find a "primary" syndrome S=Y H', where H' is the transpose of the parity-eheck matrix H;. (5) generate ali associated "test-syndrames" STj using (9) (6) add the "primary" syndrome to ali "test-syndromes" to find the "modified syndromes" SM• J (10) (7) find ali the binary errar patterns, represented here by Ef, with the aid of SMj via a binary decoder, such as a look-up table, error trapplng, etc, giving the addresses that are 1 in E(; (11) store both the min [Wa(E y ]and the correspondent Ej; (12) add this stored E to Y to obtain the codeword estimator C, correcting ali the correctable errors in Y; (13) send the information bits associated with Y; if the code is systematic, this is merely the transfer of the first k bits of Y to the output.Now, in order to validate our idea, we shall prove that these processes together are equivalent to Chase's algorithm II and that they are very efficient for practical applications.This will be done by checking the following statements.

Statement 1
To work with "primary" and "test" syndrome addition is equivalent to work with received sequence and test pattern addition, and the former approach is preferable than the latter one.

Proof
We need to decode with binary decoding the sequence Y"given in (2).Ali practical binary decoders need a syndrome to work with.then we need to calculate The syndrome addition needs only (n-k) adders whíle Y and Tj addition needs n adders.

Statement 2
To work with the addresses of the digits that are 1 in a binary vector with n digits is much more convenient than to work with the vector itself, provided n is big enough and the 1's are sparse.
Proof (i) for binary decoding a binary linear block consider code C(n, k, d).The hard-decision decoder can correct up to t=INT[(d-1)/2] errors.The conventional decoder finds an error vector E" with n digits, then E" must be added to the received vector Y to produce the codeword estimator C. Then the number of necessary signal tracks Li, which is also the number of circlJit elements for each linear operation, is However, if this decoder handles only the addresses of error-positions, since there are at most t correctable errors and since to address, in a binary form, a number ranging from 1 to n involves a number Na which is for n =2g ] + 1 for any other n there will be at most (15) necessary signal tracks.Then, if d«n, it is clearly wiser to work with addresses rather than the vector itself.For example, if t =3, for any integer n greater than 9, we should prefer the addressed form of the vector.
(ii) analog weight calculation Clearly there are at most (d-1) terms to add up when addresses are being used.
On the contrary, if we use the error-vector itself, we must add ali the products q.Eji, since we do not know which digits will be a's and which will be 1's.
(iii) valid error-pattern calculation Note that n modul0-2 additions sholJid be performed if a direct representation of vectors is used.If an addressed representation is available, it is merely a coincident address masking that is required.

Statement 3
The easiest generation of Tj is accomplished if addressing is used, because we can eliminate vector representation conversions.As far as algorithm II uses ali possible subsequences tor the least reliable positions in the sequence in order to generate the set {Tll.we will always need to point out these positions by means of addressing.This is so even if we use a bit set-reset procedure to build a "true least reliable positions indicator vector', in case a direct represen tation form is being used.

Statement 4
To calculate the "test-syndrome" is simply to add the coresponding columns of the transpose of the parity-check matrix, addressed by the digits that are Based on this analysis, we can establish the decoding macro-process flowgraph shown in Fig. 1, where each process can be handled by a simple and ecçmomical hardware.It is thus defined a complete architecture, which may be used to decode, using Chase's algorithm II, any binary linear and preferably systematic code C(n,k,d).
If transmission is continuous, the decoder enabling signal ENDEC will always be 1 and if burst transmíssion is used, ENDEC will be pulsed and generated by the appropriate synchronization circuitry.If a simple coherent BPSK modem is used, the direct demodulated and quantized bit stream feeds the decoder.If coherent QPSK is used and a single encoder is provided, deiTlodulated and quantized symbols must be recombined into a single bit quantized digit stream using the same commutation rule employed at the transmitter.From Fig. 1, it is evident that the stationary decoding delay is one block, and then we do not need duplication of decoders to handle any message time distríbution.Also, a very handy function to system evaluation, namely the BER estimation, may be provided by the decoder.
At this point, instead of going further, detailing the generalized circuits to carry out each process, it will be easier to understand the principies of the architecture by showing the general block diagram of the hardware.We then analyze the block implementation for the specific application and finally generalize the requirements.
The general decoder block diagram is shown in Fig. 2. The input signals are power (VCC and GND), decoder enabling (ENDEC), clocks (which will be detailed in the next section), and input quantized (data and reliability) signals.Its output signals are the decoded data and BER alarmo Ali presented findings allow us to draw a conclusion: combined pipeline and parallel processing, as used inthe division of the macro-process shown in Fig. 1, can effectively provide an optimum architecture to perform Chase's algorithm II decoding.

Application to Digital Satellite Transmission Systems
3.1 Use of EGC (24, 12, 8) with Chase's Algorithm II in the SAMSAT System As a concrete application example, we shall analyze the Chase decoder architecture composition in the SAMSAT environment.SAMSAT is the Brazilian narrow-band multi-frequency TDMA data transmission system by satellite for medium capacity domestic applications.Each SAMSAT system operating at its full capacity occupies one third of a 36 MHz transponder of the Brazilian domestic telecommunications satellite (BRASILSAT), and then there may be up to three SAMSATs per transponder.Each SAMSAT can handle up to 126 terrninal stations (ETS) and has 2 reference stations: a master for primary reference (ERO) and a secondary reference (ER 1).Both reference stations do not carry traffic.Total system capacity is 8.192 Mbitls of information, allocated in 60 ms frames working at 512 kbitls net information rate along 16 distinct frequencies.
Each ETS may support up to 16 ports (terrestrial interface modules) working at speeds ranging from 1.2 kbitls up to 384 kbitls in a variety of different interfaces such as CCITI V.24/V.28,V.36/V.11,G. 703 (both synchronous and plesiochronous).Non-switched voice service at 32 and 64 kbitls are also supported.Burst assignment is pre-allocated (on a reservation basis) that may change once every 5 minutes, the minimum database change interval.The ETS multi-frequency TDMA/QPSK modem operates at a digit rate DR = 1068 kbitls.The gross inforrnation rate (information plus overhead such as unique words, carrier recovery sequence, etc) is IR = 534 kbitls, and an extended Golay code EGC (24, 12, 8) is applied to protect the already scrambled net inforrnation bits.
To convert the coded digit stream into two in-phase and quadrature streams to feed the modulator, decommutation is performed such that the odd positions of an encoded block are sent to the in-phase channel and the even ones to the quadrature channel.At the reception, coherent demodulation is used, with phase ambiguity removal and block synchronization achieved via the unique word detection, that also provides frame alignment and master station reference clock "windowing" for ETS clock recovery.The reference c10ck of any ETS is 16 times the digit rate, i.e., it is 17.088 MHz and then it is possible to generate any c10ck which is a rational multiple or submultiple of the digit rate (DR) up to 17.088 MHz.
SAMSAT system specifications stand for a BE R < 10-7 for an operating EbJ'No = 9.5 dB, including ali allowable degradations, which led to an Eb/No = 8.5 dB at an IF loop of the modem with multi-frequency and burst operation.This requirement demands the use of FEC with soft-decision decoding.Adopted FEC scheme is the EGC(24, 12, 8) with soft-decision 3-bit quantized decoding using Chase's algorithm II mainly because block coding at rate 1/2 rate is easy to be done at the transmitter side, does not require any burst extension period as convolutioilal codes do, has simple c10ck generation and because it achieves the required performance, being as good as short-Iength rate 1/2 convolutional codes with Viterbi decoders.
We shall now detail how to design the blocks of the decoding architecture, showing their interrelationship.We begin by remembering that for 3-bit quantized EGC (24, 12, 8) sequences, using algorithm II, we shall have 4 least reliable positions per block, 16 possible test patterns, a maximum value of 3 for eack fj.Therefore, an upper bound value for the analog weight is 3.8 = 24.
Furthermore we assume a binary decoder that corrects ali the error pattens up to 3 errors and some (1/6) of the 4-error patterns (for the EGC is a quasi-per'fect code), with 2<n-k)=4096 correctable error patterns.Also, EGC may be characterized by a polynomial generator since it is derived from a cyclic code and we will assume that it is in its systematic format [3J and [4J.Also, block alignment and any rational multiple or submultiple of the basic clock rate are available.Then, if we state that any position within any block will be addressed by a number in the range 1 to 24, lAJe may declare the value O as a dummy address, which will be very handy for hardware processing as we shall see soon.Finally, we ass me the 3-bit quantizer output encoded as given in Table 1.With these input constraints in mind, we may now start the block implementation description.The block numbers referred to are always those shown in l=ig.2.
The input interface -block 1 -is simply composed of line receivers at ali input signals and a multiplexer of the in-phase and quadrature demodulated signals, controlled by the information clock IR=534 kHz, to reconstruct the block sequence.
The primary syndrome generator -block 3 -may be implemented as a conventional syndrome register "Iith (n-k) =12 flip-flops, as described in [3J and [7], and a (n-k) store register.
The least reliable digits selector -block 4 -performs process denoted as # in Fig. 1.It must search for the addresses of the 4 least reliable positions and may be implemented as sketched in Fig. 3.The master counter plays a major role not only in this block but also in the whole architecture since it is responsible for ali the address generation.
Three bit quantization correspondence.
The test pattern generator -bloek 5 -responsible for proeess # 2 shown in Fig. 1 is extremely simplified by using position addresses and the dummy ad dress O. Fig. 4 shows the necessary hardware to generate the 16 vectors Tj represented by addresses DT1 to DT4, As only one lj has 4 non-zero DT j ' s, all the other will have at least one DT that is nu II, which means that this DT points to "nowhere" in the block.The process in block 5 can be seen as a position ad dress masking function.The masking clock must guarantee the test pattern ge neration of ali test patterns in one block period in order not to increase the de coding delaYj nor to duplicate functions.Then it must have afrequency f m whi eh is equal to (16/24).DR, that is, 712 kHz.ln general, we have The test syndrome generator -block 6 -responsible for process # 3 in Fig. 1 and the syndrome adder -block 7 -may be implemented together as shown in Fig. 5  the addresses of digits that are 1 in the decoded error-pattern.This is a set of four 5-bit addresses, namely PE1, PE2, PE3 and PE4.Generally, look-up table decoding needs an (n-k)-bit syndrome input to respond with a set of t addresses of valued 1 digit positions in the error-pattem.There shall be an enabling signal, such as VALlD is, to indicate a valid syndrome.Other hard-decision decoding might be used, but this may not be practical, for today's EPROMs are quite cheap.
The coincident address analyzer -block 9 -carries out process # 4 in Fig. 1 and its structure is depicted in Fig. 6.Again a bus strucuture is used to multiplex (do the union of) the signals PEj and DTj by enabling 3-state devices at appropriate instants.However, to get the proper valid error pattern E~, we need to erase the coincident positions between sets {PEi} and {DTi}' The multiplexed positions of valued 1 digits of Ej will be used in the next blocks, the metric analyzer and the error-position memory.Since there are 4 PEi'S and 4 DTj's, the frequency f aw of the AWCK ("Analog Weight Clock") must be 8.f m , that is, 5696 kHz.ln general, we must have where thus f aw = = d.fm whenever the code being used is quasi-perfect.Also, f sum a/lhough Chase decoders correct up to (d-1) errors, the hardware will always need to work with d addresses due to the random nature of the ordering of errar positions in the sets {PE j } and {DTj}.The error correction and output interface unit -block •12receives the errar positions, stored in the error-position memory, via a read sweeping operation over this memory during an information bit period.If the information bit address agrees with any stored error position, an address comparator pulses and this pulse is fed into an exclusive-or gate to invert the current information bit, providing the correction.This bit stream is regenerated and fed into a line driver which is the output source.The information bits come from a rate converter (1068 kbitls to 534 kbitls) that reads them from the reliability and data memory.SER meter -block 13 -just counts the number of correction pulses during a certain number ofbits.If the counter of correction pulses overflows, an alarm of excessive input (prior to correction) SER is activated.SER must be measured within a given tolerance and with a high confidence levei [8).
The last two blocks -the memories -should be carefully ana/yzed due to its pagination schemes.The reliability and data memory -block 2 -will be always a 2.(n+1) long and G-bit wide RAM, that is a 50x3 RAM in our example.This memory performs the following functions: (i) receive and store the reliability and data sequences, with 24 write cycles per codeword, at 1068 kbitls; (ii) store O at relative address O of current selected memory page, with 24 write cycles per codeword, at 1068 kbitls; (iii) read the information bits to send to output, with 12 read cycles at 1068 kbitls per codeword; (iv) read the reliability values to feed the metric analyzer, with 16•8=128 read cycles per codeword at 5696 kHz.This memory has two pages that avoid duplication of other decoding functions because while a block read frem one page is being decoded, the subsequent block may be being stored at the other page.
The error-position memory -block 11 -also has two pages, each one di'/ided in two segments.ln general, it will be always a 4.{INT[d/2]+t"} long and Na-bit wide RAM, with Na as given by ( 14).ln our case, it is a 32x5 RAM.The two pages are provided for mutiplexed write cycles for the current processing codeword and for read cycles for the current output codeword (that is being corrected).Segmentation is needed due to the fact that a çjecision on any Wa(Ej) is possible only at the end of the current Ej time interval.Therefore, we first need to write Ej in a segment of the currenf page and then, based on the metric comparator output, we can either declare this segment good and change of segment, 01' declare this segment as scratch memory and continue to use i1.At the end of decoding, the minimum analog weight error pattern will be stored at the segment which is not being pointed as "able to write" by the write address buffer.
At last, we shall derive the required RAM access times.Since the maximum write/read frequency is 5696 kHz which is (16/3).DR and since a divide-by-3 counter has an output duty cycle of 1/3, it follows that we need an access time Ta such that Ta < 58.5 ns

Ocher Application Examples
There are several possible applications of this architecture in digital satellite communications systems.Since ali blocks are completely defined by the code parameters (n, k, d), the digit rate DR and the number G of quantization bits, it is very easy to make an assessment of viability or fitness for any Chase algorithm II decoder using this architecture.TDMA systems or any "RF framed" system may be directly implemented.Application to continuous "unframed" transmission systems would need some extensions, in order to design the codeword alignrnent auxiliary circuits.
This work shows clear1y that this architecture is very suitable to implement a standard PCB -using standard ICs -or even an ASIC to decode binary linear al"id preferably systematic and cyclic block codes, that have moderately large minimum distance (d<12), operating at medium data rates.
An extension of the specific EGC(24, 12, 8) decoder considered here to continuous "unframed" transmission applications is not complicated since EGC(24, 12, 8) is transparent to phase ambiguities that lead to word ínversion.This is so because the all-ones word is a codeword, Le., it has symmetrical weight distribution.DBPSK (Differential BPSK) applications do not need any additional circuitry.DQPSK applications must have a block alignment circuit that may use the BER alarms as thresholds for changing a channel comml !tator and inverter.A recursive polynomial division of Y by the code generator polynomial might be done by additional hardware, in order to ensure a small alignment search time.Coherent demodulation for "unframed" transmission is impossible in this case due to the specific code transparency to phase rotation

Measurement Results
The hardware presented for the EGC (24, 12, 8) used in SAMSAT has been Revista da Sociedade B,'asileira de Telecomunlcaç15es 1"15 Volume 5, N ~ 1, junho 1990.I designed using 124 standard ICs on a 13.05 x 8.87 square inch double-sided PCB whose density is 0.63 square inches/equivalent 14-pin IC.A single +5V power supply is required and in the worst case the measured operating current was 1300 mA at the nominal digit rate of 1068 kbitls.Ten PCBs have been built to check repeatability and to provide enough hardware for SAMSAT field trial.
Baseband continuous non-scrambled transmission tests in an AWGN (Addítive White Gaussian Noise) channel have given the typical results listed in  ,16 Finally, measured results for IF loop with TOMA multi-frequency operation of the modem have shown that at Eb/No = 8.5 dB the BER is better than 10-7 , Le., complies with the specification.SAMSAT system tests with 3 ETSs and the reference stations have demonstrated that the whole system works very well through the BRASILSAT and no BER associated problems have been detected so far at the user's interfaces.

Conclusions
A novel approach to Chase's Algorithm 1\ Oecoding Method for binary linear block codes was derived.This approach was based on the division of the severa1 steps of the algorithm into pipelined parallel processes suitable for hardware implementation.It was shown that this hardware is optimum for logical circuitry minimization.Then the implementation for the specific case of an Extended Golay (24, 12, 8) code applied to a narrow-band multi-frequency TOMA data transmisson system by satellite developed at CPqO -the SAMSAT -was presented and discussed.Possible extensions of the derived circuits for the general case of a binary linear block code (n, k, d) have also been considered.It was also shown that the resulting hardware is very suitable to decode binary linear and preferably systematic and cyclic block codes with moderately large minimum distance (d<12) operating at medium data rates.
Fínally, practical measured results for the above mentioned case were presented, showing the BER performance at several system operating points.From these results, the improvement of the coding system on the overall system performance can be obtained.
is 1 bit (the MSB or most significant bit) that represents the polarity (O or 1) of the digit Yi; b) there are (G-1) bits that represent a positive integer number in the range [O, 2G-1], carrying a reliability information of the received Yi and denoted by 11.

( 8 )
find the valid error-pattern Ej as (11) by making the union of the sets of addresses, expurgating those appearing at both sets {E(} ,and {Tj}; (9) calculate ali the 2INT[d/2J analog wights Wa(E Y • frem the sum of ali r; ad dressed by those positions "i" that are 1 in E} (10) compare ali Wa (Ej) and select the Ej with the iowest analog weight;

Figure 5 .
Figure 5. Test syndrome generation and syndrome adder.
delays, parasitic capacitances, jitter and skew, we have chosen Tét =: 45 ns.To evaluate the maximum operating speed of this architectur'6, we note that the fastest NMOS RAM available today has Ta = 25 riS.Then the maximum theoretical speed would be DR max = 2 , because both blocks use the same number of adders, and we may eliminate a lot of circuitry by selecting three-state devices fOi" hardware implementation, which give us a high-impedance state.The test pattern is multiplied by H' via 4 additions of !ines stored in PROMs, addressed by the multiplexed DTi'S.If an extra addition with the primary syndrome is done, the modified syndrome SMj that will be fed into the binary decoder results.be noticed that PROMs contents are not only the lines of thA H matrix but .also êil ali-zero line at address O, in order not to change the modified-syndrome \f</hen any DTi is O. Also it is important to point out that output SMj is ready only when VALlD signal goes to 1. Finally, note that f sum is the maximum necessary clock frequency to be supplied to the decoder's hardware.
The binary decoder -block 8 -is simply an EPROM array that performs a table look-upoperation.Given an address which is the 12-bit SMj. it gives as output

Table 2
(by-pass correction and hard/soft-decision selection jumpers are available at PCB).Numbers in Table2are ensemble average figures, where a minimum of 5 consecutive measurements were made with 100 accumulated bit errors per measurement.Log-linear regression was made to find a function relating the output BER (BERout) and the input BER (BERin).We have obtained for soft and hard-decision, respectively 109 BER out =5.05 log BERin + 3.19