Embedded Signal Processing Module for Online Filtering in High-Event Rate Conditions

—Online event detection (ﬁltering) is required in communications, industry and electronic instrumentation systems. Those systems may comprise sequential decision levels. Eventually, decision to reject or accept an event may comprise fusion of different measured data. This work describes an embedded solution in ﬁeld-programmable gate array (FPGA), which allows to combine information from two different sources for a decision. A matched ﬁlter discriminator was used to identify typical signatures of interest. The main focus is on both digital data control and packaging module design and implementation and optimization of the matched ﬁlter module in terms of FPGA occupation. The proposed digital electronic system is presented and simulation results are used to validate the design.


I. INTRODUCTION
E FFICIENT online detection of events of interest is re- quired for several applications in telecommunications, industry and electronic instrumentation.For example, a highspeed and high precision smoothness detection system is required for efficient operation in ceramic production industry [1].An online computer-vision based structural problem detection system is proposed in [2] for automotive industry applications, in order to increase reliability and safety.In telecommunications, online trigger systems are required, for example, to detect anomalies and non-authorized access [3].Online event detection is also important in applications such as motor fault diagnosis [4], voice activity detection [5] and experimental high-energy physics [6].
For some applications, the use of information from different instrumentation systems may increase filtering efficiency.For example, thermal and imaging systems were combined in [7] to produce an instrument to study advanced functional materials.A combination of electrostatic and digital imaging sensors is proposed in [8] for online concurrent measurement of mass flow rate and size distribution of particles in a pneumatic suspension for industry applications.
Online filtering is mandatory in modern high-energy physics (HEP) experiments, which often produce a large amount of information as the collision rates have been considerably increased [9], [10].In such experiments, different instrumentation systems (detectors) are required for proper physics characterization.
As often most of data corresponds to background noise for those experiments, the detection procedure must be performed online within a short latency time interval.As the interesting events are very rare, high efficiency is required for the relevant processes.
The online filtering (trigger) system operation may be split into sequential decision levels.In this case, event selection is refined at each filtering stage, reducing the acceptance of false signatures.Operating in very high event rates, the first level is frequently implemented in dedicated hardware to achieve fast response [11], [12].The following trigger levels may make use of software-based solutions and more complex particle selection algorithms [6].
In HEP experiments, usually online detection is based on using information from different instrumentation systems.For example, energy measurement (calorimetry) and particle (muon) detection readouts are combined in [13], [14] to reduce the acceptance of false signatures.
This work presents an online filtering application that envisages high-rate event detection using information from different readout systems.Decision from a primary instrumentation system is validated using the outputs of a secondary measurement system.In order to cope with stringent time latencies, a digital implementation using field-programmable gate-arrays (FPGA) [15] is proposed.Typical signatures of the events of interest are used to design a matched filter detector.A casestudy envisaging online trigering in experimental high-energy physics application is presented.
The paper is divided as follows.In Section II general online detection in high event rate environments is addressed.The proposed digital system is presented in Section III.Results obtained from digital circuit design and signal processing with simulated data are presented in Section IV. Conclusions are derived in Section V.

II. ONLINE DETECTION IN HIGH EVENT RATE PROBLEMS
Modern online filtering (trigger) applications usually present common requirements such as high detection efficiency for the relevant signatures, low acceptance of false events and short time latencies for the decision making procedure.Considering specifically the online particle detection (trigger) in highenergy physics experiments, typical solutions usually comprise a sequential information processing system [6].Figure 1 illustrates a two-level trigger system.The first level (L1) is usually implemented in dedicated electronics, in order to deal with severe temporal constraints (very short latencies and high event rate).Pipeline memories and event buffers are used to temporarily retain detector data until a trigger decision is released.In the case of a final acceptance, data are recorded in permanent media for further offline analysis.Data fusion may be applied to increase detection efficiency.This kind of solution has been extensively applied in highenergy physics experiments.For instance, a combination of calorimetry and muon system information was proposed for muon identification in [16] for the International Linear Collider (ILC), and also in [14] for the ATLAS detector.
The proposed electronic system deals with the situation that a trigger decision obtained in a primary measurement system needs to be confirmed by analysing the signal profile from a second instrumentation system.
A classical detection method of an a-priori known signature s[k] immerse in additive background noise n[k] is the matched filter (MF) [17].The MF is an optimal linear detector in the presence of white Gaussian noise.The detection problem considers two possible hypothesis for the observed signal y[k].From H 1 , signal is corrupted from additive noise: From the alternative hypothesis (H 0 ), there is only noise (no signal is acquired): The likelihood ratio Λ(y) is the best measure of how likely a received signal y = where f y|H r (y) is the probability density function (pdf) of y under hypothesis H r (r = 0 or 1) and γ is the decision threshold .Whenever Λ(y) > γ, hypothesis H 1 is accepted (an event is accepted for final registration), otherwise, a decision is taken for H 0 (the event is discarded).As in practical problems pdfs are usually unknown, the optimum decision threshold may be chosen experimentally to meet system operational requirements (i.e.maximize detection efficiency).
When s[k] has sufficiently fixed pulse shape, an approximation may be applied to the MF design as the signal of interest s[k] may be considered as a finite-length waveform f [n]: If the noise n[k] is assumed to be white, zero-mean and Gaussian, than eq.( 3) can be simplified to: In case n[k] is not white, some pre-whitening transformation may be applied.It is worth to observe that Equation ( 5) shall be easily implemented in a digital system as a linear correlation between the observed signal y[n] and the deterministic waveform s[n].
In the following section an online dedicated electronic system is proposed and implemented in FPGA technology, to allow information combination from different instrumentation systems for optimal online event identification.For this, the MF classifier was implemented to process instrumentation readout signals and its decision was used to validate events pointed out as good candidates by a different concurrent measurement system.

III. PROPOSED SYSTEM ARCHITECTURE
A hardware implementation in FPGA is proposed to accomplish with short time latencies, envisaging high event rate environments.Taking that into account, the system shall be able to temporarily store and process information from different instrumentation systems.
When a trigger event occurs in the primary detection system, information from the secondary instrumentation system shall be used for trigger validation.As there may be a delay for primary trigger generation, a temporary memory is required for secondary system data.Trigger operation over secondary system information is validated through the MF processing block.
When the MF points to a candidate event, it is necessary to forward to the next selection level the required information.This information is encapsulated into an output data fragment, which is divided into three sections: (i) raw analog-to-digital conversion data; (ii) matched filter output; (iii) trigger results.Besides, information for identification and control purposes are also inserted in the data fragment.
In the following subsections, the main project requirements will be presented along with the proposed architecture.

A. System Requirements
Considering that the information of interest in the secondary instrumentation system is generated before the primary triggering process is finished, it is necessary to create an internal temporary memory structure (buffer) that will store the required information until previous trigger analysis is completed.Since each sensor output (readout channel) data are processed independently, a separate structure per channel is required.
For proper online operation, a synchronized clock signal is requested in both primary and secondary systems.An average primary trigger rate shall also be defined a priori, together with the maximum accepted latency time.
As illustrated in Figure 2, the proposed system receives as inputs both, primary trigger decision and secondary system readout signals.As the MF operates over a window of N samples, in addition to the secondary signal sample synchronized with the trigger information, its adjacent samples (nearest neighbours) must also be recorded, making it possible to properly represent a typical pulse of interest.
Since the sensors may be physically scattered, the generated signals have different paths, which produces a slight variation between the propagation time of the signals coming from the analog to digital converters (ADC).Therefore, it may be necessary to carry out a synchronization process to compensate for the existing delays.The expected system operation comprises the following steps, in case Equation 5 applies (deterministic pulse detection approximation): 1) Retain in local buffers a number of secondary system readout samples and their corresponding clock information k. 2) Wait until a primary trigger acceptance indication is received for k = n.3) Select the k = n − δ − , . . ., n, . . ., n + δ + secondary system readout samples and forward them to the MF, where Equation ( 5) is evaluated.Here δ − and δ + are, respectively, the lower and upper limits of the typical detector pulse, such that N = (δ + + δ − ) is the typical event of interest pulse length.4) Compute the MF decision.5) Build the data fragment containing information from both primary trigger and MF modules.6) Send the data fragment as final system output.It is worth to mention that step 1 is executed in free-running mode during the entire time interval in which the system is operational.

B. Proposed Architecture
In order to meet the specifications presented in the previous subsection, a digital architecture comprising four main blocks was developed (see Figure 3  The packet controller modules are responsible for receiving and storing in a free-running mode information of interest from both primary trigger signal and secondary system readouts, allowing its access when a trigger occurs.As illustrated in Figure 4, for the ADC packet controller, internally these modules comprise two memory levels and a complementary control logic block.The first level, like the other blocks, operates at the system clock rate, the same frequency as the analog-to-digital converters, storing the received samples in a circular memory with 256 positions.One of the specific functions of the ADC packet controller is to compensate for delays between the ADCs interfaces.Once identified, the delay value is received through a configuration register, which can be accessed via a dedicated data bus.When a trigger is generated, the ADC packet controller receives a signal indicating its occurrence along with the address of the Level 1 memory related to the event.The final read address can be found by adding the received address with the previously configured delay value.The value read in the resulting address is then stored in a second memory level (l2_mem).
The generic packet controller works similarly to the ADC packet controller.However, the signals received in the first one do not have synchronization problems, since they are generated internally in the proposed system.The level two memories were designed to store secondary system samples related up to seven different primary trigger signals and operate as a First-In-First-Out (FIFO) memory.
The control unit is the module responsible for receiving the trigger signal from the selection algorithm and passing it on to the packet controllers.Since the trigger signal does not occur in a constant rate, a considerable number of events may appear within short time intervals.Besides, as the process of transferring the data from level 1 to level 2 memories in the packet controllers is performed sequentially, whenever the primary trigger signal is passed before a data transfer is completed, an overlap problem shall occur.In order to avoid the occurrence of these two issues, the control unit comprises a waiting system to store the received triggers in a dedicated FIFO memory.Thus, the next trigger signal is passed to the packet controllers within a previously fixed number of pulse samples (detector pulse width), which is defined considering the maximum time required to perform a transfer between level one and two memories.Whenever the trigger queue is full, a signal is generate to indicate that the next incoming events are to be discarded.When data are passed to the packet control second buffer level, an interrupt is generated in the generic packet controller block stating that data are ready to be sent.At this point, the data are read and sent sequentially to output links following a specific fragment format.Each sample read from the ADC packet controllers generates a request by generic packet controller to make the next sample of level 2 memory available.The generated data fragment format consists of basically three fields: fragment header, sub-fragment and fragment end.As illustrated in Figure 5, data package identification information is stored in the fragment header, in the following, secondary system readout data are sent in the sub-fragment.A fragment end is sent stating the conclusion of a data package.
For information package control within the Packer block a finite state machine (FSM) was used.As illustrated in Figure 6, the following five states were defined: idle, header, data_header, data e trailer, which are described in the following: • the FSM is initialized in idle state and waits until an interruption is received, indicating that there is information ready to be processed.In this case, the FSM is switched to the header state; • the header state is responsible for sending to the output link the package header according to the defined data fragment format; • in the following, the FSM state is set into data_header state and the subfragment header information, comprising information from subfragment start bit, size and type is sent to the output link.After this stage FSM moves to data state; • in data state the FIFO readout data memories are accessed and forwarded to the output data link.In case there is an additional sub-fragment to be sent, the FSM state is returned to data_header; • after the subfragment information transmission is finished, the FSM state changes to trailer, in which the data fragment end is sent and the FSM returns to idle.Additionally to systems functional blocks, a debug structure was inserted allowing external access to information such as: number of sent packages; number of matched filter trigger occurrences; triggers FIFO overflow events; number of primary triggers received; and trigger identification data.

IV. RESULTS
From the specifications and the architecture of the proposed design, critical functionalities were identified.Thus, specific test scenarios have been considered to verify the following aspects: • Correct detection of a trigger; • Selection of the data related to the received trigger, which comprises adjusting the delay between different readout channels; • Correct packaging and sending of data; • Trigger wait system; • MF error due to a finite word-length (limited resolution) implementation.As described in previous sections, the proposed embedded system shall be applied in online trigger systems which require data fusion from different measurement systems.In order to demonstrate practical applicability, a case-study in experimental high-energy physics was considered.
Specification of some parameters were required for simulation and testing of the proposed digital circuit.These values were chosen to cope with a real experimental high-energy physics problem, which could appear in one of the LHC detectors [10]: • System operational frequency: 40 MHz (which is the LHC nominal bunch-crossing rate); • Expected average primary trigger rate: 100 kHz; • Number of required detector samples to proper characterize the signal of interest: 7; • Capacity to handle in parallel up to 32 readout channels.In Figure 7, the control block behaviour during the occurrence of a trigger signal is shown.It can be observed that a primary trigger acceptance input was received at k = 1 (see trigger (in) signal).After this, the proposed solution was able to detect it, store the identification value and inform the other system blocks that there is a trigger to be processed.At k = 5, the correspondent data samples are made available for reading, as indicated in the has_ data(out) signal.In the sequence, the process of storing and sending the data corresponding to the generated trigger signal was tested.For this scenario, known values were inserted into the inputs corresponding to the ADCs, and at the same time, trigger signals were injected at a known frequency.In the end, the values of the output were compared with the expected ones.No error was observed, demonstrating the proper module operation.In Figure 8 the process of transferring ADC samples between levels one and two of the memories can be observed.As a delay of three units was configured, the desired sample is located three samples before the received address.Another test concerned the trigger waiting system validation.For this, triggers were generated with different configurations: continuously; having random intervals; and sequences between one and eight triggers without intervals.No errors were observed in this module operation.
In order to verify that the system can support the secondary system event rate, the time required to send a packet was computed.For this, it was considered that the period starts with a trigger event detection and ends when the last related information is sent.From the collected information, a maximum data transfer rate (throughput) of 360 KHz was obtained.Since primary trigger events were considered to occur, on average, at 100 kHz, the trigger sending time ends up being approximately three times larger than the reception one.This feature makes it possible to compensate for any overloads in trigger occurrences.From the tests carried out, it was observed that the use of Level 2 memories with eight depth levels would be enough for avoiding packet losses.
After completing the simulation process, the data fragment builder design was synthesized into a XC6SLX150T device from Xilinx's Spartan 6 family [18], which is currently among the most powerful devices.From the synthesis results, data such as power consumption, timing and used FPGA resources were extracted (Table I).From these results, it is possible to observe that for the worst case data path the setup and hold times have a positive value, indicating that the proposed design does not exhibit timing problems.Moreover, despite the quite high utilization of the FPGA memories, other resources such as look-up tables (LUTs) and registers did not present significant use.Likewise, the power analysis is proportional to the resource utilization.
Once the proposed implementation of the packet controller module was validated, a study concerning the finite wordlength effects on the FPGA MF module implementation was performed.For this, a reference filter model was developed using 32-bit for filter coefficients quantization.This was considered to be the target model.Different implementations using from 7 up to 31-bits were generated and compared to the reference model, considering aspects such as he filter outputs relative mean error and MF detection disagreement.
The filter output relative mean error (RME, in %) was defined as: in which x is the number of quantization bits (word-length), y Ref and y x are the reference filter and the x-bit filter outputs, respectively, and N is the number of considered readout samples.In this analysis, N = 70, 000 simulated readout samples were used.As illustrated in Figure 9, it is possible to observe that the RME considerably increases for x ≤ 11 bits.For 12 ≤ x ≤ 15 the error is very small (below 0.01%) and tends to zero for x > 15.As the MF purpose is to perform signal detection, the effects of finite word-length in the classification results were also evaluated.For this, a detection disagreement (mismatch) index was defined as: where NDis x is the number of disagreements between the decision of the MF implementations for x-bits and for 32-bit (considered here as the reference model).
As from Figure 10, the detection disagreement varies with both the number of bits and the used decision threshold (γ).It was observed that the mismatch increases for x ≤ 12, specially for γ around 0.54 and 1.11.For x > 15 disagreement tends to zero, which means that, in terms of the classification decision, such implementations are approximately equivalent.The effect of varying the word-length was also observed after digital circuit FPGA synthesis considering LUT and register utilization.As from Figure 11, device utilization increases with the number of bits (x), as the number of logical and arithmetical operations are directly related to that parameter.A considerably different behavior was observed for a 18-bit implementation.This may have occurred due to some optimization issue in the computer-aided hardware design software, which was able to better divide the required logic into registers and LUTs.
From the finite word-length analysis, using between 16 and 18 bits is enough for MF operation when the 32-bit implementation is used as a reference.Thus, device occupation may be considerably reduced without affecting system performance.V. CONCLUSION Efficient online event detection is particular important in different applications.This work presented an embedded digital solution to improve the efficiency of online detection when information from different instrumentation systems is required to reduce the occurrence of false triggers.A matched filter discriminator was proposed to validate primary trigger decisions by analyzing signatures produced in a secondary instrumentation system.For system validation, a case study considering an experimental high-energy physics problem was used.A FPGA implementation was designed for the data control module and validated using simulated analysis from computer-aided hardware design software.The matched filter module was analyzed concerning the finite word-length effects, and one may reduce device occupation without affecting system discrimination efficiency.The proposed solution may be adjusted to be used in different applications such as modern particle detectors, telecommunication and industry.

Fig. 3 .
Fig. 3. Top view diagram of the proposed data fragment builder module.

Fig. 7 .
Fig. 7. Digital signals temporal diagram in the occurrence of a trigger.

Fig. 8 .
Fig. 8. Data transfer from the ADC to the Level two memory.

Fig. 9 .
Fig. 9. Relative mean error (RME, in %) of the MF outputs considering different number of quantization bits (word-length).The reference model refers to a 32-bit implementation.

Fig. 10 .
Fig. 10.Detection disagreement as a function of the word-length and the decision threshold.

Fig. 11 .
Fig. 11.FPGA resource utilization as a function of the number of quantization bits (word-length).
): a Synchronization Trigger Control Unit; a 32-bit Generic Packet Controller; an 8-bit ADC packet controller; and a Packer module.The proposed module is referred here as data fragment builder.