# The New Readout Architecture for the CMS ECAL

The CMS ECAL Front-End Electronics group M. Hansen

> CERN, 1211 Geneva 23, Switzerland Magnus.Hansen@cern.ch

# Abstract

The relatively high cost of optical data transmission, but the decreasing cost of radiation tolerant electronics, has led to a change in the architecture of the readout electronics for the CMS ECAL sub-detector. The processes for generation of trigger primitives as well as the primary event buffers have been moved into the experiment, allowing a decrease of the optical fibre count by an order of magnitude.

A new module, the Front End card (FE), serves one trigger tower in the CMS ECAL barrel or one super-crystal in the CMS ECAL end-cap. The main functions are implemented in a new Application Specific Integrated Circuit (ASIC), named FENIX (Front End New Intermediate data eXtractor). The ASIC contains the DSP for trigger primitive generation and RAM for the digital pipeline and the primary event buffers.

The ASIC is implemented in a radiation tolerant process and expected to be radiation hard, but will be subject to Single Event Upsets (SEU). In order to cope with SEU, configuration registers are protected with triple-redundant registers, data paths are protected with a single parity, and

The development of the FENIX ASIC functionality is accelerated by the parallel development of an FPGA, using identical source code.

## I. INTRODUCTION

Even though the architecture of the CMS ECAL trigger and data acquisition system has changed, the functional requirements are unchanged since a number of years. The main functionality to be provided is listed under system requirements in the following.

### A. System Requirement

- a) Trigger Primitive Generation. The trigger primitive generation requires an absolute calibration of each channel. The algorithm is well developed, described, and understood. A severe latency budget is imposed in order not to waste expensive pipeline memory space, especially in sub-detectors using analogue temporary storage.
- Readout of data corresponding to a positive level 1 trigger decision. In particular, the data acquisition system has to be dead time free for a sustained trigger rate of up to 100 kHz trigger rate. The

system has to be able to cope with modified level 1 trigger latency.

c) The system has to support monitoring of crystal deterioration due to irradiation and a high precision, a tenth of a degree is required, crystal temperature measurement.

## B. History

In 1996 a decision was taken to implement a strict minimum of the readout system functionality inside the experiment in order to achieve a maximum of flexibility in the trigger and readout algorithms. Also, by implementing the modularity of a single readout channel, a single data link failure only affect one channel.

This architecture is here called the ROSE (Read-Out System ECAL) architecture.



Figure 1: The old ROSE readout block diagram

The large number of data links makes the system expensive to implement and was, despite certain advantages, considered to be too expensive.

## C. The New Architecture

During the electronics review in February 2002 the observation that the large number of data links in the ROSE system architecture is carrying almost only zeros was made.

A study of a modification to the readout system architecture was launched following the above mentioned review.



Figure 2: The new ECAL readout system block diagram

The outcome of the study suggest that by implementing more functionality in the front end using modern radiation hard, or radiation tolerant, technology opens the road for decreasing the data link count with an order of magnitude. In particular the trigger primitive generation, the digital pipeline, and the primary event buffers need to be moved on to the ECAL detector. In turn, this requires implementing a well developed clock and control link to the front end in order to transmit the level 1 trigger decision as well as slow control for configuration. The development cost for radiation tolerant electronics needed in the front end could possibly be kept between limits in order to achieve a lower cost to completion for the detector.

### II. A NEW TOWER SUBUNIT



Figure 3: Front End system drawing with Motherboard VFE card, LVR card, FE card, and cooling system.

## A. Motherboard

The entirely passive motherboard provides a "flat" surface to install the electronics on. The analogue connection to the detector is made through kapton cables. The analogue power for the VFE cards is distributed through the motherboard.

### B. Very Front End card

Two ASICs combined implement analogue dynamic range compression by digital gain selection, i.e. the gain selection is performed after digitization. The two circuits providing this functionality are new. The MGPA [1] is a three gain preamplifier, and the Multi-ADC [2] is a four channel 12 bit ADC with logic to select the converted value of the highest non-saturated gain as the output of the ADC.

### C. Low Voltage Regulator card

The low voltage regulator card is housing the linear voltage regulators needed to supply the power to the complete front end system.

## D. Front End card

The majority of the functions moved from the upper level readout system onto the detector is implemented in the front end card (FE). In the following chapters, mainly the FE card and its sub-components will be addressed.

# E. The Gigabit optical Hybrid

The Gigabit optical Hybrid (GOH) is implementing a data link transmitter, including serializer (GOL) and a laser diode.

### F. Upper level readout system

The upper level readout system is implemented using four components, namely CCS (clock and control system), TCC (Trigger Concentrator Card), DCC (Data Concentrator Card), and the SRP (Selective Readout Processor). The upper level readout system is placed in the counting room. A global block diagram of the complete readout system can be found in figure 4.



Figure 4: System block diagram

# III. THE FRONT END CARD

The Front End Card is the heart of the front end system. Data is received at a 40 Mwords per second from five VFE cards housing five electronics channels each. Services, as sampling clock and slow control, are provided to each of the VFE cards. The FE card is receiving regulated power from the LVR card for its internal operation, and is distributing power for the digital parts of the VFE cards.

Hence, on the back side of the FE card all connections to the VFEs and to the LVR are made, please see figure 5.

On the front side of the FE card, all main functions are implemented in about 12 ASICs, and the connections to the data links and to the flat cables for the token ring are made, please see figure 6.



Figure 5: (left) Back side of the FE card; Connectors to the VFE cards an to the LVR card.

Figure 6: (right) Front side of the FE card; ASICs providing the main functionality of the FE card and connectors to data links and to token ring.

As mentioned above, two main tasks are performed in the FE card, namely trigger primitive generation for the level 1 trigger and readout of the raw data time frames corresponding to a level 1 trigger accept. The clock and control system is identical to the one for the CMS tracker system [7], and will not be further discussed but referred to as CCS.

#### A. Trigger primitive generation

The trigger primitive generator is implemented in two steps described below.

First, filtered strip sums are created, adding five absolutely calibrated channels and applying the result to a five or six tap Finite Impulse Response (FIR) filter. A simple peak finder is applied to the result in order to assign the filtered result to a single bunch crossing.

Second, five filtered strip sums are added together in adjacent pairs. Each pair sum is compared to the total sum of the five strip sums. If the total energy is contained in any pair of strips, this information is sent together with the estimated energy to the level 1 trigger system. The total latency in the trigger primitive generation process is 11 clocks, as seen I figure.

| hanner Magnus     | er path latency   |       |           |               |
|-------------------|-------------------|-------|-----------|---------------|
|                   | 7750.273224 .7800 | ,7900 | ,8000     | 8141.427596 m |
| clock             |                   |       |           |               |
| channel_1_1_in    | 0000 0200 0000    |       |           |               |
| strip1_sum        | 0000              |       | 004A 0000 |               |
| trigger_primitive | 0000              |       |           | 1012 0000     |
|                   | 1. 100010         |       |           |               |

Figure 7: FE card trigger path delay simulation

## B. Readout of Triggered data

The readout of the data accepted by the level 1 trigger is done in three steps described below.

First, while calculating the trigger primitive, all data words are temporarily stored in a digital pipeline. The length of the pipeline is programmable in order to correspond exactly to the latency in the global level 1 trigger process.

Second, when the level 1 trigger decision comes back through the TTC system and through the CCS, a time frame for every channel is transferred from the pipeline to the primary event buffer. In case of a level 1 trigger reject, the data corresponding to the rejected bunch crossing is rejected. The primary event buffer has a capacity of 25 event with the nominal time frame length of 10 samples. The readout service time of 7.2 us yield an overflow probability of in the order of  $10^{-8}$ . In case of overflow, valid events with dropped payload are issued.

Third, as soon as the readout data link is available for the current event, the data from each of the 25 channels together with an event encapsulation consisting of event identification and a trailer word is sent to the upper level readout system in the counting room.

Through the complete chain of operation, including the data link, each data word has a single odd parity in order to detect a single bit error. The event header contains the sender ID, the local event ID, and the bunch crossing number. The trailer consists of a vertical even parity including all active data bits (14 bits). All header words, with exception for the CRC, have a unique identity code.

# IV. THE DEVELOPMENT OF THE FENIX ASIC

In order to perform the all the task required on the FE card, the decision to make a single chip with three, or actually four, operation modes has been taken. The operation mode is selected using two external pins.

### A. FENIX chip operation modes

### 1) Strip

The FENIX chip running in strip mode is calculating filtered strip sums for the trigger primitive generation, and contains the pipeline as well as the primary event buffers for five channels.

# 2) TCP

The FENIX chip running in TCP mode is finalising the trigger primitive for one trigger tower in the ECAL barrel.

#### 3) DAQ

The FENIX chip running in DAQ mode is controlling the readout of five FENIX chips running in Strip mode. It is encapsulating the event by adding a tower ID, an event ID, a bunch crossing ID, as well as a CRC in the trailer word.

4) MEM

The MEM operation mode is a mixture between Strip mode and DAQ mode. An encapsulated event is created for five channels. The MEM operation mode is meant to be used for reading out the ADCs monitoring the laser injection system.

#### B. Control

The front end system is using the CMS tracker clock and control system for fast and slow control.

#### 1) Fast control

The fast control is distributed by suppressing clock edges in the global clock distributed by the CCS. The missing clock edges are detected in the tracker PLL ASIC. For every missing clock edge, a signal called T1 is set high for one clock cycle. Every command is encoded using three consecutive clocks. The codes used in the ECAL are listed in table 1.

| Code    | Use          | Meaning                             |  |
|---------|--------------|-------------------------------------|--|
| 100     | Lvl1 trigger |                                     |  |
| 101     | BC0          | Sets local BC counter               |  |
| 110     | Re-synch     | Reset all state machines            |  |
| 111     | Force VFE    | Force gain and test pulse injection |  |
| 110110  | Pwup_reset   | Re-initialises FENIX ASIC           |  |
| 1100110 | Pwup_reset   | Re-initialises FENIX ASIC           |  |

Table 1: Fast control encoding

#### 2) Slow control

I2C extended 10 bit addressing [5] is used for the FENIX slow control interface. The interface provides a direct access to all set-up registers. The number of set-up registers is in the order of 150. All set-up registers have been given a reasonable value at power-up reset in order to have a fully functional system without need to set a correct value in any register. The interface, which is fully synchronous to the main 40 MHz clock, is compatible with the CCU I2C master ports.

## C. HDL description

The description is written in VHDL. A generic coding style make the synthesis to ASIC based on standard cells possible at the same time as to a functionally identical FPGA. The RAM blocks have implemented with special attention. All functional simulation have been done using a generic standard RAM, specified to be compatible with the corresponding Xilinx RAM block and with the ASIC RAM cell [4]. For the FPGA, the Xilinx RAM has been wrapped and instantiated, and for the ASIC the modular static RAM cell has been treated in the same manner. Post layout simulation has been performed in order to verify conformity to the functional simulation model.

# D. FPGA emulator

A Xilinx Virtex 2 device, the XC2V1000, was chosen as the device for the emulator. The observable functionality of the emulator is identical to the ASIC, but in order to save resources a few internal features have been left out. To be precise, the triple-redundant registers, the ECC in the RAM blocks, and the RAM BIST functions have not been implemented. It should be mentioned that the FPGA is full to 60%. If all features where to be implemented, it would be full to 150%, thus not feasible.

# E. ASIC implementation

The ASIC has been implemented using Synopsis for synthesis and Silicon Ensemble for place and route. The design flow provides the very short design turnaround of 2 weeks, and in theory even less than a week.

### 1) Strategy to cope with Radiation

Implemented in a 0.25u radiation tolerant process [3], the ASIC is estimated to be radiation hard but subject to Single Event Upsets (SEU) [6]. Common strategies have been used to prevent an SEU from disturbing a good functioning of the system: Triple-redundant registers have been use in set-up registers and in state machines. Error correction code (ECC) has been added to the data in all RAM blocks.

Some coding tricks have been used in order to implement the structures in figure 1 and figure 2.

i) Set-up registers



Figure 8: Triple-redundant set-up register

The structure shown in figure 1, with the three redundant registers set up as a shift register, is immune to a SEU in any one of the three registers, as well as for single spikes on the write enable signal or the data to be written. However, almost any hardware failure is fatal. Adding the testability flag is making the test of the structure possible even without internal scan chains.

ii) State machine and counters



Figure 9: SEU safe state machine register

The structure shown in figure 2, with the three redundant registers set up as a shift register in case of non-changing data, is immune to a SEU in any one of the three register. As is the case for the set-up register, almost any hardware failure is fatal. Adding the testability flag is making the test of the structure possible even without internal scan chains, and synthesizable.

iii) ECC in RAM [6]

RAM is protected with an Error Correction Code (ECC) allowing for a single bit error correction. All RAM blocks have hamming code added before writing. Decoding and correction is made after reading.

#### 2) FENIX ASIC testability

The target time consumption for the FENIX ASIC has been set to in the order of 1 second without chip handling.

- i) Test of set-up registers. To test the tripleredundant set-up registers it is possible to write and read back each of the registers while observing the seu\_flag output pin. In case of permanent discrepancy, the seu\_flag output pin is permanently high. During the write, the seu\_flag pin is active for two clock cycles.
- ii) Triple-redundant state machines. During normal operation the seu\_flag output pin is providing a signature of operation, hence the internal operation is somewhat observable.
- iii) RAM BIST. The RAM bist can be launched either by activating an input pin, or by writing a set-up register. During the BIST cycle the functioning can be observed on a dedicated output pin, which is also, as soon as the BIST cycle is terminated, showing the result of the BIST by staying high (error) or low (no error).

#### V. SUMMARY

A New Readout System Architecture for CMS ECAL has been developed. The change in architecture consist of moving functionality from the former upper level readout system to the front end in order to decrease the number of optical data link, and thereby achieve a lower total cost of the readout system. The feasibility of the system has been demonstrated by a successful test in beam of a prototype system implemented using FPGAs.

## VI. VI. REFERENCES

[1] "The MGPA electromagnetic calorimeter readout for CMS", M. Raymond, Proceedings of the 9<sup>th</sup> Workshop on Electronics for the LHC Experiments, Amsterdam, Sept., 2003.

[2] "A CMOS low power, quad channel, 12 bit, 40Ms/s pipelined ADC for applications in particle physics calorimetry", A. Marchioro et al., Proceedings of the 9<sup>th</sup> Workshop on Electronics for the LHC Experiments, Amsterdam, Sept., 2003.

[3] "Radiation Tolerant VLSI circuits in standard deep submicron CMOS technologies for the LHC experiments" Practical Design Aspects", G. Anelli et al., IEEE Transactions on Nuclear Science, Vol. 46, No. 6, pt1, pp 1690-1696, Dec. 1999.

[4] "A Configurable Radiation Tolerant Dual-Ported Static RAM macro, designed in a 0.25µm CMOS technology for applications in the LHC environment, Proceedings of the 8<sup>th</sup> Workshop on Electronics for the LHC Experiments, Colmar, France, Sept. 2002.

[5] "The I2C-BUS specification", Philips Semiconductors, Version2.1, January 2000.

[6] "SEU effects in registers and in a Dual-Ported Static RAM designed in a 0.25  $\mu$ m CMOS technology for applications in the LHC", F. Faccio et al., Proceedings of the 4th Workshop on Electronics for the LHC Experiments, Rome, Sept. 1998.

[7] "A system for timing distribution and control of front end electronics for the CMS tracker", A Marchioro, Proceedings of the 3rd Workshop on Electronics for the LHC Experiments, London, Sept. 1997.