# Kchip: A Radiation Tolerant Digital Data Concentrator chip for the CMS Preshower Detector.

K. Kloukinas, P. Aspell, D. Barney, S. Bonacini, S. Reynaud

CERN EP division, 1211 Geneva 23, Switzerland Kostas.Kloukinas@cern.ch

### Abstract

The Kchip is a digital chip for the CMS Preshower detector front-end readout electronics. Its primary function is to merge up to four digital data streams into one and format them in a way suitable to be sent over an optical high speed link. The Kchip has been developed in a commercial 0.25  $\mu$ m CMOS technology using radiation tolerant layout techniques. Special design considerations were also taken in order to protect the operation of the chip against Single Event Upsets. This paper introduces briefly the front end readout architecture of the CMS Preshower silicon strip detector and presents the design of the Kchip. The experimental test results from the first prototype chip are reported.

### I. INTRODUCTION

The front-end readout electronics of the CMS Preshower detector are based on a hybrid architecture consisting of an analog pipeline memory followed by an ADC and a non-zerosuppressed digital transmission to the off-detector electronics. Figure 1 shows a general block diagram of this architecture.

The sensitive elements of the CMS Preshower detector are ~4300 silicon sensors each of which is  $6.3 \times 6.3 \text{ cm}^2$  and divided into 32 strips. The signals from the silicon strips are amplified by a 32 channel preamplifier chip, named the "DELTA" chip, and then are sampled at 40MHz and stored in a 32 channel analog pipeline memory chip, named "PACE-AM". The DELTA and PACE-AM chips are ASICs are also based on 0.25µm CMOS technology and compose the PACE chipset. Upon the arrival of a trigger signal 3 consecutive columns in the "PACE-AM" memory are blocked and their column addresses are written to a FIFO. A readout cycle then follows that multiplexes out of the PACE-AM the analog data at a rate of 20MHz. The data are digitized by a CMOS 0.25µm, quad channel, 12-bit, 40Msps ADC chip [1]. The digitized data are transmitted without any zero suppression to the off-detector electronics through a unidirectional Gigabit Optical Link (GOL) [2]. To utilize more efficiently the bandwidth of the links and reduce the overall number of links needed to readout the entire detector, it is necessary to merge the data streams generated by up to 4 PACE chipsets. This data concentration operation is performed by a custom made chip named the "Kchip".

The Kchip incorporates all the necessary functionality to read out the PACE chips, assemble a time stamped Event Packet with appropriate format in order to be transmitted through the gigabit optical links. It monitors the operation of the PACE pipeline memories and the occupancy of the frontend buffers and identifies possible erroneous conditions. The Kchip is designed in 0.25  $\mu$ m CMOS technology with special layout techniques to ensure radiation tolerance. These techniques are extensively described in references [3] and [4].

Section II introduces the design of the Kchip whilst section III presents experimental test results from the first prototype of the chip.



Figure 1The Preshower Front End Readout Architecture.

### II. KCHIP ARCHITECTURE

#### A. Data Path Architecture

The block diagram of the Kchip is shown in Figure 2. Upon the reception of level 1trigger accept command (LV1A) an 8-bit Event Counter (EC) is incremented and a trigger event entry is inserted in the Trigger FIFO. The trigger entry is complemented with a Bunch Count identifier that comes from a 12-bit Bunch Counter (BC), which is incremented on every clock cycle. The EC and BC are reset upon a resynchronization command "ReSync" or a "BC0" command which indicates the beginning of the orbit. These commands as well as the LV1A and the PACE calibration command (CalPulse) are received from the control system and decoded by the Trigger Decoder logic on the Kchip.

The LV1A command is distributed to the PACE chips and a readout sequence of the event samples stored in the analog memories starts. The Kchip receives the digitized samples from the ADCs through two 12-bit multiplexed data busses using Double Data Rate and LVDS signalling. The data are passed through a de-multiplexing circuit that uses both edges of the system clock to separate the two data streams to four data channels corresponding to the four PACE chips. The data are pushed into four independent Data FIFOs.







Figure 3 PACE Readout cycle.

Concurrently with the analog data the PACE chips transmit serially on a separate line (ColAddr) the address of the memory column that is currently being read out. The Kchip de-serializes this information and stores it into the "Column Address FIFO". The PACE readout sequence is shown in Figure 3. "DataValid" is a synchronization signal indicating the beginning and the end of the readout of an analog memory column.

The PACE readout sequence is replicated on the Kchip by the PACE Controller logic. This logic emulates the readout state machine of the PACE chip and generates a copy of the control signals that the Kchip receives from the PACE chips. The Error Logger logic compares these sets of signals and identifies possible loss of readout synchronization among PACE chips. Error conditions are properly flagged and associated with the corresponding received event data.

## B. Data Rates and Buffer Sizes

On every Trigger LV1 accept there are 96 Event Samples that are digitized at 12-bit resolution and pushed into the Kchip. At an average trigger rate of 100 kHz the generated traffic by one PACE chip is 14.4 Mbyte/sec. The generated traffic by four PACE chips is then 57.6 Mbyte/sec making feasible the transmission using the 80 Mbyte/sec GOL link.

The readout time of an event from the PACE chip is 6.9µsec while the readout time of an event from the Kchip is 7.8µsec. The difference in the readout times and the stochastic nature of the trigger arrivals mandate the use of data buffering on the Kchip. The size of the buffers in the K-chip determines the probability of losing an event because of a momentary congestion. The sizing of the FIFOs is then of great importance

The fact that the trigger arrivals follow an exponential distribution while the Kchip service times follow a uniform distribution makes an analytic queuing model difficult to develop. Instead a software emulation model of the full frontend readout chipset (PACE, ADC, Kchip) was developed in order to determine the size of the data FIFOs as a function of the probability of rejecting an event. Figure 4 presents a plot of the occupancy of the three different FIFOs found on the Kchip as a function of time. The simulation was run for 1.5 million events and the results of the analysis are summarized in Table 1.



Figure 4 Simulation results presenting the occupancy of the Kchip Data FIFO, Column\_Address FIFO and Trigger FIFO.

| Time examined                    | $15.10^2$      | S     |
|----------------------------------|----------------|-------|
| Number of events                 | $1.5 \ 10^{6}$ |       |
| Mean interarrival time of events | 10.059         | μs    |
| PACE rejected events             | 7              |       |
| Kchip rejected events            | 0              |       |
| Maximum Trigger FIFO occupancy   | 26             | words |
| Maximum Column FIFO occupancy    | 52             | words |
| Maximum Data FIFO occupancy      | 863            | words |
| Average Trigger FIFO occupancy   | 3              | words |
| Average Column FIFO occupancy    | 2              | words |
| Average Data FIFO occupancy      | 36             | words |

Table 1 Simulation results of the front-end system indicating buffer sizes.

Taking into consideration the maximum FIFO sizes estimated from the previous analysis two different FIFO

modules were assembled using the Configurable Radiation Tolerant Dual-Ported SRAM macro cell [5]. One SRAM macro of 1024x18bits to build the data FIFOs and another of 128x27bits to built the Column and Trigger FIFOs.

## C. Packet Formatter

The main function of the "Packet Formatter" is to associate the trigger events in the Trigger FIFO with the data waiting in the Data FIFO and Column FIFO and compose the data packet for transmission. The block diagram of the Packet Formatter is shown in Figure 5. As the data input comes in 12-bit format the logic re-aligns them in contiguous blocks of 16-bit as to utilize maximally the link bandwidth. The format of the assembled packets is shown in Figure 6. They are composed by the "Header" the "Data" and the "Trailer" fields. The Header comprises the "Bunch Counter", the "KID" "Event Counter" the (Kchip programmable IDentification number) and a set of flags indicating the type of event and the error conditions.



Figure 5 Packet Formatter block diagram.



Figure 6 Data Packet Format.

The "Null Event" logic identifies situations where data could not be stored for a given trigger event because either the PACE pipeline or the Kchip Data FIFO were full and properly assembles a special packet called "NULL". This type of packet contains no data and carries only the "Header" information with properly incremented "Bunch Counter" and "Event Counter". This facilitates the maintenance of the synchronization of the readout chain.

### D. Gigabit Optical Link Interface

The Kchip implements a packet oriented data transmission protocol employing an error detection mechanism to identify errors that can occur on the data links. Figure 7 shows the block diagram of the GOL interface circuitry.



Figure 7 Gigabit Optical Link (GOL) Interface block diagram.



Figure 8 Gigabit Link structure (serializer-deserializer).



Figure 9 Packet format in CIMT protocol.



IDLE = <K28.5, D5.6> or <K28.5, D16.2> : Idle CXT = <K23.7, K23.7> : Carrier Extend

Figure 10 Packet format in 8b/10b protocol.

The chip is specially designed to utilize seamlessly both the CIMT and the 8b/10b encoding schemes that are supported by the GOL chip. The flexibility of using both encoding schemes is realized by properly choosing the transmission protocol control symbols (SOF, IDLE, CXT) which are supported in both encoding schemes. Figure 9 and Figure 10 shows the link protocol for both schemes.

To protect the transmitted data through the link a 16-bit CRC character is appended in every packet that covers Header and the Data fields. The generator polynomial in use is the CRC-CCITT ( $x^{16} + x^{12} + x^5 + 1$ ). A special "Link Test" mode is also implemented where a fixed format packet is continuously transmitted through the link upon request in order to facilitate the testing and debugging during the system installation period.

### E. Calibration Circuit

To facilitate the calibration of the PACE chips the Kchip provides a test pulse generator that has a programmable delay with respect to the sampling clock and programmable duration. The calibration pulses can be stepped in terms of 25ns increments and 3.2ns increments using an on-chip DLL circuit [6]. The calibration pulse are generated upon the reception of a "CalPulse" trigger command. After the elapse of time equal to the trigger latency (programmable) a trigger signal is automatically generated and distributed to the PACE chips in order to read out the calibration event. A specially flagged trigger entry is also inserted in the Kchip trigger FIFO. The automatic generation of the trigger event can be inhibited leaving the initiative to the off-detector electronics to generate this special trigger event.

## F. Internal Registers and $I^2C$ Interface

There is a set of 25 internal registers that control and monitor the operation of the chip. There is also a set of 16-bit factory blown fusses that give a unique identification number to each Kchip. This identifier will be used for component tracing as well as correct cabling verification during system installation.

To gain access to the Kchip internal registers and to the on-chip FIFOs a serial slow control interface has been implemented. The interface follows the I<sup>2</sup>C bus standard [7] allowing for 7-bit addressing, single byte transfers. The interface is based on a synchronous state machine design facilitating the implementation of the triple module redundancy scheme as described in the following paragraph. To alleviate possible metastability problems when interfacing with the I<sup>2</sup>C bus signals a synchronizer circuitry is used that double buffers the two I<sup>2</sup>C bus-lines.

## G. SEU Tolerant Technique

The Kchip will have to operate in a radiation environment where the flux of energetic particles that can cause Single Event Upsets (SEUs) is high [8]. SEUs in the data path of the chip are not that harmful since they will affect the integrity of only a small amount of information that was actually being processed at the time of the SEU incidence and will not lead to a loss of synchronization in the operation of the chip. However, SEUs happening in the control logic of the chip would cause a sustained malfunction that can only be recovered by a reset and re-initialisation operation. To avoid the need of frequent reset cycles we have decided to protect against SEUs the entire control logic of the chip and leave the data path unprotected.

The scheme that we have employed is called Triple Module Redundancy and is shown in Figure 11. It consists of the instantiation of three state machines leading to a majority voting circuit. This way, if one state machine goes to a wrong state, because of an SEU, the other two outvote it and the correct output is propagated to the rest of the circuit while the correct state is feedback to the state machines. Within the next 3 clock cycles and in the absence of another SEU, all state machines will be brought back to the correct state. All the state machines on the chip and the configuration registers are individually triplicated. The triplication logic is coded into the HDL description of the chip and synthesized.



Figure 11 Triplicated State Machine.

## III. FABRICATION AND EXPERIMENTAL RESULTS

### A. Kchip Design and Fabrication

The chip was designed using a combination of standard cells and full custom design blocks. The full custom cells are the SRAM modules and the DLL circuit. The entire chip was synthesized from Verilog code and implemented using a radiation tolerant standard cell library. The entire design flow was completely scripted. The core logic of the chip contains 13,300 logic gates and 1,400 registers. The generated clock tree had a total of 189 buffers on 6 levels giving a max. delay of 685ns and max. skew equal to 65ps at the leaf cell.

The layout of the chip measures  $6x5mm^2$ . The design is pad limited owing to the need for the two ADC parallel busses, the four sets of PACE control signals and the use of differential signalling.

A first prototype of the Kchip was developed at CERN and submitted for fabrication in a Multi Project Wafer run on February 2003. Figure 12 shows a microphotograph of the first prototype of the Kchip.

### **B.** Functionality Tests

Samples of the Kchip arrived at CERN in the summer of 2003 and were bonded in Ceramic PGA packages for testing. The functionality tests were performed on a digital tester in the Microelectronics Group at CERN. Two types of test vector were generated from Verilog simulations and were uploaded to the digital tester for design verification. One set contained simulation results from various operating conditions and data traffic loading of the chip and another set contained automatically generated test vectors that use the scan path to test the chip. All implemented functionalities were confirmed. A minor bug in the layout design of one of the SRAM modules was identified. Testability features like the I<sup>2</sup>C direct access mechanism to the memories and the scan path implementation on all state machines proved to be extremely beneficial. A fix is planned for the next submission of the chip.

The maximum operating frequency that we attained was 60 MHz at 2.5V. The total power consumption of the chip was measured to be 625 mW at 40MHz and at 2.5V. The power consumption of the core logic was only 170 mW while the power consumption on the periphery was 455 mW.



Figure 12 Microphotograph of the first prototype of the Kchip.

### C. Radiation Tests

Total ionizing dose tests were performed at CERN using an X-ray source. A step irradiation was performed in 5 steps at 1 Mrad, 3 Mrad, 5 Mrad, 10 Mrad and 20 Mrad (SIO<sub>2</sub>), having a constant dose rate of 2.04Mrad/h. The irradiation was followed by an annealing period of 24 hours at room temperature. The chips were powered and operational during the irradiation. A full set of functional tests and power dissipation measurements were carried out after each step. The results showed that there was no measurable performance degradation in terms of maximum operating frequency and there was no increase in the power dissipation up to the maximum total dose of 20 Mrad. A slight drop in the power dissipation was observed instead.

## IV. CONCLUSIONS

The first full scale prototype of the Kchip for the CMS Preshower detector has been designed and fabricated in 0.25  $\mu$ m CMOS technology with special layout techniques to ensure radiation tolerance. The design features triple module redundancy on all state machines and internal status and control registers in order to protect against SEUs

Extensive tests performed on the first prototype verified the correct behaviour of the chip. A bug was identified in the design of one SRAM module causing infrequent errors to appear on the output data stream of the chip. The problem is understood and a design fix will be implemented in the next submission of the chip. In-system of the chip tests are also under preparation as well as SEU tests.

## V. ACKNOWLEDGEMENTS

We would like to thank the microelectronics group of Rutherford Appleton Laboratory, particularly Quentin Morrisey, for helping in the adaptation of the APV25 DLL circuit in order to be integrated in the Kchip.

## VI. REFERENCES

[1] "A CMOS low power, quad channel, 12 bit, 40Ms/s pipelined ADC for applications in particle physics calorimetry", C. Fachada et al., Proceedings of the 9th Workshop on Electronics for the LHC Experiments, Amsterdam, Sept., 2003.

[2] "G-Link and Gigabit Ethernet Compliant Serializer for LHC Data Transmission", P. Moreira et al, Proceedings of the IEEE Nuclear Science Symposium and Medical Imaging Conference, Lyon, France, Oct. 2000.

[3] "Development of a Radiation Tolerant 2.0V standard cell library using a commercial deep submicron CMOS technology for the LHC experiments", K. Kloukinas, F. Faccio, A. Marchioro, P. Moreira, Proceedings of the 4th Workshop on Electronics for the LHC Experiments, Rome, Sept., 1998.

[4] "Radiation Tolerant VLSI circuits in standard deep submicron CMOS technologies for the LHC experiments" Practical Design Aspects", G. Anelli et al., IEEE Transactions on Nuclear Science, Vol. 46, No. 6, pt1, pp 1690-1696, Dec. 1999.

[5] "A Configurable Radiation Tolerant Dual-Ported Static RAM macro, designed in a 0.25µm CMOS technology for applications in the LHC environment, Proceedings of the 8th Workshop on Electronics for the LHC Experiments, Colmar, France, Sept. 2002.

[6] "The CMS Tracker APV25 0.25  $\mu$ m CMOS Readout Chip", M.Raymond et al., Proceedings of the 6<sup>th</sup> Workshop on Electronics for the LHC Experiments, Krakow, Poland, Sept. 2000.