### <u>The Readout System of the</u> <u>ATLAS Liquid Argon Calorimeters</u>

I mma Riu University of Geneva on behalf of the LARG ROD group

8<sup>th</sup> Workshop on Electronics for LHC Experiments COLMAR, France

10 September 2002

#### Outline:

- Introduction
- Readout requirements
- The LArgon Readout architecture
- ROD system description
- DSP test results
- Status and plans
- Summary

# Introduction

- The ATLAS detector
- The ATLAS Calorimetry
- The Liquid Argon Readout

#### The ATLAS detector

#### ATLAS: A Toroidal LHC ApparatuS



# The ATLAS calorimetry

- The ATLAS Liquid Argon calorimeter is divided into:
  - Barrel calorimeter (EMB)
  - Electromagnetic endcaps (EMEC)
  - Hadronic endcaps (HEC)
  - Forward calorimeter (FCAL)
- In total, around 190 000 channels are to be read out.





#### The challenging LArgon Readout

- Large dynamic energy range: [50 MeV 3TeV] -----> 16 bits !
- The bunch crossing (BX) rate at LHC is 40 MHz (each 25 ns): For a signal of 600 ns, the pile-up takes up to 24 BXs.

Pile-up and electronic noise should be minimized.

Required relative energy resolution: ~ 10% /  $\sqrt{E}$ :  $\bullet$ 

Good calibration of the electronics response. Signal after shaping Amplitude 1 Different 0.80.6 0.4



0.2

-0.2

 $\overline{t}_{p}$ 

100

BXs

Imma Riu

300

400

200

The Readout System of the **ATLAS Liquid Argon Calorimeters** 

600

500

Time (ns)

The functionality and requirements of the Readout electronics

#### Functionality and requirements

#### • Functionality:

- Derive the precise energy and arrival time of calorimeter signals from discrete samplings of the pulse.
- Perform monitoring and format the digital stream for the DAQ system.
- Generate a 'busy' signal in case the trigger rate is too high.
- Requirements:
  - High channel density.
  - Modular design: basic components should be easily changed/upgraded.
  - Event processing time including monitoring and histogramming tasks  $\leq$  10  $\mu$ s.
  - Low power consumption.



A collaboration among:

- Academia Sinica, Taiwan
- LAL Orsay
- LAPP Annecy
- LPNHE Université Paris VI
- MPI Munich
- Nevis Laboratories
- Southern Methodist University
- University of Geneva

# The Liquid Argon Readout Architecture

#### The LArgon readout architecture (I)



# The LArgon readout architecture (II)



**ATLAS Liquid Argon Calorimeters** 



# FEB and ROD boards functionality



- Radiation tolerant board.
- 128 channels / FEB.
- Fast signal shaping (~ 50 ns).
- Five digitized points using three gains in the ratio 1/10/100.
- Gain selection.
- LArgon needs ~1600 FEBs.

- Computes time, energy and shape quality flag ( $\chi$ 2) in  $\leq$  10 $\mu$ s.
- Use of optimal filtering algorithm.
- Use of Digital Signal Processors (DSP).
- Generates the 'busy' signal.
- LArgon needs ~200 RODs.

#### **Optimal filtering algorithm**

- The technique is an error minimization of E and  $E \cdot t$ .
- Computation of signal arrival time and energy from a set of measurements (5) using some constraints.



$$E = \sum_{i=1}^{5} a_i \cdot (S_i - Ped)$$
$$E \cdot t = \sum_{i=1}^{5} b_i \cdot (S_i - Ped)$$
$$a_i, b_i : \text{weights}$$

#### ROD system description

- Input: 8 optical fibers with FEB raw data (16 bits @ 80 MHz)
- **Output:** 4 optical fibers with ROD calculations (32 bits @ 40 MHz)
- Modules:
  - 9U VME64x board: ROD module (14 RODs / crate at maximum)
  - 9U VME64x board: Transition Module (TM)
  - Custom-made back plane called P3 (for TTC and busy signals)



### **ROD** physical description

- ROD module:
  - ROD Mother Board (MB):
    - I mplements the VME interface, the TTC and deals with the busy signal.Routes the input data to the PU boards .
    - Routes the PUs output data to the TM after serialization.
  - 4 Processing Unit (PU) boards mounted on top of the ROD MB:
    Perform the optimal filtering algorithm calculations.
- Transition module:
  - Transition Module board (TM):
    - De-serializes the ROD output data.
    - Sends to the ROD the Link Down and Link Full signals from the DAQ.
  - 4 S-link interface cards mounted on top of the TM: Send the ROD output to the DAQ.

#### ROD module scheme



#### Processing Unit Board scheme



- Input FPGA:
  - FEB data serial to parallel conversion.
  - Data rearrangement.
  - Error checking.
- Output FPGA:
  - VME and TTC interface to the DSP.

- Digital Signal Processor (DSP):
  - Perform the optimal filtering algorithm calculations.
- FIFO:
  - Contains the DSP output data to be read by the MB.

#### Transition Module board scheme



- Requirements:
  - Data bandwidth should be 1.28 Gbit/s.
- The information coming back from the DAQ is:
  - Link down (LD)
  - Link full flag (LFF)
- FIFO:
  - Used for stocking data when LD or LFF come.
- DeSer (de-serializer):
  - De-serialize the data from the ROD module.

#### Data path in the ROD





## ROD requirements

- High channel density.
- Modular design: basic components should be easily changed/upgraded.
- Event processing time including histogramming tasks  $\leq$  10  $\mu$ s.
- Low power consumption.

#### **Boards comparison**

#### ROD demonstrator

(the past)

- Built in 2000.
- Board frequency: 40 MHz.
- 2 optical receivers as mezzanine in TM.
- 1 Output Slink in the Transition Module.
- 4 PUs: 1 DSP/PU, 64 channels/DSP.
- Used in Test Beams and for tests of PU.



#### ROD prototype

(the future)

- To be built in 2002.
- Parts of the board at 80 MHz.
- 8 optical links integrated in the ROD.
- 4 Slink Outputs in the TM.
- 4 PUs: 2 DSP/PU, 128 channels/DSP.



- Sending of data serialized in LVDS at 280 MHz to the TM.
- Addition of the staging FPGAs.
- Use of BGA chips.



#### Tests with the DSP

- Two PU boards provided with two different DSPs were tested:
  - Texas Instruments DSP 6203
  - Texas Instruments DSP 6414



#### **DSP** characteristics

- TI 6203:
  - 300 MHz; 3.33 ns core cycle
  - Fixed-point arithmetic
  - Based on VelociTI<sup>™</sup>, an advanced Very Long Instruction Word (VLIW) architecture.
  - Eight 32-bit instructions/cycle
  - 875 kbytes internal memory:
    - 375 kbytes Program RAM
    - 500 kbytes Data RAM
  - 32-bit External Memory Interface (EMIF)
  - 4 DMA channels
  - 384-pin BGA package
  - 3.3V I/O, 1.5V core

- TI 6414:
  - 600 MHz; 1.67 ns core cycle
  - Fixed-point arithmetic
  - Based on VelociTI.2<sup>™</sup> VLIW architecture:
    - Include special purpose instructions to accelerate performance in key applications like imaging. For example, support for packed data processing.
  - Eight 32-bit instructions/cycle
  - L1/L2 Memory Architecture:
    - 16 kbytes L1 Program Cache (L1P)
    - 16 kbytes L1 Data Cache (L1D)
    - 1 Mbyte L2 RAM/Cache
  - Two EMIF: 64-bit and 16-bit
  - 64 EDMA channels
  - 532-pin BGA package
  - 3.3V I/O, 1.4V core

#### DSP time measurement

- Comparison of measured DSP processing time:
  - 128 channels per event.
  - Optimal filtering code computing E,t and  $\chi^2$ .
  - Histogramming of channels having E> E<sub>T</sub> (1.9 ADC counts)
  - Code optimized for each DSP.



Average DSP processing time versus

**Conclusion:** Both fulfill the condition  $t < 10 \ \mu s$ .

The DSP 6414 is faster than the DSP 6203.

It is not twice as fast as it would be expected.



# ROD requirements

- High channel density.
- Modular design: basic components should be easily changed/upgraded.
- Event processing time including histogramming tasks  $\leq$  10 µs.
- Low power consumption.

#### **DSP** memory organization

- Memory contents in the DSP of the PU:
  - Input data and output data
  - Optimal algorithm weights
  - Histogram contents
- Good organization of the memory is needed:

The weights (heavily used) need to be always in the L1D cache of DSP 6414.



#### Histogramming in DSP 6414



# ROD requirements

- High channel density.
- Modular design: basic components should be easily changed/upgraded.
- Event processing time including histogramming tasks  $\leq$  10  $\mu$ s.
- Low power consumption. Estimated to be ~80 W per ROD

# Status and plans

#### Status and plans

- Decision of the DSP chip:
- ROD preliminary design review:
- Prototype production:
- Pre-series production:
- PRR (Production Readiness Review) :
- Series production:

Done (DSP TI 6414) September 2002 Nov/Dec 2002 June 2003 Oct/Nov 2003 January 2004

### Delicate points of the ROD

- Cooling of G-link chips:
  - 35 °C at maximum for 80 MHz clock frequency.
  - Cooling with water or air are being studied.
- Staging mode:
  - Half of the PUs will be used at the beginning of LHC.
  - The DSP processes 128\*2 channels.
- The ROD output goes through serializer/de-serializer at 280 MHz.
- The DSP power consumption:
  - Histogramming impacts the power consumption, as it accesses memory which is not mapped in the cache.
  - The DSP does not like the change of data read/data write. This causes dirty lines and higher power consumption.



#### **Summary**

- The ROD project is ongoing well.
- The DSP 6414 has been chosen recently. It needs careful memory treatment.
- The first prototypes of the ROD, the TM and the P3 back plane are expected by the end of 2002.
- The ROD mass production is expected to be finished in 2004.