Dissertation

submitted to the
Combined Faculties of the Natural Sciences and Mathematics
of the Ruperto-Carola-University of Heidelberg, Germany
for the degree of
Doctor of Natural Sciences

Put forward by
Sebastian Dittmeier
born in Aschaffenburg
Oral examination: 22.11.2018
Fast data acquisition
for silicon tracking detectors
at high rates

Referees:
Prof. Dr. André Schöning
Prof. Dr. Norbert Herrmann
FAST DATA ACQUISITION FOR SILICON TRACKING DETECTORS AT HIGH RATES

Silicon tracking detectors play a key role in many current high energy physics experiments. To enhance experimental sensitivities for searches for new physics, beam energies and event rates are constantly being increased, which leads to growing volumes of detector data that have to be processed. This thesis covers high-speed data acquisition for silicon tracking detectors in the context of the Mu3e experiment and future hadron collider experiments. For the Mu3e experiment, a vertical slice of the trigger-less readout system is realized as a beam telescope consisting of 8 layers of pixel sensors that are read out using a prototype of the Mu3e front-end board. The performance of the full readout system is studied during beam tests. Sensor hit rates of up to 5 MHz can be handled without significant losses. Hence, the system fulfils the requirements for the first phase of the experiment.

To fully exploit the potential of silicon tracking detectors at future hadron collider experiments, the implementation of high-speed data links is mandatory. Wireless links operating at frequencies of 60 GHz and above present an attractive alternative to electrical and optical links, as they offer high bandwidth, small form factor and low power consumption. This thesis describes readout concepts for tracking detectors applying wireless data transfer and presents studies of wireless data transmission.
SCHNELLE DATENERFASSUNG FÜR SILIZIUM-SPURDETEKTOREN BEI HOHEN RÄTEN

In vielen aktuellen Experimenten der Hochenergiephysik spielen Silizium-Spurdetektoren eine Schlüsselrolle. Zur Verbesserung der Sensitivität von Experimenten, die nach neuer Physik suchen, werden Strahlenenergien und Ereignisraten ständig erhöht, was zu wachsenden Mengen an Detektordaten führt, die verarbeitet werden müssen.
Damit das Potenzial der Silizium-Spurdetektoren bei zukünftigen Hadronencollider-Experimenten voll ausgeschöpft werden kann, ist die Verwendung von Hochgeschwindigkeits-Datenverbindungen unabdingbar. Drahtlose Datenverbindungen, die im Frequenzbereich von 60 GHz und mehr betrieben werden, stellen eine attraktive Alternative zu elektrischen und optischen Verbindungen dar, denn sie bieten eine hohe Bandbreite bei kleinem Formfaktor und geringem Stromverbrauch. In dieser Arbeit werden drahtlose Auslesekonzepte für Spurdetektoren beschrieben und Studien zur drahtlosen Datenübertragung vorgestellt.
CONTENTS

I  INTRODUCTION

1  ELEMENTARY PARTICLE PHYSICS  3
   1.1  The Standard Model of particle physics  3
   1.2  Physics beyond the Standard Model  5
   1.3  High energy physics experiments  5

2  SILICON TRACKING DETECTORS  7
   2.1  Principle of silicon detectors  7
   2.2  Principle of tracking detectors  8
   2.3  Silicon pixel detector technologies  8

3  DATA TRANSMISSION BASICS  11
   3.1  Classical electrodynamics  11
   3.2  Plane waves in non-conducting materials  12
   3.3  Plane waves in conducting materials  12
   3.4  Plane waves in real dielectrics  14

4  HIGH-SPEED DATA TRANSMISSION  15
   4.1  Electrical data transmission  15
   4.2  High-speed serial links  16
      4.2.1  Physical layer  17
      4.2.2  Data link layer  20
   4.3  Implementation of high-speed links in FPGAs  21
      4.3.1  LVDS receivers in Stratix IV devices  21
      4.3.2  Gigabit transceivers in Stratix IV devices  23
      4.3.3  Pre-emphasis in Stratix IV transmitters  27
   4.4  Optical data transmission  28
   4.5  Wireless data transmission  29
   4.6  Digital modulation schemes  30
      4.6.1  Amplitude-shift keying  30
      4.6.2  Phase-shift keying  30
      4.6.3  Frequency-shift keying  31
      4.6.4  Multi-level modulation schemes  32
   4.7  Testing high-speed links  32

II  DATA ACQUISITION FOR THE MU3E EXPERIMENT

5  THE READOUT SYSTEM OF THE MU3E EXPERIMENT  37
   5.1  The Mu3e experiment  37
   5.2  Readout architecture of the Mu3e experiment  40
   5.3  Expected data rates  41
      5.3.1  Data rates at the front-end  41
      5.3.2  Data rates between front-end and switching boards  43
      5.3.3  Data rates between switching boards and filter farm  45
      5.3.4  Phase II - data rates of the pixel detector  46
      5.3.5  Requirements for the data acquisition system  48
5.4 The MuPix pixel sensors ..................................... 49
  5.4.1 MuPixX ............................................. 49
  5.4.2 The latest prototype: MuPix8 .......................... 50
5.5 The front-end board ........................................... 58
  5.5.1 The front-end board in development based on the Arria V FPGA ........................................... 60
  5.5.2 The first front-end board prototype based on the Stratix IV FPGA ........................................... 63
5.6 Switching board ................................................. 67
5.7 Filter farm ..................................................... 68
5.8 Clock and reset distribution system .......................... 70
5.9 Readout links .................................................. 71
  5.9.1 Electrical links ........................................ 71
  5.9.2 Optical links .......................................... 72
6 LINK STUDIES .................................................. 73
  6.1 Studies of the serializer of the MuPix sensors .............. 73
    6.1.1 Data rate studies with MuPix7 .......................... 74
    6.1.2 Eye diagram studies with MuPix8 .......................... 77
    6.1.3 Bit error rate studies with MuPix8 .......................... 84
  6.2 LVDS receiver tests of the front-end FPGA ................. 86
  6.3 Optical link studies ....................................... 87
    6.3.1 MiniPods and QSFP on the front-end board ............. 87
    6.3.2 SFP+ transceivers .................................. 89
7 A VERTICAL SLICE OF THE MU3E READOUT SYSTEM ............ 91
  7.1 The MuPix Telescope ......................................... 91
  7.2 Hardware description of the vertical slice ................ 92
    7.2.1 MuPix8 ............................................. 94
    7.2.2 QTH-SCSI-Adapter-PCB ................................. 94
    7.2.3 Front-end board ...................................... 95
    7.2.4 Clock distribution .................................... 95
    7.2.5 Back-end PC and receiver card ........................ 96
  7.3 Firm- and software at the front-end ........................ 96
    7.3.1 Front-end firmware .................................. 97
    7.3.2 Front-end software on the FPGA .......................... 104
    7.3.3 Front-end software on the PC ........................... 105
  7.4 Firm- and software at the back-end ........................ 106
    7.4.1 Back-end firmware .................................... 106
    7.4.2 Back-end software on the FPGA .......................... 113
    7.4.3 Back-end software on the PC: MuPix Telescope ........ 113
  7.5 Commissioning ............................................... 114
  7.6 Test beam studies ........................................... 116
    7.6.1 Periods of active data taking for link studies ........ 116
    7.6.2 Temperature monitoring ................................ 118
    7.6.3 Optical links ........................................ 119
    7.6.4 Electrical sensor data links ............................ 122
    7.6.5 MuPix readout: hit rates, load and multiplicity ........ 127
7.6.6 Performance of the commissioning readout mode . . 130
7.6.7 Readout latency . . . . . . . . . . . . . . . . . . . . . . 132
7.6.8 Performance of the hit sorter readout mode . . . . . 136

III WIRELESS DATA ACQUISITION FOR FUTURE HEP EXPERIMENTS
8 READOUT FOR FUTURE TRACKING DETECTORS 141
  8.1 Readout challenges . . . . . . . . . . . . . . . . . . . . . 141
  8.2 New readout technologies for HEP . . . . . . . . . . . . 142
  8.3 Wireless data transmission . . . . . . . . . . . . . . . . . 143
  8.4 Readout concepts and potential benefits . . . . . . . . . 144
  8.5 A wireless demonstrator at 60 GHz . . . . . . . . . . . 145
9 STUDIES OF WIRELESS DATA TRANSMISSION 147
  9.1 Transmission through detector modules . . . . . . . . . 147
  9.2 Noise pickup studies . . . . . . . . . . . . . . . . . . . . . 150
  9.3 Prospects of carrier frequencies above 200 GHz . . . . . 151

10 SUMMARY, CONCLUSION AND OUTLOOK 155
Appendices 161
A LIST OF ABBREVIATIONS 163
B EYE DIAGRAM ANALYSIS OF THE SERIAL LINKS OF MUPIX8 165
C LIST OF MY OWN PUBLICATIONS 169
Bibliography 171
Danksagung 183
OUTLINE OF THESIS

The work described in this thesis is focused on high-speed data acquisition for the Mu3e experiment and envisioned future particle physics experiments. The first part of the thesis gives an introduction into elementary particle physics (chapter 1), silicon tracking detectors (chapter 2) and the foundations of high-speed data transmission (chapter 3). Different data transmission technologies and the implementation of high-speed links in FPGAs are described (chapter 4). The second part of the thesis starts with an introduction to the Mu3e experiment, which is followed by a description of the experiment’s readout system (chapter 5). Studies of high-speed data links, which are conducted in the context of the readout system’s development, are reported (chapter 6). A vertical slice of the Mu3e readout system is implemented and its performance is studied during test beam measurements (chapter 7). The last part is dedicated to the readout challenges of future experiments and how they can be overcome using wireless data transfer technologies (chapter 8). Studies of wireless signal transmission are conducted and effects of data transmission using mm-waves on silicon detectors are investigated (chapter 9). Lastly, the thesis is concluded and an outlook is given (chapter 10).

CONTRIBUTIONS FROM THE AUTHOR

The development of a data acquisition system for a particle physics experiment is a complex task which involves contributions from many people. The implementation of the vertical slice of the Mu3e readout system is based on the development of the MuPix Telescope and the readout architecture which was first developed for MuPix3. The author developed large parts of the readout and control firmware for the MuPix7 and MuPix8 sensors and contributed also to the DAQ software, relying on the framework that has been developed within the group over many years.

The front-end board prototype was developed by the electronics workshop of the Physics Institute. The author brought the board into operation, conducted the tests of the on-board components, and contributed to the development of the next front-end board version.

The author developed the firmware for the FPGAs in the vertical slice setup. In addition, he developed the software to control the front-end FPGA and contributed to the development of the back-end software.

The performance of the vertical slice setup was measured during three test beam campaigns which were conducted in a collaborative effort by many members of the group. At the first test beam, the author commissioned the setup. The results presented in this thesis originate from the second and third test beam campaigns, which were performed in absence
of the author. The analysis of the performance of the readout system was carried out by the author.

The author designed the PCBs to characterize the MiniPod transceivers and to interface the front-end board with the MuPix8 PCBs, which host the pixel sensors.

The characterization of the data links of the MuPix sensors and of the optical transceivers described within this thesis were conducted by the author. The author developed the firmware used for the bit error rate tests of the optical and wireless transceivers based on design examples by the FPGA’s manufacturer.

The author performed the wireless signal transmission studies described in the last part of this thesis. The noise pickup study was conducted in collaboration with the ATLAS group at the University of Freiburg. The data transmission studies with the 240 GHz transceiver were carried out by the author and one of the designers from IHCT Wuppertal.
The Standard Model of particle physics is one of the great triumphs of modern physics. Despite its huge success in describing the fundamental particles and their interactions up to the electro-weak scale, there are many questions yet open to be answered. For instance, the Standard Model does not provide an explanation for the origin of neutrino masses and the nature of dark matter. Various new physics models try to explain the phenomena that are missing in the formulation of the Standard Model.

Similar to the Higgs mechanism, which predicted the existence of a scalar boson, many of these models predict the existence of new, heavy particles. Accelerator experiments therefore aim to increase beam energies to be able to directly produce more and more massive particles. But new physics could also manifest itself in form of small deviations from Standard Model expectations, which can be probed best in rare decays. Hence, experiments try to raise beam rates and luminosities to be able to observe highly-suppressed processes.

Silicon tracking detectors are key tools for all large modern particle physics experiments enabling measurements of momenta and vertices of charged particles with highest precision, even at high particle rates. Rising particle fluxes, that are caused by increasing beam energies and luminosities at hadron colliders, for instance, lead to growing data volumes produced by silicon detectors that have to be handled by their readout systems. This requires utilization of high-speed data links.

In this part, an introduction into the Standard Model, silicon tracking detectors and high-speed data links is given. The working principles and the shortcomings of different data transmission techniques are explained from classical electrodynamics.
Searches for the fundamental constituents of matter and their interactions have a long history. Until the 20th century, it was controversially disputed amongst scientists if matter was a continuum or if it was composed of fundamental, indivisible particles. Today, thanks to the discoveries by generations of scientists, we have a theory that describes the known fundamental particles and interactions with tremendous accuracy - the Standard Model of particle physics.

1.1 THE STANDARD MODEL OF PARTICLE PHYSICS

The Standard Model (SM) of particle physics describes the fundamental particles and their interactions based on quantum field theories. Its particle content (Figure 1.1), can be divided into two main classes: fermions and bosons.

Fermions are particles with half-integer spin. All SM fermions possess spin-$\frac{1}{2}$. They are the basic building blocks of matter. Bosons are particles with integer spin. The bosons of the SM are vector bosons with spin-1,
except the Higgs particle which is a scalar with spin-0. The vector bosons are the mediators of the Standard Model’s interactions.

The Standard Model contains 12 fermions and their corresponding antiparticles. The fermions are categorized into quarks and leptons. All quarks take part in the strong, weak and electromagnetic interactions, as they possess color charge, weak isospin and electric charge. All leptons take part in the weak interaction. In addition, charged leptons interact electromagnetically.

Quarks and leptons are divided into three generations. Each generation of quarks consists of a positively charged particle with electrical charge \( q = +\frac{2}{3} \), and a negatively charged particle with \( q = -\frac{1}{3} \). The first generation contains the up- and down-quark, which are the basic constituents of protons and neutrons, the components of atomic nuclei. The quarks of the second and third generation, charm-, strange-, top- and bottom-quark, have higher masses and are unstable. Each generation of leptons contains an electrically charged particle with charge \( q = -1 \) and a neutral particle with charge \( q = 0 \). The electron, the charged lepton of the first generation, is stable. Electrons and atomic nuclei form neutral bound states, called atoms. The charged lepton of the second and third generation, the muon and the tauon, possess higher masses and are unstable. All three neutral leptons, called neutrinos, are massless within the SM.

The Standard Model contains 12 vector boson fields: 8 gluons mediating the strong interaction, 2 charged (\( W^\pm \)) and a neutral boson (\( Z \)) mediating the weak interaction, and the photon \( \gamma \), mediating the electromagnetic interaction. Gluons and photons are massless, while the \( W \)- and \( Z \)-bosons are massive.

The boson fields arise due to the local gauge invariance of the Standard Model’s particle fields under the \( SU(3) \times SU(2) \times U(1) \) symmetry groups. The bosons are therefore also referred to as gauge bosons. The strong interaction arises due to the symmetry of quarks under local \( SU(3) \) transformations. The weak and electromagnetic interactions are represented by the \( SU(2) \) and \( U(1) \) groups. If these two symmetry groups were to be considered independently, however, all gauge bosons would be massless, which is in conflict with observations.

With both electromagnetic and weak interaction possessing a neutral gauge boson, the two interactions can be unified into the electroweak interaction. The arising electroweak symmetry, represented by the group product \( SU(2) \times U(1) \), however, is spontaneously broken through the Higgs mechanism, giving the \( W \)- and \( Z \)-bosons a non-zero mass. The electroweak symmetry breaking further implies the existence of a massive, scalar boson, the Higgs particle, which is the latest addition to the list of particles of the SM, discovered at the LHC in 2012 [3, 4]. Through Yukawa interaction, the fermion fields also acquire masses from the Higgs field.
1.2 PHYSICS BEYOND THE STANDARD MODEL

The Standard Model describes the underlying physics up to the electroweak scale with tremendous precision. However, it possesses many free parameters \((18)\), which are only measured experimentally and cannot be derived from theory. In addition, it does not provide answers to many yet open questions of physics.

What is the nature of neutrinos? Up to now, it is unknown if neutrinos are Dirac or Majorana particles. Neutrinos are experimentally observed to mix and thus to not conserve lepton flavour, which, however, is an accidental symmetry of the SM. This observation led to the conclusion that neutrinos are not massless. The origin and the actual value of these masses, however, are yet unclear. In addition, from observation of neutral lepton flavour violation, the natural question arises, if charged lepton flavour is also violated in nature.

Why are there three generations of matter? The Standard Model contains three generations of quarks and leptons, but there is no underlying symmetry. Are there more generations at even higher energy scales?

What is the nature of dark matter? Since the 1930s it is well-established that the universe contains some form of non-luminous matter in order to explain the measured velocity distributions of stars orbiting around the galactic centre [5].

These are just some examples of questions the Standard Model cannot answer so far. Various high energy physics experiments try to address these questions.

1.3 HIGH ENERGY PHYSICS EXPERIMENTS

High energy physics experiments can be divided into two categories: specialized precision experiments, that are dedicated to measure specific decay channels and particle properties with highest accuracy, and general purpose experiments that allow to perform multitudes of measurements. The Mu3e experiment, that is described in chapter 5, is a prime example for a specialized experiment searching for an ultra-rare decay mode of the muon. The ATLAS experiment at the Large Hadron Collider (LHC) is an example for a general purpose experiment with a rich physics programme. With the ATLAS detector, searches for heavy resonances and dark matter particles are performed. In addition, precision tests of the Standard Model and tests of models beyond that, like supersymmetry (SUSY), are carried out.

Collider experiments like ATLAS investigate individual particle interactions originating from two colliding beams. The experiments rely on a wide range of technologies to detect and measure the properties of the particles produced in these interactions [6]. Figure 1.2 shows the structure of a general purpose particle physics detector. It consists of four parts. Closest to the interaction region, a tracking detector is used to measure the momentum of charged particles in the presence of a magnetic field.
Tracking detectors are discussed in more detail in chapter 2. The tracking system is followed by an electromagnetic calorimeter (ECAL), measuring the energy of photons and electrons. The hadronic calorimeter (HCAL), surrounding the ECAL, is used to measure the energy of hadrons like pions and neutrons. The only particles escaping all these detector layers are muons and neutrinos. Muons are detected in the muon detectors, located at the outer shell of the experiment. Neutrinos leave the detector undetected. Combining the measurements of all particles produced in an interaction, the original, primary interaction is tried to be reconstructed to identify the underlying physics.
SILICON TRACKING DETECTORS

Tracking detectors are a key component for all modern, large particle physics experiments, as they are enabling the momentum and vertex measurements of charged particles [6]. Modern tracking detectors are typically based on semiconductor technologies, as they allow to build thin detectors with high granularity that can still be operated at high particle rates. This, in turn, allows to build tracking detectors very close to interaction regions to study short-lived particles [7].

2.1 PRINCIPLE OF SILICON DETECTORS

The general working principle of a silicon detector is illustrated in Figure 2.1. A charged particle passing through a silicon detector creates free electron-hole pairs. For detection, the silicon contains implants of different p- and n-doping concentration. Having a reverse-bias voltage applied, depletion zones with large electric fields form in the silicon starting from the p-n junctions. The charge carriers drift in the direction of the electric field, which generates an induction signal that can be picked-up by sensing electrodes and further processed using amplifiers and comparators. High segmentation of the silicon detector allows to obtain a precise spatial information. Sensors segmented in one dimension are called strip detectors, while sensors fine segmented in two dimensions are called pixel detectors.

![Figure 2.1: Principle of operation of a semiconductor detector.](image-url)
Figure 2.2: A charged particle leaves hits, indicated as red dots, in a barrel shaped tracking detector with multiple layers. The solenoidal magnetic field $B$ forces the particle on a helical trajectory with radius $R$. In the longitudinal plane, the trajectory is unaffected. The inclination angle of the track in this plane is $\lambda$.

2.2 PRINCIPLE OF TRACKING DETECTORS

The general principle of a barrel-shaped tracking detector operated in a solenoidal magnetic field is depicted in Figure 2.2. Charged particles with a-priori unknown momentum are exposed to the magnetic field, which forces their movement on helical trajectories. The particles pass through the tracking detector, which consists of several detector layers. A particle passing through such a layer creates a "hit". Hits from several layers can be combined to reconstruct the track of the particle. In the transverse plane, the track forms a circle, which can be reconstructed and yields the bending radius $R$, see Figure 2.2a. In the longitudinal plane, the trajectory is not affected by the magnetic field, thus the inclination angle $\lambda$ of the track can be reconstructed from a straight line. For a singly charged particle, the momentum $p$ can be reconstructed using these two values through

$$p \cdot \cos \lambda = 0.3 \cdot BR,$$

where $B$ is the magnetic field in tesla and $R$ is the radius in metres [6].

2.3 SILICON PIXEL DETECTOR TECHNOLOGIES

Silicon pixel detectors are highly segmented devices that offer high spatial resolution in two dimensions. Different implementations of silicon pixel detectors exist. So-called hybrid pixel sensors consists of two devices: a silicon detector chip, that implements a large depletion zone to obtain a large signal, and a readout chip, that processes the analogue signals, digitizes the information and provides a physical link for data transmission. Connection between sensor and readout chip is realized using solder bumps, as is illustrated in Figure 2.3a. This scheme is used for instance in the ATLAS [8] and CMS [9] experiment at LHC.
Monolithic active pixel sensors (MAPS), as illustrated in Figure 2.3b, are produced in complementary metal-oxide semiconductor (CMOS) processes. This allows to integrate sensing diodes and readout electronics in a single device. No external readout chip is required. MAPS have been used for instance in the STAR experiment [10]. Compared to hybrid detectors, MAPS suffered from being slower by collecting charge mainly via diffusion. With the development of high-voltage monolithic active pixel sensors (HV-MAPS) [11] that rely on charge collection via drift, as the hybrid sensors do, this drawback is eliminated.
The high segmentation of silicon pixel detectors in combination with high particle rates results in large data volumes. Readout of silicon pixel tracking detectors therefore requires the utilization of high-speed data transmission technologies. All important features of high-speed data transmission and shortcomings of different transmission media can be derived from a set of four equations with appropriately chosen boundary conditions. These four equations, the Maxwell equations, are the heart of classical electrodynamics. In the following, the Maxwell equations and their solutions in dielectric and conducting media are described. Implications for high-speed data transmission are given.

3.1 CLASSICAL ELECTRODYNAMICS

Classical electrodynamics is based on four fundamental, linear equations [13], the Maxwell equations:

\[ \nabla \cdot \vec{D} = \rho, \quad (3.1) \]
\[ \nabla \times \vec{H} - \frac{\partial \vec{D}}{\partial t} = \vec{j}, \quad (3.2) \]
\[ \nabla \times \vec{E} + \frac{\partial \vec{B}}{\partial t} = 0, \quad (3.3) \]
\[ \nabla \cdot \vec{B} = 0. \quad (3.4) \]

Within these equations, \( \vec{E} \) and \( \vec{B} \) represent the electric and magnetic fields, \( \vec{D} \) the electric displacement and \( \vec{H} \) the magnetising field. Charge and current densities are denoted by \( \rho \) and \( \vec{j} \). The following relations exist between the fields:

\[ \vec{D} = \varepsilon_r \varepsilon_0 \vec{E}, \quad (3.5) \]
\[ \vec{B} = \mu_r \mu_0 \vec{H}, \quad (3.6) \]

with the vacuum permittivity \( \varepsilon_0 \) and permeability \( \mu_0 \), and the medium dependent relative permittivity \( \varepsilon_r \) and permeability \( \mu_r \). In general, both are complex numbers and frequency dependent. The conductivity \( \sigma \) of a material is introduced through Ohm’s law,

\[ \vec{j} = \sigma \vec{E}, \quad (3.7) \]

connecting current density and electric field.
3.2 Plane Waves in Non-Conducting Materials

For a linear and homogeneous nonconducting medium without sources, i.e. $\rho = 0$ and $\sigma = 0$, hence, $\mathbf{j} = 0$, the wave equation [14],

$$\Delta \mathbf{E} = \varepsilon\mu \frac{\partial^2 \mathbf{E}}{\partial t^2} ,$$

(3.8)

for the electric field can be derived from Equations 3.2 and 3.3. This equation is solved by a plane wave,

$$\mathbf{E}(\mathbf{x}, t) = \mathbf{E}_0 e^{i\omega t - i\mathbf{k} \cdot \mathbf{x}} ,$$

(3.9)

which is a harmonic function in time and space with frequency $\omega$ and wave vector $\mathbf{k}$. Equation 3.1 furthermore requires the wave to be of transverse nature, fulfilling $\mathbf{k} \cdot \mathbf{E} = 0$. Wave vector and frequency are connected through the dispersion relation

$$\mathbf{k}^2 = \omega^2 \mu \varepsilon .$$

(3.10)

The wave’s phase velocity $v_p$ turns out to be

$$v_p = \frac{1}{\sqrt{\mu \varepsilon}} .$$

(3.11)

It is equal to the group velocity $v_g$ if permittivity and permeability do not depend on the frequency. For $\varepsilon(\omega) \neq const.$, the phase velocity depends on the frequency which leads to chromatic dispersion.

3.3 Plane Waves in Conducting Materials

In a conducting medium, $\sigma \neq 0$, hence $\mathbf{j} \neq 0$, the wave equation takes the form:

$$\Delta \mathbf{E} = \varepsilon\mu \frac{\partial^2 \mathbf{E}}{\partial t^2} + \sigma\mu \frac{\partial \mathbf{E}}{\partial t} .$$

(3.12)

Applying the general plane wave ansatz from Equation 3.9, the dispersion relation gets complex,

$$\mathbf{k}^2 = -i\omega \mu \sigma + \omega^2 \mu \varepsilon .$$

(3.13)

Dividing the wave vector into a real and imaginary part, $\mathbf{k} = \mathbf{\alpha} - i\mathbf{\beta}$, the plane wave solution reads

$$\mathbf{E}(\mathbf{x}, t) = \mathbf{E}_0 e^{i\omega t - i\mathbf{\alpha} \cdot \mathbf{x}} e^{-\mathbf{\beta} \cdot \mathbf{x}} ,$$

(3.14)
with the wavelength given by $\alpha$ and an attenuation factor depending on $\beta$. Applying $\vec{k} = \vec{a} - i\vec{b}$ to the dispersion relation, $\alpha$ and $\beta$ turn out to be

$$\alpha = \omega \sqrt{\mu \varepsilon} \left[ \frac{1}{2} + \frac{1}{2} \sqrt{1 + \frac{\sigma^2}{\omega^2 \varepsilon^2}} \right]^{1/2}, \quad (3.15)$$

$$\beta = \frac{\omega \mu \sigma}{2\alpha}. \quad (3.16)$$

Both depend on the frequency and the material properties.

The distance a wave travels inside a conductor depends on the attenuation coefficient $\beta$. The inverse of $\beta$, the distance at which the amplitude is reduced to a factor of $1/e$, is called the skin depth $\delta$. One refers to this effect, which limits the region of a high frequency wave travelling inside a conductor to the conductor’s outer surface, as the skin effect. For a good conductor, $\sigma \gg \omega \varepsilon$, the skin depth is approximately

$$\delta = \frac{1}{\beta} \approx \sqrt{\frac{2}{\omega \mu \sigma}}, \quad (3.17)$$

so inversely proportional to the square root of the frequency. Under the same assumptions, the phase velocity $v_p$ is a function of frequency and the material properties,

$$v_p = \frac{\omega}{\alpha} \approx \frac{1}{\sqrt{\mu \varepsilon}} \sqrt{\frac{2\omega \varepsilon}{\sigma}}, \quad (3.18)$$

which leads to dispersion. In contrast to the perfect dielectric case, the group velocity does not equal the phase velocity. Instead,

$$v_g = \frac{d\omega}{d\alpha} \approx 2v_p. \quad (3.19)$$

The wave impedance $Z$ follows from the intensities of the electric and magnetic field:

$$Z = \frac{E_0}{H_0} = \frac{\omega \mu}{\alpha - i\beta}. \quad (3.20)$$

For a good conductor, it can be approximated to

$$Z \approx (1 + i) \sqrt{\frac{\omega \mu}{2\sigma}}. \quad (3.21)$$

The wave impedance is therefore proportional to the square root of the wave’s frequency.
3.4 PLANE WAVES IN REAL DIELECTRICS

In real dielectrics, the permittivity \( \varepsilon \) is a complex number,

\[
\varepsilon = \varepsilon' - i\varepsilon'',
\]

which, similar to a non-zero conductivity, also leads to dispersion and attenuation. Considering the general case, including \( \sigma \neq 0 \), the wave equation translates to

\[
\Delta \vec{E} = (\varepsilon' - i\varepsilon'')\mu \frac{\partial^2 \vec{E}}{\partial t^2} + \sigma \mu \frac{\partial \vec{E}}{\partial t}.
\]

Real and imaginary parts of the wave vector, \( \vec{k} = \vec{\alpha} - i\vec{\beta} \), translate accordingly to

\[
\alpha = \omega \sqrt{\mu \varepsilon'} \left[ \frac{1}{2} + \frac{1}{2} \sqrt{1 + \frac{(\sigma + \omega \varepsilon'')^2}{\omega^2 \varepsilon'^2}} \right]^{1/2},
\]

\[
\beta = \frac{\omega \mu (\sigma + \omega \varepsilon'')}{2\alpha}.
\]

The exponent of the attenuation factor \( \beta \) depends on both the conductivity \( \sigma \) and the imaginary part of the permittivity \( \varepsilon'' \). A commonly defined quantity is the loss tangent \( \tan \delta \),

\[
\tan \delta = \frac{\sigma + \omega \varepsilon''}{\omega \varepsilon'},
\]

that defines the ratio of losses due to conductivity and permittivity in a material. For many good insulators, above the critical frequency \( \omega_c = \sigma / \varepsilon \), which defines the boundary of conducting and insulating modes in a material, the conductivity is to first order directly proportional to the frequency [15], which yields the loss tangent to be constant over a wide range of frequencies.

For typical high-speed electrical data links, conductors are surrounded and separated by dielectrics in a configuration that can be modelled as a parallel plate capacitor. In this configuration, the power loss of a wave within the dielectric can be derived to be directly proportional to the frequency and the loss tangent [16],

\[
P_{\text{loss}} \propto \omega \tan \delta,
\]

thus, power losses in real dielectrics rise with increasing frequency. Dielectric losses therefore affect high speed data transmission in electrical links.
A variety of technologies exist to implement high-speed data links. In this chapter, three types of data transmission are described: electrical, optical and wireless transmission. First, electrical links and their implementation in general and in field programmable gate arrays (FPGAs) in particular are described. Secondly, basics of optical and wireless data transmission are presented. The chapter is concluded with a description of digital modulation schemes and procedures to test high-speed links.

### 4.1 Electrical Data Transmission

Electrical links are based on a conductive connection of a signal transmitter and receiver, whose implementations are further described in the subsequent section. The physical link can be realized in many different ways: as a single conductor or as a differential pair, in form of coaxial cables, twisted pair cables, traces on a printed circuit board, etc. High-speed links are typically realized in differential configuration, which has the advantage that the return current is closely coupled to the signal current. Transmission lines are typically surrounded by dielectric materials to fix the distance between conductors in order to control the impedance.

An electrical transmission line can be modelled as an infinitely cascaded two-port system [15], see Figure 4.1, consisting of segments with a characteristic series impedance $z$ and a parallel admittance $y$ to ground. The impedance is composed of a resistance $R$ and an inductance $L$, the admittance is composed of a capacitance $C$ and a shunt conductance $G$. The

![Figure 4.1: Infinitely cascaded model used to derive the telegrapher’s equations, describing the properties of a general transmission line. Adapted from [15].](image-url)
general wave impedance $Z_0$ of such a transmission line follows from the telegrapher’s equation [15] to be

$$Z_0 = \sqrt{\frac{R + i\omega L}{G + i\omega C}}. \quad (4.1)$$

For a lossless transmission line, $R = G = 0$, the impedance is independent of the frequency $\omega$,

$$Z_0 = \sqrt{\left(\frac{L}{C}\right)}, \quad (4.2)$$

thus there is no attenuation and dispersion in lossless transmission lines.

For a real transmission line, the effects described in the previous chapter, skin effect and dielectric losses, lead to non-zero resistance $R$ and conductance $G$. In addition, capacitance changes with frequency as the dielectric permittivity is in general frequency dependent. All these effects lead to attenuation and dispersion of high-speed data signals, which can cause inter-symbol interference. High-speed serial links therefore have to correct for these effects to ensure proper data transmission.

### 4.2 High-speed serial links

Serial links allow to transfer high data volumes over single electrical connections instead of large parallel buses, reducing the number of required interconnections between transmitter and receiver devices. Especially for data transmission between devices that are not placed on the same printed circuit board, the reduction of connections is a huge advantage. But serial links have to be operated at high speeds in order to transfer the same amount of bits in the same time compared to relatively slow, but large parallel buses.

Figure 4.2: High-speed serial links can be divided into different layers, fulfilling different tasks.
High-speed serial links are implemented using different layers to perform different tasks, see Figure 4.2. In the minimal configuration, two layers are required: the physical layer and the data link layer. The physical layer maps data to a physical observable and assures electrical compatibility between devices [17]. It interfaces to the data link layer. The two layers exchange synchronously clocked bits. The data link layer manipulates the data to improve signal integrity and ensure successful communication [17]. This involves usage of encoding schemes and protocol dependent control characters that allow for proper data alignment and clock recovery. Above the data link layer, further protocol specific upper layers may be used. These layers add protocol and application specific features to the data stream, like error checking and error correction, or they introduce additional header or link status information [17]. In the subsequent sections, physical and data link layers are further elaborated.

4.2.1 Physical layer

The physical layer implements the electrical interface between the device and the transmission medium. The data is modulated onto a physical observable. Most common electrical links use non-return-to-zero (NRZ) binary coding, which uses two voltage levels to represent a logical ‘1’ and ‘0’, see Figure 4.3a. High-speed links are typically implemented using differential signalling with low common mode and differential voltages [17], realized in standards like low-voltage differential signaling (LVDS), current-mode logic (CML) or proprietary standards like pseudo current mode logic (PCML), used in Intel FPGAs. Most recently, to increase data rates even beyond 50 Gb/s and overcome bandwidth limitations, high-speed links make use of more voltage levels to encode information of more than a single bit. In 4-level pulse-amplitude modulation (PAM4) coding, illustrated in Figure 4.3b, four voltage levels are used to encode two bits [15]. Such coding schemes require dedicated input and output buffer circuitry to detect the multiple levels.

At the transmitter, high-speed serial links utilize a fast clock for serialization. At the receiver, this transmitting clock has to be recreated in order to recover the bits correctly from the serial data stream. This is realized using

![Diagram](a) Binary non-return-to-zero coding  
(b) 4-level pulse-amplitude modulation

Figure 4.3: Electrical modulation schemes.
clock and data recovery circuitry consisting of phase-locked loops (PLLs) and phase interpolators.

For typical electrical links, signal integrity issues arise at data rates of a few Gb/s and above. In electrical conductors, frequency dependent attenuation due to skin effect and dielectric losses as well as distortions due to frequency dependent delays lead to inter-symbol interference (ISI). ISI is caused by the voltage signal of a symbol that does not reach its full strength within its symbol time [18], and therefore affects the subsequent symbols. Dedicated equalization circuitry in the receiver and transmitter is used to counteract the electrical behaviour of the physical links.

**Equalization at the transmitter: pre-emphasis**

Equalization is the cancellation of analogue signal distortion that a data link suffers along its transmission channel. It can be implemented at the transmitter and at the receiver. Equalization at a transmitter is typically called pre-emphasis [17]. For high frequencies, transmission lines typically represent a low-pass filter. Pre-emphasis boosts high frequency components in the transmitted data signal [21], to compensate losses along the link and minimize ISI effects. The effect of pre-emphasis on the frequency spectrum of a 6.25 Gb/s data stream is shown in Figure 4.4. The spectral power around the fundamental frequency (3.125 GHz) and the first harmonic (9.375 GHz) is increased, while lower frequency components are consequently attenuated.

Figure 4.4: Effect of pre-emphasis on a link with a data rate of 6.25 Gb/s. Black curve: no pre-emphasis. Blue curve: pre-emphasis of about 7 dB enhances the spectrum at high frequencies around 3.125 GHz, and higher harmonics, and diminishes the proportion of lower frequencies. The data is generated using a Stratix V FPGA [19] and recorded with a spectrum analyzer [20].
A common approach of implementing pre-emphasis is by attenuating constant bit levels instead of actively amplifying bit transitions, which can be realized using highpass filtering [17]. The implementation of pre-emphasis in a Stratix IV transceiver is described in section 4.3.3.

Pre-emphasis has limitations. It enhances crosstalk on neighbouring channels due to the increased signal edge rate [22]. Reflections at channel discontinuities get more complicated. In addition, serial links with pre-emphasis consume more power to boost the high frequencies.

*Equalization at the receiver*

Losses along the transmission media can also be counteracted in the receiver [17]. Using frequency dependent gain or attenuation stages before the bit is sampled, signal integrity can be restored.

The effect of equalization can be visualised with the On-Chip Signal Quality Monitoring Circuitry (EyeQ) feature within the gigabit transceivers of a Stratix IV FPGA. Figure 4.5 shows the bathtub curves, that are the bit error rates depending on the sampling phase, of a receiver channel with and without equalization. In this example, 6 dB of equalization is applied to a 3.125 Gb/s serial data stream. The quality of the recorded signal is improved as the width of the bathtub for any target bit error rate increases.

Equalization comes with the general disadvantage that all high frequency components of the signal, including noise contributions, are amplified [18].

![Bathtub curves recorded with the EyeQ feature at 3.125 Gb/s illustrate the effect of equalization on the signal quality of a high-speed serial link. With equalization enabled (black curve), the width between the two edges is larger as the signal quality is enhanced.](image-url)
4.2.2 Data link layer

In the data link layer, encoding schemes and data protocols are implemented to improve signal integrity and allow for successful communication [17]. Encoding schemes are used to guarantee bit transitions that are required for the receiver to recover the clock from the data stream. Commonly used encoding schemes are 8b/10b [23], 64/66b, 64/67b or 128b/130b encoding. Out of these, 8b/10b is the only coding scheme that guarantees DC balancing, which is useful to prevent voltage signals from drifting away from the ideal logical high or low values and allows to compensate large amounts of attenuation along the transmission path [24].

8b/10b encoding

In 8b/10b encoding, 8 bits of data are represented as a 10 bit word, producing 25% of overhead. The 10 bit words contain between 4 and 6 binary ‘1’s. For each encoded word that consists of 4 ‘1’s, there also exists the negated word with 6 ‘1’s. The difference between ‘1’s and ‘0’s in a word is called disparity. In this coding scheme, the disparity of all words is either 0 or ±2. The disparity of a data stream is continuously tracked and called running disparity (RD). The running disparity may only have values of ±1. Following the rules in Table 4.1, the disparity of the encoded words is chosen to maintain the running disparity within these limits.

The limited number of allowed 10b data patterns as well as the running disparity allow to detect single bit errors efficiently. In addition to the codes used by the data words, 8b/10b encoding implements control symbols. The control symbols, also referred to as K-symbols or K-characters, are compiled in Table 4.2. These are special characters used for control functions. A subset of these symbols, K.28.1, K.28.5 and K.28.7, are used as synchronization characters and are called comma symbols. They possess a unique bit sequence of 5 consecutive ‘1’s or ‘0’s that allows to recover word boundaries unambiguously after serialization. The control characters are also used as idle characters, guaranteeing bit transitions when no other data is to be transmitted.

<table>
<thead>
<tr>
<th>Previous RD</th>
<th>Disparity choices</th>
<th>Disparity chosen</th>
<th>Next RD</th>
</tr>
</thead>
<tbody>
<tr>
<td>−1</td>
<td>0</td>
<td>0</td>
<td>−1</td>
</tr>
<tr>
<td>−1</td>
<td>+2, −2</td>
<td>+2</td>
<td>+1</td>
</tr>
<tr>
<td>+1</td>
<td>0</td>
<td>0</td>
<td>+1</td>
</tr>
<tr>
<td>+1</td>
<td>+2, −2</td>
<td>−2</td>
<td>−1</td>
</tr>
</tbody>
</table>

Table 4.1: Rules for running disparity ensure a DC-balanced data stream. Adapted from [25].
4.3 IMPLEMENTATION OF HIGH-SPEED LINKS IN FPGAS

FPGAs are integrated circuits whose hardware be reconfigured after manufacturing, enabling custom hardware developments in small volumes with short modification turn-around times. FPGAs consist of programmable logic blocks with reconfigurable interconnects and memory blocks. In addition, hard intellectual property (IP) cores are implemented as fixed building blocks that are optimized for their individual tasks. These cores typically include PLLs, external memory interfaces and high-speed serializer and deserializer (SerDes) circuitry.

In the following, the high-speed input and output circuitry of an Intel Stratix IV FPGA, which is used in the context of this thesis, is described. Two types of serial IPs are presented. LVDS receivers which can be operated at data rates up to 1.25 Gb/s, and fast, multi-gigabit transceivers that allow for data rates exceeding 6 Gb/s.

### 4.3.1 LVDS receivers in Stratix IV devices

LVDS receivers are relatively compact IPs. Many channels can be clocked from a single PLL. However, the maximum achievable data rates are limited to 1.25 Gb/s.\(^1\) The circuitry of an LVDS receiver channel of an Intel Stratix IV FPGA [21] is shown in Figure 4.6. It features a differential...
input buffer that is connected to the dynamic phase alignment (DPA) circuitry. The input buffer, shown in Figure 4.7, features a differential on-chip termination of 100 Ω to comply with LVDS standards. The DPA circuitry uses 8 fast clocks running at the frequency of the data stream, e.g. 1.25 GHz for 1.25 Gb/s. The clocks are phase-shifted in steps of 45° with respect to each other. Out of these clocks, the DPA circuitry selects the one that matches the incoming data best. The incoming data stream is then retimed to this clock. The phase between the incoming serial data and the selected clock is continuously monitored by the DPA circuitry [21]. If the phase drifts, the DPA block reacts and selects another clock to sample the data during runtime.

The serial data is passed to a 6 bit deep first in, first out buffer (FIFO) to compensate for phase differences between the clock selected by the DPA circuitry and the clock used for deserialization. Before the data is finally deserialized, it passes the bit slip block. Here, single bit latencies can be inserted into the serial data stream to restore proper word alignment. In the deserializer, the serial data stream is written to a 10 bit deep shift register, that is presented to the FPGA fabric as a parallel, synchronous bus.
4.3 Implementation of High-Speed Links in FPGAs

4.3.2 Gigabit transceivers in Stratix IV devices

Serial links on an Intel Stratix IV FPGA with data rates exceeding 1.25 Gb/s are implemented using dedicated and more complex transceiver circuitry. The gigabit transceiver blocks (GXBs) implement the two layers described before in sections 4.2.1 and 4.2.2, the physical layer, herein called physical media attachment (PMA) sublayer, and the data link layer, herein called physical coding sub-layer (PCS). The data path within a gigabit transmitter channel [21] is shown in Figure 4.8 and described in the following paragraphs.

Transmitter

The transmitter (TX) presents a parallel data input interface to the FPGA fabric and a serial data interface to the physical output. At the input of the TX, phase differences between the clocks of the FPGA fabric and the transmitter channel’s PCS are compensated by the TX Phase Compensation FIFO.

The transmitter can be operated in two configurations: single-width and double-width configuration. In single-width configuration, the Byte

![Serializer diagram](image)

Figure 4.9: Serializer operated with a 10 bit wide parallel interface. Adapted from [21].
Serializer is simply bypassed. In double-width configuration, the Byte Serializer halves the bus width of the incoming data by doubling the output clock frequency. This allows to run the transceiver at higher data rates while the clock frequency at the FPGA fabric remains within the specifications of the device [21]. The data is passed to the 8b/10b encoder block, which implements the encoding scheme as it is described in section 4.2.2.

The PMA-layer contains the serializer which is implemented as a shift register, as shown in Figure 4.9. The shift register is loaded with parallel data synchronously to the low-speed parallel clock of the PCS block, and shifted using the high-speed serial clock of the PMA block. The LSB of the shift register is connected to the output buffer. The output buffer, see Figure 4.10, offers circuitry to improve signal integrity, such as programmable differential output voltage $V_{OD}$ to handle different trace lengths, three-tap pre-emphasis to account for frequency dependent attenuation along the transmission channel, and a programmable on-chip termination to match the impedance of the transmission channel [21]. The implementation of pre-emphasis is further described in section 4.3.3.

Receiver

The receiver (RX) presents a serial data interface to the physical input and a parallel data output interface to the FPGA fabric. The data path within a fast receiver channel [21] is shown in Figure 4.11.

The receiving data path starts with the input buffer, see Figure 4.12. The input buffer features programmable common mode voltage to match the common mode voltage of the transmitter for DC coupled links, linear
equalization to compensate for frequency dependent attenuation along the
transmission channel, DC gain to compensate for losses equally across the
frequency spectrum and on-chip termination to match the impedance of
the transmission channel [21]. DC gain is implemented as an amplifier with
uniform frequency response. Its gain can be set in steps of 3 dB from 0 dB
to 12 dB [21]. Equalization is implemented as high frequency amplifier. In
Stratix IV devices, it is optimized for data rates of 6.5 Gb/s, which reduces
the actual gain for significantly lower data rates. The high frequency gain
can be set in a range from 2.6 dB to 17.8 dB [22].

After equalization, the data is presented to the clock and data recovery
(CDR) block, see Figure 4.13. The CDR block is used to recover the clock
from the incoming serial data stream [21]. It consists of a PLL steering a
voltage controlled oscillator (VCO) to recover the high-speed and low-speed
clocks for the PMA and PCS layers. The PLL is trained with a reference
clock using the phase frequency detector (PFD) after reset or when no
valid data signal is present at the input. This operational mode is called
lock-to-reference (LTR). With a valid signal at the input, the CDR switches
to lock-to-data (LTD) mode. The phase detector (PD) tracks the incoming

![Figure 4.12: Receiver input buffer. Taken from [21].](image)

![Figure 4.13: Clock and Data Recovery (CDR) circuitry [21].](image)
serial data and tunes the VCO through the charge pump to match the incoming data stream.

The CDR automatically selects the best clock phase to sample the incoming data. Using the EyeQ feature available in Stratix IV transceivers [21], the CDR can be forced to sample the incoming data across any of 32 available phases within one unit interval (UI)$^2$. This feature is useful to evaluate the quality of the incoming data signal at the physical receiver pin after passing through the transmission channel and the equalization circuitry.

The recovered high-speed serial clock is used in the deserializer to load the incoming bit into the MSB of a shift register and shift the register, as shown in Figure 4.14. The parallel data in the shift register is forwarded to the PCS layer using the low-speed parallel clock.

After serialization and deserialization, word boundaries are lost. In the PCS block, the correct bit alignment is restored in the Word Aligner. The Word Aligner searches for a pre-defined data pattern in the data stream and aligns the word boundaries accordingly. In 8b/10b encoding, such patterns are referred to as comma symbols, see section 4.2.2.

Deskew FIFOs are only important for transceivers using the 10 Gigabit Attachment Unit Interface (XAUI) standard. Rate Match FIFOs are required in asynchronous systems. In the context of this thesis, both FIFOs are bypassed, as all transceivers are operated synchronously and without the XAUI protocol.

8b/10b Decoder and Byte Deserializer revert the effects of the 8b/10b Encoder and Byte Serializer in the transmitter. In the decoder, the original data is restored. In addition, bit and disparity errors are detected. The

---

$^2$ One unit interval equals the bit period, so for a data rate of 1 Gb/s, the unit interval is 1 ns
Byte Deserializer doubles the bus width at the output by halving the clock frequency.

Equally to the bit alignment, that has to be restored after deserialization, the order of the bytes within the parallel data bus can be mixed up in the receiver. Correct order of the bytes is restored in the Byte Ordering block using a pre-defined pattern. The data is then ready to be processed in the FPGA fabric, after passing through another FIFO to compensate for phase differences between the clocks of the PCS block and the FPGA fabric.

4.3.3 Pre-emphasis in Stratix IV transmitters

The purpose of pre-emphasis is to compensate inter-symbol interference from nearby data symbols, as described in section 4.2.1. It is realized by delaying and inverting the signal and subsequently adding it to the original signal using a properly chosen weight. The Stratix IV transmitter incorporates a finite impulse response filter with three "taps", that refer to signals after different unit intervals [22]. The transmitter contains four programmable drivers, depicted in Figure 4.15. The main driver, $V_{OD}$, controls the signal’s base amplitude. The other three drivers belong to the pre-emphasis taps: pre-tap, 1st post-tap, and 2nd post-tap.

The 1st post-tap emphasizes the bit period immediately after a transition and de-emphasizes the remaining bits [22], see Figure 4.16. It is typically the most effective tap. The 2nd post-tap de-emphasizes the first two bits after a transition and emphasizes the remaining bits [22]. The opposite can be achieved by inverting the polarity, see Figure 4.17a. The pre-tap

![Figure 4.15: Pre-emphasis circuitry using three taps in Stratix IV transmitters. Taken from [21].](image1)

![Figure 4.16: Effect of the 1st post-tap of the pre-emphasis module in a Stratix IV transmitter.](image2)
4.4 Optical Data Transmission

The bandwidth of optical communication systems exceeds that of copper based architectures [15]. In addition, fibre optical communication is free from interference due to radio waves or any other electromagnetic effects. However, the implementation of high-speed optical transceivers is more costly.

Optical transmitters are typically based on light-emitting diodes (LEDs) or laser diodes. LEDs are implemented in direct band-gap semiconductors as forward biased p-n junctions, that generate light through spontaneous emission by recombination of excess charge carriers [26]. High-speed transmitters rely on laser diodes, which allow for faster switching frequencies, faster signal rise times and higher optical output power. In addition, they emit light with a narrower spectral width [15]. Most commonly, these laser diodes are implemented as vertical-cavity surface-emitting lasers (VCSELs).

Optical receivers are based on semiconductor photodiodes that are typically realized as shallow, reverse-biased p-n or p-i-n junctions. The reverse-bias leads to the formation of a depletion zone, where free electron-hole pairs, created through photon absorption, are separated [26]. The motion of the free charge carriers leads to a sharp increase in the reverse bias current.

In optical links, transmitters and receivers are coupled using optical fibres, that are made of non-conducting and non-magnetic materials. Thus, signal propagation can be derived from Maxwell’s equations [27], following section 3.2. High quality optical fibres are made of glass with graded refractive index [15]. Two types of fibres exist: multi-mode and single-mode fibres. The difference arises due to the different fibre core size [15]. Multi-
mode fibres have most commonly core diameters of 50 \( \mu m \) or 62.5 \( \mu m \). The wavelengths used for optical communication are within 700 nm to 1600 nm. The large core leads to multi-mode signal propagation with hundreds of pathways. Each mode has a unique attenuation along the fibre and a different delay, which leads to modal dispersion \[15\]. In addition, multi-mode fibres suffer from chromatic dispersion, which is due to variations in the refractive index with respect to the wavelength. Chromatic dispersion can be mitigated using lasers with narrow spectral width. Impurities in the optical fibre can lead to attenuation of the signals due to absorption or Rayleigh scattering \[28\]. Single mode fibres have much smaller core diameters of 9 \( \mu m \) to 10\( \mu m \) and are operated with wavelengths around 1300 nm \[15\]. In contrast to multi-mode fibres, only a single mode of propagation develops, cancelling any modal dispersion effects. Thus, they are better suited for high speed links over long distances. But due to the small core diameter, single-mode fibres are harder to match between optical devices, which leads to higher cost.

To transfer data over an optical link, the light source, also referred to as carrier, is modulated. Most commonly, binary direct modulation is used \[28\]. For instance, lasers can be simply switched on and off to transfer a ’1’ or a ’0’, implementing on-off-keying (OOK). In general, higher order modulation schemes, as shown in section 4.6, can be applied. Implementation of phase-shift keying (PSK), frequency-shift keying (FSK) and multilevel modulation schemes in optical transceivers are described in \[29\].

4.5 wireless data transmission

Both types of data transmission discussed so far, electrical and optical, use a direct connection between transmitter and receiver in form of an electrical cable or an optical fibre. For many decades, wireless data transmission has been used in different frequency regimes, for instance for radio broadcasts or mobile communications. New developments of wireless links operated in the mm-wave band (from 30 GHz to 300 GHz) open up possibilities for high-speed wireless data links.

Similar to optical transceivers, data is transferred over a wireless link by modulating a carrier signal. The carrier, however, is created through purely electrical circuitry and not using opto-electronics. Fast voltage controlled oscillators are used to create the carrier frequencies in wireless transceivers \[30\].

To transmit and receive the signals, antennas are used. Antennas convert guided electromagnetic signals into electromagnetic waves and vice versa \[30\]. Different types of antennas exist, ranging from dipole antennas as already used in the 19th century by Hertz to patch antennas on printed circuit boards. In the mm-wave band it is even possible to integrate antennas into the transceiver chips.
A wireless channel is not guided through a fibre or a cable, and thus affected by path loss, multi-path propagation, inter-symbol interference and external interference from other wireless communication systems [31]. The free-space path loss (FSPL) is highly dependent on the wavelength $\lambda$ of the wireless link. From Friis transmission equation [32], it follows to be

$$FSPL = \left(\frac{4\pi r}{\lambda}\right)^2,$$

(4.3)

for distances $r \gg \lambda$. The path loss limits the application of wireless data links with high frequencies, and corresponding small wavelengths $\lambda$, to short ranges.

### 4.6 Digital Modulation Schemes

Optical and wireless data transmission use modulation of a carrier wave to convey information. In general, a carrier sinusoid wave $c(t)$, of the form

$$c(t) = A \cdot \cos \left(2\pi f_c t + \theta\right),$$

(4.4)

has three parameters that can be modulated: the Amplitude $A$, the frequency $f$ and the phase $\theta$, following [29]. Individual modulation of these parameters is implemented in the subsequently described schemes.

#### 4.6.1 Amplitude-shift keying

In amplitude-shift keying (ASK), the message is encoded within the amplitude, such that

$$A(t) = k_\text{a}m(t),$$

(4.5)

with $k_\text{a}$ defining the maximum amplitude and $m(t)$ taking discrete values for digital modulation. In the binary case, for $m(t) = 0, 1$, the modulation scheme is known as on-off-keying, see Figure 4.18a.

#### 4.6.2 Phase-shift keying

Similarly, the information can be encoded in the phase $\theta$ of the carrier signal, see Figure 4.18b, such that

$$\theta(t) = k_\text{p}m(t),$$

(4.6)

with $k_\text{p}$ being the phase sensitivity [29]. In the binary, digital case, the modulation scheme is referred to as binary phase-shift keying (BPSK).
4.6.3 Frequency-shift keying

In FSK, the message is conveyed in the frequency $f$ of the carrier. To prevent phase discontinuities in the carrier, it is typically implemented as continuous phase frequency-shift keying (CPFSK), see Figure 4.18c. The carrier wave is translated to

$$c(t) = A \cdot \cos(\Phi(t)),$$  \hspace{1cm} (4.7)

where the argument of the cosine is given by

$$\Phi(t) = 2\pi f_i t.$$  \hspace{1cm} (4.8)

The frequencies $f_i$ used for modulation are given by

$$f_i = f_c + k_f m_i,$$  \hspace{1cm} (4.9)

where $k_f$ is the frequency modulation index and $m_i$ the modulation step. Phase discontinuities are avoided by choosing the frequencies such that the phases of all frequencies accumulated over a bit interval are an integer multiple of $2\pi$ [29].
4.6.4 Multi-level modulation schemes

All previously discussed modulation schemes have in common that the spectral efficiency, the number of bits that are transferred per second and unit of bandwidth, is relatively low. Higher order modulation schemes have intrinsically higher spectral efficiencies, which allows for higher data rates over the same bandwidth. According to the Shannon-Hartley theorem [33], the channel capacity is limited by the signal-to-noise ratio. Consequently, for higher order modulation schemes, higher signal-to-noise ratios are required.

Spectral efficiency can be increased by using multiple levels instead of the binary two, for instance in PAM4 modulation, as described in section 4.2.1. Quadrature amplitude modulation (QAM) is an example for a higher order modulation scheme, implementing both amplitude and phase modulation. If both phase and amplitude can take 4 different values, 4 bits can be transferred per symbol. Accordingly, higher orders are implemented. QAM is often implemented in wireless transceivers using IQ-mixers, where the carrier is split into two orthogonal waves with a phase shift of 90°, spanning a 2-dimensional space, called IQ-space, see Figure 4.19a. Symbols are represented as discrete points within this space.

A special case of frequency-shift keying, that can be implemented using IQ-mixers, is minimum-shift keying (MSK). In MSK, a logical '0' or '1' is represented in IQ-space by a 90° rotation either clock- or counter-clockwise, see Figure 4.19b. MSK has a higher spectral efficiency than typical CPFSK.

4.7 Testing high-speed links

The implementation of high-speed links requires thorough testing in order to guarantee proper data transfer, especially when no means of error
correction are available. The quality of high-speed links can be tested in two ways. Using fast oscilloscopes, the analogue electrical signal can be observed. Overlaying of many waveforms with different patterns generates so-called eye diagrams, illustrated in Figure 4.20. The quality of data transmission can be deduced from different parameters. The $jitter_{RMS}$ quantifies the standard deviation of the signal at the signal transition. The $noise_{RMS}$ quantifies the standard deviation of the signal at its high or low level. The amplitude is the difference between the average high and low level. The signal-to-noise ratio $SNR$ is computed by dividing the amplitude by the noise RMS.

Additional insight can be gained by using the eye parameters. The eye height is a measure of how noise affects the vertical opening between the high and low voltage levels. In the DSA8300 serial analyzer by Tektronix [34], that is used to perform eye diagram analyses within this thesis, the eye height $EH$ is defined as

$$EH = (High - 3 \cdot noise_{RMS,high}) - (Low + 3 \cdot noise_{RMS,low}), \quad (4.10)$$

where $High$ and $Low$ are the average voltage levels at the logical ‘1’ and ‘0’ states and $noise_{RMS,high}$ and $noise_{RMS,low}$ are their standard deviations.

The eye width $EW$ is a measure of how jitter affects the horizontal eye opening between two consecutive crossings [34], defined as

$$EW = (TCross2 - 3 \cdot jitter_{RMS,2}) - (TCross1 + 3 \cdot jitter_{RMS,1}), \quad (4.11)$$

where $TCross1$ and $TCross2$ are the mean of the histograms of the two transitions and $jitter_{RMS,1}$ and $jitter_{RMS,2}$ are their standard deviations.

Additionally, the link quality can be tested by measuring bit error rates. At the transmitter, typically a pseudo-random data pattern is generated and sent over the link. At the receiver, due to the deterministic nature of the pseudo-random data, the incoming data can be predicted, which allows to find bit errors. For a data rate $r$ and a time interval $t$, the number of transferred bits $N$ is

$$N_{bits} = r \cdot t. \quad (4.12)$$
From the number of observed bit errors $N_{err}$, the bit error rate (BER), the probability that a bit error occurs, can be deduced

$$BER = \frac{N_{err}}{N_{bits}}.$$ (4.13)

In case of the absence of bit errors, only upper limits for the bit error rate can be stated. To that end, a Bayesian approach with flat prior is used for a Poissonian distribution, according to [35]. For a confidence level (CL) of 95%, one arrives at an upper limit for the bit error rate of

$$BER \leq \frac{\ln(1 - CL)}{N_{bits}} \approx \frac{3}{N_{bits}}.$$ (4.14)
The hunt for new physics is performed at the particle energy and intensity frontiers. At the energy frontier, experiments search for resonances in the spectra of secondary particles that originate from the direct production of new heavy particles. At the intensity frontier, experiments search for smallest deviations from Standard Model expectations that could lead the way to new physics. Rare decays of the Standard Model are an ideal probe as their branching ratios can be highly affected by contributions from new physics.

The Mu3e experiment is a precision experiment at the intensity frontier. An ultra-thin silicon tracking detector is the experiment’s key component in the search for the ultra-rare decay $\mu \rightarrow eee$. Track rates of more than $10^9$ per second, required to reach an unprecedented sensitivity for the decay’s branching ratio down to $10^{-16}$, pose challenges not only to the sensors, but also to the data acquisition system.
It is well established that lepton flavour is not conserved in nature. Experiments like Super-Kamiokande [36], SNO [37] and KamLand [38] have observed lepton flavour violation through neutrino mixing. Searches for lepton flavour violation of charged leptons, however, have not succeeded so far. Most prominently, charged lepton flavour violation is searched for in decays of the muon. The most sensitive experiments conducted in the muon sector up to now, like MEG, searching for the decay $\mu^+ \rightarrow e^+ \gamma$, or SINDRUM, searching for the decay $\mu^+ \rightarrow e^+ e^- e^+$, could only set limits on the respective branching ratios: $B(\mu \rightarrow e\gamma) < 4.2 \times 10^{-13}$ [39] and $B(\mu \rightarrow eee) < 1.0 \times 10^{-12}$ [40], both at 90% confidence level.

5.1 The Mu3e Experiment

The Mu3e experiment [41–43] at Paul Scherrer Institut (PSI) is a precision experiment at the intensity frontier dedicated to search for charged lepton

\[ \mu^+ \rightarrow \nu_\mu \nu_\mu \bar{\nu}_e W^+ e^- \gamma^* e^+ e^- \]

\( (a) \) The decay $\mu \rightarrow eee$ through neutrino mixing.

\[ \mu^+ \rightarrow \nu_\mu \nu_\mu \bar{\nu}_e W^+ e^- \gamma^* e^+ e^- \]

\( (b) \) Irreducible background $\mu \rightarrow eee\nu\nu$ through to photon conversion

\[ \mu^+ \rightarrow \chi^0 \rightarrow \nu_e Z\gamma \gamma^* e^+ e^- \]

\( (c) \) The decay $\mu \rightarrow eee$ through SUSY particles at loop-level.

\[ \mu^+ \rightarrow \nu_\mu \nu_\mu \bar{\nu}_e Z' e^- e^- \]

\( (d) \) The decay $\mu \rightarrow eee$ through a flavour violating $Z'$ boson at tree-level.

Figure 5.1: Feynman diagrams of signal and background processes.
flavour violation in the channel $\mu^+ \rightarrow e^+e^-e^+$ with an unprecedented sensitivity. In absence of a signal, limits on the branching ratio are targeted to be improved by four orders of magnitude down to $B < 1.0 \times 10^{-16}$ at a confidence level of 90%.

The decay channel $\mu \rightarrow e^+e^-e^+$ allows to perform precision measurements in a very clean environment. Including neutrino mixing into Standard Model physics (Figure 5.1a), the expected branching ratio is well below experimental reach ($B(\mu \rightarrow eee) < 10^{-50}$) [44]. Potential signal candidates are motivated through a variety of physics theories. Supersymmetric (SUSY) particles (Figure 5.1c) could enhance the branching ratio. New bosons could mediate a lepton flavor violating interaction [45] (Figure 5.1d) that could raise the branching ratio to a measurable level.

Several challenges have to be overcome for the experiment to reach its ultimate sensitivity. The required separation power between signal and background events dictates the detector requirements. To distinguish the signal from the irreducible background due to radiative muon decays with internal photon conversion, $\mu^+ \rightarrow e^+e^-e^+\nu_e\bar{\nu}_\mu$ (Figure 5.1b), with such a high precision, a detector with an average momentum resolution better than 1.0 MeV is required, see Figure 5.2. To suppress accidental background composed of combinations of Michel decays, radiative decays and Bhabha scattering, good vertex and time resolutions are required.

The experimental concept, depicted in Figure 5.3, foresees to stop a beam of muons on a Mylar target and measure the trajectories of the decay electrons in a solenoidal magnetic field of 1 T using a silicon pixel tracking detector. In the momentum regime of these electrons, momentum and vertex resolution are predominantly deteriorated by multiple Coloumb scattering in the detector material. Thus, the detector’s material has to be minimized. Ultra-thin pixel layers relying on high-voltage monolithic active pixel sensors are used. The pixel sensors, called MuPix, are further described in section 5.4.
The central pixel detector consists of four radial layers around the target. These four layers are subdivided into inner and outer pixel layers. The two inner pixel layers consist of 8 and 10 detector ladders, see Figure 5.4, with 6 pixel sensors each. Each layer is split into two modules, which combine 4 and 5 ladders, respectively. The two outer pixel layers consist of 24 and 28 ladders with 17 and 18 sensors, respectively. The layers are split into 6 and 7 modules, respectively, with each module combining 4 ladders. Up- and downstream, so-called recurl pixel layers, which are copies of the outer pixel layers of the central detector, are installed to enhance the momentum resolution by increasing the lever arm of the tracks.

In the central detector region, a scintillating fibre detector is used for timing measurements. A time resolution of 500 ps per track is targeted. At the recurl stations, timing detectors in form of scintillating tiles are used, adding the most precise time information to the tracks that reach the recurl stations. Both timing detectors utilize silicon photomultipliers (SiPMs) and a common readout application specific integrated circuit (ASIC), the MuTRiG. Details on the fibre detector can be found in [43, 46, 47]. Details on the tile detector can be found in [43, 48]. The MuTRiG is described in detail in [49–51].

The experiment is planned to be conducted in two phases. In the first phase, the πe5 beam line at PSI provides muon rates of the order of $10^8$.
muons per second. In the second phase, beam rates of more than $10^9$ muons per second are required. These beam rates will be provided by a future high intensity muon beam line that is currently under study at PSI [52]. The high beam rates pose an additional challenge to the sensors and to the readout system.

5.2 Readout Architecture of the Mu3e Experiment

The sensitivity goal of the Mu3e experiment demands the observation of more than $10^{16}$ muon decays. Due to the large number of decays, storage of the full event information is not feasible for all events. As most events originate from background processes, anyway, it is also not necessary. Thus, for detailed offline analysis, the complete detector data is stored only for interesting signal and signal-like events that are selected online.

The readout architecture of the Mu3e experiment is optimized for an efficient event selection in real time. The detectors provide un-triggered, zero-suppressed hit position and timing data. This data is processed in three subsequent stages, see Figure 5.5. At the front-end, the sensors are continuously readout and provided with a clock and their configuration. Furthermore, the chronological order of the incoming detector information is restored at the front-end. On the switching boards, the data of several front-ends is collected and merged into data packets containing the complete, chronologically ordered detector information. These packets are distributed to the online event filter farm, where the tracks of all events are reconstructed. Events are finally selected for storage based on momentum and vertex constraints applied on the reconstructed tracks.

Figure 5.5: General readout architecture for the Mu3e experiment. The number of arrows symbolizes the number of links between the different stages.
5.3 Expected Data Rates

The Mu3e experiment utilizes two types of sensor and readout ASICs, the MuPix for the tracking detector and the MuTRiG for the timing detector. Both ASICs generate un-triggered, zero-suppressed data. The requirements on the readout system’s bandwidth are directly linked to the expected detector occupancy. For Phase I of the experiment, full detector simulations have been performed assuming a rate of stopped muons of $1 \cdot 10^8$ Hz [43].

5.3.1 Data Rates at the Front-end

For the pixel detector, simulated hit and corresponding data rates are compiled in Table 5.1, excluding noise contributions. The complete pixel detector is expected to deliver a total hit rate of $1.045 \cdot 10^3$ MHz. A single pixel hit on the sensor is expressed in a 32 b word, following the implementation of the current prototype MuPix8. It is composed of a 16 b address (8 b column and 8 b row position), a 10 b time stamp and a 6 b amplitude information. Thus, the expected data rate of the whole pixel detector is about $33.4 \text{ Gb/s}$. This does not yet include the link protocol. The pixel sensors utilize 8b/10b encoding, which produces an overhead of 25%. This
yields a total readout bandwidth from the sensors to the front-end FPGAs of 41.8 Gb/s.

Peak occupancies are equally important as average occupancies. The sensors closest to the target are exposed to a maximum hit rate of 5.2 MHz, resulting in a maximum data rate of 166.4 Mb/s per sensor.

Contributions from electronics noise are left out in the considerations above. Depending on the noise rate per pixel, $R_{\text{noise,pix}}$, a more or less significant amount of data has to be transferred in addition. The resulting data rates for average noise rates of 0.1 Hz, 1.0 Hz and 10.0 Hz are compiled in Table 5.2. For noise rates above 1.0 Hz, noise data starts to contribute significantly to the total data rate. For noise rates of 10 Hz and above, noise data dominates. For the following considerations, noise rates are expected to be below 1 Hz per pixel and, thus, subsequently neglected.
Table 5.1 shows the number of front-end FPGAs connected to each pixel layer. Two FPGAs are used per pixel module\(^1\). Thus, for layer 1 and 2, each FPGA is connected to 12 and 15 sensors, respectively. For the outer layers and the recurl station layers, each FPGA is connected to either 32 or 36 sensors.

The timing detectors use a common readout ASIC, the MuTRiG [49], that can be operated in two configurations. In standard configuration an event is represented using 48 b. In short configuration the energy information is dropped, which reduces the size of an event to 27 b.

For the scintillitating fibre detector, the MuTRiG is operated in short configuration. A total data rate of about 61.9 Gb/s is expected from simulations [46]. This includes the raw event rate as well as the dark count rate, with the latter constituting a significant part of the data as the SiPMs are foreseen to be run with a threshold of 0.5 photoelectrons. Including 8b/10b encoding, the fibre detector requires around 77.4 Gb/s of bandwidth.

For the scintillating tile detector, the MuTRiG is operated in standard configuration. The SiPMs deployed can be operated with higher thresholds. Thus, dark counts are not expected to contribute as substantially to the data rate as for the fibre detector. From simulations, a hit rate for the whole detector of about 145 MHz is expected. Leaving a significant margin for noise, a total rate of about 200 to 250 MHz is assumed [53]. The latter would result in a data rate of 12.0 Gb/s, with about 7.0 Gb/s stemming from actual particle interactions. Including 8b/10b encoding, the tile detector requires around 15.0 Gb/s of bandwidth.

### 5.3.2 Data rates between front-end and switching boards

The front-end FPGAs send packets of chronologically sorted hit information to the switching boards via optical links. Table 5.3 shows a potential structure for the packets for the pixel detector. In this scheme, a packet covers 16 timestamps with up to 63 hits per timestamp. For each timestamp, a single bit is used as overflow flag to indicate if more than 63 hits were present and, hence, hits had to be discarded. As hits are ordered by their timestamp, and the number of hits per timestamp is indicated in the header, the timestamp itself can be removed from the hit representation. Instead, a 6 b chip address is added to identify each of the up to 45 connected links unambiguously. Thus, a single hit can be expressed in a 28 b word at this stage of the readout.

Applying this data structure, the data rates between the front-end FPGAs and the switching boards can be inferred. The average number of hits per timestamp at the front-end FPGA is derived from Table 5.1. Hits are represented in 28 b, as described above, and the overhead due to the packet structure is 208 b, see Table 5.3. Packets containing zero hits are forwarded to the switching board, in order to ease packet merging. This yields a base data rate of 1.625 Gb/s due to packet overhead. For the two innermost

---

\(^1\) See section 5.1 for the definition of the pixel modules.
Table 5.3: Packet structure for hit data at the front-end. A packet contains up to 63 hits per timestamp. The highlighted rows indicate the hit information, referred to as payload of the packet.

<table>
<thead>
<tr>
<th>Bits</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>48</td>
<td>Start marker</td>
</tr>
<tr>
<td></td>
<td>FPGA timestamp</td>
</tr>
<tr>
<td></td>
<td>Hit block counter</td>
</tr>
<tr>
<td>16</td>
<td>Number of hits per timestamp</td>
</tr>
<tr>
<td>16</td>
<td>Timestamp overflow flags</td>
</tr>
<tr>
<td>$N \times 28$</td>
<td>N hits, composed of:</td>
</tr>
<tr>
<td></td>
<td>8b row address</td>
</tr>
<tr>
<td></td>
<td>8b column address</td>
</tr>
<tr>
<td></td>
<td>6b time over threshold</td>
</tr>
<tr>
<td></td>
<td>6b chip address</td>
</tr>
<tr>
<td>48</td>
<td>End marker + CRC</td>
</tr>
<tr>
<td>$208 + N \times 28$</td>
<td>Total packet size</td>
</tr>
</tbody>
</table>

Table 5.4: Readout requirements for the Mu3e pixel detector operated at Phase I, from the front-end FPGAs to the switching boards (SBs). The front-end FPGAs connected to the layers of the recurl stations are labelled according to their location inner/outer (I/O) and up- and downstream (U/D). Noise contributions are neglected. No further protocol is implied. In the last column, the number of connected switching boards is specified.

<table>
<thead>
<tr>
<th>layer</th>
<th>avg hits /8ns</th>
<th>avg size hit block /FPGA 208+ [b]</th>
<th>data rate /FPGA [Gb/s]</th>
<th>payload [%]</th>
<th>data rate /layer [Gb/s]</th>
<th>SBs</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 + 2</td>
<td>0.389</td>
<td>174.3</td>
<td>2.99</td>
<td>45.6</td>
<td>23.9</td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td>0.177</td>
<td>79.4</td>
<td>2.25</td>
<td>27.6</td>
<td>26.9</td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td>0.142</td>
<td>63.5</td>
<td>2.12</td>
<td>23.4</td>
<td>29.7</td>
<td>1</td>
</tr>
<tr>
<td>Recurl IU</td>
<td>0.027</td>
<td>12.2</td>
<td>1.72</td>
<td>5.6</td>
<td>20.6</td>
<td>1</td>
</tr>
<tr>
<td>Recurl OU</td>
<td>0.025</td>
<td>11.3</td>
<td>1.71</td>
<td>5.1</td>
<td>24.0</td>
<td>1</td>
</tr>
<tr>
<td>Recurl ID</td>
<td>0.019</td>
<td>8.4</td>
<td>1.69</td>
<td>3.9</td>
<td>20.3</td>
<td>1</td>
</tr>
<tr>
<td>Recurl OD</td>
<td>0.017</td>
<td>7.4</td>
<td>1.68</td>
<td>3.4</td>
<td>23.6</td>
<td>1</td>
</tr>
<tr>
<td>Avg</td>
<td>0.097</td>
<td>43.5</td>
<td>1.97</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Total</td>
<td>8.360</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>169</td>
</tr>
</tbody>
</table>

layers, the resulting average data rate per FPGA is around 3 Gb/s, having a payload\(^2\) of 45.6 %. The average data rates of the FPGAs connected to the outer two central layers are 2.12 Gb/s to 2.25 Gb/s with an average payload of 23.4 % to 27.6 %. The FPGAs connected to the recurl layers mainly send

\(^2\) In this context, payload is defined as the number of bits containing hit data versus the total number bits per packet.
overhead, as the average payload is below 6%. Data rates, average payload and the average number of hits per timestamp are compiled in Table 5.4.

As for the pixel detector, hits in the scintillating fibre and tile detector have to be ordered chronologically to facilitate an efficient online reconstruction. As a large part of the fibre data is due to dark counts, a cut on the hit multiplicity or a clustering algorithm are considered to be implemented on the front-end FPGA. This allows to reduce the data rate to the switching board at least by a factor of 3 [46, 47].

For the tile detector, time walk compensation might be implemented on the front-end, which could reduce the data rate significantly by a factor of 1.78 by dropping the energy information. If noise hits can be completely cancelled by applying an energy cut on the front-end, the total data reduction factor could be larger than two. Therefore, the tile detector does not pose a challenge to the bandwidth towards the switching boards.

### 5.3.3 Data rates between switching boards and filter farm

The FPGAs on the switching boards receive the data from the front-end FPGAs and merge it. For Phase I of the experiment, one switching board is connected to the central detector, one to the upstream and one to the downstream recurl stations, each. The same data structure is applied as before on the front-end, see Table 5.3. The average hits per timestamp, the average hit block size and the resulting data rate from the switching boards towards the filter farm are summarized in Table 5.5.

The switching board connected to the central detector is expected to deliver the highest data rate, about 32.3 Gb/s and thus about 80% of the total pixel detector data.

The bandwidth of the switching boards connected to the upstream and downstream recurl stations has to be shared with the tile detector, which does not pose a challenge as both detectors do not require more than a few Gb/s. For the fibre detector, information from both ends of the ribbons is

<table>
<thead>
<tr>
<th>layer</th>
<th>avg hits /8ns</th>
<th>avg size hit block /FPGA</th>
<th>data rate /SB to farm [Gb/s]</th>
<th>payload [%]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Inner 1-4</td>
<td>7.22</td>
<td>3930</td>
<td>32.3</td>
<td>95.0</td>
</tr>
<tr>
<td>Recurl U</td>
<td>0.68</td>
<td>370</td>
<td>4.5</td>
<td>64.0</td>
</tr>
<tr>
<td>Recurl D</td>
<td>0.46</td>
<td>248</td>
<td>3.6</td>
<td>54.4</td>
</tr>
<tr>
<td>Total</td>
<td>8.36</td>
<td>4548</td>
<td>40.4</td>
<td></td>
</tr>
</tbody>
</table>

Table 5.5: Readout requirements for the Mu3e pixel detector operated at Phase I, from the switching boards to the filter farm. The switching FPGAs connected to the layers of the recurl stations are labelled according to their location up- and downstream (U/D). Noise contributions are neglected. No further protocol is implied.
combined on a separate switching board to reduce the detector’s data rate even further. It is expected to end up well below 40 Gb/s [46, 47].

5.3.4 Phase II - data rates of the pixel detector

For Phase II of the Mu3e experiment, a muon stopping rate of \(2 \cdot 10^9\) Hz is targeted. Hereinafter, only the pixel detector is considered. The same detector configuration as for Phase I is assumed. To arrive at the data rates expected for Phase II, the hit rates are scaled-up by a factor of 20.

The total data rate of the full pixel detector scales to 668 Gb/s, see Table 5.6. Including 8b/10b encoding, the required readout bandwidth amounts to 835 Gb/s. The maximum hit rate of the busiest sensors increases to 104 MHz\(^3\).

To transfer the data from the front-end FPGAs to the switching boards, the same packet structure can be applied as for Phase I. The resulting data rates are summarized in Table 5.7. The maximum optical bandwidth increases for the FPGAs connected to the central layers to 28.9 Gb/s. Applying 8b/10b encoding, it rises to 36.1 Gb/s. To account for the increased total data rate of 725 Gb/s, three more switching boards have to be used for the readout of the central detector.

3 The peak occupancy will be reduced due to geometrical changes of the detector. The target is planned to be elongated in order to increase the spatial spread of the vertices. This, however, requires a re-design of the inner pixel layers and a separate, dedicated study, which is not within the scope of this thesis. The maximum hit rate is estimated to be of the order of 80 MHz in the detector configuration for Phase II.

<table>
<thead>
<tr>
<th>layer</th>
<th>max hits</th>
<th>avg hits</th>
<th>data rate max /chip [MHz]</th>
<th>data rate /layer [Mb/s]</th>
<th>front-end FPGAs</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>48 (144)</td>
<td>104</td>
<td>3880</td>
<td>3328</td>
<td>124</td>
</tr>
<tr>
<td>2</td>
<td>60 (180)</td>
<td>104</td>
<td>3900</td>
<td>3328</td>
<td>124</td>
</tr>
<tr>
<td>3</td>
<td>408 (408)</td>
<td>24</td>
<td>5320</td>
<td>768</td>
<td>170</td>
</tr>
<tr>
<td>4</td>
<td>504 (504)</td>
<td>24</td>
<td>4960</td>
<td>768</td>
<td>158</td>
</tr>
<tr>
<td>Recurl IU</td>
<td>408 (408)</td>
<td>3.0</td>
<td>820</td>
<td>96</td>
<td>26</td>
</tr>
<tr>
<td>Recurl OU</td>
<td>504 (504)</td>
<td>2.8</td>
<td>880</td>
<td>90</td>
<td>28</td>
</tr>
<tr>
<td>Recurl ID</td>
<td>408 (408)</td>
<td>2.2</td>
<td>560</td>
<td>70</td>
<td>18</td>
</tr>
<tr>
<td>Recurl OD</td>
<td>504 (504)</td>
<td>2.0</td>
<td>580</td>
<td>64</td>
<td>18</td>
</tr>
<tr>
<td>Total</td>
<td>2844 (3060)</td>
<td>20900</td>
<td>668</td>
<td>86</td>
<td></td>
</tr>
</tbody>
</table>

Table 5.6: Readout requirements for the Mu3e pixel detector operated at Phase II, from the sensors to the front-end FPGAs. The recurl station layers are labelled according to their location inner/outer (I/O) and up- and downstream (U/D). A hit is represented using 32 b. No further protocol is implied. In the last column, the number of connected front-end FPGAs is specified.
The structure of the packets sent from the switching boards to the filter farm should be adapted to accommodate on average 36.1 hits per timestamp and to leave some margin for peaking hit rates. Increasing the header to account for up to 255 hits per timestamp seems appropriate. So for every timestamp, the hit counter is enlarged from 6 b to 8 b, increasing the width of the header to 240 b. The resulting data rates that are expected between switching boards and filter farm are compiled in Table 5.8. The data rates for each of the four switching boards of the central detector amount to 155 Gb/s.

### Table 5.7: Readout requirements for the Mu3e pixel detector operated at Phase II, from the front-end FPGAs to the switching boards. The front-end FPGAs connected to the recurl station layers are labelled by inner/outer (I/O) and up- and downstream (U/D).

<table>
<thead>
<tr>
<th>Layer</th>
<th>Avg Hits</th>
<th>Avg Size</th>
<th>Data Rate to Switching Board</th>
<th>Data Rate to Filter Farm</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 + 2</td>
<td>7.78</td>
<td>3485</td>
<td>28.9</td>
<td>94.4</td>
</tr>
<tr>
<td>3</td>
<td>3.55</td>
<td>1589</td>
<td>14.0</td>
<td>88.5</td>
</tr>
<tr>
<td>4</td>
<td>2.83</td>
<td>1270</td>
<td>11.5</td>
<td>86.0</td>
</tr>
<tr>
<td>Recurl IU</td>
<td>0.55</td>
<td>245</td>
<td>3.5</td>
<td>55.8</td>
</tr>
<tr>
<td>Recurl OU</td>
<td>0.50</td>
<td>225</td>
<td>3.4</td>
<td>53.1</td>
</tr>
<tr>
<td>Recurl ID</td>
<td>0.37</td>
<td>167</td>
<td>2.9</td>
<td>47.1</td>
</tr>
<tr>
<td>Recurl OD</td>
<td>0.33</td>
<td>148</td>
<td>2.8</td>
<td>42.5</td>
</tr>
</tbody>
</table>

**Avg** 1.94 871 8.43

**Total** 725 6

### Table 5.8: Readout requirements for the Mu3e pixel detector operated at Phase II, from the switching boards to the filter farm. The switching FPGAs connected to the recurl station layers are labelled by up- and downstream (U/D).

<table>
<thead>
<tr>
<th>Layer</th>
<th>Avg Hits</th>
<th>Avg Size</th>
<th>Data Rate to Filter Farm</th>
</tr>
</thead>
<tbody>
<tr>
<td>Inner 1-4</td>
<td>36.1</td>
<td>19649</td>
<td>155</td>
</tr>
<tr>
<td>Recurl U</td>
<td>13.6</td>
<td>7398</td>
<td>59.7</td>
</tr>
<tr>
<td>Recurl D</td>
<td>9.1</td>
<td>4961</td>
<td>40.6</td>
</tr>
</tbody>
</table>

**Total** 167 722

Table 5.7: Readout requirements for the Mu3e pixel detector operated at Phase II, from the front-end FPGAs to the switching boards. The front-end FPGAs connected to the recurl station layers are labelled by inner/outer (I/O) and up- and downstream (U/D).

Table 5.8: Readout requirements for the Mu3e pixel detector operated at Phase II, from the switching boards to the filter farm. The switching FPGAs connected to the recurl station layers are labelled by up- and downstream (U/D).
5.3.5 Requirements for the data acquisition system

The requirements for the different stages for Phase I and Phase II can be inferred from the previous section and are summarized Table 5.9.

The MuPix sensors are readout with serial links operated at 1.25 Gb/s. This bandwidth is partially used by 8b/10b encoding and the readout block structure, see Table 5.11 for the implementation in the current prototype MuPix8. Assuming a maximum of 64 hits per readout block, 927.5 Mb/s are available for hit data. Further details on the readout structure of the pixel sensors are given in section 5.4.

Already with a single link, a significant bandwidth margin is available for the first phase of the experiment. For the second phase, three links per sensor are required in the inner detector, increasing the available bandwidth to 2.78 Gb/s. However, the detector geometry is planned to be adapted to reduce the maximum occupancy per sensor to about 80 MHz. The resulting maximum data rate is about 2.6 Gb/s, matching the sensor’s bandwidth.

The maximum optical bandwidth from the front-end FPGAs to the switching boards is around 3.0 Gb/s in the first phase, and 29 Gb/s in the second phase. This bandwidth can be easily accommodated into links running at a speed of 6.25 Gb/s, which most of the commercially available FPGAs are capable of. For phase I, a single optical link per front-end FPGA is sufficient. Moreover, there is still enough margin to accommodate slow control information in the data stream to the switching board. For Phase II, up to 5 optical links are necessary per FPGA. Applying 8b/10b encoding, this number increases to 6.

From the switching boards to the filter farm, the maximum optical bandwidth required is 32.3 Gb/s in the first phase. This can be accommodated into 4 links running at 10 Gb/s, which is possible with high-end FPGAs.

<table>
<thead>
<tr>
<th>data rate</th>
<th>Phase I</th>
<th>Phase II</th>
</tr>
</thead>
<tbody>
<tr>
<td>maximum per sensor to FEB</td>
<td>166.4 Mb/s</td>
<td>3.3 Gb/s</td>
</tr>
<tr>
<td>total pixel detector to FEB</td>
<td>33.4 Gb/s</td>
<td>668 Gb/s</td>
</tr>
<tr>
<td>maximum per FEB to SB</td>
<td>2.99 Gb/s</td>
<td>28.9 Gb/s</td>
</tr>
<tr>
<td>total FEB to SB</td>
<td>169 Gb/s</td>
<td>725 Gb/s</td>
</tr>
<tr>
<td>maximum per SB to FF</td>
<td>32.3 Gb/s</td>
<td>155 Gb/s</td>
</tr>
<tr>
<td>total SB to FF</td>
<td>40.4 Gb/s</td>
<td>722 Gb/s</td>
</tr>
</tbody>
</table>

Table 5.9: Summary of maximum and total data rates at the different stages in the readout system for both phases of the experiment. FEB denotes front-end board, SB stands for switching board, FF represents filter farm. Data rates per front-end and switching boards rely on the packet structures described in the previous section. No further protocol is implied. * As mentioned before, for Phase II, the maximum detector occupancy will be reduced by changing the detector geometry. The maximum hit rate is estimated to be about 80 MHz for Phase II. The resulting data rate is given.
available on the market. However, a more efficient transmission protocol than 8b/10b is required. In the second phase, to transfer all data from the switching boards, up to 16 high-speed links are required per switching board.

In the following, the hardware components of the pixel detector and its readout system are described.

5.4 THE MUPIX PIXEL SENSORS

The Mupix is a pixel sensor specifically designed for the Mu3e experiment. It is produced in a commercial high-voltage-CMOS (HV-CMOS) process that allows to integrate sensing diodes and readout electronics into a single die. Hence, these types of sensors are referred to as HV-MAPS [11]. The sensors can be thinned to 50 µm, which allows to build ultra-low material pixel layers for the experiment.

Sensing diodes are created by placing deep n-wells inside a p-substrate. At the p-n junction, a small depletion zone forms. Applying a high voltage between the well and the substrate, the depletion zone and thus the sensitive volume are enlarged. Free electron-hole pairs that are created within the depletion zone by an ionizing particle are accelerated by the strong local electric field, which leads to fast charge collection. The induced signal is picked-up by the electronics placed inside the deep n-well.

Various prototypes of the Mupix sensor family have been produced and characterized [54–64], with more and more readout electronics integrated from version to version. The final Mupix sensor for the Mu3e experiment, hereinafter referred to as MupixX, relies heavily on the developments of the last two prototypes, Mupix7 and Mupix8. Mupix7 is the first HV-CMOS pixel sensor having the complete readout electronics integrated [55]. Mupix8 [65] is the first large prototype and the latest available at the time of writing. The implementation of Mupix8 will be further described in section 5.4.2, as it is used for numerous tests of the readout system in the context of this thesis. First, an outline to MupixX is given, describing the requirements it has to fulfil.

5.4.1 MupixX

The final Mupix sensor has an active area of $2 \times 2 \text{cm}^2$, divided into $250 \times 250$ pixels of $80 \times 80 \text{µm}^2$. The sensor has to cope with the highest hit rates expected for Phase II, which is of the order of 80 MHz. To that end, three serial links running at 1.25 Gb/s provide enough bandwidth, as discussed in section 5.3.5. The sensor has a minimal set of in- and output pads, to ease the assembly of detector modules. All in- and output signals are implemented differentially.
5.4.2 The latest prototype: MuPix8

MuPix8 is the first large HV-MAPS prototype for Mu3e. It is produced by austriamicrosystems AG (AMS) in the aH18 process. The layout of MuPix8 is displayed in Figure 5.7.

MuPix8 consists of 128 × 200 pixels with a size of 81 × 80 µm² [65], which yields a total active sensor area of 1.659 cm². The full chip covers an area of 2.106 cm². The insensitive area is used for digital readout electronics, bias blocks and pads to connect the chip. Inside each pixel, an amplifier and a signal driver are implemented to boost the signal induced by the moving charge carriers and transfer it to the periphery, where it is digitized and consequently readout by the state machine. For a schematic overview of the pixel electronics, see Figure 5.8.

The pixel matrix of MuPix8 is divided into three sub matrices, in the following referred to as matrices A, B and C, that are read out individually. Matrices A and B consist of 48 columns each, while matrix C holds the remaining 32 columns. Pixels in matrix A feature a voltage based signal transmission to the periphery, while transmission from pixels in matrices B and C is current based.

Figure 5.7: Layout of MuPix8.
The MuPix8 features several on-chip bias digital-to-analog converters (DACs), pixel tune DACs and configurable readout settings that allow the sensor to be operated under different conditions. The configuration of the MuPix8 is implemented as a shift register with a length of 2998 bit. Each bit is implemented with three latches, see Figure 5.9. The external configuration interface comprises five input signals and one output signal. The input signals are: a data input, which is connected to the first bit of the shift register; the read back signal, which acts as the select bit of a multiplexer that allows to read back the current sensor configuration stored in the third latch; two clocks that update the first two latches, respectively; and the load signal that updates the third latch, which contains the actual configuration of the chip. The bits are daisy-chained, such that the second latch of a bit drives the first latch of the subsequent bit. The output of the last bit’s second latch is connected to the data output of the sensor.

The shift register holds the pixel configuration for a single, addressable row, such that rows are configured one after the other. Therefore, the configuration register has to be written 200 times, once for each row, until the complete pixel configuration is updated. For efficient data taking, a fast configuration scheme is required, which has been successfully implemented on an FPGA. The total chip configuration time could be reduced down to about 100 ms. The firmware implementation is described in 7.4.1.

Digital periphery

In the MuPix8, each pixel has a signal driver integrated that transmits the analogue signal to the pixel’s dedicated readout cell in the chip periphery [65, 66]. Each readout cell hosts two comparators, see Figure 5.8, with
individually tunable thresholds. The first comparator is used to detect a particle hit and sets the corresponding hit latch. The second comparator is used to mitigate time walk effects. When a hit is detected, its 10 bit and 6 bit timestamps, the latter is used to derive an amplitude equivalent information off-chip, are stored. Each readout cell features an 8 bit read-only memory (ROM) that stores the pixel’s row address.

Each column features an end of column (EOC) block. It hosts an 8 bit ROM that stores the column address [65]. Here, the full 32 bit hit information is assembled during readout. All readout cells inside a column are connected to the EOC block using a priority chain based on the pixel’s position, illustrated in Figure 5.10. During readout, the first pixel that contains a hit within a column is selected to copy its data to its corresponding EOC block. A similar priority logic is implemented for the EOC blocks towards the state machine.

**Readout state machine**

An on-chip integrated readout state machine makes the MuPix8 an autonomous particle detection system. Once the MuPix is supplied with its configuration and system clock, it outputs zero suppressed hit information on a serial data link.

The state diagram of the readout state machine is shown in Figure 5.11. After a reset, the state machine starts in the synchronization state, where

---

**Figure 5.10:** Simplified illustration of the priority logic for a subset of the readout cells and end of column blocks. The select signal for the columns’ priority chain is called ReadPixel. The select signal for the EOC blocks priority chain is called ReadColumn. The availability of data to the state machine is indicated by the PriFromDet signal.
the state machine continuously outputs the alignment character K28.5⁴. From there, it enters the sendcounter mode or the actual pixel readout mode, depending on the user-based configuration.

In the sendcounter mode, the state machine outputs the value of the continuously updated 24 bit on-chip counter, see Table 5.10. This is useful for studies of the serial links as the generated data patterns are more diverse than those during regular readout operation. In addition, the patterns do not depend on the configuration of the analog and digital pixel cells, and they do not depend on the hit rate.

In the pixel readout mode, the state machine periodically generates the signals PullDown (PD), LoadColumn (LdCol), LoadPixel (LdPix) and

⁴ For a definition of the control characters of 8b/10b encoding, see Table 4.2.
Table 5.10: Data format of Mupix8 in sendcounter mode at full readout speed (top, timerend = 0) and at half the readout speed (bottom, timerend = 1). Counter is the on-chip generated binary counter, TimestampToDet corresponds to the Gray-encoded timestamp that is distributed to the readout cells.

<table>
<thead>
<tr>
<th>cycle (16ns)</th>
<th>state</th>
<th>data</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>sendcounter1</td>
<td>Counter[23:0] TimestampToDet[7:0]</td>
</tr>
<tr>
<td>2</td>
<td>sendcounter2</td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>cycle (16ns)</th>
<th>state</th>
<th>data</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>sendcounter1</td>
<td>Counter[23:0] TimestampToDet[7:0]</td>
</tr>
<tr>
<td>2</td>
<td>sendcounter1</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>sendcounter2</td>
<td>K28.5 K28.5 K28.5 K28.5</td>
</tr>
<tr>
<td>4</td>
<td>sendcounter2</td>
<td></td>
</tr>
</tbody>
</table>

*ReadColumn* (RdCol) that are sent to the EOC blocks [65, 67]. There are two states per signal, e.g. *PD1* and *PD2*. The signal is generated in the first of these two. In the second, new data is assigned to the output. On PD, the column data buses, carrying the data from the readout cells to the EOC blocks, are cleared. On LdCol, *ReadPixel* (see Figure 5.10) is triggered in the EOC blocks, which causes the first hit in each column’s priority logic to be transferred to the EOC blocks.

Table 5.11: Data format of Mupix8 [67] at full readout speed (timerend set to 0). ID represents the link identifier word. For matrix A, B and C, ID = 0xAA, 0xBB and 0xCC, respectively. Counter is the on-chip generated binary counter, TimestampToDet corresponds to the Gray-encoded timestamp that is distributed to the readout cells. The hit data contains the hit address and the timestamps.

<table>
<thead>
<tr>
<th>cycle (16ns)</th>
<th>state</th>
<th>data</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>PD1</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>PD2</td>
<td>K28.5 K28.5 K28.5 K28.5</td>
</tr>
<tr>
<td>3</td>
<td>LdCol1</td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>LdCol1</td>
<td>K28.5 K28.5 K28.5 K28.5</td>
</tr>
<tr>
<td>5</td>
<td>LdCol1</td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>LdCol1</td>
<td>K28.5 K28.5 K28.5 K28.5</td>
</tr>
<tr>
<td>7</td>
<td>LdCol1</td>
<td></td>
</tr>
<tr>
<td>8</td>
<td>LdCol2</td>
<td>K28.0 ID K28.0 ID</td>
</tr>
<tr>
<td>9</td>
<td>LdPix1</td>
<td></td>
</tr>
<tr>
<td>10</td>
<td>LdPix2</td>
<td>if hits: Counter[23:0] TimestampToDet[7:0]</td>
</tr>
<tr>
<td></td>
<td></td>
<td>next state: RdCol1</td>
</tr>
<tr>
<td></td>
<td></td>
<td>else: K28.5 K28.5 K28.5 K28.5</td>
</tr>
<tr>
<td></td>
<td></td>
<td>next state = PD1</td>
</tr>
<tr>
<td>11</td>
<td>RdCol1</td>
<td></td>
</tr>
<tr>
<td>...</td>
<td>RdCol1/2</td>
<td>repeat for $N_{hits}$</td>
</tr>
</tbody>
</table>
When a hit occurs within a pixel, the corresponding hit latch is set and its timestamps are stored, as described in the previous section. But the hits are not yet visible to the column’s priority logic. To allow for all bits in the readout cell to settle, the hit latch is synchronized by the state machine on LdPix before the data can be accessed by the EOC block in the subsequent readout cycle on LdCol. On RdCol, data from the first EOC with a hit is sent to the state machine. Availability of data to the state machine is indicated using the \textit{PriFromDet} signal, see Figure 5.10.

All clocks that are used for readout and timestamp generation are derived from the fast on-chip VCO, described in the following section. The state

<table>
<thead>
<tr>
<th>cycle (16ns)</th>
<th>state</th>
<th>data</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>PD1</td>
<td>K28.5 K28.5 K28.5 K28.5</td>
</tr>
<tr>
<td>2</td>
<td>PD1</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>PD2</td>
<td>K28.5 K28.5 K28.5 K28.5</td>
</tr>
<tr>
<td>4</td>
<td>PD2</td>
<td></td>
</tr>
<tr>
<td>5</td>
<td>LdCol1</td>
<td>K28.5 K28.5 K28.5 K28.5</td>
</tr>
<tr>
<td>6</td>
<td>LdCol1</td>
<td></td>
</tr>
<tr>
<td>7</td>
<td>LdCol1</td>
<td>K28.5 K28.5 K28.5 K28.5</td>
</tr>
<tr>
<td>8</td>
<td>LdCol1</td>
<td></td>
</tr>
<tr>
<td>9</td>
<td>LdCol1</td>
<td>K28.5 K28.5 K28.5 K28.5</td>
</tr>
<tr>
<td>10</td>
<td>LdCol1</td>
<td></td>
</tr>
<tr>
<td>11</td>
<td>LdCol1</td>
<td>K28.5 K28.5 K28.5 K28.5</td>
</tr>
<tr>
<td>12</td>
<td>LdCol1</td>
<td></td>
</tr>
<tr>
<td>13</td>
<td>LdCol1</td>
<td>K28.5 K28.5 K28.5 K28.5</td>
</tr>
<tr>
<td>14</td>
<td>LdCol2</td>
<td>K28.0 ID K28.0 ID</td>
</tr>
<tr>
<td>15</td>
<td>LdCol2</td>
<td></td>
</tr>
<tr>
<td>16</td>
<td>LdCol2</td>
<td></td>
</tr>
<tr>
<td>17</td>
<td>LdPix1</td>
<td>K28.5 K28.5 K28.5 K28.5</td>
</tr>
<tr>
<td>18</td>
<td>LdPix1</td>
<td></td>
</tr>
<tr>
<td>19</td>
<td>LdPix2</td>
<td>if hits: Counter[23:0] TimestampToDet[7:0] [else: K28.5 K28.5 K28.5 K28.5]</td>
</tr>
<tr>
<td>20</td>
<td>LdPix2</td>
<td>if hits, next state: RdCol1 [else next state = PD1]</td>
</tr>
<tr>
<td>21</td>
<td>RdCol1</td>
<td>K28.5 K28.5 K28.5 K28.5</td>
</tr>
<tr>
<td>22</td>
<td>RdCol1</td>
<td></td>
</tr>
<tr>
<td>24</td>
<td>RdCol2</td>
<td></td>
</tr>
<tr>
<td>...</td>
<td>RdCol1/2</td>
<td>repeat for $N_{hits}$</td>
</tr>
</tbody>
</table>

Table 5.12: Data format of Mupix8 at half the maximum readout speed (\textit{timerend} set to 1). This corresponds to the typical operation of MuPix8 at 1.25 Gb/s as used in the context of this thesis. ID represents the link identifier word. For matrix A, B and C, ID = 0xAA, 0xBB and 0xCC, respectively. Counter is the on-chip generated binary counter, TimestampToDet corresponds to the Gray-encoded timestamp that is distributed to the readout cells. The hit data contains the hit address and the timestamps.
machine provides the timestamps for all readout cells which can be reset synchronously by applying an external signal. The timestamps are Gray-coded, which reduces the power consumption and the possibility of metastable states as only a single bit flip occurs per clock cycle. The timestamp’s frequency can be divided using on-chip registers.

The readout clock can be divided using a 6 bit register, called $\text{timerend}$. The state machine formats the data from the EOC logic, and creates packets following the data format illustrated in Table 5.11 for a $\text{timerend}$ value of 0, and in Table 5.12 for $\text{timerend} = 1$. In the context of this thesis, a $\text{timerend}$ value of 1 is chosen as power distribution or timing issues do not allow for operation of the sensor with $\text{timerend} = 0$.

The state machine sends blocks of 32 b at a frequency of 31.25 MHz to the serializer, see Figure 5.12a, where the bytes are serialized in the input register and subsequently 8b/10b encoded at a clock frequency of 125 MHz. The output register formats the 10 b data again into bytes, that are clocked at a frequency of 156.25 MHz. In the serializer tree, shown in Figure 5.12b, the bytes are serialized into a 2 bits. These two bits are forwarded to the last stage of serialization, which is described in the following section.

Each of the three matrices of MuPix8 is readout by an individual state machine, which is connected to an individual serial link. In addition, MuPix8 features a fourth serial link, which can be operated in different modes: it can either provide a copy of one of the other three links, or it can be used with data multiplexed from the three state machines. The latter allows to readout the complete sensor using a single serial link.

![Diagram](image)

(a) Implementation of serializer block. (b) Implementation of the serializer tree.

Figure 5.12: The 32bit parallel data from the state machine is byte-serialized, 8b/10b encoded, and subsequently serialized into 2bits in the serializer tree. Adapted from [65].
Last stage of serialization

The last serialization stage of MuPix8 is implemented in a fully custom-designed block. This block further consists of a VCO combined with a PLL and the clock generation tree for the serializer’s last stage. It is implemented in differential current mode logic (DCL) [65].

The VCO is implemented as a ring oscillator with 8 differential buffer stages and an inverted feedback loop. The PLL consists of a phase detector, that compares the VCO clock to an external reference clock, and a charge pump, that adjusts the externally configurable VCO’s bias currents $VPVCO$ and $VNVCO$, to lock the phases of both clocks.

In the clock generation tree, see Figure 5.13, the following clocks are derived: the master clock provided to the CMOS digital part; the clock that latches the two bits of data originating from the serializer tree (Figure 5.12b) into the last stage, called $DCLclk$; and the clock that is used last for multiplexing these two bits into the single bit serial data stream, called $delayed\ DCLclk$. The delay between these clocks is adjustable through the on-chip registers, called $VNDelDCL$ and $VNDelDCLMux$. Furthermore, the

For the delay bias DACs, there exists a differential counterpart, called $VPDelDCL$ and $VPDelDCLMux$.

---

![Figure 5.13: Clock generation tree of MuPix8.](image)

![Figure 5.14: Last stage of the DCL serializer of MuPix8.](image)

![Figure 5.15: Pre-emphasis stage of MuPix8.](image)
phase of the latching and multiplexing clocks can be inverted with respect to the clock of the digital part.

The last stage of the serializer, see Figure 5.14, consists of a pipeline of two and three latches for the two incoming bits, respectively, to assure that the data is not read by the multiplexer while it changes [65]. The data is multiplexed using the delayed DCLclk.

To compensate for losses over transmission media with low bandwidth, the serializer features pre-emphasis to boost the high frequency content of the data stream. Its implementation is similar to the first post-tap of Stratix IV transmitters, see section 4.3.3. The signal is split into two streams, with one being delayed and inverted, see Figure 5.15. The delay is adjustable using the VNDelPreEmp register. The driving strength of the signals is adjustable as well. The base differential output voltage is set through the VNLVDS register. The strength of the inverted signal is adjustable through VNLVDSDel. The sum of both signals is present at the serial data output.

The settings of the delays and signal drivers are global settings for all links of the chip. They can not be fine adjusted for each link individually.

5.5 THE FRONT-END BOARD

The front-end boards of the Mu3e experiment are used for sensor readout, configuration and clocking, as well as for optical data transfer between the detector and the counting house. They are placed inside the magnet and electrically connected to the sensors. Each front-end board hosts a powerful FPGA. The front-end FPGAs receive the serial hit data from the sensors, restore the chronological hit order, which is necessary for the online event reconstruction, and prepare the data for the optical transmission.

The conceptual firmware design that implements the aforementioned tasks for the pixel detector on the front-end is illustrated in Figure 5.16. The serial sensor data is deserialized using LVDS receiver IPs, which are described in greater detail in section 4.3.1. The data decoder performs bit alignment by identifying the control character K28.5 within the incoming data\textsuperscript{6}. Furthermore, it performs 8b/10b decoding, error detection as well as hit extraction from the link protocol. In the data protocol of the MuPix sensor, it takes 4 clock cycles to transfer a pixel hit. Thus, the hit data of 4 links can be "serialized" into a single data stream with hits potentially arriving every clock cycle.

Subsequently, the hit data enters the hit sorter [2], where the hits are stored in a random-access memory (RAM) at addresses according to their timestamps. From this RAM, the hits are read in chronological order and packed into data packets that contain all hits belonging to a set of timestamps. The packets are temporarily stored in a FIFO before they are serialized in the FPGA gigabit transmitter blocks and sent to the counting

\textsuperscript{6} K28.5 is used as idle character in the MuPix and the MuTRiG, which allows to use the same decoder implementation for both ASICs.
house via the optical transmitters. This allows to interleave the hit data with slow control information.

The firmware implementation is not yet finalized, thus, additional features are under evaluation. For instance, a 2-to-1 hit serializer in front of the hit sorter could reduce the sorter’s logic area usage by a factor of 2, but would require to double the clock frequency.

For system commissioning, different parts of the firmware can be bypassed, for instance the hit sorter and subsequent entities. This allows to check the correct operation of the sensors and of the different firmware entities as well as to identify issues while the system is running.

The front-end FPGA is placed on a custom-designed board. The experimental requirements on the board and its FPGA are compiled in Table 5.13. All components on the front-end boards have to withstand magnetic fields up to 2 T. This requires, for instance, the use of power regulators with external air coils. As space is limited, a eurocard design with a size of $17.5 \times 10 \text{ cm}^2$ is chosen, which fits into matching custom-designed mini crates. A high multi-purpose input/output (I/O) count is required to connect up to 36 MuPix sensors at the outer pixel layers, see Table 5.14, or several MuTRiG ASICs for the tile and fibre detectors. To reduce elec-
Table 5.13: Requirements on the front-end board and its FPGA. The size of the RAM for the hit sorter follows from the packet structure described in section 5.3.2. The RAM should store up to 63 per timestamp, with a size of 28 bit per hit. The number of timestamps is assumed to be 1024.

<table>
<thead>
<tr>
<th>requirement</th>
<th>value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Size</td>
<td>$17.5 \times 10 \text{cm}^2$</td>
</tr>
<tr>
<td>Tolerance for magnetic fields</td>
<td>$\geq 2 \text{T}$</td>
</tr>
<tr>
<td>LVDS receiver channels (1.25 Gb/s)</td>
<td>$\geq 45$</td>
</tr>
<tr>
<td>Optical output bandwidth</td>
<td>$\geq 28.9 \text{Gb/s}$</td>
</tr>
<tr>
<td>Hit sorter RAM</td>
<td>$\geq 1806 \text{kb}$</td>
</tr>
</tbody>
</table>

Table 5.14: Number of MuPix sensors and serial data links per FPGA for the pixel detector layers. The number of FPGAs per layer 3 and layer 4 include the central detector as well as the recurl stations. Due to the odd number of sensors per ladder in layer 3, half of the FPGAs are connected to 32 sensors, the other half to 36 sensors.

<table>
<thead>
<tr>
<th>layer</th>
<th>FPGAs per FPGA</th>
<th>sensors per FPGA</th>
<th>sensor links per FPGA</th>
</tr>
</thead>
<tbody>
<tr>
<td>1 &amp; 2</td>
<td>8</td>
<td>12/15</td>
<td>36/45</td>
</tr>
<tr>
<td>3</td>
<td>36</td>
<td>32/36</td>
<td>32/36</td>
</tr>
<tr>
<td>4</td>
<td>42</td>
<td>36</td>
<td>36</td>
</tr>
</tbody>
</table>

Table 5.15: Requirements on the front-end board and its FPGA. The size of the RAM for the hit sorter follows from the packet structure described in section 5.3.2. The RAM should store up to 63 per timestamp, with a size of 28 bit per hit. The number of timestamps is assumed to be 1024.

The choice of the FPGA is based on the following considerations. The FPGA has to have at least 45 dedicated LVDS receiver channels, which is the maximum number of sensor links per FPGA at the innermost pixel layers, see Table 5.14. It should offer a reasonably high logic count, enough RAM for the implementation of the sorting algorithm and more than 28.9 Gb/s of optical bandwidth. Furthermore, low power consumption and low costs are advantageous. An Intel Arria V FPGA has been chosen as it matches all requirements well, see Table 5.15 for the FPGA's features. The front-end board based on this FPGA is presented in the following section.

5.5.1 The front-end board in development based on the Arria V FPGA

The front-end board based on the Arria V FPGA 5AGXBA7D4F31C5N [68] is being designed at the time of writing. Among the Arria V device family, device migration is possible without requiring any changes to the layout. For the chosen device of type Arria V A7, migration is possible to types A5, B1 and B3. Table 5.16 shows the differences in logic elements, adaptive logic modules and block memory. If the logic count of the A7 turns out not
Table 5.15: List of features of the Arria V FPGA (5AGXBA7D4F31C5) [68, 69] for the front-end board that is currently in development. The maximum data rate for the LVDS channels is 1.25 Gbps. The GXB channels have a maximum data rate of 6.5536 Gbps. ALM = adaptive logic module, LE = logic element, DSP = digital signal processor.

<table>
<thead>
<tr>
<th>Feature</th>
<th>Arria V A7</th>
</tr>
</thead>
<tbody>
<tr>
<td>ALMs</td>
<td>92 k</td>
</tr>
<tr>
<td>LEs</td>
<td>242 k</td>
</tr>
<tr>
<td>User I/Os</td>
<td>384</td>
</tr>
<tr>
<td>Total block memory bits</td>
<td>13 660 k</td>
</tr>
<tr>
<td>DSP block 18-bit elements</td>
<td>800</td>
</tr>
<tr>
<td>GXB channels PCS</td>
<td>9</td>
</tr>
<tr>
<td>GXB channels PMA</td>
<td>9</td>
</tr>
<tr>
<td>LVDS channels (receiver/transmitter)</td>
<td>96/84</td>
</tr>
<tr>
<td>PLLs</td>
<td>12</td>
</tr>
</tbody>
</table>

Table 5.16: Counts of logic elements, adaptive logic modules, and memory bits for the four Arria V FPGAs, among which device migration is feasible without changing the board layout. Values taken from [68]. The chosen device is highlighted.

<table>
<thead>
<tr>
<th>Device</th>
<th>LEs</th>
<th>ALMs</th>
<th>Memory [kb]</th>
</tr>
</thead>
<tbody>
<tr>
<td>A5</td>
<td>190 k</td>
<td>71 698</td>
<td>12 973</td>
</tr>
<tr>
<td>A7</td>
<td>242 k</td>
<td>91 680</td>
<td>15 108</td>
</tr>
<tr>
<td>B1</td>
<td>300 k</td>
<td>113 208</td>
<td>16 952</td>
</tr>
<tr>
<td>B3</td>
<td>362 k</td>
<td>136 880</td>
<td>17 260</td>
</tr>
</tbody>
</table>

to be sufficient for the use in the experiment, it could be increased by about 50% by migrating to the Arria V B3.

A drawing of the block diagram for the front-end board is depicted in Figure 5.17. The front-end board has a significant demand of low-voltage power. Since it is operated in a magnetic field of 1 T, standard DC-DC converters with ferrite core coils cannot be used. External air coils are used as inductors to allow for operation in the magnetic field.

Two clock chips (SI5345 [70] by Silicon Labs) are used that provide the FPGA with all its clocks. In addition, they provide fast clocks with minimum jitter for the MuTRiG at the detector I/Os. The clock chips receive a reference clock via the optical receivers coming from the clock distribution system, which is described in section 5.8. The clock chips additionally provide a reference clock for one another, such that only one optical clock line is required. For optical data transmission, two four-fold bidirectional Samtec Firefly transceivers [71] are used. They offer 8 optical links towards the switching boards, driven by the FPGA’s gigabit transmitters at 6.25 Gb/s, thus, fulfilling the bandwidth requirement. Among the receiving channels, two are used as clock lines, two are used
Figure 5.17: Schematic drawing of the front-end board prototype with the Arria V FPGA.
as reset lines and the four remaining channels are available to transfer configuration data.

The sensors are connected to the front-end board via high density QStrip high-speed ground plane sockets (QSHs) [72]. For programming, a Joint Test Action Group (JTAG) interface and an active serial configuration device with 256 Mb of flash memory are used. The latter allows to store an FPGA hardware image on-board, which is automatically loaded when the board is powered. An interface to a crate controller is foreseen for additional slow control and power supervision of the board. Furthermore, various debug and test I/Os are used.

5.5.2 The first front-end board prototype based on the Stratix IV FPGA

Prior to the current front-end board development, a first prototype version of the front-end board was designed based on a Stratix IV FPGA. The purpose of this prototype was to gain experience in the board’s development and also in its operation. At the time the specifications for this first prototype were formulated, the Stratix IV EP4SGX70HF35C4 [73] was fulfilling the design requirements and chosen due to its availability and moderate costs. The Arria V FPGA was not yet available on the market.

The Stratix series is the high-end FPGA series of Intel. This specific device offers many I/Os, LVDS channels and multi-gigabit links. On the downside, it has a rather low logic count. The FPGA’s features are summarized in Table 5.17.

Figure 5.18 shows a schematic of the front-end board prototype. The FPGA can be programmed via a JTAG interface and additionally via an active serial configuration device (EPCS128SI16N) [74] containing 128 Mb of flash memory.

The front-end board offers plenty of optical bandwidth. It hosts sockets for a Quad Small Form-factor Pluggable (QSFP) transceiver module and

<table>
<thead>
<tr>
<th>Feature</th>
<th>Stratix IV GX70</th>
</tr>
</thead>
<tbody>
<tr>
<td>ALMs</td>
<td>29 k</td>
</tr>
<tr>
<td>LEs</td>
<td>73 k</td>
</tr>
<tr>
<td>User I/Os</td>
<td>488</td>
</tr>
<tr>
<td>Total block memory bits</td>
<td>6617 k</td>
</tr>
<tr>
<td>DSP block 18-bit elements</td>
<td>384</td>
</tr>
<tr>
<td>GXB channels PCS</td>
<td>16</td>
</tr>
<tr>
<td>GXB channels PMA</td>
<td>24</td>
</tr>
<tr>
<td>LVDS channels (receiver/transmitter)</td>
<td>56/56</td>
</tr>
<tr>
<td>PLLs</td>
<td>4</td>
</tr>
</tbody>
</table>

Table 5.17: List of features of the Stratix IV FPGA (EP4SGX70HF35C4) [21, 73] used on the first front-end board prototype. The maximum data rate for the LVDS channels is 1.25 Gbps. The GXB channels have a maximum data rate of 6.375 Gbps.
Figure 5.18: Schematic drawing of the front-end board prototype with the Stratix IV FPGA.
a MiniPod [75] transmitter and receiver pair. This allows to perform tests of two optical components that are still going to be used in the Mu3e experiment at later stages of the DAQ system. The MiniPod is used on the switching board, which is described in section 5.6. QSFP modules are used at the filter farm, discussed in section 5.7. In total, there are 16 multi-gigabit transceivers connected, which can be operated at 6.25 Gb/s. A photo of the front-end board prototype equipped with the optical transceivers is shown in Figure 5.19.

The front-end board features two clock ASICs, SI5342 and SI5345 by Silicon Labs [70], that provide the FPGA with all required clocks. Both ASICs are connected to individual 50 MHz oscillators [76] and to SubMiniature version A (SMA) connectors to supply external reference clocks for the ASICs. This allows to operate the front-end board within a larger, synchronous system. The clock chips additionally reduce the jitter of the incoming reference clocks. An implemented feedback loop assures phase-stability between the outputs and the input of the SI5345. This feature is necessary for the Mu3e experiment to guarantee that all front-end boards and, thus, all sensors are operated in phase. The SI5342 does not have a feedback loop, hence, its output clocks are only used for the FPGA’s slow control. Both ASICs are configured by the FPGA via a 4-wire Serial Peripheral Interface (SPI). The clock chips have a non-volatile memory (NVM) to store the configuration permanently.

The board hosts five high density QSH-sockets [72], hereinafter also referred to as QSH-banks, to connect the sensors to the FPGA. Each QSH-bank has 60 user pins, with 9 differential input pairs connected to dedicated LVDS receivers of the FPGA. Moreover, each bank provides a differential clock and reset signal for the sensors. On the prototype board,
configuration pins are routed as single-ended signals. Test pulse circuitry as well as three differential analog-to-digital converter (ADC) channels are connected. The remaining pins are used for general purpose signals, grounding and supply voltages.

The front-end board hosts a test pulse circuitry used to inject charge into the pixel sensor. The test pulse circuitry consists of two AD5664-16 bit Quad DACs. The DACs are controlled by the FPGA via a shared SPI interface. Pulsing is established with one analogue switch per QSH-bank.

The front-end board hosts 5 multi-purpose ADCs of type LTC2991 [77] that can be configured to measure diode temperatures, voltages or currents. Configuration and readout of the ADCs is implemented via an Inter-Integrated Circuit (I²C) interface. In addition to the channels connected to the QSH-banks, the ADCs are used to measure the temperature of the FPGA and their own internal temperature. To determine the temperature of a connected diode, the ADC sources a current $I_C$. It measures the corresponding diode base-emitter voltage $U_{BE}$ and converts it to a temperature $T$ following

$$ T = \frac{U_{BE} \cdot e}{\eta \cdot k_B \cdot \ln \left( \frac{I_C}{I_S} \right)} , \quad (5.1) $$

where $e$ is the elementary charge, $\eta$ is the diode ideality factor, $k_B$ is Boltzmann’s constant and $I_S$ is a process dependent factor [77]. The temperature $T$ is given in Kelvin. The ADC uses an ideality factor $\eta$ of 1.004 [77] for its on-chip conversion. For temperature calibration of devices with deviating or unknown ideality factor, the measured voltages are also recorded.

For debugging purposes, the front-end board further contains LEDs, test pins and a liquid-crystal display (LCD).

Reasons for device migration to the Arria V

The first version of the front-end board was primarily designed for lab studies as well as to gain experience in the development of such a complex Printed Circuit Board (PCB). To ease the accessibility of all on-board components, the board was produced with a size of $18.7 \times 18.0 \text{cm}^2$. To meet the spatial requirements of the detector, shrinkage was foreseen for the board’s second version.

In the context of this thesis, the first prototype of the front-end board was brought into operation and successfully integrated in a vertical slice of the Mu3e readout system, described in chapter 7. During that process, some issues in the design of the board were found that led to the decision to develop the new prototype based on the Arria V FPGA instead of another version of the Stratix IV board. One QSH-bank was found to be missing dedicated deserializer IPs at the FPGA, which reduces the number of connectable sensor data links to 36. In addition, it was found that the power regulators used on the prototype board do not tolerate strong magnetic fields, as they incorporate coils with ferrite cores. Implementation
of both changes in combination with the shrinkage of the board would have required a completely new board design. This was used as an opportunity to introduce newer components like the Samtec Firefly and to reconsider the choice of FPGA using best current knowledge of the experimental needs.

5.6 switching board

The switching boards aggregate the data from the sub-detectors and distribute it to the filter farm. Four switching boards are used for the first phase of Mu3e: one for the central pixel detector, two for the recurl pixel stations up- and downstream, which are shared with the tile detector, and one for the fibre detector. For an illustration of this architecture, see Figure 5.6 in section 5.2.

The PCIe40 board [78, 79], developed for the upgrades of the LHCb and ALICE experiments, serves as switching board for Mu3e. It hosts a powerful FPGA of Intel’s latest FPGA generation: the Arria 10 10AX115S3F45E2SG. The PCIe40 board is optimized for high data throughput featuring 48 full duplex optical links at up to 10 Gb/s. In addition, it features a 16 lane PCIe 3.0 interface with a bandwidth of 112 Gb/s [80].

The conceptual firmware implementation for the FPGA on the switching board is illustrated in Figure 5.20. The switching board receives the data from the front-end boards over optical links running at 6.25 Gb/s. The optical receivers are connected to the gigabit receiver blocks of the FPGA,

![Figure 5.20: Firmware implementation for the FPGA on the switching board.](image-url)
Table 5.18: Input and output links of the switching boards (SB) for the central pixel detector at Phase I and Phase II. For Phase II, for the inner two layers, 6 links per front-end board are assumed. For the outer two layers, 3 links are assumed.

<table>
<thead>
<tr>
<th></th>
<th>input links</th>
<th>output links</th>
<th>switching boards</th>
<th>input links</th>
<th>output links</th>
</tr>
</thead>
<tbody>
<tr>
<td>Phase I</td>
<td>34</td>
<td>4</td>
<td>1</td>
<td>34</td>
<td>4</td>
</tr>
<tr>
<td>Phase II</td>
<td>126</td>
<td>64</td>
<td>4</td>
<td>32</td>
<td>16</td>
</tr>
</tbody>
</table>

that perform deserialization and bit alignment. The data decoder recovers the packets of hit data from the link protocol. Afterwards, the packets are buffered in FIFOs to account for arrival delays between different front-end boards due to payload variations. The packet merger accesses these FIFOs and combines the packets of all connected front-end boards which contain the same timestamps. The resulting packets are buffered in a FIFO and subsequently serialized in the FPGA’s gigabit transmitters, which drive the optical transmitters to the filter farm at 10 Gb/s. Through the high-speed Peripheral Component Interconnect Express (PCIe) interface, the data can be accessed at different firmware entities for system commissioning and verification. Additionally, the hosting PC can exchange pixel configuration data with the switching board. The configuration data is then transferred to the front-end FPGAs via optical links [43].

At the switching board, the uplinks to the filter farm and the downlinks from the front-end boards are operated at different speeds. But FPGAs do not support asymmetric data rates within shared transceiver channels. Thus, the sum of these up- and downlinks is limited by the total number of transceiver channels of 48.\(^7\) For the first phase of the experiment, four links at 10 Gb/s are used to transfer the data to the filter farm. This leaves 44 links for data reception from the front-end boards, which is sufficient as only 34 are necessary, see Table 5.18.

For the second phase of the experiment, three more switching boards are required for the readout of the central detector. The total number of links from the front-end boards increases from 34 to 126. Also, in total 64 high-speed links are necessary for the data transfer to the filter farm. With four switching boards, the bandwidth is most efficiently used having 32 input links and 16 output links per board.

### 5.7 Filter Farm

The purpose of the filter farm is to reduce the data rate that is written to disk to a reasonable level ($\leq 100$ MB/s) by filtering signal-like events from the dominating background. The event selection is based on online

---

\(^7\) For every downlink from the front-end board, an uplink can be implemented at the same speed. The same holds for the links to the filter farm.
track reconstruction. Selected events are sent via Ethernet to a single data collection server that transfers the data to the PSI computing centre for storage and offline analysis.

Twelve PCs are planned to be used for the first phase of the Mu3e experiment. Each PC hosts a powerful graphics processing unit (GPU) for track reconstruction and event selection as well as an FPGA board in a PCIe slot to receive the data from the switching boards. A commercially available FPGA board has been chosen for this purpose: the Terasic DE5a-Net Arria X Development Kit [81]. It hosts a powerful FPGA from the Intel Arria 10 family (10AX115N2F45E1SG). The board features 4 QSFP+ sockets, thus, 16 optical multi-gigabit transceiver channels are available for data transmission and reception. The FPGA cards have an 8 lane PCIe 3.0 interface, allowing for fast data transfer via direct memory access (DMA) to the RAM of the PC [63]. The FPGA cards host 2 independent DDR4-RAM sockets, which allows to add on-board up to 16 GB of high-speed RAM to buffer data during event selection.

To share the workload among the PCs, they are connected in a daisy-chain, see Figure 5.21, with the first PC receiving the data from the four switching boards and forwarding it to the next PC. Each PC receives the full detector data, but processes only a subset of it. For the global architecture of the DAQ system, see Figure 5.6 in section 5.2.

On the FPGA cards, data from all connected switching boards is merged into packets containing the complete detector information, see Figure 5.21. Hits of the central pixel detector are pre-selected for track reconstruction based on simple geometrical criteria [63] before a coordinate transformation prepares them for the GPU event selection. The data is sent from the FPGA using direct memory access to the PC’s RAM where it can be accessed by the GPU.

On the GPU, a multiple scattering track fit [82] is applied in combination with a vertex search based on geometrical constraints. As the reconstruction
algorithm is linearized and can hence be parallelized, GPUs are suited best to perform the event reconstruction. Studies with currently available high-end GPUs (GTX1080Ti) have proven the concept to work [63]. Applying full event selection, the overall background reduction factor is estimated with simulated data to be about 140 [63].

For the second phase of the experiment, the filter farm has to handle much higher data rates, especially from the central pixel detector. The implementation of the filter farm for Phase II is yet under study.

5.8 Clock and reset distribution system

In a high rate experiment like Mu3e, a precise time information of each particle observed in the detector is key to suppress accidental background. Thus, the complete detector has to be operated synchronously. For Mu3e, synchronisation is achieved by using a common 125 MHz clock paired with a synchronous reset signal for the on-chip timestamps.

The clock distribution system assures to minimize the clock jitter and the clock skew between different sensors, to fully exploit the precision of the timing detectors, which is of the order of 500 ps for the fibre detector and 50 ps for the tile detector.

![Clock distribution scheme](image.png)

**Figure 5.22:** Clock distribution scheme for the Mu3e experiment. A master clock and a master reset are electrically split and optically transmitted to the detector.
A custom-designed board generates the 125 MHz master clock and the 1.25 Gb/s reset link for distribution. An electrical fanout is used to generate 256 copies for each of them, which are optically distributed to the subsystems, see Figure 5.22. The protocol of the reset link utilizes 8b/10b encoding. It allows to issue a global synchronous reset on all connected front-end boards, to start and stop data taking, as well as to distribute global parameters like run numbers [83].

5.9 readout links

The readout system for the Mu3e experiment relies heavily on the application of high-speed data links to cope with the expected data rates. Within the detector, pixel sensors and timing ASICs transfer their data electrically at a speed of 1.25 Gb/s over distances up to a few metres. Electrical links at speeds beyond 1.25 Gb/s are used for data transmission between FPGAs and optical transceivers within PCBs. Long range data transmission from the detector to the counting house is realized using high-speed optical links.

5.9.1 Electrical links

The LVDS drivers of the detector ASICs’ serial data outputs are electrically connected to the LVDS receivers of the front-end FPGAs. In the experiment’s baseline design, the physical link consists of differential pairs of aluminium traces on high density interconnects (HDIs), interposers, rigid-flex PCBs and micro twisted pair cables [43], illustrated in Figure 5.23. Most of these components have been tested separately. Studies of the pixel sensor’s transmitters and of the front-end FPGA’s receivers are discussed in chapter 6. Studies of the HDIs, interposers and micro twisted pair cables can be found in [84–86].

On the various PCBs hosting optical transceivers, electrical multi-gigabit transceivers are used for on-board data transmission at data rates of 6.25 Gb/s and 10 Gb/s between the FPGAs and the optical transceivers. The physical links are realized as matched differential pairs of traces on the PCBs. As described in chapter 4, signal quality degradation along the physical link can be mitigated by applying pre-emphasis at the transmitter.

![Figure 5.23: Electrical links between the sensors and the front-end FPGA contain different transmission media. Simplified illustration, lengths of the different transmission media and connectors are not for scale.](image-url)
or equalization at the receiver. In chapter 6, multi-gigabit transceivers of FPGAs are studied in combination with the optical transceivers.

5.9.2 Optical links

The FPGA boards of the Mu3e readout system utilize three different high-speed opto-electrical components. The commercial FPGA boards chosen for the filter farm have four sockets for QSFP+ transceivers, which are commonly used in industrial high-speed network applications. The switching boards use four twelvefold unidirectional MiniPod [75] transmitter and receiver pairs. The MiniPods have a smaller form factor than the QSFP+ modules and were state-of-the-art at the time the development of the PCIe40 board was started. The front-end board, which is currently under development for the Mu3e experiment, implements two fourfold bidirectional Samtec Firefly [71] transceivers, which are state-of-the-art at the time of writing, offering high bandwidth in combination with low power consumption and an even smaller form factor than the MiniPod.

All three devices allow for data rates exceeding 10 Gb/s per channel and operate at a common laser wavelength of 850 nm. Various studies of these transceivers have been carried out. Tests of the MiniPod and QSFP+ transceivers, conducted with the first front-end board prototype, are described in section 6.3.1. Further studies of the QSFP+ and Samtec Firefly transceivers can be found in [87–89].

Connections between transmitters and receivers are realized using 50/125 multi-mode OM3 fibres [43]. In the Mu3e readout system, the link budget of the optical data links is not limited by their length, as the distance between detector and counting house is less than 100 m and the fibres allow for distances up to 300 m [90]. Instead, the link budget is dominated by the various connectors along the transmission path, see Figure 5.24. Especially between detector and counting house, several connectors are required for the links to pass the magnetic door feedthrough and several patch panels.

Figure 5.24: Illustration of the optical links between the front-end board and the switching board and between the switching board and the filter farm.
High-speed data links are an integral part of the Mu3e data acquisition system. Between the sensors and the front-end FPGAs, fast electrical data links are operated at 1.25 Gb/s. The front-end FPGAs send the data to the switching boards via optical links running at a speed of 6.25 Gb/s. The switching boards distribute the data to the filter farm PCs using optical links at 10 Gb/s. The implementation of these high-speed data links within the Mu3e readout system requires extensive testing in advance.

This chapter examines the active components of the electrical links within the detector. The error-free operation of the pixel sensors’ serial links is of the utmost importance, since the link protocol does not provide any proper means of error correction. Therefore, the serializer of the MuPix sensors and the LVDS receivers of the front-end FPGA are subjects of this study.

The first front-end board prototype, described in section 5.5.2, is being integrated into a vertical slice of the Mu3e readout system in the context of this thesis. It features two different opto-electrical components: a QSFP+ transceiver and a MiniPod transmitter and receiver pair. This chapter studies the performance of the optical links realized with both devices. Furthermore, the rate capabilities of Small Form-factor Pluggable (SFP)+ transceivers used within the vertical slice arrangement are investigated.

### 6.1 Studies of the Serializer of the MuPix Sensors

The serializer of the MuPix sensors consists of two parts, as described in section 5.4.2. The sensor data is first serialized into a stream of two bits within the serializer tree, which is fully implemented in CMOS logic. Then the final serial data output is generated within the last serializer stage implemented in DCL. The transition from the serializer tree to the DCL serializer is associated with a clock domain change. The DCL serializer can be tuned using various DACs that control signal strengths and timing delays.

The serializer, the accompanying VCO and the on-chip readout state machine are implemented for the first time in the MuPix7 prototype. For the Mu3e experiment, the serial sensor links are operated at a data rate of 1.25 Gb/s. The fast serializing clock is derived from an external reference clock of 125 MHz using an on-chip PLL. The range of data rates at which the links can operate is studied in order to quantify bandwidth margins and identify potential timing issues.

The MuPix8 features three independent state machines and four serial data links. It is used for integration studies for the Mu3e experiment. Therefore, the signal quality of the serial links is studied in detail at the
Mu3e base data rate of 1.25 Gb/s. The links are tested for on-chip variations as well as for sensor-to-sensor variations.

6.1.1 Data rate studies with MuPix7

The operating range of the MuPix7’s serializer is studied with the setup illustrated in Figure 6.1. The MuPix7 is glued and wire bonded to its motherboard, the MuPix7 PCB [55]. The sensor is clocked, configured and readout by a Stratix IV FPGA on a development board [91] located in the back-end computer. The computer runs the MuPix Single Setup data acquisition and control software [58, 92] that communicates with the FPGA via PCIe using a dedicated device driver [63, 93].

A mezzanine card interfaces the High Speed Mezzanine Card (HSMC) connector of the FPGA board with the Small Computer System Interface (SCSI) connector of the MuPix7 PCB. A 1.8 m long SCSI2-cable with 25 twisted pairs is used to connect both cards. The serial link of the MuPix7 can be sent over this cable to the FPGA, where it is sampled by a GXB receiver, or it can be probed at dedicated SMA connectors on the MuPix7 PCB by setting the PCB’s solder jumpers accordingly. In the latter case, an oscilloscope (Tektronix DPO7254C) [94] and a digital serial analyzer (Tektronix DSA8300) [34] are used to sample the waveforms of the serial link.

A reconfigurable PLL on the FPGA generates the sensor’s reference clock using an input frequency of 125 MHz provided by a clock source on the FPGA board. The clock multiplication factor $M$ and the clock division factor $N$ can be set through the control software. This allows to change

Figure 6.1: Illustration of the setup used for the measurement of the serial data link of MuPix7.
the sensor’s reference frequency $f = M/N \cdot 125\, \text{MHz}$, while the sensor is being operated.

The sensor is operated under regular laboratory conditions at room temperature of about 25°C and a reverse bias voltage of $-85\, \text{V}$. Previous studies have shown that the MuPix7 sensor heats up to about 50°C to 60°C under these conditions.

The PLL of the MuPix7 can be successfully locked within a reference frequency range from 15 MHz to 160 MHz. The sensor’s VCO is tuned for each frequency individually in order to minimize the jitter of the sensor’s clock, using the bias DACs $VPVCO$ and $VNVCO$. A fixed $VNVCO = 10$ yields least jitter for all tested frequencies [61]. The optimal $VPVCO$ settings are compiled in Table 6.1.

The serial link is tested using the FPGA’s GXB receiver by checking the data for 8b/10b errors. Furthermore, the data format is checked to comply with the sensor’s link protocol. The serial link of the MuPix7 is found to be fully functional over the entire range from 150 Mb/s to 1600 Mb/s.

![Serial bit stream of the MuPix7 at 900 Mb/s recorded with the oscilloscope illustrating bit errors due to timing violations in the serializer.](image)

(a) Bit error in K28.5 word because of timing violations. The erroneous bit is highlighted.

(b) Correctly transmitted K28.5 word.

Figure 6.2: Serial bit stream of the MuPix7 at 900 Mb/s recorded with the oscilloscope illustrating bit errors due to timing violations in the serializer.

<table>
<thead>
<tr>
<th>frequency range [MHz]</th>
<th>$VPVCO$ (dec)</th>
</tr>
</thead>
<tbody>
<tr>
<td>15 to 20</td>
<td>1</td>
</tr>
<tr>
<td>25 to 60</td>
<td>2</td>
</tr>
<tr>
<td>65 to 85</td>
<td>3</td>
</tr>
<tr>
<td>90 to 105</td>
<td>4</td>
</tr>
<tr>
<td>110 to 135</td>
<td>5</td>
</tr>
<tr>
<td>140 to 160</td>
<td>6</td>
</tr>
</tbody>
</table>

Table 6.1: Optimal $VPVCO$ settings for different frequency ranges. Taken from [61]
For data rates from 150 Mb/s to 750 Mb/s, as well as from 1000 Mb/s to 1600 Mb/s, the serializer runs error-free with the default parameters that control the delay between the 625 MHz clocks of the serializer tree and the DCL serializer ($VPD_{DelDCL} = 12$, $VND_{elDCL} = 6$, DCL clock not inverted). For data rates ranging from 800 Mb/s to 950 Mb/s, the delay is adjusted to prevent the occurrence of bit errors due to a timing mismatch between the two clock domains using the default configuration. The effect of the timing mismatch is qualitatively studied with the oscilloscope, see Figure 6.2. With the sensors readout state machine held in reset, the sensor is expected to constantly output its synchronization character K28.5 in alternating disparities. Timing violations between the two clock domains lead to a bit flip of the most significant bit (Figure 6.2a). After adjusting the delay by inverting the DCL clock, the correct bit pattern is restored (Figure 6.2b).

A quantitative investigation of the timing margin is done by varying the delay parameters of the DCL clock $VND_{elDCL}$ and $VPD_{elDCL}$ within their 6 bit range. For all combinations of these two parameters, the link is tested for error-free data transmission and for compliance with the data protocol of the sensor.

![Parameter scan](image)

(a) 1250 Mb/s, invert = 0  
(b) 950 Mb/s, invert = 0  
(c) 850 Mb/s, invert = 0  
(d) 850 Mb/s, invert = 1

Figure 6.3: Parameter scan of the delay of the clock of the DCL serializer. Error-free operation of the link is color coded in blue, erroneous operation in red.
For the nominal data rate of the Mu3e experiment, 1.25 Gb/s, there is a considerable timing margin, since almost the entire 2-dimensional delay parameter space allows error-free data transmission. At 95 MHz, timing is violated for a significant number of settings when the clock of the DCL serializer is not inverted, see Figure 6.3b. At 85 MHz, only a tiny parameter space is available for error-free data transmission, as can be seen in Figure 6.3c. Inverting the DCLclk adds half a clock period of delay and reopens the complete parameter space again for error-free operation, see Figure 6.3d.

The serial link of the MuPix sensor is fully operational over a wide range of frequencies. The Mu3e base data rate is significantly lower than the maximum possible sensor data rate leaving a considerable bandwidth margin. The clock delay parameters within the DCL serializer allow to mitigate any timing violations at the clock transitions. This is important for the operation of the Mu3e experiment, where thousands of links have to be operated simultaneously. Temperature differences, which are expected to be significant within the detector, can have a similar effect on the timing within the serializer. The implementation of the delay parameters allows to compensate these effects completely.

6.1.2 Eye diagram studies with MuPix8

Eye diagram analyses allow to evaluate the performance of high-speed serial data links. Moreover, they can reveal slightest analogue variations between different data links, which may remain hidden by studying bit error rates alone. Hence, they are the perfect tool to examine the data links of the MuPix8 sensor and optimize the settings of its DCL serializers for stable data transmission. In addition, they are used to check the links for on-chip variations as well as for sensor-to-sensor variations.

The MuPix8 is the first large HV-MAPS with four integrated serial data links, which potentially provide enough bandwidth that the sensors could be used for both phases of the Mu3e experiment. Three links, hereinafter referred to as links 0 to 2, transmit the data from individually running readout state machines. The fourth link, referred to as link 3, can be set to duplicate the data of another link or to multiplex the data of the other three and merge them into a single data stream.

The setup used for the study of MuPix8 is illustrated in Figure 6.4. The MuPix sensor is glued and bonded to a PCB [95], which is inserted into the mating connector on its motherboard, the MuPix8 PCB [96]. The motherboard provides the supply voltages for the sensor and an LVDS repeater to boost the serial links over the SCSI3-cable that connects it to the readout FPGA within the back-end PC. A custom designed HSMC-SCSI adapter card [95] interfaces the SCSI-cable with the FPGA.

Similar to the MuPix7 PCB, the MuPix8 PCB allows to probe the serial links of the sensor at dedicated SMA connectors. Eye diagrams are recorded using the digital serial analyzer DSA8300 by Tektronix [34] by
continuously overlaying waveforms of the serial data stream. The serial analyzer is triggered using an integrated clock recovery unit, which reduces the measured amplitude by 6.6 dB on average [97].

The sensors are operated at the nominal Mu3e data rate of 1.25 Gb/s. The sendcounter mode is chosen in order to generate a diverse data stream containing many different patterns. The speed of the state machine is halved by setting timerend to 1, in order to interleave the data stream with the synchronization character of the link protocol (Table 5.10). This enables the conduction of bit error rate tests using the same settings. Link 3 is chosen to copy the data of link 1.

**DAC optimization**

The DCL serializer has a total of 13 bias DACs that can be used to tune the serial data links. These parameters influence all four links at the same time. As the jitter of the on-chip clock affects the signal quality of the serial link, the DACs to steer the VCO and PLL are included in the parameter set. A default set of the parameters (Table 6.2), derived from MuPix7, has been used to operate the links of MuPix8 with the GXB receivers of the Stratix IV FPGA without issues. However, these settings do not provide a satisfactory link performance when the LVDS receivers of the FPGA are used, which is mandatory for the operation with the front-end board. The parameters are therefore optimized based on the study of eye diagrams.

The requirements on the optimization are derived from the components that are used along the data link. On the MuPix8 PCB, the serial data is sent to an LVDS repeater of type DS15BR400 [98] that boosts the signal in front of the SCSI-cable. It requires a minimum differential input voltage
6.1 Studies of the Serializer of the MuPix Sensors

<table>
<thead>
<tr>
<th>DAC</th>
<th>default</th>
<th>optimized</th>
<th>task</th>
</tr>
</thead>
<tbody>
<tr>
<td>VPVCO</td>
<td>7</td>
<td>12</td>
<td>oscillator bias</td>
</tr>
<tr>
<td>VNVC0</td>
<td>10</td>
<td>13</td>
<td>phase locked loop bias</td>
</tr>
<tr>
<td>VPPUMP</td>
<td>5</td>
<td>63</td>
<td>charge pump bias</td>
</tr>
<tr>
<td>VPDCL</td>
<td>12</td>
<td>24</td>
<td>global DCL bias</td>
</tr>
<tr>
<td>VNDCL</td>
<td>6</td>
<td>16</td>
<td>global DCL bias</td>
</tr>
<tr>
<td>VPDelDCL</td>
<td>12</td>
<td>40</td>
<td>delay for DCL clock</td>
</tr>
<tr>
<td>VNDelDCL</td>
<td>6</td>
<td>40</td>
<td>delay for DCL clock</td>
</tr>
<tr>
<td>VPDelDCLMux</td>
<td>12</td>
<td>24</td>
<td>delay for DCL mux clock</td>
</tr>
<tr>
<td>VNDelDCLMux</td>
<td>6</td>
<td>24</td>
<td>delay for DCL mux clock</td>
</tr>
<tr>
<td>VNLVDS</td>
<td>63</td>
<td>24</td>
<td>strength of LVDS buffer</td>
</tr>
<tr>
<td>VNLVDSDel</td>
<td>0</td>
<td>0</td>
<td>strength of pre-emphasis buffer</td>
</tr>
<tr>
<td>VPDelPreEmp</td>
<td>12</td>
<td>24</td>
<td>delay of pre-emphasis</td>
</tr>
<tr>
<td>VNDelPreEmp</td>
<td>6</td>
<td>24</td>
<td>delay of pre-emphasis</td>
</tr>
</tbody>
</table>

Table 6.2: Complete list of DAC settings of the DCL serializer. Values are given in decimal numbers. All DACs have a dynamic range from 0 to 63.

Figure 6.5: Eye diagram of sensor 84-01-10 link 0 for a supply voltage \( V_{High} \) of 1.8 V using the default settings before optimization.

Figure 6.5 shows the initial eye diagram of an exemplary MuPix8 link using the default parameters. The link suffers from large jitter of 65 ps RMS. This results in an eye width of only 411 ps, which is approximately half a bit period (0.5 UI). As the link fails the eye width criterion, the goal of

of 100 mV that all MuPix links have to comply with. At the FPGA, the data is sampled using an LVDS receiver IP that has a minimum differential input voltage of 247 mV [21]. Within this setup, the latter is provided by the LVDS repeater that has a typical differential output voltage of 360 mV. Losses over the SCSI-cable are mitigated by enabling pre-emphasis at the repeater. The LVDS receiver further requires sinusoidal jitter of less than 0.35 UI = 280 ps for frequencies in a range from 1.493 MHz to 50 MHz [21]. This translates to a minimum eye width of 0.65 UI = 520 ps.
the optimization is to reduce the jitter and to increase the eye width. The signal-to-noise ratio, which is 15.7 for this exemplary link, as well as the eye height are also subject to optimization.

All bias DACs are tuned individually to enhance the quality of the serial signal. The DAC settings derived from the optimization study are compiled in Table 6.2. Most of the bias currents are increased. Only the driver strength for the differential output voltage, $V_{NLVDS}$, is reduced in order to decrease the noise. Pre-emphasis is turned off as it does not enhance signal integrity within this setup. Figure 6.6a shows the effect of the optimized settings on the link used before. The jitter is significantly reduced (33 ps RMS), corresponding to a larger eye width of 600 ps = 0.75 UI. The signal-to-noise ratio is also increased to 22.2. It is further noteworthy that the sensor’s power consumption for the optimized settings is compatible with the previous settings within less than 1 %.

Studies of on-chip and sensor-to-sensor variations

Figure 6.6 shows eye diagrams of all four links of an exemplary MuPix8 sensor. A degradation of the eye opening from link 0 (width = 600 ps, height = 117 mV) to link 3 (width = 565 ps, height = 78 mV) is clearly visible. Moreover, the jitter of link 2 is more than 25% larger than that of

![Eye diagrams of sensor 84-01-10 for a supply voltage $V_{High}$ of 1.8 V.](image)

(a) link 0  
(b) link 1  
(c) link 2  
(d) link 3

Figure 6.6: Eye diagrams of sensor 84-01-10 for a supply voltage $V_{High}$ of 1.8 V.
the other links. The degrading signal quality from link 0 to link 3 could have two causes. The length of the signal lines from the LVDS output buffers to the pads differs among the links (Figure 6.7), hence, there is an intrinsic capacitance difference. This could explain that the high and low voltage levels are comparable among the links, but over-and undershoots are pronounced differently. Also, an issue of the on-chip power distribution within the DCL serializers and LVDS buffers cannot be fully excluded, as those parts have the largest power consumption density of the entire chip. To study this further, the global supply voltage for the serializer, called \( V_{\text{High}} \), is increased from its nominal value of 1.8 V to 1.9 V, 2.0 V and 2.045 V.

(a) \( V_{\text{High}} = 1.8 \) V  
(b) \( V_{\text{High}} = 1.9 \) V  
(c) \( V_{\text{High}} = 2.0 \) V  
(d) \( V_{\text{High}} = 2.045 \) V

Figure 6.8: Eye diagrams of link 3 of sensor 84-01-10 for different values of \( V_{\text{High}} \).
Raising the supply voltage leads to an overall increase of all bias currents within the DCL serializer. Figure 6.8 shows the effect of the supply voltage on the eye diagram of link 3. For higher voltages, the jitter is reduced which widens the eye. The differential amplitude rises considerably. The noise grows as well, but mostly because over-and undershoots are becoming more pronounced. This points to the on-chip capacitance being the major contributor to the signal differences among the serial links. The eye height is improved, as well. Hence, the increased supply voltage generally enhances the signal quality.

A sample of 10 MuPix sensors is used to study the on-chip variations as well as possible sensor-to-sensor variations quantitatively. Table 6.3 lists the samples with their approximate substrate resistivity given by the foundry. The sensors originate from a preproduction of the MuPix8. There is no further information available about the wafers from which the individual sensors originate. Eye diagrams are examined for all four links of each individual sensor operated at supply voltages of $V_{\text{High}} = 1.8 \text{ V}, 1.9 \text{ V}, 2.0 \text{ V}$ and $2.045 \text{ V}$. The serial analyzer is set to 10,000 acquisitions with automated measurements of the noise RMS, jitter RMS, eye width, eye height and amplitude. The signal-to-noise ratio is computed from the measured values of amplitude and noise.

The measured quantities are found to differ strongly among the links. But the four links of all sensors show similar characteristics. Link 3 always has the largest noise and smallest eye height, whereas link 2 possesses the largest jitter and smallest eye width. The influence of the supply voltage is comparable for all sensors. Since no influence of the sensor’s resistivity is observed, the measured quantities for the links 0 to 3 are averaged across all sensors. The detailed plots showing the measured quantities for all individual 40 serial links can be found in appendix B.

Jitter is significantly reduced for higher supply voltages, see Figure 6.9a. Especially the jitter of link 2 improves by more than 20%. No significant enhancement is observed by increasing the supply voltage beyond 2.0 V.

<table>
<thead>
<tr>
<th>Sensor ID</th>
<th>Resistivity [Ωcm]</th>
</tr>
</thead>
<tbody>
<tr>
<td>0232-0001-000001</td>
<td>200</td>
</tr>
<tr>
<td>0232-0001-000003</td>
<td>200</td>
</tr>
<tr>
<td>0232-0001-000005</td>
<td>200</td>
</tr>
<tr>
<td>0084-0001-000010</td>
<td>80</td>
</tr>
<tr>
<td>0084-0001-000003</td>
<td>80</td>
</tr>
<tr>
<td>0084-0001-000008</td>
<td>80</td>
</tr>
<tr>
<td>0084-0002-000010</td>
<td>80</td>
</tr>
<tr>
<td>0084-0002-000004</td>
<td>80</td>
</tr>
<tr>
<td>0084-0003-000002</td>
<td>200</td>
</tr>
<tr>
<td>0084-0003-000029</td>
<td>200</td>
</tr>
</tbody>
</table>

Table 6.3: Sensors used for eye diagram study.
6.1 Studies of the Serializer of the Mupix Sensors

Figure 6.9: Parameters of the serial data links 0 to 3, averaged over all 10 samples, as a function of the supply voltage. The RMS of the averaged values are used as uncertainties, hence, the uncertainties represent the sensor-to-sensor variations. Voltages are not corrected for the attenuation of 6.6 dB by the clock-recovery-unit.

The sensor-to-sensor variations decrease for higher supply voltages, as well. At 1.8 V, variations of more than 20% are observed, which are reduced to less than 10% at 2.0 V.

The same behaviour is reflected in the average eye width, see Figure 6.9b. The eye width reaches saturation at about 540 ps for link 2, between 580 ps and 590 ps for links 0 and 3, and around 610 ps for link 1. Sensor-to-

(a) Jitter RMS.

(b) Eye width.

(c) Noise RMS.

(d) Signal-to-noise ratio.

(e) Eye height.
sensor variations decrease accordingly. At 2.0 V, all links have an eye width larger than 520 ps, which is required by the LVDS receivers. It is therefore expected that the links can be operated without bit errors.

Figure 6.9c shows the average noise RMS values for link 0 to 3 as function of the supply voltage. On average, link 1 has the lowest and link 3 the highest noise value. All four links show a different dependence on the supply voltage. Sensor-to-sensor variations of up to 15% are observed.

The signal-to-noise ratio of the four links is influenced differently by the supply voltage, see Figure 6.9d. The SNR of Link 3 shows a very small spread among the sensors and is almost not affected at all by the supply voltage. Each of the other links has a different optimal supply voltage which maximizes the SNR.

The eye height of all links improves significantly with increasing supply voltage, see Figure 6.9e. All links gain between 15 and 25 mV.

The eye width and eye height parameters undoubtedly show an improvement in the signal quality of the MuPix8’s serial data links for higher supply voltages. Sensor-to-sensor variations, which are most pronounced in the jitter of the serial links, are also reduced for increased supply voltages. However, the significant differences among the links do not vanish with higher supply voltage. These variations are therefore most likely caused by the capacitance differences of the transmission lines on the chip.

6.1.3 Bit error rate studies with MuPix8

The serializer settings found in the previous section are checked for error-free data transmission using bit error rate tests. The same sample of sensors (Table 6.3) and the same hardware setup (Figure 6.4) is used. The serial links are sampled by the FPGA that is connected to the MuPix8 PCB via a SCSI-cable. The sensors are tested one at a time, with all four sensor links being investigated in parallel.

In the FPGA, LVDS receiver IPs are instantiated to deserialize the data stream. The FPGA performs bit alignment of the serial data and checks the data stream for errors of the running disparity as well as for obvious 8b/10b errors, i.e. any pattern that consists of more than 6 '1's or '0's. The implementation is further described in the context of the vertical slice setup in section 7.3.1.

Each bit error rate test is conducted over a period of 100 seconds, which allows to set an upper limit for the bit error rate of $2.4 \cdot 10^{-11}$ in the case that no error is found. The measured bit error rates are correlated with the previously measured eye diagram parameters. Figure 6.10 shows the bit error rates of all tested links in a two-dimensional parameter space spanned by the eye width and eye height of the links. In case that no error is found, the upper limit of $2.4 \cdot 10^{-11}$ is plotted.

For a supply voltage of 1.8 V, 8 sensors pass the test without any errors, but two sensors fail by having a significant amount of errors counted in all 8 channels, see Figure 6.10a. Five of these channels fail because one of
6.1 Studies of the Serializer of the MuPix Sensors

Figure 6.10: Measured bit error rates and upper limits for the bit error rate versus the eye parameters for all tested links at different supply voltages. The upper limit for links without errors is $2.4 \cdot 10^{-11}$.

Table 6.4: Bit error rates for three links with potentially good eye diagram parameters, that fail because of cross-talk. The bit error rates (BER) are measured with (parallel) and without (individual) the other links at the same time.
Hence, there is a potential cross-talk source in the sensor and on the first centimetres of traces within the PCB up to the solder jumpers that could lead to this non-zero error rate.

For supply voltages of 1.9 V and higher, the signal quality of the serial links is improved. This reflects in the measurements of the bit error rates. All channels are found to run error-free and pass the test (Figures 6.10b-d). For a supply voltage of 2.0 V and above, all but one link fulfil the eye width requirement posed by the LVDS receiver. However, error-free data transmission is also found for links with an eye width smaller than 500 ps, but only if the eye height is larger than 115 mV. Hence, the two parameters cannot be fully disentangled.

For $V_{\text{High}} = 2.0 \text{ V}$, the sensor-to-sensor variations are significantly smaller than for lower voltages, which is advantageous for the stable operation of multiple sensors at a time.

6.2 LVDS Receiver Tests of the Front-end FPGA

The bit error rate test of the MuPix sensor in the previous section has been conducted with a Stratix IV FPGA on a development board. For the vertical slice setup, described in the following chapter, the first front-end board prototype is integrated into the data acquisition system. In preparation of this integration, the implementation of the front-end board is tested. In that context, the proper operation of all 36 LVDS receiver channels, that are connected to the QSH-banks, is verified using a bit error rate test.

The setup is schematically depicted in Figure 6.11a. An adapter PCB routes the LVDS receiver channels to SMA connectors. On a Stratix V GS
The firmware implementation of this setup is illustrated in Figure 6.11b. The data format uses 8b/10b encoding in order to be comparable to the protocol of the MuPix sensors. An 8 bit counter is used as test data. To allow the receiver to recover the correct word boundary, the control character K28.5 is inserted regularly in the data stream. On the front-end FPGA, the data stream is deserialized within the LVDS receiver IP. The word boundary is restored in the alignment process searching for the unique pattern of the control character. Bit errors are detected using an exclusive or (XOR)-comparison of the incoming and the expected data.

As the LVDS transmitter of the Stratix V FPGA presents as an “ideal” transmitter with high signal quality, the implementation of the LVDS links on the front-end board can be properly tested. All receiver channels are measured to be running error-free over run times of at least 15.5 hours per channel. The resulting upper limit for the bit error rate is $BER < 4 \cdot 10^{-14}$ at a confidence level of 95%. This test verifies the correct physical link implementation of the LVDS receiving channels on-board as well as the correct implementation of the receiving firmware.

6.3 optical link studies

In preparation of the vertical slice test, all opto-electrical components used within this setup are tested for proper operation. Therefore, bit error rate tests are carried out for the QSFP+ and MiniPod transceivers on the front-end board, as well as for SFP+ transceivers in a stand-alone setup.

6.3.1 MiniPods and QSFP on the front-end board

The setup for the tests of the transceivers on the front-end board is depicted in Figure 6.11a. The MiniPods are connected in an optical loopback, using the AFBR-811FH1Z transmitter and AFBR-821FH1Z receiver devices by Avago [75]. The 12-fold transmitter and receiver are potentially capable of transferring data rates exceeding 10 Gb/s per channel. For the tests, the link is established using a 1 m long, 12-fold multi-mode fibre with PRIZM connector. The QSFP-link is established using a Molex 106410-A-02 device, that contains four 3 m long single-mode fibres connecting the active transceiver modules.

A block diagram of the firmware implementation is shown in Figure 6.13. The data generator provides 32 bit words, that consist of two consecutive states of a 16 bit counter. Every time the counter lapses, a unique sequence of 4 32 bit words is sent that includes a control character to perform bit and byte alignment at the receiver. The data is 8b/10b en- and decoded within the gigabit transceiver blocks. A serial loopback within the FPGA is used to verify the firmware implementation without the optical devices.
Figure 6.12: Setup of the optical link studies performed with the Stratix IV front-end board. The MiniPod transceivers are tested using an optical loopback on the front-end board. The QSFP transceivers are tested using two front-end boards.

Figure 6.13: Block diagram of the firmware implementation for the optical links studies.

Bit error rate tests are performed at 3.125 Gb/s, 5.0 Gb/s and 6.25 Gb/s. The latter is the target data rate for the optical links between the front-end and the switching board and also in the vertical slice test. It is close to the maximum capabilities of the FPGA of 6.375 Gb/s.

All 16 optical links are found to run error-free at all tested data rates. The optical modules are operated in their default configuration. Table 6.5 shows the upper limits on the bit error rates achieved with the MiniPod channels. Up to 5.0 Gb/s, the FPGA’s GXB transceivers are operated with

<table>
<thead>
<tr>
<th>data rate (Gb/s)</th>
<th>tested bits (×10^n)</th>
<th>bit errors</th>
<th>bit error rate</th>
</tr>
</thead>
<tbody>
<tr>
<td>3.125</td>
<td>3.0 · 10^{12}</td>
<td>0</td>
<td>≤ 1.0 · 10^{-12} @ 95 % CL</td>
</tr>
<tr>
<td>5.0</td>
<td>5.3 · 10^{14}</td>
<td>0</td>
<td>≤ 5.6 · 10^{-15} @ 95 % CL</td>
</tr>
<tr>
<td>6.25</td>
<td>8.6 · 10^{16}</td>
<td>0</td>
<td>≤ 3.5 · 10^{-16} @ 95 % CL</td>
</tr>
</tbody>
</table>

Table 6.5: Bit error rates per channel for the MiniPod transceivers.
Table 6.6: Bit error rates per channel for the QSFP+ transceivers.

<table>
<thead>
<tr>
<th>data rate (Gb/s)</th>
<th>tested bits</th>
<th>bit errors</th>
<th>bit error rate</th>
</tr>
</thead>
<tbody>
<tr>
<td>3.125</td>
<td>$3 \cdot 10^{12}$</td>
<td>0</td>
<td>$\leq 1.0 \cdot 10^{-12}$ @ 95%CL</td>
</tr>
<tr>
<td>6.25</td>
<td>$3.8 \cdot 10^{14}$</td>
<td>0</td>
<td>$\leq 8.0 \cdot 10^{-15}$ @ 95%CL</td>
</tr>
</tbody>
</table>

6.3.2 SFP+ transceivers

In the vertical slice setup, the back-end FPGA receives the data optically using SFP+ transceivers of type AFBR-709 by Avago [99]. The transceivers are therefore tested at different data rates in order to test their applicability.

The transceivers are operated using a Stratix V FPGA development kit [19] that connects eight gigabit transceivers via the Santa Luz HSMC-SFP card [100] to the SFP+ modules. The modules are connected using 1 m long multi-mode fibres in a pairwise loopback, illustrated in Figure 6.14.

Bit error rate tests are conducted with an adapted firmware implementation of the transceiver toolkit design example by Intel [101]. A pseudorandom binary sequence-7 (PRBS7) is used as test pattern, which is comparable to an 8b/10b encoded data stream as the maximum run length of consecutive 1’s is limited. All channels are operated error-free at 4 Gb/s using the standard configuration of the FPGA’s gigabit transceivers. The upper limit for the bit error rate is $1.2 \cdot 10^{-14}$ at 95%CL. At 8 Gb/s, all channels are operated without errors as well by enabling the 1\textsuperscript{st} post-tap of

![Figure 6.14: Setup of the optical link studies performed with SFP+ transceivers.](image-url)
the FPGA’s gigabit transmitters’ pre-emphasis. An upper limit for the bit error rate of $4.8 \times 10^{-15}$ at 95\% CL is measured per channel. As error-free operation of the SFP+ transceivers is possible at data rates of 8.0 Gb/s, they can be safely used within the vertical slice setup where the data rate is limited to 6.25 Gb/s.
The components of the Mu3e readout system are subject to individual studies. Among other things, the implementations of the sorting algorithm for the front-end [2] and the online event selection for the filter farm [63] have previously been examined. The integration of the readout components into a system represents an important milestone in the experiment’s development. The modularity of the designs of the detector and the readout system allows to perform integration studies in form of vertical slice tests.

The front-end board is a custom development for the Mu3e experiment in order to fulfil the specific experimental requirements. It has to process the data of up to 45 1.25 Gb/s serial sensor links and provide optical bandwidth of at least 30 Gb/s for the second phase of the experiment. In addition, the board has to fulfill spatial constraints as well as to tolerate magnetic fields up to 2 T. Since the development of this board is a time-consuming process, the production of a first prototype has been started at an early stage of the experiment’s development. This prototype, although it does neither fulfil the spatial requirements, nor does it tolerate strong magnetic fields, provides all means of electrical and optical communication, which enables its integration into a vertical slice of the Mu3e pixel detector using the current pixel sensor prototype MuPix8.

Developments for the readout of the Mu3e pixel detector are gathered in the framework of the MuPix Telescope [102]. During test beam measurements, not only the pixel sensors can be characterized using this particle tracking telescope, but also hard-, firm- and software developments can be checked for their operation under conditions that come as close to the final experimental conditions as possible. The MuPix Telescope therefore presents an ideal testbed for this integration study.

The subsequent sections describe the hard-, firm- and software implementations of the vertical slice setup. The setup is successfully commissioned and operated as a particle tracking telescope at test beam measurements at the Deutsches Elektronen-Synchrotron (DESY) in Hamburg and at the Mainz Microtron (MAMI). The performance of the full readout system is discussed in detail.

7.1 **The MuPix Telescope**

The MuPix Telescope is a particle tracking telescope with high rate capabilities [102]. The principle of a tracking telescope based on pixel detectors is illustrated in Figure 7.1. A particle penetrates several layers of pixel sensors. The hit positions on the sensors are used to reconstruct the particle’s track, which is extrapolated to a device under test that is placed between
the layers of the beam telescope. This allows to measure for instance the detection efficiency of the device under test.

The MuPix Telescope is used to characterize the MuPix sensor prototypes at test beam measurements. The tracking telescope typically consists of four MuPix sensors that are controlled and readout by a Stratix IV FPGA development board.

The FPGA board communicates with its host PC via PCIe. A device driver handles the memory mapping between PC and FPGA [63, 93]. Four different base address registers (BARs) are used for control and status registers as well as for two 256 kB large memory regions that can either be written by the PC or the FPGA. Data transfer from the FPGA to the PC can be performed via polling or DMA. The first option limits the data throughput to a few MB/s [102]. Using DMA, several GB/s can be transferred [63], which truly enables high-speed data acquisition.

The MuPixTelescope data acquisition software is a multi-threaded software implementing asynchronous Qt-threads for readout, file writing and monitoring. [64, 92]. A graphical user interface (GUI) allows to control the setup. Separate dialogues are used to display and modify the sensor configuration, to monitor the incoming hit sensor data and to check the status of the data links.

7.2 HARDWARE DESCRIPTION OF THE VERTICAL SLICE

The vertical slice integrates the Mu3e front-end board into the continually progressing development of the Mu3e readout system within the MuPix Telescope. Figure 7.2 shows the hardware components and their interconnections within this integration study, which are described in the following sections. The setup comprises eight MuPix8 sensors that are read out and controlled by the Stratix IV front-end board prototype. The data acquired by the front-end FPGA is transferred optically to an FPGA within the back-end PC. To operate the front-end and the back-end FPGAs synchronously, a clock distribution system is implemented. A photograph of the setup in the laboratory is shown in Figure 7.3.
7.2 HARDWARE DESCRIPTION OF THE VERTICAL SLICE

Figure 7.2: Schematic drawing of the vertical slice setup.

Figure 7.3: Photograph of the vertical slice setup in the laboratory. The most important components are highlighted.
7.2.1 MuPix8

The MuPix8 pixel sensors are described in detail in section 5.4. The sensors are glued and bonded to custom designed PCBs, the MuPix8 Inserts [95], which in turn are mounted on their motherboards, the MuPix8 PCBs [96], as described in section 6.1.2. A photograph of the sensor on its PCB is shown in Figure 7.4.

7.2.2 QTH-SCSI-Adapter-PCB

The QTH-SCSI-Adapter-PCB is used to connect two MuPix8 PCBs via SCSI-68 sockets to one QSH-bank of the front-end board. The adapter card is shown in Figure 7.5. It features two LVDS fan-out chips of type DS10BR254 to distribute the differential clock and reset signals from the FPGA to both sensors. These chips possess a typical channel-to-channel skew of 40 ps [103]. To guarantee a total skew of less than 100 ps between the clocks of the two sensors, the clock traces on the PCB are matched within 40 ps.
The fan-out chips can be tested individually without having to connect the PCB to the front-end board as extra SMA connectors are implemented to drive a secondary input as well as to probe a spare output of the chips. The spare output additionally allows to examine the clock phases between different QSH-banks when the PCB is connected to the front-end board. This enables the clock skew among all connected sensors to be minimized.

Signal transmission over the SCSI-cables is implemented fully differential. To accommodate all signals on the banks of the front-end board, conversion from single-ended to differential signals and vice versa is implemented using the DS90LV031A/032A line drivers and receivers [104, 105]. The signals used for sensor configuration are connected in parallel, except for the data input which is daisy-chained between the sensors, see Figure 7.6.

7.2.3 Front-end board

A Stratix IV front-end board prototype is used, which is described in detail in section 5.5.2. Four QTH-SCSI-Adapter-PCBs are connected to operate eight MuPix8 sensors in parallel. A USB-Blaster cable [106] connects the back-end PC with the JTAG header of the front-end board, which allows to control the FPGA from the back-end. A MiniPod transmitter and receiver pair is used for the optical communication between the front-end and the back-end. As the MiniPods provide enough bandwidth for the readout system, no QSFP module is installed. The FPGA’s auxiliary clock as well as the reference clocks for the on-board clock chips are provided by the clock distribution system.

7.2.4 Clock distribution

To operate the front-end and back-end FPGAs synchronously, a clock distribution system is implemented which is based on an SI5338 evaluation board [107] as clock source. It generates four differential output clocks, indicated in Figure 7.2. Two 125 MHz clocks are used as reference inputs.
for the clock chips on the front-end board. In addition, a system clock of 50 MHz is provided for the front-end FPGA. The connections are realized electrically using coaxial cables with SMA connectors.

A 125 MHz clock is provided for the back-end FPGA. To prevent a ground loop in the system, the clock is optically coupled via a MiniPod transmitter and receiver pair. The MiniPods are operated on custom designed PCBs, where each electrical channel can be connected individually.

7.2.5 Back-end PC and receiver card

The back-end PC hosts a Stratix IV GX PCIe development board [91] as receiver card, which is chosen as device drivers, PCIe interfaces and other firmware components have been available and extensively tested from the development of the MuPix Telescope. The board features two HSMC connectors, referred to as HSMC ports A and B. Each port hosts up to 8 multi-gigabit transceiver channels and 68 multi-purpose pins. At port A, a Santa Luz SFP+ transceiver card [100] is installed via a 15 cm long Samtec HSMC high-speed cable (HDR-131992-01-HQDP [108]). The Santa Luz card is equipped with 8 SFP+ transceivers of type AFBR-709 by Avago [99]. At HSMC port B, an HSMC-SCSI adapter PCB [95] is mounted, which is used to receive the clock signal from the distribution system.

The SFP transceivers are connected to 8 MiniPod transceiver channels of the front-end board via 5 m long PRIZM-to-LC fan-out cables [109]. The 4 remaining MiniPod transceiver channels are connected in an optical loopback that allows to monitor the operational stability of the front-end board and its transceivers as well as to disentangle issues within the setup. Figure 7.7 illustrates the channel assignments of the optical transceivers.

7.3 Firm- and software at the front-end

The tasks of the front-end FPGA for the readout of the Mu3e detector are described in section 5.5. In this vertical slice test, they are split between
the front-end and the back-end FPGA because the currently available implementation of the sorting algorithm is too resource intensive to fit onto the FPGA of the first front-end board prototype. The front-end therefore only performs data deserialization, 8b/10b decoding and error detection. The link protocol unpacking as well as the packet building are moved to the back-end. The subsequent sections describe the firmware- and software implementations on the front-end FPGA in the vertical slice setup.

7.3.1 Front-end firmware

The top level of the firmware implemented on the front-end FPGA is shown in Figure 7.8. It is composed of two major blocks: the data path and the slow control. The data path receives the serial data from the sensors and prepares it for the optical transmission to the back-end. Furthermore, it receives the sensor configuration from the back-end and transfers it to the pixel chips. The slow control block adjusts the delays between the sensor clocks and communicates with the peripheral components on the front-end board.

Slow control

Most of the on-board slow control is handled by an embedded system using a NIOS II [110] soft processor within the FPGA, see Figure 7.8. The processor is implemented in the economy version NIOS II/e to minimize the resource utilization. The system is clocked by the FPGA’s 50 MHz auxiliary clock and can be accessed via a JTAG universal asynchronous receiver-transmitter (UART), which allows to interact with the software running on the processor and to access all memory mapped components. A 160 kB large RAM stores the software executed by the processor and acts as stack and heap. In addition, 4 kB of RAM are reserved for storage of slow control data with a fixed address map.

Peripheral firmware components, such as the transceiver reconfiguration controller and the bit error counters, are accessed through 32 bit wide

![Diagram of Front-end board slow control](image_url)
parallel I/Os (PIOs) or Avalon memory interfaces [111]. Communication
with the test pulse DACs and the clock chips is realized over SPI. The
settings of the clock chips, which are initially created using the Clock-
Builder Pro software by Silicon Labs [112], are stored in an on-chip RAM
and transferred to the clock chips after the FPGA has been configured.
Communication with the optical transceivers and the ADCs is realized over
I2C, using an individual I2C master entity [113] per device.

The firmware features various debug components, such as interfaces to
LEDs and push-buttons as well as an LCD controller. The hash value of the
version control software at time of compilation is stored in a 28 bit register,
to keep track of the firmware version running on the FPGA. In addition, a
serial flash loader IP allows to update the hardware image stored on the
connected flash memory via JTAG.

At the front-end board, each QSH-bank provides a differential 125 MHz
clock for the attached detectors that originates from the FPGA. Routing
delays within the FPGA and the PCB lead to phase differences between the
output clocks, which would deteriorate the detector system’s time resolu-
tion. As a countermeasure, phase compensation is realized in the FPGA
for each output clock. To cover the full dynamic range, it is implemented
in two stages. Firstly, a rough clock alignment with the accuracy of 1 ns is
realized by sampling the state of the 125 MHz master clock with the rising
and falling edges of a 500 MHz clock, which is derived from the master
clock using a PLL. Secondly, output buffers with two reconfigurable delay
chains are implemented, which allow to add a maximum delay of 1.1 ns
in steps of 50 ps. All clocks can therefore be aligned within 25 ps or less.
The configuration of the delay chains in the output buffer is simulated and
verified using the ModelSim [114] simulation software. Figure 7.9 shows
the effect of increasing the delay between a buffer’s input and output clock
by 50 ps within the simulation.

Data path

The data path implements a readout scheme of the Mu3e front-end firmware
that can be used for detector commissioning. The deserialized and decoded
raw detector data is transferred to the back-end without further processing,
which allows the general functionality of the attached sensors to be checked.
The data path, depicted in Figure 7.10, consists of two major blocks: the receiver and the transceiver block, that are described in the subsequent paragraphs.

**Receiver Block** The receiver block has 32 serial inputs for the data links of the 8 connected pixel sensors. Each input is connected to an LVDS receiver IP that deserializes the data stream of 1.25 Gb/s into parallel words of 10 bit clocked at 125 MHz. The dedicated deserializer IPs are located in two row banks of the FPGA. All receivers within one bank share a common, external PLL that provides the receivers’ clocks as described in [21].

The parallel detector data is 8b/10b decoded and the word boundary is restored in the alignment process. Figure 7.11 displays the state machine of the alignment process. After reset, the state machine waits for the receiver’s PLL to lock to the reference clock and for the dynamic phase alignment...
Figure 7.11: Diagram of the state machine that performs word alignment for the LVDS links.

(DPA) circuitry to lock its phase to the data stream. Afterwards, the 8b/10b decoded data stream is checked for the occurrence of the control character K28.5 within 512 consecutive clock cycles. The sensor protocol assures that the control character is sent more than once within this time window. If the control character is not identified, the alignment process triggers a bit shift in the LVDS receiver, resets the receiver’s FIFO, and restarts the process. If the control character is not identified within 10 bit shifts, the LVDS receiver channel and the DPA circuitry are reset and the alignment starts over. Once the control character is found, the channel is flagged ready to the downstream logic. The data stream is continuously monitored for the occurrence of the control character in order to take action and re-align the data in the event of a link failure.

In parallel to the alignment process, a disparity checker calculates the running disparity and raises a flag if one of the following two error conditions is met: either the running disparity (section 4.2.2) is violated, or

<table>
<thead>
<tr>
<th>error condition</th>
<th>error code</th>
</tr>
</thead>
<tbody>
<tr>
<td>FIFO empty</td>
<td>K27.7</td>
</tr>
<tr>
<td>link not aligned</td>
<td>K30.7</td>
</tr>
<tr>
<td>wrong control character</td>
<td>K29.7</td>
</tr>
</tbody>
</table>

Table 7.1: Error conditions that are caught at the front-end’s receiver block to prevent corrupted sensor links from causing 8b/10b errors on the optical links. The erroneous data words are replaced by the indicated control characters which are not used in the protocol of the MuPix sensors.
the received word does not comply with the 8b/10b standard that any 10 b word may only contain between 4 and 6 ’1’s. The output of the disparity checker is connected to a generic error counter that accumulates the uptime of the link and its errors. Both values are accessible through the embedded system.

The two LVDS receiver banks are operated in separate clock domains. FIFOs are used to synchronize all data streams to a common 125 MHz clock without data losses. Afterwards, the data is prepared for the optical transmission. To guarantee that corrupted links do not cause bit errors on the optical links, erroneous words are replaced with control symbols that are not part of the link protocol for the conditions compiled in Table 7.1. These measures guarantee that corrupted data from the LVDS links does not cause bit errors on the optical links. In addition, the erroneous words can be easily detected in the back-end where the data is checked for compliance with the sensor protocol.

**Transceiver Block** At the interface between LVDS receiver and GXB transceiver block, the data is grouped into eight busses of five bytes, each combining four bytes of detector data and an extra byte containing either slow control information or an idle character (K23.7). The transceivers are operated in double-width mode at a data rate of 6.25 Gb/s with internal 8b/10b en- and decoders. The parallel data interface has a width of four bytes at a clock frequency of 156.25 MHz per channel.

The loss-free transition from a 5 byte wide bus at 125 MHz to a 4 byte wide bus at 156.25 MHz is realized with a FIFO. The implementation is illustrated in Figure 7.14. At the faster clock domain, data exceeding the output bus width is stored in an intermediate buffer and combined with

![Figure 7.12: Clock transition at the gigabit transmitter using a FIFO to reduce the bus width from 5 bytes at 125 MHz to 4 bytes at 156.25 MHz without data losses. An X means that no data is requested to be read.](image-url)
the data that is subsequently read, until the buffer is completely filled up. At that point, no read request is issued and the buffer is emptied.

After reset, writing of the data busses to the FIFOs is disabled by the link control units. Instead, the link control units generate a fixed and repetitive data pattern to facilitate synchronization and byte alignment at the receivers. The selected sequence is shown in Table 7.2. The links can be individually enabled via JTAG, which requests the incoming data to be written to the FIFOs. When the data becomes available for the link, a unique data pattern is sent to indicate the start of the subsequent data flow.

The transceivers are realized as hard IPs on the FPGA and are described in detail in section 4.3.2. The FPGA’s physical differential transceiver input and output pins are connected to the MiniPod receiver and transmitter channels, respectively. The reset of the transceivers is implemented according to the specifications in [21]. The transceivers are connected to a reconfiguration controller that is controlled through the NIOS. Pre-emphasis and equalization of the transmitter and receiver channels are individually adjustable per channel. Furthermore, the EyeQ feature is implemented to probe the signal quality of the high-speed links directly at the physical receiver input.

Word boundaries are lost after data serialization and have to be restored at the receiver. Figure 7.13 shows the state machine of the word alignment process for the GXB receivers. After reset, the state machine waits for the CDR circuitry to lock to the incoming data stream. The receiver subsequently searches for the alignment pattern K28.5 within the data stream. If the pattern is detected, the receiver aligns the bits accordingly and raises a flag which is monitored by the word alignment process. If the synchronization pattern is not found, the receiver is forced to re-align. After bit alignment is done, the correct byte alignment is restored by shifting the incoming bytes to match the expected data patterns (Table 7.2).

The 8b/10b decoders within the GXB receivers offer full 8b/10b error detection and provide error flags that are synchronous to the data. Error and uptime counters are instantiated per channel to monitor the quality of data transmission. These values are accessible via JTAG.
At the interface of the transceiver block, the received data is transferred from the transceiver’s 156.25 MHz clock domain to the global 125 MHz clock domain using a FIFO. No data is lost at this transition as the bus width is increased from 4 byte to 5 byte. The implementation is illustrated in Figure 7.14. At the faster clock domain, the bus width is increased to 5 byte using an intermediate buffer to store excess data. Writing to the FIFO is enabled after the unique indicator pattern (Table 7.2) is received. This guarantees that the correct byte order is restored after optical transmission. The data received by the gigabit receivers contains the pixel sensor and...
PCB slow control signals as well as the synchronous sensor resets, which are directly mapped to the corresponding output pins of the FPGA.

**Resource utilization**

The firmware design has been implemented using the Quartus II software version 17.1.0. It utilizes 54% of the FPGAs logic, with 69% of the ALMs being partially or completely used. Table 7.3 comprises also the number of adaptive look-up tables (ALUTs), dedicated logic registers and total block memory bits being used.

In Table 7.4, the resource utilization in form of adaptive logic modules and memory bits is broken down for the three largest components: the receiver block, the transceiver block and the embedded system. These three components are essential for the operation of the front-end board and consume already more than half of the FPGA’s logic on the first prototype, which motivates the device migration to the larger Arria V FPGA for the Mu3e experiment. However, optimizations are still possible to reduce the logic usage, for instance in the LVDS receiver block, where about 40% of the logic elements are used for the error and uptime counters.

### 7.3.2 Front-end software on the FPGA

The NIOS processor runs a program to communicate with the clock chips, the optical transceivers and the ADCs, using the implemented SPI and I2C protocols. The software is also capable of reconfiguring the FPGA’s fast transceivers. For monitoring purposes, continuously called alarm functions are implemented. These are used to regularly read the temperature and voltage values of the ADCs and stored them in memory where they can be

<table>
<thead>
<tr>
<th>Resource</th>
<th>Used</th>
<th>Percentage Used</th>
</tr>
</thead>
<tbody>
<tr>
<td>ALUTs</td>
<td>23 351</td>
<td>40%</td>
</tr>
<tr>
<td>Dedicated logic registers</td>
<td>26 741</td>
<td>46%</td>
</tr>
<tr>
<td>Logic utilization</td>
<td>31 539</td>
<td>54%</td>
</tr>
<tr>
<td>ALMs partially or completely used</td>
<td>20 113</td>
<td>69%</td>
</tr>
<tr>
<td>Total block memory implementation bits</td>
<td>2 267 136</td>
<td>34%</td>
</tr>
</tbody>
</table>

Table 7.3: Resource utilization of the firmware implementation for the front-end FPGA in the vertical slice setup.

<table>
<thead>
<tr>
<th>Entity</th>
<th>ALMs</th>
<th>block memory bits</th>
</tr>
</thead>
<tbody>
<tr>
<td>LVDS receiver block</td>
<td>8731</td>
<td>5120</td>
</tr>
<tr>
<td>GX transceiver block</td>
<td>7129</td>
<td>14 064</td>
</tr>
<tr>
<td>Embedded system (NIOS)</td>
<td>2695</td>
<td>1 354 752</td>
</tr>
</tbody>
</table>

Table 7.4: Resource utilization for the three largest entities within the front-end firmware.
retrieved via JTAG. Likewise, the transceivers’ configurations are read out, stored and updated as required. Access to the program is realized through JTAG using the nios2-terminal [115].

The program is implemented in C using the NIOS II Software Build Tools for Eclipse. To minimize the size of the program, it is implemented with reduced device drivers and without support of C++. The code and initialized data of the program amount to 20 kB, leaving 138 kB free for stack and heap.

7.3.3 Front-end software on the PC

A Tool Command Language (TCL)-based graphical user interface (GUI) is used to control and monitor the front-end board. It is executed via the System Console [116] provided by the Quartus II software. The program accesses the FPGA via the JTAG connection and translates all actions into memory read and write operations.

The GUI consists of five dashboards. The "Utilities" dashboard is used for tests of the QSH-banks and displays the currently running firmware version. The "Temperature" dashboard monitors the ADCs and shows the latest measured temperatures and voltages, see Figure 7.15a. The "LVDS BERT" dashboard monitors uptime and bit errors of the LVDS links. The "GX Control" dashboard monitors the gigabit transceiver links and allows to reconfigure each channel individually, as is shown in Figure 7.15b for a single link. The "Chipclk Control" dashboard is used to adjust the phases of the clocks between the QSH-banks of the front-end board.

During operation of the front-end board, the measured temperatures, voltages, uptimes and bit errors of all links as well as the configuration of the gigabit transceivers are continuously written to disk for offline evaluation.
7.4 Firmware and Software at the Back-end

A Stratix IV GX development board [91] is used as back-end FPGA board within the host computer. It receives the detector data from the front-end board via optical links and performs the remaining tasks of the Mu3e front-end firmware, meaning that it retrieves the hit information from the raw detector data and creates data packets. These are subsequently sent to the computer via PCIe.

7.4.1 Back-end firmware

The top level of the back-end firmware is schematically depicted in Figure 7.16. It consists of four major blocks: the data path, the DDR3-RAM interface, the PCIe interface and the slow control block. The data path processes the detector data and transfers it via the DDR3 to the PCIe interface. If DMA is used to transfer the data to the computer, the DDR3 can be bypassed. The slow control block controls the back-end FPGA as well as the MuPix sensors and PCBs that are attached to the front-end.

Slow Control

A set of registers forms the central element of the slow control block. These registers can either be read or written by the computer via the PCIe interface, in order to reset components or set parameters within the running firmware as well as to read off status information. All components of the slow control block are shown in Figure 7.16. Only the most important components are further explained in the following paragraphs.

An on-chip PLL generates a global master clock of 125 MHz, a fast clock of 500 MHz that is used to sample externally applied timing reference signals and a 50 MHz system clock. All clocks are derived from the external 125 MHz clock that is provided by the clock distribution system.

Figure 7.16: Top level of the back-end firmware.
Asynchronous reset signals that are globally and locally used within the firmware design are provided by the reset logic. For the reset of the pixel sensors’ timestamps, specific reset signals are synchronized to the 125 MHz master clock. The reset logic is clocked by an on-board clock source to be independent of PLLs and external clock inputs.

The back-end firmware hosts the slow control for the MuPix8 PCBs. It contains a test pulse entity that controls the charge injection circuitry and two SPI interfaces for the DACs and ADC on the MuPix8 PCB.

In addition, the configuration of the MuPix sensors is implemented on the back-end. The pixel configuration data is stored in an on-chip RAM which can be written via the PCIe interface. A state machine interprets the content of the memory following the protocol that is illustrated in Figure 7.17. Data words of 32 bits are serialized and clocked into the sensor’s configuration shift register. The slow control block is operated synchronously to the sensors at 125 MHz. The actual configuration clock can be divided down to 2 kHz. Within this setup, a maximum frequency of 31.25 MHz is found up to which the MuPix8 sensor can be properly configured, which corresponds to a single bit transmission rate of about 6 Mb/s. The complete sensor can therefore be configured within 100 ms. Figure 7.18 shows a simulated waveform of the MuPix8 sensor slow control process transferring a bit at a clock frequency of 31.25 MHz.

![Figure 7.17: Memory map for the MuPix8 sensor slow control. The control word (CTRL) contains the number of 32-bit words \( N \) that are to be transferred and defines which sensors are to be configured as well as if their load and readback signals are to be raised. The transmission is framed by defined start and end markers.](image1)

![Figure 7.18: Simulated waveform of writing the sensor’s configuration using the MuPix8 sensor slow control block. Zoom into the transmission of a single bit.](image2)
Data path

The data path processes the detector data coming from the front-end and creates packets of hit information. A block diagram of the data path is shown in Figure 7.19.

The transceiver block at the back-end uses the same implementation as the front-end. The detector data coming from the front-end arrives at the gigabit receivers of the back-end FPGA where it is deserialized and synchronized to the global 125 MHz clock. The transmitters send the MuPix sensor and PCB slow control data as well as the synchronous sensor resets to the front-end board. Error counters and status signals of the transceivers are accessible via PCIe registers.

The data unpacker checks the raw detector data streams for compliance with the MuPix link protocol and recovers pixel hit and chip counter
7.4 Firm- and Software at the Back-end

Table 7.5: Packet structure for the commissioning readout. Markers can be distinguished from data using the MSB, which is ‘1’ for marker words only. The 32 b hit information is split into two words, with the first containing the timestamps and the second the hit address.

<table>
<thead>
<tr>
<th>word</th>
<th>content</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Marker: begin of packet</td>
</tr>
<tr>
<td>2</td>
<td>Packet counter: equals MuPix readout cycles</td>
</tr>
<tr>
<td>3</td>
<td>FPGA timestamp [62:31]</td>
</tr>
<tr>
<td>4</td>
<td>FPGA timestamp [31:0]</td>
</tr>
<tr>
<td>5</td>
<td>Packet counter after zero suppression</td>
</tr>
<tr>
<td>6</td>
<td>MuPix chip counter</td>
</tr>
<tr>
<td>7</td>
<td>MuPix hit (time)</td>
</tr>
<tr>
<td>8</td>
<td>MuPix hit (address)</td>
</tr>
<tr>
<td>...</td>
<td>MuPix hit states repeat for a total of N hits</td>
</tr>
<tr>
<td>7 + 2N</td>
<td>Marker: end of packet</td>
</tr>
</tbody>
</table>

Information as well as the link identifier from the incoming data. As the timestamps of the MuPix hits are Gray-encoded, but binary timestamps are required for the operation of the sorting algorithm [2], conversion of the timestamps is implemented on the FPGA.

Four readout modes are implemented within the back-end firmware that fulfil different purposes. A multiplexer is used to select one of the readout modes whose data is transferred to the DDR3 interface.

**Readout mode: single link** In the single link readout, the data of one of the 32 MuPix links is processed. For every readout cycle of the MuPix’s state machine, a packet is created that contains the complete hit and counter information. The general packet structure is outlined in Table 7.5. This readout mode is used to check the general functionality of the pixel sensors.

**Readout mode: single link with zero suppression** In this readout mode, packets that do not contain any hits are detected and dropped in order to reduce the data rate at the output. Optionally, a reduced rate of empty packets, which can be set through a register, can be forwarded. The number of packets that pass the zero suppression is counted and written into word position 5 within the packet, see Table 7.5. This readout mode is used to check for hits within a single sensor link.

**Readout mode: multi-link with zero suppression** For all 32 links, this readout mode creates zero suppressed packets that are buffered in FIFOs, see Figure 7.20. The number of packets that are currently buffered is tracked for each channel. Overflow protection is implemented such that complete packets are dropped before a FIFO could be entirely filled. The
The number of packets that have been dropped can be recovered from the packet counters.

The FIFOs are read in a quick round-robin arbitration scheme. Within one clock cycle, all 32 links are checked in ascending order if their FIFOs contain a packet. The first packet found, at link position $X$, is transferred from the FIFO to the output. Afterwards, the arbiter starts again at link position $X + 1$. The sensor data can be interleaved with timing reference information, which is also stored in a FIFO. If timing data is available, a dedicated packet is built and sent to the output.

This readout mode allows to recover the complete readout structure of all links that are attached. It is therefore useful for the commissioning of the readout system. Its disadvantage is the large overhead, especially for packets containing only single hits, which limits the maximum throughput. Studies of the rate capabilities have been performed during test beam measurements and are described in section 7.6.6.

**Readout mode: Hit sorter**  In the sorted readout mode, incoming hits of 4 links are multiplexed into one data stream and chronologically sorted within a RAM. The sorting algorithm is described in detail in [2]. This readout mode is closest to the final implementation for the Mu3e experiment. However, in its currently available design, it requires too many resources to read out all 32 attached sensor links. Hence, the sorter accepts data from 8 links only within this vertical slice test. The implementation of the sorting algorithm differs from the description in section 5.5 with respect to the packet parameters. The sorter creates packets of 128 timestamps instead of 16. Up to 15 pixel hits can be stored per timestamp instead of 63. Within a packet, each hit is represented using 64 bit instead of 28 bit. An internal timestamp, that can be delayed through a register, defines the boundary between blocks of timestamps that can be transferred towards the PC and blocks to which new hits can be written. The efficiency of the sorter depends on the correct setting of this delay, which requires knowledge of the hits’ readout latency.
FPGA histograms  On the FPGA, a set of RAM-based histograms are implemented to monitor the data quality of the detector data. These include histograms of the hit addresses and timestamps as well as of the occurrence of data words, control characters complying with the MuPix protocol and erroneous control characters. The latter are injected in the data stream on the front-end FPGA to identify link errors. Also, information about the sensors’ readout structure is stored in histograms. These include the number of hits per readout frame, called hit multiplicity, and the time difference between two consecutively readout hits. Especially when using the sorted readout mode, information about the sensor’s readout structure can not be recovered from the offline data anymore. The histograms also allow to identify bottlenecks within the readout system, as the online hit rates of the sensors can be recovered from the histograms and can be compared to the offline hit rates from the data written to disk.

**DDR3-RAM interface**

The back-end FPGA board features a 512 MB large DDR3-RAM that can be used to buffer data temporarily before it is being transferred to the PC. The back-end firmware implements an interface to this RAM that allows to reduce the rate at which data is being read from the buffer. This is required if the data transfer over PCIe relies on polling, which limits the data rate to the computer to a few MB/s, while the readout modes described in the previous section can provide up to 500 MB/s. When using direct memory access, this RAM can be bypassed. The general interface can be implemented on other FPGA boards as well, for instance on the receiving boards in the filter farm, where the data is buffered in an on-board memory until event selection takes place.

The DDR3 is operated with a 64 b interface at its maximum frequency of 533 MHz. It features burst mode, which allows to transfer up to 64 words in consecutive clock cycles to increase data transmission efficiency. In the case that the reading data rate is intentionally reduced, the RAM is protected from overflow. If data that has not been read yet were to be overwritten, further writing to the buffer is prohibited. Instead, the RAM is set to be read only until all data is read. This does not pose an issue within

<table>
<thead>
<tr>
<th>Frequency [MHz]</th>
<th>Combined bandwidth [GB/s]</th>
<th>Bit error rate limit at 95% CL</th>
</tr>
</thead>
<tbody>
<tr>
<td>125</td>
<td>1.0</td>
<td>$8 \cdot 10^{-13}$</td>
</tr>
<tr>
<td>250</td>
<td>2.0</td>
<td>$6 \cdot 10^{-15}$</td>
</tr>
<tr>
<td>375</td>
<td>3.0</td>
<td>$1 \cdot 10^{-13}$</td>
</tr>
<tr>
<td>500</td>
<td>4.0</td>
<td>$2 \cdot 10^{-15}$</td>
</tr>
</tbody>
</table>

Table 7.6: Bit error rate tests of the external DDR3-RAM interface. The combined bandwidth is the sum of data written and read from the RAM. All bit error rates are given as upper limits and differ due to varying run time.
the vertical slice test, as the readout software typically resets the readout system after 500 MB of data have been acquired.

The hardware implementation of the DDR3-interface is tested with a bit error rate test. A 32 b counter pattern is generated as test data at 125 MHz, 250 MHz, 375 MHz and 500 MHz and written to the external RAM. The data is read from the external RAM and compared to the expected values to identify bit errors. Up to 500 MHz, corresponding to a combined read and write bandwidth of 4 GB/s, data transfer is found be work without any errors, see Table 7.6 for the resulting upper limits on the bit error rate.

**PCle interface**

The PCIe interface handles communication and data transfer with the computer. It contains four on-chip memory regions that are either writeable by the PC or by the FPGA. For the control and status registers, two times 256 B of memory are used. For data transfer between the PC and the FPGA, two times 256 kB of memory are provided. In addition, direct memory access is implemented, as described in [63].

**Resource utilization**

The firmware design has been implemented using the Quartus II software version 15.1.0. Two versions of the firmware are used during the test beam studies, as the implementations of the sorted readout mode and the zero suppressed multi-link readout mode do not fit into the FPGA in parallel. The commissioning readout firmware, which contains the multi-link readout mode, utilizes 78 % of the FPGAs logic, with 87 % of the ALMs

<table>
<thead>
<tr>
<th>Resource</th>
<th>Used</th>
<th>Percentage Used</th>
</tr>
</thead>
<tbody>
<tr>
<td>ALUTs</td>
<td>101 723</td>
<td>56 %</td>
</tr>
<tr>
<td>Dedicated logic registers</td>
<td>106 106</td>
<td>58 %</td>
</tr>
<tr>
<td>Logic utilization</td>
<td>142 141</td>
<td>78 %</td>
</tr>
<tr>
<td>ALMs partially or completely used</td>
<td>79 449</td>
<td>87 %</td>
</tr>
<tr>
<td>Total block memory implementation bits</td>
<td>12 229 632</td>
<td>84 %</td>
</tr>
</tbody>
</table>

Table 7.7: Resources used by the implementation containing the multi-link readout mode.

<table>
<thead>
<tr>
<th>Resource</th>
<th>Used</th>
<th>Percentage Used</th>
</tr>
</thead>
<tbody>
<tr>
<td>ALUTs</td>
<td>106 975</td>
<td>59 %</td>
</tr>
<tr>
<td>Dedicated logic registers</td>
<td>88 916</td>
<td>49 %</td>
</tr>
<tr>
<td>Logic utilization</td>
<td>151 699</td>
<td>83 %</td>
</tr>
<tr>
<td>ALMs partially or completely used</td>
<td>82 448</td>
<td>90 %</td>
</tr>
<tr>
<td>Total block memory implementation bits</td>
<td>11 280 384</td>
<td>77 %</td>
</tr>
</tbody>
</table>

Table 7.8: Resources used by the implementation containing the sorted readout mode.
being partially or completely used. Table 7.7 comprises also the number of ALUTs, dedicated logic registers and total block memory bits being used. The firmware containing the hit sorter utilizes 83% of the FPGAs logic, with 90% of the adaptive logic modules (ALMs) being partially or completely used, see Table 7.8.

In Table 7.9, the resource utilization in form of adaptive logic modules and memory bits is broken down for the largest components. The PCIe block utilizes most block memory bits. The histograms and the two readout modes constitute the largest parts of the utilized logic modules.

The hit sorted readout mode requires a more resource efficient implementation to be used on the Mu3e front-end board. For the readout of 45 sensor links, the current design is estimated to require more than 130 k ALMs, while the Arria V offers only 92 k. The multi-link readout mode could be implemented for the commissioning of the detector in its current design, as an up-scaling to 45 links yields an estimated resource usage of about 28 k ALMs. A subset of the histograms can potentially be implemented on the front-end board, depending on the remaining number of available logic elements.

### 7.4.2 Back-end software on the FPGA

The software running on the back-end’s NIOS processor has reduced functionalities compared to the front-end, but uses a similar implementation. It supports transceiver reconfiguration as well as the EyeQ feature. It can be accessed through a JTAG UART and a dual-port RAM that is exposed to the PCIe registers. The program requires 15 kB of memory for code and initialized data.

### 7.4.3 Back-end software on the PC: MuPix Telescope

The back-end FPGA is controlled and readout using the MuPix Telescope data acquisition software, described in 7.1. The software is adapted for the

<table>
<thead>
<tr>
<th>Entity</th>
<th>ALMs</th>
<th>block memory bits</th>
</tr>
</thead>
<tbody>
<tr>
<td>Histograms</td>
<td>28446</td>
<td>3 213 568</td>
</tr>
<tr>
<td>Hit sorter readout</td>
<td>23710</td>
<td>491 520</td>
</tr>
<tr>
<td>Multi-link readout</td>
<td>20220</td>
<td>950 784</td>
</tr>
<tr>
<td>PCIe block</td>
<td>8005</td>
<td>5 069 472</td>
</tr>
<tr>
<td>Transceiver block</td>
<td>6003</td>
<td>578 864</td>
</tr>
<tr>
<td>DDR3 interface</td>
<td>5749</td>
<td>200 800</td>
</tr>
<tr>
<td>Data unpacker</td>
<td>3200</td>
<td>0</td>
</tr>
<tr>
<td>Timestamp conversion</td>
<td>1600</td>
<td>0</td>
</tr>
</tbody>
</table>

Table 7.9: Resource utilization for the largest entities in the back-end firmware.
7.4 Operation of the Vertical Slice Setup

A screenshot of the GUI is shown in Figure 7.21.

Data taking periods are divided by the software into runs, whose duration is defined by the maximum size of the accumulated data file of 500 MB. In-between runs, the pixel sensors are configured and the readout system is being reset. The uptimes and bit error counts of the optical transceivers are constantly monitored. For each run, additional status information of the hit sorter and the FPGA register configuration are logged. In addition, the FPGA histograms are readout and stored at the end of each run.

7.5 Commissioning

In the following, the commissioning of the vertical slice setup is described. Before the setup is used as an 8-layer beam telescope at a test beam facility, it is assembled and tested in the laboratory.

With the master clocks provided by the clock distribution system, communication between all devices can be established following a dedicated reset procedure. At power-on-reset, the firmware images of the FPGAs are loaded from the on-board flash memory devices. After successful booting of the FPGAs, all optical channels transmit the synchronization pattern. Bit alignment is automatically performed at the receivers such that the links are synchronized and ready to be enabled for data transfer. If the links
are disconnected during operation, they can be restored by resetting the corresponding transmitters into the synchronization state and force the receivers to re-align.

The optical data transmission is tested using bit error rate tests at 6.25 Gb/s as described in section 6.3. First, the FPGA’s fast transceivers on the front-end and on the back-end are tested in serial loopback to verify that the firmware implementations on the FPGAs are correct. All 12 transceiver channels at the front-end as well as all 8 transceiver channels at the back-end are found to work error-free during several hours of operation, which corresponds to an upper limit on the bit error rate of $5 \cdot 10^{-14}$ at a confidence level of 95% per channel. Secondly, the front-end and back-end transceivers are tested in optical loopback. For that purpose, the MiniPod transmitter and receiver at the front-end are connected with a 1 m long 12-fold multi-mode fibre cable. At the back-end, the SFP+ transceivers are connected in a pairwise loop using 1 m long multi-mode fibres. Both optical loopback tests are successful with no error detected during transmission, resulting in upper limits on the bit error rate of $3 \cdot 10^{-13}$. Lastly, the MiniPod transmitters and receivers are connected with the corresponding SFP+ receivers and transmitters to establish the links between front-end and back-end. For proper operation, equalization at the front-end’s receiving channels is adjusted to a default of 9 dB of DC gain and 5 dB to 6 dB of high frequency gain. For two receiving channels, a high frequency gain of 9 dB to 10 dB is applied. The links have been extensively tested over several days of operation, resulting in upper limits on the bit error rate of $6 \cdot 10^{-16}$ per channel.

The clock signals for the pixel sensors are aligned using the reconfigurable delays of the front-end FPGA’s output buffers, as described in section 7.3.1. The clock signals are probed with the Tektronix DPO7254C oscilloscope [94] at the QTH-SCSI-Adapter-PCBs and the buffer delays are adjusted using the front-end board control software. The clocks of the different QSH-banks are phase-aligned within at least $13.5 \pm 9.1$ ps. The measured delays between the clocks of the different banks are compiled in Table 7.10.

<table>
<thead>
<tr>
<th>Clock</th>
<th>Delay [ps]</th>
</tr>
</thead>
<tbody>
<tr>
<td>B</td>
<td>8.7 ± 9.2</td>
</tr>
<tr>
<td>C</td>
<td>10.8 ± 11.0</td>
</tr>
<tr>
<td>E</td>
<td>13.5 ± 9.1</td>
</tr>
</tbody>
</table>

Table 7.10: Delay between the rising edges of clocks on QSH-banks B, C and E with respect to QSH-bank A.
to a radioactive source, which is monitored using the MuPix Telescope software running on the back-end.

7.6 Test Beam Studies

The vertical slice setup was used as an 8-layer particle tracking telescope during two beam times at DESY in Hamburg and at MAMI in Mainz. The purpose of these beam tests was to commission the setup under experimental conditions with all sensors being exposed to real particles. At DESY, the setup was taken into operation with moderate particle rates. The performance of the sensor’s LVDS links and the fast optical links is studied. Furthermore, the reliability and capabilities of the readout system are investigated. As high sensor hit rates can only be emulated at DESY by intentionally detuning the sensors to high noise rates, the setup was operated at a second beam time at MAMI in Mainz where particle rates of several MHz can be provided. The serial links are studied for potential impairments caused by the high hit rates. The rate capabilities of the readout system are investigated and bottlenecks are identified.

In addition to the studies of the readout system, these beam times are primarily used to characterize the analogue performance of the MuPix sensors, which is not part of this thesis. Details can be found in [64]. For all sensors, only hit data originating from the pixel matrices A, corresponding to link 0, is further processed in the readout chain, as the current based signal transmission in matrices B and C could not be operated efficiently.

7.6.1 Periods of active data taking for link studies

The vertical slice based particle tracking telescope has been set up in the test beam area 24 of the test beam facility at DESY for two weeks in June and July 2018. A beam of positrons with an energy of 2.5 GeV is used for these studies. A list of the eight MuPix8 sensors in operation is compiled in Table 7.11. All sensors are operated at a supply voltage of around 1.9 V

<table>
<thead>
<tr>
<th>Layer</th>
<th>Sensor</th>
<th>VHigh [V]</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0084-0003-000021</td>
<td>1.92</td>
</tr>
<tr>
<td>1</td>
<td>0084-0003-000018</td>
<td>1.92</td>
</tr>
<tr>
<td>2</td>
<td>0084-0001-000005</td>
<td>1.91</td>
</tr>
<tr>
<td>3</td>
<td>0232-0001-000004</td>
<td>1.92</td>
</tr>
<tr>
<td>4</td>
<td>0084-0002-000006</td>
<td>1.93</td>
</tr>
<tr>
<td>5</td>
<td>0084-0001-000010</td>
<td>1.93</td>
</tr>
<tr>
<td>6</td>
<td>0084-0003-000004</td>
<td>1.91</td>
</tr>
<tr>
<td>7</td>
<td>0084-0003-000030</td>
<td>1.93</td>
</tr>
</tbody>
</table>

Table 7.11: Sensors in operation at the DESY test beam

<table>
<thead>
<tr>
<th>Layer</th>
<th>Sensor</th>
<th>VHigh [V]</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0084-0003-000021</td>
<td>1.92</td>
</tr>
<tr>
<td>1</td>
<td>0084-0003-000018</td>
<td>1.92</td>
</tr>
<tr>
<td>2</td>
<td>0084-0001-000005</td>
<td>1.91</td>
</tr>
<tr>
<td>3</td>
<td>0232-0001-000004</td>
<td>1.92</td>
</tr>
<tr>
<td>4</td>
<td>0084-0002-000006</td>
<td>1.93</td>
</tr>
<tr>
<td>5</td>
<td>0084-0001-000010</td>
<td>1.93</td>
</tr>
<tr>
<td>6</td>
<td>0084-0003-000004</td>
<td>1.91</td>
</tr>
<tr>
<td>7</td>
<td>0232-0001-000004</td>
<td>1.92</td>
</tr>
</tbody>
</table>

Table 7.12: Sensors in operation at the MAMI test beam
7.6 Test Beam Studies

Figure 7.22: Active data taking periods during the DESY test beam.

Figure 7.23: Regular sensor operation at the DESY testbeam yields about 30 kHz of total hit rate.

as no efficient settings for the pixel sensors have been available for higher voltages.

Figure 7.22a shows the active data taking periods during the whole test beam period at DESY. Short interruptions of data taking are due to run changes. During a run change, the sensors are reconfigured, the FPGA histograms are read out and the software prepares the start of the next run. Longer interruptions are mainly caused by accesses of the beam area in order to maintain the setups. Figure 7.22b shows a zoom into the morning of July 2nd when a threshold scan of the two central sensors, layers 3 and 4, is performed. This period is selected for studies of the electrical and optical links. The average hit rate per sensor during this operation is between 3 kHz to 4 kHz, see Figure 7.23.

High rate studies of the vertical slice setup are performed at the X1 beamline of the MAMI accelerator in Mainz in August 2018. A beam of electrons with an energy of 855 MeV is used. The beam rate can be varied over a wide range from a few kHz to several MHz. The same set of sensors is used, with layer 3 and 7 swapped, see Table 7.12. A beam rate scan is
chosen to study effects of varying hit rate on the electrical and optical links. During the chosen beam period, see Figure 7.24a, the sensor hit rates are varied between a few kHz and a few MHz, see Figure 7.24b. Due to the higher particle rate, the average run times are much shorter compared to the DESY test beam.

7.6.2 Temperature monitoring

The ADCs on the front-end board measures the temperature of the FPGA during data acquisition. In addition, the temperatures of the ADCs themselves are recorded, see Figure 7.25a for the temperature monitoring during the DESY test beam. The software used to log the temperatures, see section 7.3.3, has not been constantly running, therefore, measurement points are missing. The FPGA’s temperature is about $55^\circ$C, with a time of day
7.6 TEST BEAM STUDIES

Figure 7.25: Temperatures of the FPGA and the ADCs on the front-end board during the DESY test beam.

Dependent modulation of about 2 °C. The different temperatures of the
ADCs can be related to their distance to the FPGA, which is the largest
power consumer on the front-end board. Figure 7.25b shows that the tem-
perature conditions are stable during the period, shown in Figure 7.22b,
that is chosen to study the high-speed optical and electrical links.

7.6.3 Optical links

The quality of the optical data transmission is studied using the link uptime
and bit error counts that are constantly monitored for all channels. Three
different channel configurations, as described in section 7.2.5, are operated
and studied: transmitters at the front-end connected to receivers at the
back-end and vice versa, as well as optical loopback between transmitter
and receiver channels at the front-end.

Figure 7.26: Counts and rates of a front-end receiver connected in optical loopback
to a front-end transmitter.
Figure 7.26a shows the performance of one of the links operated in optical loopback between a MiniPod transmitter an receiver channel on the front-end during the DESY test beam. A steadily increasing word count is an indicator for stable operation. The error count is stable at 0 with only a few interruptions that can be related to the front-end FPGA being reset or loosing its configuration. The latter can be caused for instance by other parts of the setup being power cycled. The word and error counts are reset to zero whenever the link between the front- and the back-end has to be re-established, which is mostly due to crashes of the back-end PC caused by the back-end’s data acquisition software.

Data and bit error rates can be computed from these counts, see Figure 7.26b. The mean data rate during stable operation is 6.25 Gb/s. The spread of the data points is due to fluctuations in the time of access of
the JTAG interface, which are not taken into account in the measurement interval.

Figure 7.27a shows the data and error rate of a receiver channel on the front-end board that is connected to a back-end transmitter. Errors can be caused by the configuration loss of the front-end FPGA or by a power-cycle of the back-end PC due to a crash of the data acquisition software. Figure 7.27b shows a zoom into the chosen stable data taking period, during which the link runs free of errors.

The data and error rates of a back-end receiver channel are displayed in Figure 7.28a. Errors are caused by the same aforementioned reasons. During the chosen stable period of data acquisition, the link is operated without any bit errors, see Figure 7.28b.

During stable data taking conditions, all 12 receiving channels at the front-end run error-free. The same holds for the 8 receivers at the back-end, see Figures 7.29a and 7.29c. The corresponding upper limits on the bit error

---

(a) All front-end receiver channels run error-free.

(b) Upper limits on the bit error rate for the front-end receiver channels per run are limited by the run duration.

(c) All back-end receiver channels run error-free.

(d) Upper limits on the bit error rate for the back-end receiver channels per run are limited by the run duration.

Figure 7.29: Summary of the errors and bit error rates of the optical receiver channels at the DESY test beam during a period of stable data acquisition. Front-end: Channels 0 to 3 are the QSFP channels and not connected. Channels 4 to 7 are connected in loopback, channels 8 to 15 are connected to the back-end. Back-end: all 8 channels are connected to the front-end.
(a) All front-end receiver channels run error-free.

(b) Upper limits on the bit error rate for the front-end receiver channels per run are limited by the run duration.

(c) All back-end receiver channels run error-free.

(d) Upper limits on the bit error rate for the back-end receiver channels per run are limited by the run duration.

Figure 7.30: Summary of the errors and bit error rates of the optical receiver channels at the MAMI test beam during a period of stable data acquisition. Front-end: Channels 0 to 3 are the QSFP channels and not connected. Channels 4 to 7 are connected in loopback, channels 8 to 15 are connected to the back-end. Back-end: all 8 channels are connected to the front-end.

rates per run, all at a level of $10^{-11}$ to $10^{-12}$ at 95\%CL, are limited by the run duration and shown in Figures 7.29b and 7.29d.

The same study is performed during the beam rate scan at MAMI, described in section 7.6.1. The optical links are found to be independent of the sensors hit rates. All links run without any errors, see Figures 7.30a and 7.30c. The upper limits on the bit error rate per run are shown in Figures 7.30b and 7.30d. The limits for the individual runs are higher compared to the DESY data, due to the shorter average runtime of about 15 seconds per run.

7.6.4 Electrical sensor data links

The 32 sensor LVDS links are continuously monitored during test beam operation. The data and error rates of an exemplary link, link 0 of sensor 0, are shown in Figure 7.31 for the operation at the DESY test beam. The
The average data rate is measured to be 1.25 Gb/s and errors are found to occur mainly in bursts.

On the electrical sensor links, errors occur more frequently compared to the optical links. The reasons for this are manifold. First of all, the sensors are being reconfigured between runs. During that process, the serializer
might enter an undefined state for a short period of time, which results in bit errors on the link. Moreover, the FPGA’s LVDS receivers are less robust than the fast transceivers, as there is no equalization or dedicated clock recovery circuitry. Also, the signal quality of some sensor links is just not sufficient for an error-free data transmission, as the on-chip variations as well as the sensor-to-sensor variations are still rather large for a supply voltage of 1.9 V.

A zoom into a period of stable data acquisition is shown in Figure 7.32a. Over a run time of more than two hours, this specific link is running stable without producing any bit errors at all. The same is observed for link 0 of sensor 5, see Figure 7.32c. For link 0 of sensor 4, see Figure 7.32c, bit errors are found during seven runs. Figure 7.32d shows an example of a link with

(a) All LVDS channels with at least one bit error during a run are marked red, all error free channels are marked in green.

(b) Bit error rates for channels with errors and upper limits for error-free channels.

Figure 7.33: Summary of the errors and the bit error rates of the sensors’ LVDS links at the DESY test beam.
bad signal quality, whose bit error rate changes over time in discrete steps over several orders of magnitude.

The appearance and disappearance of bit errors as well as the discrete bit error rate steps can be attributed to a change in the dynamic phase alignment of the LVDS receiver. After configuration of the MuPix sensor, the link can be interrupted shortly. This causes the alignment process to re-align the LVDS receiver. During that process, the dynamic phase alignment circuitry is reset and aligns the data to one of eight clock phases. It can happen that the receiver locks to an unfavourable phase that causes bit errors in the deserialized data stream. Moreover, the receiver, once it is phase-locked, does not tell if the phase is kept constant or changed during runtime, therefore discrete steps in the bit error rate can occur whenever the phase is changed.

This issue can be caught in future, improved versions of the firmware. To find proper phase alignment, feedback from the bit error rate counter to the alignment process is required. Change of the sampling phases during runtime can be prevented by enabling the hold signal of the DPA circuitry after a stable phase has been found.

The performance of the electrical links is summarized in Figure 7.33a, distinguishing the links that are operated without bit errors from those that produced at least one bit error per run. In Figure 7.33b, the corresponding bit error rates are shown. For the channels free of errors, the upper limit on the bit error rate is given per run which is only limited by the runtime. Only one link, link 2 of sensor 6, can not be operated error-free at all and has varying bit error rates as discussed before. The remaining 31 links can potentially be operated error-free. Depending on the runtime, the upper limits on the bit error rate are of the order of $10^{-10}$ to $10^{-12}$ per channel and run. However, in only a few runs all of these links are operating completely stable. The signal quality of some of the links leads to non-zero bit error

![Figure 7.34: Data and error rates of an exemplary LVDS link, sensor 4 link 0, during a beam rate scan at MAMI. The data rate is averaged over intervals of 3 minutes.](image-url)
rates during some runs, which is most likely due to an unfavourable phase in the DPA circuitry. Future alignment procedures should therefore use feedback from the bit error counters directly on the FPGA and force the DPA circuitry to hold the best phase.

At the MAMI test beam, it is studied if the link quality is affected by high particle rates. Figure 7.34 shows the recorded data rate and bit error rate of an exemplary LVDS link that is running stable without producing any bit errors independent of the particle rate. Figure 7.35a shows that, similar to the operation at DESY, not all links are running error free. Three channels constantly produce errors, while the majority of the links works flawlessly. Single runs with non-zero error counts can most likely be attributed to sensor misconfiguration, as seems to be case for run 547 for instance, where all channels of sensor 1 produce bit errors. Consecutive runs with

![Diagram](a) All LVDS channels with at least one bit error during a run are marked red, all error free channels are marked in green.

![Diagram](b) Bit error rates for channels with errors and upper limits on error-free channels.

Figure 7.35: Summary of the errors and bit error rates of the sensors’ LVDS links at the MAMI test beam.
a constant, non-zero error rate, as is the case for sensor 6 link 3 during runs 488 to 496, can be attributed to an unfavourable phase alignment. The measured bit error rates and the upper limits calculated according to the runtimes for channels free of errors are shown in Figure 7.35b. As there is no dependence of the stability of the data links with respect to the hit rate observable, compare Figure 7.24b for the hit rate during the data acquisition period, it is concluded that the quality of data transmission of the MuPix8 sensor is independent of the particle rate.

7.6.5 MuPix readout: hit rates, load and multiplicity

For Phase I of the Mu3e experiment, the readout system has to withstand hit rates up to several MHz per sensor for the innermost layers. The DESY testbeam facility provides particle rates of a few kHz. To exceed these rates, the sensors can be intentionally detuned to produce high noise rates. Figure 7.36 shows the hit rate per sensor and the total system hit rate during the test beam at DESY. The hit rates are true online hit rates, measured on the FPGA in order to mitigate any bottlenecks in the readout system that are investigated in the subsequent sections. Total rates up to 30 MHz are achieved during noise tests, with a single sensor delivering about 10 MHz of noise hits. Under normal operation of the sensors at DESY, the total hit rate is about 30 kHz, with each sensor contributing between 3 and 4 kHz, see Figure 7.23. As the readout related time structure of noise hits is different from hits induced by actual particles, dedicated tests with high particle rates are performed at MAMI. Figure 7.37 shows a beam rate scan performed at MAMI. The total system hit rate is varied from a few 10 kHz up to a maximum hit rate of about 80 MHz, with single sensors delivering up to 10 MHz of hits.

All sensors are operated with a timerend value of 1, reducing the clock of the readout state machine to 31.25 MHz, as the MuPix8 prototype cannot

Figure 7.36: Hit rates during the complete beam time at DESY. High rates are achieved by detuning the sensors to high noise rates.
be properly operated at full speed. This limits the maximum theoretical hit rate per link to 14.15 MHz. As for each sensor only the data originating from one link is further processed, this limit equals the maximum hit rate per sensor within these studies.

For every sensor link, the number of hits per readout cycle is stored in a histogram on the FPGA. This allows to access the readout structure for different configurations and hit rates independent of the readout mode chosen downstream within the back-end firmware. Figure 7.38 shows an exemplary multiplicity distribution for a hit rate of about 4 kHz at DESY. The maximum number of hits that can be readout per link is 48, which corresponds to one hit per column.

The majority of the readout frames contain zero hits and are thus empty. Towards hit multiplicities of 10, the distribution drops sharply and only a small number of frames contains more hits than that. These high multiplicities can be induced by secondary particles like delta electrons that create hits within several columns. Also during sensor configuration, high
hit multiplicities can occasionally occur which may stem from cross-talk of the toggling digital configuration signals into the chip periphery.

The number of hits per readout frame is broken down into two variables, which are referred to as readout load and multiplicity. Load is the ratio of readout frames that contain at least one hit versus the total number of frames, so

$$\text{load} = \frac{\sum_{i=1}^{48} N_i}{\sum_{k=0}^{48} N_k}, \quad (7.1)$$

with $N_i$ being the number of frames containing $i$ hits. Multiplicity is defined as the ratio of hits per non-empty frames, so

$$\text{multiplicity} = \frac{\sum_{i=1}^{48} i \cdot N_i}{\sum_{k=1}^{48} N_k}. \quad (7.2)$$

Figure 7.39: Readout load and multiplicities during the DESY test beam. For high rates, the sensors have been intentionally detuned to add noise hits.

Figure 7.40: Readout load and multiplicities during the MAMI test beam.
Figure 7.39a shows the average load versus hit rate for all sensors operated at the DESY test beam. Load values are averaged over several runs with similar hit rates. For rates exceeding 1 MHz, the readout load value approaches 1, as the MuPix readout starts to saturate. For regular operation at the DESY test beam facility, with rates of a few kHz, the average readout load is of the order of $10^{-3}$. The average hit multiplicity is close to 1 for hit rates up to 1 MHz, and rises sharply as the readout reaches saturation, see Figure 7.39b. Close to full readout saturation is achieved at a sensor hit rate of about 11.6 MHz, which corresponds to 82% of the theoretical maximum hit rate.

Comparing the DESY noise results with actual high rate beam data taken at MAMI, see Figures 7.40a for readout load and 7.40b for readout multiplicity, only minor differences in the readout structure are observed between high noise and high particle rates. At MAMI, the average readout multiplicity rises already at sensor hit rates below 1 MHz which is not the case for the noise data from DESY. This is due to the uncorrelated nature of the noise hits. Particles, however, can create clusters with more than one pixel being hit due to charge sharing. Hence, the average multiplicity is expected to be higher. As charge sharing is small in the MuPix sensors, the effect is not very pronounced. System load studies can therefore be performed at both facilities. At MAMI, the sensor’s readout does not fully saturate multiplicity-wise. The hit multiplicity is limited to 20 because the particle beam is not uniformly distributed over the entire pixel matrix.

7.6.6 Performance of the commissioning readout mode

The back-end is operated using the zero suppressed multi-link and the hit sorting readout modes, which are described in section 7.4.1, in order to compare their performance and check whether they meet the requirements of the first phase of the Mu3e experiment. The requirements are derived from the expected average hit rates, which originate from the innermost layers of the pixel detector: per front-end FPGA, an average system hit rate of 48.6 MHz is expected, with an average hit rate of 1.2 MHz per link, following from section 5.3.1.

During the test beam studies, the vertical slice setup processes the data from eight MuPix8 sensors with only one link each. The following two requirements are therefore placed upon the system. The system should be able to process the average hit rate per link, resulting in a total system hit rate of $8 \times 1.2\ MHz = 9.6\ MHz$ within this study (requirement I). Moreover, the readout system should be capable of processing the total system hit rate of 48.6 MHz (requirement II). The latter can be tested using average hit rates of more than 6 MHz per link. At these rates, however, readout load and multiplicity start to saturate, which can affect the general system behaviour. Hence, system and sensor performance have to be disentangled.

The multi link readout mode is used as a commissioning readout maintaining the full MuPix readout block structure for offline analysis. However,
at high rates, the round-robin arbitration scheme poses a bottleneck and the link packet buffers fill up. This leads to full packets being dropped and their complete hit information being lost. The number of packets that have been dropped, however, can be recovered from the header of the packets.

The performance of the implemented readout architecture is studied by comparing the online hit rate deduced from the FPGA histograms versus the offline hit rate retrieved from the data that has been written to disk. As can be seen in Figure 7.41a and 7.42a, offline and online hit rates are comparable up to total system hit rates of about 13 MHz, hence,

![Graph](image1)

(a) Total offline versus online hit rate. Linear fit is performed in the range from 10 kHz to 5000 kHz.

![Graph](image2)

(b) Ratio of packets dropped by the multi-link readout versus online hit rate.

Figure 7.41: Performance of the zero suppressed multi-link readout at DESY. Hit rates are total system hit rates, summed over all 8 sensors/links. The green line at 9.6 MHz represents the maximum average rate for 8 links for Mu3e phase I. The purple line at 48.6 MHz represents the maximum average rate per front-end FPGA for Mu3e phase I.

![Graph](image3)

(a) Total offline versus online hit rate. Linear fit is performed in the range from 10 kHz to 5000 kHz.

![Graph](image4)

(b) Ratio of packets dropped by the multi-link readout versus online hit rate.

Figure 7.42: Performance of the zero suppressed multi-link readout at MAMI. Hit rates are total system hit rates, summed over all 8 sensors/links. The green line at 9.6 MHz represents the maximum average rate for 8 links for Mu3e phase I. The purple line at 48.6 MHz represents the maximum average rate per front-end FPGA for Mu3e phase I.
fulfilling requirement I. A straight line fit with offset fixed at 0 yields a proportionality factor of $1.02 \pm 0.05$ between offline and online hit rate for the DESY data, which is in good agreement with a factor of 1. For the MAMI data, the proportionality factor is slightly larger than 1, $1.06 \pm 0.01$, which could be due to a systematic underestimation of the runtime in the offline data.

Independent of the high rates being induced by noise, as for DESY, or by particles, as for MAMI, offline hit rates deviate significantly from online hit rates above 13 MHz. This is due to the bottleneck by the round-robin arbitration. Above 10 MHz, the ratio of packets being dropped rises significantly, see Figures 7.41b and 7.42b. Up to a hit rate of 10 MHz, the average amount of packets being dropped is below $2 \cdot 10^{-4}$. More than 20\% of the packets are lost at system hit rates above 20 MHz. Close to the expected total system hit rate of 48.6 MHz (requirement II), losses are at a level of $45 \pm 2$\%.

For system hit rates above 30 MHz, offline hit rates are approaching the online hit rates again and the ratio of dropped packets flattens, because the average readout multiplicity per frame increases as discussed in section 7.6.5. This enhances the packet’s efficiency as the ratio of overhead being sent compared to hit data is reduced. The overall rate capabilities of this readout architecture could be enhanced by operating the round-robin arbiter at a higher clock frequency or reducing the packet overhead as well as the width of the data word used per hit.

7.6.7 Readout latency

For the sorting readout mode, described in section 7.4.1 and in detail in [2], to work properly, a delay parameter has to be set that accounts for the

![Figure 7.43: Latency between hit timestamp and chip counter and between hit timestamp and FPGA counter at the back-end.](attachment:image.png)
7.6 Test Beam Studies

The latency of the hits reaching the sorter [2]. To that end, the latency of the sensor hits within the vertical slice setup is investigated.

Figure 7.43 shows the latency distribution of hits of one MuPix8 sensor under regular conditions at DESY at a hit rate of about 4 kHz, resulting in a readout load of $10^{-3}$ with an average multiplicity close to 1. Latency to the chip counter is induced by the readout state machine, which is described in section 5.4. With the readout parameter `timerend` of 1, the minimum latency between the hit timestamp and the counter sampled at state LdPix2 is 44 time stamp cycles, or 352 ns. The latency distribution flattens over a plateau of 40 time stamp cycles, or 320 ns, which corresponds to another almost complete readout cycle with a hit being registered while the state machine enters the LdPix2 state. These latencies are the same for all sensors attached.

Figure 7.43 also shows the latency between the sensor timestamps and a timestamp that is generated and sampled on the back-end FPGA. The latency distribution is sampled using the multi-link readout which preserves

<table>
<thead>
<tr>
<th>Component</th>
<th>Latency in clock cycles (8 ns)</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Minimum MuPix latency</strong></td>
<td>60</td>
</tr>
<tr>
<td>- Minimum hit delay to chip counter</td>
<td>44</td>
</tr>
<tr>
<td>(LdPix1 → LdPix2 + 1 cycle)</td>
<td></td>
</tr>
<tr>
<td>- Minimum hit delay state machine to serializer</td>
<td>8</td>
</tr>
<tr>
<td>(LdPix2 → RdCol2)</td>
<td></td>
</tr>
<tr>
<td>- Data serialization</td>
<td>est. 8</td>
</tr>
<tr>
<td><strong>Firmware: hit data to FPGA counter</strong></td>
<td>45</td>
</tr>
<tr>
<td>- LVDS receiver block (front-end)</td>
<td>15</td>
</tr>
<tr>
<td>- GXB FIFOs 40b/32b ↔ 32b/40b (front- + back-end)</td>
<td>25</td>
</tr>
<tr>
<td>- GXB transceiver (front- + back-end)</td>
<td>est. 15</td>
</tr>
<tr>
<td>- Data path (back-end)</td>
<td>6</td>
</tr>
<tr>
<td>- FPGA counter sampling at link ID (back-end)</td>
<td>-16</td>
</tr>
<tr>
<td><strong>Firmware: chip counter reset to FPGA counter reset</strong></td>
<td>44</td>
</tr>
<tr>
<td>- Output clocking (front- + back-end)</td>
<td>3</td>
</tr>
<tr>
<td>- GXB FIFOs 40b/32b ↔ 32b/40b (front- + back-end)</td>
<td>25</td>
</tr>
<tr>
<td>- GXB transceiver (front- + back-end)</td>
<td>est. 15</td>
</tr>
<tr>
<td>- MuPix synchronous reset input clocking</td>
<td>1</td>
</tr>
<tr>
<td><strong>Transmission: hit data to FPGA, reset to MuPix</strong></td>
<td>est. 15</td>
</tr>
<tr>
<td>Optical transmission (2 × 5 m fibres)</td>
<td>est. 8</td>
</tr>
<tr>
<td>Electrical transmission (2 × 3 m SCSI + PCBs)</td>
<td>est. 7</td>
</tr>
<tr>
<td><strong>Sum: minimum hit delay to back-end FPGA counter</strong></td>
<td>164</td>
</tr>
</tbody>
</table>

Table 7.13: Contributions of the MuPix readout and serialization, of the firmware components at the front-end and back-end, as well as the contributions from the optical and electrical transmissions to the measured latency between hit timestamp and FPGA counter at the back-end. Estimated numbers are marked with "est."
the time structure of the sensor readout blocks. The minimum latency between hit and FPGA counter is between 164 and 169 clock cycles, or 1312 ns and 1352 ns, for the different sensors connected. This latency has two origins: Firstly, the global timestamp reset is created at the back-end FPGA. It resets both the FPGA’s and also the sensors’ timestamps through the synchronous reset signal. The latter, however, has to be transferred through the full readout system before reaching the sensor. Upon release of the reset, the sensors’ timestamps are delayed with respect to the FPGA. Secondly, the hits have to be transferred through the front-end to the back-end, which causes additional delay. Contributions of the different firmware components, of the data transmission as well as of the MuPix readout to the total delay are broken down in Table 7.13. From the hit data entering the front-end FPGA serially through the LVDS receivers, it takes 61 cycles, or 488 ns, through the firmware of both the front-end and back-end FPGA until the hits are packed into packets. The largest contribution of 25 cycles, or 200 ns, originates from the FIFOs within the transceiver blocks.

Three distinct readout latency plateaus are visible in Figure 7.43. These originate from correlated hits, due to clusters or crosstalk within a pixel column, that can only be readout in consecutive readout cycles and, thus, with higher latencies. In Figure 7.44 the readout latency of the hits is broken down into the number of consecutive, non-empty readout frames that are preceding each hit. To increase the statistics, the histograms of all eight sensors are summed. Hits within readout frames, that are preceded by an empty frame, have the smallest latencies. Correlated hits, that can only be readout after a hit in the same column has been readout before, generate

![Figure 7.44: Stacked histogram of the latency between hit and FPGA timestamp, broken down into the number of non-empty frames preceding the hit. The gray coloured area defines the time window of the sorter, where hits are considered to arrive out-of-time. This will be further discussed in section 7.6.8. Data from DESY testbeam, with an average hit rate of 3.75 kHz per sensor.](image)
the plateaus shifted to higher latencies. Uncorrelated hits, that are read out with preceding non-empty frames, can have lower latencies. However, the minimum latency of hits in frames preceded by a non-empty frame, is 8 clock cycles larger, which is the amount of time spent to readout a hit in the preceding frame.

Operating the sensors at high noise rates, as done at DESY, changes the readout timing structure significantly. See Figure 7.45a, the amount of consecutive non-empty frames and the amount of uncorrelated hits increases. Increasing the beam rate, however, alters the readout timing structure differently. See Figures 7.45b, 7.45c and 7.45d for average sensor hit rates of 130 kHz, 750 kHz and 6000 kHz. With increasing hit rate, the amount of consecutive non-empty frames increases, as well. But the distribution is broader due to the larger fraction of correlated hits. At sensor hit rates of 6 MHz the readout is close to saturation with a readout load close to 1, which leads to a drastic increase of the average readout latency.

Figure 7.45: Comparison of the readout timing structure at high rates due to noise or particle rates.
7.6.8 Performance of the hit sorter readout mode

The hit sorter algorithm writes incoming hits into a RAM at an address according to the hits’ timestamps [2]. It uses a block structure of 128 timestamps, with four blocks always being available for hits to be written to. The two timestamp boundaries between a hit being allowed to be written or being dropped are defined through a timestamp on the FPGA that can be delayed. In the following, hits outside of the allowed timestamp boundaries are referred to as being out-of-time.

The boundaries shift between blocks when the timestamp has increased by the size of a block. For timestamps within a block, this discrete boundary shift causes different maximum latencies for the hits to be allowed. If the boundary has just shifted, a maximum hit latency of 4 full blocks (512 timestamps) is allowed. If the boundary is just about to shift, the maximum hit latency is reduced to only 3 blocks (384 timestamps). Averaged over all timestamps within a block, an average time window of 3.5 blocks (448 timestamps) is available for hits to be written to the RAM. This allowed time window is illustrated as the unshaded area in Figures 7.44 and 7.45.

The delay parameter is scanned at DESY operating the sensors under regular conditions to find the optimum delay value for this setup. The data can be compared to offline data acquired under similar conditions using the multi-link readout. The latency distribution of the hits is shown

![Figure 7.46: Ratio of hits that are considered to be out-of-time by the sorting algorithm versus the timestamp delay setting of the sorter. The measured online ratio (Sorted data) is compared to an offline data set that is recorded with the multi-link readout (Unsorted data). For the unsorted data, a timestamp window of 3.5 blocks (block size = 128 timestamps) is shifted across the readout latency data (Figure 7.44) which corresponds to different delay values. Hits within this window are considered to be in-time, the remaining hits are out-of-time.](image-url)
in Figure 7.44. Changing the delay parameter is equal to shifting the time window along the latency distribution. Figure 7.46 shows the comparison of hits that are considered out-of-time from the offline data versus hits that are considered out-of-time by the FPGA’s hit sorting algorithm. The data sets are in good agreement. The optimum FPGA’s timestamp delay value for the offline analysed data is around 816, which can be interpreted as a negative delay of 208 clock cycles. For further operations, the sorter timestamp delay is fixed to a value of 880, which adds a bit of safety margin as the number of out-of-time hits rises sharply between 800 and 816 and is still below $10^{-4}$ for delay values up to 928.

![Graph showing online vs. offline hit rate](image)

(a) Total offline versus online hit rate. Linear fit is performed in the range from 0.1 MHz to 50 MHz.

![Graph showing ratio of hit drops](image)

(b) Ratio of hits being lost due to different conditions versus the online hit rate.

Figure 7.47: Rate performance of the hit sorter readout mode on the back-end. Hit rates are total system hit rates, summed over all 8 sensors/links. The green line at 9.6 MHz represents the maximum average rate for 8 links for Mu3e phase I. The purple line at 48.6 MHz represents the maximum average rate per front-end FPGA for Mu3e phase I.
With this parameter fixed, the rate capabilities of the sorter are studied at the MAMI testbeam. As for the commissioning readout, the hit rate computed from the offline data is compared to the online hit rate acquired on the FPGA, see Figure 7.47a. For total system hit rates up to 50 MHz, there are no significant deviations visible between the offline and online hit rates. A linear fit yields a proportionality of offline to online hit rate of $1.05 \pm 0.01$. As for the commissioning readout data at MAMI, the runtime of the offline data might be systematically underestimated.

At the highest system hit rate of almost 80 MHz, the offline rate does not rise with the online rate anymore. This has two reasons: The sensor readout reaches saturation and the average hit latency rises, thus, the amount of out-of-time hits increases. For 80 MHz, about 20% are considered to be out-of-time, see Figure 7.47b. This can be mitigated if the readout of the MuPix sensor could be operated at full speed with a timerend value of 0. In addition, at rates of 50 MHz and above, the reading state machine of the sorter cannot cope with the high data rates anymore. Therefore, packets are not read out completely, referred to as packet boundaries broken in Figure 7.47b. This can be mitigated by operating the reading state machine at a higher clock frequency and by decreasing the width of the data word per pixel hit from 64 bit to 28 bit. The number of hits being lost due to an overflow within a timestamp bin is below $10^{-7}$ for all rates and therefore negligible.

For rates below 10 MHz, the amount of hits that are dropped seems to increase when the hit rate is lowered. This can be explained by the size of the data sets. A fixed file size is used for all runs. At lower hit rates, the packets contain less hit data and more overhead, hence, the data files also contain less pixel hits. The number of total hits that have been dropped also contains the number of hits that have been written to the memory, but that have not yet been read out. As for lower hit rates, there are less hits in the data set, the ratio of hits that are still within the sorter’s RAM increases.

The sorter readout mode fulfils requirement I posed by the average hit rate per link. At a system hit rate of 9.6 MHz, the total losses are about $0.003 \pm 0.001\%$ and fully dominated by late hits. At the required total system hit rate of 48.6 MHz (requirement II) losses are interpolated to be about $3 \pm 1\%$, with a large contribution from out-of-time hits due to the saturation of the sensor readout. As the onset of losses due to incomplete packets being read by the hit sorter is close to this data rate, the implementation of the sorter should be revised.
Part III

WIRELESS DATA ACQUISITION FOR FUTURE HEP EXPERIMENTS

Track information from silicon detectors can significantly enhance first level trigger decisions at future hadron colliders. With the detectors being exposed to ever-increasing particle fluxes, their readout systems have to handle unprecedented data volumes. This requires the massive use of high-speed data links. Silicon tracking detectors place high demands on their associated readout systems. Inside the detectors, readout electronics are exposed to extreme radiation levels. Moreover, the power and material budgets of high-speed data links are severely constrained. High-speed wireless data transmission in the 60 GHz band and above has the potential to facilitate fast readout of the detectors, to enable new readout topologies and to enhance trigger capabilities.
Future high energy physics experiments push forward into experimental conditions of unprecedented particle energies and interaction rates. The planned high luminosity upgrade of the Large Hadron Collider, called HL-LHC, provides more events to perform precision measurements in the Higgs sector and to extend searches for signatures beyond the Standard Model [117], by increasing the luminosity by a factor of 10 with respect to the LHC’s design value [118]. Future accelerators, like the envisioned Future Circular Collider (FCC), could push the centre of mass energies even further to 100 TeV with luminosities potentially exceeding HL-LHC [119]. Experiments conducted at these accelerators have to face many challenges regarding sensor and readout developments.

8.1 readout challenges

Silicon tracking detectors will continue to play a key role in collider experiments. At HL-LHC they serve as a crucial tool to disentangle events within a pile-up of 140 (200) per bunch crossing for the nominal (ultimate) luminosity of \( \mathcal{L} = 5 \cdot 10^{34} \text{cm}^{-2}\text{s}^{-1} \) (\( \mathcal{L} = 7 \cdot 10^{34} \text{cm}^{-2}\text{s}^{-1} \)) [120]. Being the most granular detectors, the trackers can significantly enhance the level-1 trigger capabilities of the upgraded ATLAS and CMS experiments, whose muon and calorimeter systems alone are not sufficient to maintain the current physics sensitivity [121, 122]. Realizations of track trigger architectures, however, are limited by the capabilities of the readout system.

Up to now, a triggerless readout of complete silicon tracking detectors does not seem feasible. With pixel detectors consisting of several hundred million channels and particle fluxes up to 2 GHz/cm\(^2\), a triggerless system would need to handle data rates of 22 Gb/s/cm\(^2\), resulting in 1800 Tb/s for an 8 m\(^2\) large detector [122]. The big challenge is to transfer these data volumes out of the hostile radiation environment with as little material and as little power consumption as possible. To circumvent this problem, the ATLAS and CMS experiments only use the tracking information of the outer detector layers for the trigger decision.

For the implementation of the track triggers, the experiments follow different approaches. ATLAS utilizes track information that is read out in regions of interest (RoI) defined by a level-0 trigger based on the muon and calorimeter systems. In CMS, data from closely-spaced sensors is correlated to discriminate tracks with high transverse momentum \( p_T \) from tracks with low \( p_T \) without requiring a level-0 trigger [123]. Both experiments only use a subset of their tracking information for the track
trigger algorithms. The remaining data is read out when the level-1 trigger’s decision is positive.

Being closest to the interaction points, the tracking detectors have to withstand highest radiation levels within the experiments. At HL-LHC, the ATLAS pixel detector will have to cope with radiation levels exceeding 17 MGy of total ionizing dose and $2 \cdot 10^{16} \text{n}_{\text{eq}} \text{cm}^{-2}$ [120]. Simulations for a potential detector at FCC yield radiation levels up to 600 MGy and $8 \cdot 10^{17} \text{n}_{\text{eq}} \text{cm}^{-2}$ for the innermost pixel layer in a central barrel tracking detector [124]. This does not only pose a challenge to the sensors, but also to the readout system.

Readout of silicon tracking detectors under these conditions is extremely challenging. To deal with the high rates, optical links would typically be employed. However, radiation hardness of lasers, photo-diodes and optical fibres is an issue [125–127]. For the phase-I upgrade of the LHC, radiation hardness of the active optical components is addressed in the Versatile Link (VL) project [128], implementing a bi-directional 4.8 Gb/s optical link in combination with the radiation-hard electrical serializer and deserializer GigaBit Transceiver (GBT) chipset [129]. For the optical components, the doses in the innermost pixel detector are yet too high, which requires to transfer the data electrically further outside. Follow-up efforts in the context of HL-LHC are concentrated in the Versatile Link PLUS (VL+) project targeting 1 MGy and $2 \cdot 10^{15} \text{n}_{\text{eq}} \text{cm}^{-2}$ [130]. First tests of a radiation hard laser driver operated at 10 Gb/s, that was irradiated up to 5 MGy, are promising [131]. The versatile links will be operated with asymmetric up-links (2.5 Gb/s) and downlinks (5 Gb/s to 10 Gb/s) provided by the successor of the GBT, the low power GigaBit Transceiver (LpGBT) [132].

For HL-LHC, a common radiation-hard pixel readout ASIC is developed within the RD53 project [133]. It features four serial links at a nominal data rate of 1.28 Gb/s [134]. These links have to be driven electrically to the LpGBT and the optical link over distances up to seven metres in the case of the ATLAS detector [120]. In order to not add too much material, low mass cabling is required, which suffers from inherent bandwidth limitations, as discussed in section 4.1. To account for losses along the transmission lines, pre-emphasis and equalization have to be integrated in the transmitters and receivers. To overcome the bandwidth limitations in the future, higher order modulation schemes like PAM4 might be used, which in turn require better signal to noise ratios to achieve error-free data transmission.

8.2 NEW READOUT TECHNOLOGIES FOR HEP

The radiation hardness issue of optical links and the bandwidth limitations of electrical links call for new technologies to facilitate high-speed data transfer under the extreme conditions present in silicon trackers. And new technologies are being developed. Optical line of sight transmission [135, 136], which could enable inter-layer communication and avoids use of optical fibres, suffers from requiring lasers and PIN diodes in the detector.
Silicon Photonics [137, 138] presents an elegant way to move lasers and PIN diodes out of the detector. Through optical fibres the laser light could be brought to and out of a photonic chip at the end of a detector stave. These fibres would have to fulfil the radiation hardness requirements, of course. A complete integration with the detector would require to have monolithic photonic detector layers to distribute the light to all front-ends. Radio frequency (radio frequency (RF) data transmission combines the benefits of both technologies removing cabling as well as active optical components. Additional advantages are discussed in the subsequent sections.

8.3 wireless data transmission

Wireless data transmission has evolved significantly over the last decades. It has become an integral part of everyday life enabling connectivity all around the globe. Steadily rising demand has led to the development of wireless systems that enable ever higher data throughput. A wide range of technological advancements has made these developments possible. Higher order modulation techniques allow to utilize bandwidth with increased spectral efficiency. Multiple-input and multiple-output (MIMO) methods allow to increase link capacity by exploiting multipath propagation [139]. Progress in the semi-conductor industry enables the usage of higher base frequencies with larger associated bandwidths.

In the following, two frequency bands are presented that open up new opportunities for fast data readout for tracking detectors. These frequency bands are located at 60 GHz and 240 GHz. In the license-free 60 GHz band, a large bandwidth of up to 9 GHz is available that facilitates data rates exceeding 10 Gb/s [140]. In the 240 GHz band, devices with 13 GHz intermediate frequency bandwidth [141] and beyond [142] have been implemented, that allow for data rates exceeding 50 Gb/s. The transceiver themselves are produced in various processes, from standard CMOS to SiGe HBT BiCMOS processes. The latter offer higher cut-off and maximum oscillation frequencies $f_T$ and $f_{\text{max}}$ [143].

Data rate is not the only critical point, which a link has to fulfil to qualify for usage in a silicon tracker. The data has to be transferred efficiently. That requires a low power consumption. With large intermediate frequency bandwidths available, spectral efficiency can be traded for power consumption. Modulation schemes with reduced complexity that can be decoded non-coherently and do not require a dedicated baseband circuitry, e.g. on-off-keying, can be implemented consuming little power. Data rates of several Gb/s can be achieved, as demonstrated in [140, 144].

With increasing carrier frequency, the form factor of antennas is drastically reduced. Antennas can be realized in form of small patch antennas on PCBs or even included into the transceiver chip [145]. This allows to implement high-speed links with low material budget.

With increasing carrier frequency, the free space path loss increases. This limits the application of low power and high-speed wireless links to short
ranges. The maximum link data rate and the maximum distance depend on the choice of antenna and can in turn be traded for power consumption.

8.4 Readout Concepts and Potential Benefits

Wireless data transmission enables new readout and distribution topologies that no longer require the use of long, high-speed electrical links. As depicted in Figure 8.1, readout of a barrel shaped silicon tracker could be realized radially [146]. This differs from implementations based on cables or fibres, where the data has to be transferred axially in order to maintain the detector’s modularity. To transmit the data from the innermost layers to the outside of the detector, the data would need to be repeated at every layer, increasing the required bandwidth from layer to layer. At the outer enclosure, after the data has escaped the most hostile radiation environment, it could be transferred using regular optical links to the counting house.

Radial readout allows to correlate inter-layer information on the detector, which could facilitate track trigger applications [146]. An algorithm similar to the one used by CMS at HL-LHC could be applied. But instead of using closely-spaced, electrically connected sensors, data from sensors

![Figure 8.1: Conceptual sketch of a wireless radial readout system implemented in a barrel shaped tracking detector. Taken from [147], and adapted from [146].](image1)

![Figure 8.2: Conceptual sketch of a triplet detector layer with wireless inter-layer communication for track trigger applications.](image2)
8.5 A wireless demonstrator at 60 GHz

Similar to the Versatile Link project, a wireless transceiver usable for high energy physics (HEP) applications has to comply with certain requirements on radiation hardness and power consumption, but does not require to follow specific industrial protocols. Instead, the transceiver can be optimized for continuous operation in line-of-sight within the static tracking
detector environment. The link margin, for instance, can be chosen such that multi-path interferences can be neglected. Within the Wireless Allowing Data And Power Transmission (WADAPT) [148, 149] project, a wireless demonstrator operating in the 60 GHz band is developed. It uses a radiation-hard design of a transmitter and receiver using OOK modulation. The transceiver will be produced in the IBM 130 nm SiGe Bi-CMOS HBT 8HP process [150, 151]. The link power consumption is estimated to be less than 150 mW to transfer data rates of 4.5 Gb/s over distances up to 1 m [152]. Depending on the actual data rate, link distance and choice of antenna, the power consumption is subject to change.
The feasibility of the concepts described in section 8.4 depends on several conditions that have to be fulfilled. The wireless transceivers have to be radiation hard. To prevent wireless links between different layers from interfering, the signals must not penetrate the detectors. Multi-path propagation evoked by reflections needs to be suppressed in order to avoid cross-talk. The performance of the detectors shall not be compromised under the influence of wireless data transmission.

This chapter covers studies that have been conducted to test the feasibility of wireless data transmission for the application in HEP experiments. The signal transmission properties of two detector modules used in the current ATLAS Semiconductor Tracker (SCT) are investigated in section 9.1. Influence of wireless data transmission on the performance of a silicon strip detector module is tested in section 9.2. Both studies are performed in the 60 GHz band. The prospects of carrier frequencies above 200 GHz are studied in section 9.3. Studies of cross-talk and its mitigation using highly directive antennas and absorbing materials can be found in [147, 154]. A first irradiation study of a wireless transceiver prototype is described in [155].

9.1 TRANSMISSION THROUGH DETECTOR MODULES

A first transmission study of electromagnetic waves in the 60 GHz band through an ATLAS SCT barrel module [156] shows that the waves cannot penetrate the detector module [146]. This measurement is repeated using a more sensitive setup and an additional spare module of the ATLAS SCT endcap detector in order to quantify transmission losses and identify critical module elements.
The setup used for this measurement is illustrated in Figure 9.1. Spare modules of the ATLAS SCT barrel and endcap detectors, depicted in Figure 9.2, are individually mounted on a 2D-movable stage. The modules are placed in line-of-sight (LOS) between a transmitting and a receiving horn antenna. Aluminium plates and graphite foam serve as shielding in order to reduce impairments of the measurements due to reflections and diffraction. The modules are exposed to linearly polarized waves in the range from 57.3 GHz to 61.3 GHz using the HMC6000/6001 transmitter and receiver chips [158]. The intensity of the signal transmitted through the modules is measured with an R&S FSU67 spectrum analyzer [20], which operates up to 67 GHz. Thus, the intensity is measured in the radio frequency band without down-conversion, making this setup more sensitive than the one used in [146].

The transmission loss $T$ through the module is obtained by normalizing the transmitted intensity $P_T$ to the intensity measured without the module $P_0$,

$$T = \frac{P_T}{P_0}.$$  \hspace{1cm} (9.1)

The setup allows to resolve transmission losses down to $-55$ dB at a minimum power of approximately $-90$ dBm over the aforementioned frequency range.

Transmission loss spectra of the barrel module at positions A, B and C are shown in Figure 9.3a together with the noise limited sensitivity of the spectrum analyzer. At none of the tested positions can a transmitted signal be measured, which corresponds to an average transmission loss of more than $-50$ dB over the entire frequency range, see Figure 9.4a.

Figure 9.3b shows the transmission loss spectra for the endcap module at positions A, B and C, and the noise limited sensitivity of the measurement. In the electronics region at position A, the assembly hole leads to large, frequency dependent variations of the transmitted intensity. In the region of the bonding wires at position B, the transmitted signal has the largest
Figure 9.3: Transmission loss spectra as function of the frequency at positions A, B and C, and the noise limited sensitivity of the spectrum analyzer. Uncertainties are due to intensity variations observed in the spectrum analyzer.

Figure 9.4: Transmission loss averaged over the frequency band for a position scan along the modules. Uncertainties represent the RMS of the averaged measurements.

intensity. However, with $-25\,\text{dB}$, the transmission loss is still large. At position C, no transmitted signal is measured through the silicon. The frequency averaged transmission loss as function of the position on the module is shown in Figure 9.4b. A clear position dependence is visible. The average transmission loss ranges from $-25\,\text{dB}$ to $-40\,\text{dB}$ in the electronics region.

This study demonstrates that electromagnetic waves in the 60 GHz frequency band cannot penetrate silicon strip sensors. Transmission of signals through gaps and not fully metallized layers within the modules may occur. However, for the modules under test it is found to not reach a critical level. The smallest measured transmission loss through the endcap module of $-25\,\text{dB}$ is still large enough to pose no problem for the implementation of a wireless readout system, as link margins can be set such that the intensity of signals penetrating detector layers is well below the receiver’s threshold.
9.2 Noise Pickup Studies

The analogue performance of the detectors shall not be affected by the readout system. Since electromagnetic waves can be picked up by the detector electronics, this poses a potential concern for the implementation of a wireless readout system. However, for wireless links operating at frequencies of 60 GHz and above, no interference between the wireless links and typical detector components is to be expected, as the cut-off frequencies of sensors and readout ASICs are typically at a few GHz.

In order to demonstrate that detectors and wireless links can be operated side by side without interfering, the performance of two silicon strip detector module prototypes is tested under the influence of a wireless link at 60 GHz. This study is conducted using two ABCN endcap electronics hybrid prototypes for the phase-2 upgrade of the ATLAS silicon tracking detector [159, 160]. Both samples contain 12 fully operational ABCN readout chips [161]. One of the samples is connected to short silicon strip sensors with a strip length of 2.4 cm, the other is a bare electronics hybrid.

The noise level of each readout channel is measured in units of equivalent noise charges (ENC) by performing a threshold scan with calibrated injected charges. Threshold scans are performed with and without the wireless transmitter being active. The transmitter generates a 60 GHz carrier signal that is modulated with a 1.76 Gb/s 8b/10b encoded pseudo random data pattern using minimum-shift keying. Using a 20 dBi horn antenna at a distance of 1 cm, the hybrids are exposed to a transmitted power of about \(-1.0 \pm 1.0 \text{ dBm}\). A block diagram and a photograph of the test setup are shown in Figure 9.5a and 9.5b, respectively. The antenna is positioned at four different spots over the modules, as indicated in Figure 9.5a: (A) over the power converter, (B) and (C) over the readout chips, and (D) over the

![Block diagram and photo of test setup](image)

Figure 9.5: Setup to test the influence of a wireless link on the noise of a silicon strip detector module.
bonding wires. The wireless link is operated with carrier frequencies of 57 GHz, 60 GHz and 63 GHz.

The noise level distributions of an exemplary measurement with and without wireless transmission are shown in Figure 9.6 for the bare electronics hybrid. The noise distributions are compatible with each other. No significant differences are found in the mean value and in the width of the distributions. Similarly for the second prototype that is connected to a silicon strip sensor, there is no significant increase. The average reference noise level of $570 \pm 1$ ENC is compatible to the average noise level of $571 \pm 1$ ENC when the wireless transmitter is active.

No significant influence on the noise level is observed regardless of the carrier frequency or the position of the wireless transmitter. Therefore, it is concluded that wireless data transmission in the 60 GHz frequency band is safe with respect to the simultaneous operation of currently used silicon sensors.

### 9.3 Prospects of Carrier Frequencies above 200 GHz

As discussed in section 8.3, wireless transceivers operating at carrier frequencies above 200 GHz are emerging. The benefit of an increasing carrier frequency is the accompanying increasing bandwidth, which allows for even higher data rates. Examples of transceivers operating at frequencies ranging from 210 GHz to 283 GHz can be found in literature [162–164].

The rate capabilities of the 240 GHz band are demonstrated using a transmitter and receiver chipset [163] with integrated ring antennas. The antenna’s directivity is increased using silicon lenses with a gain of 25 dBi. A wireless link over a transmission distance of about 40 cm is set up in
Figure 9.7: A 240 GHz wireless link at a transmission distance of 40 cm. Taken from [153].

Figure 9.8: The link’s intermediate frequency spectrum measured with a spectrum analyser at the output of the 240 GHz receiver. The transmitter’s intermediate frequency input is varied between 0 GHz to 20 GHz. The green line highlights the 3 dB bandwidth. Taken from [153].

The laboratory, see Figure 9.7. At a carrier frequency of 235 GHz, the intermediate frequency (IF) bandwidth is characterized using a frequency sweep in a range of 0 GHz to 20 GHz. At the receiver, the IF output is measured using the R&S FSU67 spectrum analyser. The 3/6-dB IF bandwidth is measured to be 12/13 GHz, respectively, see Figure 9.8.

This large bandwidth can be exploited to transfer data at high rates without applying complex modulation schemes. Bit error rate tests are performed using a Stratix V FPGA development kit and the adapted transceiver tool-kit, as already described in section 6.3.2. A pseudo random binary sequence (PRBS-7) signal is modulated onto the carrier using BPSK, by simply connecting the digital data signal to the in-phase input (I) of the IQ-mixer. Up to 8 Gb/s, not a single bit error is measured. According to the measurement time, an upper limit on the bit error rate of $10^{-13}$ is stated. At 10 Gb/s, a bit error rate of $10^{-14}$ is measured during an over-night measurement. Figure 9.9 shows the corresponding eye diagram at the output.
Figure 9.9: Eye diagram of a 10 Gb/s PRBS-7 data stream sent over the wireless link, recorded with the DSA8300.

of the wireless receiver, recorded with the DSA8300 serial analyzer. With a signal-to-noise ratio (SNR) of 18, the link still works properly, but reaches the rate limits of modulation scheme and bandwidth, which reflects in the measured bit error rate.

The technology of transceivers operating at frequencies of 200 GHz and above is yet in a prototyping phase. The transceiver used for this study consumes about 2.8 W, corresponding to 280 pJ/bit. To be fully competitive with other technologies, the energy efficiency has to be improved.
SUMMARY, CONCLUSION AND OUTLOOK

Silicon tracking detectors enable precise measurements of momenta and vertices of charged particles. However, the high granularity of the detectors is associated with large amounts of data that have to be read out and processed. Experiments searching for new physics require steadily increasing beam energies, luminosities or intensities. As a consequence, the event rates within the detectors and thus also their data rates rise. Data acquisition for today’s and future silicon detectors is only possible with the utilization of high-speed data links and massive parallel logic implemented in hardware.

In the context of this thesis, a vertical slice of the Mu3e readout system is developed and wireless data transmission is studied complementing electrical and optical links for the use in future experiments.

The Mu3e experiment aims to search for the ultra-rare decay $\mu^+ \rightarrow e^+ e^- e^+$ with an unprecedented sensitivity of $1 \times 10^{16}$ decays. Any observation of this charged lepton flavour violating decay would be a clear indication for new physics. The experiment is conducted in two phases. In the first phase, a sensitivity of the order of $10^{-15}$ is targeted using the existing compact muon beam line at PSI that provides $10^8$ muons per second. In the second phase, to achieve the ultimate sensitivity within a reasonable time, more than $10^9$ muon decays have to be observed per second.

An ultra-thin silicon pixel detector is used to measure the trajectories of the electrons that originate from the muon decays with high precision. The expected data rates from the pixel detector are about 33 Gb/s in the first phase and 668 Gb/s in the second phase, excluding any overhead due to the link protocol. As it is not feasible to store the full detector information for all events, online event selection is used to filter signal-like events from the majority of background events. The amount of data that is written to disk is thereby reduced to a maximum of 100 MB/s. The event selection is based on online track reconstruction using the full detector information. This requires high-speed readout of the entire Mu3e detector.

The Mu3e readout system runs without a hardware trigger. The zero-suppressed detector data is prepared in several stages for the online event selection. In the context of this thesis, the first stages of the data acquisition system are investigated and brought together in form of a vertical slice test.

The serial data outputs of two versions of the Mu3e pixel sensors, called MuPix, are studied. The MuPix7 is the first prototype with integrated readout state machine and serial data output. Its serial link is found to be fully operational over a broad range from 150 Mb/s to 1.6 Gb/s. MuPix8 is the first large prototype with three independent readout state machines.
and four serial data outputs. The analogue signal of the serial links can be tuned using a set of global DACs that influence all four links. The signal quality of the serial links, operated at the Mu³e design data rate of 1.25 Gb/s, is studied in detail. Using eye diagram analyses, on-chip variations of the analogue signal quality among the four links are observed. These are most likely caused by the differences of the capacitance of the transmission lines on the chip. Sensor-to-sensor variations of the order of 20% are observed for the serial links operated at the nominal supply voltage of 1.8 V. These variations are largely reduced by increasing the supply voltage to 2.0 V.

Using bit error rate tests, the quality of the data transmission is found to be satisfactory for a sensor supply voltage of 1.9 V and above. A sample of ten MuPix8 sensors is used for this study. All sensors pass the bit error rate test, running error-free for a period of 100 s, which corresponds to an upper limit of $2.4 \cdot 10^{-11}$ on the bit error rate. For a supply voltage of 1.8 V, two of ten sensors fail the test.

For the final pixel sensor, a revision of the transmission line design should be considered to reduce on-chip variations. The signal quality variations could also be mitigated by implementing additional DACs which would allow to tune each link individually. As the signal quality of all links is significantly improved by operating the sensor with higher supply voltages than the nominal one, the powering scheme of the serializer should be revised.

In the Mu³e experiment, the pixel sensors are clocked, configured and read out by FPGAs located on the front-end boards close to the tracking modules within the magnet. A first front-end board prototype is brought into operation and included into a vertical slice of the data acquisition system, based on the framework of the MuPix Telescope. The setup implements configuration of the pixel sensors, clock distribution and the data path from the front-end board to a back-end PC.

For this integration study, the tasks of the Mu³e front-end FPGA are split between front-end and back-end, due to the limited number of logic elements available on the FPGA of the first front-end board version. The front-end FPGA performs data deserialization, bit alignment and bit error monitoring of the sensors’ serial links. The chronological sorting of the data and the data packing are done by the back-end FPGA.

The data is sent using 8 optical links running at 6.25 Gb/s from the front-end to the back-end FPGA. The optical links are tested extensively and found to run error-free over multiple days without any interruption, resulting in an upper limit on the bit error rate of $6 \cdot 10^{-16}$ per channel.

On the back-end FPGA, two readout modes are implemented. In the commissioning readout mode, the sensor’s readout structure is conserved, which allows to deduce parameters like load, multiplicity and latency from the data. In the sorted readout mode, packets of chronologically sorted data are generated that contain the hits from all sensors for a set of timestamps.
The back-end FPGA subsequently transfers the data packets via PCIe to the computer where it is written to disk.

The vertical slice setup is operated as an 8-layer beam telescope at test beam campaigns at DESY and MAMI. For all sensors, bit error rates are measured for all four serial links. But only the data of one link is further processed per sensor, because only one of the three pixel matrices of MuPix8 can be operated efficiently. The setup is successfully commissioned and tested under high particle rates.

Most sensors can be operated with error-free data transmission on all links during both beam tests, with few exceptions that have one or two failing links. The performance of the serial links is found to be independent of the particle rate. However, discrete variations of the measured bit error rates are observed within the failing links, which are most likely due to changes of the sampling phases within the FPGA’s LVDS receivers. Future designs of the front-end firmware should therefore include a direct feedback from the bit error rate monitor to the LVDS receiver to force a re-alignment in the direct phase alignment circuitry if the chosen phase leads to bit errors. Consequently, the phase should be locked once that an optimal phase is found. Long-term stability of the sampling phase should be investigated under the influence of temperature variations to assure proper operation in the detector where a significant temperature gradient is expected.

The readout system on the FPGA is capable of handling total hit rates up to 10 MHz in the commissioning readout mode, corresponding to an average hit rate of 1.25 MHz per link, with data losses below 0.1%. For higher rates, the round-robin process that forwards the packets to the PCIe-interface is identified as bottleneck, which could be mitigated by operating this process at higher clock speeds. In the sorted readout mode, up to 40 MHz of total hit rate, corresponding to an average hit rate of 5 MHz per link, can be processed without significant losses. The losses of 0.3% are completely dominated by pixel hits with large readout latency. This is due to the readout state machine of the MuPix8, that cannot be operated at full speed. This should be fixed for future sensors, which would reduce the average readout latency and increase the maximum hit rate by a factor of 2.

For Phase I of the Mu3e experiment, an average hit rate of 1.2 MHz per link is expected for the front-end FPGAs of the innermost pixel layers. At these hit rates, the readout system works highly efficient. Only $3 \times 10^{-5}$ of all hits are lost operating the system in the sorted readout mode, which qualifies the implementation for the deployment during Phase I. However, the system has to be scaled up further. Proper operation with up to 45 data links has to be verified. Close to the targeted total system hit rate of 48.6 MHz, hits are lost as the reading process of the sorter poses a bottleneck and truncates packets. This can be mitigated by either by reducing the data width of the hits from 64 bit to 28 bit as would be sufficient for the experiment or by operating the reading process at a higher clock speed.

In the experiment, not only the pixel detector, but also the timing detectors will be readout and controlled by the front-end boards. Both timing
detectors utilize a common readout ASIC, called MuTRiG. Integration of the MuTRiG into the data acquisition chain is foreseen for the near future. Developments have already started using a front-end board prototype. The inclusion of the switching board into the readout system, enabling to merge data packets from several front-end boards into a single data stream, poses the next step in the development of the full data acquisition chain.

**Wireless data transmission** is an attractive alternative to wired electrical and optical data transmission for the readout of future experiments.

Silicon tracking detectors are a key component for all large modern particle physics experiments, as they allow to resolve single events within a pile-up of tens or even hundreds of events. At HL-LHC, for instance, an average pile-up above 100 is expected. The data from the tracking detector is foreseen to be used in early trigger stages of the upgraded ATLAS and CMS experiments in order to enhance the physics trigger performance under these extreme pile-up conditions. However, the high granularity of the tracking detectors in combination with particle fluxes exceeding 1 GHz/cm² lead to data volumes that can not be fully transferred without being reduced by an external trigger. Therefore, track triggers, as envisioned for the ATLAS and CMS experiments at HL-LHC, do not rely on the full detector information but only on a subset of the data.

Silicon tracking detectors at hadron colliders place enormous demands on their readout links’ bandwidth, as well as their power and material budgets. Additionally, the links have to be operated within an extremely hostile radiation environment. Wireless links operating at frequencies of 60 GHz and above offer high bandwidth, low power consumption and small form factor. Regarding transmission speeds, they can compete with electrical and optical links as used in today’s HEP experiments.

The application of wireless links could enable the implementation of new and flexible readout topologies. The data would not be forced to follow the electrical connections axially along the layers, but could be sent radially from layer to layer, following the particle trajectories, without impairing the detector’s modularity. This form of inter-layer communication could enhance track trigger capabilities.

Within this context, studies of wireless signal transmission in the 60 GHz frequency band are conducted. Silicon strip detector modules are found to be opaque for electromagnetic waves in this frequency range. Transmission losses through the detectors are measured to be larger than 50 dB. This would allow to operate wireless links between different detector layers without interference.

The immunity of current detector technologies to wireless links operating at 60 GHz is demonstrated. The noise level of the readout ASICs of a silicon strip detector module prototype for the phase-II upgrade of the ATLAS experiment is measured under the influence of electromagnetic waves in the 60 GHz band. The noise is found to be independent of the presence
of the wireless link, hence, the performance of the readout ASICs is not impaired.

High data rate capabilities are demonstrated using a wireless link that is operated at a carrier frequency of 240 GHz. A bit error rate test is conducted proving stable data transmission up to 10 Gb/s with a bit error rate of $10^{-14}$.

In order to fully qualify wireless technologies for the use within silicon tracking detectors, the radiation tolerance of wireless transceivers has to be studied in the future in great detail.

The aforementioned studies and integration tests for the Mu3e experiment have led to the design of a production prototype of the Mu3e front-end board. The current front-end board version is used for readout of the MuPix telescope during beam tests and for further integration studies. The broad investigations of high-speed wireless data links in the context of HEP detectors have demonstrated the feasibility of employing this technology for upcoming tracking detectors and track trigger systems.
Appendices
<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADC</td>
<td>analog-to-digital converter</td>
</tr>
<tr>
<td>ALM</td>
<td>adaptive logic module</td>
</tr>
<tr>
<td>ALUT</td>
<td>adaptive look-up table</td>
</tr>
<tr>
<td>AMS</td>
<td>austria microsystems AG</td>
</tr>
<tr>
<td>ASIC</td>
<td>application specific integrated circuit</td>
</tr>
<tr>
<td>ASK</td>
<td>amplitude-shift keying</td>
</tr>
<tr>
<td>BAR</td>
<td>base address register</td>
</tr>
<tr>
<td>BER</td>
<td>bit error rate</td>
</tr>
<tr>
<td>BPSK</td>
<td>binary phase-shift keying</td>
</tr>
<tr>
<td>CDR</td>
<td>clock and data recovery</td>
</tr>
<tr>
<td>CL</td>
<td>confidence level</td>
</tr>
<tr>
<td>CML</td>
<td>current-mode logic</td>
</tr>
<tr>
<td>CMOS</td>
<td>complementary metal-oxide semiconductor</td>
</tr>
<tr>
<td>CPFSK</td>
<td>continuous phase frequency-shift keying</td>
</tr>
<tr>
<td>DAC</td>
<td>digital-to-analog converter</td>
</tr>
<tr>
<td>DAQ</td>
<td>data acquisition</td>
</tr>
<tr>
<td>DC</td>
<td>direct current</td>
</tr>
<tr>
<td>DCL</td>
<td>differential current mode logic</td>
</tr>
<tr>
<td>DESY</td>
<td>Deutsches Elektronen-Synchrotron</td>
</tr>
<tr>
<td>Diff</td>
<td>differential signaling</td>
</tr>
<tr>
<td>DMA</td>
<td>direct memory access</td>
</tr>
<tr>
<td>DPA</td>
<td>dynamic phase alignment</td>
</tr>
<tr>
<td>DSP</td>
<td>digital signal processor</td>
</tr>
<tr>
<td>ECAL</td>
<td>electromagnetic calorimeter</td>
</tr>
<tr>
<td>ENC</td>
<td>equivalent noise charges</td>
</tr>
<tr>
<td>EOC</td>
<td>end of column</td>
</tr>
<tr>
<td>EyeQ</td>
<td>On-Chip Signal Quality Monitoring Circuitry</td>
</tr>
<tr>
<td>FCC</td>
<td>Future Circular Collider</td>
</tr>
<tr>
<td>FIFO</td>
<td>first in, first out buffer</td>
</tr>
<tr>
<td>FPGA</td>
<td>field programmable gate array</td>
</tr>
<tr>
<td>FSK</td>
<td>frequency-shift keying</td>
</tr>
<tr>
<td>FSPL</td>
<td>free-space path loss</td>
</tr>
<tr>
<td>GBT</td>
<td>GigaBit Transceiver</td>
</tr>
<tr>
<td>GPU</td>
<td>graphics processing unit</td>
</tr>
<tr>
<td>GUI</td>
<td>graphical user interface</td>
</tr>
<tr>
<td>GXB</td>
<td>gigabit transceiver block</td>
</tr>
<tr>
<td>HCAL</td>
<td>hadronic calorimeter</td>
</tr>
<tr>
<td>HDI</td>
<td>high density interconnect</td>
</tr>
<tr>
<td>HEP</td>
<td>high energy physics</td>
</tr>
<tr>
<td>HL-LHC</td>
<td>High-Luminosity LHC</td>
</tr>
<tr>
<td>HSMC</td>
<td>High Speed Mezzanine Card</td>
</tr>
<tr>
<td>HV-CMOS</td>
<td>high-voltage-CMOS</td>
</tr>
<tr>
<td>HV-MAPS</td>
<td>high-voltage monolithic active pixel sensors</td>
</tr>
<tr>
<td>I²C</td>
<td>Inter-Integrated Circuit</td>
</tr>
<tr>
<td>I/O</td>
<td>input/output</td>
</tr>
<tr>
<td>IF</td>
<td>intermediate frequency</td>
</tr>
<tr>
<td>IP</td>
<td>intellectual property</td>
</tr>
<tr>
<td>ISI</td>
<td>inter-symbol interference</td>
</tr>
<tr>
<td>JTAG</td>
<td>Joint Test Action Group</td>
</tr>
<tr>
<td>LCD</td>
<td>liquid-crystal display</td>
</tr>
<tr>
<td>LE</td>
<td>logic element</td>
</tr>
<tr>
<td>LED</td>
<td>light-emitting diode</td>
</tr>
<tr>
<td>LHC</td>
<td>Large Hadron Collider</td>
</tr>
<tr>
<td>LpGBT</td>
<td>low power GigaBit Transceiver</td>
</tr>
<tr>
<td>LSB</td>
<td>least significant bit</td>
</tr>
<tr>
<td>LTD</td>
<td>lock-to-data</td>
</tr>
<tr>
<td>LTR</td>
<td>lock-to-reference</td>
</tr>
</tbody>
</table>
# List of Abbreviations

<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>LVDS</td>
<td>low-voltage differential signaling</td>
</tr>
<tr>
<td>MAMI</td>
<td>Mainz Microtron</td>
</tr>
<tr>
<td>MAPS</td>
<td>monolithic active pixel sensors</td>
</tr>
<tr>
<td>MIMO</td>
<td>multiple-input multiple-output</td>
</tr>
<tr>
<td>MISO</td>
<td>multiple-input single-output</td>
</tr>
<tr>
<td>MSB</td>
<td>most significant bit</td>
</tr>
<tr>
<td>MSK</td>
<td>minimum-shift keying</td>
</tr>
<tr>
<td>NRZ</td>
<td>non-return-to-zero</td>
</tr>
<tr>
<td>NVM</td>
<td>non-volatile memory</td>
</tr>
<tr>
<td>OOK</td>
<td>on-off-keying</td>
</tr>
<tr>
<td>PAM4</td>
<td>4-level pulse-amplitude modulation</td>
</tr>
<tr>
<td>PCB</td>
<td>Printed Circuit Board</td>
</tr>
<tr>
<td>PCIe</td>
<td>Peripheral Component Interconnect Express</td>
</tr>
<tr>
<td>PCML</td>
<td>pseudo current mode logic</td>
</tr>
<tr>
<td>PCS</td>
<td>physical coding sub-layer</td>
</tr>
<tr>
<td>PFD</td>
<td>phase frequency detector</td>
</tr>
<tr>
<td>PD</td>
<td>phase detector</td>
</tr>
<tr>
<td>PIO</td>
<td>parallel I/O</td>
</tr>
<tr>
<td>PLL</td>
<td>phase-locked loop</td>
</tr>
<tr>
<td>PMA</td>
<td>physical media attachment</td>
</tr>
<tr>
<td>PRBS7</td>
<td>pseudorandom binary sequence-7</td>
</tr>
<tr>
<td>PSI</td>
<td>Paul Scherrer Institut</td>
</tr>
<tr>
<td>PSK</td>
<td>phase-shift keying</td>
</tr>
<tr>
<td>QAM</td>
<td>quadrature amplitude modulation</td>
</tr>
<tr>
<td>QSFP</td>
<td>Quad Small Form-factor Pluggable</td>
</tr>
<tr>
<td>QSH</td>
<td>QStrip high-speed ground plane socket</td>
</tr>
<tr>
<td>QTH</td>
<td>QStrip high-speed ground plane terminal strip</td>
</tr>
<tr>
<td>RAM</td>
<td>random-access memory</td>
</tr>
<tr>
<td>RD</td>
<td>running disparity</td>
</tr>
<tr>
<td>RF</td>
<td>radio frequency</td>
</tr>
<tr>
<td>RoI</td>
<td>regions of interest</td>
</tr>
<tr>
<td>ROM</td>
<td>read-only memory</td>
</tr>
<tr>
<td>RX</td>
<td>receiver</td>
</tr>
<tr>
<td>SCSI</td>
<td>Small Computer System Interface</td>
</tr>
<tr>
<td>SCT</td>
<td>Semiconductor Tracker</td>
</tr>
<tr>
<td>SE</td>
<td>single-ended signaling</td>
</tr>
<tr>
<td>SerDes</td>
<td>serializer and deserializer</td>
</tr>
<tr>
<td>SFP</td>
<td>Small Form-factor Pluggable</td>
</tr>
<tr>
<td>SiPM</td>
<td>silicon photomultiplier</td>
</tr>
<tr>
<td>SM</td>
<td>Standard Model</td>
</tr>
<tr>
<td>SMA</td>
<td>SubMiniature version A</td>
</tr>
<tr>
<td>SPI</td>
<td>Serial Peripheral Interface</td>
</tr>
<tr>
<td>SUSY</td>
<td>supersymmetry</td>
</tr>
<tr>
<td>TCL</td>
<td>Tool Command Language</td>
</tr>
<tr>
<td>TX</td>
<td>transmitter</td>
</tr>
<tr>
<td>UART</td>
<td>universal asynchronous receiver-transmitter</td>
</tr>
<tr>
<td>UI</td>
<td>unit interval</td>
</tr>
<tr>
<td>VCO</td>
<td>voltage controlled oscillator</td>
</tr>
<tr>
<td>VCSEL</td>
<td>vertical-cavity surface-emitting laser</td>
</tr>
<tr>
<td>WADAPT</td>
<td>Wireless Allowing Data And Power Transmission</td>
</tr>
<tr>
<td>XAUI</td>
<td>10 Gigabit Attachment Unit Interface</td>
</tr>
<tr>
<td>XOR</td>
<td>exclusive or</td>
</tr>
</tbody>
</table>
EYE DIAGRAM ANALYSIS OF THE SERIAL LINKS OF MUPIX8

Figure B.1 shows the noise RMS for all 40 serial links of the MuPix8 sensors studied in section 6.1.2 for supply voltages $V_{High} = 1.8\, \text{V}, 1.9\, \text{V}, 2.0\, \text{V}$ and $2.045\, \text{V}$. For all sensors, link 3 has the highest noise value for all voltages, with the only exception being sensor 84-2-4, where at $V_{High} = 1.9\, \text{V}$, link 0 has a slightly higher noise value. For most of the sensors, link 0 has the second highest noise value. These are attributed to the signal overshoots. The majority of the links show a noise minimum at $1.9\, \text{V}$ (18 links) or $1.8\, \text{V}$ (12 links). Only 5 links benefit most from $2.0\, \text{V}$, another 5 from increasing the supply voltage even up to $2.045\, \text{V}$.

Only about half of the links gain in the signal-to-noise ratio by increasing the supply voltage, see Figure B.2. For the remaining half, the $SNR$ stays either more or less constant, or even gets worse. For all sensors, link 1 has the best and link 3 has the worst $SNR$.

Figure B.3 shows the eye height for all links versus the supply voltage. All links benefit from increasing the supply voltage. With only a few exceptions, the highest supply voltage yields the largest eye heights.

---

Figure B.1: RMS noise of the serial links depending on the supply voltage $V_{High}$ for 10 sensors. Amplitudes are not corrected for losses in the clock recovery unit.
Figure B.2: Signal-to-noise ratio of the serial links depending on the supply voltage $V_{High}$ for 10 sensors.

Figure B.3: Eye height of the serial links depending on the supply voltage $V_{High}$ for 10 sensors. Amplitudes are not corrected for losses in the clock recovery unit.

Figure B.4 shows the measured jitter RMS values. For 8 out of 10 sensors, link 2 has by far the highest jitter value among the links. Most of the links...
benefit from increased supply voltages by reducing the jitter. Similarly, the eye width increases with higher supply voltage, see Figure B.5.
LIST OF MY OWN PUBLICATIONS

Some of the ideas and figures presented in this thesis have appeared previously or are expected to be published in the following journal articles and conference proceedings:

- **Technical design of the Phase I Mu3e Experiment**  
  A. Blondel et al.  
  to be published

- **Irradiation study of a fully monolithic HV-CMOS pixel sensor design in AMS 180 nm**  
  H. Augustin et al.  

- **Efficiency and timing performance of the MuPix7 high-voltage monolithic active pixel sensor**  
  H. Augustin et al.  

- **Readout Electronics for the First Large HV-MAPS Chip for Mu3e**  
  D. Wiedner et al.  
  PoS TWEPP-17 (2018) 099

- **The MuPix System-on-Chip for the Mu3e Experiment**  
  H. Augustin et al.  

- **The MuPix Telescope: A Thin, high Rate Tracking Telescope**  
  H. Augustin et al.  
  JINST 12 (2017) no.01, C01087

- **Ultra-low material pixel layers for the Mu3e experiment**  
  N. Berger et al.  
  JINST 11 (2016) no.12, C12006

- **MuPix7 – A fast monolithic HV-CMOS pixel chip for Mu3e**  
  H. Augustin et al.  
  JINST 11 (2016) no.11, C11029

- **Readout via Flexprints for the Mu3e Experiment**  
  N. Berger et al.  
  PoS CORFU2015 (2016) 071
– Overview of HVCMOS pixel sensors
  I. Perić et al.
  JINST 10 (2015) no.05, C05021

– Wireless data transmission for high energy physics applications
  S. Dittmeier et al.
  EPJ Web Conf. 150 (2017) 00002

– Feasibility studies for a wireless 60 GHz tracking detector readout
  S. Dittmeier et al.

The content of the following articles is related to previous research conducted during a research stay prior to my doctoral studies as well as during my master thesis and is not further used within this thesis:

– First results from the DEAP-3600 dark matter search with argon at SNOLAB
  DEAP-3600 collaboration
  Phys.Rev.Lett. 121 (2018) no.7, 071801,

– Design and Construction of the DEAP-3600 Dark Matter Detector
  DEAP-3600 collaboration
  Submitted to: Astropart. Phys.

– In-situ characterization of the Hamamatsu R5912-HQE photomultiplier tubes used in the DEAP-3600 experiment
  DEAP-3600 collaboration,

– 60 GHz wireless data transfer for tracker readout systems – first studies and results
  S. Dittmeier et al.
  JINST 9 (2014) no.11, C11002

– Towards Multi-Gigabit readout at 60 GHz for the ATLAS silicon microstrip detector
  H.K. Soltveit et al.
  DOI: 10.1109/NSSMIC.2013.6829448


DANKSAGUNG

Die letzte Seite möchte ich all denen widmen, die mich bei der Durchführung dieser Doktorarbeit unterstützt haben.

So gilt ein herzliches Dankeschön meinem Doktorvater André Schöning für die Möglichkeit, dieses spannende Thema zu bearbeiten, für die hervorragende Betreuung und für die vielen Diskussionen, die zum Erfolg dieser Arbeit beigetragen haben.

Vielen Dank außerdem an Norbert Herrmann für die Bereitschaft, als Zweitgutachter zur Verfügung zu stehen.

Den Mitgliedern der Mu3e-Arbeitsgruppe möchte ich für die tolle Atmosphäre bei der Arbeit am Institut, auf den unzähligen Testbeam-Kampagnen und auch bei den vielen Freizeitaktivitäten danken.

Ganz besonders möchte ich mich bei meinen langjährigen Doktorandenkollegen und Freunden Heiko Augustin, Adrian Herkert, Lennart Huth, Ann-Kathrin Perrevoort, sowie auch bei Moritz Kiehn und Dorothea vom Bruch bedanken. Danke für die intensive Zusammenarbeit, für die vielen Gespräche über ernste, aber auch weniger ernste Themen, für die vielen Runden am Tischkicker und für die vielen Erfahrungen, die ich mit euch während der letzten Jahre teilen durfte.


Außerdem möchte ich mich auch bei den Mitarbeitern der Elektronikwerkstatt des Physikalischen Instituts bedanken, die durch ihre Arbeit viele der durchgeführten Messungen überhaupt erst möglich gemacht haben.

Meinen Eltern und meinen Brüdern möchte ich für die immerwährende Unterstützung danken. Danke, dass ihr immer für mich da seid.

Danke auch an alle meine Freunde, die mich über das Studium oder noch länger darüber hinaus bereits begleitet haben.
