## DISSERTATION

submitted to the Combined Faculties of the Natural Sciences and Mathematics

of the

Ruperto-Carola-University of Heidelberg, Germany for the degree of Doctor of Natural Sciences

Put forward by

HEIKO AUGUSTIN born in Saarlouis Oral examination: 1<sup>st</sup> of December, 2021

# Development of a novel slow control interface and suppression of signal line crosstalk enabling HV-MAPS as sensor technology for Mu3e

Referees: Prof. Dr. André Schöning Prof. Dr. Peter Fischer

The Mu3e experiment searches for the charged lepton flavour violating decay  $\mu^+ \rightarrow e^+ e^- e^+$  with the goal to find or exclude the decay above a branching ratio of  $1 \times 10^{-16}$ . In a first phase of the experiment a reduced muon decay rate of  $1 \times 10^8$ Hz is used which allows to reach a single event sensitivity of  $2 \times 10^{-15}$ . With high particle rates and the low energy of the decay particles, the experiment imposes strong limitations on its material budget. For this reason the pixel tracker modules are designed with a thickness of 0.1% of a radiation length. The only pixel sensor technology which provides high rate capability, while allowing to thin the sensor down to 50 µm are high voltage monolithic active pixel sensors (HV-MAPS).

In this thesis the first full-scale pixel sensor for the Mu3e experiment, MuPix10, is described and first results from its characterisation are presented. The focus of this work is on two specific developments. The first one is a novel slow control interface is developed, tailored to the MuPix chip, with the goal of minimising the amount of required differential pairs used for the sensor configuration. The resulting interface requires no additional differential lines as it incorporates the syncronous reset signal and takes its place. The new interface allows to configure a 9-chip pixel tracker module with chip-individual configuration in less then 400 ms.

The second topic is the optimisation of the signal line routing scheme of the sensor with the goal to suppress crosstalk effects and reduce the row address dependent time delays, that have been observed in MuPix8. Subsequently, models are developed to gain a deeper understanding of the effects. The new routing scheme reduces the crosstalk probability from an expected average value of 18% to 2.5%. Furthermore, the time spread of the row dependent pixel delay is halved from 40 ns to 20 ns.

v

Das Mu3e Experiment sucht nach dem geladenen leptonfamilienzahlverletzenden Zerfall  $\mu^+ \rightarrow e^+e^-e^+$  mit dem Ziel den Zerfall nachzuweisen oder für Verzweigungsverhälnisse größer als  $1 \times 10^{-16}$  auszuschließen. In einer ersten Phase wird das Experiment mit einer reduzierten Muon-Zerfallsrate betrieben, wo durch die einzel-Ereignis Sensitivität auf  $2 \times 10^{-15}$  beschränkt ist. Auf Grund der hohe Teilchenraten und der geringen Energie der Zerfallsteilchen muss das Materialbudget des Detektors streng reduziert werden, mit dem Ziel Spurdetektorlagen mit einer Dicke von 0.1% einer Strahlungslänge bereitzustellen. Die einzige Pixelsensortechnology die Hochraten fähig ist wenn sie auf 50 µm gedünnt wurde, sind Hochspannungsgetriebene monolitisch aktive Pixelsensoren (HV-MAPS).

In dieser Arbeit wird MuPix10, der erste Pixelsensor mit der in der finalen Größe für das Mu3e experiment, vorgestellt und erste Ergebnisse werden präsentiert. Der Fokus dieser Arbeit liegt auf zwei spezifischen Entwicklungen. Ein Schwerpunkt ist die Entwicklung eine neuartige, maßgeschneiderten Konfigurationsschnittstelle für den MuPix-Sensor, mit dem Ziel die Menge der zur Konfiguration benötigten differentiellen Signalpaaren zu reduzieren. Die resultierende Schnittstelle benötig keine zusätzlichen differentiellen Paare, da das Signal für das synchrone Reset in die Schnittstelle integriert wird. Sie erlaubt die vollständige individuelle Konfiguration eines 9-Sensor Spurdetektormoduls in unter 400 ms. Das zweite Thema befasst sich mit der Optimierung der Führung der Pixelsignalleitungen mit dem Ziel Übersprechen und die Reihen-Addressen abhängige Latenz, die für den MuPix8 Sensor beobachtet wurden, zu reduzieren. Es wurde Modelle entwickelt, die ein tieferes Verständnis dieser Effekte ermöglichen. Mit Hilfe des neuen Schemas zur Führung der Pixelsignalleitungen, wurde die Übersprechwahrscheinlichkeit von erwarteten 18% auf 2.5% reduziert. Desweiteren wurde der Zeitunterschied der Pixellatenzen von 40 ns auf 20 ns halbiert.

| Ι   | INTRODUCTION 1                               |                                                         |    |  |  |  |
|-----|----------------------------------------------|---------------------------------------------------------|----|--|--|--|
| 1   | INTRODUCTION 3                               |                                                         |    |  |  |  |
| 2   | ТНЕ                                          | E MU3E EXPERIMENT 7                                     |    |  |  |  |
|     | 2.1                                          | The $\mu^+ \rightarrow e^+ e^- e^+$ signal decay        |    |  |  |  |
|     | 2.2                                          | Backgrounds                                             |    |  |  |  |
|     | 2.3                                          | The experimental concept                                |    |  |  |  |
|     | 2.4                                          | The Mu3e pixel tracker module                           | 11 |  |  |  |
| II  | ΗV                                           | 7-MAPS                                                  | 13 |  |  |  |
| 3   | AN I                                         | HV-MAPS FOR THE MU3E TRACKING DETECTOR 15               |    |  |  |  |
|     | 3.1                                          | Particle detection with silicon pixel sensors 1         |    |  |  |  |
|     |                                              | 3.1.1 Energy deposition                                 | 16 |  |  |  |
|     |                                              | 3.1.2 Charge collecting diode                           | 17 |  |  |  |
|     |                                              | 3.1.3 Charge sharing                                    | 20 |  |  |  |
|     |                                              | 3.1.4 Transistors                                       | 21 |  |  |  |
|     | 3.2                                          | Pixel sensor technologies                               | 22 |  |  |  |
|     |                                              | 3.2.1 Hybrid pixel detectors                            | 23 |  |  |  |
|     |                                              | 3.2.2 Monolithic active pixel sensors (MAPS)            | 23 |  |  |  |
|     |                                              | 3.2.3 High voltage monolithic active pixel sensors (HV- |    |  |  |  |
|     |                                              | MAPS)                                                   | 24 |  |  |  |
|     | 3.3                                          | MuPix10                                                 | 25 |  |  |  |
|     |                                              | 3.3.1 The HV-CMOS process                               | 25 |  |  |  |
|     |                                              | 3.3.2 Requirements                                      | 26 |  |  |  |
|     |                                              | 3.3.3 The MuPix10 chip                                  | 26 |  |  |  |
|     |                                              | 3.3.4 Towards MuPix11                                   | 41 |  |  |  |
|     | 3.4                                          | HV-MAPS beyond Mu3e                                     | 42 |  |  |  |
| III | III SLOW CONTROL & CROSSTALK OPTIMISATION 45 |                                                         |    |  |  |  |
| 4   | MUI                                          | PIX SLOW CONTROL                                        | 47 |  |  |  |
|     | 4.1                                          | The chip configuration infrastructure                   | 47 |  |  |  |
|     | 4.2                                          | Development of the Mu3e Protocol                        | 52 |  |  |  |
|     |                                              | 4.2.1 Logic implementation & Verification               | 56 |  |  |  |
|     | 4.3                                          | MuPix9                                                  | 57 |  |  |  |
|     |                                              | 4.3.1 Implementation                                    | 57 |  |  |  |
|     |                                              | 4.3.2 Commissioning                                     | 58 |  |  |  |
|     |                                              | 4.3.3 Conclusion                                        | 60 |  |  |  |
|     | 4.4                                          | MuPix10                                                 | 61 |  |  |  |
|     |                                              | 4.4.1 MuPix10 Configuration Infrastructure              | 61 |  |  |  |
|     |                                              | 4.4.2 Implementation                                    | 62 |  |  |  |
|     |                                              | 4.4.3 The ADC                                           | 64 |  |  |  |
|     |                                              | 4.4.4 First Commissioning                               | 65 |  |  |  |

|    |                                                                                                     | 4.4.5  | Module Configuration Strategies & Performance |            |  |  |
|----|-----------------------------------------------------------------------------------------------------|--------|-----------------------------------------------|------------|--|--|
|    |                                                                                                     |        | Projection                                    | 69         |  |  |
|    |                                                                                                     | 4.4.6  | Conclusion                                    | 73         |  |  |
| 5  | ANALOGUE SIGNAL TRANSMISSION IN LARGE SCALE MONO-                                                   |        |                                               |            |  |  |
|    | LITHIC CHIPS                                                                                        |        |                                               |            |  |  |
|    | <ul><li>5.1 Introduction</li></ul>                                                                  |        |                                               |            |  |  |
|    |                                                                                                     |        |                                               |            |  |  |
|    |                                                                                                     | 5.2.1  | Source Follower                               | 80         |  |  |
|    |                                                                                                     | 5.2.2  | RC-line                                       | 82         |  |  |
|    |                                                                                                     | 5.2.3  | Crosstalk                                     | 83         |  |  |
|    | 5.3 Routing optimisation for MuPix10                                                                |        |                                               |            |  |  |
|    | 5.4 MuPix10 results                                                                                 |        |                                               |            |  |  |
|    |                                                                                                     | 5.4.1  | Data & Analysis                               | 92         |  |  |
|    |                                                                                                     | 5.4.2  | Crosstalk                                     | 96         |  |  |
|    |                                                                                                     | 5.4.3  | Delay                                         | 109<br>111 |  |  |
|    | <ul><li>5.5 Optimisation considerations towards MuPix11</li><li>5.6 ATLASPix-like sensors</li></ul> |        |                                               |            |  |  |
|    |                                                                                                     |        |                                               |            |  |  |
|    | 5.7                                                                                                 | Conclu | sion                                          | 116        |  |  |
| IV | SIIN                                                                                                | AMARV  | AND OUTLOOK                                   | 119        |  |  |
| 6  | SUMMARY AND OUTLOOK<br>SUMMARY AND OUTLOOK                                                          |        |                                               |            |  |  |
| 0  | 3 U M                                                                                               | MANIA  | IND OUTLOOK                                   | 121        |  |  |
| V  | APP                                                                                                 | PENDIX |                                               | 125        |  |  |
| А  | PUBLICATIONS 12                                                                                     |        |                                               |            |  |  |
| В  | BIBLIOGRAPHY 13                                                                                     |        |                                               |            |  |  |
| С  | DANKSAGUNG 13                                                                                       |        |                                               |            |  |  |

In this work the development of a high voltage monolithic active pixel sensor (HV-MAPS) for the use in the Mu3e pixel tracker is described with the focus on two specific developments. After an introduction of the Mu3e experiment and the explanation of its need for an ultra-thin and fast pixel tracker, an overview of silicon pixel detector basics and the available pixel detector types is given. Subsequently the MuPix10, the latest in a series of HV-MAPS prototypes for Mu3e, is introduced and first results are presented. In chapter 4 the development of a new configuration interface for MuPix chip discussed. In the last part of the thesis (chapter 5) the issue of signal line crosstalk and more general, the influence of the signal line on the sensor performance is discussed and the results for a novel routing scheme are presented and explained. Finally the work summarized and a conclusion for the Mu3e experiment and the future of the MuPix development is drawn. The complex data acquisition(DAQ) system was developed in collaboration with many colleagues. The author's focus and main solo-development was the chip configuration interface. The developed software allows for a fast implementation of any new arriving prototypes with a similar configuration architecture. Like this 16 different sensor prototypes have been implemented, typically reducing the commissioning time below one week. The DAQ is supporting single pixel investigations as well as the MuPix-Telescope.

An important application of the configuration is the tuning of the sensor. After the author initially implemented a fast tuning procedure which utilized FPGA functionality for MuPix8, which could not make use of it, the procedure was finally successful adapted by a student for the use with MuPix10.

Since MuPix7 the author got acquainted with the chip design side of sensor development with the goal to provide a fast feedback loop for the laboratory investigation of the devices and enabling a precise targeting of issues, as well as specification of problems and wishes with the designers. Since then the author has provided mayor input for all PCB designs which house a HV-MAPS pixel sensor in Heidelberg as well as all subsequent MuPix submissions.

Starting with MuPix8, the author also performed design work with the colleagues at KIT. In MuPix8 he designed a new type of shift register which subsequently the baseline for all further HV-MAPS chips. Besides this a temperature stable current reference was ported from an other project.

In MuPix9 the first version of a new slow control concept for the Mu3e pixel sensors was implemented by the author. The project was finalised in MuPix10 and is presented in this thesis. Additionally MuPix10 features a novel routing scheme for pixel-to-pixel connections which is developed on the basis of experimental data with the goal to reduce the couplings in between these connection which lead to additional fake hits on the sensor. This development and its impact is also presented in the following, worked out by the author. The author makes use of the MuPix-Telescope and its analysis framework which was in large parts developed by a colleague, but the author is maintaining and debugging the frame work since the beginning as also a strong overlap with the DAQ for individual sensor investigation is present.

Beside these activities the author participated in or contributed to countless testbeam campaigns and guided decisively the laboratory efforts of the HV-MAPS characterisation in Heidelberg resulting in numerous Bachelor and Master theses ranging from basic studies of the analogue circuits to convoluted studies of sensor efficiency and time resolution at with testbeam data. Part I

### INTRODUCTION

With the discovery of the higgs particle at the large hadron collider (LHC) [1, 2], all particles that have been predicted by the Standard Model of particle physics (SM) have been found. The SM is describing the elementary particles and their interaction and it is performing splendidly, as so far all of the SM predictions of the particle interactions and their generation hold up.

However, it is the common understanding that the SM can not be entirely correct, as it fails to integrate gravity as an additional force into its frame work. Further the observation of the rotation and collisions of galaxies [3] and many more galactic structure formations can not be explained by the amount of matter that is visible, luminous, for us. The existence of a dark matter particle, which does not or only weakly interact with luminous matter, can explain all these observations. But the SM does not offer a particle which fits this description. Finally the SM also can not explain the large abundance of matter compared to anti-matter, the matter-antimatter asymmetry. The asymmetry present in the SM through CP-violation is not large enough, suggesting that the nature of particle physics is different in the hotter, early universe and allows for this large asymmetry.

Many beyond the SM theories naturally offer candidate particles and mechanisms to account for the effects mentioned above and always predict new particles at higher energy scales. This motivates the precision testing of SM predictions and the search for New Physics. Two different strategies are applied in these searches.

Either the search for new particles at high energies with increasing interaction rates with the High Luminosity upgrade of the LHC [4] or the contruction of an even more power full ring accelerator to access center of mass energies up to 100 TeV with the future circular collider (FCC) [5].

Or the precision investigation of SM parameters and rare processes which can be enhanced by these heavy particles e.g. entering on loop-level. An effect not predicted in the standard model is neutrino oscillation which is lepton flavour violating (LFV) and was confirmed in many different experiments by now [6–8]. The existence of neutrino oscillation proves that they are massive particles. These masses are added ad-hoc in the extended SM without providing a generation mechanism.

There is a large abundance of beyond the SM theories which predict LFV not only in the neutrino sector, but also for the charged leptons. E.g. super-symmetric models (SUSY) or models with new heavy gauge bosons [9]. In particular in the muon sector a large experimental effort is present, searching for charged lepton flavour violation (cLFV). The three investigated processes are  $\mu \rightarrow eee$ ,  $\mu \rightarrow e\gamma$  and  $\mu N \rightarrow eN$ , the latter describing the muon to electron conversion in the vicinity of a nucleus. Figure 1.1 shows possi-



Figure 1.1: Charged lepton flavour violating  $\mu \rightarrow$  eee decays.

ble Feynman diagrams to generate the cLFV decay  $\mu \rightarrow eee$ . A comparison with the diagrams plotted in figure 1.2 shows that both the  $\mu \rightarrow eee$  as well as the  $\mu \rightarrow e\gamma$  process can be enhanced by the same loop contributions, which favours the  $\mu \rightarrow e\gamma$  channel for a search for cLFV as the branching ratio of the  $\mu \rightarrow eee$  decay will be suppressed by an additional factor  $\alpha$  due to the photo conversion process. However, the  $\mu \rightarrow eee$  channel gives an additional sensitivity to tree level processes as depicted in figure 1.1c which is not present in the  $\mu \rightarrow e\gamma$  channel.



Figure 1.2: Charged lepton flavour violating  $\mu \rightarrow e\gamma$  decays.

Table 1.1 shows the current experimental limits on the muon decay channels and figure 1.3 a time table of the investigation of the channels past, present and future.

| Channel                      | Experiment      | Branching ratio limit   |
|------------------------------|-----------------|-------------------------|
| $\mu^+ \to e^+ e^- e^+$      | SINDRUM [10]    | $< 1 \times 10^{-12}$   |
| $\mu^+ \to e^+ \gamma$       | MEG [11]        | $< 4.2 \times 10^{-13}$ |
| $\mu$ -Au $\rightarrow$ e-Au | SINDRUM II [12] | $< 7 \times 10^{-13}$   |

Table 1.1: Experimental limits for the three muon decay channels with 90% confidence level (CL). SINDRUM II used a gold target.



Figure 1.3: History of searches for lepton flavour violating decays. Taken from [13] and modified with [14].

The SINDRUM experiment performed the last measurement of the  $\mu \rightarrow$  eee channel in the 1980s. Thirty years later the Mu3e experiment is aiming to improve the current branching ratio limit ultimately by four orders of magnitude. This is achieved by using silicon instead of gaseous detectors which allows to access higher rates of muon decays. However, also in 2010 no pixel sensor technology was ready to use for the employment in a low-energy high rate experiment. This triggers the investigation of a new proposed technology which promises to fulfil all experimental requirements. In this thesis essential work is presented that is performed on the first full-scale prototype of the pixel sensor for the Mu3e experiment.

The Mu3e experiment [15–17] is searching for the charged lepton flavour violating decay  $\mu^+ \rightarrow e^+e^-e^+$  with the ultimate goal to either observe the decay or provide a new exclusion limit of a branching fraction > 10<sup>-16</sup> at 90% confidence level. To achieve this goal in reasonable amount of time (1 year measurement time), continuous muon stopping rates of  $2 \times 10^9$ Hz need to provided which currently no facility is capable to deliver. Therefore the experiment is conducted in a two phase approach. In Mu3e phase 1 the  $\pi E5$  beam line at the Paul Scherrer Institute (PSI) provides muon rates up to  $1 \times 10^8$ Hz which enables to achieve a single event sensitivity of  $2 \times 10^{-15}$  within a year of run time. The beam line that can provide the final rate required for Mu3e experiment (phase 2) is currently under development at PSI, the high intensity muon beam (HIMB) project [18].

2.1 The  $\mu^+ \rightarrow e^+ e^- e^+$  signal decay

In the extended SM, with massive neutrinos, the decay  $\mu \rightarrow \text{eee}$  is possible via loop processes with a neutrino oscillation as depicted in figure 2.1. The branching ratio of this decay channel is however suppressed to levels  $< 10^{-54}$  [19, 20], which makes it practically unmeasurable and therefore leaves the channel with no irreducible physics background. Any detected  $\mu \rightarrow \text{eee}$  event is therefore a clear sign for cLFV and physics beyond the SM.



Figure 2.1: The SM  $\mu \rightarrow$  eee decay via a loop with neutrino oscillation.

The decay kinematics of this three body decay are used to identify  $\mu \rightarrow$  eee candidates and discriminate signal from background. The muons decay at rest and give the unique signature three charged particles, two positron and one electron, emerging from a common point, at the same time. The sum of the decay particles momenta  $\vec{p}_i$  should therefore vanish, while the invariant mass is equal to the muon mass.

$$\left|\vec{p}_{tot}\right| = \left|\sum \vec{p}_i\right| = 0 \tag{2.1}$$



Figure 2.2: Signal topologies for a) the  $\mu \rightarrow$  eee decay, b) the internal conversion decay and c) combinatorial background from two Michel decays with a Bhabha scattering event which generates the electron.

$$m_{inv} = \left|\sum p_i\right| = \sum E_i = m_\mu = 105.66 \,\mathrm{MeV}$$
 (2.2)

The energy of the decay particles can maximally reach 53 MeV in a backto-back topology.

The mechanism which may create the lepton flavour violation is unknown, therefore the description through a generalised Lagrangian [21] is used which allows to study the influence of the interaction type on the decay kinematics.

#### 2.2 BACKGROUNDS

Although there is no irreducible physics background, two processes can generate fake signatures closely resembling the  $\mu \rightarrow$  eee kinematics. Figure 2.2 highlights the three different scenarios.

Combinatorial background is fuelled by high rates. The higher the rates, the more likely are accidental overlaps of different Michel-decays, pictured in figure 2.3a, with additional electrons from photon conversions or Bhabha scattering. To suppress this background an excellent vertex and time resolution is required, as the overlapping processes will not share a common vertex and decay time. For the phase 2 experiment a time resolution below 100 ps is required to disentangle the decay times of the individual decays.

The internal conversion decay  $\mu^+ \rightarrow e^+ e^- e^+ vv$ , figure 2.3b gives the same signature of three charged particles with a common vertex, but an energy fraction is carried away by the undetected neutrinos. A measurement of the invariant mass of the decay particles will therefore result in a value smaller the muon mass.

To achieve the goal of the  $2 \times 10^{-15}$  sensitivity with  $2\sigma$  cut on the signal region, a mass resolution better than 1 MeV is required as read of figure 2.4.



Figure 2.3: The SM muon decays relevant for the largest background contributions.



Figure 2.4: The reconstructed mass resolution which is required to suppress the amount of interconversion decays to certain fractions.

Both vertex and momentum resolution depend heavily on the material budget of the pixel tracker layers and in case of the vertex resolution also the stopping target. Due to the low momenta of the decay particles they are subject to strong multiple coulomb scattering (MS) effects as described by the Highland-formula [22] with p the particle momentum, x the material thickness and  $X_0$  the radiation length of the material.

$$\Theta_{rms} = \frac{13.6MeV}{\beta cp} z \sqrt{\frac{x}{X_0}} \left[ 1 + 0.038 \log\left(\frac{x}{X_0}\right) \right]$$
(2.3)

Figure 2.5 highlights the deflection of the trajectories of low energy particles, beeing the main uncertainty in their reconstruction. To reduce MS effects, only the minimisation of the material budget remains as an option to achieve the required background suppressions. The material budget goal is one permille of a radiation length.



Figure 2.5: Particle tracking in the multiple scattering regime.

#### 2.3 THE EXPERIMENTAL CONCEPT

Figure 2.6 sketches the experimental concept of the Mu3e experiment [17]. A surface-muon beam strides a hollow, double cone target fabricated from Mylar, which stops the muons. The target is surrounded by four radial layers of HV-MAPS pixel sensors which form the pixel tracker, observing the muons decay at rest in a 1 T solenoidal field which is aligned along the beam axis. The two layers close to the target act as vertex detector, the outer layers enable the determination of the bending radius of charged decay particles and thereby their momentum. The momentum resolution improves for particles which "recurl" in the magnetic field and intersect again with the outer layers. To increase the number of the observed recurlers the pixel tracker layers are extended up- and down-stream. The pixel detector is cooled by flows of helium gas.

The Mu3e detector is complemented by two timing detector systems utilising scintillating fibres and scintillating tiles respectively, which provide an excellent time resolution to guarantee the suppression of combinatorial background and cope with the high particle rates.

The data generated by the detectors is streamed out to their respective front-end boards where the data is time sorted and merged. In the next step the so-called switching boards distribute time slices of sorted data to a farm of graphics processing units(GPU) which subsequently run highly parallel tracking and triple vertex finding algorithms on the data. Only data that corresponds to a triple vertex originating from the target region will be keeped for further analysis [23]. The data flow is depicted in figure 2.7 from the detectors, producing approximately 100 Gbit/s, to the filter farm which reduces the data to (50-100) MB/s which is send to permanent storage. These estimates do not include any noise contributions, but represent the pure decay particle induced data in Mu3e phase I. The sensors closest to the target need to handle hit rates of 5 MHits/s.



(a) Cut along the beam axis.





Figure 2.6: Schematic view of the Mu3e detector concept.

#### 2.4 THE MU3E PIXEL TRACKER MODULE

The most stringent requirement to the pixel tracker is the allowed material budget which directly limits the physics performance of the Mu3e experiment. A ladder consists of up to 18 sensors glued and bonded to a 2 layer aluminium high density interconnect (HDI), multiple ladders form a module as depicted in figure 2.8a. The HDIs cross section is sketched in figure 2.8b which account for a thickness of 0.05% radiation lengths. With the 50  $\mu$ m thick sensors glued to it the thickess increases to 0.115% of radiation length which suffices the Mu3e requirements.

The HDI needs to provide power for up to 9 chips which in a worst case scenario amounts to 30 W of electrical power. More than 15 A which need to be distributed among the chips without large ohmic losses on the HDI itself. To guarantee a power routing design with minimal ohmic losses, the largest part of the aluminium routing space is needed for the power lines, requiring a rigorous reduction the remaining signal lines to the bare minimum. This comprises 9 differential data output lines and a two signal differential bus system for clocking and configuration. A configuration interface with a single differential input line is not a standard solution. It was



Figure 2.7: The MuPix readout system.



Figure 2.8: CAD drawing of an Mu3e tracker module and the material stack of the HDI.

specifically developed and tailored to the needs of the MuPix sensors and tracker modules. The development of this interface is described in chapter 4.

Part II

HV-MAPS

#### AN HV-MAPS FOR THE MU3E TRACKING DETECTOR

The requirements to vertex tracking detectors in particle physics are constantly increasing to higher vertex resolutions. but more importantly higher particle rates. In the 1980s this lead to emergence of the first silicon strip detector used for vertex detection [24], as commonly used gaseous detectors could no longer handle the particle rates. Since then silicon detectors form the heart piece of many high energy particle physics experiments, so also for the current experiments at the highest particle energies ATLAS and CMS [25, 26] and many more.

With the Mu3e experiment a new challenge arises for silicon tracking detectors. The detector system need to handle the high particle rates emitting from 10<sup>8</sup> muon decays per second, while maintaining an ultra low material budget to minimize tracking degradations through multiple-coulombscattering. At the time of the proposal [15] no common silicon pixel detector technology can suffice the experimental requirements and leads to the pursue and advance of a new proposed pixel detector concept, the high voltage monolithic active pixel sensors (HV-MAPS) [27].

In a series of pixel sensor prototypes [28–32] the technology was investigated and developed, culminating in the first large scale sensor design. Starting from this the goal is to mould a full scale pixel sensor which fulfils all experimental requirements concerning operation, performance and form factor to build the Mu3e tracking detector.

The objective of this thesis is to create a pixel sensor that eases the achievement of the material budget requirement for the pixel modules (section 4) by optimizing its configuration interface and to optimize the chip internal routing of the individual pixels to combat the "signal line crosstalk" which was observed for the predecessor MuPix8 (section 5).

#### 3.1 PARTICLE DETECTION WITH SILICON PIXEL SENSORS

A charged particle which traverses a larger volume of matter will inevitably interact with the electrons and nuclei. On one hand this leads to a deflection of the particle from its original path through multiple-coulomb scattering, on the other hand the interactions will deposit energy in the material. In silicon this deposited energy is creating electron-hole pairs which can be collected and measured, e.g. to pinpoint the particles crossing location. In the following only the interaction of electron and positrons with are discussed in detail. A comprehensive summary of all interaction types can be found in [22]

#### 3.1.1 Energy deposition

The interaction of a charged particle with matter is a stochastic process. The number as well as the strength of the interactions can vary widely, but the evolution of the mean energy loss of the particle is well understood and described by the "Bethe formula" [33]. For the use case of electrons and positrons, this formula has to be further modified to account for the small mass of the traversing particle as well as the indistinguishability of incident and scattering electrons in contrast to the positron which leads to different energy losses for the two species, the Berger-Seltzer formula [34].

$$-\left\langle \frac{dE}{dx}\right\rangle = \rho \frac{0.153536}{\beta^2} \frac{Z}{A} \cdot \left(B_0(T) - 2\log(\frac{I}{m_e c^2}) - \delta\right)$$
(3.1)

The energy loss depends on the material and momentum dependent stopping power  $B_0(T)$ , the material dependent mean excitation energy *I*, density correction  $\delta$ , atomic number *Z*, mass number A and material density  $\rho$ . The energy loss for silicon is plotted in Fig.3.1 as function of the momentum p. In the case of thin detector layers this formula is overestimating the



Figure 3.1: Mean energy loss of electrons and positrons in silicon for from 50 keV to 10 GeV with values from [34]. Electrons in red, positrons in blue.

amount of deposited charge, as especially single interactions with a large energy transfer are possibly not contained in the small detector volume. It describes the mean energy loss of the particle, but not the amount of deposited energy inside the detector volume. This is discussed in [22] advocating the most probable values as better suited parameter as depicted in figure 3.2.

Additional to the ionization loss there are also bremsstrahlung effects, which can be described by equation 3.2 for relativistic particles ( $\beta^2 \approx 1$ ) ([35]). The energy loss is proportional to the particle energy *E* and the ma-



(a) Straggeling functions normalized at the most probable value.

(b) Most probable energy loss scaled to the mean energy loss.

Figure 3.2: Energy loss of a 500 MeV pion on thin silicon layers.

terial dependent radiation length  $X_0$ . As in this work deliberately only very thin material layers are used, bremsstrahlung is not a relevant effect.

$$-\frac{dE}{dx} = -\frac{E}{X_0} \tag{3.2}$$

The radiation length is given by the approximation in equation 3.3, which also takes into account coulomb screening effects of the core potential, with A the atomic mass number and Z the atomic charge number.

$$X_0 = \frac{716.4 \text{ g/} \text{cm}^2 \cdot A}{Z(Z+1) \cdot \log(287/\sqrt{Z})}$$
(3.3)

#### 3.1.2 Charge collecting diode

Silicon(Si) is a semiconductor if diamond like crystal structure each Siatom (4 valence electrons) forming 4 covalent bonds with 4 neighbouring atoms. At a temperature of 300 K 10<sup>10</sup>/cm<sup>3</sup> these bonds are broken due to thermal excitation, which releases the electrons into the crystal lattice and leaves a "hole" in the covalent bond structure. Both the electron and hole can move freely, however, with different mobility values  $\mu_e = 1400 \text{ cm}^2/(\text{Vs})$ and  $\mu_h = 450 \text{ cm}^2/(\text{Vs})$  which leads to different drift velocities in an electric field  $v = E * \mu$ . If a free electron encounters a hole in the lattice they can combine to reinstitute the covalent bond, the so-called recombination.

With  $10^{10}$ /cm<sup>3</sup> free electrons available as charge carriers, silicon is not a good conductor. To enhance its conduction capability, atoms with a different number of valence electrons can be added into the lattice structure (doping). By adding Boron, 3 valence electrons, the existence of a missing covalent is forced (p-doping). An additional hole is created. With the help of phosphorus, 5 valence electrons, the additional electron will not be bound in the crystal bond structure and is thermally exited into the crystal lattice, adding an free electron (n-doping). With typical doping concentrations add  $10^{15}$ /cm<sup>3</sup> "impurities" to the lattice and thereby the same amount of free charge carriers. The doped silicon is a better conductor but the prominent usage of this is the combination of P- and N-doped silicon, forming a PN-junction also called diode. Brought into contact electrons from the N-doped silicon will diffuse into the P-doped material, leaving behind a positively charged impurity in the lattice. Vice versa the holes leaving a negative charge. Due to this charge-up an electric field will develop at the junction between the materials which opposes the diffusion movement of the free charges. Charges caught in the electric field region are drifting in direction of the field, effectively keeping the region clear of free charge carriers. The region is therefore referred to as depletion region. The potential which is created by the electric field is called in-build potential.

If an external voltage, potential, is applied to the diode which opposes the build-in potential (forward-bias), a notable current will start to flow once the potential is surpassing the value of the in-build potential which is 0.7 V for silicon. An external potential which aligns with the in-build potential will lead to an increased depletion zone and a stronger electric field. Only a small current is flowing, generated from stochastically created electron-hole (e-h) pairs from thermal excitation. The charges will be separated by the field. Electrons drift to the N-doped area and positrons vice versa. The current-voltage behaviour of the diode is described by the Shockley-diode equation [36]. For large electric field strengths of ( $\approx 3 \times 10^5$ V/cm) the drifting charge carriers can generate new e-h pairs in collision with the lattice. A self-sustained avalanche of charge carriers will be generated causing a large current. The externally applied voltage level is referred to as breakdown voltage.

The depletion zone of a reverse-biased diode can be used for particle detection. A particle which passes through the depletion zone will deposit energy along its path as described before. The deposited energy is transformed into e-h pairs with an average generation energy of 3.6 eV per pair. These pairs are separated by the electric field which causes a current flow which is measured. The wider the depletion zone, the larger the amount of generated electron hole pairs.

Figure 3.3 shows a typical scenario for the use of a diode a particle physics detector. The diode is created by a strongly doped implant inside the weakly doped bulk material which results in an asymmetric formation of the depletion zone. The development of the potential and electric field based on the doping concentration and external potential is described by the Poisson-equation.

$$\frac{d^2 U(x)}{dx^2} = -\frac{\rho(x)}{\epsilon_0 \epsilon}$$
(3.4)

In the case of an strongly asymmetric doping strength the width of the depletion zone *w* is approximated by

$$w = \sqrt{\frac{2\epsilon_0\epsilon_r}{e} * \frac{1}{N_A} * U}$$
(3.5)



Figure 3.3: A reverse biased diode. The depletion zone width scales with the applied voltage.

and the maximum field strength is

$$E_{max} = \sqrt{\frac{2eN_A}{\epsilon_0\epsilon_r} * U}$$
(3.6)

with  $\epsilon_r$  the relative permittivity of silicon, *e* the electric charge,  $N_A$  the donator (p-)doping concentration and *U* the applied reverse bias voltage. The doping of silicon wafers is typically given in form of the resistivity value  $\rho[\Omega \text{ cm}]$ , which is connected to the doping concentration via the mobility  $\mu$ ,

$$\rho = \frac{1}{eN\mu} \tag{3.7}$$

which allows to reformulate w.

$$w = \sqrt{2\epsilon_0 \epsilon_r \mu_p U} \tag{3.8}$$

The diode forms a capacitance with the depletion zone as dielectric which represents the detector capacitance  $C_d$  for simple case shown above it can be calculated as

$$C_d = \epsilon_0 \epsilon_r \frac{A}{w} \tag{3.9}$$

with a the surface of the junction.

These are good approximations for many simple considerations, however, in reality the structures are much more complex and need a computer aided approach to get the full picture as shown in figure 3.4 to e.g. investigate the exact depletion behaviour, determine the expected breakdown voltage or simulate the passing of a charged particle. The charge collection is described by the Shockley-Ramo theorem, describing the induction of a current on the electrode as soon as the e-h pairs are starting to separate due to the electric field. The charges generated in the depletion zone of a MuPix chip  $w \approx 30 \mu$ m are typically collecting within 500 ps.



Figure 3.4: The simulation of three pixel diodes (MuPix10) with a substrate resistivity of  $200 \Omega$  cm at a reverse bias of 100 V [37].

#### 3.1.3 Charge sharing

The surface of silicon wafer can be pixelated by creating a grid of shallow, strongly doped implants. Each of them forming a diode with the substrate. In figure 3.5 a cross section of three neighbouring diode is shown, with three cases of particles crossing the depletion zone which is the active detector volume. Each particle creates e-h pairs along its path. If it crosses perpendicular and central inside the pixel the charges will be collected by a single diode. In case it traverses at the border of two pixel, charges can be created in both. The generated charges will be transported by the electric field of diode in which they are generated. A the charge is split between multiple pixels this is referred to as charge sharing. In case of an inclined particle trajectory it can penetrate multiple pixels and created e-h pairs in all of them.



Figure 3.5: A typical case of a diode structure of HV-MAPS with the corresponding depletion zone. From left to right particles cross the detector volume inclined, centered and at the border between two pixel. For some electron the drift path is approximated for their way to the n-implant.

#### 3.1.4 Transistors

To make use of the signal that is generated by the charge collection of the diode, it needs to be further processed. For this purpose circuitry needs to be provided that amplifies, digitizes and reads out the signal, typical frontend electronics. They are implemented as integrated circuits utilizing the same micro processing technologies as the diode itself. The transistor technology used in this work is metal-oxide-semiconductor field-effect transistors (MOSFET), which are available in two flavours, NMOS and PMOS transistors. Figure 3.6 sketches the two types. Each transistor has four terminal: Source(S), Gate(G), Drain(D) and Bulk(B). The NMOS transistor is comprised of n-doped implants on a p-substrate and PMOS vice versa. The current that is flowing between source and drain contact is steered via the gate-source voltage difference ( $U_{GS}$ ) and the drain-source voltage difference  $(U_{DS})$ . A MOS transistor is characterised by its characteristic threshold voltage  $U_{th}$ , which enables a current flow between source and drain. Besides that the form factor of transistor characterised by its gate length L, the distance between gate and source, and the gate width which impact its performance. The current that is flowing through the transistor is proportional to  $\frac{W}{T}$ . Typical design choices are long transistors used current sources, wide transistors as input transistors and cascodes.

Depending on the applied voltages, the transistor can be operated in different states:



Figure 3.6: MOSFET transistors.

CUT-OFF/WEAK INVERSION In the case  $U_{GS} < U_{th}$  the transistor can be considered as "switched-off", only a very small current can flow, often perceived as leakage current.

LINEAR REGION  $U_{GS} < U_{thr}$  but  $U_{DS} < U_{GS} - U_{th}$ . Here the gate voltage is large enough, that a conducting channel is formed below the gate an connects source and drain contact.  $V_{GS}$  control the charge density in the channel and thereby its "resistance". The transistor behaves like an adjustable resistor.

SATURATION  $U_{GS} < U_{thr}$  and  $U_{DS} >= U_{GS} - U_{th}$ . The channel is "pinched-off" which leaves only a little dependence of th current on  $U_{DS}$ due to channel length modulation which is minimal for long transistors. The transistor has a very large output resistance  $R_{DS} = U_{DS}/I_{DS}$  and the transistor behaves as a voltage controlled current source  $I_{DS} = g_m * U_{GS}$ . Almost all transistors in analogue circuitry are operated in the state.

SUBSTRATE/BODY EFFECT Typically the source and bulk contact are shorted ( $U_{SB} = 0$ ). In some cases, e.g. for a source follower circuit as relevant later, this is not the case. As the conducting channel in the transistor is formed between bulk and gate. An increase of  $U_{SB}$ , will lead to an increase of  $U_{th} \propto \sqrt{U_{SB}}$ , referred as the bulk-effect. For a source follower this leads to a reduction of the amplification (< 1) which makes a non-ideal buffer element.

#### 3.2 PIXEL SENSOR TECHNOLOGIES

In today's world, pixel sensors are everywhere. Almost everyone carries around camera sensors in his pocket either a charge-coupled device (CCD) or active-pixel sensor(APS/CMOS sensor). The world's demand for integrated circuits greatly drives the advancement of micro electronics technology in the commercial sector which in turn feeds back into science, offering new, smaller technologies and chip handling techniques e.g. bump bonding and thinning processes down to  $10 \,\mu$ m [38]. In particle physics mostly two types of silicon devices are used: single-photon avalanche diodes (SPAD) which allow to build extremely small form-factor photo-multipliers and as discussed before sensors that just collect charge with a diode structure. In the following only the latter is relevant which forms the baseline for the most common tracking detector technologies.

#### 3.2.1 Hybrid pixel detectors

The charge collection diode structure and the front-end readout are produced as separate chips and hybridised in an additional production step, for pixel tracking detectors commonly flip-chip bump bonding of the two chips. While the front-end chip can be produced in commercially available processes, the diode structures need to be specifically developed and together with the hybridization step lead to high production costs. The possibility to combine one type of readout chip with different charge collection structures e.g. planar or 3D diode, but also electrodes in diamond, gives this detector a very high level of versatility. Figure 3.7 shows a typical cross section of a hybrid stack. The material budged of pixel modules with hybrid designs typically is larger then  $2\% X_0$  [39] (300 µm thick sensor). hybrid detectors are an established technology used to construct the largest tracking detectors in high energy physics and also form the baseline for the upgrade of the ATLAS and CMS pixel trackers. Their new front-end chip is developed in a joint venture of ATLAS and CMS in the RD53 collaboration [40].



Figure 3.7: A typical cross section of a hybrid sensor design. The diode is bump bonded to the front-end chip which processes the incoming current signal from the diode. Taken from [41]

#### 3.2.2 Monolithic active pixel sensors (MAPS)

In contrast to the hybrid approach, monolithic designs combine detection and front-end electronics on the same chip. The use of commercially available CMOS processes and the avoidance of a hybridisation step reduce the production cost greatly and allows to equip large detector surfaces at much lower cost. Active pixels describe the combination of detection diode and pixel electronics inside the pixel structure. The detection diode of typical MAPS chips is only creating a small depletion zone and makes use of a process with an epitaxial layer (8 k $\Omega$  cm) to confine charges of crossing particles to a 20 µm thick layer . The charge collection via diffusion and the employment of low power front-end electronics limit the time resolution to the order of µs. At the same time the small charge collection region allows to thin away large portions of the initial wafers thickness and reduce the sensor thickness down to 50 µm which reduces the sensor material budget below 0.1% of radiation length.

The chips of the Mimosa series have pioneered this detector approach with the Mimosa26 sensor equipping the EUDET tracking telescopes [42] and the Ultimate-2 (Mimosa28) sensor used in the first large scale application as vertex detector in the STAR experiment [43]. Based on this development the ALPIDE chip was developed in the 180 n*m*-Towerjazz(TJ) process. The ALPIDE sensor [44] is used for the upgrade of the inner tracking system (ITS) of the ALICE experiment at CERN with a total detector surface of about  $10 \text{ m}^2$ .



Figure 3.8: Cross section view of the well structure of the ALPIDE MAPS chip [44]. The circuitry is implemented around the tiny n-well electrode in a deep-well separating it from the epitaxial layer.

#### 3.2.3 High voltage monolithic active pixel sensors (HV-MAPS)

The usage of a depleted diode as detection element enables charge collection via drift, which allows the HV-MAPS principle [27] to combine the advantages of the monolithic approach with a fast charge collection, giving time resolutions in the order of ns. To produce the sensors commercially available HVCMOS processes are used, which are commonly used in automotive industry and guarantees low production costs. The employment of a deep n-well implant allows to apply a negative bias to the p-substrate which depletes the n-well p-substrate junction, representing the active detector volume, while at the same time implementing CMOS circuitry inside the n-well. A further advantage of the n-well pixelation is that a complete de-coupling of analogue circuitry and clocked logic is possible when they are placed in different n-wells, reducing the probability of crosstalk of the clocked domain into analogue electronics. HV-MAPS typically use substrate resistivities between (10-1000)  $\Omega$  cm and a reverse biases of (60-120) V. Like this the depletion zone thickness can be controlled and the remaining non-depleted silicon can be removed, conserving the material budget advantage of MAPS.

The combination of fast readout, good time resolution and controllable material budget this technology is a perfect fit for the Mu3e experiment and consequently chosen and further developed to enable the construction of the Mu3e pixel tracker.



Figure 3.9: Sketch of the HV-MAPS principle [27]. The pixel electronics is implemented inside the n-well electrode.

# 3.3 MUPIX10

Previous MuPix prototypes solved critical challenges in the maturing of the HV-MAPS technology, the high integration level of the readout [32] and the technology scaling to produce 2 cm long sensors while maintaining the excellent sensor performance [45]. For MuPix10 the design goal is to provide a full-scale HV-MAPS chip, which fulfils all requirements in his dimensions and connection interface to enable the production of Mu3e pixel tracker modules.

## 3.3.1 The HV-CMOS process

The MuPix10 is produced in a 180 nm HV-CMOS process, provided by TSI semiconductors <sup>1</sup>. Previous prototypes had been produced in the same process node by ams <sup>2</sup> which was discontinued. Both processes root on the original IBM 180 nm process which allowed a smooth transition. The TSI process was validated by resubmitting the MuPix7 design which showed no notable differences [46].

A cross section of the process is shown in figure 3.10. CMOS electronics is implemented in a deep n-well which are decoupled from each other as they form a depletion zone with the p-doped substrate. On one hand this allows to separate analogue and clocked digital electronics from each other and prevent crosstalk between them. On the other hand the depletion can also be used for particle detection.

Although the foundry typically uses standard substrate( $20 \Omega$  cm) for the chip manufacturing, they have been convinced to use higher-ohmic sub-

<sup>1</sup> TSI Semiconductors, USA, http://www.tsisemi.com

<sup>2</sup> ams AG, Austria, http://www.ams.com



Figure 3.10: The well structure of the used HV-CMOS process. The deep nwell forms a depleted diode with the substrate which can be used as active detector and allows to decouple electronics in different wells.

strates in the HV-MAPS engineering productions runs, as it allows to achieve larger depletion widths at the same high voltages  $w \propto \sqrt{\rho U}$ . The commonly used substrates are highlighted in table 3.1. For the production of MuPix10 the 200  $\Omega$  cm substrate was chosen to allow for a full depletion of the sensor after it was thinned down to 50 µm in case the breakdown voltage is similar to MuPix8 (-60 V). Figure 3.11 illustrates the depletion depth development for different resistivities.

| Name     | standard            | $80\Omega\mathrm{cm}$ | $200\Omega\mathrm{cm}$        |
|----------|---------------------|-----------------------|-------------------------------|
| Range    | (10-20) $\Omega$ cm | (50-100) $\Omega$ cm  | $(200-400) \Omega\mathrm{cm}$ |
| Measured | -                   | -                     | $369 \pm \Omega \mathrm{cm}$  |

Table 3.1: Substrate resistivity ranges used in HV-MAPS production.

# 3.3.2 Requirements

Table 3.2 lists the specification that the MuPix10 design needs to fulfil to enable the construction of sensor modules for the Mu3e pixel tracker. Aside the performance and overall dimensioning, the sensor is required to offer a connection interface, which allows to operate and readout the chip via a two layer aluminium high density interconnect(HDI) [47] which provides power, high voltage, configuration and readout. The interconnection with the chip is achieved by SpTa-bonding [48] of the aluminium traces to the chip, which requires large  $200 \times 100 \mu m^2$  bond pads.

# 3.3.3 The MuPix10 chip

The chip layout is shown in figure 3.12 which show the typical separation of the chip into the active detection pixel matrix and the periphery housing the clocked circuitry to digitize and readout the pixel hits. Additionally the



Figure 3.11: The depletion depth depending on the applied reverse bias for different substrate resitivities. In rising order  $20 \Omega$  cm,  $80 \Omega$  cm,  $200 \Omega$  cm and  $400 \Omega$  cm resistivity.

periphery contains digital-to-analogue converters (DAC) to steer the analogue circuitry as well as the bond pads. The periphery makes up 13% of the total chip surface of which 2.2% is the bond layout by itself.

Table 3.3 summarises the design dimensions which follow the Mu3e specifications. The total sensor size is here equivalent to the dicing size which leaves an  $11 \,\mu m$  silicon seal ring around the chips guard ring which allows to minimise the non active material in between two sensors on a module.

|                                         | Requirements    |
|-----------------------------------------|-----------------|
| pixel size [µm <sup>2</sup> ]           | $80 \times 80$  |
| sensor size [mm <sup>2</sup> ]          | $20 \times 23$  |
| active area [mm <sup>2</sup> ]          | $20 \times 20$  |
| active area [mm <sup>2</sup> ]          | 400             |
| sensor thinned to thickness [µm]        | 50              |
| LVDS links                              | 3 + 1           |
| maximum bandwidth [Gbit/s]              | $3 \times 1.25$ |
| RMS of spatial resolution [µm]          | ≤ 30            |
| power consumption [mW/cm <sup>2</sup> ] | ≤ 350           |
| time resolution per pixel [ns]          | ≤ 20            |
| efficiency at 20 Hz/pix noise [%]       | ≥ 99            |
| noise rate at 99% efficiency [Hz/pix]   | ≤ 20            |

Table 3.2: The Mu3e pixel sensor specifications [17].





The sensors are available in various thickness and two resistivities which allow for detailed studies of possible influences by the thinning process in the future. In this work only a  $100 \,\mu\text{m}$ ,  $200 \,\Omega$  cm sensor will be investigated.

## 3.3.3.1 The Chip Design

The sensor architecture follows the same principle as previous prototypes, sketched in figure 3.13. The active pixel consists of a deep n-well which forms a detection diode with the substrate material and is housing a charge sensitive amplifier (CSA) for signal amplification and a source follower (SF) as buffering driver element which loads the long metal line forming the point-to-point connection of the active pixel to the corresponding digital cell in the periphery. Here the arriving analogue pulse of the active pixel is digitised and the data is subsequently readout in a column-drain fashion. The data is serialised by the on-chip state machine and send out via 1.25 Gbit/s links. Each pixel sub-matrix is readout individually with its own

|                                  | MuPix10                  |
|----------------------------------|--------------------------|
| pixel size [µm <sup>2</sup> ]    | $80 \times 80$           |
| sensor size [mm <sup>2</sup> ]   | $20.66 \times 23.18$     |
| columns × rows                   | $256 \times 250$         |
| active area [mm <sup>2</sup> ]   | $20.48 \times 20$        |
| active area [mm <sup>2</sup> ]   | 400                      |
| sensor thinned to thickness [µm] | 50, 60, 70, 80, 100, 625 |
| resistivity [ $\Omega$ cm]       | 20, 200                  |

Table 3.3: The MuPix10 sensor dimensions.

1.25 Gbit/s link. A fourth additional link is provided which offers a data stream merged from the three sub-matrices.



Figure 3.13: The MuPix electronic structure.

THE PIXEL ELECTRONICS The in-pixel electronics is sketched in figure 3.14. The charge sensitive amplifier is build from a folded-cascode gain stage with a PMOS input transistor, pictured in figure 3.15, with a feedback circuit providing a constant current creating a linear falling edge, which leads to a linearised connection between pulse height and pulse duration. The drain contact of the PMOS cascode transistor forms a parasitic capacity with the deep n-well, which is used as the feedback capacity of the CSA  $(C_f = 1.6 \text{ fF})$ . As a result the pulse height is proportional to the deposited charge  $(Q_{dep})$ . To mimic a charge deposition the input of the amplifier also allows to inject charges via a capacitance.

$$U_{pulse} \approx \frac{Q_{dep}}{C_f} \tag{3.10}$$

A detailed discussion of the amplifier can be found in [49]. The amplifier is not meant to drive a large output capacitance, therefore it is connected to a NMOS source follower which drives the long point-to-point connection into the periphery. As shown in figure 3.15 the gain stage requires an additional "amplifier voltage" which is called VSSA ((1.0-1.2) V). As for the Mu3e pixel tracker modules only one supply voltage is available, this voltage needs to be generated on chip.

The architecture of the regulator is sketched in figure 3.16. It is a linear series regulator which uses a differential amplifier to adjust the VSSA voltage level to a configurable reference value (vss\_ref). As the regulator acts as a self adjusting voltage divider the current drawn by the (1.0-1.2) V network will dissipate additional heat in the regulating transistor leading to a power penalty.

THE DIGITAL CELL The analogue pulse arriving on the signal line is AC coupled to a system of two comparators with individual thresholds. As depicted in figure 3.17 two modes are selectable, either only one of the comparators is used to sample the pulse or both with shared tasks. In the case of 1-comparator mode additionally to a hit flag, a time of arrival (ToA)



Figure 3.14: The in-pixel electronics. vss=VSSA



Figure 3.15: Folded cascode amplifier with a PMOS input transistor. vdd=vdda

timestamp will be stored when the pulse surpasses the comparator threshold. The timestamp is 11 bit wide with a 8 ns binning. When the pulse drops again below the threshold a second 5 bit timestamp (TS2) is stored typically with 128 ns binning to cover a pulse duration range of 4  $\mu$ s. The subtraction of the time of arrival allows to calculate a time-over-threshold (ToT) which is a measure of the deposited energy.

$$ToT = TS2 - ToA \tag{3.11}$$

As also highlighted in figure 3.17 the time of arrival that is stored will dependent on the pulse height. Larger pulses will cross the threshold earlier which is referred to as timewalk. The effect can be reduced either by lowering the threshold, which reduces the timewalk spread, or by measuring the ToT and correcting for effect later. A lowering of the threshold is leading to an increase of noise induced hits. To overcome this the two-threshold



Figure 3.16: Functional sketch of the linear regulator.



Figure 3.17: The digitisation of the analogue pulse by the comparators.

method was implemented which allows to sample a less timewalk affected ToA on the lower threshold and the hit flag and second timestamp on a noise free, higher threshold.

Due to variations of process parameters during the production of the chip. All comparators will behave slightly different, an effect commonly referred to as threshold-dispersion, as individual comparators require different threshold voltage levels to exhibit the same performance. To reduce this effect the comparators are equipped with a 3 bit trimming circuit which allows for an individual adjustment of the comparators, also referred to as tuning.

In MuPix8 it was discovered that the sampling of the second timestamp is jeopardized [50]. As depicted in figure 3.18a the hit is possibly readout before TS2 is sampled, therefore invalidating the ToT information. To guarantee a correct ToT measurement and subsequently enabling the possibility of timewalk correction, the scheme pictured in figure 3.18b is implemented in MuPix10. When the hit flag is set by the comparator a delay period start which is the same for all pixels inhibiting the readout until the TS2 was sampled. As the delay is the same for all pixels, the time ordering of the readout hits remains unchanged. A desired effect, as it simplifies the time sorting process which is part the Mu3e readout chain.





Figure 3.19 shows a sketch of the new analogue delay circuit. The registered hit enables a current source which charges up a capacitor. The increasing voltage of the capacitor is measured by a discriminating element which switches when its threshold is crossed. This output enables the readout of the pixel cell. The strength of the current source is adjustable and thus the time delay. A detailed description of the digital cell and the circuitry is presented in [51].



Figure 3.19: The delay circuit logic. (The reset circuit is not shown.)

THE READOUT The registered hits are readout in a column-drain approach, with each column forming an OR-chain of the hit flags, the priority signal, indicating if a hit is to be readout in the respective column. If the OR-chain signals a hit, the information of the hit lowest in the column is copied to the end-of-column(EoC) buffer and from there further to the data serialiser, as depicted in figure 3.20. The signals required to copy data, clear buses and reset hit flags are orchestrated by a synthesised finite-state



machine, shown in figure 3.21. A full description of the readout procedure can be found in [52].

Figure 3.20: The column-drain readout scheme.

The readout state machine transitions the states nominally with a frequency of 62.5 MHz. On every second state a data word is sampled and serialised for readout as illustrated in table 3.4. With the priority logic acting as zero-suppression and the state machine running independent of any external signals, e.g. trigger signals, it constantly sends out data over a 1.25 Gbit/s link, referred to as streaming readout. A sub-matrix of MuPix10 has a theoretical maximum readout capacity of 30 MHits/s. With three sub-matrices readout individually 90 MHits/s. For MuPix8 the readout was tested for hit rates larger then 10 MHits/s [52]. As the data format contains a sizeable amount of fill words. These free data slot can be used for a different purpose (chapter 4).

FURTHER OPTIMISATIONS Aside from the already mentioned changes, namely the regulator and the readout delay, two more mayor adaptations have been implemented and are the main subject of this work. In section 4 the development and implementation of the Mu3e configuration interface is described, aiming for a reduction of signals required for the sensor configuration, which is driven by the very limited routing space available on the HDI. Section 5 describes the optimisation of the routing scheme of the pixel point-to-point connections to reduce crosstalk between signal lines and the position dependent delay. Both have been prominently observed in MuPix8.

# 3.3.3.2 Results

The MuPix10 chip was successfully commissioned and is currently being routinely operated and intensively tested in laboratory measurements and testbeam campaigns. For the testing of single chips, the sensor is mounted onto a specifically designed PCB-card as depicted in figure 3.22a and is con-



Figure 3.21: State diagram of the readout state machine. Taken from [52].

figured and readout via the DAQ system presented in [53]. In the following the main findings are highlighted as they are guiding final polishing process for a resubmission of MuPix10 with minimal changes as the chip for the Mu3e pixel tracker, MuPix11.

BREAKDOWN VOLTAGE It was found that the breakdown voltage improved drastically in comparison to MuPix8 which was approximately at -60 V. For MuPix10 the breakdown was found to occur at -115 V and by default operated at -100 V which corresponds to a depletion depth of around  $60 \mu$ m. With the improved breakdown voltage a reduction of the resistivity is possible as also at  $80 \Omega$  cm a depletion depth of  $30 \mu$ m is achievable.

THE VSSA REGULATOR & POWER CONSUMPTION So far in depth study of the regulator has been performed, however, the ad-hoc usage of the regulated voltage as VSSA supply was successful and is the default mode of operation. Figure 3.23 shows a scan of the regulator reference volt-

| readout state | data                   | data type      |  |
|---------------|------------------------|----------------|--|
| PD1           |                        |                |  |
| PD2           | bc bc bc bc            | komma          |  |
| LdCol1<1>     |                        |                |  |
| LdCol1<2>     | bc bc bc bc            | komma          |  |
| LdCol1<3>     |                        |                |  |
| LdCol1<4>     | bc bc bc bc            | komma          |  |
| LdCol1<5>     |                        |                |  |
| LdCol1<6>     | bc bc bc bc            | komma          |  |
| LdCol1<7>     |                        |                |  |
| LdCol2        | lc aa lc aa            | hit identifier |  |
| LdPix1        |                        |                |  |
| LdPix2        | chip time              | chip timestamp |  |
| RdCol1        |                        |                |  |
| RdCol2        | address and timestamps | hit data       |  |

Table 3.4: The data structure of the MuPix readout state machine. Every second state samples an serialises a data word. Some states sending the Komma word K28.5 (0xbc) as filler. The identifier 0xaa for sub-matrix A with Komma flag 0x1c (K28.0).

age. The VSSA level shows a linear behaviour with a range from (0-1.4) V with the nominal working point of 1.2 V well covered. Further, the corresponding changes in the supply current are plotted. As the amplifier is the only circuit using the VSSA voltage; the change in current represents the turn-on of the amplifier. Only for voltage levels above 1 V the amplifier can be considered fully working.

As the VSSA-regulator is working, the default sensor operation is using only a single supply voltage, shorting analogue and digital domains, while providing VSSA via the regulator. With the help of test point inside the chips power grid it was found, that there is considerable voltage drop from the supply voltage connection pads and the power grid, which was confirmed by the layout as narrow wires. To compensate for these voltage drops a larger supply voltage needs to be applied, 2.25 V. Only with this over-voltage the analogue domain (VDDA), which suffers the largest drop, reaches its nominal working voltage of 1.8 V. The digital domain (VDD) is operated at an over voltage. This is easily fixed in MuPix11.

Table 3.5 shows the resulting voltages and currents in the individual domains if a single supply voltage of 2.25 V is applied. Assuming the currents remain almost unchanged if their nominal working points of 1.8 V are reached, the contribution of the voltage drop can be calculated out. Leaving only the bare power consumption of the sensor.

$$P_{MuPix10} = V_{Supply} * I_{Supply}$$
(3.12)



(a) A MuPix10 chip mounted on a test PCB.



(b) MuPix10 hit map observed with the DESY electron beam.

Figure 3.22: The MuPix10 is routinely operated, glued and wire bonded to a specifically designed test PCB.

| Domain              | V <sub>Supply</sub> | VSSA | VDDA | VDD  |
|---------------------|---------------------|------|------|------|
| In-chip voltage [V] | 2.25                | 1.0  | 1.8  | 2.1  |
| Current [A]         | 0.58                | 0.22 | 0.23 | 0.13 |

Table 3.5: Substrate resistivity ranges used in HV-MAPS production.

The total power consumption in the mode of a single supply voltage amounts to  $P_{MuPix10} = 1.305$  W. Normalised to the total chip surface of  $2 \times 2.3$  cm<sup>2</sup> this gives 280 mW/cm<sup>2</sup>. Corrected for the power dissipation in the undesired wire resistances  $P_{corrected} = 0.858$  W or 186 mW/cm<sup>2</sup>. This number misses the power penalty of the VSSA regulator which is  $P_{Penalty} =$  $(1.8V - 1V) * I_{VSSA} = 0.168$  W which gives a total power consumption estimation for a MuPix10 with no voltage drops connected to a single supply



Figure 3.23: VSSA voltage and vdda current plotted for a scan of the regulator reference voltage.

voltage of 222 mW/cm<sup>2</sup>. Both values with and without voltage drop suffice the Mu3e requirement, but the voltage drop increases the power consumption by 25%.



Figure 3.24: ToT spectra obtained for 3 different delay times controlled by the VPTimerDel DAC value.

THE DELAY CIRCUIT Figure 3.24 shows three measured ToT distributions with different chosen delays. The peak in the end of the spectrum corresponds to the delay time. If the delay time lapsed, but the ToT of the pulse would be even longer, the expiring time of the delay is sampled instead. All pulses longer than the delay are contained in this peak. The dispersion of the peak is caused by variations of the delay time over the chip and is on an acceptable level for the usage in Mu3e. As long pulses correspond to a large signal amplitude, there is no need for a timewalk correction. Hence, the delay time can be reduced below the duration of the longest pulses without loosing precision in the correction process.

EFFICIENCY & NOISE Figure 3.25 shows an exemplary result of a scan of the discriminator threshold voltage determining crosstalk and noise for each point with the help of the MuPix-telescope [53].





For low thresholds the efficiency reaches a larger plateau with efficiencies well above 99.9% while maintaining an average noise-per-pixel rate below 2 Hz.

An in-depth study of the HV-dependence [54] and the influence thinning of the sensor on the efficiency is currently ongoing.

TIME RESOLUTION The degrading effect of pixel position dependent delays and the timewalk on the sensor time resolution is well understood [50] and was immediately investigated for MuPix10 [55] with the result highlighted in figure 3.26.



(b) Exemplary MuPix10 time resolution for raw and corrected data.

# Figure 3.26: The MuPix10 time resolution with the compensation of the onchip voltage drops and further optimised settings.

With the help of the analytical delay and time walk compensations developed [55] time resolution of 7 ns are achievable. The investigation of the time resolution on a pixel to pixel base revealed an intrinsic pixel time resolution of  $(6 \pm 1 \text{ ns})$ . The time resolution goal of the Mu3e specifications is very well met with and without correction.

SENSOR TUNING All pixels are connected to two comparators with globally applied thresholds. In the MuPix10, both comparators are equipped with a 3 bit DAC which allows to set individual threshold to compensate the variations, the so-called trimming or tuning. Additionally, the pixels have a switch bit, which allows to mute pixels if they are uncontrollably noisy. This feature was tested and a tuning was performed successfully [56].

In this study all pixels have been stimulated with charge pulses corresponding to  $3000 \,\mathrm{e^-}$ , using the injection infrastructure, see figure 3.13. A scan of the global threshold is performed and the amount of recognised injection pulses is measured. The resulting distribution is fitted with an s-curve function. The mean value extracted from this fit varies for different pixels. The RMS of the mean-distribution plotted in figure 3.27b is a measure of the threshold dispersion. With the application of individual threshold shifts, shown in figure 3.27a, the differences of the mean values can be minimised, reducing the dispersion over all pixels.



(a) Commissioning measurement to check the linearity of the 3 bit tuning DAC. Linear fit in red.



(b) The threshold dispersion before (red) and after (blue) tuning for the full MuPix10 matrix with an equivalent signal of  $3000 \, e^-$ .

Figure 3.27: Results obtained with the threshold tuning method [56].

With this method the threshold dispersion was reduced from 11 mV to 4.8 mV or in electron equivalent from  $240 \text{ e}^-$  to  $75 \text{ e}^-$ , see figure 3.27b. The

effect of the tuning on the efficiency still needs to be investigated in testbeam campaigns.

ISSUES As its predecessor MuPix8, also MuPix10 can not be reliably operated with the readout state machine at full speed which halves the maximum readout speed to 45 MHits/s, still enough to handle the high rates of Mu3e phase 1. The origin of this problem is under investigation to possibly find a fix of the issue for the MuPix11 chip.

Further a reset issue was discovered in one of the chips registers which leads to an immediate loss of the chips configuration after the configuration written(loaded). Although a work-around was possible which allows to fully use the sensor, this bug affects the testability of Mu3e configuration interface. A fix of this issue for the MuPix11 submission is already prepared.

# 3.3.4 Towards MuPix11

Table 3.6 summarises the MuPix10 properties given by design or measured as presented above. A comparison with the Mu3e specifications shows that all requirements are met. However, these results represent a  $100 \,\mu\text{m}$  thick sensor. They need to be confirmed for a MuPix10 with 50  $\mu\text{m}$  thickness.

|                                         | Requirements    | MuPix10                 |
|-----------------------------------------|-----------------|-------------------------|
| pixel size [µm <sup>2</sup> ]           | $80 \times 80$  | $80 \times 80$          |
| sensor size [mm <sup>2</sup> ]          | $20 \times 23$  | $20.66 \times 23.18$    |
| active area [mm <sup>2</sup> ]          | $20 \times 20$  | $20.48 \times 20$       |
| active area [mm <sup>2</sup> ]          | 400             | 410                     |
| sensor thinned to thickness [µm]        | 50              | 100 (50)                |
| LVDS links                              | 3 + 1           | 3 + 1                   |
| maximum bandwidth [Gbit/s]              | $3 \times 1.25$ | $3 \times 1.25$         |
| RMS of spatial resolution [µm]          | ≤ 30            | ≤ 30                    |
| power consumption [mW/cm <sup>2</sup> ] | ≤ 350           | 280 (222)               |
| time resolution per pixel [ns]          | ≤ 20            | $(6 \pm 1 \mathrm{ns})$ |
| efficiency at 20 Hz/pix noise [%]       | ≥99             | ≥99                     |
| noise rate at 99% efficiency [Hz/pix]   | ≤ 20            | ≤2                      |

Table 3.6: Side-by-side comparison of Mu3e specifications and the MuPix10 benchmark values.

As the first full scale prototype MuPix10 was used to build prototype modules with printed circuit boards instead of HDIs pictured in figure 3.28a. Several modules have been combined to form a two layer proto vertex detector, shown in figure 3.28b, which allowed to perform a DAQ integration testbeam with actual muon decays a signal source. The construc-

tion of MuPix tracker modules and the results of this integration testbeam are presented in [57].



(a) A 6-chip prototype module.



(b) The two layer proto-type vertex detector durin assembly.

Figure 3.28: Prototype Mu3e detectors build the MuPix10 chip.

For most issues discovered for MuPix10 the problem is understood and the solution requires only minimal changes to transition from the MuPix10 to the MuPix11 design, which will enable the start of Mu3e pixel module construction in spring of 2022.

### 3.4 HV-MAPS BEYOND MU3E

Besides the usage in Mu3e also experiments plan to use HV-MAPS in their detector system[58–60]. For some this will require a further development of sensors with differently optimised electronics.

An already existing example of such a development are the ATLASPix chips [61–63]. Their architecture is highlighted in figure 3.29. The comparator is moved inside the pixel, therefore the analogue pulse of the amplifier will not travel across the signal line, but a normalised pulse. This has several advantages as discussed later.



Figure 3.29: The architecture of an HV-MAPS sensor with in-pixel comparator.

A new process modification (figure 3.30), which introduces a deep p-well inside the deep n-well was already successfully tested in [51]. The introduction of the deep p-well allows to decouple PMOS transistors fully from the n-well. This grants the implementation of digital CMOS electronics inside the active detection n-well, without a possible parasitic coupling of the PMOS transistors into the signal collection electrode, the deep n-well. This opens the door wide ever more complex in pixel logics and moving circuits from the periphery into the pixel which slims down the size of the periphery and thereby the amount non active detection surface.



Figure 3.30: The well structure of the HV-CMOS process with an additional deep p-isolation well. The deep p-well decoupling the PMOS transistors from the deep n-well.

The ATLASPix design was discussed as option for the outer layer of the inner tracker of the ATLAS detector [61]. The possibility of a monolithic pixel in ATLAS spawned an arms race in the development of MAPS detector utilising depletion, which created several interesting competing design approaches. The developments by Ivan Perić are typically denoted as HV-MAPS. Other developments are also named high-resistivity MAPS (HR-MAPS) and depleted MAPS (DMAPS).



Figure 3.31: The modified TowerJazz180 process.[64, 65]

One major development was modification of the CMOS process, which was used to produced the ALPIDE MAPS for the ALICE ITS upgrade. With the introduction a weakly n-doped layer into the epitaxial layer as depicted in figure 3.31, a diode was created which is easily fully depleted and the former MAPS technology is now a depleted MAPS technology. There is one huge difference setting this design apart the HV-MAPS.

In the TowerJazz process the diode is very small compared to pixel size, while the deep n-well of the HV-MAPS fills large areas of the pixel. Therefore being referred to as small and large fill factor designs, also illustrated in figure 3.32.



Figure 3.32: On top a large fill factor design, below a small fill factor design.

Both types have their strengths and are still being further developed. The small fill factor design profits from a very small detector capacitance approximately 4 fF, which allows to implement fast and low power amplifiers. They lack however in terms of radiation hardness as the electric fields are small and the drift paths long. This on the contrary is the strong suit of the large fill factor design as the diode structure is very simple and the electric field can be very strong for lower substrate resistivities.

There are currently many different developments on going, many of which apply the well understood column-drain readout scheme. A deviation from this with more intricate readout design to facilitate smaller pixel sizes can be found in the MALTA chips [66]

A completely different path is chosen in [67]. With the usage of BiCMOS processes which offer extremely fast bipolar transistors along side a CMOS process. It is possible to build monolithic pixels with sub-ns time resolution.

The diversity of available devices is rapidly increasing, making the field of depleted monolithic pixel detectors more interesting then ever. Part III

SLOW CONTROL & CROSSTALK OPTIMISATION

# 4

In this chapter the development of the Mu3e pixel sensor configuration interface is described. The evolution and structure of the on-chip configuration architecture and the requirements imposed onto the sensor from module and detector integration are highlighted. The goal is to provide a fast configuration interface with a minimal pin count, to not further stretch the on-edge design of the module HDI and maximise the routing space available for the power distribution (section 2.4).

# 4.1 THE CHIP CONFIGURATION INFRASTRUCTURE

The basic building blocks of the configuration infrastructure of the MuPix and ATLASPix chips are custom-designed linear shift registers in a serial-inparallel-out architecture connected to a storage interface. The structure is depicted in figure 4.1. In its simplest incarnation it features a clock(CK) and data input to the shift register and a load signal to copy the data from the shift register cells to the memory cell.



Figure 4.1: Original design of the shift register cells: D-Flip Flop forming a cell in the register chain, in- and output in blue, and a gated latch as storage element.

This basic implementation with a chain of D-flip flops was used for all MuPix chips up to MuPix7. The main challenge for MuPix8 was the upscaling of the technology and the increasing length of the shift register. A change to the shift register implementation is required, as the used D-flip flops are implemented in pass-transistor-logic, which employs transmission gates to realize the logic. This uses less transistors and therefore less space for the same logic. However, a chain of two oppositely clocked transmission gates, as is found in this implementation, is partially transparent during switching, effectively shorting in and output. Causing this structure to be prone to failure. In a long shift register implemented with this technology this may cause the failure of the chain and thereby the entire chip. Therefore a redesign of the shift register cell becomes necessary. Additionally a new feature is desired, which allows to readback the stored values for diagnostic and monitoring purposes, as in the past a loss of the configuration was observed without an apparent reason and thus



(a) The new register cell with 2 gated latches replacing the D-flip flop. Shift register path marked in blue, readback path in red.





(c) Reading back the value in the storage latch into the register cell.

Figure 4.2: Implementation and usage of the new register.

The solution is illustrated in figure 4.2a . The D-flip flop is replaced by a chain of two gated latches, eliminating the transmission gates. The structure uses two independent clocks. A third gated latch acts as the configuration memory which is written with the load signal. The readback capability is achieved within this unit cell through a feedback of the stored value via a 2-to-1 multiplexer, which switches between the shift register input and the stored memory value, steered by the readback signal RB. The waveforms in figure 4.2b shows exemplary the standard clocking and loading procedure for this register cell. With the clear separation of CK1 and CK2 a possible shorting of the in and output of the register cell is entirely circumvented.. In comparison to the D-flip flop implemented in pass-transistor-logic, this logic requires more transistors, but guarantees a save operation.

A snap shot of the layout of an individual cell and a combination of 4 cells is shown in figure 4.3. As standard cells<sup>1</sup> have been used for the implementation, the individual components are stacked very space efficient and use common power and ground grids. The routing between the standard cells is done by hand. Further the cell's in- and output wires are designed sym-

<sup>1</sup> Standard cells are transistor implementations of logic gates: NAND, NOR, INVERTER etc. With standardised layout parameters.

metrical, which allows to create the shift register chain very easily without loosing space.

This cell forms the standard implementation for all chips since MuPix8. However, it was reworked for ATLASPix3 by replacing the storage latch by a triple-redundant logic with a majority and error correction logic, which be not covered here. Furthermore a reset mechanism was added which allows to load a default, hardwired setting for each register cell. This tripleredundant register cell is also used in MuPix10.



Figure 4.3: Layout of for a single and four merged the register cells.



Figure 4.4: Implementation of the pixel RAM.

For configuration bits and DAC values the storage latch of the register cell is used for permanent storage. In case of the pixel values, this is not the solution as it would take up a lot of additional space per pixel. The solution here is to make use of the natural matrix structure of the pixels with each cell containing its own memory in the form of SRAM cells. This matrix of memory is controlled as illustrated in figure 4.4. One register is used as a write enable for a column and the data lines are applied per row. To configure the full matrix in this way the register has to be written and loaded at least once for every column. In MuPix10(first in ATLASPix3) this principal was further expanded with the introduction of a pre-charge mechanisms which allows to readback the pixel configuration. This new method is implicitly used to readback the pixel RAM in the following, but this new mechanism is not discussed as part of this thesis.



Figure 4.5: Serial and parallel implementation of shift registers.

Depending on the available functionality of the chip, the shift register controls different types of configurable behaviour. In case of the MuPix chips 3 main types are distinguished: control of adjustable voltages and currents, state machine configuration and pixel configuration. In general these bits are grouped up into connected blocks according to their function. These blocks arethen either chained into a single long shift register (e.g. MuPix8) or arranged in parallel, sharing as many control signals as possible, as depicted in figure 4.5. The parallel architecture clearly has advantages, as the shift registers responsible for the pixel configuration have to be loaded several times to configure the full matrix, which means in case of a single register architecture also the non-pixel registers have to be written in each cycle which represent a sizable overhead, limiting the total configuration speed.



Figure 4.6: Module configuration architectures.

The same architecture types, serial or parallel, also applies to the combination of several chips to a detector module as illustrated in figure 4.6. The obvious solution is to control every chip individually, this however requires individual routing for every chip and is therefore very impractical and not feasible for the Mu3e approach. The second possibility is a daisy chain of the individual chips into one very long shift register, which comes at the cost of a large overhead if changes in one sensor are required, as well as an increased chance of corrupting the entire module if the chain fails at any point. The third approach is the parallel organisation of the chip configuration interface, which minimizes the amount of required slow control signals (bus), however requires a possibility to address the correct chip. This is either achieved via a chip select line, which requires an additional signal for each chip or by providing each chip with a unique address on the module. Only the latter fits the design goals for the Mu3e pixel modules as described in section 2.4. An additional desired feature is the feedback of slow control information from each chip. To achieve this the same architectures as for the configuration input is consoled. However, a entirely different approach is used, as described below.

## 4.2 DEVELOPMENT OF THE MU3E PROTOCOL

The requirements imposed by the experimental design goals require a configuration interface with minimal pin count that connects to the existing chip-internal slow control infrastructure, described above. Further it should provide a readback capability to read out stored configuration values and additional observables e.g. on-chip voltage measurements via an ADC and other chip specific functions.

| Name | Wires | Peculiarity         |
|------|-------|---------------------|
| SPI  | 4     | Chips select        |
| JTAG | 4(2)  | Daisy chain         |
| I2C  | 2     | Chip Address        |
| RD53 | 2     | Data Clock recovery |

Table 4.1: Characteristics of established communication interfaces and the approach on the RD53B readout chip[68].

As a starting point established standards are compared to the Mu3e requirements in table 4.1. None of the protocols fits the requirements out of the box, in particular the interface to the internal shift registers requires modifications as even the signal types do not match e.g. for  $I^2C$  which uses open-drain or push-pull drivers. The minimal pin count requirement only leaves an  $I^2C$  like approach or the ATLAS-Readout chip standard as possibilities. The latter is discarded on the grounds of its complexity and required changes, especially the clock generation via data-clock-recovery, which would mean a rework of the PLL and clock trees of the MuPix, which are already silicon proven and working flawlessly at the time of development. From the designers point of view a drop-in, minimal invasive solution is desired, which is tested, debugged and proven within a chip submissions, before its application in the final chip. Therefore a dedicated interface was developed for Mu3e which fulfils the Mu3e requirement and does not jeopardize existing circuitry. As for the readout, the logic required for the configuration interface is realised as a state machine implemented in verilog and synthesised onto the chip, possibly in conjunction with the readout state machine.



Figure 4.7: The embedding of the slow control interface. All in and out going signals are connected to the Mu3e Front-end board or controlling FPGA. The dashed red lines indicate possibilities to make use of redundancies.

Figure 4.7 shows the envisioned interface for the final chip. All signals are unidirectional differential lines which are transmitted and received by the front-end FPGA. For the configuration interface a data input line and a clock line are foreseen, similar to  $I^2C$ , but with an additional output for slow control data. Further the obligatory signals required by the readout state machine are shown. It is obvious that some of these signals are functionally redundant. Indicated with the dashed lines this redundancy is further used to reduce the number of differential signals. The clock for the configuration interface is retrieved from the clock of the readout state machine, if a different clock speed is required this is easily provided with the usage of a clock divider. This double use of the reference clock already breaks the similarity with the  $I^2C$  protocol and demands an entirely different approach, due to the continuously running clock which makes the data transmission style of  $I^2C$  non applicable.

Further the data output of the slow control interface is fed into the readout state machine to be send out via the chip's data link. The data link does not use the full bandwidth for data transmission, but sends Komma words as synchronisation in every readout cycle. Although these Komma words are required to check and guarantee the correct word alignment on the receiving side, they occur often enough to replace some Komma words with data from the configuration interface instead, without jeopardizing the data integrity. By doing so, two signals are dropped and a further slim down of the interface to an effective 1-wire configuration interface is achieved.

figure 4.8 shows the generalised idea of the state machine structure for the configuration interface and the input data structure. In the idle state, the state machine is waiting for a start signal which enables the reading process to sample the data on the input line. As this data is broadcast into



Figure 4.8: Baseline of the slow control state machine and composition of the input data.

a bus system all chips receive it and need to identify the data intended for them. Therefore the input data contains the chips address. Further the configuration interface has to perform several different tasks, foremost sending and loading data to shift registers. To define the action to execute, a command word is transmitted. The remaining bits of the input data are the payload which are further processed during the task execution. Once the data is sampled it is being interpreted, meaning the chip address is checked and the task to perform is determined. If the data is not meant for this chip or the task was performed, the state machine returns into the idle state, waiting for a new data word.

The main purpose of the interface is the configuration of the chip via shift registers as described above. It needs to provide logic to write and load the registers. For a given data word this achieved with a small state machine as depicted in figure 4.9 which represents the protocol necessary to shift the payload bits into the shift register. When the shift register is filled as desired, an additional command is send, which invokes the load signal.

As found for MuPix8 in [52] the shift register is written arbitrarily fast, but at a limit of about 6 Mbit/s for the single bit transmission rate. With the five-state-loop in figure 4.9, which is identical to the approach used in [52], the state transition speed is limited to 30 MHz. As the reference clock provided to the readout state machine is nominally running at 125 MHz, a clock divider is required to provide the slow control clock. This is the base clock of the configuration interface.

With the introduction of a new clock domain the synchronisation between external reference system, the FPGA, and the internal clock of the slow control interface is no longer ensured, as on power-up the state of



Figure 4.9: Register control state machine, which produces the waveform shown in figure 4.2b.

the divider element is randomized and therefore introduces an undefined phase shift of the internal clock which is a priori unknown to the FPGA. It is therefore necessary to provide a reset procedure, which is applied from outside, allowing to define the phase. This is integral, as the sampling of the input data is based on the slow control clock, which is given by the zero crossings of the clock divider. Only with the exact knowledge of the phase by the FPGA, a sampling without data corruptionis guaranteed.





figure 4.10 describes the reset procedure after power-up. With an unknown phase shift, there is only one type of signal which is sampled without corruption: A constant signal. The sampled value is independent of the sampling time. By pulling the signal input to the high level all sampled values is '1'. Therefore this special data word is used to achieve synchronisation. When it is recognized the configuration state machine goes into the "synchronous reset" state which resets the clock divider and holds this state until the input line is pulled back to the low level. Then the divider is released and the phase of the internal clock is well defined. The precision of this synchronous reset procedure and its usage for the readout state machine is discussed later. It offers the possibility to remove the differential synchronous reset signal at the interface.

Contrary to I<sup>2</sup>C which uses an interplay of data and clock line to indicate the start and stop of a transmission, this has to be achieved differently

| SIn   | /IN  | IT / Input data  |    |     |           |  |
|-------|------|------------------|----|-----|-----------|--|
| Chip0 | Idle | Read & Interpret | X  | Exe | cute Task |  |
| Chip1 | Idle | Read & Interpret | Ϋ́ |     | Idle      |  |

Figure 4.11: Reading input data. Input data is sampled on the divider's zero crossings, here indicated by horizantal lines.

in this implementation, as the clocks are running continuously. The approach chosen here is defining the default state of the input line as the low level (idle). While the configuration interface is in the idle state, it is sampling the input line and waiting for a level change. Once a change to high level is detected (INIT), the state changes to the reading mode and samples the input level on each zero-crossing of the divider and store the value in an internal shift register, see figure 4.11. The state machine counts the number of sampled bits and changes into the interpretation state after a defined amount, which makes the stop condition implicit.

The readout of configuration data was one of the most disputed parts, as it needed to tap into the sensitive, fast readout logic which takes care of the sampling of the hit information as well as the serialisation of data. To achieve this, various clock domains are used which require a very careful synthesis to guarantee a correct alignment of all clocks and the correct transmission of data. Therefore an implementation with minimal changes and without any conditions that bind the readout to the configuration interface was selected. The readout state machine is unchanged, but the data transmitted in one of the states is either a filling Komma word, as before, or the data provided by the configuration state machine. To prevent any dependence of the readout on the configuration state machine, this data transfer is implemented without a hand-shake and without any matching conditions between the configuration and readout clock domains during synthesis. No hand-shake means that the configuration interface provides the data to-be-send to the readout, but it is not receiving an acknowledgement, whether the data was send, making it possible that slow control data is lost. As the the speed of the configuration interface is reduced compared to the readout, the data is available for some time, before a new slow control data word is send to the readout. Thereby increasing the probability of the data to be read out. To further make missing words detectable, the configuration interface adds a counter value into the data. A skipped counter value indicating a missed word.

## 4.2.1 Logic implementation & Verification

To realise this complex logic, the same solution as already used for the readout state machine is chosen. The interface was implemented in the hardware description language Verilog and used logic synthesis to transform it into a circuit design. This also allows for a co-synthesis of slow control and readout state machine. To perform the synthesis the framework, already present for the synthesis of the readout state machine used. The framework consists of a series of tcl-scripts which invoke the CADENCE®tools Innovus and Genus to synthesize the code and a verilog testbench running at a reference clock of 160 MHz for verification. The code developed to test the slow control interface is directly transferred to the FPGA firmware to run it in the test setup.

As a first step the base floorplan is prepared by defining the size of the area which is used for synthesis, placing I/O pins in the periphery of this cell and providing a broad power grid to allow for an even distribution of power for the synthesis area. In the next step the code is translated into an implementation in logic gates, before performing an automated place-and-route routine to place and connect the logic gates on the prepared floorplan.

The verification of the slow control state machine with the testbench is performed after each step: For the verilog code (Register-Transfer-Level), the gate level representation and finally after the place-and-route procedure, taking into account all signal line delays. As the slow control is running on slower base clock and no additional timing constraints are introduced, no problems are encountered during the timing analysis. Also the introduction of slow control data words into the data stream does not jeopardise the data integrity.

## 4.3 MUPIX9

The first version of this interface was implemented on MuPix9 shown in figure 4.12. Here it is available in two flavours, once synthesized as a standalone entity and once integrated into the RO-statemachine, as intended in the end. In the following the MuPix9 implementation, data format and commissioning results are presented.

# 4.3.1 Implementation

Both implementations are depicted in figure 4.13. The standalone synthesis has no connection to the readout state machine or any further logic. The two slow control state machines share the same input pads. The goal of this version is to test the developed protocol with a focus on the data IO and the register communication, the state machine is implemented as shown in figure 4.8.

At the time of design the exact register structure of the final MuPix chip is undefined. It is assumed that it features a parallel register structure, therefore this version is designed to service two parallel registers. Aside from the readback of register data, an additional interface is provided to readout a 12 bit value from an Analogue to Digital Converter(ADC). The ADC is not available for MuPix9, so only a dummy interface is foreseen. As it is the very first implementation additional in- and outputs are added, to provide adjustability and debug information. The clock division factor is adjustable



Figure 4.12: The MuPix9 chips with the standalone and co-synthesized state machines highlighted with a green and violet overlay respectively. The pads on the left edge of the chip (blue overlay) provide the in- and output signals for the slow control implementations.

and the slow clock generated that way is send out to a pad, which allows to check if the clock division works properly.

In contrast to the desired final implementation all signals are singleended in- and outputs, further the clock for the configuration interface is separated from the readout reference clock. The input data word is chosen to be 32 bits long consisting of a 4 bit address for chip identification, a 4 bit command word and 24 bit payload. The size is chosen ad-hoc as 32 bit is a natural size for the existing FPGA interface and corresponds to the data word size of the readout. The 4 bit address is required, as up to 9 chips is connected to one bus in the Mu3e experiment. The commands, summarized in table 4.2, could be wrapped into fewer bits. However, only a subset of the possible bit structures that available with 4 bit were selected, which have the peculiarity that they are not transformed into one another by a single bit flip. This protects them against errors during the reading of the input data as well as single event upsets.

A similar approach is used for the implementation of the state identifier of the state machine. Here a 11 bit wide identifier is used with one bit for each state. This has the same protection effect as the choice of the command words, but further it allows to realize the state changes in a shift register fashion by shifting one bit instead of rewriting the complete state identifier every time.

## 4.3.2 Commissioning

As the MuPix9 implementation offers various testpoints for integral functionalities as the slow clock and the synchronous reset, the integrity of



Figure 4.13: MuPix9 slow control implementation. Green signals are only present for the standalone implementation, blue only for the co-synthesized version.

| Action Name      | Bit-identifier |  |  |
|------------------|----------------|--|--|
| WriteDacRegister | 0011           |  |  |
| LoadDacRegister  | 0101           |  |  |
| WritePixRegister | 0110           |  |  |
| LoadPixRegister  | 1010           |  |  |
| ReadbackRegister | 1100           |  |  |
| ReadbackADC      | 1001           |  |  |

Table 4.2: The command set available for the MuPix9 slow control implementation, serving 2 registers and the ADC dummy interface.

these features is checked with the oscilloscope and no issues have been found. With these two instances tested, the sending of input data is the next step. Also here no difficulties have been observed if the division counters used on chip and FPGA are identical and synchronized as intended. The chip only recognizes data words with the correct chip address and unknown command words do not have an effect. The write and load commands produce the desired signal patterns to control the shift registers. Both the write-register and read-ADC commands submit slow control data words into the data stream of the readout state machine and is extracted.

The ultimate test of the configuration capability was performed as depicted in figure 4.14b. To configure the full chip via the slow control interface, both implementations need to be utilized. The co-synthesised slow control state machine is used to configure the first part of the register, the stand-alone implementation needs to provide the clock and load signals for the second register part. As this operation mode was not foreseen on the chip carrier PCB, a routing of the signals with single-ended cables as shown in figure 4.15 had to be performed. With this implemen-



(a) Default MuPix 9 register structure.



(b) Utilizing internal and external register access.

Figure 4.14: Shift register architecture for MuPix9.

tation the chip could be fully configured and used with an input clock of 125 MHz(setup limit) and the internal slow control clock running at 62.5 MHz (clock division factor 2), leading to a writing speed of 12.5 Mbit/s to the shift registers, more than double the speed that has been deduced in [52]. The correct configuration was here inferred qualitatively by observing the power consumption of the sensor, the response to a radio active source and the measurement of configurable on-chip voltages for several configuration cycles. The reason for the difference in register speed is most likely rooted in the length and distribution of the shift registers on MuPix8 and MuPix9 respectively. While MuPix8 featured a register of almost 3000 bits and the shift register cells being distributed over a length of more then 40 mm, this is very reduced in MuPix9 with less than 700 bits and a register path of less than 10 mm. Delay times created by theses routing distances are the most likely the cause for the smaller writing speed limit observed for MuPix8.

# 4.3.3 Conclusion

On the MuPix9 chip both the stand-alone, as well as the co-synthesized implementation have been tested successfully. All parts of the developed protocol work and allow for register writing speeds, up to 12.5 Mbit/s. This however has to be carefully tested for future register infrastructures, as length and distribution seem to play an important role in the accessible speed performance, which is a further argument for the parallel organisation of the registers.

With this proven, the development for the next chip is expanded on this baseline, fitting it to the chips register interface and providing additional functionalities as an injection mechanism. Further a few shortcomings of this still simplistic implementation should be overcome in the next iteration. One is the length of the input data word, which is with 24 bit payload and 8 bit overhead rather inefficient. An expansion of the data word from 32 to 64 bit is envisioned here. Further, an error reporting system needs be added as right now undefined states or wrong command words do not have



Figure 4.15: Improvised signal routing to achieve the co-usage of the implemented slow control state machines. The various jumper allow to run the chip in different powering and slow control modes. The blue jumpers allow to set the chip address and slow control divider.

a system breaking effect, but go unnoticed and is therefore still corrupting the configuration process.

#### 4.4 MUPIX10

With the proof-of-principle achieved in MuPix9, the slow control implementation is now expanded and moulded to fit the requirements of the final chip. Although MuPix10 is still denoted an R&D chip, the requirements to the slow control interface are the same as for the final chip as it is used for the production of the first modules. With no ADC implementation becoming available, this has to be implemented as well and is partially integrated into and controlled by the configuration interface.

#### 4.4.1 MuPix10 Configuration Infrastructure

The MuPix10 registers infrastructure is highlighted in figure 4.16. It uses 6 parallel registers with three taking care of the global chip configuration: Bi-asBlock(steering of analogue circuits), Voltage DACs(Baseline and thresholds) and Digital-Configuration (Clock speeds, readout state machine configuration). One register to steer the access to test infrastructure as Injection, Hitbus and AmpOut and the remaining two registers are used to write and read the pixel RAM. Each pixel contains 7 individual bits implemented as SRAM cells. 3 bits for tuning per comparator and 1 bit which allows to mask the pixel in case it is too noisy. The readout of this RAM cells is possible via a bidirectional read write infrastructure which utilizes a pre-charge

mechanism for the readout, which is not further discussed here. The registers are steered via 3 different interfaces - all external, SPI or the Mu3e slow control interface - which are selected via a multiplexer system implemented in the synthesized code.



Figure 4.16: The MuPix10 parallel register structure. The individual configuration approaches are selectable via external enable signals(EnSC or EnSPI).

#### 4.4.2 Implementation



Figure 4.17: The MuPix10 slow control implementation. Via a configuration bit the internal synchronous reset signal(green) is selected instead of the external signal. The slow control division factor is hard wired and the Reset signal is connected to the asynchronous chip reset (red).

The MuPix10 implementation of the slow control interface is depicted in figure 4.17. It is jointly synthesized with the readout state machine. All input signals are differential and the reference clock is shared between readout and slow control state machine. As on this chip the availability of pads

is very limited, no pads could be sacrificed for additional outputs of the slow control, which makes the debugging much more challenging. Ideally the division factor could be also chosen freely to find the optimum during commissioning, but here as well no pads are available and is therefore hard wired on the chip fixing the division factor to 8. Only the 4 bit address is variable as required.

The state machine is expanded as described in figure 4.18 The input data word was extended to 64 bit to improve the payload-overhead ratio to enable more data transfer for a single write-to-register command. Six individual command sets are supplied for the parallel registers. In case of the registers responsible for the pixel-configuration additional commands are introduced which enables a faster configuration process. Further special commands have been introduced which allow to create an injection pulse, reset the registers globally and steer the newly added ADC. With these implementations the amount of required command words is heavily increasing and requires the expansion of 4 bit to 6 bit commands, leaving the data word with 10 bit overhead to 54 bit payload. The complete command list is found in the table 4.4. Many of these commands, especially the injection and register reset, are aimed to be used simultaneously on all chips of a module. To enable this a broadcast address is implemented, which is recognized by all chips independent of their chip address.



Figure 4.18: The MuPix10 slow control state machine, with error reports and full controlling the ADC state machine.

Although the overall structure of the state machine remains unchanged two additions have been made, the ADC and error reports. While already in MuPix9 the state machine and the command words have been secured against single bit flips, there was no feedback of information in case something went wrong. With MuPix10 error reports are introduced, which highlight the type of error and supply some additional slow control information which helps to identify the possible issue. 3 types of errors are possible: Undefined state, undefined command, undefined state of the ADC. Undefined states could be generated either by external effects (power instability or SEUs) or by a bad transition between different states, which ideally

| Readout State | No SC Data  | With SC Data | Data Type       |
|---------------|-------------|--------------|-----------------|
| LdCol2        | lc aa lc aa | lc aa lc aa  | Identifier      |
| LdPix         | 2e 27 d3 3b | 2e 27 d3 3b  | Chip Time       |
| PD            | bc bc bc bc | 5c bc 5c bc  | SC Identifier   |
| LdCol1<1>     | bc bc bc bc | ff ff ff ff  | SC Data [31:0]  |
| LdCol1<3>     | bc bc bc bc | 60 7f ff ff  | SC Data [63:32] |
| LdCol1<5>     | bc bc bc bc | bc bc bc bc  | Komma Filler    |
| LdCol1<7>     | bc bc bc bc | bc bc bc bc  | Komma Filler    |
| LdCol2        | lc aa lc aa | lc aa lc aa  | Identifier      |

Table 4.3: Side by side data example for a readout cycle without hit information in hexadecimal representation. Slow control (SC) data is marked by a Komma word series containing K28.2(5C) as identifier.

should never happen. Wrong commands are either a mistake of the user or a wrongly sampled bit during the reading of the data word. The latter hints to a non-optimal phase relation between input data and sampling clock which is corrected for by the controlling device. The error reports, as well as readback data and data provided by the ADC are now 64 bit long words which requires more bandwidth. The principle of the data readout remains unchanged, the data is provided to the readout state machine without a hand-shake and is send out instead of Komma words. The larger payload requires to replace two Komma words. The slow control information is send directly after the hits. Without further adjustments the readout FPGA is not able to distinguish between hits and slow control data. Therefore the 64 bit of slow control data are proceeded by the newly introduced word containing the Komma word K28.2(0x5C), which marks the beginning of the slow control data. Table 4.3 show an exemplary data stream with and with out the slow control data.

#### 4.4.3 *The ADC*

The design of the ADC is shown in figure 4.19. It is a hybrid design using synthesized elements of the slow control state machine as well as custom designed parts. Control and readout are provided by the slow control interface. The ADC is implemented as a counter-ramp ADC. A counter generated in the synthesized part is connected to a voltage DAC thereby creating a voltage ladder while counting up. With the help of a rail-to-rail comparator a selectable voltage is discriminated against the voltage ramp. When the ramp voltage surpasses the to-be measured voltage level the comparator switches and thereby stops the counter. The counter value corresponds to the digitized voltage level. The used 8 bit voltage DAC covers a range of 1.8V therefore providing voltage steps with a precision of  $1.8V/256 \approx 7 \text{ mV}$ . The counter speed is adjustable through the slow control interface and is

slowed down by an additional factor 1024 compared to the slow control state machine's base clock. The voltages to be measured are selected via a multiplexer. The measurement is being repeated in an endless loop until it is stopped by the slow control interface. 3 different measurement modes are available: the measurement of a single voltage, a sequence of 4 selected voltages or all 32 possible voltages. When the digitized value is determined a 64 bit word is send out containing the measured value as well as the settings and multiplexer value used in the measurement.



Figure 4.19: Schematic depiction of the ADC implementation. The signals in blue are provided by the slow control state machine.



Figure 4.20: The layout of the ADC. The VDAC cell which creates the voltage ladder extends even further and takes up the most space.

#### 4.4.4 First Commissioning

When MuPix10 was first commissioned a bug was found in the configuration register which is unable to permanently store loaded values. To over-

| Action Name           | Bit-identifier | Effect                        |
|-----------------------|----------------|-------------------------------|
| WriteDacRegister      | 111000         | Write Bias DAC register       |
| LoadDacRegister       | 000111         | Load Bias DAC register        |
| WriteConfRegister     | 110100         | Write Config register         |
| LoadConfRegister      | 001011         | Load Config register          |
| WriteVDACRegister     | 110010         | Write VDAC register           |
| LoadVDACRegister      | 010011         | Load VDAC register            |
| WriteColRegister      | 110001         | Write PixelRam Write register |
| LoadColRegister       | 100011         | Load PixelRam Write register  |
| WriteTestRegister     | 101100         | Write Test register           |
| LoadTestRegister      | 001101         | Load Test register            |
| WriteTDACRegister     | 101010         | Write TDAC register           |
| LoadTDACRegister      | 010101         | Load TDAC register            |
| ReadbackDACs          | 100110         | Readback for all registers    |
| ReadbackTDACs         | 100101         | Readback for the pixel bits   |
| SteerADC              | 100000         | Steer the ADC                 |
| Inject                | 001000         | Trigger an injection          |
| ResetBiasBlocks       | 000010         | Reset the registers           |
| ShiftColRegisterbyOne | 111101         | SHIFT-BY-ONE                  |
| SyncReset [64bit]     |                | Synchronous Reset             |

Table 4.4: MuPix10 slow control statemachine commands. Untestable marked in read, partially testable in orange and fully testable in green.

come this, the load signal needs to be enabled at all times, constantly loading the value from the register cell. The reason for this failure is not fully understood yet. As this register has an integral part in the reset infrastructure of the entire chip, it is impossible to run the chip with all-external register control and also the newly developed slow control interface, which is designed to apply the load signal only for a defined amount of time. Only the SPI interface is able to work around this issue and use the chip to its full extend. When using the new MuPix10 slow control interface the chip is in reset state, which means the chip is not configured and therefore the readout of slow control data via the data link is not possible. Although the slow control state machine it self is not dependent on any chip settings it is fully operational and performs its tasks.

Due to this only parts of the slow control functionality can be tested properly. Especially the testing of the register control is severely handicapped, but is still investigated by switching between SPI and Mu3e slow control mode. Table 4.4 highlights the commands affected. The newest addition to the slow control, the ADC, is not affected and can be tested. A further bug was found in the sending of slow control data. Due to a missing reset, the slow control data word is send in every cycle, not only once as intended. This does not jeopardize any sensor or readout functionality. On the contrary it even helps, as it allows to read the most recent slow control data word easily directly from the data stream. Furthermore it also implicitly overcomes the issue of a lossy data transmission and can be considered to be a permanent solution.

SYNCHRONOUS RESET The synchronous reset is tested in SPI configured mode. With the help of a configuration bit the chip allows to chose between the externally supplied synchronous reset signal and the synchronous reset generated by the slow control state machine. Here a first qualitative test was performed. When the synchronous reset is applied the timestamp counters are halted and every hit is read out with a timestamp value of zero. On release the timestamps resume counting. As the release is synchronized to the fast reference clock a high precision is expected. For the module use with up to 9 chips in parallel this synchronisation mechanism has to be tested in detail.

WRITING AND LOADING Due to the bug in the configuration register the write and loading infrastructure can not be fully tested. However, by switching between SPI and Mu3e slow control mode at least the register writing and readback capability can be tested. When sending a write command the value of the last register cell is read back for each clocking cycle. If the readback command was send before hand the stored values are read out. With the reset issue these values are corresponding to the reset values of the register. The readback word is stored in a slow control register. By switching back to SPI mode the chip is configured and due to the bug in the slow control data handling the last readback word is extracted from the data stream, proving the functionality of the register write and readback capability. With each writing command the word-counter, introduced to be able to detect missing words, is increased by one. This works for all registers. The readback of the pixel RAM values is not tested.

The writing and loading procedure is tested with the configuration register. It contains one bit which is responsible for the reset of all the other registers and is the reason for big impact of the configuration register malfunction. However, during commissioning it was found that when this bit was set, by applying the load signal, the current consumption of the digital voltage supply would increase significantly and also cause a visible spike on the supply voltage. It is assume that this effect is also caused by the malfunction. But as it is a clear indicator that this bit was set successfully, these observables are used to check the functionality of the writing and loading functionality of the configuration register.

To perform this test two different bit strings are send into the configuration register. One with only zeros. In this case, if the load signal is applied no change in current or an observable voltage spike are expected. For the second string the reset bit is set to one and the rest to zero. When the load signal is send a clear voltage spike and current increase is visible, confirming the correct configuration. The one in a different bit position does not have an effect. This test confirms that the correct bit pattern is written to the register and the loading command works. This test has been performed nominally at 125 MHz reference frequency. With the chip internally hard wired division factor of 8 for the slow control, this results in writing speed of 3.125 Mbit/s.

To test whether the writing and the slow control can operate at higher speed, the reference clock was increased up to 250 MHz (setup limit), corresponding to 6.25 Mbit/s. Also for this test writing and loading works properly, no bits are skipped in the register and the enabling of the loading signal causes the spike and power change.

ERROR REPORTS The newly introduced error reporting system is tested in SPI configured mode, which allows the readout of slow control data words. The error report on invalid commands is tested on purpose, by sending wrong commands. This works as expected and an error message containing the input data word as debugging information is send out via the data stream. The error reports form invalid state values of the state machine can not be tested deliberately. However, after power up the first transmitted slow control word is the error message for an invalid state, thereby confirming the functionality. An error message from a wrong ADC state was not observed so far.

ADC CONTROL & READOUT The ADC control is found to be fully functioning. It is configured and measurements can be performed. The multiplexer is used to select the desired chip internal voltages. This is checked at an external testpoint of the multiplexer output, which also allows to supply an external voltage level for a calibration measurement. For a first test the ADC was configured to measure the baseline voltage with the slowest possible sampling speed. The result of the ADC measurement is written into the data stream and is readout directly. A scan of the baseline voltage, provided by the chip internal VDAC is performed and measured with the ADC. The results is plotted in figure 4.21. The first test with the chip in default configuration shows a slight non-linearity of the ADC measurement as the differences of set and measured values increase. As voltage drops have been observed on the sensor, this might create a difference between the supply voltage levels applied to the baseline VDAC and the VDAC of the ADC and thereby create the non-linearity. To exclude this the measurement is repeated with a reduced current on the sensor which also reduces the voltage drops. Here a systematic but constant offset between applied and measured values is observed, proving the non-linearity is created by the voltage drops. The measured offset can be caused by various effects, a supply voltage difference between the two VDACs, but also an offset caused by the comparator inputs or a systematic effect of the measurement. As the



(a) DAC value plotted agains the measured ADC value for a scan performed with the baseline VDAC shows a good linearity.



the difference of set and measured value.

Figure 4.21: Commissioning measurement of the ADC with a scan of the baseline voltage.

voltage ladder approaches the to-be-measured voltage from below, a noise component on the two voltages creates a tendency to measure smaller values. A detailed calibration measurement gives more insight.

#### 4.4.5 Module Configuration Strategies & Performance Projection

#### 4.4.5.1 Bus Utilisation

The developed slow control interface offers 3 different possibilities to configure the chips connected to the same bus. One chip at a time, all chips at the same time by using the broadcast command or in parallel. The parallel approach is illustrated in figure 4.22. While one chip is executing a task, in most cases a write register command, no new input data is read until the task is done. This periode is used to send further commands to the other chips and thereby create an interleaved configuration structure. However, one needs to take care that a chip that is currently executing a task is not returning into the wait state in the middle of an other data transfer. If this happens, a partial data word is read and could accidentally match the chip address and a valid command word. For the MuPix10 implementation a data transmission, from the initialization until the chip is either back in the wait state or executing the task, takes 67 slow control cycles. The execution of a write command needs 272 cycles (5x54+2). These numbers match up quite well, as 272 is 4x68. This means by sending data words in 68 cycle intervals 4 more data words are send while one chip is performing a write task. This helps to highly parallelise the configuration process.

| Sin   |                                 | Input data        | /            | Input data       | / Input data      |
|-------|---------------------------------|-------------------|--------------|------------------|-------------------|
| Chip0 | Chip0 Idle X Read & Interpret X |                   | Execute Task |                  |                   |
| Chip1 | Idle                            | Read & Interpret  | (Idle        | Read & Interpret | Execute Task      |
| Chip3 | Idle                            | (Read & Interpret | Idle         | Read & Interpret | Idle Execute Task |

Figure 4.22: Multi chip configuration using an interleaving command structure.

#### 4.4.5.2 Module Configuration Strategies

For the module configuration 3 types are covered. The base configuration, the zeroing of all pixel RAM and the full configuration of the pixel RAM. To our current best knowledge, the chips all behave very similarly with the same settings, therefore enabling the definition of a base configuration. This configuration is a default settings for all sensors based on the Bias, Configuration and VDAC register. After supplying the base configuration to a chip it is fully functional, independent of the configuration of the pixel RAM. This means by broadcasting the base configuration to a module, all chips are running in default mode. The time required to perform this base configuration is independent of the module size. If individual adjustments are necessary e.g. the threshold levels, these are transmitted later. Small adjustments have only minor impact on the overall power consumption of a sensor, therefore allowing the base configuration to establish an almost final state concerning power consumption, which fixes voltage drops on the flex print and the dissipated heat, which is an integral step for the detector start-up.

The zeroing of all pixel RAM allows to bring the sensor in an absolutely defined state, although with the correct base configuration, the pixel RAM information has no effect anyway. The zeroing is appended to the broad-cast of the base configuration. It is performed much faster than the full configuration of the pixel RAM matrix, as it requires to write the TDAC register only once with all zeros. For the full configuration the TDAC register has to be filled with the correct bit values for each RAM column.

Table 4.5 and 4.6 summarizes the duration in slow control cycles for the command combination to write and load the different registers. The SHIFT-BY-ONE command is specifically designed to optimize the configuration of the pixel RAM cells. The idea is based on the fact, that only one column of RAM cells is written at the same time. This means there is only a single bit set to 1 in the write-enable register. Instead of performing a full write and load cycle with this register to enable the next column for writing, a function was implemented, which performs one clocking cycle, shifting the 1 to the next register cell, and executes the loading process for the pixel RAM cells. As the pixel RAM configuration is an iterative process and the loading process has to be performed 896 times to fully configure the pixel matrix, the SHIFT-BY-ONE command heavily decreases the amount of slow control cycles required for the configuration, while enabling the simplest form of column by column RAM writing.

| Command        | Duration [Number of Slow Control cycles]  |
|----------------|-------------------------------------------|
| READ DATA WORD | 67 (68)                                   |
| WRITE REGISTER | $67 + (5 \times 54) + 2$                  |
| LOAD REGISTER  | 67 + WAIT + 2                             |
| SHIFT BY ONE   | $67 + (5 \times 1) + 2 \times (WAIT + 2)$ |

Table 4.5: Duration of command transmission and execution.

| Register               | Length [bits] | Command Combination               |
|------------------------|---------------|-----------------------------------|
| Bias                   | 210           | $4 \times WRITE + 1 \times LOAD$  |
| Configuration          | 90            | $2 \times WRITE + 1 \times LOAD$  |
| Voltage DAC            | 80            | $2 \times WRITE + 1 \times LOAD$  |
| Test                   | 896           | $17 \times WRITE + 1 \times LOAD$ |
| PixelRAM: write enable | 896           | $17 \times WRITE + 1 \times LOAD$ |
| PixelRAM: TDAC         | 512           | $10 \times WRITE + 1 \times LOAD$ |

Table 4.6: Command combinations for the configuration of individual registers.

#### 4.4.5.3 Performance projection

Based on these number, the knowledge of the shift register lengths and the amount of bits transferred in one write task (54), the time required to configure MuPix10 is determined. This calculation was performed for 3 different scenarios: single chip, 3 chip bus and 9 chip bus, utilising the different approaches discussed above.

Table 4.7 summarizes the command combinations required to perform the pixel RAM configuration in different styles. The strategy for the interleaving approach and assumptions made are highlighted in he following. As discussed above, up to 5 write commands can be strung together before the first chip finishes the execution of its task. For a 5 chip module this would be perfect, as no waiting times would be generated. In the case of the 3 chip module this requires the input bus to wait until the task of the first is finished to send the next command. For the 9 chip module it means

| Туре         | Command Combination                                                      |
|--------------|--------------------------------------------------------------------------|
| Brute Force  | $896 \times (((10+17) \times \text{WRITE} + 2 \times \text{LOAD}))$      |
| Smart        | $896 \times (((10+1) \times WRITE + 2 \times LOAD))$                     |
| Shift-by-one | $896 \times ((10 \times WRITE + 1 \times LOAD) + 1 \times SHIFT-BY-ONE)$ |
| Zero all     | $(10 \times WRITE + 1 \times LOAD) + 896 \times SHIFT-BY-ONE$            |

Table 4.7: Command Combination to perform the pixel RAM configuration. "Brute force" writes both registers fully every time, while "smart" uses a similar approach as shift-by-one, but with one write command which shifts the enable bit by 54 instead of 1.

the first chip has to wait for the next command after finishing its task, as the 6th, 7th, 8th and 9th chip did not receive their WRITE task yet. As the LOAD or SHIFT-BY-ONE commands have to be send to each sensor, they are broadcast after all writing tasks are finished. The application of the interleaving method does only make sense, when the sensors need different information, so mainly for the pixel RAM configuration.

For the time calculation a reference clock speed of 125 MHz and a slow control division factor of 8 are used, which gives a 64 ns long slow control cycle. The WAIT cycles, which define the duration of the load signal, are set generously to 10. The required value to guarantee a correct loading process has to be determined experimentally. However, during the first tests of the slow control state machine a WAIT of 1 sufficed as well. This however needs to be tested for all registers, not only the configuration register. The results of the time calculation for different approaches and chip numbers are highlighted in table 4.8.

| Туре      | Approach     | Single Chip | 3 Chips | 9 Chips |
|-----------|--------------|-------------|---------|---------|
| Base      | chip by chip | 190 µs      | 570 µs  | 1710µs  |
| Base      | broadcast    | 190 µs      | 190 µs  | 190 µs  |
| Base      | interleaving | 190 µs      | 260 µs  | 346 µs  |
| PixelRAM  | Brute Force  | 535 ms      | 746 ms  | 972 ms  |
| PixelRAM  | Smart        | 224 ms      | 309 ms  | 411 ms  |
| PixelRAM  | Shift-by-one | 205 ms      | 283 ms  | 377 ms  |
| Base&Zero | broadcast    | 5.9 ms      | 5.9 ms  | 5.9 ms  |

Table 4.8: Calculated time durations for different scenarios. The PixelRAM configuration makes use of the interleaving command structure.

#### 4.4.6 Conclusion

The MuPix10 implementation of the slow control state machine was successfully commissioned within the limits set by the malfunctioning configuration register. Newly added functionality as the error reports and the ADC are working. The ADC is used for the measurement of on-chip voltages. A detailed characterisation of the ADC needs to be performed to investigate the possible non-linearity of the circuit and calibrate it.

The functionality of the registers interface could only be investigated with a work around and the damaged configuration register in qualitative approach. The readback function of the register is functioning and the information is transmitted correctly to the data stream. At the nominal setting with 125 MHz reference clock speed and slow control clock division factor of 8 the registers are written with a bit rate of 3.125 Mbit/s. With the proposed highly optimised schemes a nine chip module is fully configured with individual chip pixel RAM configuration in less than 400 ms, which takes about 4.5 s in an unoptimized chip by chip approach. Further a module is configured into a fully functioning state in less then 6 ms. A test with doubled reference clock speed suggests that even higher register speeds could be feasible. Therefore the third bit of the slow control division factor should be made adjustable via a pad, which allows to choose between a division factor of 8 or 4, thereby adding the option to double the speed. This speed-up would half the time required for sensor configuration, but at the same time undermine the current implementation of the hand-shake free output data handling of the slow control. As the speed increases the probability for data losses increase drastically. Two options are available to work around this issue. The first is a slight rework of the data handling using an acknowledge of new slow control data on the side of the readout state machine. The second is to embrace the current "bug" of sending the same data word multiple times and leave it to the FPGA to detect the individual words, which is feasible, as the words contain a counter element. Both solutions fix the issue of possible data losses entirely.

| Туре     | Data length | Single Chip | 3 Chips | 9 Chips |
|----------|-------------|-------------|---------|---------|
| PixelRAM | 10+54       | 205 ms      | 283 ms  | 377 ms  |
| PixelRAM | 10+86       | 198 ms      | 272 ms  | 370 ms  |
| PixelRAM | 10+128      | 228 ms      | 312 ms  | 430 ms  |

Table 4.9: Calculated configuration times for the 3 different input data lengths.

An other option that could be able to further increase the configuration speed is an expansion of the input data word. However, as shown in table 4.9, for a data word of 96 bit length with 10 bit overhead, this gives only a minor improvement, while it even degrades for a 128 bit long data word. For very long on-chip registers you could profit from this expansion. How-

#### 74 MUPIX SLOW CONTROL

ever, for relatively short registers in MuPix10, and thereby also MuPix11, the input data word for the load-command is also longer, counter balancing the gain through the reduced overhead of the write-commands. Additionally the interleaving of data words does not work as effective anymore.

Over all the implementation of the slow control interface was very successful. Although not every functionality could be tested with MuPix10, the implementation is considered successful, with a minor adaptation for the data output handling, is usable as the chips slow control interface for the Mu3e experiment. Concerning the module requirement of a minimal pin out, a perfect solution was found. With the incorporation of the synchronous reset into the slow control no additional differential signal is required, leaving the pin out with 2 differential input lines as before.

# 5

## ANALOGUE SIGNAL TRANSMISSION IN LARGE SCALE MONOLITHIC CHIPS

A peculiarity of MuPix and ATLASPix chips, as well as many other largefill factor designs, is the transmission of un-digitized pixel hit pulse over a long point-to-point connections from the active pixel matrix into the periphery. Especially in large scale chips the long transmission lines need to be routed very densely and form parasitic capacitance with their neighbours which lead to crosstalk and measurable delays for the transmitted signal pulse. In this chapter the current knowledge obtained from crosstalk and delay measurements is presented and models are created to describe the crosstalk and delay effects. Based on the experimental results the routing of the transmission lines on MuPix10 is optimized and the results obtained with the MuPix10 chip are presented. Finally the prospects for further improvement of the routing on MuPix11 are investigated. Part of the presented matter was already covered in [69], similar wording may occur.

#### 5.1 INTRODUCTION

Current large fill-factor HV-MAPS designs as the MuPix chips make use of point-to-point connections between in-pixel electronics and its peripheral cell. This straight forward approach comes with some limitations, as for large scale sensors several metal layers are required as routing space. Even at the highest possible routing density, which is allowed by the process, the realisation of MuPix10 with 250 pixel of  $80 \times 80 \mu m^2$  in one column is barely possible with two metal layers. This posses a limitation on the achievable pixel size for large scale design of this type and also affects its performance as observed for MuPix8, with only 200 pixel per column but the identical routing density.

MuPix8 utilised the simplest possible routing of exact point-to-point connections, which leads to an increasing signal line length along the column as shown on figure 5.1a. Due to the extreme routing density neighbouring signal lines are in very close proximity and form parasitic capacitances with each other which allows a particle induced amplifier pulse traversing a signal line to couple to its neighbours and induce small pulses, referred to as crosstalk. If these pulses are large enough to cross the threshold of the victims discriminator a hit is created as depicted in figure 5.2.

Figure 5.1b highlight the case of a crosstalk event in both neighbouring lines. These not-particle induced, artificial hits will be readout and form an additional load for the readout and all subsequent systems in the Mu3e experiment. Investigations of the crosstalk for MuPix8 showed two alarming properties of this routing scheme.



(a) Routing with increasing line length.

(b) Crosstalk to both line neighbours (red).

Figure 5.1: MuPix8 routing scheme and triple crosstalk topology.

Foremost the probability of crosstalk occurrence as measured in figure 5.3. The longer the signal line length (the higher the row address), the more crosstalk is observed. This is explained by the scaling of the parasitic coupling with the line length. For the longest signal lines of MuPix8 with 16 mm the probability to cause a crosstalk hit in both neighbours was found to be 35%, which linearly extrapolated to MuPix10 with a maximum line length of 20mm would give rise to 48% probability to cause two additional crosstalk hits. Averaged out over the full sensor this gives a crosstalk probability per charged particle hit of 18%. Which can be interpreted as 1.36 hits per crossing particle, a 36% overhead. In phase 1 of the Mu3e experiment this would account for more than 2 MHz of additional hits on the sensors with the highest occupancies (5 MHz).

It has to be remarked, that this is a conservative estimation, as crosstalk to only one neighbour was ignored and this data only reflects the case of an  $80 \Omega$  cm sensor with -50 V applied high voltage, approximately  $20 \mu$ m depletion depth, for perpendicular crossing particles. In the final application in Mu3e a depletion depth of  $30 \mu$ m is desired and the traversing particles will cross non-perpendicular, both increasing the trajectories length inside the detection volume and thereby the amount of deposited charge. This is expected to increase the amount of large pulses that will be send over the signal line, subsequently causing more crosstalk.

A further disadvantage of this routing scheme is connected to the crosstalk topology. As neighbouring lines are connected to neighbouring pixels, a crosstalk hit will form a cluster with the pixel that was hit by the particle. Without further investigation clusters created by crosstalk and actual clusters created by charge sharing are indistinguishable. Therefore



Figure 5.2: Crosstalk pulse induction in MuPix signal lines.



Figure 5.3: The row dependence of the crosstalk probability measured for MuPix8 [53].

they can not just be used in a cluster based reconstruction without introducing biases. Especially for inclined tracks, as here the cluster shape also contains information of directionality.

While the correlation of crosstalk with the signal line routing is undisputably proven, the observed position dependent delay which also correlates with the pixels signal line length is so far lacking an explanation. It is axiomatic to assume a correlation with the lines capacity, but as far as the MuPix8 results are concerned, also a voltage drop across the sensor as explanation is not impossible, especially the column dependence. For MuPix8 with a line length up to 16 mm the observed delay difference between shortest and longest line is about 40 ns. With even longer lines in MuPix10 (20 mm) this difference would approach 50 ns. Together with the timewalk effect this delay is limiting the uncorrected raw time resolution of the sensor. While in characterisation measurements offline-corrections



Figure 5.4: The position dependent delay as measured for MuPix8[50].

can be applied to compensate these effects and access the chips intrinsic time resolution, this is not necessarily possible in an online system as the Mu3e experiment. Therefore a control and reduction of the delay is desirable.

With the extrapolated numbers for crosstalk and sensor delay at hand, a transfer of the simple routing scheme to the full scale MuPix10 is not advisable. Therefore an alternate, more involved routing scheme was explored for MuPix10, based on the empirical observation from the MuPix8 design. Further both the crosstalk and delay phenomenon are studied on a fundamental, but more intricate level as done so far, to gain a deeper understanding of the effects and extract valuable input for layout consideration without the necessity of a parasitic extraction of the signal lines.

#### 5.2 THE SIGNAL LINE IN MUPIX-LIKE CHIPS

In the MuPix approach the signal line is driven by a source follower in the active pixel, sending the analogue pulse from the amplifier into the periphery. Here the line is coupled to the comparator via an RC high pass filter as depicted in figure 5.5. The signal line is here pictured idealized, purely as an electrical interconnect, without any contribution to the circuitry. It is however an extended object with sizeable capacitances and resistance which needs to be treated as part of the signal path.

Figure 5.6 shows a typical scenario for signal routing using two metal layers for the routing as it can be found in MuPix8. The dominant capacitances are highlighted and are estimated using a parallel plate capacitance approach. Possible contribution from fringe capacitances will be ignored in the following.

The parallel plate capacitance is estimated using Equation 5.1 with the vacuum permittivity  $\epsilon_0$ , relative permittivity  $\epsilon_r = 4.1$ , the capacitor area *A* and the plate distance *d*.



Figure 5.5: Sketch of the MuPix signal line.

$$C = \epsilon_0 * \epsilon_r * \frac{A}{d} \tag{5.1}$$

For the capacity between two metal lines within a metal layer this gives Equation 5.2

$$C = \epsilon_0 * \epsilon_r * \frac{H * L}{d_1}$$
(5.2)

and for lines in different metal layers Equation 5.3, with sizes from figure 5.6c, *H* the height of the metal line, *W* the width and  $d_1$  and  $d_2$  the distance between the lines within and in between the metal layers respectively.

$$C = \epsilon_0 * \epsilon_r * \frac{W * L}{d_2} \tag{5.3}$$

The height of the metal lines as well as the layer distances are defined by process parameters and can not be altered. For the routing layouts used for MuPix8 and MuPix10 the inter-metal-layer capacitance is only a third of the line-to-line capacitance. table 5.1 shows typical values for the coupling capacitances for neighbouring 20 mm long lines and the typical resistance of these lines. The metal layers over and below the routing layers are here indicated as planes. In reality however, these will be also segmented and the exact geometry is different for each line. As a general approach also here the inter-metal-layer capacitance is assumed in following. It is expected that the change from aH18 to the TSI process increases the parasitic capacitances as the height of the metal lines increases and the metal layer distance decreases, therefore further increasing the crosstalk probability for a MuPix8-like routing structure. The resistance of a 20 mm long line is calculated from the processes sheet resistance to  $6.43 \, \mathrm{k}\Omega$ .

A further property that needs to be to considered to describe the signal line exactly is its self-inductance (L). Along side the lines resistance R it will



Figure 5.6: Metal layer usage and parasitic capacitance in the MuPix8 routing scheme.

form an additional frequency dependent resistance.  $Z = R + i\omega L$ . The self inductance of a 20 mm line is estimated to be 50 nH and is therefore negligible as an considerable resistive contribution only occurs for frequencies larger then 10 GHz which are not relevant for our signals and will not be considered further.

In the following the properties of the source follower as the driving element are discussed and how it is modelled. Further the description of the signal line as an RC element is investigated with a focus on crosstalk and delay effects that are created.

#### 5.2.1 Source Follower

A source follower is a type of common-drain amplifier, which is typically build from two NMOS or PMOS transistors. Our case is depicted in figure 5.7 consisting of two NMOS transistors. The lower transistor acts as

| Coupling          | Capacitance[pF] |  |  |
|-------------------|-----------------|--|--|
| line-to-line      | 1.16            |  |  |
| next-to-neighbour | 0.4             |  |  |
| inter-metal-lines | 0.37            |  |  |

Table 5.1: Typical capacitance values for 20 mm long lines in the MuPix8 routing scheme with the addition of next-to-neighbour lines in the case the line in between was removed.

current source, while the upper one is the input device. The name "source follower" originates from the behaviour of its output node, which corresponds to the source of the upper transistor. The current through this transistor is controlled by the gate-source voltage difference ( $V_{GS}$ ). To achieve a steady state the upper transistor needs to match the current of the current source, the output voltage will adjust itself such that the correct  $V_{GS}$ is applied. Therefore, if the gate voltage changes, the output level will adjust such that  $V_{GS}$  again produces the correct current. For an ideal case  $V_{GS}$  stays constant for all input voltages, tying the output voltage to the input voltage, hence the name "source follower". This circuit is commonly used as a voltage buffer with a low output impedance, e.g. to decouple an amplifier from a large output load as in in our case.



Figure 5.7: The modeling of the NMOS source follower.

In the ideal case as discussed above the amplifier will have an amplification factor of 1. However in a real implementation this factor is reduced to 0.7-0.8, as the upper transistor will be affected by the body effect which alters the transistors threshold voltage depending on the source voltage, in this case the output voltage level. I.e. the pulse which is transmitted over the signal line is damped to about  $\frac{3}{4}$  of the amplifier pulse height. Figure 5.7b shows a small signal model of the source follower, the input transistor represented as a voltage controlled current source with an additional drain-source resistance (basically the ohmic resistance of the conductive channel of the transistor), which is for both the current source as well as the input device very large (> 100 kΩ) and does not contribute to the following derivations. The body effect is represented by a second current source which is controlled by the source-bulk voltage difference  $V_{BS}$ . The output resistance of this circuit is given by  $R_{Out} = 1/(gm + gmb)$ . To get an estimate of gm + gmb a simulation was performed scanning the input voltage around the working point which allow to derive  $\frac{\Delta I}{\Delta U} = gm + gmb$  at the working point which gives and output resistance of  $R_{Out} \approx 8k\Omega$ . As this is very likely not an exact value, a range from (5-10) kΩ will be investigated.

In the following considerations the source follower is represented as a voltage source with a series resistance of  $R_O ut$ .

#### 5.2.2 RC-line

The modelling of an extended RC line is an important topic in digital electronics and logic synthesis, as delay and crosstalk effects influence the timing behaviour of digital signal and therefore can create setup and hold time violations. In the following several concepts from the digital realm are examined and applied to the use case of the MuPix signal line.



Figure 5.8: Different models for the RC line approximation, summing the resistance and capacitance of the line.

The simplest model of the RC line are the so-called lumped models, highlighted in figure 5.8a-c, which represent the line as RC circuits with the summed resistance and capacitance of the line or line segments. Possible RC arrangement are shown in figure 5.8. To achieve a higher precision the line is segmented and each segment is represented by a lumped model. The more segments are used, the more precise the model will describe the RC line. In the limit of an infinitesimal segmented line the signal propagation along the line is described by a diffusion equation, which describes the time evolution for any position on the line with r = R/L and c = C/L the length depending resistance and capacitance.

$$rc\frac{\partial V}{\partial t} = \frac{\partial^2 V}{\partial x^2} \tag{5.4}$$

This model is called the distributed RC line. For the differential equation solutions are known for the scenarios of a step and ramp input, which can be used to investigate the time development along the line. In practice however a full transient analysis of a distributed model for each line is impractical and time consuming therefore simplified models and metrics are used to predict its crosstalk [70] and timing behaviour [71].

#### 5.2.3 Crosstalk

$$I = C * \frac{\partial U}{\partial t} \tag{5.5}$$

The base principle of capacitive crosstalk is described by Equation 5.5. A changing voltage is generating a current through a capacitance. In the case of two capacitive couple signal lines this means that a pulse on one line will inject a current into its neighbour. In case of a long line, the coupling is described as a chain of capacitance (resistance ignored for simplicity), sketched in figure 5.9. A signal pulse that propagates on the line will inject a current in each of these capacitances when passing. The current will split up and flow both up and down the line. The current flowing in the same direction as the signal pulse will super-impose and form a crosstalk pulse. This crosstalk is typically referred to as far-end or forward crosstalk, which is the type of crosstalk that is observed in the MuPix chips. In the opposite direction the crosstalk is called near-end or reverse crosstalk [72]. Following this picture and Equation 5.5, the forward crosstalk will resemble a short pulse with the duration of the signal pulses rising edge, approximately 200 ns.



Figure 5.9: Current injection model.

Although Equation 5.5 gives an intuitive picture for crosstalk on a fundamental level, it simplifies the situation a lot. To get a more realistic and complete picture the case depicted in figure 5.10a is investigated. A signal line coupled to two neighbouring signal lines. The 3 lines are assumed to be identical. The electrical model is shown in figure 5.10b, with the source follower modelled as described above. The line is described by a lumped model with the line being coupled by the mutual capacitance. The end of the line is decoupled via an RC high pass element with a 100 fF capacitance and a resistive element larger then  $10 M\Omega$ . In the chip the resistive element is adjustable, but generally chosen to have a large resistance, therefore given this element a large time constant of >  $10 \mu$ s which will not participate in the pulse shaping process, but just act as a filter against low frequency noise.



Figure 5.10: Lumped RC model for three coupled lines. The investigated output nodes are marked in blue.

To get an intuitive picture also for this more complex model the so-called transfer function  $H(\omega)$  is utilized, which describes the frequency behaviour for a chosen point in an arbitrary impedance network. A plot of  $|H(\omega)|$  is referred to as Bode-plot and shows the frequency dependent damping of the signal.

The transfer function can be derived for any impedance network by calculating  $H(\omega) = \frac{V_{out}(V_{in})}{V_{in}}$ , using Kirchhof's laws. For the proposed RC-network the three marked output nodes will be investigated, however, due to the symmetry the response will be the same for the victim lines. To simplify the calculation, the RC elements on the output of the victim lines are dropped as they do not affect the transmission of the short and fast crosstalk pulses. The circuit is now a current divider with three branches, three high pass filters.

The series circuit of the high pass is described by Equation 5.6

$$HP[\omega, R, C] = R + \frac{1}{2\pi * i\omega C}$$
(5.6)

Equation 5.7 shows a general solution which covers both cases with  $HP_{out}$  describing the high pass on the output of the main signal line while  $HP_{victim}$  describes the high pass generated by the coupling capacitance and resistances on the victim line.  $HP_{inv}$  as well as  $R_{inv}$  and  $C_{inv}$  represent the investigated branch, so either  $HP_{out}$  or  $HP_{victim}$  and the corre-

sponding resistances and capacitances on these branches. The factor *A* is in this example 2 as it represent the two victim lines.

$$H(\omega) = R_{inv} * HP_{out}(\omega, R_{out}, C_{AC}) * HP_{victim}(\omega, R_{line} + R_i, C_{coup}) / [A * (R_{line} + R_i) * HP_{out} + (R_{line} + R_i) * HP_{victim} + HP_{out} * HP_{victim}] * HP_{inv}(\omega, R_{inv}, C_{inv})$$

$$(5.7)$$

Figures 5.11 show the resulting bode plots with the addition of the frequency band that corresponds to a typical RC-shaped pulse, in the time domain<sup>1</sup> described with a rise time of 100ns and a fall time of 4us, approximating a large pulse in a MuPix chip, which in the frequency domain is represented by a combination of high and low pass with the corresponding time constants, see Equation 5.8. The used parameters are  $C_{AC} = 100$  fF and  $R_{out} = 100 M\Omega$ , which gives a time constant of 10 µs for the decoupling high pass.  $R_i = 8$ k $\Omega$  the source follower output resistance ,  $R_{line} = 2.1$ k $\Omega$ and  $C_{coup} = 0.3$  pF corresponding to the characteristic values of a 5 mm long line.

$$Pulse\_Bandwidth(\omega) = \frac{i\omega * \tau_{fall}}{1 + i\omega * \tau_{fall}} \times \frac{1}{1 + i\omega * \tau_{rise}}$$
(5.8)





For lower the frequencies of the pulse bandwidth no damping is observed on the signal line and no signal is transmitted to the neighbouring lines. For higher frequencies the damping, and thereby the transmission, increases until a steady state is reached. Here the RC network acts as an current divider and the current equally distributes between the signal line and the neighbours.

In figure 5.12 the influence of the line length is highlighted. For increasing line length, here 20 mm with two close neighbours, the signal damping

<sup>1</sup> frequency and time domain are connected by a Laplace transformation



Figure 5.12: Comparison of a 5 mm and a 20 mm line . The bandwidth of a typical MuPix signal pulse in black. Blue is the bandwidth on the output of the center line, red represents one of the neighbours, for the 5 mm case. Green and orange for the 20 mm case respectively.

occurs for lower frequencies and thereby cuts into the pulses frequency band which increases the amount of crosstalk as more of the signal pulse will be transmitted. Further figure 5.13 shows the effect of the overall coupling capacitance of the signal line. By removing one of the neighbouring lines the current is distribute differently for high frequencies. More signal is transmitted to the remaining neighbour, this line experiences higher crosstalk pulses and therefore overall more crosstalk hits. This also explains an effect observed for MuPix8 which increased the overall crosstalk probability for pixels with only one close neighbour.



Figure 5.13: Comparison of a 5 mm line with two and one close neighbouring line respectively. The bandwidth of a typical MuPix signal pulse in black. Blue is the bandwidth on the output of the center line, red represents one of the neighbours, For the case of two close neighbours. Green and orange for the case of only one victim line respectively.

The observations made via the bode plot give an intuitive picture of the crosstalk behaviour and show that the absence of victim lines or rather coupling partners will lead to an increase of crosstalk in the remaining lines.

#### 5.2.3.1 Delay

As mentioned in section 5.2.2 the signal behaviour of the RC line is described by a diffusion equation. There is no closed-form solution for this differential equation, solutions are know for different input scenarios, as a step input or a voltage ramp. In [73] a solution is provided for the scenario of a voltage ramp as input signal, Equation 5.9 with  $V_0$  the voltage level of the ramp after time  $T_R$  and  $T_D elay$  the delay time constant of the RC-line.

$$U_{Out}(t) = \frac{V_0}{T_R} * \left[ -T_{Delay} + t + T_{Delay} * \exp^{\frac{-t}{T_{Delay}}} \right]$$
(5.9)



Figure 5.14: Illustration of the transient response (blue) of an RC-line to ramp input pulse (red).

Figure 5.14 shows a plot of the solution for a ramp input comparable to the rising edge of the MuPix signal pulse with  $V_0 = 400 \text{ mV}$ ,  $T_R = 200 \text{ ns}$  a delay time constant of 20 ns for the RC-line. The delay effect of the RC line is clearly visible with the delay time at a given threshold of 50 mV corresponding to the used time constant of the RC-line.

In digital electronics, especially logic synthesis, it is impractical to perform a transient analysis for each RC-line to investigate its delay behaviour, the figure of merit is the delay which needs to be approximated based on the line resistance and parasitic capacitances. The most successful firstorder approximation of the delay constant of a wire is the so-called Elmore delay [72, 74].

The Elmore delay gives a first order approximation of the delay time constant for an arbitrary RC tree structure with the capacitances connected to ground, typically valid for the 50% crossing point of the output signal. As can be seen in figure 5.14 it is valid for a much larger range of output amplitudes. Only for low output amplitudes the delay will be over estimated. The Elmore delayestimate  $T_{Elmore}$  is calculated as the sum of intermediate time constants from the input node to the output node, Equation 5.10.

$$T_{Elmore} = \sum_{i=1}^{N} R_i \sum_{k=i}^{N} C_k$$
(5.10)

 $R_i$  are all resistive elements on the path from input to output node and  $C_k$  all capacitances down stream of the resistive element towards the output node. For the case of an N-segmented RC-line the Elmore delay formula can be expressed as in Equation 5.11 with line length L, total resistance and capacitance R and C giving r = R/L and c = C/L. In the limit of infinitesimal segments it is reduced to the very simple expression Equation 5.11, which shows a quadratic dependence of the length line length.

$$T_{Elmore} = \sum_{i=1}^{N} r * \frac{L}{N} * \sum_{k=i}^{N} c * \frac{L}{N} = \frac{r * c * L^{2} * N * (N+1)}{N^{2} * 2}$$
(5.11)

In the limit of infinitesimal segments it is reduced to the very simple expression Equation 5.12, which shows a quadratic dependence of the length line length and only half the delay that a lumped model is predicting.



(a) Increasing line length. (b) Fixed line length.

Figure 5.15: Routing schemes with different line length evolution

Figure 5.15 shows the two types of signal line architectures that are realised in the current MuPix prototypes. The exact point-to-point connection as in MuPix8 in figure 5.15a and the intermediate connected signal lines as for MuPix10, but also MuPix7 in the past, in figure 5.15b. Applying the Elmore formula to these structure, including the source follower as signal source ( $R_i$ ) and the decoupling capacitance  $C_{AC}$ , gives the following expressions.

For increasing line length,  $C_{tot}$  representing the total capacitance of the line and  $R_{tot}$  the total resistance from the input to the output:

$$T_{inc}(l) = R_i * C_{tot} + \frac{r * c * l^2}{2} + (R_i + R_{tot}) * C_{AC}$$
  
=  $R_i * c * l + \frac{r * c * l^2}{2} + (R_i + r * l) * C_{AC}$  (5.13)



Figure 5.16: RC ladder for the exact point-to-point connection.



Figure 5.17: RC ladder for the intermediate connection on same length lines.

For constant line length but varying interconnection points, with  $l_{tot}$  the total line length:

$$T_{inc}(l) = R_i * C_{tot} + \frac{r * c * l^2}{2} + (R_i + R_{tot}) * C_{AC}$$
  
=  $R_i * c * l_{tot} + \frac{r * c * l^2}{2} + (R_i + r * l) * C_{AC}$  (5.14)

While in both cases a length dependence is present in all terms, in case of the intermediate connected line it shows that for a constant line length, one of the dependences is eliminated. The time constant created by the RC-charge up of the whole line capacitance through the input resistance is fixed for a design utilizing the same line length for all lines.

#### 5.3 ROUTING OPTIMISATION FOR MUPIX10

Following the results of the crosstalk obtained for MuPix8, two goals are formulated for a crosstalk optimisation of the routing: crosstalk reduction and detection. The optimisation is solemnly fuelled by the conclusions the MuPix8 investigation delivered. Firstly, the crosstalk probability increases proportional to the length two signal lines are adjacent, i.e. their coupling capacitance. Further the turn-on of the crosstalk is observed at a row address of around 60 for this measurement performed with a 50mV threshold, which allows to approximate the critical length that two lines can be routed in close proximity before the coupling capacitance enables crosstalk pulses to cross the discriminator threshold. The critical length for a threshold of 50mV is approximately  $60 \times 80 \mu m = 4.8 mm$ .

For the MuPix10 design, two metal layers were available for pixel routing. With the usage of the full column width  $(80\mu m)$  and 125 lines per metal layer, the smallest distance between adjacent lines is the same as in MuPix8. Two strategies were applied in the MuPix10 design to reduce crosstalk as much as possible and make it easily distinguishable from "real" sensor phenomena such as charge sharing.

For the latter, neighbouring pixels must not be routed on adjacent signal lines. By choosing an easily recognisable pattern, crosstalk can be identified and removed from the data. In case of one-sided crosstalk, the ToT information helps to identify the crosstalk hit, as the crosstalk pulses will be very short, while the initial signal is large.

The second strategy aims to reduce crosstalk by minimising the length that two different signal lines are routed in close proximity. To achieve this two routing techniques have been exploited. Within one metal layer a reorganisation of the line positions is used. By separating long lines from each other in an alternating scheme, the maximum length that adjacent neighbours share is halved in comparison to the simple MuPix8 scheme. The second technique makes use of the multiple routing layers. If a signal line ends in one layer, it frees up space, which can be utilized for a layer change of a line routed above. In doing so it allows for a signal line to change its close proximity neighbours. This is used to distribute the coupling capacitance of the line to more than two neighbours, thereby reducing the maximum coupling capacitance between any two lines.

With the two metal layers available and the usage of the two routing techniques, the scheme in figure 5.18 is implemented in MuPix10. It allows to limit the direct neighbour length to  $\frac{1}{4}$  of the maximal length. Metal lines in the two layers are routed on top of each other, forming a stack labelled i. Two adjacent stacks form the smallest symmetry unit, which is repeated until all 250 pixels within one column are connected. The lines with a length of  $\frac{1}{4}$  connect the lower quarter of the pixelalso dubbed sector 1, the lines with length  $\frac{1}{2}$  the next quarter(sector 2) and so forth. Following this simple picture and a maximal line length of 20 mm in MuPix10, the length that two signal line can be adjacent is limited to5 mm, slightly larger than the critical length extracted from MuPix8. A drastic reduction of the crosstalk probability is expected.

Table 5.2 summarizes the expected capacitance for each of the four line types comprising the sum of the parallel plate capacitances of the neighbouring lines in the metal as well as the couplings to neighbouring metal layers and compares it to maximum capacitance for the simple routing scheme from MuPix8. Exemplary for the shortest line length with two neighbours within the metal layer and in between metal layers in Equation 5.15 using the values calculated for 20 mm lines from table 5.1. The maximally possible capacitance of the lines is reduced by 40%.

$$C_{tot} = 2 * \frac{1.16\,\mathrm{pF}}{4} + 2 * \frac{0.37\,\mathrm{pF}}{4} = 0.76\,\mathrm{pF}$$
(5.15)



### **Cross section**

Figure 5.18: The MuPix10 routing scheme using two metal layers (red and green) stacked on top of each other. Prominent couplings are highlighted.

|           | Sector 1               | Sector 2               | Sector 3               | Sector 4               |       |
|-----------|------------------------|------------------------|------------------------|------------------------|-------|
| Line type | $C_{tot}[\mathrm{pF}]$ | $C_{tot}[\mathrm{pF}]$ | $C_{tot}[\mathrm{pF}]$ | $C_{tot}[\mathrm{pF}]$ | ∑[pF] |
| 1         | 0.76                   | -                      | -                      | -                      | 0.76  |
| 2         | 0.76                   | 0.76                   | -                      | -                      | 1.52  |
| 3         | 0.76                   | 0.67                   | 0.29                   | -                      | 1.72  |
| 4         | 0.76                   | 0.39                   | 0.29                   | 0.29                   | 1.73  |
| MuPix8    | 0.76                   | 0.76                   | 0.67                   | 0.67                   | 2.86  |

Table 5.2: The sector composition of total coupling capacitance for the different line types.

#### 5.4 MUPIX10 RESULTS

In the following a MuPix10 data sample obtained at a testbeam campaign will be investigated with the focus on crosstalk and position dependent delay effects by the novel line routing scheme. The MuPix-Telescope and its tracking capability will be utilized as a measurement tool. Only the important aspects of the tool will be discussed in the following. A detailed description of the MuPix-telescope and tracking can be found in [53].



#### 5.4.1 Data & Analysis

Figure 5.19: The MuPix10-telescope setup.

The data used in the following was obtained with the setup depicted in figure 5.19, utilizing a 4 GeV electron beam provided by the DESY II testbeam facility (April 2021). The setup consist of a MuPix-telescope with four layers of MuPix10 sensors, complemented by scintillating tiles readout by silicon photo multipliers placed before and after the telescope which provide a precise time reference measurement. The beam strides the telescope planes perpendicularly with particle rates of (30-40) kHz. The data is organised in packages, so called frames, which contain the hit and time reference data of a 16 µs interval. A comparison with the particle rate shows that approximately every second time interval will contain the information of a traversing particle, a very moderate occupation, making accidental coincidences between two different particles highly unlikely.

The usage of the MuPix-telescope allows to utilize its tracking capabilities. Three layers of the telescope are used as so called reference layers to identify and reconstruct particle trajectories (tracks) while the fourth is used as the device-under-test (DUT), which is a 100 µm thin MuPix10 sensor with a 200  $\Omega$  cm substrate. The sensor is operated at –100 V substrate voltage which is expected to create a depletion zone larger than 50 µm. The sensor is powered with a single supply voltage (2.25 V) and the VSSA voltage is generated on-chip, as it would be for the detector usage. To compensate for the on-chip voltage drops the supply voltage is distinctively higher than the nominal operating voltage (1.8 V). The chip is operated at the best settings known at the time, based on the optimisation performed in [55]. If not noted otherwise, data was obtained at an effective threshold of 40 mV above the baseline, which shows no signs of a heavy noise contribution.



Figure 5.20: The hitmap of the used data sample. No dominating hot pixels are visible.

figure 5.20 shows the hitmap of the DUT layer with no heavy noise contribution from individual pixels. The data set contains  $8 \times 10^6$  pixel hits on the DUT with a total of  $5 \times 10^6$  reconstructed tracks.



Figure 5.21: A particle traversing four sensor layers. The 2D hit information can be used to reconstruct a 3D straight line, approximating the particles path. Taken from [52].

TRACKING The telescope layers are stacked behind each other with a gap of approximately 7 cm to its neighbouring layers and aligned such, that the beam traverses all layers perpendicularly and hits roughly the same area of the sensor plane (mechanical alignment). A particle traversing this stack will deposit energy in each layer and create hits which are correlated. For the three reference layers a straight line fit is performed for all combination of one hit of each layer (within one 16 µs interval). The fit is performed using a  $\chi^2$ -method and is validated with a  $\chi^2$ -cut of  $\chi^2 < 10$ , the straight line representing an approximation of the particle trajectory. As the DUT layer is in between the reference layers the intersection of the straight line with the DUT plane can be calculated, which predicts the expected hit

position of the traversing particle. Figure 5.22b shows the residual histogram for this prediction, which is the distance between the predicted hit position and the closest hit on the DUT. The distribution is centered around zero with a gaussian width of  $\sigma = 31 \,\mu\text{m}$ , validating the alignment and showing a reasonable resolution close to the single pixel resolution of  $\sigma_{pixel} = \frac{80 \,\mu\text{m}}{\sqrt{12}} = 22 \,\mu\text{m}$ , only slightly degraded by scattering effect. Thereby, together with the  $\chi^2$  distribution in figure 5.22a, validating the usage of the straight line fit.



Figure 5.22: The MuPix10-telescope setup.

MATCHING The combination of a reconstructed hit position with the closest hit on the DUT layer is called matching. The matching procedure is the basic mechanism for the determination of DUT sensors efficiency and crosstalk probability. As seen above, the spread between reconstructed hit position and matched hit is well under control which allows to apply a distance-matching cut, depicted in figure 5.23. Especially in scenarios with very high particle rates or many noise hits on the DUT the distance cut reduces the probability of accidental hit matches tremendously and thereby prevents a systematic overestimation of e.g. the efficiency. In the following analysis a rather wide cut matching radius of 500 µm is used, as neither the particle rate nor the noise conditions demand for a more restrictive cut.



Figure 5.23: Sketch of matching radius (green). The closest hit to the reconstructed hit position (green dot) is selected. Hits outside the search radius are not considered.

CLUSTERING Although the particles penetrate the sensor in good approximation perpendicular, hits close to the borders and corners in between pixels can create hits in two or more adjacent pixels through charge sharing. These hits are therefore highly correlated and can be combined to a cluster. The condition for the clustering is therefore that the hits are directly connected via a pixel edge or corner as illustrated in figure 5.24.



Figure 5.24: Sketch of matching radius. The closest hit to the reconstructed hit position is selected. Hits outside the search radius are not considered.

A further process that can create very large clusters is the generation of so-called  $\delta$ -electrons. Although most interaction of the beam particle with the sensor material only creates small energy transfers, a large energy transfer will create a high energetic secondary electron, which can travel longer distances though the silicon generating electron-hole pairs along its path. However, especially for electrons as primary particles the occurrence of  $\delta$ -electrons is neglectable and will not contribute to the overall clustering characteristic.

TIME MEASUREMENT To perform a time resolution measurement it is advantageous to use a time reference system which itself provides a more precise time measurement then the investigated device. In this setup this precise time measurement is provided by two plastic scintillators readout by SiPMs. They are connected in coincidence which means only a particle crossing both scintillators will create a time reference measurement, thereby heavily reducing possible noise contributions. The system provides a time resolution of  $\sigma \approx 2 \text{ ns}$  [50] precision and is sampled by the data acquisition system with 500 MHz clock (2 ns bin size). The time reference measurements can be correlated with the timestamps of sensor hits. If there is a correlation, it means the reference measurement and hit have been triggered by the same particle. The time difference is calculated as shown in equation 5.16 which should be constant.

$$\Delta t = time_{hit} - time_{reference} \tag{5.16}$$

As shown in figure 5.25 a peak is visible for the calculated time difference confirming the correlation. The gaussian width of this peak is referred to as the time resolution. It is broadened by effects as pixel-to-pixel difference, e.g. a position dependent delay, and timewalk. In these measurements the absolute peak position is not containing any viable information, as e.g. cable delays between the two systems have not been considered. To investigate the delay behaviour the time difference is calculated depending on the pixels row address, thereby unravelling the relative delays within the pixel matrix.



Figure 5.25: Histogram of the calculated time differences create a gaussian distribution, broadened by pixel latency variations and timewalk.

#### 5.4.2 Crosstalk

#### 5.4.2.1 Identification

The MuPix10 routing scheme was designed to allow for a simple identification of crosstalk, as well a clear distinction between crosstalk and charge sharing events. As discussed in section 5.3 the routing features four types of signal lines with distinct lengths, each connecting one quarter of the sensors row domain referred to as sectors. The situation is summarized in figure 5.26a which shows the alternating routing scheme for the double column design of MuPix10. Table 5.3 summarizes the row address ranges of the sectors which is slightly asymmetric between even and odd columns and will therefore be distinguished in some cases in the following.



Figure 5.26: Routing scheme of the MuPix10 double column structure. The green metal lines are routed exactly below the red metal lines. A layer change for type 3 lines is indicated with a black box representing the via. A particle induced hit(blue), can create crosstalk in its neighbouring lines(purple), which induces additonal hits(green).

Figure 5.26b illustrates the pattern crosstalk will generate on the example of the two shortest line types. A hit in sector 2 will generate crosstalk on its neighbouring lines which are assigned to sector 1. While in the digital periphery this creates a connected cluster, crosstalk hits appear in a different sector in the active matrix with a row address difference of about 62 cor-

| Sector | Odd Columns | Even Column |  |
|--------|-------------|-------------|--|
| 1      | 0-59        | 0-61        |  |
| 2      | 60-118      | 62-124      |  |
| 3      | 119-184     | 125-186     |  |
| 4      | 185-249     | 187-249     |  |

Table 5.3: The address ranges of sectors as created by the line routing.

responding ( $\approx 5 \,\text{mm}$ ) therefore clearly separating particle hit and crosstalk induced hits. These digital address correlations can be used to easily identify crosstalk.



Figure 5.27: The self-correlation of the column and row addresses of hits in a frame.

As the possible correlations introduced through crosstalk are well separated they can be checked for qualitatively by investigating the sensors self-correlations plotted in figure 5.27. The plots show the column-column and row-row correlations for all hits within one frame. The column correlation shows the default expected picture. There is a prominent diagonal due to the correlation of each hit with itself, which is slightly washed out due charge sharing which correlates directly neighbouring pixels. For the row correlation on the other hand there are several additional of-center diagonal correlations visible. Figure 5.28 explains their origin with the according capacitive couplings between adjacent line. The double line structures are caused by the slight asymmetry of even and odd columns which are not disentangled here.



Figure 5.28: The origin of different crosstalk correlated address sectors with the help of the most prominent couplings.

Overall this shows that there is still crosstalk in MuPix10, but even more it makes clear that there is not only a coupling within the same metal layer, but also between the metal layers. This was not observed for MuPix8. Further it also allows to get an idea of the relative strength of crosstalk for different couplings. The crosstalk within the metal layers is clearly still more pronounced then in between metal layers. But also the crosstalk within the same metal layer shows different expression. The coupling between sector 1 and 2 is strongest, followed by sectors 3 and 4, while the correlation between 2 and 3 is clearly suppressed.

An explanation for the different weights can be found with a look at figure 5.26a. While in the optimisation process the neighbouring line length was limited to 5 mm, this only is true within the 20 mm long active matrix. However, the signals still have to be routed to the designated digital cell, which in this design can add up to 2 mm of line length. Additional (12) mm line length is added to the lines of sector 1 and 2. Up to 1 mm to the lines of sector 3 and 4. The neighbouring length between the lines of sector 2 and 3 is 5 mm. This explains the dominance of the crosstalk between sector 1 and 2. The further suppression of crosstalk between sector 2 and 3 is explained by the location of the crosstalk coupling, in the middle of the matrix, in contrast to the other sectors, which are coupling only before the digital cell is reached. Therefore the crosstalk pulse induced between sector 2 and 3 still has to travel to end of the line, itself coupling to other neighbouring lines which causes a reduction in pulse height and thereby crosstalk rate.

#### 5.4.2.2 Characterisation

With the knowledge of the hit patterns generated by crosstalk, these events are selected on purpose and their behaviour is investigated and compared to charge sharing events. Additionally to crosstalk to one neighbour also pure charge sharing clusters are selected. These two event groups give both two pixel hits per event and are completely disjunct as only charge sharing between pixel in neighbouring columns is considered, which can not be generated by crosstalk. In figure 5.29 their ToT behaviour is plotted, showing the distributions for the larger and smaller ToT of the two hits. The difference between charge sharing and crosstalk events is severe. While for charge sharing both distributions span allmost across the full range, in case of crosstalk the two distributions are very well separated which is exactly what is expected. As discussed before, only large signal pulses create detectable crosstalk, resulting in large ToTs, while the crosstalk pulses are very short and only produce small ToT values. The two distributions are well separated and will allow to identify, which hit is crosstalk induced and therefore invalid, based on the ToT information.

# 5.4.2.3 Crosstalk probability

The figure of merit to compare the new routing scheme to MuPix8 is the crosstalk probability. For this measurement, the tracking and matching capability of the telescope is used. In a first step the crosstalk events are tagged with the help of the address identification as discussed above, crosstalk in between metal layers is not identified individually as it is expected to happen only in coincidence with the tagged crosstalk, due to the weaker coupling. By applying the matching to the data set containing the tagged hits, two numbers are measured:  $N_{matched}$ , the number of all matched events, and  $N_{matched\&tagged}$ , the number of matched events which have been tagged as crosstalk. The crosstalk probability is therefore:

$$P_{crosstalk} = \frac{N_{matched\&tagged}}{N_{matched}}$$
(5.17)

Figure 5.30 shows  $P_{crosstalk}$  depending on the row position of the matched hit and distinguished for odd and even columns. The routing



Figure 5.29: ToT distributions for charge sharing and crosstalk events with two pixel hits. The ToT is plotted in red for larger of the two ToT values and blue for the smaller ones respectively.

scheme together with matching guarantees here that the matched hit is particle induced, as the matching radius is  $500 \,\mu\text{m}$  while crosstalk only appears in 5 mm distance from its origin.

The relative error of this observable is below 2% routed in the fact, as discussed later, that a particle creates clusters based on charge sharing with about 4% probability most of those double clusters. The probability to match the hit in the double cluster, which possibly did not create crosstalk is naively 50%, taking into account the precise reconstruction demonstrated above the probability to confuse the two hits is even lower.

All structures and variations of the crosstalk probability can be traced back to irregularities in the routing scheme and its surrounding metal layers. In the following only the two mayor structures will be discussed, all other smaller features can be explained by the same effects.

While for most rows the crosstalk probability is below 2% there are blocks of row addresses which range in between 5%. The blocks which appear both for even and odd columns correspond to their respective sector 2, while for odd columns additionally a block of 6 rows is boosted at the beginning of sector 4.

A comparison of figure 5.30 with the self-correlation plot in figure 5.27b shows which correlations cause these structures. The large blocks are



Figure 5.30: The crosstalk probability depending on the row position of the matched hit. In red for even and blue for odd columns.

caused by crosstalk between sector 1 and 2, the additional small block is generated between sector 3 and 4. However, the row dependence of the crosstalk probability also shows that there is mostly crosstalk from sector 2 to sector 1 and for the small block from sector 4 to sector 3 but not in the other direction. The crosstalk probability is asymmetric for the same coupling, the crosstalk probability has directionality.

A look into the layout reveals why these row ranges are more susceptible to crosstalk. Additional to the longer neighbouring line lengths which leads to a larger coupling, as discussed above, here also the surrounding metal structure is different. As depicted in figure 5.26a the rows which are enhanced in the coupling between sector 3 and 4 (right edge of the column) do not have metal lines of the other routing layer routed above them. In section 5.2.3 it was shown that the crosstalk strength will increase if a line has less neighbours. The missing of these additional lines to share the crosstalk current, increases the crosstalk amplitude in the remaining victim lines. This explanation can also be applied to the coupling between sector 1 and 2. As these are the longest lines they are also missing the signal lines routed below them on the last 1 mm, which adds as a further factor to the boost of the crosstalk amplitude alongside the increased neighbouring length.

An explanation for the crosstalk directionality is depicted in figure 5.31. One factor is that the longer lines of sector 2 use the full length of the sector 1 line to induce crosstalk which also leads to an overall rather constant crosstalk probability within the blocks. For the case of crosstalk from sector 1 to 2 the effective length which is used for crosstalk induction is row position dependent which should introduce a position dependent crosstalk probability. However by this logic the highest sector 1 row-address should



Figure 5.31: The crosstalk directionality explained based on the relative origin of the signal pulse and the charge up of the line. The Yellow tainted gap indicates the area in which forward-crosstalk can be generated.

induce the same amount of crosstalk as sector 2, which is clearly not the case. The reason for this is the current division up and down the line creating forward and reverse crosstalk as the induced current needs to charge up the complete line capacitance both up and down stream. As the total capacitance of the sector 2 lines is twice the capacitance of the sector 1 lines, the amplitude of the induced pulses will be halved:  $U_{xtalk} \propto \frac{1}{C_{line}} \int I(t) dt$ , thereby reducing the crosstalk probability. The same argument holds for the block in sector 4.

Averaged out over the full chip this amounts to 2.4% in even and 2.6% crosstalk probability in odd columns to a total occurrence for 2.5% for all crossing particles. The distribution however is not flat, as discussed above, which leads locally to higher crosstalk probabilities of about 6%.

# 5.4.2.4 Efficiency reduction

Beside the crosstalk an efficiency reduction due to the signal line coupling was reported for MuPix8[53], explained by an amplitude reduction as part of the pulse is transferred to its neighbours. Figure 5.32 shows the same in-

vestigation for MuPix10. The efficiency is plotted in dependence of the row position for different thresholds. It is calculated as  $Eff = \frac{N_{matched}}{N_{tracks}}$ . With  $N_{tracks}$  the number of tracks intersecting with the DUT device or smaller region of it, the so-called region of interest (ROI), and  $N_{matched}$  the number of tracks that could be matched to a hit on the DUT. The shown asymetric errors are calculated automatically by the ROOT TEfficiency class [75] which uses a binomial estimator incorporating the calculated efficiency and the used statistics.



Figure 5.32: The efficiency depending on the row address plotted for 42(red), 56(violet), 77(blue) and 105 mV(green) threshold. Exhibiting an efficiency reduction with increasing threshold, but also the pronunciation of a row dependence.

While for low thresholds no effect is visible, a clear slope and structure is emerging for higher thresholds. The plateau for low row addresses corresponds to sector 1, the lines with the smallest capacitive coupling. As this structure evidently matching to the routing scheme, other possible reasons for the efficiency drop as a voltage drop across the matrix are unlikely, thereby also confirming the MuPix8 interpretation.

# 5.4.2.5 Crosstalk free data

As already seen in section 5.4.2.3 the matching capability of the telescope allows to exclude crosstalk hits from the data set due to their large displacement. This for the first time allows for a direct investigation of pure particle data of a large scale HV-MAPS sensor, untainted by any effect of the crosstalk on clustering or the ToT-data. By only using matched hits and their corresponding clusters the cluster size and the ToT behaviour is investigated.

Figure 5.33 shows a comparison of the cluster size distribution for raw data and for matched data. The difference is especially visible for cluster size two, as in an event which caused crosstalk in both neighbouring lines



Figure 5.33: Comparison of the cluster size distribution between all data and matched data, which is not affected by crosstalk.

the crosstalk hits will appear as a two hit cluster in the neighbouring sector. For matched data a 96% single hit probability is achieved for perpendicularly crossing particles highlighting the extreme binary nature of the HV-MAPS particle detection and charge collection via drift, charge sharing only appearing very close to the pixel edge.

The ToT data is plotted in figure 5.34. While the matched data shows a nice landau-like distribution with an enhancement for higher ToT values due to the ToT cap caused by the delay circuit, the raw ToT distribution shows a pronounced difference, as it has an additional peak at low ToT values. This peak contains mostly crosstalk, which causes short pulses as shown before in figure 5.29b, and possible noise contributions. As the distribution shows a good separation between the crosstalk peak and the main distribution, it raises the question if a ToT-cut can be used to remove crosstalk efficiently from the raw data without the availability of tracking information. For an online system as Mu3e this simple cut could be introduced very early in the readout chain, while a more complex crosstalk removal based on the possible line correlations would be an offline application.

Figure 5.36 shows the effect of this cut for the consecutive removal of ToT bins, starting at zero. The crosstalk probability is reducing with an increasing cut window as expected. For the maximum cut value the probability is reduced below the permille-level. The cut removed at this point 6% additional hit data, comprised mostly by crosstalk but also good hits



Figure 5.34: Comparison of the ToT distribution between all data and matched data, which is not affected by crosstalk.



Figure 5.35: An overlay of the ToT spectrum of raw and matched data showing the effect of the ToT cut respectively.

as the efficiency shows. For the removal of the four lowest ToT bins an efficiency reduction of about 0.12% is observed. An explanation is given in figure 5.35, showing an overlay of raw and matched data, which indicates that the larger cuts also exclude matched data.



Figure 5.36: The effects of the ToT cut in the bin size of the ToT measurement, 128 ns.

#### 5.4.3 Delay

To investigate the chip internal delay behaviour the time difference measurement is performed based on the row address. For each row the time difference distribution is measured and the mean value is extracted. Figure 5.37 shows a plot of the mean values depending on the row address. Four separated segments are clearly visible which fit again to the row sectors. Each representing one of the four discrete line lengths.



Figure 5.37: The mean of the time difference distribution broken down for the row address.

Using the description of the signal line as an RC-line, as worked out in section 5.2.2, the Elmore approach is used to create a delay formula for each line type, utilizing the capacitance estimates shown in table 5.2. and the internal resistance of the source follower (8 k $\Omega$ ). Figure 5.18 reminds of the chosen routing and shows that the total coupling capacitance changes depending on the sector as lines end or change the layer. Table 5.2 shows the estimated total capacitances for the different line types and sectors. As each line type is connected to the pixel in its last sector, only here a length dependent Elmore time constant is expected the sectors before will all contribute with the Elmore delay for the full 5 mm length of the sector.

Equation 5.18 shows the generalised Elmore expression for the different line types with n the number of sectors ,  $R_{int}$  the output resistance estimate of the source follower and  $C_{total}$  the total capacitance of the corresponding line type.

$$T = R_{int} * C_{total} + \sum_{i=1}^{n} T_{sectori} + R_{total} * C_{AC}$$
(5.18)

The first term represents that the source follower has to load the full line and the last term for the loading process of the decoupling capacitance before the pixel discriminator. For the longest and most complex line type(4) this give equation5.19 with the line length *l* between (15-20) mm,  $c_i$  the sectors specific capacitance( $\mu$ F/mm) and *r* the resistance per line length ( $\Omega$ /mm).

$$T_{type4}(l) = R_{int} * C_{total} + \frac{r * c_1 * (5 \text{ mm})^2}{2} + \frac{r * c_2 * (5 \text{ mm})^2}{2} + \frac{r * c_3 * (5 \text{ mm})^2}{2} + \frac{r * c_3 * (5 \text{ mm})^2}{2} + \frac{r * c_4 * (l - 15)^2}{2} + (R_{int} + r * l) * C_{AC}$$
(5.19)



Figure 5.38: The delay behaviour predicted with the Elmore description of the RC-delay for 8 and  $10 \, k\Omega$  output resistance of the source follower. The prediction for MuPix10 in green and for comparison a MuPix8 like routing in blue.

In this way an equation for all line types is created and plotted in figure 5.38 against the line length. Also here the differences between the sectors are clearly visible and very well explain the structure of the data. The time spread from smallest to largest time constant is 12 ns while the data tends towards slightly larger spread approaching 20 ns, but it shows a huge compression of the time spread compared to a MuPix8 like routing scheme. This difference can fully attributed to the many approximations, especially the capacitance estimate ignoring fringe effects. In particular this will influence the last sector(4) where the long lines do not have any close routing neighbours, fringing effects will contribute here most. But also the estimation of the source followers resistance might be too small, as the concerned transistor uses an enclosed design for which the process does not offer a good model. A simulation with 10 k $\Omega$  is depicted in figure 5.38b which increases the time spread further.



Figure 5.39: The experimental data overlayed with the calculated Elmore delay predictions for  $8 \text{ k}\Omega$  source follower resistance.

The influence of the capacitance approximation is further highlighted in figure 5.39 it shows the experimental data overlayed with the Elmore prediction. While the short segments are well represented, the prediction power degrades more and more for sector 3 and 4 as the slope of the delay increase is not correctly described anymore. Also hinting to an underestimation of the capacitance. The time difference at l = 20 mm between data and calculation is about 10 ns. A correction term for the Elmore calculation is defined which accounts for the "missing" capacitance, see equation 5.20.

$$t_{data}(20\,\text{mm}) - t_{Elmore}(20\,\text{mm}) = R_{int} * C_{missing} + R_{total} * C_{missing}$$
(5.20)

 $C_{missing}$  is calculated to 0.7 pF a 40% mark up to the value estimated with the parallel plate approach.

# 5.5 OPTIMISATION CONSIDERATIONS TOWARDS MUPIX11

As seen in section 5.4 the optimisation of the routing scheme with the aim of crosstalk control was very successful. However, as the influence

of also rather minor routing differences shows to have a major impact on the crosstalk occurrence, it is clear that the design is still close to the critical coupling capacitance and should strive to improve further if possible. While in MuPix10 only two metal layers were used for the signal line routing, due to technical reasons, it is possible to use the additional metal layer that becomes available in the TSI process for routing as well. This was already successfully implemented for the ATLASPix3 chip [62].

If a third metal layer is used, the routing density will relax greatly. Instead of 125 signal line squeezed into an  $80 \mu m$  wide corridor, only 83(84)lines need to be placed. Keeping the line width fixed, the distance between the signal lines more than double and thereby reduces the coupling capacitances inside a metal layer by the same factor, resulting in less crosstalk and reduced signal line delay.



Figure 5.40: The cross section of routing schemes utilizing three metal layers.

Figure 5.40a shows a cross section of a straight forward approach using three metal layers, as can be found in ATLASPix3, with the lines stacked on top of each other. An investigation of the coupling capacitances within a metal layer and in between two metal layers, reveals that they are very comparable. While in the MuPix10 routing the coupling in between metal layers could be neglected, here it has the same weight as the coupling of two neighbouring line within a metal layer. An optimized routing scheme therefore needs take into account both to minimize the coupling capacitances, which complicates the optimisation a lot. A way around this issue is proposed in figure 5.40b. Instead of stacking the metal lines a staggered approach is used. Within a metal layer the capacitance remain unchanged, but in between metal layers the lines are separated diagonally increasing the distance, but also no longer corresponding the description of a parallel plate capacitor, which means a parasitic extraction is required to determine the exact coupling capacitance. Overall this is expected to reduce the coupling for lines in between metal layers. Therefore allowing again for an optimisation using the MuPix10 techniques.

A routing proposal for MuPix11 is shown in figure 5.41. The routing lengths are thirds of the total length, which together with the double line distance reduces the coupling of two neighbouring lines to less then  $\frac{1}{6}$  com-



Figure 5.41: The cross section of routing schemes utilizing three metal layers.

pared to the maximum MuPix8 coupling (MuPix10: $\frac{1}{4}$ ). Possibly eradicating signal line crosstalk and reducing the routing delay by a similar factor.

With the routing density no longer being at the process limit, a further parameter becomes available for optimisation: the line width. While for crosstalk considerations an increase of the line width has a negative impact due to the increase of the coupling capacitances it allows to reduce the signal line resistance which scales  $\propto \frac{1}{width}$ . As the delay time constant is a product resistance and capacity the optimum may be different. In the light of the Mu3e experiment no further reduction of the delay is required then achieved by the routing scheme already.

#### 5.6 ATLASPIX-LIKE SENSORS

As it was shown in [61] the ATLASPix like sensors with an in-pixel comparator are not prone to crosstalk. Although the line driving mechanism is different from the MuPix source follower approach, overall the same thoughts and mechanisms for crosstalk and signal line delay apply.



Figure 5.42: Crosstalk mechanism for ATLASPix like signal transmission.

Figure 5.42 illustrates the crosstalk effect for the case of an ATLASPix chip. Instead of an analogue pulse the signal is a normalized pulse of adjustable height with a steep rising edge. Just as in the MuPix case crosstalk will appear on the signals rising edge, scaling with the signal lines coupling capacitance. However, while for the analogue pulse of the MuPix a low threshold is applied on the receiving end to achieve a high efficiency and a good time resolution, the receiver circuit of the ATLASPix type chips does not have an adjustable threshold. The switching point is given by the circuits design typically 500 mV or more above the baseline level, a much higher "threshold" than in the MuPix case, therefore allowing much larger crosstalk pulses to go by undetected. But at the same time due to the normalized nature of the transmitted pulse, the crosstalk behaviour is binary, as the crosstalk pulse will always have the same amplitude. If it surpasses the threshold, every crosstalk pulse crosses the threshold. However, due to the adjustable pulse height of the normalised pulse this is managed easily by reducing the pulse height, which consequently reduces the crosstalk amplitude.

Without looking further than the crosstalk effect, the digital nature of the pixel pulse transmission has clear advantages compared to the MuPix analogue approach. While ATLASPix\_Simple had same dense routing conditions and scheme as the MuPix8, the adjustability of the normalized pulse height allowed for a complete suppression of crosstalk and at the same time would not be subject to efficiency reductions as observed for the MuPix approach due to the uniformity of the transmitted pulses.

However, looking at the dependencies of the ATLASPix time resolution show that it profits from larger pulse heights. Therefore also for the AT-LASPix designs a consideration of a more involved routing scheme is desirable, to reduce coupling capacitances, thereby omit crosstalk and allow for a good time resolution. This is already partially realised in AT-LASPix3 which makes use of three metal layers for the pixel routing, enlarging the spacing between the signal lines by a factor 1.6 compared to AT-LASPix\_Simple. Further the ATLASPix3 features a different routing scheme with all signal lines having identical length, each metal layer connecting one third of the pixels. This scheme can also clearly be observed in the chips signal line delay in figure 5.43 which fits the description worked out before for the MuPix signal line.



Figure 5.43: Row address dependent delay measured for ATLASPix3[62]. The blocks are routed via metal 5(M5), metal 4(M4) and metal top(MT).

Although our prediction would be a complete linear behaviour, the three metal layers (M4, M5, MT) are clearly distinguishable. For the outer layers the structure of the neighbouring non-routing layers is variable an therefore not easy to approximate. This further shows in the pronounced differences between the two outer layers (M4 & MT). A study of the layout shows, that there is one main difference in the structure of the non-routing layers. While for the layer MT the neighbouring structures have the same routing orientation, from the periphery to the top of the pixel matrix. For the metal layer M4 its neighbour features wide metal bars, routed perpendicular to the signal lines. In both cases these structures are connected to ground or the supply voltage. The perpendicular routing approach seems to cause an overall increase in the delay time relative to the approach with the same routing direction. This observation is further supported by the outer most signal lines in all layers, as they are neighboured by broken lines depicted in figure 5.44 which are connected to the perpendicular metal bars. These



border lines can be easily spotted in the delay plot as spikes of more than 2 ns delay difference, e.g. around row address 50 and 110.

Figure 5.44: The broken line neighbours of the ATLASPix3 edge lines.

A possible explanation of this effect is the breakdown of the validity of the signal line model. To approximate its time constant with the Elmore delay approach an infinite continues ladder model is assumed. However, in the case of the M4 routing the capacitance distribution is not continuous but locally broken therefore the factor 1/2 of the Elmore approximation is probably not realised and the time constant will increase towards the "pessimistic" lumped approximation  $T_{Delay} = RC$ .

#### 5.7 CONCLUSION

The routing optimisation of the MuPix10 chip was very successful. Without a relief of the routing density the crosstalk probability was confined to 2.5% by rearranging the signal lines, a factor 7 less then expected for the conservative MuPix8-extrapolation. The spread of the row dependent signal line delay is measured to 20 ns, more than two times less then expected for a MuPix8-like implementation. Table 5.4 summarizes the results. The numbers have to be taken with care, as the investigation and optimisation of MuPix10 is on-going, which may influence also the crosstalk behaviour. However, the depletion zone is expected to be two times as wide as for the final experiment with 50  $\mu$ m thin sensors (30  $\mu$ m depletion depth), which corresponds to double the amount of collected charge and thereby on average higher signals leading to more crosstalk. Therefore a further reduction of the crosstalk probability for the final use case is expected. A detailed study of the high voltage and sensor thickness dependence is currently on going.

The results further confirm the developed intuitive picture for the crosstalk effect which explain the relative strength as well as the newly dis-

|                                | MuPix8-like | MuPix10 |
|--------------------------------|-------------|---------|
| Crosstalk Probability [%]      | 18          | 2.5     |
| Resistivity [ $\Omega$ cm]     | 80          | 200     |
| HV [V]                         | 50          | 100     |
| Estimated depletion depth [µm] | 20          | 60      |
| Row delay [ns]                 | 40          | 20      |

Table 5.4: The key results of the MuPix10 measurement compared to the expectation for a MuPix8-like routing.

covered directionality of the crosstalk. Additionally a theoretical description of the signal line as RC-line was established which allows for a prediction of the position dependent signal delay with the help of the Elmore approximation. The calculated delay performance describes the dynamic of the signal line delay very well but also shows the difficulty of the estimation of the line capacitance based on a pure parallel plate approach. To get an exact prediction of the delay times the performance of a parasitic extraction of the routing layout is required.

It was shown that with the help of this new routing scheme crosstalk is easily detectable and can be dismissed on the basis of its ToT information, which by it self is a power full tool for further crosstalk suppression (ToTcut) at the cost of a minimal efficiency loss on the 0.1%-level. While the use of this cut in Mu3e needs to be discussed, it can be applied without any hesitation e.g. to clean up the data of the reference layers of the MuPixtelescope and remove 6% overhead data.

In the routing optimisation of the MuPix10 only two metal layers have been utilized. The TSI process, as proved by the ATLASPix3 implementation, allows to use three metal layers. A three layer routing scheme was proposed for MuPix11 which reduces the line coupling even further and thereby possibly eradicates the MuPix crosstalk while further reducing the range of the signal line delay. However, this change will most likely not be implemented in MuPix11 as it would require larger changes to the overall layout, while the design goal is mostly focussed on small fixes and updates. Further it is foreseeable that MuPix11 will only serve as the pixel sensor for the phase 1 Mu3e experiment, which needs to handle 20 times less muon decays per second, therefore the additional hit rate caused by crosstalk (6%) will be only 300 kHz for the sensor with the highest occupancy, which is easily handled by a data acquisition system designed to handle MHz hit rates.

The comparison of the MuPix performance with the ATLASPix experience shows the advantage of this technology, as crosstalk is easily averted and the analogue performance of the pixel is not disturbed by the signal line. However, also for ATLASPix-like pixel sensors an optimisation of the line routing is desirable to extract the best possible timing behaviour. A further reduction of the delay range is possible by improving the line routing and especially for larger pixel sizes making use of the routing space to reduce the coupling capacity. For example in the case of the MightyPix development[76] which aims for an application at LHC and therefore requires a hit registration within a 25 ns window. By reducing the signal line capacitance also the delay contribution to the instantaneous time resolution can be reduced, only leaving timewalk as a dispersive effect.

With the implementation feasibility of the isolating p-well in the TSI process proven in [51], an ATLASPix-like approach will be used for the implementation of the pixel sensor for Mu3e phase 2, utilizing an optimized routing design. Part IV

# SUMMARY AND OUTLOOK

# 6

# SUMMARY AND OUTLOOK

Since the discovery of the Higgs particle and the absence of further new particles at high energies at this point, the search for physics beyond the standard model is more important then ever, as hints to New Physics would guide the experimental and theoretical effort in the particle physics community. Currently searches are performed at the high energy frontier at the LHC ( $\sqrt{s} = 14$  TeV) and with the proposed FCC reaching  $\sqrt{s} = 100$  TeV energies in the future. These searches for new particle resonances at high energy are complementary to the searches for New Physics in muon decays, as the current muon experiments probe branching ratio regions which could be enhanced e.g. by heavy new particles entering through loop contributions. The particle masses can be larger than  $10^3$  TeV.

One of these experiments is the Mu3e experiment, which searches for the charged lepton flavour violating decay  $\mu^+ \rightarrow e^+ e^- e^+$ , aiming to ultimately improve the existing limit of  $1 \times 10^{-12}$ , set by the SINDRUM experiment, by four orders of magnitude. To achieve this goal in a reasonable amount of time,  $2 \times 10^9$  muon decays per second have to be observed. As currently no beam line in the world can provide these high rates, the experiment begins with a muon decay rate of  $1 \times 10^8$  Hz. The low energy of decay particles subjects them to severe multiple Coulomb scattering when crossing material, subsequently limiting the achievable precision of the particle reconstruction. As the precise determination of the particles vertex and momentum are the most important tools for background suppression, the material budget of the pixel tracker layers needs to be reduced drastically to 0.1% of a radiation length. Together with the high particle rates hitting the pixel sensors, 5 MHz for the sensors closest to the target, the thinning of sensors to 50 µm are unique requirements to a pixel sensor. The only pixel sensor technology that can fulfil these radical requirements was found in the HV-MAPS technology. At the same time the limited material budget also affects the sensor interconnection. The sensors are glued and bonded to a two layer aluminium HDI which needs to provide power, high voltage, configuration and readout capability. As the sensors draw their power directly from the HDI power line, the most of the available routing space is required by the power lines to prevent voltage drops on the line due to ohmic losses. To maximise the power routing space, the remaining chip interfaces need to be minimised.

The minimisation of the MuPix configuration interface was very successful. While the chip internal shift registers are controlled by more than 5 different input signals, a solution employing a single differential line was found. By making use of the chips existing clocking scheme and by incorporating the synchronous reset signal, no additional differential pair connections are required for the module configuration. All chips on one module are connected to one differential clock and configuration bus line. Slow control data is sent out via a regular 1.25 Gbit/s data link. Commercially available slow control solutions as I2C can not compete, as they require at least two input signals which are typically implemented in a pushpull or open-drain signal connection scheme which is not easily compatible with the required differential signal scheme. At the nominal reference clock speed of 125 MHz, the configuration data is send to the chip with 15.625 Mbit/s. The chip internal shift register is then written at a speed of 3.125 Mbit/s. These bit rates can compete with the fastest I2C transmission rates (3.4-5) Mbit/s.

With the presented optimised configuration schemes, a 9-chip MuPix10 module can be configured with individual configurations for each chip in less then 400 ms, while for an unoptimised approach, 4.5 s are required. Any module can be configured to a default state within 6 ms. For the case of the MuPix10 register structure, no further improvement is possible, besides decreasing the slow control clock divider.

Although not all features of the implementation could be tested due to a bug in the configuration shift register, the newly implemented ADC and the error reporting system works fine. The supposed bug which leads to a constant addition of slow control data into the readout data stream, turns out to be very practical and will be implemented in the next iteration.

The developed slow control interface is well optimised for the current HV-MAPS sensors, but is also easily adaptable and expandable to other uses and functions, which give this implementation the potential to be a baseline design for many applications. The choice of a differential scheme will also allow to use it in AC coupled scenarios, e.g. serial powered sensor modules.

As a second project the nature of the MuPix signal lines was investigated in depth. The observation of signal line crosstalk and signal line delay in the MuPix8 chip lead to the development of an improved routing scheme which reduces the crosstalk and also the observed row address dependent delay. A conservative extrapolation of the MuPix8 results estimate the expected crosstalk and delay effects for the even larger MuPix10 chip to an average of 18% crosstalk probability and 40 ns position dependent delay across the row address space. The improved routing scheme successfully reduced the crosstalk probability to 2.5% and the delay below 20 ns. It was further shown, that the remaining crosstalk is well understood and can be rejected on the basis of its ToT information. A simple ToT-cut without taking into account the specifically engineered address correlations suffices to remove crosstalk further, only at the cost of minimal 0.1% efficiency reduction.

The results are well explained by the presented crosstalk and delay models, in particular the delay calculation based on the Elmore approximation showed a very good agreement with the data although many parameters have been estimated with rather simple models. In MuPix10 the prediction degrades for longer line lengths, most likely due to the less dense routing of the signal lines. The simplified capacitance models are underestimating the long signal lines coupling capacity.

To eradicate the remaining crosstalk a new routing scheme was proposed, which make used of 3 metal layers for the routing which leads to a reduction of the routing density and subsequently the capacitive coupling to neighbouring signal lines.

The developed models for crosstalk are as well transferable to the AT-LASPix type sensors. Although crosstalk affects this technology significantly less which gives it a clear advantage, the crosstalk possibility is still present and needs to be kept in check. But more importantly the delay behaviour is also described by the same model. As current developments of ATLASPix-like chips aim for an application e.g. at LHCb (MightyPix), which requires an excellent in-time resolution during a 25 ns time window, a reduction of the signal line delay spread will remove one of the contributions that degrade the sensors raw time resolution.

The MuPix10 chip is so far performing splendidly and fulfils all requirements of the Mu3e pixel tracker. Currently only 100 µm sensors have been investigated thoroughly. Studies of thinned sensors are on the way. Most of the found issues are easily fixed, but affect the testability of the MuPix10 sensor. For once this prevents the complete testing of the novel slow control interface as describe earlier. At the same time the observed in-chip voltage drops will effect the sensor analogue performance. Changes of the analogue performance are expected once the voltage drops are removed. Therefore a study is going on which aims to match the in-pixel voltage domains by compensating the shift of the chip internal voltage domains with shifts of the external voltage levels.

Soon the Mu3e pixel tracker will be the first large scale detector project constructed with HV-MAPS sensors. It is expected that MuPix11 will be the last large-scale HV-MAPS sensor featuring a source follower as the line driving element, as the advantages of a design with in-pixel comparator are obvious as no analogue pulse which perceives the signal as a shaping element, is send down the signal line. The technology is now also available with a CMOS in-pixel comparator which makes this technology suitable for the use in Mu3e which will need to develop a new sensor coping the with even higher muon decay rates of phase 2.

Part V

APPENDIX

# PUBLICATIONS

Some of the ideas and figures presented in this thesis have appeared previously or are expected to be published in the following journal articles and conference proceedings:

# Technical design of the phase I Mu3e experiment

K. Arndt et al. In: *Nucl. Instrum. Meth.* A1014 (2021) DOI: 10.1016/j.nima.2021.165679

# The Mu3e experiment: Toward the construction of an HV-MAPS vertex detector

T. Rudzki et al.
(2021). arXiv: 2106.03534[physics.ins-det]

# MuPix10: First Results from the Final Design

H. Augustin et al. In: *JPSCP* PP (2021) DOI: 10.7566/JPSCP.34.010012

### The Mu3e Data Acquisition

H. Augustin et al. In: *IEEE TNS* PP (2021) DOI: 10.1109/TNS.2021.3084060

### High-Voltage CMOS Active Pixel Sensor

I. Perić et al. In: *IEEE JSSC* PP (2021) DOI: 10.1109/JSSC.2021.3061760

# Test results of ATLASPIX3 — A reticle size HVCMOS pixel sensor designed for construction of multi chip modules

R. Schimassek et al. In: *Nucl. Instrum. Meth.* A986 (2021) DOI: 10.1016/j.nima.2020.164812

# The MuPix sensor for the Mu3e experiment

H. Augustin et al. In: *Nucl. Instrum. Meth.* A979 (2020) DOI: 10.1016/j.nima.2020.164441

# MuPix & ATLASpix: Architectures and Results

A. Schöning et al. In: *PoS* Vertex2019 DOI: 10.22323/1.373.0024

# EUDAQ—a data acquisition software framework for common beam telescopes

P. Ahlburg et al. In: *JINST* 15 P01038 DOI: 10.1088/1748-0221/15/01/P01038

# Performance of the ATLASPix1 pixel sensor prototype in ams aH18 CMOS technology for the ATLAS ITk upgrade

M. Kiehn et al. In: *JINST* 14 C08013 DOI: 10.1088/1748-0221/14/08/C08013

Performance of the large scale HV-CMOS pixel sensor MuPix8 H. Augustin et al. In: *JINST* 14 C10011 DOI: 10.1088/1748-0221/14/10/C10011

# Electrical characterization of AMS aH18 HV-CMOS after neutrons and protons irradiation

D. Sultan et al. In: *JINST* 14 C05003 DOI: 10.1088/1748-0221/14/05/C05003

# MuPix8 — Large area monolithic HVCMOS pixel detector for the Mu3e experiment

H. Augustin et al. In: *Nucl. Instrum. Meth.* A936 (2018) DOI: 10.1016/j.nima.2018.09.095

# A high-voltage pixel sensor for the ATLAS upgrade

I. Perić et al. In: *Nucl. Instrum. Meth.* A924 (2018) DOI: 10.1016/j.nima.2018.06.060

# Irradiation Study of a Fully Monolithic HV-CMOS Pixel Sensor Design in AMS 180 nm

H. Augustin et al. In: *Nucl. Instrum. Meth.* A905 (2018) DOI: 10.1016/j.nima.2018.07.044

# Efficiency and Timing Performance of the MuPix7 High Voltage Monolithic Active Pixel Sensor

H. Augustin et al. In: *Nucl. Instrum. Meth.* A902 (2018) DOI: 10.1016/j.nima.2018.06.049

# **The MuPix Telescope: A Thin, High Rate Tracking Telescope** H. Augustin et al. In: *JINST* 12.01 (2017), C01087. DOI: 10.1088/1748-0221/12/01/C01087.

# MuPix7—A Fast Monolithic HV-CMOS Pixel Chip for Mu3e

H. Augustin et al. In: *JINST* 11.11 (2017), C11029. DOI: 10.1088/1748-0221/11/11/C11029

# The MuPix System-on-Chip for the Mu3e Experiment

H. Augustin et al. In: *Nucl. Instrum. Meth.* A845 (2017), 194-198. DOI: 10.1016/j.nima.2016.06.095.

# **Overview of HVCMOS Pixel Sensors**

I. Perić et al. In: *JINST* 10.05 (2015), C05021. DOI: 10.1088/1748-0221/10/05/C05021.

# B

- G. Aad et al. "Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC". In: *Physics Letters B* 716.1 (2012), pp. 1–29. DOI: https://doi.org/10.1016/j.physletb.2012.08.020.URL: https://www.sciencedirect.com/science/article/pii/S037026931200857X.
- S. Chatrchyan et al. "Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC". In: *Physics Letters B* 716.1 (2012), pp. 30–61. DOI: https://doi.org/10.1016/j.physletb. 2012.08.021. URL: https://www.sciencedirect.com/science/ article/pii/S0370269312008581.
- [3] E. Corbelli and P. Salucci. "The extended rotation curve and the dark matter halo of M33". In: *Monthly Notices of the Royal Astronomical Society* 311.2 (2000), 441–447. DOI: 10.1046/j.1365-8711.2000.03075.x. URL: http://dx.doi.org/10.1046/j.1365-8711.2000.03075.x.
- [4] O. Aberle et al. High-Luminosity Large Hadron Collider (HL-LHC): Technical design report. CERN Yellow Reports: Monographs. Geneva: CERN, 2020. DOI: 10.23731/CYRM-2020-0010. URL: https://cds. cern.ch/record/2749422.
- [5] Michelangelo Mangano et al. FCC Physics Opportunities: Future Circular Collider Conceptual Design Report Volume 1. Future Circular Collider. Tech. rep. Geneva: CERN, 2018. DOI: 10.1140 / epjc / s10052 019 6904 3. URL: https://cds.cern.ch/record/2651294.
- [6] Q. R. Ahmad et al. "Measurement of the charged current interactions produced by B-8 solar neutrinos at the Sudbury Neutrino Observatory". In: *Phys. Rev. Lett.* 87 (2001), p. 071301. eprint: nucl - ex/ 0106015.
- [7] F.P. An et al. "Observation of electron-antineutrino disappearance at Daya Bay". In: *Phys.Rev.Lett.* 108 (2012), p. 171803. DOI: 10.1103/ PhysRevLett.108.171803. arXiv: 1203.1669 [hep-ex].
- [8] Y. Fukuda et al. "Evidence for oscillation of atmospheric neutrinos". In: *Phys. Rev. Lett.* 81 (1998), pp. 1562–1567. eprint: hep - ex / 9807003.

- [9] J. Bernabéu, E. Nardi, and D. Tommasini. "mu-e conversion in nuclei and Z-prime physics". In: *Nuclear Physics B* 409.1 (1993), pp. 69–86. DOI: https://doi.org/10.1016/0550-3213(93)90446-V. URL: https://www.sciencedirect.com/science/article/pii/055032139390446V.
- [10] U. Bellgardt et al. "Search for the Decay  $\mu^+ \rightarrow e^+ e^+ e^-$ ". In: *Nucl.Phys.* B299 (1988), p. 1. DOI: 10.1016/0550-3213(88)90462-2.
- [11] A. M. Baldini et al. "Search for the lepton flavour violating decay  $\mu^+ \rightarrow e^+ \gamma$  with the full dataset of the MEG experiment". In: *Eur. Phys. J. C* 76.8 (2016), p. 434. DOI: 10.1140/epjc/s10052-016-4271-x. arXiv: 1605.05081 [hep-ex].
- [12] J. Kaulard et al. "Improved limit on the branching ratio of  $\mu^- \rightarrow e^+$  conversion on titanium". In: *Phys.Lett.* B422 (1998), pp. 334–338. DOI: 10.1016/S0370-2693(97)01423-8.
- [13] William J. Marciano, Toshinori Mori, and J. Michael Roney. "Charged Lepton Flavor Violation Experiments". In: Annual Review of Nuclear and Particle Science 58.1 (2008), pp. 315–341. DOI: 10.1146/ annurev.nucl.58.110707.171126. eprint: https://doi.org/10. 1146/annurev.nucl.58.110707.171126. URL: https://doi.org/ 10.1146/annurev.nucl.58.110707.171126.
- [14] A. Baldini et al. *A submission to the 2020 update of the European Strategy for Particle Physics on behalf of the COMET, MEG, Mu2e and Mu3e collaborations.* 2018. arXiv: 1812.06540 [hep-ex].
- [15] A. Blondel et al. *Letter of Intent for an Experiment to Search for the*  $Decay \mu \rightarrow eee. 2012.$
- [16] A. Blondel et al. "Research Proposal for an Experiment to Search for the Decay  $\mu \rightarrow eee$ ". In: *ArXiv:1301.6113* (Jan. 2013). arXiv: 1301.6113 [physics.ins-det].
- [17] K. Arndt et al. "Technical design of the phase I Mu3e experiment". In: Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 1014 (2021), p. 165679. DOI: https://doi.org/10.1016/j.nima. 2021.165679.URL: https://www.sciencedirect.com/science/ article/pii/S0168900221006641.
- [18] HiMB. "The high intensity muon beam project." URL: https://www. psi.ch/en/ltp-muon-physics/the-high-intensity-muonbeam-himb-project.
- [19] G. Hernández-Tomé, G. López Castro, and P. Roig. "Flavor violating leptonic decays of  $\tau$  and  $\mu$  leptons in the Standard Model with massive neutrinos". In: *Eur. Phys. J. C* 79.1 (2019). [Erratum: Eur.Phys.J.C 80, 438 (2020)], p. 84. DOI: 10.1140/epjc/s10052-019-6563-4. arXiv: 1807.06050 [hep-ph].

- [20] P. Blackstone, M. Fael, and E. Passemar. " $\tau \rightarrow \mu\mu\mu$  at a rate of one out of 10<sup>14</sup> tau decays?" In: *Eur. Phys. J. C* 80.6 (2020), p. 506. DOI: 10.1140/epjc/s10052-020-8059-7. arXiv: 1912.09862 [hep-ph].
- [21] Y. Kuno and Y. Okada. "Muon decay and physics beyond the standard model". In: *Rev. Mod. Phys.* 73 (2001), pp. 151–202. eprint: hepph/9909265.
- [22] Particle Data Group. "Review of Particle Physics". In: Progress of Theoretical and Experimental Physics 2020.8 (Aug. 2020). 083C01. DOI: 10.1093/ptep/ptaa104. eprint: https://academic.oup.com/ ptep/article-pdf/2020/8/083C01/34673722/ptaa104.pdf. URL: https://doi.org/10.1093/ptep/ptaa104.
- [23] Heiko Augustin et al. "The Mu3e Data Acquisition". In: *IEEE Transactions on Nuclear Science* 68.8 (2021), pp. 1833–1840. DOI: 10.1109/ TNS.2021.3084060.
- [24] B. Hyams, U. Koetz, E. Belau, R. Klanner, G. Lutz, E. Neugebauer, A. Wylie, and J. Kemmer. "A silicon counter telescope to study short-lived particles in high-energy hadronic interactions". In: *Nuclear Instruments and Methods in Physics Research* 205.1 (1983), pp. 99–105. DOI: https://doi.org/10.1016/0167-5087(83)90177-1.URL: https://www.sciencedirect.com/science/article/pii/0167508783901771.
- [25] The ATLAS Collaboration. "The ATLAS Experiment at the CERN Large Hadron Collider". In: *Journal of Instrumentation* 3.08 (2008), S08003–S08003. DOI: 10.1088/1748-0221/3/08/s08003. URL: https://doi.org/10.1088/1748-0221/3/08/s08003.
- [26] The CMS Collaboration. "The CMS experiment at the CERN LHC". In: *Journal of Instrumentation* 3.08 (2008), S08004–S08004. DOI: 10. 1088/1748-0221/3/08/s08004. URL: https://doi.org/10.1088/ 1748-0221/3/08/s08004.
- [27] I. Perić. "A novel monolithic pixelated particle detector implemented in high-voltage CMOS technology". In: *Nucl.Instrum.Meth.* A582 (2007), p. 876. DOI: 10.1016/j.nima.2007.07.115.
- [28] H. Augustin et al. "The MuPix System-on-Chip for the Mu3e Experiment". In: *Nucl. Instrum. Meth.* A845 (2017), pp. 194–198. DOI: 10.1016 / j.nima.2016.06.095. arXiv: 1603.08751 [physics.ins-det].
- [29] H. Augustin et al. "MuPix7-A fast monolithic HV-CMOS pixel chip for Mu3e". In: *Journal of Instrumentation* 11.11 (2016), p. C11029. URL: http://stacks.iop.org/1748-0221/11/i=11/a=C11029.
- [30] H. Augustin et al. "The MuPix Telescope: A Thin, high Rate Tracking Telescope". In: *JINST* 12.01 (2017), p. C01087. DOI: 10.1088/1748-0221/12/01/C01087. arXiv: 1611.03102 [physics.ins-det].

- [31] H. Augustin et al. "Irradiation study of a fully monolithic HV-CMOS pixel sensor design in AMS 180 nm". In: *Nucl. Instrum. Meth.* A905 (2018), pp. 53–60. DOI: 10.1016/j.nima.2018.07.044. arXiv: 1712.03921 [physics.ins-det].
- [32] H. Augustin et al. "Efficiency and timing performance of the MuPix7 high-voltage monolithic active pixel sensor". In: *Nucl. Instrum. Meth.* A902 (2018), pp. 158–163. DOI: 10 . 1016 / j . nima . 2018 . 06 . 039 , 10 . 1016 / j . nima . 2018 . 06 . 049. arXiv: 1803 . 01581 [physics.ins-det].
- [33] H. Bethe, W. Heitler, and Paul Adrien Maurice Dirac. "On the stopping of fast particles and on the creation of positive electrons". In: *Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character* 146.856 (1934), pp. 83–112. DOI: 10.1098/rspa.1934.0140. eprint: https://royalsocietypublishing.org/doi/pdf/10.1098/rspa.1934.0140. URL: https://royalsocietypublishing.org/doi/abs/10.1098/rspa.1934.0140.
- [34] Stephen M. Seltzer and Martin J. Berger. "Improved procedure for calculating the collision stopping power of elements and compounds for electrons and positrons". In: *The International Journal* of Applied Radiation and Isotopes 35.7 (1984). URL: http://www. sciencedirect.com/science/article/pii/0020708X84901133.
- [35] William R. Leo. *Techniques for nuclear and particle physics experiments.* Springer, 1987.
- [36] W. Shockley. "The theory of p-n junctions in semiconductors and p-n junction transistors". In: *The Bell System Technical Journal* 28.3 (1949), pp. 435–489. DOI: 10.1002/j.1538-7305.1949.tb03645.x.
- [37] A. Meneses. "Thesis in Preparation". PhD thesis. Heidelberg University, 2022. URL: https://www.psi.ch/mu3e/theses.
- [38] OPTIM Waferservices. Private Communication. URL: https://www. optimwaferservices.com.
- [39] Heinz Pernegger. *The Pixel Detector of the ATLAS Experiment for LHC Run-2*. Tech. rep. Geneva: CERN, 2015. DOI: 10.1088/1748-0221/10/06/C06012. URL: https://cds.cern.ch/record/1985432.
- [40] N. Demaria et al. "Recent progress of RD53 Collaboration towards next generation Pixel Read-Out Chip for HL-LHC". In: *Journal of Instrumentation* 11.12 (2016), pp. C12058–C12058. DOI: 10.1088 / 1748 - 0221/11/12/c12058. URL: https://doi.org/10.1088 / 1748-0221/11/12/c12058.
- [41] Ricardo Marco-Hernández. "Overview of CMOS Sensors for Future Tracking Detectors". In: *Instruments* 4.4 (2020). DOI: 10.3390 / instruments4040036. URL: https://www.mdpi.com/2410-390X/ 4/4/36.

- [42] R. Diener et al. "The DESY II Test Beam Facility". In: Nucl. Instrum. Meth. A922 (2019), pp. 265–286. DOI: 10.1016/j.nima.2018.11.
   133. arXiv: 1807.09328 [physics.ins-det].
- [43] Giacomo Contin et al. "The STAR MAPS-based PiXeL detector". In: *Nucl. Instrum. Meth. A* 907 (2018), pp. 60–80. DOI: 10.1016/j.nima. 2018.03.003. arXiv: 1710.02176 [physics.ins-det].
- [44] M. Mager. "ALPIDE, the Monolithic Active Pixel Sensor for the AL-ICE ITS upgrade". In: Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 824 (2016). Frontier Detectors for Frontier Physics: Proceedings of the 13th Pisa Meeting on Advanced Detectors, pp. 434–438. DOI: https://doi.org/10.1016/j.nima. 2015.09.057.URL: https://www.sciencedirect.com/science/ article/pii/S0168900215011122.
- [45] H. Augustin et al. "Performance of the large scale HV-CMOS pixel sensor MuPix8". In: *JINST* 14.10 (2019), p. C10011. DOI: 10 . 1088 / 1748 0221 / 14 / 10 / C10011. arXiv: 1905 . 09309 [physics.ins-det].
- [46] C. Blattgerste. "Study of different 180 nm HV-CMOS Implementations of the MuPix7 Pixel Sensor". Bachelor thesis. Heidelberg University, 2019. URL: https://www.psi.ch/mu3e/theses.
- [47] LTU. "LED Technologies of Ukraine". URL: http://ltu.ua/en/ index/.
- [48] M. Oinonen et al. "ALICE Silicon Strip Detector module assembly with single-point TAB interconnections". In: *Proceedings, eleventh Workshop on Electronics for LHC and Future Experiments, Heidelberg, Germany, 12-16 September 2005.* 2005, pp. 92–98. DOI: 10.5170 / CERN 2005 011. URL: http://lhc-workshop-2005.web.cern.ch/lhc%2Dworkshop%2D2005/PlenarySessions/15-M0inonen.pdf.
- [49] Ivan Perić et al. "High-Voltage CMOS Active Pixel Sensor". In: *IEEE Journal of Solid-State Circuits* 56.8 (2021), pp. 2488–2502. DOI: 10. 1109/JSSC.2021.3061760.
- [50] J. Hammerich. "Analog Characterization and Time Resolution of a large scale HV-MAPS Prototype". Master thesis. Heidelberg University, 2018. URL: https://www.psi.ch/mu3e/theses.
- [51] A. L. Weber. "Development of Integrated Circuits and Smart Sensors for Particle Detection in Physics Experiments and Particle Therapy". PhD thesis. Heidelberg University, 2021. URL: https://www.psi. ch/mu3e/theses.
- [52] S. Dittmeier. "Fast data acquisition for silicon tracking detectors at high rates". PhD thesis. Heidelberg University, 2018. URL: https:// www.psi.ch/mu3e/theses.

- [53] L. Huth. "A High Rate Testbeam Data Acquisition System and Characterization of High Voltage Monolithic Active Pixel Sensors". PhD thesis. Heidelberg University, 2018. URL: https://www.psi.ch/ mu3e/theses.
- [54] J. Stricker. "Testing of a Method for the Sensor Thickness Determination and a Cluster Size Study for the MuPix10". Bachelor thesis. Heidelberg University, 2021. URL: https://www.psi.ch/mu3e/theses.
- [55] F. Frauen. "Characterisation of the time resolution of the MuPix10 pixel sensor". Bachelor thesis. Heidelberg University, 2021. URL: https://www.psi.ch/mu3e/theses.
- [56] M. Menzel. "Calibration of the MuPix10 pixel sensor for the Mu3e experiment". Bachelor thesis. Heidelberg University, 2020. URL: https: //www.psi.ch/mu3e/theses.
- [57] T. Rudzki. "Thesis in Preparation". PhD thesis. Heidelberg University, 2022. URL: https://www.psi.ch/mu3e/theses.
- [58] Dominik Becker et al. "The P2 experiment". In: *The European Physical Journal A* 54.11 (2018). DOI: 10.1140/epja/i2018-12611-6.
   URL: http://dx.doi.org/10.1140/epja/i2018-12611-6.
- [59] The PANDA collaboration. TDR for the PANDA Luminosity Detector. Tech. rep. GSI, 2015. URL: https://panda.gsi.de/publication/ re-tdr-2015-001.
- [60] L. Huth et al. "TelePix 1 A HV-CMOS Pixel sensor with region of interest trigger and timing capabilites for beam telescopes". In preparation.
- [61] A. Herkert. "Characterisation of a Monolithic Pixel Sensor Prototype in HV-CMOS Technology for the High-Luminosity LHC". PhD thesis. Heidelberg University, 2020. URL: https://www.psi.ch/mu3e/ theses.
- [62] D. Kim. "Timing Study and Optimization of ATLASPix3 a full-scale HV-MAPS Prototype". Bachelor thesis. Heidelberg University, 2020. URL: https://www.psi.ch/mu3e/theses.
- [63] R. Schimassek et al. "Test results of ATLASPIX3 A reticle size HVC-MOS pixel sensor designed for construction of multi chip modules". In: Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 986 (2021), p. 164812. DOI: https://doi.org/10.1016/j.nima.2020.164812.URL: https://www.sciencedirect.com/science/article/pii/S0168900220312092.
- [64] W. Snoeys et al. "A process modification for CMOS monolithic active pixel sensors for enhanced depletion, timing performance and radiation tolerance". In: Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 871 (2017), pp. 90–96. DOI: https://doi.org/10.

1016/j.nima.2017.07.046.URL: https://www.sciencedirect. com/science/article/pii/S016890021730791X.

- [65] H. Pernegger et al. "Radiation hard monolithic CMOS sensors with small electrodes for High Luminosity LHC". In: Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 986 (2021), p. 164381. DOI: https://doi.org/10.1016/j.nima.2020.164381.URL: https://www.sciencedirect.com/science/article/pii/S0168900220307786.
- [66] R. Cardella et al. "MALTA: an asynchronous readout CMOS monolithic pixel detector for the ATLAS High-Luminosity upgrade". In: *Journal of Instrumentation* 14.06 (2019), pp. C06019–C06019. DOI: 10.1088/1748-0221/14/06/c06019. URL: https://doi.org/ 10.1088/1748-0221/14/06/c06019.
- [67] G. Iacobucci et al. "A 50 ps resolution monolithic active pixel sensor without internal gain in SiGe BiCMOS technology". In: *Journal of Instrumentation* 14.11 (2019), P11008–P11008. DOI: 10.1088/1748-0221/14/11/p11008. URL: https://doi.org/10.1088/1748-0221/14/11/p11008.
- [68] Maurice Garcia-Sciveres, Flavio Loddo, and Jorgen Christiansen. RD53B Manual. Tech. rep. CERN-RD53-PUB-19-002. Geneva: CERN, 2019. URL: https://cds.cern.ch/record/2665301.
- [69] Heiko Augustin et al. "MuPix10: First Results from the Final Design". In: Proceedings of the 29th International Workshop on Vertex Detectors (VERTEX2020). DOI: 10.7566/JPSCP.34.010012. eprint: https: //journals.jps.jp/doi/pdf/10.7566/JPSCP.34.010012. URL: https://journals.jps.jp/doi/abs/10.7566/JPSCP.34.010012.
- [70] Yungseon Eo, W.R. Eisenstadt, Ju Young Jeong, and Oh-Kyong Kwon.
   "A new on-chip interconnect crosstalk model and experimental verification for CMOS VLSI circuit design". In: *IEEE Transactions on Electron Devices* 47.1 (2000), pp. 129–140. DOI: 10.1109/16.817578.
- [71] R. Gupta, B. Tutuianu, and L.T. Pileggi. "The Elmore delay as a bound for RC trees with generalized input signals". In: *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems* 16.1 (1997), pp. 95–104. DOI: 10.1109/43.559334.
- [72] Howard W. Johnson and Martin Graham. *High-speed signal propagation. advanced black magic.* eng. 10. printing. Prentice Hall modern semiconductor design series. Includes bibliographical references and index. Upper Saddle River, NJ: Prentice Hall PTR, 2008, XXX, 766 S.

- [73] A.B. Kahng, K. Masuko, and S. Muddu. "Analytical delay models for VLSI interconnects under ramp input". In: *Proceedings of International Conference on Computer Aided Design*. 1996, pp. 30–36. DOI: 10.1109/ICCAD.1996.568907.
- [74] W. C. Elmore. "The Transient Response of Damped Linear Networks with Particular Regard to Wideband Amplifiers". In: *Journal of Applied Physics* 19.1 (1948), pp. 55–63. DOI: 10.1063/1.1697872. eprint: https://doi.org/10.1063/1.1697872. URL: https://doi.org/10.1063/1.1697872.
- [75] R. Brun and F. Rademakers. "ROOT: An object oriented data analysis framework". In: *Nucl. Instrum. Meth. A* 389 (1997). Ed. by M. Werlen and D. Perret-Gallix, pp. 81–86. DOI: 10.1016/S0168-9002(97)00048-X.
- [76] LHCb. "Preliminary Specification of aHV-CMOS Pixel Chip for LHCbUpgrade II". Internal note.



Die letzte Seite widme ich allen die mich bei der Durchführung dieser Doktorarbeit unterstützt haben.

Bei meinem Doktorvater André Schöning bedanke ich mich für die Unterstüzung und das Vertrauen, dass er in mich gesetzt hat, sowie die anregenden Diskussionen die immer wieder einem Blick über den Tellerrand führen und zum Erfolg dieser Arbeit beigetragen haben.

Peter Fischer gilt mein Dank für die Geduld und die Bereitschaft sich als Zweitgutachter zur Verfügung zu stellen.

Ein riesiges Dankeschön gilt allen Mitgliedern der Mu3e-Arbeitsgruppe die mich durch die Jahre begleitet haben und für eine unglaublich gute Atmosphäre gesorgt haben, sei es am Institut, der Testbeam-Kampagne oder dem abendlichen Grillfest. Mein besonderer Dank geht an meine langjährigen Kollegen Ann-Kathrin Perrevoort, Moritz Kiehn, Lennart Huth, Sebastian Dittmeier, Adrian Herkert, Dorothea vom Bruch, Jan Hammerich, David Immig, Benjamin Weinläder und Dohun Kim, die immer für unterhaltsame Gespräche auch abseits der Arbeit zu haben sind. Was ich mit euch in Jahren intensiver Zusammenarbeit gelernt habe wird mich mein Leben lang begleiten. Danke für eure Freundschaft.

Ich möchte mich außerdem bei Nik Berger und Dirk Wiedner für alle Lehrstunden durch die Jahre bedanken. Diese Grundlagen werde ich nie mehr vergessen.

Ein herzliches Danke schön geht natürlich auch an meine Korrekturleser, die auch kurzfristig zur Verfügung standen: Lennart Huth, Sebastian Dittmeier, Adrian Herkert, Jan Hammerich, David Immig, Thomas Rudzki und Luigi Vigani.

Bei Ivan Perić, Alena Weber und Benjamin Weinläder möchte ich mich für die tiefen Diskussionen zum Theme Chipdesign bedanken. Es hat sicher zum Erfolg dieser Arbeit bei getragen. Ein besonderes Dankeschön geht dabei an Ivan für die intensiv Betreuung vor den Sensorsubmissionen.

Ein großes Dankeschön geht auch an Lutz Hofmann für die interdisziplinären Frühstückstreffen, die mich früher als sonst aus dem Bett gebracht haben.

Meinen Eltern und meiner Schwester möchte ich für die riesige Unterstützung durch all die Jahre des Studiums danken. Es ist gut zu wissen, dass es einen sicheren Hafen gibt.

In diesem Sinne auch einen großen Dank an alle Freunde von nah und fern, insbesondere die Urpils-Clique, die mich auch nach längerer Abwesenheit mit offenen Armen empfangen.

Zuletzt möchte ich mich bei Julia Riese bedanken, die mir immer zur Seite steht und mich insbesodere in dieser Zeit des Aufschreibens ertragen und durchgefüttert hat. Danke mein Schatz.