Let’s stick with the Volt meter example from yesterday, but use a parabolic model:
๐ฆ = ๐ + ๐๐ฅ (๐ + ๐๐ฅ ๐) ๐๐ฅ = ๐ฅ โ ๐ฅโ
The parameters ๐, ๐ and ๐ need to be determined during the calibration procedure.
measurement number | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
reference ๐ฅ (true) [V] | 0 | 1 | 2 | 3 | 4 | 5 |
measured ๐ฆ [V] | -0.10 | 0.84 | 1.91 | 2.88 | 4.06 | 4.83 |
The uncertainty for all measured voltages is 0.1 V.
Code up a visualisation of the data.
import ROOT
Vtrue = [0, 1, 2, 3, 4, 5]
V = [-.10, .84, 1.91, 2.88, 4.06, 4.83]
sigmas = [.1, .1, .1, .1, .1, .1]
ROOT.gROOT.SetBatch()
c = ROOT.TCanvas("c", "c", 800, 600)
# build graph with data
gdata = ROOT.TGraphErrors(len(Vtrue))
gdata.SetNameTitle("gdata", "data")
for i in range(0, len(Vtrue)):
gdata.SetPoint(i, Vtrue[i], V[i])
gdata.SetPointError(i, 0., sigmas[i])
gdata.SetLineWidth(2)
gdata.SetMarkerStyle(7)
gdata.SetMarkerSize(2)
gdata.Draw("AP") # draw data
c.SaveAs("day2-ex1.pdf")
Given the model from the last slide,
๐ฆ = ๐ + ๐๐ฅ (๐ + ๐๐ฅ ๐) ๐๐ฅ = ๐ฅ โ ๐ฅโ
the ฯยฒ function comes out as
ฯยฒ = โ แตข (๐ฆแตข โ ๐ - ๐๐ฅแตข ๐ - ๐๐ฅแตขยฒ ๐)ยฒ/2ฯแตขยฒ
We need to minimize that function with respect to the fit parameters ๐, ๐ and ๐.
ฯยฒ = โ แตข (๐ฆแตข โ ๐ - ๐๐ฅแตข ๐ - ๐๐ฅแตขยฒ ๐)ยฒ/2ฯแตขยฒ
Calculate the derivative with respect to ๐, ๐ and ๐, and put them to zero.
ฯยฒ = โ แตข (๐ฆแตข โ ๐ - ๐๐ฅแตข ๐ - ๐๐ฅแตขยฒ ๐)ยฒ/2ฯแตขยฒ
Putting the derivatives to zero yields:
0 = ๐ฯยฒ/๐๐ = โ แตข (๐ฆแตข โ ๐ - ๐๐ฅแตข ๐ - ๐๐ฅแตขยฒ ๐)( -1 ) / ฯแตขยฒ
0 = ๐ฯยฒ/๐๐ = โ แตข (๐ฆแตข โ ๐ - ๐๐ฅแตข ๐ - ๐๐ฅแตขยฒ ๐)(-๐๐ฅแตข ) / ฯแตขยฒ
0 = ๐ฯยฒ/๐๐ = โ แตข (๐ฆแตข โ ๐ - ๐๐ฅแตข ๐ - ๐๐ฅแตขยฒ ๐)(-๐๐ฅแตขยฒ) / ฯแตขยฒ
0 = ๐ฯยฒ/๐๐ = โ แตข (๐ฆแตข โ ๐ - ๐๐ฅแตข ๐ - ๐๐ฅแตขยฒ ๐)( -1 ) / ฯแตขยฒ
0 = ๐ฯยฒ/๐๐ = โ แตข (๐ฆแตข โ ๐ - ๐๐ฅแตข ๐ - ๐๐ฅแตขยฒ ๐)(-๐๐ฅแตข ) / ฯแตขยฒ
0 = ๐ฯยฒ/๐๐ = โ แตข (๐ฆแตข โ ๐ - ๐๐ฅแตข ๐ - ๐๐ฅแตขยฒ ๐)(-๐๐ฅแตขยฒ) / ฯแตขยฒ
In terms of our “shorthand”,
๐ โจ1โฉ + ๐ โจ๐๐ฅโฉ + c โจ๐๐ฅแตขยฒโฉ = โจ๐ฆโฉ
๐ โจ๐๐ฅโฉ + ๐ โจ๐๐ฅแตขยฒโฉ + c โจ๐๐ฅแตขยณโฉ = โจ๐ฆ ๐๐ฅโฉ
๐ โจ๐๐ฅแตขยฒโฉ + ๐ โจ๐๐ฅแตขยณโฉ + c โจ๐๐ฅแตขโดโฉ = โจ๐ฆ ๐๐ฅแตขยฒโฉ
๐ โจ1โฉ + ๐ โจ๐๐ฅโฉ + c โจ๐๐ฅแตขยฒโฉ = โจ๐ฆโฉ
๐ โจ๐๐ฅโฉ + ๐ โจ๐๐ฅแตขยฒโฉ + c โจ๐๐ฅแตขยณโฉ = โจ๐ฆ ๐๐ฅโฉ
๐ โจ๐๐ฅแตขยฒโฉ + ๐ โจ๐๐ฅแตขยณโฉ + c โจ๐๐ฅแตขโดโฉ = โจ๐ฆ ๐๐ฅแตขยฒโฉ
Or, in matrix form (let’s call the matrix below M):
/ โจ1โฉ โจ๐๐ฅโฉ โจ๐๐ฅแตขยฒโฉ \ / a \ / โจ๐ฆโฉ \
| โจ๐๐ฅโฉ โจ๐๐ฅแตขยฒโฉ โจ๐๐ฅแตขยณโฉ | | b | = | โจ๐ฆ ๐๐ฅโฉ |
\ โจ๐๐ฅแตขยฒโฉ โจ๐๐ฅแตขยณโฉ โจ๐๐ฅแตขโดโฉ / \ b / \ โจ๐ฆ ๐๐ฅแตขยฒโฉ /
measurement number | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
reference ๐ฅ (true) [V] | 0 | 1 | 2 | 3 | 4 | 5 |
measured ๐ฆ [V] | -0.10 | 0.84 | 1.91 | 2.88 | 4.06 | 4.83 |
The uncertainty for all measured voltages is 0.1 V. Using the model
๐ฆ = ๐ + ๐๐ฅ (๐ + ๐๐ฅ ๐) ๐๐ฅ = ๐ฅ โ ๐ฅโ
leads to the matrix equation
/ โจ1โฉ โจ๐๐ฅโฉ โจ๐๐ฅแตขยฒโฉ \ / a \ / โจ๐ฆโฉ \
| โจ๐๐ฅโฉ โจ๐๐ฅแตขยฒโฉ โจ๐๐ฅแตขยณโฉ | | b | = | โจ๐ฆ ๐๐ฅโฉ |
\ โจ๐๐ฅแตขยฒโฉ โจ๐๐ฅแตขยณโฉ โจ๐๐ฅแตขโดโฉ / \ b / \ โจ๐ฆ ๐๐ฅแตขยฒโฉ /
Write a piece of code to calculate the matrix and right hand side from the data.
def buildLinSystemForParabolaFit(xs, ys, sigmays, x0):
rhs = [0, 0, 0]
mat = [0, 0, 0, 0, 0, 0]
for x, y, sigma in zip(xs, ys, sigmays):
w = 1 / (sigma * sigma)
dx = x - x0
dx2 = dx * dx;
rhs[0] += w * y
rhs[1] += w * y * dx
rhs[2] += w * y * dx2
mat[0] += w
mat[1] += w * dx
mat[2] += w * dx2
mat[4] += w * dx * dx2
mat[5] += w * dx2 * dx2
mat[3] = mat[2]
return rhs, mat
A word on the routine on the last page is in order: It uses packed matrix storage. The matrix is symmetric, so it’s attractive to only save half of it.
Consider a symmetric 100 x 100 matrix (100 fit parameters can happen in practice). That means updating around 78 kiB of matrix elements for every measurement with normal martix storage. With packed storage, it’s only about 40 kiB of storage. For large matrices and/or large data sets, this translates into a nice speed gain.
Technically, one saves only the diagonale and below, and find element ๐แตขโฑผ in an array at index ๐ โ (๐ + 1) / 2 + ๐ for (๐ <= ๐):
๐โโ, ๐โโ, ๐โโ, ๐โโ, ๐โโ, ๐โโ โฆ
A linear system of equations, ๐ ๐ฑ = ๐, is usually solved by using some matrix decomposition method. The strategy is usually to
write ๐ = ๐พ ๐ where
๐พ is a matrix that is easily invertible
๐ is usually a upper right matrix, i.e. can be solved by substitution
/ ๐
โโ ๐
โโ โฏ ๐
โโ \
๐ = K โ | 0 ๐ โโ ๐ โโ | | โฎ โฑ โฑ โฎ | \ 0 โฏ 0 ๐ โโ /
๐พโปยน is thus the matrix that annihilates the lower left half of ๐ (as ๐พโปยน ๐ = ๐ ).
There are a number of common matrix decomposition strategies:
These methods have different properties regarding their numerical stability (more on that later).
Given a symmetric positive matrix ๐ = ๐แต > 0, Cholesky decomposition forms a lower triangular matrix ๐ฟ. Example:
/ 4 12 -16 \ / 2 0 0 \ / 2 6 -8 \
| 12 37 -43 | = | 6 1 0 | โ
| 0 1 5 |
\ -16 -43 98 / \ -8 5 3 / \ 0 0 3 /
For a general ๐ ร ๐ real matrix M, the elements of ๐ฟ can be computed as follows:
____________________
๐ฟโฑผโฑผ = โ ๐โฑผโฑผ โ โ โโโสฒโปยน ๐ฟโฑผโยฒ
๐ฟแตขโฑผ = 1 / ๐ฟโฑผโฑผ (๐แตขโฑผ โ โ โโโสฒโปยน ๐ฟแตขโ ๐ฟโฑผโ) for ๐ > ๐
A matrix ๐ is (alomst) never exact: floating point numbers have a finite precision, and often there are measurement uncertainties.
The matrix M itself is usually given by the problem, and we can not do anything about it. But the choice of matrix ๐พ in the decomposition ๐ = ๐พ ๐ can make matters worse or better.
Let’s look at the eigenvalues (EV) of K, ordered by absolute value:
|ฮปโแตขโ| = |ฮปโ| <= |ฮปโ| <= โฆ <= |ฮปโ| = |ฮปโโโ|
The overall scale of the EV does not matter, because an overall scaling does not change the relative numerical error in the calculation of ๐ = ๐พโปยน ๐ and ๐ซ’ = ๐พโปยน ๐ซ.
The “eigenvalue contrast” or condition number ฮบ = |ฮปโโโ| / |ฮปโแตขโ| is what drives numerical stability because it determines an upper bound for the “amplification” of the numerical error along the directions of the eigenvectors.
What are typical values of the condition number ฮบ?
๐ฟ๐ decomposition: depends on matrix ๐, but 10โดโฆ10โถ not uncommon (single precision float has only 7 decimal digits precision!)
๐๐ decomposition: ๐ has only eigenvalues ยฑ1 – numerically stable
SVD: uses two rotation/mirror matrices like ๐๐ – numerically stable
Cholesky decomposition:
/ 1 0 โฏ 0 \ / ๐ทโ 0 โฏ 0 \ / 1 ๐ฟโโ โฏ ๐ฟโโ
| ๐ฟโโ 1 โฑ 0 | | 0 ๐ทโ โฑ 0 | | 0 1 โฏ ๐ฟโโ |
| โฎ โฑ 0 | | 0 โฑ โฑ 0 | | 0 โฑ โฑ โฎ |
\ ๐ฟโโ ๐ฟโโ โฏ 1 / \ 0 โฏ 0 ๐ทโ / \ 0 โฏ 0 1 /
____________________
๐ฟโฑผโฑผ = โ ๐โฑผโฑผ โ โ โโโสฒโปยน ๐ฟโฑผโยฒ
๐ฟแตขโฑผ = 1 / ๐ฟโฑผโฑผ (๐แตขโฑผ โ โ โโโสฒโปยน ๐ฟแตขโ ๐ฟโฑผโ) for ๐ > ๐
Implement Cholesky decomposition in code, and write a routine that allows to solve ๐ ๐ฑ = ๐ซ with two substitution passes.
Test the routine. Write a piece of code that uses the solver from above to invert a symmetric positive-definite matrix, and test it.
bool choleskyDecomp(unsigned n, double* m)
{
PackedMatrixAdapter<double> M(m); // make packed storage accessible, see file
for (unsigned j = 0; n != j; ++j) {
double diag = M[j][j];
for (unsigned k = j; k--; )
diag -= M[j][k] * M[j][k];
if (diag <= 0) return false;
M[j][j] = diag = std::sqrt(diag);
diag = 1 / diag;
for (unsigned i = j + 1; n != i; ++i) {
double tmp = M[i][j];
for (unsigned k = 0; j != k; ++k)
tmp -= M[i][k] * M[j][k];
M[i][j] = tmp * diag;
}
}
return true;
}
void solve(unsigned n, const double* l, double* b)
{
PackedMatrixAdapter<const double> L(l);
// solve L y = b
for (unsigned i = 0; n != i; ++i) {
double tmp = b[i];
for (unsigned j = 0; i != j; ++j)
tmp -= L[i][j] * b[j];
b[i] = tmp / L[i][i];
}
// solve L^T x = y, save in b
for (unsigned i = n; i--; ) {
double tmp = b[i];
for (unsigned j = n; --j != i;)
tmp -= L[i][j] * b[j];
b[i] = tmp / L[i][i];
}
}
void invert(unsigned n, double* l)
{
std::vector<double> mat(l, l + n * (n + 1) / 2);
PackedMatrixAdapter<double> Li(l);
std::vector<double> tmp;
tmp.reserve(n);
for (unsigned i = 0; n != i; ++i) {
tmp.assign(n, 0);
tmp[i] = 1;
solve(n, mat.data(), tmp.data());
for (unsigned j = 0; j <= i; ++j)
Li[i][j] = tmp[j];
}
}
Take the code that calculates matrix and right hand side from earlier, and use the Cholesky code to solve the matrix equation, and work out the covariance.
If your Cholesky decomposition code does not work yet, look into numpy, ROOT, GSL or similar packages, and use their routines. (You have to know what these routines do, but in real life, you should likely use a library routine that’s well debugged…)
Visualize the results (like yesterday).
Ideally, one would like to assess how well one is doing with the fits. There are a couple of things we can look at:
Once the best fit point is known, we can calculate the ฯยฒ value for the data.
If things behave well, ฯยฒ should be about the number of degrees of freedom (ndf), i.e. the number of data points minus the number of fit parameters.
In other words, ฯยฒ / ndf should be around 1 for a good fit to more than a handful of data points.
Calculate ฯยฒ and ฯยฒ / ndf for your parabolic fit.
If the measurements are normally distributed, one can calculate from first principles how likely it is to get a certain value of ฯยฒ (given ndf).
The cumulative distribution function (CDF) measures how likely it is to get a ฯยฒ value that is at most as bad as the one observed. This is called the p-value.
/ ฯยฒ
p-value = | ๐ฯ'ยฒ ๐(ฯ'ยฒ, ๐๐๐)
/ 0
In ROOT, you can access this function with ROOT.TMath.Prob(chi2, ndf)
.
What is the p-value for your fit?
def calcChi2(xs, ys, sigmas, x0, param):
retVal = 0
for x, y, s in zip(xs, ys, sigmas):
w = 1 / (s * s)
dx = x - x0
ypred = param[0] + dx * (param[1] + dx * param[2])
retVal += (y - ypred) * (y - ypred) * w
return retVal
chi2 = calcChi2(xs, ys, sigmas, 2.5, crhs)
prob = ROOT.TMath.Prob(chi2, 3)
print("chi={} prob={}".format(chi2, prob))