calibration package¶

Submodules¶

calibration.dataprocessing module¶

calibration.errormetrics module¶

calibration.errormetrics.MAE(x, y)¶: Mean absolute error

calibration.errormetrics.MSE(x, y)¶: Mean Squared Error

calibration.errormetrics.NLPD(x, y, ystd)¶

(normalised) Negative Log Predictive Density

Definition of Negative Log Predictive Density (NLPD):

$$L = -

rac{1}{n} sum_{i=1}^n log p(y_i=t_i|mathbf{x}_i)$$

See http://mlg.eng.cam.ac.uk/pub/pdf/QuiRasSinetal06.pdf, page 13.

“This loss penalizes both over and under-confident predictions.” but “The NLPD loss favours conservative models, that is models that tend to beunder-confident rather than over-confident. This is illustrated in Fig. 7, and canbe deduced from the fact that logarithms are being used. An interesting way ofusing the NLPD is to give it relative to the NLPD of a predictor that ignoresthe inputs and always predicts the same Gaussian predictive distribution, withmean and variance the empirical mean and variance of the training data. Thisrelative NLPD translates into a gain of information with respect to the simpleGaussian predictor described.”

calibration.errormetrics.NMSE(x, y)¶: Normalised Mean Squared Error x = correct y = estimate see https://math.stackexchange.com/questions/488964/the-definition-of-nmse-normalized-mean-square-error

calibration.errormetrics.compute_test_data(X, Y, trueY, refsensor)¶

This method produces test data for evaluating models. It returns three matrices,

testX, testY, testtrueY.

The trueY parameter contains a list of known measurements, with ‘nan’s for all the other observations. For example, for the Kampala data I’ve picked out all the data which was observed by a low cost sensor next to sensor #47:

from calibration.errormetrics import compute_test_data trueY = np.full_like(Y[:,0],np.NaN) refkeeps = np.isin(X[:,2],[47]) trueY[refkeeps] = Y[refkeeps,1] refkeeps = np.isin(X[:,1],[47]) trueY[refkeeps] = Y[refkeeps,0] testX, testY, testtrueY = compute_test_data(X,Y,trueY,refsensor)

It constructs the output by considering all the colocations with reference instruments and then uess the non-reference instrument as the testY data, and the reference measured values as the trueY values. Parameters:

X = An Nx3 matrix of [time, sensoridA, sensoridB] Y = An Nx2 matrix of measured values at sensorA and sensorB. trueY = the true pollution at these measurements - if known - otherwise nan. refsensor = a binary vector of whether a sensor is a reference sensor or not.

Returns:: testX = an Mx3 matrix of [times, sensorid, 0] <-the last column is left in but unused. testY = the measured value at the colocated low cost sensor testtrueY = the true value of the colocation.

calibration.hasenfratz module¶

calibration.simple module¶

calibration.synthetic module¶

calibration.synthetic.generate_synthetic_dataset(Nstatic, Nmobile, Ttotal, Nrefs, Nvisitsperdayref, Nvisitsperday, staticsensornoise, mobilesensornoise, Nsamps)¶

Nvisitsperdayref = expected number of times each mobile sensor visits a static reference sensor. Nvisitsperday = expected number of times each mobile sensor visits a static sensor. Ttotal = number of time steps (hours) Nsamps = number of samples per colocation event

in reality the boda drivers have their own ‘patches’ which mean that they are more likely to visit sensors in their own patch. We build our synthetic data by temporarily building a simulation of the locations of the static sensors and the centres of the mobile (boda-boda) motorbike taxi activities. The boda-bodas are organised to have particular waiting areas, known as stages, around the city. Typically a boda-boda will have one stage they are allowed to wait at. They are also therefore more likely to visit sensors in their part of the city.

We simulate this by selecting randomly locations for these stages and then assign a probability of visiting proportional to the inverse distance.

With only two or three reference sensors arranged across Kampala we pay the boda-boda drivers to visit them once a week to recalibrate the mobile sensors. Future papers will explore the optimum sequence of visits.

Ignoring night/day, we simply note that each hour has a probability of approximately Nvisitsperday/24. We multiply this by the probabilities assigned to each static sensor (for being visited by each mobile sensor).

calibration.synthetic.getmobilesensortranform(t, pol, sensor, noisescale)¶

calibration.synthetic.getrealpolution(t, loc)¶

calibration.synthetic.getstaticsensortranform(t, pol, Nrefs, sensor, noisescale)¶

Module contents¶

class calibration.BrokenMultiKernel(gpflowkernels, indices)¶

Bases: object

matrix(X1, X2)¶

class calibration.CalibrationSystem(X, Y, Z, refsensor, C, transform_fn, gpflowkernels, kernelindices, likemodel='fixed', gpflowkernellike=None, likelihoodstd=1.0, jitter=0.0001, lr=0.02, likelr=None, minibatchsize=100, sideY=None)¶

Bases: object

computeforminibatch(justuserefs=False)¶

likelihoodfn(scaledA, scaledB, ref)¶

likelihoodfn_nonstationary(scaledA, scaledB, varparamA, varparamB)¶

precompute()¶

run(its=None, samples=100, threshold=0.001)¶

Run the VI optimisation.

its: Number of iterations. Set its to None to automatically stop when the ELBO has reduced by less than threshold percent (between rolling averages of the last 50 calculations and the 50 before that). samples: Number of samples for the stochastic sampling of the gradient threshold: if its is None, this is the percentage change between the rolling average, over 50 iterations. Default: 0.001 (0.1%).

class calibration.CalibrationSystemNoMiniBatch(X, Y, Z, refsensor, C, transform_fn, gpflowkernel, likemodel='fixed', gpflowkernellike=None, likelihoodstd=1.0, jitter=0.0001, lr=0.02, likelr=None, sideY=None)¶

Bases: object

likelihoodfn(scaledA, scaledB)¶

likelihoodfn_nonstationary(scaledA, scaledB, varparamA, varparamB)¶

precompute()¶

run(its=None, samples=100, threshold=0.001)¶

Run the VI optimisation.

its: Number of iterations. Set its to None to automatically stop when the ELBO has reduced by less than threshold percent (between rolling averages of the last 50 calculations and the 50 before that). samples: Number of samples for the stochastic sampling of the gradient threshold: if its is None, this is the percentage change between the rolling average, over 50 iterations. Default: 0.001 (0.1%).

class calibration.Kernel(gpflowkernel)¶

Bases: object

matrix(X1, X2)¶

class calibration.MultiKernel(gpflowkernels, indices)¶

Bases: object

matrix(X1, X2)¶

oldmatrix(X1, X2)¶

class calibration.SparseModel(X, Z, C, k, jitter=0.0001)¶

Bases: object

get_qf(mu, scale)¶

get_samples(mu, scale, num=100)¶

Get samples of the function components for every observation pair in X. Returns a num x N x (C*2) matrix,

where num = number of samples
N = number of observation pairs C = number of components

So for the tensor that is returned, the last dimension consists of the pairs of sensors, with each pair being one of the C components.

If scale is set to None, then we return a single sample, of the posterior mean (i.e. we assume a dirac q(f). Returns 1 x N x (C*2) matrix.

get_samples_one_sensor(mu, scale, num=100)¶

Get samples of the function components for a sensor. Returns a num x N x (C) matrix,

where num = number of samples
N = number of observation pairs C = number of components

So for the tensor that is returned, the last dimension consists of the pairs of sensors, with each pair being one of the C components.

class calibration.SparseModelNoMiniBatch(X, Z, C, k)¶

Bases: object

get_qf(mu, scale)¶

get_samples(mu, scale, num=100)¶

Get samples of the function components for every observation pair in X. Returns a num x N x (C*2) matrix,

where num = number of samples
N = number of observation pairs C = number of components

So for the tensor that is returned, the last dimension consists of the pairs of sensors, with each pair being one of the C components.

If scale is set to None, then we return a single sample, of the posterior mean (i.e. we assume a dirac q(f). Returns 1 x N x (C*2) matrix.

get_samples_one_sensor(mu, scale, num=100)¶

Get samples of the function components for a sensor. Returns a num x N x (C) matrix,

where num = number of samples
N = number of observation pairs C = number of components

So for the tensor that is returned, the last dimension consists of the pairs of sensors, with each pair being one of the C components.

class calibration.TemporaryMultiKernel(gpflowkernels, indices)¶

Bases: object

matrix(X1, X2)¶

calibration.getcov(scale)¶

calibration.placeinducingpoints(X, S, C, M=16)¶: set up inducing point locations - evenly spaced.