# Optimal number of fitting points for msd measurements

NTA data is normally reduced to the fitting of the mean squared displacement of a particle, which can be quickly written as:

$$\left<\Delta x^2\right>(\tau) = 2D\tau$$

In the equation above, $$\tau$$ is the time step between measurements. For a normal track, there's going to be a $$\tau_1$$ which corresponds to the inverse of the frame rate, and a $$\tau_N$$ which corresponds to the total time of the acquisition.

The uncertainty with which $$\left<\Delta x^2\right>(\tau_i)$$ is measured will depend on the number of values being averaged. The smallest time-step is averaged over $$N$$ frames, the second over $$N-1$$, and so forth, until the last one is not averaged at all.

Taking into account the Sources of error in measuring diffusion coefficient through nanoparticle tracking analysis, one can assume that averaging many steps yields more accurate results. Although this is theoretically understood, experimentally may be harder to realize[@ernst2013Measuring a diffusion coefficient by single-particle tracking: statistical analysis of experimental mean squared displacement curves].

Therefore the question is how many $$\tau$$ points must be considered for yielding the better measurement of the diffusion coefficient $$D$$. In the figure below, there are experimental results showing what happens when the number of points considered for the fit changes:

The experiment in [@ernst2013Measuring a diffusion coefficient by single-particle tracking: statistical analysis of experimental mean squared displacement curves] is based on a very long track ($$10^5$$ data points) of a single-particle split in different sub-sets of different lengths. This allows to build distributions (and measure uncertainty) under the exact same experimental conditions and with the same object under study.

Hollow and filled squares represent different number of total frames, while the horizontal axis shows how many points were included in the MSD fit (like in the inset). Surprisingly, the optimum (lowest variance) happens at around 4 or 5 points. Taking into account more data points only worsens the results. The difference between $$N=100$$ and $$N=1000$$ is also worth exploring.

In the figure above, each distribution was generated at the optimal number of time-steps for the MSD fit, but with a different number of total frames. The most important message is:

For trajectories with a length in the order of 100 data points, the actual outcome of an experiment for the diffusion coefficient can vary by more than a factor of 2.

The standard deviation for the distributions can be described by:

$$\sigma=\pm \bar{D}\left(\frac{2n}{3(N_\textrm{seg}-n)^{1/2}}\right)$$

Where $$n$$ is the number of points used for the fit. One of the results that I like the most is that averaging many diffusion coefficients, the overall result is very close to the *ground truth*:

**To summarize**, if we use track lengths of around 100 points, the accuracy to calculate the diffusion coefficient is of around $$\pm25\%$$. With a track length of 1000 frames, the accuracy improves to around $$\pm10\%$$.

This definitely have an impact on the quality of the data generated by nanoparticle tracking analysis, and is probable one of the limitations of nanoparticle tracking analysis. Moreover, a factor 10 increase in the track length only yields a factor 2 improvement on the accuracy of the diffusion coefficient. The question I have is what would be the Optimal track length for MSD measurements, considering not only the accuracy, but the data volume and time it takes to generate the data.

Also, what is the impact of this uncertainty on the Stokes-Einstein relationship?

#### Backlinks

These are the other notes that link to this one.