This post describes the solution to the
Quiz #1, based on the files found
there. The point is to produce both the average and the standard deviation of a series of spectra. Below is how the final averaged spectra shoud look like:
I will present here two different solutions.
Solution 1: using the definition of standard deviation
There is a simple solution using the definition of the standard deviation:
$$\sigma_y = \sqrt{<y^2> - {<y>}^2}$$
in which \(<y^2>\) is the average of \(y^2\) (and so on). So the simplest solution is to construct datasets with an additional column that would contain \(y^2\), average these columns, and replace the average with the above formula. For that, we need first a companion script that loads a single data file and adds a column with \(y^2\). Let's call this script
load-one.cmds
:
load ${1}
apply-formula y2=y**2 /extra-columns=1
flag /flags=processed
When this script is run with the name of a spectrum file as argument, it loads it (replaces
${1}
by the first argument, the file name), adds a column
y2
containing the square of the
y
column, and flag it with the
processed
flag. This is not absolutely necessary, but it makes it much easier to refer to all the spectra when they are processed. Then to process all the spectra, one just has to run the following commands:
run-for-each load-one.cmds Spectrum-1.dat Spectrum-2.dat Spectrum-3.dat
average flagged:processed
apply-formula y2=(y2-y**2)**0.5
dataset-options /yerrors=y2
The
run-for-each
command runs the
load-one.cmds
script for all the spectra (one could also have used
Spectra-*.dat
to not have to give all the file names). Then, the
average
averages the values of the columns over all the datasets. To be clear, it finds all the values that have the same X (or very close X values) and average them, column by column. The result of this command is therefore a dataset with the average of the original \(y\) data as
y
column and the average of the original \(y^2\) data as
y2
column. So now, the only thing left to do is to use the above equation, which is done by the
apply-formula
code. The last command,
dataset-options
, is not absolutely necessary but it signals to QSoas that the standard error of the
y
column should be found in the
y2
column. This is now available as script
method-one.cmds
in the
git repository.
Solution 2: use QSoas's knowledge of standard deviation
The other method is a little more involved but it demonstrates a good approach to problem solving with QSoas. The starting point is that, in
apply-formula
, the value
$stats.y_stddev
corresponds to the standard deviation of the whole
y
column... Loading the spectra yields just a series of
x,y
datasets. We can
contract
them into a single dataset with one
x
column and several
y
columns:
load Spectrum-*.dat /flags=spectra
contract flagged:spectra
After these commands, the current dataset contains data in the form of:
lambda1 a1_1 a1_2 a1_3
lambda2 a2_1 a2_2 a2_3
...
in which the
ai_1
come from the first file,
ai_2
the second and so on. We need to use
transpose
to transform that dataset into:
0 a1_1 a2_1 ...
1 a1_2 a2_2 ...
2 a1_3 a2_3 ...
In this dataset, values of the absorbance for the same wavelength for each dataset is now stored in columns. The next step is just to use
expand
to obtain a series of datasets with the same x column and a single y column (each corresponding to a different wavelength in the original data). The game is now to replace these datasets with something that looks like:
0 a_average
1 a_stddev
For that, one takes advantage of the
$stats.y_average
and
$stats.y_stddev
values in
apply-formula
, together with the
i
special variable that represents the index of the point:
apply-formula "if i == 0; then y=$stats.y_average; end; if i == 1; then y=$stats.y_stddev; end"
strip-if i>1
Then, all that is left is to apply this to all the datasets created by
expand
, which can be just made using
run-for-datasets
, and then, we reverse the splitting by using
contract
and
transpose
! In summary, this looks like this. We need two files. The first,
process-one.cmds
contains the following code:
apply-formula "if i == 0; then y=$stats.y_average; end; if i == 1; then y=$stats.y_stddev; end"
strip-if i>1
flag /flags=processed
The main file,
method-two.cmds
looks like this:
load Spectrum-*.dat /flags=spectra
contract flagged:spectra
transpose
expand /flags=tmp
run-for-datasets process-one.cmds flagged:tmp
contract flagged:processed
transpose
dataset-options /yerrors=y2
Note some of the code above can be greatly simplified using new features present in the upcoming 3.0 version, but that is the topic for another post.
About QSoas
QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the
GNU General Public License. It is described in
Fourmond, Anal. Chem., 2016, 88 (10), pp 5050–5052. Current version is
2.2. You can download its source code and compile it yourself or buy precompiled versions for MacOS and Windows
there.
No comments:
Post a Comment