This post describes two similar solutions to the
Quiz #2, using the data files found
there. The two solutions described here rely on
split-on-values
. The first solution is the one that came naturally to me, and is by far the most general and extensible, but the second one is shorter, and doesn't require external script files.
Solution #1
The key to both solution is to separate the original data into a series of datasets that only contain data at a fixed value of x (which corresponds here to a fixed pH), and then process each dataset one by one to extract the average and standard deviation. This first step is done thus:
QSoas> load kcat-vs-ph.dat
QSoas> split-on-values pH x /flags=data
After these commands, the stacks contains a series of datasets bearing the
data
flag, that each contain a single column of data, as can be seen from the beginnings of a show-stack command:
QSoas> k
Normal stack:
F C Rows Segs Name
#0 (*) 1 43 1 'kcat-vs-ph_subset_22.dat'
#1 (*) 1 44 1 'kcat-vs-ph_subset_21.dat'
#2 (*) 1 43 1 'kcat-vs-ph_subset_20.dat'
...
Each of these datasets have a meta-data named
pH
whose value is the original x value from
kcat-vs-ph.dat
. Now, the idea is to run a
stats
command on the resulting datasets, extracting the average value of x and its standard deviation, together with the value of the meta
pH
. The most natural and general way to do this is to use
run-for-datasets
, using the following script file (named
process-one.cmds
):
stats /meta=pH /output=true /stats=x_average,x_stddev
So the command looks like:
QSoas> run-for-datasets process-one.cmds flagged:data
This command produces an output file containing, for each flagged dataset, a line containing
x_average
,
x_stddev
, and
pH
. Then, it is just a matter of loading the output file and shuffling the columns in the right order to get the data in the form asked. Overall, this looks like
this:
l kcat-vs-ph.dat
split-on-values pH x /flags=data
output result.dat /overwrite=true
run-for-datasets process-one.cmds flagged:data
l result.dat
apply-formula tmp=y2;y2=y;y=x;x=tmp
dataset-options /yerrors=y2
The slight improvement over what is described above is the use of the
output
command to write the output to a dedicated file (here
result.dat
), instead of
out.dat
and ensuring it is overwritten, so that no data remains from previous runs.
Solution #2
The second solution is almost the same as the first one, with two improvements:
- the
stats
command can work with datasets other than the current one, by supplying them to the /buffers=
option, so that it is not necessary to use run-for-datasets
;
- the use of the output file can by replaced by the use of the accumulator.
This yields the following, smaller,
solution:
l kcat-vs-ph.dat
split-on-values pH x /flags=data
stats /meta=pH /accumulate=* /stats=x_average,x_stddev /buffers=flagged:data
pop
apply-formula tmp=y2;y2=y;y=x;x=tmp
dataset-options /yerrors=y2
About QSoas
QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the
GNU General Public License. It is described in
Fourmond, Anal. Chem., 2016, 88 (10), pp 5050–5052. Current version is
3.0. You can download its source code
there (or clone from the
GitHub repository) and compile it yourself, or buy precompiled versions for MacOS and Windows
there.
No comments:
Post a Comment