YANUB: yet another (nearly) useless blog: 2021

Wednesday, July 7, 2021

Upcoming features of QSoas and github repository

For the past years, most of the development has happened behind the scene in a private repository, and the code has appeared in the public repository only a couple of months before the release, in the release branch. I have now decided to publish the current code of QSoas in the github repository (in the public branch). This way, you can follow and use all the good things that were developed since the last release, and also verify whether any bug you have is still present in the currently developed version !

Upcoming features

This is the occasion to write a bit about the some of the features that have been added since the publication of the 3.0 release. Not all of them are polished nor documented yet, but here are a few teasers. The current version in github has:

a comprehensive handling of column/row names, which makes it much easier to work with files with named columns (like the output files QSoas produces !);
better handling of lists of meta-data, when there is one value of the meta for each segment or each Y column;
handling of complex numbers in apply-formula;
defining fits using external python code;
a command for linear least squares (which has the huge advantage of being exact and not needing any initial parameters);
commands to pause in a script or sort datasets in the stack;
improvements over previous commands, in particular with eval;
... and more...

Check out the github repository if you want to know more about the new features !

As of now, no official date is planned for the 3.1 release, but this could happen during fall.

About QSoas

QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the GNU General Public License. It is described in Fourmond, Anal. Chem., 2016, 88 (10), pp 5050–5052. Current version is 3.0. You can download its source code there (or clone from the GitHub repository) and compile it yourself, or buy precompiled versions for MacOS and Windows there.

Sunday, June 13, 2021

Solution for QSoas quiz #2: averaging several Y values for the same X value

This post describes two similar solutions to the Quiz #2, using the data files found there. The two solutions described here rely on split-on-values. The first solution is the one that came naturally to me, and is by far the most general and extensible, but the second one is shorter, and doesn't require external script files.

Solution #1

The key to both solution is to separate the original data into a series of datasets that only contain data at a fixed value of x (which corresponds here to a fixed pH), and then process each dataset one by one to extract the average and standard deviation. This first step is done thus:

QSoas> load kcat-vs-ph.dat
QSoas> split-on-values pH x /flags=data

After these commands, the stacks contains a series of datasets bearing the data flag, that each contain a single column of data, as can be seen from the beginnings of a show-stack command:

QSoas> k
Normal stack:
	 F  C	Rows	Segs	Name	
#0	(*) 1	43	1	'kcat-vs-ph_subset_22.dat'
#1	(*) 1	44	1	'kcat-vs-ph_subset_21.dat'
#2	(*) 1	43	1	'kcat-vs-ph_subset_20.dat'
...

Each of these datasets have a meta-data named pH whose value is the original x value from kcat-vs-ph.dat. Now, the idea is to run a stats command on the resulting datasets, extracting the average value of x and its standard deviation, together with the value of the meta pH. The most natural and general way to do this is to use run-for-datasets, using the following script file (named process-one.cmds):

stats /meta=pH /output=true /stats=x_average,x_stddev

So the command looks like:

QSoas> run-for-datasets process-one.cmds flagged:data

This command produces an output file containing, for each flagged dataset, a line containing x_average, x_stddev, and pH. Then, it is just a matter of loading the output file and shuffling the columns in the right order to get the data in the form asked. Overall, this looks like this:

l kcat-vs-ph.dat
split-on-values pH x /flags=data
output result.dat /overwrite=true
run-for-datasets process-one.cmds flagged:data
l result.dat
apply-formula tmp=y2;y2=y;y=x;x=tmp
dataset-options /yerrors=y2

The slight improvement over what is described above is the use of the output command to write the output to a dedicated file (here result.dat), instead of out.dat and ensuring it is overwritten, so that no data remains from previous runs.

Solution #2

The second solution is almost the same as the first one, with two improvements:

the stats command can work with datasets other than the current one, by supplying them to the /buffers= option, so that it is not necessary to use run-for-datasets;
the use of the output file can by replaced by the use of the accumulator.

This yields the following, smaller, solution:

l kcat-vs-ph.dat
split-on-values pH x /flags=data
stats /meta=pH /accumulate=* /stats=x_average,x_stddev /buffers=flagged:data
pop
apply-formula tmp=y2;y2=y;y=x;x=tmp
dataset-options /yerrors=y2

About QSoas

Sunday, May 30, 2021

QSoas quiz #2: averaging several Y values for the same X value

This second quiz may sound like the first one, but in fact, the approach used is completely different. The point is to gather some elementary statistics from a series of experiments performed under different conditions, but with several repeats at the same conditions.

Quiz

You are given a file (which you can download there) that contains a series of pH value data: the X column is the pH, the Y column the result of the experiment at the given pH (let's say the measure of the catalytic rate of an enzyme). Your task is to take this data and produce a single dataset which contains, for each pH value, the pH, the average of the results at that pH and the standard deviation. The result should be identical to the following file, and should look like that:

There are several ways to do this, but all ways must rely on stats, and the more natural way in QSoas is to take advantage of split-on-values, which is a very powerful command but somehow hard to master, which is the point of this Quiz.
By the way, the data file is purely synthetic, if you look in the GitHub repository, you'll see how it was generated.

About QSoas

Sunday, May 16, 2021

Tutorial: analyze redox inactivations/reactivations

Redox-dependent inactivations are actually rather common in the field of metalloenzymes, and electrochemistry can be an extremely powerful tool to study them, providing one can analyze the data quantitatively. The point of this point is to teach the reader how to do so using QSoas. For more general information about redox inactivations and how to study them using electrochemical techniques, the reader is invited to read the review del Barrio and Fourmond, ChemElectroChem 2019.

This post is a tutorial to learn the analysis of data coming from the study of the redox-dependent substrate inhibition of periplasmic nitrate reductase NapAB, which has the advantage of being relatively simple. The whole processed is discussed in Jacques et al, BBA, 2014. What you need to know in order to follow this tutorial is the following:

the whole inactivation/reactivation process can be modelled by a simple reversible reaction: $$ \mathrm{A} \rightleftharpoons \mathrm{I} $$ A is the active form, I the inactive form;
the forward rate constant is $k_i(E)$ (dependent on potential) and the backward rate constant is $k_a(E)$, also dependent on potential;
the experiment is done in a series of 5 steps at 3 different potentials: $E_0$ then $E_1$ then $E_2$ then $E_1$ then, finally, $E_0$;
the enzyme is assumed to be fully active at the beginning of the first step;
a single experiment is used to obtain the values of $k_i$ and $k_a$ for the three potentials (although not reliably for the value at $E_0$
the current given by the active species depends on potential (and it is negative because the enzyme catalyzes a reduction), and the inactive species gives no current;
in addition to the reversible reaction above, there is an irreversible, potential-dependent loss.

You can download the data files from the GitHub repository. Before fitting the data to determine the values of the rate constants at the potentials of the experiment, we will first subtract the background current, assuming that the respective contributions of faradaic and non-faradaic currents is additive. Start QSoas, go to the directory where you saved the files, and load both the data file and the blank file thus:

QSoas> cd
QSoas> load 27.oxw
QSoas> load 27-blanc.oxw
QSoas> S 1 0

(after the first command, you have to manually select the directory in which you downloaded the data files). The S 1 0 command just subtracts the dataset 1 (the first loaded) from the dataset 0 (the last loaded), see more there. blanc is the French for blank...

Then, we remove a bit of the beginning and the end of the data, corresponding to one half of the steps at $E_0$, which we don't exploit much here (they are essentially only used to make sure that the irreversible loss is taken care of properly). This is done using strip-if:

QSoas> strip-if x<30||x>300

Then, we can fit ! The fit used is called fit-linear-kinetic-system, which is used to fit kinetic models with only linear reactions (like here) and steps which change the values of the rate constants but do not instantly change the concentrations. The specific command to fit the data is:

QSoas> fit-linear-kinetic-system /species=2 /steps=0,1,2,1,0

The /species=2 indicates that there are two species (A and I). The /steps=0,1,2,1,0 indicates that there are 5 steps, with three different conditions (0 to 2) in order 0,1,2,1,0. This fits needs a bit of setup before getting started. The species are numbered, 1 and 2, and the conditions (potentials) are indicated by #0, #1 and #2 suffixes.

The I_1 and I_2 are the currents for the species 1 and 2, so something for 1 (active form) and 0 for 2 (inactive form). Moreover, the parameters I_2_#0 (and _#1, _#2) should be fixed and not free (since we don't need to adjust a current for the inactive form).
The k_11 and k_22 correspond to species-specific irreversible loss. It is generally best to leave them fixed to 0.
k_12 is the formation of 2 (I) from 1 (A), and k_21 is the formation of A from I. Their values will be determined for the three conditions. The default values should work here.
The k_loss parameters are the rates of irreversible loss that apply indiscriminately on all species (unlike k_11 and k_22). They are adjusted and ther default values should work too.
alpha_1_0 and alpha_2_0 are the initial concentrations of species 1 and 2, so they should be fixed to 1 and 0.
Last, the xstart_a and (_b, _c, _d and _e) correspond to the starting times for the steps, here, 0, 60, 120, 210 and 270.

For the sake of simplicity, you can also simply load the starting-parameters.params parameters to have all setup the correct way. Then, just hit Fit, enjoy this moment when QSoas works and you don't have to... The screen should now look like this:

Now, it's done ! The fit is actually pretty good, and you can read the values of the inactivation and reactivation rate constants from the fit parameters.

You can train also on the 21.oxw and 21-blanc.oxw files. Usually, re-loading the best fit parameters from other potentials as starting parameters work really well. Gathering the results of several fits into a real curve of rate constants as a function of potentials is left as an exercise for the reader (or maybe a later post), although you may find these series of posts useful in this context !

About QSoas

Thursday, March 11, 2021

All tips and tricks about QSoas

I've decided to post regular summaries of all the articles written here about QSoas; this is the first post of this kind. All the articles related to QSoas can be found here also.

The articles written here can be separated into several categories.

Tutorials to analyze real data

These are posts about how to reproduce the data analysis of published articles, including links to the original data so you can fully reproduce our results.

how to determine the $K_m$ of enzymes working with gaseous substrates with electrochemistry, as in Domnik et al, Angewandte Chemie 2017.

These posts all have the label tutorial.

All about fits

QSoas has a particularly powerful interface for non-linear least square minimisations (fits):

See here to learn how you can take advantage of the fit interface to explore easily how a parameter influences the shape of a function.
See here how you can produce smooth curves for a fit on jaggy data.

Meta-data

Meta data describe the conditions in which experiments were performed.

You can see here how to use them in data analysis, to plot for instance peak position as a function of meta-data;
You can see here how to permanently store meta-data for files.

Quiz and their solutions

Quiz are small problems that take some skill to solve; they can teach you a lot about how to work with QSoas.

Quiz #1: computing the standard deviation of spectra, along with the solution. This quiz can teach you a lot about combining data from different datasets and manipulating data row-by-row.

Other tips and tricks

See here how to find the 0s of experimental data.
See here how one can save just the selected points in a baseline, and reuse them.
See here to learn how to generate many datasets in one go from a mathematical formula.
See here how to define a custom mathematical function using Ruby.
See here how one can take advantage of Ruby, the underlying programming language to sum columns, extend the values of columns with lots of missing values and rename datasets using a pattern.
See here how you can use QSoas without starting the graphical interface !

Release annoucements

These have generally lot of general information about the possibilities in QSoas:

initial release;
version 2.0;
version 2.1;
version 2.2;
and, finally, version 3.0.

About QSoas

Tuesday, February 16, 2021

QSoas tips and tricks: permanently storing meta-data

It is one thing to acquire and process data, but the data themselves are most often useless without the context, the conditions in which the experiments were made. These additional informations can be called meta-data. In a previous post, we have already described how one can set meta-data to data that are already loaded, and how one can make use of them.

QSoas is already able to figure out some meta-data in the case of electrochemical data, most notably in the case of files acquired by GPES, ECLab or CHI potentiostats. However, only a small number of constructors are supported as of now^[1], and there are a number of experimental details that the software is never going to be able to figure out for you, such as the pH, the sample, what you were doing...

The new version of QSoas provides a means to permanently store meta-data for experimental data files:

QSoas> record-meta pH 7 file.dat

This command uses record-meta to permanently store the information pH = 7 for the file file.dat. Any time QSoas loads the file again, either today or in one year, the meta-data will contain the value 7 for the field pH. Behind the scenes, QSoas creates a single small file, file.dat.qsm, in which the meta-data are stored (in the form of a JSON dictionnary).

You can set the same meta-data to many files in one go, using wildcards (see load for more information). For instance, to set the pH=7 meta-data to all the .dat files in the current directory, you can use:

QSoas> record-meta pH 7 *.dat

You can only set one meta-data for each call to record-meta, but you can use it as many times as you like.

Finally, you can use the /for-which option to load or browse to select only the files which have the meta you need:

QSoas> browse /for-which=$meta.pH<=7

This command browses the files in the current directory, showing only the ones that have a pH meta-data which is 7 or below.

[1] I'm always ready to implement the parsing of other file formats that could be useful for you. If you need parsing of special files, please contact me, sending the given files and the meta-data you'd expect to find in those.

About QSoas

Wednesday, January 13, 2021

Taking advantage of Ruby in QSoas

First of all, let me all wish you a happy new year, with all my wishes of health and succes. I sincerely hope this year will be simpler for most people as last year !

For the first post of the year, I wanted to show you how to take advantage of Ruby, the programming language embedded in QSoas, to make various things, like:

creating a column with the sum of Y values;
extending values that are present only in a few lines;
renaming datasets using a pattern.

Summing the values in a column

When using commands that take formulas (Ruby code), like apply-formula, the code is run for every single point, for which all the values are updated. In particulier, the state of the previous point is not known. However, it is possible to store values in what is called global variables, whose name start with an $ sign. Using this, we can keep track of the previous values. For instance, to create a new column with the sum of the y values, one can use the following approach:

QSoas> eval $sum=0
QSoas> apply-formula /extra-columns=1 $sum+=y;y2=$sum

The first line initializes the variable to 0, before we start summing, and the code in the second line is run for each dataset row, in order. For the first row, for instance, $sum is initially 0 (from the eval line); after the execution of the code, it is now the first value of y. After the second row, the second value of y is added, and so on. The image below shows the resulting y2 when used on:

QSoas> generate-dataset -1 1 x

Extending values in a column

Another use of the global variables is to add "missing" data. For instance, let's imagine that a files given the variation of current over time as the potential is changed, but the potential is only changed stepwise and only indicated when it changes:

## time	current	potential
0	0.1	0.5
1	0.2
2	0.3
3	0.2
4	1.2	0.6
5	1.3
...

If you need to have the values everywhere, for instance if you need to split on their values, you could also use a global variable, taking advantage of the fact that missing values are represented by QSoas using "Not A Number" values, which can be detected using the Ruby function nan?:

QSoas> apply-formula "if y2.nan?; then y2=$value; else $value=y2;end"

Note the need of quotes because there are spaces in the ruby code. If the value of y2 is NaN, that is it is missing, then it is taken from the global variable $value else $value is set the current value of y2. Hence, the values are propagated down:

## time	current	potential
0	0.1	0.5
1	0.2	0.5
2	0.3	0.5
3	0.2	0.5
4	1.2	0.6
5	1.3	0.6
...

Of course, this doesn't work if the first value of y2 is missing.

Renaming using a pattern

The command save-datasets can be used to save a whole series of datasets to the disk. It can also rename them on the fly, and, using the /mode=rename option, does only the renaming part, without saving. You can make full use of meta-data (see also a first post here)for renaming. The full power is unlocked using the /expression= option. For instance, for renaming the last 5 datasets (so numbers 0 to 4) using a scheme based on the value of their pH meta-data, you can use the following code:

QSoas> save-datasets /mode=rename /expression='"dataset-#{$meta.pH}"' 0..4

The double quotes are cumbersome but necessary, since the outer quotes (') prevent the inner ones (") to be removed and the inner quotes are here to indicate to Ruby that we are dealing with text. The bit inside #{...} is interpreted by Ruby as Ruby code; here it is $meta.pH, the value of the "pH" meta-data. Finally the 0..4 specifies the datasets to work with. So theses datasets will change name to become dataset-7 for pH 7, etc...