Thursday, March 11, 2021

All tips and tricks about QSoas

I've decided to post regular summaries of all the articles written here about QSoas; this is the first post of this kind. All the articles related to QSoas can be found here also.

The articles written here can be separated into several categories.

Tutorials to analyze real data

These are posts about how to reproduce the data analysis of published articles, including links to the original data so you can fully reproduce our results. These posts all have the label tutorial.

All about fits

QSoas has a particularly powerful interface for non-linear least square minimisations (fits):
  • See here to learn how you can take advantage of the fit interface to explore easily how a parameter influences the shape of a function.
  • See here how you can produce smooth curves for a fit on jaggy data.

Meta-data

Meta data describe the conditions in which experiments were performed.
  • You can see here how to use them in data analysis, to plot for instance peak position as a function of meta-data;
  • You can see here how to permanently store meta-data for files.

Quiz and their solutions

Quiz are small problems that take some skill to solve; they can teach you a lot about how to work with QSoas.
  • Quiz #1: computing the standard deviation of spectra, along with the solution. This quiz can teach you a lot about combining data from different datasets and manipulating data row-by-row.

Other tips and tricks

  • See here how to find the 0s of experimental data.
  • See here how one can save just the selected points in a baseline, and reuse them.
  • See here to learn how to generate many datasets in one go from a mathematical formula.
  • See here how to define a custom mathematical function using Ruby.
  • See here how one can take advantage of Ruby, the underlying programming language to sum columns, extend the values of columns with lots of missing values and rename datasets using a pattern.
  • See here how you can use QSoas without starting the graphical interface !

Release annoucements

These have generally lot of general information about the possibilities in QSoas:

About QSoas

QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the GNU General Public License. It is described in Fourmond, Anal. Chem., 2016, 88 (10), pp 5050–5052. Current version is 3.0. You can download its source code there (or clone from the GitHub repository) and compile it yourself, or buy precompiled versions for MacOS and Windows there.

Tuesday, February 16, 2021

QSoas tips and tricks: permanently storing meta-data

It is one thing to acquire and process data, but the data themselves are most often useless without the context, the conditions in which the experiments were made. These additional informations can be called meta-data. In a previous post, we have already described how one can set meta-data to data that are already loaded, and how one can make use of them.

QSoas is already able to figure out some meta-data in the case of electrochemical data, most notably in the case of files acquired by GPES, ECLab or CHI potentiostats. However, only a small number of constructors are supported as of now[1], and there are a number of experimental details that the software is never going to be able to figure out for you, such as the pH, the sample, what you were doing...

The new version of QSoas provides a means to permanently store meta-data for experimental data files:

QSoas> record-meta pH 7 file.dat
This command uses record-meta to permanently store the information pH = 7 for the file file.dat. Any time QSoas loads the file again, either today or in one year, the meta-data will contain the value 7 for the field pH. Behind the scenes, QSoas creates a single small file, file.dat.qsm, in which the meta-data are stored (in the form of a JSON dictionnary).

You can set the same meta-data to many files in one go, using wildcards (see load for more information). For instance, to set the pH=7 meta-data to all the .dat files in the current directory, you can use:

QSoas> record-meta pH 7 *.dat
You can only set one meta-data for each call to record-meta, but you can use it as many times as you like.

Finally, you can use the /for-which option to load or browse to select only the files which have the meta you need:

QSoas> browse /for-which=$meta.pH<=7
This command browses the files in the current directory, showing only the ones that have a pH meta-data which is 7 or below.

[1] I'm always ready to implement the parsing of other file formats that could be useful for you. If you need parsing of special files, please contact me, sending the given files and the meta-data you'd expect to find in those.

About QSoas

QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the GNU General Public License. It is described in Fourmond, Anal. Chem., 2016, 88 (10), pp 5050–5052. Current version is 3.0. You can download its source code there (or clone from the GitHub repository) and compile it yourself, or buy precompiled versions for MacOS and Windows there.

Wednesday, January 13, 2021

Taking advantage of Ruby in QSoas

First of all, let me all wish you a happy new year, with all my wishes of health and succes. I sincerely hope this year will be simpler for most people as last year !

For the first post of the year, I wanted to show you how to take advantage of Ruby, the programming language embedded in QSoas, to make various things, like:

  • creating a column with the sum of Y values;
  • extending values that are present only in a few lines;
  • renaming datasets using a pattern.

Summing the values in a column

When using commands that take formulas (Ruby code), like apply-formula, the code is run for every single point, for which all the values are updated. In particulier, the state of the previous point is not known. However, it is possible to store values in what is called global variables, whose name start with an $ sign. Using this, we can keep track of the previous values. For instance, to create a new column with the sum of the y values, one can use the following approach:
QSoas> eval $sum=0
QSoas> apply-formula /extra-columns=1 $sum+=y;y2=$sum
The first line initializes the variable to 0, before we start summing, and the code in the second line is run for each dataset row, in order. For the first row, for instance, $sum is initially 0 (from the eval line); after the execution of the code, it is now the first value of y. After the second row, the second value of y is added, and so on. The image below shows the resulting y2 when used on:
QSoas> generate-dataset -1 1 x


Extending values in a column

Another use of the global variables is to add "missing" data. For instance, let's imagine that a files given the variation of current over time as the potential is changed, but the potential is only changed stepwise and only indicated when it changes:
## time	current	potential
0	0.1	0.5
1	0.2
2	0.3
3	0.2
4	1.2	0.6
5	1.3
...
If you need to have the values everywhere, for instance if you need to split on their values, you could also use a global variable, taking advantage of the fact that missing values are represented by QSoas using "Not A Number" values, which can be detected using the Ruby function nan?:
QSoas> apply-formula "if y2.nan?; then y2=$value; else $value=y2;end"
Note the need of quotes because there are spaces in the ruby code. If the value of y2 is NaN, that is it is missing, then it is taken from the global variable $value else $value is set the current value of y2. Hence, the values are propagated down:
## time	current	potential
0	0.1	0.5
1	0.2	0.5
2	0.3	0.5
3	0.2	0.5
4	1.2	0.6
5	1.3	0.6
...
Of course, this doesn't work if the first value of y2 is missing.

Renaming using a pattern

The command save-datasets can be used to save a whole series of datasets to the disk. It can also rename them on the fly, and, using the /mode=rename option, does only the renaming part, without saving. You can make full use of meta-data (see also a first post here)for renaming. The full power is unlocked using the /expression= option. For instance, for renaming the last 5 datasets (so numbers 0 to 4) using a scheme based on the value of their pH meta-data, you can use the following code:
QSoas> save-datasets /mode=rename /expression='"dataset-#{$meta.pH}"' 0..4
The double quotes are cumbersome but necessary, since the outer quotes (') prevent the inner ones (") to be removed and the inner quotes are here to indicate to Ruby that we are dealing with text. The bit inside #{...} is interpreted by Ruby as Ruby code; here it is $meta.pH, the value of the "pH" meta-data. Finally the 0..4 specifies the datasets to work with. So theses datasets will change name to become dataset-7 for pH 7, etc...

About QSoas

QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the GNU General Public License. It is described in Fourmond, Anal. Chem., 2016, 88 (10), pp 5050–5052. Current version is 3.0. You can download its source code there (or clone from the GitHub repository) and compile it yourself, or buy precompiled versions for MacOS and Windows there.

Monday, December 14, 2020

Version 3.0 of QSoas is out

After almost two years of development, version 3.0 of QSoas is finally out ! It brings in a number of new features.

An expert mode for fitting

Undoubtedly the most important feature in the new version is a complete upgrade of the fit system, which now features an expert mode, turned on by using the /expert=true option with the fit commands. The expert mode features a command prompt that looks like the normal command prompt, in which it is possible:
  • to fix/unfix, set, reset and load ans save parameters;
  • export parameters or computed datasets;
  • launch the fit;
  • select fit engines and configure them;
  • list of all the fits that were run in the session with their starting and finishing parameters (see picture), and reuse them;
  • run scripts;
  • fit data completely non-interactively;
  • and, last but not least, perform parameter space exploration, which consists in running (automatically) a number of fits with different (random) starting parameters to maximize the chances of finding out the best parameters for the fit, avoiding local minima.
The latter feature is very important when running fits with many parameters. In that case, there are a number of local minima, and it is necessary to try a number of different starting parameters to really find the best parameters. The new parameter space exploration feature makes it much easier than before, with an interface that allows easily finding the best parameters tried so far and reuse them.

A new documentation system

Another very important update is the inclusion of a new, offline, documentation system. The documentation features browsing via table of contents, a command index and text search. It also features the possibility to copy commands from the help to the command prompt, or even run them directly. To top it all, it comes with a series of startup tips that might teach you a thing or two about QSoas (try hitting Show random to learn new tricks !).

Many other features

For the full list of changes, please see the changelog. Apart from the changes described above, these are my favorites:
  • running the command from the menu now gives a dialog box for selecting all the arguments and options, which is very useful to discover the possibilities of the various commands;
  • features that make the statistics and meta-data much easier to use with commands like apply-formula, like automatic code completion;
  • a new system for permanently storing meta-data using record-meta;
  • a new fit for fitting implicit equations, in which one cannot express \(y = f(x)\), but only \(f(x,y)=0\);
  • and, of course, a brand shiny new logo by Marta Meneghello !

To get the new version, you can just download the source code from the downloads page, where you can also purchase precompiled versions for Windows and MacOS. You can also clone the source from the GitHub repository.

About QSoas

QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the GNU General Public License. It is described in Fourmond, Anal. Chem., 2016, 88 (10), pp 5050–5052. Current version is 3.0. You can download its source code there (or clone from the GitHub repository) and compile it yourself, or buy precompiled versions for MacOS and Windows there.

Monday, November 23, 2020

QSoas tips and tricks: using meta-data, first level

By essence, QSoas works with \(y = f(x)\) datasets. However, in practice, when working with experimental data (or data generated from simulations), one has often more than one experimental parameter (\(x\)). For instance, one could record series of spectra (\(A = f(\lambda)\)) for different pH values, so that the absorbance is in fact a function of both the pH and \(\lambda\). QSoas has different ways to deal with such situations, and we'll describe one today, using meta-data.

Setting meta-data

Meta-data are simply series of name/values attached to a dataset. It can be numbers, dates or just text. Some of these are automatically detected from certain type of data files (but that is the topic for another day). The simplest way to set meta-data is to use the set-meta command:
QSoas> set-meta pH 7.5
This command sets the meta-data pH to the value 7.5. Keep in mind that QSoas does not know anything about the meaning of the meta-data[1]. It can keep track of the meta-data you give, and manipulate them, but it will not interpret them for you. You can set several meta-data by repeating calls to set-meta, and you can display the meta-data attached to a dataset using the command show. Here is an example:
QSoas> generate-buffer 0 10
QSoas> set-meta pH 7.5
QSoas> set-meta sample "My sample"
QSoas> show 0
Dataset generated.dat: 2 cols, 1000 rows, 1 segments, #0
Flags: 
Meta-data:	pH =	 7.5	sample =	 My sample
Note here the use of quotes around My sample since there is a space inside the value.

Using meta-data

There are many ways to use meta-data in QSoas. In this post, we will discuss just one: using meta-data in the output file. The output file can collect data from several commands, like peak data, statistics and so on. For instance, each time the command 1 is run, a line with the information about the largest peak of the current dataset is written to the output file. It is possible to automatically add meta-data to those lines by using the /meta= option of the output command. Just listing the names of the meta-data will add them to each line of the output file.

As a full example, we'll see how one can take advantage of meta-data to determine the position of the peak of the function \(x^2 \exp (-a\,x)\) depends on \(a\). For that, we first create a script that generates the function for a certain value of \(a\), sets the meta-data a to the corresponding value, and find the peak. Let's call this file do-one.cmds (all the script files can be found in the GitHub repository):

generate-buffer 0 20 x**2*exp(-x*${1})
set-meta a ${1}
1 
This script takes a single argument, the value of \(a\), generates the appropriate dataset, sets the meta-data a and writes the data about the largest (and only in this case) peak to the output file. Let's now run this script with 1 as an argument:
QSoas> @ do-one.cmds 1
This command generates a file out.dat containing the following data:
## buffer       what    x       y       index   width   left_width      right_width     area
generated.dat   max     2.002002002     0.541340590883  100     3.4034034034    1.24124124124   2.162162162161.99999908761
This gives various information about the peak found: the name of the dataset it was found in, whether it's a maximum or minimum, the x and y positions of the peak, the index in the file, the widths of the peak and its area. We are interested here mainly in the x position.

Then, we just run this script for several values of \(a\) using run-for-each, and in particular the option /range-type=lin that makes it interpret values like 0.5..5:80 as 80 values evenly spread between 0.5 and 5. The script is called run-all.cmds:

output peaks.dat /overwrite=true /meta=a
run-for-each do-one.cmds /range-type=lin 0.5..5:80
V all /style=red-to-blue
The first line sets up the output to the output file peaks.dat. The option /meta=a makes sure the meta a is added to each line of the output file, and /overwrite=true make sure the file is overwritten just before the first data is written to it, in order to avoid accumulating the results of different runs of the script. The last line just displays all the curves with a color gradient. It looks like this:
Running this script (with @ run-all.cmds) creates a new file peaks.dat, whose first line looks like this:
## buffer       what    x       y       index   width   left_width      right_width     area    a
The column x (the 3rd) contains the position of the peaks, and the column a (the 10th) contains the meta a (this column wasn't present in the output we described above, because we had not used yet the output /meta=a command). Therefore, to load the peak position as a function of a, one has just to run:
QSoas> load peaks.dat /columns=10,3
This looks like this:
Et voilĂ  !

To train further, you can:

  • improve the resolution in x;
  • improve the resolution in y;
  • plot the magnitude of the peak;
  • extend the range;
  • derive the analytical formula for the position of the peak and verify it !

[1] this is not exactly true. For instance, some commands like unwrap interpret the sr meta-data as a voltammetric scan rate if it is present. But this is the exception.

About QSoas

QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the GNU General Public License. It is described in Fourmond, Anal. Chem., 2016, 88 (10), pp 5050–5052. Current version is 2.2. You can download its source code there (or clone from the GitHub repository) and compile it yourself, or buy precompiled versions for MacOS and Windows there.

Wednesday, November 11, 2020

Solution for QSoas quiz #1: averaging spectra

This post describes the solution to the Quiz #1, based on the files found there. The point is to produce both the average and the standard deviation of a series of spectra. Below is how the final averaged spectra shoud look like:
I will present here two different solutions.

Solution 1: using the definition of standard deviation

There is a simple solution using the definition of the standard deviation: $$\sigma_y = \sqrt{<y^2> - {<y>}^2}$$ in which \(<y^2>\) is the average of \(y^2\) (and so on). So the simplest solution is to construct datasets with an additional column that would contain \(y^2\), average these columns, and replace the average with the above formula. For that, we need first a companion script that loads a single data file and adds a column with \(y^2\). Let's call this script load-one.cmds:
load ${1}
apply-formula y2=y**2 /extra-columns=1
flag /flags=processed
When this script is run with the name of a spectrum file as argument, it loads it (replaces ${1} by the first argument, the file name), adds a column y2 containing the square of the y column, and flag it with the processed flag. This is not absolutely necessary, but it makes it much easier to refer to all the spectra when they are processed. Then to process all the spectra, one just has to run the following commands:
run-for-each load-one.cmds Spectrum-1.dat Spectrum-2.dat Spectrum-3.dat
average flagged:processed
apply-formula y2=(y2-y**2)**0.5
dataset-options /yerrors=y2
The run-for-each command runs the load-one.cmds script for all the spectra (one could also have used Spectra-*.dat to not have to give all the file names). Then, the average averages the values of the columns over all the datasets. To be clear, it finds all the values that have the same X (or very close X values) and average them, column by column. The result of this command is therefore a dataset with the average of the original \(y\) data as y column and the average of the original \(y^2\) data as y2 column. So now, the only thing left to do is to use the above equation, which is done by the apply-formula code. The last command, dataset-options, is not absolutely necessary but it signals to QSoas that the standard error of the y column should be found in the y2 column. This is now available as script method-one.cmds in the git repository.

Solution 2: use QSoas's knowledge of standard deviation

The other method is a little more involved but it demonstrates a good approach to problem solving with QSoas. The starting point is that, in apply-formula, the value $stats.y_stddev corresponds to the standard deviation of the whole y column... Loading the spectra yields just a series of x,y datasets. We can contract them into a single dataset with one x column and several y columns:
load Spectrum-*.dat /flags=spectra
contract flagged:spectra
After these commands, the current dataset contains data in the form of:
lambda1	a1_1	a1_2	a1_3
lambda2	a2_1	a2_2	a2_3
...
in which the ai_1 come from the first file, ai_2 the second and so on. We need to use transpose to transform that dataset into:
0	a1_1	a2_1	...
1	a1_2	a2_2	...
2	a1_3	a2_3	...
In this dataset, values of the absorbance for the same wavelength for each dataset is now stored in columns. The next step is just to use expand to obtain a series of datasets with the same x column and a single y column (each corresponding to a different wavelength in the original data). The game is now to replace these datasets with something that looks like:
0	a_average
1	a_stddev
For that, one takes advantage of the $stats.y_average and $stats.y_stddev values in apply-formula, together with the i special variable that represents the index of the point:
apply-formula "if i == 0; then y=$stats.y_average; end; if i == 1; then y=$stats.y_stddev; end"
strip-if i>1
Then, all that is left is to apply this to all the datasets created by expand, which can be just made using run-for-datasets, and then, we reverse the splitting by using contract and transpose ! In summary, this looks like this. We need two files. The first, process-one.cmds contains the following code:
apply-formula "if i == 0; then y=$stats.y_average; end; if i == 1; then y=$stats.y_stddev; end"
strip-if i>1
flag /flags=processed
The main file, method-two.cmds looks like this:
load Spectrum-*.dat /flags=spectra
contract flagged:spectra
transpose
expand /flags=tmp
run-for-datasets process-one.cmds flagged:tmp
contract flagged:processed
transpose
dataset-options /yerrors=y2
Note some of the code above can be greatly simplified using new features present in the upcoming 3.0 version, but that is the topic for another post.

About QSoas

QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the GNU General Public License. It is described in Fourmond, Anal. Chem., 2016, 88 (10), pp 5050–5052. Current version is 2.2. You can download its source code and compile it yourself or buy precompiled versions for MacOS and Windows there.

Thursday, October 22, 2020

QSoas tips and tricks: generating smooth curves from a fit

Often, one would want to generate smooth data from a fit over a small number of data points. For an example, take the data in the following file. It contains (fake) experimental data points that obey to Michaelis-Menten kinetics: $$v = \frac{v_m}{1 + K_m/s}$$ in which \(v\) is the measured rate (the y values of the data), \(s\) the concentration of substrate (the x values of the data), \(v_m\) the maximal rate and \(K_m\) the Michaelis constant. To fit this equation to the data, just use the fit-arb fit:
QSoas> l michaelis.dat
QSoas> fit-arb vm/(1+km/x)
After running the fit, the window should look like this:
Now, with the fit, we have reasonable values for \(v_m\) (vm) and \(K_m\) (km). But, for publication, one would want to generate "smooth" curve going through the lines... Saving the curve from "Data.../Save all" doesn't help, since the data has as many points as the original data and looks very "jaggy" (like on the screenshot above)... So one needs a curve with more data points.

Maybe the most natural solution is simply to use generate-buffer together with apply-formula using the formula and the values of km and vm obtained from the fit, like:

QSoas> generate-buffer 0 20
QSoas> apply-formula y=3.51742/(1+3.69767/x)
By default, generate-buffer generate 1000 evenly spaced x values, but you can change their number using the /samples option. The two above commands can be combined to just one call to generate-buffer:
QSoas> generate-buffer 0 20 3.51742/(1+3.69767/x)
This works, but it is quite cumbersome and it is not going to work well for complex formulas or the results of differential equations or kinetic systems...

This is why to each fit- command corresponds a sim- command that computes the result of the fit using a "saved parameters" file (here, michaelis.params, but you can also save it yourself) and buffers as "models" for X values:

QSoas> generate-buffer 0 20
QSoas> sim-arb vm/(1+km/x) michaelis.params 0
This strategy works with every single fit ! As an added benefit, you even get the fit parameters as meta-data, which are displayed by the show command:
QSoas> show 0
Dataset generated_fit_arb.dat: 2 cols, 1000 rows, 1 segments, #0
Flags: 
Meta-data:	commands =	 sim-arb vm/(1+km/x) michaelis.params 0	fit =	 arb (formula: vm/(1+km/x))	km =	 3.69767
	vm =	 3.5174
They also get saved as comments if you save the data.

Important note: the sim-arb command will be available only in the 3.0 release, although you can already enjoy it if you use the github version.

About QSoas

QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the GNU General Public License. It is described in Fourmond, Anal. Chem., 2016, 88 (10), pp 5050–5052. Current version is 2.2. You can download its source code and compile it yourself or buy precompiled versions for MacOS and Windows there.