Structured data fusion: advanced concepts¶
A few more advanced modeling concepts are explained in this section. These
concepts allow for more dynamical and/or more compact models. Three concepts are
discussed: indexed references, the transform
field which applies
transformations to multiple variables in a single line, and the implicit factors
options which tries to automatically create missing factors from variables. The
details of these concepts can be found in the full language specification
(Structured data fusion: specification of domain specific language).
Using indexed notations¶
In the basic introduction to SDF (Structured data fusion) we always used named
variables, factors and factorizations, e.g., model.variables.a
, and named
references to variables and factors, e.g., model.factors.A = {'a'}
. While
this practice makes the model easy to read, it requires many keystrokes and it
complicates the adaption of models afterwards, e.g., if a third-order tensor model
is expanded into a fourth-order tensor model.
Internally, all structs are converted to cells, which means that instead of names indices are used. This cell format is available to the user. For example:
model = struct;
model.variables.a = a;
model.variables.b = b;
model.factors.A = 'a';
model.factors.B = 'b';
model.factorizations.mat.data = T;
model.factorizations.mat.cpd = {'A','B'};
can be rewritten using indexed references as
model = struct;
model.variables.a = a;
model.variables.b = b;
model.factors.A = 1; % refers to variable a
model.factors.B = 2; % refers to variable b
model.factorizations.mat.data = T;
model.factorizations.mat.cpd = 1:2; % refers to factors A (1) and B (2)
We see that both the names and indices of variables and factors can be used to refer to them. The index of the variable or factor is determined by the order in which the different variables or factors are defined the first time:
model = struct;
model.variables.a = a; % has index 1
model.variables.c = c; % has index 2
model.variables.a = a2; % still has index 1 (defined a first time before)
model.variables.b = b; % has index 3
The names of the variables, factors and factorizations can even be omitted. In this situation we use indexed variables, factors or factorizations instead.
model = struct;
model.variables{1} = a;
model.variables{2} = b;
model.factors{1} = 1;
model.factors{2} = 2;
model.factorizations{1}.data = T;
model.factorizations{1}.cpd = 1:2;
Named and indexed referencing can even be mixed, e.g.,
model = struct;
model.variables{1} = a;
model.variables{2} = b;
model.factors.A = 1;
model.factors.B = 2;
model.factorizations{1}.data = T; % a third-order tensor
model.factorizations{1}.cpd = {1, 'A', 'B'}; % == {1,1,2} or {'A','A','B'}
Named references can only be used if the variable or factor being referred to
has been named. Within the same field (variables
, factors
or
factorizations
) all definitions are either named or indexed. The following
block throws a MATLAB error:
model = struct;
model.variables.a = a;
model.variables{2} = b;
while the following block throws an error or gives a warning depending on the MATLAB version:
model = struct;
model.variables{1} = a;
model.variables.b = b;
If no error is thrown, the model above is the same as
model = struct;
model.variables.b = b;
hence the definition of variable 1 is ignored.
The fact that all fields can be cells, can be used to shorten the code. Consider a regularized CPD of a third-order tensor \(\ten{T}\) of rank 5. In the basic SDF syntax, we have
Uinit = cpd_rnd(size(T), 5);
model = struct;
model.variables.a = Uinit{1};
model.variables.b = Uinit{2};
model.variables.c = Uinit{3};
model.factors.A = 'a';
model.factors.B = 'b';
model.factors.C = 'c';
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = {'A', 'B', 'C'};
model.factorizations.regul.regL2 = {'A', 'B', 'C'};
By exploiting the fact that all fields can be cells, the model above can be rewritten as:
model = struct;
model.variables = cpd_rnd(getsize(T), 5);
model.factors = 1:getorder(T);
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = 1:getorder(T);
model.factorizations.regul.regL2 = 1:getorder(T);
Note that we used getorder(T)
instead of 3
. If \(\ten{T}\) is a
fourth-order tensor, the code block above still works without the need to alter
the code. Instead of getorder(T)
the Matlab command ndims(T)
can also be
used. The advantage of the former is that it works on full, sparse, incomplete
and structured tensors, while the latter only works for full tensors.
For the factorizations
field, there exists an additional shortcut. Apart
from a struct or cell, the factorizations
field can also be a struct array.
As an example, consider a coupled CPD of two third-order tensors \(\ten{S}\)
and \(\ten{T}\) which are coupled in the first mode. The usual way of
constructing this model is (only relevant fields are shown):
model.factorizations{1}.data = S;
model.factorizations{1}.cpd = 1:3;
model.factorizations{2}.data = T;
model.factorizations{2}.cpd = [1 4 5];
Using a struct array, the following equivalent specification can be used:
datasets = {S, T};
coupling = {[1 2 3], [1 4 5]};
model.factorizations = struct('data', datasets, 'cpd', coupling);
This syntax can be useful for coupled factorizations that involve many datasets.
The transform
field¶
In the case of structured factorizations, it often occurs that some factors have the same or a highly similar structure, i.e., the same transformation is used for multiple factors. This transformation can have the same or different parameters for each factor. Consider the following nonnegative CPD of a third-order rank-5 tensor \(\ten{T}\):
model = struct;
model.variables.a = rand(size(T, 1), 5);
model.variables.b = rand(size(T, 2), 5);
model.variables.c = rand(size(T, 3), 5);
model.factors.A = {'a', @struct_nonneg};
model.factors.B = {'b', @struct_nonneg};
model.factors.C = {'c', @struct_nonneg};
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = {'A', 'B', 'C'};
This model can be written more compactly using a transform
field:
model = struct;
model.variables.a = rand(size(T, 1), 5);
model.variables.b = rand(size(T, 2), 5);
model.variables.c = rand(size(T, 3), 5);
model.transform.nonneg = {{'a', 'b', 'c'}, {'A', 'B', 'C'}, @struct_nonneg};
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = {'A', 'B', 'C'};
The nonneg
transform transforms the variables {'a', 'b', 'c'}
into the
factors {'A', 'B', 'C'}
using the transformation @struct_nonneg
. As can
be seen using sdf_check(model, 'print')
, three factors A
, B
and
C
are created. The factor names are often not needed as such: if they are
omitted, the variable names are also used as factor names. Note that if {'A',
'B', 'C'}
is omitted, the newly created factors have names a
, b
and
c
, and lowercase names should also be used in the last line of the previous
model. This can again be checked using sdf_check(model, 'print')
. Using
indexing instead of explicit naming, the model can be simplified further and
made more dynamic:
model = struct;
model.variables = cpd_rnd(T, 5, 'Real', @rand);
model.transform{1} = {1:getorder(T), @struct_nonneg};
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = 1:getorder(T);
Similar to variables, factors and factorizations, multiple transforms can be
defined by using the struct (model.transform.t1 = ...
) or the cell
(model.transform{1} = ..
) notation. Also, the factors
and the
transform
field can be used together in a model, as shown in the following
example in which the second and third factor matrix have polynomials as
columns, while the first factor matrix is unconstrained. Without transform
the code is:
model = struct;
model.variables.A = randn(size(T,1), 5); % R = 5
model.variables.coefB = randn(5, d); % d-1 is the degree of the polynomial
model.variables.coefC = randn(5, d); % d-1 is the degree of the polynomial
model.factors{1} = 1; % unconstrained factor A
model.factors{2} = {2, @(z,task) struct_poly(z,task,tb)}; % poly in points tb
model.factors{3} = {3, @(z,task) struct_poly(z,task,tc)}; % poly in points tc
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = 1:3;
Using transform
, the code becomes:
model = struct;
model.variables.A = randn(size(T,1), 5); % R = 5
model.variables.coefB = randn(5, d); % d-1 is the degree of the polynomial
model.variables.coefC = randn(5, d); % d-1 is the degree of the polynomial
model.factors{1} = 1; % unconstrained factor A
model.transform{1} = {2:3, @struct_poly, {tb, tc}}; % tb, tc are
% evaluation points
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = 1:3;
Note that there is no need to define an anonymous function @(z,task)
struct_poly(...)
. The second factor still has evaluation points tb
as
parameters while the third factor has evaluations points tc
. This way many
parameters can easily be passed to the transformations. Consider the
following transform (see help struct_poly
for more information on the
different parameters):
model.transform{1} = {1:3, @struct_poly, {t1,t2,t3}, 'chebyshev', {true}};
% which is equivalent to
model.factors{1} = {1, @(z,task) struct_poly(z,task,t1,'chebyshev',true)};
model.factors{2} = {2, @(z,task) struct_poly(z,task,t2,'chebyshev',true)};
model.factors{3} = {3, @(z,task) struct_poly(z,task,t3,'chebyshev',true)};
Hence, cell parameters are distributed if the length of the cell is equal to the number of factors to be created. If the cell parameter has length one, each factor gets the content of the cell as argument. (Cells with a different length are not allowed.) All other parameter types are copied for each transformation. It is also possible to chain transformations:
model.transform{1} = {1:3, @struct_vander, @struct_transpose};
Each transformation can have its own set of parameters.
As shown, multiple variables can easily be transformed into multiple factors
using the transform
field if the transformation is the same for all factors.
There are some (minor) restrictions regarding naming, however. The
transform
field cannot overwrite factors with the same name or index
(remember that if no factor names/indices are specified, the variable names/indices
become factor names/indices); multiple transform
entries cannot define the
same factor names or indices; and if named (indexed) factors are used, named
(indexed) factor references in the transform
field should be used. More
details on the syntax of the transform
field can be found in the full
language specification (Section The transform field).
Using implicit factors¶
It is quite common to have lines like
model.factors.A = 'a';
In these lines the variable is used directly as a factor as no transformation or
concatenation of subfactors is performed. These spurious lines can be avoided by
allowing factors to be defined implicitly. This can be done by setting the
model.options.ImplicitFactors = true
. When implicit factors are allowed,
variables can be used directly in factorizations, as a factor with the same name
is implicitly defined. Consider the following example:
model = struct;
model.variables.A = rand(10,5);
model.variables.B = rand(11,5);
model.variables.C = rand(12,5);
model.factors.A = 'A';
model.factors.B = 'B';
model.factors.C = 'C';
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = {'A','B','C'};
This model can also be defined by
model = struct;
model.variables.A = rand(10,5);
model.variables.B = rand(11,5);
model.variables.C = rand(12,5);
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = {'A','B','C'};
model.options.ImplicitFactors = true;
No factors with names A
, B
or C
can be found, but variables with
these names exist. Therefore, the factors A
, B
and C
are
automatically created. If, for example, a factor with A
as name does exist,
the latter is used. The result can be checked using
sdf_check(model,'print')
. Of course the model can also be defined as
model = struct;
model.variables.A = rand(10,5);
model.variables.B = rand(11,5);
model.variables.C = rand(12,5);
model.factors = 1:3;
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = 1:3;
If some factors are structured, however, the simple line
model.factors = 1:3;
cannot be used. Implicit factors are more convenient
in this case. As an example, consider a model that involves two unstructured factors
A
and C
and a nonnegative factor B
.
model = struct;
model.variables.A = rand(10,5);
model.variables.B = rand(11,5);
model.variables.C = rand(12,5);
model.factors{2} = {'B', @struct_nonneg};
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = 1:3;
model.options.ImplicitFactors = true;
As implicit factors can be confusing for inexperienced users, this option is
turned off by default. The full language specification (Section
Implicit factors) explains a method to turn the option on by
default. This way the line model.options.ImplicitFactors
is no longer
needed in every model definition.
High accuracy for NLS algorithms¶
Sometimes accuracy up to machine precision is required. The nonlinear least
squares (NLS) type algorithms like cpd_nls
, ll1_nls
, lmlra_nls
and
btd_nls
are often a good choice to achieve high accuracy thanks to their
fast convergence in the neighborhood of optima. This means that once the
algorithm has found an iterate close to an optimum, the optimum is found up to
machine precision a few iterations later. To get high accuracy results, the
relative function tolerance TolFun
and the step size tolerance TolX
should be small enough, e.g., TolFun = eps^2
and TolX = eps
.
Number of CG iterations¶
If structure is imposed on one or more factors in a decomposition, it may seem
that sdf_nls
does not always have this fast convergence property. The reason
behind this is rather technical, but the main idea is that underlying linear
systems are not solved accurately enough by the Conjugate Gradients (CG)
algorithm, as not enough CG iterations are performed. By default the maximum
number of iterations is limited by CGMaxIter = 15
. Often the number of
iterations for the optimization routine can be significantly reduced if the
maximum number of CG iterations is increased, e.g., CGMaxIter = 50
or
CGMaxIter = 100
. Exceptionally, CGMaxIter
needs to be increased to the
number of underlying variables. Increasing CGMaxIter
is likely to increase
the cost per iteration, but often the number of iterations of the optimization
routines decreases resulting in a net gain.
Instead of increasing CGMaxIter
it is also possible to use the sdf_minf
solver, as sdf_minf
avoids solving this linear system. When to use
sdf_minf
or sdf_nls
with increased CGmaxIter
is problem specific.
As an example, consider the following nonnegative tensor factorization:
R = 5;
T = ful(cpd_rnd([40 50 60], R, 'Real', @rand));
model = struct;
model.variables = cpd_rnd([40 50 60], R);
model.transform = {1:3, @struct_nonneg};
model.factorizations{1}.data = T;
model.factorizations{1}.cpd = 1:3;
[sol,output] = sdf_nls(model, 'Display', 100, 'TolX', eps, 'TolFun', eps^2);
The algorithm prints its progress every 100 iterations:
fval relfval relstep delta rho
=1/2*norm(F)^2 TolFun = 5e-32 TolX = 2e-16
0: 1.05508511e+02 |
1: 1.15611717e+01 | 8.90424274e-01 | 1.52025153e-01 | 8.0584e+00 | 8.9202e-01
100: 4.61577625e-11 | 2.11901351e-14 | 8.20225374e-06 | 1.7762e-01 | 1.0063e+00
200: 1.05870120e-12 | 3.18735863e-16 | 1.07081681e-06 | 1.7762e-01 | 9.9611e-01
300: 4.59820639e-14 | 1.39850518e-17 | 2.29709351e-07 | 1.7762e-01 | 1.0072e+00
400: 2.37198537e-15 | 6.68979657e-19 | 4.45819524e-08 | 1.7762e-01 | 9.8808e-01
500: 1.07873831e-16 | 3.13699148e-20 | 1.09605852e-08 | 1.7762e-01 | 9.9295e-01
As can be seen, machine precision is not attained within the maximum number of
iterations. When looking at the number of CG iterations
(output.cgiterations
) it is clear that they are clipped against the maximum
number of CG iterations:
[min(output.cgiterations) max(output.cgiterations)]
ans =
15 15
When the maximum number of CG iterations is increased to 150, the number of iterations decreases significantly:
sol = sdf_nls(model, 'Display', 10, 'TolX', eps, 'TolFun', eps^2,...
'CGMaxIter', 150);
fval relfval relstep delta rho
=1/2*norm(F)^2 TolFun = 5e-32 TolX = 2e-16
0: 1.05508511e+02 |
1: 1.18695988e+01 | 8.87501030e-01 | 2.14514488e-01 | 1.1371e+01 | 8.8881e-01
10: 8.24322930e-03 | 2.17476982e-05 | 4.20034477e-02 | 1.6564e+00 | 1.0810e+00
20: 1.82692225e-13 | 7.25666009e-13 | 1.03495202e-04 | 1.3658e+00 | 9.9992e-01
30: 1.75577946e-26 | 2.92600487e-27 | 1.38043361e-11 | 1.3658e+00 | 9.9998e-01
34: 1.69764270e-31 | 2.60423058e-32 | 4.11640579e-14 | 1.3658e+00 | 9.9753e-01
Now only 34 iterations are needed to obtain a solution that is accurate up to
machine precision, which can be validated by computing cpderr
.
Incomplete tensors¶
By default, sdf_nls
uses an approximation for the Gramian of the Jacobian in
the case of incomplete tensors. This approximation is fast and works well if the
number of missing entries is low and the missing entries are spread out
uniformly across the tensor. When very few entries are known or when the missing
entries follow a pattern, this approximation is no longer valid. This can result
in very slow convergence. An exact implementation of the Gramian of the Jacobian
is available that allows fast convergence regardless of the number of missing
entries. This implementation can be selected by using the cpdi
factorization
type, instead of cpd
[15]:
model.factorizations{1}.data = T;
model.factorizations{1}.cpdi = {'A','B','C'};
While the number of iterations in the optimization algorithm usually decreases, each iteration is more expensive. However, when very few entries are known, there is often a gain.
The remarks about the CG iterations in the previous subsection also hold for the
cpdi
factorization type. sdf_minf
does not use the Gramian of the
Jacobian, and therefore it does not have a problem with incomplete tensors.