Structured data fusion: advanced concepts

A few more advanced modeling concepts are explained in this section. These concepts allow for more dynamical and/or more compact models. Three concepts are discussed: indexed references, the transform field which applies transformations to multiple variables in a single line, and the implicit factors options which tries to automatically create missing factors from variables. The details of these concepts can be found in the full language specification (Structured data fusion: specification of domain specific language).

Using indexed notations

In the basic introduction to SDF (Structured data fusion) we always used named variables, factors and factorizations, e.g., model.variables.a, and named references to variables and factors, e.g., model.factors.A = {'a'}. While this practice makes the model easy to read, it requires many keystrokes and it complicates the adaption of models afterwards, e.g., if a third-order tensor model is expanded into a fourth-order tensor model.

Internally, all structs are converted to cells, which means that instead of names indices are used. This cell format is available to the user. For example:

model = struct;
model.variables.a = a;
model.variables.b = b;
model.factors.A = 'a';
model.factors.B = 'b';
model.factorizations.mat.data = T;
model.factorizations.mat.cpd = {'A','B'};

can be rewritten using indexed references as

model = struct;
model.variables.a = a;
model.variables.b = b;
model.factors.A = 1; % refers to variable a
model.factors.B = 2; % refers to variable b
model.factorizations.mat.data = T;
model.factorizations.mat.cpd = 1:2; % refers to factors A (1) and B (2)

We see that both the names and indices of variables and factors can be used to refer to them. The index of the variable or factor is determined by the order in which the different variables or factors are defined the first time:

model = struct;
model.variables.a = a;  % has index 1
model.variables.c = c;  % has index 2
model.variables.a = a2; % still has index 1 (defined a first time before)
model.variables.b = b;  % has index 3

The names of the variables, factors and factorizations can even be omitted. In this situation we use indexed variables, factors or factorizations instead.

model = struct;
model.variables{1} = a;
model.variables{2} = b;
model.factors{1} = 1;
model.factors{2} = 2;
model.factorizations{1}.data = T;
model.factorizations{1}.cpd = 1:2;

Named and indexed referencing can even be mixed, e.g.,

model = struct;
model.variables{1} = a;
model.variables{2} = b;
model.factors.A = 1;
model.factors.B = 2;
model.factorizations{1}.data = T; % a third-order tensor
model.factorizations{1}.cpd = {1, 'A', 'B'}; % == {1,1,2} or {'A','A','B'}

Named references can only be used if the variable or factor being referred to has been named. Within the same field (variables, factors or factorizations) all definitions are either named or indexed. The following block throws a MATLAB error:

model = struct;
model.variables.a = a;
model.variables{2} = b;

while the following block throws an error or gives a warning depending on the MATLAB version:

model = struct;
model.variables{1} = a;
model.variables.b  = b;

If no error is thrown, the model above is the same as

model = struct;
model.variables.b = b;

hence the definition of variable 1 is ignored.

The fact that all fields can be cells, can be used to shorten the code. Consider a regularized CPD of a third-order tensor \(\ten{T}\) of rank 5. In the basic SDF syntax, we have

Uinit = cpd_rnd(size(T), 5);
model = struct;
model.variables.a = Uinit{1};
model.variables.b = Uinit{2};
model.variables.c = Uinit{3};
model.factors.A = 'a';
model.factors.B = 'b';
model.factors.C = 'c';
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = {'A', 'B', 'C'};
model.factorizations.regul.regL2 = {'A', 'B', 'C'};

By exploiting the fact that all fields can be cells, the model above can be rewritten as:

model = struct;
model.variables = cpd_rnd(getsize(T), 5);
model.factors = 1:getorder(T);
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = 1:getorder(T);
model.factorizations.regul.regL2 = 1:getorder(T);

Note that we used getorder(T) instead of 3. If \(\ten{T}\) is a fourth-order tensor, the code block above still works without the need to alter the code. Instead of getorder(T) the Matlab command ndims(T) can also be used. The advantage of the former is that it works on full, sparse, incomplete and structured tensors, while the latter only works for full tensors.

For the factorizations field, there exists an additional shortcut. Apart from a struct or cell, the factorizations field can also be a struct array. As an example, consider a coupled CPD of two third-order tensors \(\ten{S}\) and \(\ten{T}\) which are coupled in the first mode. The usual way of constructing this model is (only relevant fields are shown):

model.factorizations{1}.data = S;
model.factorizations{1}.cpd  = 1:3;
model.factorizations{2}.data = T;
model.factorizations{2}.cpd  = [1 4 5];

Using a struct array, the following equivalent specification can be used:

datasets = {S, T};
coupling = {[1 2 3], [1 4 5]};
model.factorizations = struct('data', datasets, 'cpd', coupling);

This syntax can be useful for coupled factorizations that involve many datasets.

The transform field

In the case of structured factorizations, it often occurs that some factors have the same or a highly similar structure, i.e., the same transformation is used for multiple factors. This transformation can have the same or different parameters for each factor. Consider the following nonnegative CPD of a third-order rank-5 tensor \(\ten{T}\):

model = struct;
model.variables.a = rand(size(T, 1), 5);
model.variables.b = rand(size(T, 2), 5);
model.variables.c = rand(size(T, 3), 5);
model.factors.A = {'a', @struct_nonneg};
model.factors.B = {'b', @struct_nonneg};
model.factors.C = {'c', @struct_nonneg};
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd  = {'A', 'B', 'C'};

This model can be written more compactly using a transform field:

model = struct;
model.variables.a = rand(size(T, 1), 5);
model.variables.b = rand(size(T, 2), 5);
model.variables.c = rand(size(T, 3), 5);
model.transform.nonneg = {{'a', 'b', 'c'}, {'A', 'B', 'C'}, @struct_nonneg};
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd  = {'A', 'B', 'C'};

The nonneg transform transforms the variables {'a', 'b', 'c'} into the factors {'A', 'B', 'C'} using the transformation @struct_nonneg. As can be seen using sdf_check(model, 'print'), three factors A, B and C are created. The factor names are often not needed as such: if they are omitted, the variable names are also used as factor names. Note that if {'A', 'B', 'C'} is omitted, the newly created factors have names a, b and c, and lowercase names should also be used in the last line of the previous model. This can again be checked using sdf_check(model, 'print'). Using indexing instead of explicit naming, the model can be simplified further and made more dynamic:

model = struct;
model.variables = cpd_rnd(T, 5, 'Real', @rand);
model.transform{1} = {1:getorder(T), @struct_nonneg};
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd  = 1:getorder(T);

Similar to variables, factors and factorizations, multiple transforms can be defined by using the struct (model.transform.t1 = ...) or the cell (model.transform{1} = ..) notation. Also, the factors and the transform field can be used together in a model, as shown in the following example in which the second and third factor matrix have polynomials as columns, while the first factor matrix is unconstrained. Without transform the code is:

model = struct;
model.variables.A     = randn(size(T,1), 5); % R = 5
model.variables.coefB = randn(5, d); % d-1 is the degree of the polynomial
model.variables.coefC = randn(5, d); % d-1 is the degree of the polynomial
model.factors{1}      = 1; % unconstrained factor A
model.factors{2}      = {2, @(z,task) struct_poly(z,task,tb)}; % poly in points tb
model.factors{3}      = {3, @(z,task) struct_poly(z,task,tc)}; % poly in points tc
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd  = 1:3;

Using transform, the code becomes:

model = struct;
model.variables.A     = randn(size(T,1), 5); % R = 5
model.variables.coefB = randn(5, d); % d-1 is the degree of the polynomial
model.variables.coefC = randn(5, d); % d-1 is the degree of the polynomial
model.factors{1}      = 1; % unconstrained factor A
model.transform{1}    = {2:3, @struct_poly, {tb, tc}}; % tb, tc are
                                                       % evaluation points
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd  = 1:3;

Note that there is no need to define an anonymous function @(z,task) struct_poly(...). The second factor still has evaluation points tb as parameters while the third factor has evaluations points tc. This way many parameters can easily be passed to the transformations. Consider the following transform (see help struct_poly for more information on the different parameters):

model.transform{1} = {1:3, @struct_poly, {t1,t2,t3}, 'chebyshev', {true}};
% which is equivalent to
model.factors{1} = {1, @(z,task) struct_poly(z,task,t1,'chebyshev',true)};
model.factors{2} = {2, @(z,task) struct_poly(z,task,t2,'chebyshev',true)};
model.factors{3} = {3, @(z,task) struct_poly(z,task,t3,'chebyshev',true)};

Hence, cell parameters are distributed if the length of the cell is equal to the number of factors to be created. If the cell parameter has length one, each factor gets the content of the cell as argument. (Cells with a different length are not allowed.) All other parameter types are copied for each transformation. It is also possible to chain transformations:

model.transform{1} = {1:3, @struct_vander, @struct_transpose};

Each transformation can have its own set of parameters.

As shown, multiple variables can easily be transformed into multiple factors using the transform field if the transformation is the same for all factors. There are some (minor) restrictions regarding naming, however. The transform field cannot overwrite factors with the same name or index (remember that if no factor names/indices are specified, the variable names/indices become factor names/indices); multiple transform entries cannot define the same factor names or indices; and if named (indexed) factors are used, named (indexed) factor references in the transform field should be used. More details on the syntax of the transform field can be found in the full language specification (Section The transform field).

Using implicit factors

It is quite common to have lines like

model.factors.A = 'a';

In these lines the variable is used directly as a factor as no transformation or concatenation of subfactors is performed. These spurious lines can be avoided by allowing factors to be defined implicitly. This can be done by setting the model.options.ImplicitFactors = true. When implicit factors are allowed, variables can be used directly in factorizations, as a factor with the same name is implicitly defined. Consider the following example:

model = struct;
model.variables.A = rand(10,5);
model.variables.B = rand(11,5);
model.variables.C = rand(12,5);
model.factors.A = 'A';
model.factors.B = 'B';
model.factors.C = 'C';
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = {'A','B','C'};

This model can also be defined by

model = struct;
model.variables.A = rand(10,5);
model.variables.B = rand(11,5);
model.variables.C = rand(12,5);
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = {'A','B','C'};
model.options.ImplicitFactors = true;

No factors with names A, B or C can be found, but variables with these names exist. Therefore, the factors A, B and C are automatically created. If, for example, a factor with A as name does exist, the latter is used. The result can be checked using sdf_check(model,'print'). Of course the model can also be defined as

model = struct;
model.variables.A = rand(10,5);
model.variables.B = rand(11,5);
model.variables.C = rand(12,5);
model.factors = 1:3;
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = 1:3;

If some factors are structured, however, the simple line model.factors = 1:3; cannot be used. Implicit factors are more convenient in this case. As an example, consider a model that involves two unstructured factors A and C and a nonnegative factor B.

model = struct;
model.variables.A = rand(10,5);
model.variables.B = rand(11,5);
model.variables.C = rand(12,5);
model.factors{2} = {'B', @struct_nonneg};
model.factorizations.tensor.data = T;
model.factorizations.tensor.cpd = 1:3;
model.options.ImplicitFactors = true;

As implicit factors can be confusing for inexperienced users, this option is turned off by default. The full language specification (Section Implicit factors) explains a method to turn the option on by default. This way the line model.options.ImplicitFactors is no longer needed in every model definition.

High accuracy for NLS algorithms

Sometimes accuracy up to machine precision is required. The nonlinear least squares (NLS) type algorithms like cpd_nls, ll1_nls, lmlra_nls and btd_nls are often a good choice to achieve high accuracy thanks to their fast convergence in the neighborhood of optima. This means that once the algorithm has found an iterate close to an optimum, the optimum is found up to machine precision a few iterations later. To get high accuracy results, the relative function tolerance TolFun and the step size tolerance TolX should be small enough, e.g., TolFun = eps^2 and TolX = eps.

Number of CG iterations

If structure is imposed on one or more factors in a decomposition, it may seem that sdf_nls does not always have this fast convergence property. The reason behind this is rather technical, but the main idea is that underlying linear systems are not solved accurately enough by the Conjugate Gradients (CG) algorithm, as not enough CG iterations are performed. By default the maximum number of iterations is limited by CGMaxIter = 15. Often the number of iterations for the optimization routine can be significantly reduced if the maximum number of CG iterations is increased, e.g., CGMaxIter = 50 or CGMaxIter = 100. Exceptionally, CGMaxIter needs to be increased to the number of underlying variables. Increasing CGMaxIter is likely to increase the cost per iteration, but often the number of iterations of the optimization routines decreases resulting in a net gain.

Instead of increasing CGMaxIter it is also possible to use the sdf_minf solver, as sdf_minf avoids solving this linear system. When to use sdf_minf or sdf_nls with increased CGmaxIter is problem specific.

As an example, consider the following nonnegative tensor factorization:

R = 5;
T = ful(cpd_rnd([40 50 60], R, 'Real', @rand));
model = struct;
model.variables = cpd_rnd([40 50 60], R);
model.transform = {1:3, @struct_nonneg};
model.factorizations{1}.data = T;
model.factorizations{1}.cpd  = 1:3;

[sol,output] = sdf_nls(model, 'Display', 100, 'TolX', eps, 'TolFun', eps^2);

The algorithm prints its progress every 100 iterations:

lisdafjoijoijwoij
      fval             relfval          relstep          delta        rho
      =1/2*norm(F)^2   TolFun = 5e-32   TolX = 2e-16

  0:  1.05508511e+02 |
  1:  1.15611717e+01 | 8.90424274e-01 | 1.52025153e-01 | 8.0584e+00 | 8.9202e-01
100:  4.61577625e-11 | 2.11901351e-14 | 8.20225374e-06 | 1.7762e-01 | 1.0063e+00
200:  1.05870120e-12 | 3.18735863e-16 | 1.07081681e-06 | 1.7762e-01 | 9.9611e-01
300:  4.59820639e-14 | 1.39850518e-17 | 2.29709351e-07 | 1.7762e-01 | 1.0072e+00
400:  2.37198537e-15 | 6.68979657e-19 | 4.45819524e-08 | 1.7762e-01 | 9.8808e-01
500:  1.07873831e-16 | 3.13699148e-20 | 1.09605852e-08 | 1.7762e-01 | 9.9295e-01

As can be seen, machine precision is not attained within the maximum number of iterations. When looking at the number of CG iterations (output.cgiterations) it is clear that they are clipped against the maximum number of CG iterations:

[min(output.cgiterations) max(output.cgiterations)]
wqer47we78849
ans =
 15    15

When the maximum number of CG iterations is increased to 150, the number of iterations decreases significantly:

sol = sdf_nls(model, 'Display', 10, 'TolX', eps, 'TolFun', eps^2,...
              'CGMaxIter', 150);
dasdfewriueio
     fval             relfval          relstep          delta        rho
     =1/2*norm(F)^2   TolFun = 5e-32   TolX = 2e-16

 0:  1.05508511e+02 |
 1:  1.18695988e+01 | 8.87501030e-01 | 2.14514488e-01 | 1.1371e+01 | 8.8881e-01
10:  8.24322930e-03 | 2.17476982e-05 | 4.20034477e-02 | 1.6564e+00 | 1.0810e+00
20:  1.82692225e-13 | 7.25666009e-13 | 1.03495202e-04 | 1.3658e+00 | 9.9992e-01
30:  1.75577946e-26 | 2.92600487e-27 | 1.38043361e-11 | 1.3658e+00 | 9.9998e-01
34:  1.69764270e-31 | 2.60423058e-32 | 4.11640579e-14 | 1.3658e+00 | 9.9753e-01

Now only 34 iterations are needed to obtain a solution that is accurate up to machine precision, which can be validated by computing cpderr.

Incomplete tensors

By default, sdf_nls uses an approximation for the Gramian of the Jacobian in the case of incomplete tensors. This approximation is fast and works well if the number of missing entries is low and the missing entries are spread out uniformly across the tensor. When very few entries are known or when the missing entries follow a pattern, this approximation is no longer valid. This can result in very slow convergence. An exact implementation of the Gramian of the Jacobian is available that allows fast convergence regardless of the number of missing entries. This implementation can be selected by using the cpdi factorization type, instead of cpd [15]:

model.factorizations{1}.data = T;
model.factorizations{1}.cpdi = {'A','B','C'};

While the number of iterations in the optimization algorithm usually decreases, each iteration is more expensive. However, when very few entries are known, there is often a gain.

The remarks about the CG iterations in the previous subsection also hold for the cpdi factorization type. sdf_minf does not use the Gramian of the Jacobian, and therefore it does not have a problem with incomplete tensors.