Structured data fusion: implementing a transformation

In SDF, structure can be imposed on a factor by transforming variables \(z\) into a factor \(x\), i.e., \(x\) is a function \(x : z \rightarrow x(z)\). An example is struct_nonneg in which nonnegativity is imposed by using the transformation \(x(z) = z^2\) for each entry. This chapter discusses the different elements needed to implement a transformation and illustrates how the user can implement his or her own transformations. This way new types of structure can be imposed on factors. All concepts are illustrated by two examples: the implementation of struct_sqrt which implements a square root constraint, and the implementation of a (simplified) version of struct_poly which imposes that a factor has polynomial factor vectors.

Elements in a transformation

Each transformation implements at least three operations or tasks: the function evaluation, the right Jacobian-vector product and the left Jacobian-vector product. In this section, the global structure of a transformation is explained, as well as the different tasks. As this is a rather technical section, it can be useful to first read the examples in Examples.

Each transformation accepts at least two input arguments: the underlying variables z and the task to perform. Two output arguments are needed: the computed result x which depends on the task, and state (which should be set to [] if not used). The function definition of a transformation looks like:

function [x,state] = struct_name(z, task)

There are three tasks to be implemented depending on the fields in the task struct:

  • ~isempty(task.r): x is the the right Jacobian-vector product \(\frac{\partial x}{\partial z}\cdot r\) in which \(r\) is task.r.
  • ~isempty(task.l): x is the left Jacobian-vector product \((\frac{\partial x}{\partial z})^{\TH}\cdot l + (\frac{\partial x}{\partial \bar{z}})^{\T}\cdot l\) in which \(l\) is task.l.
  • Otherwise (task can be []): x is the function evaluation \(x(z)\).

The different tasks are now discussed separately.

Function evaluation

The transformation returns the function evaluation \(x(z)\) if task is empty, or both task.r and task.l are empty. To increase the readability, we introduce the variables right and left:

if nargin < 2, task = []; end
right = ~isempty(task) && isfield(task, 'r') && ~isempty(task.r);
left  = ~isempty(task) && isfield(task, 'l') && ~isempty(task.l);

if ~right && ~left
    % function evaluation

The input argument z is the variable, in the same format as defined in the model. For example, if a variable a defined as

model.variables.a = randn(10,3);

and is transformed, then z = a is a matrix of size \(10\times3\). z can also be the output of a previous transformation, e.g.,

t = 1:10;
model.variables.c = rand(5,3);
model.factors.A = {'c', @(z,task) struct_poly(z,task,t), @struct_nonneg};

For struct_poly, z is the variable c (a \(5\times3\) matrix), while for struct_nonneg, z is the output of struct_poly (a \(10\times5\) matrix).

Depending on the transformation, the output x can be a scalar, vector, matrix or tensor. x can also be a cell, which can be processed by subsequent transformations.

Right Jacobian-vector product

The transformation returns the right Jacobian-vector product if task.r is not empty. task.l is always empty in this case. This right Jacobian-vector product is only computed in NLS type algorithms. In this task, the derivative \(\frac{\partial x}{\partial z}\cdot r\) is computed, in which \(r\) is given in task.r. Mathematically, \(x\), \(z\) and \(r\) are vectors of length \(F\), \(V\) and \(V\) respectively. Suppose \(D = \frac{\partial x}{\partial z}\) is an \(F\times V\) matrix, then we have \(d_{ij} = \frac{\partial x_i}{\partial z_j}\). To simplify the code, task.r is not a vector, but has the same format as the input variable z, while the result x has the same format as the resulting factor would have if \(x(z)\) was evaluated. In the code of the transformation, the following lines are added:

elseif right
    % compute right Jacobian-vector product

The right Jacobian-vector product can only be computed if the function \(x\) depends solely on \(z\), and hence not on its conjugate \(\bar{z}\). If \(x\) is a function of both \(z\) and \(\bar{z}\), i.e., \(x(z,\bar{z})\), an error should be thrown if the input data is complex:

elseif right
    if ~isreal(z) || isreal(task.r)
        error('struct_name:nonanalytic', ['Nonanalytic objective ' ...
              'functions are currently not supported in sdf_nls, please ' ...
              'use sdf_minf instead.']);
    end

Left Jacobian-vector product

The transformation returns the left Jacobian-vector product if task.l is not empty (task.r is always empty in this case). This left Jacobian-vector product is computed in all algorithms. In this task, the derivative \((\frac{\partial x}{\partial z})^{\TH}\cdot l + (\frac{\partial x}{\partial \bar{z}})^{\T}\cdot l\) is computed, in which \(l\) is given. Mathematically, \(x\), \(z\) and \(l\) are vectors of length \(F\), \(V\) and \(F\) respectively. Suppose \(D = \frac{\partial x}{\partial z}\) is an \(F\times V\) matrix, then each entry is given by \(d_{ij} = \frac{\partial x_i}{\partial z_j}\). To simplify the code, task.l is not a vector, but has the same format as as the output x would have if \(x(z)\) was evaluated. The result of the derivative x should have the same format as the input variable z. In the code of the transformation, the following lines are added:

elseif left
    % compute left Jacobian-vector product
end

To compute \(\frac{\partial x}{\partial z}\), \(\bar{z}\) is assumed to be constant, while \(z\) is kept constant when \(\frac{\partial x}{\partial \bar{z}}\) is computed. Hence, if \(x\) is only a function of \(z\), i.e., \(x(z)\), the derivative w.r.t. \(\bar{z}\) is zero. If we have \(x(z,\bar{z})\) both derivatives need to be computed.

Examples

Implementing struct_sqrt

The transformation struct_sqrt implements the function \(f : \R^{I\times R} \rightarrow \R^{I\times R}: x = \sqrt{z}\) in which the square root is applied entry-wise. Three elements are required for this transformation: the function evaluation, the right Jacobian-vector product and the left Jacobian-vector product.

The function evaluation is simply x = sqrt(z). The right and left Jacobian-vector products are given by \(\frac{\partial x}{\partial z}\cdot r\) and \((\frac{\partial x}{\partial z})^{\TH}\cdot l + (\frac{\partial x}{\partial \bar{z}})^{\T}\cdot l\), respectively. The derivative \(\frac{\partial x}{\partial z}\) is given by \(\frac{1}{2\sqrt{z}}\). When computing the derivative w.r.t. \(z\), \(\bar{z}\) is kept constant. As \(x\) does not depend on \(\conj{z}\), the derivative \(\frac{\partial x}{\partial \bar{z}} = 0\). Hence, the complete code for the transformation is given by:

function [x,state] = struct_sqrt(z,task)
state = [];
if nargin < 2, task = []; end
right = ~isempty(task) && isfield(task, 'r') && ~isempty(task.r);
left  = ~isempty(task) && isfield(task, 'l') && ~isempty(task.l);
if ~right && ~left
     x = sqrt(z);
elseif right
     x = 1./(2*sqrt(z)).*task.r;
elseif left
     x = conj(1./(2*sqrt(z))).*task.l;
end

Implementing struct_poly

The transformation struct_poly imposes polynomial structure on each of the columns of a factor matrix, i.e., each entry of the factor \(x\) (an \(I\times R\) matrix) is given by \(x_{ir} = c_{r0} + c_{r1} t_i + c_{r2} t_i^2 + \cdots + c_{rd} t_i^d\). The points \(t_i\), \(i=1,\ldots,I\) are fixed. The coefficients \(c\) (an \(R\times (d+1)\) matrix) are the variables \(z\) (the highest-degree coefficients \(c_{rd}\) form the first column).

Again, the different tasks are implemented. The first lines of the transformation are similar to struct_sqrt:

function [x, state] = struct_poly(z, task, t)

   if nargin < 2, task = []; end
   right = ~isempty(task) && isfield(task, 'r') && ~isempty(task.r);
   left  = ~isempty(task) && isfield(task, 'l') && ~isempty(task.l);
   state = [];

The function evaluation can be written as a matrix multiplication by using a basis matrix B in which each column is a power of the point vector t, i.e., x = B*z:

if ~right && ~left
    B = bsxfun(@power, t(:), size(z, 2)-1:-1:0);
    x = B*z;

The derivative \(\frac{\partial x}{\partial z}\) is simply given by B, hence the right and left Jacobian-vector products are given by

elseif right
    B = bsxfun(@power, t(:), size(z, 2)-1:-1:0);
    x = B*task.r.';
elseif left
    B = bsxfun(@power, t(:), size(z, 2)-1:-1:0);
    x = task.l.'*conj(B);
end

Storing intermediate results

Some transformations require expensive computations, which may depend on the current value of the variables, but also on extra input parameters that stay constant over all iterations. In the latter case, persistent storage can be used to store precomputed results, while the state field can be used in the former case. First, the use of the state field is illustrated for a transformation that implements the square root constraint: struct_sqrt. Next, the use of the persistent field is illustrated for the polynomial transformation struct_poly.

Using the state field

In the struct_sqrt example, the quantity sqrt(z) and the division 1./(2*sqrt(z)) are computed many times in each iteration, as the three tasks are performed multiple times per iteration depending on the chosen optimization algorithm. A property of the optimization-based algorithms used in Tensorlab is that every time a variable is changed, the function value is computed. In other words, the variable z does not change between function value computations. Therefore, the variable z seen by the right and left Jacobian-vector products, is the same as in the last function evaluation. The computation time can hence be reduced by performing computations depending on z in the function evaluation and by reusing these results in the right and left Jacobian-vector product computations. The state output field allows the temporary storage of such intermediary results.

The state output is a struct. Each field set during the function evaluation is available under the same name in the task struct. In this example, a field dz is constructed to store the result 1./(2*sqrt(z)). To compute the right and left Jacobian-vector product, this precomputed value is used:

function [x,state] = struct_sqrt(z,task)
state = [];
if nargin < 2, task = []; end
right = ~isempty(task) && isfield(task, 'r') && ~isempty(task.r);
left  = ~isempty(task) && isfield(task, 'l') && ~isempty(task.l);
if ~left && ~right
    x = sqrt(z);
    state.dz = 1./(2*x);
elseif right
    x = task.dz.*task.r;
elseif left
    x = conj(task.dz).*task.l;
end

As can be seen, the value of state.dz is only computed when the function is evaluated. As the function evaluation is always called before the left and right Jacobian-vector products, state should be set in the function evaluation task. The value of state is ignored in the other calls, hence state can be left empty ([]) for the right and left Jacobian-vector products.

Persistent storage

In the previous example, the precomputed value depends on the variable \(z\). As this variable changes every iteration, the state needs to be recomputed every iteration. In the struct_poly transformation, the matrix basis B is computed every function evaluation, but B is independent of the variables. Hence, setting state.B = B every iteration reduces the computational cost a little, but this still wastes a lot of computation time. Therefore, a special field is provided: the persistent field. B can be computed in the first call to struct_poly and stored in state.persistent.B = B. In all subsequent calls, the struct task will contain the field persistent which contains B as a field. Hence, the transformation can be rewritten as:

function [x, state] = struct_poly(z, task, t)

   if nargin < 2, task = []; end
   right = ~isempty(task) && isfield(task, 'r') && ~isempty(task.r);
   left  = ~isempty(task) && isfield(task, 'l') && ~isempty(task.l);

   if ~isempty(task) && isfield(task, 'persistent')
       B = task.persistent.B;
       state = [];
   else % not present, compute
       B = bsxfun(@power, t(:), size(z, 2)-1:-1:0);
       state.persistent.B = B;
   end

   if ~left && ~right
       x = B*z;
   elseif right
       x = B*task.r.';
   elseif left
       x = task.l.'*conj(B);
   end
end

B is now computed only once. In subsequent iterations B can be loaded from the task.persistent struct to be used in further computations. Note that state = [] if B has already been computed: there is no need to pass persistent data as this is done in the background.

Comparing state and persistent

Setting the state field in the function value call is useful to store intermediate results needed for the right and left Jacobian-vector products. The state field can be used if the intermediate results depend on the variables z and possibly on extra input parameters.

The state.persistent field can be used to store precomputed data that depend solely on extra input parameters.

Non-numerical variables

In the previous examples, only numerical values are used as variable \(z\) and output \(x\). Cells can be used as well, which is useful if several variables need to be combined. As an example, consider the struct_poly example again. Now, instead of taking fixed points t, the evaluation points are considered variables as well, i.e.:

model.variables.z = {rand(5,3), rand(10,1)}; % note this is a cell
model.factors.x = {'z', @struct_poly2};

We reimplement the three tasks. Note that this example is given for illustration purposes only, and that the code below may not be the best way to implement this kind of structure.

The function evaluation for struct_poly2 is very similar as for struct_poly, only now z is a cell with the coefficient c = z{1} and the evaluation points t = z{2}:

function [x, state] = struct_poly(z, task, t)
   % trivial parts omitted
   c = z{1}; t = z{2};
   if ~left && ~right
       B = bsxfun(@power, t(:), size(c,2)-1:-1:0);
       x = B*c;

Note that the persistent field cannot be used here, as t changes every iteration.

For the right Jacobian-vector product, task.r has the same format as the input variable z, i.e., task.r is a cell of length two. The output x has the same format as the factor. The derivative \(\frac{\partial x}{\partial z} = [\frac{\partial x}{\partial c}, \frac{\partial x}{\partial t}]\) is multiplied by \([\vectorize(r_c)^{\T}, \vectorize(r_t)^{\T}]^{\T}\) in which \(r_c\) and \(r_t\) are the parts of \(r\) corresponding to the variables \(c\) and \(t\), respectively. The multiplication can be written as \(\frac{\partial x}{\partial c}\vectorize(r_c)+\frac{\partial x}{\partial t}\vectorize(r_t)\). Therefore, the right Jacobian-vector product can be computed as

elseif right
    % derivative w.r.t. c
    B = bsxfun(@power, t(:), size(c,2)-1:-1:0);
    x = B*task.r{1}.';
    % derivative w.r.t. t
    cder = [zeros(size(c,1),1),
            bsxfun(@times, c(:,1:end-1),size(c,2)-1:-1:1)];
    x = x + (B*cder).*task.r{2};

Finally, the left Jacobian-vector product is computed. task.l has the same format as the resulting factor. The output x should have the same format as the variable z in this case. Following the same reasoning as for the right Jacobian-vector product, the following result is obtained:

elseif left
    % derivative w.r.t. c
    B = bsxfun(@power, t(:), size(c,2)-1:-1:0);
    cl = B'*task.l;
    % derivative w.r.t. t
    cder = [zeros(size(c,1),1),
            bsxfun(@times, c(:,1:end-1),size(c,2)-1:-1:1)];
    tl = conj(B*cder).*task.l;
    x = {cl.', tl};
end

As before, the repeated computations of B and cder can be avoided by computing both during the function evaluation and using state.B and state.cder to store the results.

The transformations struct_poly and struct_poly2 can be merged. The merged transformation executes the code for struct_poly2 if z is a cell. Otherwise, the results for struct_poly are returned.