Structured data fusion: implementing a transformation¶
In SDF, structure can be imposed on a factor by transforming variables \(z\)
into a factor \(x\), i.e., \(x\) is a function \(x : z \rightarrow
x(z)\). An example is struct_nonneg
in which nonnegativity is imposed by
using the transformation \(x(z) = z^2\) for each entry. This chapter
discusses the different elements needed to implement a transformation and
illustrates how the user can implement his or her own transformations. This way
new types of structure can be imposed on factors. All concepts are illustrated
by two examples: the implementation of struct_sqrt
which implements a square
root constraint, and the implementation of a (simplified) version of
struct_poly
which imposes that a factor has polynomial factor vectors.
Elements in a transformation¶
Each transformation implements at least three operations or tasks: the function evaluation, the right Jacobian-vector product and the left Jacobian-vector product. In this section, the global structure of a transformation is explained, as well as the different tasks. As this is a rather technical section, it can be useful to first read the examples in Examples.
Each transformation accepts at least two input arguments: the underlying
variables z
and the task
to perform. Two output arguments are needed:
the computed result x
which depends on the task, and state
(which should
be set to []
if not used). The function definition of a transformation looks
like:
function [x,state] = struct_name(z, task)
There are three tasks to be implemented depending on the fields in the task
struct:
~isempty(task.r)
:x
is the the right Jacobian-vector product \(\frac{\partial x}{\partial z}\cdot r\) in which \(r\) istask.r
.~isempty(task.l)
:x
is the left Jacobian-vector product \((\frac{\partial x}{\partial z})^{\TH}\cdot l + (\frac{\partial x}{\partial \bar{z}})^{\T}\cdot l\) in which \(l\) istask.l
.- Otherwise (
task
can be[]
):x
is the function evaluation \(x(z)\).
The different tasks are now discussed separately.
Function evaluation¶
The transformation returns the function evaluation \(x(z)\) if task
is
empty, or both task.r
and task.l
are empty. To increase the
readability, we introduce the variables right
and left
:
if nargin < 2, task = []; end
right = ~isempty(task) && isfield(task, 'r') && ~isempty(task.r);
left = ~isempty(task) && isfield(task, 'l') && ~isempty(task.l);
if ~right && ~left
% function evaluation
The input argument z
is the variable, in the same format as defined in the
model. For example, if a variable a
defined as
model.variables.a = randn(10,3);
and is transformed, then z = a
is a matrix of size \(10\times3\). z
can
also be the output of a previous transformation, e.g.,
t = 1:10;
model.variables.c = rand(5,3);
model.factors.A = {'c', @(z,task) struct_poly(z,task,t), @struct_nonneg};
For struct_poly
, z
is the variable c
(a \(5\times3\) matrix),
while for struct_nonneg
, z
is the output of struct_poly
(a
\(10\times5\) matrix).
Depending on the transformation, the output x
can be a scalar, vector,
matrix or tensor. x
can also be a cell, which can be processed by subsequent
transformations.
Right Jacobian-vector product¶
The transformation returns the right Jacobian-vector product if task.r
is
not empty. task.l
is always empty in this case. This right Jacobian-vector
product is only computed in NLS type algorithms. In this task, the derivative
\(\frac{\partial x}{\partial z}\cdot r\) is computed, in which \(r\) is
given in task.r
. Mathematically, \(x\), \(z\) and \(r\) are
vectors of length \(F\), \(V\) and \(V\) respectively. Suppose
\(D = \frac{\partial x}{\partial z}\) is an \(F\times V\) matrix, then we
have \(d_{ij} = \frac{\partial x_i}{\partial z_j}\). To simplify the code,
task.r
is not a vector, but has the same format as the input variable z
,
while the result x
has the same format as the resulting factor would have if
\(x(z)\) was evaluated. In the code of the transformation, the following
lines are added:
elseif right
% compute right Jacobian-vector product
The right Jacobian-vector product can only be computed if the function \(x\) depends solely on \(z\), and hence not on its conjugate \(\bar{z}\). If \(x\) is a function of both \(z\) and \(\bar{z}\), i.e., \(x(z,\bar{z})\), an error should be thrown if the input data is complex:
elseif right
if ~isreal(z) || isreal(task.r)
error('struct_name:nonanalytic', ['Nonanalytic objective ' ...
'functions are currently not supported in sdf_nls, please ' ...
'use sdf_minf instead.']);
end
Left Jacobian-vector product¶
The transformation returns the left Jacobian-vector product if task.l
is not
empty (task.r
is always empty in this case). This left Jacobian-vector
product is computed in all algorithms. In this task, the derivative
\((\frac{\partial x}{\partial z})^{\TH}\cdot l + (\frac{\partial
x}{\partial \bar{z}})^{\T}\cdot l\) is computed, in which \(l\) is given.
Mathematically, \(x\), \(z\) and \(l\) are vectors of length
\(F\), \(V\) and \(F\) respectively. Suppose \(D =
\frac{\partial x}{\partial z}\) is an \(F\times V\) matrix, then each entry is
given by \(d_{ij} = \frac{\partial x_i}{\partial z_j}\). To simplify the
code, task.l
is not a vector, but has the same format as as the output x
would have if \(x(z)\) was evaluated. The result of the derivative x
should have the same format as the input variable z
. In the code of the
transformation, the following lines are added:
elseif left
% compute left Jacobian-vector product
end
To compute \(\frac{\partial x}{\partial z}\), \(\bar{z}\) is assumed to be constant, while \(z\) is kept constant when \(\frac{\partial x}{\partial \bar{z}}\) is computed. Hence, if \(x\) is only a function of \(z\), i.e., \(x(z)\), the derivative w.r.t. \(\bar{z}\) is zero. If we have \(x(z,\bar{z})\) both derivatives need to be computed.
Examples¶
Implementing struct_sqrt
¶
The transformation struct_sqrt
implements the function \(f :
\R^{I\times R} \rightarrow \R^{I\times R}: x = \sqrt{z}\) in which the square
root is applied entry-wise. Three elements are required for this transformation:
the function evaluation, the right Jacobian-vector product and the left
Jacobian-vector product.
The function evaluation is simply x = sqrt(z)
. The right and left
Jacobian-vector products are given by \(\frac{\partial x}{\partial z}\cdot
r\) and \((\frac{\partial x}{\partial z})^{\TH}\cdot l + (\frac{\partial
x}{\partial \bar{z}})^{\T}\cdot l\), respectively. The derivative
\(\frac{\partial x}{\partial z}\) is given by \(\frac{1}{2\sqrt{z}}\).
When computing the derivative w.r.t. \(z\), \(\bar{z}\) is kept
constant. As \(x\) does not depend on \(\conj{z}\), the derivative
\(\frac{\partial x}{\partial \bar{z}} = 0\). Hence, the complete code for
the transformation is given by:
function [x,state] = struct_sqrt(z,task)
state = [];
if nargin < 2, task = []; end
right = ~isempty(task) && isfield(task, 'r') && ~isempty(task.r);
left = ~isempty(task) && isfield(task, 'l') && ~isempty(task.l);
if ~right && ~left
x = sqrt(z);
elseif right
x = 1./(2*sqrt(z)).*task.r;
elseif left
x = conj(1./(2*sqrt(z))).*task.l;
end
Implementing struct_poly
¶
The transformation struct_poly
imposes polynomial structure on each of the
columns of a factor matrix, i.e., each entry of the factor \(x\) (an
\(I\times R\) matrix) is given by \(x_{ir} = c_{r0} + c_{r1} t_i +
c_{r2} t_i^2 + \cdots + c_{rd} t_i^d\). The points \(t_i\),
\(i=1,\ldots,I\) are fixed. The coefficients \(c\) (an \(R\times
(d+1)\) matrix) are the variables \(z\) (the highest-degree coefficients
\(c_{rd}\) form the first column).
Again, the different tasks are implemented. The first lines of the
transformation are similar to struct_sqrt
:
function [x, state] = struct_poly(z, task, t)
if nargin < 2, task = []; end
right = ~isempty(task) && isfield(task, 'r') && ~isempty(task.r);
left = ~isempty(task) && isfield(task, 'l') && ~isempty(task.l);
state = [];
The function evaluation can be written as a matrix multiplication by using a
basis matrix B
in which each column is a power of the point vector t
,
i.e., x = B*z
:
if ~right && ~left
B = bsxfun(@power, t(:), size(z, 2)-1:-1:0);
x = B*z;
The derivative \(\frac{\partial x}{\partial z}\) is simply given by B
,
hence the right and left Jacobian-vector products are given by
elseif right
B = bsxfun(@power, t(:), size(z, 2)-1:-1:0);
x = B*task.r.';
elseif left
B = bsxfun(@power, t(:), size(z, 2)-1:-1:0);
x = task.l.'*conj(B);
end
Storing intermediate results¶
Some transformations require expensive computations, which may depend on the current
value of the variables, but also on extra input parameters that stay
constant over all iterations. In the latter case, persistent storage can be used
to store precomputed results, while the state
field can be used in the
former case. First, the use of the state
field is illustrated for a
transformation that implements the square root constraint: struct_sqrt
.
Next, the use of the persistent
field is illustrated for the polynomial
transformation struct_poly
.
Using the state
field¶
In the struct_sqrt
example, the quantity sqrt(z)
and the division
1./(2*sqrt(z))
are computed many times in each iteration, as the three tasks
are performed multiple times per iteration depending on the chosen optimization
algorithm. A property of the optimization-based algorithms used in Tensorlab is
that every time a variable is changed, the function value is computed. In other
words, the variable z
does not change between function value computations.
Therefore, the variable z
seen by the right and left Jacobian-vector
products, is the same as in the last function evaluation. The computation time
can hence be reduced by performing computations depending on z
in the
function evaluation and by reusing these results in the right and left
Jacobian-vector product computations. The state
output field allows the
temporary storage of such intermediary results.
The state
output is a struct. Each field set during the function evaluation
is available under the same name in the task
struct. In this example, a
field dz
is constructed to store the result 1./(2*sqrt(z))
. To compute
the right and left Jacobian-vector product, this precomputed value is used:
function [x,state] = struct_sqrt(z,task)
state = [];
if nargin < 2, task = []; end
right = ~isempty(task) && isfield(task, 'r') && ~isempty(task.r);
left = ~isempty(task) && isfield(task, 'l') && ~isempty(task.l);
if ~left && ~right
x = sqrt(z);
state.dz = 1./(2*x);
elseif right
x = task.dz.*task.r;
elseif left
x = conj(task.dz).*task.l;
end
As can be seen, the value of state.dz
is only computed when the function is
evaluated. As the function evaluation is always called before the left and right
Jacobian-vector products, state
should be set in the function evaluation
task. The value of state
is ignored in the other calls, hence state
can
be left empty ([]
) for the right and left Jacobian-vector products.
Persistent storage¶
In the previous example, the precomputed value depends on the variable
\(z\). As this variable changes every iteration, the state
needs to be
recomputed every iteration. In the struct_poly
transformation, the matrix
basis B
is computed every function evaluation, but B
is independent of
the variables. Hence, setting state.B = B
every iteration reduces the
computational cost a little, but this still wastes a lot of computation time.
Therefore, a special field is provided: the persistent
field. B
can be
computed in the first call to struct_poly
and stored in
state.persistent.B = B
. In all subsequent calls, the struct task
will contain the field persistent
which contains B
as a field. Hence,
the transformation can be rewritten as:
function [x, state] = struct_poly(z, task, t)
if nargin < 2, task = []; end
right = ~isempty(task) && isfield(task, 'r') && ~isempty(task.r);
left = ~isempty(task) && isfield(task, 'l') && ~isempty(task.l);
if ~isempty(task) && isfield(task, 'persistent')
B = task.persistent.B;
state = [];
else % not present, compute
B = bsxfun(@power, t(:), size(z, 2)-1:-1:0);
state.persistent.B = B;
end
if ~left && ~right
x = B*z;
elseif right
x = B*task.r.';
elseif left
x = task.l.'*conj(B);
end
end
B
is now computed only once. In subsequent iterations B
can be loaded
from the task.persistent
struct to be used in further computations. Note
that state = []
if B
has already been computed: there is no need to pass
persistent data as this is done in the background.
Comparing state
and persistent
¶
Setting the state
field in the function value call is useful to store
intermediate results needed for the right and left Jacobian-vector products. The
state
field can be used if the intermediate results depend on the variables
z
and possibly on extra input parameters.
The state.persistent
field can be used to store precomputed data that depend
solely on extra input parameters.
Non-numerical variables¶
In the previous examples, only numerical values are used as variable \(z\)
and output \(x\). Cells can be used as well, which is useful if several
variables need to be combined. As an example, consider the struct_poly
example again. Now, instead of taking fixed points t
, the evaluation points
are considered variables as well, i.e.:
model.variables.z = {rand(5,3), rand(10,1)}; % note this is a cell
model.factors.x = {'z', @struct_poly2};
We reimplement the three tasks. Note that this example is given for illustration purposes only, and that the code below may not be the best way to implement this kind of structure.
The function evaluation for struct_poly2
is very similar as for
struct_poly
, only now z
is a cell with the coefficient c = z{1}
and
the evaluation points t = z{2}
:
function [x, state] = struct_poly(z, task, t)
% trivial parts omitted
c = z{1}; t = z{2};
if ~left && ~right
B = bsxfun(@power, t(:), size(c,2)-1:-1:0);
x = B*c;
Note that the persistent
field cannot be used here, as t
changes every
iteration.
For the right Jacobian-vector product, task.r
has the same format as the
input variable z
, i.e., task.r
is a cell of length two. The output x
has the same format as the factor. The derivative \(\frac{\partial
x}{\partial z} = [\frac{\partial x}{\partial c}, \frac{\partial x}{\partial t}]\)
is multiplied by \([\vectorize(r_c)^{\T}, \vectorize(r_t)^{\T}]^{\T}\) in
which \(r_c\) and \(r_t\) are the parts of \(r\) corresponding to
the variables \(c\) and \(t\), respectively. The multiplication can be
written as \(\frac{\partial x}{\partial c}\vectorize(r_c)+\frac{\partial
x}{\partial t}\vectorize(r_t)\). Therefore, the right Jacobian-vector product can
be computed as
elseif right
% derivative w.r.t. c
B = bsxfun(@power, t(:), size(c,2)-1:-1:0);
x = B*task.r{1}.';
% derivative w.r.t. t
cder = [zeros(size(c,1),1),
bsxfun(@times, c(:,1:end-1),size(c,2)-1:-1:1)];
x = x + (B*cder).*task.r{2};
Finally, the left Jacobian-vector product is computed. task.l
has the same
format as the resulting factor. The output x
should have the same format as
the variable z
in this case. Following the same reasoning as for the right
Jacobian-vector product, the following result is obtained:
elseif left
% derivative w.r.t. c
B = bsxfun(@power, t(:), size(c,2)-1:-1:0);
cl = B'*task.l;
% derivative w.r.t. t
cder = [zeros(size(c,1),1),
bsxfun(@times, c(:,1:end-1),size(c,2)-1:-1:1)];
tl = conj(B*cder).*task.l;
x = {cl.', tl};
end
As before, the repeated computations of B
and cder
can be avoided by
computing both during the function evaluation and using state.B
and
state.cder
to store the results.
The transformations struct_poly
and struct_poly2
can be merged. The
merged transformation executes the code for struct_poly2
if z
is a
cell. Otherwise, the results for struct_poly
are returned.