Datasets: dense, incomplete, sparse and structured¶
A scalar is a tensor of order zero. Vectors and matrices are first- and second-order tensors, respectively. Arrays with three or more dimensions are called higher-order tensors.
Datasets can be dense, sparse, incomplete or structured. Tensorlab provides different representations for each of these cases, discussed in Tensor representations. These representations are developed in such a way that a number of operations can be computed efficiently, which is discussed in Tensor operations. In Visualizing higher-order datasets, different visualization techniques are illustrated.
Tensor representations¶
Dense tensors¶
A dense (or full) tensor is simply a MATLAB array, e.g., A = randn(10,10)
or T = randn(5,5,5)
. MATLAB supports a few basic operations on tensors, for example:
T = randn(10,20,30);
S = randn(10,20,30);
V = T + S; % Elementwise addition
W = T.*S; % Elementwise multiplication
Incomplete tensors¶
An incomplete tensor is a dataset in which some (or most) of the entries are unknown. An incomplete tensor is efficiently represented in Tensorlab by a structure that contains the necessary information. The efficient representation of an incomplete tensor of which the unknown entries are padded with NaN
can be obtained by using the format command fmt
. In the following example, a MATLAB array of size \(9\times 9 \times 9\) with only three known elements is replaced by its efficient representation:
T = NaN(9,9,9);
T(1,2,3) = -1;
T(4,5,6) = -2;
T(7,8,9) = -3;
T = fmt(T)
T =
matrix: [9x81 double]
size: [9 9 9]
ind: [3x1 int64]
val: [3x1 double]
incomplete: 1
sparse: 0
sub: {[3x1 int32] [3x1 int32] [3x1 int32]}
Note that if there are no entries equal to NaN
, then fmt(T)
returns T
itself. On the other hand, from the set of known entries, their positions, the tensor’s size and an incomplete flag, the efficient representation can be constructed directly such as in the following example:
T = struct;
T.val = [-1 -2 -3]; % The known elements
T.sub = [1 2 3; 4 5 6; 7 8 9]; % Their positions
T.size = [9 9 9]; % The size of the tensor
T.incomplete = true; % The incomplete flag
T = fmt(T); % Complete the representation
Note that for obtaining a valid efficient representation, fmt
needs to be used such that additional fields needed for computational purposes are inserted, e.g., T.matrix
. Alternatively, the user may supply linear indices with T.ind
instead of subindices with T.sub
. To convert linear indices to subindices and vice versa, see the MATLAB functions ind2sub
and sub2ind
, respectively.
Given the representation of an incomplete tensor, the MATLAB array with NaN
values for the unknown entries can be obtained using the command ful
:
T = struct;
T.val = [1 2 3];
T.ind = [1 5 9];
T.size = [3 3];
T.incomplete = true;
T = fmt(T);
S = ful(T)
S =
1 NaN NaN
NaN 2 NaN
NaN NaN 3
Sparse tensors¶
A sparse tensor is a dataset in which most of the entries are zero, e.g., a large diagonal matrix. Tensorlab supports efficient representations of sparse tensors of which at least 95% of the entries are zero. The efficient representation can be obtained using the format command fmt
:
T = zeros(5,5,5);
T(1:31:end) = 1;
T = fmt(T)
T =
matrix: [5x25 double]
size: [5 5 5]
ind: [5x1 int64]
val: [5x1 double]
incomplete: 0
sparse: 1
sub: {[5x1 int32] [5x1 int32] [5x1 int32]}
If less than 95% of the entries are zero, then fmt
returns T
itself. To construct an efficient representation of a sparse tensor directly, define a structure containing the tensor’s nonzero elements in the same way as for incomplete tensors and set the sparse flag. Afterwards, fmt
should be called to insert additional necessary fields for computational purposes and to obtain a valid efficient representation:
T.val = [-1 -2 -3]; % The non-zero elements
T.sub = [1 2 3; 4 5 6; 7 8 9]; % Their positions
T.size = [9 9 9]; % The size of the tensor
T.sparse = true; % The sparse flag
T = fmt(T);
The ful
command can be used to convert the representation to the corresponding MATLAB array:
T = struct;
T.val = [1 2 3];
T.ind = [1 5 9];
T.size = [3 3];
T.sparse = true;
T = fmt(T);
S = ful(T)
S =
1 0 0
0 2 0
0 0 3
Structured tensors¶
Tensorlab 3.0 introduces additional support for structured tensors. Two groups of structured tensors can be handled: structured tensors resulting from the tensorization of data and tensors that are given under the form of an explicit factorization.
Structured tensors resulting from the tensorization of data
Consider a Hankel matrix, constructed from an arbitrary vector. Such a matrix has a special structure: each anti-diagonal (or skew-diagonal) is constant. For example, from the generating vector \(\vec{v} = [1,2,\ldots,7]\), the following Hankel matrix \(\mat{H}\) can be obtained:
One can exploit the structure of \(\mat{H}\) when storing \(\mat{H}\) and even when performing operations on \(\mat{H}\). The former is done in this case by only storing the small generating vector \(\vec{v}\), instead of the 16 elements of \(\mat{H}\).
This Hankel-based mapping is a specific example of tensorization and the Hankel matrix is a specific example of a tensor obtained from a tensorization of some lower-order data. The concept of tensorization is covered in more detail in Tensorization techniques. Tensorlab supports efficient representations of Hankel matrices and (block-)Hankel tensors of arbitrary order, as well as Löwner, segmentation-based and decimation-based structures obtained by tensorization.
Each efficient representation is a MATLAB structure array, analogous to the efficient representation for incomplete and sparse tensors. For example, the following is an efficient representation of the previously mentioned Hankel matrix \(\mat{H}\):
H = hankelize(1:7,'full',false);
H =
type: 'hankel'
val: [7x1 double]
dim: 1
order: 2
ind: 4
ispermuted: 0
repermorder: [1 2]
size: [4 4]
subsize: [1x1 struct]
By calling ful
on H
, the dense matrix is obtained:
H = ful(H)
H =
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
Tensors given under the form of an explicit factorization
Consider a rank-1 tensor \(\ten{T} \in\C^{100\times 100 \times 100}\), which can be written as the outer product of three vectors \(\vec{a}\in\C^{100}\), \(\vec{b}\in\C^{100}\) and \(\vec{c}\in\C^{100}\). Each element of \(\ten{T}\) satisfies \(t_{ijk} = a_ib_jc_k\). The 300 elements from \(\vec{a}\), \(\vec{b}\) and \(\vec{c}\) exactly determine all of the 1,000,000 elements of \(\ten{T}\). Hence, instead of storing \(\ten{T}\) and performing operations on \(\ten{T}\), only the three vectors could be stored and used for calculations.
Tensorlab supports efficient operations on tensors represented by a factorization in rank-1 terms (CPD, see Canonical polyadic decomposition), with arbitrary rank, order and size. Tensorlab also supports efficient operations on tensors which are given under the form of a tensor train (TT) decomposition [23], a low multilinear rank approximation (LMLRA, see Multilinear singular value decomposition and low multilinear rank approximation) or a block term decomposition (BTD, see Block term decomposition, with as specific case the decomposition in multilinear rank-\((L_r,L_r,1)\) terms).
The efficient representations corresponding to each structure are a collection of factor matrices and core tensor(s). Consider an \(N\)th-order tensor \(\ten{T}\) of size \(I_1\times I_2 \times \cdots \times I_N\):
- For a CPD, the representation is a cell array containing the factor matrices of sizes \(I_n \times R\) with \(R\) the number of rank-1 terms.
- For an LMLRA, the representation is a cell array containing (1) a cell array with the factor matrices of sizes \(I_n \times R_n\), and (2) a core tensor of size \(R_1\times R_2\times \cdots \times R_N\).
- For a BTD, the representation is a cell array with \(R\) entries, each representing a tensor with multilinear rank \((L_{r1},L_{r2},\ldots,L_{rN})\). The \(r\)th cell entry has \(N+1\) elements: the first \(N\) elements represent the factor matrices of size \(I_n \times L_{rn}\), while element \(N+1\) represents the core tensor of size \(L_{r1}\times L_{r2}\times \cdots \times L_{rN}\). Note that this is different from the LMLRA format as the factor matrices are not encapsulated in an additional cell.
- For a TT, the representation is a cell array with \(N\) entries. The first and last entry represent two factor matrices of sizes \(I_1 \times R_1\) and \(R_{N-1}\times I_N\). The other entries represent the cores of sizes \(R_n\times I_{n+1}\times R_{n+1}\) for \(n = 1,\ldots,N-2\).
While Tensorlab supports the efficient representation of TT decompositions, we refer to other toolboxes for the computation of such decompositions.
Useful commands for efficient representations¶
Besides generating the entire dense tensor given an efficient representation of an incomplete, sparse or structured tensor, the command ful
provides an additional feature allowing the user to pass indices for specific elements of the tensor to be calculated or extracted. Given an efficient representation of the following Hankel matrix \(\mat{H}\):
the following returns the diagonal of \(\mat{H}\) using linear indices:
ful(H,[1 6 11 16])
ans =
1 3 5 7
Alternatively, subindexing can be used to extract fibers or slices. For example:
ful(H,2,':')
ful(H,2:3,':')
ans =
2 3 4 5
ans =
2 3 4 5
3 4 5 6
Further, getstructure
returns the type of efficient representation. The functions getorder
and getsize
return the order and the size of the tensor. Given a dense tensor, the function detectstructure
will try to detect structure present in the dataset. Currently, only symmetry and Hankel structure is detected. If the detected structure corresponds to a supported efficient representation in Tensorlab, the representation is also returned. Finally, the function isvalidtensor
determines if the given efficient representation is a valid tensor representation. In the following example, the tensor representation is not valid as the field size
is not present:
T = struct;
T.val = [1 2 3];
T.ind = [1 2 3];
T.incomplete = true;
isvalidtensor(T)
Tensor operations¶
Matricization and tensorization¶
A dense tensor T
can be flattened or unfolded into a matrix with M = tens2mat(T,mode_row,mode_col)
. Let size_tens = size(T)
, then the resulting matrix M
is of size prod(size_tens(mode_row))
by prod(size_tens(mode_col))
. For example,
T = randn(3,5,7,9);
M = tens2mat(T,[1 3],[4 2]);
size(M)
outputs [21 45]
. In tens2mat
, a given column (row) of M
is generated by fixing the indices corresponding to mode_col
(mode_row
) and then looping over the remaining indices in the order mode_row
(mode_col
). To transform a matricized tensor M
back into its original size size_tens
, use mat2tens(M,size_tens,mode_row,mode_col)
.
The most common use case is to matricize a tensor by placing its mode-\(n\) vectors as columns in a matrix, also called a mode-\(n\) matricization. This can be achieved by
T = randn(3,5,7);
n = 2;
M = tens2mat(T,n);
where the optional argument mode_col
is implicitly equal to [1:n-1 n+1:ndims(T)]
.
The commands tens2mat
and mat2tens
currently only support dense tensors. For other matricization and detensorization methods, we refer to Tensorization techniques.
Tensor-matrix product¶
In a mode-\(n\) tensor-matrix product, the tensor’s mode-\(n\) fibers are premultiplied by a given matrix. In other words, U*tens2mat(T,n)
is a mode-\(n\) matricization of the mode-\(n\) tensor-matrix product \(\texttt{T} \tmprod_n \texttt{U}\). The function tmprod(T,U,mode)
computes the tensor-matrix product \(\texttt{T} \tmprod_{\texttt{mode(1)}} \texttt{U{1}} \tmprod_{\texttt{mode(2)}} \texttt{U{2}}\, \cdots\). For example,
T = randn(3,5,7);
U = {randn(11,3),randn(13,5),randn(15,7)};
S = tmprod(T,U,1:3);
size(S)
outputs [11 13 15]
. To compute a single tensor-matrix product, a cell array is not necessary. The following is an example of a single mode-2 tensor-matrix product:
T = randn(3,5,7);
S = tmprod(T,randn(13,5),2);
Note that tmprod
currently only supports dense tensors.
The contract(T,v,n)
command computes contractions between the tensor T
with the vectors v
in the modes given by n
.
Kronecker and Khatri–Rao product¶
Tensorlab includes a fast implementation of the Kronecker product \(A \otimes B\) with the function kron(A,B)
, which overrides MATLAB’s built-in implementation. Let \(A\) and \(B\) be matrices of size \(I\times J\) and \(K\times L\), respectively, then the Kronecker product of \(A\) and \(B\) is the \(IK \times JL\) matrix
More generally, kron(A,B,C,...)
and kron(U)
compute the Kronecker products \(((\texttt{A} \otimes \texttt{B}) \otimes \texttt{C}) \otimes \cdots\) and \(((\texttt{U\{1\}} \otimes \texttt{U\{2\}}) \otimes \texttt{U\{3\}}) \otimes \cdots\), respectively.
The Khatri–Rao product \(A\odot B\) can be computed by kr(A,B)
. Let \(A\) and \(B\) both be matrices with \(N\) columns, then the Khatri–Rao product of \(A\) and \(B\) is the column-wise Kronecker product
More generally, kr(A,B,C,...)
and kr(U)
compute the Khatri–Rao products \(((\texttt{A} \odot \texttt{B}) \odot \texttt{C}) \odot \cdots\) and \(((\texttt{U\{1\}} \odot \texttt{U\{2\}}) \odot \texttt{U\{3\}}) \odot \cdots\), respectively.
Note that kr
and kron
currently only support dense tensors.
Matricized-tensor times Kronecker and Khatri–Rao product¶
Matricized-tensor times Kronecker product
Often, algorithms for tensor decompositions do not explicitly need matricized tensors or Kronecker products, but rather need the following matricized tensor Kronecker product:
tens2mat(T,n)*conj(kron(U([end:-1:n+1 n-1:-1:1])))
where kron(...)
represents the Kronecker product U{end}
\(\otimes \cdots\otimes\) U{n+1}
\(\otimes\) U{n-1}
\(\otimes\cdots\otimes\) U{1}
. The function mtkronprod(T,U,n)
computes the result of this operation without explicitly computing either of the operands and without permuting the elements of the tensor T
in memory.
Note that for n=0
, the efficiently implemented product gives the same result as the following:
T(:).'*conj(kron(U(end:-1:1)))
For improved computational performance, it is recommended that the two largest dimensions of T
are the first and last modes of T
. The command mtkronprod
supports dense, sparse, incomplete and structured tensors.
Matricized-tensor times Khatri–Rao product
Similarly, algorithms for tensor decompositions usually do not explicitly need matricized tensors or Khatri–Rao products, but rather the following matricized tensor Khatri–Rao product:
tens2mat(T,n)*conj(kron(U([end:-1:n+1 n-1:-1:1])))
The function mtkrprod(T,U,n)
computes the result of this operation without explicitly computing either of the operands and without permuting the elements of the tensor T
in memory.
For n=0
, the efficiently implemented product gives the same result as the following:
T(:).'*conj(kr(U(end:-1:1)))
For improved computational performance, it is recommended that the two largest dimensions of T
are the first and last modes of T
. The command mtkrprod
supports dense, sparse, incomplete and structured tensors.
Frobenius norm¶
The Frobenius norm of a tensor is the square root of the sum of square moduli of its (known) elements. Given a tensor T
, its Frobenius norm can be computed with frob(T)
. If the tensor is dense, this is equivalent with norm(T(:))
, i.e., the two-norm of the vectorized tensor. The squared Frobenius norm can be computed with frob(T,'squared')
.
The command supports dense, sparse, incomplete and structured tensors. For structured tensors, the norm is computed in a very efficient way. Consider a third-order Hankel tensor \(H\) of size \(334\times 334\times 334\), constructed using the hankelize
method (see Tensorization techniques). We compare the time needed for the computation of the Frobenius norm given the dense tensor H
with the time needed for the computation given its efficient representation Heff
:
[H,Heff] = hankelize(linspace(0,1,1000),'order',3);
tic; disp(frob(H)); toc; % Using the dense tensor H
tic; disp(frob(Heff)); toc; % Using the efficient representation of H
3.2181e+03
Elapsed time is 0.026401 seconds.
3.2181e+03
Elapsed time is 0.000832 seconds.
Inner and outer product¶
Given two tensors T1
and T2
, the command inprod(T1,T2)
computes the inner product T2(:)'*T1(:)
. The result is a scalar. The command supports dense, sparse, incomplete and structured tensors for both T1
and T2
.
The command outprod(T1,T2)
computes the outer product of two tensors T1
and T2
. If T1
has size \(I_1\times \cdots \times I_M\) and T2
has size \(J_1\times \cdots \times J_N\), then the resulting tensor has size \(I_1\times \cdots \times I_M \times J_1 \times \cdots \times J_N\). If T1
, respectively T2
, is a row or column vector, the singleton dimension is discarded. Currently, the command outprod
supports only dense tensors.
Noisy versions of tensors and representations¶
The noisy
command can be used to create a noisy version of dense, incomplete, sparse and structured tensors given a signal-to-noise ratio. For example,
T = randn(10,10,10);
SNR = 20;
That = noisy(U,SNR);
frob(That-T)/frob(T)
ans =
0.1000
Note that for efficient representations of incomplete, sparse and structured tensors, noise is added to the variables of the representation rather than to the full tensor. For example, given the efficient representation Heff
of a \(3\times 3\) Hankel matrix H
, a Hankel matrix is returned with constant perturbations along the anti-diagonals:
[H,Heff] = hankelize(1:5);
SNR = 20;
Heffhat = noisy(Heff,SNR);
H
ful(Heffhat)
H =
1 2 3
2 3 4
3 4 5
ans =
1.2691 2.2507 3.3571
2.2507 3.3571 3.5825
3.3571 3.5825 4.9737
The noisy
command can be used for cell arrays as well. Noise is then added on every entry of the cell. For example:
U = {randn(10,2),randn(15,2),randn(20,2)};
SNR = 20;
Uhat = noisy(U,SNR);
creates a cell array U
and a noisy version of that cell array called Uhat
. This is equivalent to adding noise to the efficient representation of factorized tensors as discussed in the previous paragraph.
Visualizing higher-order datasets¶
Tensorlab offers four methods to visualize higher-order datasets. The
methods slice3
, surf3
and voxel3
accept third-order datasets,
while the visualize
method accepts tensors of any order. The following
example demonstrates the first three methods on the amino acids dataset
[16]:
url = 'http://www.models.life.ku.dk/go-engine?filename=amino.mat';
urlwrite(url,'amino.mat'); % Download amino.mat in this directory.
load amino X;
figure(1); voxel3(X);
figure(2); surf3(X);
figure(3); slice3(X);
The resulting plots are shown in Figure 30, Figure 31 and Figure 32, respectively.
visualize
is a more advanced method for visual exploration of tensors of
arbitrary order. It extracts and plots slices of order one, two or three. The
other dimensions can then be varied to ‘walk through’ the data. The following
example illustrates visualize
for a tensor with smooth slices:
R = 4; degree = 3;
sz = [10 15 20 25];
for n = 1:4
% Use struct_poly to generate polynomial factor vectors
U{n} = struct_poly(rand(R,degree+1), [], 1:sz(n));
end
T = cpdgen(U);
visualize(T)
The result is shown in Figure 33. The check boxes can be
used to select the dimensions that are plotted. By checking a single dimension,
a mode-\(n\) vector is shown, checking two dimensions results in a surface
plot, while checking three dimensions shows the slice3
plot of the data. In
Figure 33 the slice \(\ten{T}(:,:,1,1)\) is plotted, since
\(i_3=i_4=1\). Changing the sliders or the text field for dimensions
\(i_3\) or \(i_4\), allows the user to view other slices. When hovering
a check box, label, slider or text field, a tooltip is shown with extra
information.
The power of visualize
lies in the ability to customize the plots
and to view both the data tensor and its decomposition at the same time, as the
following example demonstrates:
% Create dataset
R = 4; degree = 3;
sz = [10 15 20 25];
U = cell(1, 4); % Factor matrices
t = cell(1, 4); % Evaluation points
for n = 1:4
% Use struct_poly to generate polynomial factor vectors in points t{n}
t{n} = linspace(-0.5, 0.5, sz(n));
U{n} = struct_poly(rand(R,degree+1), [], t{n});
end
T = noisy(cpdgen(U), 30);
% Decompose tensor
Ures = cpd(T, R);
% Visualization
options = struct;
options.Original = T; % Original data = T
options.Trace = true; % Do not erase previous slices (only first-order)
options.DimensionTransform = t; % Use points t for the axes
options.DimensionLabels = 't_%d'; % Use t_1, t_2 etc to label the dimensions
visualize(Ures, options)
The result after clicking a few sliders is shown in Figure 34. Note that the dimensions have different labels and have
the appropriate range as determined by the evaluation points. Going from a
first-order to a second-order slice erases all traces again. By plotting the
result and the data together, it is easier to see where the data is fitted well
and where it is not. For an overview of the different options, see
help visualize
.

Figure 34: Visualization of a fourth-order tensor using visualize
with original
data, tracing and dimension transforms after changing some sliders.
As a final example the support
option is illustrated. Suppose we are only
interested in the part of the data inside the cylinder \(t_2^2+t_4^2 \leq 0.5^2\) , then
we can prevent everything outside of this sphere to be plotted. To do this, an
additional function indicating the region of interest is created:
function region = support(dim, ind)
% Compute support region for selected dimension and transformed indices
% dim is a logical array with a 1 if that dimension is plotted
% ind is a cell of (possibly transformed) indices
% region is vector or a matrix depending on sum(dim) == 1 or 2, respectively.
[t2,t4] = ndgrid(ind{2},ind{4});
region = t2.^2 + t4.^2 <= 0.5^2;
The following code block generates Figure 35 for the same dataset as before:
options = struct;
options.DimensionTransform = t; % Use points t for the axes
options.DimensionLabels = 't_%d'; % Use t_1, t_2 etc to label the dimensions
options.Support = @support; % Use support region
visualize(Ures, options)