Plover Tutorials

This part of the documentation gives worked examples of using Plover.

Table of Contents

Matrix Multiplications
Computing Correlation Vectors
Programming a QR Least Squares Solver
Citations

Matrix Multiplications

A quick application of using Plover is to create bespoke functions which operate on matrices. A very basic example is a matrix times a vector:

mul_mat_vec {m,n} (A :: double[m,n]) (v :: double[n]) :: double[m]
  := A * v;

This function takes two implicit arguments (m and n) which give the dimensions of the \(m\times n\) matrix A and an \(n\)-dimensional vector v. Putting this into a file called mats.plv, we compile it with Plover using

$ plover mats.plv

which creates two files, mats.c and mats.h. The resulting C definition is

void mul_mat_vec (const s32 m, const s32 n,
                  const double * A, const double * v,
                  double * result)
{
    for (s32 idx = 0; idx < m; idx++) {
        double sum = 0;

        for (s32 idx2 = 0; idx2 < n; idx2++) {
            sum += A[n * idx + idx2] * v[idx2];
        }
        result[idx] = sum;
    }
}

with a corresponding definition in the header file. The rules for creating the C function type from a Plover type are simple: the parameters for the C function appear in the same order as they appear in the Plover definition, and if the result is a vector type (i.e., non-scalar), it is appended as the last argument. The caller is responsible for allocating the memory of the returned vector.

Another example is computing a quadratic form \(v^TAv\):

mat_quad_form {n} (A :: double[n,n]) (v :: double[n]) :: double
  := (v^T * A * v)[0];

Multiplication is left-associative, so the inner expression is the same as (v^T * A) * v. The rule for taking the transpose of an \(n\)-dimensional vector is that it becomes a \(1\times n\)-dimensional matrix, and so the product with the \(n\times n\) matrix A, giving a \(1\times n\) matrix. This is then multiplied by the vector, and a matrix times a vector results in a vector, in this case \(1\)-dimensional. This is indexed to get the sole entry. Alternatively, the body of the function could equivalently be (A * v) * v, without indexing, where the second * is a dot product between vectors.

Compiling this, the resulting C is

double mat_quad_form (const s32 n,
                      const double * A, const double * v)
{
    double sum = 0;

    for (s32 idx = 0; idx < n; idx++) {
        double sum2 = 0;

        for (s32 idx2 = 0; idx2 < n; idx2++) {
            sum2 += v[idx2] * A[n * idx2 + idx];
        }
        sum += sum2 * v[idx];
    }
    return sum;
}

Notice that the Plover compiler does not generate the intermediate matrix product. This is because the later multiplication informs the first that it will only be evaluating each element of the first at most once, so the elements may be computed on demand.

Suppose that the vector were instead a matrix. That is,

mat_quad_prod {n,m} (A :: double[n,n]) (B :: double[n,m]) :: double[m,m]
  := B^T * A * B;

Plover generates the following code:

void mat_quad_prod (const s32 n, const s32 m,
                    const double * A, const double * B,
                    double * result)
{
    double tmp [m * n];

    for (s32 idx = 0; idx < m; idx++) {
        for (s32 idx2 = 0; idx2 < n; idx2++) {
            double sum = 0;

            for (s32 idx3 = 0; idx3 < n; idx3++) {
                sum += B[m * idx3 + idx] * A[n * idx3 + idx2];
            }
            tmp[n * idx + idx2] = sum;
        }
    }
    for (s32 idx = 0; idx < m; idx++) {
        for (s32 idx2 = 0; idx2 < m; idx2++) {
            double sum = 0;

            for (s32 idx3 = 0; idx3 < n; idx3++) {
                sum += tmp[n * idx + idx3] * B[m * idx3 + idx2];
            }
            result[m * idx + idx2] = sum;
        }
    }
}

Since the second product will use the elements of the first product multiple times, the compiler memoizes (or spills) the result onto the stack. If n is large relative to m this might be unacceptable behavior on an embedded system, and we may be willing to trade stack space for computation time. Plover gives the nomemo operator to control whether a request to memoize an intermediate value will be acknowledged. This does not change the result of a computation, but it might change whether the computation is computable on a given system. Concretely:

mat_quad_prod_safe {n,m} (A :: double[n,n]) (B :: double[n,m]) :: double[m,m]
  := nomemo (B^T * A) * B;

gives

void mat_quad_safe (const s32 n, const s32 m,
                    const double * A, const double * B,
                    double * result)
{
    for (s32 idx = 0; idx < m; idx++) {
        for (s32 idx2 = 0; idx2 < m; idx2++) {
            double sum = 0;

            for (s32 idx3 = 0; idx3 < n; idx3++) {
                double sum2 = 0;

                for (s32 idx4 = 0; idx4 < n; idx4++) {
                    sum2 += B[m * idx4 + idx] * A[n * idx4 + idx3];
                }
                sum += sum2 * B[m * idx3 + idx2];
            }
            result[m * idx + idx2] = sum;
        }
    }
}

Computing Correlation Vectors

The cross-correlation of two (real-valued) signals \(f\) and \(g\) is defined to be

\begin{equation*} (f \star g)[i] = \sum_{j=-\infty}^\infty f[j] g[j+i] \end{equation*}

For periodic signals with period \(N\), the bounds of the sum can be restricted to having length \(N\), and, if we model \(f\) and \(g\) as being vectors of length \(N\), this amounts to a dot product of \(f\) by a cyclic shift of \(g\).

In Plover, the right-hand side can be written as f * (g[i:] # g[:i]). The * operator computes the dot product when given vector operands, and the # operator concatenates two vectors. Like in Python or Matlab, vectors can be sliced by indexing them with a range of values. The lower- or upper- bounds may be omitted on the range operator :, and they default to the bounds of the sliced vector. Here, i: and :i are equivalent to the vectors vec(i,i+1,...,N-1) and vec(0,1,...,i-1), respectively.

With this, we can make a function which computes all of the cross-correlations of two vectors of length N:

cross_cor {N} (f :: double[N]) (g :: double[N]) :: double[N] :=
  vec i in N -> f * (g[i:] # g[:i]);

The declaration for the function gives N as an implicit parameter, f and g as vectors of length N with double-precision floating-point numbers as values, and a return type which is also a vector of length N with doubles as values.

The body of the function is given after the := definition operator. The body creates a new vector of length N with i iterating over that range, computing the cross-correlation for each offset i.

Note

The cross-correlation function can also be written as:

cross_cor {N} (f :: double[N]) (g :: double[N]) :: double[N] :=
  vec i in N -> f * g[(i:i+N) % N];

since arithmetic operators coerce their operands into vectors of compatible size and then vectorize the operation elementwise.

Auto-correlation is the cross-correlation of a vector with itself. Given the above definition, we may write

auto_cor {N} (f :: double[N]) :: double[N] :=
  cross_cor f f;

Since the first argument to cross_cor is implicit, Plover will try to determine a valid argument to place there, which it will determine using f and the return type for auto_cor. If we wish to be explicit, we may instead write cross_cor {N} f f to pass the implicit argument N.

Putting these into a file called cor.plv, we may compile them to C by entering the directory and running

$ plover cor.plv

This creates a files called cor.h and cor.c, with cor.c containing the following definitions:

void cross_cor (const s32 N, const double * f, const double * g, double * result)
{
    for (s32 idx = 0; idx < N; idx++) {
        s32 i = idx;
        double sum = 0;

        for (s32 idx2 = 0; idx2 < N; idx2++) {
            double tmp;

            if (idx2 < -i + N) {
                tmp = g[i + idx2];
            } else {
                tmp = g[idx2 - (-i + N)];
            }
            sum += f[idx2] * tmp;
        }
        result[idx] = sum;
    }
}
void auto_cor (const s32 N, const double * f, double * result)
{
    cross_cor(N, f, f, result);
}

The auto-correlation of a random vector approximates a delta function as the length of the vector goes to infinity. We will make a test which demonstrates this.

import prelude;

main () :: int :=
  ( v := normalize (vec i in 1000 -> rand_normal());
    print_vec $ auto_cor v;
    return 0;
  );

This imports the standard Plover library, and defines a main function. The function creates a vector of 1000 random doubles, normally distributed, and normalizes the vector so that its Euclidean length is 1. After this, the auto-correlation vector is printed. In Plover, the dollar sign acts like an open parenthesis which extends to the end of an expression.

Compiling and running this file with:

$ plover cor.plv
$ gcc cor.c prelude.c -o cor
$ ./cor

shows that the 0th element is 1.0 (since this entry represents the dot product of the vector with itself), and the rest are relatively small. We can measure this with the following code:

vec_mean {n} (v :: double[n]) :: double :=
  sum v / n;

main () :: int :=
  ( w := vec N in 2:2000 -> (
           v := normalize $ vec i in N -> rand_normal();
           av := auto_corr v;
           vec_mean $ av[1:] .* av[1:]
         );
    print_vec w;
    return 0;
  );

The .* operator is the point-wise product ("Hadamard" product) and effectively squares each element in the array. So w is a vector of the mean of the squares of the non-zero-offset auto-correlations for various sizes of random vectors. Plotting this vector in a graphing application, one can see the errors decrease with the inverse of the size of the vectors.

Programming a QR Least Squares Solver

We're going to step through the implementation of a textbook algorithm: a QR least squares solver for overdetermined systems. That is, given \(A\in \R^{m\times n}\) and \(b\in \R^m\) with \(m \geq n\), we want to find a vector \(x\in\R^n\) such that the squared error \(\norm{Ax-b}\) is minimized. We give a brief explanation of the math first, following [MC2013].

Suppose we can compute an orthogonal matrix \(Q\in\R^{m\times m}\) such that

\begin{equation*} Q^T A = R = \begin{bmatrix}R_1 \\ 0 \end{bmatrix} \begin{matrix} \scriptstyle n \\ \scriptstyle m-n \end{matrix} \end{equation*}

where \(R_1\in \R^{n\times n}\) is upper triangular. Then we can write

\begin{equation*} Q^T b = \begin{bmatrix}c \\ d \end{bmatrix} \begin{matrix} \scriptstyle n \\ \scriptstyle m-n \end{matrix} \end{equation*}

and we note that

\begin{equation*} \norm{Ax-b} = \norm{Q^TAx-Q^Tb} = \norm{R_1x-c}+\norm{d} \end{equation*}

If \(A\) is full rank, we can solve the system \(R_1x=c\) exactly, and the remaining error is \(d\).

The essential steps are:

Apply orthogonal transformations to \(A\) and \(b\) until \(A\)'s first \(n\) rows are upper triangular.
Backsolve the triangular system.
Return the solution and the error.

We will attempt to mindlessly follow the implementation given in [MC2013].

QR Factorization

First we apply a sequence of Givens rotations to introduce zeroes into our matrix \(A\). A Givens rotation is a pair \(c=\cos(\theta)\) and \(s=\sin(\theta)\) such that

\begin{equation*} \begin{bmatrix} c & s \\ -s & c \end{bmatrix}^T \begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} r \\ 0 \end{bmatrix} . \end{equation*}

Pseudocode to calculate such a pair follows:

givens (a, b) returns [\(c, s\)]

if \(b = 0\)

\(c = 1; s = 0\)

else if \(\abs{b} > \abs{a}\)

\(\tau = -a/b; s = 1/\sqrt{1+\tau^2}; c = s\tau\)

else

\(\tau = -b/a; c = 1/\sqrt{1+\tau^2}; s = c\tau\)

The plover version is below. We will step through it line by line.

static givens (a :: double) (b :: double) :: double[2,2] := (
  c :: double;
  s :: double;
  if b == 0 then (
    c <- 1; s <- 0
  ) else if fabs b > fabs a then (
    tau := -a/b; s <- 1/sqrt(1+tau^2); c <- s*tau
  ) else (
    tau := -b/a; c <- 1/sqrt(1+tau^2); s <- c*tau;
  );

  mat( c, s ;
      -s, c );
);

static givens (a :: double) (b :: double) :: double[2,2] := (
  ...
);

Plover is statically typed, and all functions require a type signature, although in many cases types can be inferred (see the language reference section on type holes). This function signature indicates that the function givens is static (declared static in the generated C file, with no prototype in the header file), takes two arguments a and b of type double, and returns a 2-by-2 matrix of doubles. Function declarations and variable initializations use the := operator and must be terminated by a semicolon. Blocks are enclosed by parentheses, and the expressions within a block are separated by semicolons.

c :: double;
s :: double;

A new local variable may either be declared with a type (var :: type;) or with an initial value (var :: optional_type := value;), in which case the type is optional and can be inferred. In this case, c and s are set by each branch of the if expression below, so we declare them with only their types. Declaring a variable does not initialize it.

if b == 0 then (
  c <- 1; s <- 0
)

The condition of an if expression does not need enclosing parentheses. The condition must be followed by the keyword then and an expression. In this case, we have a block which updates the values of c and s. Updating variables must be done with <- . An if should be terminated by a semicolon when used as a statement.

Note

In Plover, everything is an expression. Sometimes it is convenient to call an expression appearing in a block a statement.

else if fabs b > fabs a then (
  tau := -a/b; s <- 1/sqrt(1+tau^2); c <- s*tau
) else (
  tau := -b/a; c <- 1/sqrt(1+tau^2); s <- c*tau
);

if statements are optionally followed by an else clause and another expression or block. Here, the fabs function is called on b and on a, and these values are compared to choose the branch. The local tau is initialized, and c and s are updated as shown in the pseudocode above. The fabs and sqrt functions are included in Plover's prelude module.

mat( c, s ;
    -s, c );

The final expression in a block is treated as the value for that block. Here, the function returns a 2-by-2 matrix literal containing the values we've just computed. The output code below shows how this is translated into C.

Let's take a look at the C code generated so far:

// excerpt, qr.c
static void givens (const double a, const double b, double * result);
void givens (const double a, const double b, double * result)
{
    double c;
    double s;

    if (b == 0) {
        c = 1;
        s = 0;
    } else {
        if (fabs(a) < fabs(b)) {
            double tau;

            tau = -(a / b);
            s = 1 / sqrt(1 + tau * tau);
            c = s * tau;
        } else {
            double tau;

            tau = -(b / a);
            c = 1 / sqrt(1 + tau * tau);
            s = c * tau;
        }
    }
    result[2 * 0] = c;
    result[2 * 0 + 1] = s;
    result[2 * 1] = -s;
    result[2 * 1 + 1] = c;
}

We can see that Plover passes the result matrix as an extra argument, and stores the dense matrix in a flat array in row-major order. Arguments are by default passed const, but modifications are allowed with the out and inout parameter options; see the language reference. The rest of the code matches the input closely.

Now we will use this routine to factor our matrix. Starting at the lower left corner, we go up and then right, introducing zeros with one Givens rotation at a time. Pseudocode from [MC2013]:

qr_factor (m, n, A)

for \(j = 1:n\)

for \(i=m:-1:j+1\)

\(R\) = givens(\(A(i-1,j),A(i,j)\))

\(A(i-1:i,j:n) = R^T A(i-1:i, j:n)\)

We pick a rotation that introduces a zero at location \((i,j)\) and apply it to rows i and i-1 of \(A\), updating them in-place. Note that the second loop counts down from \(m\) to \(j+1\), and the arrays are one-indexed.

The Plover code:

qr_update {m, n}
  (inout b :: double[m])
  (inout A :: double[m, n])
  :: Void := (

    for j in 1 .. n,
        i in m .. j+1 : -1 -> (

      -- Givens rotation
      rot := givens A[i-2,j-1] A[i-1,j-1];
      -- Rotate one column at a time
      for k in j..n -> (
        v := A[i-2 .. i-1, k-1];
        A[i-2 .. i-1, k-1] <- rot^T * v;
      );

      -- Rotate b vector
      v := b[i-2 .. i-1];
      b[i-2 .. i-1] <- rot^T * v;

  );
);

qr_update {m, n}
  (inout b :: double[m])
  (inout A :: double[m, n])

We use inout variables, mutating b and A as we go along. This way, we never store the Q matrix and simply return the upper triangular rotation of A.

{m, n} denotes that qr_update takes two implicit int parameters. The function qr_update can be called simply with the (explicit) b and A arguments, and m and n will be inferred. If the dimensions of the explicit arguments don't match, Plover will report a type error. See the language reference for more details.

for j in 1 .. n,
    i in m .. j+1 : -1 -> (

  -- Givens rotation
  rot := givens A[i-2,j-1] A[i-1,j-1];
  ...
);

Plover uses zero-indexing, but we keep the same loop bounds to avoid too much confusion in the translation. The expression a..b denotes the sequence of integers from a to b, inclusive, whereas a:b excludes the upper bound. The expression (a..b : -1) means: count from a to b with step size -1.

-- Rotate one column at a time
for k in j..n -> (
  v := A[i-2 .. i-1, k-1];
  A[i-2 .. i-1, k-1] <- rot^T * v;
);

We rotate one column at a time so that we can use a two element temporary vector v to avoid overwiting elements of A while they are still needed by the product computation. Currently, Plover will not warn you and will not automatically make a copy of the right hand side if one is needed to properly compute an update statment a <- b.

These lines also demonstrate the submatrix indexing facilities of Plover. We often use the notation v[a:b] to take the subvector of v at indices from a to b-1. These expressions can be used as l-values and as r-values, as above. They can also be passed as out arguments to a function, and the proper subvector will be updated. We can take subranges of objects with multiple indices as well: taking a row of a matrix is accomplished with M[i] or M[i, :], and taking a column is simply M[:, i]. A colon without upper or lower bounds is filled in appropriately.

Backsolving

Now we have a square upper triangular constraint matrix and a target vector; we can solve this one row at a time, starting with the last.

For an upper-triangular system \(Rx=b\), the value of \(x_i\) is given by

\begin{equation*} x_i = \left. \left(b_i - \sum_{j=i+1}^n R_{ij}x_j\right)\middle/ R_{ii} \right. . \end{equation*}

The algorithm will overwrite b[i] with this value, since it is not needed by later steps.

-- Back substitution for upper triangular U
static backsolve {n}
  (U :: double[n,n])
  (inout b :: double[n])
  :: s8 := (
    for i in 0:n ->
      if U[i,i] == 0 then
        return -1;

    b[n-1] <- b[n-1]/U[n-1, n-1];

    for i in n-1 .. 1 : -1 -> (
      b[i-1] <- (b[i-1] - U[i-1, i : n] * b[i : n]) / U[i-1, i-1];
    );

    return 0;
);

The * inside the for loop is shorthand for a dot product. We add a check to see if any of the diagonal entries are 0 and return an error code as a signed byte.

Complete Solver

Finally, the completed algorithm:

-- Assumes m >= n
-- See "Matrix Computations" 4th ed. Golub and Van Loan
qr_solve {m, n}
  (inout A :: double[m, n])
  (inout b :: double[m])

  (out solution :: double[n])
  (out residual :: double)

  :: s8 := (

  qr_update (inout b) (inout A);

  -- A is now upper triangular; backsolve it into b
  code := backsolve A[0:n, 0:n] (inout b[0:n]);

  -- Solution stored in first n elements
  solution <- b[0:n];

  -- Norm of error = norm of last m-n elements
  residual <- norm b[n:m];

  return code;
);

Note the way implicit arguments are resolved.

The generated C:

s8 qr_solve (const s32 m, const s32 n, double * A, double * b, double * solution, double * const residual)
{
    qr_update(m, n, b, A);

    s8 code;
    double arg [n * n];
    double arg2 [n];

    for (s32 idx = 0; idx < n; idx++) {
        for (s32 idx2 = 0; idx2 < n; idx2++) {
            arg[n * idx + idx2] = A[n * idx + idx2];
        }
    }
    for (s32 idx = 0; idx < n; idx++) {
        arg2[idx] = b[idx];
    }
    code = backsolve(n, arg, arg2);
    for (s32 idx = 0; idx < n; idx++) {
        b[idx] = arg2[idx];
    }
    for (s32 idx = 0; idx < n; idx++) {
        solution[idx] = b[idx];
    }

    double arg3 [m - n];

    for (s32 idx = 0; idx < m - n; idx++) {
        arg3[idx] = b[n + idx];
    }
    *residual = norm(m - n, arg3);
    return code;
}

The copying around inout b[0:n] is a bit inefficient in this case, but similar logic is needed for more complex matrix storage types.

// qr.h
#ifndef PLOVER_GENERATED_qr
#define PLOVER_GENERATED_qr

#include "prelude.h"

s8 qr_solve (const s32 m, const s32 n, double * A, double * b, double * solution, double * const residual);
void qr_update (const s32 m, const s32 n, double * b, double * A);
s32 main (void);


#endif /* PLOVER_GENERATED_qr */

Citations

[MC2013]

(1, 2, 3) G.H. Golub and C.F. Van Loan (2013). Matrix Computations, 4th ed. The Johns Hopkins University Press, Baltimore, MD.

Authors:	Scott Kovach Kyle Miller
Modified:	August 2015