Optimization on manifold Link to heading

In the following post I would like to summarize my perception of poses’ optimization problem. Such a problem often occurs in robotics and other related fields. Usually we want jointly optimize the poses, their increments and various measurements. What we want to find is such set of parameters, that minimize the sum of residuals, or differences, between the real measurements and measurements, that we derive from our state.

Assume that we have the prior on pose of our object origin $b$ w.r.t. the world origin $w$ at time $i$ - $\mathbf{T}_{wb_i}$, we also know the prior on pose at time ${i+1}$ - $\mathbf{T}_{wb_{i+1}}$, and the relative pose between them $\mathbf{T}_{b_i b_{i+1}}$:

$$ \mathbf{T}_{w b_{i+1}} = \mathbf{T}_{w b_{i}}\mathbf{T}_{b_{i}b_{i+1}}. $$

In such a case we have only two poses and one pose difference between them. Real-life examples are usually much more harder, and include hundreds of parameters. Nevertheless, even in such a simple case, how can we formulate it in a way that we can optimize?

If we have two poses $\mathbf{T}_{wb_i}$ and $\mathbf{T}_{wb_{i+1}}$, we can define the measurement function $\mathbf{h}$ as their composition:

$$ \mathbf{h}\left(\mathbf{T}_{wb_i}, \mathbf{T}_{wb_{i+1}}\right) = \hat{\mathbf{T}}_{b_{i}b_{i+1}} = \mathbf{T}_{wb_{i}}^{-1}\mathbf{T}_{wb_{i+1}} $$

Now, having an estimated and real measurements we can find their difference as:

$$ \hat{\mathbf{T}}_{b_{i}b_{i+1}}^{-1}\mathbf{T}_{b_{i}b_{i+1}}. $$

To find the minimum we should find the derivative w.r.t. the minimization parameters (our matrices!), set it to zero and solve for the parameters. Of course, we could try to directly solve such a problem, but imagine a beast that comes out as a result such attempt. There should be a better way of doing this. This way utilizes the following math concepts:

  1. Define residuals on the tangent space of the manifold.
  2. Use first-order approximation - it allows us to describe how the non-linear function behaves near the linearization point, and allows to deduce standard Jacobian blocks.
  3. Use the first-order approximation to rewrite the problem in standard linear form that allows to use standard Jacobians that depend only on the measurement function.

Residuals minimization on Manifold Link to heading

Step 1 Link to heading

We want to minimize the following cost function:

$$ \left|\left|\log\left(\hat{\mathbf{T}}_{b_{i}b_{i+1}}^{-1}\mathbf{T}_{b_{i}b_{i+1}}\right)\right|\right|^2. $$

We can interpret this function intuitively:

  • by group definition if the elements $\hat{\mathbf{T}}$ and $\mathbf{T}$ are equal, then their composition $\hat{\mathbf{T}}^{-1}\mathbf{T}$ is the identity, if they differ, then the composition allows us to find this difference;
  • $\log$ operations maps the difference from the manifold to the tangent space, $\log: \mathcal{M} \to \mathcal{R}^3$, that is the vector space;
  • norm operator measures the size of our difference, since we use $\mathcal{l}^2$ norm, we can easily find the derivative.

Step 2 Link to heading

The solution of the problem $\mathbf{L} = \arg\underset{x}{\min}||\mathbf{A}\mathbf{x} - \mathbf{b}||^2$ is well-known and can be derived quite easily:

$$ \mathbf{x} = (\mathbf{A}^T\mathbf{A})^{-1}\mathbf{A}^T\mathbf{b}. $$

The aforementioned cost function expression, however, has the different form. But let’s use first-order approximation for the measurement function $\mathbf{h}$, where $\delta\mathbf{x}$ is state perturbation around our current estimate $\mathbf{\hat{x}}$:

$$\mathbf{h}(\mathbf{\hat{x}} + \delta\mathbf{x}) \approx \mathbf{h}(\mathbf{\hat{x}}) + \mathbf{J}^{\mathbf{h}}_{\mathbf{x}}(\mathbf{\hat{x}}) \delta \mathbf{x}$$

After trivial rewriting we can identify the standard form:

$$ \mathbf{L} = \arg\underset{\delta\mathbf{x}}{\min}\left|\left| \mathbf{J}^{\mathbf{h}}_{\mathbf{x}}(\mathbf{\hat{x}}) \delta \mathbf{x} - \left(\mathbf{h}(\mathbf{\hat{x}} + \delta\mathbf{x}) - \mathbf{h}(\mathbf{\hat{x}}) \right)\right|\right|^2 , $$

noticing that $\mathbf{\hat{x}} + \delta\mathbf{x} = \mathbf{x}$, where $\mathbf{x}$ is the true state, we substitute $\mathbf{h}(\mathbf{x})$ with our measurement $\mathbf{z}$, we simplifying the expression one step further:

$$ \mathbf{L} = \arg\underset{\delta\mathbf{x}}{\min}\left|\left| \underbrace{\mathbf{J}^{\mathbf{h}}_{\mathbf{x}}(\mathbf{\hat{x}})}_{\vphantom{f}\mathbf{A}} \delta \mathbf{x} - \underbrace{\left(\mathbf{z} - \mathbf{h}(\mathbf{\hat{x}}) \right)}_{\vphantom{f}\mathbf{b}}\right|\right|^2. $$

In this form we clearly identify the linear residuals form! Since $\mathbf{b} = \log\left(\hat{\mathbf{T}}_{b_{i}b_{i+1}}^{-1}\mathbf{T}_{b_{i}b_{i+1}}\right)$ and has vector form only one thing that is needed now is the jacobian $\mathbf{J}^{\mathbf{h}}$.

Step 3 Link to heading

To find the Jacobian we can use the elementary blocks and chain rule, as stated in 1. Note, that after our former reformulation we need to find the Jacobian of the measurement function $\mathbf{h}$ w.r.t. the state, not w.r.t. the squared residual:

$$ \mathbf{J}^{\mathbf{h}}_{\mathbf{x}} = \frac{D \mathbf{h}(\mathbf{x})}{D \mathbf{x}}. $$

Having the measurement function

$$\mathbf{h}\left(\mathbf{T}_{wb_i}, \mathbf{T}_{wb_{i+1}}\right) = \hat{\mathbf{T}}_{b_{i}b_{i+1}} = \mathbf{T}_{wb_{i}}^{-1}\mathbf{T}_{wb_{i+1}}$$

we can use the jacobians that already derived for us in the literature, for example the jacobians w.r.t. $\mathbf{T}_{wb_{i}}$ and $\mathbf{T}_{wb_{i+1}}$ can be found to be (eq. (82) and (83) in 1):

$$ \begin{gather} \mathbf{J}^\mathbf{h}_{\mathbf{x}_{\mathbf{T}_{wb_{i}}}} = -\mathbf{J}_l^{-1}\left(\mathbf{e}\right), \\\\ \mathbf{J}^\mathbf{h}_{\mathbf{x}_{\mathbf{T}_{wb_{i+1}}}} =\mathbf{J}_r^{-1}\left(\mathbf{e}\right), \end{gather} $$

and $\mathbf{e}=\log\left(\mathbf{T}_{wb_{i}}^{-1}\mathbf{T}_{wb_{i+1}}\right) \in \mathcal{T}_{b_i}\mathcal{M}$.

Equipped with Jacobians we can write our toy problem. Assume that we know prior robot poses $\{\mathbf{T}_{wb_{1}}, \mathbf{T}_{wb_{2}}, \mathbf{T}_{wb_{3}}\} \in SE(2)$ and the measured pose increments between them $\{\mathbf{\tau}_{b_{1}b_{2}}, \mathbf{\tau}_{b_{2}b_{3}} \} \in \mathcal{T}\mathcal{M}$. Our state consist of three poses, and we have two increments. Then we can define our problem using matrices:

$$ \underbrace{ \begin{bmatrix} -\mathbf{J}_l^{-1}\left(\log\left(\mathbf{T}_{wb_{1}}^{-1}\mathbf{T}_{wb_{2}}\right)\right) & \mathbf{J}_r^{-1}\left(\log\left(\mathbf{T}_{wb_{1}}^{-1}\mathbf{T}_{wb_{2}}\right)\right) & \mathbf{0}_{3\times3}\\\\ \mathbf{0}_{3\times3} & -\mathbf{J}_l^{-1}\left(\log\left(\mathbf{T}_{wb_{2}}^{-1}\mathbf{T}_{wb_{3}}\right)\right) & \mathbf{J}_r^{-1}\left(\log\left(\mathbf{T}_{wb_{2}}^{-1}\mathbf{T}_{wb_{3}}\right)\right) \end{bmatrix}}_{\mathbf{J}_{6\times 9}} \delta\mathbf{x} = \underbrace{ \begin{bmatrix} \mathbf{\tau}_{b_{1}b_{2}} - \log\left(\mathbf{T}_{wb_{1}}^{-1}\mathbf{T}_{wb_{2}}\right) \\\\ \mathbf{\tau}_{b_{2}b_{3}} - \log\left(\mathbf{T}_{wb_{2}}^{-1}\mathbf{T}_{wb_{3}}\right) \end{bmatrix}}_{\mathbf{r}_{6\times 1}} $$

This can be solved using least-squares, the errors, or perturbation can be found to be:

$$ \delta\mathbf{x} = \left(\mathbf{J}^T\mathbf{J}\right)^{-1}\mathbf{J}^T\mathbf{r} $$

then these errors are injected into the current, or nominal state: $\mathbf{x} \leftarrow \mathbf{x} + \delta\mathbf{x}$. The procedure is repeated until convergence. Doing so we evenly distribute the total error vector between the whole elements of the state vector. To overcome this issue we can use weighted least squares and assign weight to each residual based on our confidence in it.

Note that this problem can be described using factor graph 2: this uncovers the close relationship between graphs and linear algebra.

Arbitrary measurement functions Link to heading

Last thing I would like to tackle in this post is the arbitrary measurement functions. Assume that we have a sensor that gives us the range measurements to the landmark, and the landmark position $\mathbf{p}_{wl}$ is known. How can we add such a sensor in our setup? As usual, we have to define the measurement function $\mathbf{h}$:

$$ \mathbf{h}_{r}\left(\mathbf{T}_{wb}, \mathbf{p}_{wl}\right) = \sqrt{\mathbf{t}^T \mathbf{t}}, $$

where $\mathbf{t} = \begin{bmatrix}\mathbf{T}_{wb}^x - \mathbf{p}_{wl}^x \\ \mathbf{T}_{wb}^y - \mathbf{p}_{wl}^y\end{bmatrix}$ is the translation between origin of the body frame and the landmark.

Only one thing we need is to find the derivative of $\mathbf{h}_r$ w.r.t. our state. First, let’s resolve the landmark position in body frame (doing so we don’t need to calculate the difference before dot product):

$$ \mathbf{p}_{bl} = \mathbf{T}_{bw}\mathbf{p}_{wl} = \mathbf{T}_{wb}^{-1}\mathbf{p}_{wl} $$

substituting into the $\mathbf{h}_{r}$ and extracting only pose component of state vector we get:

$$ \mathbf{h}_{r}\left(\mathbf{T}_{wb}, \mathbf{p}_{wl}\right) = \sqrt{\left(\mathbf{T}_{wb}^{-1}\mathbf{p}_{wl}\right)^T \left(\mathbf{T}_{wb}^{-1}\mathbf{p}_{wl}\right)}, $$

We can think write our measurement function $\mathbf{h}$ as composition of $\mathbf{f} \circ \mathbf{g} \circ \mathbf{n}$, where

  • $\mathbf{n}\left(\mathbf{T}\right) = \mathbf{T}^{-1}$;
  • $\mathbf{g}\left(\mathbf{T}, \mathbf{p}\right) = \mathbf{T} \mathbf{p}$;
  • $\mathbf{f}\left(\mathbf{p}'\right) = \sqrt{\mathbf{p'}_x^2 + \mathbf{p'}_y^2}$.

The derivative w.r.t. to state can be found using chain rule:

$$ \mathbf{J}^{\mathbf{f}}_{\mathbf{T}} = \mathbf{J}^{\mathbf{f}}_{\mathbf{p}'} \mathbf{J}^{\mathbf{T}^{-1}\mathbf{p}}_{\mathbf{T}^{-1}} \mathbf{J}^{\mathbf{T}^{-1}}_{\mathbf{T}}. $$

For $SE(2)$ the latter two are derived in 1 by equations (166) and (62). The derivation of $\mathbf{J}^{\mathbf{f}}_{\mathbf{p}'}$ is trivial:

$$ \mathbf{J}^{\mathbf{f}}_{\mathbf{p}'} = \begin{bmatrix}\frac{\mathbf{p'}_x}{r} & \frac{\mathbf{p'}_y}{r}\end{bmatrix}, ~r = \sqrt{\mathbf{p'}_x^2 + \mathbf{p'}_y^2}. $$

Now, if we have two landmarks $\mathbf{l}_1$ and $\mathbf{l}_2$, and our robot sees first landmark only at $\mathbf{T}_{wb_1}$ and second landmark only at $\mathbf{T}_{wb_3}$, we have to add two extra rows at our matrix $\mathbf{J}$ to account for this data:

$$ \underbrace{ \begin{bmatrix} &\mathbf{J}_{6\times 9}& \\\\ \mathbf{J}^{\mathbf{f}}_{\mathbf{T}_{wb_1}} & \mathbf{0}_{1\times 3} & \mathbf{0}_{1\times 3} \\\\ \mathbf{0}_{1\times 3} & \mathbf{0}_{1\times 3} & \mathbf{J}^{\mathbf{f}}_{\mathbf{T}_{wb_3}} \end{bmatrix}}_{\mathbf{J}_{8\times 9}} \delta\mathbf{x} = \underbrace{ \begin{bmatrix} \mathbf{r}_{6\times 1} \\\\ r_1 - \mathbf{h}_r\left( \mathbf{T}_{wb_1}, \mathbf{p}_{wl_1} \right) \\\\ r_2 - \mathbf{h}_r\left( \mathbf{T}_{wb_3}, \mathbf{p}_{wl_2} \right) \end{bmatrix}}_{\mathbf{r}_{8\times 1}}, $$

where $r_1$ and $r_2$ - range measurements.

Implementation Link to heading

Now let’s give the numerical example to all aforementioned concepts and functions.

First, define all needed operations for Lie groups and associated algebras.

Spoiler
import numpy as np
import scipy.linalg

def Jr(tau):
    '''Right jacobian for SE2, eq. (163) from [1].'''
    x, y, theta = tau.flatten()
    s, c = np.sin(theta), np.cos(theta)
    return np.array([[s / theta, (1 - c) / theta, (theta * x - y + y * c - x * s) / theta**2],
                     [(c - 1) / theta, s / theta, (x + theta * y - x * c - y * s) / theta**2],
                     [0., 0., 1.]])

def Jl(tau):
    '''Left jacobian for SE2, eq. (163) from [1].'''
    x, y, theta = tau.flatten()
    s, c = np.sin(theta), np.cos(theta)
    return np.array([[s / theta, (c - 1) / theta, (theta * x + y - y * c - x * s) / theta**2],
                     [(1 - c) / theta, s / theta, (-x + theta * y + x * c - y * s) / theta**2],
                     [0., 0., 1.]])

def T(x, y, theta):
    '''Returns transformation matrix for given x, y and theta parameters.'''
    return np.array([[np.cos(theta), -np.sin(theta), x], [np.sin(theta), np.cos(theta), y], [0., 0., 1.]])

def Exp(tau):
    '''Exponent, transfers Lie algebra to the group, known as retraction.'''
    tau = tau.flatten()
    x, y, theta = tau[0], tau[1], tau[2]
    T = np.zeros((3, 3))
    T[0:2, 0:2] = scipy.linalg.expm(np.array([[0, -theta], [theta, 0]]))
    V_theta = np.sin(theta) / theta * np.eye(2) + (1. - np.cos(theta)) / theta * np.array([[0, -1], [1, 0]])
    p = V_theta @ np.array([[x], [y]])
    T[0:2, 2] = p.flatten()
    T[2, 2] = 1.0
    return T

def Log(T):
    '''Inverse of Exp, unwraps group to Lie algebra.'''
    t = T[1, 0] / T[0, 0]
    x, y, theta = T[0, 2], T[1, 2], np.arctan2(T[1, 0], T[0, 0])
    V_theta = np.sin(theta) / theta * np.eye(2) + (1. - np.cos(theta)) / theta * np.array([[0, -1], [1, 0]])
    v = np.linalg.inv(V_theta) @ np.array([[x], [y]])
    return np.array([[v[0, 0]], [v[1, 0]], [theta]])

def Ad(T):
    '''Adjoint, linear transform that transforms tangent space at X to tangent space at identity.'''
    Ad = np.copy(T[:, :])
    Ad[0:2, 2] = -np.array([[0., -1.], [1., 0]]) @ T[0:2, 2]
    return Ad[:, :]

def J_df_dp(p):
    px, py = p[0], p[1]
    r = np.sqrt(px**2 + py**2)
    return np.array([px / r, py / r, 0.])

def J_dTp_dT(T, p):
    J = np.eye(3)
    J[0:2, 0:2] = T[0:2, 0:2]
    J[0:2, 2] = T[0:2, 0:2] @ np.array([[0, -1.], [1., 0]]) @ p
    
    return J

Now select true values, measurements and add noise to them.

Spoiler
# true values
trueT_1 = T(2., 4., np.deg2rad(35))
trueT_2 = T(2., 3., np.deg2rad(50))
trueT_3 = T(4., 3., np.deg2rad(70))

# landmark positions
L1 = np.array([1.5, 3.75, 1.])
L2 = np.array([4.5, 3., 1.])

# pose increments measured + noise
dT_noise = np.array([0.01, 0.01, 0.01])
T_12 = Log(np.linalg.inv(trueT_1) @ trueT_2) + np.random.normal(scale = dT_noise).reshape((3, 1))
T_23 = Log(np.linalg.inv(trueT_2) @ trueT_3) + np.random.normal(scale = dT_noise).reshape((3, 1))

# ranges measured + noise
dR_noise = 0.01
R1 = np.sqrt(np.sum((trueT_1[:, 2] - L1) ** 2)) + np.random.normal(scale = dR_noise)
R2 = np.sqrt(np.sum((trueT_3[:, 2] - L2) ** 2)) + np.random.normal(scale = dR_noise)

# noise prior estimates
T_1 = trueT_1 @ Exp(np.random.normal(scale = 20 * dT_noise))
T_2 = trueT_2 @ Exp(np.random.normal(scale = 20 * dT_noise))
T_3 = trueT_3 @ Exp(np.random.normal(scale = 20 * dT_noise))
pT_1, pT_2, pT_3 = np.copy(T_1), np.copy(T_2), np.copy(T_3)

Fill-in jacobians and solve the problem iteratively:

u_sigmas = np.array([0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.00001, 0.00001])
W = np.diagflat(1./u_sigmas)  # this is Q^(-T/2)

for i in range(0, 1000):
    H = np.zeros((8, 9))
    est_T_12 = Log(np.linalg.inv(T_1) @ T_2)
    H[0:3, 0:3] = -np.linalg.inv(Jl(est_T_12))
    H[0:3, 3:6] =  np.linalg.inv(Jr(est_T_12))
    
    est_T_23 = Log(np.linalg.inv(T_2) @ T_3)
    H[3:6, 3:6] = -np.linalg.inv(Jl(est_T_23))
    H[3:6, 6:9] =  np.linalg.inv(Jr(est_T_23))
    
    est_range_R1 = np.sqrt(np.sum((T_1[:, 2] - L1) ** 2))
    # chain rule
    T1_inv = np.linalg.inv(T_1)
    L1_transformed = (T1_inv @ L1)[0:2]
    H[6, 0:3] = J_df_dp(L1_transformed) @ J_dTp_dT(T1_inv, L1[0:2]) @ -Ad(T_1)
    
    est_range_R2 = np.sqrt(np.sum((T_3[:, 2] - L2) ** 2))
    # chain rule
    T3_inv = np.linalg.inv(T_3)
    L2_transformed = (T3_inv @ L2)[0:2]
    H[7, 6:9] = J_df_dp(L2_transformed) @ J_dTp_dT(T3_inv, L2[0:2]) @ -Ad(T_3)
    
    residual = np.vstack((T_12 - est_T_12, 
                          T_23 - est_T_23, 
                          R1 - est_range_R1,
                          R2 - est_range_R2))
    
    delta = np.linalg.pinv(H.T @ W @ H) @ H.T @ W @ residual
    
    T_1 = T_1 @ Exp(delta[0:3, :])
    T_2 = T_2 @ Exp(delta[3:6, :])
    T_3 = T_3 @ Exp(delta[6:9, :])
Spoiler
import matplotlib.pylab as plt
from matplotlib.patches import Circle
%config InlineBackend.figure_format='retina'

fig, ax = plt.subplots(1, 1, figsize = (7, 5))
ax.plot([trueT_1[0, 2], trueT_2[0, 2], trueT_3[0, 2]], [trueT_1[1, 2], trueT_2[1, 2], trueT_3[1, 2]], 'o-', label = 'True')
ax.plot([T_1[0, 2], T_2[0, 2], T_3[0, 2]], [T_1[1, 2], T_2[1, 2], T_3[1, 2]], 'o-', label = 'Optimized')
ax.plot([pT_1[0, 2], pT_2[0, 2], pT_3[0, 2]], [pT_1[1, 2], pT_2[1, 2], pT_3[1, 2]], 'o-', label = 'Initial')
ax.scatter([L1[0], L2[0]], [L1[1], L2[1]], marker='x', s = 120, label = 'Landmarks')
for l, r in zip((L1, L2), (R1, R2)):
    circle = Circle((l[0], l[1]), r, facecolor='none', edgecolor='tab:blue', linewidth=3, alpha=0.5)
    ax.add_patch(circle)
ax.axis("equal")
ax.legend()
ax.grid()
ax.set_title("True, initial and estimated trajectories")
ax.text(1, 2.25, "Circles - measured distances to the landmarks", c = "tab:blue", alpha = 0.8);

png

References Link to heading