Spatial Transformer Layer

为什么需要它

CNN无法做到旋转与缩放不变性,translatian也只能做的很小(利用pooling)

STL做的事情

对图像做旋转与缩放，并使这一过程的参数可学习，其实就是用仿射变换(affine transformation)

问题

如果train出来的转移方程，通过四舍五入导致一个pixel的位置不变，那么它的参数gradient为0

解决办法：Interpolation(插值)

不用四舍五入，而用双线性插值，基本思想是通过某一点周围四个点的灰度值来估计出该点的灰度值

具体结构

3.1 Localisation Network

The localisation network function floc() can take any form, such as a fully-connected network or a convolutional network, but should include a final regression layer to produce the transformation parameters θ.

3.2 Parameterised Sampling Grid

用上一层output出的来做仿射变换

3.3 Differentiable Image Sampling

再做双线性插值，实现可微

实际应用

这种layer可以用来处理input与feature map，也可以在同一层加多个单元(这里将它看做一个neuron)以产生不同的Spatial Transformer Output

这个打钱就像赞一样，如果你想赞，可以赞一分

Post author: fly理
Post link: https://flyleeee.github.io/2021/10/01/STL/
Copyright Notice: All articles in this blog are licensed under unless otherwise stated.