R21D

In this architecture, 3D convolution is broken down into 2D and 1D convolution. 2D convolution is in spatial dimension and 1D convolution is along the temporal dimension.
There are two benefits of this decomposition:

Increase in non-linearity as the number of layers have increased
Due to factorization of 3D kernels, the optimization becomes easier.

More details can be found here.

Residual Network Architecture. R(2+1)D are ResNets with (2+1)D convolutions. For interpretability, residual connections are omitted.