torchvision
How is the C++ code compiled and how is it loaded in PyTorch.
from torchvision.io import read_file to read a binary file and it returns
a 1-d tensor of dtype torch.uint8.
There is also a write_file to write a 1-d tensor of dtype torch.uint8.
from torchvision.io import read_image to read an image. Supported formats are
jpg, png, gif.
Color images are represented as a uint8 tensor of shape (channels, height, width).
transforms.v2._utils contains functions for checking and parsing sizes, paddings, fill.
Use v2.ToDtype(torch.float32, scale=True) to convert dtype.
RandomAffinecallsF.affine.RandomAffineis intransforms/v2/_geometry.py. Affine has 4 parameters: angle for rotation, translate, scale, shear. RandomAffine._get_params() just return the 4 parameters. Note that we can call the class methodRandomAffine.get_params(), which is defined intransform/transform.pyF.affine()is defined intransforms/v2/functional/_geometry.pywith namedef affine_image_get_inverse_affine_matrix()is used to get the matrix for affine transform.Inside
F.affine(), it calls_affine_gridto get a grid from the affine transform matrix and then use_apply_grid_transformto transform the input.
TODO: look at torch.nn.functional.affine_grid() and grid_sample(). See also https://github.com/wuneng/WarpAffine2GridSample/blob/master/main.py#L56
Deep Learning Paper Implementations: Spatial Transformer Networks - Part I