Author: Christian M.M. Frey
E-Mail: christianmaxmike@gmail.com
It’s a python-based scientific computing package targeted at two sets of audiences:
Pytorch has been established to be one of the most optimized high-performance tensor library for computation of deep learning tasks on GPUs and CPUs. Pytorch is a library based on Python and the Torch tool provided by Facebook’s Artificial Intelligence Research group. Pytorch provides a large collection of modules for computation with three of them being the very prominent:
In this tutorial, we make our first steps in pytorch and show how we can create tensors and also how we can execute basic operations on them.
i) Open the Anaconda navigator and go to the ‘Environments’ page
ii) Select your preferred environment, search for ‘pytorch’ and install it Alternatively: open the terminal, make sure your environment (source activate my_env
) is selected and type the following:
conda install pytorch
iii) Launch jupyter and open a notebook of your choice
iv) Type the following command to check whether the PyTorch library is installed and check the version
import torch
torch.__version__
import torch
torch.__version__
'1.9.0'
Create an empty Tensors of size (5,5), i.e., a 2-dimensional matrix with 5 rows and 5 columns.
torch.empty(5,5)
tensor([[0.0000e+00, 0.0000e+00, 1.4013e-45, 0.0000e+00, 1.4013e-45],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 2.8026e-45, 0.0000e+00],
[9.4885e-39, 3.4438e-41, 1.4013e-45, 0.0000e+00, 1.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 2.8026e-45, 0.0000e+00],
[9.4886e-39, 3.4438e-41, 1.4013e-45, 0.0000e+00, 1.0000e+00]])
Create a tensor from a python list [[20, 30, 40], [90, 60, 70]], i.e., a 2x3 matrix wit given entries.
torch.FloatTensor([[20, 30, 40], [90, 60, 70]])
tensor([[20., 30., 40.],
[90., 60., 70.]])
Create a random tensor (torch.randn()) of various sizes and print them on the console
# Construct a random matrix
a = torch.Tensor(3,3)
print (a)
print (torch.randn(4,4))
tensor([[2.0033e-24, 4.5873e-41, 2.0183e-24],
[4.5873e-41, 2.0183e-24, 4.5873e-41],
[1.6147e-24, 4.5873e-41, 2.0162e-24]])
tensor([[-1.0667, -2.2027, 0.9573, -1.3560],
[ 0.7074, 0.5609, -0.4915, -0.1674],
[-0.0229, -1.0087, 0.8788, 0.1121],
[-0.4559, -0.0781, 0.9201, 0.2624]])
Print the size of a matrix by using the function tensor.size()
print(a)
print(a.size())
tensor([[2.0033e-24, 4.5873e-41, 2.0183e-24],
[4.5873e-41, 2.0183e-24, 4.5873e-41],
[1.6147e-24, 4.5873e-41, 2.0162e-24]])
torch.Size([3, 3])
Special matrices/tensors: Ones Tensor ; Zeros Tensor ; Identity Tensor
# Create a tensor with '0' entries of size (5,3)
torch.zeros(5,3)
# Create a tensor with '1' entries of size (2,2)
torch.ones(2,2)
# Create the identity matrix (matrix having '1' on the diagonal, else '0' on the off-diagonals)
torch.eye(6)
tensor([[1., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.],
[0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 1.]])
There are multiple syntaxes for operations which we are learning in the following. First, let’s create two random tensor having the same size.
x = torch.rand(3,3)
y = torch.rand(3,3)
There are three different ways on how to add to tensors:
# Alt 1: using the '+' operator
z = x + y
print ("z:", z, "\nx: ", x, "\ny:", y)
z: tensor([[1.3688, 0.7951, 1.4088],
[0.4504, 0.5946, 1.3274],
[1.7864, 1.1904, 1.1175]])
x: tensor([[0.4275, 0.4471, 0.9603],
[0.0266, 0.0864, 0.9078],
[0.8902, 0.9344, 0.8658]])
y: tensor([[0.9413, 0.3480, 0.4485],
[0.4239, 0.5082, 0.4195],
[0.8963, 0.2560, 0.2518]])
# Alt 2: using pytorch's built-in function torch.add(.,.)
torch.add(x,y)
tensor([[1.3688, 0.7951, 1.4088],
[0.4504, 0.5946, 1.3274],
[1.7864, 1.1904, 1.1175]])
Inplace operations are post-fixed with an “_”
# Alt 3: calling the tensor's add_ function.
x.add_(y) # inplace add-operation
tensor([[1.3688, 0.7951, 1.4088],
[0.4504, 0.5946, 1.3274],
[1.7864, 1.1904, 1.1175]])
print(x)
tensor([[1.3688, 0.7951, 1.4088],
[0.4504, 0.5946, 1.3274],
[1.7864, 1.1904, 1.1175]])
For the multiplicaiton of a tensor with a scalar, there are again the three different variants from above.
# scalar multiplication
# Alternative 1
print (x * 5)
# Alternative 2
print (torch.mul(x,5))
# Alternative 3
print (x.mul_(5))
tensor([[ 855.4833, 496.9533, 880.5090],
[ 281.5063, 371.6346, 829.6192],
[1116.5049, 744.0276, 698.4658]])
tensor([[ 855.4833, 496.9533, 880.5090],
[ 281.5063, 371.6346, 829.6192],
[1116.5049, 744.0276, 698.4658]])
tensor([[ 855.4833, 496.9533, 880.5090],
[ 281.5063, 371.6346, 829.6192],
[1116.5049, 744.0276, 698.4658]])
Transposing a (n,m)-matrix, result in its (m,n) tranposed version (=switching rows and cols)
# Alternative 1
x.t()
# Alternative 2
x.transpose(1,0)
tensor([[ 855.4833, 281.5063, 1116.5049],
[ 496.9533, 371.6346, 744.0276],
[ 880.5090, 829.6192, 698.4658]])
PyTorch provides already a function to concatenate two tensors along a specific dimension. Let’s create two random tensors:
a = torch.rand(4,3)
b = torch.rand(4,3)
# Concatenating along the '0'-th dimension (appending as rows)
cat1 = torch.cat((a,b), 0)
print (cat1)
# Concatenating along the '1'-st dimension (appending as cols)
cat2 = torch.cat((a,b), 1)
print (cat2)
tensor([[0.7584, 0.3989, 0.1433],
[0.8016, 0.7306, 0.8424],
[0.1678, 0.7775, 0.0676],
[0.7486, 0.8901, 0.6153],
[0.7705, 0.9654, 0.1987],
[0.6932, 0.8022, 0.1134],
[0.1097, 0.6701, 0.1871],
[0.3566, 0.5516, 0.6091]])
tensor([[0.7584, 0.3989, 0.1433, 0.7705, 0.9654, 0.1987],
[0.8016, 0.7306, 0.8424, 0.6932, 0.8022, 0.1134],
[0.1678, 0.7775, 0.0676, 0.1097, 0.6701, 0.1871],
[0.7486, 0.8901, 0.6153, 0.3566, 0.5516, 0.6091]])
Slicing is similar to the slicing operations which we know form numpy:
# get the 3rd column
a[:, 2]
tensor([0.1433, 0.8424, 0.0676, 0.6153])
# get the 3rd row
a[2, :]
tensor([0.1678, 0.7775, 0.0676])
Often it might come in handy two reorder the entries within a tensor. For that pyTorch provides the .view() operation:
# Create a random tensor of size (4,4)
x = torch.rand(4,4)
# Resize the entries such that there is only one row with all the entries.
y = x.view(-1,16)
print (y)
print (x.flatten()) # <<-- Another way to flatten an array
# Resize the entries such that in each row there are 8 entries resulting in a (2,8) matrix
z1 = x.view(2, 8)
z2 = x.view(-1, 8) # <-- -1 can be used whenever the remaining entries are meant
print(x.size(), y.size(), z1.size())
tensor([[0.2188, 0.3941, 0.7353, 0.5304, 0.1330, 0.0808, 0.2885, 0.9352, 0.2935,
0.0148, 0.5019, 0.8725, 0.9420, 0.8920, 0.3253, 0.5555]])
tensor([0.2188, 0.3941, 0.7353, 0.5304, 0.1330, 0.0808, 0.2885, 0.9352, 0.2935,
0.0148, 0.5019, 0.8725, 0.9420, 0.8920, 0.3253, 0.5555])
torch.Size([4, 4]) torch.Size([1, 16]) torch.Size([2, 8])
The torch tensor and numpy array will share their underlying memory locations. Therefore, changing one will change the other.
# Create a tensor with '1' entries of size (5,4)
a = torch.ones(5,4)
print(a)
print (type(a)) # <<-- let's check a's type
tensor([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
<class 'torch.Tensor'>
a_np = a.numpy()
print (type(a_np)) # <<-- checking the type again
<class 'numpy.ndarray'>
Switching a tensor to its numpy representation, we can of course then apply the methods we know from numpy on the data.
# Torch Tensor -> NumPy array
b = a.numpy()
b = b.reshape(4,-1)
print(b)
[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]]
Switching back from numpy to pyTorch, we can use the function torch.from_numpy().
# NumPy Array -> Torch Tensor
c = torch.from_numpy(b)
print(c)
tensor([[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]])
import numpy as np
a = np.random.rand(3,4)
b = torch.from_numpy(a)
print(b)
tensor([[0.6912, 0.7155, 0.3031, 0.4622],
[0.6994, 0.5682, 0.2444, 0.5003],
[0.2115, 0.5563, 0.7044, 0.1517]], dtype=torch.float64)
The dynamic computational graph is combined of gradient enabled tensors (variables) along with functions (operations). The flow of data and the operations applied to the data are defined at runtime.
Every variable object has several attributes:
# Creating the graph
x = torch.tensor(1.0, requires_grad = True)
y = torch.tensor(2.0)
z = x * y
# Displaying
for i, name in zip([x, y, z], "xyz"):
print(f"{name}\n\
data: {i.data}\n\
requires_grad: {i.requires_grad}\n\
grad: {i.grad}\n\
grad_fn: {i.grad_fn}\n\
is_leaf: {i.is_leaf}\n")
x
data: 1.0
requires_grad: True
grad: None
grad_fn: None
is_leaf: True
y
data: 2.0
requires_grad: False
grad: None
grad_fn: None
is_leaf: True
z
data: 2.0
requires_grad: True
grad: None
grad_fn: <MulBackward0 object at 0x7fe01b8f1630>
is_leaf: False
/Users/ChrisMaxMike/anaconda3/envs/py36_torch/lib/python3.6/site-packages/ipykernel_launcher.py:13: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information.
del sys.path[0]
torch.Tensor is the central class of the package. If you set its attribute .requires_grad as True, it starts to track all operations on it. When you finish your computation you can call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute
from torch.autograd import Variable
x = Variable(torch.ones(4,3), requires_grad=True)
print (x)
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]], requires_grad=True)
y = x + 5
print(y)
tensor([[6., 6., 6.],
[6., 6., 6.],
[6., 6., 6.],
[6., 6., 6.]], grad_fn=<AddBackward0>)
print(y.grad_fn)
<AddBackward0 object at 0x7fe01b8ff898>
z = y * y * 5
out = z.mean()
print(z, out)
tensor([[180., 180., 180.],
[180., 180., 180.],
[180., 180., 180.],
[180., 180., 180.]], grad_fn=<MulBackward0>) tensor(180., grad_fn=<MeanBackward0>)
out.backward()
print(x.grad)
tensor([[5., 5., 5.],
[5., 5., 5.],
[5., 5., 5.],
[5., 5., 5.]])
Why 5? With ‘out’ we have that $out = \frac{1}{4\cdot3} \sum_{i} z_i$, where $z_i = 5 \cdot (x+5)^2$. Therefore, $\frac{\partial out}{\partial x_i} = \frac{10}{12} (x_i + 5)$. Hence, $\frac{\partial out}{\partial x_i} \Bigr\rvert_{x_i = 1} = \frac{60}{12} = 5$
Backward() is used for calculating the gradient by passing it’s argument (default: 1x1 unit tensor) through the backward graph all the way up to every leaef node traceable from the calling root tensor. The calculated gradients are then stored in .grad of every leaf node.
Hence, a tensor is automatically passed as .backward(torch.tensor(1.0)). The dimension of tensor passed into .backward() must be the same as the dimension of the tenso whose gradient is being calculated. For example:
%%time
x = torch.randn(2,2)
x = Variable(x, requires_grad = True)
y = x * 2
while y.data.norm() < 1000:
y = y * 2
print(y.data.norm())
print(y)
gradients = torch.FloatTensor(torch.rand(2, 2))
y.backward(gradients)
print(x.grad)
tensor(1875.4691)
tensor([[1350.1831, 986.6542],
[ 46.8405, -847.7674]], grad_fn=<MulBackward0>)
tensor([[ 831.6336, 324.4223],
[ 780.7774, 1003.5716]])
CPU times: user 5.8 ms, sys: 2.8 ms, total: 8.6 ms
Wall time: 12.9 ms
The tensor passed into the backward function acts like weights for a weighted output of gradient.
In the example y is no longer a scalar. torch.autograd count not compute the full Jacobian directly, but if we just want the vector-Jacobian product, we can simply pass a vector to the backward() function as argument.