### 1. Autograd mechanics[1]

#### 1.1 Excluding subgraphs from backward（从后向中排除子图）

Every Tensor has a flag: :attr:requires_grad that allows for fine grained exclusion of subgraphs from gradient computation and can increase efficiency.

>>> x = torch.randn(5, 5)  # requires_grad=False by default
>>> y = torch.randn(5, 5)  # requires_grad=False by default
>>> z = torch.randn((5, 5), requires_grad=True)
>>> a = x + y
False
>>> b = a + z
True


For example if you want to finetune a pretrained CNN, it’s enough to switch the:attr:requires_grad flags in the frozen base, and no intermediate buffers will be saved, until the computation gets to the last layer, where the affine transform will use weights that require gradient, and the output of the network will also require them.

model = torchvision.models.resnet18(pretrained=True)
# 让resnet18的所有的参数都不参与BP过程
for param in model.parameters():
# Replace the last fully-connected layer
# Parameters of newly constructed modules have requires_grad=True by default
model.fc = nn.Linear(512, 100)

# 只对分类器进行优化
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)


#### 1.2 How autograd encodes the history

Autograd is reverse automatic differentiation system. Conceptually, autograd records a graph recording all of the operations that created the data as you execute operations, giving you a directed acyclic graph（DAG） whose leaves are the input tensors and roots are the output tensors. By tracing this graph from roots to leaves, you can automatically compute the gradients using the chain rule.

Internally, autograd represents this graph as a graph of :class:Function objects (really expressions), which can be:meth:~torch.autograd.Function.apply ed to compute the result of evaluating the graph. When computing the forwards pass, autograd simultaneously performs the requested computations and builds up a graph representing the function that computes the gradient (the .grad_fn attribute of each :class:torch.Tensor is an entry point into this graph). When the forwards pass is completed, we evaluate this graph in the backwards pass to compute the gradients.

An important thing to note is that the graph is recreated from scratch at every iteration, and this is exactly what allows for using arbitrary Python control flow statements, that can change the overall shape and size of the graph at every iteration. You don’t have to encode all possible paths before you launch the training - what you run is what you differentiate.

#### 1.3 In-place operations with autograd（In-place的算子）

Supporting in-place operations in autograd is a hard matter, and we discourage their use in most cases. Autograd’s aggressive buffer freeing and reuse makes it very efficient and there are very few occasions when in-place operations actually lower memory usage by any significant amount. Unless you’re operating under heavy memory pressure, you might never need to use them.

There are two main reasons that limit the applicability of in-place operations:
In-place operations can potentially overwrite values required to compute gradients.
Every in-place operation actually requires the implementation to rewrite the computational graph. Out-of-place versions simply allocate new objects and keep references to the old graph, while in-place operations, require changing the creator of all inputs to the :class:Function representing this operation. This can be tricky, especially if there are many Tensors that reference the same storage (e.g. created by indexing or transposing), and in-place functions will actually raise an error if the storage of modified inputs is referenced by any other :class:Tensor.

### 2. 计算图和Symbol2Symbol[2]

#### 2.2 Symbol2Symbol

Tensorflow和Theano类似，额外节点提供了所需导数的符号描述。这种方法的主要优点是导数可以使用与原始表达式相同的语言来描述。因为导数只是另外一张计算图(添加到主计算图中)，我们可以再次运行反向传播，对导数 再进行求导就能得到更高阶的导数。

d z d x = d y d x ∗ d z d y \frac {dz} {dx} = \frac {dy} {dx} * \frac {dz} {dy}

### 3. 问题回顾

• PyTorch是动态图
• 动态图在每次iteration后都需要重建图本身
• PyTorch的optimizer默认情况会自动对梯度进行accumulate[4]，所以对下一次iteration（一个新的batch），需要对optimizer进行清空操作。使用方式如下:
for epoch in range(2):  # loop over the dataset multiple times

running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, labels = data

# 情况参数梯度

# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

# print statistics
running_loss += loss.item()
if i % 2000 == 1999:    # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0

print('Finished Training')


03-27 2万+

12-19 3万+
04-12 240
08-11 95
06-15 1万+
08-19 3228
07-15 734
04-19 6335
04-12 1万+
10-19 2万+
10-02 372
10-21 1304
03-10 4671
08-04 3979
03-30 6416
©️2020 CSDN 皮肤主题: Age of Ai 设计师:meimeiellie