Profiling Neural Networks to improve model training and inference speed
Part 2: Improving the Neural Network
In the previous part, we studied how to profile a neural network using Tensorflow Board profile. One of the suggestions was to use Floating Point 16 (FP16) arithmetic instead of Floating Point 32 (FP32) arithmetic. In this post, we will discuss the rationale behind this suggestion, and specifically how to make this improvement. tradeoff.
FP16 vs FP32 arithmetic
Using FP16 buys us two significant computational advantages
Computational speed of operating on FP16 (Say multiplying two FP16 numbers) is many times faster compared to FP32 arithmetic. There are two reasons: (i) You have 4x less work to do, and (ii) Many processors such as those from Nvidia have specialized units to handle FP16.
We consume much lower memory bandwidth accessing FP16 data, thus reducing the probability of running into a memory access bottleneck before a computational bottleneck.
However, the consequence of using FP16 is lower precision. So, if we simply used FP16 to represent the coefficients of a Neural Network, there is a high likelihood that the training phase will not converge because of loss of data precision.
This has led to the introduction of mixed precision training. In mixed precision training, we perform most operations in FP16, but preserve some core parts of the network in FP32 (usually model activations and gradients are stored using a 16-bit floating point format while model weights and optimizer states use 32-bit precision 1) so that information loss can be minimized. nVidia’s documentation shows how mixed precision training can achieve 3x boost in speed while converging with the same level of accuracy while training.
ML Libraries support Mixed Precision Training
Most ML libraries have built in native support for mixed precision. As an example, in Tensorflow, enabling Mixed Precision Training just involves adding the following lines:
from tensorflow.keras import mixed_precision
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_global_policy(policy)
In the above lines of code, we set a global policy to use mixed precision with FP16. A detailed document explaining the intricacies in Tensorflow is available here.
Thanks for reading so far. In the next part, we will study how the performance profile of RESNET trained to enable Anki Vector to recognize the human sign language changes once mixed precision training is configured. Please subscribe for free to my newsletter to get updates on the next parts of this series.