Profiling Neural Networks to improve model training and inference speed
Part 2: Improving the Neural Network
In the previous part, we studied how to profile a neural network using Tensorflow Board profile. One of the suggestions was to use Floating Point 16 (FP16) arithmetic instead of Floating Point 32 (FP32) arithmetic. In this post, we will discuss the rationale behind this suggestion, and specifically how to make this improvement. tradeoff.
FP16 vs FP32 arithmetic
Using FP16 buys us two significant computational advantages
Computational speed of operating on FP16 (Say multiplying two FP16 numbers) is many times faster compared to FP32 arithmetic. There are two reasons: (i) You have 4x less work to do, and (ii) Many processors such as those from Nvidia have specialized units to handle FP16.
We consume much lower memory bandwidth accessing FP16 data, thus reducing the probability of running into a memory access bottleneck before a computational bottleneck.
However, the consequence of using FP16 is lower precision. So, if we simply used FP16 to represent the coefficients of a Neural Network, th…
Keep reading with a 7-day free trial
Subscribe to Learn With A Robot to keep reading this post and get 7 days of free access to the full post archives.