![]() The data processed from radio telescopes is a good example. In addition to deep learning, applications that use data from cameras or other real-world sensors often don’t require high-precision floating point computation, because the sensors generate low-precision or low dynamic range data. Moreover, for many networks deep learning inference can be performed using 8-bit integer computations without significant impact on accuracy. Storing FP16 (half precision) data compared to higher precision FP32 or FP64 reduces memory usage of the neural network, allowing training and deployment of larger networks, and FP16 data transfers take less time than FP32 or FP64 transfers. For example, researchers in the rapidly growing field of deep learning have found that deep neural network architectures have a natural resilience to errors due to the backpropagation algorithm used in training them, and some have argued that 16-bit floating point (half precision, or FP16) is sufficient for training neural networks. But there are many applications for which much lower precision arithmetic suffices. Many technical and HPC applications require high precision computation with 32-bit (single float, or FP32) or 64-bit (double float, or FP64) floating point, and there are even GPU-accelerated applications that rely on even higher precision (128- or 256-bit floating point!). “As the relative costs and ease of computing at different precisions evolve, due to changing architectures and software, as well as the disruptive influence of accelerators such as GPUs, we will see an increasing development and use of mixed precision algorithms.” - Nick Higham, Richardson Professor of Applied Mathematics, University of Manchester With the introduction of the Pascal GPU architecture and CUDA 8, NVIDIA is expanding the set of tools available for mixed-precision computing with new 16-bit floating point and 8/16-bit integer computing capabilities. This is especially important when it comes to numerical computing, where tradeoffs between precision, accuracy, and performance make it essential to choose the best representations for data. In the practice of software development, programmers learn early and often the importance of using the right tool for the job. Interested in learning more or trying it out for yourself? Get tensor core optimized examples for popular AI frameworks here. NVIDIA has also added automatic mixed precision capabilities to TensorFlow, PyTorch, and MXNet. Making use of Tensor Cores requires using CUDA 9 or later. This enables faster and easier mixed-precision computation within popular AI frameworks. Update, March 25, 2019: The latest Volta and Turing GPUs now incoporate Tensor Cores, which accelerate certain types of FP16 matrix math.
0 Comments
Leave a Reply. |