TensorRT

TensorRT is a library developed by NVIDIA for faster inference on NVIDIA graphics processing units (GPUs), built on CUDA. You can only use TensorRT with Nvidia Graphics cards.

Is TensorRT individual-GPU specific, or architecture specific?

For example, if I built model weights for a NVIDIA Jetson Orin Nano, and you also have a Jetson Orin Nano, do you need to rebuild the model from the ONNX weights, or can I just give you the model? Why or why not?

It is architecture-specific

Installation (source)

sudo apt update
sudo apt install nvidia-tensorrt

What does the Optimization process look like

The optimization process involves several key steps:

Precision Calibration: Converts model weights and activations to lower precision formats (e.g., FP32 to FP16 or INT8) to accelerate computation
Layer and Tensor Fusion: Combines multiple layers and operations into a single operation to reduce memory access and improve execution speed
Kernel Auto-Tuning: Selects the most efficient algorithms and kernels based on the specific GPU architecture.
Dynamic Tensor Memory Management: Optimizes memory usage for the model’s intermediate tensors to reduce the memory footprint and increase throughput

ONNX
CuDNN

🛠️ Steven Gong

Table of Contents

TensorRT

What does the Optimization process look like

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

TensorRT

What does the Optimization process look like

Related

Graph View

Backlinks