Skip to content

Debugging guide for TensorRT #3489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docsrc/contributors/images/ci_whls.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
87 changes: 87 additions & 0 deletions docsrc/contributors/infra.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
.. _dev_infra:

PyTorch CI
====================

Our main CI provider is the PyTorch CI, backed by `pytorch/test-infra <https://github.com/pytorch/test-infra>`_


Debugging CI Failures
------------------------

Sometimes, you may observe errors on CI tests but cannot repro on a local machine. There are a few possible reasons:

- Oversubscription of resources issue that means CI runs too many jobs at the same time. You have to reduce the num of parallel jobs by lowering -n like ``python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml -n 8 conversion/``
- Your ENV may be different from CI.
- GPU arch. As of 11/20/2024, our CI is using AWS linux.g5.4xlarge.nvidia.gpu which runs on 1 x A10 GPU
- Torch-TensorRT version
- Dependency versions (PyTorch, TensorRT, etc.)


Create a same environment as CI
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

CI builds are slightly different than local builds since they will use PyTorch versions that are likely newer than the ones you have installed locally.
Therefore when debugging, it may be helpful to replicate the CI environment.

We build all CI wheels using a AlmaLinux manylinux container customized by PyTorch: ``pytorch/manylinux2_28-builder:cudaXX.X`` e.g.,``pytorch/manylinux2_28-builder:cuda12.8``
This container is available on Docker Hub and can be pulled using the following command:

.. code-block:: bash

docker pull pytorch/manylinux2_28-builder:cuda12.4

You can then either download builds from CI for testing:

.. image:: /contributors/images/ci_whls.png
:width: 512px
:height: 512px
:scale: 50 %
:align: right


.. code-block:: bash


/opt/python/cp311-cp311m/bin/python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
/opt/python/cp311-cp311m/bin/python -m pip install docker_workspace/torch_tensorrt-2.6.0.dev20241119+cu124-cp311-cp311m-linux_x86_64.whl # Install your downloaded artifact
# enter Torch-TRT dir and run: pip install -r requirements-dev.txt
/opt/python/cp311-cp311m/bin/python -m pip install timm pytest-xdist # pytest-xdist is used by pytest to parallel tasks



Or you can replicate the build in container by running the following command

.. code-block:: bash

docker run --rm -it -v $(pwd):/workspace pytorch/manylinux2_28-builder:cuda12.8 bash
# Inside container
cd /workspace
export CUDA_HOME=/usr/local/cuda-12.8
export CI_BUILD=1
./packaging/pre_build_script.sh
/opt/python/cp311-cp311m/bin/python setup.py bdist_wheel
/opt/python/cp311-cp311m/bin/python -m pip install timm pytest-xdist # pytest-xdist is used by pytest to parallel tasks



Run CI Tests in the same manner as PyTorch CI
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash
export RUNNER_TEST_RESULTS_DIR=/tmp/test_results

export USE_HOST_DEPS=1
export CI_BUILD=1
cd tests/py/dynamo
python -m pytest -ra --junitxml=${RUNNER_TEST_RESULTS_DIR}/dynamo_converters_test_results.xml -n 4 conversion/

Building Torch-TensorRT as Hermetically As Possible
---------------------------------------------------

Torch-TensorRT uses a combination of `Bazel <https://bazel.build/>`_ and `UV <https://docs.astral.sh/uv>`_ to build the project in a (near) hermetic manner.

C++ dependencies are declared in ``MODULE.bzl`` using ``http_archive`` and ``git_repository`` rules. Using a combination of ``pyproject.toml`` and ``uv``
we lock python dependencies as well. This insures that the dependencies fetched will be identical on each build. Using the build command
``uv pip install -e . `` or ``uv run <script using torch_tensorrt>`` will use these dependencies to build the project. When providing a reproducer for a
locally identified bug, providing the `MODULE.bzl` and `pyproject.toml` files will help us reproduce the issue.
2 changes: 2 additions & 0 deletions docsrc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ User Guide
* :ref:`runtime`
* :ref:`using_dla`
* :ref:`mixed_precision`
* :ref:`debugging`

.. toctree::
:caption: User Guide
Expand All @@ -51,6 +52,7 @@ User Guide
user_guide/runtime
user_guide/using_dla
user_guide/mixed_precision
user_guide/debugging


Tutorials
Expand Down
30 changes: 30 additions & 0 deletions docsrc/user_guide/debugging_torch_tensorrt.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
.. _debugging:

Debugging Torch-TensorRT Compilation
====================================


FX Graph Visualization
----------------------

Debug Mode
-------------


Profiling TensorRT Engines
--------------------------

There are some profiling tools built into Torch-TensorRT to measure the performance of TensorRT sub blocks in compiled modules.
This can be used in conjunction with PyTorch profiling tools to get a picture of the performance of your model.
Profiling for any particular sub block can be enabled by the ``enabled_profiling()`` method of any
`` __torch__.classes.tensorrt.Engine`` attribute, or of any ``torch_tensorrt.runtime.TorchTensorRTModule``. The profiler will
dump trace files by default in /tmp, though this path can be customized by either setting the
profile_path_prefix of ``__torch__.classes.tensorrt.Engine`` or as an argument to
torch_tensorrt.runtime.TorchTensorRTModule.enable_precision(profiling_results_dir="").
Traces can be visualized using the Perfetto tool (https://perfetto.dev)

.. image:: /user_guide/images/perfetto.png
:width: 512px
:height: 512px
:scale: 50 %
:align: right
Binary file added docsrc/user_guide/images/perfetto.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading