NVIDIA
diff --git a/‎README.md
+65-9 b/‎README.md
+65-9
diff --git a/‎docker/Dockerfile
+27-7 b/‎docker/Dockerfile
+27-7
diff --git a/‎docker/requirements-pip-odise.txt
+19 b/‎docker/requirements-pip-odise.txt
+19
diff --git a/‎docker/requirements-pip-pytorch.txt
+3-2 b/‎docker/requirements-pip-pytorch.txt
+3-2
diff --git a/‎docker/requirements-pip.txt
+23-10 b/‎docker/requirements-pip.txt
+23-10
diff --git a/‎nvidia_tao_pytorch/core/callbacks/loggers.py
+4-1 b/‎nvidia_tao_pytorch/core/callbacks/loggers.py
+4-1
diff --git a/‎nvidia_tao_pytorch/core/mmlab/mmclassification/model_params_mapping.py
+11 b/‎nvidia_tao_pytorch/core/mmlab/mmclassification/model_params_mapping.py
+11
diff --git a/‎nvidia_tao_pytorch/core/mmlab/mmclassification/utils.py
+5-1 b/‎nvidia_tao_pytorch/core/mmlab/mmclassification/utils.py
+5-1
diff --git a/‎nvidia_tao_pytorch/cv/__init__.py
+12 b/‎nvidia_tao_pytorch/cv/__init__.py
+12
diff --git a/‎nvidia_tao_pytorch/cv/action_recognition/config/default_config.py
+1 b/‎nvidia_tao_pytorch/cv/action_recognition/config/default_config.py
+1
@@ -7,6 +7,8 @@
 		* [Hardware Requirements](#HardwareRequirements)
 		* [Software Requirements](#SoftwareRequirements)
 	* [Instantiating the development container](#Instantiatingthedevelopmentcontainer)
+		* [Command line options](#Commandlineoptions)
+		* [Using the mounts file](#Usingthemountsfile)
 	* [Updating the base docker](#Updatingthebasedocker)
 		* [Build base docker](#Buildbasedocker)
 		* [Test the newly built base docker](#Testthenewlybuiltbasedocker)
@@ -25,16 +27,16 @@
 
 TAO Toolkit is a Python package hosted on the NVIDIA Python Package Index. It interacts with lower-level TAO dockers available from the NVIDIA GPU Accelerated Container Registry (NGC). The TAO containers come pre-installed with all dependencies required for training. The output of the TAO workflow is a trained model that can be deployed for inference on NVIDIA devices using DeepStream, TensorRT and Triton.
 
-This repository contains the required implementation for the all the deep learning components and networks using the PyTorch backend. These routines are packaged as part of the TAO Toolkit PyTorch container in the Toolkit package.
+This repository contains the required implementation for the all the deep learning components and networks using the PyTorch backend. These routines are packaged as part of the TAO Toolkit PyTorch container in the Toolkit package. These source code here is compatible with PyTorch version > 2.0.0
 
 ## <a name='GettingStarted'></a>Getting Started
 
 As soon as the repository is cloned, run the `envsetup.sh` file to check
-if the build enviroment has the necessary dependencies, and the required
+if the build environment has the necessary dependencies, and the required
 environment variables are set.
 
 ```sh
-source scripts/envsetup.sh
+source ${PATH_TO_REPO}/scripts/envsetup.sh
 ```
 
 We recommend adding this command to your local `~/.bashrc` file, so that every new terminal instance receives this.
@@ -64,23 +66,24 @@ We recommend adding this command to your local `~/.bashrc` file, so that every n
 | **Software**                     | **Version** |
 | :--- | :--- |
 | Ubuntu LTS                       | >=18.04     |
-| python                           | >=3.8.x     |
+| python                           | >=3.10.x     |
 | docker-ce                        | >19.03.5    |
 | docker-API                       | 1.40        |
 | `nvidia-container-toolkit`       | >1.3.0-1    |
 | nvidia-container-runtime         | 3.4.0-1     |
 | nvidia-docker2                   | 2.5.0-1     |
-| nvidia-driver                    | >525.85     |
+| nvidia-driver                    | >535.85     |
 | python-pip                       | >21.06      |
 
 ### <a name='Instantiatingthedevelopmentcontainer'></a>Instantiating the development container
 
-Inorder to maintain a uniform development enviroment across all users, TAO Toolkit provides a base environment docker that has been built and uploaded to NGC for the developers. For instantiating the docker, simply run the `tao_pt` CLI. The usage for the command line launcher is mentioned below.
+Inorder to maintain a uniform development environment across all users, TAO Toolkit provides a base environment Dockerfile in `docker/Dockerfile` that contains all
+the required third party dependencies for the developers. For instantiating the docker, simply run the `tao_pt` CLI. The usage for the command line launcher is mentioned below.
 
 ```sh
 usage: tao_pt [-h] [--gpus GPUS] [--volume VOLUME] [--env ENV]
               [--mounts_file MOUNTS_FILE] [--shm_size SHM_SIZE]
-              [--run_as_user] [--ulimit ULIMIT] [--port PORT]
+              [--run_as_user] [--tag TAG] [--ulimit ULIMIT] [--port PORT]
 
 Tool to run the pytorch container.
 
@@ -92,6 +95,7 @@ optional arguments:
   --mounts_file MOUNTS_FILE Path to the mounts file.
   --shm_size SHM_SIZE       Shared memory size for docker
   --run_as_user             Flag to run as user
+  --tag TAG                 The tag value for the local dev docker.
   --ulimit ULIMIT           Docker ulimits for the host machine.
   --port PORT               Port mapping (e.g. 8889:8889).
 
@@ -106,6 +110,55 @@ tao_pt --gpus all \
        --env PYTHONPATH=/tao-pt
 ```
 
+Running Deep Neural Networks implies working on large datasets. These datasets are usually stored on network share drives with significantly higher storage capacity. Since the `tao_pt` CLI wrapper uses docker containers under the hood, these drives/mount points need to be mapped to the docker.
+
+There are 2 ways to configure the `tao_pt` CLI wrapper. 
+
+1. Via the command line options
+2. Via the mounts file. By default, at `~/.tao_mounts.json`.
+
+#### <a name='Commandlineoptions'></a>Command line options
+
+| **Option**      | **Description** | **Default** |
+| :-- | :-- | :-- |
+| `gpus`         | Comma separated GPU indices to be exposed to the docker | 1 | 
+| `volume`       | Paths on the host machine to be exposed to the container. This is analogous to the `-v` option in the docker CLI. You may define multiple mount points by using the --volume option multiple times.  | None |  
+| `env`          | Environment variables to defined inside the interactive container. You may set them as `--env VAR=<value>`. Multiple environment variables can be set by repeatedly defining the `--env` option. | None |
+| `mounts_file`  | Path to the mounts file, explained more in the next section. | `~/.tao_mounts.json` | 
+| `shm_size`     | Shared memory size for docker in Bytes. | 16G |
+| `run_as_user`  | Flag to run as default user account on the host machine. This helps with maintaining permissions for all directories and artifacts created by the container. | 
+| `tag`          | The tag value for the local dev docker | None |
+| `ulimit`       | Docker ulimits for the host machine | 
+| `port`         | Port mapping (e.g. 8889:8889) | None |
+
+#### <a name='Usingthemountsfile'></a>Using the mounts file
+
+The `tao_pt` CLI wrapper instance can be configured by using a mounts file. By default, the wrapper expects the mounts file to be at 
+`~/.tao_mounts.json`. However, for multiple options, you may be able 
+
+The launcher config file consists of three sections:
+
+* `Mounts`
+
+The `Mounts` parameter defines the paths in the local machine, that should be mapped to the docker. This is a list of `json` dictionaries containing the source path in the local machine and the destination path that is mapped for the CLI wrapper.
+
+A sample config file containing 2 mount points and no docker options is as below.
+
+  ```json
+  {
+      "Mounts": [
+          {
+              "source": "/path/to/your/experiments",
+              "destination": "/workspace/tao-experiments"
+          },
+          {
+              "source": "/path/to/config/files",
+              "destination": "/workspace/tao-experiments/specs"
+          }
+      ]
+  }
+  ```
+
 ### <a name='Updatingthebasedocker'></a>Updating the base docker
 
 There will be situations where developers would be required to update the third party dependancies to newer versions, or upgrade CUDA etc. In such a case, please follow the steps below:
@@ -120,10 +173,11 @@ cd $NV_TAO_PYTORCH_TOP/docker
 ```
 
 #### <a name='Testthenewlybuiltbasedocker'></a>Test the newly built base docker
-Developers may tests their new docker by using the `tao_pt` command.
+
+The build script tags the newly built base docker with the username of the account in the user's local machine. Therefore, the developers may tests their new docker by using the `tao_pt` command with the `--tag` option.
 
 ```sh
-tao_pt -- script args
+tao_pt --tag $USER -- script args
 ```
 
 #### <a name='Updatethenewdocker'></a>Update the new docker
@@ -151,6 +205,8 @@ bash $NV_TAO_PYTORCH_TOP/docker/build.sh --build --push --force
 The TAO docker is built on top of the TAO Pytorch base dev docker, by building a python wheel for the `nvidia_tao_pyt` module in this repository and installing the wheel in the Dockerfile defined in `release/docker/Dockerfile`. The whole build process is captured in a single shell script which may be run as follows:
 
 ```sh
+git lfs install
+git lfs pull
 source scripts/envsetup.sh
 cd $NV_TAO_PYTORCH_TOP/release/docker
 ./deploy.sh --build --wheel
 
@@ -1,5 +1,5 @@
-ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:23.02-py3
-FROM ${BASE_IMAGE}
+ARG PYTORCH_BASE_IMAGE=nvcr.io/nvidia/pytorch:23.08-py3
+FROM ${PYTORCH_BASE_IMAGE}
 
 # Ensure apt-get won't prompt for selecting options
 ENV DEBIAN_FRONTEND=noninteractive
@@ -16,8 +16,8 @@ RUN pip install parametrized ninja
 WORKDIR /opt
 
 # Clone and checkout TensorRT OSS
-# Moving TensorRT to 8.5 branch.
-ENV TRT_TAG "release/8.5"
+# Moving TensorRT to 8.6 branch.
+ENV TRT_TAG "release/8.6"
 ENV TRT_INCLUDE_DIR="/usr/include/x86_64-linux-gnu"
 # Install TRT OSS
 RUN mkdir trt_oss_src && \
@@ -27,21 +27,41 @@ RUN mkdir trt_oss_src && \
   cd TensorRT && \
   git submodule update --init --recursive && \
   mkdir -p build && cd build  && \
-  cmake .. -DGPU_ARCHS="53 60 61 70 75 80 86 90" -DTRT_LIB_DIR=/usr/lib/x86_64-linux-gnu -DTRT_BIN_DIR=`pwd`/out -DCUDA_VERSION=11.8 -DCUDNN_VERSION=8.7 && \
+  cmake .. \
+    -DGPU_ARCHS="53;60;61;70;75;80;86;90" \
+    -DCMAKE_CUDA_ARCHITECTURES="53;60;61;70;75;80;86;90" \
+    -DTRT_LIB_DIR=/usr/lib/x86_64-linux-gnu \
+    -DTRT_BIN_DIR=`pwd`/out \
+    -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.2/bin/nvcc \
+    -DCUDNN_VERSION=8.9 && \
   make -j16 nvinfer_plugin nvinfer_plugin_static && \
-  cp libnvinfer_plugin.so.8.5.3 /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.5.3 && \
+  cp libnvinfer_plugin.so.8.6.1 /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.6.1 && \
   cp libnvinfer_plugin_static.a /usr/lib/x86_64-linux-gnu/libnvinfer_plugin_static.a && \
   cd ../../../ && \
   rm -rf trt_oss_src
 
 COPY docker/requirements-pip.txt requirements-pip.txt
-RUN pip install --ignore-installed -r requirements-pip.txt \
+# Forcing cython==0.29.36 for pycocotools-fix with python3.10.
+RUN pip install Cython==0.29.36 \
+  && pip install --ignore-installed -r requirements-pip.txt \
   && rm requirements-pip.txt
 
 COPY docker/requirements-pip-pytorch.txt requirements-pip-pytorch.txt
 RUN pip install --ignore-installed --no-deps -r requirements-pip-pytorch.txt \
   && rm requirements-pip-pytorch.txt
 
+COPY docker/requirements-pip-odise.txt requirements-pip-odise.txt
+RUN pip install --ignore-installed --no-deps -r requirements-pip-odise.txt \
+  && rm requirements-pip-odise.txt
+
+# Install mmcv from source for our cuda versions.
+COPY third_party/mmcv/mmcv.patch mmcv.patch
+RUN git clone https://github.com/open-mmlab/mmcv.git \
+  && cd mmcv && git checkout v1.7.1 \
+  && git apply /opt/mmcv.patch \
+  && pip install -r requirements/optional.txt --ignore-installed \
+  && FORCE_CUDA=1 MMCV_WITH_OPS=1 python setup.py install
+
 # Setup user account
 ARG uid=1000
 ARG gid=1000
 
@@ -0,0 +1,19 @@
+# ODISE
+huggingface-hub
+fvcore
+ftfy
+kornia==0.6
+diffdist==0.1
+nltk>=3.6.2
+taming-transformers-rom1504
+importlib-metadata==4.11.3
+flake8-comprehensions
+git+https://github.com/facebookresearch/detectron2.git
+git+https://github.com/openai/CLIP.git@main#egg=clip
+git+https://github.com/cocodataset/panopticapi.git
+yacs>=0.1.8
+iopath==0.1.9
+jmespath
+s3transfer
+pathspec
+black
@@ -2,12 +2,13 @@ fairscale==0.4.12
 lpips==0.1.4
 lightning-utilities==0.8.0
 mmcls==0.25.0
-mmcv-full -f https://download.openmmlab.com/mmcv/dist/11.4/torch1.11.0/index.htmlninja
 pytorch-lightning==1.8.5
 pytorch_metric_learning==1.7.1
 pytorch-msssim
 thop
 timm>=0.9.6.dev0
 torchmetrics==0.10.3
-open-clip-torch[training]==2.20.0
+open-clip-torch[training]==2.23.0
+sentencepiece==0.1.99
 ftfy
+torch-pruning==1.2.2
@@ -1,18 +1,24 @@
 addict==2.4.0
 anyconfig==0.9.10
 astroid==2.5.2
+boto3
+botocore
 ccimport==0.4.2
+click==8.0.4
 colored==1.4.4
 cumm-cu114==0.2.8
+cutex==0.2.1
 easydict==1.10
+einops==0.3.2
 faiss-cpu==1.7.2 # TODO: faiss-gpu works better in some cases
 fire==0.5.0
 flake8==6.0.0
 gdown==4.6.4
+gradio==4.3.0
 hydra-core==1.2.0
 imgaug==0.4.0
 imageio==2.26.0
-isort==4.2.5
+isort==4.3.21
 lark==1.1.5
 lazy-import==0.2.2
 lazy_object_proxy==1.5.1
@@ -26,23 +32,25 @@ mypy-extensions==1.0.0
 natsort==8.3.1
 ninja==1.11.1
 nltk==3.8.1
-https://files.pythonhosted.org/packages/02/99/ca518644076d372509d9dff13e85072e65fba273c42da79a344f55bbad48/nvidia_eff-0.6.4-py38-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
-https://files.pythonhosted.org/packages/d1/c2/c14dd8884a5bc05ca07331b3d78a92812eb19e25a625a0b59af8b609a93f/nvidia_eff_tao_encryption-0.1.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
+nvidia-eff==0.6.5
+nvidia-eff-tao-encryption==0.1.8
 numpy==1.22.2 # TODO: Update np.float to np.float64 in the coming numpy version
 omegaconf==2.2.2
 # Install onnx-graphsurgeon with extra-index-url
 --extra-index-url https://pypi.ngc.nvidia.com
 onnx-graphsurgeon
 onnx-simplifier==0.4.5
 onnxoptimizer==0.3.8
-onnxruntime>=1.7.0<=1.11.1
+onnxruntime==1.15.1
 onnxsim==0.4.17
-opencv-python==4.5.5.64
+opencv-python==4.8.0.74
 pccm==0.4.6
+pillow==9.5.0
 Polygon3==3.0.8
+protobuf>4.21.0,<5.0
 pyarmor==7.7.4
-pyclipper==1.1.0.post3
-pycocotools-fix==2.0.0.9
+pyclipper
+pycocotools
 pycodestyle==2.10.0
 pycuda==2022.2.2
 pycodestyle==2.10.0
@@ -52,6 +60,7 @@ pyflakes==3.0.1
 pylint==2.2.2
 pynini==2.1.5
 pyquaternion==0.9.9
+pyrr==0.10.3
 PyWavelets==1.4.1
 PyYAML==6.0
 rich==13.3.2
@@ -60,13 +69,17 @@ shapely==1.8.2
 soundfile==0.12.1
 spconv-cu114==2.1.21
 tabulate>=0.9.0
-tensorboardX==2.6
+tensorboardX==2.6.2.2
 terminaltables==3.1.0
 tifffile==2023.2.28
-transformers>=4.8.2
-tokenizers==0.10.3
+# Upgrading transformer due to an error with importlib version checks.
+transformers==4.33.3
+tokenizers==0.12.1
+# Same issue with tqdm.pip
+tqdm==4.65.0
 ujson==5.5.0
 unidecode==1.2.0
+wandb>=0.12.11
 wget==3.2
 wrapt>=1.11, <1.13.0
 yapf==0.32.0
@@ -14,7 +14,10 @@
 
 """Status Logger callback."""
 
-from collections import Iterable
+try:
+    from collections.abc import Iterable
+except ImportError:
+    from collections import Iterable
 
 from datetime import timedelta
 
 
@@ -32,4 +32,15 @@
     "gc_vit_base": 1024,
     "gc_vit_large": 1536,
     "gc_vit_large_384": 1536,
+    "faster_vit_0_224": 512,  # FasterViT
+    "faster_vit_1_224": 640,
+    "faster_vit_2_224": 768,
+    "faster_vit_3_224": 1024,
+    "faster_vit_4_224": 1568,
+    "faster_vit_5_224": 2560,
+    "faster_vit_6_224": 2560,
+    "faster_vit_4_21k_224": 1568,
+    "faster_vit_4_21k_384": 1568,
+    "faster_vit_4_21k_512": 1568,
+    "faster_vit_4_21k_768": 1568,
 }}}
@@ -201,7 +201,11 @@ def load_model(model_path, mmcls_config=None, return_ckpt=False):
     Returns:
         Returns the loaded model instance.
     """
-    temp = tempfile.NamedTemporaryFile(suffix='.pth', delete=False)
+    # Forcing delete to close.
+    temp = tempfile.NamedTemporaryFile(
+        suffix='.pth',
+        delete=True
+    )
     tmp_model_path = temp.name
 
     # Remove EMA related items from the state_dict
 
@@ -26,3 +26,15 @@
     from third_party.onnx.utils import _export
     # Monkey Patch ONNX Export to disable onnxscript
     torch.onnx.utils._export = _export
+    # Monkey Patch SDPA location
+    torch.nn.functional.scaled_dot_product_attention = torch._C._nn._scaled_dot_product_attention  # noqa: pylint: disable=I1101
+
+
+if major_version >= 2:
+    # From https://github.com/pytorch/pytorch/blob/2efe4d809fdc94501fc38bf429e9a8d4205b51b6/torch/utils/tensorboard/_pytorch_graph.py#L384
+    def _node_get(node: torch._C.Node, key: str):  # noqa: pylint: disable=I1101
+        """Gets attributes of a node which is polymorphic over return type."""
+        sel = node.kindOf(key)
+        return getattr(node, sel)(key)
+
+    torch._C.Node.__getitem__ = _node_get  # noqa: pylint: disable=I1101
@@ -92,6 +92,7 @@ class ARTrainExpConfig:
 
     results_dir: Optional[str] = None
     gpu_ids: List[int] = field(default_factory=lambda: [0])
+    num_gpus: int = 1
     resume_training_checkpoint_path: Optional[str] = None
     optim: OptimConfig = OptimConfig()
     num_epochs: int = 10