gpu/cuda/nvidia experimentations for cloud of GPU
Experiments:
-
π§ Terraform:
raw1
: Simple Terraform exprimentation. Later devloped into [https://github.com/sohale/gpu-experimentations/tree/main/provisioning_scripts/terraform] -
π§ LLVM:
raw-llvm
: Coding LLVM hard-coded hands-on home-made LLVM code -
π§ TVM:
3tvm
: TVM (Framework/DSL for Neural Networks Inference). (For TVM open-source tickets) -
π§ Lean4:
4leannet1
(moved to 5) Early Lean4 experiments -
π§ Triton:
4triton
(For OpenAI Triton open-source tickets) -
π§ Lean4:
leannn5
Lean4 experiments -
π§ CUDA:
6_cuda_rggbuff
Simple CUDA code for RGBA buffer -
π§ MLIR:
7_mlir
MLIR experiment 1: Full MLIR build, build scripts, My own Docker build (Dockerfile) for containerised MLIR development (For MLIR open-source tickets) -
π§ MLIR:
8_mlir_nn
: MLIR experiment 2: Neural network (cancelled) -
π§ MLIR:
9_mlir_neo_refactor
: MLIR experiment 3 Neural network (with better build and container), as support for a compiler project. See [https://github.com/sohale/gpu-experimentations/tree/main/provisioning_scripts/mlir_env]. Also LLVM debugging usinglldb
(Clang toolchain). -
π§ PTX:
10_mcmc_ptx
: MCMC using PTX (direct hard-coded NVidia's assembly language, on top of).- Low-level βParallel-Thread Execution ISA Version 8.3β (almost architecture-independent, using "as-if virtual machine")
- ( see PTX (pdf) and PTX (html)
PTX
is itself on top ofSASS
(propriatory): SASS' .yacc and SASS' .lexptxas
, ``- Also see cuda_api.h and cuda_runtime_api.cc , .lex file ptx.l on
gpgpusim
- PTX Op Codes: opcodes.def
- Cool from GPGPUSIM: gpgpu_context.h. They even have OpenCL runtime API: opencl_runtime_api.cc
- CUDA-level: cuda_runtime_api.cc for CUDA-level and instructions.cc
- CUDA Memory model:
- CUDA devide runtime: cuda_device_runtime.cc
- power_stat.h, taking into account POD, DRAM, interconnect.
-
π§ CUDA:
11_matrix_cuda
: Advanced CUDA optimisation techniques + profiling: for Matrix Multiplicaiton -
π§ FPGA:
12_fpga_aws
: FPGA on cloud using AWS's F2, utilixiing Xilinx hardware and AmaranthDHL (open-source hardware HDL) (as part of heterogeneous computing) -
π§ CUDA:
13_cuda_sharedmem
: Advanced CUDA+PTX optimisation techniques + profiling ( for experimentation with CUDA / CC Architectures )