Some scripts and tools to help me manage my programs on the Palmetto Cluster
do this first to install the palmetto-scripts (takes ~5 minutes to run). You only have to do this once.
bash <(curl -s https://raw.githubusercontent.com/dougnd/palmetto-scripts/master/bin/basicSetup.sh)
getGPULikeNode # get a node for installation purposes
dinstall tensorflow
exit # leave the node
qsub -I -l select=1:ncpus=1:mem=10gb:ngpus=2:gpu_model=k40,walltime=0:30:00
cd $TMPDIR
wget https://github.com/tensorflow/tensorflow/tarball/master # may have to try this more than once
tar xf master
cd tensorflow-tensorflow-*/tensorflow/models/image/mnist
python convolutional.py
exit # leave the node
getGPULikeNode # get a node for installation purposes
dinstall caffe_cudnn
exit # leave the node
Note: the following was added to the palmetto MOTD: "Jobs that request gpus but don't use them may be terminated without notice." Make sure if you are request a gpu, you actually are using it. You may want to develop on a local machine and deploy to palmetto once you know it works.
Futhermore, every GPU on a node is accessible to your job regardless of whether you request any. Thus, something like tensorflow will detect 2 GPU's and make use of them both even if you only request one. This is clearly not good since you may interfere with another job. Make sure you are only using GPU's assigned to you. If you are requesting a single GPU and want to know what device you were assigned, you can used the following (provided by the palmetto support team):
export gpuDev=$(qstat -f $PBS_JOBID | awk '/exec_vnode/ {
match($0, /'`hostname`'\[([0-9]+)\]/, grp);
print grp[1]
}')
You can then use the gpuDev enviornment variable in your scripts. e.g. :
echo "Using GPU: $gpuDev"
caffe device_query -gpu $gpuDev
caffe train -solver vehicleDetectorSolver.prototxt -gpu $gpuDev
dinstall [<options>...] <command> [<packages>...]
Command can be one of the following:
update
- updates dinstall and palmetto-scripts (pulls from github).install
- installs packages and their dependencies. Multiple packages can be supplied.
e.g.:dinstall install caffe_cudnn tensorflow
(installs caffe and tensorflow as well as all thier dependencies)uninstall
- removes packages. Multiple packages can be supplied.upgrade
- upgrades packages. Multiple packages can be supplied.list
- lists available packages.list installed
- lists installed packages.
Options
--help
prints basic usage information.--version
prints version information.--ignore-binaries
installs from source, ignoring available binaries.
getCPUNode
gets a node (6 cores, 10GB, 6 hr)getGPUNode
gets a GPU node with a k40 (6 cores, 10GB, 6 hr)getGPULikeNode
gets a node without a GPU, but the same architecture (6 cores, 10GB, 6 hr)