Skip to content

ThrunGroup/implicit-hyper-opt

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Optimizing Millions of Hyperparameters by Implicit Differentiation

This repository is an implementation of Optimizing Millions of Hyperparameters by Implicit Differentiation.

Running Experiments

Setup Environment

Create a Python 3.7 environment and install required packages:

conda create -n ift-env python=3.7
source activate ift-env
conda install pytorch torchvision cudatoolkit=9.0 -c pytorch
pip install -r requirements.txt

Install Jupyter lab:

conda install -c conda-forge jupyterlab

Simple test

Consider the following tests to verify the environment is correctly setup:

mnist_test.py

python mnist_test.py 
  --datasize <train set size> 
  --valsize <validation set size> 
  --lrh <hyperparameter lr need to be negative> 
  --epochs <min epochs for training model> 
  --hepochs <# of iterations for hyperparameter update> 
  --l2 <initial log weight decay> 
  --restart <reinitialize model weight after each hyperparameter update or not> 
  --model <cnn for lenet like model, mlp for logistic regession and mlp>
  --dataset <CIFAR10 or MNIST>
  --num_layers <# of hidden layer for mlp>
  --hessian<KFAC: KFAC estiamte; direct:true hessian and inverse>
  --jacobian<direct: true jacobian; product: use d_L/d_theta * d_L/d_lambda>

Trained models after each hyperparameter update will be stored in folder defined in line 627 in mnist_test.py. To use CG to compute inverse of hessian, change line 660's hyperparameter updator.

python mnist_test.py --datasize 40000 --valsize 10000 --lrh 0.01 --epochs=100 --hepochs=10 --l2=1e-5 --restart=10 --model=mlp --dataset=MNIST --num_layers=1 --hessian=KFAC --jacobian=direct

Deployment

First, make sure you are on the master node:

ssh <USERNAME>@q.vectorinstitute.ai

Submit a job to the Slurm scheduler:

srun --partion=gpu --gres=gpu:1 --mem=4GB python mnist_test.py

Or, submit a batch of jobs defined by srun_script.sh:

sbatch --array=0-2 srun_script.sh

View queued jobs for a user:

squeue -u $USERNAME

Cancel jobs for a user:

scancel -u $USERNAME

Cancel a specific job:

scancel $JOBID

Experiments

Here, we should place commands for deploying experiments with and without Slurm

To deploy all of the experiments data generation:

sbatch run_all.sh

Train Data Augmentation Network and/or Loss Reweighting Network

Data Augmentation Network

python train_augment_net2.py --use_augment_net

Loss Reweighting Network

python train_augment_net2.py --use_reweighting_net --loss_weight_type=softmax

Regularization Experiments

LSTM Experiments

The LSTM code in this repository is built on the AWD-LSTM codebase. These commands should be run from inside the rnn folder.

First, download the PTB dataset by running:

./getdata.sh

Tune LSTM hyperparameters with 1-step unrolling

python train.py

STN Comparison

To train an STN, run the following command from inside the stn folder:

python hypertrain.py --tune_all --save

Train a baseline model to get a checkpoint

python train_checkpoint.py --dataset cifar10 --model resnet18 --data_augmentation

Finetune the trained checkpoint

python finetune_checkpoint.py --load_checkpoint=baseline_checkpoints/cifar10_resnet18_sgdm_lr0.1_wd0.0005_aug1.pt --num_finetune_epochs=10 --wdecay=1e-4

Experiment 1

Explain what experiment does, and what figure it is in the paper.

To run python script:

python script.py

To deploy with Slurm:

srun ...

Project Structure

.
├── HAM_dataset.py
├── README.md
├── cutout.py
├── data_loaders.py
├── finetune_checkpoint.py
├── finetune_ift_checkpoint.py
├── grid_search.py
├── images
├── inverse_comparison.py
├── isic_config.py
├── isic_loader.py
├── kfac.py
├── kfac_utils.py
├── minst_ref.py
├── mnist_test.py
├── models
│   ├── __init__.py
│   ├── resnet.py
│   ├── resnet_cifar.py
│   ├── simple_models.py
│   ├── unet.py
│   └── wide_resnet.py
├── papers
│   ├── haoping_project
│   │   ├── main.tex
│   │   ├── neurips2019.tex
│   │   ├── neurips_2019.sty
│   │   └── references.bib
│   └── nips
│       ├── main.tex
│       ├── neurips_2019.sty
│       └── references.bib
├── random_search.py
├── requirements.txt
├── rnn
│   ├── config_scripts
│   │   ├── dropoute_ift_no_lrdecay.yaml
│   │   ├── dropouto
│   │   │   ├── dropouto_2layer_lrdecay.yaml
│   │   │   ├── dropouto_2layer_no_lrdecay.yaml
│   │   │   ├── dropouto_ift_lrdecay.yaml
│   │   │   ├── dropouto_ift_neumann_1_lrdecay.yaml
│   │   │   ├── dropouto_ift_neumann_1_no_lrdecay.yaml
│   │   │   ├── dropouto_ift_no_lrdecay.yaml
│   │   │   ├── dropouto_lrdecay.yaml
│   │   │   ├── dropouto_no_lrdecay.yaml
│   │   │   └── dropouto_perparam_ift_no_lrdecay.yaml
│   │   └── wdecay
│   │       ├── ift_wdecay_per_param_no_lrdecay.yaml
│   │       ├── wdecay_ift_lrdecay.yaml
│   │       └── wdecay_ift_neumann_1_lrdecay.yaml
│   ├── create_command_script.py
│   ├── data.py
│   ├── embed_regularize.py
│   ├── getdata.sh
│   ├── locked_dropout.py
│   ├── logger.py
│   ├── model_basic.py
│   ├── plot_utils.py
│   ├── rnn_utils.py
│   ├── run_grid_search.py
│   ├── train.py
│   ├── train2.py
│   └── weight_drop.py
├── search_configs
│   ├── cifar100_wideresnet_bern_dropout_sep.yaml
│   ├── cifar100_wideresnet_gauss_dropout_sep.yaml
│   ├── cifar10_resnet32_data_aug.yaml
│   ├── cifar10_resnet32_grid.yaml
│   ├── cifar10_resnet32_random.yaml
│   ├── cifar10_resnet32_wdecay_per_layer.yaml
│   ├── cifar10_wideresnet_bern_dropout.yaml
│   ├── cifar10_wideresnet_bern_dropout_sep.yaml
│   ├── cifar10_wideresnet_gauss_dropout.yaml
│   ├── cifar10_wideresnet_gauss_dropout_sep.yaml
│   ├── isic_grid.yaml
│   └── isic_random.yaml
├── search_scripts
│   ├── cifar100_wideresnet_bern_dropout_sep
│   ├── cifar100_wideresnet_gauss_dropout_sep
│   ├── cifar100_wideresnet_random
│   ├── cifar10_wideresnet_bern_dropout
│   ├── cifar10_wideresnet_bern_dropout_sep
│   ├── cifar10_wideresnet_gauss_dropout
│   └── cifar10_wideresnet_gauss_dropout_sep
├── srun_script.sh
├── stn
│   ├── datasets
│   │   ├── __init__.py
│   │   ├── cifar.py
│   │   └── loaders.py
│   ├── hypermodels
│   │   ├── __init__.py
│   │   ├── alexnet.py
│   │   ├── hyperconv2d.py
│   │   ├── hyperlinear.py
│   │   └── small.py
│   ├── hypertrain.py
│   ├── models
│   │   ├── __init__.py
│   │   ├── alexnet.py
│   │   └── small.py
│   └── util
│       ├── __init__.py
│       ├── cutout.py
│       ├── dropout.py
│       └── hyperparameter.py
├── train.py
├── train_augment_net2.py
├── train_augment_net_graph.py
├── train_augment_net_multiple.py
├── train_augment_net_slurm.py
├── train_baseline.py
├── train_checkpoint.py
└── utils
    ├── csv_logger.py
    ├── discrete_utils.py
    ├── logger.py
    ├── plot_utils.py
    └── util.py

17 directories, 103 files

Authors

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Other 0.8%