今天在运行深度学习程序adversarial-code-generation的时候,发现用到了docker,然后发现了下面的错误:

[DBG]:   + Image built!
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: device error: 1: unknown device: unknown.
Makefile:818: recipe for target 'train-model-seq2seq' failed
make: *** [train-model-seq2seq] Error 125
Command exited with non-zero status 2
0.94user 1.92system 0:13.92elapsed 20%CPU (0avgtext+0avgdata 61912maxresident)k
1439376inputs+1317112outputs (267major+24941minor)pagefaults 0swaps

脚本是这个:

ARGS="--regular_training --epochs 10" \
GPU=1 \
MODELS_OUT=final-models/seq2seq/sri/py150/ \
DATASET_NAME=datasets/transformed/preprocessed/tokens/sri/py150/transforms.Identity \
time make train-model-seq2seq

解决方法

我发现网上的一些降低版本的做法好像不管用,我注意到nvidia-docker使用的是gpu的,所以我修改一下GPU 这个参数,我只有一块1080TI哈

ARGS="--regular_training --epochs 10" \
GPU=0 \
MODELS_OUT=final-models/seq2seq/sri/py150/ \
DATASET_NAME=datasets/transformed/preprocessed/tokens/sri/py150/transforms.Identity \
time make train-model-seq2seq