Docker ====== `Docker Images `_ are created for each release. They come with all the benchmarks installed and the necessary datasets. No additional downloads are necessary. CUDA ---- Requirements ^^^^^^^^^^^^ * NVIDIA driver * `docker-ce `_ * `nvidia-docker `_ Usage ^^^^^ The commands below will download the lastest cuda container and run milabench right away, storing the results inside the ``results`` folder on the host machine: .. code-block:: bash # Choose the image you want to use export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:cuda-nightly # Pull the image we are going to run docker pull $MILABENCH_IMAGE # Run milabench docker run -it --rm --ipc=host --gpus=all \ -v $(pwd)/results:/milabench/envs/runs \ $MILABENCH_IMAGE \ milabench run ``--ipc=host`` removes shared memory restrictions, but you can also set ``--shm-size`` to a high value instead (at least ``8G``, possibly more). Each run should store results in a unique directory under ``results/`` on the host machine. To generate a readable report of the results you can run: .. code-block:: bash # Show Performance Report docker run -it --rm \ -v $(pwd)/results:/milabench/envs/runs \ $MILABENCH_IMAGE \ milabench report --runs /milabench/envs/runs ROCM ---- Requirements ^^^^^^^^^^^^ * rocm * docker Usage ^^^^^ For ROCM the usage is similar to CUDA, but you must use a different image and the Docker options are a bit different: .. code-block:: bash # Choose the image you want to use export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:rocm-nightly # Pull the image we are going to run docker pull $MILABENCH_IMAGE # Run milabench docker run -it --rm --ipc=host \ --device=/dev/kfd --device=/dev/dri \ --security-opt seccomp=unconfined --group-add video \ -v /opt/amdgpu/share/libdrm/amdgpu.ids:/opt/amdgpu/share/libdrm/amdgpu.ids \ -v /opt/rocm:/opt/rocm \ -v $(pwd)/results:/milabench/envs/runs \ $MILABENCH_IMAGE \ milabench run For the performance report, it is the same command: .. code-block:: bash # Show Performance Report docker run -it --rm \ -v $(pwd)/results:/milabench/envs/runs \ $MILABENCH_IMAGE \ milabench report --runs /milabench/envs/runs Multi-node benchmark ^^^^^^^^^^^^^^^^^^^^ There are currently two multi-node benchmarks, ``opt-1_3b-multinode`` (data-parallel) and ``opt-6_7b-multinode`` (model-parallel, that model is too large to fit on a single GPU). Here is how to run them: 1. Set up two or more machines that can see each other on the network. Suppose there are two and their addresses are: * ``manager-node`` ⬅ this is the node you will launch the job on * ``worker-node`` 2. ``docker pull`` the image on both nodes. 3. Prior to running the benchmark, create a SSH key pair on ``manager-node`` and set up public key authentication to the other nodes (in this case, ``worker-node``). 4. Write an override file that will tell milabench about the network (see below). Note that you will need to copy/paste the same configuration for both multinode tests. 5. On ``manager-node``, execute ``milabench run`` via Docker. * Mount the private key at ``/milabench/id_milabench`` in the container * Use ``--override "$(cat overrides.yaml)"`` to pass the overrides Example YAML configuration (``overrides.yaml``): .. code-block:: yaml # Name of the benchmark. You can also override values in other benchmarks. opt-6_7b-multinode: # Docker image to use on the worker nodes (should be same as the manager) docker_image: "ghcr.io/mila-iqia/milabench:cuda-nightly" # The user on worker-node that public key auth is set up for worker_user: "username" # Address of the manager node from the worker nodes manager_addr: "manager-node" # Addresses of the worker nodes (do not include the manager node, # although it is also technically a worker node) worker_addrs: - "worker-node" # Make sure that this is equal to length(worker_addrs) + 1 num_machines: 2 capabilities: # Make sure that this is ALSO equal to length(worker_addrs) + 1 nodes: 2 opt-1_3b-multinode: # Copy the contents of the opt-6_7b-multinode section without any changes. docker_image: "ghcr.io/mila-iqia/milabench:cuda-nightly" worker_user: "username" manager_addr: "manager-node" worker_addrs: - "worker-node" num_machines: 2 capabilities: nodes: 2 Then, the command should look like this: .. code-block:: bash # On manager-node: # Change if needed export SSH_KEY_FILE=$HOME/.ssh/id_rsa docker run -it --rm --gpus all --network host --ipc=host --privileged \ -v $SSH_KEY_FILE:/milabench/id_milabench \ -v $(pwd)/results:/milabench/envs/runs \ $MILABENCH_IMAGE \ milabench run --override "$(cat overrides.yaml)" \ --select multinode The last line (``--select multinode``) specifically selects the multi-node benchmarks. Omit that line to run all benchmarks. If you need to use more than two nodes, edit or copy ``overrides.yaml`` and simply add the other nodes' addresses in ``worker_addrs`` and adjust ``num_machines`` and ``capabilities.nodes`` accordingly. For example, for 4 nodes: .. code-block:: yaml opt-6_7b-multinode: docker_image: "ghcr.io/mila-iqia/milabench:cuda-nightly" worker_user: "username" manager_addr: "manager-node" worker_addrs: - "worker-node1" - "worker-node2" - "worker-node3" num_machines: 4 capabilities: nodes: 4 .. note:: The multi-node benchmark is sensitive to network performance. If the mono-node benchmark ``opt-6_7b`` is significantly faster than ``opt-6_7b-multinode`` (e.g. processes more than twice the items per second), this likely indicates that Infiniband is either not present or not used. (It is not abnormal for the multinode benchmark to perform *a bit* worse than the mono-node benchmark since it has not been optimized to minimize the impact of communication costs.) Even if Infiniband is properly configured, the benchmark may fail to use it unless the ``--privileged`` flag is set when running the container. Building images --------------- Images can be built locally for prototyping and testing. .. code-block:: docker build -f docker/Dockerfile-cuda -t milabench:cuda-nightly --build-arg CONFIG=standard.yaml . Or for ROCm: .. code-block:: docker build -f docker/Dockerfile-rocm -t milabench:rocm-nightly --build-arg CONFIG=standard.yaml .