{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# NVSHMEM Device API" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're finally ready to tackle the functionality we ultimately want from NVSHMEM: device-side memory operations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Returning to peer-to-peer operations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Earlier we saw that peer-to-peer GPU connections over NVLink support directly writing to or reading from peer GPU memory. (Indeed, this is what is done under the hood with a host-side call like `nvshmem_int_sum_reduce()`.) We used that to perform the CUDA operation\n", "```\n", " atomicAdd(hits, 1);\n", "```\n", "even when `hits` pointed to device memory that wasn't local to the GPU performing the atomic add. How do we do the analogous operation in device code using NVSHMEM? Specifically, we want to achieve something similar to what we did in [exercises/monte_carlo_nvshmem_cuda_peer.cpp](exercises/monte_carlo_nvshmem_cuda_peer.cpp).\n", "\n", "Fortunately, [NVSHMEM directly exposes atomic operations](https://docs.nvidia.com/hpc-sdk/nvshmem/api/docs/gen/api/amo.html#). The one we're looking for is called [nvshmem_int_atomic_add](https://docs.nvidia.com/hpc-sdk/nvshmem/api/docs/gen/api/amo.html#nvshmem-atomic-add) (although, for this use case, `nvshmem_int_atomic_inc` would also work). It looks a lot like CUDA's `atomicAdd`, but takes as a third argument the ID of the PE to do the atomic add to. Let's look at [exercises/monte_carlo_nvshmem_device.cpp](exercises/monte_carlo_nvshmem_device.cpp)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "NUM_DEVICES = !nvidia-smi -L | wc -l\n", "NUM_DEVICES = int(NUM_DEVICES[0])" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "src/comm/transports/ibrc/ibrc.cpp:1445: NULL value get_device_list failed \n", "src/comm/transports/ibrc/ibrc.cpp:1445: NULL value get_device_list failed \n", "src/comm/transports/ibrc/ibrc.cpp:1445: NULL value get_device_list failed \n", "src/comm/transports/ibrc/ibrc.cpp:1445: NULL value get_device_list failed \n", "Estimated value of pi = 3.14072\n", "Error = 0.000277734\n" ] } ], "source": [ "!nvcc -ccbin=mpicxx -x cu -arch=sm_70 -rdc=true -I $NVSHMEM_HOME/include -L $NVSHMEM_HOME/lib -lnvshmem -lcuda -o monte_carlo_nvshmem_device solutions/monte_carlo_nvshmem_device.cpp\n", "!mpirun -np $NUM_DEVICES ./monte_carlo_nvshmem_device" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Device-side collectives" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 4 }