<div align="center"><img src="./images/DLI_Header.png"></div>

# Monte Carlo Approximation of $\pi$ - NVSHMEM with Distributed Work

In this notebook you will use NVSHMEM in the monte-carlo approximation of $\pi$ program, but this time distributing the work across GPUs.

## Objectives

By the time you complete this notebook you will:

- Be able to distribute work across multiple GPUs using NVSHMEM.

## Exercise: Distribute Work Across PEs

In this exercise you will have each GPU divide the $N$   sample points by the number $M$   of PEs. We can use the API `nvshmem_n_pes()` to obtain this:

```cpp
int n_pes = nvshmem_n_pes();
```

Then it's simple to divide up $N$ by `n_pes`. As an additional step to make this more interesting, also have each GPU do *different* work by choosing the random seed to be unique for each PE:

```cpp
int seed = nvshmem_my_pe();
```

Please work in [exercises/nvshmem_pi_step2.cpp](exercises/nvshmem_pi_step2.cpp). As before, deal with any `FIXME` locations, then come back and compile and run. What you should observe is that each PE gets a slightly different answer, and the answer should be less accurate because we're using fewer sample points.

Check the [solution](solutions/nvshmem_pi_step2.cpp) if you're stuck.

In [None]:
!nvcc -x cu -arch=sm_70 -rdc=true -I $NVSHMEM_HOME/include -L $NVSHMEM_HOME/lib -lnvshmem -lcuda -o nvshmem_pi_step2 exercises/nvshmem_pi_step2.cpp
!nvshmrun -np $NUM_DEVICES ./nvshmem_pi_step2

## Next

In the next notebook you will learn about one of the key features on NVSHMEM, *symmetric memory*, which allows for fine-grained inter-GPU communications from device-side code.

Please open the next notebook: [_The NVSHMEM Memory Model_](09_MCÏ€-NVSHMEM-Sym.ipynb).