Hey @kevinr-
In this case, I would take the following approach:
-
First, add some tests to
test_packages.py
that check whetherllama_cpp
can be imported and has the GPU access you expect. These will fail until the package is correctly installed. -
Check to see whether there are any binaries on conda-forge for
llama-cpp-python
that are pre-compiled for GPU support, specifically for CUDA 11.8. We do this for other libraries in the environment file, e.g.,tensorflow=2.13.1=*cuda118*
. Different packages have different conventions for how they specify this (some include the.
in11.8
, others don’t, some only usecu
instead ofcuda
) so you’ll have to do a bit of digging to figure that out.If there are any such binaries, I would try to install those and test whether it works. This might look like modifying
environment-gpu.yml
to contain a line like- llama-cpp-python=*=*cuda118*`
and unpinning packages as necessary to get a solved environment. Once you have a solved environment, build the container (
make build
) and run the tests from step 1 (make test-container
) to see whether it worked. -
If that doesn’t work, I’d try adding a
- pip:
section to the environment file and modifying theDockerfile
so that the values ofCMAKE_ARGS
andFORCE_CMAKE
were set appropriately. Then, similarly try installing and see whether it works. -
If the above fails, try modifying the Dockerfile as necessary to get a functioning install. Run tests to ensure it’s working.
Hope this helps getting this sorted. Please note that per the rules (and as noted here), submissions may not use software that is not licensed under an Open Source Initiative license, which the Llama license is not, and so solutions that use Llama models would not qualify for prizes. It seems as though llama-cpp-python
supports models beyond Llama, so it’s not unreasonable to submit a PR that adds support for it.
-Chris