首先你需要一张Nvidia RTX 显卡
比如单张 3090
或者4张
watch -n 1 nvidia-smi
Every 1.0s: nvidia-smi ubuntu-cy: Sat Jul 6 09:06:15 2024
Sat Jul 6 09:06:15 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 30% 29C P8 21W / 350W | 19113MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:02:00.0 Off | N/A |
| 30% 28C P8 29W / 350W | 18241MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce RTX 3090 Off | 00000000:03:00.0 Off | N/A |
| 30% 29C P8 17W / 350W | 18241MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce RTX 3090 Off | 00000000:04:00.0 Off | N/A |
| 30% 28C P8 23W / 350W | 18427MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 57721 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 58786 C python 19098MiB |
| 1 N/A N/A 57721 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 58786 C python 18226MiB |
| 2 N/A N/A 57721 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 58786 C python 18226MiB |
| 3 N/A N/A 57721 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 58786 C python 18412MiB |
+---------------------------------------------------------------------------------------+
推理引擎:
Ollama
llama.cpp (text-generation-webui-main)
Transformer
Ollama 介绍:
它有个keep_alive 参数。默认值5 mins 5分钟。如果是负数,则无限,不会自动卸载模型。
如果是正数,比如20m,则20分钟后无调用则卸载模型
ollama serve --keep_alive 1s
ollama run qwen2:0.5b --keep_alive 1s