Перейти к содержанию

Установка драйверов GPU AMD, ROCm и HIP на Ubuntu Linux

В этой статье

Данная инструкция описывает процесс установки драйверов GPU и стека ROCm (Radeon Open Compute) и HIP. Использование ROCm позволяет запускать задачи машинного обучения и ИИ на современных видеокартах AMD, а HIP ускоряет обработку графики например в Blender.

Внимание

Видеокарты AMD в HOSTKEY гарантированно работают ТОЛЬКО на Ubuntu 24.04 LTS!

Подготовка системы

Перед началом установки убедитесь, что система соответствует требованиям:

  1. Проверка ОС: cat /etc/os-release — в выводе должно быть VERSION_ID="24.04".

  2. Проверка ядра: uname -r — требуется ядро Linux версии ≥6.13. При необходимости установите последнее доступное mainline ядро:

    sudo add-apt-repository ppa:cappelikan/ppa -y
    sudo apt update && sudo apt install -y mainline
    sudo mainline install-latest
    reboot
    

  3. Обновление системы:

    sudo apt update && sudo apt upgrade -y
    

Ручная установка ROCm

  1. Установка зависимостей:

    sudo apt install -y wget gnupg2 build-essential dkms curl
    
  2. Очистка старых пакетов (рекомендуется):

    sudo dpkg --configure -a
    sudo apt remove --purge -y rocminfo
    sudo apt purge -y 'rocm*' 'amdgpu*' 'graphics*' 'hip*'
    sudo apt autoremove -y
    sudo apt clean
    sudo rm -rf /etc/apt/sources.list.d/amdgpu* /etc/apt/sources.list.d/rocm* /etc/apt/sources.list.d/graphics*
    sudo apt update
    
  3. Добавление ROCm репозитория "latest":

    sudo install -d -m 0755 /usr/share/keyrings
    wget -qO- https://repo.radeon.com/rocm/rocm.gpg.key | gpg --dearmor | sudo tee /usr/share/keyrings/rocm-archive-keyring.gpg >/dev/null
    echo "deb [arch=amd64 signed-by=/usr/share/keyrings/rocm-archive-keyring.gpg] https://repo.radeon.com/rocm/apt/latest/ noble main" | sudo tee /etc/apt/sources.list.d/rocm.list >/dev/null
    sudo apt update
    
  4. Установка ROCm стека:

    sudo apt install -y rocm-dev rocm-libs rocm-hip-sdk rocm-smi-lib amd-smi-lib rocminfo
    
  5. Создание симлинка /opt/rocm:

    ROCM_DIR=$(ls -d /opt/rocm-[0-9]* 2>/dev/null | sort -V | tail -n 1)
    sudo ln -sfn "$ROCM_DIR" /opt/rocm
    echo "ROCm установлен: $(basename "$ROCM_DIR")"
    
  6. Настройка прав доступа:

    sudo usermod -aG render,video $USER
    
  7. Настройка путей в ~/.bashrc:

    ROCM_VER=$(basename "$ROCM_DIR" | sed 's/rocm-//')
    cat >> ~/.bashrc << EOF
    
    # AMD ROCm Paths
    if [ -d "/opt/rocm-${ROCM_VER}" ]; then
    export PATH="/opt/rocm-${ROCM_VER}/bin:\$PATH"
    export LD_LIBRARY_PATH="/opt/rocm-${ROCM_VER}/hip/lib:/opt/rocm-${ROCM_VER}/lib:\$LD_LIBRARY_PATH"
    export ROCM_PATH="/opt/rocm-${ROCM_VER}"
    export HIP_CLANG_PATH="/opt/rocm-${ROCM_VER}/llvm/bin"
    fi
    EOF
    source ~/.bashrc
    

Проверка установки

После завершения установки и перезагрузки системы проверьте корректность работы драйверов. Для начала «разбудите» карту командой

echo on | sudo tee /sys/class/drm/card0/device/power/control
  1. Инструмент rocminfo:

    rocminfo
    
    Команда должна вывести список доступных GPU и их характеристики (агенты HSA).

    Пример вывода rocminfo при успешной установке драйверов и ROCm
    ROCk module is loaded
    =====================
    HSA System Attributes
    =====================
    Runtime Version:         1.18
    Runtime Ext Version:     1.14
    System Timestamp Freq.:  1000.000000MHz
    Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
    Machine Model:           LARGE
    System Endianness:       LITTLE
    Mwaitx:                  DISABLED
    XNACK enabled:           NO
    DMAbuf Support:          YES
    VMM Support:             NO
    
    ==========
    HSA Agents
    ==========
    *******
    Agent 1
    *******
    Name:                    AMD Ryzen 9 7950X 16-Core Processor
    Uuid:                    CPU-XX
    Marketing Name:          AMD Ryzen 9 7950X 16-Core Processor
    Vendor Name:             CPU
    Feature:                 None specified
    Profile:                 FULL_PROFILE
    Float Round Mode:        NEAR
    Max Queue Number:        0(0x0)
    Queue Min Size:          0(0x0)
    Queue Max Size:          0(0x0)
    Queue Type:              MULTI
    Node:                    0
    Device Type:             CPU
    Cache Info:
        L1:                      32768(0x8000) KB
    Chip ID:                 0(0x0)
    ASIC Revision:           0(0x0)
    Cacheline Size:          64(0x40)
    Max Clock Freq. (MHz):   5881
    BDFID:                   0
    Internal Node ID:        0
    Compute Unit:            32
    SIMDs per CU:            0
    Shader Engines:          0
    Shader Arrs. per Eng.:   0
    WatchPts on Addr. Ranges:1
    Memory Properties:
    Features:                None
    Pool Info:
        Pool 1
        Segment:                 GLOBAL; FLAGS: FINE GRAINED
        Size:                    130980620(0x7ce9b0c) KB
        Allocatable:             TRUE
        Alloc Granule:           4KB
        Alloc Recommended Granule:4KB
        Alloc Alignment:         4KB
        Accessible by all:       TRUE
        Pool 2
        Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
        Size:                    130980620(0x7ce9b0c) KB
        Allocatable:             TRUE
        Alloc Granule:           4KB
        Alloc Recommended Granule:4KB
        Alloc Alignment:         4KB
        Accessible by all:       TRUE
        Pool 3
        Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
        Size:                    130980620(0x7ce9b0c) KB
        Allocatable:             TRUE
        Alloc Granule:           4KB
        Alloc Recommended Granule:4KB
        Alloc Alignment:         4KB
        Accessible by all:       TRUE
        Pool 4
        Segment:                 GLOBAL; FLAGS: COARSE GRAINED
        Size:                    130980620(0x7ce9b0c) KB
        Allocatable:             TRUE
        Alloc Granule:           4KB
        Alloc Recommended Granule:4KB
        Alloc Alignment:         4KB
        Accessible by all:       TRUE
    ISA Info:
    *******
    Agent 2
    *******
    Name:                    gfx1036
    Uuid:                    GPU-XX
    Marketing Name:          AMD Radeon Graphics
    Vendor Name:             AMD
    Feature:                 KERNEL_DISPATCH
    Profile:                 BASE_PROFILE
    Float Round Mode:        NEAR
    Max Queue Number:        128(0x80)
    Queue Min Size:          64(0x40)
    Queue Max Size:          131072(0x20000)
    Queue Type:              MULTI
    Node:                    1
    Device Type:             GPU
    Cache Info:
        L1:                      16(0x10) KB
        L2:                      256(0x100) KB
    Chip ID:                 5710(0x164e)
    ASIC Revision:           1(0x1)
    Cacheline Size:          64(0x40)
    Max Clock Freq. (MHz):   2200
    BDFID:                   2560
    Internal Node ID:        1
    Compute Unit:            2
    SIMDs per CU:            2
    Shader Engines:          1
    Shader Arrs. per Eng.:   1
    WatchPts on Addr. Ranges:4
    Coherent Host Access:    FALSE
    Memory Properties:       APU
    Features:                KERNEL_DISPATCH
    Fast F16 Operation:      TRUE
    Wavefront Size:          32(0x20)
    Workgroup Max Size:      1024(0x400)
    Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
    Max Waves Per CU:        32(0x20)
    Max Work-item Per CU:    1024(0x400)
    Grid Max Size:           4294967295(0xffffffff)
    Grid Max Size per Dimension:
        x                        2147483647(0x7fffffff)
        y                        65535(0xffff)
        z                        65535(0xffff)
    Max fbarriers/Workgrp:   32
    Packet Processor uCode:: 18
    SDMA engine uCode::      1
    IOMMU Support::          None
    Pool Info:
        Pool 1
        Segment:                 GLOBAL; FLAGS: COARSE GRAINED
        Size:                    65490308(0x3e74d84) KB
        Allocatable:             TRUE
        Alloc Granule:           4KB
        Alloc Recommended Granule:2048KB
        Alloc Alignment:         4KB
        Accessible by all:       FALSE
        Pool 2
        Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
        Size:                    65490308(0x3e74d84) KB
        Allocatable:             TRUE
        Alloc Granule:           4KB
        Alloc Recommended Granule:2048KB
        Alloc Alignment:         4KB
        Accessible by all:       FALSE
        Pool 3
        Segment:                 GROUP
        Size:                    64(0x40) KB
        Allocatable:             FALSE
        Alloc Granule:           0KB
        Alloc Recommended Granule:0KB
        Alloc Alignment:         0KB
        Accessible by all:       FALSE
    ISA Info:
        ISA 1
        Name:                    amdgcn-amd-amdhsa--gfx1036
        Machine Models:          HSA_MACHINE_MODEL_LARGE
        Profiles:                HSA_PROFILE_BASE
        Default Rounding Mode:   NEAR
        Default Rounding Mode:   NEAR
        Fast f16:                TRUE
        Workgroup Max Size:      1024(0x400)
        Workgroup Max Size per Dimension:
            x                        1024(0x400)
            y                        1024(0x400)
            z                        1024(0x400)
        Grid Max Size:           4294967295(0xffffffff)
        Grid Max Size per Dimension:
            x                        2147483647(0x7fffffff)
            y                        65535(0xffff)
            z                        65535(0xffff)
        FBarrier Max Size:       32
        ISA 2
        Name:                    amdgcn-amd-amdhsa--gfx10-3-generic
        Machine Models:          HSA_MACHINE_MODEL_LARGE
        Profiles:                HSA_PROFILE_BASE
        Default Rounding Mode:   NEAR
        Default Rounding Mode:   NEAR
        Fast f16:                TRUE
        Workgroup Max Size:      1024(0x400)
        Workgroup Max Size per Dimension:
            x                        1024(0x400)
            y                        1024(0x400)
            z                        1024(0x400)
        Grid Max Size:           4294967295(0xffffffff)
        Grid Max Size per Dimension:
            x                        2147483647(0x7fffffff)
            y                        65535(0xffff)
            z                        65535(0xffff)
        FBarrier Max Size:       32
    *** Done ***
    
  2. Инструмент rocm-smi:

    rocm-smi
    

    Результат вывода команды:

    WARNING: AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status
    
    =========================================== ROCm System Management Interface ===========================================
    ===================================================== Concise Info =====================================================
    Device  Node  IDs              Temp    Power    Partitions          SCLK     MCLK     Fan    Perf  PwrCap  VRAM%  GPU%
          (DID,     GUID)  (Edge)  (Avg)    (Mem, Compute, ID)
    ========================================================================================================================
    0       1     0x7551,   64106  67.0°C  184.0W   N/A, N/A, 0         3259Mhz  96Mhz    54.9%  auto  300.0W  31%    100%
    1       2     0x164e,   36957  46.0°C  35.194W  N/A, N/A, 0         N/A      1800Mhz  0%     auto  N/A     3%     0%
    ========================================================================================================================
    ================================================= End of ROCm SMI Log ==================================================
    
  3. Инструмент amd-smi:

    amd-smi
    

    Результат вывода команды (обратите внимание что отображается еще один GPU, встроенный в процессор):

    +------------------------------------------------------------------------------+
    | AMD-SMI 26.2.0+021c61fc    amdgpu version: 6.18.1-061801 ROCm version: 7.1.1 |
    | VBIOS version: 00158746                                                      |
    | Platform: Linux Baremetal                                                    |
    |-------------------------------------+----------------------------------------|
    | BDF                        GPU-Name | Mem-Uti   Temp   UEC       Power-Usage |
    | GPU  HIP-ID  OAM-ID  Partition-Mode | GFX-Uti    Fan               Mem-Usage |
    |=====================================+========================================|
    | 0000:03:00.0    AMD Radeon Graphics | 68 %     82 °C   0           285/300 W |
    |   0       0     N/A             N/A | 95 %    52.94           10406/32624 MB |
    |-------------------------------------+----------------------------------------|
    | 0000:0a:00.0 ...X 16-Core Processor | N/A        N/A   0             N/A/0 W |
    |   1       1     N/A             N/A | N/A        N/A               15/512 MB |
    +-------------------------------------+----------------------------------------+
    +------------------------------------------------------------------------------+
    | Processes:                                                                   |
    |  GPU        PID  Process Name          GTT_MEM  VRAM_MEM  MEM_USAGE     CU % |
    |==============================================================================|
    |    0      12335  ollama                 2.0 MB    9.7 GB    10.1 GB  N/A     |
    |    1      12335  ollama                 2.0 MB   35.2 KB      0.0 B  N/A     |
    +------------------------------------------------------------------------------+
    

Все это позволит увидеть текущую загрузку, температуру и потребление видеокарт.

Примечание

amd-smi постепенно заменяет rocm-smi как основную утилиту мониторинга в новых версиях ROCm.

Работа с Docker

Если вы используете Docker, необходимо установить инструментарий для проброса GPU в контейнеры:

sudo apt install -y rocm-gdb rocm-container-toolkit
sudo systemctl restart docker

Автоматическая установка "в 1 клик"

Bash-скрипт для полной автоматизации процесса. Он определяет последнюю версию ROCm, устанавливает драйверы, утилиту rocminfo и настраивает пути. Скопируйте его и вставьте в командную строку вашего сервера и запустите.

#!/bin/bash
set -euo pipefail

# Universal AMD GPU + ROCm ("latest") installer for Ubuntu 24.04+

# FLAGS (enable/disable steps here)

DO_APT_UPGRADE=1

DO_OS_POLICY_CHECK=1                 # Enforce policy: only Ubuntu 24.04 LTS
ALLOWED_UBUNTU_VERSIONS=("24.04")

DO_KERNEL_POLICY_CHECK=1          # Enforce policy "kernel >= REQUIRED_KERNEL_MM"
DO_INSTALL_MAINLINE_KERNEL=1      # If kernel is lower, try installing a mainline kernel
REQUIRED_KERNEL_MM="6.13"

DO_GRUB_PARAMS=0                  # Add GRUB params (conservatively disabled by default)
GRUB_PARAMS=("amdgpu.gpu_recovery=1" "amdgpu.runpm=0" "amdgpu.ppfeaturemask=0xffffffff")

DO_PURGE_OLD_PACKAGES=1           # Remove old rocm/amdgpu/hip packages (best effort)
DO_SETUP_ROCM_REPO=1              # Add ROCm repository
DO_INSTALL_ROCM=1                 # Install rocm-dev/rocm-libs/...
DO_LINK_OPT_ROCM=1                # Make /opt/rocm -> /opt/rocm-X.Y.Z (if found)

DO_USER_GROUPS=1                  # Add user to render,video
DO_BASHRC_PATH=1                  # Add /opt/rocm/bin to PATH via ~/.bashrc

DO_OLLAMA_AMDGPU_IDS_WORKAROUND=1 # Create amdgpu.ids link for some Ollama builds
DO_GPU_POWER_CONTROL_ON=1         # Best effort: set power/control=on (if available)


# Start
echo "Starting AMD ROCm installation..."

# Dependency checks
for cmd in lspci wget gpg curl lsb_release; do
  if ! command -v "$cmd" >/dev/null 2>&1; then
    echo "Missing dependency: $cmd"
    exit 1
  fi
done


# Check this is Ubuntu (robust: don't grep raw file)

. /etc/os-release
if [[ "${ID:-}" != "ubuntu" ]]; then
  echo "This script is intended for Ubuntu. Exiting."
  exit 1
fi


# Restrict script to specific Ubuntu releases (only 24.04)
# ALLOWED_UBUNTU_VERSIONS=("24.04")
if [[ "${DO_OS_POLICY_CHECK}" -eq 1 ]]; then
  UBUNTU_VERSION="$(lsb_release -rs)"  # e.g. 24.04 [web:62]

  ok=0
  for v in "${ALLOWED_UBUNTU_VERSIONS[@]}"; do
    if [[ "${UBUNTU_VERSION}" == "${v}" ]]; then
      ok=1
      break
    fi
  done

  if [[ "${ok}" -ne 1 ]]; then
    echo "Unsupported Ubuntu version: ${UBUNTU_VERSION}"
    echo "Allowed versions: ${ALLOWED_UBUNTU_VERSIONS[*]}"
    exit 1
  fi
fi

# Detect AMD GPU (vendor 1002)
AMD_GPU_LINES="$(lspci -nn | grep -iE 'vga|3d' | grep -i '1002:' || true)"
if [[ -z "${AMD_GPU_LINES}" ]]; then
  echo "No AMD GPU detected (vendor 1002)."
  exit 1
fi
echo "AMD GPUs detected:"
echo "${AMD_GPU_LINES}"

# Update/upgrade
if [[ "${DO_APT_UPGRADE}" -eq 1 ]]; then
  sudo apt update
  sudo apt upgrade -y
fi

# Kernel check/upgrade (script policy)
KERNEL_INSTALLED=0
echo "Current kernel: $(uname -r)"

if [[ "${DO_KERNEL_POLICY_CHECK}" -eq 1 ]]; then
  KERNEL_VERSION="$(uname -r)"
  KERNEL_MM="$(echo "${KERNEL_VERSION}" | sed -nE 's/^([0-9]+)\.([0-9]+).*/\1.\2/p')"

  req_major="${REQUIRED_KERNEL_MM%.*}"
  req_minor="${REQUIRED_KERNEL_MM#*.}"
  cur_major="${KERNEL_MM%.*}"
  cur_minor="${KERNEL_MM#*.}"

  KERNEL_OK=0
  if [[ "${cur_major}" -gt "${req_major}" ]] || \
     [[ "${cur_major}" -eq "${req_major}" && "${cur_minor}" -ge "${req_minor}" ]]; then
    KERNEL_OK=1
  fi

  if [[ "${KERNEL_OK}" -ne 1 ]]; then
    echo "Kernel is older than required by this script policy (>= ${REQUIRED_KERNEL_MM})."
    if [[ "${DO_INSTALL_MAINLINE_KERNEL}" -eq 1 ]]; then
      echo "Installing latest mainline kernel..."
      sudo add-apt-repository ppa:cappelikan/ppa -y 2>/dev/null || true
      sudo apt update
      sudo apt install -y mainline pkexec
      sudo mainline install-latest
      echo "Mainline kernel installed. Reboot required to activate it."
      KERNEL_INSTALLED=1
    else
      echo "Mainline kernel install is disabled by flag DO_INSTALL_MAINLINE_KERNEL=0. Continuing."
    fi
  fi
fi

# Optional: GRUB parameters (append-only)
if [[ "${DO_GRUB_PARAMS}" -eq 1 ]]; then
  GRUB_FILE="/etc/default/grub"
  GRUB_CHANGED=0

  for param in "${GRUB_PARAMS[@]}"; do
    if ! sudo grep -qE "GRUB_CMDLINE_LINUX_DEFAULT=.*\b${param}\b" "${GRUB_FILE}"; then
      sudo cp -a "${GRUB_FILE}" "${GRUB_FILE}.backup.$(date +%F-%H%M%S)"
      sudo sed -i -E "s/^(GRUB_CMDLINE_LINUX_DEFAULT=\")([^\"]*)\"/\1\2 ${param}\"/" "${GRUB_FILE}"
      echo "Added GRUB param: ${param}"
      GRUB_CHANGED=1
    else
      echo "GRUB param already present: ${param}"
    fi
  done

  if [[ "${GRUB_CHANGED}" -eq 1 ]]; then
    sudo update-grub
    echo "GRUB updated."
  fi
else
  echo "Skipping GRUB parameters (DO_GRUB_PARAMS=0)."
fi

# Best effort: purge old packages/repos
if [[ "${DO_PURGE_OLD_PACKAGES}" -eq 1 ]]; then
  echo "Removing previous ROCm/AMDGPU packages (best effort)..."
  sudo dpkg --configure -a || true
  sudo apt remove --purge -y rocminfo || true
  sudo apt purge -y 'rocm*' 'amdgpu*' 'graphics*' 'hip*' || true
  sudo apt autoremove -y || true
  sudo apt clean || true
  sudo rm -rf /etc/apt/sources.list.d/amdgpu* /etc/apt/sources.list.d/rocm* /etc/apt/sources.list.d/graphics* || true
  sudo apt update || true
else
  echo "Skipping purge old packages (DO_PURGE_OLD_PACKAGES=0)."
fi

# Add ROCm "latest" repository
if [[ "${DO_SETUP_ROCM_REPO}" -eq 1 ]]; then
  echo "Setting up ROCm 'latest' repository..."

  . /etc/os-release

  UBUNTU_CODENAME="${UBUNTU_CODENAME:-${VERSION_CODENAME:-}}"
  if [[ -z "${UBUNTU_CODENAME}" ]]; then
  echo "Cannot detect Ubuntu codename (UBUNTU_CODENAME/VERSION_CODENAME)."
  exit 1
  fi

  sudo install -d -m 0755 /usr/share/keyrings
  wget -qO- https://repo.radeon.com/rocm/rocm.gpg.key \
    | gpg --dearmor \
    | sudo tee /usr/share/keyrings/rocm-archive-keyring.gpg >/dev/null

  echo "deb [arch=amd64 signed-by=/usr/share/keyrings/rocm-archive-keyring.gpg] https://repo.radeon.com/rocm/apt/latest/ ${UBUNTU_CODENAME} main" \
    | sudo tee /etc/apt/sources.list.d/rocm.list >/dev/null

  # Pin repo.radeon.com above Ubuntu
  sudo tee /etc/apt/preferences.d/rocm-pin-600 >/dev/null <<'EOF'
Package: *
Pin: origin repo.radeon.com
Pin-Priority: 600
EOF

else
  echo "Skipping ROCm repo setup (DO_SETUP_ROCM_REPO=0)."
fi

# Install ROCm packages
if [[ "${DO_INSTALL_ROCM}" -eq 1 ]]; then
  echo "Installing ROCm stack..."
  sudo apt update
  sudo apt install -y -o Dpkg::Options::="--force-overwrite" \
    rocm-dev rocm-libs rocm-hip-sdk rocm-smi-lib rocminfo
else
  echo "Skipping ROCm install (DO_INSTALL_ROCM=0)."
fi

# /opt/rocm -> /opt/rocm-X.Y.Z
if [[ "${DO_LINK_OPT_ROCM}" -eq 1 ]]; then
  INSTALLED_ROCM_DIR="$(ls -d /opt/rocm-[0-9]* 2>/dev/null | sort -V | tail -n 1 || true)"
  if [[ -n "${INSTALLED_ROCM_DIR}" ]]; then
    REAL_VERSION="$(echo "${INSTALLED_ROCM_DIR}" | grep -oE '[0-9]+\.[0-9]+\.[0-9]+' || echo latest)"
    sudo ln -sfn "${INSTALLED_ROCM_DIR}" /opt/rocm
    echo "ROCm detected: ${REAL_VERSION} (${INSTALLED_ROCM_DIR}); linked /opt/rocm -> ${INSTALLED_ROCM_DIR}"
  else
    echo "No /opt/rocm-X.Y.Z directory found; leaving /opt/rocm as-is."
  fi
else
  echo "Skipping /opt/rocm symlink (DO_LINK_OPT_ROCM=0)."
fi

# User groups: render,video
if [[ "${DO_USER_GROUPS}" -eq 1 ]]; then
  TARGET_USER="${SUDO_USER:-$USER}"
  sudo usermod -aG render,video "${TARGET_USER}" || true
  echo "User added to groups: render, video (${TARGET_USER}). Re-login or reboot required."
else
  echo "Skipping user groups (DO_USER_GROUPS=0)."
fi

# PATH + LD_LIBRARY_PATH in ~/.bashrc
if [[ "${DO_BASHRC_PATH}" -eq 1 ]]; then
  TARGET_USER="${SUDO_USER:-$USER}"
  TARGET_HOME="$(getent passwd "${TARGET_USER}" | cut -d: -f6)"
  TARGET_BASHRC="${TARGET_HOME}/.bashrc"
  MARKER="AMD ROCm Paths"

  if [[ ! -f "${TARGET_BASHRC}" ]]; then
    sudo -u "${TARGET_USER}" touch "${TARGET_BASHRC}" || true
  fi

  # Determining the installed ROCm version
  ROCM_VERSION_DIR="$(ls -d /opt/rocm-[0-9]* 2>/dev/null | sort -V | tail -n 1 || true)"
  if [[ -n "${ROCM_VERSION_DIR}" ]]; then
    ROCM_VERSION="$(basename "${ROCM_VERSION_DIR}" | sed 's/rocm-//')"
    echo "Using ROCm version: ${ROCM_VERSION} (${ROCM_VERSION_DIR})"
  else
    ROCM_VERSION="unknown"
    echo "Warning: No /opt/rocm-X.Y.Z found; using generic paths"
  fi

  if ! grep -q "${MARKER}" "${TARGET_BASHRC}" 2>/dev/null; then
    cat >> "${TARGET_BASHRC}" <<EOF

# ${MARKER}
if [ -d "/opt/rocm-${ROCM_VERSION}" ]; then
  export PATH="/opt/rocm-${ROCM_VERSION}/bin:\$PATH"
  export LD_LIBRARY_PATH="/opt/rocm-${ROCM_VERSION}/hip/lib:/opt/rocm-${ROCM_VERSION}/lib:\$LD_LIBRARY_PATH"
  export ROCM_PATH="/opt/rocm-${ROCM_VERSION}"
  export HIP_CLANG_PATH="/opt/rocm-${ROCM_VERSION}/llvm/bin"
fi
EOF
    echo "Added full ROCm paths (PATH+LD_LIBRARY_PATH) to ${TARGET_BASHRC}"
  else
    echo "ROCm PATH block already present in ${TARGET_BASHRC}"
  fi

  # Apply to the current session
  if [[ -n "${ROCM_VERSION_DIR}" ]]; then
    export PATH="${ROCM_VERSION_DIR}/bin:${PATH}"
    export LD_LIBRARY_PATH="${ROCM_VERSION_DIR}/hip/lib:${ROCM_VERSION_DIR}/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
    export ROCM_PATH="${ROCM_VERSION_DIR}"
    export HIP_CLANG_PATH="${ROCM_VERSION_DIR}/llvm/bin"
  fi
else
  echo "Skipping .bashrc PATH (DO_BASHRC_PATH=0)."
fi

# Workaround for amdgpu.ids (some Ollama builds)
if [[ "${DO_OLLAMA_AMDGPU_IDS_WORKAROUND}" -eq 1 ]]; then
  if [[ -f /usr/share/libdrm/amdgpu.ids ]]; then
    sudo mkdir -p /opt/amdgpu/share/libdrm
    sudo ln -sf /usr/share/libdrm/amdgpu.ids /opt/amdgpu/share/libdrm/amdgpu.ids
    echo "Created compatibility link: /opt/amdgpu/share/libdrm/amdgpu.ids -> /usr/share/libdrm/amdgpu.ids"
  else
    echo "amdgpu.ids not found at /usr/share/libdrm/amdgpu.ids; skipping workaround."
  fi
else
  echo "Skipping Ollama amdgpu.ids workaround (DO_OLLAMA_AMDGPU_IDS_WORKAROUND=0)."
fi

# Best effort: power/control=on
if [[ "${DO_GPU_POWER_CONTROL_ON}" -eq 1 ]]; then
  if [[ -w /sys/class/drm/card0/device/power/control ]]; then
    echo on | sudo tee /sys/class/drm/card0/device/power/control >/dev/null
    echo "Set /sys/class/drm/card0/device/power/control = on"
  else
    echo "No write access to /sys/class/drm/card0/device/power/control; skipping."
  fi
else
  echo "Skipping GPU power control (DO_GPU_POWER_CONTROL_ON=0)."
fi

# Final
echo "Installation finished."
if [[ "${KERNEL_INSTALLED}" -eq 1 ]]; then
  echo "Reboot required to activate the new kernel."
else
  echo "Reboot recommended to apply group membership changes."
fi

echo "After reboot, verify:"
echo "  rocminfo"
echo "  amd-smi (if installed)"

Внимание

После выполнения скрипта обязательна перезагрузка сервера командой sudo reboot. Это необходимо для активации новых групп пользователей и модулей ядра.

question_mark
Я могу вам чем-то помочь?
question_mark
ИИ Помощник ×