KSPP for openRuyi

0 概述

KSPP 全称 Kernel Self Protection Project, 是一个旨在提升Linux内核本身抵御漏洞利用能力的安全项目.

本篇旨在对比官方推荐的 config 文件与 openRuyi 下的 patch, config, spec 文件来勘察 riscv 配置的合理性.

KSPP 项目说明了安全性加固内核的配置应该是什么样子的.

Sometimes people ask the Kernel Self Protection Project what a secure set of build CONFIGs and runtime settings are. This is a brain-dump of the various options for a particularly paranoid system.

作者还表述到, 我们也可以通过 kernel-hardening-checker 项目来获取推荐配置.

Another place to find recommended kernel hardening settings is via the “kernel-hardening-checker” tool maintained by Alexander Popov.

关于 kernel-hardening-checker 文后 section 3 我们会专门阐述该工具.

PS: 我们需要注意到, 这些配置是非常偏执的(专门针对内核的安全性进行了探讨, 可能按照作者的意思是一个极度注重安全的配置).

1 对照方法

1.1 对比 KSPP 官方模板

第一个非常 project 的方法就是筛选出 SPECS/linux 下跟 KSPP 项目相关的 configs.

将可能的关键词存储在 kspp.keys 文件.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
cat > kspp.keys <<'EOF'
CONFIG_BUG
CONFIG_DEBUG_KERNEL
CONFIG_DEBUG_RODATA
CONFIG_STRICT_KERNEL_RWX
CONFIG_DEBUG_WX
CONFIG_STACKPROTECTOR
CONFIG_STACKPROTECTOR_STRONG
CONFIG_STRICT_DEVMEM
CONFIG_IO_STRICT_DEVMEM
CONFIG_PROC_MEM_NO_FORCE
CONFIG_SYN_COOKIES
CONFIG_LIST_HARDENED
CONFIG_DEBUG_CREDENTIALS
CONFIG_DEBUG_NOTIFIERS
CONFIG_DEBUG_LIST
CONFIG_DEBUG_SG
CONFIG_DEBUG_VIRTUAL
CONFIG_BUG_ON_DATA_CORRUPTION
CONFIG_SCHED_STACK_END_CHECK
CONFIG_SECCOMP
CONFIG_SECCOMP_FILTER
CONFIG_SECURITY
CONFIG_SECURITY_YAMA
CONFIG_SECURITY_LANDLOCK
CONFIG_SECURITY_LOCKDOWN_LSM
CONFIG_SECURITY_LOCKDOWN_LSM_EARLY
CONFIG_LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY
CONFIG_HARDENED_USERCOPY
CONFIG_SLAB_FREELIST_RANDOM
CONFIG_SLAB_FREELIST_HARDENED
CONFIG_SLAB_BUCKETS
CONFIG_RANDOM_KMALLOC_CACHES
CONFIG_SHUFFLE_PAGE_ALLOCATOR
CONFIG_PAGE_TABLE_CHECK
CONFIG_PAGE_TABLE_CHECK_ENFORCED
CONFIG_SLUB_DEBUG
CONFIG_PAGE_POISONING_ZERO
CONFIG_INIT_ON_ALLOC_DEFAULT_ON
CONFIG_INIT_ON_FREE_DEFAULT_ON
CONFIG_INIT_STACK_ALL_ZERO
CONFIG_VMAP_STACK
CONFIG_REFCOUNT_FULL
CONFIG_FORTIFY_SOURCE
CONFIG_SECURITY_DMESG_RESTRICT
CONFIG_UBSAN
CONFIG_UBSAN_TRAP
CONFIG_UBSAN_BOUNDS
CONFIG_UBSAN_SANITIZE_ALL
CONFIG_UBSAN_LOCAL_BOUNDS
CONFIG_KFENCE
CONFIG_KFENCE_SAMPLE_INTERVAL
CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT
CONFIG_WERROR
CONFIG_EFI_DISABLE_PCI_DMA
CONFIG_IOMMU_SUPPORT
CONFIG_IOMMU_DEFAULT_DMA_STRICT
CONFIG_HW_RANDOM_TPM
CONFIG_RANDOM_TRUST_BOOTLOADER
CONFIG_RANDOM_TRUST_CPU
CONFIG_RANDSTRUCT
CONFIG_SCHED_CORE
CONFIG_ZERO_CALL_USED_REGS
CONFIG_RESET_ATTACK_MITIGATION
CONFIG_STATIC_USERMODEHELPER
CONFIG_PANIC_ON_OOPS
CONFIG_PANIC_TIMEOUT
CONFIG_MODULE_SIG
CONFIG_MODULE_SIG_FORCE
CONFIG_MODULE_SIG_ALL
CONFIG_MODULE_SIG_SHA512
CONFIG_MODULE_SIG_HASH
CONFIG_MODULE_SIG_KEY
CONFIG_STRICT_MODULE_RWX
# gcc
CONFIG_GCC
# x86_64
CONFIG_X86_64
CONFIG_DEFAULT_MMAP_MIN_ADDR
CONFIG_RANDOMIZE_BASE
CONFIG_RANDOMIZE_MEMORY
CONFIG_LEGACY_VSYSCALL_NONE
CONFIG_MITIGATION_PAGE_TABLE_ISOLATION
CONFIG_X86_KERNEL_IBT
CONFIG_X86_USER_SHADOW_STACK
CONFIG_INTEL_IOMMU
CONFIG_INTEL_IOMMU_DEFAULT_ON
CONFIG_INTEL_IOMMU_SVM
CONFIG_AMD_IOMMU
CONFIG_AMD_IOMMU_V2
CONFIG_MITIGATION_SLS
CONFIG_CFI_CLANG
EOF
1
2
3
4
5
6
# PS: 如果下面的所有选项我们去掉 w, 则可能会匹配到意料之外的
# 惊喜, 例如本来准备强匹配A, 但是 A* 也被 匹配到了.
# 已启用
grep -Fwf kspp.keys config.riscv64
# 未启用
grep -Fwf kspp.keys config.riscv64 | grep 'is not set'

最后我们检索出了这些.

已启用.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ grep -Fwf kspp.keys config.riscv64 > kspp.riscv64
CONFIG_RANDOMIZE_BASE=y
CONFIG_MODULE_SIG_SHA256=y
CONFIG_SLAB_FREELIST_RANDOM=y
CONFIG_SLAB_FREELIST_HARDENED=y
CONFIG_SECURITY=y
CONFIG_SECURITY_INFINIBAND=y
CONFIG_SECURITY_NETWORK_XFRM=y
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_APPARMOR=y
CONFIG_SECURITY_YAMA=y
CONFIG_SECURITY_LOCKDOWN_LSM=y
CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y
CONFIG_SECURITY_LANDLOCK=y
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
CONFIG_FORTIFY_SOURCE=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_KFENCE=y
CONFIG_PANIC_ON_OOPS=y
CONFIG_DEBUG_LIST=y
CONFIG_IO_STRICT_DEVMEM=y
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
$ grep -Fwf kspp.keys config.x86_64 > kspp.x86_64
CONFIG_SCHED_CORE=y
CONFIG_LEGACY_VSYSCALL_NONE=y
CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT=y
CONFIG_SLAB_FREELIST_RANDOM=y
CONFIG_SLAB_FREELIST_HARDENED=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=65536
CONFIG_RESET_ATTACK_MITIGATION=y
CONFIG_AMD_IOMMU=y
CONFIG_INTEL_IOMMU=y
CONFIG_INTEL_IOMMU_SVM=y
# CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
# CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON is not set
# CONFIG_INTEL_IOMMU_PERF_EVENTS is not set
CONFIG_SECURITY_DMESG_RESTRICT=y
CONFIG_SECURITY=y
CONFIG_SECURITY_NETWORK_XFRM=y
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_DEBUG=y
CONFIG_SECURITY_SMACK=y
CONFIG_SECURITY_SMACK_NETFILTER=y
CONFIG_SECURITY_TOMOYO=y
CONFIG_SECURITY_APPARMOR=y
CONFIG_SECURITY_YAMA=y
CONFIG_SECURITY_LANDLOCK=y
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
CONFIG_ZERO_CALL_USED_REGS=y
CONFIG_FORTIFY_SOURCE=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_KFENCE=y
CONFIG_KFENCE_SAMPLE_INTERVAL=0
CONFIG_IO_STRICT_DEVMEM=y

未启用.

1
2
3
4
$ grep -Fwf kspp.keys config.riscv64 | grep 'is not set'
# 无
$ grep -Fwf kspp.keys config.x86_64 | grep 'is not set'
# CONFIG_INTEL_IOMMU_DEFAULT_ON is not set

至于 patchspec 文件我以为会存在对应定义, 查阅了一下, 不存在.

通过上面我们可以看到 x64, riscv64 共同存在的有.

1
2
3
4
5
6
7
8
9
10
11
$ comm -12 <(sort kspp.riscv64) <(sort kspp.x86_64) > common.kspp
CONFIG_FORTIFY_SOURCE=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
CONFIG_IO_STRICT_DEVMEM=y
CONFIG_KFENCE=y
CONFIG_SECURITY_LANDLOCK=y
CONFIG_SECURITY=y
CONFIG_SECURITY_YAMA=y
CONFIG_SLAB_FREELIST_HARDENED=y
CONFIG_SLAB_FREELIST_RANDOM=y

riscv64 独有配置的有.

1
2
3
4
5
6
$ comm -23 <(sort kspp.riscv64) <(sort kspp.x86_64) > riscv64_only.kspp
CONFIG_DEBUG_LIST=y
CONFIG_PANIC_ON_OOPS=y
CONFIG_RANDOMIZE_BASE=y
CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y
CONFIG_SECURITY_LOCKDOWN_LSM=y

x64 独有的配置有.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ comm -13 <(sort kspp.riscv64) <(sort kspp.x86_64) > x86_64_only.kspp
CONFIG_AMD_IOMMU=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=65536
# CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
CONFIG_INTEL_IOMMU_SVM=y
CONFIG_INTEL_IOMMU=y
CONFIG_KFENCE_SAMPLE_INTERVAL=0
CONFIG_LEGACY_VSYSCALL_NONE=y
CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT=y
CONFIG_RESET_ATTACK_MITIGATION=y
CONFIG_SCHED_CORE=y
CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_SECURITY_DMESG_RESTRICT=y
CONFIG_ZERO_CALL_USED_REGS=y

1.2 对比不同发行版的差异性

第二个思路是拿不同的发行版的 config 文件进行对比来探讨, 根据不同社区的配置, 我们可能能够看到不同社区对于内核安全配置的相关缺陷以及不同社区的态度.

这里采集的数据如下:

  1. openEuler

  2. Ubuntu Noble(理由: 内核版本较新): https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/

  3. Arch Linux:

  4. Fedora:

2 合理性分析

我们根据 section 1 的公共性配置部分和架构独有性配置部分进行探讨.

并且, 针对未覆盖的 config 我们也要进一步进行探讨.

PS: 所有的分析都是静态分析, 不涉及代码的调试运行

2.1 公共性部分

2.1.1 CONFIG_SLAB_FREELIST_RANDOM

在 KSPP 的推荐配置中.

1
2
3
4
5
# Randomize allocator freelists, harden metadata.
CONFIG_SLAB_FREELIST_RANDOM=y
CONFIG_SLAB_FREELIST_HARDENED=y
CONFIG_SLAB_BUCKETS=y
CONFIG_RANDOM_KMALLOC_CACHES=y

结论: 合理.

PS: config 启用情况: 1 表示启用, - 表示无架构支持, 0 表示未启用

发行版 x86_64 riscv64
openEuler 1 1
Ubuntu Noble 1 1
Arch Linux 1 -
Fedora 1 1

CONFIG_SLAB_FREELIST_RANDOM 是 kernel 中用于 slub/slab 分配器的安全特性配置, 在以上所有发行版都有采用.

启用之后, 会在 slab 的空闲对象中以随机的顺序初始化 freelist 列表, 这样就会使 slab 中的空闲对象以随机的方式串联在 freelist 中,无法预测.

针对 UAF 问题, 当然有 blog 表明防止 UAF 是一个误解, 该配置主要防止的是 heap overflow 的问题.

当没有 freelist randomisation 的时候, 对象在 slab 中是顺序分布的, 如果攻击者连续分配 a, b, 此时它们是相邻的, 此时攻击者可以通过溢出 a 来覆盖 b.

2.1.2 CONFIG_SLAB_FREELIST_HARDENED

实际上 CONFIG_SLAB_FREELIST_HARDENED 也是针对 slab 管理的一个安全性加固选项.

发行版 x86_64 riscv64
openEuler 0 0
Ubuntu Noble 1 1
Arch Linux 1 -
Fedora 1 1

结论: 合理.

CONFIG_SLAB_FREELIST_HARDENED 的作用是防止 freelist 指针被篡改.

原理是加密防止 freelist 指针指向的地址, 如果 hacker 篡改指针地址就会被检测到.

2.1.3 CONFIG_SECURITY_YAMA

发行版 x86_64 riscv64
openEuler 1 1
Ubuntu Noble 1 1
Arch Linux 1 -
Fedora 1 1

结论: 合理.

CONFIG_SECURITY_YAMA 主要用于限制 ptrace 系统调用的滥用.

在 Linux 的 DAC 权限模型下, 同一个用户下的所有进程是相互信任的, 也就是说如果 A 进程被攻破了, B 进程就可以被 A 进程通过 ptrace 控制. yama 就是建立了一道防火墙, 通过 /proc/sys/kernel/yama/ptrace_scope 接口在运行时控制防护级别

  • Mode 0: 完全不限制, 任何进程都可以调试(ptrace)同一个用户下的其他进程.

  • Mode 1: 只允许父进程调试子进程.

  • Mode 2: 只有拥有 CAP_SYS_PTRACE 能力的进程才能使用 ptrace.

  • Mode 3: 完全禁用 PTRACE_ATTACH.

https://www.kernel.org/doc/html/v5.3/admin-guide/LSM/Yama.html

2.1.4 CONFIG_SECURITY_LANDLOCK

发行版 x86_64 riscv64
openEuler 0 0
Ubuntu Noble 1 1
Arch Linux 1 -
Fedora 1 1

结论: 合理.

CONFIG_SECURITY_LANDLOCK 是自 Linux 5.13 起支持的Linux安全模块, 这是一个为非特权进程设计的轻量级的 sandbox 机制.

它允许普通进程在运行时为自己设置文件系统访问控制的规则(总而言之, 普通用户也能限制自身对文件系统的访问控制), 启用之后会出现一组新的系统调用.

https://www.techug.com/post/landlock-ing-linux/

2.1.5 CONFIG_INIT_ON_ALLOC_DEFAULT_ON

发行版 x86_64 riscv64
openEuler 0 0
Ubuntu Noble 1 1
Arch Linux 1 -
Fedora 1 1

结论: 合理.

主要防止堆内存信息泄露, 作用是在内核分配堆内存时, 自动将内存块初始化为零.

2.1.6 CONFIG_FORTIFY_SOURCE

发行版 x86_64 riscv64
openEuler 1 1
Ubuntu Noble 1 0
Arch Linux 1 -
Fedora 1 1

结论: 合理, 该配置依赖 ARCH_HAS_FORTIFY_SOURCE, 只要满足对应架构, ARCH_HAS_FORTIFY_SOURCE 就会默认开启.

Linux 内核中的编译时与运行时缓冲区溢出检测加固特性.

一是编译时检查: 如果编译器能静态确定目标缓冲区的大小,而程序员试图超过缓冲区大小的数据, 编译器会直接报错.

二是运行时检查: 如果大小无法在编译时确定, 内核会在运行时检查.

这篇侧重于 gcc 的 _FORTIFY_SOURCE 编译选项, 容易和内核的 CONFIG_FORTIFY_SOURCE 混淆: https://maskray.me/blog/2022-11-06-fortify-source

对于内核的 CONFIG_FORTIFY_SOURCE.

FORTIFY_SOURCE 介绍: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6974f0c4555e285ab217cee58b6e874f776ff409

内核的相关 config.

1
2
3
4
5
6
7
8
config FORTIFY_SOURCE
bool "Harden common str/mem functions against buffer overflows"
depends on ARCH_HAS_FORTIFY_SOURCE
# https://github.com/llvm/llvm-project/issues/53645
depends on !X86_32 || !CC_IS_CLANG || CLANG_VERSION >= 160000
help
Detect overflows of buffers in common string and memory functions
where the compiler can determine and validate the buffer sizes.

2.1.7 CONFIG_HARDENED_USERCOPY

发行版 x86_64 riscv64
openEuler 1 1
Ubuntu Noble 1 1
Arch Linux 1 -
Fedora 1 1

结论: 合理.

作用是对用户空间的拷贝进行安全加固, 引文中介绍了两个危险的接口.

1
2
copy_from_user(kernel_buf, user_ptr, n);
copy_to_user(user_ptr, kernel_buf, n);

于是引入了配置项, 目的是防止内核在与用户空间进行数据拷贝时发生越界读写.

https://lwn.net/Articles/695991/

KSPP 中:

1
2
3
4
# Perform usercopy bounds checking. (And disable fallback to gain full whitelist enforcement.)
CONFIG_HARDENED_USERCOPY=y
# CONFIG_HARDENED_USERCOPY_FALLBACK is not set
# CONFIG_HARDENED_USERCOPY_PAGESPAN is not set

2.1.8 CONFIG_KFENCE

发行版 x86_64 riscv64
openEuler 0 0
Ubuntu Noble 1 1
Arch Linux 1 -
Fedora 1 1

结论: 合理(建议显式开启依赖 HAVE_ARCH_KFENCE).

KFENCE 主要是针对 heap 的内存安全问题进行检测. 特点是低开销, 因此可以放心在生产环境中开启.

原理是: 创建一个专用的内存检测池, 在分配的数据周围加上一个所谓的 fence page, 然后设置为不可访问, 只要越界就会出发异常.

https://docs.kernel.org/dev-tools/kfence.html

2.1.9 CONFIG_IO_STRICT_DEVMEM

发行版 x86_64 riscv64
openEuler 1 0
Ubuntu Noble 0 0
Arch Linux 1 -
Fedora 1 1

结论: 不合理:

  • 第一个原因是, 这里的配置存在争议, KSPP 项目说明到有些情况可能需要直接访问物理内存, 那么至少建议开启 CONFIG_STRICT_DEVMEM.

  • 第二个原因是, 截至 2026-1-19 这里有一个问题是:

    1
    2
    3
    $ grep -Irn STRICT_DEVMEM .
    ./config.riscv64:1796:CONFIG_IO_STRICT_DEVMEM=y
    ./config.x86_64:5961:CONFIG_IO_STRICT_DEVMEM=y

    实际上通过查看 .config 我们知道 IO_STRICT_DEVMEM 依赖于 STRICT_DEVMEM, 但是这里只打开了 IO_STRICT_DEVMEM 而没能打开 STRICT_DEVMEM. 很明显的配置错误.

CONFIG_IO_STRICT_DEVMEM 用于严格限制 /dev/mem 对物理内存的访问范围,防止用户空间通过 /dev/mem 直接读写普通 RAM

2.2 riscv64

2.2.1 CONFIG_MODULE_SIG_SHA256

发行版 x86_64 riscv64
openEuler 1 1
Ubuntu Noble 0 0
Arch Linux 0 -
Fedora 0 0

结论: 不合理

KSPP 中规范道:

1
2
3
4
5
6
7
8
9
10
# But if CONFIG_MODULE=y is needed, at least they must be signed with a per-build key.
# See also kernel.modules_disabled sysctl below.
CONFIG_DEBUG_SET_MODULE_RONX=y (prior to v4.11)
CONFIG_STRICT_MODULE_RWX=y (since v4.11)
CONFIG_MODULE_SIG=y
CONFIG_MODULE_SIG_FORCE=y
CONFIG_MODULE_SIG_ALL=y
CONFIG_MODULE_SIG_SHA512=y
CONFIG_MODULE_SIG_HASH="sha512"
CONFIG_MODULE_SIG_KEY="certs/signing_key.pem"

riscv64 存在 CONFIG_MODULE_SIG_SHA256, 但是它 depend on CONFIG_MODULE_SIG, 可是 openRuyi 针对两个架构都没有展开.

x86_64 明显是不存在 CONFIG_MODULE_SIG 配置, 该配置建议开启, 这是其他发行版都开启的一个基础配置(但是内核应该可以生效的原因应该在于 2.2.2 他会强制开启 CONFIG_MODULE_SIG).

2.2.2 CONFIG_SECURITY_LOCKDOWN_LSM

发行版 x86_64 riscv64
openEuler 1 1
Ubuntu Noble 1 1
Arch Linux 1 -
Fedora 1 1

结论: 合理, 该配置会强制开启 CONFIG_MODULE_SIG.

该配置项启动了 kernel lockdown 的机制, 目的是禁止对运行中的内核镜像进行未经授的修改.

https://man7.org/linux/man-pages/man7/kernel_lockdown.7.html

2.2.3 CONFIG_SECURITY_LOCKDOWN_LSM_EARLY

发行版 x86_64 riscv64
openEuler 1 1
Ubuntu Noble 1 1
Arch Linux 0 -
Fedora 1 1

结论: 合理, 该配置依赖 CONFIG_SECURITY_LOCKDOWN_LSM.

同时 KSPP 还建议:

1
2
3
4
# Enable "lockdown" LSM for bright line between the root user and kernel memory.
CONFIG_SECURITY_LOCKDOWN_LSM=y
CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y
CONFIG_LOCK_DOWN_KERNEL_FORCE_CONFIDENTIALITY=y

2.2.4 CONFIG_PANIC_ON_OOPS

发行版 x86_64 riscv64
openEuler 1 1
Ubuntu Noble 0 0
Arch Linux 0 -
Fedora 0 0

结论: 合理, 但是很多发行版不打开 KSPP 建议的搭配组合是.

1
2
3
# Reboot devices immediately if kernel experiences an Oops.
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_TIMEOUT=-1

启动该配置, 之后如果遇到非致命性错误, 系统也会马上重启. 貌似很多发行版不会打开.

但是如果不主动设置 CONFIG_PANIC_TIMEOUT 的情况下, 会导致内核停机. 如果为负数则会立即重启.

https://wiki.noodlefighter.com/%E8%AE%A1%E7%AE%97%E6%9C%BA/linux/kernel/linux%E7%9A%84panic%E5%92%8Coops/

2.2.5 CONFIG_DEBUG_LIST

发行版 x86_64 riscv64
openEuler 1 1
Ubuntu Noble 0 0
Arch Linux 0 -
Fedora 1 1

结论: 合理(建议显式开启CONFIG_DEBUG_KERNEL)

CONFIG_DEBUG_LIS 依赖 CONFIG_DEBUG_KERNEL.

CONFIG_DEBUG_KERNEL 没有开启, 但是查阅了文件, 存在 CONFIG_EXPERT 会导致强制开启 CONFIG_DEBUG_KERNEL, 建议显式开启该配置.

2.2.6 CONFIG_RANDOMIZE_BASE

发行版 x86_64 riscv64
openEuler 1 0
Ubuntu Noble 1 1
Arch Linux 1 -
Fedora 1 1

结论: 不合理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
config RANDOMIZE_BASE
bool "Randomize the address of the kernel image"
select RELOCATABLE
depends on MMU && 64BIT && !XIP_KERNEL
help
Randomizes the virtual address at which the kernel image is
loaded, as a security feature that deters exploit attempts
relying on knowledge of the location of kernel internals.

It is the bootloader's job to provide entropy, by passing a
random u64 value in /chosen/kaslr-seed at kernel entry.

When booting via the UEFI stub, it will invoke the firmware's
EFI_RNG_PROTOCOL implementation (if available) to supply entropy
to the kernel proper. In addition, it will randomise the physical
location of the kernel Image as well.

If unsure, say N.

我们可以看到该配置依赖于 RELOCATABLE. 但是目测 openRuyi 的 repo 下没有打开该配置项.

1
2
3
4
grep -Irn RELOCATABLE .
./0034-UPSTREAM-riscv-boot-Always-make-Image-from-vmlinux-n.patch:11:Doing so fixes booting a RELOCATABLE=y Image with kexec. The problem is
./0034-UPSTREAM-riscv-boot-Always-make-Image-from-vmlinux-n.patch:48:-ifdef CONFIG_RELOCATABLE
./0036-UPSTREAM-riscv-trace-fix-snapshot-deadlock-with-sbi-.patch:63: ifdef CONFIG_RELOCATABLE

该配置就是大名鼎鼎的 KASLR.

如果不设置的话, 内核总是加载到一个固定的虚拟地址, 当设置之后每次系统启动时, 内核就会被加载到一个随机的虚拟地址.

https://cateee.net/lkddb/web-lkddb/RANDOMIZE_BASE.html

KSPP 的建议如下:

1
2
3
4
# x86_64
# Randomize position of kernel and memory.
CONFIG_RANDOMIZE_BASE=y
CONFIG_RANDOMIZE_MEMORY=y

因此建议 x86_64 也需要打开一下.

2.3 x86_64

2.3.1 CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT

发行版 x86_64 riscv64
openEuler 0 0
Ubuntu Noble 1 0
Arch Linux 1 -
Fedora 1 1

结论: 合理

该配置的作用是在每次系统调用的时, 为内核栈添加一个随机的偏移量.

该配置配合内核命令行变量 randomize_kstack_offset 来设置开启和关闭.

2.3.2 CONFIG_SECURITY_DMESG_RESTRICT

发行版 x86_64 riscv64
openEuler 0 0
Ubuntu Noble 1 1
Arch Linux 1 -
Fedora 1 1

结论: 合理

该配置项的作用是禁止普通用户查看内核日志缓冲区.

1
2
3
4
5
6
7
8
9
10
11
config SECURITY_DMESG_RESTRICT
bool "Restrict unprivileged access to the kernel syslog"
default n
help
This enforces restrictions on unprivileged users reading the kernel
syslog via dmesg(8).

If this option is not selected, no restrictions will be enforced
unless the dmesg_restrict sysctl is explicitly set to (1).

If you are unsure how to answer this question, answer N.

2.3.3 CONFIG_ZERO_CALL_USED_REGS

发行版 x86_64 riscv64
openEuler 0 0
Ubuntu Noble 1 1
Arch Linux 0 -
Fedora 0 0

结论: 合理

它的作用是要求编译器在函数返回之前,将所有使用过的临时寄存器清零.

PS: 需要 GCC 或者 Clang 版本 > 15.0.6

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
config CC_HAS_ZERO_CALL_USED_REGS
def_bool $(cc-option,-fzero-call-used-regs=used-gpr)
# https://github.com/ClangBuiltLinux/linux/issues/1766
# https://github.com/llvm/llvm-project/issues/59242
depends on !CC_IS_CLANG || CLANG_VERSION > 150006

config ZERO_CALL_USED_REGS
bool "Enable register zeroing on function exit"
depends on CC_HAS_ZERO_CALL_USED_REGS
help
At the end of functions, always zero any caller-used register
contents. This helps ensure that temporary values are not
leaked beyond the function boundary. This means that register
contents are less likely to be available for side channels
and information exposures. Additionally, this helps reduce the
number of useful ROP gadgets by about 20% (and removes compiler
generated "write-what-where" gadgets) in the resulting kernel
image. This has a less than 1% performance impact on most
workloads. Image size growth depends on architecture, and should
be evaluated for suitability. For example, x86_64 grows by less
than 1%, and arm64 grows by about 5%.

2.3.4 CONFIG_SCHED_STACK_END_CHECK

发行版 x86_64 riscv64
openEuler 0 0
Ubuntu Noble 1 1
Arch Linux 1 -
Fedora 1 1

结论: 合理

该配置项主要用于在 schedule() 的时候检测内核栈溢出的状态.

1
2
3
4
5
6
7
8
9
10
11
config SCHED_STACK_END_CHECK
bool "Detect stack corruption on calls to schedule()"
depends on DEBUG_KERNEL
default n
help
This option checks for a stack overrun on calls to schedule().
If the stack end location is found to be over written always panic as
the content of the corrupted region can no longer be trusted.
This is to ensure no erroneous behaviour occurs which could result in
data corruption or a sporadic crash at a later stage once the region
is examined. The runtime overhead introduced is minimal.

2.3.5 CONFIG_KFENCE_SAMPLE_INTERVAL

发行版 x86_64 riscv64
openEuler 0 0
Ubuntu Noble 0 0
Arch Linux 100 -
Fedora 100 100

结论: 该配置搭配 CONFIG_KFENCE 使用, 这里的设置有点不恰当.

CONFIG_KFENCE_SAMPLE_INTERVAL 配合 CONFIG_KFENCE 使用.

该配置选项在 openRuyi 当中设置为 0.

1
2
$ grep -Irn CONFIG_KFENCE_SAMPLE_INTERVAL .
./config.x86_64:5945:CONFIG_KFENCE_SAMPLE_INTERVAL=0

但是 riscv 的 config 却没有设置, 此时 riscv64 会被设置为默认值 100. 而 x86_64 架构下由于设置为 0 则等同于没有启用 CONFIG_KFENCE.

1
2
3
4
5
6
7
8
9
10
11
12
if KFENCE

config KFENCE_SAMPLE_INTERVAL
int "Default sample interval in milliseconds"
default 100
help
The KFENCE sample interval determines the frequency with which heap
allocations will be guarded by KFENCE. May be overridden via boot
parameter "kfence.sample_interval".

Set this to 0 to disable KFENCE by default, in which case only
setting "kfence.sample_interval" to a non-zero value enables KFENCE.

另一个 enable KFENCE 的方法就是.

1
2
# Another way to enable KFENCE (see CONFIG_KFENCE_SAMPLE_INTERVAL).
kfence.sample_interval=100

2.3.6 CONFIG_SCHED_CORE

发行版 x86_64 riscv64
openEuler 0 0
Ubuntu Noble 1 0
Arch Linux 1 -
Fedora 1 1

结论: 合理

介绍说的是用于解决超线程(SMT)下的安全性问题. 知识盲区了.

这里依赖 SCHED_SMT, 该配置又依赖架构自动开关.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
config SCHED_CORE
bool "Core Scheduling for SMT"
depends on SCHED_SMT
help
This option permits Core Scheduling, a means of coordinated task
selection across SMT siblings. When enabled -- see
prctl(PR_SCHED_CORE) -- task selection ensures that all SMT siblings
will execute a task from the same 'core group', forcing idle when no
matching task is found.

Use of this feature includes:
- mitigation of some (not all) SMT side channels;
- limiting SMT interference to improve determinism and/or performance.

SCHED_CORE is default disabled. When it is enabled and unused,
which is the likely usage by Linux distributions, there should
be no measurable impact on performance.

https://docs.kernel.org/admin-guide/hw-vuln/core-scheduling.html

2.3.7 CONFIG_RESET_ATTACK_MITIGATION

发行版 x86_64 riscv64
openEuler 0 0
Ubuntu Noble 1 1
Arch Linux 0 -
Fedora 0 0

结论: 合理

该配置的作用是防止攻击者通过重启电脑来窃取残留在内存中的敏感数据. 但是为什么这么多发行版都没打开,我也没有深究, 我猜测可能涉及到固件方面的兼容性问题.

https://github.com/a13xp0p0v/kernel-hardening-checker/issues/11

1
2
3
4
5
6
7
8
9
10
11
12
config RESET_ATTACK_MITIGATION
bool "Reset memory attack mitigation"
depends on EFI_STUB
help
Request that the firmware clear the contents of RAM after a reboot
using the TCG Platform Reset Attack Mitigation specification. This
protects against an attacker forcibly rebooting the system while it
still contains secrets in RAM, booting another OS and extracting the
secrets. This should only be enabled when userland is configured to
clear the MemoryOverwriteRequest flag on clean shutdown after secrets
have been evicted, since otherwise it will trigger even on clean
reboots.

2.3.8 CONFIG_AMD_IOMMU

发行版 x86_64 riscv64
openEuler 1 0
Ubuntu Noble 1 0
Arch Linux 1 -
Fedora 1 0

结论: 合理

可以看到大多数发行版都打开了这个配置开关. 没有细究了.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
config AMD_IOMMU
bool "AMD IOMMU support"
select SWIOTLB
select PCI_MSI
select PCI_ATS
select PCI_PRI
select PCI_PASID
select IRQ_MSI_LIB
select MMU_NOTIFIER
select IOMMU_API
select IOMMU_IOVA
select IOMMU_IO_PGTABLE
select IOMMU_SVA
select IOMMU_IOPF
select IOMMUFD_DRIVER if IOMMUFD
depends on X86_64 && PCI && ACPI && HAVE_CMPXCHG_DOUBLE
help
With this option you can enable support for AMD IOMMU hardware in
your system. An IOMMU is a hardware component which provides
remapping of DMA memory accesses from devices. With an AMD IOMMU you
can isolate the DMA memory of different devices and protect the
system from misbehaving device drivers or hardware.

You can find out if your system has an AMD IOMMU if you look into
your BIOS for an option to enable it or if you have an IVRS ACPI
table.

2.3.9 CONFIG_DEFAULT_MMAP_MIN_ADDR

发行版 x86_64 riscv64
openEuler 4096 4096
Ubuntu Noble 65536 65536
Arch Linux 65536 -
Fedora 65536 4096

结论: 合理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
config DEFAULT_MMAP_MIN_ADDR
int "Low address space to protect from user allocation"
depends on MMU
default 4096
help
This is the portion of low virtual memory which should be protected
from userspace allocation. Keeping a user from writing to low pages
can help reduce the impact of kernel NULL pointer bugs.

For most arm64, ppc64 and x86 users with lots of address space
a value of 65536 is reasonable and should cause no problems.
On arm and other archs it should not be higher than 32768.
Programs which use vm86 functionality or have some need to map
this low address space will need CAP_SYS_RAWIO or disable this
protection by setting the value to 0.

This value can be changed after boot using the
/proc/sys/vm/mmap_min_addr tunable.

2.3.10 CONFIG_INTEL_IOMMU*

发行版 x86_64 riscv64
openEuler 1 0
Ubuntu Noble 1 0
Arch Linux 1 -
Fedora 1 0

结论: 合理

2.3.11 CONFIG_LEGACY_VSYSCALL_NONE

发行版 x86_64 riscv64
openEuler 0 0
Ubuntu Noble 0 0
Arch Linux 0 -
Fedora 0 0

结论: 合理

该配置会禁用传统的 vsyscall 内存页, 主流的发行版都禁用该配置是因为为了兼容性考量.

vsyscall= [X86-64,EARLY]
Controls the behavior of vsyscalls (i.e. calls to
fixed addresses of 0xffffffffff600x00 from legacy
code). Most statically-linked binaries and older
versions of glibc use these calls. Because these
functions are at fixed addresses, they make nice
targets for exploits that can control RIP.

               emulate     Vsyscalls turn into traps and are emulated
                           reasonably safely.  The vsyscall page is
                           readable.

               xonly       [default] Vsyscalls turn into traps and are
                           emulated reasonably safely.  The vsyscall
                           page is not readable.

               none        Vsyscalls don't work at all.  This makes
                           them quite hard to use for exploits but
                           might break your system.

https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html#vsyscall

KSPP:

1
2
3
# Modern libc no longer needs a fixed-position mapping in userspace, remove it as a possible target.
# CONFIG_X86_VSYSCALL_EMULATION is not set
CONFIG_LEGACY_VSYSCALL_NONE=y

https://docs.yoctoproject.org/pipermail/linux-yocto/2018-August/007229.html

3 kernel-hardening-checker

该工具可以自动帮我们检查自己Linux系统内核中的相关安全增强选项. 因此对于 openRuyi 在内核安全方面的配置, 我的建议是可以引入作为配置参考.

引文已经讲解的很好了, 这里就没必要再重复了.

https://cloud.tencent.com/developer/article/1855250

4 总结

这里我观察到三件有趣的事情.

第一个有趣的事情是, KSPP 项目下, 作者的 recommand configs 只有 x86_*, arm*, gcc_plugins 的配置, 针对其他架构, 例如 riscv64 缺少特定的推荐配置, 一个问题是是否真的有针对 riscv64 的安全性模板配置? 我们需要长期投入, 摸索并定义出一套适用于 riscv 架构的最佳安全配置标准.

第二个有趣的事情是通过对比不同发行版的 config, 我神奇的发现了不同发行版针对 kernel 的配置方法也都有区别, 实际上我可以感受到 openEuler 的配置方法是较为简单的, 缺点是架构多了之后不方便集成管理, Ubuntu 的方法目测实际上是非常高效的. 但是 openRuyi 的架构针对性非常明显, 因此对 riscv 的配置下, 采用类似 openEuler 的配置方式, 既能满足当前需求, 又能降低维护成本.

第三个有趣的事情是, 我观察到有些发行版有很多 config, 有些发行版实际上安全性的配置没那么多, 我们 openRuyi 是否可以分成激进版和通用版两个版本进行发布呢? 激进版负责释放性能作为实验性版本, 通用版开启必要的安全特性, 作为默认选择.

下一步的计划应该是:

深入研究 KSPP 配置列表将极具价值. 我们需要结合发行版的具体定位, 对每一项配置进行的权衡分析. 我们工作的重点不是盲目照搬,而是要逐项审计. 最后有体系地 apply 进我们的代码仓库.

5 局限性

最后说明一下本篇的局限性, 或者说判断标准:

  1. 分析是静态的, 不经过代码的实证.

  2. 分析目前的判断标准

    • 只要其他发行版的内核支持, 即合理.

    • 只要内核不涉及架构相关, 即合理.

    • 只要配置项不要相互冲突, 即合理.

6 References

  1. KSPP 项目官方 repo: https://github.com/KSPP

  2. 初步了解 KSPP 的发展: https://zhuanlan.zhihu.com/p/585398563

  3. Kernel Self-Protection: https://www.kernel.org/doc/html/latest/security/self-protection.html

  4. recommand configs: https://github.com/KSPP/kspp.github.io/blob/main/Recommended_Settings.md, or https://kspp.github.io/Recommended_Settings

  5. 项目介绍页: https://kspp.github.io/

  6. Linux 内核安全相关的文章, 文章中作者指出了 Linux 社区对内核安全的态度, 文中还参考了 KSPP 的标准, 有点汉化 KSPP 的意味: https://zhuanlan.zhihu.com/p/548481948