The Linux Kernel/Debugging


⚲ API

linux/err.h inc helper macros for error pointer handling and propagation
linux/errno.h inc standard error codes used throughout the kernel.

Performance

There are many factors that can affect the performance of the Linux kernel, including hardware configurations, software configurations, and workload characteristics.

In this context, performance optimization of the Linux kernel involves identifying and addressing performance bottlenecks in the system. This can involve tuning kernel parameters, optimizing system resources, and identifying and fixing bugs and other issues that may be impacting performance.

Given the complexity of the Linux kernel and the wide range of factors that can affect performance, performance optimization can be a challenging task. However, with the right tools and techniques, it is possible to significantly improve the performance and reliability of Linux-based systems.

Perf_events

Perf_events, short for performance events, is a powerful interface that provides detailed insights into the performance characteristics of software running on a system. By analyzing the data collected by perf_events, developers can identify performance bottlenecks and optimize software to improve performance and reduce resource utilization. Perf_events is designed to be a lightweight, low-overhead monitoring solution that has minimal impact on system performance.


🔧 TODO


⚲ Interfaces

man 1 perf performance analysis tools
Basic commands:
man 1 perf-help display help information about perf
man 1 perf-top System profiling tool.
man 1 perf-record Run a command and record its profile into perf.data
man 1 perf-report Read perf.data (created by perf record) and display the profile
Other commands:
man 1 perf-annotate Read perf.data (created by perf record) and display annotated code
man 1 perf-archive Create archive with object files with build-ids found ...
man 1 perf-arm-spe Support for Arm Statistical Profiling Extension within...
man 1 perf-bench General framework for benchmark suites
man 1 perf-buildid-cache Manage build-id cache.
man 1 perf-buildid-list List the buildids in a perf.data file
man 1 perf-c2c Shared Data C2C/HITM Analyzer.
man 1 perf-config Get and set variables in a configuration file.
man 1 perf-daemon Run record sessions on background
man 1 perf-data Data file related processing
man 1 perf-diff Read perf.data files and display the differential profile
man 1 perf-dlfilter Filter sample events using a dynamically loaded shared...
man 1 perf-evlist List the event names in a perf.data file
man 1 perf-ftrace simple wrapper for kernel's ftrace functionality
man 1 perf-inject Filter to augment the events stream with additional in...
man 1 perf-intel-pt Support for Intel Processor Trace within perf tools
man 1 perf-iostat Show I/O performance metrics
man 1 perf-kallsyms Searches running kernel for symbols
man 1 perf-kmem Tool to trace/measure kernel memory properties
man 1 perf-kvm Tool to trace/measure kvm guest os
man 1 perf-kwork Tool to trace/measure kernel work properties (latencies)
man 1 perf-list List all symbolic event types
man 1 perf-lock Analyze lock events
man 1 perf-mem Profile memory accesses
man 1 perf-probe Define new dynamic tracepoints
man 1 perf-sched Tool to trace/measure scheduler properties (latencies)
man 1 perf-script Read perf.data (created by perf record) and display tr...
man 1 perf-script-perl Process trace data with a Perl script
man 1 perf-script-python Process trace data with a Python script
man 1 perf-stat Run a command and gather performance counter statistics
man 1 perf-test Runs sanity tests.
man 1 perf-timechart Tool to visualize total system behavior during a workload
man 1 perf-trace strace inspired tool
man 1 perf-version display the version of perf binary


⚙️ Internals

man 2 perf_event_open sets up performance monitoring
uapi/linux/perf_event.h inc
tools/perf src
linux/perf_event.h inc
kernel/events/core.c src
kernel/profile.c src simple profiling


📖 References

perf instruments CPU performance counters, tracepoints, kprobes, and uprobes
https://perf.wiki.kernel.org/


📚 Further reading

perf Examples
The Unofficial Linux Perf Events Web-Page


🛠️ Utilities

Performance Co-Pilot, https://pcp.io/ Performance Co-Pilot
Prometheus, https://prometheus.io/
https://github.com/redhat-nfvpe/container-perf-tools
https://github.com/brendangregg/perf-tools performance analysis tools based on Linux perf_events (aka perf) and ftrace
readprofile a tool to read kernel profiling information


📚 Further reading

stress-ng exercises various kernel interfaces
http://trac.gateworks.com/wiki/linux/profiling
Analyzing application performance in RHEL 9
Monitoring and managing system status and performance in RHEL 9
Real-time Linux

User space debug interfaces

⚲ Interfaces

man 1 dmesg prints or control the kernel ring buffer
man 2 syslog system call, which is used to control the kernel printk() buffer
man 1 strace system calls and signals tracing tool
man 2 ptrace process trace system call
man 3 klogctl
man 5 core
/sys/kernel/debug/ debugfs
dmesg --console-level <level>
gdb /usr/src/linux/vmlinux /proc/kcore
/proc/self/stack
dynamic doc debug
⌨️ hands-on:
echo "module atkbd +pfl" | sudo tee /sys/kernel/debug/dynamic_debug/control


⚙️ Internals

handle_sysrq id


📚 References

Development tools for the kernel doc
DebugFS doc, samples/qmi/qmi_sample_client.c src
Kprobe-based Event Tracing doc
Dynamic debug doc
Linux Magic System Request Key Hacks doc
Magic SysRq key

Tracing and logging

⚲ API:

User-space interface:

man 1 dmesg prints or control the kernel ring buffer
man 2 syslog system call, which is used to control the kernel printk() buffer
/proc/kmsg
https://kernelshark.org/ front end reader of trace-cmd
https://trace-cmd.org/, man 1 trace-cmd CLI for Ftrace doc Linux kernel internal tracer /sys/kernel/debug/tracing/


The most commonly used functions

linux/printk.h inc
dump_stack id – prints the current kernel stack trace for debugging purposes
pr_alert id – logs an alert-level message, indicating a critical event that requires immediate attention
pr_cont id – continues printing the current message on the same line
pr_crit id – logs a critical-level message, indicating a severe condition that might require system halt
pr_debug id – logs a debug-level message for developers, usually enabled in debug builds
pr_devel id – logs a developer-specific message, typically used for fine-grained debug purposes, see dynamic doc
pr_emerg id – logs an emergency-level message, indicating a serious error that could cause system crash
pr_err id – logs an error-level message, typically indicating an issue that requires attention
pr_err_ratelimited id – logs an error-level message with rate limiting to prevent excessive logging
pr_fmt id – defines a format string for kernel messages
pr_info id – logs an informational-level message, providing status updates or diagnostics
pr_info_ratelimited id – logs an informational-level message with rate limiting
pr_notice id – logs a notice-level message, typically used for events that aren't errors but should be noted
pr_warn id – logs a warning-level message, indicating a potential issue that doesn't immediately affect system functionality
pr_warn_once id – logs a warning message once, preventing repeated warnings for the same event
pr_warn_ratelimited id – logs a warning-level message with rate limiting
print_hex_dump id – prints a hexdump of data for debugging purposes
print_hex_dump_debug id – prints a detailed hexdump with debugging-level verbosity
printk id – the primary function for printing kernel messages with varying severity levels
va_format id – formats a variable argument list into a string for logging or printing purposes
⌨️ hands-on:
echo "module atkbd +pfl" | sudo tee /sys/kernel/debug/dynamic_debug/control; dmesg -w
and type on build-in keyboard
include/linux/dev_printk.h inc – device-specific logging
dev_crit id – prints a critical-level message for a device
dev_dbg id – prints a debug-level message for a device if debugging is enabled
dev_dbg_ratelimited id – prints debug messages for a device with rate limiting
dev_err id – prints an error-level message for a device
dev_err_once id – prints an error message for a device only once
dev_err_probe id – prints an error related to probe failure with standard formatting
dev_err_ratelimited id – prints error messages for a device with rate limiting
dev_fmt id – defines a format string used by device-specific printk macros
dev_info id – prints an informational-level message for a device
dev_notice id – prints a notice-level message for a device
dev_printk id – generic function to print kernel messages with specified log level for a device
dev_vdbg id – prints verbose debug messages for a device if enabled at compile time
dev_warn id – prints a warning-level message for a device
dev_warn_once id – prints a warning message for a device only once
dev_warn_ratelimited id – prints warning messages for a device with rate limiting
asm-generic/bug.h inc
WARN_ON id
WARN id


⚙️ Internals

printk id
kernel/printk/printk.c src
arch/x86/kernel/traps.c src
lib/dump_stack.c src
kernel/trace src
scripts/tracing/draw_functrace.py src
logging ltp, tracing ltp
samples/ftrace src
samples/trace_events src
samples/trace_printk src
linux/instrumentation.h inc


📚 References:

Debugging by printing
Message logging with printk doc
Dynamic debug doc
SystemTap
man 1 stap systemtap script translator/driver
strace
man 1 strace trace system calls and signals
LTTng
ftrace
Linux Tracing Technologies doc
Tracepoint Analysis doc
Function Tracer doc function, latency and event tracing
Event Tracing doc
Using ftrace to hook to functions doc
Fprobe - Function entry/exit probe doc
Kprobes doc
Kprobe-based Event Tracing doc
Uprobe-tracer: Uprobe-based Event Tracing doc
Using the Linux Kernel Tracepoints doc
Subsystem Trace Points: kmem doc
Subsystem Trace Points: power doc
NMI Trace Events doc
In-kernel memory-mapped I/O tracing doc
Event Histograms doc
Histogram Design Notes doc
Boot-time tracing doc
Hardware Latency Detector doc
Intel(R) Trace Hub (TH) doc
Lockless Ring Buffer Design doc
System Trace Module doc
CoreSight - ARM Hardware Trace doc

🔧 TODO. 🚀 advanced features

linux/kmemleak.h inc memory leak detector
pr_cont id- continues a previous log message in the same line
print_hex_dump_bytes id
print_hex_dump_debug id
dump_stack id
CONFIG_PRINTK_CALLER id
CONFIG_DEBUG_KERNEL id
CONFIG_DEBUG_INFO id
https://git.kernel.org/pub/scm/libs/libtrace/

kgdb and kdb

⚲ Interfaces

linux/kgdb.h inc
linux/kdb.h inc


⚙️ Internals

kernel/debug src


📚 References

Using kgdb, kdb and the kernel debugger internals doc
kdump
kdump doc
man 8 crash Analyze Linux crash dump data or a live system


eBPF

⚲ API:

man 2 bpfkernel/bpf/syscall.c src


📖 References

eBPF and BPF doc


📚 Further reading

man 7 bpf-helpers
Linux Extended BPF (eBPF) Tracing Tools
bpftrace High-level tracing language for Linux eBPF
BCC Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Example of trace.py
man 8 stapbpf
eBPF Programming for Linux Kernel Tracing
lockdep - Runtime locking correctness validator doc


Watchdogs

The Linux Kernel/Softdog Driver

dev_watchdog id network device watchdog

The NMI watchdog lockup detectors:

⚲ API

/proc/sys/kernel/nmi_watchdog
/proc/sys/kernel/soft_watchdog
/proc/sys/kernel/watchdog
/proc/sys/kernel/watchdog_cpumask
/proc/sys/kernel/watchdog_thresh
/proc/sys/kernel/hardlockup_all_cpu_backtrace
/proc/sys/kernel/hardlockup_panic
/proc/sys/kernel/softlockup_all_cpu_backtrace
/proc/sys/kernel/softlockup_panic
linux/nmi.h inc


👁️ Example

./lib/test_lockup.c src test module to generate lockups

Provoke NMI watchdog without panic:

echo 0 > /proc/sys/kernel/hardlockup_panic
insmod test_lockup.ko disable_irq=1 time_secs=13

⚙️ Internals

kernel/watchdog.c src detects hard and soft lockups on a system
kernel/watchdog_perf.c src detects hard lockups on a system using perf
kernel/watchdog_buddy.c src

📚 References

Documentation for /proc/sys/kernel/ doc
Softlockup detector and hardlockup detector (aka nmi_watchdog) doc
kernel parameters:
nmi_watchdog param
nowatchdog param
nosoftlockup param
softlockup_panic param

...

⚙️ Internals

arch/x86/kernel/traps.c src


📖 References for debugging

Ramoops oops/panic logger doc
pstore block oops/panic logger doc
Fault injection doc
Bisecting a bug doc
Development tools for the kernel doc
Kernel Testing Guide doc
Checkpatch doc, scripts/checkpatch.pl src
Selftests doc, tools/testing/selftests src
linux/tracepoint.h inc


📚 Further reading

https://deepwiki.com/torvalds/linux/2.4-kernel-tracing-and-profiling
https://drgn.readthedocs.io/ programmable debugger
https://crash-utility.github.io/
https://wiki.ubuntu.com/Kernel/Debugging
Intel VTune Profiler
Linux Applications Debugging Techniques
Category:Book:The Linux Kernel#Debugging%20
Category:Book:The Linux Kernel