The Linux Kernel/Debugging
⚲ API
- linux/err.h inc – helper macros for error pointer handling and propagation
- linux/errno.h inc – standard error codes used throughout the kernel.
Performance
There are many factors that can affect the performance of the Linux kernel, including hardware configurations, software configurations, and workload characteristics.
In this context, performance optimization of the Linux kernel involves identifying and addressing performance bottlenecks in the system. This can involve tuning kernel parameters, optimizing system resources, and identifying and fixing bugs and other issues that may be impacting performance.
Given the complexity of the Linux kernel and the wide range of factors that can affect performance, performance optimization can be a challenging task. However, with the right tools and techniques, it is possible to significantly improve the performance and reliability of Linux-based systems.
Perf_events
Perf_events, short for performance events, is a powerful interface that provides detailed insights into the performance characteristics of software running on a system. By analyzing the data collected by perf_events, developers can identify performance bottlenecks and optimize software to improve performance and reduce resource utilization. Perf_events is designed to be a lightweight, low-overhead monitoring solution that has minimal impact on system performance.
🔧 TODO
⚲ Interfaces
- man 1 perf – performance analysis tools
- Basic commands:
- man 1 perf-help – display help information about perf
- man 1 perf-top – System profiling tool.
- man 1 perf-record – Run a command and record its profile into perf.data
- man 1 perf-report – Read perf.data (created by perf record) and display the profile
- Other commands:
- man 1 perf-annotate – Read perf.data (created by perf record) and display annotated code
- man 1 perf-archive – Create archive with object files with build-ids found ...
- man 1 perf-arm-spe – Support for Arm Statistical Profiling Extension within...
- man 1 perf-bench – General framework for benchmark suites
- man 1 perf-buildid-cache – Manage build-id cache.
- man 1 perf-buildid-list – List the buildids in a perf.data file
- man 1 perf-c2c – Shared Data C2C/HITM Analyzer.
- man 1 perf-config – Get and set variables in a configuration file.
- man 1 perf-daemon – Run record sessions on background
- man 1 perf-data – Data file related processing
- man 1 perf-diff – Read perf.data files and display the differential profile
- man 1 perf-dlfilter – Filter sample events using a dynamically loaded shared...
- man 1 perf-evlist – List the event names in a perf.data file
- man 1 perf-ftrace – simple wrapper for kernel's ftrace functionality
- man 1 perf-inject – Filter to augment the events stream with additional in...
- man 1 perf-intel-pt – Support for Intel Processor Trace within perf tools
- man 1 perf-iostat – Show I/O performance metrics
- man 1 perf-kallsyms – Searches running kernel for symbols
- man 1 perf-kmem – Tool to trace/measure kernel memory properties
- man 1 perf-kvm – Tool to trace/measure kvm guest os
- man 1 perf-kwork – Tool to trace/measure kernel work properties (latencies)
- man 1 perf-list – List all symbolic event types
- man 1 perf-lock – Analyze lock events
- man 1 perf-mem – Profile memory accesses
- man 1 perf-probe – Define new dynamic tracepoints
- man 1 perf-sched – Tool to trace/measure scheduler properties (latencies)
- man 1 perf-script – Read perf.data (created by perf record) and display tr...
- man 1 perf-script-perl – Process trace data with a Perl script
- man 1 perf-script-python – Process trace data with a Python script
- man 1 perf-stat – Run a command and gather performance counter statistics
- man 1 perf-test – Runs sanity tests.
- man 1 perf-timechart – Tool to visualize total system behavior during a workload
- man 1 perf-trace – strace inspired tool
- man 1 perf-version – display the version of perf binary
⚙️ Internals
- man 2 perf_event_open – sets up performance monitoring
- uapi/linux/perf_event.h inc
- tools/perf src
- linux/perf_event.h inc
- kernel/events/core.c src
- kernel/profile.c src – simple profiling
📖 References
- perf – instruments CPU performance counters, tracepoints, kprobes, and uprobes
- https://perf.wiki.kernel.org/
📚 Further reading
🛠️ Utilities
- Performance Co-Pilot, https://pcp.io/ – Performance Co-Pilot
- Prometheus, https://prometheus.io/
- https://github.com/redhat-nfvpe/container-perf-tools
- https://github.com/brendangregg/perf-tools – performance analysis tools based on Linux perf_events (aka perf) and ftrace
- readprofile – a tool to read kernel profiling information
📚 Further reading
User space debug interfaces
⚲ Interfaces
- man 1 dmesg – prints or control the kernel ring buffer
- man 2 syslog – system call, which is used to control the kernel printk() buffer
- man 1 strace – system calls and signals tracing tool
- man 2 ptrace – process trace system call
- man 3 klogctl
- man 5 core
- /sys/kernel/debug/ – debugfs
- dmesg --console-level <level>
- gdb /usr/src/linux/vmlinux /proc/kcore
- /proc/self/stack
- dynamic doc debug
- ⌨️ hands-on:
- echo "module atkbd +pfl" | sudo tee /sys/kernel/debug/dynamic_debug/control
⚙️ Internals
📚 References
Tracing and logging
⚲ API:
User-space interface:
- man 1 dmesg – prints or control the kernel ring buffer
- man 2 syslog – system call, which is used to control the kernel printk() buffer
- /proc/kmsg
- https://kernelshark.org/ – front end reader of trace-cmd
- https://trace-cmd.org/, man 1 trace-cmd – CLI for Ftrace doc – Linux kernel internal tracer /sys/kernel/debug/tracing/
The most commonly used functions
- linux/printk.h inc
- dump_stack id – prints the current kernel stack trace for debugging purposes
- pr_alert id – logs an alert-level message, indicating a critical event that requires immediate attention
- pr_cont id – continues printing the current message on the same line
- pr_crit id – logs a critical-level message, indicating a severe condition that might require system halt
- pr_debug id – logs a debug-level message for developers, usually enabled in debug builds
- pr_devel id – logs a developer-specific message, typically used for fine-grained debug purposes, see dynamic doc
- pr_emerg id – logs an emergency-level message, indicating a serious error that could cause system crash
- pr_err id – logs an error-level message, typically indicating an issue that requires attention
- pr_err_ratelimited id – logs an error-level message with rate limiting to prevent excessive logging
- pr_fmt id – defines a format string for kernel messages
- pr_info id – logs an informational-level message, providing status updates or diagnostics
- pr_info_ratelimited id – logs an informational-level message with rate limiting
- pr_notice id – logs a notice-level message, typically used for events that aren't errors but should be noted
- pr_warn id – logs a warning-level message, indicating a potential issue that doesn't immediately affect system functionality
- pr_warn_once id – logs a warning message once, preventing repeated warnings for the same event
- pr_warn_ratelimited id – logs a warning-level message with rate limiting
- print_hex_dump id – prints a hexdump of data for debugging purposes
- print_hex_dump_debug id – prints a detailed hexdump with debugging-level verbosity
- printk id – the primary function for printing kernel messages with varying severity levels
- va_format id – formats a variable argument list into a string for logging or printing purposes
- ⌨️ hands-on:
- echo "module atkbd +pfl" | sudo tee /sys/kernel/debug/dynamic_debug/control; dmesg -w
- and type on build-in keyboard
- include/linux/dev_printk.h inc – device-specific logging
- dev_crit id – prints a critical-level message for a device
- dev_dbg id – prints a debug-level message for a device if debugging is enabled
- dev_dbg_ratelimited id – prints debug messages for a device with rate limiting
- dev_err id – prints an error-level message for a device
- dev_err_once id – prints an error message for a device only once
- dev_err_probe id – prints an error related to probe failure with standard formatting
- dev_err_ratelimited id – prints error messages for a device with rate limiting
- dev_fmt id – defines a format string used by device-specific printk macros
- dev_info id – prints an informational-level message for a device
- dev_notice id – prints a notice-level message for a device
- dev_printk id – generic function to print kernel messages with specified log level for a device
- dev_vdbg id – prints verbose debug messages for a device if enabled at compile time
- dev_warn id – prints a warning-level message for a device
- dev_warn_once id – prints a warning message for a device only once
- dev_warn_ratelimited id – prints warning messages for a device with rate limiting
- asm-generic/bug.h inc
⚙️ Internals
- printk id
- kernel/printk/printk.c src
- arch/x86/kernel/traps.c src
- lib/dump_stack.c src
- kernel/trace src
- scripts/tracing/draw_functrace.py src
- logging ltp, tracing ltp
- samples/ftrace src
- samples/trace_events src
- samples/trace_printk src
- linux/instrumentation.h inc
📚 References:
- Debugging by printing
- Message logging with printk doc
- Dynamic debug doc
- SystemTap
- man 1 stap – systemtap script translator/driver
- strace
- man 1 strace – trace system calls and signals
- LTTng
- ftrace
- Linux Tracing Technologies doc
- Tracepoint Analysis doc
- Function Tracer doc – function, latency and event tracing
- Using ftrace to hook to functions doc
- Fprobe - Function entry/exit probe doc
- Kprobes doc
- Kprobe-based Event Tracing doc
- Uprobe-tracer: Uprobe-based Event Tracing doc
- Using the Linux Kernel Tracepoints doc
- Subsystem Trace Points: kmem doc
- Subsystem Trace Points: power doc
- NMI Trace Events doc
- In-kernel memory-mapped I/O tracing doc
- Event Histograms doc
- Histogram Design Notes doc
- Boot-time tracing doc
- Hardware Latency Detector doc
- Intel(R) Trace Hub (TH) doc
- Lockless Ring Buffer Design doc
- System Trace Module doc
- CoreSight - ARM Hardware Trace doc
🔧 TODO. 🚀 advanced features
- linux/kmemleak.h inc – memory leak detector
- pr_cont id- continues a previous log message in the same line
- print_hex_dump_bytes id
- print_hex_dump_debug id
- dump_stack id
- CONFIG_PRINTK_CALLER id
- CONFIG_DEBUG_KERNEL id
- CONFIG_DEBUG_INFO id
- https://git.kernel.org/pub/scm/libs/libtrace/
kgdb and kdb
⚲ Interfaces
⚙️ Internals
📚 References
- Using kgdb, kdb and the kernel debugger internals doc
- kdump
- kdump doc
- man 8 crash – Analyze Linux crash dump data or a live system
eBPF
⚲ API:
📖 References
📚 Further reading
- man 7 bpf-helpers
- Linux Extended BPF (eBPF) Tracing Tools
- bpftrace – High-level tracing language for Linux eBPF
- BCC – Tools for BPF-based Linux IO analysis, networking, monitoring, and more
- man 8 stapbpf
- eBPF Programming for Linux Kernel Tracing
- lockdep - Runtime locking correctness validator doc
Watchdogs
The Linux Kernel/Softdog Driver
dev_watchdog id – network device watchdog
The NMI watchdog lockup detectors:
⚲ API
- /proc/sys/kernel/nmi_watchdog
- /proc/sys/kernel/soft_watchdog
- /proc/sys/kernel/watchdog
- /proc/sys/kernel/watchdog_cpumask
- /proc/sys/kernel/watchdog_thresh
- /proc/sys/kernel/hardlockup_all_cpu_backtrace
- /proc/sys/kernel/hardlockup_panic
- /proc/sys/kernel/softlockup_all_cpu_backtrace
- /proc/sys/kernel/softlockup_panic
👁️ Example
- ./lib/test_lockup.c src – test module to generate lockups
Provoke NMI watchdog without panic:
echo 0 > /proc/sys/kernel/hardlockup_panic insmod test_lockup.ko disable_irq=1 time_secs=13
⚙️ Internals
- kernel/watchdog.c src – detects hard and soft lockups on a system
- kernel/watchdog_perf.c src – detects hard lockups on a system using perf
- kernel/watchdog_buddy.c src
📚 References
- Documentation for /proc/sys/kernel/ doc
- Softlockup detector and hardlockup detector (aka nmi_watchdog) doc
- kernel parameters:
...
⚙️ Internals
📖 References for debugging
- Ramoops oops/panic logger doc
- pstore block oops/panic logger doc
- Fault injection doc
- Bisecting a bug doc
- Development tools for the kernel doc
- linux/tracepoint.h inc
📚 Further reading
- https://deepwiki.com/torvalds/linux/2.4-kernel-tracing-and-profiling
- https://drgn.readthedocs.io/ – programmable debugger
- https://crash-utility.github.io/
- https://wiki.ubuntu.com/Kernel/Debugging
- Intel VTune Profiler
- Linux Applications Debugging Techniques