Collecting Crash Dumps Using Kdump Utility
When an Oracle Linux system experiences a kernel panic and crashes unexpectedly or hangs, information about the system state and kernel calls leading up to the crash can be useful for troubleshooting. The Kdump feature provides a dumping mechanism for kernel crash information. In Oracle Linux platform images, the OS is either fully configured or partially configured to generate a crash dump, depending on the image release date.
If you have your own Linux image or a marketplace one, you need to install and configure Kdump using the command line.
Kdump includes a second kernel, that resides in a reserved part of the system memory, so that it can capture information about a stopped kernel. Kdump uses the kexec system call to boot into the second kernel, called a capture kernel, without the need to reboot the system, and then captures the contents of the stopped kernel’s memory as a crash dump. For more information about the contents of a crash dump, see What's Inside a Linux Kernel Core Dump.
/var/oled/crash/<ip-address>-<YYYY-MM-DD>-<HH:MM:SS>
directory, by default. A new <ip-address>-<YYYY-MM-DD>-<HH:MM:SS>
directory is created for each crash dump, for example: [opc@<instance_name> crash] ls -a
127.0.0.1-2025-02-07-15:18:07
127.0.0.1-2025-02-07-16:28:19
The dump directory contains the crash dump file, vmcore
, a text file and a log file, for example:[opc@<instance_name> <127.0.0.1-2025-02-07-16:28:19>] ls -a
vmcore
vmcore-dmesg.txt
kexec-dmesg.log
If you have an Oracle Linux instance that is unreachable or unresponsive, you can send a diagnostic interrupt to troubleshoot. A diagnostic interrupt causes the instance's OS to crash and reboot. To use the console or API to send a diagnostic interrupt, you must have Kdump configured to generate a crash dump. For more information, see Sending a diagnostic interrupt.
Setting the Memory Reserved for a Crash Dump
If you are using an Oracle Linux platform image, Kdump is installed and either fully configured or partially configured. You can change the memory amount that is reserved on the kernel to save the crash dump, also called a crashkernel memory reservation. In Oracle Linux 8, and earlier, the default memory reservation is set to adjust automatically: GRUB_CMDLINE_LINUX="crashkernel=auto"
. However, crashkernel=auto
is not supported for Oracle Linux 9, so you must set a specific amount of reserved memory using the crashkernel
parameter.
To set the memory reservation for a crash dump:
- From a command line, use your administrative privileges and connect to the instance using SSH.
- Edit the
/etc/default/grub
file to set the reserved memory. For example:- Set the memory reserve to 64 MB, for example:
GRUB_CMDLINE_LINUX="crashkernel=
64MB
" - Set the amount of reserved memory as a variable using the
crashkernel=<range1>:<size1>,<range2>:<size2>
syntax. For example:GRUB_CMDLINE_LINUX="crashkernel=512M-2G:64M,2G-:128M"
- Define an offset value for the reserved memory. Because the
crashkernel
reservation occurs early in the boot process, some systems require that you reserve memory with a certain fixed offset. When a fixed offset is specified, the reserved memory begins at that point. For example, to reserve 128 MB of memory, starting at 16 MB:GRUB_CMDLINE_LINUX="crashkernel=128M@16M"
- Set the memory reserve to 64 MB, for example:
- Save the changes and refresh the grub configuration:
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
- Reboot the instance to apply changes.
Changing Crash Dump Location
Using the /etc/kdump.conf
, you can change the location in which the crash dump files are saved, transfer them via SSH or export them to a network share.
- From a command line, use your administrative privileges and connect to the instance using SSH.
- Edit the configuration file at
/etc/kdump.conf
file and remove the#
comment character at the beginning of each line that you want to enable.- Change the default directory (
/var/oled/crash
) for the crash dump files, for example:path /usr/local/cores
- Transfer crash dump files over a secure shell connection, for example:
ssh <user@example.com> sshkey /root/.ssh/<mykey>
- Set the crash dump files to be exported to a compatible network share, for example:
nfs <example.com>:/<output>
See the
kdump.conf.5
file in the/usr/share/man/man5/kdump.conf.5.gz
archive for more information. - Change the default directory (
- When you have finished, save the changes and restart the
kdump
service.sudo systemctl restart kdump.service
- Reboot the instance
Changing the Default Failure State
By default, if Kdump fails to send its result to the configured output locations, it reboots the server. This action deletes any data that has been collected for the dump. To prevent this outcome, change the Kdump configuration.
- From a command line, use your administrative privileges and connect to the instance using SSH.
- Edit
/etc/kdump.conf
to uncomment and change thedefault
value in the file as follows:default dump_to_rootfs
The
dump_to_rootfs
option tries to save the result to a local directory, which can be useful if a network share is unreachable. You can useshell
instead to copy the data manually from the command line.Note
The
poweroff
,restart
, andhalt
options are also valid for the defaultkdump
failure state. However, performing these actions causes you to lose the collected data if those actions are performed. See thekdump.conf.5
file in the/usr/share/man/man5/kdump.conf.5.gz
archive for more information. - When you have finished, save the changes and restart the
kdump
service.sudo systemctl restart kdump.service
- Reboot the instance.
Triggering a Crash Dump
Test the Kdump configuration by crashing the kernel which triggers the service to collect a crash dump. Then, review the crash dump.
- From the command line, connect to the instance using SSH
- Make sure Kdump is running:
systemctl is-active kdump
- Initiate the crash from the console or command line:This forces the kernel to crash and the dump files are copied into the
echo 1 > /proc/sys/kernel/sysrq echo c > /proc/sysrq-trigger
/var/oled/crash/<ip-address>-<YYYY-MM-DD>-<HH:MM:SS>
directory, by default, or to the location you have selected in the configuration. - Reboot the instance.
- Review the crash dump files in the
/var/oled/crash/<ip-address>-<YYYY-MM-DD>-<HH:MM:SS>
:- The kernel message buffer includes the most essential information about the system crash and it is always dumped first in to the
vmcore-dmesg.txt
file. This is useful when an attempt to get the fullvmcore
file fails, for example because of lack of space on the target location. - As the
kexec
tool boots into the second kernel and captures the contents of the crashed kernel’s memory, it also writes to thekexec-dmes.log
file so you can trace the process. For example, at the end of the file you can see the crash dump save process:... Feb 07 16:28:19 linux9 systemd[1]: Starting Kdump Vmcore Save Service... Feb 07 16:28:19 linux9 kdump[504]: Kdump is using the default log level(3). Feb 07 16:28:19 linux9 kdump[541]: saving to /kdumproot/var/oled/crash/127.0.0.1-2025-02-07-16:28:19/ Feb 07 16:28:19 linux9 kdump[546]: saving vmcore-dmesg.txt to /kdumproot/var/oled/crash/127.0.0.1-2025-02-07-16:28:19/ Feb 07 16:28:19 linux9 kdump[552]: saving vmcore-dmesg.txt complete Feb 07 16:28:19 linux9 kdump[554]: saving vmcore Feb 07 16:28:21 linux9 kdump.sh[555]: Checking for memory holes : [ 0.0 %] / ... Copying data : [100.0 %] \ eta: 0s Feb 07 16:28:21 linux9 kdump.sh[555]: The dumpfile is saved to /kdumproot/var/oled/crash/127.0.0.1-2025-02-07-16:28:19//vmcore-incomplete. Feb 07 16:28:21 linux9 kdump.sh[555]: makedumpfile Completed. Feb 07 16:28:21 linux9 kdump[559]: saving vmcore complete Feb 07 16:28:21 linux9 kdump[561]: saving the /run/initramfs/kexec-dmesg.log to /kdumproot/var/oled/crash/127.0.0.1-2025-02-07-16:28:19//
- The
vmcore
file contains the crash dump information. To analyze the crash dump, you need a utility that can read thevmcore
file format. See Analyzing Crash Dumps for information on using thecrash
utility.
- The kernel message buffer includes the most essential information about the system crash and it is always dumped first in to the
Analyzing Crash Dumps
You can use the crash
utility to analyze the crash dumps collected by Kdump. In Oracle Linux platform images, crash
is installed by default. For other Linux instances, use the command line to install it: sudo dnf install crash
.
Configure an Oracle Linux Instance to Use the crash Utility
To analyze a crash dump with crash
, complete the following configuration tasks:
- From a command line, use your administrative privileges and connect to the instance using SSH.
- Enable the Oracle Linux
debuginfo
repository by creating the/etc/yum.repos.d/debuginfo.repo
file with root privileges and the following contents, for example:[debuginfo] name=Oracle Linux 8 Debuginfo Packages baseurl=https://oss.oracle.com/ol8/debuginfo/ gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle gpgcheck=1 enabled=1
- Update the system:
sudo dnf update -y
- Install the
kernel-uek-debuginfo
package:sudo dnf install -y kernel-uek-debuginfo-$(uname -r)
Important
Run the install command each time the kernel is updated through the package manager. Thedebuginfo
package is only functional when it matches the running kernel, and it's not replaced automatically when a newer kernel version is installed on the system.
Analyze a Crash Dump Using the crash Utility
To analyze a crash dump, provide the vmcore
information to crash
, and then use the crash
shell options retrieve crash dump information. For detailed information about using the crash utility, type man crash
at a command prompt or see the crash documentation.
- From a command line, use your administrative privileges and connect to the instance using SSH.
- Provide the location of the kernel
debuginfo
module and the location of the core dump as parameters to the crash utility, for example:sudo crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux \ /var/oled/crash/<ip-address>-<YYYY-MM-DD>-<HH:MM:SS>/vmcore
$(uname -r) identifies the running kernel version within the command,
<ip-address>-<YYYY-MM-DD>-<HH:MM:SS>
represents the directory that gets created for the crash dump files, andvmcore
file contains the crash dump.The
crash
shell starts and displays some system crash info, such as:KERNEL: /usr/lib/debug/lib/modules/5.15.0-302.167.6.el9uek.x86_64/vmlinux DUMPFILE: /var/oled/crash/127.0.0.1-2025-02-07-16:28:19/vmcore [PARTIAL DUMP] CPUS: 2 DATE: Fri Feb 7 16:28:15 GMT 2025 UPTIME: 01:09:58 LOAD AVERAGE: 0.00, 0.02, 0.00 TASKS: 204 NODENAME: oci-linux9 RELEASE: 5.15.0-302.167.6.el9uek.x86_64 VERSION: #2 SMP Mon Nov 4 23:41:59 PST 2024 MACHINE: x86_64 (2445 Mhz) MEMORY: 16 GB PANIC: "Kernel panic - not syncing: NMI: Not continuing" PID: 0 COMMAND: "swapper/0" TASK: ffffffffb761a980 (1 of 2) [THREAD_INFO: ffffffffb761a980] CPU: 0 STATE: TASK_RUNNING (PANIC) crash>
- At the
crash
prompt, enter an option to get more information about the crash dump, for example:bt -a
displays a stack trace of the active task(s) when the kernel panicked, for example:crash> bt -a PID: 286 TASK: c0b3a000 CPU: 0 COMMAND: "in.rlogind" #0 [c0b3be90] crash_save_current_state at c011aed0 #1 [c0b3bea4] panic at c011367c #2 [c0b3bee8] tulip_interrupt at c01bc820 #3 [c0b3bf08] handle_IRQ_event at c010a551 #4 [c0b3bf2c] do_8259A_IRQ at c010a319 #5 [c0b3bf3c] do_IRQ at c010a653 #6 [c0b3bfbc] ret_from_intr at c0109634 EAX: 00000000 EBX: c0e68280 ECX: 00000000 EDX: 00000004 EBP: c0b3bfbc DS: 0018 ESI: 00000004 ES: 0018 EDI: c0e68284 CS: 0010 EIP: c012f803 ERR: ffffff09 EFLAGS: 00000246 #7 [c0b3bfbc] sys_select at c012f803 #8 [c0b3bfc0] system_call at c0109598 EAX: 0000008e EBX: 00000004 ECX: bfffc9a0 EDX: 00000000 DS: 002b ESI: bfffc8a0 ES: 002b EDI: 00000000 SS: 002b ESP: bfffc82c EBP: bfffd224 CS: 0023 EIP: 400d032e ERR: 0000008e EFLAGS: 00000246
ps -A
displays only the active task on each CPU, for example:crash> ps -A PID PPID CPU TASK ST %MEM VSZ RSS COMM > 10 2 1 ffff880212969710 IN 0.0 0 0 [migration/1] > 0 0 3 ffff884026d43520 RU 0.0 0 0 [swapper] > 6582 1 2 ffff880f49c52040 RU 0.0 42202472 33368 oracle > 9497 1 0 ffff880549ec2ab0 RU 0.0 42314692 138664 oracle
vm
displays basic virtual memory information of the current context, for example:crash> vm PID: 30986 TASK: c0440000 CPU: 0 COMMAND: "bash" MM PGD RSS TOTAL_VM c303fe20 c4789000 88k 1728k VMA START END FLAGS FILE c0d1f540 8048000 80ad000 1875 /bin/bash c0d1f400 80ad000 80b3000 1873 /bin/bash c0d1f880 80b3000 80ec000 77 c0d1f0c0 40000000 40012000 875 /lib/ld-2.1.1.so c0d1f700 40012000 40013000 873 /lib/ld-2.1.1.so c0d1fe00 40013000 40014000 77 c0d1f580 40014000 40016000 73 ...
files
displays information about open files in the current context.crash> files PID: 720 TASK: c67f2000 CPU: 1 COMMAND: "innd" ROOT: / CWD: /var/spool/news/articles FD FILE DENTRY INODE TYPE PATH 0 c6b9c740 c7cc45a0 c7c939e0 CHR /dev/null 1 c6b9c800 c537bb20 c54d0000 REG /var/log/news/news 2 c6df9600 c537b420 c5c36360 REG /var/log/news/errlog 3 c74182c0 c6ede260 c6da3d40 PIPE 4 c6df9720 c696c620 c69398c0 SOCK 5 c6b9cc20 c68e7000 c6938d80 SOCK 6 c6b9c920 c7cc45a0 c7c939e0 CHR /dev/null 7 c6b9c680 c58fa5c0 c58a1200 REG /var/lib/news/history 8 c6df9f00 c6ede760 c6da3200 PIPE
kmem -i
displays kernel memory usage information, for example:crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 1974231 7.5 GB ---- FREE 208962 816.3 MB 10% of TOTAL MEM USED 1765269 6.7 GB 89% of TOTAL MEM SHARED 365066 1.4 GB 18% of TOTAL MEM BUFFERS 111376 435.1 MB 5% of TOTAL MEM CACHED 1276196 4.9 GB 64% of TOTAL MEM SLAB 120410 470.4 MB 6% of TOTAL MEM TOTAL HUGE 524288 2 GB ---- HUGE FREE 524288 2 GB 100% of TOTAL HUGE TOTAL SWAP 2498559 9.5 GB ---- SWAP USED 81978 320.2 MB 3% of TOTAL SWAP SWAP FREE 2416581 9.2 GB 96% of TOTAL SWAP COMMIT LIMIT 3485674 13.3 GB ---- COMMITTED 850651 3.2 GB 24% of TOTAL LIMIT
- When you have finished analyzing the core dump, exit the shell by typing exit or q.