Collecting Crash Dumps Using Kdump Utility

When an Oracle Linux system experiences a kernel panic and crashes unexpectedly or hangs, information about the system state and kernel calls leading up to the crash can be useful for troubleshooting. The Kdump feature provides a dumping mechanism for kernel crash information. In Oracle Linux platform images, the OS is either fully configured or partially configured to generate a crash dump, depending on the image release date.

Note

If you have your own Linux image or a marketplace one, you need to install and configure Kdump using the command line.

Kdump includes a second kernel, that resides in a reserved part of the system memory, so that it can capture information about a stopped kernel. Kdump uses the kexec system call to boot into the second kernel, called a capture kernel, without the need to reboot the system, and then captures the contents of the stopped kernel’s memory as a crash dump. For more information about the contents of a crash dump, see What's Inside a Linux Kernel Core Dump.

For Oracle Linux instances, crash dump information collected by Kdump is copied into the /var/oled/crash/<ip-address>-<YYYY-MM-DD>-<HH:MM:SS> directory, by default. A new <ip-address>-<YYYY-MM-DD>-<HH:MM:SS> directory is created for each crash dump, for example:
[opc@<instance_name> crash] ls -a

127.0.0.1-2025-02-07-15:18:07
127.0.0.1-2025-02-07-16:28:19
The dump directory contains the crash dump file, vmcore, a text file and a log file, for example:
[opc@<instance_name> <127.0.0.1-2025-02-07-16:28:19>] ls -a

vmcore
vmcore-dmesg.txt
kexec-dmesg.log
Important

If you have an Oracle Linux instance that is unreachable or unresponsive, you can send a diagnostic interrupt to troubleshoot. A diagnostic interrupt causes the instance's OS to crash and reboot. To use the console or API to send a diagnostic interrupt, you must have Kdump configured to generate a crash dump. For more information, see Sending a diagnostic interrupt.

Setting the Memory Reserved for a Crash Dump

If you are using an Oracle Linux platform image, Kdump is installed and either fully configured or partially configured. You can change the memory amount that is reserved on the kernel to save the crash dump, also called a crashkernel memory reservation. In Oracle Linux 8, and earlier, the default memory reservation is set to adjust automatically: GRUB_CMDLINE_LINUX="crashkernel=auto". However, crashkernel=auto is not supported for Oracle Linux 9, so you must set a specific amount of reserved memory using the crashkernel parameter.

To set the memory reservation for a crash dump:

  1. From a command line, use your administrative privileges and connect to the instance using SSH.
  2. Edit the /etc/default/grub file to set the reserved memory. For example:
    • Set the memory reserve to 64 MB, for example:
      GRUB_CMDLINE_LINUX="crashkernel=64MB"
    • Set the amount of reserved memory as a variable using the crashkernel=<range1>:<size1>,<range2>:<size2> syntax. For example:
      GRUB_CMDLINE_LINUX="crashkernel=512M-2G:64M,2G-:128M"
    • Define an offset value for the reserved memory. Because the crashkernel reservation occurs early in the boot process, some systems require that you reserve memory with a certain fixed offset. When a fixed offset is specified, the reserved memory begins at that point. For example, to reserve 128 MB of memory, starting at 16 MB:
      GRUB_CMDLINE_LINUX="crashkernel=128M@16M"
  3. Save the changes and refresh the grub configuration:
    sudo grub2-mkconfig -o /boot/grub2/grub.cfg
  4. Reboot the instance to apply changes.

Changing Crash Dump Location

Using the /etc/kdump.conf, you can change the location in which the crash dump files are saved, transfer them via SSH or export them to a network share.

  1. From a command line, use your administrative privileges and connect to the instance using SSH.
  2. Edit the configuration file at /etc/kdump.conf file and remove the # comment character at the beginning of each line that you want to enable.
    • Change the default directory (/var/oled/crash) for the crash dump files, for example:
      path /usr/local/cores
    • Transfer crash dump files over a secure shell connection, for example:
      ssh <user@example.com>
      sshkey /root/.ssh/<mykey>
    • Set the crash dump files to be exported to a compatible network share, for example:
      nfs <example.com>:/<output>

    See the kdump.conf.5 file in the /usr/share/man/man5/kdump.conf.5.gz archive for more information.

  3. When you have finished, save the changes and restart the kdump service.
    sudo systemctl restart kdump.service 
  4. Reboot the instance

Changing the Default Failure State

By default, if Kdump fails to send its result to the configured output locations, it reboots the server. This action deletes any data that has been collected for the dump. To prevent this outcome, change the Kdump configuration.

  1. From a command line, use your administrative privileges and connect to the instance using SSH.
  2. Edit /etc/kdump.conf to uncomment and change the default value in the file as follows:
    default dump_to_rootfs

    The dump_to_rootfs option tries to save the result to a local directory, which can be useful if a network share is unreachable. You can use shell instead to copy the data manually from the command line.

    Note

    The poweroff, restart, and halt options are also valid for the default kdump failure state. However, performing these actions causes you to lose the collected data if those actions are performed. See the kdump.conf.5 file in the /usr/share/man/man5/kdump.conf.5.gz archive for more information.

  3. When you have finished, save the changes and restart the kdump service.
    sudo systemctl restart kdump.service
  4. Reboot the instance.

Triggering a Crash Dump

Test the Kdump configuration by crashing the kernel which triggers the service to collect a crash dump. Then, review the crash dump.

  1. From the command line, connect to the instance using SSH
  2. Make sure Kdump is running:
    systemctl is-active kdump
  3. Initiate the crash from the console or command line:
    echo 1 > /proc/sys/kernel/sysrq
    echo c > /proc/sysrq-trigger
    This forces the kernel to crash and the dump files are copied into the /var/oled/crash/<ip-address>-<YYYY-MM-DD>-<HH:MM:SS> directory, by default, or to the location you have selected in the configuration.
  4. Reboot the instance.
  5. Review the crash dump files in the /var/oled/crash/<ip-address>-<YYYY-MM-DD>-<HH:MM:SS>:
    • The kernel message buffer includes the most essential information about the system crash and it is always dumped first in to the vmcore-dmesg.txt file. This is useful when an attempt to get the full vmcore file fails, for example because of lack of space on the target location.
    • As the kexec tool boots into the second kernel and captures the contents of the crashed kernel’s memory, it also writes to the kexec-dmes.log file so you can trace the process. For example, at the end of the file you can see the crash dump save process:
      ...
      Feb 07 16:28:19 linux9 systemd[1]: Starting Kdump Vmcore Save Service...
      Feb 07 16:28:19 linux9 kdump[504]: Kdump is using the default log level(3).
      Feb 07 16:28:19 linux9 kdump[541]: saving to /kdumproot/var/oled/crash/127.0.0.1-2025-02-07-16:28:19/
      Feb 07 16:28:19 linux9 kdump[546]: saving vmcore-dmesg.txt to /kdumproot/var/oled/crash/127.0.0.1-2025-02-07-16:28:19/
      Feb 07 16:28:19 linux9 kdump[552]: saving vmcore-dmesg.txt complete
      Feb 07 16:28:19 linux9 kdump[554]: saving vmcore
      Feb 07 16:28:21 linux9 kdump.sh[555]: 
      Checking for memory holes                         : [  0.0 %] /                  
      ...
      Copying data                                      : [100.0 %] \           eta: 0s
      Feb 07 16:28:21 linux9 kdump.sh[555]: The dumpfile is saved to /kdumproot/var/oled/crash/127.0.0.1-2025-02-07-16:28:19//vmcore-incomplete.
      Feb 07 16:28:21 linux9 kdump.sh[555]: makedumpfile Completed.
      Feb 07 16:28:21 linux9 kdump[559]: saving vmcore complete
      Feb 07 16:28:21 linux9 kdump[561]: saving the /run/initramfs/kexec-dmesg.log to /kdumproot/var/oled/crash/127.0.0.1-2025-02-07-16:28:19//
    • The vmcore file contains the crash dump information. To analyze the crash dump, you need a utility that can read the vmcore file format. See Analyzing Crash Dumps for information on using the crash utility.

Analyzing Crash Dumps

You can use the crash utility to analyze the crash dumps collected by Kdump. In Oracle Linux platform images, crash is installed by default. For other Linux instances, use the command line to install it: sudo dnf install crash.

Configure an Oracle Linux Instance to Use the crash Utility

To analyze a crash dump with crash, complete the following configuration tasks:

  1. From a command line, use your administrative privileges and connect to the instance using SSH.
  2. Enable the Oracle Linux debuginfo repository by creating the /etc/yum.repos.d/debuginfo.repo file with root privileges and the following contents, for example:
    [debuginfo]
    name=Oracle Linux 8 Debuginfo Packages
    baseurl=https://oss.oracle.com/ol8/debuginfo/
    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
    gpgcheck=1
    enabled=1
  3. Update the system:
    sudo dnf update -y
  4. Install the kernel-uek-debuginfo package:
    sudo dnf install -y kernel-uek-debuginfo-$(uname -r)
    Important

    Run the install command each time the kernel is updated through the package manager. The debuginfo package is only functional when it matches the running kernel, and it's not replaced automatically when a newer kernel version is installed on the system.

Analyze a Crash Dump Using the crash Utility

To analyze a crash dump, provide the vmcore information to crash, and then use the crash shell options retrieve crash dump information. For detailed information about using the crash utility, type man crash at a command prompt or see the crash documentation.

  1. From a command line, use your administrative privileges and connect to the instance using SSH.
  2. Provide the location of the kernel debuginfo module and the location of the core dump as parameters to the crash utility, for example:
    sudo crash /usr/lib/debug/lib/modules/$(uname -r)/vmlinux \
     /var/oled/crash/<ip-address>-<YYYY-MM-DD>-<HH:MM:SS>/vmcore

    $(uname -r) identifies the running kernel version within the command, <ip-address>-<YYYY-MM-DD>-<HH:MM:SS> represents the directory that gets created for the crash dump files, and vmcore file contains the crash dump.

    The crash shell starts and displays some system crash info, such as:

    KERNEL: /usr/lib/debug/lib/modules/5.15.0-302.167.6.el9uek.x86_64/vmlinux
        DUMPFILE: /var/oled/crash/127.0.0.1-2025-02-07-16:28:19/vmcore  [PARTIAL DUMP]
            CPUS: 2
            DATE: Fri Feb  7 16:28:15 GMT 2025
          UPTIME: 01:09:58
    LOAD AVERAGE: 0.00, 0.02, 0.00
           TASKS: 204
        NODENAME: oci-linux9
         RELEASE: 5.15.0-302.167.6.el9uek.x86_64
         VERSION: #2 SMP Mon Nov 4 23:41:59 PST 2024
         MACHINE: x86_64  (2445 Mhz)
          MEMORY: 16 GB
           PANIC: "Kernel panic - not syncing: NMI: Not continuing"
             PID: 0
         COMMAND: "swapper/0"
            TASK: ffffffffb761a980  (1 of 2)  [THREAD_INFO: ffffffffb761a980]
             CPU: 0
           STATE: TASK_RUNNING (PANIC)
    
    crash>
  3. At the crash prompt, enter an option to get more information about the crash dump, for example:
    • bt -a displays a stack trace of the active task(s) when the kernel panicked, for example:
      crash> bt -a
      
      PID: 286    TASK: c0b3a000  CPU: 0   COMMAND: "in.rlogind"
          #0 [c0b3be90] crash_save_current_state at c011aed0
          #1 [c0b3bea4] panic at c011367c
          #2 [c0b3bee8] tulip_interrupt at c01bc820
          #3 [c0b3bf08] handle_IRQ_event at c010a551
          #4 [c0b3bf2c] do_8259A_IRQ at c010a319
          #5 [c0b3bf3c] do_IRQ at c010a653
          #6 [c0b3bfbc] ret_from_intr at c0109634
             EAX: 00000000  EBX: c0e68280  ECX: 00000000  EDX: 00000004  EBP: c0b3bfbc
             DS:  0018      ESI: 00000004  ES:  0018      EDI: c0e68284 
             CS:  0010      EIP: c012f803  ERR: ffffff09  EFLAGS: 00000246 
          #7 [c0b3bfbc] sys_select at c012f803
          #8 [c0b3bfc0] system_call at c0109598
             EAX: 0000008e  EBX: 00000004  ECX: bfffc9a0  EDX: 00000000 
             DS:  002b      ESI: bfffc8a0  ES:  002b      EDI: 00000000 
             SS:  002b      ESP: bfffc82c  EBP: bfffd224 
             CS:  0023      EIP: 400d032e  ERR: 0000008e  EFLAGS: 00000246  
    • ps -A displays only the active task on each CPU, for example:
      crash> ps -A
      
      PID    PPID  CPU       TASK        ST  %MEM    VSZ    RSS  COMM
      >    10      2   1  ffff880212969710  IN   0.0      0      0   [migration/1]
      >     0      0   3  ffff884026d43520  RU   0.0      0      0   [swapper]
      >  6582      1   2  ffff880f49c52040  RU   0.0 42202472  33368  oracle
      >  9497      1   0  ffff880549ec2ab0  RU   0.0 42314692 138664  oracle
    • vm displays basic virtual memory information of the current context, for example:
      crash> vm
      
      PID: 30986  TASK: c0440000  CPU: 0   COMMAND: "bash"
             MM       PGD       RSS    TOTAL_VM
          c303fe20  c4789000    88k      1728k
            VMA      START      END     FLAGS  FILE
          c0d1f540   8048000   80ad000  1875   /bin/bash
          c0d1f400   80ad000   80b3000  1873   /bin/bash
          c0d1f880   80b3000   80ec000    77
          c0d1f0c0  40000000  40012000   875   /lib/ld-2.1.1.so
          c0d1f700  40012000  40013000   873   /lib/ld-2.1.1.so
          c0d1fe00  40013000  40014000    77
          c0d1f580  40014000  40016000    73
      ...
    • files displays information about open files in the current context.
      crash> files
      
      PID: 720    TASK: c67f2000  CPU: 1   COMMAND: "innd"
          ROOT: /    CWD: /var/spool/news/articles
           FD    FILE     DENTRY    INODE    TYPE  PATH
            0  c6b9c740  c7cc45a0  c7c939e0  CHR   /dev/null
            1  c6b9c800  c537bb20  c54d0000  REG   /var/log/news/news
            2  c6df9600  c537b420  c5c36360  REG   /var/log/news/errlog
            3  c74182c0  c6ede260  c6da3d40  PIPE
            4  c6df9720  c696c620  c69398c0  SOCK
            5  c6b9cc20  c68e7000  c6938d80  SOCK
            6  c6b9c920  c7cc45a0  c7c939e0  CHR   /dev/null
            7  c6b9c680  c58fa5c0  c58a1200  REG   /var/lib/news/history
            8  c6df9f00  c6ede760  c6da3200  PIPE
    • kmem -i displays kernel memory usage information, for example:
      crash> kmem -i
      
                           PAGES        TOTAL      PERCENTAGE
              TOTAL MEM  1974231       7.5 GB         ----
                   FREE   208962     816.3 MB   10% of TOTAL MEM
                   USED  1765269       6.7 GB   89% of TOTAL MEM
                 SHARED   365066       1.4 GB   18% of TOTAL MEM
                BUFFERS   111376     435.1 MB    5% of TOTAL MEM
                 CACHED  1276196       4.9 GB   64% of TOTAL MEM
                   SLAB   120410     470.4 MB    6% of TOTAL MEM
          
             TOTAL HUGE   524288         2 GB         ----
              HUGE FREE   524288         2 GB  100% of TOTAL HUGE
          
             TOTAL SWAP  2498559       9.5 GB         ----
              SWAP USED    81978     320.2 MB    3% of TOTAL SWAP
              SWAP FREE  2416581       9.2 GB   96% of TOTAL SWAP
          
           COMMIT LIMIT  3485674      13.3 GB         ----
              COMMITTED   850651       3.2 GB   24% of TOTAL LIMIT
  4. When you have finished analyzing the core dump, exit the shell by typing exit or q.