Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Building And Integrating Virtual Private Networks With Openswan (2006).pdf
Скачиваний:
73
Добавлен:
17.08.2013
Размер:
4.74 Mб
Скачать

Debugging and Troubleshooting

Userland Issues: Assertion Failed or Segmentation Faults

When you hit a serious bug, Openswan's IKE daemon Pluto will terminate with either a segmentation fault or with an assertion failed error. When this happens, the plutorun script will automatically restart Pluto. All connections will automatically reload or restart, which could cause the same crash, resulting in a repeating loop.

A segmentation fault always indicates a problem that needs to be addressed by the Openswan development team. The code is simply wrong and needs to be fixed.

An 'assertion failed' error means that Openswan ended up in an unexpected state it should never end up in, and it will also die—although in a somewhat more controlled way—with an error message that usually pinpoints a single line of code referring to one particular state. The decision to have the daemon die on some issues, which sometimes seem fairly innocent, is controversial, but it is vital to the security of Openswan. 'Assertion failed' errors happen when we are reaching a state we should never reach. This normally only happens when handling IKE packets and the internal state of the loaded and active connections are somehow corrupted. If Openswan just logged a warning, and continued to try to work despite this, some serious security breach could be the result. Such breaches could include a flawed encryption state, but also a failure to process an IKE exchange. These assertion failures could be the result of a bad remote IPsec endpoint, but if we create a workaround for this, we want the workaround to be known, and a message to be logged. Ideally there would also be some define option or connection parameter to enable or disable such a workaround.

If you experience segmentation faults or assertion failures, the first thing to do is upgrade to the latest version of Openswan. If that does not help, report back to the Openswan community, preferably to either the bug tracker at http://bugs.Openswan.org, or to the developer mailing list (dev@openswan.org). If you are somewhat familiar with the GNU debugger (gdb), then it would help if you could provide information about the internal state of Openswan at the moment of failure by using gdb on the Pluto core file. To enable core files, add the following option to the

config setup section of ipsec.conf:

dumpdir=/tmp

plutorestartoncrash=no

This will cause Pluto to dump a core file and abort without restarting again on an assertion failure. Depending on a kernel tuning option, usually defined in /etc/sysctl.conf, core files might have a process ID number attached to their filename, for instance core.31337 to prevent old core files being overwritten if Pluto does restart.

Go to Openswan's source directory, for example /usr/src/openswan-2, and start gdb. Here is an example of roughly how this would look:

# gdb programs/pluto/pluto /tmp/core.31337

GNU gdb Red Hat Linux (5.3.90-0.20030710.41rh) Copyright 2003 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions.

There is absolutely no warranty for GDB. Type "show warranty" for details.

This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".

276

Chapter 12

Core was generated by `/usr/local/libexec/ipsec/pluto --nofork --secretsfile /etc/ipsec.secrets -- ipse'.

Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/lib/libgmp.so.3...done. Loaded symbols for /usr/lib/libgmp.so.3

Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2

Reading symbols from /lib/i686/libc.so.6...done. Loaded symbols for /lib/i686/libc.so.6

Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2

0 0x08076a59 in informational (md=0x8111740) at demux.c:1047

1047 if(st->st_connection->extra_debugging & IMPAIR_DIE_ONINFO) { (gdb) bt

0 0x08076a59 in informational (md=0x8111740) at demux.c:1047

1 0x08078dd0 in process_packet (mdp=0x80e4a94) at demux.c:2247

0000002 0x08076cfd in comm_handle (ifp=0x8104340) at demux.c:1167

3 0x0805d72b in call_server () at server.c:1124

4 0x0805a616 in main (argc=8, argv=0xbffff154) at plutomain.c:747 (gdb) bt full

0 0x08076a59 in informational (md=0x8111740) at demux.c:1047 disp_len = 135337912

disp_buf = '\0' <repeats 12 times>, "\034\000\000\000\001", '\0' <repeats 23 times>, "\fîÿ¿\000\000\000\000Ü\02 7\021\bðñ\r\bÕí\r\bPz\r\b\001\000\000\000\000\000\000\000øi\016\b°ñ\r\bÌÝ\021\bÃ\027\021\b (îÿ¿óÑ\n\b \201\r\b\004\000\0 00\000\000\000\000\000ðñ\r\bl\000\000\000\000\000\000\000\004\000\000\000\f", '\0' <repeats 11 times>, "\bîÿ¿\004\000\0 00\000Ðø\r\b\000\000\000\000üÝ\021\b\000\000\000\000\004\000\000\000üíÿ¿T\027\021\b\00 2\000\000\000\002\000\000\000t¦\0

17\bT\027\021\b`9åD \220ÑD\200ïÿ¿" n_pbs = (pb_stream * const) 0x8111824

n = (struct isakmp_notification * const) 0x8111844 st = (struct state *) 0x0

n_pld = (struct payload_digest * const) 0x8111824

. . .

In this case, we can see that the function informational from the file demux.c seems to have become corrupted on line 1047.

If possible, also describe the brand, model, and firmware version of the remote endpoint, and any relevant information from ipsec.conf. With this information, we should be able to pinpoint the problem and resolve things on our end, and perhaps contact the vendor of the remote endpoint about their bug. Please mail this information to dev@openswan.org.

277

Debugging and Troubleshooting

Kernel Issues: Crashes and Oopses

If the problem lies in the kernel subsystem (KLIPS or NETKEY), then things are much harder to debug and get information on. Your system will probably either hang, reboot, or remain in an unknown dangerous state. The dmesg command might still work, giving you a hint as to what happened. When reporting these bugs, it is of course vitally important that we know which kernel stack you are using. The output of ipsec --version will tell us exactly what we need to know about the versions. Be aware that if you are using NETKEY, and you are not using the latest kernel, that we will more than likely ask you to try the latest kernel version first. These days, since the 2.6 kernel is such a rapidly moving target, we may even ask you to try the latest testing version, for example 2.6.12-rc3. If possible, give us the kernel 'oops' message. If it only shows numbers and no function names, run it through the ksymoops utility. For this you might need to specify the exact location of your System.map file. These are commonly found in /boot/ for distributions, or in the root or your Linux kernel source tree if you compiled the kernel yourself. Also ensure you are using the latest kernel utilities, either kernel-utils, modutils, or module- init-tools, depending on your kernel and distribution.

You can also try to eliminate some of your kernel issues by compiling the kernel subsystem (KLIPS or NETKEY) directly into the kernel, instead of as a module.

As of Linux 2.6, module unloading is not encouraged, and the kernel developers suggest that module unloading should not be attempted by end users. Module unloading on 2.4 kernels should work without any problem. If you are experiencing a 99% CPU load upon module unloading, you need to upgrade your module tools.

A lot of IPsec-related fixes went into release 2.6.8.1 and 2.6.11. Do not run anything older than 2.6.11. At the time of writing, some vendor kernels based on 2.6.13, and the official 2.6.14 kernel, have issues that have not yet been completely resolved in the latest version of Openswan, 2.4.0. This is due to massive code changes in the networking stack of those Linux kernels.

Memory Issues

If you are experiencing memory problems, which are more likely to happen on small embedded devices, then you should enable the -DLEAK_DETECTIVE option in Pluto's Makefile and recompile Pluto. Restart Openswan and let it run for a while so that the memory issues occur. Before the entire system runs out of memory, shut down the IPsec subsystem (gracefully, using the proper initscript). Upon shutdown, Pluto will log a lot of memory debugging information that will help us to find out which parts of the daemon are actually leaking the memory. Report these issues to the developer mailing list at dev@openswan.org. If you are an experienced software developer, then you could also help us by using valgrind.

Memory issues for the kernel stacks are a bit more difficult to trace. lsmod allows you to at least monitor how much memory a certain kernel module is using. If this keeps growing, then it signifies some kernel-level memory leak.

278