Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Absolute BSD - The Ultimate Guide To FreeBSD (2002).pdf
Скачиваний:
25
Добавлен:
17.08.2013
Размер:
8.15 Mб
Скачать

Apparent Gdb Weirdness

You could try digging a little further into the data to see what's going on. The second variable in our panic (vp−>v_rdev−>si_hlist) actually goes on a bit; let's take a look a little deeper into it:

...............................................................................................

(kgdb) p vp−>v_rdev

There is no member named v_rdev.

(kgdb)

...............................................................................................

Normally, this would work, and if you've used gdb before, you might think that gdb is wrong, but in this case it's correct. Here, v_rdev is a convenience macro, though only people who have read the kernel source code would know that. Actually, v_rdev expands to v_un.vu_spec.vu_specinfo. You couldn't be expected to know that, but don't be surprised if a developer asks you to type something different than what actually appears in the trace.

To view vp−>v_rdev, enter this command:

...............................................................................................

(kgdb) p vp−>v_un.vu_spec.vu_specinfo

$5 = (struct specinfo *) 0x0

(kgdb)

...............................................................................................

If you've gotten this far, you should be able to recognize the null pointer here, but that's about it.

Results

In this particular case, your extra digging would produce the answer for a developer very quickly. The tidbit in the contents of the vp structure identifies the problem almost immediately.

...............................................................................................

v_type = VBAD

...............................................................................................

This is a vnode that isn't currently used, and shouldn't even be in this part of the system. A developer would jump directly on that, and try to learn why the system is trying to set a new vnode to a bogus value.

I got this particular kernel dump from a kernel developer, who commented that while he "could fix vcount() to return 0 for invalid vnodes—it wouldn't, strictly speaking, be incorrect—but the *real* bug is somewhere else, and ‘fixing’ vcount() would just hide it." This is the correct attitude to have on this sort of problem—BSD users expect bugs to be found, not painted over. This means, however, that you can expect your developer to come back to you with requests for further information, and probably more things to type into gdb. He might even ask you to send the kernel.debug and vmcore file.

462

Vmcore and Security

The vmcore file contains everything in your system's memory at the time of the panic, which may include all sorts of security−impacting information. Someone could conceivably use this information to break into your system. A developer might write you and request a copy of the file for all sorts of legitimate reasons: It makes debugging easier and can save countless rounds of email. Still, consider the potential consequences of someone having this information very carefully. If you don't recognize the person who asks, or if you don't trust her, there's no way you should send the file!

If the panic is reproducible, however, you can cold−boot the system to single−user mode and trigger the panic immediately. That way, if the system never starts any programs that contain confidential information, and nobody types any passwords into the system, the dump cannot contain that information. Reproducing a panic in single−user mode hence generates a "clean" core file.

To prepare a clean core file, enter boot −s at the loader prompt to bring the system to a command prompt, then do the minimal setup necessary to prepare a dump and panic the system:

...............................................................................................

#dumpon /dev/ad0s4b

#mount −art ufs

#/usr/local/bin/command_that_panics_the_system

...............................................................................................

The first line tells the system where to put its dump (put your correct swap partition name here). The second line mounts the filesystems as read−only, so you won't have to fsck after your panic. (Since you know the crash is coming, why make yourself fsck?) Finally, you run the command that triggers the panic. You may need some additional commands, depending on your local setup, but this should get you up and running in most cases.

Symbols vs. No Symbols

As a final treat, here's a debugging session from the same panic and the same kernel, but without debugging symbols. Compare it to the initial output from where, discussed earlier in the chapter.

...............................................................................................

(kgdb) where

#0 0xc01c5982 in dumpsys () #1 0xc0143119 in db_fncall ()

#2 0xc0142f33 in db_command ()

#3 0xc0142fff in db_command_loop () #4 0xc0145393 in db_trap ()

#5 0xc02ad0f6 in kdb_trap () #6 0xc02ba004 in trap_fatal () #7 0xc02b9d71 in trap_pfault () #8 0xc02b9907 in trap ()

#9 0xc01ffb23 in vcount ()

#10 0xc01a5e58 in spec_close ()

#11 0xc01a55f1 in spec_vnoperate () #12 0xc0207454 in vn_close ()

#13 0xc0207fab in vn_closefile () #14 0xc01b1d50 in fdrop_locked () #15 0xc01b155a in fdrop ()

463

#16 0xc01b152d in closef () #17 0xc01b114e in fdfree () #18 0xc01b5173 in exit1 ()

#19 0xc01b4ec2 in sys_exit () #20 0xc02ba2b7 in syscall ()

#21 0xc02ae06d in syscall_with_err_pushed () #22 0x80503a5 in ?? ()

#23 0x807024a in ?? () #24 0xbfbfffb4 in ?? () #25 0x807daaf in ?? () #26 0x807d6eb in ?? () #27 0x80630c1 in ?? () #28 0x8062fed in ?? () #29 0x805ea4c in ?? () #30 0x8065949 in ?? () #31 0x806544d in ?? () #32 0x806dc17 in ?? () #33 0x80616b7 in ?? () #34 0x80613f0 in ?? () #35 0x8048135 in ?? ()

...............................................................................................

That's it. There are no hints here about where the panic happened, just the function names that happened. An extraordinarily experienced hacker might happen to recognize a place in the kernel where the exact system calls take place, in exactly this order. If the kernel developer is really, really interested in the problem, he could get some information out of it like this:

...............................................................................................

(kgdb) p vcount

$1 = {<text variable, no debug info>} 0xc01ffb00 <vcount> (kgdb) up 9

#9 0xc01ffb23 in vcount ()

(kgdb) p/x 0xc01ffb23 − 0xc01ffb00 $2 = 0x23

(kgdb)

...............................................................................................

The p/x command means "print in hexadecimal." Here, we've learned roughly how far into vcount() the problem happened. If the developer has a similar kernel built with similar source code, he can do this:

...............................................................................................

(kgdb) l *(vcount + 0x23)

0xc01fb913 is in vcount (../../../kern/vfs_subr.c:2301).

2296

struct vnode *vq;

2297

int count;

2298

 

2299

count = 0;

2300

mtx_lock(&spechash_mtx);

2301

SLIST_FOREACH(vq, &vp−>v_rdev−>si_hlist, v_specnext)

2302

count += vq−>v_usecount;

2303

mtx_unlock(&spechash_mtx);

2304

return (count);

2305

}

(kgdb)

...............................................................................................

That's it. There's no way to get the bad vnode information out. The developer is left on his own, poking through the code to see if he can figure out the problem via sheer dogged determination. And in any event, it's very unlikely that any developer capable of working on a problem will have the

464