Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Linux Timesaving Techniques For Dummies.pdf
Скачиваний:
59
Добавлен:
15.03.2015
Размер:
15.98 Mб
Скачать

Using lsof to Find Out Which Files Are Open

449

Finally, we show you a tool that can help somebody else track down a bug. valgrind watches running programs for memory usage errors and produces a report that can show a developer exactly where a program gets whacked. If you’re a developer, valgrind makes it easy to track down intermittent problems. If not, run valgrind and send the report to the developer of the program that you’re trying to use.

In this technique, we introduce you to a few diagnostic tools that you should keep in your troubleshooting toolbox. Each tool exposes different information, so each one is useful in tracking down a particular type of problem. Using these tools together, you can make quick work of troubleshooting difficult programs.

Using lsof to Find Out

Which Files Are Open

Linux is pretty forgiving when it comes to file sharing. In most cases, two (or more) users can work on the same file at the same time (don’t try that with a text editor though). You can share devices, too. For example, mount a CD, and anyone with the proper privileges can access the content, even if you’re using the CD at the same time. Sometimes, however, a device (or a file) can’t be shared.

Say that you’re in the middle of burning a multisession CD using a program like k3b. You’ve just finished burning the first track, you’ve selected the files you want to burn to the second session, you click the Burn button, and you’re rewarded with the following error messages:

Could not retrieve multisession information from disk

The disk is either empty or not appendable

Now, you know that the CD is not empty, and you’re pretty sure that you selected Start Multisession when you burned the first track. Just to make sure the disc really has something already on it, you try to mount the CD at the command line:

# mount /dev/cdrom /mnt/cdrom mount: /dev/cdrom already mounted

That looks pretty fishy — Linux thinks that someone has already mounted the CD. Try to unmount the CD:

# umount /dev/cdrom

umount: /mnt/cdrom: device is busy

That explains the original error message (well, sort of — that’s not a very helpful error message). How do you find the culprit? Use the lsof command. lsof provides information about open files. The lsof command operates in three modes:

If you run lsof without any arguments, you’re greeted with a (very long) list of all the open files, devices, and network connections on your system.

Give lsof the name of a file, directory, or device, and you see a list of all the processes currently using that file.

Finally, if you give lsof a process ID, it shows you all the files being used by that process.

Use the second form to find out who’s using your CD drive:

# /usr/sbin/lsof /dev/cdrom

COMMAND

PID

USER

FD

NAME

bash

1406

freddie

cwd

/mnt/cdrom

Aha! freddie has managed to mount the CD just after you finished burning the first session. The cwd (in the column labeled FD) tells you that the current working directory of freddie’s bash session is /mnt/ cdrom.

Check out Technique 36 for the complete lowdown on lsof.

Now that you know who’s using your CD drive, you can ask him to unmount the drive and let you continue. Drop an e-mail to the developers suggesting a more meaningful error message while you’re burning the next track.

450 Technique 59: Troubleshooting Persnickety Programs

Debugging Your Environment with strace

Software problems can be subtle. Every once in a while, you run into a program that worked yesterday but refuses to work today. When that happens, start out by assuming that the program didn’t suddenly break just because it happens to be Friday the 13th. Find out what’s changed in your environment.

In the preceding section, you saw that the lsof command can tell you which files are currently opened by a process, but it has one drawback: It won’t tell you about files that the process tried to open, but failed. Fortunately, Linux has another tool that can tell you much more about a process: strace.

The strace command reaches into a running program and exposes all the interaction between the program and the operating system. You can find out a lot from strace, but you can also get lost in the mass of data. Here’s how we recently used strace (and its cousin ltrace) to track down a problem. We wanted to download an RPM package using wget (see Technique 14) and found that wget was getting stuck somewhere: It simply refused to connect to the remote server.

Here’s the command that we used:

$ wget http://www.example.com/package.rpm

This command should connect to the www.example.com Web server and download the file named package.rpm. To use strace to track down the problem, simply run the command that you want to watch, but put the word strace at the start of the command line, like this:

$strace wget http://www.example.com/ package.rpm

strace runs the program (wget) for us and starts tracing all the system calls (calls to the kernel). Because we know that wget is hanging (just stopping) after a while, we let strace messages scroll by and hope that the display pauses after a while.

Eventually (after a few seconds), we notice that wget seems to get stuck in a loop, repeating the system calls shown in Listing 59-1.

We could read up on all the functions shown in Listing 59-1, trying to figure out what wget is doing, but we noticed that two functions seem to be delaying the loop. Because we’re looking for something that causes a delay, it makes sense to focus on those functions first: nanosleep() and connect(). The man page for nanosleep tells us that nanosleep “pauses execution for a specified time.” That doesn’t sound like a bug — wget would not include a call to nanosleep() unless it was required. The second function, connect(), looks more promising.

It’s difficult to tell from the strace output, but wget calls connect() with three arguments: 3, an IP address (192.168.0.22), and 16. A quick glance at the manual confirms that connect() does in fact expect three parameters: a socket (3), a server address (192.168.0.22), and the length of the server address (16). If you look closely at the end of Listing 59-1, you’ll see that connect() returns an error:

EHOSTUNREACH (No route to host).

Next, we ping www.example.com and notice two interesting points (see Listing 59-2):

We can connect to www.example.com (that is, ping is not reporting an EHOSTUNREACH error like wget did).

The IP address of www.example.com is 192.0.34.166, but wget is trying to connect to a different host (192.168.0.22). Why?

To paraphrase Lewis Carroll, fishier and fishier. Time for a new tool: ltrace.

 

Investigating Programs with ltrace

451

LISTING 59-1: STRACE DISPLAYS WGET SYSTEM CALLS

 

 

 

 

 

...

 

 

time(NULL)

= 1079184678

 

rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8)

= 0

 

rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8)

= 0

 

rt_sigprocmask(SIG_SETMASK, [], NULL, 8)

= 0

 

nanosleep({2, 0}, {2, 0})

= 0

 

time(NULL)

= 1079184680

 

access(“-”, F_OK)

= -1 ENOENT (No such file or directory)

 

socket(PF_INET, SOCK_STREAM, IPPROTO_IP)

= 3

 

connect(3, {sa_family=AF_INET, sin_port=htons(9877), \ sin_addr=inet_addr(“192.168.0.22”)}, 16) = -1 EHOSTUNREACH (No route to host)

close(3)

...

LISTING 59-2: PINGING WWW.EXAMPLE.COM

$ ping www.example.com

PING www.example.com (192.0.34.166) 56(84) bytes of data.

64 bytes from www.example.com (192.0.34.166): icmp_seq=0 ttl=54 time=736 ms 64 bytes from www.example.com (192.0.34.166): icmp_seq=1 ttl=54 time=802 ms 64 bytes from www.example.com (192.0.34.166): icmp_seq=2 ttl=54 time=756 ms

...

Investigating Programs with ltrace

strace shows you the system calls that a program makes — calls into the Linux kernel. ltrace is another program that lets you peek under the hood of a running program. ltrace displays a running log of the shared-library calls made by a program. A shared library is a collection of functions that provide common functionality to many programs. The C runtime library is a shared library. The KDE graphical toolkit is a shared library (so is the GNOME library, GTK).

To run ltrace, use the same technique that you use to run strace. Just prefix the command that you want to watch with the word ltrace:

$ ltrace wget \

The output from ltrace is usually more voluminous than strace, but it’s often more interesting. If you run ltrace, you’ll probably notice that the display is whizzing by too fast to read. Press Ctrl-C to stop the program and then change the command line to this:

$ltrace -o /tmp/wget.trc wget \ http://www.example.com/package.rpm

The -o filename option redirects ltrace output to the named file (in this case, /tmp/wget.trc). A nice side effect of the -o filename option is that normal output from wget is clearly displayed rather than buried in ltrace output.

After letting wget (and ltrace) run for a few moments, we cancel the program by pressing Ctrl-C and then start browsing through the log.

Before going much further, here’s a quick reminder of how we got here:

http://www.example.com/package.rpm

452 Technique 59: Troubleshooting Persnickety Programs

We’re trying to download a package from www.example.com, and the wget command is hanging.

strace shows that wget is spending a lot of time trying to reach host 192.168.0.22.

The call to connect (192.168.0.22) is failing with a “no route to host” error.

ping reveals that wget is trying to connect to the wrong host.

Because we’re interested in finding out why wget is trying to connect to the wrong host, we search for 192.168.0.22 in the ltrace log and find a section that looks like this:

...

memcpy(0x0866c985, “”, 0) = 0x0866c985 getenv(“http_proxy”) = “192.168.0.22:9877” strlen(“192.168.0.22:9877”)= 17 malloc(25) = 0x0866c990?

...

The first reference tells us that the getenv() function returned 192.168.0.22:9877. The man page for getenv() states that getenv() returns the value of an environment variable; in this case, the environment variable is named http_proxy. Now we’re getting somewhere. A quick check shows that we do in fact have an environment variable named http_proxy and its value is indeed 192.168.0.22:9877:

$ echo $http_proxy 192.168.0.22:9877

The documentation for wget explains that the http_proxy environment variable specifies a proxy server (a server that carries out network requests on behalf of another computer). Now we know where wget is getting the mystery IP address. The solution to this problem is simply to remove the environment variable and try again, as shown in the following example:

$ unset http_proxy

$ wget http://www.example.com/package.rpm --10:48:38-- http://www.example.com/

package.rpm

=> `package.rpm’ Resolving www.example.com... done. Connecting to

www.example.com[192.0.34.166]:80...

connected.

HTTP request sent, awaiting response...

...

Problem solved — the host 192.168.0.22 is a computer on our local network that we were using to test out proxy server software.

Handy strace and ltrace Options

strace and ltrace are both powerful tools, but they can generate a ton of output. Table 59-1 lists a few command line options that make strace easier to use.

TABLE 59-1: HANDY STRACE OPTIONS

Option

What It Does

-o filename

Redirects the strace log to

 

filename. You can browse through

 

the strace log after the program

 

completes, or you can follow along

 

by opening a second terminal win-

 

dow and running the command

 

tail -f filename.

-f

Forces strace to trace child (and

 

grandchild, great grandchild, and

 

so on) processes spawned by the

 

program you’re tracing.

-v

Tells strace to print complete

 

structure content rather than an

 

abbreviated (space-saving) version.

-s size

This option tells strace to print, at

 

most, size characters when dis-

 

playing string arguments. (The

 

default value is 32, and that may be

 

too short for some programs.)

Соседние файлы в предмете Операционные системы