So excellent articles, but the blogspot.com is blocked by STH. so i paste it here.

Original Link: http://0xfe.blogspot.com/2006/03/troubleshooting-unix-systems-with-lsof.html

One of the least-talked-about tools in a UNIX sysadmin’s toolkit is lsof. Lsof lists information about files opened by processes. But that’s really an understatement.

Most people forget that, in UNIX, (almost) everything is a file. The OS makes hardware available to applications by way of files in /dev. Kernel, system, memory, device etc. information in made available inside files in /proc. TCP/UDP sockets are sometimes represented internally as files. Even directories are really just files containing other filenames.

Lsof works by examining kernel data-structures and provides a variety of information related to files, pipes, sockets and more.

Lsof is installed by default on most Linux distributions, BSD distributions and OS X. Binary packages for Solaris, AIX, HP-UX, *cough*SCO OpenServer*cough* and many other UNIXes (Unices?) are available on the web.

So, just how useful is lsof?

Deciphering its Output

Switch to root, and type lsof on the commandline.

linux# lsof

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME

init 1 root cwd DIR 3,65 4096 2 /

init 1 root rtd DIR 3,65 4096 2 /

init 1 root txt REG 3,65 29556 172317 /sbin/init

init 1 root mem REG 3,65 1166880 93908 /lib/libc-2.3.5.so

init 1 root mem REG 3,65 103053 93909 /lib/ld-2.3.5.so

init 1 root 10u FIFO 3,65 48438 /dev/initctl

ksoftirqd 2 root cwd DIR 3,65 4096 2 /

ksoftirqd 2 root rtd DIR 3,65 4096 2 /

ksoftirqd 2 root txt unknown /proc/2/exe

events/0 3 root cwd DIR 3,65 4096 2 /

events/0 3 root rtd DIR 3,65 4096 2 /

events/0 3 root txt unknown /proc/3/exe

…SNIP…

syslog-ng 6529 root txt REG 3,69 114132 84690 /usr/sbin/syslog-ng

syslog-ng 6529 root mem REG 3,65 1166880 93908 /lib/libc-2.3.5.so

syslog-ng 6529 root mem REG 3,65 64568 93943 /lib/libresolv-2.3.5.so

syslog-ng 6529 root mem REG 3,65 75176 93924 /lib/libnsl-2.3.5.so

syslog-ng 6529 root mem REG 3,65 103053 93909 /lib/ld-2.3.5.so

syslog-ng 6529 root 0u CHR 1,3 47320 /dev/null

syslog-ng 6529 root 1u CHR 1,3 47320 /dev/null

syslog-ng 6529 root 2u CHR 1,3 47320 /dev/null

syslog-ng 6529 root 3u unix 0xdea00e00 672127 /dev/log

…SNIP…

asterisk 7001 root 10u IPv4 6015 TCP localhost:5038 (LISTEN)

asterisk 7001 root 11r FIFO 3,70 306 /var/run/asterisk/autod

ial.ctl

asterisk 7001 root 12u IPv4 6834 UDP *:5060

asterisk 7001 root 13r FIFO 0,5 6019 pipe

asterisk 7001 root 14u IPv4 6016 TCP localhost:5038->localho

st:32768 (ESTABLISHED)

asterisk 7001 root 15u IPv4 6835 UDP *:2727

asterisk 7001 root 16u IPv4 6861 UDP *:4569

asterisk 7001 root 17u REG 3,70 0 593222 /var/lib/asterisk/astdb

asterisk 7001 root 18r FIFO 0,5 6883 pipe

asterisk 7001 root 19u REG 3,70 39402 32066 /var/tmp/iaxy.bin-19098

89093 (deleted)

asterisk 7001 root 20w FIFO 0,5 6883 pipe

…LOTS MORE SNIPPED…

What you will be presented with is a very long list of open files, which you might want to pipe through your favourite pager.

By default (on Linux), lsof displays the following information about each open file:

  • COMMAND: The name of the UNIX command associated with the process.

  • PID: The Process ID.

  • USER: The user ID or login name of the user to whom the process belongs.

  • FD: The file descriptor number of the file or a code representing more information about the structure. See manual page for details.

  • TYPE: The type of the node associated with the file. E.g. REG signifies a regular file, IPv4 or IPv6 signifies an IP socket, DIR a directory, “unix” a UNIX domain socket, etc.

  • DEVICE: Usually contains major and minor device numbers for the files, or addresses/references for other structures.

  • SIZE: The size of the file or the file offset, in bytes. (If available.) In the case of files that don’t have true sizes (eg., sockets, pipes), lsof displays the size of the content their kernel buffer descriptors.

  • NODE: Node number / inode / Internet protocol type (TCP) etc.

  • NAME: The name of the file / mount point / device / Internet address / etc.

For a comprehensive description of these fields, refer the lsof manual page.

Since lsof works by examining kernel memory, you will need root access to be able to fully utilize it. A non-root user will not have access to information that belongs to other users.

Common Usage

Lsof is usually run with one or more of the following options:

  • /path/to/file: List processes, owners and open file descriptors that are currently using the specified file.

  • -i [46][protocol][@hostname hostaddr][:service port]: List Internet files / sockets.
  • -u name: List files owned by user.

  • -p pid: List files open by specified process.

  • -t: Terse output. No headers, only PIDs. Useful within scripts.

  • -n: Disable resolving of network names.

  • -N: List NFS files

These options are ORed by default.

Display all internet files OR files opened by user “foobar”.

# lsof -u foobar -i

To display all internet files that are opened by foobar, you need to apply the AND (-a) condition between the switches.

# lsof -u foobar -a -i

The following recipes demonstrate how lsof can be used to troubleshoot real-world problems.

Recipe #1: Finding Port Hogs

Your web-server is refusing to come up because port 80 is in use by another process. How do you track down the offending process?

# lsof -i

… SNIP …

asterisk 7554 root 16u IPv4 6861 UDP *:4569

postmaste 7688 postgres 5u IPv4 5955 UDP localhost:32768->localhost:32768

postmaste 7689 postgres 5u IPv4 5955 UDP localhost:32768->localhost:32768

sshd 27038 root 3u IPv4 677971 TCP reddwarf:ssh->CPE.xxxx.com:61702 (ESTABLISHED)

sshd 27043 mohit 3u IPv4 677971 TCP reddwarf:ssh->CPE.xxxx.com:61702 (ESTABLISHED)

… SNIP …

Nice. A list of open Internet sockets, along with the processes, addresses and owners. Also note that (similar to netstat), the TCP states are displayed. Above, we can see two established ssh sessions in progress.

Let’s add a port filter and find exactly what we’re looking for.

# lsof -i TCP:80

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME

lighttpd 7356 lighttpd 3u IPv4 6409 TCP *:http (LISTEN)

Okay, so lighttpd is the reason why Apache won’t run. That’s probably a good thing.

Recipe #2: Finding Processes Within a Given Port Range

You need to find a range of free ports for your new multimedia application.

# lsof -i TCP:5000-5200

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME

asterisk 7001 root 10u IPv4 6015 TCP localhost:5038 (LISTEN)

asterisk 7001 root 14u IPv4 6016 TCP localhost:5038->localhost:32768 (ESTABLISHED)

asterisk 7002 root 10u IPv4 6015 TCP localhost:5038 (LISTEN)

asterisk 7002 root 14u IPv4 6016 TCP localhost:5038->localhost:32768 (ESTABLISHED)

asterisk 7039 root 10u IPv4 6015 TCP localhost:5038 (LISTEN)

asterisk 7039 root 14u IPv4 6016 TCP localhost:5038->localhost:32768 (ESTABLISHED)

asterisk 7040 root 10u IPv4 6015 TCP localhost:5038 (LISTEN)

asterisk 7040 root 14u IPv4 6016 TCP localhost:5038->localhost:32768 (ESTABLISHED)

asterisk 7041 root 10u IPv4 6015 TCP localhost:5038 (LISTEN)

asterisk 7041 root 14u IPv4 6016 TCP localhost:5038->localhost:32768 (ESTABLISHED)

asterisk 7042 root 10u IPv4 6015 TCP localhost:5038 (LISTEN)

asterisk 7042 root 14u IPv4 6016 TCP localhost:5038->localhost:32768 (ESTABLISHED)

asterisk 7044 root 10u IPv4 6015 TCP localhost:5038 (LISTEN)

asterisk 7044 root 14u IPv4 6016 TCP localhost:5038->localhost:32768 (ESTABLISHED)

perl 7046 root 3u IPv4 6054 TCP *:5100 (LISTEN)

perl 7046 root 4u IPv4 6055 TCP *:5101 (LISTEN)

perl 7046 root 6u IPv4 6056 TCP localhost:32768->localhost:5038 (ESTABLISHED)

asterisk 7073 root 10u IPv4 6015 TCP localhost:5038 (LISTEN)

asterisk 7073 root 14u IPv4 6016 TCP localhost:5038->localhost:32768 (ESTABLISHED)

asterisk 7504 root 10u IPv4 6015 TCP localhost:5038 (LISTEN)

asterisk 7504 root 14u IPv4 6016 TCP localhost:5038->localhost:32768 (ESTABLISHED)

Recipe #3: Listing User Files

What files do users “foobar” and “apache” have open?

# lsof -u foobar,apache

List UDP ports in use by user “mohit”.

# lsof -i UDP -a -u mohit

Who’s responding to “who”?

# lsof -i UDP:who

Recipe #4: Unmounting a Disk or Filesystem

Sometimes you need to track down the user or process that’s blocking you from unmounting a disk.

# umount /opt

umount: /opt: device is busy

umount: /opt: device is busy

# mount grep “/opt”

/dev/hdb9 on /opt type ext3 (rw,noatime)

# lsof /dev/hdb9

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME

perl 7046 root 2w REG 3,73 111 1376386 /opt/local/paynacea/var/state/callmanager.pid.err

perl 7046 root 5w REG 3,73 6783 1376385 /opt/local/paynacea/var/log/callmanager.log

# kill 7046

# umount /opt

Or the simpler:

# kill `lsof -t /opt`

Recipe #5: Finding Device Hogs

Who’s using the audio manager?

# lsof /dev/audio

Why can’t I start my alternate logger?

# lsof /dev/log

Why doesn’t my CD eject?

# lsof /dev/cdrom

Recipe #6: Using Exclusions

The ‘^’ (negated) modifier can prefix the User or Process ID parameters to exclude them from the resulting list. Since they represent exclusions, they are applied without ORing or ANDing and take effect before any other selection criteria are applied.

List all Internet files/sockets open by non-root users.

# lsof -i -u^root

Recipe #7: Recursing Directories

The ‘+D’ option causes lsof to search for open files within a specified directory, recursing down to its complete depth.

List all processes that have files open in /tmp.

# lsof +D /tmp

The ‘+d’ option does the same thing, but does _not_ descend the directory tree.

Recipe #8: Matching by Process Name

List all files open by processes beginning with the letters mpg.

# lsof -c mpg

Using a regular-expression.

# lsof -c ‘/post.*er/’

Recipe #9: Examining Suspicious Processes

Lsof can be used along with strace to examine and monitor the operation of viruses, worms or spyware.

What files are opened by PID 14554?

# lsof -p 14554

Who’s looking at the password file?

# lsof /etc/passwd

Recipe #10: Repeat Mode

The -r switch puts lsof in repeat mode. It delays every 15 seconds (unless specified), and displays another listing.

Watching a user’s open files every 5 seconds:

# lsof -u badcop -r5

Monitoring the password file:

# lsof /etc/passwd -r 2

Recipe #11: Finding Deleted Open Files

This recipe was added on 26/Mar/06 after an anonymous poster left a comment regarding deleted files.

One of the most annoying problems is a file-system quickly running out of space, without a hint of what file is responsible for it. This happens when a file (usually a log-file), gets deleted while it’s still being written to. When you delete an open file, the kernel unlinks the file from the directory, but cannot remove the inode, since it’s still open.

This causes the file to continue to grow, with no trace of its existance anywhere. Well… almost anywhere.

Lsof provides the +L parameter to list the number of link counts an open file has. When followed by a number, lsof only displays files with link counts less thatn the specified number.

mohit@reddwarf ~ $ lsof +L3

COMMAND PID USER FD TYPE DEVICE SIZE NLINK NODE NAME

sshd 11540 mohit mem REG 3,69 303448 1 85869 /usr/sbin/sshd

sshd 11540 mohit mem REG 3,65 35404 1 94075 /lib/libnss_nis-2.3.5.so

sshd 11540 mohit mem REG 3,65 30928 1 94086 /lib/libnss_compat-2.3.5.so

sshd 11540 mohit mem REG 3,65 35236 1 93958 /lib/libnss_files-2.3.5.so

sshd 11540 mohit mem REG 3,65 28444 1 94094 /lib/libcrack.so.2.8.0

A deleted file has zero links. So the following command displays deleted-but-open files on a system.

$ lsof +L1

Display a list of deleted-but-open files within a specific filesystem.

$ lsof +aL1 /tmp

Finally

We barely scratched the surface with the above recipes, but as you can see, lsof is a powerful troubleshooting tool. I’d be interested in learning what other users do with lsof. Toy with it, tinker with it, use it and let me know how it has helped you.