Monday, July 6, 2009

File system full - what to look for : SUN

Generic info for SUN servers -
There are several reasons why a filesystem gets full. An important thing to consider is how you set up your filesystems during installation,;you need to take care how much space is used for each filesystem and think ahead.
With forward thinking it is less likely that your filesystems will get full, but will not prevent a filesystem getting full. This document will show the most common reasons why a filesystem may become full and how to handle them.Resolution Top
Below is written mainly to deal with the OS filesystems (such as root, var and usr), but it can be utilized to troubleshoot other filesystems. There are many ways of finding what's filling up a filesystem, which can sometimes be a difficult process. One problem is that a filesystem can be filled up by one or few very large files (which is generally easy to find) or by thousands of smaller files (which can be difficult to find and pinpoint the cause).
First you need to figure out which files are filling up your filesystem.

A very useful way to list the size of files in a filesystem is with the du command.
The following example lists files from largest to smallest on the root filesystem:
$ du -akd / sort -nr more
or
$ du -akd / sort -nr > /tmp/du.out

The latter will give you a file you can review at your convenience.
The -d option of the du command keeps du from crossing partition boundaries.
The “-a” option tells du to report file sizes (without this option du just reports the amount of space used in each directory. The “-k” option means that du will report in terms of kilobytes rather than 512-byte blocks. On Solaris 9 or later replace “k” with “h” if you prefer “human-readable” output, that is output in terms of kilobytes, megabytes or gigabytes depending on the number reported.
The -nr option of sort puts the files in reverse numerical order.
Of course, this can be used on filesystems other than root, just substitute the required path for “/” in the "du" command.

The command “du -skd /” summarizes the amount of kilobytes used for a filesystem, in the given case for the root filesystem. If this is different from what is reported by the df -k command, one may check the InfoDocs 4083 and 17720 for further explanation and troubleshooting tips.

One common problem with df showing more usage than du is existing data or files in directories that are used as mount points.
INFODOC 4083 covers this, but the basic solution is given again here for convenience.
Unmount any mounted filesystems and check the mount point directories for files. Remove the files, or move them if you think you need them, and mount the filesystems again.
For the /tmp filesystem, you will have to boot the system into single-user mode to access the /tmp directory without having swap mounted over it.
For /var and /usr, you will have to boot the system from cdrom, mount the root filesystem, and then check the /var and /usr directories under the mounted root filesystem. These should normally be empty when /var or /usr is not mounted.

Another good way to search for files is to use the command '/usr/bin/find'. There is a good document how to use the find command, see infodoc 13678
Standard filesystems to look at first will be:
Filesystem
Checks
/tmp
If /tmp is full or contains large files, a reboot will clean this directory. A default Solaris installation shares the diskspace for /tmp and swap as you can see in the output of the df command.
Note: /tmp is not cleaned at boot time if /tmp is configured as a separate filesystem.
/dev
Large files may appear here when trying to write to a device using the incorrect device name. For example /dev/rmt/o (letter 'o') instead of /dev/rmt/0 (digit 'zero' for a tape drive. This is a very common problem if the machine does not have a tape drive attached and someone uses a tape command like tar or ufsdump. That will just create a large file in /dev/rmt/. So be sure to check the /dev directory for actual links not files.
/
Look for core files. Check /.wastebasket and /lost+found directory for large files. Check for a .CPR file in root, this is put there by power suspend/resume software.
/var
Third party packages sometimes leave tar files in /var/sadm/pkg directory.
If /var is full (and is a separate filesystem) or /var directory is the one we determined is using up most space in root, check the following.
Clearing out (but NOT deleting; the files should be truncated to zero length) the following files might gain you some space. Use caution here because you will lose various log information. For example, the utmp[x] and wtmp[x] files contain user access and accounting information:
/var/cron/log
/var/spool/lp/logs
/var/adm/utmp
/var/adm/utmpx
/var/adm/wtmp
/var/adm/wtmpx
/var/log/syslog*
/var/adm/messages.*
NOTE: if you zero out the utmp, utmpx, wtmp or wtmpx files, you should reboot your machine.

To zero out a file:
# cat /dev/null > filename
NOTE: For Solaris[TM] 9 or greater, see logadm(1M) for a useful tool to manage log files.
Check /var/saf - check for _log and in tcp and zsmon directories. There will be _log files - you can zero them out with "cat /dev/null > filename". If your system is being used as a printer host, check /var/lp/logs for files, they can be removed if they have been printed or left over from system crashes or printer problems. Check /var/preserve. Check /var/spool/* directory. Subdirectories like "lp" or "mqueue" are used for spooling. Check /var/crash for any system cores. Also check /var/tmp for files not needed; /var/tmp is not cleaned up with a reboot.
A word of caution regarding the /var/sadm directory. This directory contains package and patch information and generally should not be touched.

Also it could be that you are running out of inodes and are getting the message “file system full”. In this case recreate a partition with more inodes. The basic steps are:

Remove unneeded files.*
Backup the partition.*
recreate using newfs -i nbpi /dev/ where nbpi is chosen smaller then the default for the disk size, and rfsname is the raw filesystem; e.g. /dev/rdsk/cNtNdNsN. See man newfs(1M) for more information.*

Restore information back to the partition.
In the course of normal system operation, the root and usr filesystems (or directories) are mostly static (do not grow over time). /var however, does grow over time (because it contains log files, package database, print and mail spoolers, etc.). The name “var” is in fact an abbreviation for “varying” or “variable” as the “/var” filesystem is intended for files which vary in size and content over time (see the filesystem(5) manual page for more details about this). It is good system administration practice to monitor log files to make sure they don't get too large.
If a filesystem suddenly fills up, that could have been caused by installing a new piece of software into a wrong directory.
Check any lost+found directory on any filesystem that is full.
Another approach would be to list files by their modification date (if the date of when the filesystem filled up is known).
# ls -lRt / more will list all the files and sort them by the modification dates.

It could also be that all of these action does not give the solution and the problem is actually that the filesystem is too small.

To check inode usage - df -F ufs -o iTo check how a filesystem was created - mkfs -m /dev/rdsk/cXtXdXsX

::ciao.