Deleting lost+found is a bad idea

I recently came across an oracle redhat linux server without lost+found. It had been deleted because oracle needs an empty directory on installation. Albeit this is an insane rule, deleting an important system directory is a really bad idea. I will tell you why.
First lets focus on what lost+found does:

In case of catastrophic filesystem corruption the utility fsck tries to repair it. When it finds a string of inodes in use without any filename it will create a new file in lost+found with the inode number as filename. When lost and found does not exist fsck will create it at this time.

This is the core of the problem: creating a new directory that needs about 10.000 inodes on creation on an already flaky filesystem. Lost+found is uniq in that case. You can also create it by hand with a command named mklostfound or by mkdir. In the latter case you also need to fill it with 10.000 free inodes by creating that many files and then throwing them away.

Now lets get back to fsck. Its running and needs a lot of free inodes to recreate lost+found. However the filesystem is damaged. If /etc is among the directories missing, that will be the first choice because it had a low inode number.

If you consider the order in which filesystems are created on install:

host:/ # ls -l /
drwxr-xr-x 2 root root 4096 Feb 25 10:08 bin
drwxr-xr-x 4 root root 1024 Feb 25 10:09 boot
drwxr-xr-x 14 root root 4520 Feb 25 10:08 dev
drwxr-xr-x 100 root root 12288 Feb 25 10:09 etc

/bin is safe enough because no mutations are done to this filesystem. The same goes for /boot. /dev and /etc however, contain mutable data. /dev is usually a mount by the process udev and is recreated on every boot. that leaves /etc as one of the first victims of a missing lost+found.

 

To understand this behaviour means a system administrator can mitigate the risk. First by separating the system partition to include mostly read only files and all the really active files are moved to another filesystem. This reduces the chance of a systemwide failure due to fs corruption. Secondly the sysadmin should recreate this dir as soon as its found to be missing. Even on a healthy filesystem this can still be a problem because of the high inode usage of lost+found.

More about the topic of filesystem separation on system partitions in another blog post.