ext2 filesystem basics

maintained by rudy winnacker (last updated 25 October 2004)

This is a brief review of some of the internals of the ext2 filesystem, with focus on the practical implications of this architecture.

If you've heard that UNIX doesn't permit the recovery of deleted files, unlike Windows, you may have had the impression that there is no way to recover a file in Linux after you've done an 'rm' on it and seen it removed from the output of 'ls'. However, while it is true that Linux does not natively implement a 'Recycle Bin' or other pre-deletion repository for files, it is also true that you can (under the right conditions) recover a removed file in Linux.

The ext2 filesystem is commonly implemented on Red Hat Linux; in some cases it includes journalling, and in these cases it is known as an ext3 filesystem.

Inodes record almost all metadata for a file, i.e. where the data for the file is stored, and also properties of the file such as owner and permissions, indeed, just about everything interesting about a file except the name. The inode of the directory containing that file records the name and inode number of the file.

Since there is no restriction on the number of times an inode can be referenced in this way, it is misleading to speak of 'the' name of a file if this implies that a file can have only one 'real' name. Just as a person can have more than one filename, an inode can be referenced more than once under different filenames (or 'hard links').

Here is an easy way to see this at work.

See how much room you have left in /var/tmp/ with "df -h /var/tmp".

Assuming you have 1G free, you can then make a dummy file using this command: "dd if=/dev/zero of=/var/tmp/bigfile bs=8128 count=128000".

Now that file has one name, "bigfile", and you should be able to see that this disk available under /var/tmp/ has gone down by 1G. Now, give the file another name by making a hard link to it with "ln /var/tmp/bigfile /var/tmp/bigfile-hardlink".

Now, remove the original filename with "rm /var/tmp/bigfile". Rerun your "df -h /var/tmp" command and you will see that the disc available has not returned to the original value yet.

Now, remove the file using the last name for it with "rm /var/tmp/bigfile-hardlink". The "df -h /var/tmp" command will now show you that you've gotten your disc space back.

The fields of an inode include the following:

mode: file type and permissions (%F and %A, %a, %f)
uid: uid of the file owner (%U, %u)
size: size in bytes (%s)
atime: last time the file was accessed (%X %x)
ctime: last time the inode information was changed (%Y %y)
mtime: last time the file content was modified (%Z %z)
dtime: time when this file was deleted
gid: gid of the file (%G %g)
links count: number of (hard) links pointing to the inode (%i)
blocks: number of blocks allocated to the file (default is 512 bytes/block) (%b)
flags:
block:
version: file version (for NFS)
file acl:
dir acl:
faddr: file fragment block address
frag: number of the fragment on the block
size: size of the fragment

What happens when a file is created? What happens when a hard link is created or removed?

When a file is created an inode is allocated for it with a link count of 1, and an entry is made in the directory inode recording the file name and the inode number of the file.

When a hard link is created to a file, an entry is made in the inode of the directory containing the hard link along with the inode number, and the link count is increased.

When a hard link is removed, the name of the hard link is removed from the directory inode and the link count is decreased by 1.

The 'stat' command can be used to report some of the data from an inode of a file using one of its filenames:

mach:/var/tmp# stat /var/tmp/testfile
  File: `/var/tmp/testfile'
  Size: 0               Blocks: 0          IO Block: 4096   Regular File
Device: 3a02h/14850d    Inode: 97601       Links: 1    
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2004-10-26 18:14:01.000000000 -0700
Modify: 2004-10-26 18:14:01.000000000 -0700
Change: 2004-10-26 18:14:01.000000000 -0700

You can also stat a directory:

mach:/var/tmp/tmp# stat /var/tmp
  File: `/var/tmp'
  Size: 4096            Blocks: 8          IO Block: 4096   Directory
Device: 3a02h/14850d    Inode: 97537       Links: 6    
Access: (1777/drwxrwxrwt)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2004-10-26 18:06:20.000000000 -0700
Modify: 2004-10-26 15:07:40.000000000 -0700
Change: 2004-10-26 15:07:40.000000000 -0700

You already know that ls will show you the contents of a directory, but stat does not (the files in a directory are recorded in the data blocks of its inodes, not in its metadata, and stat only reports some of the metadata for an inode). However, you can use debugfs to see the deleted and undeleted contents of a directory:

mach:/var/tmp/tmp# touch deletedfile
mach:/var/tmp/tmp# ls -l deletedfile 
-rw-r--r--    1 root     root            0 Oct 26 18:16 deletedfile
mach:/var/tmp/tmp# rm deletedfile 
rm: remove regular empty file `deletedfile'? y
mach:/var/tmp/tmp# !deb
debugfs /dev/sysvg/var
debugfs 1.32 (09-Nov-2002)
debugfs:  ls -ld /tmp/tmp
  97538   41777 (2)      0      0    4096 26-Oct-2004 18:16 .
  97537   41777 (2)      0      0    4096 26-Oct-2004 18:14 ..
<     0>      0 (1)      0      0       0                   .rnd
 601475   41777 (2)     43     43    4096 19-Jul-2004 21:32 .font-unix
 211337   40700 (2)   4793   5001    4096 20-Oct-2004 14:59 ssh-CmI13108
  97539  100600 (1)   4793   5001     567 26-Oct-2004 18:06 krb5cc_4793
<     0>      0 (1)      0      0       0                   RsBeV65A
 162569   40700 (2)   4793   5001    4096 26-Aug-2004 14:14 ssh-VIJs8576
 227594   40700 (2)   4793   5001    4096 26-Oct-2004 18:06 ssh-kHc11280
<     0>      0 (1)      0      0       0                   deletedfile
<     0>      0 (1)      0      0       0                   rpm6478.gDCmaO

Debugfs is a very powerful general-purpose filesystem debugger. Among the interesting things it allows one to do is to create a hard link without increasing the link count for the inode. So, if you delete one of the hard links after doing this, you can end up with a file with a link count of zero:

mach:/var/tmp/tmp# echo file data > tempfile
mach:/var/tmp/tmp# cat tempfile
file data

mach:/var/tmp/tmp# stat tempfile
  File: `tempfile'
  Size: 10              Blocks: 8          IO Block: 4096   Regular File
Device: 3a02h/14850d    Inode: 276364      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2004-10-27 14:43:11.000000000 -0700
Modify: 2004-10-27 14:43:09.000000000 -0700
Change: 2004-10-27 14:43:09.000000000 -0700

mach:/var/tmp/tmp# debugfs -w /dev/sysvg/var
debugfs 1.32 (09-Nov-2002)
debugfs:  ln /tmp/tmp/tempfile /tmp/tmp/templink
debugfs:  quit

mach:/var/tmp/tmp# ls -l
total 8
-rw-r--r--    1 root     root           10 Oct 27 14:43 tempfile
-rw-r--r--    1 root     root           10 Oct 27 14:43 templink

mach:/var/tmp/tmp# rm tempfile
rm: remove regular file `tempfile'? y

mach:/var/tmp/tmp# stat templink
  File: `templink'
  Size: 10              Blocks: 8          IO Block: 4096   Regular File
Device: 3a02h/14850d    Inode: 276364      Links: 0
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2004-10-27 14:43:11.000000000 -0700
Modify: 2004-10-27 14:43:09.000000000 -0700
Change: 2004-10-27 14:48:17.000000000 -0700

mach:/var/tmp/tmp# cat templink
file data

Ext2fs is a filesystem checker. It looks for things like inodes that appear to have no hard links (hard link count of zero):

spe57:/var/tmp/tmp# e2fsck -n -f /dev/sysvg/var
e2fsck 1.32 (09-Nov-2002)
Warning!  /dev/sysvg/var is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
Pass 1: Checking inodes, blocks, and sizes

Inodes that were part of a corrupted orphan linked list found.  Fix? no

Inode 276364 was part of the orphaned inode list.  IGNORED.
Pass 2: Checking directory structure

Entry 'templink' in /tmp/tmp (276363) has deleted/unused inode 276364.  Clear? no

...

WARNING: Don't try this on a partition with data you want to keep! Debugfs also provides a way of working around the restriction against hardlinking to directories using ls. However, keep in mind that if you use rm -rf on a hardlink to a directory, it will remove all of the contents of that directory as well, even if there is more than one hard link to that directory. And, using rm on a second hard link to a directory has in my experience almost always create mild to very bad i/o errors trying to ls the remaining link. Again, it is really easy to break your filesystem this way. I would be interested in why anyone would want to hardlink to a directory in the first place, so this behavior is probably not something to try to work around:

kant:/var/tmp/tmp2$ ln tempdir tempdir-hl
ln: `tempdir': hard link not allowed for directory