I am learning the EXT2 file system. I am confused about how the removal of a file works for EXT2. My understanding is that, upon deletion, it doesn't actually deletes the inode, instead it marks some metadata as unused. My question is that, what metadata does it modify upon deletion, and how does the file system know that the file is deleted? Thanks.
In Linux this is implemented around ext2_delete_inode
function of fs/ext2/inode.c file:
http://lxr.free-electrons.com/source/fs/ext2/inode.c?v=2.6.32#L59
56 /*
57 * Called at the last iput() if i_nlink is zero.
58 */
59 void ext2_delete_inode (struct inode * inode)
60 {
61 truncate_inode_pages(&inode->i_data, 0);
..
65 EXT2_I(inode)->i_dtime = get_seconds();
66 mark_inode_dirty(inode);
67 ext2_write_inode(inode, inode_needs_sync(inode));
68
69 inode->i_size = 0;
70 if (inode->i_blocks)
71 ext2_truncate (inode);
72 ext2_free_inode (inode);
73
74 return;
..
77 }
So, it removes pages from page cache in truncate_inode_pages
, sets dtime (deletion time) and marks inode as dirty - I_DIRTY
which is combination of (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
:
1601 * I_DIRTY_SYNC Inode is dirty, but doesn't have to be written on
1602 * fdatasync(). i_atime is the usual cause.
1603 * I_DIRTY_DATASYNC Data-related inode changes pending. We keep track of
1604 * these changes separately from I_DIRTY_SYNC so that we
1605 * don't have to write inode on fdatasync() when only
1606 * mtime has changed in it.
1607 * I_DIRTY_PAGES Inode has dirty pages. Inode itself may be clean.
Then write modified inode, change it size to zero, truncate all blocks linked from inode with ext2_truncate()
(the actual marking of data blocks as free is done there): http://lxr.free-electrons.com/source/fs/ext2/inode.c?v=2.6.32#L1025
1025 void ext2_truncate(struct inode *inode)
1026 {
..
1059 n = ext2_block_to_path(inode, iblock, offsets, NULL);
99 /* ext2_block_to_path - parse the block number into array of offsets
105 * To store the locations of file's data ext2 uses a data structure common
106 * for UNIX filesystems - tree of pointers anchored in the inode, with
107 * data blocks at leaves and indirect blocks in intermediate nodes.
108 * This function translates the block number into path in that tree -
109 * return value is the path length and @offsets[n] is the offset of
110 * pointer to (n+1)th node in the nth one. If @block is out of range
111 * (negative or too large) warning is printed and zero returned. */
1069 if (n == 1) {
1070 ext2_free_data(inode, i_data+offsets[0],
1071 i_data + EXT2_NDIR_BLOCKS);
1072 goto do_indirects;
1073 }
..
1082 ext2_free_branches(inode, &nr, &nr+1, (chain+n-1) - partial);
..
1084 /* Clear the ends of indirect blocks on the shared branch */
1085 while (partial > chain) {
1086 ext2_free_branches(inode,
1087 partial->p + 1,
1088 (__le32*)partial->bh->b_data+addr_per_block,
1089 (chain+n-1) - partial);
..
1094 do_indirects:
1095 /* Kill the remaining (whole) subtrees */
1096 switch (offsets[0]) {
1097 default:
1098 nr = i_data[EXT2_IND_BLOCK];
1099 if (nr) {
1100 i_data[EXT2_IND_BLOCK] = 0;
1101 mark_inode_dirty(inode);
1102 ext2_free_branches(inode, &nr, &nr+1, 1);
1103 }
1104 case EXT2_IND_BLOCK:
1105 nr = i_data[EXT2_DIND_BLOCK];
1106 if (nr) {
1107 i_data[EXT2_DIND_BLOCK] = 0;
1108 mark_inode_dirty(inode);
1109 ext2_free_branches(inode, &nr, &nr+1, 2);
1110 }
1111 case EXT2_DIND_BLOCK:
1112 nr = i_data[EXT2_TIND_BLOCK];
1113 if (nr) {
1114 i_data[EXT2_TIND_BLOCK] = 0;
1115 mark_inode_dirty(inode);
1116 ext2_free_branches(inode, &nr, &nr+1, 3);
1117 }
1118 case EXT2_TIND_BLOCK:
1119 ;
1120 }
(why EXT2_TIND_BLOCK
is not cleared?)
Then we can free inode structure in kernel memory.
how does the file system know that the file is deleted?
The check is there in the ext2_iget
function: http://lxr.free-electrons.com/source/fs/ext2/inode.c?v=2.6.32#L1251
1251 /* We now have enough fields to check if the inode was active or not.
1252 * This is needed because nfsd might try to access dead inodes
1253 * the test is that same one that e2fsck uses
1254 * NeilBrown 1999oct15
1255 */
1256 if (inode->i_nlink == 0 && (inode->i_mode == 0 || ei->i_dtime)) {
1257 /* this inode is deleted */
1258 brelse (bh);
1259 ret = -ESTALE;
1260 goto bad_inode;
1261 }
So, deleted inode is inode which has no incoming links (it not mentioned in any directory i_nlink) and have either zero mode or non-zero deletion time.