I know that my question has an answer here: QFile seek performance. But I am not completely satisfied with the answer. Even after looking at the following implementation of generic_file_llseek()
for ext4, I can't seem to understand how can the complexity be measured.
/**
* generic_file_llseek - generic llseek implementation for regular files
* @file: file structure to seek on
* @offset: file offset to seek to
* @origin: type of seek
*
* This is a generic implemenation of ->llseek useable for all normal local
* filesystems. It just updates the file offset to the value specified by
* @offset and @origin under i_mutex.
*/
loff_t generic_file_llseek(struct file *file, loff_t offset, int origin)
{
loff_t rval;
mutex_lock(&file->f_dentry->d_inode->i_mutex);
rval = generic_file_llseek_unlocked(file, offset, origin);
mutex_unlock(&file->f_dentry->d_inode->i_mutex);
return rval;
}
/**
* generic_file_llseek_unlocked - lockless generic llseek implementation
* @file: file structure to seek on
* @offset: file offset to seek to
* @origin: type of seek
*
* Updates the file offset to the value specified by @offset and @origin.
* Locking must be provided by the caller.
*/
loff_t
generic_file_llseek_unlocked(struct file *file, loff_t offset, int origin)
{
struct inode *inode = file->f_mapping->host;
switch (origin) {
case SEEK_END:
offset += inode->i_size;
break;
case SEEK_CUR:
/*
* Here we special-case the lseek(fd, 0, SEEK_CUR)
* position-querying operation. Avoid rewriting the "same"
* f_pos value back to the file because a concurrent read(),
* write() or lseek() might have altered it
*/
if (offset == 0)
return file->f_pos;
break;
}
if (offset < 0 || offset > inode->i_sb->s_maxbytes)
return -EINVAL;
/* Special lock needed here? */
if (offset != file->f_pos) {
file->f_pos = offset;
file->f_version = 0;
}
return offset;
}
Say, for example, I have a 4GB file, and I know the offset for the middle portion in the file, how exactly does a lseek()
get me there without traversing the entire file? Does the OS already know where each byte of the file resides?
lseek()
as implemented in ext4
will just increment the file pointer and do some validation checks. It doesn't depend on the file size, meaning it is O(1)
.
Also you can see this in the code, there isn't any loop nor suspicious function calls in there.
However, while this is true on ext4
, it might be not true for other filesystems, as this behaviour isn't guaranteed by POSIX. But it is likely unless the filesystem is meant for a very special purpose.