(All of the code snippets are taken from: https://docs.huihoo.com/doxygen/linux/kernel/3.7/dir_97b3d2b63ac216821c2d7a22ee0ab2b0.html)
Hi! To establish my question I have been looking at the Linux fs code for almost a month now for research and I am stuck here. So I am looking at this code in include/linux/fs.h
(which if I am not wrong has the definitions of almost all major structures and pointers used by codes like read_write.c
and open.c
) and I observe this code snippet:
struct file_operations {
1519 struct module *owner;
1520 loff_t (*llseek) (struct file *, loff_t, int);
1521 ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
1522 ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
1523 ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
1524 ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
1525 int (*readdir) (struct file *, void *, filldir_t);
1526 unsigned int (*poll) (struct file *, struct poll_table_struct *);
1527 long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
1528 long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
1529 int (*mmap) (struct file *, struct vm_area_struct *);
1530 int (*open) (struct inode *, struct file *);
1531 int (*flush) (struct file *, fl_owner_t id);
1532 int (*release) (struct inode *, struct file *);
1533 int (*fsync) (struct file *, loff_t, loff_t, int datasync);
1534 int (*aio_fsync) (struct kiocb *, int datasync);
1535 int (*fasync) (int, struct file *, int);
1536 int (*lock) (struct file *, int, struct file_lock *);
1537 ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
1538 unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
1539 int (*check_flags)(int);
1540 int (*flock) (struct file *, int, struct file_lock *);
1541 ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
1542 ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
1543 int (*setlease)(struct file *, long, struct file_lock **);
1544 long (*fallocate)(struct file *file, int mode, loff_t offset,
1545 loff_t len);
1546 };
Here as you can see they have defined these very specific syscalls which have been declared in their respective files. For example read_write.c has its definition of read and write syscalls as SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
and SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf, size_t, count)
respectively. Now for research purposes, I basically went inside these two definitions and hunted down each and every function call (at least those that were linked in the Doxygen documentation) that happened inside each of them and the function calls inside those function calls but could not answer a very simple question. How do these two syscalls call the virtual filesystem to further call the drivers required to read actual blocks of data from the filesystem? (If it is filesystem-specific then please show me locations in the code where it is handing it off to the FS drivers)
P.S. I did the same hunt for the open syscall but was able to find the place where they invoked a part of namei.c
code to perform that task specifically here: struct file *do_filp_open(int dfd, struct filename *pathname, const struct open_flags *op, int flags)
. here they use the structure nameidata that has the relevant information from the inode to open a file.
In Linux, in-kernel filesystems are implemented in a modular fashion. For example, each struct inode
contains a pointer to a struct file_operations
, the same struct you copied in your question. This struct contains function pointers for various file operations.
For example, the member ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
is a function pointer to a function that takes a struct file *
, char *
, size_t
, and loff_t *
as parameters, and returns a ssize_t
.
When the read system call occurs, the kernel VFS code finds the corresponding inode
, and then calls the filesystem's read function that is specified in the struct file_operations
. Here's a trace of the read system call:
read()
syscall handler is invoked,ksys_read()
,vfs_read()
.This is where the magic happens in vfs_read()
:
if (file->f_op->read)
ret = file->f_op->read(file, buf, count, pos);
else if (file->f_op->read_iter)
ret = new_sync_read(file, buf, count, pos);
else
ret = -EINVAL;
A related struct, struct file
, also contains a pointer to a struct file_operations
. The above if-condition checks if there is a read()
handler for this file, and calls it if it exists. If a read()
handler doesn't exist, it checks for a read_iter
handler. If neither exists, it returns -EINVAL
.
In ext4, the struct file_operations
is defined here. It is used in several places, but it is associated with an inode here. ext4 defines a read_iter
handler (ie. ext4_file_read_iter
), but not a read
handler. So, when read(2)
is called on an ext4 file, ext4_file_read_iter()
is eventually called.
At this point, we've gotten to filesystem specific code. How ext4 manages blocks can be explored further from here.