crustslicerust-cargoffi

Char pointers in Rust-C interop


Recently I was doing a very simple Rust-C interop project. Basically, I have a fuction coded in C, its compiled into a dynamic library and linked with my Rust code.

This is the C code:

//print_file.c
#include<stdio.h>

void print_file(const char *file_contents, size_t file_size) {
    for (size_t i = 0; i < file_size; i++)
        printf("%c", *(file_contents + i));
}

and this is the Rust code:

//main.rs
use std::{fs::OpenOptions, io::Read};
use libc::size_t;

#[link(name = "printer")]
extern "C" {
    fn print_file(
        file_contents: *const u8,
        file_size: size_t
    );
}

fn main() {
    let mut content_buf: String = String::new();

    let mut file = OpenOptions::new().read(true).open("content.txt").unwrap();
    file.read_to_string(&mut content_buf).unwrap();

    let file_size: size_t = content_buf.len();

    let content_bytes = content_buf.as_bytes()[0] as *const u8;
    unsafe {
        print_file(content_bytes, file_size);
    }
}

The Rust code reads the text "As usual." contained in the text file "content.txt", and does some printing through the C function.

Everything builds fine and the linking goes smoothly. When I run the Rust program however, I get:

Segmentation fault (core dumped)

Looking at this, I tried a few things. First I changed my C code to this: (The Rust code is still the same as above)

//print_file.c
#include<stdio.h>

void print_file(const char *file_contents, size_t file_size) {
    for (size_t i = 0; i < file_size; i++)
        printf("%c", (file_contents + i)); //This line is changed
}

And this does not crash. However the weird thing about this is that, it reads the first character present in the file. The rest of the characters it prints, is the ASCII characters that come after the first letter in the ASCII table. So, I get "ABCDEFGHIJ" as my output in the terminal. If I change the first letter in content.txt, It reflects in my console output, however, the rest are following ASCII characters. (Note: The output string size is the same as the content size)

The thing that I didnt understand here, is: How could that C function access the first character of that file without using any inderection/deref operator, '*'?

After this, I did more modifications, I changed my C program back to the original,

//print_file.c
#include<stdio.h>

void print_file(const char *file_contents, size_t file_size) {
    for (size_t i = 0; i < file_size; i++)
        printf("%c", *(file_contents + i)); //Back to the old file...
}

And I changed my Rust code to this:

//main.rs
use std::{fs::OpenOptions, io::Read};
use libc::size_t;

#[link(name = "printer")]
extern "C" {
    fn print_file(
        file_contents: *const [u8], //Changed this...
        file_size: size_t
    );
}

fn main() {
    let mut content_buf: String = String::new();

    let mut file = OpenOptions::new().read(true).open("content.txt").unwrap();
    file.read_to_string(&mut content_buf).unwrap();

    let file_size: size_t = content_buf.len();

    let content_bytes = content_buf.as_bytes() as *const [u8]; //Changed this too...
    unsafe {
        print_file(content_bytes, file_size);
    }
}

And now, I get a Rust compiler warning that:

warning: `extern` block uses type `[u8]`, which is not FFI-safe  
 --> src/main.rs:7:24  
  |  
7 |         file_contents: *const [u8],  
  |                        ^^^^^^^^^^^ not FFI-safe  
  |  
  = help: consider using a raw pointer instead  
  = note: slices have no C equivalent  
  = note: `#[warn(improper_ctypes)]` on by default  

The code compiles, and Voila! I get "As usual." as my terminal output!

I still cant wrap my head around these weird things. So what is the difference between *const u8 and *const [u8] under the hood, in C?

Well why didnt pointer arithmetic on *const u8? Why did it give a SEGV?

I thought it could be because of const, but NO. Even changing all variables to mut gives the same results.

I am using Ubuntu. My Rust compiler version is 1.83 (stable), my gcc version is 11.4.

And please give me info based on u8 or i8 and char in C. I dont want info on wchar or some other types.

I just want some information regarding these weird behaviours. The source code is on Github btw.


Solution

  • aside from your pointer machinations - which appear to have been addressed, a trivial read and display example below, leveraging Cstring module (to maximise compatibility between rust & c for 'string' interchange) , the env module used to collect command line args.

    cat main.rs 
    use std::env;
    use std::fs;
    use std::ffi::CString;
    
    extern "C" {
        fn print_file(aString: *const i8, size: usize);
    }
    
    fn main() {
        let args: Vec<String> = env::args().collect();
    
        if args.len() != 2
        {
            eprintln!("usage: {} FILE-TO-DISPLAY", &args[0]);
            return;
        }
    
        let file_contents = match fs::read_to_string(&args[1]) 
        {
            Ok(contents) => contents,
            Err(err) => {
                eprintln!("Failed to read {}, REASON:{}", &args[1], err);
                return;
            }
        }; 
    
        let c_string = match CString::new(file_contents)
        {
            Ok(c_string) => c_string,
            Err(err) => {
                eprintln!("unexpected CString error {}", err);
                return;
            }
        };
    
        unsafe {
            print_file(c_string.as_ptr(), c_string.to_bytes().len());
        }
    }
    
    
    #
    # test with
    #
    ./main someinputfile