linuxtestingcommand-line

Is there a command to write random garbage bytes into a file?


I am now doing some tests of my application again corrupted files. But I found it is hard to find test files.

So I'm wondering whether there are some existing tools, which can write random/garbage bytes into a file of some format.

Basically, I need this tool to:

  1. It writes random garbage bytes into the file.
  2. It does not need to know the format of the file, just writing random bytes are OK for me.
  3. It is best to write at random positions of the target file.
  4. Batch processing is also a bonus.

Thanks.


Solution

  • The /dev/urandom pseudo-device, along with dd, can do this for you:

    dd if=/dev/urandom of=newfile bs=1M count=10
    

    This will create a file newfile of size 10M.

    The /dev/random device will often block if there is not sufficient randomness built up, urandom will not block. If you're using the randomness for crypto-grade stuff, you can steer clear of urandom. For anything else, it should be sufficient and most likely faster.

    If you want to corrupt just bits of your file (not the whole file), you can simply use the C-style random functions. Just use rnd() to figure out an offset and length n, then use it n times to grab random bytes to overwrite your file with.


    The following Perl script shows how this can be done (without having to worry about compiling C code):

    use strict;
    use warnings;
    
    sub corrupt ($$$$) {
        # Get parameters, names should be self-explanatory.
    
        my $filespec = shift;
        my $mincount = shift;
        my $maxcount = shift;
        my $charset = shift;
    
        # Work out position and size of corruption.
    
        my @fstat = stat ($filespec);
        my $size = $fstat[7];
        my $count = $mincount + int (rand ($maxcount + 1 - $mincount));
        my $pos = 0;
        if ($count >= $size) {
            $count = $size;
        } else {
            $pos = int (rand ($size - $count));
        }
    
        # Output for debugging purposes.
    
        my $last = $pos + $count - 1;
        print "'$filespec', $size bytes, corrupting $pos through $last\n";
    
        # Open file, seek to position, corrupt and close.
    
        open (my $fh, "+<$filespec") || die "Can't open $filespec: $!";
        seek ($fh, $pos, 0);
        while ($count-- > 0) {
            my $newval = substr ($charset, int (rand (length ($charset) + 1)), 1);
            print $fh $newval;
        }
        close ($fh);
    }
    
    # Test harness.
    
    system ("echo =========="); #DEBUG
    system ("cp base-testfile testfile"); #DEBUG
    system ("cat testfile"); #DEBUG
    system ("echo =========="); #DEBUG
    
    corrupt ("testfile", 8, 16, "ABCDEFGHIJKLMNOPQRSTUVWXYZ   ");
    
    system ("echo =========="); #DEBUG
    system ("cat testfile"); #DEBUG
    system ("echo =========="); #DEBUG
    

    It consists of the corrupt function that you call with a file name, minimum and maximum corruption size and a character set to draw the corruption from. The bit at the bottom is just unit testing code. Below is some sample output where you can see that a section of the file has been corrupted:

    ==========
    this is a file with nothing in it except for lowercase
    letters (and spaces and punctuation and newlines).
    that will make it easy to detect corruptions from the
    test program since the character range there is from
    uppercase a through z.
    i have to make it big enough so that the random stuff
    will work nicely, which is why i am waffling on a bit.
    ==========
    'testfile', 344 bytes, corrupting 122 through 135
    ==========
    this is a file with nothing in it except for lowercase
    letters (and spaces and punctuation and newlines).
    that will make iFHCGZF VJ GZDYct corruptions from the
    test program since the character range there is from
    uppercase a through z.
    i have to make it big enough so that the random stuff
    will work nicely, which is why i am waffling on a bit.
    ==========
    

    It's tested at a basic level but you may find there are edge error cases which need to be taken care of. Do with it what you will.