javainputarrays

Filter (search and replace) array of bytes in an InputStream


I have an InputStream which takes the html file as input parameter. I have to get the bytes from the input stream .

I have a string: "XYZ". I'd like to convert this string to byte format and check if there is a match for the string in the byte sequence which I obtained from the InputStream. If there is then, I have to replace the match with the bye sequence for some other string.

Is there anyone who could help me with this? I have used regex to find and replace. however finding and replacing byte stream, I am unaware of.

Previously, I use jsoup to parse html and replace the string, however due to some utf encoding problems, the file seems to appear corrupted when I do that.

TL;DR: My question is:

Is a way to find and replace a string in byte format in a raw InputStream in Java?


Solution

  • Not sure you have chosen the best approach to solve your problem.

    That said, I don't like to (and have as policy not to) answer questions with "don't" so here goes...

    Have a look at FilterInputStream.

    From the documentation:

    A FilterInputStream contains some other input stream, which it uses as its basic source of data, possibly transforming the data along the way or providing additional functionality.


    It was a fun exercise to write it up. Here's a complete example for you:

    import java.io.*;
    import java.util.*;
    
    class ReplacingInputStream extends FilterInputStream {
    
        LinkedList<Integer> inQueue = new LinkedList<Integer>();
        LinkedList<Integer> outQueue = new LinkedList<Integer>();
        final byte[] search, replacement;
    
        protected ReplacingInputStream(InputStream in,
                                       byte[] search,
                                       byte[] replacement) {
            super(in);
            this.search = search;
            this.replacement = replacement;
        }
    
        private boolean isMatchFound() {
            Iterator<Integer> inIter = inQueue.iterator();
            for (int i = 0; i < search.length; i++)
                if (!inIter.hasNext() || search[i] != inIter.next())
                    return false;
            return true;
        }
    
        private void readAhead() throws IOException {
            // Work up some look-ahead.
            while (inQueue.size() < search.length) {
                int next = super.read();
                inQueue.offer(next);
                if (next == -1)
                    break;
            }
        }
    
        @Override
        public int read() throws IOException {    
            // Next byte already determined.
            if (outQueue.isEmpty()) {
                readAhead();
    
                if (isMatchFound()) {
                    for (int i = 0; i < search.length; i++)
                        inQueue.remove();
    
                    for (byte b : replacement)
                        outQueue.offer((int) b);
                } else
                    outQueue.add(inQueue.remove());
            }
    
            return outQueue.remove();
        }
    
        // TODO: Override the other read methods.
    }
    

    Example Usage

    class Test {
        public static void main(String[] args) throws Exception {
    
            byte[] bytes = "hello xyz world.".getBytes("UTF-8");
    
            ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
    
            byte[] search = "xyz".getBytes("UTF-8");
            byte[] replacement = "abc".getBytes("UTF-8");
    
            InputStream ris = new ReplacingInputStream(bis, search, replacement);
    
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
    
            int b;
            while (-1 != (b = ris.read()))
                bos.write(b);
    
            System.out.println(new String(bos.toByteArray()));
    
        }
    }
    

    Given the bytes for the string "Hello xyz world" it prints:

    Hello abc world