character-encodingssissolarisdos2unix

\377\376 Appended to file (Windows -> Unix)


I have an SSIS package that performs the following.

  1. Run a SQL script
  2. Export the results to a flat file (UTF-8 encoded, ; delimited, and \n for new lines)
  3. FTP results to a Solaris machine (binary format)

The problem is that when the file shows up on my Solaris box, it has the following at the start of the file.

\377\376

I have tried dos2unix, and it still has not corrected the issue. In fact, it changes the \377\376 to \227\226, not very helpful.

Is there a way to remove these characters from my file? When they are there, they mess with grep and other Unix tools, like head.


Solution

  • By default, any SSIS or Windows-encoded file is UCS-2-LITTLE-ENDIAN encoded. The easiest way is to encode the file on your Unix server with the following commands.

    1. Switch over to UTF-8 (or whatever encoding you need) with iconv:

      iconv -f UCS-2-LITTLE-EDIAN -t UTF-8 input > output
      
    2. Remove the carriage returns that Microsoft adds to the end of lines.

      unix2dos -ascii utf-8-file outputfile