talendtalend-mdm

Usage of Escape char and Text Enclosure


What is the use of Escape char and Text Enclosure in tFileOutputDelimited component and How can we use them??

enter image description here

Thanks in Advance...


Solution

  • To answer your question, consider the below example from CSV file

    bookId,bookname,description,authorname
    1,Grammer,book that has details about grammer,author1
    2,Special Characters, book that describes about some escape characters like \", punctuations and special characters ,etc.,author2
    3,Mathematics, book that has mathematical operations like addition +, subtraction -, multiplication *, division / etc, author3
    

    I have created a simple job like below

    enter image description here

    In the above sample, character comma "," is the delimiter. But there are some commas in between the data.

    The data that is written to CSV file will look like below,

    enter image description here

    Now When I read the data from that file I will get below data

    .------+------------------+-------------------------------------------------------+-------------------------------------.
    |                                                       tLogRow_3                                                       |
    |=-----+------------------+-------------------------------------------------------+------------------------------------=|
    |bookId|bookName          |description                                            |author                               |
    |=-----+------------------+-------------------------------------------------------+------------------------------------=|
    |1     |Grammer           |book that has details about grammer                    |author1                              |
    |2     |Special Characters|book that describes about some escape characters like "| punctuations and special characters |
    |3     |Mathematics       |book that has mathematical operations like addition +  | subtraction -                       |
    '------+------------------+-------------------------------------------------------+-------------------------------------'
    

    If you notice, some data are missing in the log for "author" column.

    This is because of the comma in between the data. To avoid it Text Enclosure option is used. Also there is a escape character in the data, which is \". In the file it will be printed as ". If Text Enclosure has value as """, then you need to escape the character " which is present inside the data. To do this, you have to use Escape char option, like below

    enter image description here

    Now the output that I got is

    enter image description here

    When I read this data, I will get data like below,

    .------+------------------+-------------------------------------------------------------------------------------------------------+-------.
    |                                                                tLogRow_3                                                                |
    |=-----+------------------+-------------------------------------------------------------------------------------------------------+------=|
    |bookId|bookName          |description                                                                                            |author |
    |=-----+------------------+-------------------------------------------------------------------------------------------------------+------=|
    |1     |Grammer           |book that has details about grammer                                                                    |author1|
    |2     |Special Characters|book that describes about some escape characters like ", punctuations and special characters ,etc.     |author2|
    |3     |Mathematics       |book that has mathematical operations like addition +, subtraction -, multiplication *, division / etc.|author3|
    '------+------------------+-------------------------------------------------------------------------------------------------------+-------'
    

    If you notice, no data is lost.

    Hope this would help you out.