firebirdexternal-tables

How do I create an external table in Firebird and relate it to a CSV file?


I created a CSV file using a comma to separate fields. There are two field values on each line separated by a comma. I created an external table using two fields and referencing the filespec for the CSV file. When I use the following command:

insert into realtable (field1, field2) select from externaltable (field1, field2);

all the data from the csv file is inserted in field1. It appears that the comma is not being recognized when the create table command is used. How do you indicate the separator value when creating an external table?


Solution

  • Firebird external tables use a binary format. It is basically the same format as Firebird uses when holding a row in memory (when a row is written to disk in a normal table it applies a RLE compression on the row, but not for external tables).

    In other words, you cannot read a CSV file directly using an external table.

    There is a trick you can use to make a Firebird external table work as a fixed-width text format (that is, a column has the same width in each row, and you need to use spaces to pad a value to the declared length of the column). For that, you need to use only CHAR columns, and add an extra column for the line break (either a CHAR(1) if you use only LF or a CHAR(2) if you use CR/LF). It is recommended to explicitly specify the character set of the columns. Note that using a multi-byte character set like UTF-8 is not recommended for this trick (as with UTF-8, it will no longer be fixed-width text (only fixed-width in bytes), and the length in bytes will be 4x the length in characters).

    Alternatively, you can use a tool to convert a CSV file to a Firebird external table. I have written such a tool, Firebird External Table Generator, or ext-table-gen (it's free and open source). The current version only supports CHAR columns as described above, and the next version will also support data types SMALLINT, INTEGER, BIGINT, INT128, DATE, TIME, TIMESTAMP, NUMERIC and DECIMAL.

    It comes with a manual describing its use in more detail, but paraphrasing an example from the manual, you can derive the definition of an external table from a CSV file:

    CSV file:

    ID,Lastname,Firstname,Email
    1,Doe,John,john@example.org
    2,Doe,Jane,jane@example.org
    3,Deer,Jason,jason@example.org
    4,Deer,Jillian,jillian@example.org
    

    Command to generate an external table file:

    ext-table-gen --csv-file=C:\path\to\persons.csv --table-file=C:\path\to\persons.dat --config-out=C:\path\to\persons.xml
    

    This generates an external table file, persons.dat, and also generates an XML config file, persons.xml. You can modify this file further. For example, if you want to reuse the configuration to repeatedly import new versions of the CSV, it may make sense to define wider columns than derived from the initial file.

    The config file also shows the DDL necessary to define the external table. NOTE: if you change the column definitions, then also make sure to modify the DDL, or regenerate the config file, and of course regenerate the persons.dat so it conforms to the wider format.

    For the example, it generates the following DDL for the external table:

    create table "DEFAULT_EXTERNAL_TABLE_NAME" external file 'C:\path\to\persons.dat' (
      "ID" char(1) character set ISO8859_1,
      "Lastname" char(4) character set ISO8859_1,
      "Firstname" char(7) character set ISO8859_1,
      "Email" char(19) character set ISO8859_1,
      "LF" char(1) character set ASCII default _ASCII x'0a'
    );
    

    Execute the DDL, and you can then select from the persons.dat file created by ext-table-gen.

    select * from DEFAULT_EXTERNAL_TABLE_NAME