My database is a postgres v14 database with en_US UTF-8 locale. I have to import many .csv files and they have usually a decimal point (3.14) for reals and sometimes a decimal comma (3,14). My question: is there a way to set postgres in such a way that the data are correctly read into postgres?
What I did: I know whether the data has a decimal point or comma. In pandas the solution is rather elegant by setting the locale, like:
# test for decimal point, set locale accordingly
decimaal_teken = meta_data.iloc[0].loc['bronbestand_decimaal']
if decimaal_teken == '.':
logger.warning(f'Floating point met decimale punt')
locale.setlocale(locale.LC_NUMERIC, 'en_EN.utf8')
elif decimaal_teken == ',':
logger.warning(f'Floating point met decimale komma')
locale.setlocale(locale.LC_NUMERIC, 'en_DK.utf8')
else:
logger.warning(f'Onbekend decimaal teken: "{decimaal_teken}", decimale punt wordt gebruikt')
locale.setlocale(locale.LC_NUMERIC, 'en_EN.utf8')
I am looking for a similar solution in postgres but I cannot find it. I have a test file that looks like:
SET lc_numeric = 'en_DK';
-- SET DECIMALSEPARATORCOMMA=ON;
INSERT INTO public.floats (naam, value)
VALUES
('Drente', '112,5');
I import the file in psql either by -f or \i . In both cases there is an error that 112,5 is not a valid real. 112.5 is accepted.
The DECIMALCOMMASEPARATOR I found in this article yields an error when importing the file in psql. I have set the linux locale to en_DK as well but no success.
I can replace the comma by a dot but I want to leave the data untouched. I cannot imagine that postgres has no solution for this. Can somebody help by pointing me to some information I overlooked?
Thank you in advance!
Number literals always have to use a decimal point, no matter how lc_numeric
is set. This parameter only influences how the format letter D
(among others) is interpreted by to_numeric()
:
SELECT to_number('112,5', '990D99');
to_number
═══════════
112.5
(1 row)
You can use that with INSERT
, but you won't be able to use it with COPY
.