I have an issue with an Oracle pipeline function, and I am getting crazy to understand what is happening. My Oracle Database is version 19c running over Red Hat 7.2 and configured in AL32UTF8
as CharacterSet.
Let me explain the scenario.
I have the following setup with two types and one pipeline function in order to generate files using parallel processes, thereby I can speed up incredibly the generation of big files.
Two types
--
-- DUMP_PARALLEL_OBJECT (Type)
--
CREATE OR REPLACE TYPE CPL_DATA_OUT.dump_parallel_object AS OBJECT
(file_name VARCHAR2 (128), no_records NUMBER, seq_id NUMBER);
/
--
-- DUMP_PARALLEL_OBJECT_NTT (Type)
--
CREATE OR REPLACE TYPE CPL_DATA_OUT.dump_parallel_object_ntt AS TABLE OF cpl_data_out.dump_parallel_object;
/
Pipelined Function
This is the pipeline function to get the output files in chunks that I can join then using cat
in Linux.
CREATE OR REPLACE function CPL_DATA_OUT.fn_generate_parallel_file
(
p_source IN SYS_REFCURSOR,
p_filename IN VARCHAR2,
p_directory IN VARCHAR2,
p_extension IN VARCHAR2 DEFAULT 'csv',
p_limit IN NUMBER DEFAULT 10000
) return dump_parallel_object_ntt
pipelined
parallel_enable (partition p_source by any)
as
type row_ntt is table of varchar2(32767);
v_rows row_ntt;
v_file UTL_FILE.FILE_TYPE;
v_buffer VARCHAR2(32767);
v_sid NUMBER;
v_name VARCHAR2(128);
v_lines PLS_INTEGER := 0;
c_eol CONSTANT VARCHAR2(1) := CHR(10);
c_eollen CONSTANT PLS_INTEGER := LENGTH(c_eol);
c_maxline CONSTANT PLS_INTEGER := 32767;
begin
SELECT generate_random_number.nextval INTO v_sid FROM dual;
v_name := p_filename || '_' || TO_CHAR(v_sid) || '.' || p_extension;
v_file := UTL_FILE.FOPEN(p_directory, v_name, 'w', 32767);
LOOP
FETCH p_source BULK COLLECT INTO v_rows LIMIT p_limit;
FOR i IN 1 .. v_rows.COUNT LOOP
IF LENGTH(v_buffer) + c_eollen + LENGTH(v_rows(i)) <= c_maxline THEN
v_buffer := v_buffer || c_eol || v_rows(i);
ELSE
IF v_buffer IS NOT NULL THEN
UTL_FILE.PUT_LINE(v_file, v_buffer);
END IF;
v_buffer := v_rows(i);
END IF;
END LOOP;
v_lines := v_lines + v_rows.COUNT;
EXIT WHEN p_source%NOTFOUND;
END LOOP;
CLOSE p_source;
UTL_FILE.PUT_LINE(v_file, v_buffer);
UTL_FILE.FCLOSE(v_file);
PIPE ROW (dump_parallel_object(v_name, v_lines, v_sid));
RETURN;
END fn_generate_parallel_file;
/
I am using a sequence inside the function to assign unique numbers to those files. Let's test the scenario
Problematic Table
SQL> desc SRD_OUT.FCT_EMPROLE_TRANSFORM
Name Null? Type
----------------------------------------- -------- ----------------------------
DAT_MONTH DATE
PERSNR VARCHAR2(6 CHAR)
ARBEITS_STATUS VARCHAR2(50 CHAR)
NAME VARCHAR2(50 CHAR)
VORNAME VARCHAR2(50 CHAR)
FTE NUMBER
WOCHENSTUNDEN NUMBER
FUNKTION VARCHAR2(50 CHAR)
OE VARCHAR2(70 CHAR)
DIREKTION VARCHAR2(50 CHAR)
BEREICH VARCHAR2(50 CHAR)
N_NUMMER VARCHAR2(50 CHAR)
FTE_VALUE NUMBER
CENTERKEY VARCHAR2(200 CHAR)
ROLLE VARCHAR2(200 CHAR)
BEMESSUNGSFAKTOR VARCHAR2(50 CHAR)
COD_PROCESS VARCHAR2(30 CHAR)
DAT_EFFECTIVE DATE
SQL> select count(*) from SRD_OUT.FCT_EMPROLE_TRANSFORM ;
COUNT(*)
----------
20436
If I run the function against the dba_objects
or other similar table/views, everything works as it should.
SQL> COL FILE_NAME FOR A50
SQL> set lines 220
SQL> r
1 SELECT *
2 FROM TABLE(
3 cpl_data_out.fn_generate_parallel_file(
4 CURSOR(
5 SELECT /*+ PARALLEL(s,10) */
6 "OWNER" ||'~'||
7 "OBJECT_NAME" ||'~'||
8 "SUBOBJECT_NAME" ||'~'||
9 "OBJECT_ID" ||'~'||
10 "DATA_OBJECT_ID" ||'~'||
11 "OBJECT_TYPE" ||'~'||
12 "CREATED" ||'~'||
13 "LAST_DDL_TIME" as csv
14 FROM DBA_OBJECTS s)
15 , 'test_file'
16 , 'DIR_SRD_OUT'
17 , 'csv')
18 ) nt
19*
FILE_NAME NO_RECORDS SEQ_ID
-------------------------------------------------- ---------- ----------
test_file_459.csv 25496 459
test_file_449.csv 25496 449
test_file_453.csv 25496 453
test_file_461.csv 25496 461
test_file_455.csv 25499 455
test_file_451.csv 25496 451
test_file_447.csv 25496 447
test_file_443.csv 25496 443
test_file_457.csv 25496 457
test_file_445.csv 25497 445
10 rows selected.
As you may see, the pipelined function works as expected, it creates 10 csv files that I can join later on using cat
. However, if I try to run it against the table shown above, this happens ( for the purpose of the example, I am just using some columns of the table )
Working
SQL> SELECT *
2 FROM TABLE(
3 cpl_data_out.fn_generate_parallel_file(
4 CURSOR(
5 SELECT /*+ PARALLEL(s,10) */
6 "DAT_MONTH" ||'~'||
7 "PERSNR" ||'~'||
8 "COD_PROCESS" ||'~'||
9 "DAT_EFFECTIVE"
10 as csv
11 FROM SRD_OUT.FCT_EMPROLE_TRANSFORM s)
12 , 'test_file'
13 , 'DIR_SRD_OUT'
14 , 'csv')
15* ) nt
SQL> /
FILE_NAME NO_RECORDS SEQ_ID
-------------------------------------------------- ---------- ----------
test_file_569.csv 456 569
test_file_571.csv 489 571
test_file_575.csv 314 575
test_file_573.csv 483 573
test_file_577.csv 496 577
test_file_581.csv 487 581
test_file_579.csv 430 579
test_file_567.csv 3500 567
test_file_565.csv 3606 565
test_file_563.csv 10175 563
10 rows selected.
Not Working
SQL> SELECT *
2 FROM TABLE(
3 cpl_data_out.fn_generate_parallel_file(
4 CURSOR(
5 SELECT /*+ PARALLEL(s,10) */
6 "DAT_MONTH" ||'~'||
7 "PERSNR" ||'~'||
8 "COD_PROCESS" ||'~'||
9 "DAT_EFFECTIVE" ||'~'||
10 "ROLLE"
11 as csv
12 FROM SRD_OUT.FCT_EMPROLE_TRANSFORM s)
13 , 'test_file'
14 , 'DIR_SRD_OUT'
15 , 'csv')
16* ) nt
SQL> /
ERROR:
ORA-12801: error signaled in parallel query server P005
ORA-06502: PL/SQL: numeric or value error: character string buffer too small
ORA-06512: at "CPL_DATA_OUT.FN_GENERATE_PARALLEL_FILE", line 34
ORA-06512: at line 1
The only difference between the two queries is the column "ROLLE" which contains ASCII extended characters ( as letters in German such as "äüöß" ). It happens with every column that contains such characters.
Actually the error refers to this line: v_buffer := v_buffer || c_eol || v_rows(i);
, but I have no clue what is wrong there when such characters are involved.
SQL> set pages 200
SQL> r
1* select distinct rolle from SRD_OUT.FCT_EMPROLE_TRANSFORM
ROLLE
------------------------
Filialleiter (große Filiale)
Vertriebsdirektor Vermögensberatung
I don't really understand what relationship exists between those Extended ASCII characters and the function. What should I change in my function to make it work with such characters ?
Thank you all for your help.
When you do:
IF LENGTH(v_buffer) + c_eollen + LENGTH(v_rows(i)) <= c_maxline
you are counting the number of characters in the buffer and collection variables. When you only have single-byte characters that's OK, but with any multibyte characters you could get a situation where the number of characters totals less than 32767, but the number of bytes exceeds that. The check passes; but then you do:
v_buffer := v_buffer || c_eol || v_rows(i)
which exceeds the size of the buffer, and throws the error. If your buffer was declared smaller than the maximum and using character semantics you might still get away with that; but with the maximum size (and any semantics) it will fail.
If you count bytes instead of characters it won't exceed the byte limit:
IF LENGTHB(v_buffer) + c_eollen + LENGTHB(v_rows(i)) <= c_maxline