I got array of words roughly 10k in size. I need to check text column of rows if there is any of this words in the text. I decided to use multiSearchAnyCaseInsensitive:
AS WITH (select groupArray(word) from bad_words) AS patterns
SELECT
s.id AS id,
multiSearchAnyCaseInsensitive(s.text, patterns) AS has_bad_words
FROM source s
GROUP BY id;
but it has limitation on number of elements in array, so I got this error
Number of arguments for function multiSearchAnyCaseInsensitive doesn't match: passed 10002, should be at most 255
What other options do I have?
P.S. I can't split text by space, because some needles are couple of words.
try following approach
CREATE TABLE source
(
id UInt32,
text String
)
ENGINE = MergeTree
ORDER BY id;
CREATE TABLE bad_words
(
word String
)
ENGINE = MergeTree
ORDER BY word;
SELECT
s.id AS id,
count() > 0 AS has_bad_words
FROM source s
JOIN bad_words bw
ON positionCaseInsensitive(s.text, bw.word) > 0
GROUP BY s.id