I'm trying to remove stop words from a text in MarkLogic 8 using this function :
declare function rec:remove-stop-words($string, $stop_words) {
(: This is a recursive function. :)
if(not(empty($stop_words))) then
rec:remove-stop-words(
replace($string, $stop_words[1], '', 'i'),
(: This passes along the stop words after
the one just evaluated. :)
$stop_words[position() > 1]
)
else normalize-space($string)
};
Here where I call it
for $r in /rec:Record
return
rec:remove-stop-words(data($r/rec:Abstract), $stop_words}
It gives me the following error
XDMP-ARGTYPE: (err:XPTY0004) fn:replace((xs:untypedAtomic(" chapter utilized asymmetry of n..."), xs:untypedAtomic(" book interrelationship between ...")), "a", "", "i") -- arg1 is not of type xs:string?
The function expects a string
type but the actual type is untypedAtomic
. I don't know what to do!
NOTE: (( The problem is not in the function because I've tried to use it for a different text and it worked well )).
I tried to the code by converting untypedAtomic
to string
by:
return
<info>{rec:remove-stop-words(data(xs:string($r/rec:Abstract)), $stop_words)}</info>
but I got this error:
XDMP-ARGTYPE: (err:XPTY0004) fn:replace((" chapter utilized asymmetry of n...", " book interrelationship between ..."), "a", "", "i") -- arg1 is not of type xs:string
The problem is that when you iterate over /rec:Record
and pass $r/rec:Abstract
as input, at least one of your records is returning more than one rec:Abstract
. The function signature for rec:remove-stop-words
allows a sequence of values as input for $string
, but the function body where you call fn:replace
only handles input for a single value, so it throws an argument exception (given xs:string+
and expecting xs:string?
).
You can handle the sequence by iterating over rec:Abstract
before you call the function:
for $r in /rec:Record
for $a in $r/rec:Abstract
return
rec:remove-stop-words($a, $stop_words)
If you use stricter function signatures, it can help avoid problems like this, or at least make them easier to debug. For example, if you define your function to only allow a single input for the first parameter:
rec:remove-stop-words($string as xs:string, $stop_words as xs:string*)
...
This will throw a similar exception when $string
is passed a sequence, but higher up the call stack, which can help make these types of errors a little more obvious.