I really need your help with a query in BaseX. The problem is that I really do not understand the logic behind this language which is Xquery. So I have this first exercise and it is asking me:
"Find the first symptom(s) appearing after June 5, 2012. Report the result in a document having root SYMSAFTER, containing elements SYM."
The database is like that
<?xml version="1.0"?>
<PATIENT_SYMS>
<PATIENT>
<NAME>Bob</NAME>
<SYMOCC>
<SYM>
<INT>high</INT>
<DESC> edema </DESC>
</SYM>
</SYMOCC>
</PATIENT>
<PATIENT>
<NAME>Ann</NAME>
<SYMOCC>
<DATE>2015-08-03</DATE>
<SYM>
<INT>low</INT>
<DESC> asthma </DESC>
</SYM>
</SYMOCC>
<SYMOCC>
<DATE>2017-05-03</DATE>
<SYM>
<INT> high </INT>
<DESC> nausea </DESC>
</SYM>
</SYMOCC>
</PATIENT>
<PATIENT>
<NAME> Tom </NAME>
<SYMOCC>
<DATE>2011-01-01</DATE>
<SYM>
<INT>high</INT>
<DESC> headache </DESC>
</SYM>
<SYM>
<INT> low </INT>
<DESC> nausea </DESC>
</SYM>
</SYMOCC>
</PATIENT>
<PATIENT>
<NAME>Sue</NAME>
</PATIENT>
</PATIENT_SYMS>
The answer to the question is the following:
<SYMSAFTER> {
for $s in doc('Ps.xml')//SYMOCC
where $s/DATE > '2012-06-05' and (every $s1 in doc('Ps.xml')//SYMOCC satisfies not($s1/DATE > '2012-06-05') or $s1/DATE >= $s/DATE)
return $s
}
</SYMSAFTER>
The output will be:
<SYMSAFTER>
<SYMOCC>
<DATE>2015-08-03</DATE>
<SYM>
<INT>low</INT>
<DESC>asthma</DESC>
</SYM>
</SYMOCC>
</SYMSAFTER>
I honestly don't understand the logic behind that.
satisfies not($s1/DATE > 2012-06-05)
why this one down below it is actually not working?
satisfies ($s1/DATE < 2012-06-05)
isn't it the exact same thing?
Why is the last part "OR" and not "AND". I got we're checking if the first date is actually the first by checking if there isn't another date before that date but shouldn't it be "AND"?
Why in this line
$s1/DATE >= $s/DATE
we put greater equal (and not just greater)? isn't it obvious that it is going to find the same date equal to the one on $s?
As you can imagine I'm a bit confuse about this, but online informations are really poor and I had no idea on what I need to do. Thank you!
Learning any language from online resources alone can be very tough. There's so much information, but it is typically of very mixed quality, and most of it's written in an hour or two with very little design or review. Get yourself a good old-fashioned book, like Priscilla Walmsley's - you know that's written by an expert, who has spent months thinking carefully about how to present information in a logical sequence, and it will have been carefully reviewed by others.
Now let's look at this example query.
for $s in doc('Ps.xml')//SYMOCC
where $s/DATE > '2012-06-05'
and (every $s1 in doc('Ps.xml')//SYMOCC
satisfies not($s1/DATE > '2012-06-05')
or $s1/DATE >= $s/DATE)
return $s
I actually think this is a very poor answer to the question, but let's analyse what it means.
Firstly, you have to know the language pretty well to know the precedence of the operators, specifically, whether the "or xxxx" clause is part of the "satisfies" condition or not. In fact it is, as I have tried to show in my indentation - but it would be better to use parentheses to make it clear.
The query is looking for dates in doc('Ps.xml')//SYMOCC
that satisfy two conditions: (a) the date D must be after 2012-06-05, and (b) every date in the document must either be before 2012-06-05, or >= D. Those two conditions correspond to the conditions in the requirement that (a) the date must be after 2012-06-05, and (b) it must be earlier than any other date.
Let's try and answer your questions:
It's not an imperative, procedural language, it's a declarative language. It doesn't have instructions, and they aren't executed. It's a logic-based declarative language where you say what conditions the answer must satisfy, and the system works out how to get that answer. Different implementations will do it quite differently depending on their optimization strategy.
The difference between DATE < XXX
and not(DATE >= XXX)
arises when there is no DATE (some of the SYMOCC elements do not have a DATE child). If there is no DATE, then DATE < XXX
and DATE >= XXX
are both false.
Why is it OR rather than AND? Well, I think the way the query is expressed is a little perverse, but given the approach taken, it's correct. The date D we're looking for is the first one after 2012-06-05 if every other date is either (a) earlier than 2012-06-05, or (b) later than D.
Why is the final condition >=
rather than >
? Because there can be multiple symptoms appearing on the same date. If you wrote >
, then you'd get no results in the event of duplicates.
Most of your questions seem to be less a problem with XQuery notation, and more a lack of understanding of how predicate logic works. But having said that, I would have produced a different solution to this problem. I would start by sorting all the events by date, then removing those before 2012-06-05, then removing those after the first date in the sequence. That would be something like
let $selected :=
for $s in doc('Ps.xml')//SYMOCC[DATE]
where $s/DATE > '2012-06-05'
order by $s/DATE
return $s
return $selected[DATE = $selected[1]/DATE]