I built a parser for HL7 based on documentation I found and thought it was working well--until I got examples of test data. I built it with the following assumptions:
~
is a "repeat" character. Basically meaning the value of the field passed is an array of the given values.^
indicates the field is represented by an array, but the expectation is the array items are used to build a final value.&
is similar to the ^
, but is a nested array inside of a ^
.These assumptions don't appear very accurate given the test data I have. Can someone help set me straight on what the right way to interpret these are?
As you are building a parser, I will go into little more details.
Please refer to this reference:
(x0D) Segment separator | Field separator, aka pipe ^ Component separator, aka hat & Sub-component separator ~ Field repeat separator \ Escape character
The segment separator is not negotiable. It is always a carriage return (ASCII 13 or HEX 0D). The others are suggested values only, but usually used as indicated above. The HL7 standard lets you choose your own as long as you show them in the MSH segment.
The MSH is the first segment of all HL7 messages (except HL7 batch messages). The field separator is presented as the 4th character in the message and it also represents the first field of the MSH segment. Since the first field of the MSH is typically only a pipe,’|’, counting MSH fields becomes tricky. Field 2 of the MSH (MSH-2) contains the other separator characters in this order: component, field repeat, escape, and sub-component.
Thus, the following is an example of the beginning of an HL7 message: MSH|^~&|…
As stated above:
~
represents that there are multiple values provided for this specific field. So, in terms of programming language, it is an array or list or similar data structure. Your assumption is correct. Please refer to this answer for more details.^
represent component parts of the given field. That means, one field may have multiple components. All these components combine represent final value. This should not be related to array in programming language terms I think. The example here is Person Name. Entire Person Name is single data which is split in family name, given name etc. As you can see, this is not an array. This is not multiple values; this is single value split in multiple sub values. So instead of array, you can think this as class
or struct
as in Composition.&
is sub-component which is similar to component as stated above with the difference that, it further splits data in given component in sub-components. Again, I think this should be linked with language specific class
or struct
instead of an array.Also, the characters listed above are default and most commonly used for the purpose stated. But, they can be changed. Basically, these characters are defined in each message in MSH(2)
. Note that first field is always field separator (|
) which is non-negotiable. So the next (second) field holds the Encoding Characters. As you are writing parser, you should read the encoding characters from here and use them accordingly further.
Order of the characters is also defined as mentioned here:
2.24.1.2 Encoding characters (ST) 00002 Definition: This field contains the four characters in the following order: the component separator, repetition separator, escape character, and subcomponent separator. Recommended values are ^~&, (ASCII 94, 126, 92, and 38, respectively).
Please refer to these other answers those discuss about HL7 Escape Sequences, conventions, and terms used.