hi2txt: XML description reference

(note that a DTD is available in <hi2txt>/db/hi2txt.dtd)
The XML file defines which data definition version matches the hi/nvram files [check/definition/size], how the data can be extracted [structure/elt/loop] and what/how to display into text format [output/field/table].
The tool needs to select the best structure of data if more than one are available, based on hiscore.dat location [check/definition] used and resulting hi files size [check/size].
It means that one XML description can be compatible with different hiscore.dat locations for the same game, and also that choosing the best data extraction is automatic.
A data extraction structure defines on which file it can be applied (structure@file], and the different elements.

structure > check > ?definition "hiscore.dat_location"

optional location specifying the expected location to match this structure

ex: ddonpach.xml
<structure>...
<check>...
<definition>0:1016ea:64:00:05
0:101626:4:00:06</definition>
</check>
</structure>

structure > check > ?size "<integer>"

optional integer specifying the expected input size to match this structure

ex: ddonpach.xml
<structure>...
<check>...
<size>104</size>
</check>
</structure>

Elements [elt] can be text (type=txt), number (type=int), raw data (type=raw)
Similar elements or group of elements (high score 1, high score 2, etc.) can be described using a loop [loop] to have a clear definition.

elt

common properties

structure > elt id="<string>"

mandatory identifier allowing to associate a specific input element to an output field or column

ex: ddonpach.xml
<structure>...<elt size="4" type="int" id="TOP SCORE" base="16"/></structure>
<output>...<field id="TOP SCORE" format="*10" display="extra"/></output>

structure > elt size="<integer>"

mandatory integer specifying how many bytes must be read to decode a specific input element

ex: ddonpach.xml
<structure>...<elt size="4" type="int" id="TOP SCORE" base="16"/></structure>

structure > elt type="text|int|raw"

mandatory type specifying how to decode a specific element: does it store text (text)? number (int)? unknown data (raw)?
using type = int automatically converts the input data into a decimal value in base 10 (in this case, when the decimal value is the one directly seen in the hexa file, it explain why back conversion to a decimal in base 16 is needed). All common and number-related related operations are supported.
using type = raw stores the input data as-is, as some sort of string, without any convertion at all (some explicit operations like bytes manipulations (byte-swap, byte-trim, endianness, nibble-skip, nibble-trim, bitmask) are still supported).
using type = text stores the input data as a string and allows all common, bytes-related and text-related decoding operations.

ex: ddonpach.xml
<structure>...
<elt size="4" type="int" id="TOP SCORE" base="16"/>
<elt size="6" type="text" id="NAME" byte-skip="odd" ascii-step="4" ascii-offset="32" charset="ddonpach"/>
</structure>

structure > elt ?format="<format_identifier>"
structure > elt ?format="<identifier1>;<identifier2>;..."

optional reference to a format identifier describing how to format the element to decode it.

its usage is rare, as it is used here only to allow native decoding purpose, i.e. if the conversion of the requested type (int,text,raw) if possible only through a specific formatting
it is recommended to use output field/column format to do assembly of elements (sum, etc.) or element transformation for display purpose (*10, etc.)
if the format identifier is simple enough, this format definition can be skipped as the program will automatically create it, computing its content from the identifier itself.
more than one format identifier can be specified, separated by ';'
see column format for more information

ex: gng.xml
<structure>
<elt size="2" type="int" id="RANKPOINTER" format="rp1;rp2" byte-skip="odd"/>
...
</structure>
<format id="rp1"><substract>44</substract></format>
<format id="rp2"><divide>7</divide></format>

structure > elt ?decoding-profile="base-40|base-32|bcd|bcd-le"

optional identifier selecting a pre-defined profile to decode an element
a profile is a pre-define set of element properties
it means that this set of individual properties can be explicitely used instead of the corresponding profile
profiles are shortcuts to quickly decode an element using a set of properties frequently used together
base-40^[2]:
    src-unit-size="16"
    base="40"
    dst-unit-size="3"
    ascii-offset="64"
base-32:
    src-unit-size="16"
    base="32"
    dst-unit-size="3"
    ascii-offset="64"
bcd:
    nibble-skip="odd"
    base="16"
bcd-le:
    endianness="little_endian"
    nibble-skip="odd"
    base="16"

ex: trackfld.xml
<elt size="2" type="text" id="NAME" decoding-profile="base-32" charset="trackfld"/>
<elt size="4" type="int" id="HAMMER SCORE" decoding-profile="bcd"/>

ex: gauntlet.xml
<elt size="2" type="text" id="WARRIOR NAME" decoding-profile="base-40" charset="gauntlet"/>

ex: galaga.xml
<elt size="6" type="int" id="SCORE" decoding-profile="bcd-le" byte-trim="0x24"/>

byte manipulation properties

structure > elt ?endianness="big_endian|little_endian"

optional identifier defining the endianness of a group of bytes (default = big_endian).
"The term 'Endian' refers to the order of storing bytes in computer memory. In 'Big Endian' scheme, the most significant byte is stored in the lowest memory address (or big in first), while 'Little Endian' stores the least significant bytes in the lowest memory address. For example, the 32-bit integer 12345678H (2215053170₁₀) is stored as 12H 34H 56H 78H in big endian; and 78H 56H 34H 12H in little endian. An 16-bit integer 00H 01H is interpreted as 0001H in big endian, and 0100H as little endian."^[1]

ex: astdelux.xml
<elt size="3" type="int" id="SCORE" endianness="little_endian" base="16"/>

structure > elt ?byte-skip="0x<byte>|odd|even|1000|0100|0010|0001"

optional identifier allowing to remove all occurences of a specific byte, or all odd bytes, or all even bytes, or 3 bytes every 4 bytes.
01AB01CD₁₆ -> byte-skip="odd" -> ABCD₁₆
01AB01CD₁₆ -> byte-skip="even" -> 0101₁₆
01AB01CD₁₆ -> byte-skip="0xAB" -> 0101CD₁₆
01AB01CD₁₆ -> byte-skip="1000" -> 01₁₆
01AB01CD₁₆ -> byte-skip="0100" -> AB₁₆
01AB01CD₁₆ -> byte-skip="0010" -> 01₁₆
01AB01CD₁₆ -> byte-skip="0001" -> CD₁₆

Flavors "1000", "0100", "0010" and "0001" act as a shortcut avoiding to defined the related bitmask.

Ex:

  <elt ... bitmask="bm"/>

  <bitmask id="bm">

  <character mask="00000000 00000000 00000000 11111111 00000000 00000000 00000000 11111111 00000000 00000000 00000000 11111111"/>

  </bitmask>

...can be replaced by...

  <elt ... byte-skip="0001"/>

ex: ddonpach.xml
<elt size="6" type="text" id="NAME" byte-skip="odd"/>

ex: espgal.xml

structure > elt ?byte-trim="0x<byte>"

optional byte allowing to remove all occurences of this byte from the start of the data sequence.
01AB01CD₁₆ -> byte-trim="0x01" -> AB01CD₁₆
010101AB01CD₁₆ -> byte-trim="0x01" -> AB01CD₁₆

ex: galaga.xml
<elt size="6" type="int" id="SCORE" decoding-profile="bcd-le" byte-trim="0x24"/>

structure > elt ?byte-trunc="0x<byte>"

optional byte allowing to trunc this byte and all bytes after it. [since version 1.10]
AB00CDEF₁₆ -> byte-trunc="0x00" -> AB₁₆

ex: xmultipl.xml
<elt size="9" type="text" id="NAME" byte-trunc="0x00" charset="charconv"/>

structure > elt ?byte-swap="<integer>"

optional integer specifying the number of bytes to swap together. this swapping is repeated as long as there are data available.
01020304₁₆ -> byte-swap="2" -> 02010403₁₆
01020304₁₆ -> byte-swap="4" -> 04030201₁₆
0102030405060708₁₆ -> byte-swap="2" -> 0201040306050807₁₆0102030405060708₁₆ -> byte-swap="4" -> 0403020108070605₁₆

ex: rastan.xml
<elt size="12" type="text" id="TOP SCORE" endianness="little_endian" byte-swap="2" charset="topscore" src-unit-size="16"/>

structure > elt ?nibble-skip="odd|even"

optional identifier allowing to remove all odd nibbles, or all even nibbles.
01AB01CD₁₆ -> nibble-skip="odd" -> 1B1D₁₆
01AB01CD₁₆ -> nibble-skip="even" -> 0A0C₁₆

ex: robotron.xml
<elt size="6" type="text" id="NAME" nibble-skip="odd" charset="robotron"/>

structure > elt ?nibble-trim="0x<nibble>"

optional identifier allowing to remove all occurences of a nibble from the start of the data sequence.
FFF123₁₆ -> nibble-trim="0xF" -> 0123₁₆

ex: altair.xml
<elt size="5" type="int" id="SCORE" nibble-trim="0xF" base="16"/>

structure > elt ?bit-swap="yes|no"

optional boolean enabling or not bits swapping of each input byte
1A₁₆ -> 00011010₂ -> bit-swap="yes" -> 01011000₂ -> 58₁₆

ex: -
-

structure > elt ?bitmask="<bitmask_identifier>"

optional identifier specifying a bitmask definition to be used to select some bits of the input data.
see bitmask object definition for more information.

ex: klax.xml
<elt size="7" type="int" id="SCORE" table-index="loop_index" table-index-format="*2" bitmask="score_odd"/>
...
<bitmask id="score_odd"> 
<character mask="11111111 00000000 00000000 00000000 11111111 00000000 11111111"/>
</bitmask>

structure > elt ?src-unit-size="<integer>"

optional integer specifying how many bits must be taken into account, from the input data, to represent each unit of the defined element (default is 8 bits)
a typical example is the decoding of string of characters, where the element defines a size if 2 bytes (2 bytes = 16 bits => src-unit-size="16") from the input data, that can be translated into 3 characters (dst-unit-size="3").

ex: rastan.xml
<elt size="12" type="text" id="TOP SCORE" endianness="little_endian" byte-swap="2" charset="topscore" src-unit-size="16"/>

structure > elt ?dst-unit-size="<integer>"

optional integer specifying how many units of the defined element will be decoded, from the input data (default is 1 unit).
a typical example is the decoding of string of characters, where the element defines a size if 2 bytes (2 bytes = 16 bits => src-unit-size="16") from the input data, that can be translated into 3 characters (dst-unit-size="3").

ex: -
-

structure > elt ?swap-skip-order="<property1>;<property2>;..."

all skip and swap properties are done in a pre-defined sequence, matching 99% of the cases.
in rare case, the sequence order must be altered to do a specific operation before/after another one.
this optional list must specifies all the impacted properties, separated by ";": endian, byte-skip, nibble-skip, nibble-trim, byte-trim, byte-trunc, bitmask, byte-swap, bit-swap
default sequence: byte-swap -> bit-swap -> byte-skip -> endian -> byte-trunc -> byte-trim -> nibble-skip -> nibble-trim -> bitmask

ex: -
-

number/character related properties

structure > elt ?base="<integer>"

every integer element are automatically converted from hexadecimal (base 16) to decimal (base= 10).
optional property specifying an additional conversion to get the real value from the decimal representation.
base="16" means that the value will be converted back into hexadecimal.
0123₁₆ -> 291₁₀ (auto) -> base="16" -> 123₁₆ -> element value = 123
currently known base values used: 10, 16, 32, 40 but more are supported.

ex: ddonpach.xml
<elt size="2" type="int" id="MAXHIT" base="16"/>

text related properties

structure > elt ?charset="<charset_identifier>"

optional identifier specifying which charset object to use to translate specific bytes or sequence of bytes into ascii characters or ascii strings.
used when a generic algorithm cannot be applied to every characters.
see charset object for more information.

ex: ddonpach.xml
<elt size="6" type="text" id="NAME" byte-skip="odd" ascii-step="4" ascii-offset="32" charset="ddonpach"/>

a pre-defined charset can be called from the charset attribute, additionally to a custom charset. [since version 1.2]

a call to a pre-defined charset can override the start offset and the delta to add to the current step between each char.

supported pre-defined charsets:
CS_NUMBER, defining number from 0 to 9, using ASCII codes

ex: sbomber.xml
<elt ... charset="CS_NUMBER[-14];sbomber" ... >

structure > elt ?ascii-offset="<integer>"

optional integer specifying the decimal value to add to the value of each characters of the string, allowing to match standard ascii code.
targeted character: E -> decoded value: 37 -> offset: +32 -> standard ascii code of letter E (decimal) = 37+32 = 69
see ASCII reference^[3].

ex: ddonpach.xml
<elt size="6" type="text" id="NAME" byte-skip="odd" ascii-step="4" ascii-offset="32" charset="ddonpach"/>

structure > elt ?ascii-step="<integer>"

optional integer specifying the decimal value separating two consecutive characters.
E -> decimal input value: 148 -> ascii standard code: 148 / 4 + offset = 148 / 4 + 32 = 69 -> ascii standard code = (decimal input value / step) + offset
F -> decimal input value: 152 = 148+4 -> ascii standard code: 152 / 4 + offset = 152 / 4 + 32 = 70
G -> decimal input value: 156 = 152+4 -> ascii standard code: 156 / 4 + offset = 156 / 4 + 32 = 71
see ASCII reference^[3].

ex: ddonpach.xml
<elt size="6" type="text" id="NAME" byte-skip="odd" ascii-step="4" ascii-offset="32" charset="ddonpach"/>

elt loop related properties

optional keyword used for an elt embedded in a loop and specifying its position in the resulting array in memory, and consequently in the output table (default: loop_index).
possible values:
loop_index => the elt index is the loop iterator position
<loop count="5"><elt ... table-index="loop_index"/></loop>
loop step 0 -> elt index = 0
loop step 1 -> elt index = 1
loop step 2 -> elt index = 2
...
loop_reverse_index => the elt index is (loop max - 1 - loop iterator position), typically used when elt order is inverted
   <loop count="5"><elt ... table-index="loop_reverse_index"/></loop>
loop step 0 -> elt index = 4
loop step 1 -> elt index = 3
loop step 2 -> elt index = 2
loop step 3 -> elt index = 1
loop step 4 -> elt index = 0
...
last: the elt index is the value of the previous elt input data in the loop, typically used in a loop with multiple elts, the 1st one being the index or rank.
  <loop count="5">
    <elt id="elt1" ... table-index="itself"/>
    <elt id="elt2" ... table-index="last"/>
</loop>
loop step 0 -> elt1 index = elt1 value = X
                       elt2 index = index of last computed index = elt1 index = X
loop step 1 -> elt1 index = elt1 value = Y
                       elt2 index = index of last computed index = elt1 index = Y
...
itself: the elt index is the value of the elt itself (see previous example)
<column>:index_from_value: the elt index is the value of another elt located at the same loop step
<loop count="10">
    <elt ... type="int" id="RANKPOINTER" .../>
</loop>
<loop count="10">
    <elt ... id="SCORE" table-index="RANKPOINTER:index_from_value" .../>
</loop>
loop step 0, RANKPOINTER[0].value = X -> SCORE index = X
loop step 1, RANKPOINTER[1].value = Y -> SCORE index = Y
...
<column>:value_from_index: the elt index is the index of another elt sharing the same value

ex: tempest.xml
<loop count="8">
<elt size="3" type="text" id="NAME" table-index="loop_reverse_index" endianness="little_endian" charset="tempest" ascii-offset="65"/>
</loop>

ex: 1942.xml
<loop count="25">
<elt size="1" type="int" id="RANK"    table-index="itself"/>
<elt size="4" type="int" id="SCORE"   table-index="last" base="16"/>
<elt size="8" type="text" id="NAME"    table-index="last" ascii-offset="55" charset="cs1942"/>
<elt size="1" type="int" id="STAGE"   table-index="last"/>
<elt size="2" type="raw" id="UNKNOWN" table-index="last"/> 
</loop>

ex: gng.xml
<loop count="10">
<elt size="2" type="int" id="RANKPOINTER" format="rp1;rp2" byte-skip="odd"/>
</loop>
<loop count="10">
<elt size="4" type="int" id="SCORE" table-index="RANKPOINTER:index_from_value" base="16"/>
<elt size="3" type="text" id="NAME" table-index="RANKPOINTER:index_from_value" charset="gng"/>
</loop>

structure > loop > elt > ?table-index-format="<format_identifier>|<direct_implicit_format>"
structure > loop > elt > ?table-index-format="<identifier1>;<identifier2>;..."

optional identifier specifying how to format the elt index.
the final index is the combinaison of the table-index and table-index-format properties.
for loop step 2, with elt table-index="loop_index", then elt preliminary index is loop step = 2, then table-index-format transforms it using the specified format, meaning that elt final index can be 5 if format is "*2;+1"..
if the format identifier is simple enough, this format definition can be skipped as the program will automatically create it, computing its content from the identifier itself.
more than one format identifier can be specified, separated by ';'

ex: klax.xml
<loop count="5">
<elt size="7" type="int" id="SCORE" table-index="loop_index" table-index-format="*2" bitmask="score_odd"/>
<elt size="6" type="text" id="NAME" table-index="loop_index" table-index-format="*2" bitmask="name_odd" decoding-profile="base-40" charset="klax" format="+dot"/>
<elt size="6" type="int" id="SCORE" table-index="loop_index" table-index-format="*2;+1" bitmask="score_even"/>
<elt size="4" type="text" id="NAME" table-index="loop_index" table-index-format="*2;+1" bitmask="name_even" decoding-profile="base-40" charset="klax" format="+dot"/>
</loop>

loop

structure > loop ?start="<integer>"

optional integer specifying the starting index value of the loop (default = 0).

ex: punchout.xml
<loop count="10" start="40">

structure > loop ?step="<integer>"

optional integer specifying the increment for each occurence of the loop (default = 1)

ex: -
-

structure > loop count="<integer>"

integer specifying the number of iterations of the loop.
final loop index is start+count*step, meaning by default 0+count*1 = count

ex: punchout.xml
<loop count="10" start="40">

structure > loop ?skip-last-bytes="<integer>"

[since version 1.2]
optional integer specifying the number of bytes at the end of the loop structure that must be ignored, because not dumped in the hi file.
it allows to define a loop of X entries, where the last entry is only partially dumped, instead of defining a loop of X-1 entries, and an additional loop of 1 entry, with only part of the previous loop columns.

ex: ket.xml
<loop count="10" skip-last-bytes="6">

structure > loop ?skip-first-bytes="<integer>"

[since version 1.2]
optional integer specifying the number of bytes at the beginning of the loop structure that must be ignored, because not dumped in the hi file.
it allows to define a loop of X entries, where the first entry is only partially dumped, instead of defining a loop of 1 entry, with only part of the next loop columns, and an additional loop of X-1 entries.

ex: ket.xml
<loop count="10" skip-first-bytes="2">

output
output fields are used to display an element as a single key/value pair.
output table and embedded columns are used to display all occurences/indexes of a set of elements.

output id="<string>"

[since version 1.4]

output identifier allowing to link it with a structure. see 'structure' object for more information.

ex: fixeight.xml
<structure output="broken_topscore">...</structure>
<output id="broken_topscore">...</output>

field

output > field ?id="<field_identifier>"

optional identifier allowing to associate an output field to a specific input element sharing the same id, giving the value to be displayed
if no identifier is provided, the attribute 'src' or a non-empty content is expected to define the value to be displayed

ex: ddonpach.xml
<field id="TOP SCORE" format="*10" display="extra"/>

output > field ?src="<elt_id>|index"

optional identifier specifying which element must be used as input to the field
if 'src' is specified, the input value of the field is taken from the related element, whenever a field id is set or not, even using another element identifier.
'src' value can also be the keyword 'index', meaning that the current output table index is used as input to this field.
'src' value can also be the keyword 'unsorted_index', meaning that the current output table index, before any sorting, is used as input to this field.

ex: -

output > field ?format="<format_identifier>|<direct_implicit_format>"
output > field ?format="<identifier1>;<identifier2>;..."

optional reference to a format identifier describing how to format the field to display it.
if the format identifier is simple enough, this format definition can be skipped as the program will automatically create it, computing its content from the identifier itself.
(note: the special implicit format identifier "0x" means that "0x" will be added a the beginning of the value itself, to emphasis the hexadecimal representation of the output if needed)

see column format for more information
more than one format identifier can be specified, separated by ';'

ex: ddonpach.xml
<field id="TOP SCORE" format="*10" display="extra"/>

output > field ?display="always|extra|debug"

optional keyword specifying in which context this field must be displayed (default is always).
always: the field is always displayed
extra: the field is displayed only if extra information are requested, using -ra command-line argument.
debug: the field is displayed only if debug information are requested, using -rd command-line argument.

ex: ddonpach.xml
<field id="TOP SCORE" format="*10" display="extra"/>

output ?content

the field content can be non-empty to specify hard-coded string to be displayed.
see column content for more information.

ex: -
-

table

output > table @?id="<table_id>:<value>"

[since version 1.6]

optional attribute specifying the table identifier
it is used only when extracting data in xml

ex: sonicwi3.xml
<table id="2 PLAYERS" line-ignore="2P SCORE:0">
<column id="RANK" src="index" format="+1"/>
<column id="2P SCORE"/>
...

output > table @?line-ignore="<column_id>:<value>"

optional attribute specifying what are the lines to not display
for example, line-ignore="SCORE:0" means that all lines of the output table that have a SCORE value equals to 0 must be skipped

ex: tempest.xml
<table line-ignore="SCORE:0">
<column id="RANK" format="+1" src="index"/>
<column id="NAME"/>
<column id="SCORE"/>
</table>

output > table ?line-ignore-operator="<operator>"

optional operator specifying the operator to be applied to know if a specific table line must be skipped, according to the line-ignore attribute (default is '==')
supported operators: <, ==, >, >=, <=, !=

ex: dino.xml
<table line-ignore-operator=">" line-ignore="SCORE:99000000">
<column id="RANK" format="+1" src="index"/>
<column id="SCORE"/>
<column id="NAME"/>
<column id="CHARACTER" format="character"/>
<column id="STAGE"/>
<column id="SPACE" display="debug"/>
<column id="UNKNOWN" format="0x" display="debug"/>
</table>

output > table ?sort="<column_id>"

optional identifier specifying that a column must be taken into account to sort the output table.
by default, the column values are converted as decimal or string, to do the sorting (in this priority order). it is rarely used as sorting can be done in 99% of the cases during the input elements decoding, using 'table-index' attribute, which is the recommended way to sort elements as it is best fitted to the original display algorithm

ex: srumbler.xml
<table sort-order="asc" sort="SCORE RANK">
<column id="RANK" format="+1" src="index"/>
<column id="SCORE"/>
<column id="SCORE RANK" display="debug"/>
<column id="NAME"/>
<column id="NAME RANK" display="debug"/>
</table>

output > table @?sort-order="asc|desc"

optional keyword specifying what is the sort order to be used, if 'sort' attribute has been defined (default is 'asc')

output > table @?sort-format="<format_identifier>|<direct_implicit_format>"
output > table @?sort-format="<identifier1>;<identifier2>;..."

[since version 1.7]
optional identifier specifying how to format the sort column values.
if the format identifier is simple enough, this format definition can be skipped as the program will automatically create it, computing its content from the identifier itself.
more than one format identifier can be specified, separated by ';'

ex: kof2001.xml
<table id="WINS" sort="WIN %" sort-format="TrimR%" sort-order="desc">
  <column id="CHARACTER" src="unsorted_index" format="character"/>
  <column id="WIN %" src="WIN CHARACTER" format="win;Suffix%"/>
  <column id="TOTAL" display="debug"/>
</table>

output > table ?lines-max="<integer>"

optional integer specifying the maximum number of table lines to be displayed, in case 'line-ignore' attribute is not adapted, which is very rare

ex: carnival.xml
<table line-ignore="SCORE:0" lines-max="3">
<column id="RANK" format="+1" src="index"/>
<column id="SCORE" format="*10"/>
<column id="NAME"/>
</table>

output > table ?display="always|extra|debug"

[since version 1.6]

optional keyword specifying in which context this table must be displayed (default is always).
always: the table is always displayed
extra: the table is displayed only if extra information are requested, using -ra command-line argument.
debug: the table is displayed only if debug information are requested, using -rd command-line argument

column

output > table > column ?id="<column_identifier>"

optional identifier allowing to associate an output column to a specific input element sharing the same id, giving the value to be displayed
if no identifier is provided, the attribute 'src' or a non-empty content is expected to define the value to be displayed

ex: ddonpach.xml
<column id="SCORE" format="score"/>

output > table > column ?src="<elt_id>|index"

optional identifier specifying which element must be used as input to the column
if 'src' is specified, the input value of the column is taken from the related element, whenever a field id is set or not, even using another element identifier.
'src' value can also be the keyword 'index', meaning that the current output table index is used as input to this column.
'src' value can also be the keyword 'unsorted_index', meaning that the current output table index, before any sorting, is used as input to this field.

ex: ddonpach.xml
<column id="RANK" src="index" format="+1"/>

output > table > column ?format="<format_identifier>|<direct_implicit_format>"
output > table > column ?format="<identifier1>;<identifier2>;..."

optional reference to a format identifier describing how to format the column to display it.
see format object definition below to understand the formatting possibilities.
if the format identifier is simple enough, this format definition can be skipped as the program will automatically create it, computing its content from the identifier itself.

operator	definition	example	introduction
*<number>	multiply
/<number>	divide	5800 / 60 => 96.66666
d<number>	divide and trunc as integer	5800 d 60 => 96
D<number>	divide and round to the nearest integer	5800 D 60 => 97	[since version 1.3]
-<number>	substract
+<number>	add
%<number>	remainder	5800 % 60 => 66666
><integer>	shift		[since version 1.2]
LC or Lowercase	lowercase		[since version 1.3]
UC or Uppercase	uppercase		[since version 1.2]
Capitalize	capitalize		[since version 1.3]
Round or R	round		[since version 1.3]
Trunc or T	trunc		[since version 1.3]
TrimL<character>	trim <character> from left side	TrimL0	[since version 1.3]
TrimR<character>	trim <character> from right side	TrimR0	[since version 1.3]
Trim<character>	trim <character> from both sides	Trim0	[since version 1.3]
PadL<integer><character>	pad <character> from left side, until a maximum of <integer> characters is reached	PadL60	[since version 1.3]
PadR<integer><character>	pad <character> from right side, until a maximum of <integer> characters is reached	PadR80	[since version 1.3]
Suffix<string>	concatenate <string> at the end, if not empty	Suffix%	[since version 1.3]
Prefix<string>	concatenate <string> at the beginning, if not empty	PrefixSTG	[since version 1.3]
LoopIndex	assign loop index to the current element		[since version 1.3]

more than one format identifier can be specified, separated by ';'.
(note: the special implicit format identifier "0x" means that "0x" will be added a the beginning of the value itself, to emphasis the hexadecimal representation of the output if needed).
see related format operations definition for more information.

ex: ddonpach.xml
<table>
<column id="RANK" src="index" format="+1"/>
...
</table>
<format id="+1"><add>1</add></format>

output > table > column ?display="always|extra|debug"

optional keyword specifying in which context this column must be displayed (default is always).
always: the column is always displayed
extra: the column is displayed only if extra information are requested, using -ra command-line argument.
debug: the column is displayed only if debug information are requested, using -rd command-line argument

output > table > column ?content

the column content can be non-empty to specify hard-coded string to be displayed.
in this case, column id is not necessary.

ex: spnchoutj.xml
<format id="percentage" input-as-subcolumns-input="true">
<concat>
    <column format="integer"/>
    <column>.</column>
    <column format="decimal"/>
    <column>%</column>
</concat>
</format>

format

format allows to transform a value or a set of values into another value, embedding a set of operations for this purpose.

format id="<identifier>"

identifier of the format, allowing it to be referenced, called by a field or column.

ex: spnchoutj.xml
<format id="percentage" input-as-subcolumns-input="true">
...
</format>

format ?formatter="<regex>"

regular expression allowing to format the output string, after all embedded operations.
formatter string syntax follows the Java format string syntax described in the Java doc of the class 'Formatter'.

ex: trackfld.xml
<format id="time" formatter="%.2fsec">
<divide>100</divide>
</format>

format ?apply-to="char|value"

keyword specifying if the format is applicable on the input value as a whole or on each 'character' of the input value (default is value).
it is targeting 'text' inpout element.
supported keywords:
char => format is applied on each character of the input value
value => format is applied one time for the whole value

ex: klax.xml, where the format append '.' after each character of the input value.
<format id="+dot" apply-to="char">
<suffix>.</suffix>
</format>

format ?input-as-subcolumns-input="yes|no"

boolean specifying that the input value of the format must not be used by the embedded operations directly, but as input value of the columns/fields contained inside these operations (default is no).
in this case, sub-columns/fields can skip the 'id' or 'src' attribute, as the input value for them is implicitely the value of the input calling the embedding format.
it allows to re-use multiple times the same input in different parts of a format, without having to define intermediate columns.
note that sub-columns/fields -embedded in an operation of this format- that are defining a non-empty content are not impacted as this content hardcodes itself the column input value.
note that operation of this format that doesn't use any sub-column/field also ignore this attribute, allowing to chain the result of operations inside a format, even if the format uses 'input-as-subcolumns-input' (see tgm2.xml format@id="medal"/trim). [since version 1.2]

supported format operations

format > ?add <integer>

integer added to the value of the column/field calling this format

ex: klax.xml
<format id="+1">
<add>1</add>
</format>

format > ?prefix <string> @empty <string> @consume <yes|no>

string to concatenate in front of the value of the column/field calling this format.
if the input value is empty, nothing is appended.
'empty' attribute allows to define what is considered as an empty input value (default is no value at all, but it can be set to "0" for example). [since version 1.2]
'consume' attribute indicates that an empty input value must be consumed by this operation, which will not return any output value (only useful when 'empty' attribute is set to something different than no value). [since version 1.2]
this operation can be implicitly defined when calling the format, using"Prefix<string>" (see column/field format attribute). [since version 1.3]

ex: turfmast.xml
<format id="+"><prefix>+</prefix></format>

ex: tgm2.xml => for medal value 0, nothing is appended and this '0' value is not returned/displayed
[since version 1.2]
<format id="medal_ac">
<prefix empty="0" consume="yes"> AC</prefix>
</format>

format > ?suffix <string> @empty <string> @consume <yes|no>

string to concatenate at the end of the value of the column/field calling this format.
if the input value is empty, nothing is appended.
'empty' attribute allows to define what is considered as an empty input value (default is no value at all, but it can be set to "0" for example). [since version 1.2]
'consume' attribute indicates that an empty input value must be consumed by this operation, which will not return any output value (only useful when 'empty' attribute is set to something different than no value). [since version 1.2]
this operation can be implicitly defined when calling the format, using "Suffix<string>" (see column/field format attribute). [since version 1.3]

ex: klax.xml
<format id="+dot" apply-to="char">
<suffix>.</suffix>
</format>

format > ?multiply <integer>

integer multiplied to the value of the column/field calling this format
this operation can be implicitly defined when calling the format, using "*<number>" (see column/field format attribute).

ex: turfmast.xml
<format id="-">
<multiply>-1</multiply>
<add>256</add>
<prefix>-</prefix>
</format>

format > ?divide <integer>

integer dividing the value of the column/field calling this format: the result can be a float.
if the result must keep only the decimal part, see the 'remainder' operation.
if the result must keep only the integer part, see the' divide_trunc' operation.
if the result must be rounded to the nearest integer, see the' divide_round' operation.
this operation can be implicitly defined when calling the format, using"/<number>" (see column/field format attribute).

ex: trackfld.xml
<format id="time" formatter="%.2fsec">
<divide>100</divide>
</format>

format > ?sum (<field>|<column>)+

list of columns/fields to sum all together.
these columns/fields must be number.
this operation can be implicitly defined when calling the format, using"+<number>" (see column/field format attribute).

ex: ddonpach.xml
<format id="score">
<sum>
<column id="SCORE1" format="*10"/>
<column id="SCORE2"/>
</sum>
</format>

format > ?concat (<field>|<column>|<txt>)+

list of columns/fields to be concatenated all together.
note: a specific 'txt' element can be used also to store hard-coded text with improved readibility [since version 1.1.20140809].

ex: ddonpach.xml
<format id="area">
<concat>
    <column id="LOOP" format="default_loop;-"/>
    <column id="STAGE" format="default_stage"/>
</concat>
</format>

ex: kindmgp.xml
<format id="time" input-as-subcolumns-input="yes">
<concat>
    <column format="MN"/>
    <txt>'</txt>
    <column format="SEC"/>
    <txt>''</txt>
    <column format="CS"/>
</concat>
</format>

format > ?min (<field>|<column>)+

list of columns/fields from where the minimum value will be selected

ex: -
<format id="score">
<min>
    <field id="VALUE_1"/>
    <field id="VALUE_2"/>
    <field id="VALUE_3"/>
</min>
</format>

format > ?max (<field>|<column>)+

list of columns/fields from where the maximum value will be selected

ex: phoenix.xml
<format id="score">
<max>
    <field id="TOP SCORE ALT"/>
    <field id="SCORE 1 ALT"/>
    <field id="SCORE 2 ALT"/>
</max>
</format>

format > ?pad <string> @direction <left|right> @max <integer>

string to append before or after (attribute 'direction') the value of the column/field calling this format, enabling to reach the maximum number of characters requested ('max' attribute), but no more.
in case the initial value is already >= maximum specified, nothing is added.
this operation can be implicitly defined when calling the format, using "PadL<integer_for_max><character>", "PadR<integer_for_max><character>" (see column/field format attribute). [since version 1.3]

ex: spnchoutj.xml
<format id="ms">
<remainder>100</remainder>
<pad direction="left" max="2">0</pad>
</format>

format > ?trim <string> @direction <left|right|both>

all consecutive occurences of the specified string will be removed, starting from the start (direction="left"), the end (direction="right"), or both (direction="both" [since version 1.3]) of the value of the column/field calling this format.
this operation can be implicitly defined when calling the format, using "TrimL", "TrimR" or "Trim" (see column/field format attribute).

ex: twincobr.xml
<format id="trim">
<trim direction="left">0</trim>
</format>

format > ?substract <integer>

integer substracted from the value of the column/field calling this format.
this operation can be implicitly defined when calling the format, using "-<number>" (see column/field format attribute).

ex: gng.xml
<format id="rp1">
<substract>44</substract>
</format>

format > ?remainder <integer>

integer dividing the value of the column/field calling this format: only the remainder part is kept.
if the result must keep integer and decimal parts, see the 'divide' operation.
if the result must keep only the integer part, see the' divide_trunc' operation.
if the result must be rounded to the nearest integer, see the' divide_round' operation.
this operation can be implicitly defined when calling the format, using "%<number>" (see column/field format attribute).

ex: spnchoutj.xml
<format id="ms">
<remainder>100</remainder>
<pad direction="left" max="2">0</pad>
</format>

format > ?trunc

convert a number into an integer, by skipping the decimal part.
this operation can be implicitly defined when calling the format, using "Trunc" (see column/field format attribute). [since version 1.3]

ex: -
<format id="integer">
<trunc/>
</format>

format > ?round

convert a number into the nearest rounded integer.
this operation can be implicitly defined when calling the format, using "Round" (see column/field format attribute). [since version 1.3]

ex: -
<format id="rounded">
<round/>
</format>

format > ?divide_trunc (<value>|field|column)*>

integer dividing the value of the column/field calling this format: only the integer part is kept.
if the result must keep integer and decimal parts, see the 'divide' operation.
if the result must keep only the decimal part, see the 'remainder' operation.
if the result must be rounded to the nearest integer, see the' divide_round' operation.
this operation can be implicitly defined when calling the format, using "d<number>" (see column/field format attribute).

ex: spnchoutj.xml
<format id="sec">
<remainder>10000</remainder>
<divide_trunc>100</divide_trunc>
<pad direction="left" max="2">0</pad>
</format>

format > ?divide_round (<value>|field|column)*>

[since version 1.3]
integer dividing the value of the column/field calling this format: the result is rounded to the nearest integer .
if the result must keep integer and decimal parts, see the 'divide' operation.
if the result must keep only the decimal part, see the 'remainder' operation.
if the result must keep only the integer part, see the' divide_trunc' operation.
this operation can be implicitly defined when calling the format, using "D<number>" (see column/field format attribute).

ex: mushitam.xml
<format id="time" input-as-subcolumns-input="yes">
<concat>
<column format="d60;d60"/>
<txt>:</txt>
<column format="d60;%60;PadL20"/>
<txt>:</txt>
<column format="%60;*100;D60;PadL20"/>
</concat>
</format>

format > ?replace @src <string> @dst <string>

all occurences of the 'src' string will be replaced by the 'dst' string

ex: suprmrio.xml
<format id="world">
<replace src="0" dst="W-"/>
</format>

format > ?shift

[since version 1.2]
input number is shifted^[7] by the specified value.
if the bit sequence 0001 0111 (decimal 23) were subjected to a logical shift of one bit position to the left would yield: 0010 1110 (decimal 46).
this operation can be implicitly defined when calling the format, using "><number>" (see column/field format attribute).

ex:
<format id="medal">
<shift>2</shift>
</format>

format > ?lowercase

[since version 1.3]
all characters of input data are put in lower case.
this operation can be implicitly defined when calling the format, using "Lowercase" or "LC" (see column/field format attribute).

ex:
<format id="trial">
<lowercase/>
</format>

format > ?uppercase

[since version 1.2, behavior modified in 1.3]
all characters of input data are put in upper case.
this operation can be implicitly defined when calling the format, using "Uppercase" or "UC" (see column/field format attribute).

ex: guwange.xml
<column id="NAME H" format="transliteration;UC;parenthesis"/>

ex:
<format id="trial">
<uppercase/>
</format>

format > ?capitalize

[since version 1.3]
first character of input data is put in upper case.
this operation can be implicitly defined when calling the format, using "Capitalize" (see column/field format attribute).

ex:
<format id="trial">
<capitalize/>
</format>

format > ?loopindex

[since version 1.3]
the formatted element takes the value of the loop index, if it is inside a loop.
this operation is done after "table-index" attribute management, allowing to use the real data as the index inside the table (@table-index="itself"), but after, modifying this data to be the loop index.
the main usage is to handle a list of pointers on the real data, allowing to sort them.
this operation can be implicitly defined when calling the format, using "LoopIndex" (see column/field format attribute).

ex: inthehunt.xml
<structure file=".hi">
...
<loop count="12">
<elt size="2" type="int" id="POINTER" endianness="little_endian" table-index="itself" table-index-format="-384;/16" format="LoopIndex"/>
</loop>
</structure>
<output>
<table line-ignore="NAME:" sort="POINTER">
...
</table>
</output>

format > ?case*

operation converting an input value into a specific value.
multiple cases can be defined.
inside the same format, multiple groups of 'case' can be defined, separated by other operations, allowing to chain multiple operations, including 'case'. [since version 1.2]

format > ?case @src="?" @dst="?" @default="yes|no"

'src' attribute specifies which input value matches the case (the src value can be a value in base 10 (nn)or in base 16 (0xnn)).
'dst' attribute specifies in which string the input will be converted, if the 'src' matches the input.
'default' attribute specifies which 'case' among the list of cases will be used if the input value doesn't match any case 'src'.

ex: 1941.xml
<format id="grade_mapping">
<case src="0" dst="SECOND LIEUTENANT" default="yes"/>
<case src="1" dst="FIRST LIEUTENANT"/>
<case src="2" dst="CAPTAIN"/>
<case src="3" dst="MAJOR"/>
<case src="4" dst="LIEUTENANT COLONEL"/>
<case src="5" dst="COLONEL"/>
<case src="6" dst="6"/>
<case src="7" dst="7"/>
</format>

ex: ad2083.xml
<format id="ad2083" apply-to="char">
<case src="0x40" dst=" "/>
<case src="0x5C" dst="."/>
</format>

ex: multiple and different groups of 'case'
<format id="medal">
<case src="A" dst="10"/>
<case src="B" dst="100"/>
<pad direction="left" max="3">0</pad>
<case src="000" dst=""/>
</format>

format > ?case @operator <operator>

operator defining how to check if the input value matches the 'src' (default is '==')
supported operators: <, ==, >, >=, <=, !=

ex: turfmast.xml
<format id="score">
    <case src="0"                    dst="EVEN"/>
    <case src="240" operator="<" format="+"/>
    <case src="240" operator=">=" format="-"/>
</format>

format > ?case @operator-format "<format_identifier>|<direct_implicit_format>"
format > ?case @operator-format "<identifier1>;<identifier2>;..."

identifier allowing to format input value before applied the match.
input value + operator-format = formated input value
formated input value + operator + src value = 'case' matches or not
more than one format identifier can be specified, separated by ';'

ex: dariusg.xml
<case src="16" operator-format="-1792;%256" operator="<" dst="ABDGKQW"/>

format > ?case@format="<format_identifier>|<direct_implicit_format>"
format > ?case@format="<identifier1>;<identifier2>;..."

identifier allowing to format the input value to produce the result.
more than one format identifier can be specified, separated by ';'

ex: turfmast.xml
<format id="score">
    <case src="0"                    dst="EVEN"/>
    <case src="240" operator="<" format="+"/>
    <case src="240" operator=">=" format="-"/>
</format>

charset
charset is a set of translations from a raw value into a character (char).

charset id="<identifier>"

identifier of the charset, allowing it to be referenced, called by a field or column.

ex: ddonpach.j
<structure>
...
<loop count="5"><elt size="6" type="text" id="NAME" ... charset="ddonpach"/></loop>
...
</structure>
...
<charset id="ddonpach">
<char src="0x00" dst=" "/>
<char src="0x38" dst="."/>
</charset>

pre-defined charsets are defined: see elt@charset section. [since version 1.2]

charset > char

operation converting an input character into an output character or string
multiple chars can be defined.

charset > char @src="" @dst="" ?@default="yes|no"

'src' attribute specifies which input value matches the char (the src value can be a value in base 10 (nn)or in base 16 (0xnn)).
'dst' attribute specifies in which string the input will be converted, if the 'src' matches the input: it can be a single letter, an empty string, a string with multiple characters, including special characters.
'default' attribute specifies which 'case' among the list of cases will be used if the input value doesn't match any case 'src'.

ex: ddonpach.j
<charset id="ddonpach">
<char src="0x00" dst=" "/>
<char src="0x38" dst="."/>
</charset>

ex: mooncrst.xml
<char src="0xFF" dst=""/> 

ex: gigawing.xml
<char src="0x063E" dst="Dr."/>
<char src="0x0642" dst=".Jr"/>
<char src="0x0646" dst="St."/>
<char src="0x07B6" dst="&black-heart;"/>
<char src="0x07BA" dst="&black-diamond;"/>
<char src="0x07BE" dst="&black-club;"/>

bitmask
bitmask is a set of data selection at bit level (character), mainly targetting character extraction.

bitmask @id="<identifier>"

identifier of the bitmask, allowing it to be referenced, called by a field or column.

ex: rtype.xml
<elt size="3" type="int" id="SCORE" bitmask="score" base="16"/>
...
<bitmask id="score"> 
<character mask="00000000 11111111 00000000"/>
<character mask="11111111 00000000 00000000"/>
<character mask="00000000 00000000 11111111"/>
</bitmask>

bitmask > ?@byte-completion="yes|no"

default is yes.
activate or not completion up to 8 bits (1 byte) for each individual 'character/mask' result.
when each character mask defines a single characters, and less than 8 bits is selected, it is necessary to indicates that completion up to a full byte is necessary for each of these characters.
when character mask selects less than 8 bits, but multiple characters are defined only to re-order bytes (ex: take 4 end bits, then 4 first bits, using 2 characters), this completion is inadequate.
note that the final output is always stored as a bytes array, meaning autocompletion of the final result.

ex: slapfigh.xml
<bitmask id="score" byte-completion="no"> 
<character mask="00000000 00000000 00001111"/>
<character mask="00000000 11111111 00000000"/>
<character mask="11111111 00000000 00000000"/>
<character mask="00000000 00000000 11110000"/>
</bitmask>

bitmask > character

a character can be used to

skip unused bits
select individual bits to build each output character
re-order bits to be able to decode the full string
etc.

bitmask > character @mask="<mask>"

sequence of 0 and 1, allowing to skip/select specific bits from the full input value
each mask takes the full input data and create an individual 'character' output (set of bits, with completion or not).
[IMPORTANT NOTE] each character is working on the full input: it means that the mask must define all bits from the full entry. So, for a 3 characters name (on 3 bytes), each character mask must define 3*8 bits and put enough bits to 1 to specify how to extract one character.
the outputs of all masks are concatenated together, at bit level.
note that the final output is always stored as a bytes array, meaning autocompletion of the final result.

ex: rtype.xml
<bitmask id="score"> 
<character mask="00000000 11111111 00000000"/>
<character mask="11111111 00000000 00000000"/>
<character mask="00000000 00000000 11111111"/>
</bitmask>

ex: solarq.xml
<bitmask id="name">
<character mask="00000000 00111111 00000000 00000000"/>
<character mask="00111111 11000000 00000000 00000000"/>
<character mask="00000000 00000000 00000000 00111111"/>
</bitmask>

example: byte-completion = 'yes'
    input 00000000 11111111 10101010
    input + mask 1 "00001111 1111" => 00001111 (already 8 bits, so no need for completion)
    input + mask 2 "00000000 00000000 1111" => 1010 + completion => 00001010
    output "0000111100001010" already on 2 bytes

example: byte-completion = 'no'
    input 00000000 11111111 10101010
    input + mask 1 "00001111 1111" => 00001111
    input + mask 2 "00000000 00000000 1111" => 1010 (no completion)
    output "000011111010" meaning "00000000 11111010" on 2 bytes

References

1. "A Tutorial on Data Representation", section 3.9 'Big Endian vs Little Endian', Chua Hock-Chuan
http://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html
2. "Slimming Strings With Custom Base-40 Packing", Al Williams
http://www.drdobbs.com/embedded-systems/slimming-strings-with-custom-base-40-pac/229400732
3. "ASCII", Wikipedia
http://en.wikipedia.org/wiki/US-ASCII
4. "Unicode entities", Wikipedia
http://en.wikibooks.org/wiki/Unicode
http://en.wikibooks.org/wiki/Unicode/Character_reference
http://en.wikibooks.org/wiki/Unicode/List_of_useful_symbols
http://unicode-table.com
5. "Binary-to-text encoding", Wikipedia
http://en.wikipedia.org/wiki/Binary-to-text_encoding
6. "Base 32", Wikipedia
http://en.wikipedia.org/wiki/Base32
7. "Shift operation", Wikipedia
http://en.wikipedia.org/wiki/Logical_shift

History

2020-03-13 v1.9: add decoding parameter 'nibble-trim', introduced in hi2txt v1.11
2019-09-01 v1.8: add decoding parameter 'byte-trunc', introduced in hi2txt v1.10
2017-01-23 more details about "bitmask character" behavior
2016-04-09 aligned with hi2txt v1.7
2015-28-11 aligned with hi2txt v1.6
...
2014-09-01 v1.2

XML description reference [version 1.9]