Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DAFFODIL-1516

dfdl:contentLength & dfdl:valueLength specifying lengthUnits 'characters' and variable-width encodings

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • DFDL Language
    • None

    Description

      Note that there is DFDL workgroup discussion about the implications of asking for length measured in units of 'characters' when the underlying item is not text, or not all text (complex types).

      There is no issue when the character set encoding is fixed width. One simple takes the data size in bytes/bits and does the math to convert to characters.

      The problem is when there is a variable-width encoding like UTF-8. Measuring length in characters in essence requires unparsing the data into those characters and counting how many, or perhaps unparsing the data to bits/bytes and then parsing it as characters and counting how many.

      In either case, unless there is a uniform character encoding the behavior is confusing. Other places in DFDL where data that is not necessarily text may get interpreted as text are in lengthKind 'pattern', and in the pattern asserts and pattern discriminators used in parsing.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mbeckerle Mike Beckerle
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: