Closed Bug 1835034 Opened 1 year ago Closed 28 days ago

Implement JIT Support for Float16Array

Categories

(Core :: JavaScript Engine, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
130 Branch
Tracking Status
firefox130 --- fixed

People

(Reporter: dminor, Assigned: anba)

References

(Blocks 1 open bug)

Details

Attachments

(14 files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

From https://bugzilla.mozilla.org/show_bug.cgi?id=1833647#c12, a follow up to implement JIT support for Float16Array.

Severity: -- → S3
Priority: -- → P3

There's some discussion of optimizations here: https://github.com/tc39/proposal-float16array/issues/12

Assignee: nobody → andrebargull
Status: NEW → ASSIGNED
Depends on: 1905609

Add the more conversion methods from upstream. Later patches will call the new
methods.

Add if-constexpr to ElementSpecific::valueToNative to avoid compiler errors,
because both js::float16::operator=(double) and js::float16::operator=(float)
are applicable assignment operators when assigning from int64_t.

Support vcvtph2ps and vcvtps2ph instructions from the F16C instruction set.

F16C requires AVX being enabled per "Intel developer manual, Vol 1, §14.4.1
Detection of F16C Instructions".

Depends on D215762

Upstream already has this fixed.

Depends on D215763

Support float in addition to double in storeCallFloatResult.

Depends on D215764

Hardware support for float16 conversions is limited:

  • ARM supports float32<>float16 conversions with Neon. This is not implemented yet.
  • ARM64 supports float32<>float16 and float64<>float16 conversions.
  • x86/x64 supports float32<>float16 conversions when F16C instructions are supported.

We use the following approach for this initial implementation:

  • Use supported conversions when available, otherwise fall back to an ABI call.
  • Represent float16 as float32 throughout the JIT (so no MIRType::Float16 yet),
    because:
    1. float32 is supported for all targets, so we reduce cross-target differences
      when choosing the data type which is universally supported,
    2. float32<>float16 conversions are natively supported for the main target platforms,
    3. actual float16 math operations have even more limited hardware support, so we need
      to convert float16 to either float32 or float64 anyway at some point.
  • And this also enables using the existing optimisations for MIRType::Float32.

The next part will start using the conversion methods from this patch.

Note 1: float64->float16 conversion can't be emulated through a float64->float32->float16
conversion sequence, because the sequence float64->float32 and float32->float16 can
round differently than the direct float64->float16 conversion.

Note 2: float f(int32_t) in "ABIFunctionType.yaml" requires an explicit General -> Float32
entry for the ARM simulator, just adding Int32 -> Float32 led to an error.

Depends on D215765

Inline Math.f16round similar how Math.fround is inlined:

  • CacheIRCompiler either calls the conversion methods from part 5
    or calls into the VM.
  • Warp transpiles to MToFloat16, which has a similar implementation
    as MToFloat32.

Depends on D215766

Extend MacroAssembler::loadFromTypedArray to support loading from Float16Array.
This requires passing an additional temp-register and LiveRegisterSet when the
target doesn't natively support float32<>float16 conversions.

Codegen for LoadUnboxedScalar on x86/x64 looks like:

movzwl 0x0(%rdx,%rbx,2), %esi
vmovd %esi, %xmm0
vpmovzxwq %xmm0, %xmm0
vcvtph2ps %xmm0, %xmm0
vucomiss %xmm0, %xmm0
jnp .Lfrom120
movss .Lfrom128(%rip), %xmm0

And on ARM64:

ldr h0, [x2, x3, lsl #1]
fcvt s0, h0
fcmp s0, s0
b.vc -> 1015f
ldr s0, pc+24 (addr 0x70c2b0a96224) ; .const nan

Depends on D215767

Extend the existing DataView code to also support Float16, using similar
changes as the previous part.

Depends on D215768

Slightly larger changes when compared to the previous two parts, because
MacroAssembler::storeToTypedFloatArray had to be changed to support
conversions instead of performing conversion in its caller:

  • CacheIRCompiler::emitStoreTypedArrayElement used ScratchFloat32Scope to
    convert double -> float32, but using the same approach won't work for float16,
    because ScratchFloat32Scope is also needed in MacroAssembler::storeFloat16
    to convert float32 -> float16.
  • Therefore move the conversion double -> float32 into StoreToTypedFloatArray
  • And the conversions double -> float16 into MacroAssembler::storeFloat16.

Codegen for StoreUnboxedScalar on x64 looks like:

vcvtps2ph $0x4, %xmm0, %xmm15
vmovd %xmm15, %r11d
movw %r11w, 0x0(%rdx,%rbx,2)

And on ARM64:

h31, s0
h31, [x2, x4, lsl #1]

Depends on D215769

Transpiler and type policies add the following instructions when reading and then
storing a value from a Float16Array:

value = MLoadUnboxedScalar(f16array)
guarded_value = MToDouble(value) <-- Inserted by WarpCacheIRTranspiler
typed_value = MToFloat16(guarded_value) <-- Inserted by StoreUnboxedScalarPolicy
MStoreUnboxedScalar(f16array, typed_value)

Neither MToDouble nor MToFloat16 are needed, so let MToFloat16::foldsTo remove them.

This extra folding is needed because we don't yet have a MIRType::Float16 which we
can handle in MToFloat16::foldsTo.

The WarpCacheIRTranspiler change is an optimisation to avoid generating the following
instructions during transpiling and applying the type policy:

value = MLoadUnboxedScalar(f16array)
double_value = MToDouble(value) <-- Inserted by js::jit::AlwaysBoxAt
boxed_value = MBox(double_value) <-- Inserted by BoxPolicy
unboxed_value = MUnbox(boxed_value, Double) <-- Inserted by WarpCacheIRTranspiler

GVN will remove the MBox->MUnbox sequence, but it seems preferable to avoid generating it
in the first place.

Depends on D215771

Remove the TODO note about adding Float16Array JIT support by renaming
OutOfLineLoadTypedArrayOutOfBounds to OutOfLineAsmJSLoadHeapOutOfBounds
which makes it more clear that Float16 support isn't needed here.

Depends on D215772

Before this change:

[Codegen] vucomiss   %xmm0, %xmm0
[Codegen] jnp        .Lfrom214
[Codegen] movss       .Lfrom222(%rip), %xmm0

After this change:

[Codegen] vucomiss   %xmm0, %xmm0
[Codegen] jnp        .Lfrom214
[Codegen] movss      .Lfrom222(%rip), %xmm0

Note how the label identifiers are now properly aligned.

Depends on D215773

ToFloat32(ToDouble(float32)) is exactly equal to float32, so MToDouble can
produce Float32 when its input can produce Float32. This change is necessary to
enable Float32 optimizations for various instructions, for example MSqrt.

Without this change Float32 optimizations are always disabled, which makes it
hard to verify that Float16 operations correctly handle Float32 inputs and
outputs.

Depends on D215774

Mentor: dminor
Pushed by andre.bargull@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/410329e58599
Part 1: Add float/int conversions to js::float16. r=jandem
https://hg.mozilla.org/integration/autoland/rev/9d01c98d3d64
Part 2: Support encoding vcvtph2ps and vcvtps2ph instructions. r=jandem
https://hg.mozilla.org/integration/autoland/rev/ade51dbcc573
Part 3: Add missing to Float16 conversion when simulating FCVT_dh. r=jandem
https://hg.mozilla.org/integration/autoland/rev/3f2cb72c6348
Part 4: Support `float` results in storeCallFloatResult. r=jandem
https://hg.mozilla.org/integration/autoland/rev/c6ec3155a5f8
Part 5: Add {Double,Float32,Int32}ToFloat16 conversion methods to MacroAssemblers. r=jandem
https://hg.mozilla.org/integration/autoland/rev/f8cdec89d3cc
Part 6: Inline Math.f16round. r=jandem
https://hg.mozilla.org/integration/autoland/rev/d5ea08f74244
Part 7: Inline loading from Float16Array. r=jandem
https://hg.mozilla.org/integration/autoland/rev/816e1e8497d5
Part 8: Inline DataView.prototype.getFloat16. r=jandem
https://hg.mozilla.org/integration/autoland/rev/1543e18cfd43
Part 9: Inline storing into Float16Array. r=jandem
https://hg.mozilla.org/integration/autoland/rev/b39efb191c8e
Part 10: Inline DataView.prototype.setFloat16. r=jandem
https://hg.mozilla.org/integration/autoland/rev/10fb376db8cf
Part 11: Fold ToFloat16 when the input is guaranteed to be Float16. r=jandem
https://hg.mozilla.org/integration/autoland/rev/1006c87386c1
Part 12: AsmJS codegen doesn't support Float16Array. r=jandem
https://hg.mozilla.org/integration/autoland/rev/a67dc538eaee
Part 13: Fix indentation for codegen spew. r=jandem
https://hg.mozilla.org/integration/autoland/rev/f0bc3536dce7
Part 14: Enable float32 optimizations for MToDouble. r=jandem,nbp
https://hg.mozilla.org/integration/autoland/rev/e0c206023ab0
apply code formatting via Lando
You need to log in before you can comment on or make changes to this bug.