Implement JIT Support for Float16Array
Categories
(Core :: JavaScript Engine, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox130 | --- | fixed |
People
(Reporter: dminor, Assigned: anba)
References
(Blocks 1 open bug)
Details
Attachments
(14 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
From https://bugzilla.mozilla.org/show_bug.cgi?id=1833647#c12, a follow up to implement JIT support for Float16Array.
Updated•1 year ago
|
Reporter | ||
Comment 1•4 months ago
|
||
There's some discussion of optimizations here: https://github.com/tc39/proposal-float16array/issues/12
Assignee | ||
Updated•1 month ago
|
Assignee | ||
Comment 2•1 month ago
|
||
Add the more conversion methods from upstream. Later patches will call the new
methods.
Add if-constexpr
to ElementSpecific::valueToNative
to avoid compiler errors,
because both js::float16::operator=(double)
and js::float16::operator=(float)
are applicable assignment operators when assigning from int64_t
.
Assignee | ||
Comment 3•1 month ago
|
||
Support vcvtph2ps
and vcvtps2ph
instructions from the F16C instruction set.
F16C requires AVX being enabled per "Intel developer manual, Vol 1, §14.4.1
Detection of F16C Instructions".
Depends on D215762
Assignee | ||
Comment 4•1 month ago
|
||
Upstream already has this fixed.
Depends on D215763
Assignee | ||
Comment 5•1 month ago
|
||
Support float
in addition to double
in storeCallFloatResult
.
Depends on D215764
Assignee | ||
Comment 6•1 month ago
|
||
Hardware support for float16 conversions is limited:
- ARM supports float32<>float16 conversions with Neon. This is not implemented yet.
- ARM64 supports float32<>float16 and float64<>float16 conversions.
- x86/x64 supports float32<>float16 conversions when F16C instructions are supported.
We use the following approach for this initial implementation:
- Use supported conversions when available, otherwise fall back to an ABI call.
- Represent float16 as float32 throughout the JIT (so no
MIRType::Float16
yet),
because:- float32 is supported for all targets, so we reduce cross-target differences
when choosing the data type which is universally supported, - float32<>float16 conversions are natively supported for the main target platforms,
- actual float16 math operations have even more limited hardware support, so we need
to convert float16 to either float32 or float64 anyway at some point.
- float32 is supported for all targets, so we reduce cross-target differences
- And this also enables using the existing optimisations for
MIRType::Float32
.
The next part will start using the conversion methods from this patch.
Note 1: float64->float16
conversion can't be emulated through a float64->float32->float16
conversion sequence, because the sequence float64->float32
and float32->float16
can
round differently than the direct float64->float16
conversion.
Note 2: float f(int32_t)
in "ABIFunctionType.yaml" requires an explicit General -> Float32
entry for the ARM simulator, just adding Int32 -> Float32
led to an error.
Depends on D215765
Assignee | ||
Comment 7•1 month ago
|
||
Inline Math.f16round
similar how Math.fround
is inlined:
- CacheIRCompiler either calls the conversion methods from part 5
or calls into the VM. - Warp transpiles to
MToFloat16
, which has a similar implementation
asMToFloat32
.
Depends on D215766
Assignee | ||
Comment 8•1 month ago
|
||
Extend MacroAssembler::loadFromTypedArray
to support loading from Float16Array.
This requires passing an additional temp-register and LiveRegisterSet
when the
target doesn't natively support float32<>float16 conversions.
Codegen for LoadUnboxedScalar
on x86/x64 looks like:
movzwl 0x0(%rdx,%rbx,2), %esi
vmovd %esi, %xmm0
vpmovzxwq %xmm0, %xmm0
vcvtph2ps %xmm0, %xmm0
vucomiss %xmm0, %xmm0
jnp .Lfrom120
movss .Lfrom128(%rip), %xmm0
And on ARM64:
ldr h0, [x2, x3, lsl #1]
fcvt s0, h0
fcmp s0, s0
b.vc -> 1015f
ldr s0, pc+24 (addr 0x70c2b0a96224) ; .const nan
Depends on D215767
Assignee | ||
Comment 9•1 month ago
|
||
Extend the existing DataView code to also support Float16, using similar
changes as the previous part.
Depends on D215768
Assignee | ||
Comment 10•1 month ago
|
||
Slightly larger changes when compared to the previous two parts, because
MacroAssembler::storeToTypedFloatArray
had to be changed to support
conversions instead of performing conversion in its caller:
CacheIRCompiler::emitStoreTypedArrayElement
usedScratchFloat32Scope
to
convertdouble -> float32
, but using the same approach won't work for float16,
becauseScratchFloat32Scope
is also needed inMacroAssembler::storeFloat16
to convertfloat32 -> float16
.- Therefore move the conversion
double -> float32
intoStoreToTypedFloatArray
- And the conversions
double -> float16
intoMacroAssembler::storeFloat16
.
Codegen for StoreUnboxedScalar
on x64 looks like:
vcvtps2ph $0x4, %xmm0, %xmm15
vmovd %xmm15, %r11d
movw %r11w, 0x0(%rdx,%rbx,2)
And on ARM64:
h31, s0
h31, [x2, x4, lsl #1]
Depends on D215769
Assignee | ||
Comment 11•1 month ago
|
||
Depends on D215770
Assignee | ||
Comment 12•1 month ago
|
||
Transpiler and type policies add the following instructions when reading and then
storing a value from a Float16Array:
value = MLoadUnboxedScalar(f16array)
guarded_value = MToDouble(value) <-- Inserted by WarpCacheIRTranspiler
typed_value = MToFloat16(guarded_value) <-- Inserted by StoreUnboxedScalarPolicy
MStoreUnboxedScalar(f16array, typed_value)
Neither MToDouble
nor MToFloat16
are needed, so let MToFloat16::foldsTo
remove them.
This extra folding is needed because we don't yet have a MIRType::Float16
which we
can handle in MToFloat16::foldsTo
.
The WarpCacheIRTranspiler change is an optimisation to avoid generating the following
instructions during transpiling and applying the type policy:
value = MLoadUnboxedScalar(f16array)
double_value = MToDouble(value) <-- Inserted by js::jit::AlwaysBoxAt
boxed_value = MBox(double_value) <-- Inserted by BoxPolicy
unboxed_value = MUnbox(boxed_value, Double) <-- Inserted by WarpCacheIRTranspiler
GVN will remove the MBox->MUnbox sequence, but it seems preferable to avoid generating it
in the first place.
Depends on D215771
Assignee | ||
Comment 13•1 month ago
|
||
Remove the TODO note about adding Float16Array JIT support by renaming
OutOfLineLoadTypedArrayOutOfBounds
to OutOfLineAsmJSLoadHeapOutOfBounds
which makes it more clear that Float16 support isn't needed here.
Depends on D215772
Assignee | ||
Comment 14•1 month ago
|
||
Before this change:
[Codegen] vucomiss %xmm0, %xmm0
[Codegen] jnp .Lfrom214
[Codegen] movss .Lfrom222(%rip), %xmm0
After this change:
[Codegen] vucomiss %xmm0, %xmm0
[Codegen] jnp .Lfrom214
[Codegen] movss .Lfrom222(%rip), %xmm0
Note how the label identifiers are now properly aligned.
Depends on D215773
Assignee | ||
Comment 15•1 month ago
|
||
ToFloat32(ToDouble(float32))
is exactly equal to float32
, so MToDouble
can
produce Float32 when its input can produce Float32. This change is necessary to
enable Float32 optimizations for various instructions, for example MSqrt
.
Without this change Float32 optimizations are always disabled, which makes it
hard to verify that Float16 operations correctly handle Float32 inputs and
outputs.
Depends on D215774
Reporter | ||
Updated•29 days ago
|
Comment 16•29 days ago
|
||
Pushed by andre.bargull@gmail.com: https://hg.mozilla.org/integration/autoland/rev/410329e58599 Part 1: Add float/int conversions to js::float16. r=jandem https://hg.mozilla.org/integration/autoland/rev/9d01c98d3d64 Part 2: Support encoding vcvtph2ps and vcvtps2ph instructions. r=jandem https://hg.mozilla.org/integration/autoland/rev/ade51dbcc573 Part 3: Add missing to Float16 conversion when simulating FCVT_dh. r=jandem https://hg.mozilla.org/integration/autoland/rev/3f2cb72c6348 Part 4: Support `float` results in storeCallFloatResult. r=jandem https://hg.mozilla.org/integration/autoland/rev/c6ec3155a5f8 Part 5: Add {Double,Float32,Int32}ToFloat16 conversion methods to MacroAssemblers. r=jandem https://hg.mozilla.org/integration/autoland/rev/f8cdec89d3cc Part 6: Inline Math.f16round. r=jandem https://hg.mozilla.org/integration/autoland/rev/d5ea08f74244 Part 7: Inline loading from Float16Array. r=jandem https://hg.mozilla.org/integration/autoland/rev/816e1e8497d5 Part 8: Inline DataView.prototype.getFloat16. r=jandem https://hg.mozilla.org/integration/autoland/rev/1543e18cfd43 Part 9: Inline storing into Float16Array. r=jandem https://hg.mozilla.org/integration/autoland/rev/b39efb191c8e Part 10: Inline DataView.prototype.setFloat16. r=jandem https://hg.mozilla.org/integration/autoland/rev/10fb376db8cf Part 11: Fold ToFloat16 when the input is guaranteed to be Float16. r=jandem https://hg.mozilla.org/integration/autoland/rev/1006c87386c1 Part 12: AsmJS codegen doesn't support Float16Array. r=jandem https://hg.mozilla.org/integration/autoland/rev/a67dc538eaee Part 13: Fix indentation for codegen spew. r=jandem https://hg.mozilla.org/integration/autoland/rev/f0bc3536dce7 Part 14: Enable float32 optimizations for MToDouble. r=jandem,nbp https://hg.mozilla.org/integration/autoland/rev/e0c206023ab0 apply code formatting via Lando
Comment 17•28 days ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/410329e58599
https://hg.mozilla.org/mozilla-central/rev/9d01c98d3d64
https://hg.mozilla.org/mozilla-central/rev/ade51dbcc573
https://hg.mozilla.org/mozilla-central/rev/3f2cb72c6348
https://hg.mozilla.org/mozilla-central/rev/c6ec3155a5f8
https://hg.mozilla.org/mozilla-central/rev/f8cdec89d3cc
https://hg.mozilla.org/mozilla-central/rev/d5ea08f74244
https://hg.mozilla.org/mozilla-central/rev/816e1e8497d5
https://hg.mozilla.org/mozilla-central/rev/1543e18cfd43
https://hg.mozilla.org/mozilla-central/rev/b39efb191c8e
https://hg.mozilla.org/mozilla-central/rev/10fb376db8cf
https://hg.mozilla.org/mozilla-central/rev/1006c87386c1
https://hg.mozilla.org/mozilla-central/rev/a67dc538eaee
https://hg.mozilla.org/mozilla-central/rev/f0bc3536dce7
https://hg.mozilla.org/mozilla-central/rev/e0c206023ab0
Description
•