Skip to content

Host colectable GC references#55

Draft
andreaTP wants to merge 12 commits into
bytecodealliance:mainfrom
andreaTP:gc-ref-interpreter-phase1
Draft

Host colectable GC references#55
andreaTP wants to merge 12 commits into
bytecodealliance:mainfrom
andreaTP:gc-ref-interpreter-phase1

Conversation

@andreaTP

@andreaTP andreaTP commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

WIP to fix #36

andreaTP added 12 commits June 2, 2026 12:55
Replace GcRefStore integer-ID indirection with direct Java Object
references in the interpreter path. Java's GC now handles liveness
of Wasm GC structs, arrays, and i31 values naturally.

Core design: MStack gains a lazy Object[] refs array (null until
first pushRef). push()/pop() are unchanged — zero overhead for
non-GC workloads. GC refs use pushRef()/popRef() to store actual
WasmStruct/WasmArray/WasmI31Ref objects.

Key changes:
- MStack: lazy Object[] with pushRef/popRef/peekRef/clearRefsTo
- StackFrame: Object[] localRefs parallel to long[] locals
- WasmStruct/WasmArray: dual long[]+Object[] for fields/elements
- GlobalInstance: Object refValue for ref-typed globals
- TableInstance: Object[] objRefs for GC-typed tables
- ValType.isGcReference(): distinguishes any-hierarchy from func/extern
- StorageType.isReference()/isGcReference(): field type helpers
- Instance.heapTypeMatchRef(Object,...): type matching on Objects
- ConstantEvaluators: ConstantResult carries both long[] and Object
- InterpreterMachine: ~30 GC instructions updated
- Machine.call(int,long[],Object[]): overload for ref args
- WasmI31Ref: equals/hashCode for ref.eq value equality

GcRefStore is NOT yet removed — the compiler still uses it (Phase 2).
Compiler GC tests are expected to fail until Phase 2.
Add callGc/applyGc methods to Machine, ExportFunction, and
WasmFunctionHandle for passing real Java Objects as Wasm GC refs.
Users can now receive WasmStruct/WasmArray/WasmI31Ref directly
from exported functions, and Java GC manages their lifecycle.

Key changes:
- Machine.callGc(int, Object[]) / ExportFunction.applyGc(Object...)
- InterpreterMachine overrides callGc with native Object path
- WasmFunctionHandle.applyGc for GC-aware host functions
- WasmExternRef can wrap either long or Object (for extern.convert_any)
- ValType.isGcReference() cached during resolve() — zero-cost check
- isGcReference correctly classifies concrete func types as non-GC
- Table init populates objRefs for GC-typed tables
- Instance.registerGcRef/gcRef/array deprecated
- OpcodeImpl.boxForTable/unboxFromTable deprecated
- Compiler: GC refs use Object.class, call bridge threads Object[]

Compiler Phase 2: GC refs flow as Objects in generated bytecode.
CompilerUtil maps GC ref types to Object.class/OBJECT_TYPE.
Shaded helpers take/return Object for GC refs.
Generated call_N bridges accept Object[] refArgs.
BR_ON_NULL/NON_NULL checks distinguish GC (ifnull) from funcref (if_icmpeq).

Non-GC code paths are completely unchanged — zero overhead.
Delete GcRefStore and its epoch-based mark-sweep collector. Wasm GC
references (structs, arrays, i31) are now managed entirely by Java's
garbage collector through the Object[] refs arrays in MStack,
StackFrame, WasmStruct, WasmArray, GlobalInstance, and TableInstance.

Changes:
- Delete GcRefStore.java and GcRefStoreTest.java
- Instance: remove gcRefs field, gcSafePoint()
- Instance.registerGcRef/gcRef/array: throw UnsupportedOperationException
- ExportFunction.apply(long...) throws on functions with GC params/returns
  directing users to applyGc(Object...) instead
- Remove gcSafePoint() calls from Instance.Exports and initialization

Users must migrate from apply(long...) to applyGc(Object...) for
functions that use GC reference types (structs, arrays, i31, anyref).
Non-GC functions (funcref, externref, numeric types) continue to
work with apply(long...) unchanged.
Fix all remaining interpreter GC reference bugs:
- MStack.popRef/peekRef: handle null refs array gracefully
- REF_TEST/CAST_TEST/BR_ON_CAST: dispatch based on source type
  (popRef for GC refs, pop for funcref/externref)
- ARRAY_GET/SET: use isGcReference() not isReference() for field
  type checks — funcref/externref elements stay in long[] path
- ARRAY_NEW_DEFAULT: fill with REF_NULL_VALUE for non-GC ref types
- ConstantEvaluators ARRAY_NEW_DEFAULT: same fix for global init
- callGc: convert REF_NULL_VALUE to null for all ref return types
- apply(long...): throw UnsupportedOperationException only after
  execution succeeds (traps propagate correctly)
- WasmExternRef: can wrap Object for extern.convert_any round-trips

Test generator (JavaTestGen):
- Emit applyGc() for functions with GC params/returns
- Null ref assertions use assertNull() for all ref types
- WasmValue: toGcArgsValue, toGcResultValue, toGcAssertion methods
- WasmValueType.isGcReference() helper
Fix all remaining compiler and interpreter GC reference bugs:

Compiler:
- Generate callGc override in compiled Machine class
- Add int-based variants for refTest/castTest/heapTypeMatch (non-GC refs)
- Add GC table operations: tableGetRef, tableSetRef, tableGrowRef, tableFillRef
- Add extern conversion helpers in Shaded
- Fix THROW to handle GC ref tag params (createWasmExceptionGc)
- Fix CATCH_UNBOX_PARAMS to load GC refs from refArgs
- Fix arrayNewElem/arrayInitElem to use computeConstant for GC elements
- Fix WasmAnalyzer to track source types for REF_TEST dispatch

Interpreter:
- BR_ON_NULL/BR_ON_NON_NULL: check Object side for GC refs, long side for funcref
- MStack.push(): clear stale refs (if refs != null) for correctness
- ConstantEvaluators: use isGcReference() not isReference() for field type checks
- STRUCT_NEW_DEFAULT/ARRAY_NEW_DEFAULT: fill REF_NULL_VALUE for non-GC ref fields
- WasmException: carry Object[] refArgs for GC ref exception payloads

Bridge:
- CompilerInterpreterMachine.CALL: use callGc for functions with GC returns
- Pass refArgs through compiled-to-interpreted boundary

Tests:
- BrOnNullTest: use applyGc for GC functions
- All approval snapshots updated
Compiler fixes:
- Shaded.structNewDefault/arrayNewDefault: fill REF_NULL_VALUE for
  funcref fields (non-GC refs need -1 for null, not 0)
- emitBoxValuesOnStack: handle Object refs (store 0L placeholder)
- Remove dead code: CompilerUtil.isGcRef, hasGcRefReturns, Context.isGcTable

Infrastructure:
- ValType.isObjectRef(): cached flag for "uses Object on JVM stack"
  (GC refs + externref, NOT funcref). Ready for future externref-as-Object.
- StorageType.isObjectRef() delegates to ValType

Tests:
- GcEdgeCasesTest: struct.new_default funcref null, extern round-trip
  (interpreter only — compiler externref-as-Object deferred), applyGc
- GcStressTest: allocate 100K struct chains 100 times, verifies
  Java GC collects unreachable Wasm GC refs (no OOM with -Xmx64m)
Make externref use Object representation (same as GC refs). Only
funcref stays as int. extern.convert_any / any.convert_extern are
now identity operations, satisfying the Wasm spec requirement that
composing them yields the original value.

API:
- CallResult record: zero-boxing return type with long[] + Object[]
- Machine.callWithRefs(int, long[], Object[]) returns CallResult
- ExportFunction.applyWithRefs(long[], Object[]) returns CallResult
- applyGc(Object...) stays as convenience wrapper
- apply(long...) backward compatible for non-ref functions

Internal:
- isObjectRef() (GC refs + externref) replaces isGcReference() at
  all representation-deciding sites (compiler + interpreter)
- isGcReference() kept only for Wasm type system checks
- Tables: externref tables use Object[] objRefs (isObjectRef)
- Shaded: extern conversions are identity (Object → Object)
- structNewDefault/arrayNewDefault: only funcref needs REF_NULL_VALUE

Zero overhead for non-GC non-externref modules.
- apply(long...) throws for isObjectRef (GC + externref), not just isGcReference
- Delete callGc from Machine and InterpreterMachine
- Delete boxReturnValue, all instanceof Number/Long boxing
- Delete Long.valueOf externref wrapping
- InterpreterMachine: zero boxing in Machine layer
- Host functions: applyWithRefs(Instance, long[], Object[]) returns CallResult
- WasmModuleTest: externref test uses applyWithRefs with real Objects
- Remaining boxing only in Instance.applyGc (caller convenience edge)

Breaking: annotation processor externref integration test fails
(generated code uses apply(long...) for externref — needs migration
to applyWithRefs in the annotation processor module).
Delete applyGc from ExportFunction, Instance, WasmFunctionHandle.
Zero boxing anywhere in the runtime.

API (final):
- apply(long...) — numeric + funcref only, throws for GC/externref
- applyWithRefs(long[], Object[]) → CallResult — everything, zero boxing

ArgsAdapter extended with addRef() + applyWithRefs() for clean
test code generation. Test generator migrated from applyGc to
ArgsAdapter.builder().add(n).addRef(obj).applyWithRefs(func).

Remaining: annotation processor needs migration to applyWithRefs
for externref functions (annotations-it test).
Fix 5 confirmed bugs from code review, document 3 known limitations:

MStack: NULL_REF sentinel distinguishes "null GC ref" from "no ref".
pushRef(null) stores NULL_REF in refs[], popRef/peekRef convert back.
topIsRef() checks the raw refs[] value. Fixes null GC refs lost in
doControlTransfer, REF_IS_NULL, BR_ON_NULL, and exception catch.

Instance: apply() guard uses isObjectRef() — externref functions
now correctly throw, directing users to applyWithRefs().

Shaded: tableFillRef uses long arithmetic for overflow-safe bounds.

Compiler: null-check refArgs before loading in compileCallFunction.

InterpreterMachine: pushExceptionArgs uses tag type to determine
ref positions instead of ambiguous null check.

Known limitations (documented with TODO comments):
- Cross-module call_indirect GC ref returns discarded (Bug 4)
- Multi-value returns with Object[] vs LALOAD mismatch (Bug 5)
- Host/cross-module calls lose GC ref params in long[] (Bug 6)

Tests: GcReviewFixesTest with WAT module exercising null refs
through blocks, exceptions, table.fill bounds, and externref
via applyWithRefs.
Fix 6 bugs where GC ref Objects were lost at long[] serialization
boundaries in the compiler.

Root cause: the compiler used long[] to pass args/returns across
module boundaries, host calls, tail calls, and too-many-params.
Object refs can't fit in long[], so they were discarded (POP + 0L).

Fix: dual long[] + Object[] at every boundary, matching the
Machine.callWithRefs(int, long[], Object[]) → CallResult pattern.

Shaded: add WithRefs overloads for callHostFunction, callIndirect,
setTailCall, setTailCallIndirect, resolveTailCall. Old long[]-only
methods kept for backward compat with existing compiled code.

Instance.TailCallPending: add Object[] refArgs field.

Emitters: emitBoxValuesOnStackWithRefs creates dual arrays.
emitUnboxResult uses AALOAD for Object[] multi-value returns.
emitTailCallCheck uses resolveTailCallWithRefs → CallResult.
RETURN_CALL variants use WithRefs when params have Object refs.

Compiler: emitBoxArgumentsWithRefs, emitUnboxCallResult for
host/cross-module paths. compileMachineCallInvoke host path
uses callHostFunctionWithRefs.

Zero overhead for non-GC: all WithRefs paths gated on
isObjectRef() at emit time. Null Object[] when no refs.
Update ModuleInterfaceCodegen to generate applyWithRefs calls
for functions with externref params/returns. externref maps to
Object.class (not long.class).

Export codegen: builds positional long[] + Object[] arrays,
calls applyWithRefs, reads from CallResult. Non-ref functions
use the original apply() path unchanged.

Import codegen: generates WasmFunctionHandle with applyWithRefs
override for functions with Object ref types.

ExternRefExampleTest: host functions use Object for externref.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable host JVM GC to collect unreachable WasmGC references

1 participant