# AGENTS.md

This file provides comprehensive guidance to AI coding agents when working with the Apache Fory codebase.

## Core Principles

While working on Fory, please remember:

- **Performance First**: Performance is the top priority. Never introduce code that reduces performance without explicit justification.
- **English Only**: Always use English in code, comments, and documentation.
- **Meaningful Comments**: Only add comments when the code's behavior is difficult to understand or when documenting complex algorithms.
- **Focused Testing**: Only add tests that verify internal behaviors or fix specific bugs; don't create unnecessary tests unless requested.
- **Git-Tracked Files**: When reading code, skip all files not tracked by git by default unless generated by yourself.
- **Cross-Language Consistency**: Maintain consistency across language implementations while respecting language-specific idioms.
- **Graalvm Support using fory codegen**: For graalvm, please use `fory codegen` to generate the serializer when building graalvm native image, do not use graallvm reflect-related configuration unless for JDK `proxy`.

## Build and Development Commands

### Java Development

- All maven commands must be executed within the `java` directory.
- All changes to `java` must pass the code style check and tests.
- Fory java needs JDK `17+` installed.

```bash
# Clean the build
mvn -T16 clean

# Build
mvn -T16 package

# Install
mvn -T16 install -DskipTests

# Code format check
mvn -T16 spotless:check

# Code format
mvn -T16 spotless:apply

# Code style check
mvn -T16 checkstyle:check

# Run tests
mvn -T16 test

# Run specific tests
mvn -T16 test -Dtest=org.apache.fory.TestClass#testMethod
```

### C++ Development

- All commands must be executed within the `cpp` directory.

```bash
# Prepare for build
pip install pyarrow==15.0.0

# Build C++ library
bazel build //...

# Run tests
bazel test $(bazel query //...)

# Run specific test
bazel test //fory/util:buffer_test
```

### Python Development

- All commands must be executed within the `python` directory.
- All changes to `python` must pass the code style check and tests.
- When running tests, you can use the `ENABLE_FORY_CYTHON_SERIALIZATION` environment variable to enable or disable cython serialization.
- When debugging protocol related issues, you should use `ENABLE_FORY_CYTHON_SERIALIZATION=0` first to verify the behavior.
- Fory python needs cpython `3.8+` installed although some modules such as `fory-core` use `java8`.

```bash
# clean build
rm -rf build dist .pytest_cache
bazel clean --expunge

# Code format
ruff format .
ruff check --fix .

# Install
pip install -v -e .

# Build native extension when cython code changed
bazel build //:cp_fory_so --config=x86_64 # For x86_64
bazel build //:cp_fory_so --copt=-fsigned-char # For arm64 and aarch64

# Run tests without cython
ENABLE_FORY_CYTHON_SERIALIZATION=0 pytest -v -s .
# Run tests with cython
ENABLE_FORY_CYTHON_SERIALIZATION=1 pytest -v -s .
```

### Golang Development

- All commands must be executed within the `go/fory` directory.
- All changes to `go` must pass the format check and tests.
- Go implementation focuses on reflection-based and codegen-based serialization.

```bash
# Format code
go fmt ./...

# Run tests
go test -v

# Run tests with race detection
go test -race -v

# Build
go build

# Generate code (if using go:generate)
go generate ./...
```

### Rust Development

- All cargo commands must be executed within the `rust` directory.
- All changes to `rust` must pass the clippy check and tests.
- You must set `RUST_BACKTRACE=1 FORY_PANIC_ON_ERROR=1` when debuging rust tests to get backtrace.
- You must add `-- --nocapture` to cargo test command when debuging tests.
- You must not set `FORY_PANIC_ON_ERROR=1` when runing all rust tests to check whether all tests pass, some tests will check Error content, which will fail if error just panic.

```bash
# Check code
cargo check

# Build
cargo build

# Run linter for all services.
cargo clippy --all-targets --all-features -- -D warnings

# Run tests (requires test features)
cargo test --features tests

# run specific test
cargo test -p tests  --test $test_file $test_method

# run specific test under subdirectory
cargo test --test mod $dir$::$test_file::$test_method

# debug specific test under subdirectory and get backtrace
RUST_BACKTRACE=1 FORY_PANIC_ON_ERROR=1 ENABLE_FORY_DEBUG_OUTPUT=1 cargo test --test mod $dir$::$test_file::$test_method -- --nocapture

# inspect generated code by fory derive macro
cargo expand --test mod $mod$::$file$ > expanded.rs

# Format code
cargo fmt

# Check formatting
cargo fmt --check

# Build documentation
cargo doc --lib --no-deps --all-features

# Run benchmarks
cargo bench
```

### JavaScript/TypeScript Development

- All commands must be executed within the `javascript` directory.
- Uses npm/yarn for package management.

```bash
# Install dependencies
npm install

# Run tests
node ./node_modules/.bin/jest --ci --reporters=default --reporters=jest-junit

# Format code
git ls-files -- '*.ts' | xargs -P 5 node ./node_modules/.bin/eslint
```

### Dart Development

- All commands must be executed within the `dart` directory.
- Uses pub for package management.

```bash
# First, generate necessary code
dart run build_runner build

# Run all tests
dart test

# Format code
dart analyze
dart fix --dry-run
dart fix --apply
```

### Kotlin Development

- All maven commands must be executed within the `kotlin` directory.
- Kotlin implementation provides extra serializers for kotlin types.
- Kotlin implementation is built on fory java, please install the java libraries first by `cd ../java && mvn -T16 install -DskipTests`. If no code changes after installed fory java, you can skip the installation step.

```bash
# Build
mvn clean package

# Run tests
mvn test
```

### Scala Development

- All commands must be executed within the `scala` directory.
- Scala implementation provides extra serializers for Scala types.
- Scala implementation is built on fory java, please install the java libraries first by `cd ../java && mvn -T16 install -DskipTests`. If no code changes after installed fory java, you can skip the installation step.

```bash
# Build with sbt
sbt compile

# Run tests
sbt test

# Format code
sbt scalafmt
```

### Integration Tests

- All commands must be executed within the `integration_tests` directory.
- For java related integration tests, please install the java libraries first by `cd ../java && mvn -T16 install -DskipTests`. If no code changes after installed fory java, you can skip the installation step.

```bash
it_dir=$(pwd)
# Run graalvm tests
cd $it_dir/graalvm_tests && mvn -T16 -DskipTests=true -Pnative package && target/main

# Run latest_jdk_tests
cd $it_dir/latest_jdk_tests && mvn -T16 test

# Run JDK compatibility tests
cd $it_dir/jdk_compatibility_tests && mvn -T16 test

# Run JPMS tests
cd $it_dir/jpms_tests && mvn -T16 test

# Run Python benchmarks
cd $it_dir/cpython_benchmark && pip install -r requirements.txt && python benchmark.py
```

### Documentation and Formatting

- **Markdown Formatting**: When updating markdown documentation, use `prettier --write $file` to format.
- **API Documentation**: When updating important public APIs, update documentation under `docs/`.
- **Protocol Specifications**: `docs/specification/**` contains Fory protocol specifications. Read these documents carefully before making protocol changes.
- **User Guides**: `docs/guide/**` contains user guides for different features and languages.

## Repository Structure Understanding

### Git Repository

Apache Fory is an open-source project hosted on GitHub.
The git repository for Apache Fory is https://github.com/apache/fory .
Contributors always fork the repository and create a pull request to propose changes.
The `origin` points to forked repository instead of the official repository.

### Key Directories

- **`docs/`**: Documentation, specifications, and guides
  - `docs/specification/`: Protocol specifications (critical for understanding)
  - `docs/guide/`: User guides and development guides
  - `docs/benchmarks/`: Performance benchmarks documentation

- **Language Implementations**:
  - `java/`: Java implementation (maven-based, multi-module)
  - `python/`: Python implementation (pip/setuptools + bazel)
  - `cpp/`: C++ implementation (bazel-based)
  - `go/`: Go implementation (go modules)
  - `rust/`: Rust implementation (cargo-based)
  - `javascript/`: JavaScript/TypeScript implementation (npm-based)
  - `dart/`: Dart implementation (pub-based)
  - `kotlin/`: Kotlin implementation (maven-based)
  - `scala/`: Scala implementation (sbt-based)

- **Testing and CI**:
  - `integration_tests/`: Cross-language integration tests
  - `.github/workflows/`: GitHub Actions CI/CD workflows
  - `ci/`: CI scripts and configurations

- **Build Configuration**:
  - `BUILD`, `WORKSPACE`: Bazel configuration
  - `.bazelrc`, `.bazelversion`: Bazel settings
  - Various `pom.xml`, `package.json`, `Cargo.toml`, etc.

### Important Files

- **`AGENTS.md`**: This file - AI coding guidance
- **`CLAUDE.md`**: Claude Code specific instructions
- **`CONTRIBUTING.md`**: Contribution guidelines
- **`README.md`**: Project overview and quick start
- **`.gitignore`**: Git ignore patterns (includes build dirs)
- **`licenserc.toml`**: License header configuration

## Architecture Overview

Apache Fory is a blazingly-fast multi-language serialization framework that revolutionizes data exchange between systems and languages. By leveraging JIT compilation, code generation and zero-copy techniques, Fory delivers up to 170x faster performance compared to other serialization frameworks while being extremely easy to use.

### Binary Protocols

Fory uses binary protocols for efficient serialization and deserialization. Fory designed and implemented multiple binary protocols for different scenarios:

- **[xlang serialization format](docs/specification/xlang_serialization_spec.md)**:
  - Cross-language serialize any object automatically, no need for IDL definition, schema compilation and object to/from protocol conversion.
  - Support optional shared reference and circular reference, no duplicate data or recursion error.
  - Support object polymorphism.
- **[Row format](docs/specification/row_format_spec.md)**: A cache-friendly binary random access format, supports skipping serialization and partial serialization, and can convert to column-format automatically.
- **[Java serialization format](docs/specification/java_serialization_spec.md)**: Highly-optimized and drop-in replacement for Java serialization.
- **Python serialization format**: Highly-optimized and drop-in replacement for Python pickle, which is an extension built upon **[xlang serialization format](docs/specification/xlang_serialization_spec.md)**.

**`docs/specification/**` are the specification for the Fory protocol, please read those documents carefully and think hard and make sure you understand them before making changes to code and documentation.

### Core Structure

Fory serialization for every language is implemented independently to minimize the object memory layout interoperability, object allocation, memory access cost, thus maximize the performance. There is no code reuse between languages except for `fory python`, which reused code from `fory c++`.

#### Java

- **fory-core**: Java library implementing the core object graph serialization
  - `java/fory-core/src/main/java/org/apache/fory/Fory.java`: main serialization entry point
  - `java/fory-core/src/main/java/org/apache/fory/resolver/TypeResolver.java`: type resolution and serializer dispatch
  - `java/fory-core/src/main/java/org/apache/fory/resolver/RefResolver.java`: class for resolving shared/circular references when ref tracking is enabled
  - `java/fory-core/src/main/java/org/apache/fory/serializer`: serializers for each supported type
  - `java/fory-core/src/main/java/org/apache/fory/codegen`: code generators, provide expression abstraction and compile expression tree to java code and byte code
  - `java/fory-core/src/main/java/org/apache/fory/builder`: build expression tree for serialization to generate serialization code
  - `java/fory-core/src/main/java/org/apache/fory/reflect`: reflection utilities
  - `java/fory-core/src/main/java/org/apache/fory/type`: java generics and type inference utilities
  - `java/fory-core/src/main/java/org/apache/fory/util`: utility classes

- **fory-format**: Java library implementing the core row format encoding and decoding
  - `java/fory-format/src/main/java/org/apache/fory/format/row`: row format data structures
  - `java/fory-format/src/main/java/org/apache/fory/format/encoder`: generate row format encoder and decoder to encode/decode objects to/from row format
  - `java/fory-format/src/main/java/org/apache/fory/format/type`: type inference for row format
  - `java/fory-format/src/main/java/org/apache/fory/format/vectorized`: interoperation with apache arrow columnar format

- **fory-extensions**: extension libraries for java, including:
  - Protobuf serializers for fory java native object graph protocol.
  - Meta compression based on zstd

- **fory-simd**: SIMD-accelerated serialization and deserialization based on java vector API
  - `java/fory-simd/src/main/java/org/apache/fory/util`: SIMD utilities
  - `java/fory-simd/src/main/java/org/apache/fory/serializer`: SIMD accelerated serializers

- **fory-test-core**: Core test utilities and data generators

- **testsuite**: Complex test suite for issues reported by users and hard to reproduce using simple test cases

- **benchmark**: Benchmark suite based on jmh

#### Bazel

`bazel` dir provide build support for fory c++ and cython:

- `bazel/arrow`: build rules to get arrow shared libraries based on bazel template
- `grpc-cython-copts.patch/grpc-python.patch`: patch for grpc to add `pyx_library` for cython.

#### C++

- `cpp/fory/row`: Row format data structures
- `cpp/fory/meta`: Compile-time reflection utilities for extract struct fields information.
- `cpp/fory/encoder`: Row format encoder and decoder
- `cpp/fory/columnar`: Interoperation between fory row format and apache arrow columnar format
- `cpp/fory/util`: Common utilities
  - `cpp/fory/util/buffer.h`: Buffer for reading and writing data
  - `cpp/fory/util/bit_util.h`: utilities for bit manipulation
  - `cpp/fory/util/string_util.h`: String utilities
  - `cpp/fory/util/status.h`: Status code for error handling

#### Python

Fory python has two implementations for the protocol:

- **Python mode**: Pure python implementation based on `xlang serialization format`, used for debugging and testing only. This mode can be enabled by setting `ENABLE_FORY_CYTHON_SERIALIZATION=0` environment variable.
- **Cython mode**: Cython based implementation based on `xlang serialization format`, which is used by default and has better performance than pure python. This mode can be enabled by setting `ENABLE_FORY_CYTHON_SERIALIZATION=1` environment variable.
- **Python mode** and **Cython mode** reused some code from each other to reduce code duplication.

Code structure:

- `python/pyfory/serialization.pyx`: Core serialization logic and entry point for cython mode based on `xlang serialization format`
- `python/pyfory/_fory.py`: Serialization entry point for pure python mode based on `xlang serialization format`
- `python/pyfory/_registry.py`: Type registry, resolution and serializer dispatch for pure python mode, which is also used by cython mode. Cython mode use a cache to reduce invocations to this module.
- `python/pyfory/serializer.py`: Serializers for non-internal types
- `python/pyfory/includes`: Cython headers for `c++` functions and classes.
- `python/pyfory/resolver.py`: resolving shared/circular references when ref tracking is enabled in pure python mode
- `python/pyfory/format`: Fory row format encoding and decoding, arrow columnar format interoperation
- `python/pyfory/_util.pyx`: Buffer for reading/writing data, string utilities. Used by `serialization.pyx` and `python/pyfory/format` at the same time.

#### Go

Fory go provides reflection-based and codegen-based serialization and deserialization.

- `go/fory/fory.go`: serialization entry point
- `go/fory/resolver.go`: resolving shared/circular references when ref tracking is enabled
- `go/fory/type.go`: type system and type resolution, serializer dispatch
- `go/fory/slice.go`: serializers for `slice` type
- `go/fory/map.go`: serializers for `map` type
- `go/fory/set.go`: serializers for `set` type
- `go/fory/struct.go`: serializers for `struct` type
- `go/fory/string.go`: serializers for `string` type
- `go/fory/buffer.go`: Buffer for reading/writing data
- `go/fory/codegen`: code generators, provide code generator to be invoked by `go:generate` to generate serialization code to speed up the serialization.
- `go/fory/meta`: Meta string compression

#### Rust

Fory rust provides macro-based serialization and deserialization. Fory rust consists of:

- **fory**: Main library entry point
  - `rust/fory/src/lib.rs`: main library entry point to export API to users
- **fory-core**: Core library for serialization and deserialization
  - `rust/fory-core/src/fory.rs`: main serialization entry point
  - `rust/fory-core/src/resolver/type_resolver.rs`: type resolution and registration
  - `rust/fory-core/src/resolver/metastring_resolver.rs`: resolver for meta string
  - `rust/fory-core/src/resolver/context.rs`: context for reading/writing
  - `rust/fory-core/src/buffer.rs`: buffer for reading/writing data
  - `rust/fory-core/src/meta`: meta string compression, type meta encoding
  - `rust/fory-core/src/serializer`: serializers for each supported type
  - `rust/fory-core/src/row`: row format encoding and decoding
- **fory-derive**: Rust macro-based codegen for serialization and deserialization
  - `rust/fory-derive/src/object`: macro for serializing/deserializing structs
  - `rust/fory-derive/src/fory_row`: macro for encoding/decoding row format

#### Integration Tests

`integration_tests` contains integration tests with following modules:

- **cpython_benchmark**: benchmark suite for fory python
- **graalvm_tests**: test suite for fory java on graalvm.
  - Note that fory use codegen to support graalvm instead of reflection, fory don't use `reflect-config.json` for
    serialization, this is the core advantage of compared to graalvm JDK serialization.
- **jdk_compatibility_tests**: test suite for fory serialization compatibility between multiple JDK versions
- **latest_jdk_tests**: test suite for `jdk17+` versions

## Key Development Guidelines

### Performance Guidelines

- **Performance First**: Never introduce code that reduces performance without explicit justification
- **Zero-Copy**: Leverage zero-copy techniques when possible
- **JIT Compilation**: Consider JIT compilation opportunities
- **Memory Layout**: Optimize for cache-friendly memory access patterns

### Code Quality

- **Public APIs**: Must be well-documented and easy to understand
- **Error Handling**: Implement comprehensive error handling with meaningful messages
- **Type Safety**: Use strong typing and generics appropriately
- **Null Safety**: Handle null values appropriately for each language

### Cross-Language Considerations

- **Protocol Compatibility**: Ensure serialization compatibility across languages
- **Type Mapping**: Understand type mapping between languages (see `docs/specification/xlang_type_mapping.md`)
- **Endianness**: Handle byte order correctly for cross-platform compatibility
- **Version Compatibility**: Maintain backward compatibility when possible

### Testing Strategy

- **Unit Tests**: Focus on internal behavior verification
- **Integration Tests**: Use `integration_tests/` for cross-language compatibility
- **Langauge alignment and Protocol Compatibility**: Executing `test_cross_language.py` for language and protocol alignment
- **Performance Tests**: Include benchmarks for performance-critical changes

### Documentation Requirements

- **API Changes**: Update relevant documentation in `docs/`
- **Protocol Changes**: Update specifications in `docs/specification/`
- **Examples**: Provide working examples for new features
- **Migration Guides**: Document breaking changes and migration paths

## Development Workflow

### Before Making Changes

1. **Read Specifications**: Review relevant docs in `docs/specification/`
2. **Understand Architecture**: Study the language-specific implementation structure
3. **Check Existing Tests**: Look at existing test patterns and coverage
4. **Review Related Issues**: Check GitHub issues for context

### Making Changes

1. **Follow Language Conventions**: Respect each language's idioms and patterns
2. **Maintain Performance**: Profile performance-critical changes
3. **Add Tests**: Include appropriate tests for new functionality
4. **Update Documentation**: Update docs for API changes
5. **Format Code**: Use language-specific formatters before committing

## Debugging Guidelines

### Protocol Issues

- **Use Python Mode**: Set `ENABLE_FORY_CYTHON_SERIALIZATION=0` for debugging
- **Check Specifications**: Refer to protocol specs in `docs/specification/`
- **Cross-Language Testing**: Use integration tests to verify compatibility

### Performance Issues

- **Profile First**: Use appropriate profilers for each language
- **Memory Analysis**: Check for memory leaks and allocation patterns

### Build Issues

- **Clean Builds**: Use language-specific clean commands
- **Dependency Issues**: Check version compatibility
- **Bazel Issues**: Use `bazel clean --expunge` for deep cleaning

## CI/CD Understanding

### GitHub Actions Workflows

- **`ci.yml`**: Main CI workflow for all languages
- **`build-native-*.yml`**: Mac/Window python wheel build workflows
- **`build-containerized-*.yml`**: Containerized python wheel build workflows for linux
- **`lint.yml`**: Code formatting and linting
- **`pr-lint.yml`**: PR-specific checks

## Commit Message Format

Use conventional commits with language scope:

```
feat(java): add codegen support for xlang serialization
fix(rust): fix collection header when collection is empty
docs(python): add docs for xlang serialization
refactor(java): unify serialization exceptions hierarchy
perf(cpp): optimize buffer allocation in encoder
test(integration): add cross-language reference cycle tests
ci: update build matrix for latest JDK versions
chore(deps): update arrow dependency to 15.0.0
```
