Polyglot Box-Passing Processing Pipeline Architecture

Overview

One function of the SeqWeb system is to convert OEIS sequence data (.seq files) into semantic web knowledge graphs (.ttl files). Such conversions lie in the domain of SeqWeb’s Webwright subsystem (see Systems Design for context)

The Webwright subsystem constructs the knowledge graph by orchestrating an ensemble of Fabricators. Each Fabricator is responsible for generating a specific portion of the knowledge graph, and implements a polyglot processing pipeline composed of individually-implemented software Modules designed to interface readily with related Modules. Each Module may be implemented in an arbitrary language (e.g., Python, Java, Lisp, Bash — see Rationale for Polyglot Implementation below).

For example, a Fabricator might extract entities from selected OEIS entry text, and then build RDF triples linking the subject sequence to those entities via specific relationships. Upstream Modules in this Fabricator would handle entity extraction; downstream Modules would generate the RDF; others may support intermediate transformation, filtering, or enrichment.

Pipeline
Fabricator polyglot pipeline architecture

Modules communicate via a shared abstract structure called a box — functionally a box is just a language-agnostic key-value map that flows through the pipeline.

Box glyph
a box

Each module is designed to function as a plug-and-play unit in one or more Fabricator pipelines, in which modules may be composed, replaced, tested, or reused independently, regardless of implementation language.

Modules may be composed using two different implementation mechanisms: via native orchestration (when the modules are implemented in the same language) or executed in isolation through shell wrappers that expose a standardized shell interface. This hybrid model allows clean decoupling, testability, and flexibility across execution boundaries.


Key Terminology

Term Meaning
box A dictionary-like structure of key-value pairs shared between modules. It serves both as the input/output of each module and as a shared common carrier of state across pipeline stages.
inbox The input box to a module.
outbox The output box from a module, typically the inbox plus zero or more updates to existing key-values and/or additional new key-values.
module A processing unit that takes an inbox and returns an outbox.
program A shell-invocable wrapper around a module exposing the inbox/outbox interface over shell+JSON.
shell wrapper A thin entry point that translates shell input to a native-language box, invokes the core function, and emits JSON output.
core function A native-language, pure function that maps a box (dict/map) to another box.
destructuring interface A function signature that binds named keys from a box to function arguments (e.g., *, prompt, noisy=False, **_rest).
box-then-kwargs pattern Pattern where a function accepts the full box and also unpacks it for destructuring.
native composition Bypassing the shell wrapper by directly calling core functions within the same runtime environment.
shell execution Cross-language or system-level execution where modules run as subprocesses using wrapper interfaces.

Wrapper Contract

The main() function in each module serves as a shell wrapper with a strict contract:

Input Contract:

Output Contract:

Core Function Contract:

Anti-Patterns to Avoid:

The wrapper exists solely to enable shell-based composition across languages. All other functionality should be implemented separately.


Implementation Patterns

Destructuring Core (Python)

def normalize_prompt(box: Dict[str, Any], *, prompt: str, noisy: bool = False, **_rest) -> Dict[str, Any]:
    """
    Core function that normalizes a prompt and returns an outbox.
    
    Uses destructuring pattern to bind only needed parameters while preserving
    the full box and any extra keys for pass-through semantics.
    
    Args:
        box: Full input box dictionary
        prompt: The prompt to normalize
        noisy: Whether to enable verbose output (controls printing)
        **_rest: Any additional keys in the box (preserved for pass-through)
        
    Returns:
        outbox: The box plus any modifications made by this module
    """
    result = prompt.strip().lower()
    if noisy:
        print(f"\n\t🧹 Normalized Prompt:\n{result}")
    return {**box, "normalized_prompt": result}

Destructuring Core (Java)

public static Map<String, Object> normalizePrompt(Map<String, Object> box, 
                                                 String prompt, 
                                                 boolean noisy) {
    /**
     * Core function that normalizes a prompt and returns an outbox.
     * 
     * Uses explicit parameter binding while preserving the full box
     * and any extra keys for pass-through semantics.
     * 
     * @param box Full input box dictionary
     * @param prompt The prompt to normalize
     * @param noisy Whether to enable verbose output (controls printing)
     * @return outbox The box plus any modifications made by this module
     */
    String result = prompt.strip().toLowerCase();
    if (noisy) {
        System.out.println("\n\t🧹 Normalized Prompt:\n" + result);
    }
    
    // Create new map with original box contents plus new key
    Map<String, Object> outbox = new HashMap<>(box);
    outbox.put("normalized_prompt", result);
    return outbox;
}

Key Differences from Python:

Usage Pattern:

// Extract known parameters from box
String prompt = (String) box.get("prompt");
boolean noisy = (Boolean) box.getOrDefault("noisy", false);

// Call core function
Map<String, Object> outbox = normalizePrompt(box, prompt, noisy);

Shell Wrapper (Python)

def main():
    """Shell wrapper for normalize_prompt module."""
    from libs.core.wrapper import get_inbox, dump_outbox
    
    # Define argument specifications for this module
    argument_definitions = [
        ('prompt', str, 'The prompt to normalize', True),
        ('noisy', bool, 'Enable verbose output', False)
    ]
    
    # Build inbox from stdin + CLI args using shared utility
    inbox = get_inbox(argument_definitions)
    
    # Call core function with identical semantics
    outbox = normalize_prompt(inbox, **inbox)
    
    # Emit JSON output for pipeline consumption
    dump_outbox(outbox)

Key Points:


Shell Wrapper (Java)

public static void main(String[] args) {
    // Define argument specifications for this module
    List<ArgumentDefinition> argumentDefinitions = Arrays.asList(
        new ArgumentDefinition("prompt", String.class, "The prompt to normalize", true),
        new ArgumentDefinition("noisy", Boolean.class, "Enable verbose output", false)
    );
    
    // Build inbox from stdin + CLI args using shared utility
    Map<String, Object> inbox = Wrapper.getInbox(argumentDefinitions);
    
    // Extract parameters for core function call
    String prompt = (String) inbox.get("prompt");
    boolean noisy = (Boolean) inbox.getOrDefault("noisy", false);
    
    // Call core function with identical semantics
    Map<String, Object> outbox = normalizePrompt(inbox, prompt, noisy);
    
    // Emit JSON output for pipeline consumption
    Wrapper.dumpOutbox(outbox);
}

Key Points:


Fabricator Example: fabricate_response

Fabricator (Python)

def fabricate_response(box: Dict[str, Any], *, prompt: str, noisy: bool = False, **_rest) -> Dict[str, Any]:
    """Fabricator that processes a prompt through a pipeline: normalize, generate, present."""
    # Create initial box from destructured parameters
    initial_box = {
        'prompt': prompt,
        'noisy': noisy,
        **_rest  # Preserve any extra keys from the input box
    }
    
    # Define the pipeline modules
    modules = [
        normalize_prompt,   # see above example
        generate_response,  # ToDo
        present_response    # ToDo
    ]
    
    # Run the pipeline using the box-then-kwargs pattern
    result = run_pipeline(modules, initial_box)
    
    return result

def main():
    """Shell wrapper for fabricate_response fabricator."""
    from libs.core.wrapper import get_inbox, dump_outbox
    
    # Define argument specifications for this fabricator
    argument_definitions = [
        ('prompt', str, 'The prompt to process', True),
        ('noisy', bool, 'Enable verbose output', False)
    ]
    
    # Build inbox from stdin + CLI args using shared utility
    inbox = get_inbox(argument_definitions)
    
    # Call core function with identical semantics
    outbox = fabricate_response(inbox, **inbox)
    
    # Emit JSON output for pipeline consumption
    dump_outbox(outbox)

where the run_pipeline functional helper composes modules using the box-then-kwargs pattern:

def run_pipeline(modules: List[Callable], initial_box: Dict[str, Any]) -> Dict[str, Any]:
    """
    Run a pipeline of modules on an initial box.
    
    Each module receives the full box plus destructured arguments,
    enabling both box-based and parameter-based access patterns.
    
    Args:
        modules: List of module functions to execute in sequence
        initial_box: The initial box to process through the pipeline
        
    Returns:
        Final box after all modules have been applied
    """
    box = initial_box
    for module in modules:
        box = module(box, **box)  # Each module gets full box + destructured args
    return box

Fabricator (Java)

public static Map<String, Object> fabricateResponse(Map<String, Object> box, String prompt, boolean noisy) {
    // Fabricator that processes a prompt through a pipeline: normalize, generate, present
    // Create initial box from destructured parameters
    Map<String, Object> initialBox = new HashMap<>(box);
    initialBox.put("prompt", prompt);
    initialBox.put("noisy", noisy);
    
    // Define the pipeline modules
    List<Function<Map<String, Object>, Map<String, Object>>> modules = Arrays.asList(
        FabricateResponse::normalizePrompt,   // see above example
        FabricateResponse::generateResponse,  // ToDo
        FabricateResponse::presentResponse    // ToDo
    );
    
    // Run the pipeline using the box-then-kwargs pattern
    return runPipeline(modules, initialBox);
}

public static void main(String[] args) {
    // Define argument specifications for this fabricator
    List<ArgumentDefinition> argumentDefinitions = Arrays.asList(
        new ArgumentDefinition("prompt", String.class, "The prompt to process", true),
        new ArgumentDefinition("noisy", Boolean.class, "Enable verbose output", false)
    );
    
    // Build inbox from stdin + CLI args using shared utility
    Map<String, Object> inbox = Wrapper.getInbox(argumentDefinitions);
    
    // Extract parameters for core function call
    String prompt = (String) inbox.get("prompt");
    boolean noisy = (Boolean) inbox.getOrDefault("noisy", false);
    
    // Call core function with identical semantics
    Map<String, Object> outbox = fabricateResponse(inbox, prompt, noisy);
    
    // Emit JSON output for pipeline consumption
    Wrapper.dumpOutbox(outbox);
}

where the runPipeline functional helper composes modules using the box-then-kwargs pattern:

public static Map<String, Object> runPipeline(
    List<Function<Map<String, Object>, Map<String, Object>>> modules, 
    Map<String, Object> initialBox) {
    /**
     * Run a pipeline of modules on an initial box.
     * 
     * Each module receives the full box, enabling both box-based
     * and parameter-based access patterns.
     * 
     * @param modules List of module functions to execute in sequence
     * @param initialBox The initial box to process through the pipeline
     * @return Final box after all modules have been applied
     */
    Map<String, Object> box = new HashMap<>(initialBox);
    for (Function<Map<String, Object>, Map<String, Object>> module : modules) {
        box = module.apply(box);  // Each module gets full box
    }
    return box;
}

These fabricators demonstrate that Fabricators can be Modules too — fabricateResponse can be used both as a standalone program and as a composable module in larger pipelines. Note: the implementation of the last two modules (generateResponse and presentResponse) are left as exercises for the reader.


Cross-Language Execution Model

When modules are implemented in different languages and invoked as standalone programs:

This unified design enables:

Multi-Language Module Example

A true polyglot pipeline can mix languages at the module level. Here’s a Python core module that calls a Java wrapper:

def polyglot_normalize_prompt(box: Dict[str, Any], *, prompt: str, noisy: bool = False, **_rest) -> Dict[str, Any]:
    """Python core that delegates to Java normalizePrompt wrapper."""
    import subprocess
    import json
    
    # Assumption: Java wrapper is compiled and available in PATH as 'normalize-prompt-java'
    # In practice, this would be configured via seqvar or build system
    java_wrapper_cmd = "normalize-prompt-java"
    
    # Prepare input for Java wrapper
    java_input = {
        "prompt": prompt,
        "noisy": noisy,
        **_rest
    }
    
    try:
        # Call Java wrapper with JSON input via stdin
        result = subprocess.run(
            [java_wrapper_cmd],
            input=json.dumps(java_input),
            text=True,
            capture_output=True,
            check=True
        )
        
        # Parse JSON output from Java wrapper
        java_output = json.loads(result.stdout)
        
        if noisy:
            print(f"Java wrapper returned: {len(java_output)} keys")
        
        return java_output
        
    except subprocess.CalledProcessError as e:
        if noisy:
            print(f"Java wrapper failed: {e.stderr}")
        raise RuntimeError(f"Java normalizePrompt failed: {e}") from e
    except json.JSONDecodeError as e:
        if noisy:
            print(f"Invalid JSON from Java wrapper: {result.stdout}")
        raise RuntimeError(f"Java normalizePrompt returned invalid JSON: {e}") from e

def main():
    # Define argument specifications for this polyglot module
    argument_definitions = [
        ('prompt', str, 'The prompt to normalize', True),
        ('noisy', bool, 'Enable verbose output', False)
    ]
    
    # Build inbox from stdin + CLI args using shared utility
    inbox = build_inbox_from_args(argument_definitions)
    
    # Call core function with identical semantics
    outbox = polyglot_normalize_prompt(inbox, **inbox)
    
    # Emit JSON output for pipeline consumption
    json.dump(outbox, sys.stdout)

This demonstrates how a Python module can seamlessly delegate to a Java implementation while maintaining the same wrapper contract. The Java wrapper (normalize-prompt-java) would be a compiled version of the Java normalizePrompt example shown earlier.

Key assumptions made:

Program Reusability

By structuring Module programs to use the unified build_inbox_from_args pattern, each Module becomes naturally reusable in multiple contexts:

Wrapper Contract Requirements:

This separation ensures wrappers maintain their contract while enabling comprehensive testing and flexible usage patterns.


Architectural Principles


Validation, error handling, testing, debugging & logging

Testing Strategy

Error Handling

Debugging

TODO: Add comprehensive section covering validation patterns, error handling strategies, testing approaches, debugging techniques, and logging standards for polyglot pipeline modules.


Rationale for Polyglot Implementation

While the entire Fabricator pipelines could, in theory, be implemented in a single language, our design favors a polyglot approach. This allows each module to leverage the language best suited to its particular role, improving expressiveness, maintainability, and integration with existing tools. For example:

Java Strengths

Python Strengths

Lisp (Common Lisp) Strengths

Bash Strengths

The modular, mixed-language Webwright design pattern allows us to prototype rapidly, optimize when needed, and maintain clarity between different kinds of logic: transformation, coordination, parsing, and enrichment.