Transform¶

Bidirectional data transformation engine for round-trip workflows (decode, patch, re-encode) on nested binary blobs. Domain-agnostic: the engine knows nothing about SAS, Viya, or any specific platform.

TL;DR¶

from kstlib.transform import transform

# Decode a base64+zlib+JSON+XML blob, patch a hostname inside the
# inner XML, and re-encode back to the exact same format
patched_blob = transform(
    blob_b64_string,
    chain_name="patch_report",
)

The chain definition lives in kstlib.conf.yml:

transforms:
  chains:
    sas_report:
      forward:
        - base64
        - zlib:
            skip_bytes: 3
        - json:
            extract: "transferableContent.content"
      backward:
        - json:
            wrap: "transferableContent.content"
        - zlib:
            prepend_bytes: "4d1504"
        - base64

    patch_report:
      preset: sas_report
      patch:
        scope: blob
        replace:
          "https://old-host/": "https://new-host/"

Concept: preset vs usage pattern¶

Transform chains follow a two-tier convention:

Preset : a reusable decode/encode chain. Define it once, reference it from multiple usage chains. A preset has a forward: block (and optionally backward:) but no patch:.
Usage : a chain that inherits the preset’s encode/decode pipeline via preset: and adds its own patch: block (or composed_patch: for surgical workflows).

chains:
  # PRESET: define once, reuse many times
  sas_report:
    forward: [base64, {zlib: {skip_bytes: 3}}, {json: {extract: "..."}}]
    backward: [{json: {wrap: "..."}}, {zlib: {prepend_bytes: "4d1504"}}, base64]

  # USAGE: inherit the preset, add a specific patch
  patch_dev:
    preset: sas_report
    patch:
      replace: {"prod-host": "dev-host"}

  patch_prod:
    preset: sas_report
    patch:
      replace: {"dev-host": "prod-host"}

This separation lets you maintain one decode/encode pipeline definition across many environment-specific patches without duplication.

Note

Chained presets are not supported: a preset cannot itself reference another preset via preset:. The validation enforces this at config-load time.

The 5 primitives¶

Each primitive is bidirectional. The forward direction decodes / parses, the backward direction re-encodes / serializes.

base64¶

forward:
  - base64        # str (b64) -> bytes
backward:
  - base64        # bytes -> str (b64)

Pure RFC 4648 base64 encoding. Three options support proprietary wire formats like SAS Viya report blobs:

Option	Default	Purpose
`strict`	`true`	When `true`, reject any character outside the base64 alphabet (`A-Z`, `a-z`, `0-9`, `+`, `/`, `=`). When `false`, strip non-alphabet chars silently before decoding.
`strip_prefix`	`null`	Literal string removed from the start of the input before decoding. No-op if the input does not start with it (allows mixed blobs). Max 32 chars.
`prefix`	`null`	Literal string prepended to the base64 result on encode. Mirrors `strip_prefix` on the backward path. Max 32 chars.

# SAS Viya report blob: "TRUE###" prefix where ### is a non-alphabet
# separator that strict mode would reject.
forward:
  - base64:
      strict: false             # tolerate ### separator
      strip_prefix: "TRUE###"   # remove SAS marker before decode

backward:
  - base64:
      prefix: "TRUE###"         # re-add SAS marker after encode

The prefix "TRUE" is a clever trick: those 4 base64 chars decode to exactly M\x15\x04 (the SAS proprietary 3-byte header). The ### that follows is a separator that lenient base64 decoders skip. By stripping "TRUE###" first and decoding the rest, you get the raw zlib stream directly without needing skip_bytes on the next stage.

bytes¶

forward:
  - bytes         # bytes -> str (utf-8)
backward:
  - bytes         # str -> bytes (utf-8)

Used to bridge between binary and string-typed primitives. Accepts an encoding option (default utf-8).

zlib¶

forward:
  - zlib          # compressed -> bytes
backward:
  - zlib          # bytes -> compressed

Option	Default	Purpose
`skip_bytes`	`null`	Strip N leading bytes before decompression. Useful for legacy formats that prepend a header to the zlib stream.
`prepend_bytes`	`null`	Hex string prepended after compression. Mirror of `skip_bytes`.
`level`	`-1`	Compression level. `-1` means “Python default” (typically 6). Range `0` (no compression) to `9` (maximum).

# Maximum compression for the SAS Viya backward chain
backward:
  - zlib:
      level: 9        # smallest output, slowest

skip_bytes cannot be auto-reversed (the engine cannot guess what bytes to re-prepend), so any chain that uses it must declare an explicit backward: block with prepend_bytes.

Tip

Modern SAS Viya workflows should prefer the prefix-based approach (see the base64 primitive’s strip_prefix / prefix options) over the legacy skip_bytes / prepend_bytes pair. The prefix approach is simpler, configurable from YAML alone, and avoids byte-counting math.

The decompressor enforces a hard ratio limit (max 100x expansion) and an absolute size limit (200 MB) to prevent zlib bombs.

json¶

forward:
  - json          # str -> dict
backward:
  - json          # dict -> str (UTF-8)

Standard JSON parsing. Four options on the serialize side:

Option	Default	Purpose
`extract`	`null`	Forward-only. Drill into a nested envelope (dot-notation path).
`wrap`	`null`	Backward-only. Restore the value into the envelope captured during forward.
`minify`	`false`	When `true`, output uses compact `separators=(",", ":")` (no whitespace). Useful before zlib compression (denser input compresses better).
`ensure_ascii`	`false`	When `true`, escape non-ASCII chars to `\uXXXX`. kstlib default is `false`, which diverges from Python stdlib (`true`) to preserve Unicode content (French, Japanese, etc.) without bloating the output.

Warning

The kstlib ensure_ascii default is false, NOT true like Python stdlib. This is an intentional divergence: SAS Viya report blobs (and many other real-world payloads) contain accented characters (café, données) that would otherwise be escaped to caf\u00e9 and roughly DOUBLE the JSON size before compression.

extract / wrap are useful for envelope-style payloads:

forward:
  - json:
      extract: "transferableContent.content"   # Drill into the envelope

backward:
  - json:
      wrap: "transferableContent.content"      # Restore the envelope

The forward path stores the original envelope internally (in _ChainContext.json_envelopes) so the backward path can rebuild the exact same structure even when only the inner payload was patched. This makes the round-trip lossless.

xml¶

forward:
  - xml           # str -> ElementTree.Element
backward:
  - xml           # Element -> str

Uses defusedxml if available (recommended for security). DOCTYPE declarations are rejected by default to prevent XXE attacks and billion-laughs expansion.

The patch stage operates directly on the XML string before re-parsing, which is faster and more flexible than walking the Element tree.

String primitives (forward-only extractors)¶

Beyond the 5 bidirectional codecs above, the engine ships 4 string primitives that slice and clean already-decoded text. They are forward-only: the transformation is terminal (a value is extracted or rewritten and the original cannot be rebuilt), so they have no backward implementation.

In a YAML chain this means you declare only forward: (no backward:), and the chain is usable end-to-end via transform() / from_config(). The engine skips auto-reverse for such a chain; calling .backward() on it raises TransformConfigError with a clear message.

transforms:
  chains:
    last_path_segment:        # forward-only: no backward block
      forward:
        - split:
            sep: "/"
            index: -1

from kstlib.transform import transform

transform("/reports/reports/abc", "last_path_segment")   # -> "abc"

Every string primitive hardens its input before processing: non-string input, payloads over 100 MB, and embedded null bytes are rejected with a ParseError (a [SECURITY] warning is logged first). They operate on str only.

split¶

Split a string on a literal separator (no regex) and either return the full list of segments or extract one by index.

Option	Default	Purpose
`sep`	required	Literal separator (validated non-empty at config-load time).
`index`	`null`	Segment to return. `null` returns the full `list[str]`; an integer returns one segment (0-based, negatives count from the end, `-1` is the last). Out of range raises `ParseError`.
`maxsplit`	`-1`	Forwarded to `str.split` (`-1` means no limit).
`keep_empty`	`false`	When `false`, empty segments are dropped before indexing (path-friendly: a leading separator does not create an empty first segment).

# Extract the last non-empty path segment
- split:
    sep: "/"
    index: -1          # "/reports/reports/abc" -> "abc"

# Return every segment as a list
- split:
    sep: ","           # "a,b,c" -> ["a", "b", "c"]

Warning

keep_empty defaults to false, which diverges from str.split. Python’s "/a/b".split("/") returns ["", "a", "b"], whereas the kstlib split primitive returns ["a", "b"] by default. The divergence is intentional (path-friendly extraction: a leading separator should not yield a phantom empty field). Set keep_empty: true to match the stdlib behavior and reach a leading empty segment.

tr¶

Translate or delete characters, like the Unix tr command. This works character by character, which is distinct from the substring replace of the patch stage. Exactly one of delete or map must be set (mutually exclusive, enforced at config-load time).

Option	Purpose
`delete`	String of characters to remove from the input.
`map`	Single-character to single-character translation table.

# Strip newlines from a value
- tr:
    delete: "\n"        # "a\nb\n" -> "ab"

# Translate characters one-to-one
- tr:
    map:
      "a": "x"
      "b": "y"          # "abab" -> "xyxy"

removeprefix / removesuffix¶

Strip a known literal affix from the start (removeprefix) or end (removesuffix) of a string. These are thin wrappers over str.removeprefix / str.removesuffix: if the input does not carry the affix, it is returned unchanged (no error), so the same chain handles mixed inputs that sometimes carry the affix and sometimes do not.

Option	Default	Purpose
`prefix` (removeprefix)	required	Literal prefix stripped from the start. An empty string is a no-op.
`suffix` (removesuffix)	required	Literal suffix stripped from the end. An empty string is a no-op.

- removeprefix:
    prefix: "reports/"   # "reports/abc" -> "abc"; "other/abc" unchanged

- removesuffix:
    suffix: ".json"      # "data.json" -> "data"; "data.yml" unchanged

Only the single leading or trailing occurrence is removed: removeprefix with prefix: "reports/" turns reports/reports/abc into reports/abc, not abc.

When to use which¶

Need	Use
Pull one field out of a delimited string (path segment, CSV column)	`split` with `index`
Get every segment of a delimited string as a list	`split` without `index`
Remove or remap individual characters (strip `\n`, `\t`)	`tr`
Strip a known fixed prefix or suffix (`reports/`, `.json`)	`removeprefix` / `removesuffix`
Replace a substring anywhere in the value	the patch `replace:` stage (not a primitive)

Note

There is deliberately no regex primitive. A regex applied to attacker-influenced input is a ReDoS risk that an input-size cap alone does not neutralize, so regex is intentionally deferred until a concrete need justifies the extra hardening work. For substring replacement use the patch replace: stage; for character-level edits use tr.

SAS Viya blob formats¶

SAS Viya transfer packages use two distinct blob formats depending on the type of object being serialized. The transform engine ships two ready-to-use presets, one for each format.

Format A: report blobs (compressed binary)¶

Used for objects with a large textual payload (BIRD XML reports, data sources, …). The wire format is:

"TRUE###" + base64(zlib(JSON envelope))

"TRUE" is 4 base64 characters that decode to M\x15\x04, the 3-byte SAS proprietary header
"###" is a SAS proprietary separator (NOT in the base64 alphabet, must be stripped before strict decoding)
The remainder is base64-encoded zlib-compressed JSON envelope containing the inner payload (e.g. BIRD XML)

The clever bit: stripping "TRUE###" and then base64-decoding the rest yields a raw zlib stream directly, with no additional skip_bytes needed. This is Approach A and matches the actual SAS wire format bit-for-bit.

Format B: metadata blobs (plain JSON)¶

Used for small metadata objects (folders, files, ACLs, …). The wire format is much simpler:

base64(JSON document)

No prefix, no compression, no proprietary header. Just plain base64-encoded JSON.

The two presets¶

transforms:
  chains:

    # PRESET: sas_report (Format A - compressed binary)
    sas_report:
      forward:
        - base64:
            strict: false             # tolerate ### separator
            strip_prefix: "TRUE###"   # strip SAS marker
        - zlib                        # no skip_bytes - Approach A
        - json:
            extract: "transferableContent.content"
      backward:
        - json:
            wrap: "transferableContent.content"
            minify: true              # compact before compression
            ensure_ascii: false       # preserve French/Unicode
        - zlib:
            level: 9                  # max compression
        - base64:
            prefix: "TRUE###"         # re-add SAS marker

    # PRESET: sas_metadata (Format B - plain JSON)
    sas_metadata:
      forward:
        - base64:
            strict: true              # pure base64, no SAS noise
        - json
      backward:
        - json:
            minify: true              # smaller output
            ensure_ascii: false       # preserve Unicode
        - base64

Cross-format dispatch in user code¶

The transform engine itself does not auto-detect which format applies to a given blob. The caller (a pipeline step or a Python script that iterates the package) is responsible for picking the right preset based on transferObject.summary.type:

import json
from kstlib.transform import TransformChain, load_transform_config

config = load_transform_config()
report_chain = TransformChain.from_config("sas_report", config)
metadata_chain = TransformChain.from_config("sas_metadata", config)

with open("MyPackage.json") as f:
    pkg = json.load(f)

for detail in pkg["transferDetails"]:
    to = detail["transferObject"]
    obj_type = to.get("summary", {}).get("type")
    chain = report_chain if obj_type == "report" else metadata_chain
    to["content"] = chain.transform(to["content"])

with open("MyPackage_patched.json", "w") as f:
    json.dump(pkg, f, separators=(",", ":"), ensure_ascii=False)

Note

The "TRUE###" prefix is fully configurable. If SAS changes their proprietary marker tomorrow (e.g. to "TRUE|||" or anything else), you only need to update the YAML strings - no code change required. This is the whole point of having strip_prefix and prefix as data-driven YAML options instead of hardcoded constants.

Forward / patch / backward¶

Every transform chain follows the same three-stage pipeline:

INPUT  -> [forward primitives] -> decoded data -> [patch] -> patched data -> [backward primitives] -> OUTPUT

A chain that does not declare a patch simply round-trips the data through forward and backward (useful for verifying the integrity of the encode/decode pipeline itself). A chain that does not declare a backward block uses auto-reverse (see below).

Auto-reverse rules¶

When a chain declares only forward: (no backward:), the engine generates the backward chain by reversing the forward primitives in order and swapping primitive options where needed:

Forward primitive	Auto-reverse
`base64`	`base64`
`bytes`	`bytes` (with same encoding)
`xml`	`xml`
`zlib` (no skip_bytes)	`zlib`
`zlib` with `skip_bytes`	error (cannot guess prepend_bytes)
`json` (no extract)	`json`
`json` with `extract: "a.b"`	`json` with `wrap: "a.b"`

The zlib skip_bytes exception¶

# This chain WILL FAIL at config-load time
chains:
  bad_chain:
    forward:
      - base64
      - zlib:
          skip_bytes: 3      # Cannot auto-reverse: needs explicit prepend_bytes

Fix: declare an explicit backward: block with prepend_bytes:

chains:
  good_chain:
    forward:
      - base64
      - zlib:
          skip_bytes: 3
    backward:
      - zlib:
          prepend_bytes: "4d1504"
      - base64

Patches: replace vs callable¶

A patch: block applies between forward and backward. It is mutually exclusive between two modes:

Mode 1: replace (string substitution)¶

Simple key/value substitution applied to the decoded data (after the forward chain). Works on strings and on serialized XML.

patch:
  scope: blob                          # default - patch decoded data
  replace:
    "https://old-host/": "https://new-host/"
    'library="CASUSER"': 'library="PUBLIC"'

The replace map is applied in dict iteration order. There is no regex support (use callable for regex needs). The engine enforces a maximum of 100 entries per replace map, and 4096 chars per key/value.

Note

Deprecated alias: the field name mapping: is still accepted as a deprecated alias for replace:. Setting mapping: emits a DeprecationWarning and is silently copied to replace:. Migrate existing configs to replace:.

Mode 2: callable¶

For complex patches (regex, lookup tables, conditional logic, external state), use a Python callable:

patch:
  scope: blob
  callable: myproject.transforms:patch_function
  args:
    target_host: "{{target_host}}"     # Resolved from chain context
    cas_mapping: "{{cas_mapping}}"     # Resolved from chain context

The callable target follows the module.path:function_name convention. The function is called as func(decoded_data, **resolved_args) and must return the patched data.

{{variable}} references in args are resolved against the chain’s context dict at execution time, allowing dynamic values to be injected from a pipeline step or any caller.

Important

External callables must be whitelisted in transforms.security.allowed_callable_modules. A callable whose module is not in the whitelist raises TransformConfigError at config-load time, before any transformation runs.

Patch scope: blob | outer | all¶

The scope: field controls where a replace: patch applies.

	`scope: blob` (default)	`scope: outer`	`scope: all`
Decoded data (e.g. BIRD XML)	applied	not applied	applied
Outer wrapper (`metadata['outer']`)	not applied	applied	applied
`connectors[*].hints.xpath`	n/a	PROTECTED, never patched	PROTECTED, never patched

scope: blob is the default and preserves the historical behavior: the replace map is applied to the data decoded by the forward chain.

scope: outer and scope: all mutate a JSON wrapper passed by the caller via the existing metadata= kwarg on chain.transform() / chain.patch(). The wrapper lives outside the encoded blob: think of connectors[].uri and connectors[].hints.orig-uri in a SAS Viya transferObject document. Use these scopes when you need to patch fields that the forward chain never touches.

chains:
  patch_report_full:
    preset: sas_report
    patch:
      scope: all                       # patch BIRD XML AND outer wrapper
      replace:
        'library="CASUSER"': 'library="PUBLIC"'   # BIRD XML form
        'library=CASUSER':   'library=PUBLIC'      # connector URI form
        "https://old-host":  "https://new-host"

import json
from kstlib.transform import TransformChain, load_transform_config

config = load_transform_config()
chain = TransformChain.from_config("patch_report_full", config)

wrapper = json.loads(transfer_object_json)
blob = wrapper["content"]
new_blob = chain.transform(blob, metadata={"outer": wrapper})
wrapper["content"] = new_blob
# wrapper is mutated in place by replace_outer_uris

If scope: is outer or all and metadata['outer'] is missing, the patch raises PatchError at execution time.

Protected outer paths (xpath safety)¶

replace_outer_uris is the helper that powers scope: outer and scope: all. It walks the wrapper recursively and applies the replace map to every string value, except strings whose path matches a PROTECTED_OUTER_PATHS entry.

The default blacklist contains one path:

PROTECTED_OUTER_PATHS = frozenset({
    "connectors[*].hints.xpath",
})

Path syntax	Meaning
`dict.key`	Match a dict key literally.
`[*]`	Match any list index (wildcard).
`dict.key[*].sub.key`	Mix of literal keys and list wildcards.

The connectors[*].hints.xpath blacklist exists because SAS Viya stores BIRD XPath pointers there. Patching these strings would break the wrapper-to-content coherence and silently corrupt the report.

Warning

connectors[*].hints.xpath is ALWAYS protected. Even with scope: all, even with custom replace maps, it is never modified.

You can extend or replace the blacklist by passing a custom protected_paths to replace_outer_uris directly:

from kstlib.transform import replace_outer_uris

n = replace_outer_uris(
    wrapper,
    {"old": "new"},
    protected_paths=frozenset({"my.field", "items[*].immutable"}),
)

The function returns the number of strings that were modified.

Composed patches: surgical multi-object workflows¶

Plain patch: applies to every object in a workflow. When a package contains many objects (e.g. 200 reports) and each needs a different replace map, use composed_patch: instead.

A composed patch references other chains by name and applies their patch block conditionally. Two layers exist:

global_patches: applied to every object regardless of metadata
targeted_patches: applied only when the object metadata matches a filter

chains:
  # "Patch-only" reusable building blocks (no forward/backward)
  remap_host:
    patch:
      replace:
        "https://source.res.private/": "https://target.res.private/"

  remap_caslib_global:
    patch:
      replace:
        'library="CASUSER"': 'library="PROD_GLOBAL_LIB"'

  remap_caslib_r220:
    patch:
      replace:
        'library="CASUSER"': 'library="R220_DEDICATED_LIB"'

  # The orchestrator: inherits forward/backward from sas_report,
  # composes the building blocks via global_patches + targeted_patches
  patch_report_composed:
    preset: sas_report

    global_patches:
      - remap_host             # Applied to EVERY object

    targeted_patches:
      - filter:
          content_type: report
          name: "R220_*"
        patches:
          - remap_caslib_r220

      - filter:
          content_type: report
          name: "*"            # Fallback for other reports
        patches:
          - remap_caslib_global

Cascade: last applied wins (inverse of CSS)¶

Warning

Cascade is by declaration order, NOT by filter specificity. This is the inverse of CSS.

Order your targeted_patches from most general to most specific. The last applied patch overwrites earlier ones on conflict.

Concrete example with the config above:

from kstlib.transform import transform

# Object 1: matches both R220_* and the "*" fallback
result_r220 = transform(
    blob_b64,
    "patch_report_composed",
    metadata={"content_type": "report", "name": "R220_SALES"},
)
# Apply order: remap_host -> remap_caslib_r220 -> remap_caslib_global
# Final caslib: PROD_GLOBAL_LIB (the wildcard fallback wins because
# it is declared LAST in targeted_patches)

# Object 2: matches only the "*" fallback
result_other = transform(
    blob_b64,
    "patch_report_composed",
    metadata={"content_type": "report", "name": "REPORT_FOO"},
)
# Apply order: remap_host -> remap_caslib_global
# Final caslib: PROD_GLOBAL_LIB

If you want R220 reports to keep their dedicated caslib, declare the specific filter LAST so it wins:

targeted_patches:
  - filter: {name: "*"}             # General first (will be overridden)
    patches: [remap_caslib_global]

  - filter: {name: "R220_*"}        # Specific last (final winner for R220)
    patches: [remap_caslib_r220]

This ordering convention is intentional: it mirrors the kstlib config cascade philosophy (kwargs > user config > preset > defaults) where the most explicit override always wins.

Filter syntax¶

A filter combines two fields, ANDed together:

Field	Type	Default	Notes
`content_type`	string	`"*"` (any)	Exact match against `metadata["content_type"]`
`name`	string (glob)	`"*"` (any)	`fnmatch` glob against `metadata["name"]`

- filter:
    content_type: report      # Exact: must equal "report"
    name: "R220_*"            # Glob: prefix match
  patches: [remap_caslib_r220]

The metadata dict is provided by the caller via the metadata= keyword argument on transform() or chain.transform(). The transform engine never reads metadata from the data itself: the caller is responsible for extracting content_type and name from whatever schema they use (e.g. transferDetails[].transferObject.summary in the SAS Viya Transfer API).

Note

A targeted_patches entry with no filter (or with all wildcards) matches every object and behaves like an additional global_patches entry. The advantage of using targeted_patches with a "*" filter over global_patches is ordering control: targeted_patches always run AFTER global_patches.

Patch-only chains¶

A chain that has only patch (no forward, no preset) is a “patch-only” chain. It exists solely to be referenced from another chain’s composed_patch.global_patches or composed_patch.targeted_patches[*].patches.

chains:
  remap_host:
    # No forward, no preset, no backward. Just a patch building block.
    patch:
      replace:
        "https://source/": "https://target/"

Calling TransformChain.transform() directly on a patch-only chain works as identity (forward() and backward() are no-ops because the primitive list is empty), but it is not the intended use case.

Security¶

Callable whitelist¶

transforms:
  security:
    allowed_callable_modules:
      - myproject.transforms
      - myproject.viya.patches

Only callables whose module path matches an entry in this list (or is a sub-module of one) are allowed. The default is an empty whitelist, which means no external callables can be invoked.

A callable target whose module is not whitelisted raises TransformConfigError at config-load time, before any transformation runs. Replace-based patches are not affected by the whitelist.

Hard limits¶

Parameter	Default	Hard Max
Input data size	100 MB	100 MB
JSON payload size	50 MB	50 MB
XML payload size	50 MB	50 MB
Decompressed size	200 MB	200 MB
Decompression ratio	100x	100x
Replace entries per patch	100	100
Replace key/value length	4096 chars	4096 chars
Named chains	50	50
Forward / backward chain length	20 primitives	20 primitives
Global patches per composition	10	10
Targeted patches per composition	50	50
Patches per targeted entry	10	10
Glob pattern length	256 chars	256 chars

Zlib bomb protection¶

The zlib_decompress primitive enforces both an absolute decompressed size limit (200 MB) and a maximum decompression ratio (100x). A zlib stream that expands beyond either threshold raises DecompressError immediately, before allocating the full output buffer.

XML security¶

The xml_parse primitive uses defusedxml if available (recommended). DOCTYPE declarations are rejected by default to prevent XXE attacks and billion-laughs entity expansion.

If defusedxml is not installed, the engine falls back to the stdlib xml.etree.ElementTree parser with explicit DOCTYPE rejection layered on top.

YAML config reference¶

Complete schema with comments. All fields are optional unless marked required:

transforms:
  # Security: callable whitelist (default: empty = no callables allowed)
  security:
    allowed_callable_modules:
      - myproject.transforms       # Module path prefix

  # Named chain definitions
  chains:

    # Example 1: full chain with explicit forward + backward + patch
    my_chain:                      # **required** - chain name
      forward:                     # **required** unless preset is set
        - base64                   # Primitive name (string form)
        - zlib:                    # Primitive with options (dict form)
            skip_bytes: 3
        - json:
            extract: "path.to.field"
      backward:                    # Optional (auto-reversed if absent)
        - json:
            wrap: "path.to.field"
        - zlib:
            prepend_bytes: "4d1504"
        - base64
      patch:                       # Optional (no patching if absent)
        scope: blob                # blob | outer | all (default: blob)
        replace:                   # Mutually exclusive with callable
          "old": "new"
        callable: mod.path:fn      # Mutually exclusive with replace
        args:
          key: "{{var}}"           # Resolved from chain context

    # Example 2: chain that inherits a preset
    my_usage:
      preset: my_chain             # Mutually exclusive with forward
      patch:                       # Override the preset's patch
        scope: blob
        replace:
          "foo": "bar"

    # Example 3: chain with composed patch (mutually exclusive with patch)
    my_composed:
      preset: my_chain
      global_patches:              # List of chain names
        - other_chain_a

      targeted_patches:            # Conditional patches
        - filter:
            content_type: report
            name: "R220_*"
          patches:
            - other_chain_b

Python API¶

Convenience function¶

from kstlib.transform import transform

# Loads config from kstlib.conf.yml automatically
result = transform(blob_b64, "patch_report")

# With metadata for composed_patch filter matching
result = transform(
    blob_b64,
    "patch_report_composed",
    metadata={"content_type": "report", "name": "R220_SALES"},
)

Client instance¶

from kstlib.transform import TransformChain, load_transform_config

config = load_transform_config()
chain = TransformChain.from_config("patch_report", config)

# Forward only
decoded = chain.forward(blob_b64)

# Patch only (operates on the already-decoded data)
patched = chain.patch(decoded)

# Backward only (must be called after forward to restore envelopes)
re_encoded = chain.backward(patched)

# Full round-trip
result = chain.transform(blob_b64)

Programmatic construction¶

from kstlib.transform import (
    TransformChain,
    TransformChainConfig,
    PrimitiveConfig,
    PatchConfig,
)

chain = TransformChain(
    TransformChainConfig(
        name="my_chain",
        forward=(
            PrimitiveConfig(name="base64"),
            PrimitiveConfig(name="zlib"),
            PrimitiveConfig(name="json"),
        ),
        patch=PatchConfig(replace={"foo": "bar"}),
    )
)

result = chain.transform(blob_b64_string)

Integration with kstlib.pipeline¶

The transform engine integrates cleanly with kstlib.pipeline via the CallableStep. A pipeline step can invoke kstlib.transform.transform directly:

pipelines:
  patch-and-upload:
    steps:
      - name: load
        type: shell
        command: "kstlib rapi download --out blob.json"

      - name: patch
        type: callable
        callable: kstlib.transform:transform
        args:
          - "{{blob_b64}}"           # Loaded from previous step
          - "patch_report"           # Chain name

      - name: upload
        type: shell
        command: "kstlib rapi upload --body @result.json"

For composed patches, pass metadata= as a kwarg via the callable args (the syntax depends on your pipeline step setup).

Examples¶

See examples/transform/ for runnable demos:

01_round_trip.py : programmatic chain construction with synthetic data, full forward + patch + backward, integrity verification
02_config_driven.py : load a chain from kstlib.conf.yml and apply it via TransformChain.from_config()
03_composed_patch.py : 3 synthetic objects (R220_foo, REPORT_bar, OTHER_baz) demonstrating the global + targeted cascade with explicit before/after output, and scope: all mutating the outer wrapper for the R220 case
04_outer_patch.py : the three scope: values (blob, outer, all) exercised in isolation on a synthetic SAS Viya transferObject, plus the replace_outer_uris standalone helper and a custom protected_paths blacklist

Tip

For complete API documentation including all classes, functions, and exceptions, see Bidirectional Data Transformation.