Input/Output > Save gather to S3

Description

Save gather to S3 writes a seismic gather dataset to Amazon S3 cloud storage in a tiled format optimised for fast parallel retrieval. The module groups traces by bin (inline/crossline), reads them in configurable bulk chunks, and uploads each bin gather as a separate S3 object using multiple write threads. A metadata file is also written to S3 to record the gather dimensions and grouping mode, enabling the Load gather from S3 module to reconstruct the dataset later.

This module is used in cloud-based processing workflows to archive processed seismic volumes to S3 buckets, making them available for subsequent processing steps or long-term storage. Authentication is managed through named credential profiles. After execution, the module reports the total number of traces and files written.

Input data

Input DataItem

The primary seismic data container from the upstream module, carrying all associated seismic data items to be written to S3.

Input SEG-Y data handle

The seismic data reader handle used to access trace amplitude data for writing. This provides the module with the trace count, sample count, and sample interval needed to initialise the S3 service.

Input trace headers

The collection of trace headers for the dataset. The module uses the bin (inline/crossline) information in these headers to group traces into per-bin gathers before uploading to S3.

Input gather

The seismic gather to be saved. Connect this to the gather output of the upstream processing module.

Input stack line

The 2D stack line geometry associated with the dataset, if applicable.

Input crooked line

The crooked 2D line geometry associated with the dataset, if applicable.

Input bin grid

The 3D bin grid geometry associated with the dataset, used for inline/crossline address resolution, if applicable.

Input sorted headers

An indexed and sorted version of the trace headers from the upstream module, enabling ordered per-gather access.

Parameters

Gather name

The S3 key prefix (path) under which the gather dataset will be stored. All per-bin object files and the metadata file will be written under this prefix within the configured S3 bucket. Use the same name when loading the dataset back with Load gather from S3.

Do not rewrite existing files (UNSAFE)

When enabled, the module skips writing any S3 object that already exists at the target path, rather than overwriting it. This can speed up re-runs of partially completed jobs but may result in inconsistent data if previous writes were incomplete. The UNSAFE label indicates that partial datasets can occur; enable this option only when you are confident the existing files are valid. Default: disabled (files are always overwritten).

Gather grouping { Source, Receiver, Bin }

Determines how traces are grouped into S3 objects. Currently only Bin grouping (by inline/crossline bin) is implemented; Source and Receiver grouping modes are reserved for future use. This parameter is read-only and cannot be changed. Default: Bin.

Credentials profile { geomage }

The name of the AWS credentials profile used to authenticate with Amazon S3. Profiles are defined in the local S3 credentials configuration file (INI format). Select the profile that has write access to the destination S3 bucket.

Settings

Auto-connection

Controls whether this module automatically connects to the preceding module in the sequence flow. When enabled, the module inherits the data connection from the upstream step without requiring a manual link.

Number of write threads

The number of parallel threads used to upload per-bin gather objects to S3. Default: 5. Increasing this value can reduce total write time for large datasets with many bins, subject to network bandwidth and S3 throughput limits. Valid range: 1 to 1000.

Read bulk size

The number of traces read from the seismic source in each read batch before being grouped and uploaded to S3. Default: 10000 traces. Larger values increase memory usage but reduce the number of read operations, which can improve throughput on datasets with high per-read latency. Valid range: 1 to 1,000,000,000 traces.

Execute on { CPU, GPU }

Selects whether processing runs on the CPU or GPU. For this module, which performs network I/O rather than heavy computation, CPU execution is standard.

Distributed execution

Controls whether the module runs on a remote processing node in a distributed cluster environment. When enabled, the job is submitted to a remote node rather than executed locally.

Bulk size

The minimum number of gathers processed per execution chunk in distributed mode. Larger values reduce scheduling overhead but require more memory per node.

Limit number of threads on nodes

When distributed execution is active, this setting caps the number of threads that each remote node is allowed to use, preventing overloading of shared cluster resources.

Job suffix

An optional text label appended to the distributed job name. Use this to distinguish multiple simultaneous jobs running on the cluster.

Set custom affinity

When enabled, allows the user to manually specify which CPU cores or NUMA nodes this module is allowed to use. Leave disabled to let the system assign resources automatically.

Affinity

The specific CPU core or NUMA node affinity mask, active only when Set custom affinity is enabled.

Number of threads

The number of CPU threads to use for local execution. Higher values can improve throughput when processing many gathers sequentially.

Skip

When enabled, this module is bypassed and execution continues with the next module in the flow. Use this setting to temporarily disable the module without removing it from the workflow.

Output data

Information

Traces written

The total number of seismic traces successfully written to S3 after execution completes. Use this value to verify that all expected traces were uploaded.

Files written

The total number of S3 objects (per-bin gather files) written during execution. Each populated inline/crossline bin results in one S3 object.

Save gather to S3