|
<< Click to Display Table of Contents >> Navigation: Input/Output > Save gather to S3 |
Save gather to S3 writes a seismic gather dataset to Amazon S3 cloud storage in a tiled format optimised for fast parallel retrieval. The module groups traces by bin (inline/crossline), reads them in configurable bulk chunks, and uploads each bin gather as a separate S3 object using multiple write threads. A metadata file is also written to S3 to record the gather dimensions and grouping mode, enabling the Load gather from S3 module to reconstruct the dataset later.
This module is used in cloud-based processing workflows to archive processed seismic volumes to S3 buckets, making them available for subsequent processing steps or long-term storage. Authentication is managed through named credential profiles. After execution, the module reports the total number of traces and files written.
The primary seismic data container from the upstream module, carrying all associated seismic data items to be written to S3.
The seismic data reader handle used to access trace amplitude data for writing. This provides the module with the trace count, sample count, and sample interval needed to initialise the S3 service.
The collection of trace headers for the dataset. The module uses the bin (inline/crossline) information in these headers to group traces into per-bin gathers before uploading to S3.
The seismic gather to be saved. Connect this to the gather output of the upstream processing module.
The 2D stack line geometry associated with the dataset, if applicable.
The crooked 2D line geometry associated with the dataset, if applicable.
The 3D bin grid geometry associated with the dataset, used for inline/crossline address resolution, if applicable.
An indexed and sorted version of the trace headers from the upstream module, enabling ordered per-gather access.
The S3 key prefix (path) under which the gather dataset will be stored. All per-bin object files and the metadata file will be written under this prefix within the configured S3 bucket. Use the same name when loading the dataset back with Load gather from S3.
When enabled, the module skips writing any S3 object that already exists at the target path, rather than overwriting it. This can speed up re-runs of partially completed jobs but may result in inconsistent data if previous writes were incomplete. The UNSAFE label indicates that partial datasets can occur; enable this option only when you are confident the existing files are valid. Default: disabled (files are always overwritten).
Determines how traces are grouped into S3 objects. Currently only Bin grouping (by inline/crossline bin) is implemented; Source and Receiver grouping modes are reserved for future use. This parameter is read-only and cannot be changed. Default: Bin.
The name of the AWS credentials profile used to authenticate with Amazon S3. Profiles are defined in the local S3 credentials configuration file (INI format). Select the profile that has write access to the destination S3 bucket.
Controls whether this module automatically connects to the preceding module in the sequence flow. When enabled, the module inherits the data connection from the upstream step without requiring a manual link.
The number of parallel threads used to upload per-bin gather objects to S3. Default: 5. Increasing this value can reduce total write time for large datasets with many bins, subject to network bandwidth and S3 throughput limits. Valid range: 1 to 1000.
The number of traces read from the seismic source in each read batch before being grouped and uploaded to S3. Default: 10000 traces. Larger values increase memory usage but reduce the number of read operations, which can improve throughput on datasets with high per-read latency. Valid range: 1 to 1,000,000,000 traces.
Selects whether processing runs on the CPU or GPU. For this module, which performs network I/O rather than heavy computation, CPU execution is standard.
Controls whether the module runs on a remote processing node in a distributed cluster environment. When enabled, the job is submitted to a remote node rather than executed locally.
The minimum number of gathers processed per execution chunk in distributed mode. Larger values reduce scheduling overhead but require more memory per node.
When distributed execution is active, this setting caps the number of threads that each remote node is allowed to use, preventing overloading of shared cluster resources.
An optional text label appended to the distributed job name. Use this to distinguish multiple simultaneous jobs running on the cluster.
When enabled, allows the user to manually specify which CPU cores or NUMA nodes this module is allowed to use. Leave disabled to let the system assign resources automatically.
The specific CPU core or NUMA node affinity mask, active only when Set custom affinity is enabled.
The number of CPU threads to use for local execution. Higher values can improve throughput when processing many gathers sequentially.
When enabled, this module is bypassed and execution continues with the next module in the flow. Use this setting to temporarily disable the module without removing it from the workflow.
The total number of seismic traces successfully written to S3 after execution completes. Use this value to verify that all expected traces were uploaded.
The total number of S3 objects (per-bin gather files) written during execution. Each populated inline/crossline bin results in one S3 object.