Input/Output > Load gather from S3

Description

Load gather from S3 reads a seismic gather dataset previously saved to Amazon S3 cloud storage and brings it into the g-Platform processing flow. The module reads a metadata file stored alongside the data to determine the gather dimensions (trace count, sample count, sample interval), then downloads all gather tiles in parallel using multiple read threads. The reconstructed gather is assembled in memory and passed to downstream processing modules as a standard gather output.

This module is the counterpart to Save gather to S3 and is designed for cloud-based processing workflows where large seismic datasets are stored in S3 buckets. Authentication is managed through named credential profiles defined in the local credentials configuration file.

Input data

This module does not require a seismic data input connection. All data is read directly from Amazon S3 using the path and credentials specified in the parameters.

Parameters

Gather name

The S3 path (key prefix) identifying the gather dataset to load. This should match the gather name used when the data was written by the Save gather to S3 module. The path must point to the root location of the gather in the S3 bucket, where the metadata file and data tiles are stored.

Credentials profile { geomage }

The name of the AWS credentials profile to use for authenticating with Amazon S3. Profiles are defined in the local S3 credentials configuration file (INI format). The dropdown list is populated automatically from the available profiles in that file. Select the profile that provides access to the S3 bucket containing the gather data.

Settings

Number of read threads

The number of parallel threads used to download gather tiles from S3. Default: 5. Increasing this value can significantly reduce loading time for large gathers with many tiles, provided the network connection and S3 service can sustain the concurrent requests. Valid range: 1 to 1000.

Execute on { CPU, GPU }

Selects whether processing runs on the CPU or GPU. For this module, which performs network I/O rather than heavy computation, CPU execution is standard.

Distributed execution

Controls whether the module runs on a remote processing node in a distributed cluster environment. When enabled, the job is submitted to a remote node rather than executed locally.

Bulk size

The minimum number of gathers processed per execution chunk in distributed mode. Larger values reduce scheduling overhead but require more memory per node.

Limit number of threads on nodes

When distributed execution is active, this setting caps the number of threads that each remote node is allowed to use, preventing overloading of shared cluster resources.

Job suffix

An optional text label appended to the distributed job name. Use this to distinguish multiple simultaneous jobs running on the cluster.

Set custom affinity

When enabled, allows the user to manually specify which CPU cores or NUMA nodes this module is allowed to use. Leave disabled to let the system assign resources automatically.

Affinity

The specific CPU core or NUMA node affinity mask, active only when Set custom affinity is enabled.

Number of threads

The number of CPU threads to use for local execution. Higher values can improve throughput when processing many gathers sequentially.

Skip

When enabled, this module is bypassed and execution continues with the next module in the flow. Use this setting to temporarily disable the module without removing it from the workflow.

Output data

Output DataItem

The primary data container passed to the next module in the sequence, carrying all associated seismic data items loaded from S3.

Output SEG-Y data handle

A handle to the seismic data reader, enabling downstream modules to access trace data on demand.

Output trace headers

The collection of trace headers describing all traces in the loaded gather, including geometry and sorting information.

Output gather

The seismic gather loaded from S3, assembled from all downloaded tiles. This gather is available for connection to any downstream processing or display module.

Output stack line

The 2D stack line geometry associated with the loaded dataset, if present.

Output crooked line

The crooked 2D line geometry associated with the loaded dataset, if present.

Output bin grid

The 3D bin grid geometry associated with the loaded dataset, used for inline/crossline address resolution, if present.

Output sorted headers

An indexed and sorted version of the trace headers, used by downstream modules that require ordered access to traces by gather key (e.g., by CDP or offset).

Load gather from S3