NAME

Paws::DMS::S3Settings

USAGE

This class represents one of two things:

Arguments in a call to a service

Use the attributes of this class as arguments to methods. You shouldn't make instances of this class. Each attribute should be used as a named argument in the calls that expect this type of object.

As an example, if Att1 is expected to be a Paws::DMS::S3Settings object:

  $service_obj->Method(Att1 => { BucketFolder => $value, ..., UseCsvNoSupValue => $value  });

Results returned from an API call

Use accessors for each attribute. If Att1 is expected to be an Paws::DMS::S3Settings object:

  $result = $service_obj->Method(...);
  $result->Att1->BucketFolder

DESCRIPTION

Settings for exporting data to Amazon S3.

ATTRIBUTES

BucketFolder => Str

An optional parameter to set a folder name in the S3 bucket. If provided, tables are created in the path " bucketFolder/schema_name/table_name/". If this parameter isn't specified, then the path used is " schema_name/table_name/".

BucketName => Str

The name of the S3 bucket.

CdcInsertsAndUpdates => Bool

A value that enables a change data capture (CDC) load to write INSERT and UPDATE operations to .csv or .parquet (columnar storage) output files. The default setting is "false", but when "CdcInsertsAndUpdates" is set to "true" or "y", only INSERTs and UPDATEs from the source database are migrated to the .csv or .parquet file.

For .csv file format only, how these INSERTs and UPDATEs are recorded depends on the value of the "IncludeOpForFullLoad" parameter. If "IncludeOpForFullLoad" is set to "true", the first field of every CDC record is set to either "I" or "U" to indicate INSERT and UPDATE operations at the source. But if "IncludeOpForFullLoad" is set to "false", CDC records are written without an indication of INSERT or UPDATE operations at the source. For more information about how these settings work together, see Indicating Source DB Operations in Migrated S3 Data (https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html#CHAP_Target.S3.Configuring.InsertOps) in the AWS Database Migration Service User Guide..

AWS DMS supports the use of the "CdcInsertsAndUpdates" parameter in versions 3.3.1 and later.

"CdcInsertsOnly" and "CdcInsertsAndUpdates" can't both be set to "true" for the same endpoint. Set either "CdcInsertsOnly" or "CdcInsertsAndUpdates" to "true" for the same endpoint, but not both.

CdcInsertsOnly => Bool

A value that enables a change data capture (CDC) load to write only INSERT operations to .csv or columnar storage (.parquet) output files. By default (the "false" setting), the first field in a .csv or .parquet record contains the letter I (INSERT), U (UPDATE), or D (DELETE). These values indicate whether the row was inserted, updated, or deleted at the source database for a CDC load to the target.

If "CdcInsertsOnly" is set to "true" or "y", only INSERTs from the source database are migrated to the .csv or .parquet file. For .csv format only, how these INSERTs are recorded depends on the value of "IncludeOpForFullLoad". If "IncludeOpForFullLoad" is set to "true", the first field of every CDC record is set to I to indicate the INSERT operation at the source. If "IncludeOpForFullLoad" is set to "false", every CDC record is written without a first field to indicate the INSERT operation at the source. For more information about how these settings work together, see Indicating Source DB Operations in Migrated S3 Data (https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html#CHAP_Target.S3.Configuring.InsertOps) in the AWS Database Migration Service User Guide..

AWS DMS supports the interaction described preceding between the "CdcInsertsOnly" and "IncludeOpForFullLoad" parameters in versions 3.1.4 and later.

"CdcInsertsOnly" and "CdcInsertsAndUpdates" can't both be set to "true" for the same endpoint. Set either "CdcInsertsOnly" or "CdcInsertsAndUpdates" to "true" for the same endpoint, but not both.

CdcPath => Str

Specifies the folder path of CDC files. For an S3 source, this setting is required if a task captures change data; otherwise, it's optional. If "CdcPath" is set, AWS DMS reads CDC files from this path and replicates the data changes to the target endpoint. For an S3 target if you set "PreserveTransactions" (https://docs.aws.amazon.com/dms/latest/APIReference/API_S3Settings.html#DMS-Type-S3Settings-PreserveTransactions) to "true", AWS DMS verifies that you have set this parameter to a folder path on your S3 target where AWS DMS can save the transaction order for the CDC load. AWS DMS creates this CDC folder path in either your S3 target working directory or the S3 target location specified by "BucketFolder" (https://docs.aws.amazon.com/dms/latest/APIReference/API_S3Settings.html#DMS-Type-S3Settings-BucketFolder) and "BucketName" (https://docs.aws.amazon.com/dms/latest/APIReference/API_S3Settings.html#DMS-Type-S3Settings-BucketName).

For example, if you specify "CdcPath" as "MyChangedData", and you specify "BucketName" as "MyTargetBucket" but do not specify "BucketFolder", AWS DMS creates the CDC folder path following: "MyTargetBucket/MyChangedData".

If you specify the same "CdcPath", and you specify "BucketName" as "MyTargetBucket" and "BucketFolder" as "MyTargetData", AWS DMS creates the CDC folder path following: "MyTargetBucket/MyTargetData/MyChangedData".

For more information on CDC including transaction order on an S3 target, see Capturing data changes (CDC) including transaction order on the S3 target (https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html#CHAP_Target.S3.EndpointSettings.CdcPath).

This setting is supported in AWS DMS versions 3.4.2 and later.

CompressionType => Str

An optional parameter to use GZIP to compress the target files. Set to GZIP to compress the target files. Either set this parameter to NONE (the default) or don't use it to leave the files uncompressed. This parameter applies to both .csv and .parquet file formats.

CsvDelimiter => Str

The delimiter used to separate columns in the .csv file for both source and target. The default is a comma.

CsvNoSupValue => Str

This setting only applies if your Amazon S3 output files during a change data capture (CDC) load are written in .csv format. If "UseCsvNoSupValue" (https://docs.aws.amazon.com/dms/latest/APIReference/API_S3Settings.html#DMS-Type-S3Settings-UseCsvNoSupValue) is set to true, specify a string value that you want AWS DMS to use for all columns not included in the supplemental log. If you do not specify a string value, AWS DMS uses the null value for these columns regardless of the "UseCsvNoSupValue" setting.

This setting is supported in AWS DMS versions 3.4.1 and later.

CsvRowDelimiter => Str

The delimiter used to separate rows in the .csv file for both source and target. The default is a carriage return ("\n").

DataFormat => Str

The format of the data that you want to use for output. You can choose one of the following:

"csv" : This is a row-based file format with comma-separated values (.csv).
"parquet" : Apache Parquet (.parquet) is a columnar storage file format that features efficient compression and provides faster query response.

DataPageSize => Int

The size of one data page in bytes. This parameter defaults to 1024 * 1024 bytes (1 MiB). This number is used for .parquet file format only.

DatePartitionDelimiter => Str

Specifies a date separating delimiter to use during folder partitioning. The default value is "SLASH". Use this parameter when "DatePartitionedEnabled" is set to "true".

DatePartitionEnabled => Bool

When set to "true", this parameter partitions S3 bucket folders based on transaction commit dates. The default value is "false". For more information about date-based folder partitoning, see Using date-based folder partitioning (https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html#CHAP_Target.S3.DatePartitioning).

DatePartitionSequence => Str

Identifies the sequence of the date format to use during folder partitioning. The default value is "YYYYMMDD". Use this parameter when "DatePartitionedEnabled" is set to "true".

DictPageSizeLimit => Int

The maximum size of an encoded dictionary page of a column. If the dictionary page exceeds this, this column is stored using an encoding type of "PLAIN". This parameter defaults to 1024 * 1024 bytes (1 MiB), the maximum size of a dictionary page before it reverts to "PLAIN" encoding. This size is used for .parquet file format only.

EnableStatistics => Bool

A value that enables statistics for Parquet pages and row groups. Choose "true" to enable statistics, "false" to disable. Statistics include "NULL", "DISTINCT", "MAX", and "MIN" values. This parameter defaults to "true". This value is used for .parquet file format only.

EncodingType => Str

The type of encoding you are using:

"RLE_DICTIONARY" uses a combination of bit-packing and run-length encoding to store repeated values more efficiently. This is the default.
"PLAIN" doesn't use encoding at all. Values are stored as they are.
"PLAIN_DICTIONARY" builds a dictionary of the values encountered in a given column. The dictionary is stored in a dictionary page for each column chunk.

EncryptionMode => Str

The type of server-side encryption that you want to use for your data. This encryption type is part of the endpoint settings or the extra connections attributes for Amazon S3. You can choose either "SSE_S3" (the default) or "SSE_KMS".

For the "ModifyEndpoint" operation, you can change the existing value of the "EncryptionMode" parameter from "SSE_KMS" to "SSE_S3". But you can’t change the existing value from "SSE_S3" to "SSE_KMS".

To use "SSE_S3", you need an AWS Identity and Access Management (IAM) role with permission to allow "arn:aws:s3:::dms-*" to use the following actions:

"s3:CreateBucket"
"s3:ListBucket"
"s3:DeleteBucket"
"s3:GetBucketLocation"
"s3:GetObject"
"s3:PutObject"
"s3:DeleteObject"
"s3:GetObjectVersion"
"s3:GetBucketPolicy"
"s3:PutBucketPolicy"
"s3:DeleteBucketPolicy"

ExternalTableDefinition => Str

Specifies how tables are defined in the S3 source files only.

IncludeOpForFullLoad => Bool

A value that enables a full load to write INSERT operations to the comma-separated value (.csv) output files only to indicate how the rows were added to the source database.

AWS DMS supports the "IncludeOpForFullLoad" parameter in versions 3.1.4 and later.

For full load, records can only be inserted. By default (the "false" setting), no information is recorded in these output files for a full load to indicate that the rows were inserted at the source database. If "IncludeOpForFullLoad" is set to "true" or "y", the INSERT is recorded as an I annotation in the first field of the .csv file. This allows the format of your target records from a full load to be consistent with the target records from a CDC load.

This setting works together with the "CdcInsertsOnly" and the "CdcInsertsAndUpdates" parameters for output to .csv files only. For more information about how these settings work together, see Indicating Source DB Operations in Migrated S3 Data (https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html#CHAP_Target.S3.Configuring.InsertOps) in the AWS Database Migration Service User Guide..

ParquetTimestampInMillisecond => Bool

A value that specifies the precision of any "TIMESTAMP" column values that are written to an Amazon S3 object file in .parquet format.

AWS DMS supports the "ParquetTimestampInMillisecond" parameter in versions 3.1.4 and later.

When "ParquetTimestampInMillisecond" is set to "true" or "y", AWS DMS writes all "TIMESTAMP" columns in a .parquet formatted file with millisecond precision. Otherwise, DMS writes them with microsecond precision.

Currently, Amazon Athena and AWS Glue can handle only millisecond precision for "TIMESTAMP" values. Set this parameter to "true" for S3 endpoint object files that are .parquet formatted only if you plan to query or process the data with Athena or AWS Glue.

AWS DMS writes any "TIMESTAMP" column values written to an S3 file in .csv format with microsecond precision.

Setting "ParquetTimestampInMillisecond" has no effect on the string format of the timestamp column value that is inserted by setting the "TimestampColumnName" parameter.

ParquetVersion => Str

The version of the Apache Parquet format that you want to use: "parquet_1_0" (the default) or "parquet_2_0".

PreserveTransactions => Bool

If set to "true", AWS DMS saves the transaction order for a change data capture (CDC) load on the Amazon S3 target specified by "CdcPath" (https://docs.aws.amazon.com/dms/latest/APIReference/API_S3Settings.html#DMS-Type-S3Settings-CdcPath). For more information, see Capturing data changes (CDC) including transaction order on the S3 target (https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html#CHAP_Target.S3.EndpointSettings.CdcPath).

This setting is supported in AWS DMS versions 3.4.2 and later.

RowGroupLength => Int

The number of rows in a row group. A smaller row group size provides faster reads. But as the number of row groups grows, the slower writes become. This parameter defaults to 10,000 rows. This number is used for .parquet file format only.

If you choose a value larger than the maximum, "RowGroupLength" is set to the max row group length in bytes (64 * 1024 * 1024).

ServerSideEncryptionKmsKeyId => Str

If you are using "SSE_KMS" for the "EncryptionMode", provide the AWS KMS key ID. The key that you use needs an attached policy that enables AWS Identity and Access Management (IAM) user permissions and allows use of the key.

Here is a CLI example: "aws dms create-endpoint --endpoint-identifier value --endpoint-type target --engine-name s3 --s3-settings ServiceAccessRoleArn=value,BucketFolder=value,BucketName=value,EncryptionMode=SSE_KMS,ServerSideEncryptionKmsKeyId=value"

ServiceAccessRoleArn => Str

The Amazon Resource Name (ARN) used by the service access IAM role. It is a required parameter that enables DMS to write and read objects from an S3 bucket.

TimestampColumnName => Str

A value that when nonblank causes AWS DMS to add a column with timestamp information to the endpoint data for an Amazon S3 target.

AWS DMS supports the "TimestampColumnName" parameter in versions 3.1.4 and later.

DMS includes an additional "STRING" column in the .csv or .parquet object files of your migrated data when you set "TimestampColumnName" to a nonblank value.

For a full load, each row of this timestamp column contains a timestamp for when the data was transferred from the source to the target by DMS.

For a change data capture (CDC) load, each row of the timestamp column contains the timestamp for the commit of that row in the source database.

The string format for this timestamp column value is "yyyy-MM-dd HH:mm:ss.SSSSSS". By default, the precision of this value is in microseconds. For a CDC load, the rounding of the precision depends on the commit timestamp supported by DMS for the source database.

When the "AddColumnName" parameter is set to "true", DMS also includes a name for the timestamp column that you set with "TimestampColumnName".

UseCsvNoSupValue => Bool

This setting applies if the S3 output files during a change data capture (CDC) load are written in .csv format. If set to "true" for columns not included in the supplemental log, AWS DMS uses the value specified by "CsvNoSupValue" (https://docs.aws.amazon.com/dms/latest/APIReference/API_S3Settings.html#DMS-Type-S3Settings-CsvNoSupValue). If not set or set to "false", AWS DMS uses the null value for these columns.

This setting is supported in AWS DMS versions 3.4.1 and later.

BUGS and CONTRIBUTIONS

The source code is located here: <https://github.com/pplu/aws-sdk-perl>

Please report bugs to: <https://github.com/pplu/aws-sdk-perl/issues>