S3 Simple Storage Service

Introduction

Infinite Scale can store blobs on S3 storage. This document gives some general information and considerations for the use of S3. See the General Storage Considerations in the S3 section for which data is stored where.

S3 Overview

This section gives a brief description about S3 and how it compares to POSIX based filesystems.

S3 based storage is well positioned for customers expecting a very high capacity demand for storing Infinite Scale blobs.

  • In S3, objects are stored by name in a collection called a bucket and have no hierarchical order.
    Objects in a bucket are indexed and can be listed with a prefix based search. This provides a clear scaling advantage. Compared with POSIX, the consistency domain is constrained to a single bucket instead of the entire hierarchical tree.

  • The bucket is guaranteed eventually consistent.
    The application must accept that an object that was just updated may or may not appear instantly in the list and may or may not have the latest information, the platform is allowed a best effort at being immediately consistent. Over time, the performance of the system shows that data is in general fully consistent. The architectural freedom to relax the constraint implies real performance and scalability advantages.

  • Object storage works best for static storage.
    This is especially true for unstructured data, where you write data once but may need to read it many times.

  • Requests are REST based and self-contained.
    This method is very well adapted to WAN based interactions.

  • Requests are signed with a shared secret, and communicated over an SSL connection.
    This provides protection against man in the middle attacks and prevents passwords from being communicated in requests.

For more details on S3 read the What is Object storage? from Google.

S3 comparison to POSIX
  • Performance:
    The POSIX filesystem interface is inherently IOPS centric. There is a lot of communication which is expensive and hard to scale. The RESTful S3 API addresses this by transforming IOPS into a throughput problem. Throughput is easier and cheaper to scale. This is the reason why an object storage has high performance at a massive scale.

  • Semantics:
    There is no way to guarantee consistency correctness because object operations are atomic and immutable. As a consequence, uncommitted data can be lost in case of a crash or corruption issues can occur in case of shared buckets.

  • Data Integrity:
    Writes or any mutations to a file do not appear in the namespace until it is committed. Concurrent access across shared buckets will not see any modifications. Shared access is therefore not the focus of S3.

  • Access Control:
    The way S3 APIs handle identity and access management (IAM) policies is incompatible with POSIX.

Configuration

Basically, two configurations need to be made.

  • The Storage-Users service needs to be configured for the use of POSIX and S3 storage for Infinite Scale. This includes the following environment variables:

    # activate s3ng storage driver
    STORAGE_USERS_DRIVER: s3ng
    # Path to metadata stored on POSIX
    STORAGE_USERS_S3NG_ROOT:
    # keep system data on ocis storage
    STORAGE_SYSTEM_DRIVER: ocis
    
    # s3ng specific settings
    STORAGE_USERS_S3NG_ENDPOINT:
    STORAGE_USERS_S3NG_REGION:
    STORAGE_USERS_S3NG_ACCESS_KEY:
    STORAGE_USERS_S3NG_SECRET_KEY:
    STORAGE_USERS_S3NG_BUCKET:
  • A S3 Bucket Policy needs to be created.

Also see the Docker Compose Examples in particular ocis_s3 and the Deploy the Chart section in the Container Orchestration documentation.

S3 Bucket Policy

Bucket policies are an Identity and Access Management (IAM) mechanism for controlling access to resources. They are a critical element in securing your S3 buckets against unauthorized access and attacks.

Bucket policies, which are a collection of statements defined on S3, are evaluated one after another in order of appearance, that describe what a requester is allowed to do on a bucket and with each object contained in that bucket. It applies to a requester who has been authenticated to access the bucket in question. For more details see AWS: Using bucket policies.

The following S3 bucket policies are a requirement for Infinite Scale when connecting to an S3 bucket. Replace the bucket name according your environment.

# The S3NG driver needs an existing S3 bucket with the following permissions:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ListBucket",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::bucket-name"
            ]
        },
        {
            "Sid": "ActionsInBucketContext",
            "Effect": "Allow",
            "Action": [
                "s3:*Object",
                "s3:*MultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::bucket-name/*"
            ]
        }
    ]
}