General Storage Considerations and Settings

Introduction

Infinite Scale can connect to several types of storages. This document gives some general information, considerations and recommendations with regards to supported storages.

Note that depending on the storage used, special settings can be defined to optimize resources for memory and performance.

For a quick and rough overview, the following storages are mapped against an expected sizing. Note the way that Infinite Scale is setup like binary or orchestration with its variants usually comes along with the expected sizing:


Size Case Storage Type

Small or testing

Home

Local Storage

Medium

Small and midmarket enterprises

NFS

Large

Large enterprises and hosters

S3 for blobs
NFS for metadata and other data

  • This page intends to give an overview and does not go into configuration details.

  • The levels of space requirement described below give an indication of what to expect, but never a number.

Write Consistency

Independent of the storage used and how many instances of the storage-users service are running, Infinite Scale has mechanisms implemented to guarantee consistent writing to the backend for distributed setups. In any case, a file ID based on UUIDv4 is created to guarantee the uniqueness of file names. Proper file locking is implemented for writing metadata.

Important Data Distribution Notes

Read the following note carefully, especially if all your data is stored on POSIX only:

Consider using a separate partition or an external filesystem like NFS for the base data path. If you only have one partition for your OS, Infinite Scale and your data, filling up the filesystem with user data can make your system unresponsive. This can easily happen under the following conditions:

  • The total storage space consumed by all spaces, even if there are individual quotas set for spaces, exceeds the available disk space.

  • When multiple users have concurrent uploads of big files, those big files - partly uploaded - will not count against the target space quota. These files are temporarily located in the upload folder located in the base data path and moved when finished to the target space if the target space quota is not exceeded.

  • Expired uploads that have not been cleaned up (see Manage Unfinished Uploads) can demand storage unnecessarily and can be a hidden cause of exceeding the available storage space.

  • Index and thumbnail data stored for the search service and the thumbnails service is located on the same base data path and filled up the filesystem.

We highly recommend the follwing topics for production instances. For details and which environment variables are necessary for the changes see the Default Paths documentation:

  • To avoid filling up the filesystem the OS is running on making the system unresponsive, separate the OS from where Infinite Scale data will be stored by:

    • Use independent partitions for data and the OS.

    • Use NFS for the data.

    • Use S3 for blobs.

  • If you expect a data volume that can lead to a high demand for search and/or thumbnail space - which would be stored by default in a subfolder using the OCIS_BASE_DATA_PATH as base path:

Local Storage

Using local storage is a good choice if you want to test or plan to have a small production instance. Drawbacks are less scalability and failover, because of one monolithic block. Note that at any time you can switch selected mount points to NFS which is described in the next section.

To decide which filesystems can be used for local storage, see Supported POSIX-compliant Filesystems.

The following table describes where data should be stored when using Infinite Scale in production:


Environment Variable Target Space requirement

OCIS_CONFIG_DIR

Default (local)

Very low

OCIS_BASE_DATA_PATH

Partition

Medium

NFS

Using NFS is beneficial because the instance components are separated from the storage and can scale independently. Such a setup is used when you plan a production environment with many users. In addition, the Infinite Scale config is also stored on NFS using the OCIS_CONFIG_DIR environment variable as this is a requirement for orchestrated environments. For details on NFS see the Network File System documentation.


Environment Variable Target Space requirement

OCIS_CONFIG_DIR

NFS_1

Very low

OCIS_BASE_DATA_PATH

NFS_2

High, to be monitored
Alternatively medium if NFS_4

Search and Thumbnails

NFS_3

Medium, to be monitored

STORAGE_USERS_OCIS_ROOT

OCIS_BASE_DATA_PATH
Alternatively NFS_4

High, to be monitored

S3

S3 to store blobs is typically used by large enterprises and hosters, though it can fit for medium enterprises too. Data distribution and separation is a bit different compared to a pure POSIX backend. For details on S3 including configuration notes see the S3 documentation:

  • POSIX storage, usually NFS.

    • Metadata

    • Data for search and/or thumbnails

    • Other data

  • S3 for blobs

With S3, data will be distributed over different storages and mounts based on their use case. With such a setup, the system can scale according to the needs of large enterprises.


Environment Variable Target Space requirement

OCIS_CONFIG_DIR

NFS_1

Very low

OCIS_BASE_DATA_PATH

NFS_2

Medium, to be monitored
Alternatively low if NFS_4

Search and Thumbnails

NFS_3

Medium, to be monitored

STORAGE_USERS_S3NG_ROOT

OCIS_BASE_DATA_PATH
Alternatively NFS_4

Medium, to be monitored

S3 specific settings

S3

High

Server Resource Optimisation

Depending on the storage connected and the servers capabilities, Infinite Scale can be optimized using the servers resources. The relevant environment variable to configure this is:

STORAGE_USERS_OCIS_MAX_CONCURRENCY

The value to consider and only as a rule of thumb is based on how much CPU’s and memory the server has the instance of the storage-users service is running on, which kind of storage, POSIX or S3 is used for blobs and what you want to achieve.

Background

In a nutshell, the value for STORAGE_USERS_OCIS_MAX_CONCURRENCY defines how many workers are assigned to storage related tasks. Any worker not only serves its job, but also consumes CPU and memory resources which needs to balance out. On the other hand side, when it comes to the connected storage, workers serving S3 will be more in response waiting time compared to POSIX connections. As workers which are in waiting state do consume less resources, the value can be considered to allow overcommitting CPU resources.

Value Considerations

As a rule of thumb and if using POSIX storage only:

  • Performance without worrying about memory
    runtime.NumCPU() * 2

  • Performance
    runtime.NumCPU()

  • Limited memory available
    A value of 4 or lower, assuming 4 is still lower than the number of CPU available

If S3 is used storing blobs, the resulting value can be increased.

It is essential to monitor your instance with respect to CPU, memory, network latency and the load pattern created by users. Only this can give you a final view on adapting the value.