General Storage Considerations and Settings

Table of Contents

Introduction
Write Consistency
Important Data Distribution Notes
Local Storage
NFS
S3
PosixFS
Server Resource Optimisation
- Background
- Value Considerations

Introduction

Infinite Scale can connect to several types of storages. This document gives some general information, considerations and recommendations with regards to supported storages.

Note that depending on the storage used, special settings can be defined to optimize resources for memory and performance.

For a quick and rough overview, the following storages are mapped against an expected sizing. Note the way that Infinite Scale is setup like as binary or via an orchestration with its variants usually comes along with the expected sizing or usecase:

Size	Case	Storage Type
Small or testing	Home	Local Storage
Medium	Small and midmarket enterprises	NFS
Large	Large enterprises and hosters	S3 for blobs NFS for metadata and other data
—	Special usecase Shared access to the underlying filesystem by Infinite Scale and ordinary OS users or services	PosixFS (experimental)

Size

Case

Storage Type

Small or testing

Home

Local Storage

Medium

Small and midmarket enterprises

NFS

Large

Large enterprises and hosters

S3 for blobs
NFS for metadata and other data

—

Special usecase
Shared access to the underlying filesystem by Infinite Scale and ordinary OS users or services

PosixFS (experimental)

This page intends to give an overview and does not go into configuration details.
The levels of space requirement described below give an indication of what to expect, but never a number.

See the note for distinguishing posix vs PosixFS in the PosixFS documentation.

Write Consistency

Independent of the storage used and how many instances of the storage-users service are running, Infinite Scale has mechanisms implemented to guarantee consistent writing to the backend for distributed setups. For all storage types except PosixFS, a file ID based on UUIDv4 is created to guarantee the uniqueness of file names. Proper file locking is implemented for writing metadata.

Important Data Distribution Notes

Read the following note carefully, especially if all your data is stored on POSIX only:

Consider using a separate partition or an external filesystem like NFS for the base data path. If you only have one partition for your OS, Infinite Scale and your data, filling up the filesystem with user data can make your system unresponsive. This can easily happen under the following conditions:

The total storage space consumed by all spaces, even if there are individual quotas set for spaces, exceeds the available disk space.
When multiple users have concurrent uploads of big files, those big files - partly uploaded - will not count against the target space quota. These files are temporarily located in the upload folder located in the base data path and moved when finished to the target space if the target space quota is not exceeded.
Expired uploads that have not been cleaned up (see Manage Unfinished Uploads) can demand storage unnecessarily and can be a hidden cause of exceeding the available storage space.
Index and thumbnail data stored for the search service and the thumbnails service is located on the same base data path and filled up the filesystem.

We highly recommend the follwing topics for production instances. For details and which environment variables are necessary for the changes see the Default Paths documentation:

To avoid filling up the filesystem the OS is running on making the system unresponsive, separate the OS from where Infinite Scale data will be stored by:
- Use independent partitions for data and the OS.
- Use NFS for the data.
- Use S3 for blobs.
If you expect a data volume that can lead to a high demand for search and/or thumbnail space - which would be stored by default in a subfolder using the OCIS_BASE_DATA_PATH as base path:
- The data path for services mentioned in Services That Can Deviate from the Base Data Path can be individually configured. Using additional partitions/mount points are a possible way to achieve this.

Local Storage

Using local storage is a good choice if you want to test or plan to have a small production instance. Drawbacks are less scalability and failover, because of one monolithic block. Note that at any time you can switch selected mount points to NFS which is described in the next section.

To decide which filesystems can be used for local storage, see Supported POSIX-compliant Filesystems.

The following table describes where data should be stored when using Infinite Scale in production:

Environment Variable Target Space requirement

Environment Variable	Target	Space requirement
`OCIS_CONFIG_DIR`	Default (local)	Very low
`OCIS_BASE_DATA_PATH`	Partition	Medium

OCIS_CONFIG_DIR

Default (local)

Very low

OCIS_BASE_DATA_PATH

Partition

Medium

NFS

Using NFS is beneficial because the instance components are separated from the storage and can scale independently. Such a setup is used when you plan a production environment with many users. In addition, the Infinite Scale config is also stored on NFS using the OCIS_CONFIG_DIR environment variable as this is a requirement for orchestrated environments. For details on NFS see the Network File System documentation.

Environment Variable Target Space requirement

Environment Variable	Target	Space requirement
`OCIS_CONFIG_DIR`	NFS_1	Very low
`OCIS_BASE_DATA_PATH`	NFS_2	High, to be monitored Alternatively medium if NFS_4
Search and Thumbnails	NFS_3	Medium, to be monitored
`STORAGE_USERS_OCIS_ROOT`	`OCIS_BASE_DATA_PATH` Alternatively NFS_4	High, to be monitored

OCIS_CONFIG_DIR

NFS_1

Very low

OCIS_BASE_DATA_PATH

NFS_2

High, to be monitored
Alternatively medium if NFS_4

Search and Thumbnails

NFS_3

Medium, to be monitored

STORAGE_USERS_OCIS_ROOT

OCIS_BASE_DATA_PATH
Alternatively NFS_4

High, to be monitored

S3

S3 to store blobs is typically used by large enterprises and hosters, though it can fit for medium enterprises too. Data distribution and separation is a bit different compared to a pure POSIX backend. For details on S3 including configuration notes see the S3 documentation:

POSIX storage, usually NFS.
- Metadata
- Data for search and/or thumbnails
- Other data
S3 for blobs

With S3, data will be distributed over different storages and mounts based on their use case. With such a setup, the system can scale according to the needs of large enterprises.

Environment Variable Target Space requirement

Environment Variable	Target	Space requirement
`OCIS_CONFIG_DIR`	NFS_1	Very low
`OCIS_BASE_DATA_PATH`	NFS_2	Medium, to be monitored Alternatively low if NFS_4
Search and Thumbnails	NFS_3	Medium, to be monitored
`STORAGE_USERS_S3NG_ROOT`	`OCIS_BASE_DATA_PATH` Alternatively NFS_4	Medium, to be monitored
S3 specific settings	S3	High

OCIS_CONFIG_DIR

NFS_1

Very low

OCIS_BASE_DATA_PATH

NFS_2

Medium, to be monitored
Alternatively low if NFS_4

Search and Thumbnails

NFS_3

Medium, to be monitored

STORAGE_USERS_S3NG_ROOT

OCIS_BASE_DATA_PATH
Alternatively NFS_4

Medium, to be monitored

S3 specific settings

High

PosixFS

PosixFS is currently experimental only and should not be used in production environments.

For details on PosixFS including setup and configuration notes see the PosixFS documentation.

Server Resource Optimisation

Depending on the storage connected and the servers capabilities, Infinite Scale can be optimized using the servers resources. The relevant environment variable to configure this is:

STORAGE_USERS_OCIS_MAX_CONCURRENCY
(or OCIS_MAX_CONCURRENCY when defined on a global basis)

The value to consider and only as a rule of thumb is based on how much CPU’s and memory the server has the instance of the storage-users service is running on, which kind of storage, POSIX or S3 is used for blobs and what you want to achieve.

Background

In a nutshell, the value for STORAGE_USERS_OCIS_MAX_CONCURRENCY defines how many workers are assigned to storage related tasks. Any worker not only serves its job, but also consumes CPU and memory resources which needs to balance out. On the other hand side, when it comes to the connected storage, workers serving S3 will be more in response waiting time compared to POSIX connections. As workers which are in waiting state do consume less resources, the value can be considered to allow overcommitting CPU resources.

Value Considerations

As a rule of thumb and if using POSIX storage only:

Performance without worrying about memory
runtime.NumCPU() * 2
Performance
runtime.NumCPU()
Limited memory available
A value of 4 or lower, assuming 4 is still lower than the number of CPU available

If S3 is used storing blobs, the resulting value can be increased.

It is essential to monitor your instance with respect to CPU, memory, network latency and the load pattern created by users. Only this can give you a final view on adapting the value.