General Storage Considerations and Settings
Introduction
Infinite Scale can connect to several types of storages. This document gives some general information, considerations and recommendations with regards to supported storages.
Note that depending on the storage used, special settings can be defined to optimize resources for memory and performance.
For a quick and rough overview, the following storages are mapped against an expected sizing. Note the way that Infinite Scale is setup like as binary or via an orchestration with its variants usually comes along with the expected sizing or usecase:
Size | Case | Storage Type |
---|---|---|
Small or testing |
Home |
Local Storage |
Medium |
Small and midmarket enterprises |
NFS |
Large |
Large enterprises and hosters |
S3 for blobs |
— |
Special usecase |
PosixFS (experimental) |
|
See the note for distinguishing posix vs PosixFS in the PosixFS documentation. |
Write Consistency
Independent of the storage used and how many instances of the storage-users service are running, Infinite Scale has mechanisms implemented to guarantee consistent writing to the backend for distributed setups. For all storage types except PosixFS, a file ID based on UUIDv4 is created to guarantee the uniqueness of file names. Proper file locking is implemented for writing metadata.
Important Data Distribution Notes
Read the following note carefully, especially if all your data is stored on POSIX only:
Consider using a separate partition or an external filesystem like NFS for the base data path. If you only have one partition for your OS, Infinite Scale and your data, filling up the filesystem with user data can make your system unresponsive. This can easily happen under the following conditions:
|
We highly recommend the follwing topics for production instances. For details and which environment variables are necessary for the changes see the Default Paths documentation:
-
To avoid filling up the filesystem the OS is running on making the system unresponsive, separate the OS from where Infinite Scale data will be stored by:
-
Use independent partitions for data and the OS.
-
Use NFS for the data.
-
Use S3 for blobs.
-
-
If you expect a data volume that can lead to a high demand for search and/or thumbnail space - which would be stored by default in a subfolder using the
OCIS_BASE_DATA_PATH
as base path:-
The data path for services mentioned in Services That Can Deviate from the Base Data Path can be individually configured. Using additional partitions/mount points are a possible way to achieve this.
-
Local Storage
Using local storage is a good choice if you want to test or plan to have a small production instance. Drawbacks are less scalability and failover, because of one monolithic block. Note that at any time you can switch selected mount points to NFS which is described in the next section.
To decide which filesystems can be used for local storage, see Supported POSIX-compliant Filesystems.
The following table describes where data should be stored when using Infinite Scale in production:
Environment Variable | Target | Space requirement |
---|---|---|
|
Default (local) |
Very low |
|
Partition |
Medium |
NFS
Using NFS is beneficial because the instance components are separated from the storage and can scale independently. Such a setup is used when you plan a production environment with many users. In addition, the Infinite Scale config is also stored on NFS using the OCIS_CONFIG_DIR environment variable as this is a requirement for orchestrated environments. For details on NFS see the Network File System documentation.
Environment Variable | Target | Space requirement |
---|---|---|
|
NFS_1 |
Very low |
|
NFS_2 |
High, to be monitored |
Search and Thumbnails |
NFS_3 |
Medium, to be monitored |
|
|
High, to be monitored |
S3
S3 to store blobs is typically used by large enterprises and hosters, though it can fit for medium enterprises too. Data distribution and separation is a bit different compared to a pure POSIX backend. For details on S3 including configuration notes see the S3 documentation:
-
POSIX storage, usually NFS.
-
Metadata
-
Data for search and/or thumbnails
-
Other data
-
-
S3 for blobs
With S3, data will be distributed over different storages and mounts based on their use case. With such a setup, the system can scale according to the needs of large enterprises.
Environment Variable | Target | Space requirement |
---|---|---|
|
NFS_1 |
Very low |
|
NFS_2 |
Medium, to be monitored |
Search and Thumbnails |
NFS_3 |
Medium, to be monitored |
|
|
Medium, to be monitored |
S3 specific settings |
S3 |
High |
PosixFS
PosixFS is currently experimental only and should not be used in production environments.
For details on PosixFS including setup and configuration notes see the PosixFS documentation.
Server Resource Optimisation
Depending on the storage connected and the servers capabilities, Infinite Scale can be optimized using the servers resources. The relevant environment variable to configure this is:
STORAGE_USERS_OCIS_MAX_CONCURRENCY
(or OCIS_MAX_CONCURRENCY
when defined on a global basis)
The value to consider and only as a rule of thumb is based on how much CPU’s and memory the server has the instance of the storage-users service is running on, which kind of storage, POSIX or S3 is used for blobs and what you want to achieve.
Background
In a nutshell, the value for STORAGE_USERS_OCIS_MAX_CONCURRENCY
defines how many workers are assigned to storage related tasks. Any worker not only serves its job, but also consumes CPU and memory resources which needs to balance out. On the other hand side, when it comes to the connected storage, workers serving S3 will be more in response waiting time compared to POSIX connections. As workers which are in waiting state do consume less resources, the value can be considered to allow overcommitting CPU resources.
Value Considerations
As a rule of thumb and if using POSIX storage only:
-
Performance without worrying about memory
runtime.NumCPU() * 2
-
Performance
runtime.NumCPU()
-
Limited memory available
A value of 4 or lower, assuming 4 is still lower than the number of CPU available
If S3 is used storing blobs, the resulting value can be increased.
It is essential to monitor your instance with respect to CPU, memory, network latency and the load pattern created by users. Only this can give you a final view on adapting the value. |