Prerequisites

Introduction

The prerequisites section gives an overview and background about minimum requirements with respect to hardware, virtualization, operating system and optional software to operate Infinite Scale successfully.

Note that Infinite Scale is based on a highly dynamic interoperable microservices concept, although it is delivered as a single binary. Of course, this software has to run on hardware, but its ability to adapt to various and maybe dynamic load scenarios across physical boundaries means that you can set up your physical environment flexibly, using modern technology.

The following sections are fundamental to understanding the general prerequisites, but their implementation is very dependent on your use-case. To support your decision about the environment, we recommend reading the documents about Architecture and Concepts and Availability and Scalability first.

Hardware

The minimum hardware requirements depend on the usage scenario you have in mind for your Infinite Scale instance. You need at least hardware that runs a Linux based operating system, preferably 64bit.

  • For simple testing purposes, you can even use a laptop or Raspberry Pi 4 (ARM64) with a minimum of 512MB of RAM.

  • For more intense testing purposes, the recommended start hardware is a multi-core processor with at least 4GB of RAM.

  • For production environments, CPU and RAM are the main limiting factors. A clear guideline for CPUs can’t be provided at this time since there are too many influencing factors. See RAM Considerations for memory and Bandwidth Considerations for network-related factors.

The following table shows the tested and supported hardware architecture matrix
OS / Docker Hardware Architecture

Linux on Bare Metal

386, AMD64, ARM, ARM64

Linux Hardware Virtualized

386, AMD64, ARM, ARM64

Linux Docker

AMD64, ARM, ARM64

Darwin (MacOS)

AMD64, ARM64, M1 (ARM64)

Note when referencing software builds for a particular architecture, ARM stands for ARM32(v6/v7) and ARM64 stands for ARM64(v8).

RAM Considerations

With regard to RAM, the question how much is needed can be answered with: it depends.

RAM requirements depend a lot on the quantity of spaces, the tasks that will run concurrently and if they access the storage back end. You can split this into the following main parts:

OS-dependent
  • Needs for the OS

  • Classic inode/block caching for storage back ends

Infinite Scale-dependent
  • Keeping the stat info in memory (stat cache)

  • Keeping static assets like for the web frontend

  • The number of concurrent clients accessing Infinite Scale

  • Computation processes like calculating thumbnails from files or images

While memory for computational processes depends heavily on the use case, which can be very dynamic, the memory needed for keeping block and stat info is easier to predict. Therefore it is essential to make up your mind in which environment you will deploy Infinite Scale before you get started. Consider re-reading the Availability and Scalability section, with focus on Scalability and Deployment Evolution.

Keeping Things in Memory

Infinite Scale has the concept of spaces where a space contains files and folders, is not owned by a user by default, but gets assigned to users. Spaces can be the users home, any mount created or an unlimited quantity of shares and re-shares.

Each space provides a virtual root and occupies at least one metadata block at its root. A metadata block will contain at minimum one inode, and can contain dependent of the inode size multiple inodes.

Whenever a file or folder below the root of the tree changes its content or metadata, and inode changes and the change is propagated to the spaces root to indicate it. A new stat info is created and cached and ETags get recalculated if requested based on the stat info. To let a client detect changes, the discovery phase starts at the spaces root to check for changed ETags since the last discovery going forward in the tree based on changed ETags.

Infinite Scale also has an internal process to identify when changes to files or folders happen to create a new stat info which is the base to calculate ETags.

The number of spaces one Infinite Scale instance can handle can be huge without re-accessing the storage back end, limited only by the server’s memory. If you chose a distributed deployment, there is no limit in spaces at all.

Here are some numbers based on the ext4 filesystem, see Filesystems and Shared Storage for different values:

  • An inode has a size of 256 bytes.

  • The blocksize that is cached by the OS is 4K.

  • A block therefore can contain max 16 inodes.

  • A spaces root needs at least one inode which results in occupying at least one 4k block.

  • The stat info managed and cached by Infinite Scale per space is 1K.

  • The size of an ETag is 40 bytes.

The benefit of keeping the inode/stat/block in memory is easily explained. All clients accessing Infinite Scale poll for ETag changes. If the base infos are kept in memory, the response time minimizes.

Calculation example:

Taking the numbers from above:

# of Spaces
(# of Root Blocks)
Block Cache
(4K)
Stat Cache
(1K)
Total RAM

1.000

4 MB

1 MB

5 MB

100.000

400 MB

100 MB

500 MB

1.000.000

4 GB

1 GB

5 GB

Summary (caching view only)

The above table can be interpreted as follows:

  • The memory needed for keeping the root inode is based on the number of spaces created (Infinite Scale relevant) and the blocksize (filesystem dependent).

  • The memory needed for caching stat info is directly related to the number spaces (Infinite Scale relevant).

Consideration Summary
  • Using 4GB of RAM is a good starting point.

  • Regularly check the quantity of spaces a servers must handle.

  • Getting a VFS cache hit/miss ratio is hard. Keep an eye on Kernel iostat which measures raw io. When it starts increasing and all RAM has been used as buffer cache, you may need to increase the amount of available RAM or redistribute services.

  • When using a distributed deployment, it is much easier to scale and re-distribute dynamic loads accordingly.

Background

Storing metadata in memory is important with respect to access and synchronization performance.

Backend Check

Infinite Scale has a notification process when a change occurs and manages the stat info accordingly. An ETAg gets computed if requested, based on the stat info.

Client Check

Usually, every connected client polls his assigned spaces root ETag every 30 seconds and compares it to the former ETag received for changes. Based on detected changes, actions take place.

This makes it clear why RAM can be an essential performance factor for client access and synchronization when more spaces are present.

Bandwidth Considerations

The bandwidth requirements and limitations are based on the following background. Note that this is a view on the internal network (LAN) only. Any considerations about access from the Internet are not part of this description but can be derived from the LAN point of view:

Clients, which are accessing Infinite Scale, request information about what has changed. Depending on the response and if a file synchronization is required, different bandwidth needs may result. Note that when using e.g. the Desktop Client and virtual files (VFS), only those files get physically synced which are set to be locally present, preventing additional bandwidth consumption.

Request for changed elements

To get the information about changes, the request always starts at the spaces root, looking for changed ETags, and follows only a path that has changed elements. Therefore PROPFIND requests and responses are used. A request has about 500 bytes and a response has roughly 800 bytes in size.

Number of maximum concurrent PROPFIND responses per second
Network max. PROPFIND responses/s

100 Mbit (~10MB/s)

12.500

1 Gbit (~100MB/s)

125.000

10 Gbit (~1GB/s)

1.250.000

Request syncing changed files

When a file has been identified to be physically synced, the bandwidth requirements depend on the size and the time it should finish. Note that syncing changed files can saturate a network more easily than the handling of changed ETags!

Calculation example

Consider 500 concurring syncing users, syncing with the default setting of every 30 sec, will create about ~3K PROPFIND requests (500 x 712 / 60 / 2) which consume about 2.4MB/s of bandwidth (3K x 800B) - without doing the file syncs necessary. The physical transfer will create extra bandwidth requirements.

Summary

As you can see above, the bandwidth requirements depend on:

  • The number of concurrent clients accessing Infinite Scale

  • The number of spaces to be synced

  • The dynamics of changes

  • The relative location of a change

  • The need to download changed files locally

The quantity of files and folders in total has only an impact on the first, but not on recurring synchronizations.

Virtualization

Depending on the usecase, you can run Infinite Scale on:

  • No virtualization, bare metal

  • Virtualized hardware like VMWare, KVM, HyperV, VirtualBox etc.

  • Virtualized Linux operating system in Docker containers

Supported Operating Systems

For best performance, stability, support, and full functionality we officially support Infinite Scale running on the following Linux distributions:

  • Debian 10 and 11

  • Fedora 32 and 33

  • Red Hat Enterprise Linux 7.5 and 8 including all 100% compatible derivatives

  • SUSE Linux Enterprise Server 12 with SP4/5 and SLES 15

  • openSUSE Leap 15.2 and 15.3

  • Ubuntu 20.04 and 22.04

Additional Software

It is strongly recommend to use a reverse proxy for:

  1. security reasons,

  2. load balancing and

  3. high availability.

The Infinite Scale documentation will use traefik for examples, but you can use NGINX, Apache or others too. All three products provide either a binary or docker file to download.

Traefik is a modern HTTP reverse proxy and load balancer that makes deploying microservices easy. Traefik integrates with your existing infrastructure components (Docker, Swarm mode, Kubernetes, Marathon, Consul, Etcd, Rancher, Amazon ECS, …​) and configures itself automatically and dynamically. Pointing Traefik at your orchestrator should be the only configuration step you need.

NGINX NGINX is open source software for web serving, reverse proxying, caching, load balancing, media streaming, and more. It started out as a web server designed for maximum performance and stability. In addition to its HTTP server capabilities, NGINX can also function as a proxy server for email (IMAP, POP3, and SMTP) and a reverse proxy and load balancer for HTTP, TCP, and UDP servers.

Apache In addition to being a "basic" web server and providing static and dynamic content to end-users, Apache httpd (as well as most other web servers) can also act as a reverse proxy server, also-known-as a "gateway" server.

Filesystems and Shared Storage

Infinite Scale currently supports two different internal filesystem drivers which are ocisfs and s3ng.

  • When the ocisfs driver is used, data and metadata must be on a POSIX-compliant filesystem. This driver decomposes the metadata and persists it in a POSIX filesystem. Blobs are stored on the filesystem as well. The layout makes extensive use of symlinks and extended attributes. A filesystem like xfs or zfs without practical inode size limitations is recommended. A further integration with file systems like CephFS or GPFS is under investigation.

    Ext4 limits the number of bytes that can be used for extended attribute names and their values to the size of a single block (by default 4k). This reduces the number of shares for a single file or folder to roughly 20-30, as grants have to share the available space with other metadata.
  • When the s3ng driver is used, data resides on a S3 bucket and the metadata will be stored on a POSIX-compliant filesystem which needs to be extra provisioned. This is necessary for performance reasons. When listing extended attributes of an object, the result is currently limited to 64kB. Assuming a 20 byte uuid, a grant has ~40 bytes, which would limit the number of extended attributes to ~1630 entries or ~1600 shares. With further development, this limitation may be removed.

Other drivers can be used too like for the Ceph or EOS filesystem, but no support can be given because they are not developed or maintained by ownCloud.

The currently supported Infinite Scale POSIX-compliant file systems are the following. Note that the default block size impacts the calculation example at RAM Considerations, which is on some filesystems definable and if applicable, is for informational purposes only:

Local Filesystems
Name Default Block Size

EXT4

4K

XFS

4K

BTRFS

16K

ZFS

128K

Remote Filesystems
Name Default Block Size

NFS

The block size depends on the rsize parameter in the mount options. Defaults to 4K, usually set to 32K.

Note the support for a Windows compatible filesystem like Samba will be available in a later release and separately announced.

All POSIX-compliant supported filesystems, local or remote, must support extended attributes. You can check this with the following commands, change to a location in the mounted filesystem you want to check before:

touch foo.txt && attr -s mix -V bar foo.txt
Attribute "mix" set to a 3 byte value for foo.txt:
bar
attr -g mix foo.txt
Attribute "mix" had a 3 byte value for foo.txt:
bar
rm foo.txt
NFS Notes

When using NFS, you have to take care that the NFS server AND the NFS client provides Extended Attributes.

NFS Storage Based on Linux Servers

When using a kernel version 5.9 or higher, extended attributes for the NFS server and the NFS client are part of the system. To check, run the command:

uname -r

On the system providing the NFS server AND on the NFS client check the displayed version number.

NFS Client

If you have an NFS server capable of extended attributes but you are unsure if your client accessing the server supports it, check the nfs-utils or nfs-common package version of your NFS client with the command:

mount.nfs -V

You need at minimum NFS version 2.6.1. For more details see the general NFS Utils Release History and the Ubuntu nfs-common Packages.

NFS Servers Provided from Storage Vendors

A certification matrix will be provided when available.

NFS Protocol Version

Note that if the kernel or the storage system supports extended attributes, you have to use NFSv4 in order to use it.

Ceph Notes
  • Ceph is an open source flexible distributed storage system (multi-server, multi-rack, multi-site) with an object storage layer. Ceph-S3 presents that object storage with an API that emulates the AWS' S3 API.

  • Ceph follows a different concept with regard to handling metadata, which impacts memory requirements. See the Ceph Hardware Recommendations for more details.

  • Note that you cannot access the same files in CephFS and Ceph-S3. Ceph allows exposing commodity hardware as either blockstorage (RBD), S3 or CephFS. It is not possible to write a file via S3 and then read it via CephFS.

Compatible Clients

The mobile clients and the desktop client work just like with ownCloud server when connected to Infinite Scale if used in their current versions. This means that users can access and synchronize their home folder and received shares, but spaces are not available. Supported versions are:

  • Desktop app version 2.11

  • iOS app version 11.10

  • Android app version 2.21

At the moment, only the new ownCloud Web interface offers spaces. Spaces will be available in the following upcoming client versions:

  • Desktop app 3.0

  • iOS app 12.0

Pre-release versions for Desktop and iOS are already available for testing.