Prerequisites

Introduction

The prerequisites section gives an overview and background about minimum requirements with respect to hardware, virtualization, operating system and optional software to operate Infinite Scale successfully.

Note that Infinite Scale is based on a highly dynamic interoperable microservices concept, although it is delivered as a single binary. Of course, this software has to run on hardware, but its ability to adapt to various and maybe dynamic load scenarios across physical boundaries means that you can set up your physical environment flexibly, using modern technology.

The following sections are fundamental to understanding the general prerequisites, but their implementation is very dependent on your use-case. To support your decision about the environment, we recommend reading the documents about Architecture and Concepts and Availability and Scalability first.

Hardware

The minimum hardware requirements depend on the usage scenario you have in mind for your Infinite Scale instance. You need at least hardware that runs a Linux based operating system, preferably 64bit.

  • For simple testing purposes, you can even use a laptop or Raspberry Pi 4 (ARM64) with a minimum of 512MB of RAM.

  • For more intense testing purposes, the recommended start hardware is a multi-core processor with at least 4GB of RAM.

  • For production environments, CPU and RAM are the main limiting factors. A clear guideline for CPUs can’t be provided at this time since there are too many influencing factors. See RAM Considerations for memory and Bandwidth Considerations for network-related factors.

The following table shows the tested and supported hardware architecture matrix
OS / Docker Hardware Architecture

Linux on Bare Metal

386, AMD64, ARM, ARM64

Linux Hardware Virtualized

386, AMD64, ARM, ARM64

Linux Docker

AMD64, ARM, ARM64

Darwin (MacOS)

AMD64, ARM64, M1 (ARM64)

Note when referencing software builds for a particular architecture, ARM stands for ARM32(v6/v7) and ARM64 stands for ARM64(v8).

RAM Considerations

With regard to RAM, the question how much is needed can be answered with: it depends.

RAM requirements depend a lot on the quantity of spaces, the tasks that will run concurrently and if they access the storage back end. You can split this into the following main parts:

OS-dependent
  • Needs for the OS

  • Classic inode/block caching for storage back ends

Infinite Scale-dependent
  • Keeping the stat info in memory (stat cache)

  • Keeping static assets like for the web frontend

  • The number of concurrent clients accessing Infinite Scale

  • Computation processes like calculating thumbnails from files or images

While memory for computational processes depends heavily on the use case, which can be very dynamic, the memory needed for keeping block and stat info is easier to predict. Therefore it is essential to make up your mind in which environment you will deploy Infinite Scale before you get started. Consider re-reading the Availability and Scalability section, with focus on Scalability and Deployment Evolution.

Keeping Things in Memory

Infinite Scale has the concept of spaces where a space contains files and folders, is not owned by a user by default, but gets assigned to users. Spaces can be the users home, any mount created or an unlimited quantity of shares and re-shares.

Each space provides a virtual root and occupies at least one metadata block at its root. A metadata block will contain at minimum one inode, and can contain dependent of the inode size multiple inodes.

Whenever a file or folder below the root of the tree changes its content or metadata, and inode changes and the change is propagated to the spaces root to indicate it. A new stat info is created and cached and ETags get recalculated if requested based on the stat info. To let a client detect changes, the discovery phase starts at the spaces root to check for changed ETags since the last discovery going forward in the tree based on changed ETags.

Infinite Scale also has an internal process to identify when changes to files or folders happen to create a new stat info which is the base to calculate ETags.

The number of spaces one Infinite Scale instance can handle can be huge without re-accessing the storage back end, limited only by the server’s memory. If you chose a distributed deployment, there is no limit in spaces at all.

Here are some numbers based on the ext4 filesystem, see Filesystems and Shared Storage for different values:

  • An inode has a size of 256 bytes.

  • The blocksize that is cached by the OS is 4K.

  • A block therefore can contain max 16 inodes.

  • A spaces root needs at least one inode which results in occupying at least one 4k block.

  • The stat info managed and cached by Infinite Scale per space is 1K.

  • The size of an ETag is 40 bytes.

The benefit of keeping the inode/stat/block in memory is easily explained. All clients accessing Infinite Scale poll for ETag changes. If the base infos are kept in memory, the response time minimizes.

Calculation example:

Taking the numbers from above:

# of Spaces
(# of Root Blocks)
Block Cache
(4K)
Stat Cache
(1K)
Total RAM

1.000

4 MB

1 MB

5 MB

100.000

400 MB

100 MB

500 MB

1.000.000

4 GB

1 GB

5 GB

Summary (caching view only)

The above table can be interpreted as follows:

  • The memory needed for keeping the root inode is based on the number of spaces created (Infinite Scale relevant) and the blocksize (filesystem dependent).

  • The memory needed for caching stat info is directly related to the number spaces (Infinite Scale relevant).

Consideration Summary
  • Using 4GB of RAM is a good starting point.

  • Regulary check the quantity of spaces a servers must handle.

  • Getting a VFS cache hit/miss ratio is hard. Keep an eye on Kernel iostat which measures raw io. When it starts increasing and all RAM has been used as buffer cache, you may need to increase the amount of available RAM or redistribute services.

  • When using a distributed deployment, it is much easier to scale and re-distribute dynamic loads accordingly.

Background

Storing metadata in memory is important with respect to access and synchronization performance.

Backend Check

Infinite Scale has a notification process when a change occcurs and manages the stat info accordingly. An ETAg gets computed if requested, based on the stat info.

Client Check

Usually, every connected client polls his assigned spaces root ETag every 30 seconds and compares it to the former ETag received for changes. Based on detected changes, actions take place.

This makes it clear why RAM can be an essential performance factor for client access and synchronization when more spaces are present.

Bandwidth Considerations

The bandwidth requirements and limitations are based on the following background. Note that this is a view on the internal network (LAN) only. Any considerations about access from the Internet are not part of this description but can be derived from the LAN point of view:

Clients, which are accessing Infinite Scale, request information about what has changed. Depending on the response and if a file synchronization is required, different bandwidth needs may result. Note that when using e.g. the Desktop Client and virtual files (VFS), only those files get physically synced which are set to be locally present, preventing additional bandwidth consumption.

Request for changed elements

To get the information about changes, the request always starts at the spaces root, looking for changed ETags, and follows only a path that has changed elements. Therefore PROPFIND requests and responses are used. A request has about 500 bytes and a response has roughly 800 bytes in size.

Number of maximum concurrent PROPFIND responses per second
Network max. PROPFIND responses/s

100 Mbit (~10MB/s)

12.500

1 Gbit (~100MB/s)

125.000

10 Gbit (~1GB/s)

1.250.000

Request syncing changed files

When a file has been identified to be physically synced, the bandwidth requirements depend on the size and the time it should finish. Note that syncing changed files can saturate a network more easily than the handling of changed ETags!

Calculation example

Consider 500 concurring syncing users, syncing with the default setting of every 30 sec, will create about ~3K PROPFIND requests (500 x 712 / 60 / 2) which consume about 2.4MB/s of bandwidth (3K x 800B) - without doing the file syncs necessary. The physical transfer will create extra bandwidth requirements.

Summary

As you can see above, the bandwidth requirements depend on:

  • The number of concurrent clients accessing Infinite Scale

  • The number of spaces to be synced

  • The dynamics of changes

  • The relative location of a change

  • The need to download changed files locally

The quantity of files and folders in total has only an impact on the first, but not on recurring synchronizations.

Virtualization

Depending on the usecase, you can run Infinite Scale on:

  • No virtualization, bare metal

  • Virtualized hardware like VMWare, KVM, HyperV, VirtualBox etc.

  • Virtualized Linux operating system in Docker containers

Supported Operating Systems

For best performance, stability, support, and full functionality we officially support Infinite Scale running on the following Linux distributions:

  • Debian 10 and 11

  • Fedora 32 and 33

  • Red Hat Enterprise Linux 7.5 and 8 including all 100% compatible derivatives

  • SUSE Linux Enterprise Server 12 with SP4/5 and SLES 15

  • openSUSE Leap 15.2 and 15.3

  • Ubuntu 20.04 and 22.04

Additional Software

It is strongly recommend to use a reverse proxy for:

  1. security reasons,

  2. load balancing and

  3. high availability.

The Infinite Scale documentation will use traefik for examples, but you can use NGINX, Apache or others too. All three products provide either a binary or docker file to download.

Traefik is a modern HTTP reverse proxy and load balancer that makes deploying microservices easy. Traefik integrates with your existing infrastructure components (Docker, Swarm mode, Kubernetes, Marathon, Consul, Etcd, Rancher, Amazon ECS, …​) and configures itself automatically and dynamically. Pointing Traefik at your orchestrator should be the only configuration step you need.

NGINX NGINX is open source software for web serving, reverse proxying, caching, load balancing, media streaming, and more. It started out as a web server designed for maximum performance and stability. In addition to its HTTP server capabilities, NGINX can also function as a proxy server for email (IMAP, POP3, and SMTP) and a reverse proxy and load balancer for HTTP, TCP, and UDP servers.

Apache In addition to being a "basic" web server and providing static and dynamic content to end-users, Apache httpd (as well as most other web servers) can also act as a reverse proxy server, also-known-as a "gateway" server.

Filesystems and Shared Storage

Infinite Scale currently supports two different internal filesystem drivers which are ocisfs and s3ng.

  • When the ocisfs driver is used, data and metadata must be on a POSIX-compliant filesystem.

  • When the s3ng driver is used, data resides on a S3 bucket and the metadata will be stored on a POSIX-compliant filesystem which needs to be extra provisioned. This is necessary for performance reasons.

Other drivers can be used too like for the Ceph or EOS filesystem, but no support can be given because they are not developed or maintained by ownCloud.

The currently supported Infinite Scale POSIX-compliant file systems are the following. Note that the default block size impacts the caclulation example at RAM Considerations, which is on some filesystems definable and if applicable, is for informational purposes only:

Note the support for a Windows compatible filesystem like Samba will be available in a later release and separately announced.

Local Filesystems
Name Default Block Size

EXT4

4K

XFS

4K

BTRFS

16K

ZFS

128K

Shared Filesystems
Name Default Block Size

NFS

The block size depends on the rsize parameter in the mount options. Defaults to 4K, usually set to 32K.

NFS Notes

When using NFS, you have to take care that the NFS server provides Extended Attributes.

NFS storage based on Linux servers

Extended attributes are supported by NFS starting with kernel version 5.9, therefore the NFS server has to run on a system with kernel 5.9 or higher. To check, run the command uname -a on the NFS server and compare the displayed version number.

NFS servers provided from storage vendors

A certification matrix will be provided when available.

NFS version

Note that if the kernel or the storage system supports extended attributes, you have to use NFSv4 in order to use it.

Ceph Notes
  • Ceph is an open source flexible distributed storage system (multi-server, multi-rack, multi-site) with an object storage layer. Ceph-S3 presents that object storage with an API that emulates the AWS' S3 API.

  • Ceph follows a different concept with regard to handling metadata, which impacts memory requirements. See the Ceph Hardware Recommendations for more details.

  • Note that you cannot access the same files in CephFS and Ceph-S3. Ceph allows exposing commodity hardware as either blockstorage (RBD), S3 or CephFS. It is not possible to write a file via S3 and then read it via CephFS.