Postprocessing Service Configuration
Introduction
The Infinite Scale postprocessing service handles the coordination of asynchronous post-processing steps.
General Prerequisites
To use the postprocessing service, an event system needs to be configured for all services. By default, Infinite Scale ships with a preconfigured nats
service.
Post-Processing Functionality
The storageprovider service (storage-users) can be configured to initiate asynchronous post-processing by setting the STORAGE_USERS_OCIS_ASYNC_UPLOADS
environment variable to true
. If this is the case, post-processing will get initiated after uploading a file and all bytes have been received.
The postprocessing
service will then coordinate configured post-processing steps like scanning the file for viruses. During post-processing, the file will be in a processing state
where only a limited set of actions are available.
The processing state excludes file accessibility by users. |
When all postprocessing steps have completed successfully, the file will be made accessible to users.
Additional Prerequisites for the postprocessing Service
Once post-processing has been enabled, configuring any post-processing step will require the requested services to be enabled and preconfigured. For example, to use the virusscan
step, one needs to have an enabled and configured antivirus
service.
Post-Processing Steps
The postprocessing service is individually configurable. This is achieved by allowing a list of post-processing steps to be performed in order of their appearance in the POSTPROCESSING_STEPS
envvar. This envvar expects a comma-separated list of steps that will be executed. Currently steps known to the system are virusscan
and delay
. Custom steps can be added but need an existing target for processing.
Virus Scanning
To enable virus scanning as a post-processing step after uploading a file, the environment variable POSTPROCESSING_STEPS
needs to contain the word virusscan
at one location in the list of steps. As a result, each uploaded file gets scanned for viruses as part of the post-processing steps. Note that the antivirus service
must be enabled and configured for this to work.
Delay
Though this is for development purposes only and NOT RECOMMENDED on production systems, setting the environment variable POSTPROCESSING_DELAY
to a duration not equal to zero will add a delay step with the configured amount of time. Infinite Scale will continue post-processing the file after the configured delay. Use the environment variable POSTPROCESSING_STEPS
and the keyword delay
if you have multiple post-processing steps and want to define their order. If POSTPROCESSING_DELAY
is set but the keyword delay
is not contained in POSTPROCESSING_STEPS
, it will be executed as the last post-processing step without being listed as the last one. In this case, a log entry will be written on service startup to notify the admin about the situation. That log entry can be avoided by adding the keyword delay
to POSTPROCESSING_STEPS
.
Custom Post-Processing Steps
By using the envvar POSTPROCESSING_STEPS
, custom post-processing steps can be added. Any word can be used as step name but be careful not to conflict with existing keywords like virusscan
and delay
. In addition, if a keyword is misspelled or the corresponding service either does not exist or does not follow the necessary event communication, the postprocessing service will wait forever to get the required response to proceed and therefore does not continue with any other processing.
Prerequisites
To use custom post-processing steps, you need a custom service listening to the configured event system. For more information, see General Prerequisites.
Workflow
When defining a custom postprocessing step (eg. "customstep"
), the postprocessing service will eventually send an event during postprocessing. The event will be of type StartPostprocessingStep
with its field StepToStart
set to "customstep"
. When the service defined as custom step receives this event, it can safely execute its actions. The postprocessing service will wait until it has finished its work. The event contains further information (filename, executing user, size, …) and also requires tokens and URLs to download the file in case byte inspection is necessary.
Once the service defined as custom step has finished its work, it should send an event of type PostprocessingFinished
via the configured events system back to the postprocessing service. This event needs to contain a FinishedStep
field set to "customstep"
. It also must contain the outcome of the step, which can be one of the following:
-
delete
: Abort postprocessing, delete the file. -
abort
: Abort postprocessing, keep the file. -
retry
: There was a problem that was most likely temporary and may be solved by trying again after some backoff duration. Retry runs automatically and is defined by the backoff behavior as described below. -
continue
: Continue postprocessing, this is the success case.
The backoff behavior as mentioned in the retry
outcome can be configured using the POSTPROCESSING_RETRY_BACKOFF_DURATION
and POSTPROCESSING_MAX_RETRIES
environment variables. The backoff duration is calculated using the following formula after each failure: backoff_duration = POSTPROCESSING_RETRY_BACKOFF_DURATION * 2^(number of failures - 1)
. This means that the time between the next round grows exponentially limited by the number of retries. Steps that still don’t succeed after the maximum number of retries will be automatically moved to the abort
state.
CLI Commands
Resume Post-Processing
If post-processing fails in one step due to an unforeseen error, current uploads will not be retried automatically. Starting with Infinite Scale release 4.0.0, a system administrator can instead run a CLI command to retry the failed upload which is a two step process:
-
First, find the upload ID of the failed upload.
ocis storage-users uploads list
-
Then use the restart command to resume post-processing of the ID selected.
ocis postprocessing restart -u <uploadID>
Storing
The postprocessing
service needs to store some metadata about uploads to be able to orchestrate post-processing. When running in single binary mode, the default in-memory implementation will be just fine. In distributed deployments it is recommended to use a persistent store, see below for more details.
The postprocessing
service stores each consumed event via the configured store in POSTPROCESSING_STORE
. Possible stores are:
Store Type | Description |
---|---|
|
Basic in-memory store and the default. |
|
Advanced in-memory store allowing max size. |
|
Stores data in a configured Redis cluster. |
|
Stores data in a configured Redis Sentinel cluster. |
|
Stores data in a configured etcd cluster. |
|
Stores data using the key-value-store feature of NATS JetStream. |
|
Stores nothing. Useful for testing. Not recommended in production environments. |
-
Note that in-memory stores are by nature not reboot-persistent.
-
Though usually not necessary, a database name and a database table can be configured for event stores if the event store supports this. Generally not applicable for stores of type
in-memory
. These settings are blank by default which means that the standard settings of the configured store apply. -
The postprocessing service can be scaled if not using
in-memory
stores and the stores are configured identically over all instances. -
When using
redis-sentinel
, the Redis master to use is configured viaPOSTPROCESSING_STORE_NODES
in the form of<sentinel-host>:<sentinel-port>/<redis-master>
like10.10.0.200:26379/mymaster
.
Configuration
Environment Variables
The postprocessing
service is configured via the following environment variables. Read the Environment Variable Types documentation for important details.
Name | Type | Default Value | Description |
---|---|---|---|
|
bool |
false |
Activates tracing. |
|
string |
|
The type of tracing. Defaults to '', which is the same as 'jaeger'. Allowed tracing types are 'jaeger' and '' as of now. |
|
string |
|
The endpoint of the tracing agent. |
|
string |
|
The HTTP endpoint for sending spans directly to a collector, i.e. http://jaeger-collector:14268/api/traces. Only used if the tracing endpoint is unset. |
|
string |
|
The log level. Valid values are: 'panic', 'fatal', 'error', 'warn', 'info', 'debug', 'trace'. |
|
bool |
false |
Activates pretty log output. |
|
bool |
false |
Activates colorized log output. |
|
string |
|
The path to the log file. Activates logging to this file if set. |
|
string |
127.0.0.1:9255 |
Bind address of the debug server, where metrics, health, config and debug endpoints will be exposed. |
|
string |
|
Token to secure the metrics endpoint. |
|
bool |
false |
Enables pprof, which can be used for profiling. |
|
bool |
false |
Enables zpages, which can be used for collecting and viewing in-memory traces. |
|
string |
memory |
The type of the store. Supported values are: 'memory', 'ocmem', 'etcd', 'redis', 'redis-sentinel', 'nats-js', 'noop'. See the text description for details. |
|
[]string |
[] |
A comma separated list of nodes to access the configured store. This has no effect when 'memory' or 'ocmem' stores are configured. Note that the behaviour how nodes are used is dependent on the library of the configured store. |
|
string |
postprocessing |
The database name the configured store should use. |
|
string |
postprocessing |
The database table the store should use. |
|
Duration |
0s |
Time to live for events in the store. The duration can be set as number followed by a unit identifier like s, m or h. Defaults to '336h' (2 weeks). |
|
int |
0 |
The maximum quantity of items in the store. Only applies when store type 'ocmem' is configured. Defaults to 512. |
|
string |
127.0.0.1:9233 |
The address of the event system. The event system is the message queuing service. It is used as message broker for the microservice architecture. |
|
string |
ocis-cluster |
The clusterID of the event system. The event system is the message queuing service. It is used as message broker for the microservice architecture. Mandatory when using NATS as event system. |
|
bool |
false |
Whether the ocis server should skip the client certificate verification during the TLS handshake. |
|
string |
|
The root CA certificate used to validate the server’s TLS certificate. If provided POSTPROCESSING_EVENTS_TLS_INSECURE will be seen as false. |
|
bool |
false |
Enable TLS for the connection to the events broker. The events broker is the ocis service which receives and delivers events between the services. |
|
[]string |
[] |
A comma separated list of postprocessing steps, processed in order of their appearance. Currently supported values by the system are: 'virusscan', 'policies' and 'delay'. Custom steps are allowed. See the documentation for instructions. |
|
Duration |
0s |
After uploading a file but before making it available for download, a delay step can be added. Intended for developing purposes only. The duration can be set as number followed by a unit identifier like s, m or h. If a duration is set but the keyword 'delay' is not explicitely added to 'POSTPROCESSING_STEPS', the delay step will be processed as last step. In such a case, a log entry will be written on service startup to remind the admin about that situation. |
YAML Example
Note that the filename shown below has been chosen on purpose.
See the Configuration File Naming for details when setting up your own configuration.
# Autogenerated
# Filename: postprocessing-config-example.yaml
tracing:
enabled: false
type: ""
endpoint: ""
collector: ""
log:
level: ""
pretty: false
color: false
file: ""
debug:
addr: 127.0.0.1:9255
token: ""
pprof: false
zpages: false
store:
store: memory
nodes: []
database: postprocessing
table: postprocessing
ttl: 0s
size: 0
postprocessing:
events:
endpoint: 127.0.0.1:9233
cluster: ocis-cluster
tls_insecure: false
tls_root_ca_certificate: ""
enable_tls: false
steps: []
delayprocessing: 0s