Full Text Search

Introduction

ownCloud offers the ability to use full text search via the Full Text Search app connecting to an Elasticsearch Server. This allows users to search not only for file names but also for content within files stored in ownCloud.

The Full Text Search app integrates full text search into ownCloud, powered by Elasticsearch. This document describes how to setup the ownCloud part of the Full Text Search app.

Prerequisites

  1. A fully functioning Elasticsearch Server 7. Follow the Installation and Upgrade Guide for your environment.

    • Version 1.0.0 of the Full Text Search app only works with Elasticsearch version 5.6.

    • With version >=2.0.0 of the app, Elasticsearch version > 7 is required and has been tested and is supported up to version 8.6.2 though newer versions may work without any issues.

  2. The Ingest Attachment Processor Plugin lets Elasticsearch extract metadata and text from over a thousand different file types such as PPT, XLS, PDF and more. To install the processor, run the following command from your Elasticsearch installation directory:

    sudo bin/elasticsearch-plugin install ingest-attachment

    Post installing the Plugin, you need to restart the Elasticsearch server:

    sudo service elasticsearch restart

Installation

To install the app, use the Marketplace app on your ownCloud server or proceed manually:

  1. Download and extract the tarball of the Full Text Search app to the apps directory (or custom apps directory, prefered) of your ownCloud instance.

  2. Use the App Commands to enable the search_elastic application with:

    sudo -u www-data ./occ app:enable search_elastic

    or enable it via the GUI Settings  Admin  Apps  Full Text Search  Enable.

Configuration

To configure the Full Text Search, go to Settings  Admin  Search.

Authentication Methods

Independent of the authentication method selected below, you need to provide the URL of the Elasticsearch server. For any of the authentication methods selected, your Elasticsearch server must be prepared upfront.

For the URL, both HTTP and HTTPS incuding the address and port can be used.

The app provides several authentication methods. Select the one of your choice and check out the details for the respective authentication method below:

Auth Items

No Authentication

When using No Authentication, just fill in the URL of the ES server.

No Auth

User / Password Authentication

When using User / Password Authentication, enter the credentials set up on the ES server. Note that the password will be stored encrypted in the ownCloud database.

User / Password Auth

API Key Authentication

When using API Key Authentication, enter the API Key with which the ES server was set up.

API Key Auth
The API Key needs to be the encoded one, not the api_key string. For details see the Create API key API at the ES documentation.

Search External Storage

Define if external storage should be included in ES indexing by setting the checkmark accordingly with Scan external Storages. Setting this checkmark not only enables search in external storages, but also search in federated shares. Note that this setting requires to rebuild the index.

Connector Setup

There are 2 different indexes. The Legacy is the old / current one while the RelevanceV2 is the new one. The difference between the two is how index data is stored because both provide different capabilities and index in different ways. This results in different search capabilities. The Legacy doesn’t differ from the current search while the RelevanceV2 has new capabilities.

To populate the new connector with enhanced capabilities, an occ command needs to be run for each user, creating index data accordingly. See the occ command section Fill a Secondary Index for more details.

Connector Setup

Migrating to the RelevanceV2 Connector:

  • If you haven’t indexed anything yet, you are encouraged to set up the connectors you want to use as part of the app configuration. The recommended one is RelevanceV2 for write and search.

  • If you have indexed data already, use the following steps to migrate to the new index scheme:

    1. Assuming you have the Legacy connector set up for write and search.

    2. Add the RelevanceV2 connector to the list of write connectors. The list should have both Legacy and RelevanceV2.

    3. Run the occ search:index:fillSecondary RelevanceV2 <user> command. The command needs to be run for all users, or at least the ones using the search app. Note that this step is expected to take a lot of time.

    4. Once indexed data has been migrated for all users, you can switch the search connector to use the new RelevanceV2 search capabilities.

    5. After checking everything is good, you can remove the old Legacy connector from the list of write connectors.

    6. Finally you can completely remove the old index from Elasticsearch.

With step 2, you will be writing into both indexes at the same time. This is expected to be slower. Note that step 2 just takes care of new files. Files indexed previously won’t be present in the new index. This is why step 3 is there.

Step 4 is important and you should stop at that point for a while. If something goes wrong, you can still revert things, in particular, you can switch back to the Legacy connector. In this case, use the occ command referenced above with the Legacy index.

From step 5 the actions are irreversible. If you want to go back, you’ll have to start a new migration.

It’s important to notice there isn’t any expected downtime while the migration happens. Until step 4, the Legacy connector will keep updating the index normally. When the switch happens in the search connector, the new RelevanceV2 connector will access the new index, which should have been fully updated.

Enhanced Search Capabilities with RelevanceV2

The RelevanceV2 can boost scores of recently modified files. New files should appear first though this is not a guarantee because the score of a file could be too low. Even with the boost, files could score less than older but more relevant files.

Additional searches you can do with the "RelevanceV2" connector:

  • Search by extension
    ext:pdf, ext:docx, ext:gif, ext:mp4, ext:tar.gz, ext:gz, etc., any extension is possible

  • Search by size, only in bytes or megabytes

    • Search by byte size:
      size.b:<8092 , size.b:>102400, size.b:[8092 TO 16184]

    • Search by megabyte size:
      size.mb:<3, size.mb:>9, size.mb:[3 TO 9]

  • Search by type: only "file" or "folder"
    type:file, type:folder

  • Search by date:

    • Search by timestamp:
      mtime:<1678960862, mtime:>1678960862, mtime:[1608111372 TO 1678960862]

    • Search by date:
      mtime:<2021-08-25, mtime:>2023-01-18, mtime:[2022-01-01 TO 2022-12-31]

  • Search by mimetype:
    mime:image, mime:gif, mime:text
    NOTE: To search for the whole mimetype such as "image/gif" use mime.key:image\/gif.

Each search term will narrow the search. For example brown ext:pdf will be interpreted as "name or content containing brown AND extension = pdf", so "brown.pdf" and "a brown paper.pdf" will appear, but not "brown.txt" or "blue.pdf"

Some examples of complex searches:

  • Files containing "confidential" updated since 2023 whose size is less than 10MB:
    confidential mtime:>2023-01-01 size.mb:<10

  • Folders containing more than 1GB:
    type:folder size.mb:>1024

  • Images between March and June 2020:
    mime:image mtime:[2020-03-01 TO 2020-06-30]

Note that matching by name is pretty lax, so expect a bunch of unexpected results. Anyway, good results are expected to be on top.

Save the Configuration

Save the configuration with the Save configuration button.

Set up the ES Index

When everything is set up, you can click the button Setup index which will tell the ES server to create the plain empty index and other related internal settings.

This step is important, because the red dot will turn green showing that everything has been set up correctly.

Resetting the ES Index

You can at any time reset the index if required by clicking on Reset index or with an occ command. The index will be recreated afterwards.

sudo -u www-data ./occ search:index:reset

Using occ Commands

You can use the:

  • occ Full Text Search command set to manage the app. These commands let administrators create, rebuild, reset, and update the search index. For example, the following command resets and recreates the index for all users:

    sudo -u www-data ./occ search:index:reset
  • occ Config Commands command set to configure the app.

    Examples:

    List app settings
    sudo -u www-data ./occ config:list search_elastic
    {
        "apps": {
            "search_elastic": {
                "enabled": "yes",
                "group": "content_searchers",
                "installed_version": "2.1.0",
                "nocontent": "false",
                "scanExternalStorages": "1",
                "servers": "elastic:xxxxxxx@172.17.0.3:9200",
                "types": "filesystem"
            }
        }
    }
    Set app options
    sudo -u www-data ./occ config:app:set \
        search_elastic scanExternalStorages --value 0

    or

    sudo -u www-data ./occ config:app:set \
        search_elastic scanExternalStorages --value 1

App Modes

The Full Text Search app provides two modes, which are active and passive.

Active Mode

After enabling the app, it will be by default in active mode
  • File changes will be indexed in background jobs.
    System cron is recommended, otherwise a lot of jobs might queue up.

  • Search results will be based on Elasticsearch.

  • Search functionality based on ownCloud core database queries will no longer be used.

    Active mode can cause a downtime for search when indexing starts on an already heavily used instance, because it takes a while until all files have been indexed.

Passive Mode

To do an initial full indexing without the app interfering, it can be put in passive mode
  • The administrator can run occ commands changing the search configuration without notice to the users.

  • The app will not index any changes by itself.

  • Search results will still be based on ownCloud core database queries.

Changing the App Mode

sudo -u www-data ./occ config:app:set \
    search_elastic mode --value passive

or

sudo -u www-data ./occ config:app:set \
    search_elastic mode --value active

Restrict Search Results

Index Metadata Only

If you only want to use the Full Text Search app as a more scalable search on filenames, you can disable content indexing by setting the option nocontent to true, which defaults to false:

sudo -u www-data ./occ config:app:set \
    search_elastic nocontent --value true
  • You have to reindex all files if you change this back to false. Setting it to true does not require reindexing.

  • It may be a more flexible way to go with limiting full text search to certain groups by setting the option group.nocontent, see below for details.

Limit Metadata Search for Groups

If you only want to use search for shared filenames, you can disable full text search for specific groups by setting the option group.nocontent to the groups whose users should only receive results based on filenames (not the full path), like users in the group nofulltext as in the example below:

sudo -u www-data ./occ config:app:set \
    search_elastic group.nocontent \
    --value nofulltext

You can also configure multiple groups by separating them with comma:

sudo -u www-data ./occ config:app:set \
    search_elastic group.nocontent \
    --value nofulltext,anothergroup,"group with blanks"

This allows a scalable search in shared files without clouding the results with content based hits.

Create the Index

When everything has been set up and configured, you can initiate creating the index. This must be done with an occ command. Depending on using active or passive mode, you either have to:

  • active mode: wait until the job has finished and search is available to users, or

  • passive mode: users continue to search with ownCloud embedded search and you switch over to active mode when the occ command has finished indexing.

sudo -u www-data ./occ search:index:create

Issues

When the Elasticsearch server is down or the index has not been set up, you may get the following message. Check if the ES server is reachable or if the index was set up properly as one solution to fix the issue.

Warning no Index
Warning unknown Key

User Manual

To find out more about the usage, check out the section in the User Manual: Search & Full Text Search.

Known Limitations

Currently, the app has the following known limitations:

  • If a shared file is renamed by the sharee (share receiver), the sharee cannot find the file using the new filename.

  • Search results are not updated when a text file is rolled back to an earlier version.

  • The app does not return results for recieved federated share files.

  • Search does currently not work when encrypting files via the encryption app.