Configuring Search
Introduction
This document gives some guidance and notes about how to configure the search service. Using search is a good way to find documents based on various criteria. The search service will return results for documents the searcher is eligible to access. Read the description of the search service for more details.
You can use basic search functionality without any configuration as it is preconfigured when using the default container deployment.
Index Data
With basic search, only metadata is indexed. Content can be searched when configuring Tika as the content extraction engine.
-
Metadata: all data that describes the file like
Name
,Size
,MimeType
,Tags
andMtime
. -
Content: all data that relates to content of the file like
words
,geo data
,exif data
etc.
Depending on the configuration, space requirements can differ.
General Considerations
Space Requirements
See the Space Requiremen for more details.
Index Management
See the Index Management for more details.
Content Extraction and Indexing
To search for content, a content extraction engine needs to be installed and configured. Infinite scale currently supports Apache Tika - a content analysis toolkit to extract content.
Tika Extractor
Though you can compile Tika manually on your system by following the Getting Started with Apache Tika guide (newer Tika versions may be available) or download a precompiled Tika server, you can also run Tika using a Tika container. The Local Production Setup deployment example is based on this container and is ready to use.
Configure Search using the Tika Extractor
As prerequisite, Tika needs to be accessible via http://your-server:9998
either using the manual installation or via docker. You can decide to let Tika run on the same or a separate server from where the search service runs. The following configuration assumes that all Infinite Scale services including the search service and Tika run on the same hardware.
The necessary environment variables to be set are described at the Tika Extractor documentation.
Configuring Tika
Though in the majority of cases not necessary, components of Tika can be configured if required by providing an xml file with necessary data. For more information see Configuring Tika on their web page.