A session index is a database that stores the locations of meaningful words and fields in each session. Because an index does not contain all text from each session, it can hold a large quantity of session information in a single file.
The Session Indexer service converts Long Term Archive session to XML format. It removes HTML tags and invalid XML characters. Indexes are named by day, based on the time of the last hit.
- Noise words such as "but" and "if" are not indexed.
- After a session has been closed by the Short Term Canister, the CX server automatically indexes any sessions selected for archiving.
- Index files locate occurrences of data or error codes for which you can configure derived events, and conduct faster and more effective searches of captured data.
The XML files are then converted to session index files and saved in the Canister\Indexes directory. To save disk space, Tealeaf recommends deleting this XML (the default setting).
The frequency with which the indexer service checks for sessions to index depends on the Sleep Time
setting.
Session index files can be used by both the Portal and RTV to search sessions stored in the Long Term Canister.
The underlying search engine supports many file types, including binary types such as .pdf, for indexing and search. When Tealeaf is configured to capture and process these file types, the search engine indexes the file for search, after which it can be searched through the Portal or RTV.
Documents in some formats are converted by the search engine to HTML for display. The original document is retained as part of the session record.
Note: The search engine does not generally rely on the file extension to identify file types. However, you must configure PCA to capture non-standard data types by using the filename extension.
For more information, see http://support.dtsearch.com/dts0103.htm.
UNC paths supported
You can enter UNC paths in any configuration field that requires a directory path.
IPv6 supported
IP addresses are indexed for search in IPv4 or IPv6 format. Depending on your deployment, the IPv6 versions of the address are inserted into the request, from which they are indexed for availability in search.
Index Processing
After a session is saved to the Long Term Canister, it gets indexed. During indexing, the session hits, Canister events, and Canister summary information are written in XML format and sent to the indexing engine.
The indexing operation consists of a series of sub-programs that are single-threaded and executed in a time-based manner. You can modify the number of processes that run at the same time.
Index Program runs in the background and monitors captured session data for indexing. It executes the following sub-programs to run at the appropriate times:
IndexCheck
- Checks to make sure indexes are synchronized with the library file and performs a verify operation that ensures indexes are in good condition.IndexMerge
- Merges multiple indexes.IndexMultiProcess
- Converts documents in multiple formats to indexes and saves them in the<Install_Directory>\Canister\Indexes
directory.IndexDelete
- Deletes sessions from the index when requested by other Tealeaf components.
Note: When Index Program is running, sub-processes such as IndexMerge
or IndexCheck
cannot be initiated from the command line. It is best to schedule these sub-processes through TMS.
Index Program loops continuously, looking for indexing to be done until a stop is requested. If it finds indexing to be done, it checks for disk space and then starts the process.
Index Program retrieves a list of non-indexed sessions. Work files are then generated, each containing a list of non-indexed sessions to be indexed in a single batch.
- The number of work files generated is based on the Batch work file parameter and the number of available processes.
The following types of files are valid for indexing:
TLA
- Captured Tealeaf Archive filesTLC
- Canister Tealeaf Archive filesTLA and TLC files are not normally generated during the capture process, but the system can be configured to do so for troubleshooting purposes.
Filename.ano.yyy
- Annotation files- If the file is
filename.ano.xxx
, it is indexed using dtSearch's native indexing for files of type xxx. The work filename isTeaLeafWork_nnn_mmm
orWORKTeaLeafWork_1045006104_00000001
where:
nnn
is the UNIX™ time at the time of creation.mmm
is a counter to ensure uniqueness.
- If the file is
DOC
- Microsoft™ Word filesPDF
- Portable Document Format (Adobe™ Reader) filesAnnotation, PDF, and DOC files can be added as attachments to sessions through CX RealiTea Viewer.
XML
- XML files
(Number of Index Processes) x (Batch work files)
The maximum number of entries in each file is specified by the Workfile Batch setting.
You may specify additional HTTP response content types for indexing.