Indexed word limits
By default, Tealeaf imposes a limit of
32 characters on the lengths of words to be indexed. Any word that is longer than 32 characters is length is truncated to 32 characters for purposes of indexing.
For example, when the maximum word length is 32 characters, the words
ThisWordIsMyFavoriteWordOfAllTimeNoItsNot are both indexed as
You can change the value of the
Maximum Word Size setting to accommodate longer words if they are commonly in use on your web application. The maximum accepted word length is 128.
- Changing this value can significantly alter the size of your indexes. Tealeaf recommends using the default setting.
- Changes to this setting apply only to indexes that are created after the change. Typically, those indexes are created the following day.
- The underlying search engine imposes a maximum limit of 80 characters on field names. When the maximum word length is greater than 80 characters, the underlying search engine limits field names to 80 characters. Field names that are longer than 80 characters are not included in the index at all. Using these words as search terms or field names will produce no results.
If you are searching for words longer than the maximum word size:
- You can use the wildcard (
*) to search.
- You can create a search field that applies an MD5 hash to the value. Users submit the full text version of the search term, which is converted to the 32-character MD5 hash value and submitted to the search engine for processing.
The Tealeaf search engine indexes blocks of text yet provides mechanisms for how special characters are treated. Hyphens in session data can be treated in multiple ways.
For example, the term
cross-reference might appear in indexed data as:
crossreference cross-reference cross reference
Individual words within the hyphenated phrase are always indexed. In the above example,
reference are indexed in all methods.
You can configure the session indexer to index hyphenated text using any or all of the above methods. To specify the indexing style for hyphens, set
Indexing Hyphen Style to one of the following values:
||Ignore hyphen (
||Treat hyphens as searchable text (
||Treat hyphens as space (
||Index in all of the above styles.
Note: Setting this value to
You should monitor changes in indexing rates after making this change.
Index format and storage
Indexes consist of an index library file (
IXILB.ILB) and a corresponding group of index files (
ILBfile is used only if dtSearch Desktop is enabled. Index libraries are essentially lists that keep track of the names and locations of each index.
The IXLIB.TLLfile contains the same information as the library file, in addition to information used exclusively by Tealeaf CX.
An index directory is a sub-directory below the
TeaLeaf\Canister\Indexes directory. Index directories are named with the time and date of index creation in the following format:
xxx is three sequential uppercase letters. For example, an index created on December 12, 2018 may be stored in a directory named
An index file may represent several sessions, a single session, or a partial session depending on the limits specified for your indexing options. The number of created indexes depends on the individual index size limit specified in the Indexing Options dialog box. For example, if the individual index size is limited to 50 MB, a new index directory is created after the files in the current index directory reach this limit.
After an index is created, it is added to the library file and listed by directory name.
Some rules by which the index performs indexing of specific characters can be applied through the
alphabet.dat file. Additional special rules may be apply to specific data structures.
Format of Index Control File (
The following table provides a description of each tag that comprises the
IXLIB.TLL file, a Tealeaf-specific file used by the CX RealiTea Viewer for searching. It may be necessary to check this file for troubleshooting purposes.
||A text version of the date|
||A pseudo-julian date:
||UNIX™ time of the last time of the first session in the index|
||UNIX time of the last time of the last session in the index|
||The name of the index|
||Relative path of the index|
||Is this index valid? False under certain situations, primarily merging, while indexes are being created.|
||Is the index in use?|
||Canister session identifier of the first session in the index|
||Canister session identifier of the last session in the index|
||Should a verification be run on this index? This option is set only when the
||dtSearch determination of the size of the index|
||dtSearch internal value of the document count for this index|
Indexed content types
The following content types, also called Internet media types and MIME types, are indexed by default.