About indexing

Indexed word limits

By default, Tealeaf imposes a limit of 32 characters on the lengths of words to be indexed. Any word that is longer than 32 characters is length is truncated to 32 characters for purposes of indexing.

For example, when the maximum word length is 32 characters, the words ThisWordIsMyFavoriteWordOfAllTime and ThisWordIsMyFavoriteWordOfAllTimeNoItsNot are both indexed as ThisWordIsMyFavoriteWordOfAllTim.

You can change the value of the Maximum Word Size setting to accommodate longer words if they are commonly in use on your web application. The maximum accepted word length is 128.

Note:

Changing this value can significantly alter the size of your indexes. Tealeaf recommends using the default setting.
Changes to this setting apply only to indexes that are created after the change. Typically, those indexes are created the following day.
The underlying search engine imposes a maximum limit of 80 characters on field names. When the maximum word length is greater than 80 characters, the underlying search engine limits field names to 80 characters. Field names that are longer than 80 characters are not included in the index at all. Using these words as search terms or field names will produce no results.

If you are searching for words longer than the maximum word size:

You can use the wildcard (*) to search.
You can create a search field that applies an MD5 hash to the value. Users submit the full text version of the search term, which is converted to the 32-character MD5 hash value and submitted to the search engine for processing.

Indexing hyphens

The Tealeaf search engine indexes blocks of text yet provides mechanisms for how special characters are treated. Hyphens in session data can be treated in multiple ways.

For example, the term cross-reference might appear in indexed data as:


crossreference
cross-reference
cross reference

Individual words within the hyphenated phrase are always indexed. In the above example, cross and reference are indexed in all methods.

You can configure the session indexer to index hyphenated text using any or all of the above methods. To specify the indexing style for hyphens, set Indexing Hyphen Style to one of the following values:

Value	Description
`Ignored`	Ignore hyphen (`crossreference`).
`Searchable Text`	Treat hyphens as searchable text (`cross-reference`).
`Spaces`	Treat hyphens as space (`cross reference`). This is the default value.
`All`	Index in all of the above styles. Note: Setting this value to `All` to index in all styles may bloat index sizes and produce unexpected results in searches involving longer phrases or words with multiple hyphens.

You should monitor changes in indexing rates after making this change.

Note: To apply this change to sessions that have already been indexed, you must re-index those sessions.

Index format and storage

Indexes consist of an index library file (IXILB.ILB) and a corresponding group of index files (*.IX).

The ILB file is used only if dtSearch Desktop is enabled. Index libraries are essentially lists that keep track of the names and locations of each index.
The IXLIB.TLL file contains the same information as the library file, in addition to information used exclusively by Tealeaf CX.

Index directories

An index directory is a sub-directory below the TeaLeaf\Canister\Indexes directory. Index directories are named with the time and date of index creation in the following format:

YYYYMMDDxxx

where: xxx is three sequential uppercase letters. For example, an index created on December 12, 2018 may be stored in a directory named 20181212AAA.

An index file may represent several sessions, a single session, or a partial session depending on the limits specified for your indexing options. The number of created indexes depends on the individual index size limit specified in the Indexing Options dialog box. For example, if the individual index size is limited to 50 MB, a new index directory is created after the files in the current index directory reach this limit.

After an index is created, it is added to the library file and listed by directory name.

Character indexing

Some rules by which the index performs indexing of specific characters can be applied through the alphabet.dat file. Additional special rules may be apply to specific data structures.

Format of Index Control File (`IXLIB.TLL`)

The following table provides a description of each tag that comprises the IXLIB.TLL file, a Tealeaf-specific file used by the CX RealiTea Viewer for searching. It may be necessary to check this file for troubleshooting purposes.

Tag	Description
`<Day>`	A text version of the date
`<Julian>`	A pseudo-julian date: `(year - 2000) * 1000) + DayOfTheYear`
`<FirstUse>`	UNIX™ time of the last time of the first session in the index
`<LastUse>`	UNIX time of the last time of the last session in the index
`<IndexName>`	The name of the index
`<IndexPath>`	Relative path of the index
`<Valid>`	Is this index valid? False under certain situations, primarily merging, while indexes are being created.
`<InUse>`	Is the index in use?
`<FirstSession>`	Canister session identifier of the first session in the index
`<LastSession>`	Canister session identifier of the last session in the index
`<CheckRequired>`	Should a verification be run on this index? This option is set only when the `-F` flag is given to IndexCheck, or if something went wrong during normal operation.
`<IndexSize>`	dtSearch determination of the size of the index
`<DocCount>`	dtSearch internal value of the document count for this index
`<CheckCount>`	Is the `TLPIS.ix` file current for this index?

Indexed content types

The following content types, also called Internet media types and MIME types, are indexed by default.

Note: These configured content types apply to HTTP responses only. HTTP requests are indexed based on individual sections.

text/html
text/plain
text/xml
application/xhtml+xml
application/rdf+xml
application/vnd.mozilla.xul+xml
application/xml

Indexed word limits

Indexing hyphens

Index format and storage

Index directories

Character indexing

Format of Index Control File (`IXLIB.TLL`)

Indexed content types

Join the community

Academy

Search, replay, and session administration

Indexed word limits

Indexing hyphens

Index format and storage

Index directories

Character indexing

Format of Index Control File (IXLIB.TLL)

Indexed content types

Related articles

Join the community

Academy

Format of Index Control File (`IXLIB.TLL`)