Search and indexing

To enable improved search performance, Tealeaf scans completed sessions for a predefined set of data that is typically of use in locating sessions.

The Portal and RTV use these indexes to quickly locate sessions based on search criteria that you enter. For many customers, this data set is sufficient to enable effective search for sessions of interest.

Indexes are not used for search of active sessions.

In some cases, you may decide that some session data that is not available by default for search should be accessible. For example:

You created pipeline rules that insert application-specific data into the request.
You deployed CX UI Capture for AJAX, which enables the capture of user interface events from the visitor's browser.
You deployed a Tealeaf Logging Framework, which enables the capture of application and event data from mobile native applications.
Note: Tealeaf Logging Frameworks are components of the CX Mobile module. Additional configuration and deployment is required.

In the above circumstances, the data may not be automatically indexed for search. Using one of the provided methods, you can make this data available for search through indexed data or event values.

Important: Adding index data is considered an administrator task. Adding data to be indexed for search increases the size of the indexes stored in the Processing Server. Depending on the volume of increased data that is marked for indexing, indexes can grow considerably and may impact available disk space and performance of the Processing Server. Before you begin adding data, you should review your goals with IT staff.

Character indexing

Searching for sessions in the Long-Term Canister uses an embedded search engine called dtSearch, which relies on an index of words in captured sessions. When a search is queried, dtSearch scans these indexes of to quickly find sessions that match search criteria.

For the dtSearch engine, the base unit of indexing is a word. For example, to search for the word apple, you must enter the full word apple in Portal search.

Note: To search for part of the word, you must use a wildcard character (*). To find ap, enter ap*.

To index words, dtSearch must understand which characters constitute words and which constitute punctuation such as whitespace, and which constitute noise. dtSearch only indexes words composed of characters that it doesn't consider to be noise.

Alphabet.dat

dtSearch defines what constitutes a word using an alphabet file, which designates the characteristics of each character that can be encountered in session data.

The alphabet file is located in the <install_directory>\Alphabet.dat directory.

Alphabet.dat file is a text file. For characters that are not displayed in text files, ASCII equivalents may be expressed using the slash and the hexadecimal code for the character. For example, \0c is the code for ASCII character 13, which is a return key.

When an index is created, the dtSearch engine stores a private copy of the current alphabet and hyphenation settings in the index_a.ix file in the index folder. These settings are used to index all files added to the index and to evaluate search requests for the index.

Character categories

dtSearch classifies characters into four categories: letter, space, hyphen, and ignore.

These categories are specified in the headings listed below:

Table 1. Character Categories
REQTEXT
Heading	Category	Meaning
`[Letters]`	letter	A searchable character. All of the characters in the alphabet (a-z and A-Z) and all of the digits (0-9) should be classified as letters. The alphabet specifies whether a letter character is lowercase or uppercase, whether or not it has an accent, and the lowercase or unaccented equivalent.
`[Hyphens]`	hyphen	Hyphen characters can receive special processing in dtSearch. By default, only the `-` character is defined as a hyphen. You can configure how hyphens are interpreted for indexing purposes.
`[Spaces]`	space	A character that causes a word break. For example, if you classify the period as a space character, then dtSearch would process U.S.A. as three separate words: `U`, `S` and `A`.
`[Ignore]`	ignore	A character that is disregarded in processing text. For example, if you classify the period as ignore instead of space then dtSearch would process U.S.A. as one word: `USA`.

The file must be concluded with an ending heading:

[End]

Replaced characters

During indexing, the following non-alphanumeric characters are replaced with the underscore character (_).

[ ] . $ :

When indexing field values or text outside fields, non-alphanumeric characters are treated according to their category in alphabet.dat file. When indexing field names, the above characters are replaced with the underscore.

For example, if the request field and value was the following:


[urlfield]
paymentDetailsVO.title=foo.
ctl01$PlaceHolderMain$ctl00$=bar.

The text indexed for search is the following:


<urlfield>
  <paymentDetailsVO_title>foo.</paymentDetailsVO_title>
  <ctl01_PlaceHolderMain_ctl00_>bar.</ctl01_PlaceHolderMain_ctl00_>
</urlfield>

Treatment of special characters before submitting a search query

The following characters are stripped from Portal-based search criteria and replaced with spaces before submitting to dtSearch.


\ / : & < > % ! ? . @ # $ ^ ( ) ? { } | ' ~ , * [ ] =

The asterisk character (*) is not stripped from the field, since it is used as a wildcard character.

Character indexing

Alphabet.dat

Character categories

Replaced characters

Treatment of special characters before submitting a search query

Join the community

Academy

Search, replay, and session administration

Character indexing

Alphabet.dat

Character categories

Replaced characters

Treatment of special characters before submitting a search query

Related articles

Join the community

Academy