To enable improved search performance, Tealeaf scans completed sessions for a predefined set of data that is typically of use in locating sessions.
The Portal and RTV use these indexes to quickly locate sessions based on search criteria that you enter. For many customers, this data set is sufficient to enable effective search for sessions of interest.
Indexes are not used for search of active sessions.
In some cases, you may decide that some session data that is not available by default for search should be accessible. For example:
- You created pipeline rules that insert application-specific data into the request.
- You deployed CX UI Capture for AJAX, which enables the capture of user interface events from the visitor's browser.
- You deployed a Tealeaf Logging Framework, which enables the capture of application and event data from mobile native applications.
Note: Tealeaf Logging Frameworks are components of the CX Mobile module. Additional configuration and deployment is required.
In the above circumstances, the data may not be automatically indexed for search. Using one of the provided methods, you can make this data available for search through indexed data or event values.
Character indexing
Searching for sessions in the Long-Term Canister uses an embedded search engine called dtSearch, which relies on an index of words in captured sessions. When a search is queried, dtSearch scans these indexes of to quickly find sessions that match search criteria.
For the dtSearch engine, the base unit of indexing is a word. For example, to search for the word apple
, you must enter the full word apple
in Portal search.
*
). To find ap
, enter ap*.To index words, dtSearch must understand which characters constitute words and which constitute punctuation such as whitespace, and which constitute noise. dtSearch only indexes words composed of characters that it doesn't consider to be noise.
Alphabet.dat
dtSearch defines what constitutes a word using an alphabet file, which designates the characteristics of each character that can be encountered in session data.
The alphabet file is located in the <install_directory>\Alphabet.dat
directory.
Alphabet.dat
file is a text file. For characters that are not displayed in text files, ASCII equivalents may be expressed using the slash and the hexadecimal code for the character. For example, \0c
is the code for ASCII character 13, which is a return key.
When an index is created, the dtSearch engine stores a private copy of the current alphabet and hyphenation settings in the index_a.ix
file in the index folder. These settings are used to index all files added to the index and to evaluate search requests for the index.
Character categories
dtSearch classifies characters into four categories: letter, space, hyphen, and ignore.
These categories are specified in the headings listed below:
Heading | Category | Meaning |
---|---|---|
[Letters] |
letter | A searchable character. All of the characters in the alphabet (a-z and A-Z) and all of the digits (0-9) should be classified as letters. The alphabet specifies whether a letter character is lowercase or uppercase, whether or not it has an accent, and the lowercase or unaccented equivalent. |
[Hyphens] |
hyphen | Hyphen characters can receive special processing in dtSearch. By default, only the - character is defined as a hyphen.
|
[Spaces] |
space | A character that causes a word break. For example, if you classify the period as a space character, then dtSearch would process U.S.A. as three separate words: U , S and A . |
[Ignore] |
ignore | A character that is disregarded in processing text. For example, if you classify the period as ignore instead of space then dtSearch would process U.S.A. as one word: USA . |
The file must be concluded with an ending heading:
[End]
Replaced characters
During indexing, the following non-alphanumeric characters are replaced with the underscore character (_
).
[ ] . $ :
When indexing field values or text outside fields, non-alphanumeric characters are treated according to their category in alphabet.dat
file. When indexing field names, the above characters are replaced with the underscore.
For example, if the request field and value was the following:
[urlfield]
paymentDetailsVO.title=foo.
ctl01$PlaceHolderMain$ctl00$=bar.
The text indexed for search is the following:
<urlfield>
<paymentDetailsVO_title>foo.</paymentDetailsVO_title>
<ctl01_PlaceHolderMain_ctl00_>bar.</ctl01_PlaceHolderMain_ctl00_>
</urlfield>
Treatment of special characters before submitting a search query
The following characters are stripped from Portal-based search criteria and replaced with spaces before submitting to dtSearch.
\ / : & < > % ! ? . @ # $ ^ ( ) ? { } | ' ~ , * [ ] =
The asterisk character (*
) is not stripped from the field, since it is used as a wildcard character.