By default, Tealeaf tracks the URLs of your web application that are reached by your visitors. Depending on how your application is structured, the number of possible URLs generated by visitor behaviors can range from 1 to millions.
Tealeaf can capture each recorded instance of an URL and store it into the Reporting database as a separate record. If your web application generates millions of URLs, the volume of data can become prohibitively large and, if unbounded, can consume all available storage.
Note: You can also use the methods in this section to manage any dimension that generates a large number of values. Some contents apply only to the URL dimension.
How URLs are tracked
Since the URLs visited by your visitors are so important to identifying issues and behaviors of your web application, Tealeaf automatically captures these for you, if the Tealeaf Reference session agent is deployed in your pipeline.
Note: For the capture of URL, Server, Host, and Application information, the Tealeaf Reference session must be included in any Windows™ pipeline that processes hits for search and reporting purposes.
Available URL values
When the hit is passed through the Event Engine, the value that is contained in the TLT_URL
request parameter is detected by the provided hit attribute URL
(Normalized)
. This hit attribute checks the first value on the page or hit for the presence of the TLT_URL=
request variable and returns the value after the equals sign.
The (Normalized)
suffix is added to indicate that it is sourced from the output of the Tealeaf Reference session agent, which normalizes the values and inserts them into the request variable.
That value that is captured to the hit attribute then can be added as a new value in the URL (Normalized)
dimension.
Depending on how the dimension is configured, this value may be recorded in the database. The following sections describe how to populate the URL (Normalized)
dimension from the TLT_URL
value that is inserted into the request.
Raw URL
When a visitor to your web application requests a page, the raw request that is submitted to the web server is similar to the following:
[RawRequest]
GET /news/news-releases/2011//Tealeaf-Acquires-Overstat.php?NewVisitor=true
HTTP/1.1
If-Modified-Since: Mon, 21 Sep 2009 17:44:54 GMT
If-None-Match: "2f50428-654-4741a0a159180"
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a1pre)
Gecko Minefield/3.7a1pre
Host: www.tealeaf.com
Cache-Control: no-cache
TLTUID=DB7A473CFFAD10FF0246A7393024FD9A;
TLTHID=F0F495041470101401A1F9982C669B36;
TLTSID=A4F96798140610140066B956324A693B;
Captured URLs
In the above, the value next to GET
contains the URL information for the page that is requested from the domain that is defined for the Host
value. The web server managing www.tealeaf.com
returns the requested page as the response.
When this hit is later passed to Tealeaf, the CX Passive Capture Application scans the raw request to locate the URL value and stores the URL in the URL
variable in the [env]
section:
[env]
...
URL=/news/news-releases/2011//Tealeaf-Acquires-Overstat.php?NewVisitor=true
Query parameters are also written to the [urlfield]
section of the request as name-value pairs.
Normalized URLs
When the hit is passed through the Canister, a pipeline session agent normalizes this value and three other values and writes them into the [appdata]
section.
[appdata]
TLT_URL=/news/news-releases/2011/tealeaf-acquires-overstat.php
TLT_SERVER=192.168.100.76
TLT_HOST_NAME=www.tealeaf.com
TLT_APPLICATION_NAME=news
This normalization and insertion is performed by the Tealeaf Reference session agent, which is required for proper parsing of the URL, Host, App, and Server dimensions.
Normalization includes the removal of query parameters from the URL string and converting it to all lowercase.
Note: Do not attempt to create hit attributes to scan for this value and capture those values to a dimension. Use the Tealeaf Reference session agent, which performs extra functions.
URLs for cxOverstat
cxOverstat enables the capture of usability data from the visited pages of each visitor session that is monitored by Tealeaf.
cxOverstat includes a dimension for tracking the URLs from which usability data was extracted. The ScreenView - URL
dimension included with cxOverstat may be vulnerable to data explosion. Use the same dimension value logging and population strategies that are used to populate the URL (Normalized)
dimension to manage the growth of the cxOverstat URL dimension.
Note: Failing to track a URL value in the ScreenView - URL
dimension breaks the drill-down access to the cxOverstat reports available on the URL through Browser Based Replay.
Storage of dimension values
When values are detected, they may be stored based on the following configuration options, which operate independently of each other. These options are selectable in the dimension definition.
Log dimension values
If logging for the dimension is enabled, all observed values for the dimension are stored in the database. Each instance of each value is counted when detected.
Values stored in the dimension logs are removed after two weeks, so the logs can be kept to a manageable size.
These logs can be used to build up whitelists, blacklists, and group lists while keeping down data growth.
Whitelists versus observed values
For the Values to Record
setting, you can choose one of the following options:
Whitelist Only
- When this option is selected, only the detected values that are matched against the whitelist that you defined and uploaded are allowed to be recorded for the dimension. All other values are recorded with an
[others]
value or, if the maximum number of values per hour was reached (Max Values Per Hour
), with the[limit]
value.For the
URL (Normalized)
dimension, the goal is to build a representative whitelist of the URLs of the web application. The remainder of this page works towards building a useful whitelist for this dimension. Whitelist + Observed Values
- When this option is selected, values that match whitelisted values are recorded, as well as any other value detected in the capture stream. The net effect is that each URL value for each hit is captured by the
URL (Normalized)
hit attribute, which then populates theURL (Normalized)
dimension. Each of these values is then recorded.Note: For high-volume dimensions the
Whitelist + Observed Values
can generate unbounded growth of the dimension tables in the database. This setting can be dangerous to the overall health of the Tealeaf system, as these observed values may not be purged for some time. Particularly for high-volume dimensions, this setting should be avoided.
Note: The maximum number of values that can be contained in a whitelist is 50,000. For a high-volume dimension, you may need to make some decisions about the sample of URLs of your web application that you want to track. Tealeaf provides some guidance, as discussed later.