By default, Experience Analytics tracks the URLs of your web application that are reached by your visitors. Depending on how your application is structured, the number of possible URLs generated by visitor behaviors can range from 1 to millions.
Experience Analytics can capture each recorded instance of an URL and store it into the Reporting database as a separate record. If your web application generates millions of URLs, the volume of data can become prohibitively large and, if unbounded, can consume all available storage.
Note: You can also use the methods in this section to manage any dimension that generates a large number of values. Some contents apply only to the URL dimension.
How URLs are tracked
Since the URLs visited by your visitors are so important to identifying issues and behaviors of your web application, Experience Analytics automatically captures these for you, if the Tealeaf Reference session agent is deployed in your pipeline.
Note: For the capture of URL, Server, Host, and Application information, the Tealeaf Reference session must be included in any Windows™ pipeline that processes hits for search and reporting purposes.
Available URL values
When the hit is passed through the Event Engine, the value that is contained in the
TLT_URL request parameter is detected by the provided hit attribute
(Normalized). This hit attribute checks the first value on the page or hit for the presence of the
TLT_URL= request variable and returns the value after the equals sign.
(Normalized) suffix is added to indicate that it is sourced from the output of the Tealeaf Reference session agent, which normalizes the values and inserts them into the request variable.
That value that is captured to the hit attribute then can be added as a new value in the
URL (Normalized) dimension.
Depending on how the dimension is configured, this value may be recorded in the database. The following sections describe how to populate the
URL (Normalized) dimension from the
TLT_URL value that is inserted into the request.
When a visitor to your web application requests a page, the raw request that is submitted to the web server is similar to the following:
[RawRequest] GET /news/news-releases/2011//Tealeaf-Acquires-Overstat.php?NewVisitor=true HTTP/1.1 If-Modified-Since: Mon, 21 Sep 2009 17:44:54 GMT If-None-Match: "2f50428-654-4741a0a159180" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a1pre) Gecko Minefield/3.7a1pre Host: www.tealeaf.com Cache-Control: no-cache TLTUID=DB7A473CFFAD10FF0246A7393024FD9A; TLTHID=F0F495041470101401A1F9982C669B36; TLTSID=A4F96798140610140066B956324A693B;
In the above, the value next to
GET contains the URL information for the page that is requested from the domain that is defined for the
Host value. The web server managing
www.tealeaf.com returns the requested page as the response.
When this hit is later passed to Experience Analytics, the CX Passive Capture Application scans the raw request to locate the URL value and stores the URL in the
URL variable in the
[env] ... URL=/news/news-releases/2011//Tealeaf-Acquires-Overstat.php?NewVisitor=true
Query parameters are also written to the
[urlfield] section of the request as name-value pairs.
When the hit is passed through the Canister, a pipeline session agent normalizes this value and three other values and writes them into the
[appdata] TLT_URL=/news/news-releases/2011/tealeaf-acquires-overstat.php TLT_SERVER=192.168.100.76 TLT_HOST_NAME=www.tealeaf.com TLT_APPLICATION_NAME=news
This normalization and insertion is performed by the Tealeaf Reference session agent, which is required for proper parsing of the URL, Host, App, and Server dimensions.
Normalization includes the removal of query parameters from the URL string and converting it to all lowercase.
Note: Do not attempt to create hit attributes to scan for this value and capture those values to a dimension. Use the Tealeaf Reference session agent, which performs extra functions.
URLs for cxOverstat
cxOverstat enables the capture of usability data from the visited pages of each visitor session that is monitored by Experience Analytics.
cxOverstat includes a dimension for tracking the URLs from which usability data was extracted. The
ScreenView - URL dimension included with cxOverstat may be vulnerable to data explosion. Use the same dimension value logging and population strategies that are used to populate the
URL (Normalized) dimension to manage the growth of the cxOverstat URL dimension.
Note: Failing to track a URL value in the
ScreenView - URL dimension breaks the drill-down access to the cxOverstat reports available on the URL through Browser Based Replay.
Storage of dimension values
When values are detected, they may be stored based on the following configuration options, which operate independently of each other. These options are selectable in the dimension definition.
Log dimension values
If logging for the dimension is enabled, all observed values for the dimension are stored in the database. Each instance of each value is counted when detected.
Values stored in the dimension logs are removed after two weeks, so the logs can be kept to a manageable size.
These logs can be used to build up whitelists, blacklists, and group lists while keeping down data growth.
Whitelists versus observed values
Values to Record setting, you can choose one of the following options:
- When this option is selected, only the detected values that are matched against the whitelist that you defined and uploaded are allowed to be recorded for the dimension. All other values are recorded with an
[others]value or, if the maximum number of values per hour was reached (
Max Values Per Hour), with the
URL (Normalized)dimension, the goal is to build a representative whitelist of the URLs of the web application. The remainder of this page works towards building a useful whitelist for this dimension.
Whitelist + Observed Values
- When this option is selected, values that match whitelisted values are recorded, as well as any other value detected in the capture stream. The net effect is that each URL value for each hit is captured by the
URL (Normalized)hit attribute, which then populates the
URL (Normalized)dimension. Each of these values is then recorded.
Note: For high-volume dimensions the
Whitelist + Observed Valuescan generate unbounded growth of the dimension tables in the database. This setting can be dangerous to the overall health of the Experience Analytics system, as these observed values may not be purged for some time. Particularly for high-volume dimensions, this setting should be avoided.
Note: The maximum number of values that can be contained in a whitelist is 50,000. For a high-volume dimension, you may need to make some decisions about the sample of URLs of your web application that you want to track. Experience Analytics provides some guidance, as discussed later.