The topics below show how to populate the URL dimension immediately after Tealeaf begins receiving data from your web application.
Configure for URL (Normalized)
To review the initial configuration for the URL (Normalized)
dimension, complete the following steps.
When Tealeaf is first installed or upgraded, the URL
(Normalized)
is already configured. Professional Services might perform the initial configuration and population of the URL (Normalized)
dimension for your Tealeaf solution.
Dimensions and other event-related objects such as events, hit attributes, session attributes, and alerts are configured through the Event Manager.
The following configuration is the recommended approach to limit database growth. If observed values are not captured, then values for the dimension are not recorded until the whitelist is populated.
As an alternative, you can enable the capture of observed values. If capture of observed values is enabled, it is important to disable it after the whitelist was initially populated.
For dimensions whose values are constantly changing, you might need a different approach, in which you are capturing whitelist + observed values and trimming your whitelist regularly. This approach requires more regular management of the population of your whitelist. The remainder of the section provides an overview of the approach.
Note: Keep in mind that when observed values are enabled for capture, growth of the table containing the dimension values is not checked. Purging may be required from time to time.
- Log in to the Portal as an administrator.
- From the Portal menu, select Configure > Event Manager.
- Click the Dimensions tab.
- In the Report Group panel on the left, select the
URL/Host/App/Server
report group.The dimensions of the report group are displayed in the dimension list.
- Right-click the
URL (Normalized)
dimension and select Edit Dimension.The dimension definition is displayed for editing and review.
Table 1. 1. Initial Configuration for URL (Normalized)
Setting Default value Description Values to Record
Whitelist Only
This setting defines the values that are captured to this dimension. The default setting for the URL (Normalized)
dimension captures values to the dimension from the whitelist, which contains no values until you add them.- For other dimensions that you create or that are provided by Tealeaf, the default configuration sets this option to
Whitelist + Observed Values
. In addition to populating the data from any configured whitelist, it is also populated by all values that are detected in the capture stream. Immediately after saving the configuration, any detected values are recorded to the dimension.Note: All observed values are written into the database as new entries for the dimension value. For high-volume dimensions such as
URL (Normalized)
, Tealeaf recommends setting this option toWhitelist Only
. Make the following configuration changes.
Max Values Per Hour
1000
This setting defines the maximum number of unique values that can be captured for this dimension in a given hour. Each hour, the counter resets to zero, and the next 1000 detected values are available for capture or logging. Note: Depending on the number of normalized URLs that can be generated by visitor activities on your web application, this setting may be too low, and you may begin seeing dimension values that are recorded as the
[limit]
value. You may want to change this immediately.Note: This setting does not apply if the dimension is configured to use a whitelist only. For these dimensions, you should monitor the
[others]
value, which identifies values that are not included in the whitelist.
For dimensions using whitelist + observed values, you may still reach the[limit]
value, in which case your whitelist should be updated.Logging
OFF When logging is ON, all observed values are written into database logs, even if they are on a whitelist. This feature is provided for the creation of whitelists, which is very important to do for the URL (Normalized)
dimension.Note: Enable the logging of the
URL (Normalized)
dimension.Other configuration Dimension values do not appear and cannot be logged or recorded as observed values until the dimension was associated with a non-default report group. This limitation does not apply to the URL (Normalized)
dimension, which is part of theURL/Host/App/Server
report group. For other high-volume dimensions that you create, you must add them to a report group. - For other dimensions that you create or that are provided by Tealeaf, the default configuration sets this option to
Download logged values
After logging is enabled, all raw URLs in the capture stream are detected by the URL (Normalized)
hit attribute and logged in the database table that is associated with the URL (Normalized)
dimension.
Before you begin, you may want to create a directory structure. The following directories are recommended for storing your dimension log files:
source
- raw downloaded files from the Portal are stored here.whitelist
- the edited whitelist that you uploaded to the Portal.
You should allow sufficient time to pass to create a representative sample of URLs in the dimension log. Depending on the volume of your site and any schedule-based variances in visitor activities, the length of this time period may vary. Typically, a day should gather a representative sample.
The maximum number of files that can be downloaded from the logs is 250,000 values.
After the time period passed, download the logged values to a file and review them locally.
Note: The downloaded file contains all values that are currently stored in the database. The log values are not trimmed based on downloading values or on the presence of values in any list. It contains all values that were detected for the dimension over the preceding two weeks while logging was enabled.
Do the following:
- Edit the
URL (Normalized)
dimension. - In the Edit Dimension dialog, click the Advanced Options caret.
- Click Edit Whitelist.
- Click Download Log Values. Save the file locally to your source directory.
- Create a copy of the file and save it in your whitelist directory. Edit this new file.
Tips for populating the URL dimension:
- In the text file, the logged values are displayed in column 1, and the count of occurrences of each value is displayed in column 2.
- If you are using a tabular editor such as Excel, you should sort by column 2 in descending order, which displays the detected values by number of occurrences. You can then determine a cutoff threshold for occurrences, above which values are included. For example, you may decide that any URL can be included in the whitelist if it was visited at least two times.
- The maximum number of values that can be contained in a whitelist is 50,000. For a high-volume dimension, you may need to make some decisions about the sample of URLs of your web application that you want to track.
- You may not want to populate all 50,000 values in the initial pass. Adding any additional values to a maxed whitelist requires removing an individual value, which must be performed through the Edit Whitelist dialog.
- Delete all rows that are not to be included in the whitelist.
- Delete column 2 that contains the number of occurrences of the whitelisted values.
- Save the file into the
whitelist
directory.
Note: In the future, when you update your whitelist, you can use the saved versions of previous updates, such as this initial version, to compare occurrences in the whitelist and to determine the values that are newly detected.
Upload the whitelist
After you edit the whitelist, upload the file as the whitelist for your dimension.
- Edit the
URL (Normalized)
dimension. - In the Edit Dimension dialog, click the Advanced Options caret.
- Click Edit Whitelist.
- Click Import File. Select the file saved in your
whitelist
directory.The new values are added to the whitelist. - If you want to track any values as Top Movers, click the Track Top Movers check box next to the value.
To track all displayed values, click Select All.
- To save your whitelist, click Done.
Note: Although you created your whitelist for the dimension, you might want to keep logging enabled for some time, as updates may be required.
- To save changes to the dimension, click Save Draft.
- To apply the whitelist to the dimension, save changes to the server. Click Save Changes.
All subsequent values that are detected for the
URL (Normalized)
are processed based on the whitelist. Only URL values that are on the whitelist are permitted, if the values to record setting is configured asWhitelist Only
. - If you are satisfied with the values on your dimension whitelist, disable logging.
Note: Logging is automatically disabled after a period. To update your dimension whitelist, you must re-enable logging to capture observed values, if the count of
[others]
begins to climb.
Create a Dimension Value Tracking report
To track how accurate your whitelist is mapping to the observed values, track the occurrences of the values in a report.
The following dimension constants are important to review:
[others]
- By default, a dimension is configured to insert the default value "
[others]
" if a detected value is not on the whitelist in Whitelist Only mode. The appearance of a high number of instances of[others]
indicates that the whitelist may need new values added to it. [limit]
- If the number of values that are detected in an hour exceeds the defined maximum number of values per hour for the dimension, the
[limit]
dimension constant is inserted for the dimension value. This capping prevents runaway growth of dimension values in Whitelist + Observed Values mode.
Performance reports are sorted by occurrence. The values for [others]
and [limit]
may not be displayed if an insufficient number of these values appear in the specified report.
If you create a whitelist for a dimension that is not a URL, the best approach to tracking the occurrences of these constants is to create a report in the Report Builder, applying the URL
(Normalized)
dimension to an event that fires on every hit.
Note: The use of a dimension that occurs on every hit with an event that occurs on every hit in a report causes a significant jump in the data that is saved in the reporting database. Use this approach for a limited duration. This duration should be no longer than the duration used to initially populate the dimension logs.
When you are satisfied with your dimension list, you should remove the dimension from the report.
Add a dimension to the event
Select an event that occurs on every hit (for example, Hit Count
) or that occurs frequently enough (for example, Status Code 200
) to provide a representative sample of hits passing through the pipeline.
- In the Portal, select Configure > Event Manager.
- In the Event Manager, click the Events tab.
- In the Event List, right-click the event and select Edit Event.
- In the Event Wizard, click the Report Group tab.
- In the left panel, click the Report Groups pane.
- In the Report Groups pane, click URL/Host/App/Server.
The URL/Host/App/Server report group, of which the
URL (Normalized)
dimension is a member, is now associated with the event. - Click Save Draft.
- To post changes to the server, click Save Changes.
As soon as the event is saved, subsequent occurrences of the event are recorded with any applicable data for the dimensions in the report group. You should immediately create the report, as described in the next topic.
Create a report
After you associate the dimension with an event, you can create a report to track it.
- In the Portal, select Analyze > Report Builder.
- In the toolbar, click the New icon.
- In the left navigation panel, click Add Event.
- Select the event in the Event Selector. Click Select.
The event is added to the report.
- In the left panel, click the Dimensions tab.
All dimensions that are associated with the event are displayed for selection.
- Click and drag the
URL (Normalized)
dimension to the Add X-Axis box.The event data is now filtered by the
URL (Normalized)
dimension. - In the detail table, look for the
[others]
and[limit]
entries.
Review the report
From time to time, you should monitor the dimension constants in the report that you are using.
Constant | Values | Description |
---|---|---|
[others] |
of values > 1% of total values | In a large enough sample size, if more than 1% of the values are recorded as [others] , some useful data is not appearing in the whitelist. You should download the log values again and compare the downloaded values to the values that you stored in your whitelist directory to determine the missing values with the highest counts. |
consistent counts of 0 | Your whitelist is tracking all recorded values. Although it is possible that your whitelist may contain values that are not detected in the capture stream, you should not revise it unless necessary. | |
[limit] |
of values > 3% of (Max Values Per Hour * 24) |
If the number of values that are stored as [limit] is greater than 3% of the total possible values that are recorded in the dimension for a day, then the maximum number of dimension values that can be recorded per hour may be configured to be too low. If possible, you should raise the value so that more meaningful value can be captured. |
consistent counts of 0 | The maximum number of values that are captured per hour is not being exceeded, which means that you allocated sufficient storage space to track the dimension data on an hourly basis. |
Note: Changing one of the limits can have impacts on the other. For example, if you raise the [limit]
value, then Tealeaf may begin detecting less frequently visited URLs, which may not appear in the whitelist. These values are recorded as [others]
, which may increase the count of that metric. If you expand your whitelist, the [others]
count may go down, but you also raise the potential number of detected values, which increases the chances of triggering limit conditions.
Clean up
After you stabilize your URL (Normalized)
whitelist, complete the following tasks.
- Consider disabling logging for the dimension.
- Pro: Reduces the overall data footprint of the
URL (Normalized)
dimension, which can be significant. - Con: If you see a significant uptick in the
[others]
entries, you cannot immediately explore the cause of them through the dimension logs. You must re-enable dimension logging and then perform your compare.
- Pro: Reduces the overall data footprint of the
- If you did not do so already, switch the
URL (Normalized)
dimension to record in Whitelist Only mode.Note: For any high-volume dimension, you should disable capture of observed values after the whitelist for the dimension was stabilized.
- If you created a custom report using the
URL (Normalized)
dimension for purposes of tracking the dimension constants, you should remove the dimension from the report or delete the report, so that unnecessary data is not stored in the database. The dimension can always be added back later.
Maintenance
Over time, the values for URLs and other high-volume dimensions of your web application may change. For example, if a new release is completed of the web application, it is likely that the published URLs may need to be revisited.
Note: For significant changes to the web application, you might have to rebuild the whitelist from scratch.
Some tips for managing this process:
- Continue saving each version of the whitelists that you create. Over time, you can track changes that are based on your downloaded and edited lists.
- If possible, try to acquire the new set of URLs for the new release of the application in advance of its release. That way, you can build your whitelist in advance and deploy it as soon as the new application is online.
- Re-enable logging of the dimension before the release date.
- Continue to track
[others]
and[limits]
on a periodic basis and especially before, during, and after the release date. Iteration on the whitelist may be required.