You can use the following configuration methods to reduce the volume of data available for capture into dimensions.
All high-volume dimensions should have their values that are normalized by one of the following methods:
- Use a session agent such as Tealeaf Reference session agent to normalize values in the Windows™ pipeline.
Note: The Tealeaf Reference session agent is deployed into the default Windows pipeline, and the raw values for URL, Host, App, and Server dimensions are normalized. The values that are captured for these values are normalized by default.
- Create a whitelist to normalize values to useful data for reporting purposes.
- For dimensions for which you are trying to build whitelists, you must enable logging to capture the values to the database. If you do not enable logging, you must enter values manually.
Note: You should keep a list of dimensions for which you enabled logging and enabled the capture of observed values.
Depending on how you define your dimensions and manipulate captured data, you can significantly reduce the volume of data that is captured for individual dimensions.
Avoid Whitelist + Observed Values
dimensions
Data growth issues are most prevalent in dimensions that are configured to record Whitelist + Observed Values
. Since the observed values are renewed each hour on each Canister, the data volume can grow large.
Note: Define dimensions that record Whitelist + Observed Values
only where necessary. Data for these dimensions can grow without bound.
If you are able to define a whitelist that captures all interesting dimension values for you, you should switch the dimension to record only from the whitelist that you defined.
Note: Build your dimension value that is set by recording values to Whitelist Only
and enable logging of observed values. These logged values can be used to populate the whitelist and are automatically purged later.
A Whitelist + Observed Values
configuration may be used when populating a dimension that is not dynamic or is populated by a limited set of dynamic data. For example, you might use observed values to capture a limited set of query strings associated with a small subset of hit or event occurrences.
Avoid creating dimensions to contain a high number of values
Avoid creating dimensions that contain a high number of unique values, as they can unnecessarily clutter the reporting database and your reports.
For example, it may be tempting to create a dimension to track shopping cart values. However, this dimension could contain thousands of different values. When applied to a report, the report can become cluttered. Filtering the report by Top-N values or a specific set of values may not be meaningful.
Note: To capture data such as price, which could contain theoretically a limitless number of values, use numeric group lists to bucket the dimension values into groupings such as low
, medium
and high
.
Download log files and populate whitelists every day
When logging is enabled for a dimension, you should download the log files every day for a few days. Through Excel, you may be able to compute the frequency of specific values over time and to tune the Whitelist and maximum number of captured values accordingly.