The Tealeaf Alert Service provides real-time alerting on events and Top Movers. Each alert can be configured to send email messages depending on the threshold values defined for the event or Top Mover.
Tealeaf data is analyzed every minute, which allows the alerting engine to quickly inform users of issues that require attention. The Alert Service has the following features:
- Integrated into the event evaluation process on the Processing Server
- Aligned with the hourly or daily gathering of Top Mover data
- One-minute granularity in threshold calculations
- Summarizes alert information across multiple processing servers
- Each alert has independent configuration options: thresholds, warning, alert and black-out periods, and messaging options.
As part of its operations, the Alert Service retrieves and inserts Canister statistics for all Canisters identified in the Portal into the TL_STATISTICS
database.
- The writing of statistics is enabled via the
Per Minute SQL Updates
parameter. - Some of these statistics appear in the Canister Status reports.
How alerts work
Event data that is inserted into the database includes a count of events that fired, the number of new sessions, and the number of new pages added on a per-minute basis.
Every minute, the Alert Service polls the database to identify the list of events that have fired in the last minutes or Top Movers that have been calculated.
- Events and Top Movers are compared the set of defined and active alerts to determine if any thresholds have been exceeded. If so, the corresponding alerts are fired.
- The timing of alerts is based upon the system clock of the PCA server.
Note: Sessions that are spooled by the Canister at the time of alert evaluation may not be included in threshold calculations.
- For events whose trigger is
End of Session
or whose reporting is set toReport last occurrence
, the event time associated with the event is in the past because of the need to wait for the session timeout. These events are timed to the evaluation time of the NALT table so that their counts are included in thresholds calculations. - Each minute, two special records are updated with counts for new sessions and new pages, even if no new ones are added. These records serve as the heartbeat for the Alert Service. The hit time for the heartbeat record is taken to be the delivery time of the record.
Note: Since the Alert Service polls all Canisters on one-minute intervals, performance can be significantly impacted by network latency or other disruptions.
Alert service terminology
Understanding how the Alert Service works requires understanding a few key concepts.
- Intervals: Alerting intervals can be configured to the minute. The smallest unit of time for accumulating event counts is one minute. All intervals are contiguous regardless of event activity.
- Rolling window of time: Each alert has a configurable window of time in which event counts accumulate. The window does not have a defined start time such as the beginning of the hour or every half hour. The start time of the window equals the current time minus the alert interval size. This window can be smaller if an alert has happened any time during the alert interval.
The size of the window is controlled by the Alert Interval setting for each event. The counts from each new minute are added to the window's accumulated count, and the count of the oldest minute is dropped off if it is outside the alert interval settings.
- Thresholds: These values define when a warning or alert message is created. If the count surpasses one of these values and the alert is not in a reset interval, then a message is created.
- Alert thresholds take priority over warning thresholds.
- Negative thresholds are evaluated as "less than or equal to" the value rather than "greater than or equal to". A simple way to remember this concept is to think of small numbers as "alertable" values. One example is inactivity on an event, for example "We had only 3 orders in the past 30 minutes".
- Reset Intervals: Reset Intervals allow the Alert Service to suppress warnings and alerts for a configurable amount of time before it starts sending messages again. This feature prohibits repeating warning/alert messages (every minute) while an event is in a warning or alert status. The Alert reset interval for these alerts resets the event counter and the interval start time at the end of the reset period. The acceptable range for alert Reset Intervals is between 1 and 24. Any value outside this range results in an invalid alert.
Reset intervals do not apply to Top Mover alerts and Top Mover report alerts.