Requirements for high availability

Before you install Tealeaf, consider the implications of making the system highly available.

In order to provide uninterrupted service, Tealeaf allows highly available configurations of its capture and processing platforms.

The main objective of making Tealeaf highly available is to protect against data loss or the inability to capture data, which results in data loss.

Note: For Tealeaf, protecting against the temporary unavailability of already-captured data is not a primary objective of a high-availability.

The high availability features of Tealeaf cxImpact may be considered from the perspective of the main cxImpact functional components.

The functional components will be considered in the order that captured data passes through them. Each section addresses one of the main functional units of a cxImpact installation. The order of data flow through these components is as follows: Passive Capture Application server > HBR server > Processing servers > Reporting server

The relative positions of these components can be described as being "upstream" or "downstream" of each other. In this sense the Passive Capture Application server is upstream of the HBR Server. The Reporting Server is downstream of the Processing Servers.

Typically, an upstream component is responsible for monitoring the health of the component immediately downstream from itself. This document assumes that the reader is already familiar with the architecture of a cxImpact installation.

Example: High availability configuration

A resilient Installation below depicts an Tealeaf installation with the following characteristics:

Two primary capture servers, each with a failover slave server
A primary and standby HBR server
Three processing servers
An active and a cold standby processing server

High availability: CX Passive Capture Application server

The Passive Capture Application Server (PCA server) is responsible for the extraction of HTTP requests and responses from raw TCP/IP network data.

An HTTP request/response pair is combined to form a hit, and hits are sequenced into sessions based on defined criteria, such as the value of a session cookie.

The PCA server can also decrypt encrypted data and obscure or destroy sensitive data such as credit card numbers.

In an Tealeaf cxImpact solution, there is at least one PCA server. In high-volume solutions, additional PCA servers may be deployed.

PCA device failover

The Passive Capture software supports failover and failback between primary and secondary Passive Capture devices. If a PCA server fails, the data being captured by that server is lost.

To protect against data loss resulting from such a failure, additional PCA server systems can be configured to run in a master/slave configuration. A heartbeat check from the secondary device published NIC polls the primary device published NIC at a pre-configured interval; if the polling is not successful, the secondary device begins writing data to the Processing Server.

The master server in a pair is the normally active server.
The slave server continually monitors the status of the master through a heartbeat check.
In the event of a failure in the master, the slave assumes responsibility for the capture of data from the master server. The slave server can be configured to fall back to the master server when the master becomes operational again.

Requirements - To enable PCA device failover, please verify the following requirements in your Tealeaf environment:

PCA configurations on each device must be identical except for the failover settings.
Changes to PCA configurations in one device must be applied to the other.
A second capture point (SPAN port or tap) must be active and connected to the secondary device.
Both PCAs must be receiving identical traffic feeds.

Delivery failover mode

The Passive Capture Application supports two methods of deliver failover management, depending on the version in use:

PCA Build 3500 or later:
In PCA 3500 or later, the default failover method is to use even distribution, which automatically redistributes traffic from a failed delivery peer evenly across the remaining delivery peers in the environment.

Set the Delivery Mode to Even Distribution
PCA Build 34xx or earlier:
The PCA can be configured to recognize failures in delivery targets and then to failover to secondary targets. Set the Delivery Mode to Failover.
Note: This method is supported for legacy purposes and may be deprecated in a future release.

For either method, each PCA must have at least two delivery targets, a primary and secondary. If the connection to the primary target is lost, the PCA begins to send traffic to the secondary peer in Failover mode or to all remaining peers in Even Distribution mode.

High availability: Processing servers

An Tealeaf cxImpact installation includes one or more processing servers, depending on the volume of data to be captured.

Among other functions, the processing server is responsible for the following:

Event processing
Alerting
Indexing of captured data
Storage of captured data

To protect against failure of one or more of these processing server functions, additional capacity should be configured. Additional capacity can be in the form of additional servers or additional resources within each server.

If one of the processing servers fails, the additional capacity configured allows the remaining servers to assume the load of the failed server.

For environments with multiple processing servers , an HBR Server can be deployed to monitor processing server health and load balancing. For more information, see HBR Server.

In a single processing server environment, excess capacity should be available within the processing server.

Each processing server monitors the health of its own canister, which stores session data for Active and Completed sessions. If the canister is temporarily unavailable or if the server is falling behind in processing, data may be spooled locally until the situation is resolved.

Depending on whether you are using Health-Based Routing, failover of the processing servers is handled in one of two ways.

HBR Server

Tealeaf supports Health-Based Routing within Tealeaf software for data transports to multiple processing servers. HBR enables load routing based on Canister health and failover in multi-processing server configurations.

The Health-Based Routing (HBR) server distributes incoming captured data across multiple processing servers. The HBR server monitors the health of the processing servers in the system, so it is recommended that an HBR server be configured in any Tealeaf installation with more than one processing server that must be highly available.

Smaller Tealeaf installations might not require an HBR Server. Smaller installations typically have low data volumes requiring a single processing server only.

The HBR Server manages the distribution of incoming session data among the available processing servers. HBR monitors the health of the processing servers and, if one of them becomes unavailable, stops sending data to that server and redistributes the incoming data across the remaining servers. When the unavailable server becomes operational again, the HBR server resumes sending data to it.

Functionally, the HBR Server is a Tealeaf processing server without a local canister. A Windows machine is dedicated to running the Tealeaf Transport Service. It polls each processing server for availability and spooling status. The HBR Server runs a pipeline that includes the HBR session agent. This agent performs the following functions:

Monitor the availability of downstream processing servers.
If a machine is unavailable or is spooling data, then HBR reallocates its traffic to other available processing servers.
Distribute the incoming captured data amongst the available processing servers.
Spool incoming data in the event that no processing servers are available or the available servers cannot deal with the volume of incoming data between them.

Since the HBR Server performs a central data management and distribution role, it is a potential single point of failure. To protect against this, a hot standby HBR Server may be configured. The standby server should be identical in capabilities and configuration to the primary HBR Server.

The PCA server can be configured with a primary and secondary delivery peer, where the primary peer is the active health-based router, and the secondary peer is the failover HBR machine.

Non-HBR configurations with multiple processing servers

You can achieve higher availability by having two PCA servers feeding two processing servers. For configurations with multiple processing servers and no Health-Based Routing that use the PCA failover feature, the supported solution is to use and "active/active" model for failover management. In this model, the PCA is configured to send half its data to each processing server. If the connection to a processing server is lost, the PCA sends all its data to the remaining server.

An alternative approach, using one active processing server with a standby processing server for failover, is not supported.

Delivery to multiple processing servers can be configured through the PCA Web Console.

High availability: Portal web application

In the event of a failure of the Portal web application, a failover switching script can be executed to resume functioning of the web application on a separate platform.

Typically, a secondary machine with identical configuration is deployed as a failover, with the suite of Tealeaf Data Services stopped. When this machine becomes active, a script starts the services, which then assume collection of session data from the processing servers.

To start the secondary machine, login to the machine and select the following shortcut from the Windows Start menu: Start > > All Programs > Tealeaf Technology > Start Tealeaf Services.

High availability: Reporting server

The reporting server hosts the Report server and Portal web application, which are supporting components of the web Portal application.

These components store data in a SQL Server database that can be installed locally on the Reporting server or installed remotely on a separate server.

Users access Tealeaf through the web Portal and the RealiTea Viewer. Both methods depend on the reporting server.

At any time, a Tealeaf installation should only have a single instance of the Tealeaf Data Service, a component of the reporting server, since this data is collected from processing servers, aggregated, and then removed from subsequent collection. As a result, high availability strategies for the reporting server require a cold standby system or rapid rebuild/replacement of an unavailable reporting server.

Note: When deploying a failover report server, you must also deploy a mirrored version of all Tealeaf databases. Tealeaf cannot natively update two sets of databases in real-time, so you must schedule with your database administrator daily or otherwise periodic updates of your mirrored set of databases.

Cold standby

In the cold standby scenario, a second identical reporting server is inactive until needed. If the active reporting server becomes unavailable, the standby server can be quickly activated to assume the responsibilities of the failed server.

This method has the benefit of minimizing the period of time during which Tealeaf reporting data is unavailable to users. However, it incurs the overhead of maintaining a second reporting server that is not being used most of the time.

Note: Ensure that changes to the configuration of the active server are also made to the standby server.

Rebuild/Replace

In the rebuild/replace scenario, a new server is provisioned to replace the failed reporting server. This scenario requires installation and configuration of the hardware, operating system, and Tealeaf components.

This method has the benefit of incurring the overhead of a second server only as needed. However, Tealeaf reporting data is unavailable to users for a longer period than in the cold standby method. If a new server can be made available within an acceptable period of time, this method may be appropriate.

Note: Statistical data is retained on the processing server for a period of 72 hours before it is discarded. As long as a replacement server is provisioned within this period, no data is lost.

Example: High availability configuration

High availability: CX Passive Capture Application server

High availability: Processing servers

High availability: Portal web application

High availability: Reporting server

Join the community

Academy

Planning the installation

Example: High availability configuration

High availability: CX Passive Capture Application server

High availability: Processing servers

High availability: Portal web application

High availability: Reporting server

Related articles

Join the community

Academy