Traffic Monitoring
Monitoring signaling traffic is the simplest method of revealing accidental (because of misconfiguration, for example) or intentional abuse of the SS7 network. Because signaling is the nervous system of the telecommunications network, it should be clear that if the SS7 network goes down, so does the entire telecommunications network it supports. Intentional or other acts that cause impairments in signaling performance can cause all kinds of critical failure scenarios, including incorrect billing, lack of cellular roaming functionality, failure of Short Messaging Service (SMS) transfer, unexpected cutoff during calls, poor line quality, poor cellular handovers, nonrecognition of prepay credits, multiple tries to set up calls, ghost calls, and the inability to contact other subscribers on certain other networks.
The SS7 network's quality of service (QoS) directly relates to the lack of QoS to subscribers. Thus, it is vital to monitor the SS7 network sufficiently to ensure that impairments, whatever their origin, are realized as soon as possible. Monitoring is specified in ITU-T recommendation Q.752 [71]. Further useful ITU-T references are provided in Q.753 [72].
Monitoring entails measuring the traffic in terms of messages, octets, or more detailed information, such as counts of certain message types or GTTs requested. Monitoring can be applied to any set of links, but it is considered essential at links that interconnect with other networks (for example, those crossing an STP or certain switches). In fact, monitoring systems tend to connect with a multiple number of links throughout the SS7 network, in effect, producing an overlay monitoring network. The monitoring points simply consist of line cards that are tapped onto the links to unobtrusively gather and process real-time data. The information obtained from the multiple points is then aggregated and analyzed at a central point (common computing platform). The processing platform is likely to vary in power and complexity, depending on the scale of the purchase. Higher-end systems provide intelligent fraud and security monitoring, and lower-end systems simply provide statistics and alerts when performance thresholds are crossed.
The values measured are compared to a predetermined threshold for "regular traffic." When a value exceeds the predetermined threshold, an alarm normally is generated, and a notification might be sent to maintenance personnel. In this way, SS7 network monitoring helps the network operator detect security breaches. Some examples of high-level measurements are Answer Seizure Ratio (ASR), Network Efficiency Ratio (NER), and Number of Short Calls (NOSK). ASR is normal call clearing divided by all other scenarios. NER is normal call clearing, plus busy, divided by all other call-clearing scenarios. NOSK is simply the number of calls with a hold time less than a prespecified value. To reflect a high QoS, a high NER and ASR are desired as well as a low NOSK.
SS7 monitoring systems are changing to reflect the convergence taking place. Many can show the portions of the call connected via SS7, and other portions of the call connected via other means, such as Session Initiation Protocol (SIP).
As convergence takes hold, a call has the possibility of traversing multiple protocols, such as SIGTRAN, SIP, H.323, TALI, MGCP, MEGACO, and SCTP. Monitoring systems that support converged environments allow the operator to perform a call trace that captures the entire call. SIGTRAN is explained in Chapter 14, " SS7 in the Converged World."
It should also be mentioned that monitoring the signaling network has other advantages in addition to being a tool to tighten up network security:
Customer satisfaction—
Historically, information was collected at the switches, and operators tended to rely on subscriber complaints to know that something was wrong. QoS can be measured in real time via statistics such as, call completion rates, transaction success rates, database transaction analysis, telemarketing call completion (toll free, for example), and customer-specific performance analysis. The captured data is stored in a central database and, therefore, can be used for later evaluation—for example, by network planning. Billing verification
Business-related opportunities—
Data mining for marketing data, producing statistics such as how many calls are placed to and from competitors.
Enforcing interconnect agreements—
Ensure correct revenue returns and validate revenue claims from other operators. Reciprocal compensation is steeply rising in complexity.
Presently, the most common security breach relates to fraud. The monitoring system may be connected to a fraud detection application. Customer profiles are created based on the subscriber's typical calling patterns and can detect roaming fraud, two calls from the "same" mobile (for example, SIM cloning), subscription fraud, and so on. The real-time nature of monitoring allows active suspicious calls to be released before additional operator revenue is lost.
Monitoring systems should be capable of most of the measurements defined in ITU-T recommendation Q.752 [71]. The rest of this section lists the bulk of these measurements for each level in the SS7 protocol stack.
Q.752 Monitoring Measurements
The number of measurements defined in Recommendation Q.752 [71] is very large. They are presented in the following sections. Note that most of the measurements are not obligatory, and that many are not permanent but are on activation only after crossing a predefined threshold. The obligatory measurements form the minimum set that should be used on the international network.
MTP: Link Failures
Measurements:
Abnormal Forward Indicator Bit Received (FIBR)/Backward Sequence Number Received (BSNR) Excessive delay of acknowledgment Excessive error rate Excessive duration of congestion Signaling link restoration
MTP: Surveillance
Measurements:
Local automatic changeover Local automatic changeback Start of remote processor outage Stop of remote processor outage SL congestion indications Number of congestion events resulting in loss of MSUs Start of linkset failure Stop of linkset failure Initiation of Broadcast TFP because of failure of measured linkset Initiation of Broadcast TFA for recovery of measured linkset Start of unavailability for a routeset to a given destination Stop of unavailability for a routeset to a given destination Adjacent signaling point inaccessible Stop of adjacent signaling point inaccessible Start and end of local inhibition Start and end of remote inhibition
Additional measurement may be provided to the user for determining the network's integrity.
Measurements:
Local management inhibit Local management uninhibit Duration of local busy Number of SIF and SIO octets received Duration of adjacent signaling point inaccessible
MTP: Detection of Routing and Distribution Table Errors
Measurements
Duration of unavailability of signaling linkset Start of linkset failure Stop of linkset failure Initiation of Broadcast TFP because of failure of measured linkset Initiation of Broadcast TFA for recovery of measured linkset Unavailability of route set to a given destination or set of destinations Duration of unavailability in measurement Start of unavailability in measurement Stop of unavailability in measurement Adjacent SP inaccessible Duration of adjacent SP inaccessible Stop of adjacent SP inaccessible Number of MSUs discarded because of a routing data error User Part Unavailable MSUs transmitted and received
MTP: Detection of Increases in Link SU Error Rates
Measurements:
Number of SIF and SIO octets transmitted Number of SIF and SIO octets received Number of SUs in error (monitors incoming performance) Number of negative acknowledgments (NACKS) received (monitors outgoing performance) Duration of link in the in-service state Duration of link unavailability (any reason)
MTP: Detection of Marginal Link Faults
Measurements:
SL alignment or proving failure (this activity is concerned with detecting routing instabilities caused by marginal link faults) Local automatic changeover Local automatic changeback SL congestion indications Cumulative duration of SL congestions Number of congestion events resulting in loss of MSUs
MTP: Link, Linkset, Signaling Point, and Route Set Utilization
Measurements by link:
Duration of link in the in-service state Duration of SL unavailability (for any reason) Duration of SL unavailability because of remote processor outage Duration of local busy Number of SIF and SIO octets transmitted Number of octets retransmitted Number of message signal units transmitted Number of SIF and SIO octets received Number of message signal units received SL congestion indications Cumulative duration of SL congestions MSUs discarded because of SL congestion Number of congestion events resulting in loss of MSUs
Measurements by linkset:
Measurements by signaling point:
Number of SIF and SIO octets received: - With given OPC or set of OPCs - With given OPC or set of OPCs and SI or set of SIs
Number of SIF and SIO octets transmitted: - With given DPC or set of DPCs - With given DPC or set of DPCs and SI or set of SIs
Number of SIF and SIO octets handled: - With given SI or set of SIs - With given OPC or set of OPCs, DPC or set of DPCs, and SI or set of SIs
Number of MSUs handled with given OPC set, DPC set, and SI set
Measurements by signaling route set:
Unavailability of route set to a given destination or set of destinations Duration of unavailability in measurement 4.9 Duration of adjacent signaling point inaccessible MSUs discarded because of routing data error User Part Unavailability MSUs sent and received Transfer Controlled MSU received
MTP: Component Reliability and Maintainability Studies
These studies are aimed at calculating the Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR) for each type of component in the SS7 network.
Measurements:
Number of link failures: - All reasons - Abnormal FIBR/BSNR - Excessive delay of acknowledgment - Excessive error rate - Excessive duration of congestion - Duration of SL inhibition because of local management actions - Duration of SL inhibition because of remote management actions - Duration of SL unavailability because of link failure - Duration of SL unavailability because of remote processor outage - Start of remote processor outage - Stop of remote processor outage - Local management inhibit - Local management uninhibit
SCCP: Routing Failures
Measurements:
In addition, the following measurements can be used as a consistency check or a network protection mechanism:
SCCP unavailability and congestion:
Local SCCP unavailable because of
Failure Maintenance made busy Congestion
A remote SCCP measurement is
SCCP: Configuration Management
Measurements:
SCCP: Utilization Performance
Measurements:
SCCP traffic received:
SCCP traffic sent:
General:
Total messages handled (from local or remote subsystems) Total messages intended for local subsystems Total messages requiring global title translation Total messages sent to a backup subsystem
SCCP: Quality of Service
The SCCP quality of service can be estimated using the following measurements:
Connectionless outgoing traffic:
UDT messages sent XUDT messages sent LUDT messages sent UDTS messages received XUDTS messages received LUDTS messages received
Connectionless incoming traffic:
UDT messages received XUDT messages received LUDT messages received UDTS messages sent XUDTS messages sent LUDTS messages sent
Connection-oriented establishments:
Outgoing: - CR messages sent - CREF messages received
Incoming: - CR messages received - CREF messages sent
Connection-oriented syntax/protocol errors:
Congestion:
ISUP: Availability/Unavailability
Measurements:
Start of ISDN-UP unavailable because of failure Start of ISDN-UP unavailable because of maintenance Start of ISDN-UP unavailable because of congestion Stop of ISDN-UP unavailable (all reasons) Total duration of ISDN-UP unavailable (all reasons) Stop of local ISDN-UP congestion Duration of local ISDN-UP congestion Start of remote ISDN-UP unavailable Stop of remote ISDN-UP unavailable Duration of remote ISDN-UP unavailable Start of remote ISDN-UP congestion Stop of remote ISDN-UP congestion Duration of remote ISDN-UP congestion
ISUP: Errors
Measurements:
Missing blocking acknowledgment in CGBA message for blocking request in previous CGB message Missing unblocking acknowledgment in CGUA message for unblocking request in previous CGU message Abnormal blocking acknowledgment in CGBA message with respect to previous CGB message Abnormal unblocking acknowledgment in CGUA message with respect to previous CGU message Unexpected CGBA message received with an abnormal blocking acknowledgment Unexpected CGUA message received with an abnormal unblocking acknowledgment Unexpected BLA message received with an abnormal blocking acknowledgment Unexpected UBA message received with an abnormal unblocking acknowledgment No RLC message received for a previously sent RSC message within timer T17 No GRA message received for a previously sent GRS message within timer T23 No BLA message received for a previously sent BLO message within timer T13 No UBA message received for a previously sent UBL message within timer T15 No CGBA message received for a previously sent CGB message within timer T19 No CGUA message received for a previously sent CGU message within timer T21 Message format error Unexpected message received Released because of unrecognized information RLC not received for a previously sent REL message within timer T5 Inability to release a circuit Abnormal release condition Circuit blocked because of excessive errors detected by CRC failure
ISUP: Performance
Measurements:
TCAP Fault Management
Protocol error detected in transaction portion Protocol error detected in component portion TC user generated problems
TCAP Performance
Measurements:
Total number of TC messages sent by the node (by message type) Total number of TC messages received by the node (by message type) Total number of components sent by the node Total number of components received by the node Number of new transactions during an interval Mean number of open transactions during an interval Cumulative mean duration of transactions Maximum number of open transactions during an interval
|