CTC US Data pump - delayed message processing
Incident Report for CalAmp
Postmortem

CTC US Data pump – Message Processing Delayed 6/12/2024

Incident Start Date: 6/12/2024
Started: 6/12/2024 12:03 am PT
Corrective action: 6/12/2024 06:10 am PT (Update made to improve backlog processing)
Backlog Cleared: 6/12/2024 08:00 am PT
Event declared over: 6/12/2024 08:17 am PT

Problem Statement

CTC US Data pump and CalAmp Application messages delayed.

Root Cause Analysis

Customers using CTC Data pump were not getting current messages. A big backlog of messages occurred due to a very large spike of incoming device traffic. The cause of this was an international cellular carrier re-enabling SIMs for devices that were supposed to be inactive/disabled. When the carrier re-enabled the SIMs, the devices flooded CTC with old messages stored on the devices. The influx of messages was exponentially higher than the normal daily average. Although, CTC is setup to auto scale to handle up to 2x – 3x times the normal daily traffic, this spike was far greater than that thus causing a backlog in the system.

Corrective action was taken to selectively remove the queued messages from the group of devices that should not have been enabled in order to help process the backlog faster.

Corrective Action and Follow Up

  1. CalAmp created enhanced recovery scripts to inspect and selectively remove data from the backlog to improve processing in future situations like this. These steps have been added to the basic support process to aid in faster recovery in the future.
  2. CalAmp will review the system architecture/environment to improve the performance of the message queuing system to better handle spikes in volume to prevent this in the future.
  3. CalAmp has reached out to the cellular carrier for an RCA on why disabled SIMs were re-enabled.
Posted Jun 14, 2024 - 15:59 PDT

Resolved
Backlog cleared.
Posted Jun 12, 2024 - 08:05 PDT
Update
We have implemented a patch to increase the processing of the backlog.
Posted Jun 12, 2024 - 07:29 PDT
Update
We are processing through the backlog of messages.
Posted Jun 12, 2024 - 05:19 PDT
Update
There may still be a delay. We continue to work the issue.
Posted Jun 12, 2024 - 01:16 PDT
Monitoring
The delay has cleared. We continue to monitor the system.
Posted Jun 12, 2024 - 00:53 PDT
Investigating
We are currently investigating this issue.
Posted Jun 12, 2024 - 00:27 PDT
This incident affected: US CalAmp Telematics Cloud (US CTC Core Services).