US CalAmp App - Degraded performance
Incident Report for CalAmp
Postmortem

CTC US Data pump, CalAmp App – Message Processing Delayed 6/13/2024

Incident Start Date: 6/13/2024
Started: 6/13/2024 09:30 am PT (Intermittent delay and clear)
Server upgrade: 6/13/2024 05:40 pm PT (Servers in the cloud upgraded to improve processing)
Final Backlog Cleared: 6/13/2024 07:10 pm PT (All data current)
Event declared over: 6/13/2024 09:12 pm PT

Problem Statement

CTC Data pump messages intermittently delayed. CalAmp App UI data delayed.

Root Cause Analysis

Customers using CTC Data pump in the US were intermittently not getting current messages. There were multiple times during the event where message processing would fall behind, then catch up and become current. This also impacted CalAmp Application, which would intermittently fall behind.

Restarting the message processing service initially cleared the delays and allowed the processing to become current. However, the pattern continued and CTC experience recurring delays requiring restart of the service for message processing to catch up. During the investigation and in collaboration with our Cloud provider, we identified that the cloud servers used to store the device messages had reached their maximum allowable network bandwidth. The CalAmp team identified the appropriate server configuration that supported higher network bandwidth and initiated an upgrade of the affected servers. Upon successful completion of the upgrade of the servers, all backlog was processed and data became current.

Below is a snapshot of the timeline:
09:30 am PT – Initial delay
10:25 am PT – backlog cleared and queue current

10:55 am PT – delay
12:00 pm PT – backlog cleared and queue current

12:50 pm PT – delay
Team continued working on the issue; intermittent delay and catch up continued
05:40 pm PT – Upgrade of cloud servers
07:10 pm PT – all backlog cleared and queues current

Corrective Action and Follow Up

  1. CalAmp will review the system architecture/environment of the CTC cloud infrastructure to ensure optimized use of cloud provider servers.
  2. CalAmp will review internal data flow to optimize processing flow, address any bottlenecks and improve the performance of the message queuing system to better handle any backlog volume to improve recovery times.
Posted Jun 17, 2024 - 15:56 PDT

Resolved
Data has been current for the last 2 hours. We have been monitoring to ensure no new issues occurred. This incident has been resolved.
Posted Jun 13, 2024 - 21:11 PDT
Update
We are continuing to work the issues.
Posted Jun 13, 2024 - 16:33 PDT
Update
Messages in UI may be delayed.
Posted Jun 13, 2024 - 13:24 PDT
Update
The backlog is being processed.
Posted Jun 13, 2024 - 12:29 PDT
Monitoring
Data may be delayed when viewing from the CalAmp App UI.
Posted Jun 13, 2024 - 11:41 PDT
This incident affected: US CalAmp App.