How to work with anomalies

The best place to start is by reading a support article we have here.

  • Focus on the top 10 deviations in an anomaly warning, and the one flagged as "initial deviation" is always essential. Of course, if you really want to dig in you can scroll through all impacted parameters with some focus on the parameter type of interest. A good idea is also to sort the deviations on "deviations from normal behavior".
  • The classification of an anomaly is currently decided on how many percent of the total number of parameters in a correlation group that are having deviations (root cause + impact). This can mean that a "high" anomaly alert not necessarily says something about immediate danger on your systems in the correlation group, as in some cases you can have a high number of parameters with deviations without this posing any threat to your systems.
  • For Azure resource groups, the classification can be higher than for other agents connected if you only have 1-2 resources in a resource group. Azure resource groups are meant as a logical grouping of resources that depend on each other, for instance a webapp relying on an eventhub, some storage and a SQL database. Make sure you structure your resources in proper resource groups and don't just keep every resource in a different resource groups.
  • Having a system connected to AIMS with only a few parameters will more often trigger "high" and "critical" anomaly alerts.
  • A correlation group can consist of any given number of systems connected. You can have AIMS correlate parameters just internaly within a system, or put systems in so called "correlation groups" so AIMS can correlate a deviation across systems. For instance, an IIS relying on a database will be a natural correlation group as we would like to spot what kind of impacts an increase in requests to the IIS would have on the database.
  •  We recommend that you do not try to chart more than 5-10 parameters at a time from an anomaly warning as it will be hard to read the charts.
  • Keep in mind that if you dont resolve or ignore an anomaly warning, it will be auto-ignored after 6 days
  • Ignoring or resolving an anomaly warning has a direct impact on how AIMS will handle similar future anomalies:
    • Resolve - means that the anomaly was verified to have a negative impact, worsen performance, lead to downtime etc. AIMS will continue to alert on similar situations.
    • Ignore - means that it was a false positive. There were deviations but the deviations were ok or normal, and you do not want AIMS to continue to alert about similar situations. AIMS will then incorporate the deviation(s) in the normal behavior of the parameter(s).
  • An anomaly warning can grow to include more deviations. In this case, follow-up alerts can occur (make sure that you check the ID of the alert -> is it new or a follow-up).