r28 - 18 May 2010 - 16:25:44 - BeckyHYou are here: Wiki >  AppLogic29 Web > AdvDashboardMessages
ALERT! AppLogic 2.9 Documentation The latest production release is AppLogic 3.0.30

Dashboard Notification Messages

Whenever important events occur on the grid, the controller posts notification messages to the grid dashboard.

These messages can be obtained in one of the following ways:

  • looking at the grid dashboard with a browser
  • retrieve the messages from the shell using the message list command (it is possible to obtain that list programmatically via ssh)
  • receive notifications in e-mail messages

The following sections describe the messages.

ALERT! Note that the message ID or text may vary slightly (e.g., number of apps failed to restart will not be listed at all if all apps restarted OK). Also, with each release of AppLogic we are improving the messages, so they will be changing in the future as well. When upgrading to a new version of AppLogic, if you use an automated system to collect and parse the notification messages, please verify that it recognizes all cases that you require.

Grid Start and Recovery Messages

The following messages are displayed during and after grid startup and/or recovery:

Message ID Severity Message text
010_grid_boot info Grid has been restarted on date. Processing reboot instructions...
010_grid_boot info Grid has been restarted reason on date. Processing reboot instructions...
010_grid_boot info Grid has been restarted reason on date. Restarting applications...
010_grid_boot info Grid has been restarted reason on date. Application restart aborted by operator.
010_grid_boot info Grid has been restarted reason on date. No applications were started because reason.
010_grid_boot info Grid has been restarted reason on date. Grid restart completed on date.
010_grid_boot info Grid has been restarted reason on date. Attempting application restart with m of n servers unavailable.
010_grid_boot alert Grid has been restarted reason on date. Application restart was aborted due to internal error; see log for details.
010_grid_start info Starting grid recovery at time
020_grid_progress info Waiting for N more servers to rejoin grid (timeout in M seconds)
020_grid_progress info Grid recovery in progress...
020_grid_progress info Grid recovery in progress: There were N active application(s) when the controller went down. M application(s) have been recovered. The state of P application(s) has been reacquired. Recovering Q application(s).
020_grid_progress info Grid recovery completed on date: No applications were recovered because there were no previously running applications.
020_grid_progress info Grid recovery completed on date: There were N active application(s) when the grid controller went down. M application(s) have been recovered. The state of P application(s) has been reacquired.
020_grid_progress info Grid recovery completed on date: There were N active application(s) when the grid controller went down. M application(s) were not recovered because grid recovery was aborted by operator.
020_grid_progress alert Grid recovery in progress: There were N active application(s) when the controller went down. M application(s) failed to be recovered.
020_grid_progress alert Grid recovery completed on date: There were N active application(s) when the grid controller went down. M application(s) failed to be recovered.
030_grid_failure alert Grid recovery was aborted due to internal error: see controller system log for details.
030_grid_srv_failed alert Grid recovery may not be successful. Some servers failed to connect in time.
030_grid_vol_failed alert Grid recovery failure: Failed to recover volume state. See controller system log for details.

High Availability Messages

The following messages describe the current high availability state for the grid:

Message ID Severity Message text
060_ha_check alert HA check completed on date; the following problem(s) were found that may result in application downtime in the event of a server failure:
m running applications have degraded volumes [list].
There are not enough available resources to restart components running on n server(s) [list].
o catalog class(es) with shared volumes have degraded volumes [list].
060_ha_check alert HA check completed on date: HA is unavailable due to the grid being a single server grid.
060_ha_check alert HA check completed on date: HA is unavailable due to the grid having no enabled servers.
060_ha_check alert Grid does not have controller HA. X of Y controller servers are down. To restore controller HA, Z of the following controller servers have to be brought back online: list of servers
060_ctrl_ha_check alert The grid is not configured for controller HA: a secondary controller server needs to be assigned. Please assign one of the running servers as a secondary controller server in order to enable controller HA on the grid.
060_ctrl_ha_check alert The grid does not have controller HA: X of Y controller servers are down. To restore controller HA, Z of the following controller servers have to be brought back online: list of servers.
060_bkb_ha_check alert Network HA is unavailable on the backbone | external network. The following servers have lost HA on the backbone | external network: list of servers. Please contact technical support for assistance.
060_ext_ha_check alert Network HA is OK on the backbone | external network, but more than one switch is currently active on the network. This may result in degradation of grid performance. Please contact technical support for assistance.

Please see the AppLogic High-Availability reference for more information about AppLogic's High-Availability features.

Volume Maintenance Messages

The following messages are displayed to show the state of the volumes on the grid.

Message ID Severity Message text
070_vol_maint alert Failed to initiate volume check on date because another volume check was in progress. Please contact 3tera technical support for assistance.
070_vol_maint alert Volume check completed on date. Volume maintenance is required. Found x inaccessible volumes, y unused volume steams.
500_3tvolmaintd_history alert Volume auto-repair failed to repair number volume(s) in the last number hours: vol_name, vol_name...
500_3tvolmaintd_failure alert Volume auto-repair internal error: volume list did not complete timely. Please contact 3tera technical support.
500_3tvolmaintd_failure alert Volume auto-repair internal error: failed to list volumes. Please contact 3tera technical support.
500_3tvolmaintd_failure alert Volume auto-repair internal error: failed to process volume queue. Please contact 3tera technical support.
500_3tvolmaintd_failure alert AppLogic is unable to repair volumes -- not enough servers are up and enabled. Please enable or bring at least one more server online.
500_3tvolmaintd_failure alert Volume auto-repair internal error: repair of volume vol_name requires an unavailable server. Please contact 3tera technical support.
500_3tvolmaintd_failure alert Volume auto-repair internal error: failed to start repair of volume vol_name. Please contact 3tera technical support.

Storage Failure Detection Messages

The following messages are displayed when a problem is detected with the storage of a particular server in the grid:

Message ID Severity Message text
100_srv_server_name alert The following devices are not monitored on server server: list of devices
100_srv_server_name alert Possible storage system failure on Server server. Error: Device: device, FAILED SMART self-check. BACK UP DATA NOW!. See recommended actions.
100_srv_server_name alert Possible storage system failure on Server server. Error: Device: device, Failed SMART usage Attribute: attribute. See recommended actions.
100_srv_server_name alert Possible storage system failure on Server server. Error: Device: device, Self-Test Log error count increased from M to N. See recommended actions.
100_srv_server_name alert Possible storage system failure on Server server. Error: Device: device, not capable of SMART self-check. See recommended actions.
100_srv_server_name alert Possible storage system failure on Server server. Error: Device: device, failed to read SMART Attribute Data. See recommended actions.
100_srv_server_name alert Possible storage system failure on Server server. Error: Device: device, unable to open device. See recommended actions.
100_srv_server_name alert Possible storage system failure on Server server. Error: Device: device, Read SMART Error Log Failed. See recommended actions.
100_srv_server_name alert Possible storage system failure on Server server. Error: Device: device, Read SMART Self-Test Log Failed. See recommended actions.
100_srv_server_name alert Possible storage system failure on Server server. Error: Device: device, Temperature num Celsius reached critical limit of num Celsius (Min/Max M/N). See recommended actions.
100_srv_server_name alert Possible storage system failure on Server server. Error: Device: device, ATA error count increased from M to N. See recommended actions.
100_srv_server_name alert Possible storage system failure on Server server. Error: Device: device, num Currently unreadable (pending) sectors. See recommended actions.
100_srv_server_name alert Possible storage system failure on Server server. Error: Device: device, num Offline uncorrectable sectors. See recommended actions.
100_comp_server_name alert Server 'server name' was disabled on date because it has gone down too often within the specified time period.

Scheduler Messages

The following messages are displayed when AppLogic is unable to honor the configured server constraints for an application or appliance due to the server(s) not having enough available resources.

Message ID Severity Message text
500_sch_app_name info Failed to schedule application 'application_name' based on serverset constraints. Scheduling application without serverset constraints.
500_sch_comp_name info Pinned component 'component_name>' could not be scheduled on server 'server_name', scheduled on server 'server_name' instead. Please see log for details.

Appliance Restart Messages

The following messages are displayed when an appliance is restarted due to either a server or appliance failure:

Message ID Severity Message text
300_comp_comp_name info Restarted appliance 'comp_name' on date due to server failure.
300_comp_comp_name info Restarted appliance 'comp_name' on date due to appliance failure.
300_comp_comp_name alert Failed to restart appliance 'comp_name' on date after server failure. Failed to allocate resources for appliance.
300_comp_comp_name alert Failed to restart appliance 'comp_name' on date after server failure. Appliance restart failed.
300_comp_comp_name alert Failed to restart appliance 'comp_name' on date after appliance failure.
300_comp_comp_name info Restarting appliance 'comp_name' on date due to server failure.
300_comp_comp_name info Restarting appliance 'comp_name' on date due to appliance failure.
300_comp_comp_name alert Appliance 'comp_name' restart failed at date. Appliance has failed too often within the specified time period.
300_comp_comp_name alert Appliance 'comp_name' failed due to server lost at date. Lost connection to server 'server name'.

Application Monitoring Messages

The following messages are displayed by the MON appliance for counter alerts and alarms:

Message ID Severity Message text
050_filer_status info Internal condition 'filer status' occurred. This condition should not affect the operation of your grid. Please notify support that this error has occurred and reference SCR2301.
400_alert_comp_name alert Counter 'counter_name' in component 'comp_name' was out of range on date. Counter value is counter_value.
400_alert_comp_name alert Multiple alarms have occurred for component comp_name. Last alarm was: Counter 'counter_name' was out of range on date. Counter value is counter_value.
400_alert_comp_name alert Alert from component 'comp_name' on date: msg
400_alert_comp_name alert Multiple alarms have occurred for component comp_name. Last alarm occurred on date: msg

Application Backup and SLA Messages

The following messages are displayed by the BCK and SLA appliances:

Message ID Severity Message text
200_app_app_name info Backup of application app_name successfully completed
200_app_app_name info Restore of application app_name successfully completed.
200_appA_app_name alert Backup failed to delete files from _impex volume.
200_appA_app_name alert Restore failed to delete files from _impex volume.
200_appB_app_name alert Prune of old app_name backups failed.
200_appSLAA_app_name alert app_name: SLA counter collection from MON failed - policy not enforced
200_appSLAB_app_name alert app_name: SLA cronjob cannot re-start the daemon - policy not enforced
200_appSLAC_app_name alert app_name: SLA - all appliances are running in the appliance group appliance_group but start condition is true
200_appSLAD_app_name alert app_name: SLA failed to start appliance appliance_name - marking as ignored
200_appSLAE_app_name alert app_name: SLA failed to stop appliance appliance_name - marking as ignored
200_appSLAF_app_name info app_name: SLA - no appliances are running in the appliance group appliance_group
200_appSLAG_app_name alert app_name: SLA policy does not match appliance group. Please re-define the policy. Until then, policy enforcement cannot occur.

Appliance Alert Messages

The following messages are displayed by appliances to alert the user:

Appliance Severity Message test
INSSLR alert Healtcheck method failed number times, trying to switch to backup node
INSSLR alert Healtcheck method failed number times
MYSQLR alert Free space on the data volume is running low, please check!
MYSQLR alert Replication of master server is not running, please check!
MYSQLR alert Replication of slave is too much behind master, please check!
MYSQLR alert Free space on the binlogs volume is running low, please check!
NASR alert Free space on the data volume is running low, please check!
NASR alert Rsync daemon is not running, starting!
NASR alert Replication process is not running, starting!
NASR alert Replication does not appear to be alive and could not be stopped! Manual intervention may be required!
NASR alert Replication does not appear to be alive, restarting!
SQUID alert Data storage has less than number% of free disk space.
SQUID alert Content volume has less than number% of free disk space.
TOMCAT, TOMCAT64 alert Data storage has less than number% of free disk space.
TOMCAT, TOMCAT64 alert Content volume has less than number% of free disk space.
TOMCAT, TOMCAT64 alert Boundary property perm_size was automatically adjusted to failsafe value of number.
TOMCAT, TOMCAT64 alert Boundary property heap_size was automatically adjusted to failsafe value of number.
PGSQL, PGSQL64 alert Data volume has less than number% of free disk space.
PGSQL, PGSQL64 alert ERROR: PGSQL started in maintenance mode due to unrecognized PostgreSQL database on supplied database volume.

Grid Monitoring Messages

The following messages are displayed when there is a failure or unexpected condition detected on a grid:

Message ID Severity Message text
050_metering_collect alert Grid metering data could not be collected due to internal error, please contact support for assistance.
050_metering_report alert Grid metering data could not be sent to 3tera due to internal error, please contact support for assistance.
800_oem_config info The OEM configuration file for this grid is missing or not readable; see log for details. Your support links and other provider-specific information displayed on the dashboard may be incorrect, as the OEM configuration has not been applied. This error does not affect the operation of your grid. Please contact your service provider for assistance.
060_monitoring_email_nfy alert Dashboard message change email notification could not be sent due to internal error - see log for details. Please contact technical support for assistance.
060_monitoring_email_nfy alert Dashboard message change email notification could not be sent due to missing applogic configuration. Please contact technical support for assistance.
060_monitoring_email_nfy alert Dashboard message change email notification could not be sent due to invalid severity specified within applogic configuration file. Please contact technical support for assistance.
060_monitoring_email_nfy alert Dashboard message change email notification could not be sent due to SMTP error. Please contact technical support for assistance.
060_monitoring_email_post alert Summary email notification could not be sent due to internal error - see log for details. Please contact technical support for assistance.
060_monitoring_email_post alert Summary email notification could not be sent due to SMTP error. Please contact technical support for assistance.

Server Monitoring Messages

The following messages are displayed when there is an issue detected with one of the servers within a grid:

Message ID Severity Message text
100_srv_server_name alert Lost connection to server 'server name' on date.
400_alert_server_name alert The NTP daemon was found not to be running on the server, but has been successfully restarted.
400_alert_server_name alert The NTP daemon was found not to be running on the server and could not be restarted. The time on the server and the time in the appliances running on the server will no longer be synchronized with the clock on the grid controller. Please contact technical support for assistance.
400_alert_server_name alert Failed to destroy mount 'mount name'. Unable to stop device mount device. Please contact technical support.
400_alert_server_name alert Failed to unshare volume stream 'volume name'. Unable to detach volume from hoop device. Please contact technical support.
400_alert_server_name alert Grid resources are not configured correctly. This may lead to degradation in grid performance or grid instability. Please update the following grid resources on your grid or contact technical support: controller memory | controller CPU | server memory
400_alert_server_name alert The backbone network on server server name is degraded | down. Network device(s) list of down NICs is | are down. Please contact technical support for assistance.
400_alert_server_name alert The external network on server server name is degraded | down. Network device(s) list of down NICs is | are down. Please contact technical support for assistance.
400_alert_server_name alert Network topology change detected on server server name. Network device NIC name is now connected to switch switch id.

Controller Monitoring Messages

The following messages are displayed to show the overall state of the grid controller (mostly related to failures and unexpected conditions):

Message ID Severity Message text
500_3tctlmon alert Controller is out of memory. Controller restart not initiated because number failures have been detected within the previous number hours. Please contact 3tera technical support.
500_3tctlmon alert Controller vol_name volume is read-only. Controller restart not initiated because number failures have been detected within the previous number hours. Please contact 3tera technical support.
500_3tctlmon alert Controller _impex volume is read-only. Please contact 3tera technical support.
500_3tctlmon alert Controller _impex volume was unexpectedly unmounted and successfully re-mounted. Please contact 3tera technical support.
500_3tctlmon alert Controller _impex volume is not mounted. Please contact 3tera technical support.
500_3tctlmon_report alert Controller restarted on date at time because it was out of memory. Please contact 3tera technical support.
500_3tctlmon_report alert Controller restarted on date at time because of read-only vol_name volume. Please contact 3tera technical support.
500_3tctlmon_report alert Controller restarted on date at time because it was unresponsive. Please contact 3tera technical support.
500_3tctlmon_report alert Controller restarted on date at time because of an unexpected shutdown. Please contact 3tera technical support.
500_3tctlmon_report alert Controller restarted for maintenance on date at time.
500_3tctlmon_report alert Grid restarted by operator on date at time.
500_3tctlmon_report alert Controller restarted due to hardware or software fault on date at time.
500_3tctlmon_report alert Controller restarted multiple times due to hardware or software fault on date at time.
500_3tctlmon_boot_diskspace alert Controller boot volume is nearly full. Please take immediate action or contact 3tera technical support.
500_3tctlmon_meta_diskspace alert Controller meta volume is nearly full. Please take immediate action or contact 3tera technical support.
500_3tctlmon_network alert A problem has been detected with the grid’s network: description. Please contact technical support for assistance.

Controller Recovery GUI Messages

The following recovery stages can be displayed in the Details section of the Controller Recovery GUI during the process of recovering from a controller failure:

  • Waiting for servers with controller volume streams to connect. list of connected servers have connected. Waiting on list of disconnected servers to connect. Timeout in X seconds.|
  • Waiting for servers with remaining controller volume streams to connect. list of connected servers have connected. Waiting on list of disconnected servers to connect. Timeout in X seconds.|
  • Mounting controller volumes with X of Y servers connected.
  • Performing file system check on controller boot volume.
  • Performing file system check on controller metadata volume.
  • Performing file system check on controller impex volume.
  • Repairing file system errors on controller boot volume.
  • Repairing file system errors on controller metadata volume.
  • Starting grid controller on server server name.

The following messages can be displayed in the Messages section of the Controller Recovery GUI during the process of recovering from a controller failure:

  • Unable to mount controller boot volume. Servers list of servers with boot volume streams did not connect. Please bring these servers back online and the grid controller should recover on its own. If the grid controller does not recover, please contact technical support.
  • Unable to mount controller meta volume. Servers list of servers with metadata volume streams did not connect. Please bring these servers back online and the grid controller should recover on its own. If the grid controller does not recover, please contact technical support.
  • Unable to mount controller impex volume. Servers list of servers with impex volume streams did not connect. Please bring these servers back online and the grid controller should recover on its own. If the grid controller does not recover, please contact technical support.
  • File system errors detected on controller boot volume. Backing up the volume in preparation for a file system repair.
  • File system errors detected on controller metadata volume. Backing up the volume in preparation for a file system repair.
  • Aborting start of grid controller because file system errors were detected on the following controller volumes: list of bad controller volumes
  • Failed to start grid controller on server . Please see server's system log for details. If there is another secondary controller server, the grid controller may be started on that server. If this problem persists, please contact technical support.

-- PeterNic - 23 Mar 2007

 
Copyright © CA 2005-2011. All Rights Reserved.
%