AppLogic 2.7/2.8 Documentation The latest production release is AppLogic 3.0.30
Dashboard Notification Messages
Whenever important events occur on the grid, the controller posts notification messages to the grid dashboard.
These messages can be obtained in one of the following ways:
- looking at the grid dashboard with a browser
- retrieve the messages from the shell using the
message list command (it is possible to obtain that list programmatically via ssh)
- receive notifications in e-mail messages
The following sections describe the messages.
Note that the message ID or text may vary slightly (e.g., number of apps failed to restart will not be listed at all if all apps restarted OK). Also, with each release of AppLogic we are improving the messages, so they will be changing in the future as well. When upgrading to a new version of AppLogic, if you use an automated system to collect and parse the notification messages, please verify that it recognizes all cases that you require.
Grid Start and Recovery Messages
The following messages are displayed during and after grid startup and/or recovery:
| Message ID | Severity | Message text |
010_grid_boot | info | Grid has been restarted on date. Processing reboot instructions... |
010_grid_boot | info | Grid has been restarted reason on date. Processing reboot instructions... |
010_grid_boot | info | Grid has been restarted reason on date. Restarting applications... |
010_grid_boot | info | Grid has been restarted reason on date. Application restart aborted by operator. |
010_grid_boot | info | Grid has been restarted reason on date. No applications were started because reason. |
010_grid_boot | info | Grid has been restarted reason on date. Grid restart completed on date. |
010_grid_boot | info | Grid has been restarted reason on date. Attempting application restart with m of n servers unavailable. |
010_grid_boot | alert | Grid has been restarted reason on date. Application restart was aborted due to internal error; see log for details. |
010_grid_start | info | Starting grid recovery at time |
020_grid_progress | info | Waiting for N more servers to rejoin grid (timeout in M seconds) |
020_grid_progress | info | Grid recovery in progress... |
020_grid_progress | info | Grid recovery in progress: There were N active application(s) when the controller went down. M application(s) have been recovered. The state of P application(s) has been reacquired. Recovering Q application(s). |
020_grid_progress | info | Grid recovery completed on date: No applications were recovered because there were no previously running applications. |
020_grid_progress | info | Grid recovery completed on date: There were N active application(s) when the grid controller went down. M application(s) have been recovered. The state of P application(s) has been reacquired. |
020_grid_progress | info | Grid recovery completed on date: There were N active application(s) when the grid controller went down. M application(s) were not recovered because grid recovery was aborted by operator. |
020_grid_progress | alert | Grid recovery in progress: There were N active application(s) when the controller went down. M application(s) failed to be recovered. |
020_grid_progress | alert | Grid recovery completed on date: There were N active application(s) when the grid controller went down. M application(s) failed to be recovered. |
030_grid_failure | alert | Grid recovery was aborted due to internal error: see controller system log for details. |
030_grid_srv_failed | alert | Grid recovery may not be successful. Some servers failed to connect in time. |
030_grid_vol_failed | alert | Grid recovery failure: Failed to recover volume state. See controller system log for details. |
High Availability Messages
The following messages describe the current high availability state for the grid:
| Message ID | Severity | Message text |
060_ha_check | alert | HA check completed on date; the following problem(s) were found that may result in application downtime in the event of a server failure: m running applications have degraded volumes [list]. There are not enough available resources to restart components running on n server(s) [list]. o catalog class(es) with shared volumes have degraded volumes [list]. |
060_ha_check | alert | HA check completed on date: HA is unavailable due to the grid being a single server grid. |
060_ha_check | alert | HA check completed on date: HA is unavailable due to the grid having no enabled servers. |
060_ha_check | alert | Grid does not have controller HA. X of Y controller servers are down. To restore controller HA, Z of the following controller servers have to be brought back online: list of servers |
060_ctrl_ha_check | alert | The grid is not configured for controller HA: a secondary controller server needs to be assigned. Please assign one of the running servers as a secondary controller server in order to enable controller HA on the grid. |
060_ctrl_ha_check | alert | The grid does not have controller HA: X of Y controller servers are down. To restore controller HA, Z of the following controller servers have to be brought back online: list of servers. |
Please see the AppLogic High-Availability reference for more information about AppLogic's High-Availability features.
Volume Maintenance Messages
The following messages are displayed to show the state of the volumes on the grid.
| Message ID | Severity | Message text |
070_vol_maint | alert | Failed to initiate volume check on date because another volume check was in progress. Please contact 3tera technical support for assistance. |
070_vol_maint | alert | Volume check completed on date. Volume maintenance is required. Found x inaccessible volumes, y unused volume steams. |
500_3tvolmaintd_history | alert | Volume auto-repair failed to repair number volume(s) in the last number hours: vol_name, vol_name... |
500_3tvolmaintd_failure | alert | Volume auto-repair internal error: volume list did not complete timely. Please contact 3tera technical support. |
500_3tvolmaintd_failure | alert | Volume auto-repair internal error: failed to list volumes. Please contact 3tera technical support. |
500_3tvolmaintd_failure | alert | Volume auto-repair internal error: failed to process volume queue. Please contact 3tera technical support. |
500_3tvolmaintd_failure | alert | AppLogic is unable to repair volumes -- not enough servers are up and enabled. Please enable or bring at least one more server online. |
500_3tvolmaintd_failure | alert | Volume auto-repair internal error: repair of volume vol_name requires an unavailable server. Please contact 3tera technical support. |
500_3tvolmaintd_failure | alert | Volume auto-repair internal error: failed to start repair of volume vol_name. Please contact 3tera technical support. |
Storage Failure Detection Messages
The following messages are displayed when a problem is detected with the storage of a particular server in the grid:
| Message ID | Severity | Message text |
100_srv_server_name | alert | The following devices are not monitored on server server: list of devices |
100_srv_server_name | alert | Possible storage system failure on Server server. Error: Device: device, FAILED SMART self-check. BACK UP DATA NOW!. See recommended actions. |
100_srv_server_name | alert | Possible storage system failure on Server server. Error: Device: device, Failed SMART usage Attribute: attribute. See recommended actions. |
100_srv_server_name | alert | Possible storage system failure on Server server. Error: Device: device, Self-Test Log error count increased from M to N. See recommended actions. |
100_srv_server_name | alert | Possible storage system failure on Server server. Error: Device: device, not capable of SMART self-check. See recommended actions. |
100_srv_server_name | alert | Possible storage system failure on Server server. Error: Device: device, failed to read SMART Attribute Data. See recommended actions. |
100_srv_server_name | alert | Possible storage system failure on Server server. Error: Device: device, unable to open device. See recommended actions. |
100_srv_server_name | alert | Possible storage system failure on Server server. Error: Device: device, Read SMART Error Log Failed. See recommended actions. |
100_srv_server_name | alert | Possible storage system failure on Server server. Error: Device: device, Read SMART Self-Test Log Failed. See recommended actions. |
100_srv_server_name | alert | Possible storage system failure on Server server. Error: Device: device, Temperature num Celsius reached critical limit of num Celsius (Min/Max M/N). See recommended actions. |
100_srv_server_name | alert | Possible storage system failure on Server server. Error: Device: device, ATA error count increased from M to N. See recommended actions. |
100_srv_server_name | alert | Possible storage system failure on Server server. Error: Device: device, num Currently unreadable (pending) sectors. See recommended actions. |
100_srv_server_name | alert | Possible storage system failure on Server server. Error: Device: device, num Offline uncorrectable sectors. See recommended actions. |
100_comp_server_name | alert | Server 'server name' was disabled on date because it has gone down too often within the specified time period. |
Scheduler Messages
The following messages are displayed when AppLogic is unable to honor the configured server constraints for an application or appliance due to the server(s) not having enough available resources.
| Message ID | Severity | Message text |
500_sch_app_name | info | Failed to schedule application 'application_name' based on serverset constraints. Scheduling application without serverset constraints. |
500_sch_comp_name | info | Pinned component 'component_name>' could not be scheduled on server 'server_name', scheduled on server 'server_name' instead. Please see log for details. |
Appliance Restart Messages
The following messages are displayed when an appliance is restarted due to either a server or appliance failure:
| Message ID | Severity | Message text |
300_comp_comp_name | info | Restarted appliance 'comp_name' on date due to server failure. |
300_comp_comp_name | info | Restarted appliance 'comp_name' on date due to appliance failure. |
300_comp_comp_name | alert | Failed to restart appliance 'comp_name' on date after server failure. Failed to allocate resources for appliance. |
300_comp_comp_name | alert | Failed to restart appliance 'comp_name' on date after server failure. Appliance restart failed. |
300_comp_comp_name | alert | Failed to restart appliance 'comp_name' on date after appliance failure. |
300_comp_comp_name | info | Restarting appliance 'comp_name' on date due to server failure. |
300_comp_comp_name | info | Restarting appliance 'comp_name' on date due to appliance failure. |
300_comp_comp_name | alert | Appliance 'comp_name' restart failed at date. Appliance has failed too often within the specified time period. |
300_comp_comp_name | alert | Appliance 'comp_name' failed due to server lost at date. Lost connection to server 'server name'. |
Application Monitoring Messages
The following messages are displayed by the MON appliance for counter alerts and alarms:
| Message ID | Severity | Message text |
050_filer_status | info | Internal condition 'filer status' occurred. This condition should not affect the operation of your grid. Please notify support that this error has occurred and reference SCR2301. |
400_alert_comp_name | alert | Counter 'counter_name' in component 'comp_name' was out of range on date. Counter value is counter_value. |
400_alert_comp_name | alert | Multiple alarms have occurred for component comp_name. Last alarm was: Counter 'counter_name' was out of range on date. Counter value is counter_value. |
400_alert_comp_name | alert | Alert from component 'comp_name' on date: msg |
400_alert_comp_name | alert | Multiple alarms have occurred for component comp_name. Last alarm occurred on date: msg |
Application Backup and SLA Messages
The following messages are displayed by the BCK and SLA appliances:
| Message ID | Severity | Message text |
200_app_app_name | info | Backup of application app_name successfully completed |
200_app_app_name | info | Restore of application app_name successfully completed. |
200_appA_app_name | alert | Backup failed to delete files from _impex volume. |
200_appA_app_name | alert | Restore failed to delete files from _impex volume. |
200_appB_app_name | alert | Prune of old app_name backups failed. |
200_appSLAA_app_name | alert | app_name: SLA counter collection from MON failed - policy not enforced |
200_appSLAB_app_name | alert | app_name: SLA cronjob cannot re-start the daemon - policy not enforced |
200_appSLAC_app_name | alert | app_name: SLA - all appliances are running in the appliance group appliance_group but start condition is true |
200_appSLAD_app_name | alert | app_name: SLA failed to start appliance appliance_name - marking as ignored |
200_appSLAE_app_name | alert | app_name: SLA failed to stop appliance appliance_name - marking as ignored |
200_appSLAF_app_name | info | app_name: SLA - no appliances are running in the appliance group appliance_group |
200_appSLAG_app_name | alert | app_name: SLA policy does not match appliance group. Please re-define the policy. Until then, policy enforcement cannot occur. |
Appliance Alert Messages
The following messages are displayed by appliances to alert the user:
| Appliance | Severity | Message test |
INSSLR | alert | Healtcheck method failed number times, trying to switch to backup node |
INSSLR | alert | Healtcheck method failed number times |
MYSQLR | alert | Free space on the data volume is running low, please check! |
MYSQLR | alert | Replication of master server is not running, please check! |
MYSQLR | alert | Replication of slave is too much behind master, please check! |
MYSQLR | alert | Free space on the binlogs volume is running low, please check! |
NASR | alert | Free space on the data volume is running low, please check! |
NASR | alert | Rsync daemon is not running, starting! |
NASR | alert | Replication process is not running, starting! |
NASR | alert | Replication does not appear to be alive and could not be stopped! Manual intervention may be required! |
NASR | alert | Replication does not appear to be alive, restarting! |
SQUID | alert | Data storage has less than number% of free disk space. |
SQUID | alert | Content volume has less than number% of free disk space. |
TOMCAT, TOMCAT64 | alert | Data storage has less than number% of free disk space. |
TOMCAT, TOMCAT64 | alert | Content volume has less than number% of free disk space. |
TOMCAT, TOMCAT64 | alert | Boundary property perm_size was automatically adjusted to failsafe value of number. |
TOMCAT, TOMCAT64 | alert | Boundary property heap_size was automatically adjusted to failsafe value of number. |
PGSQL, PGSQL64 | alert | Data volume has less than number% of free disk space. |
PGSQL, PGSQL64 | alert | ERROR: PGSQL started in maintenance mode due to unrecognized PostgreSQL database on supplied database volume. |
Grid Monitoring Messages
The following messages are displayed when there is a failure or unexpected condition detected on a grid:
| Message ID | Severity | Message text |
050_metering_collect | alert | Grid metering data could not be collected due to internal error, please contact support for assistance. |
050_metering_report | alert | Grid metering data could not be sent to 3tera due to internal error, please contact support for assistance. |
800_oem_config | info | The OEM configuration file for this grid is missing or not readable; see log for details. Your support links and other provider-specific information displayed on the dashboard may be incorrect, as the OEM configuration has not been applied. This error does not affect the operation of your grid. Please contact your service provider for assistance. |
060_monitoring_email_nfy | alert | Dashboard message change email notification could not be sent due to internal error - see log for details. Please contact technical support for assistance. |
060_monitoring_email_nfy | alert | Dashboard message change email notification could not be sent due to missing applogic configuration. Please contact technical support for assistance. |
060_monitoring_email_nfy | alert | Dashboard message change email notification could not be sent due to invalid severity specified within applogic configuration file. Please contact technical support for assistance. |
060_monitoring_email_nfy | alert | Dashboard message change email notification could not be sent due to SMTP error. Please contact technical support for assistance. |
060_monitoring_email_post | alert | Summary email notification could not be sent due to internal error - see log for details. Please contact technical support for assistance. |
060_monitoring_email_post | alert | Summary email notification could not be sent due to SMTP error. Please contact technical support for assistance. |
Server Monitoring Messages
The following messages are displayed when there is an issue detected with one of the servers within a grid:
| Message ID | Severity | Message text |
100_srv_server_name | alert | Lost connection to server 'server name' on date. |
400_alert_server_name | alert | The NTP daemon was found not to be running on the server, but has been successfully restarted. |
400_alert_server_name | alert | The NTP daemon was found not to be running on the server and could not be restarted. The time on the server and the time in the appliances running on the server will no longer be synchronized with the clock on the grid controller. Please contact technical support for assistance. |
400_alert_server_name | alert | Failed to destroy mount 'mount name'. Unable to stop device mount device. Please contact technical support. |
400_alert_server_name | alert | Failed to unshare volume stream 'volume name'. Unable to detach volume from hoop device. Please contact technical support. |
400_alert_server_name | alert | Grid resources are not configured correctly. This may lead to degradation in grid performance or grid instability. Please update the following grid resources on your grid or contact technical support: controller memory | controller CPU | server memory |
Controller Monitoring Messages
The following messages are displayed to show the overall state of the grid controller (mostly related to failures and unexpected conditions):
| Message ID | Severity | Message text |
500_3tctlmon | alert | Controller is out of memory. Controller restart not initiated because number failures have been detected within the previous number hours. Please contact 3tera technical support. |
500_3tctlmon | alert | Controller vol_name volume is read-only. Controller restart not initiated because number failures have been detected within the previous number hours. Please contact 3tera technical support. |
500_3tctlmon | alert | Controller _impex volume is read-only. Please contact 3tera technical support. |
500_3tctlmon | alert | Controller _impex volume was unexpectedly unmounted and successfully re-mounted. Please contact 3tera technical support. |
500_3tctlmon | alert | Controller _impex volume is not mounted. Please contact 3tera technical support. |
500_3tctlmon_report | alert | Controller restarted on date at time because it was out of memory. Please contact 3tera technical support. |
500_3tctlmon_report | alert | Controller restarted on date at time because of read-only vol_name volume. Please contact 3tera technical support. |
500_3tctlmon_report | alert | Controller restarted on date at time because it was unresponsive. Please contact 3tera technical support. |
500_3tctlmon_report | alert | Controller restarted on date at time because of an unexpected shutdown. Please contact 3tera technical support. |
500_3tctlmon_report | alert | Controller restarted for maintenance on date at time. |
500_3tctlmon_report | alert | Grid restarted by operator on date at time. |
500_3tctlmon_report | alert | Controller restarted due to hardware or software fault on date at time. |
500_3tctlmon_report | alert | Controller restarted multiple times due to hardware or software fault on date at time. |
500_3tctlmon_boot_diskspace | alert | Controller boot volume is nearly full. Please take immediate action or contact 3tera technical support. |
500_3tctlmon_meta_diskspace | alert | Controller meta volume is nearly full. Please take immediate action or contact 3tera technical support. |
Controller Recovery GUI Messages
The following recovery stages can be displayed in the Details section of the Controller Recovery GUI during the process of recovering from a controller failure:
- Waiting for servers with controller volume streams to connect. list of connected servers have connected. Waiting on list of disconnected servers to connect. Timeout in X seconds.|
- Waiting for servers with remaining controller volume streams to connect. list of connected servers have connected. Waiting on list of disconnected servers to connect. Timeout in X seconds.|
- Mounting controller volumes with X of Y servers connected.
- Performing file system check on controller boot volume.
- Performing file system check on controller metadata volume.
- Performing file system check on controller impex volume.
- Repairing file system errors on controller boot volume.
- Repairing file system errors on controller metadata volume.
- Starting grid controller on server server name.
The following messages can be displayed in the Messages section of the Controller Recovery GUI during the process of recovering from a controller failure:
- Unable to mount controller boot volume. Servers list of servers with boot volume streams did not connect. Please bring these servers back online and the grid controller should recover on its own. If the grid controller does not recover, please contact technical support.
- Unable to mount controller meta volume. Servers list of servers with metadata volume streams did not connect. Please bring these servers back online and the grid controller should recover on its own. If the grid controller does not recover, please contact technical support.
- Unable to mount controller impex volume. Servers list of servers with impex volume streams did not connect. Please bring these servers back online and the grid controller should recover on its own. If the grid controller does not recover, please contact technical support.
- File system errors detected on controller boot volume. Backing up the volume in preparation for a file system repair.
- File system errors detected on controller metadata volume. Backing up the volume in preparation for a file system repair.
- Aborting start of grid controller because file system errors were detected on the following controller volumes: list of bad controller volumes
- Failed to start grid controller on server . Please see server's system log for details. If there is another secondary controller server, the grid controller may be started on that server. If this problem persists, please contact technical support.
-- PeterNic - 23 Mar 2007
Copyright © CA 2005-2011. All Rights Reserved.