Sep 7 - AppLogic 2.1 has been officially released! See the release notes for details.
Troubleshooting
Appliance Start
Appliance won't start
If the appliance fails to start when you start the application, the most likely reason is that the appliance is not starting the VM agent (this is visible during start as the appliance taking a long time to start and then failing; the log says that appliance start timed out). To fix this, make sure the appliance runs the
/appliance/vma_load.sh script when it is done booting. All templates from the global catalog have this script and start it by default.
In most cases, even when the appliance doesn't start (times out), you have a couple of minutes to log in on it. Here is how:
- open two ssh sessions to the controller: one from which you will start the application and one to access the appliance
- start the application from the first session
- while the application is starting, go to the second session and check the appliance status using
component list shell command
- when you see your appliance in Starting state, wait 10-20 seconds and try to ssh into it (
ssh).
Since the default timeout is quite short -- approximately 2 minutes -- you may want to temporarily override the boot timeout. See the General tab on the Instance Settings screen; set the timeout to 3600 seconds (1 hour). Once you troubleshoot the appliance and it begins to start OK, restore the default boot timeout.
If you cannot even ssh to the appliance, then most likely the volume or its boot configuration got corrupted. You can stop the application, mount the volume and inspect it, or start from the template again. If the problem recurs, contact
Technical Support.
Appliance starts but property values are not propagated
Appliance starts OK, but all properties are left at their default values in the instrumented configuration files.
- Check that the configuration files are listed in the Config Files tab of the class in the Class Editor.
- Check that the configuration files are instrumented using the [ADL property markup syntax
Application Start
Inaccessible Application
- Try to login as a grid operator (see above)
- Restart the application (
app restart myapp)
- Try to login again as a grid operator
- Restore the application to last snapshot or to initial state
Application errors due to data volume full
- Stop the application (
app stop myapp)
- Resize the application's data volume
- check current volume size (
vol list myapp --all)
- to resize:
- grid server:
vol resize myapp:GSC.boot size=20G
- cPanel:
vol resize myapp:cPanel.boot size=20G
- Start the application again (
app stop myapp)
Low application performance
Applications that experience low performance most likely are not configured with enough resources (e.g., CPU, memory, or bandwidth). To attempt to resolve the problem, perform one or more of the following:
- Determine the current resource allocation for all application components by executing component list myapp.
- Without stopping the application, attempt to assign more resources to offending components by restarting the component and specifying additional resources on the command line - See component restart command for details.
- Once you have determined which components need more resources, you can permanently modify the resource requirements for the components via the Resources tab of the Instance Settings property sheet.
Application kernel crash
Application that experiences a kernel crash will be automatically restarted. A message will appear on the dashboard indicating appliance failure and restart.
Application reboot
Application that has appliance shutdown or reboot by themselves will have the appliances automatically restarted. A message will appear on the dashboard indicating appliance failure and restart.
Inaccessible Volume
Volumes may be inaccessible for one of the following reasons:
- Voluem does not exist
- volume is already mounted
- volume is in error state or needs repair/migration
To verify that the volume exists, execute
vol info name. If the volume does not exist, create the volume; see
vol create command.
To verify if the volume is mounted, execute
vol info name and inspect the volume mount state. If the volume is mounted, unmount the volume; see
vol unmount command.
To verify if the volume needs repair, execute
vol check --all and inspect whether the volume is in the list.
If the volume is in error state, verify that all servers on which the volume mirrors reside are up and operational - execute
vol info name to see on which servers the mirrors reside.
If the volume needs repair - attempt to reapir the volume - execute
vol repair name.
If the volume needs migration you can attempt to migrate the volume using
vol migrate name. However, the fact that a volume needs migration should not result in it being inaccessible.
Inaccessible Grid
If the grid is inaccessible, the grid controller is either down, the user with which you are trying to log into the grid controller has not been created, or the SSH key you are using is incorrect if trying to connect via SSH. You should contact your grid maintainer or
TechnicalSupport if this occurs.
GUI Slowdown
Sometimes, if there is heavy load being put upon the grid controller - typically due to a heavy I/O operation (e.g., copying volumes), the GUI may seem slow and should begin responding normally when the operation is complete. If the GUI remains slow for an extended period of time, you should contact your grid maintainer or
TechnicalSupport.
--
BeckyH - 02 Nov 2006