r3 - 03 Mar 2008 - 07:21:08 - EricTYou are here: Wiki >  AppLogic1 Web > ReleaseNotes-1-2-14c
led-green Dec 21 - AppLogic 2.1.1 has been officially released! See the release notes for details.

AppLogic Release Notes


Version 1.2.14c - May 18, 2007


ALERT! Hotfixes available for this release. Note that hotfix 4 must be applied to all grids.

These are the release notes for AppLogic 1.2.14c. Prior versions.

Overview

AppLogic is the first grid operating system that runs and scales existing web applications. It converts a set of commodity servers into a scalable grid that's easy to manage. With AppLogic, you can:

  • Deploy existing web applications on a grid without changing any code
  • Scale each application from a fraction of a server up to the whole grid
  • Manage whole racks of servers easier than a single server today
  • Handle hardware failures automatically without losing data
  • Add or remove servers and storage without disrupting applications
  • Manage all applications, servers and storage with just a browser

AppLogic does not require a SAN or other expensive hardware, and is open and vendor-neutral. It supports Linux and all popular open-source middleware including Apache, MySQL, JBoss and Ruby on Rails, so there is no learning curve.

What's so special about AppLogic?

AppLogic does for web applications what Apache does for web content. By separating the application from the datacenter infrastructure required to run it, AppLogic makes it possible to:

  • Deploy the same web application on different grids
  • Run multiple different web applications on the same server
  • Scale a web application to multiple servers, up to the whole grid
  • Manage applications and hardware easily

Just like the web servers made it possible to run web sites without owning a datacenter and servers, AppLogic makes it possible to run and scale web applications without the enormous expense of owning and operating scalable IT infrastructure. With AppLogic, you can host any web application on commodity servers rented on a month-to-month basis from your favorite hosting provider.

What is new in AppLogic 1.2

The AppLogic 1.2 release includes the following key features which were not available in the AppLogic 1.1 FCS release:

  • Basic High Availability: BHA provides automated grid recovery while minimizing grid downtime and significantly reducing the need for manual user intervention in case of hardware/software failures. BHA includes the following new features:
    • Automatic restart of appliances that fail in running applications: If an appliance of a running application crashes, the appliance is automatically restarted (unless the appliance can be ignored). This mimimizes the downtime of a running application if there is an appliance failure. If an appliance fails too many times, the appliance is not restarted (configurable system parameter).
    • Automatic recovery from grid server failures: If a server in the grid fails, AppLogic restarts all of the appliances that were running on the failed server on a different server. The failed server is automatically cleaned up and rebooted. The failed server is automatically disabled if it fails too many times.
    • Automatic recovery from grid controller failures (i.e., controller software crash): The entire grid is automatically rebooted when a grid failure is detected. A list of running applications is maintained so the applications may be restarted when the grid reboots. If the grid fails too many times, the applications are not restarted and user intervention is required.
    • Log recovery progress/failure notifications to the grid's dashboard: All appliance, application and server failures/restart progress are logged to the grid's dashboard to keep the user informed on what is happening on the grid. The user may delete dashboard messages they no longer need.
  • Metering: The amount of memory used by running applications is monitored and reported to 3tera so we can bill our customers for grid usage. This is based on the amount of memory per hour that is used by all running applications on a single grid (known as "gigabyte hours" or GBh).
  • SSL support ( SCR 700): All browser access to AppLogic is channeled through encrypted connections using SSL (Secure Sockets Layer). By default, AppLogic provides a SSL self-signed certificate that expires after 5 years - not signed by a trusted certificate authority. See below on how to configure your own SSL certificate.
  • Visual Editor Enhancements: The following enhancements were made to the AppLogic Visual Editor:
    • Exchange the class of an appliance: A new ability in the application editor that allows the user to exchange the class of an appliance with another class from a catalog* ( SCR 658). This allows the user to change the version/flavor of an appliance without having to delete the appliance and then dragging in and re-configuring the new appliance. Simply shift-drag the new appliance from the catalog over an appliance on the canvas and the classes are exchanged. Note that the appliance class boundaries must be identical in order to use this new feature.
    • Allow App Config to be invoked without opening the application ( SCR 460)
    • Added a button to close the application ( SCR 645)
    • Allow the user to select the value of an enumerated type property using a drop down menu ( SCR 580)
  • Common volume support: Common volumes are now fully supported in AppLogic. Common volumes are read-only volumes that are shared between several different appliances - only one copy of the volume exists in the system. Care should be taken when using common volumes since they expose a single point of failure for an application.
  • 32 server support: The AppLogic grid OS has been certified to run on 32 servers per grid.
  • Universal Appliance Distro Support: AppLogic appliances can now run on any Linux distribution. Verified with RedHat, SuSE, CentOS 4.3, and Debian.
  • Certification on CentOS: AppLogic has been certified on a grid of servers based on CentOS 4.3 - the grid controller is also based on CentOS 4.3. As a result, AppLogic will work on a much larger variety of hardware. Fedora Core 3 is no longer supported except for AppLogic appliances.
  • Swap volume support: AppLogic now has full support for swap volumes. Swap volumes can be created and re-sized (just like any other type of volume).

In addition, the following features are also now available:

  • Volume resizing: Volumes can now be resized using the new vol resize command.
    • Resize a volume to 100 megabytes: vol resize myapp:myvol size=100M
  • Volume file-level copy support: Volumes can now be copied using a file-level copy by using the new --fscpy option when copying a volume. This is useful for initializing a volume; for copying content to the content volume of a web server for example.
    • Copy a volume file by file: vol copy src dest --fscpy
  • New application migration command: This new command allows the user to execute a non-live migration of a running application from a remote grid to the current grid. Application configuration parameters may be specified for the new application. Note that in order to use application migration, SSH forwarding must be enabled on the client side.
    • app migrate myothergrid.3tera.net test usr_ip=64.4.47.21 out_ip=64.4.47.22 (move the application test from myothergrid.3tera.net to the current grid)

Hotfixes for AppLogic 1.2.14c

This section describes all of the available hotfixes for the AppLogic 1.2.14c release. Make sure that your AppLogic 1.2.14c grid is updated with the mandatory hotfixes to ensure a properly working AppLogic grid.

Mandatory Hotfixes

  • Hf2192 - Eliminates a vulnerability in AppLogic that allows properly authenticated grid users to obtain grid internal data
  • Hf1910 - Patches a potential security hole in AppLogic where it is possible for a logged in user to elevate his rights to maintainer

Optional Hotfixes

  • e1720 - Adds one hop application migration to a grid, makes application migration significantly faster

What is new in AppLogic 1.2.14c

This is a new bug fix release that addresses an issue during new grid installations. It is available for download or hosted access by 3tera customers.

ALERT! All new grid installs should use this release. Grids that are currently running versions of AppLogic prior to 1.2.14a should be upgraded to this release. Grids that are currently running AppLogic 1.2.14b do not need to be upgraded to the 1.2.14c release.

ALERT! When installing a new grid or upgrading to AppLogic 1.2.14c, there is a separate update that needs to be applied to the grid after it is installed/upgraded. Please see Installing a grid in these release notes for more information on how to install this extra update.

Key Enhancements and Bug Fixes

From 1.2.14b to 1.2.14c

Core System Bug Fixes: Stability Issues
  • SCR 1533: During the execution of volume operations on the AppLogic controller (import/export of applications/catalogs, volume copies/instantiations, etc), one or more servers may lose connection to the grid. This also affects new grid installations of AppLogic which can cause them to fail during the import of catalogs or applications.

What's Included

This release of the AppLogic grid operating system includes the following key components.

Distributed Kernel

The AppLogic distributed kernel provides a set of system services required to support the distributed infrastructure and application model of AppLogic. The four most important system services include:

  • Global volume store: a scalable, distributed volume store using the built-in hard disks of the grid servers. The volume store keeps volumes mirrored across two or more servers, ensuring high availability and improved read performance. The hierarchical volume space is structured along applications and catalogs, so volumes become integral part of those entities.
  • Distributed virtual machine manager: a runtime component that virtualizes the hardware resources used by applications.
  • Logical connection manager: a runtime component that provides the virtual network bindings between components of an application without the need to configure any IP addresses and network settings for distributed applications
  • Application scheduler: a runtime component that selects and assigns hardware resources to applications, based on available grid resources, application constraints and user-provided configuration

Grid Dashboard

The grid dashboard provides:

  • At-a-glance summary of the grid state, including resource use, server states, messages, settings, etc.
  • List of currently installed applications, with the ability to create new apps, copy existing apps, etc.
  • Support page with important links to user documentation, bug tracking database, forums, etc.

Application Configurator

The application configurator is a control panel for configuring application parameters: setting their hardware resources, network resources, tuning and other parameters. It is a single property sheet that includes all configurable parameters.

The application configurator can also be accessed through the command line shell or scripts using the app configure command.

Infrastructure Editor

The infrastructure editor is a visual tool that makes it easy to create, assemble and troubleshoot disposable infrastructure for AppLogic applications.

The user interface of the editor is highly interactive and is modeled after popular drawing programs: you assemble infrastructure by dragging components onto the canvass, wiring them together and configuring each component using a property sheet.

Command Line Shell

The command line shell gives you control of all aspects of an AppLogic grid. The shell runs on the AppLogic controller and can be accessed over SSH using any suitable SSH client package.

The shell commands are designed with the following objectives in mind:

  • make the shell easy to use by human users
  • provide simple means for scripting automation

All commands have a "batch" form of their output that makes it easy to parse programmatically, while the command's default output is structured for convenient interactive operation.

Application Infrastructure Build System

The infrastructure build system compiles the application infrastructure, producing a single entity for the application. It verifies resource and configuration constraints for each appliance and for the application as a whole, builds instance images and enforces the integrity of the application infrastructure. The infrastructure linker binds the application instance to the grid hardware resources just in time for application start, producing a ready-to-run application from the portable application format.

The infrastructure build system is automatically invoked when starting applications and is transparent for the grid operator.

System Catalog

The system catalog contains 9 appliance classes, ready to use in applications.

  • WEB: Apache-based web server with plug-in content/scipts volume
  • MYSQL: MySQL-based database server
  • HLB: Session-aware http load balancer
  • NAS: Network attached storage / file server appliance (http and cifs file access)
  • IN, OUT, NET: Firewalled network gateways based on iptables
  • LUX, LINUX: A tiny and a minimal Linux appliances that can be used as basis for new appliances

The system catalog is a global catalog, containing appliance classes that can be used by all applications on the grid. You can see the full documentation for each appliance in the catalog reference. The system catalog is read-only for AppLogic users and can be changed only by the grid maintainer.

AppLogic also includes an empty global catalog called the user catalog, for your own production-level appliances.

AppLogic also includes a proto global catalog for prototyping new appliances. Each AppLogic release may provide new appliances in this catalog.

The user and proto catalogs are freely modifiable by AppLogic users.

Sample Applications

The AppLogic release also includes the following 4 sample applications:

  • cPanel: shared web hosting
  • TWiki: web-based collaboration platform
  • SugarCRM: customer relationship management system
  • NEW GSC: grid server

The applications are ready to run, requiring only network settings to be configured. You can find details on each application in the Sample Applications reference.

In addition, included in the AppLogic 1.2.6+ releases, there is a new sample application called GSC - Grid Server. GSC is provided to allow hosting providers a quick and easy way to offer a dedicated server offering on the AppLogic Grid. To install GSC or any other sample application on an existing grid, execute the following command from within your AppLogic distribution:

  • ./aldo ai grid=mygrid applications=GSC

AppLogic Grid Distribution System

The AppLogic Grid Distribution System (aldo) installs and configures the AppLogic grid operating system, the appliance catalogs and sample applications.

Aldo easily installs multiple grids from a single distribution server. It can also add and remove servers from existing grids, change grid configuration and upgrade AppLogic. In addition, aldo can "clean" servers, removing AppLogic and any user data stored on AppLogic volumes.

For more information on aldo, see Aldo Reference and Aldo Tutorial.

Installation

Pre-requisites

To install AppLogic, you need a set of servers (1-32) connected with a gigabit Ethernet network and a designated distribution server. See HardwareConfig, AldTutorial and RefAldSetup for more information.

ALERT! Please read at least AldTutorial before choosing and setting up your servers and resources. Not reading or not following this document will likely result in a trial-and-error process which can be long and expensive. We want to make sure your installation is successful from the first time - please contact Technical Support if any of the requirements is unclear.

In addition, you will need an ssh keypair to be used to authenticate the grid maintainer. The public key must also be provided to 3tera, so that you can gain access to the 3tera download server. Please e-mail your public key to 3tera Technical Support

IDEA! We support - but do not require - PGP signatures. If you sign your ssh public key (or the e-mail with which you sent it), we will verify your signature before installing the key.

For more information on ssh keys, please see the man page on ssh-keygen or the Appendix in RefAldSetup.

Downloading the release

As user root from your chosen distribution server, run the following command:

rsync -v --progress applogic@download.3tera.net:/home/applogic/1.2.14c/* /root/applogic-1.2.14c/

IDEA! Make sure you are running ssh agent with the key that you provided to 3tera for downloads. If you don't have a key or would like to use a different key, please contact Technical Support.

Installing a grid

See AldTutorial for a quick step-by-step guide to installing a grid in its default configuration. See RefAldSetup for details on the various options available when setting up a grid (e.g., installing your logo, setting up defaults for installing multiple grids, etc.).

NEW AppLogic 1.2.14b/c only: After installing a new grid, there is an additional update that needs to be installed on the grid. This update is contained in a single file in the AppLogic 1.2.14c release named applogic-1.2.14b-update1.tar.bz2 (intentionally named this way for the 1.2.14c release, same update as for 1.2.14b). Follow the steps below to install this update on your grid:

  • Copy the applogic-1.2.14b-update1.tar.bz2 file to the grid controller of the grid that needs to be updated (this update must be executed from within the grid controller):
    • scp applogic-1.2.14b-update1.tar.bz2 grid controller IP:/tmp/.
  • SSH into the grid controller where the bz2 file was copied. Make sure you are logged in as an AppLogic maintainer and that SSH agent forwarding is enabled (this update uses SSH to copy new files to the servers of your grid).
  • Unpack the file:
    • cd /tmp
    • tar -xjf applogic-1.2.14b-update1.tar.bz2
  • Change the directory to applogic-1.2.14b-update1:
    • cd applogic-1.2.14b-update1
  • Ensure that the following files are present in the applogic-1.2.14b-update1 folder:
    • readme.txt (these directions)
    • srv_reboot_main.sh
    • update.sh
    • vrm_ctl.sh
  • Execute ./update.sh. This script will update all of the servers of your grid. It takes about 5 seconds to apply the update per server (the first server usually takes a little bit longer). The script will report failures if any of the servers fail to be updated; if this occurs it is most likely due to servers being down and inaccessible. After the update is applied to the grid, there is no need to reboot the grid.

The update may be executed more than once on the same grid. This is useful if you add some new servers to the grid at a later time. After the new servers are added, you can re-run the update over the entire grid.

ALERT! May 15, 2007 -- Hotfix 4 must be installed immediately after update1. There is no need to reboot the grid. See Hotfix 4 for installation instructions.

Product Characteristics

Dimensions

Key System Dimensions

  • 32 servers per grid (3tera has tested only up to 23 servers)
  • 31 grids per back-end LAN
  • 1024 applications per grid, up to 128 applications running simultaneously
  • If you need different dimensions, give us a call

Other Dimensional Limits

  • Per application
    • 512 network interfaces per application

  • Per appliance
    • 800MB RAM (due to SCR 1212)
    • 1 CPU (100%)
    • 2000 Mbps bandwidth
    • 16 volumes
    • 64 network interfaces/terminals
      • exception: SCR 1347: only 31 terminals are displayed per side of the appliance in the AppLogic visual editor
    • 1 external network interface
    • 1 default network interface

  • Per server
    • 255 virtual volumes (counting each mirror as a separate virtual volume)
    • 255 shares (counting each mirror as a separate share)
    • 128 mounts (counting each mirror mounted as a separate mount; i.e., 64 if mirroring by two)
    • 42 appliances (AppLogic internal limit)
    • 100 combined count of appliances and network interfaces (Xen limit)

  • Per virtual volume
    • volumes larger than 4 GB are supported (430GB volume tested)

  • If you need different dimensions, give us a call

Hardware Compatibility

  • Single CPU (additional CPUs ignored due to Xen bug, defect SCR 475)
    • Certified: Pentium P4, Intel Xeon, AMD Opteron, AMD Athlon64
    • Supported: Intel Pentium P4 or better; AMD Athlon or better

  • Minimum 512 MB RAM (2 or 4 GB recommended)

  • 80 GB IDE/SATA HDD (120 GB or larger SATA drive recommended)
  • HDD controllers
    • Certified:
      • Intel Corp. 82801EB/ER (ICH5/ICH5R)
      • Silicon Integrated Systems [SiS] 5513
      • Advanced Micro Devices [AMD] AMD-8111 IDE
    • Supported: all IDE, SATA and SCSI devices supported by CentOS 4.3 (excl. Adaptec AHA-15xx). Details.

  • Dual Gigabit Ethernet adapter (Intel or Broadcomm recommended)
    • Certified:
      • Intel Corp. 82541GI/PI Gigabit Ethernet Controller
      • Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet
      • Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (as external 10/100 NICs)
    • Supported: all Gigabit Ethernet network adapters supported by CentOS 4.3. Details.

  • Any single non-blocking gigabit Ethernet layer 2 switch (for private network; all ports must be on the same switch, no cascading)

  • If you have different network or storage devices, give us a call

Software Compatibility

  • Server
    • CentOS 4.3, 32-bit

  • Appliances
    • 32-bit Linux OS
    • Kernel 2.6.11 with Xen 2.0.7 support (included in AppLogic installation)
    • Tested: Fedora Core 3
    • Supported: all 32-bit RedHat-compatible Linux distros with Xen support, CentOS 4.3, Debian, and SuSE

  • Appliance volumes
    • File systems supported: ext2, ext3
    • Swap volumes are supported and optional for appliances
    • Integration services for other file systems are available

  • If you are interested in other Linux distros, file systems and software infrastructure, give us a call.

Important Notes

  1. Shell access requires ssh and uses public/private key authentication. For increased security, password-based logins are not supported except during grid installation.
  2. When accesssing the grid over ssh, the login user name is always root, regardless of the AppLogic user name. For the purpose of ssh logins, users and their roles are uniquely identified by their public ssh keys.
  3. Web browser's Javascript and pop-ups must be enabled to use the web-based graphical user interface (dashboard, editor, documentation)
  4. Users are responsible for allocating, assigning and use of externally visible IP addresses for applications; AppLogic takes care of all internal network assignments
  5. While the AppLogic distribution system sets up all grid servers and controllers with carefully pre-configured firewalls and disables unnecessary network services, users and maintainers are encouraged to verify the security settings of their systems.
  6. Network performance between servers on the private network used for volume and inter-appliance communication is measured to approximately 900 Mbps. The TCP network performance measured between appliances residing on different servers is measured as 720 Mbps (750-800 with server firewalls disabled; all measured on AMD Opteron 1.8 GHz CPU servers). On certain Intel-based servers we see inter-appliance network performance down to approximately 350 Mbps (while the speed between servers still at approx. 900 Mbps). We are researching the conditions/CPU specifics that may be causing this degradation.
  7. The grid user interface is designed for use by a single operator at a time. Installing grids, editing and starting applications, and other grid control operations by multiple operators or running scripts concurrently can cause (recoverable) grid malfunction, similar to the effects of reconfiguring a physical server by multiple administrators at the same time. Nevertheless, grids can have multiple maintainers and regular users (operators).
  8. Resource limits on appliance hardware resources are enforced differently for different types of resources (CPU, memory, bandwidth). CPU is "no less than", memory is "exactly that much" (includes VM overhead), bandwidth is enforced only to the degree of not scheduling components requiring more bandwidth than available (at appliance start time).
  9. Grids in which linear scalability of performance is important should be built using servers that are as uniform as possible in CPU type/speed, memory size and disk capacity. AppLogic will work correctly in grids assembled from servers with different amounts of hardware resources; however, on such grids you may experience sub-linear performance.

Known Problems and Limitations

Limitations

1. Only one CPU core per server can be used
The remaining cores/CPUs are disabled due to SCR 475, a critical bug in the Xen 2.0 virtual machine manager. This limitation will be removed with the upgrade to Xen 3.0, as Xen 3.0 proves to be stable for production use.
2. Grid size is limited to 32 servers per grid
This is a limitation of the current AppLogic release. This release has only been tested with 23 servers.
3. AppLogic's web based user interface requires Firefox 1.5 or Microsoft Internet Explorer 6.x browser to operate
Javascript, pop-ups and cookies should be enabled for the grid controller's host for proper operation of the user interface. Please use the latest available bugfix versions of these browsers, as they correct a number of browser defects needed for AJAX applications. Firefox 1.0 browsers are also supported, with some minor known problems (printing and image caching).
4. The private network Ethernet switch is a single point of failure in the grid
If the Ethernet switch dies or loses power, the grid will stop operating and will need to be restarted after the switch is restored to operation or replaced.
5. Protocols are not enforced on appliance terminals, only endpoints are enforced
This means that an appliance can only talk to appliances connected to it (plus its own server and the grid controller). Nevertheless, protocols on new appliances should be properly specified in order to ensure application design integrity and compatibility with future versions of AppLogic.
6. The total available disk space does not take volume mirroring into account
The total available disk space reported by the grid info command is a raw estimate and does not take volume mirroring into account. The true available disk space is the reported available amount divided by the number of mirrors (2 mirrors by default). For example, if there is 1000GB of available disk space and the grid was configured for mirroring of 2, the available disk space is 500GB. Also, in order to successfully mirror volumes, there must be enough disk space on at least X servers where X is the number of mirrors (AppLogic will not fail to create a volume if any one of its mirrors cannot be created, it will display a warning that the volume could not be mirrored).
7. A server failure during application start may cause the application start to fail
If an application is started and one of the grid's servers fails, the application start will fail if one or more of the application's appliances were scheduled to run on the failed server. If this situation occurs, simply restart the application.
8. Appliances are limited to a maximum of 840MB of memory due to SCR 1212
Due to a limitation in the XEN domU kernel used in all appliances in the system and proto catalogs (HIGHMEM support is disabled), these appliances are limited to a maximum of 840MB of memory. In AppLogic 1.2.8+, the appliances in all catalogs have been intentionally limited to 800MB - the maximum memory resource is set to 800MB for all appliances. If it is critical to have more memory in a specific appliance, contact Technical Support for assistance (3tera has an updated domU kernel which resolves this problem). This problem should be resolved in the next AppLogic release.

Known Problems and Issues

The following are the key known problems in this release:

1. Defect SCR 746 - Appliances may stop working on a server after 80 appliance starts/stops on that server
The problem appears as a failure of an application to start on the server for no apparent reason, and currently running appliances on that server may lose network connection. If the problem occurs, reboot the server (or disable it to prevent further scheduling until it can be rebooted), and restart the applications that have appliances on it. This problem has been observed only on servers with hyperthreading enabled. The current release disables hyperthreading to avoid this and other bugs in Xen ( SCR 475). We will re-test and update this entry accordingly.

2. Defect SCR 857 - Grid reboot may degrade one or more system volumes
If a grid is rebooted using the "3t grid reboot" command, when the grid comes back up after the reboot, one or more of the system volumes may become degraded. If your grid needs to be rebooted, after the grid has been rebooted and comes back online, log in as a regular grid user and repair the system volumes by executing the following commands below. This will ensure that the system volumes are always in a clean state. This bug will be fixed in the next AppLogic release.
  • repair vol _sys.boot
  • repair vol _sys.meta
  • repair vol _sys.impex

3. Defect SCR 886 - Unable to receive exceptions for VNA errors
If an invalid network packet is received by one of the terminals of an appliance, the network packet is rejected but the associated nfy_conn_unexpected.sh script on the controller is not invoked. This is a known problem and will be fixed in an upcoming AppLogic release.

4. Defect SCR 1042 - vol unmount app does not work
If all volumes for an application need to be unmounted, execute vol list app --all --mounted to get a list of all volumes for the application and then unmount each volume one by one. This is a known problem and will be fixed likely in an upcoming AppLogic release.

5. Defect SCR 1199 - Unable to migrate a volume whose streams are all on disabled servers
When migrating a volume, make sure that at least one of its streams is on an enabled server or else the migration command will fail. The volume can be completely migrated off of its original set of servers by migrating the volume twice.

6. Defect SCR 1233 - Grid automatic recovery (HA) may fail due to servers taking too long to reboot
Some physical servers may take a long time to reboot - this may cause AppLogic's automated grid recovery to fail. The end result of this is that applications may not be restarted after the grid recovers from a failure. This is due to the grid controller waiting for a maximum of 3 minutes for all servers to reboot and reconnect to the grid controller (which may not be enough time for all servers to reboot). Workaround is to manually restart applications after all servers have reconnected to the grid controller - execute "list srv" to ensure that all servers are connected to the grid controller - they all should be in the UP state.

7. Defect SCR 1234 - Grid flapping file is not always reset when the operator intentionally reboots the grid
When the operator reboots the grid, the grid flapping file /var/applogic/state/ha/grid.desc is supposed to be reset and a message should be displayed on the dashboard stating that the operator rebooted the grid intentionally ("Grid has been restarted by operator on ..."). Occasionally when rebooting the grid, the grid file is not reset nor is the dashboard message displayed. The only problem that this may cause is upon the next grid failure, the applications may not be automatically restarted (depending on how many times the grid has failed when this bug occurs). To workaround this problem, if after an intentional grid reboot, there is no dashboard message displayed, manually delete the grid.desc file on the grid controller.

8. Defect SCR 1323 - Server may crash while stopping appliances under high I/O load when using "app stop --all"
This is a 3rd party bug in XEN 2.0.7 in its block device driver running in Domain-0 of each server. This bug does not appear if the user stops the applications one by one. "app stop --all" should not be used. This bug will be fixed in a future release.

9. Defect SCR 1383 - AppLogic server became unresponsive - server crashed due to a Linux kernel oops
This problem has been seen only twice by 3tera over the past 4 weeks of extensive testing. On two different grids, a single server on each grid crashed (the server does not respond to ping or SSH). When this occurs, the server will not reboot on its own - the server must be manually rebooted. The appliances that were running on the failed server were restarted automatically by AppLogic and the grids continued normal operation. 3tera is working on reproducing this bug and will provide a fix for this in the near future.

10. Defect SCR 1219 - 3 to 15 minute system lockout on server failure
In most cases when a server fails, shell commands will hang for 3 minutes but the AppLogic controller will remain operational. After 3 minutes, the grid will return to normal operation. In rare cases where the failed server contains one of the mirrors of the AppLogic controller system volumes (boot, meta or impex) and the server fails to reboot, the user will be locked out of the controller for up to 15 minutes. After 15 minutes, the grid should return to normal operation. This bug will be fixed in a future release.

11. Defect SCR 1360 - Appliance inside shows slightly less memory and less disk size than allocated
The reason for the slightly reduced resources is related to allocation for service areas. For memory, it is likely due to XEN related to the memory map table for a virtual machine. For disk, it is due to normal file system service areas (this is the same as on regular Linux servers).

12. Defect SCR 1391 - Default appliance shutdown timeout is too short which may result in corrupted volumes
AppLogic does not allow the user to configure the shutdown timeout for appliances (only the boot timeout may be configured). The default shutdown timeout of 60 seconds is not enough for appliances that have lots of services installed and are under heavy load when being shut down (observed with heavily loaded cPanel appliances). This may result in incomplete filesystem flush, similar to the one that happens when a physical server loses power. Most filesystems use journaling and recover automatically from this failure but there is a slight chance of file system corruption and data loss. For grids on which this type of appliances are used, the shutdown timeout can be configured manually. On every server of the grid, update the /usr/local/apl-srv/templates/vrmd-srv.conf.tmpl and /usr/local/apl-srv/templates/vrmd-ctl.conf.tmpl template files with the proper shutdown timeout (vm_shutdown_tout). The value is specified in milliseconds; default is 60 seconds. Note that this requires a reboot of the entire grid for the change to take effect. An install-time update for this problem will be released shortly.

13. Defect SCR 1338 - Downgrade from 1.2.14c to previous releases does not restore glibc
AppLogic 1.2.14c installs new glibc packages suitable for operation with hypervisor. When downgrading to a previous release of AppLogic, the downgrade does not restore the original glibc packages. The downgraded grid will work OK but will not really be the previous version because of the difference in the installed glibc packages. To workaround this problem, re-install the original glibc packages from the CentOS 4.3 distribution on each server and on the grid controller.
  • The following 3 glibc packages need to be re-installed (using "rpm -U --force --nodeps pkg-name"):
    • glibc-2.3.4-2.19.i386.rpm
    • glibc-common-2.3.4-2.19.i386.rpm
    • nscd-2.3.4-2.19.i386.rpm
  • After re-installing the packages, rename the /lib/tls directory to /lib/tls.disabled.

Contact Information

For questions about this release and its operation, please contact Technical Support:


Self-help Resources

These links are also accessible through the Support Tab of your grid dashboard.


3Tera Partner Resources

3Tera partners and direct licensees can also obtain contract-based support and additional information resources.

On-line

Live Support

  • e-mai: support@3tera.com
  • phone: (888) 492-4738
  • fax: (949) 305-0160, ATTN: Technical Support

Support hours are Monday through Friday, 9:00am to 9:00pm Pacific Time (GMT-0800). We may be able to respond outside these hours. Please mark urgent messages as such.

IDEA! When calling the emergency phone support, please e-mail to support first -- this will ensure that all support engineers will have access to your information. Keep in mind that the phone support rings several engineers in sequence; don't hang up while it is ringing.

Interactive Sessions

We can set up interactive help sessions using WebEx. To reach our WebEx site, go to http://3tera.webex.com/. You should also receive from us a meeting number to set up a successful session.

We have verified access with the following browsers/OS combinations:

  • Windows XP: MS Internet Explorer, Mozilla Firefox
  • Linux: Mozilla Firefox with Java plug-in

WebEx sessions require Java or ActiveX to work. For more information on system requirements and to test whether your browser can access WebEx, go to http://developers.webex.com/api/jointest/index.php.


-- EricT - 18 May 2007
 
Copyright © 2005-2008 3tera, Inc. All Rights Reserved.
%