/* Partykof: August 2010 - Managing information and Technology */
In this blog, I am summarizing some of my work so far and the issues I'm facing everyday in my work as an IT professional.
You are welcome to follow, comment and share with others. If you want to drop me a private note, send me an e-mail


Monday, August 23, 2010

Server management – In and out of band infrastructure – Part 1


One of the numerous tasks of an administrator is to access and control IT assets across the entire organization, be it inside a local data center or on a remote office.
Located somewhere in the Middle East, I was responsible for a full production corporate data center sited in Santa Clara, CA. and support another engineering data center in Shenzhen China. This drove me to find a solution that will allow me and my group complete access in order to maintain services availability.
Usually the focus on enabling access to a server is based on the criticality of the application running on the server or the service it provides. To deliver the highest possible level of availability, you need to make sure you minimize the down time of a service - First you need to know that it is down, than you should figure out a plan for how to repair it. The time measured between the failure notifications to its repair is called, Mean Time to repair (i.e. MTTR). MTTR should be as short as possible, but it is really defined by two objectives.
  1. Restore time objective (RTO)
  2. Restore point objective (RPO)

Discussing these objectives is a can spread across several posts, but this post can assist with on minimizing the RTO, and deal with normal operation and not only in crisis situations.
For the sake of this discussion I will present in this post several options of connecting to a server for managing it, though in practice, only the options that will allow recovery of the service back to operation should really be implemented. To define which connections you will require, you need to come up with failure scenarios, and which connection will be utilized to overcome these failures.

The connections to a server are divided in to two major categories:
  1. Out of Band infrastructure (OOBI) – utilizes a management channel that is isolated from the data channels.
  2. In Band infrastructure – allowing management through the use of regular data channels, such as Ethernet network, to the managed device.

Figure 1: Server interfaces for management

Out of Band interfaces

  • VGA – This port refers to the display graphics output of the server, it is based on a 15-pin VGA connector to a display monitor just as you would use on a desktop. Together with a keyboard and mouse connected to the server you get a full graphical interface to the manage the local operating system (OS). Some new systems will offer DVI or HDMI interface instead of the VGA port.
  • Modem – A modem is an interface connected to the server, which allows remote dial-in to the server using a different network than the standard data network, it might be a standard telephone line, an ISDN connection or a GSM/UMTS/HSDPA wireless connection.
  • IPMI/BMC – Is a new concept based on a System-on-Chip integrated to the server’s motherboard. It allows an IP based connection to the system, and is independent from the OS status, which means it works even of the OS is down. This interface provides access to the system platform, BIOS settings, remote screen view - graphical or text mode.
  • Serial – Is an interface which allows a connection based on a serial protocol such as RS-232, which provides access to the system console; a legacy device such as Digital’s VT100 could provide a basic terminal interface, usually text based for performing administrative tasks to the local system.
  • USB – The USB interface is utilized in several ways, either by connecting the Keyboard and mouse in VGA mode, or connecting an external Modem or a remote managed UPS which can be used to power down the system.
  • Power – “He who controls the power control the device”. Power is the fundamental element for every electronic device. If you can power on or off the device, you have basic management over it. Such as in the case of remote routers which are not responding.
  • Vendor Specific – Major vendors developed dedicated interfaces to manage their appliances or devices. These implementations vary from on board solution such as Oracle’s (Sun) LOM, HP ILO or dedicated additions to the system such as IBM’s RSA card or Dell’s DRAC.
In band interfaces
  • Ethernet – Using the system’s network interface, the operating system could run an application that provided management capability for the system OS and its hardware platform. From the simple network management protocol (SNMP), Telnet, SSH, a full graphic Remote Desktop Connection (RDP), VNC or in some cases a vendor propriety agent, such as IBM Tivoli or BMC PATROL.
  • IPMI over Ethernet – In some cases, hardware manufacturers will use the Ethernet interface to provide access to the hardware platform’s System-on-Chip. It will be assigned with a different IP address in case the OS fails. This solution is very useful in dense environment and is used to save cables, switches or ever ports on the actual hardware. What it gains in saving it loses in security, as some will advise the need for a separate management network to limit access.

Connecting a single server to different kind of management interfaces can contribute a lot to the cable sprawl in your data center.

A sample for a complete solution would look something like this: (keep in mind that this is for a single server)

   Figure 2: A single server management architecture

Keeping track of every server and the different ways to connect to it, is becoming a very difficult task. Just try to imagine it; you keep a table of the users, and the password, the IPMI IP addresses, the modem extensions, the power switches and the VGA monitors, for each connection. This can become a real headache.
Fortunately, there are many great solutions available today by vendors such as Avocent, ATEN,  MRV and many more, that allow us to minimize this sprawl. Solutions such as multiplexed KVM switches, Serial console servers and even an IPMI portal.
In my next post - Server management – In and out of band infrastructure – part 2, I will cover such solutions.

-Partykof