What are the main considerations for server storage?
When configuring the main hard drives for your data capture server, there are many different options to consider.
The first and most crucial consideration is fault tolerance. As it is impractical in most cases to run full backups of TeleForm servers (see TeleForm backup FAQ), it is essential that the most common causes of data loss are planned for.
When a drive breaks, it usually suffers complete failure within minutes which means that stored data is lost. Even if you have an up-to-date backup, it will typically take hours or even days to repair the server, re-install the operating system and restore the backup. For this reason, we consider a fault tolerant configuration of hard drives an essential part of any server.
It is possible to configure hard drives in a server (or even a workstation) so that if a drive fails, the system is able to continue operation unhindered. This is commonly referred to as a RAID configuration.
The most basic type of RAID is RAID 1, which means two hard drives of equal capacity are mirrored so that each drive contains exactly the same information. In a RAID 1 configuration, if one drive fails, the other one will continue to work. The drawback to RAID 1 is that you have to purchase two drives and lose 50% of the total storage capacity.
When a drive fails, you source a replacement drive and the server will re-create the mirror on the new drive. At this point, the system is fault tolerant once more.
The much more complex RAID 5 requires three or more drives. It works by splitting the data across all but one of the drives and then writing parity information to the last drive. For example, if you have three drives, it will write the data to drives one and two and then write the parity information to drive three. The parity information is written to a different drive each time, so the next set of data will be split across drives two and three with the parity information sent to drive one.
The parity information is basically the key to calculating a missing part of data. For example, with 1 + 3 = 4, it is possible to calculate any of the numbers if only one is missing. If you consider each number (or chunk of data) to be stored on one of the three drives in a RAID 5 array the server can rebuild the data if any of the three drives fails.
|Number of disks||Disk 1||Disk 2||Disk 3||Disk 4||Disk 5||Parity disk||Missing data|
|Three disk RAID 5||1||Failed||-||-||-||4||3|
|Six disk RAID 5||1||5||2||Failed||9||24||7|
One main advantage with RAID 5 is that you lose a much smaller amount of space to the fault tolerance. RAID 1 looses 50% whereas RAID 5 only loses up to 33% or less depending on the number of drives. You can create RAID 5 arrays of much larger capacity than would be possible with one drive by simply adding more drives to the array.
The drawback to RAID 5 is an overhead for the server to calculate the parity information every time data is written to the drives. Due to the way the data is split, there is a performance loss when reading the data back from the array especially for large files.
RAID 1 V RAID 5A few years ago, the largest drives available where only 36GB in size. RAID 5 was ideal for creating large volumes as you could use four 36GB drives to create a 108GB RAID 5 array. Now that 500GB drives are commonplace, RAID 5 is much less useful. Two 500GB drives in a RAID 1 array would be cheaper, simpler to manage and faster than three 250GB drives in a RAID 5 array.
Pushing the Performance Boundaries with RAID 0
In order to increase performance, it is possible to use RAID 0 which requires two disks and splits the data across them. Half the data is written to disk one and = the second half is written to disk two simultaneously. This is called striping. The theoretical speed is double that of the component drives and, although the increase in speed might not reach the theoretical maximum, it is still a significant performance boost. To test the theory, we performed a series of tests with some 10,000RPM Western Digital Raptor SATA drives and 8GB of data consisting of varying file sizes to simulate real world usage:
|RAID 1 MB/Sec||RAID 0 MB/Sec||Speed Increase|
While RAID 0 does not waste any space, as with RAID 1 and 5, it’s huge drawback is that it fails to provide any fault tolerance and it doubles the chance of fault occurring! It is putting your eggs into two baskets. However, if either egg is dropped, you lose the whole lot!
RAID 10 to the Rescue
By coupling RAID 1 and RAID 0, you can create a volume that is fault tolerant and faster than its component drives. This is called RAID 10 (or RAID 1+0). You need four or more drives and you will lose 50% of your storage capacity as with RAID 1. You start by creating two RAID 1 mirrored volumes and then use RAID 0 to stripe them. This provides the optimum level of fault tolerance and speed.
I am building a TeleForm Workgroup or Enterprise system. What are the main network configuration issues to consider?
When building a TeleForm Workgroup or Enterprise system, there are many considerations when designing the network. This will inevitably influence the possible physical locations of the various TeleForm functions below:
- eForms Submission via eMail
- eForms Submission via Web
- Scan via network scanner
- Web Capture
- Remote Capture Station
- Scan Station
- Auto Merge Publisher
eForms submission via e-mail and web
Users submitting data via PDF or HTML forms can be located on any type of connected remote network. Naturally a working e-mail client and the full version of Acrobat is required for e-mail submission and a local network or internet connection is required for on-line web submission. On-line web submission uses an HTTP post to a CGI or ASP script so the only networking requirement is either port 80 (HTTP) for unsecured or port 443 (SSL) for secured submission, although alternative ports can be configured if required.
Scan via network scanner
Whilst the TeleForm scanning modules are best used from a local device, we recommend you implement network scanners if there are many sites or branches wishing to scan forms to a central system. We have worked with several multi-functional devices with network scanners built-in including the HP Digital Sender range. These devices usually connect directly to the network via Ethernet and do not require a controlling PC. Images are usually saved to a network folder as TIFF or PDF files and TeleForm can periodically monitor this folder and create batches or non-batches from them.
The network requirements for such devices depend on the unit used. Most support UNC paths for local networks whilst some support FTP for sending to remote locations.
The Web Capture option is included with TeleForm Enterprise and is an option for TeleForm Workgroup. It allows a web interface to be deployed on a server to allow remote branches to connect via the internet or intranet and submit scanned images. This allows remote branches to utilise high speed scanners as opposed to the relatively slow and impractical network scanner type devices.
Initially, the client machine needs permissions to install the ActiveX scanner module but subsequently a web browser and internet connection is required. Due to the file sizes, it may take a few minutes to submit the scanned batch if the connection is slow.
Remote capture station
Rather than being deployed via a web browser, the Remote Capture Station is installed from a CD. It is similar to the full TeleForm Scan Station in its appearance and functionality but you can set-up the submission to the TeleForm server to be via a UNC path or FTP.
Its main advantage over the Web capture option is the ability to queue up batches before submitting them. They can even be configured to be submitted at off-peak times or out of hours which is really useful if the connection is via an already busy corporate WAN.
Designer, scan station, reader, auto merge publisher and verifier
All the remaining TeleForm modules can only be run locally by default. Although TeleForm V9 onwards is much more tolerant of slow networks than older versions, it is still recommended to have a 100Mbit non-routed connection. We have tested on links as slow as 256Kbit and although not usable on a production basis it does start up and run adequately. This is largely due to efficient networking methods used by Microsoft SQL Server and the installation of TeleForm client executables on local workstations rather than the server.
However, you can run all but the Scan Station via Citrix Metaframe. This allows remote staff to design forms and verify data for a truly dispersed, centrally managed, enterprise solution.
What is a virtual server?
One of the big new technologies of the last few years has been virtualisation. Although the concept had been around for years, the combination of power and energy savings, server consolidation, multi-core processing and remote server centres create the perfect environment for mass adoption.
A virtual machine is a complete PC or server with a virtual hard drive, memory and operating system operating inside an emulator on a host machine. This allows you to run multiple virtual machines on one physical host server or PC, installed with applications and connected to your network as if they were real machines.
Virtual machines offer:
- Better utilisation of server hardware for fewer physical servers
- More energy efficient server rooms
- Savings in new hardware budgets
- Faster deployment of new clean or standard build platforms
- Hardware independence - simply move from one physical server to another
- Simple whole server backup with quick restoration
Disadvantages of virtualisation:
- Licensing software can become complicated
- Virtual machine performance is slower than a physical server
- Some hardware is incompatible with a virtual environment
We have implemented both TeleForm and LiquidOffice into Microsoft Virtual Server 2005, Hyper-V and VMware ESXi for a variety of purposes. Most virtualisation solutions impact performance. The usual application of a virtualised system is for Disaster Recovery or testing purposes. Careful consideration should be given to implementing a live system on a virtual environment.
Hard drive performance is often much slower in a virtual environment and it will be necessary to invest in faster hardware or multiple physical servers to achieve the same level of performance as a physical server.
Microsoft limit the number of virtual servers run on a physical server with the physical servers’ license. For example, Server 2008 Enterprise allows four virtual servers while Server 2008 Standard allows one. Supporting software (SQL Server) will need to be licensed correctly in addition to applications such as LiquidOffice and TeleForm.
If software resides on a single physical server, please do not assume manufacturers allow a single license to be virtualised multiple times - even if only a backup.
As previously explained, a virtual machine runs in an emulator which duplicates the core hardware (CPU, memory, graphics card etc.) required to run an operating system. This means that the virtual server only sees the emulated hardware and none of the real physical hardware. This provides hardware independence and allows virtual servers with different hardware specifications to be moved from one physical server to another. In such scenarios, the virtual server will continue to run without the need to update drivers.
However, hardware such as fax cards, parallel ports and physical network cards cannot be accessed by a virtual server. This would make it impossible to run a ZetaFax fax server on a virtual server with fax cards or to install TeleForm V8 with a hardware key connected to a parallel port (Virtual Server 2005 and ESXi V3.5 do support parallel ports but Hyper-V does not).
It should be noted that TeleForm V10 requires machine locking during the activation process which requires a physical network card. Whilst a Hyper-V virtual server could be used as a TeleForm SQL or file server, this is not possible for the activation or license service. VMware's ESXi emulates network cards differently to enable use in the TeleForm activation process.
Please contact us with details of your specific requirements and we can advise further or arrange compatibility testing if required.
Note to editors: Please feel free to reproduce FAQ information in whole or part but we do request that you credit ePartner Consulting Ltd and place a link back to our website whenever information is reproduced.
The FAQ's are compiled on the basis of questions received by ePC. If you have a question about data capture or workflow software, please contact us.