Mini_HOWTO: Multi Disk System Tuning
  Stein Gjoen, sgjoen@nyx.net
  v0.10c, 29 December 1996

  This document describes how best to use multiple disks and partitions
  for a Linux system. Although some of this text is Linux specific the
  general approach outlined here can be applied to many other multi
  tasking operating systems.

  1.  Introduction

  In commemoration of the "Linux Hacker V2.0 - The New Generation" this
  brand new release is code named the Pink Socks release. New code names
  will appear as per industry standard guidelines to emphasize the
  state-of-the-art-ness of this document.

  This document was written for two reasons, mainly because I got hold
  of 3 old SCSI disks to set up my Linux system on and I was pondering
  how best to utilise the inherent possibilities of parallelizing in a
  SCSI system. Secondly I hear there is a prize for people who write
  documents...

  This is intended to be read in conjunction with the Linux Filesystem
  Structure Standard (FSSTND). It does not in any way replace it but
  tries to suggest where physically to place directories detailed in the
  FSSTND, in terms of drives, partitions, types, RAID, file system (fs),
  physical sizes and other parameters that should be considered and
  tuned in a Linux system, ranging from single home systems to large
  servers on the Internet.

  Even though it is now more than a year since last release of the
  FSSTND work is still continuing, under a new name, and will encompass
  more than Linux, fill in a few blanks hinted at in FSSTND version 1.2
  as well as other general improvements. The development mailing list is
  currently private but a general release is hopefully in the near
  future.

  It is also a good idea to read the Linux Installation guides
  thoroughly and if you are using a PC system, which I guess the
  majority still does, you can find much relevant and useful information
  in the FAQs for the newsgroup comp.sys.ibm.pc.hardware especially for
  storage media.

  This is also a learning experience for myself and I hope I can start
  the ball rolling with this Mini-HOWTO and that it perhaps can evolve
  into a larger more detailed and hopefully even more correct HOWTO.

  Note that this is a guide on how to design and map logical partitions
  onto multiple disks and tune for performance and reliability, NOT how
  to actually partition the disks or format them - yet.

  1.1.  News

  Hot news: I have finally upgraded my system to Debian 1.1.11 and have
  replaced the old Slackware values with the Debian values for disk
  space requirements for the various directory. As it happens I
  installed version 1.1.11 just a few days before Debian 1.2 hit the
  streets. There are no points for guessing what will appear in the next
  release of this document.

  Also new in this release is a Questions-and-Answer section. As you,
  the reader, have hardly given me any feedback I will try to anticipate
  the most common questions and give a quick summary.

  Also there is now a section detailling a setup based on this document
  for a computer in an academic setting, using a combination of IDE and
  SCSI drives.

  Recent news: finally I have found the time to add information on
  physical storage media so I am not far off from releasing the whole
  shebang as a HOWTO, a big upgrade from the original mini-HOWTO.

  More news: there has been a fair bit of interest in new kinds of file
  systems in the comp.os.linux newsgroups, in particular logging,
  journaling and inherited file systems. Watch out for updates. Projects
  on volume management is also under way.

  The latest version number of this document can be gleaned from my plan
  entry if you do "finger sgjoen@nox.nyx.net"

  Also, from now on the latest version will be available on my web space
  on nyx: The Multiple Disk Layout mini-HOWTO Homepage
  <http://www.nyx.net/~sgjoen/disk.html>.  A text-only version as well
  as the SGML source can also be downloaded from my homepage.

  Also planned is a series of URLs to helpful software referred to in
  this document. A mirror in Europe will be announced soon.

  1.2.  Credits

  In this version I have the pleasure of acknowledging even more people
  who have contributed in one way or another:

       ronnej@ucs.orst.edu
       cm@kukuruz.ping.at
       armbru@pond.sub.org
       R.P.Blake@open.ac.uk
       neuffer@goofy.zdv.Uni-Mainz.de
       sjmudd@phoenix.ea4els.ampr.org
       nat@nataa.fr.eu.org
       sundbyk@horten.geco-prakla.slb.com
       gjoen@sn.no

  Special thanks go to nakano@apm.seikei.ac.jp for doing the Japanese
  translation, general contributions as well as contributing an example
  of a computer in an academic setting, which is included at the end of
  this document.

  Not many still, so please read through this document, make a
  contribution and join the elite. If I have forgotten anyone, please
  let me know.

  New in this version is an appendix with a few tables you can fill in
  for your system in order to simplify the design process.

  Any comments or suggestions can be mailed to my mail address on nyx:
  sgjoen@nyx.net.

  So let's cut to the chase where swap and /tmp/ are racing along hard
  drive...

  2.  Structure

  As this type of document is supposed to be as much for learning as a
  technical reference document I have rearranged the structure to this
  end. For the designer of a system it is more useful to have the
  information presented in terms of the goals of this exercise than from
  the point of view of the logical layer structure of the devices
  themselves. Nevertheless this document would not be complete without
  such a layer structure the computer field is so full of, so I will
  include it here as an introduction to how it works.

  2.1.  Logical structure

  This is based on how each layer access each other, traditionally with
  the application on top and the physical layer on the bottom.  It is
  quite useful to show the interrelationship between each of the layers
  used in controlling drives.

               ___________________________________________________________
               |__     File structure     ( /usr /tmp etc)      __|
               |__     File system           (ext2fs, vfat etc)      __|
               |__     Volume management       (AFS)               __|
               |__     RAID, concatenation     (md)                 __|
               |__     Device driver       (SCSI, IDE etc)       __|
               |__     Controller             (chip, card)          __|
               |__     Connection             (cable, network)  __|
               |__     Drive               (magnetic, optical etc) __|
               -----------------------------------------------------------

  In the above diagram both volume management and RAID and concatenation
  are optional layers. The 3 lower layers are in hardware.  All parts
  are discussed at length later on in this document.

  2.2.  Document structure

  Most users start out with a given set of hardware and some plans on
  what they wish to achieve and how big the system should be. This is
  the point of view I will adopt in this document in presenting the
  material, starting out with hardware, continuing with design
  constraints before detailing the design strategy that I have found to
  work well.  I have used this both for my own personal computer at
  home, a multi purpose server at work and found it worked quite well.
  In addition my Japanese co-worker in this project have applied the
  same strategy on a server in an academic setting with similar success.

  Finally at the end I have detailed some configuration tables for use
  in your own design. If you have any comments regarding this or notes
  from your own design work I would like to hear from you so this
  document can be upgraded.

  3.  Drive technologies

  A far more complete discussion on drive technologies for IBM PCs can
  be found at the home page of The Enhanced IDE/Fast-ATA FAQ
  <http://thef-nym.sci.kun.nl/~pieterh/storage.html> which is also
  regularly posted on Usenet News.  Here I will just present what is
  needed to get an understanding of the technology and get you started
  on your setup.

  3.1.  Drives

  This is the physical device where your data lives and although the
  operating system makes the various types seem rather similar they can
  in actual fact be very different. An understanding of how it works can
  be very useful in your design work. Floppy drives fall outside the
  scope of this document, though should there be a big demand I could
  perhaps be persuaded to add a little here.

  3.2.  Geometry

  Physically disk drives consists of one or more platters containing
  data that is read in and out using sensors mounted on movable heads
  that are fixed with respects to themselves. Data transfers therefore
  happens across all surfaces simultaneously which defines a cylinder of
  tracks. The drive is also divided into sectors containing a number of
  data fields.

  Drives are therefore often specified in terms of its geometry: the
  number of Cylinders, Heads and Sectors (CHS).

  For various reasons there is now a number of translations between

  o  the physical CHS of the drive itself

  o  the logical CHS the drive reports to the BIOS or OS

  o  the logical CHS used by the OS

  Basically it is a mess and a source of much confusion. For more
  information you are strongly recommended to read the Large Disk mini-
  HOWTO

  3.3.  Media

  The media technology determines important parameters such as
  read/write rates, seek times, storage size as well as if it is
  read/write or read only.

  3.3.1.  Magnetic Drives

  This is the typical read-write mass storage medium, and as everything
  else in the computer world, comes in many flavours with different
  properties. Usually this is the fastest technology and offers
  read/write capability. The platter rotates with a constant angular
  velocity (CAV) with a variable physical sector density for more
  efficient magnetic media area utilisation.  In other words, the number
  of bits per unit length is kept roughly constant by increasing the
  number of logical sectors for the outer tracks.  Seek times are around
  10ms, transfer rates quite variable from one type to another but
  typically 4-40 MB/s.

  Drives are often described by the geometry or drive parameters which
  is the number of heads, sectors and cylinders, which is confused by
  translation schemes between physical and various logical geometries.
  This is a mine field which is described in painful details in many
  storage related FAQs. Read and weep.

  3.3.2.  Optical drives

  Optical read/write drives exist but are slow and not so common. They
  were used in the NeXT machine but the low speed was a source for much
  of the complaints. The low speed is mainly due to the thermal nature
  of the phase change that represents the data storage. Even when using
  relatively powerful lasers to induce the phase changes the effects are
  still slower than the magnetic effect used in magnetic drives.

  Today many people use CD-ROM drives which, as the name suggests, is
  read-only. Storage is about 650MB, transfer speeds are variable,
  depending on the drive but can exceed 1.5MB/s. Data is stored on a
  spiraling single track so it is not useful to talk about geometry for
  this. Data density is constant so the drive uses constant linear
  velocity (CLV). Seek is also slower, about 100ms, partially due to the
  spiraling track. A new type (DVD) is on the horizon, offering up to
  about 18GB on a single disk.

  3.3.3.  Solid State Drives

  This is a relatively recent addition to the available technology and
  has been made popular especially in portable computers as well as
  embedded systems. Containing no movable parts they are very fast both
  in terms of access and transfer rates. The most popular type is flash
  RAM, but also other types of RAM is used. A few years ago many had
  great hopes for magnetic bubble memories but it turned out to be
  relatively expensive and is not that common.

  In general the use of RAM disks are regarded as a bad idea as it is
  normally more sensible to add more RAM to the motherboard and let the
  operating system divide the memory pool into buffers, cache, program
  and data areas. Only in very special cases, such as real time systems
  with short time margins, can RAM disks be a sensible solution.

  Flash RAM is today available in relatively several 10's of megabytes
  in storage and one might be tempted to use it for fast, temporary
  storage in a computer. There is however a huge snag with this: flash
  RAM has a finite life time in terms of the number of times you can
  rewrite data, so putting swap, /tmp or /var/tmp on such a device will
  certainly shorten its lifetime dramatically.  Instead, using flash RAM
  for directories that are read often but rarely written to, will be a
  big performance win.

  This example illustrates the advantages of splitting up your directory
  structure over several devices.
  Solid state drives have no real cylinder/head/sector addressing but
  for compatibility reasons this is faked by the driver to give a
  uniform interface to the operating system.

  3.4.  Interfaces

  There is a plethora of interfaces to chose from widely ranging in
  price and performance. Most motherboards today include IDE interface
  or better, Intel supports it through the Triton PCI chip set which is
  very popular these days. Many motherboards also include a SCSI
  interface chip made by NCR and that is connected directly to the PCI
  bus.  Check what you have and what BIOS support you have with it.

  3.4.1.  MFM and RLL

  Once upon a time this was the established technology, a time when 20MB
  was awesome, which compared to todays sizes makes you think that
  dinosaurs roamed the Earth with these drives. Like the dinosaurs these
  are outdated and are slow and unreliable compared to what we have
  today. Linux does support this but you are well advised to think twice
  about what you would put on this. One might argue that an emergency
  partition with a suitable vintage of DOS might be fitting.

  3.4.2.  IDE and ATA

  Progress made the drive electronics migrate from the ISA slot card
  over to the drive itself and Integrated Drive Electronics was borne.
  It was simple, cheap and reasonably fast so the BIOS designers
  provided the kind of snag that the computer industry is so full of. A
  combination of an IDE limitation of 16 heads together with the BIOS
  limitation of 1024 cylinders gave us the infamous 504MB limit.
  Following the computer industry traditions again, the snag was patched
  with a kludge and we got all sorts of translation schemes and BIOS
  bodges. This means that you need to read the installation
  documentation very carefully and check up on what BIOS you have and
  what date it has as the BIOS has to tell Linux what size drive you
  have. Fortunately with Linux you can also tell the kernel directly
  what size drive you have with the drive parameters, check the
  documentation for LILO and Loadlin, thoroughly. Note also that IDE is
  equivalent to ATA, AT Attachment.  IDE uses CPU-intensive Programmed
  Input/Output (PIO) to transfer data to and from the drives and has no
  capability for the more efficient Direct Memory Access (DMA)
  technology. Highest transfer rate is 8.3MB/s.

  3.4.3.  EIDE, Fast-ATA and ATA-2

  These 3 terms are roughly equivalent, fast-ATA is ATA-2 but EIDE
  additionally includes ATAPI. ATA-2 is what most use these days which
  is faster and with DMA. Highest transfer rate is increased to 16.6
  MB/s.

  3.4.4.  ATAPI

  The ATA Packet Interface was designed to support CD-ROM drives using
  the IDE port and like IDE it is cheap and simple.

  3.4.5.  SCSI

  The Small Computer System Interface is a multi purpose interface that
  can be used to connect to everything from drives, disk arrays,
  printers, scanners and more. The name is a bit of a misnomer as it has
  traditionally been used by the higher end of the market as well as in
  work stations since it is well suited for multi tasking environments.

  The standard interface is 8 bits wide and can address 8 devices.
  There is a wide version with 16 bits that is twice as fast on the same
  clock and can address 16 devices. The host adapter always counts as a
  device and is usually number 7.

  The old standard was 5MB/s and the newer fast-SCSI increased this to
  10MB/s. Recently ultra-SCSI, also known as Fast-20, arrived with 20
  MB/s transfer rates for an 8 bit wide bus.

  The higher performance comes at a cost that is usually higher than for
  (E)IDE. The importance of correct termination and good quality cables
  cannot be overemphasized. SCSI drives also often tend to be of a
  higher quality than IDE drives. Also adding SCSI devices tend to be
  easier than adding more IDE drives.

  There is a number of useful documents you should read if you use SCSI,
  the SCSI HOWTO as well as the SCSI FAQ posted on Usenet News.

  SCSI also has the advantage you can connect it easily to tape drives
  for backing up your data, as well as some printers and scanners. It is
  even possible to use it as a very fast network between computers while
  simultaneously share SCSI devices on the same bus. Work is under way
  but due to problems with ensuring cache coherency between the
  different computers connected, this is a non trivial task.

  3.5.  Host Adapters

  This is the other end of the interface from the drive, the part that
  is connected to a computer bus. The speed of the computer bus and that
  of the drives should be roughly similar, otherwise you have a
  bottleneck in your system. Connecting a RAID 0 disk-farm to a ISA card
  is pointless. These days most computers come with 32 bit PCI bus
  capable of 132MB/s transfers which should not represent a bottleneck
  for most people in the near future.

  As the drive electronic migrated to the drives the remaining part that
  became the (E)IDE interface is so small it can easily fit into the PCI
  chip set. The SCSI host adapter is more complex and often includes a
  small CPU of its own and is therefore more expensive and not
  integrated into the PCI chip sets available today. Technological
  evolution might change this.

  Some host adapters come with separate caching and intelligence but as
  this is basically second guessing the operating system the gains are
  heavily dependent on which operating system is used. Some of the more
  primitive ones, that shall remain nameless, experience great gains.
  Linux, on the other hand, have so much smarts of its own that the
  gains are much smaller.

  3.6.  Comparisons

  SCSI offers more performance than EIDE but at a price. Termination is
  more complex but expansion not too difficult. Having more than 4 (or
  in some cases 2) IDE drives can be complicated, with wide SCSI you can
  have up to 15. Some SCSI host adapters have several channels thereby
  multiplying the number of possible drives even further.

  RLL and MFM is in general too old, slow and unreliable to be of much
  use.

  3.7.  Future Development

  The general trend is for faster and faster devices for every update in
  the specifications. ATA-3 is just out but does not define faster
  transfers, that could happen in ATA-4 which is under way. Quantum has
  already released DMA/33.

  SCSI-3 is under way and will hopefully be released soon. Faster
  devices are already being announced, most recently an 80MB/s monster
  specification has been proposed. Some manufacturers already announce
  SCSI-3 devices but this is currently rather premature as the standard
  is not yet firm. As the transfer speeds increase the saturation point
  of the PCI bus is getting closer. Currently the 64 bit version has a
  limit of 264MB/s. The PCI transfer rate will in the future be
  increased from the current 33MHz to 66MHz, thereby increasing the
  limit to 528MB/s.

  Another trend is for larger and larger drives. I hear it is possible
  to get 28GB on a single drive though this is rather expensive.
  Currently the optimum storage for your money is about 2 GB but also
  this is continuously increasing. The introduction of DVD will in the
  near future have a big impact, with nearly 20GB on a single disk you
  can have a complete copy of even major FTP sites from around the
  world. The only thing we can be reasonably sure about the future is
  that even if it won't get any better, it will definitely be bigger.

  3.8.  Recommendations

  My personal view is that EIDE is the best way to start out on your
  system, especially if you intend to use DOS as well on your machine.
  If you plan to expand your system over many years or use it as a
  server I would strongly recommend you get SCSI drives. Currently wide
  SCSI is a little more expensive. You are generally more likely to get
  more for your money with standard width SCSI. There is also
  differential versions of the SCSI bus which increases maximum length
  of the cable. The price increase is even more substantial and cannot
  therefore be recommended for normal users.

  Also keep in mind that as you expand your system you will draw ever
  more power, so make sure your power supply is rated for the job and
  that you have sufficient cooling. Many SCSI drives offer the option of
  sequential spin-up which is a good idea for large systems.

  I do not want to say too much about low level hardware here but I have
  to make an exception for SCSI. Some people have a bit of trouble with
  this and in the majority of cases the cause is sub standard cabling.
  Certain SCSI adapters are known to be very sensitive to the quality of
  the cables, see the SCSI HOWTO.  The importance of correct cabling and
  termination cannot be overemphasized, read the manuals carefully. Also
  with the 20MHz Ultra standard you now also have to keep in mind that
  there is now also a minimum distance of 30cm between devices.

  4.  Considerations

  The starting point in this will be to consider where you are and what
  you want to do. The typical home system starts out with existing
  hardware and the newly converted Linux user will want to get the most
  out of existing hardware. Someone setting up a new system for a
  specific purpose (such as an Internet provider) will instead have to
  consider what the goal is and buy accordingly. Being ambitious I will
  try to cover the entire range.

  Various purposes will also have different requirements regarding file
  system placement on the drives, a large multiuser machine would
  probably be best off with the /home directory on a separate disk, just
  to give an example.

  In general, for performance it is advantageous to split most things
  over as many disks as possible but there is a limited number of
  devices that can live on a SCSI bus and cost is naturally also a
  factor. Equally important, file system maintenance becomes more
  complicated as the number of partitions and physical drives increases.

  4.1.  File system features

  The various parts of FSSTND have different requirements regarding
  speed, reliability and size, for instance losing root is a pain but
  can easily be recovered. Losing /var/spool/mail is a rather different
  issue. Here is a quick summary of some essential parts and their
  properties and requirements. Note that this is just a guide, there can
  be binaries in etc and lib directories, libraries in bin directories
  and so on.

  4.1.1.  Swap

     Speed
        Maximum! Though if you rely too much on swap you should consider
        buying some more RAM.

     Size
        Similar as for RAM. Quick and dirty algorithm: just as for tea:
        16MB for the machine and 2MB for each user. Smallest kernel run
        in 1MB but is tight, use 4MB for general work and light
        applications, 8MB for X11 or GCC or 16MB to be comfortable.
        (The author is known to brew a rather powerful cuppa tea...)

        Some suggest that swap space should be 1-2 times the size of the
        RAM, pointing out that the locality of the programs determines
        how effective your added swap space is. Note that using the same
        algorithm as for 4BSD is slightly incorrect as Linux does not
        allocate space for pages in core.

        Also remember to take into account the type of programs you use.
        Some programs that have large working sets, such as finite
        element modeling (FEM) have huge data structures loaded in RAM
        rather than working explicitly on disk files. data and computing
        intensive programs like this will cause excessive swapping if
        you have less RAM than the requirements.

        Other types of programs can lock their pages into RAM. This can
        be for security reasons, preventing copies of data reaching a
        swap device or for performance reasons such as in a real time
        module. Either way, locking pages reduces the remaining amount
        of swappable memory and can cause the system to swap earlier
        then otherwise expected.

     Reliability
        Medium. When it fails you know it pretty quickly and failure
        will cost you some lost work. You save often, don't you?

     Note 1
        Linux offers the possibility of interleaved swapping across
        multiple devices, a feature that can gain you much. Check out
        "man 8 swapon" for more details. However, software raiding swap
        across multiple devices adds more overheads than you gain.

        Thus the fstab file might look like this:

          /dev/sda1     swap            swap    pri=1           0       0
          /dev/sdc1     swap            swap    pri=1           0       0

     Remember that the fstab file is very sensitive to the formatting
     used, read the man page carefully and do not just cut and paste the
     lines above.

     Note 2
        Some people use a RAM disk for swapping or some other file
        systems. However, unless you have some very unusual requirements
        or setups you are unlikely to gain much from this as this cuts
        into the memory available for caching and buffering.

  4.1.2.  Temporary storage (/tmp and /var/tmp)

     Speed
        Very high. On a separate disk/partition this will reduce
        fragmentation generally, though ext2fs handles fragmentation
        rather well.

     Size
        Hard to tell, small systems are easy to run with just a few MB
        but these are notorious hiding places for stashing files away
        from prying eyes and quota enforcements and can grow without
        control on larger machines. Suggested: small home machine: 8MB,
        large home machine: 32MB, small server: 128MB, and large
        machines up to 500MB (The machine used by the author at work has
        1100 users and a 300MB /tmp directory). Keep an eye on these
        directories, not only for hidden files but also for old files.
        Also be prepared that these partitions might be the first reason
        you might have to resize your partitions.

     Reliability
        Low. Often programs will warn or fail gracefully when these
        areas fail or are filled up. Random file errors will of course
        be more serious, no matter what file area this is.

     Files
        Mostly short files but there can be a huge number of them.
        Normally programs delete their old tmp files but if somehow an
        interruption occurs they could survive.

     Note
        In FSSTND there is a note about putting /tmp on RAM disk. This,
        however, is not recommended for the same reasons as stated for
        swap. Also, as noted earlier, do not use flash RAM drives for
        these directories. One should also keep in mind that some
        systems are set to automatically clean tmp areas on rebooting.

  (* That was 50 lines, I am home and dry! *)

  4.1.3.  Spool areas (/var/spool/news and /var/spool/mail)

     Speed
        High, especially on large news servers. News transfer and
        expiring are disk intensive and will benefit from fast drives.
        Print spools: low. Consider RAID0 for news.

     Size
        For news/mail servers: whatever you can afford. For single user
        systems a few MB will be sufficient if you read continuously.
        Joining a list server and taking a holiday is, on the other
        hand, not a good idea.  (Again the machine I use at work has
        100MB reserved for the entire /var/spool)

     Reliability
        Mail: very high, news: medium, print spool: low. If your mail is
        very important (isn't it always?) consider RAID for reliability.

     Files
        Usually a huge number of files that are around a few kB in size.
        Files in the print spool can on the other hand be few but quite
        sizable.

     Note
        Some of the news documentation suggests putting all the
        .overview files on a drive separate from the news files, check
        out all news FAQs for more information.

  4.1.4.  Home directories (/home)

     Speed
        Medium. Although many programs use /tmp for temporary storage,
        others such as some news readers frequently update files in the
        home directory which can be noticeable on large multiuser
        systems. For small systems this is not a critical issue.

     Size
        Tricky! On some systems people pay for storage so this is
        usually then a question of finance. Large systems such as
        nyx.net <http://www.nyx.net/> (which is a free Internet service
        with mail, news and WWW services) run successfully with a
        suggested limit of 100K per user and 300K as enforced maximum.
        Commercial ISPs offer typically about 5MB in their standard
        subscription packages.

        If however you are writing books or are doing design work the
        requirements balloon quickly.
     Reliability
        Variable. Losing /home on a single user machine is annoying but
        when 2000 users call you to tell you their home directories are
        gone it is more than just annoying. For some their livelihood
        relies on what is here. You do regular backups of course?

     Files
        Equally tricky. The minimum setup for a single user tends to be
        a dozen files, 0.5 - 5 kB in size. Project related files can be
        huge though.

     Note
        You might consider RAID for either speed or reliability. If you
        want extremely high speed and reliability you might be looking
        at other operating system and hardware platforms anyway.  (Fault
        tolerance etc.)

  4.1.5.  Main binaries ( /usr/bin and /usr/local/bin)

     Speed
        Low. Often data is bigger than the programs which are demand
        loaded anyway so this is not speed critical. Witness the
        successes of live file systems on CD ROM.

     Size
        The sky is the limit but 200MB should give you most of what you
        want for a comprehensive system. A big system, for software
        development or a multi purpose server should perhaps reserve
        500MB both for installation and for growth.

     Reliability
        Low. This is usually mounted under root where all the essentials
        are collected. Nevertheless losing all the binaries is a pain...

     Files
        Variable but usually of the order of 10 - 100 kB.

  4.1.6.  Libraries ( /usr/lib and /usr/local/lib)

     Speed
        Medium. These are large chunks of data loaded often, ranging
        from object files to fonts, all susceptible to bloating. Often
        these are also loaded in their entirety and speed is of some use
        here.

     Size
        Variable. This is for instance where word processors store their
        immense font files. The few that have given me feedback on this
        report about 70MB in their various lib directories. The
        following ones are some of the largest diskhogs: GCC, Emacs,
        TeX/LaTeX, X11 and perl.

     Reliability
        Low. See point ``Main binaries''.

     Files
        Usually large with many of the order of 100 kB in size.

     Note
        For historical reasons some programs keep executables in the lib
        areas. One example is GCC which have some huge binaries in the
        /usr/lib/gcc/lib hierarchy.

  4.1.7.  Root

     Speed
        Quite low: only the bare minimum is here, much of which is only
        run at startup time.

     Size
        Relatively small. However it is a good idea to keep some
        essential rescue files and utilities on the root partition and
        some keep several kernel versions. Feedback suggests about 20MB
        would be sufficient.

     Reliability
        High. A failure here will possibly cause a fair bit of grief and
        you might end up spending some time rescuing your boot
        partition. With some practice you can of course do this in an
        hour or so, but I would think if you have some practice doing
        this you are also doing something wrong.

        Naturally you do have a rescue disk? Of course this is updated
        since you did your initial installation? There are many ready
        made rescue disks as well as rescue disk creation tools you
        might find valuable.  Presumable investing some time in this
        saves you from becoming a root rescue expert.

     Note 1
        If you have plenty of drives you might consider putting a spare
        emergency boot partition on a separate physical drive. It will
        cost you a little bit of space but if your setup is huge the
        time saved, should something fail, will be well worth the extra
        space.

     Note 2
        For simplicity and also in case of emergencies it is not
        advisable to put the root partition on a RAID level 0 system.
        Also if you use RAID for your boot partition you have to
        remember to have the md option turned on for your emergency
        kernel.

  4.1.8.  DOS etc.

  At the danger of sounding heretical I have included this little
  section about something many reading this document have strong
  feelings about.  Unfortunately many hardware items come with setup and
  maintenance tools based around those systems, so here goes.

     Speed
        Very low. The systems in question are not famed for speed so
        there is little point in using prime quality drives.
        Multitasking or multi-threading are not available so the command
        queueing facility found in SCSI drives will not be taken
        advantage of. If you have an old IDE drive it should be good
        enough. The exception is to some degree Win95 and more notably
        NT which have multi-threading support which should theoretically
        be able to take advantage of the more advanced features offered
        by SCSI devices.

     Size
        The company behind these operating systems is not famed for
        writing tight code so you have to be prepared to spend a few
        tens of MB depending on what version you install of the OS or
        Windows. With an old version of DOS or Windows you might fit it
        all in on 50MB.

     Reliability
        Ha-ha. As the chain is no stronger than the weakest link you can
        use any old drive. Since the OS is more likely to scramble
        itself than the drive is likely to self destruct you will soon
        learn the importance of keeping backups here.

        Put another way: "Your mission, should you choose to accept it,
        is to keep this partition working. The warranty will self
        destruct in 10 seconds..."

        Recently I was asked to justify my claims here. First of all I
        am not calling DOS and Windows sorry excuses for operating
        systems. Secondly there are various legal issues to be taken
        into account. Saying there is a connection between the last two
        sentences are merely the ravings of the paranoid. Surely.
        Instead I shall offer the esteemed reader a few key words: DOS
        4.0, DOS 6.x and various drive compression tools that shall
        remain nameless.

  4.2.  Explanation of terms

  Naturally the faster the better but often the happy installer of Linux
  has several disks of varying speed and reliability so even though this
  document describes performance as 'fast' and 'slow' it is just a rough
  guide since no finer granularity is feasible. Even so there are a few
  details that should be kept in mind:

  4.2.1.  Speed

  This is really a rather woolly mix of several terms: CPU load,
  transfer setup overhead, disk seek time and transfer rate. It is in
  the very nature of tuning that there is no fixed optimum, and in most
  cases price is the dictating factor. CPU load is only significant for
  IDE systems where the CPU does the transfer itself but is generally
  low for SCSI, see SCSI documentation for actual numbers. Disk seek
  time is also small, usually in the millisecond range. This however is
  not a problem if you use command queueing on SCSI where you then
  overlap commands keeping the bus busy all the time. News spools are a
  special case consisting of a huge number of normally small files so in
  this case seek time can become more significant.

  There are two main parameters that are of interest here:

     Seek
        is usually specified in the average time take for the read/write
        head to seek from one track to another. This parameter is
        important when dealing with a large number of small files such
        as found in spool files.  There is also the extra seek delay
        before the desired sector rotates into position under the head.
        This delay is dependent on the angular velocity of the drive
        which is why this parameter quite often is quoted for a drive.
        Common values are 4500, 5400 and 7200 rpm (rotations per
        minute). Higher rpm reduces the seek time but at a substantial
        cost.  Also drives working at 7200 rpm have been known to be
        noisy and to generate a lot of heat, a factor that should be
        kept in mind if you are building a large array or "disk farm".

     Transfer
        is usually specified in megabytes per second.  This parameter is
        important when handling large files that have to be transferred.
        Library files, dictionaries and image files are examples of
        this. Drives featuring a high rotation speed also normally have
        fast transfers as transfer speed is proportional to angular
        velocity for the same sector density.

  It is therefore important to read the specifications for the drives
  very carefully, and note that the maximum transfer speed quite often
  is quoted for transfers out of the on board cache and not directly
  from the platter.

  4.2.2.  Reliability

  Naturally no-one would want low reliability disks but one might be
  better off regarding old disks as unreliable. Also for RAID purposes
  (See the relevant information) it is suggested to use a mixed set of
  disks so that simultaneous disk crashes becomes less likely.

  So far I have had only one report of total file system failure but
  here unstable hardware seemed to be the cause of the problems.

  4.2.3.  Files

  The average file size is important in order to decide the most
  suitable drive parameters. A large number of small files makes the
  average seek time important whereas for big files the transfer speed
  is more important.  The command queueing in SCSI devices is very handy
  for handling large numbers of small files, but for transfer IDE is not
  too far behind SCSI and normally much cheaper than SCSI.

  4.3.  Technologies

  In order to decide how to get the most of your devices you need to
  know what technologies are available and their implications. As always
  there can be some tradeoffs with respect to speed, reliability, power,
  flexibility, ease of use and complexity.

  4.3.1.  RAID

  This is a method of increasing reliability, speed or both by using
  multiple disks in parallel thereby decreasing access time and
  increasing transfer speed. A checksum or mirroring system can be used
  to increase reliability.  Large servers can take advantage of such a
  setup but it might be overkill for a single user system unless you
  already have a large number of disks available. See other documents
  and FAQs for more information.

  For Linux one can set up a RAID system using either software (the md
  module in the kernel) or hardware, using a Linux compatible
  controller. Check the documentation for what controllers can be used.
  A hardware solution is usually faster, and perhaps also safer, but
  comes at a significant cost.

  Currently the only supported hardware SCSI RAID controllers are the
  SmartCache I/III/IV and SmartRAID I/III/IV controller families from
  DPT. These controllers are supported by the EATA-DMA driver in the
  standard kernel. This company also has an informative home page
  <http://www.dpt.com> which also describes various general aspects of
  RAID and SCSI in addition to the product related information.

  RAID comes in many levels and flavours which I will give a brief
  overview of this here. Much has been written about it and the
  interested reader is recommended to read more about this in the RAID
  FAQ.

  o  RAID 0 is not redundant at all but offers the best throughput of
     all levels here. Data is striped across a number of drives so read
     and write operations take place in parallel across all drives. On
     the other hand if a single drive fail then everything is lost. Did
     I mention backups?

  o  RAID 1 is the most primitive method of obtaining redundancy by
     duplicating data across all drives. Naturally this is massively
     wasteful but you get one substantial advantage which is fast
     access.  The drive that access the data first wins. Transfers are
     not any faster than for a single drive, even though you might get
     some faster read transfers by using one track reading per drive.

     Also if you have only 2 drives this is the only method of achieving
     redundancy.

  o  RAID 2, 3 and 4 are not so common and is not covered here.

  o  RAID 5 offers excellent redundancy without wasteful duplication. It
     is fast in reading but not so fast for writing. It is normally
     recommended to use at least 3, preferrably more than 5 drives for
     this level.

  There are also hybrids available based on RAID 1 and one other level.
  Many combinations are possible but I have only seen a few referred to.
  These are more complex than the above mentioned RAID levels.

  RAID 0/1 combines striping with duplication which gives very high
  transfers combined with fast seeks as well as redundancy. The
  disadvantage is high disk consumption as well as the above mentioned
  complexity.

  RAID 1/5 combines the speed and redundancy benefits of RAID5 with the
  fast seek of RAID1. Redundancy is improved compared to RAID 0/1 but
  disk consumption is still substantial. Implementing such a system
  would involve typically more than 6 drives, perhaps even several
  controllers or SCSI channels.
  4.3.2.  AFS, Veritas and Other Volume Management Systems

  Although multiple partitions and disks have the advantage of making
  for more space and higher speed and reliability there is a significant
  snag: if for instance the /tmp partition is full you are in trouble
  even if the news spool is empty, as it is not easy to retransfer
  quotas across partitions. Volume management is a system that does just
  this and AFS and Veritas are two of the best known examples. Some also
  offer other file systems like log file systems and others optimised
  for reliability or speed. Note that Veritas is not available (yet) for
  Linux and it is not certain they can sell kernel modules without
  providing source for their proprietary code, this is just mentioned
  for information on what is out there. Still, you can check their home
  page <http://www.veritas.com> to see how such systems function.

  Derek Atkins, of MIT, ported AFS to Linux and has also set up the
  Linux AFS mailing List for this which is open to the public.  Requests
  to join the list should go to Request and finally bug reports should
  be directed to Bug Reports.

  Important: as AFS uses encryption it is restricted software and cannot
  easily be exported from the US. AFS is now sold by Transarc and they
  have set up a www site. The directory structure there has been
  reorganized recently so I cannot give a more accurate URL than just
  the Transarc Home Page <http://www.transarc.com> which lands you in
  the root of the web site. There you can also find much general
  information as well as a FAQ.

  Volume management is for the time being an area where Linux is
  lacking.  Hot news: someone has just started a virtual partition
  system project that will reimplement many of the volume management
  functions found in IBM's AIX system.

  4.3.3.  Linux md Kernel Patch

  There is however one kernel project that attempts to do some of this,
  md, which has been part of the kernel distributions since 1.3.69.
  Currently providing spanning and RAID it is still in early development
  and people are reporting varying degrees of success as well as total
  wipe out. Use with caution.

  4.3.4.  General File System Consideration

  In the Linux world ext2fs is well established as a general purpose
  system.  Still for some purposes others can be a better choice. News
  spools lend themselves to a log file based system whereas high
  reliability data might need other formats. This is a hotly debated
  topic and there are currently few choices available but work is
  underway. Log file systems also have the advantage of very fast file
  checking. Mail servers in the 100G class can suffer file checks taking
  several days before becoming operational after rebooting.

  Adam Richter from Yggdrasil posted recently that they have been
  working on a compressed log file based system but that this project is
  currently on hold. Nevertheless a non-working version is available on
  their FTP server. Check out the yggdrasil ftp server
  <ftp://ftp.yggdrasil.com/private/adam> where special patched versions
  of the kernel can be found.  Hopefully this will be rolled into the
  mainstream kernel in the near future.

  There is room for access control lists (ACL) and other unimplemented
  features in the existing ext2fs, stay tuned for future updates. There
  has been some talk about adding on the fly compression too.

  There is also an encrypted file system available but again as this is
  under export control from the US, make sure you get it from a legal
  place.

  File systems is an active field of academic and industrial research
  and development, the results of which are quite often freely
  available. Linux has in many cases been a development tool in such
  activities so you can expect a lot of continuous work in this field,
  stay tuned for the latest development.

  4.3.5.  Compression

  Disk versus file compression is a hotly debated topic especially
  regarding the added danger of file corruption. Nevertheless there are
  several options available for the adventurous administrators. These
  take on many forms, from kernel modules and patches to extra libraries
  but note that most suffer various forms of limitations such as being
  read-only. As development takes place at neck breaking speed the specs
  have undoubtedly changed by the time you read this. As always: check
  the latest updates yourself. Here only a few references are given.

  o  DouBle features file compression with some limitations.

  o  Zlibc adds transparent on-the-fly decompression of files as they
     load.

  o  there are many modules available for reading compressed files or
     partitions that are native to various other operating systems
     though currently most of these are read-only.

  Also there is the user file system (userfs) that allows FTP based file
  system and some compression (arcfs) plus fast prototyping and many
  other features.

  Recent kernels feature the loop or loopback device which can be used
  to put a complete file system within a file. There are some
  possibilities for using this for making new file systems with
  compression, tarring, encryption etc.

  Note that this device is unrelated to the network loopback device.

  Very recently a compression package that extends ext2fs was announced.
  It is still under testing and will therefore mainly be of interest for
  kernel hackers but should soon gain stability for wider use.

  4.3.6.  Physical Track Positioning

  This trick used to be very important when drives were slow and small,
  and some file systems used to take the varying characteristics into
  account when placing files. Although higher overall speed, on board
  drive and controller caches and intelligence has reduced the effect of
  this.

  Nevertheless there is still a little to be gained even today. As we
  know, "world dominance" is soon within reach but to achieve this
  "fast" we need to employ all the tricks we can use

  To understand the strategy we need to recall this near ancient piece
  of knowledge and the properties of the various track locations.  This
  is based on the fact that transfer speeds generally increase for
  tracks further away from the spindle, as well as the fact that it is
  faster to seek to or from a central tracks than to or from the inner
  or outer tracks.

  Most drives use disks running at constant angular velocity but use
  (fairly) constant data density across all tracks. This means that you
  will get much higher transfer rates on the outer tracks than on the
  inner tracks; a characteristics which fits the requirements for large
  libraries well.

  Newer disks use a logical geometry mapping which differs from the
  actual physical mapping which is transparently mapped by the drive
  itself.  This makes the estimation of the "middle" tracks a little
  harder.

     Inner
        tracks are usually slow in transfer, and lying at one end of the
        seeking position it is also slow to seek to.

        This is more suitable to the low end directories such as DOS,
        root and print spools.

     Middle
        tracks are on average faster with respect to transfers than
        inner tracks and being in the middle also on average faster to
        seek to.

        This characteristics is ideal for the most demanding parts such
        as swap, /tmp and /var/tmp.

     Outer
        tracks have on average even faster transfer characteristics but
        like the inner tracks are at the end of the seek so
        statistically it is equally slow to seek to as the inner tracks.

        Large files such as libraries would benefit from a place here.

  Hence seek time reduction can be achieved by positioning frequently
  accessed tracks in the middle so that the average seek distance and
  therefore the seek time is short. This can be done either by using
  fdisk or cfdisk to make a partition on the middle tracks or by first
  making a file (using dd) equal to half the size of the entire disk
  before creating the files that are frequently accessed, after which
  the dummy file can be deleted. Both cases assume starting from an
  empty disk.

  The latter trick is suitable for news spools where the empty directory
  structure can be placed in the middle before putting in the data
  files.  This also helps reducing fragmentation a little.

  This little trick can be used both on ordinary drives as well as RAID
  systems. In the latter case the calculation for centring the tracks
  will be different, if possible. Consult the latest RAID manual.

  5.  Disk Layout

  With all this in mind we are now ready to embark on the layout. I have
  based this on my own method developed when I got hold of 3 old SCSI
  disks and boggled over the possibilities.
  At the end of this document there is an appendix with a few blank
  forms that you can fill in to help you decide and design your system.
  The following few paragraphs will refer to them.

  5.1.  Selection

  Determine your needs and set up a list of all the parts of the file
  system you want to be on separate partitions and sort them in
  descending order of speed requirement and how much space you want to
  give each partition.  The table in appendix A is a useful tool to
  select what directories you should put on different partitions. It is
  sorted in a logical order with space for your own additions and notes
  about mounting points and additional systems. It is therefore NOT
  sorted in order of speed, instead the speed requirements are indicated
  by bullets ('o').

  If you plan to RAID make a note of the disks you want to use and what
  partitions you want to RAID. Remember various RAID solutions offers
  different speeds and degrees of reliability.

  (Just to make it simple I'll assume we have a set of identical SCSI
  disks and no RAID)

  5.2.  Mapping

  Then we want to place the partitions onto physical disks. The point of
  the following algorithm is to maximise parallelizing and bus capacity.
  In this example the drives are A, B and C and the partitions are
  987654321 where 9 is the partition with the highest speed requirement.
  Starting at one drive we 'meander' the partition line over and over
  the drives in this way:

               A : 9 4 3
               B : 8 5 2
               C : 7 6 1

  This makes the 'sum of speed requirements' the most equal across each
  drive.

  The tables in the appendices are designed to simplify the mapping
  process.  Note the speed characteristics of your drives and note each
  directory under the appropriate column. Be prepared to shuffle
  directories, partitions and drives around a few times before you are
  satisfied. After that it is recommended to sort this list according to
  partition numbers into the table in appendix C and to use this when
  running the partitioning program (fdisk or cfdisk) and when doing the
  installation.

  5.3.  Optimizing

  After this there are usually a few partitions that have to be
  'shuffled' over the drives either to make them fit or if there are
  special considerations regarding speed, reliability, special file
  systems etc. Nevertheless this gives what this author believes is a
  good starting point for the complete setup of the drives and the
  partitions. In the end it is actual use that will determine the real
  needs after we have made so many assumptions. After commencing
  operations one should assume a time comes when a repartitioning will
  be beneficial.

  For instance if one of the 3 drives in the above mentioned example is
  very slow compared to the two others a better plan would be as
  follows:

               A : 9 6 5
               B : 8 7 4
               C : 3 2 1

  5.3.1.  Optimizing by characteristics

  Often drives can be similar in apparent overall speed but some
  advantage can be gained by matching drives to the file size
  distribution and frequency of access. Thus binaries are suited to
  drives with fast access that offer command queueing, and libraries are
  better suited to drives with larger transfer speeds where IDE offers
  good performance for the money.

  5.3.2.  Optimizing by drive parallelising

  Avoid drive contention by looking at tasks: for instance if you are
  accessing /usr/local/bin chances are you will soon also need files
  from /usr/local/lib so placing these at separate drives allows less
  seeking and possible parallel operation and drive caching. It is quite
  possible that choosing what may appear less than ideal drive
  characteristics will still be advantageous if you can gain parallel
  operations. Identify common tasks, what partitions they use and try to
  keep these on separate physical drives.

  Just to illustrate my point I will give a few examples of task
  analysis here.

     Office software
        such as editing, word processing and spreadsheets are typical
        examples of low intensity software both in terms of CPU and disk
        intensity. However, should you have a single server for a huge
        number of users you should not forget that most such software
        have auto save facilities which cause extra traffic, usually on
        the home directories. Splitting users over several drives would
        reduce contention.

     News
        readers also feature auto save features on home directories so
        ISPs should consider separating home directories, news spool and
        .overview files on separate drives.

     Database
        applications can be demanding both in terms of drive usage and
        speed requirements. The details are naturally application
        specific, read the documentation carefully with disk
        requirements in mind. Also consider RAID both for performance
        and reliability.

     E-mail
        reading and sending involves home directories as well as in- and
        outgoing spool files. If possible keep home directories and
        spool files on separate drives. If you are a mail server or a
        mail hub consider putting in- and outgoing spool directories on
        separate drives.

     Software development
        can require a large number of directories for binaries,
        libraries, include files as well as source and project files. If
        possible split as much as possible across separate drives. On
        small systems you can place /usr/src and project files on the
        same drive as the home directories.

     Web browsing
        is becoming more and more popular. Many browsers have a local
        cache which can expand to rather large volumes. As this is used
        when reloading pages or returning to the previous page, speed is
        quite important here. If however you are connected via a well
        configured proxy server you do not need more than typically a
        few megabytes per user for a session.

  5.4.  Usage requirements

  When you get a box of 10 or so CD-ROMs with a Linux distribution and
  the entire contents of the big FTP sites it can be tempting to install
  as much as your drives can take. Soon, however, one would find that
  this leaves little room to grow and that it is easy to bite over more
  than can be chewed, at least in polite company. Therefore I will make
  a few comments on a few points to keep in mind when you plan out your
  system. Comments here are actively sought.

     Testing
        Linux is simple and you don't even need a hard disk to try it
        out, if you can get the boot floppies to work you are likely to
        get it to work on your hardware.

     Learning
        about operating system is something Linux excels in, there is
        plenty of documentation and the source is available. A single
        drive with 50MB is enough to get you started with a shell, a few
        of the most frequently used commands and utilities.

     Hobby
        use or more serious learning requires more commands and
        utilities but a single drive is still all it takes, 500MB should
        give you plenty of room, also for sources and documentation.

     Serious
        software development or just serious hobby work requires even
        more space. At this stage you have probably a mail and news feed
        that requires spool files and plenty of space. Separate drives
        for various tasks will begin to show a benefit. At this stage
        you have probably already gotten hold of a few drives too. Drive
        requirements gets harder to estimate but I would expect 2-4GB to
        be plenty, even for a small server.

     Servers
        come in many flavours, ranging from mail servers to full sized
        ISP servers. A base of 2GB for the main system should be
        sufficient, then add space and perhaps also drives for separate
        features you will offer. Cost is the main limiting factor here
        but be prepared to spend a bit if you wish to justify the "S" in
        ISP. Admittedly, not all do it.

  5.5.  Servers

  Big tasks requires big drives and a separate section here. If possible
  keep as much as possible on separate drives. Some of the appendices
  detail the setup of a small departmental server for 10-100 users. Here
  I will present a few consideration for the higher end servers. In
  general you should not be afraid of using RAID, not only because it is
  fast and safe but also because it can make growth a little less
  painful. All the notes below come as additions to the points mentioned
  earlier.

  Popular servers rarely just happens, rather they grow over time and
  this demands both generous amounts of disk space as well as a good net
  connection.  In many of these cases it might be a good idea to reserve
  entire SCSI drives, in singles or as arrays, for each task. This way
  you can move the data should the computer fail. Note that transferring
  drives across computers is not simple and might not always work,
  especially in the case of IDE drives. Drive arrays require careful
  setup in order to reconstruct the data correctly, so you might want to
  keep a paper copy of your fstab file as well as a note of SCSI IDs.

  5.5.1.  Home directories

  Estimate how many drives you will need, if this is more than 2 I would
  recommend RAID, strongly. If not you should separate users across your
  drives dedicated to users based on some kind of simple hashing
  algorithm.  For instance you could use the first 2 letters in the user
  name, so jbloggs is put on /u/j/b/jbloggs where /u/j is a symbolic
  link to a physical drive so you can get a balanced load on your
  drives.

  5.5.2.  Anonymous FTP

  This is an essential service if you are serious about service. Good
  servers are well maintained, documented, kept up to date, and
  immensely popular no matter where in the world they are located. The
  big server ftp.funet.fi is an excellent example of this.

  In general this is not a question of CPU but of network bandwidth.
  Size is hard to estimate, mainly it is a question of ambition and
  service attitudes. I believe the big archive at ftp.cdrom.com is a
  *BSD machine with 50GB disk. Also memory is important for a dedicated
  FTP server, about 256MB RAM would be sufficient. Network connections
  would still be the most important factor.

  5.5.3.  WWW

  For many this is the main reason to get onto the Internet, in fact
  many now seem to equate the two. In addition to being network
  intensive there is also a fair bit of drive activity related to this,
  mainly regarding the caches. Keeping the cache on a separate, fast
  drive would be beneficial. Even better would be installing a caching
  proxy server. This way you can reduce the cache size for each user and
  speed up the service while at the same time cut down on the bandwidth
  requirements.

  With a caching proxy server you need a fast set of drives, RAID0 would
  be ideal as reliability is not important here. Higher capacity is
  better but about 2GB should be sufficient for most. Remember to match
  the cache period to the capacity and demand. Too long periods would on
  the other hand be a disadvantage, if possible try to adjust based on
  the URL. For more information check up on the most used servers such
  as Harvest, Squid and the one from Netscape.

  5.5.4.  Mail

  Handling mail is something most machines do to some extent. The big
  mail servers, however, come into a class of its own. This is a
  demanding task and a big server can be slow even when connected to
  fast drives and a good net feed. In the Linux world the big server at
  vger.rutgers.edu is a well known example. Unlike a news service which
  is distributed and which can partially reconstruct the spool using
  other machines as a feed, the mail servers are centralised. This makes
  safety much more important, so for a major server you should consider
  a RAID solution with emphasize on reliability. Size is hard to
  estimate, it all depends on how many lists you run as well as how many
  subscribers you have.

  5.5.5.  News

  This is definitely a high volume task, and very dependent on what news
  groups you subscribe to. On nyx there is a fairly complete feed and
  the spool files consume about 17GB. The biggest groups are no doubt in
  the alt.binary.* hierarchy, so if you for some reason decide not to
  get these you can get a good service with perhaps 12GB. Still others,
  that shall remain nameless, feel 2GB is sufficient to claim ISP
  status.  In this case news expires so fast I feel the spelling IsP is
  barely justified.

  5.5.6.  Others

  There are many services available on the net and even though many have
  been put somewhat in the shadows by the web. Nevertheless, services
  like archie, gopher and wais just to name a few, still exist and
  remain valuable tools on the net. If you are serious about starting a
  major server you should also consider these services. Determining the
  required volumes is hard, it all depends on popularity and demand.
  Providing good service inevitably has its costs, disk space is just
  one of them.

  5.6.  Pitfalls

  The dangers of splitting up everything into separate partitions are
  briefly mentioned in the section about volume management. Still,
  several people have asked me to emphasize this point more strongly:
  when one partition fills up it cannot grow any further, no matter if
  there is plenty of space in other partitions.
  In particular look out for explosive growth in the news spool
  (/var/spool/news). For multi user machines with quotas keep an eye on
  /tmp and /var/tmp as some people try to hide their files there, just
  look out for filenames ending in gif or jpeg...

  In fact, for single physical drives this scheme offers very little
  gains at all, other than making file growth monitoring easier (using
  'df') and physical track positioning. Most importantly there is no
  scope for parallel disk access. A freely available volume management
  system would solve this but this is still some time in the future.
  However, when more specialised file systems become available even a
  single disk could benefit from being divided into several partitions.

  5.7.  Compromises

  One way to avoid the aforementioned pitfalls is to only set off fixed
  partitions to directories with a fairly well known size such as swap,
  /tmp and /var/tmp and group together the remainders into the remaining
  partitions using symbolic links.

  Example: a slow disk (slowdisk), a fast disk (fastdisk) and an
  assortment of files. Having set up swap and tmp on fastdisk; and /home
  and root on slowdisk we have (the fictitious) directories /a/slow,
  /a/fast, /b/slow and /b/fast left to allocate on the partitions
  /mnt/slowdisk and /mnt/fastdisk which represents the remaining
  partitions of the two drives.

  Putting /a or /b directly on either drive gives the same properties to
  the subdirectories. We could make all 4 directories separate
  partitions but would lose some flexibility in managing the size of
  each directory. A better solution is to make these 4 directories
  symbolic links to appropriate directories on the respective drives.

  Thus we make

       /a/fast point to /mnt/fastdisk/a/fast   or   /mnt/fastdisk/a.fast
       /a/slow point to /mnt/slowdisk/a/slow   or   /mnt/slowdisk/a.slow
       /b/fast point to /mnt/fastdisk/b/fast   or   /mnt/fastdisk/b.fast
       /b/slow point to /mnt/slowdisk/b/slow   or   /mnt/slowdisk/b.slow

  and we get all fast directories on the fast drive without having to
  set up a partition for all 4 directories. The second (right hand)
  alternative gives us a flatter files system which in this case can
  make it simpler to keep an overview of the structure.

  The disadvantage is that it is a complicated scheme to set up and plan
  in the first place and that all mount point and partitions have to be
  defined before the system installation.

  5.8.  Maintenance

  It is the duty of the system manager to keep an eye on the drives and
  partitions. Should any of the partitions overflow, the system is
  likely to stop working properly, no matter how much space is available
  on other partitions, until space is reclaimed.

  Partitions and disks are easily monitored using 'df' and should be
  done frequently, perhaps using a cron job or some other general system
  management tool.

  Do not forget the swap partitions, these are best monitored using one
  of the memory statistics programs such as

  Drive usage monitoring is more difficult but it is important for the
  sake of performance to avoid contention - placing too much demand on a
  single drive if others are available and idle.

  It is important when installing software packages to have a clear idea
  where the various files go. As previously mentioned GCC keeps binaries
  in a library directory and there are also other programs that for
  historical reasons are hard to figure out, X11 for instance has an
  unusually complex structure.

  6.  Further Information

  There is wealth of information one should go through when setting up a
  major system, for instance for a news or general Internet service
  provider.  The FAQs in the following groups are useful:

     News groups
        comp.arch.storage, comp.sys.ibm.pc.hardware.storage,
        alt.filesystems.afs,comp.periphs.scsi ...

     Mailing lists
        raid, scsi, ext2fs ...

     HOWTO
        Large disk, Installation, LILO, Multiple OS, ...

  Many mailing lists are at vger.rutgers.edu but this is notoriously
  overloaded, so try to find a mirror. There are some lists mirrored at
  The Redhat Home Page <http://redhat.com>.

  Remember you can also use the web search engines and that some, like
  Altavista <http://www.altavista.digital.com> and Excite
  <http://www.excite.com> can also search usenet news.

  7.  Concluding Remarks

  Disk tuning and partition decisions are difficult to make, and there
  are no hard rules here. Nevertheless it is a good idea to work more on
  this as the payoffs can be considerable. Maximizing usage on one drive
  only while the others are idle is unlikely to be optimal, watch the
  drive light, they are not there just for decoration. For a properly
  set up system the lights should look like Christmas in a disco. Linux
  offers software RAID but also support for some hardware base SCSI RAID
  controllers. Check what is available. As your system and experiences
  evolve you are likely to repartition and you might look on this
  document again. Additions are always welcome.

  7.1.  Coming Soon

  There are a few more important things that are about to appear here.
  In particular I will add more example tables as I am about to set up
  two fairly large and general systems, one at work and one at home.
  These should give some general feeling on how a system can be set up
  for either of these two purposes. Examples of smooth running existing
  systems are also welcome.

  There is also a fair bit of work left to do on the various kinds of
  file systems and utilities.

  There will be a big addition on drive technologies coming soon as well
  as a more in depth description on using fdisk or cfdisk.  The file
  systems will be beefed up as more features become available as well as
  more on RAID and what directories can benefit from what RAID level.

  Also I hope to get some information from DPT who make the only RAID
  controller supported by Linux so far. I have contacted them but have
  yet to hear from them.

  There is some minor overlapping with the Linux Filesystem Structure
  Standard that I hope to integrate better soon, which will probably
  mean a big reworking of all the tables at the end of this document.
  When the new version is released there will be a substantial rewrite
  of some of the sections in this HOWTO but no release date has been
  announced yet.

  When the new standard appear various details such as directory names,
  sizes and file placements will be changed.

  As more people start reading this I should get some more comments and
  feedback. I am also thinking of making a program that can automate a
  fair bit of this decision making process and although it is unlikely
  to be optimum it should provide a simpler, more complete starting
  point.

  7.2.  Request for Information

  It has taken a fair bit of time to write this document and although
  most pieces are beginning to come together there are still some
  information needed before we are out of the beta stage.

  o  More information on swap sizing policies is needed as well as
     information on the largest swap size possible under the various
     kernel versions.

  o  How common is drive or file system corruption? So far I have only
     heard of problems caused by flaky hardware.

  o  References to speed and drives is needed.

  o  Are any other Linux compatible RAID controllers available?

  o  Leads to file system, volume management and other related software
     is welcome.

  o  What relevant monitoring, management and maintenance tools are
     available?

  o  General references to information sources are needed, perhaps this
     should be a separate document?

  o  Usage of /tmp and /var/tmp has been hard to determine, in fact what
     programs use which directory is not well defined and more
     information here is required. Still, it seems at least clear that
     these should reside on different physical drives in order to
     increase parallelicity.

  7.3.  Suggested Project Work

  Now and then people post on comp.os.linux.*, looking for good project
  ideas. Here I will list a few that comes to mind that are relevant to
  this document. Plans about big projects such as new file systems
  should still be posted in order to either find co-workers or see if
  someone is already working on it.

     Planning tools
        that can automate the design process outlines earlier would
        probably make a medium sized project, perhaps as an exercise in
        constraint based programming.

     Partitioning tools
        that take the output of the previously mentioned program and
        format drives in parallel and apply the appropriate symbolic
        links to the directory structure. It would probably be best if
        this were integrated in existing system installation software.

     Surveillance tools
        that keep an eye on the partition sizes and warn before a
        partition overflows.

     Migration tools
        that safely lets you move old structures to new (for instance
        RAID) systems. This could probably be done as a shell script
        controlling a back up program and would be rather simple. Still,
        be sure it is safe and that the changes can be undone.

  8.  Questions and Answers

  This is just a collection of what I believe are the most common
  questions people might have. Give me more feedback and I will turn
  this section into a proper FAQ.

  o  Q: I have a single drive, will this HOWTO help me?

     A: Yes, although only to a minor degree. Still, the section on
     ``Physical Track Positioning'' will give you some gains.

  o  Q: Are there any disadvantages in this scheme?

     A: There is only a minor snag: if even a single partition overflows
     the system might stop working properly. The severity depends of
     course on what partition is affected. Still this is not hard to
     monitor, the command df gives you a good overview of the situation.
     Also check the swap partition(s) using free to make sure you are
     not about to run out of virtual memory.

  o  Q: OK, so should I split the system into as many partitions as
     possible for a single drive?

     A: No, there are several disadvantages to that. First of all
     maintenance becomes needlessly complex and you gain very little in
     this. In fact if your partitions are too big you will seek across
     larger areas than needed.  This is a balance and dependent on the
     number of physical drives you have.

  o  Q: Does that mean more drives allows more partitions?

     A: To some degree, yes. Still, some directories should not be split
     off from root, check out the file system standard (soon released
     under the name File Hierarchy Standard) for more details.

  o  Q: What if I have many drives I want to use?

     A: If you have more than 3-4 drives you should consider using RAID
     of some form. Still, it is a good idea to keep your root partition
     on a simple partition without RAID, see the section on ``RAID'' for
     more details.

  9.  Appendix A: Partitioning layout table: mounting and linking

  The following table is designed to make layout a simpler paper and
  pencil exercise. It is probably best to print it out (using NON
  PROPORTIONAL fonts) and adjust the numbers until you are happy with
  them.

  Mount point is what directory you wish to mount a partition on or the
  actual device. This is also a good place to note how you plan to use
  symbolic links.

  The size given corresponds to a medium sized Debian 1.1.11
  installation. Other examples are coming later.

  Mainly you use this table to select what structure and drives you will
  use, the partition numbers and letters will come from the next two
  tables.

  Directory     Mount point     speed   seek    transfer        size    SIZE

  swap          __________      ooooo   ooooo   ooooo           32      ____

  /             __________      o       o       o               20      ____

  /tmp          __________      oooo    oooo    oooo                    ____

  /var          __________      oo      oo      oo              25      ____
  /var/tmp      __________      oooo    oooo    oooo                    ____
  /var/spool    __________                                              ____
  /var/spool/mail __________    o       o       o                       ____
  /var/spool/news __________    ooo     ooo     oo                      ____
  /var/spool/____ __________    ____    ____    ____                    ____

  /home         __________      oo      oo      oo                      ____

  /usr          __________                                      200     ____
  /usr/bin      __________      o       oo      o               25      ____
  /usr/lib      __________      oo      oo      ooo             80      ____
  /usr/local    __________                                              ____
  /usr/local/bin  __________    o       oo      o                       ____
  /usr/local/lib  __________    oo      oo      ooo                     ____
  /usr/local/____ __________                                            ____
  /usr/src      __________      o       oo      o               50      ____

  DOS           __________      o       o       o                       ____
  Win           __________      oo      oo      oo                      ____
  NT            __________      ooo     ooo     ooo                     ____

  /mnt/___/_____  __________    ____    ____    ____                    ____
  /mnt/___/_____  __________    ____    ____    ____                    ____
  /mnt/___/_____  __________    ____    ____    ____                    ____

  /___/___/_____  __________    ____    ____    ____                    ____
  /___/___/_____  __________    ____    ____    ____                    ____
  /___/___/_____  __________    ____    ____    ____                    ____

  Total capacity:

  10.  Appendix B: Partitioning layout table: numbering and sizing

  This table follows the same logical structure as the table above where
  you decided what disk to use. Here you select the physical tracking,
  keeping in mind the effect of track positioning mentioned earlier in
  ``Physical Track Positioning''.

  the final partition number will come out of the table after this.

  Directory      sda    sdb     sdc     hda     hdb     hdc     ___

  swap          |       |       |       |       |       |       |

  /             |       |       |       |       |       |       |

  /tmp          |       |       |       |       |       |       |

  /var          :       :       :       :       :       :       :
  /var/tmp      |       |       |       |       |       |       |
  /var/spool    :       :       :       :       :       :       :
  /var/spool/mail |     |       |       |       |       |       |
  /var/spool/news :     :       :       :       :       :       :
  /var/spool/____ |     |       |       |       |       |       |

  /home         |       |       |       |       |       |       |

  /usr          |       |       |       |       |       |       |
  /usr/bin      :       :       :       :       :       :       :
  /usr/lib      |       |       |       |       |       |       |
  /usr/local    :       :       :       :       :       :       :
  /usr/local/bin  |     |       |       |       |       |       |
  /usr/local/lib  :     :       :       :       :       :       :
  /usr/local/____ |     |       |       |       |       |       |
  /usr/src      :       :       :       :

  DOS           |       |       |       |       |       |       |
  Win           :       :       :       :       :       :       :
  NT            |       |       |       |       |       |       |

  /mnt/___/_____  |     |       |       |       |       |       |
  /mnt/___/_____  :     :       :       :       :       :       :
  /mnt/___/_____  |     |       |       |       |       |       |

  /___/___/_____  |     |       |       |       |       |       |
  /___/___/_____  :     :       :       :       :       :       :
  /___/___/_____  |     |       |       |       |       |       |

  Total capacity:

  11.  Appendix C: Partitioning layout table: partition placement

  This is just to sort the partition numbers in ascending order ready to
  input to fdisk or cfdisk. Here you take physical track positioning
  into account when finalizing your design. These numbers and letters
  are then used to update the previous tables, all of which you will
  find very useful in later maintenance.

          Drive :   sda sdb     sdc     hda     hdb     hdc     ___

  Total capacity: |     |       |       |       |       |       |

  Partition

  1             |       |       |       |       |       |       |
  2             :       :       :       :       :       :       :
  3             |       |       |       |       |       |       |
  4             :       :       :       :       :       :       :
  5             |       |       |       |       |       |       |
  6             :       :       :       :       :       :       :
  7             |       |       |       |       |       |       |
  8             :       :       :       :       :       :       :
  9             |       |       |       |       |       |       |
  10            :       :       :       :       :       :       :
  11            |       |       |       |       |       |       |
  12            :       :       :       :       :       :       :
  13            |       |       |       |       |       |       |
  14            :       :       :       :       :       :       :
  15            |       |       |       |       |       |       |
  16            :       :       :       :       :       :       :

  12.  Appendix D: Example: Multipurpose server

  The following table is from the setup of a medium sized multipurpose
  server where I work. Aside from being a general Linux machine it will
  also be a network related server (DNS, mail, FTP, news, printers etc.)
  X server for various CAD programs, CD ROM burner and many other
  things.  The files reside on 3 SCSI drives with a capacity of 600,
  1000 and 1300 MB.

  Some further speed could possibly be gained by splitting /usr/local
  from the rest of the /usr system but we deemed the further added
  complexity would not be worth it. With another couple of drives this
  could be more worthwhile. In this setup drive sda is old and slow and
  could just a well be replaced by an IDE drive. The other two drives
  are both rather fast. Basically we split most of the load between
  these two. To reduce dangers of imbalance in partition sizing we have
  decided to keep /usr/bin and /usr/local/bin in one drive and /usr/lib
  and /usr/local/lib on another separate drive which also affords us
  some drive parallelizing.

  Even more could be gained by using RAID but we felt that as a server
  we needed more reliability than is currently afforded by the md patch
  and a dedicated RAID controller was out of our reach.

  13.  Appendix E: Example: mounting and linking

  Directory     Mount point     speed   seek    transfer        size    SIZE

  swap          sdb2, sdc2      ooooo   ooooo   ooooo           32      2x64

  /             sda2            o       o       o               20      100

  /tmp          sdb3            oooo    oooo    oooo                    300

  /var          __________      oo      oo      oo                      ____
  /var/tmp      sdc3            oooo    oooo    oooo                    300
  /var/spool    sdb1                                                    436
  /var/spool/mail __________    o       o       o                       ____
  /var/spool/news __________    ooo     ooo     oo                      ____
  /var/spool/____ __________    ____    ____    ____                    ____

  /home         sda3            oo      oo      oo                      400

  /usr          sdb4                                            230     200
  /usr/bin      __________      o       oo      o               30      ____
  /usr/lib      -> libdisk      oo      oo      ooo             70      ____
  /usr/local    __________                                              ____
  /usr/local/bin  __________    o       oo      o                       ____
  /usr/local/lib  -> libdisk    oo      oo      ooo                     ____
  /usr/local/____ __________                                            ____
  /usr/src      ->/home/usr.src o       oo      o               10      ____

  DOS           sda1            o       o       o                       100
  Win           __________      oo      oo      oo                      ____
  NT            __________      ooo     ooo     ooo                     ____

  /mnt/libdisk  sdc4            oo      oo      ooo                     226
  /mnt/cd        sdc1           o       o       oo                      710
  /mnt/___/_____  __________    ____    ____    ____                    ____

  /___/___/_____  __________    ____    ____    ____                    ____
  /___/___/_____  __________    ____    ____    ____                    ____
  /___/___/_____  __________    ____    ____    ____                    ____

  Total capacity: 2900 MB

  14.  Appendix F: Example: numbering and sizing

  Here we do the adjustment of sizes and positioning.

  Directory      sda    sdb     sdc

  swap          |       |   64  |   64  |

  /             |  100  |       |       |

  /tmp          |       |  300  |       |

  /var          :       :       :       :
  /var/tmp      |       |       |  300  |
  /var/spool    :       :  436  :       :
  /var/spool/mail |     |       |       |
  /var/spool/news :     :       :       :
  /var/spool/____ |     |       |       |

  /home         |  400  |       |       |

  /usr          |       |  200  |       |
  /usr/bin      :       :       :       :
  /usr/lib      |       |       |       |
  /usr/local    :       :       :       :
  /usr/local/bin  |     |       |       |
  /usr/local/lib  :     :       :       :
  /usr/local/____ |     |       |       |
  /usr/src      :       :       :       :

  DOS           |  100  |       |       |
  Win           :       :       :       :
  NT            |       |       |       |

  /mnt/libdisk  |       |       |  226  |
  /mnt/cd        :      :       :  710  :
  /mnt/___/_____  |     |       |       |

  /___/___/_____  |     |       |       |
  /___/___/_____  :     :       :       :
  /___/___/_____  |     |       |       |

  Total capacity: |  600  | 1000  | 1300  |

  15.  Appendix G: Example: partition placement

  This is just to sort the partition numbers in ascending order ready to
  input to fdisk or cfdisk.

               Drive :  sda     sdb     sdc

       Total capacity: |   600 |  1000 |  1300 |

       Partition

       1               |   100 |   436 |   710 |
       2               :   100 :    64 :    64 :
       3               |   400 |   300 |   300 |
       4               :       :   200 :   226 :

  16.  Appendix H: Example II

  The following is an example of a server setup in an academic setting,
  and is contributed by nakano@apm.seikei.ac.jp. I have only done minor
  editing to this section.

  /var/spool/delegate is a directory for storing logs and cache files of
  an WWW proxy server program, "delegated". Since I don't notice it
  widely, there are 1000--1500 requests/day currently, and average disk
  usage is 15--30% with expiration of caches each day.

  /mnt.archive is used for data files which are big and not frequently
  referenced such a s experimental data (especially graphic ones),
  various source archives, and Win95 backups (growing very fast...).

  /mnt.root is backup root file system containing rescue utilities. A
  boot floppy is also prepared to boot with this partition.

  =================================================
  Directory             sda     sdb     hda

  swap                  |    64 |    64 |       |
  /                     |       |       |    20 |
  /tmp                  |       |       |   180 |

  /var                  :   300 :       :       :
  /var/tmp              |       |   300 |       |
  /var/spool/delegate   |   300 |       |       |

  /home                 |       |       |   850 |
  /usr                  |   360 |       |       |
  /usr/lib              -> /mnt.lib/usr.lib
  /usr/local/lib          -> /mnt.lib/usr.local.lib

  /mnt.lib              |       |   350 |       |
  /mnt.archive          :       :  1300 :       :
  /mnt.root             |       |    20 |       |

  Total capacity:           1024    2034    1050

  =================================================
          Drive :          sda  sdb     hda
  Total capacity:        |  1024 |  2034 |  1050 |

  Partition
  1                     |   300 |    20 |    20 |
  2                     :    64 :  1300 :   180 :
  3                     |   300 |    64 |   850 |
  4                     :   360 :   ext :       :
  5                     |       |   300 |       |
  6                     :       :   350 :       :

  Filesystem     1024-blocks  Used Available Capacity Mounted on
  /dev/hda1             19485   10534   7945    57%     /
  /dev/hda2             178598  13      169362  0%      /tmp
  /dev/hda3             826640  440814  343138  56%     /home
  /dev/sda1             306088  33580   256700  12%     /var
  /dev/sda3             297925  47730   234807  17%     /var/spool/delegate
  /dev/sda4             363272  170872  173640  50%     /usr
  /dev/sdb5             297598  2       282228  0%      /var/tmp
  /dev/sdb2         1339248     302564  967520  24%     /mnt.archive
  /dev/sdb6             323716  78792   228208  26%     /mnt.lib

  Apparently /tmp and /var/tmp is too big. These directories shall be
  packed together into one partition when disk space shortage comes.

  /mnt.lib is also seemed to be, but I plan to install newer TeX and
  ghostscript archives, so /usr/local/lib may grow about 100M or so
  (since we must use Japanese fonts!).

  Whole system is backed up by Seagate Tapestore 8000 (Travan TR-4,
  4G/8G). It works fine when accessed through /dev/st0, but when done
  through /dev/nst0 or with `mt' command, SCSI system get up a panic
  occasionally. It's not critical, but the biggest problem rest in our
  system...