Optimal Span

What is the most effective span for a hierarchical structure? For example, Management Span of Control is optimally between 6 and 7.



Most complex structures are compositional or control hierarchies. An example of a compositional hierarchy is written language. A word is composed of characters. A simple sentence is composed of words. A paragraph is composed of simple sentences, and so on. An example of a control hierarchy is a management structure, where a manager controls a number of foremen or team leaders, and they, in turn, control a number of workers.

Optimal Span Hypothesis:

Optimal Span is about the same, between five and nine, for virtually all complex structures that have been competitively selected.

That includes the products of Natural Selection (Darwinian evolution) and the products of Artificial Selection (Human inventions that competed for acceptance by human society).

The hypothesis is supported by empirical data from varied domains and a derivation from Shannon’s Information Theory and Smith and Morowitz’s concept of intricacy.

What is a Hierarchy?

Hierarchy (fromGreek:ἱερός — hieros, ‘sacred’, and ἄρχω — arkho, ‘rule’) originally denoted the holy rule ranking of nine orders of angels, from God to Seraphims to Cherubims and so ondown to the Archangels and plain old Angels at the lowest level. Kind of like the organization of God’s Corporation!

The seminal book on this topic is Hierarchy Theory: The Challenge of Complex Systems[ Pattee, 1973 ]. This book includesa chapter by Nobel laureate Herbert A. Simonon “The Organization of Complex Systems”. Other chapters: James Bonner “Hierarchical Control Programs in Biological Development”; Howard H. Pattee “The Physical Basis and Origin of Hierarchical Control” and “Postscript: Unsolved Problems and Potential Applications of Hierarchy Theories”; Richard Levins “The Limits of Complexity”, andClifford Grobstein “Hierarchical Order and Neogenesis”.

A more recent book, Complexity – The Emerging Science at the Edge of Order and Chaos, observes that the “hierarchical, building-block structure of things is as commonplace as air.” [ Waldrop, 1992 ]. Indeed, a bit of contemplation will reveal that nearly all complex structures are hierarchies.

There are two kinds of hierarchy. A few well-known examples will set the stage for more detailed examination of modern Hierarchy Theory:


1 -Management Structure (Control Hierarchy)

Workers at the lowest levelare controlled by Team Leaders (or Foremen), teams are controlled by First-Level Managers who report to Second-Level managers and so on up to the Top Dog Executive. At each level, the Management Span of Control is the number of subordinates controlled by each superior.

2 -Software Package (Control Hierarchy)

Main Line computer programcontrols Units (or Modules, etc.) and the Units control Procedures that control Subroutines that control Lines of Code. At each level, the Span of Control is the number of lower-level software entities controlled by a higher-level entity.

3 – Written Language (Containment Hierarchy)

Characters at the lowest level are contained in Words. Words are contained in Simple Sentences. Simple Sentences in Paragraphs, and so on up to Sections, Chapters and the Entire Document. At each level, theSpan of Containment is the number of smaller entitiescontained by each larger.

4 – “Chinese boxes” (Containment Hierarchy)

ALarge box contains a number of Smaller Boxes which each contain Still Smaller Boxes down to the Smallest Box. At each level, theSpan of Containment is the number of smaller entitiescontained by each larger.

Traversing a Hierarchy

Note thatExamples 1 and 3 above were explained starting at the bottom of the hierarchy and traversing up to the top while Examples 2 and 4were explained by starting at the top and traversing to the bottom.

Simple hierarchies of ths type are called “tree structures” because you can traverse them entirely from the top or the bottom andcover all nodes and links between nodes.

“Folding” a “String”

A tree structure hierarchy can also be thought of an aone-dimensional “string” that is “folded” (or parsed)to create the tree structure. What does “folding” mean in this context?

As an amusing example, please imagine the Chief Executive of a Company at the head of a parade of all his or her employees. Behind the Chief Exec would be Senior Manager #1 followed by his or her First-Level Manager #1. Behind First-Level Manager #1 would be his or her employees. Behind the employees would be the First-level Manager #2 with his or her employees. After all the First-levels and their employees,Senior Manager #2 would join the parade with his or her First-Levels and their employees, and so on. If you took the long parade and called it a “string”, you could “fold” it at each group of employees, then again at each group of First-Level Managers, and again at the group of Senior Managers, and get the familiar management tree structure!

The above “parade” was described with the Chief Exec at the head of it, but you could just as well turn it around and have the lowest-level employees lead and the Chief Exec at the rear. When military hierarchies go to war, the lowest-level soldiers are usually at the front and the highest-level Generals well behind.

A more practical example is the text you are reading right now! It was transmitted over the Internet as a string of “bits” – “1” and “0” symbols. Each group of eight bits denotes a particular character. Some of the characters are the familiar numbers and upper and lower-case letters of our alphabet and others are special characters, such as the space that demarks a word, punctuation characters such as a period or comma or question mark, and special control characters that denote things like new paragraph and so on.

You could say the string of 1’s and 0’s is folded every eight bits to form a Character. The string is folded again at each Space Character to form Words. Each group of Words is folded yet again at each comma or period symbol that denotes a Simple Sentence. Each group of Simple Sentences is again folded to make Paragraphs, and so on.

You could lay out a written document as a tree structure, similar to a Management hierarchy. The Characters would be at the bottom, the Words at the next level up, the Simple Sentences next, the Paragraphs next, and so on up to the whole Section, Chapter, and Book.

What is Optimal Span?

With all these different types of hierarchical structures, each with its own purpose and use, you might think there is no common property they share other than their hierarchical nature. You might expect a particular Span of Control that is best for Management Structures in Corporations and a significantly different Span of Containment that is best in Written Language.

If you expected the Optimal Span to be significantly different for each case, you would be wrong!

According to System Science research and Information Theory, there is a single equation that may be used to determine the most beneficial Span. Thatoptimum value maximizes the effectiveness of the resources. A Management Structure should have the Span of Control that makes the best use of the number of employees available. A Written Language Structure should have the Span of Containment that makes the best use of the number of characters (or bits in the case of the Internet) available, and so on.

The simple equation for Optimal Span derived by [ Glickstein, 1996 ] is:

So= 1 + De

(Where D is the degree of the nodes and e is the Natural Number 2.71828459)

In the examples above, where the hierarchical structure may be described as a single-dimensional folded string where each node has two closest neighbors, the degree of the nodes is, D = 2, so the equation reduces to:

So= 1 + De = 1 + 2 x 2.71828459 = 6.43659

“Take home message”: OPTIMAL SPAN, S o = ~ 6.4

Also see Quantifying Brooks Mythical Man-Month (Knol) , [Glickstein, 2003 ] and [ Meijer, 2006 ] for the applicability of Optimal Span to Management Structures.

[Added 4 April 2013: The Meijer, 2006 link no longer works. His .pdf document is available at http://repository.tudelft.nl/assets/uuid:843020de-2248-468a-bf19-15b4447b5bce/dep_meijer_20061114.pdf ]

Examples of Competitively-Selected Optimal Span

Management Span of Control

Management experts have long recommended that Management Span of Control be in the range of five or sixfor employees whose work requires considerable interaction. Depending upon the level of interaction, experts recommend up to nine employees per department.This recommendation comes from experience with organizations with different Spans of Control. The most successful tend to have Spans in the recommended range, five to nine,an example of competitive-selection.

When the lowest level consists of service-type employees, whose interaction with each other is less complex, there may be a dozen or two or more in a department, but there will usually be one or more foremen or team leaders to reduce the effective Management Span of Control to the range five to nine.Corporate hierarchies usually haveabout the same range of first-level departments reporting to the next level up and so on.

The diagram below is a simple example to bring home the point in a common-sense way. Say you had a budget for 49 employees and had to organize them to make most effective use of your human resources. The diagram shows three different ways you might organize them. Which one seems most reasonable?

In (A) you have ONE manager and 48 workers, which is a BROAD hierarchy. Management experts would say a Management Span of Control of 48 is way too much for anyone to handle!

In (B) you have THIRTEEN managers in a three-level management hierarchy and only 36 workers, which is a TALL hierarchy with an average Management Span of Control of only 3.3. Management experts would say this is way too inefficient with too many managers!

In (C) you have SEVEN managers and 42 workers in a MODERATE hierarchy with an average Management Span of Control of about 6.5. Management experts would say this is about right for most organizations where the workers have to interact with each other. Optimal Span theory supports this common-sense belief!

Human Span of Absolute Judgement

Evolution and Natural Selection have produced the human brain and nervous system and our senses of vision, hearing, and taste. It turns out that these senses are generally limited to five to nine gradations that can be reliably distinguished. It is also the case that we can remember about five to nine chunks of information at any one time. This is another example of competitive-selection, where, over the eons of evolutionary development, biological organisms competed and those that best fit the environment were selected to survive and reproduce.

George A Miller wrote a classic paper titled The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information [ Miller, 1956 ]. He showed that human senses of sight, hearing, and taste were generally limited to five to nine gradations that could be reliably distinguished. Miller’s paper begins as follows:

My problem is that I have been persecuted by an integer [7 +/- 2]. For seven years this number has followed me around, has intruded in my most private data, and has assaulted me from the pages of our most public journals. This number assumes a variety of disguises, being sometimes a little larger and sometimes a little smaller than usual, but never changing so much as to be unrecognizable. The persistence with which this number plagues me is far more than a random accident. There is, to quote a famous senator, a design behind it, some pattern governing its appearances. Either there really is something unusual about the number or else I am suffering from delusions of persecution.Miller’s paper is well worth readingand is available on the Internet at this link [Miller, 1956]

Glickstein’s Theory of Optimal Span

Miller’s number also pursued Ira Glickstein untilhe caught it.He showed, as part of hisPhD research,[ Glickstein, 1996 ]that, based on empirical data from varied domains, the optimal span for virtually all hierarchical structures falls into Miller’s range, five to nine. Using Shannon’s information theory,he also showed that maximum intricacy is obtained when the Spanfor single-dimensional structures is, S o = 1 + 2e = 6.4 (where e is the natural number, 2.71828459).His “magical number” is not the integer 7, but 6.4, a more precise rendition of Miller’s number!

Hierarchy and Complexity

Howard H. Pattee, one of the early researchers in hierarchy theory, posed a serious challenge:

Is it possible to have a simple theoryof very complex, evolving systems? Can we hope to find common, essential properties of hierarchical organizations that we can usefully apply to the design and management of our growing biological, social, and technological organizations? [Pattee, 1973]

Pattee was the Chairman of Glickstein’s PhD Committee and Glickstein took the challenge very seriously!

The hypothesis at the heart ofhis PhD dissertation is that the optimal span is about the same for virtually all complex structures that have been competitively selected. That includes the products of Natural Selection (Darwinian evolution) and the products of Artificial Selection (Human inventions that competed for acceptance by human society).

Weak Statement of Hypothesis

In whathe calls the “weak” statement of the hypothesis,he showed that it is scientifically plausable to believe that diverse structures tend to have spans in the range of five to nine.He did this by gathering empirical data from six domains plus a computer simulation. The domains are:

Human Cognition: Span of Absolute Judgement (one, two and three dimensions), Span of Immediate Memory, Categorical hierarchies and the fine structure of the brain. These all conform tothe hypothesis. Written Language: Pictographic, Logographic, Logo-Syllabic, Semi-alphabetic, and Alphabetic writing. Hierarchically-folded linear structures in written languages, including English, Chinese, and Japanese writing. These all conform tothe hypothesis. Organization and Management of Human Groups: Management span of control in business and industrial organizations, military, and church hierarchies. These all conform tothe hypothesis. Animal and Plant Organization and Structure: Primates, schooling fish, eusocial insects (bees, ants), plants. These all conform tothe hypothesis. Structure and Organization of Cells and Genes: Prokaryotic and eukaryotic cells, gene regulation hierarchies. These all conform to thehypothesis. RNA and DNA: Structure of nucleic acids. These all conform tothe hypothesis. Computer Simulations: Hierarchical generation of initial conditions for Conway’s Game of Life. (Two-dimensional ). These all conform tothe hypothesis.

Strong Statement of Hypothesis

In what hecalls the “strong” statement of the hypothesis, he showedthat Shannon’s information theory, andthe concept of intricacy ofa graphical representation of a structure [ Smith and Morowitz, 1982 ] can be used to derive a formula for the optimal span of a hierarchical graph.

This work extended the single-dimensional span concepts of management theory and Miller’s “seven plus or minus two” concepts to a general equation for any number of dimensions. He derived an equation that yields Optimal Span for a structure with one-, two-, three- or any number of dimensions!

Theequation for Span (optimal) is:

So= 1 + De

(Where D is the degree of the nodes and e is the Natural Number 2.71828459)

NOTE: For a one-dimensional structure, such as a management hierarchy or the span of absolute judgement for a single-dimensional visual, taste or sound, the degree of the nodes, D = 2 . This is because each node is a link in a one-dimensional chain or string and so each node has two closest neighbors. For a two-dimensional structure, such as a 2D visual or the pitch and intensity of a sound or a mixture of salt and sugar, D = 4. Each node is a link in a 2D mesh and so each node has four closest neighbors. For a 3D structure, D = 6 because each node is a link in a 3D egg crate and has six closest neighbors. Some of the examples in Miller’s paper were 2D and 3D and his published data agreed with the results ofthe formula. The computer simulation was 2D and also conformed well to the hypothesis.

In normal usage, complexity and intricacy are sometimes used interchangeably.However, there is an important distinction between them according to [ Smith and Morowitz, 1982 ].

Something is said to be complex if it has a lot of different parts, interacting in different ways. To completely describe a complex system you would have to completely describe each of the different types of parts and then describe the different waysthey interact. Therefore, a measure of complexity is how long a description would be required for one person competent in that domain of knowledge to explain it to another.

Something is said to be intricate if it has a lot of parts, but they may all be the same or very similar and they may interact in simple ways. To completely describe an intricate system you would only have to describe one or two or a few different parts and then describe the simple ways they interact. For example, a window screen is intricate but not at all complex. It consists of equally-spaced vertical and horizontal wires criss-crossing in a regular pattern in a frame where the spaces are small enough to exclude bugs down to some size. All you need to know is the material and diameter of the wires, the spacing betwen them, and the size of the window frame. Similarly, a field of grass is intricate but not complex.

If you think about it for a moment, it is clear that, given limited resources, they should be deployed in ways that minimize complexity to the extent possible, and maximize intricacy!

Using [ Smith and Morowitz, 1982 ] concepts of inticacy, it is possible to compute the theoretical efficiency and effectiveness of a hierarchical structure. If it had the Optimal Span, it is 100% efficient, meaning that it attains 100% of the theoretical intricacy given the resources used.If not, the percentage of efficiency can be computed. For example, a one-dimensional tree structure hierarchy is 100% efficient (maximum theoretical intricacy) with a Span of 6.4. For a Span of five, it is 94% efficient (94% of maximum theoretical intricacy).It is also 94% efficient with a Span of nine. For a Span of four or twelve, it is 80% efficient.

Derivation of Optimal Span

Shannon’s formula for information (usually called “Shannon entropy”) is a very well-known probabilistic measure of uncertainty function:


H(f(x)) = – ∑ f(x)log 2 f(x)

[ Smith and Morowitz, 1982 ] apply Shannon’s formula as the basis of their formula for what they call the intricacy of a graphical structure, which I put into the following form:


Intricacy = – S (A/M) log 2 (A/M)


Sis the span of a group of nodes (i.e., how many nodes are in the group), andA/Mis the ratio of the actual number of connections between the nodes to the maximum possible number of connections. This equation has the same basic form as Shannon’s formula, whereStakes the place ofandA/Mtakes the place off(x).

If certain assumptions are made (see my complete dissertation for details), including the assumption that all connections are bi-directional and equally-weighted, it can be shown that:


A/M = D/(S-1)

Therefore, equation (2) is equivalent to the following:


Intricacy = – S (D/(S-1)) log 2 (D/(S-1))


Dis the average degree of the nodes in the group.

This equation reaches a maximum when:


D/(S-1) = 1/e

Solving (5) for

S o, when the equation reaches a maximum,


S o = 1 + De

Statistical Comparison of Optimal Span Hypothesis with Empirical Data

Average spans for all domains are well within the theoretical optimal span range of five to nine (for one-dimensional domains), nine to seventeen (for two-dimensional domains) and thirteen to twenty-five (for three-dimensional domains). More data and details on the analysis are available in [ Glickstein, 1996 ].

Domain and Parameter ReportedCI=Confidence Interval for “t test” One-Dimension Two-D Three-D
Optimal Span Theory Theoretical Optima S o = 1 + De Node Degree D = 2 D = 4 D = 6
Optimal Span S o = 6.4 S o = 11.9 S o = 17.3
94%+ Optimal 5 … 9 9 … 17 13 … 25
Near Optimal 80%+ Optimal 4 … 13 7 … 26 11 … 38
Human Cognition Miller’s Data Mean Span 6.5 12.5 17.2
CI (t-test) 5.24..7.76 (95%+) (anecdotal) 5 (anecdotal)
Human Language English Characters per Word Mean Span 5.74 1
CI (t-test) 5.72..5.76 (99%+)
English Words per Predication Mean Span 6.96 2
CI (t-test) 6.94..6.98 (99%+)
Chinese Mean Span 5.7 3 (anecdotal)
Japanese Mean Span 6.1 4 (anecdotal)
Human Organization Business and Industry Mean Span 6.7
CI (t-test) 5.71..7.69 (99%+)
Overall Mean Span 7.1 (anecdotal)
Animal/Plant Organization Overall Mean Span 6.5 (anecdotal)
Cell Structure & Gene Regulation Cell Structure and Type Mean Span 7.43
CI (t-test) 5.82..9.0 (90%+)
Gene Regulation
RNA & DNA Structure Five RNAs Mean Span 6.39
CI (t-test) 5.01..7.77 (99%+)
E. Coli 16S tRNA Mean Span 6.19
CI (t-test) 5.55..6.83 (99%+)
DNA Mean Span 6.7 (anecdotal)
Computer Simulation Game of Life Mean Span 12.5 (anecdotal)


The space that is first character of a word is included in character count per word. Characters per word data are from [Frances and Kucera, 1982] who did a computer count of one million words of 15 genre of English literature. Words per predication is a stand-in for words per simple sentence. Data are from [Frances and Kucera, 1982] who did a computer count of one million words of 15 genre of English literature. Chinese is written in character blocks with up to six units per block and one or more blocks per word. Each unit is made up of a number of strokes. There is generally no space delimiting words. (Thereaderinsertsavirtualspacesasyoujustdidhere.) In this analysis, a virtual space was counted as a character unit for each word. The overall average span data is the average of (a) strokes per unit (5.4) and (b) units per word (6.0). Japanese writing utilizes a mix of Chinese ideograms plus phonographic (alphabetic) symbols. The Chinese units were counted as described above and the phonographic symbols were added to get a total of units per word. As with Chinese, Japanese is normally written with no space delimiting words. In this analysis, a virtual space was counted as a character unit for each word. The overall average span data is the average of (a) units and/or phonographic characters per word (5.3), (b) words per simple sentence (5.5), and (c) simple sentences per paragraph (7.4). Anecdotal data is reported when the number of samples were insufficient to obtain at least a 90% confidence interval using the statistical “t-test”.

Notable Exception to Optimal Span

As noted above, hierarchy means rank or order of holy beings.Thehierarchy of the angels of the heavenly host, as recounted in Jewish and Christian scriptures and some later mystical writings are not typically in the range five to nine and therefore do not comform tothe Optimal Span hypothesis. That is a good result because these hierarchies are not competitively selected! They are either the product of human imagination -or- the Creation of God who is not necessarily bound by the laws of information theory!

The Old Testament includes two different accounts of the numbers of angels. The first claims there are 20,000 in four levels, which would yield an average Span of 11.6 which is 86% optimal. The second claims there are 101,000,000, also at four levels, which is a whopping average Span of 100, which is only 21% optimal.

The New Testament also includes two different accountings. The first claims there are 72,000 in seven levels, which would yield an average Span of only 4.4 which is 85% optimal. The second claims there are 100,000,000, at eightlevels, which is aaverage Span of 12 which is85% optimal.

Later Jewish and Christian clerics and mystics have estimated either 301,655,722 or 311,048,892 angels at three, nine, ten, eleven, or twelve levels. It is amazing how exact the claimed counts are, to nine significant figures! Given that number of angels, if there are only three levels, the Span comes out to be 673 which is only 5% optimal. Given nine to twelve levels, however, the results are closer to reasonable and within the expected range of five to nine.For nine levels, the Span is 8.7, which is 95% optimal; ten levels, the Span is 7.0, which is 99.5% optimal;eleven levels, the Span is 5.8, which is 99% optimal; twelvelevels, the Span is 5.0, which is 94% optimal;


Glickstein, 1996 -Hierarchy Theory: Some Common Properties of Competitively-Selected Systems, Ira Glickstein, 1996, Graduate School, Binghamton University (NY). http://web.archive.org/web/20010419115651/http://pages.prodigy.net/ira/fudd.htm#dis

Glickstein, 2003 -Quantifying the “Mythical Man-Month”, Ira Glickstein, revised 2003, University of Maryland University College. http://polaris.umuc.edu/~iglickst/mswe601/optimal_MMM.doc

Meijer, 2006 – Organization Structures for Dealing With Complexity, Bart Ruurd Meijer, 2006, Technische Universiteit Delft, Netherlands. See Section 3.5 (page 103) “Glickstein’s hierarchical span theory”. Specific pages: 6, 103, 104,106, 107, and 204. ISBN-10: 90-9020642-6. ISBN-13:978-90-020642


[Added 4 April 2013: The following link worked: http://repository.tudelft.nl/assets/uuid:843020de-2248-468a-bf19-15b4447b5bce/dep_meijer_20061114.pdf ]

Miller, 1956 -The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information, The Psychological Review, 1956, vol. 63, pp. 81-97


Pattee, 1973 – Hierarchy Theory – The Challenge of Complex Systems, Howard H. Pattee, 1973,International Library of Systems Theory and Philosophy, http://www.amazon.com/review/product/080760674X/ref=dp_top_cm_cr_acr_txt?%5Fencoding=UTF8&showViewpoints=1

Smith and Morowitz, 1982 – Between History and Physics, Temple F. Smith and Harold J. Morowitz, Journal of Molecular Evolution, Springer-Verlag. http://www.springerlink.com/content/k021665p34j75565/

Waldrop, 1992 – Complexity – The Emerging Science at the Edge of Order and Chaos, M. Mitchell Waldrop, 1993, Simon and Schuster