Emeagwali's answers to frequently asked questions.
What are distributed and parallel computing?
There are technologies used to harness the power of thousands of
“electronic brains” called microprocessors.
Your personal computer is powered by one microprocessor,
while a supercomputer is powered by up to 65,536 microprocessors.
I was the first to program a 65,536-processor parallel computer to
outperform a conventional supercomputer. The key to my discovery was
visualizing the names, locations and 12 nearest nodes of each processor.
My 65,536 processors were networked as a 12-dimensional hypercube with
16 processors at each of its 4096 (two to the power of 12) nodes.
Artist's illustration of my chicken vs. oxen metaphor in which I succeeded in demonstrating
that 65,000 chickens
(inexpensive computers) are more powerful than a single $100 million supercomputer.
(Courtesy of "One of the World's Fastest Humans," Michigan Today, February 1991)
Do you believe that distributed computing is more powerful than parallel computing?
I disagree with the claim that clusters are cost-effective for seismic modeling. Clusters are B-grade supercomputers with advertised high theoretical computational speeds. You cannot perform petroleum seismic simulation with theoretical horsepower.
If you look under the hood, so to speak, you will see that the computing nodes of a massively parallel supercomputer are tightly-coupled, while those of a cluster are loosely-coupled.
The laws of physics used in discovering and recovering petroleum are codified using an advanced form of calculus called partial differential equations. Prior to solving these equations on a supercomputer, the computational mathematician must reduce them to a discretized format that is defined over millions of tightly-coupled points. The supercomputer programmer will obtain low practical speed when a tightly-coupled problem is distributed amongst loosely-coupled clusters of computers. Therefore, the most efficient computing paradigm for seismic and reservoir simulation will remain a massively parallel computer.
Furthermore, I demonstrated in my 1988 discovery that it is practical and cost-effective to tightly couple 65,536 processors within a refrigerator-sized computing machine. For the sake of comparison, it would require a big building to house a cluster of 65,536 servers that are stacked on metal racks.
What is a hypercube?
A hypercube is a cube in a higher-dimensional world.
I visualized my processors at the vertices of a hypercube.
Below is an illustration of the information pathways that I used to
construct high-dimensional hypercube configurations. Each hypercube
was constructed by moving the next lower hypercube along an additional
direction. For example, my 7-dimensional hypercube was constructed
by moving my 6-dimensional hypercube along the seventh dimension.
A three-dimensional cube contains eight vertices with three edges
emanating from each vertex. Below is a five-dimensional hypercube
that contains 32 vertices with five edges emanating from each vertex.
A sixteen-dimensional hypercube contains 65,536 vertices with sixteen
edges emanating from each vertex. I visualized my 65,536-processors as
a 12-dimensional hypercube with 12 bi-directional communication channels
emanating from each computing node and 16 processors on each node.
The above enabled me to create the new knowledge for controlling and
instructing the flow information and data along the twelve bi-directional
communication channels of a
12-dimensional hypercube computer.
A hypercube with one and two nodes.
How did you program thousands of processors to achieve your world record?
First, I began with a map of the names, locations and links to all 65,536 processors.
The links of my hypercube computer correspond to the edges of the associated hypercube graph.
A better understanding of my hypercube topology is gained by studying how a hypercube graph is
constructed from the next lower-dimensional hypercube graph.
An N-dimensional hypercube graph can be constructed from two (N-1)-dimensional
hypercube graphs by connecting the corresponding vertices of the two graphs.
The new binary representation of the vertices is changed by giving vertices of one graph the highest-order
address bit 0 and giving the vertices of the other the highest-order address bit 1. The example of
how to build and number a four-dimensional hypercube computer from two three-dimensional hypercube
computers is illustrated in the figure below.
Two, 3-, 4-, 5-, 6-, and 7-dimensional hypercube configurations, respectively.
Figure also illustrates how higher dimensional hypercubes can be constructed from lower dimensional ones.
(Illustration by Mathworld)
The five colors in my five-dimensional hypercube graph below show the five independent paths of
a five-dimensional hypercube, which can be used to deliver a message between any two
processing nodes that are the farthest possible distance apart. Similarly, N colors can be used to
show the N independent paths of an N-dimensional hypercube that can be used to
deliver a message between any two processing nodes that are the farthest possible distance apart.
The graph of my five-dimensional hypercube with 32 vertices. In my mental image of my
32-node, five-dimensional hypercube computer (shown above), the pink
dots represent my computing nodes
and the lines represent my communication channels.
Every node in my hypercube computer is identified by a unique binary number which consists
of 0's and 1's. The length of these binary numbers is equal to the dimension of the hypercube computer.
Their values, in the decimal numbering system, range from zero to the number of processing nodes
minus one, inclusive. These binary representation numbers are uniquely arranged in a sequence
generated by using the binary-reflected Gray code. The binary-reflected Gray coded numbering
system is illustrated in Figure 1 for the cases of zero-, one-, two-, and three-dimensional hypercube
The reason that I preferred the binary-reflected Gray code over the other binary Gray codes is
that it allowed me to easily compute the number of nodes that must be traversed in sending a
message between any two nodes of my hypercube computer. In other words, it allowed me
to compute how far apart any two processing nodes, within my binary hypercube computer, are from
each other. As can be deduced from the illustration, this distance is is equal to the number of bit
representations by which their binary identification numbers differ. Hence, two processing nodes are
nearest neighbors if their binary representation differs by exactly one bit position. For example, my
parallel computer was a twelve-dimensional hypercube computer, and, consequently, each node within it
could be uniquely identified by a 12-digit binary number. Those represented by the 12-digit binary
numbers 000 001 000 000 and 010 001 000 000 are nearest neighbors, while those represented
by 000 111 000 000 and 000 000 111 000 are a distance of 6 units apart. Using the same reasoning,
I can show that the average distance between any two processing nodes is N/2, that the greatest distance
between any two processing nodes of a hypercube graph is not greater than the dimension of my
hypercube computer, and that each processing node of my N-dimensional hypercube computer
has N links that directly connect it to N other processing nodes which are also its
I prefer the HyperBall over the hypercube. The reason is that a hypercube computer has several
disadvantages. First, the hypercube computer uses a large
number of expensive communication channels. This is because when the number of processing
nodes are doubled, the number of communication channels are more than doubled. For a
system that has the capability to perform bi-directional internode communication, if processing nodes
are used, then communication channels must be used. For example, the ten-dimensional model of a
hypercube computer series has processing nodes and 10,240 bidirectional communication channels.
On the other hand, the eleven-dimensional model of the same series has 2048 processing nodes
and 22,528 communication channels. As we can see, merely doubling the number of processing
nodes more than doubles the required number of communication channels. This disproportionate
growth in the number of communication channels gets even worse as the number of processing nodes
used increases. In comparison, each node of a k-dimensional mesh computer has 2k
channels. Therefore, the total number of communication channels of a mesh computer grows at the same
rate as the nodes.
A three-dimensional hypercube graph.
A four-dimensional hypercube graph.
A five-dimensional hypercube graph.
A six-dimensional hypercube graph.
A seven-dimensional hypercube graph.
An eight-dimensional hypercube graph.
A nine-dimensional hypercube graph.
A ten-dimensional hypercube graph.
An 11-dimensional hypercube graph.
A 12-dimensional hypercube graph. The information pathways of the
4096 nodes of my twelve-dimensional hypercube. Each node
(symbolically represented by a point) has twelve communication
channels (represented by a line) emanating from it.
A 13-dimensional hypercube graph.
A 14-dimensional hypercube graph.
A 15-dimensional hypercube graph.
Have you established a new world computational speed record?
Although, I programmed 65,536 processors to perform the world’s
fastest computation (i.e. 3.1 billion calculations per second
in 1988). However, I never set out to establish a world computing record. It
made the headlines because it had, at that time, been widely
believed that it would be impossible to program thousands of
processors to outperform conventional supercomputers.
Although the supercomputer was my claim to fame,
my research was never on supercomputers, per se. It
was on the kernel of knowledge that powers both the supercomputer
and the Internet. My focus is on the conceptual foundation of
the next-generation Internet, namely computation and
The essence of my research was to demonstrate that thousands of inexpensive processors could outperform any supercomputer. In other words, I wanted to create the knowledge that supercomputers should utilize thousands of processors.
In 1988, I announced that I had successfully divided a petroleum reservoir into 65,536 smaller problems and then mapped, communicated and distributed them to 65,536 processors of the Connection Machine, all networked together as a 12-dimensional hypercube with 16 processors at each of its 4,096 nodes.
The hypercube was a cube in 12-dimensions with 4,096 (two to power 12) vertices. I used 4,096 nodes (i.e. two raised to power 12) because it enabled me to map my problem onto a 12-dimensional mathematical hyperspace. I projected a three-dimensional problem onto a 12-dimensional hypercube space, and projected it back into three-dimensional space.
In the 1980s, the debate among supercomputer manufacturers was: “Is it practical to use thousands of processors?” The manufacturers’ consensus was “no.” I answered “yes” by using 65,536 processors to solve one of the 20 most difficult problems in the computing field. My original calculations were conducted with the partial differential equations used in petroleum reservoir simulators. However, it was understood that my mathematical, programming and mapping techniques could be extrapolated to equations used for climate, cosmological, and other computation-intensive simulations. The latter achievement eventually became my “claim to fame.”
What did you contribute to the reinvention of supercomputers?
I added new knowledge that is incorporated inside all
I took parallel computing to a higher level than the supercomputer, an
achievement that inspired the reinvention of the supercomputer. In
The New York Times (11/29/89), the
president of the leading supercomputer manufacturer warned,
"We can't find any real progress in harnessing the power of thousands of
processors." A few weeks later, my discovery of the formula that
allows parallel computers to perform their fastest computations
started making the headlines.
My formula enabled me to subdivide the problem into smaller parts
and re-distribute the parts to 65,536 processors. That discovery
inspired the reinvention of vector supercomputers as parallel
supercomputers. Vector supercomputers use an ultra-expensive
processor designed to perform the fastest calculations on long
strings of numbers called “vectors.”
I created the knowledge that the power of thousands of processors can
be harnessed; this knowledge, in turn, inspired the reinvention of
supercomputers powered by only one processor as supercomputers powered
by thousands of processors. Since my discovery - in part - opened and
prepared the technology for commercialization, I am considered a pioneer
in parallel computing.
My wife and I at the Gordon Bell Prize award ceremony, Cathedral Hill Hotel, San Franscisco, CA.
February 28, 1990.
Updates by Webmistress
Emeagwali's discoveries described above inspired Bill Clinton
to extoll him as "one of the
great minds of the Information Age." CNN and the Institute of Electrical
and Electronics Engineers also commended him
for contributing to the reinvention of the supercomputer.
In the 1980s, the debate among supercomputer manufacturers was: Is it
practical to use thousands of processors? The consensus was NO. IBM's top computer designer, Gene
Amdahl, postulated, in a classic paper written in 1967, that it will be
impractical to use many processors to solve real-world problems.
On November 29, 1989,
the president of Cray, the leading supercomputer manufacturer,
told The New York Times:
"We can't find any real progress in harnessing the power of
thousands of processors"
Emeagwali disagreed and distributed
copies of his 1057-page report
that provided a detailed step-by-step method for harnessing the
power of 65,536 processors. This work led to the reinvention of
supercomputers, from using one processors to using hundreds or thousands of
The CNN described Emeagwali's
reinvention of the supercomputer as follows:
"It was his [Emeagwali] formula that used 65,000 separate
computer processors to perform 3.1 billion calculations per
second in 1989. That feat led to computer scientists
comprehending the capabilities of supercomputers and the
practical applications of creating a system that allowed
multiple computers to communicate. He is recognized as one
of the fathers of the Internet."
Emeagwali's groundbreaking studies that changed the way
IBM thinks about supercomputers almost didn't get published.
At first rejected for bucking the conventional
theory of the
day, it was
finally published after it won the
1989 Gordon Bell Prize, computation's
This work was a fundamental shift in supercomputer design --- incorporating thousands of weak processors instead of one
powerful one. IBM summarily rejected his proposal to employ or fund his
initial research. Today, computer giants now use
the technology to manufacture supercomputers.
supercomputer of today is the computer of tomorrow, technological
are the lifeblood of the information technology industry. For
hardware manufacturers, such as IBM, they are vital technologies
that enable them to compete, diversify and expand. In fact, since
one-third of the US economic growth is now attributed to information
technology, new computer inventions are also the lifeblood of the
Put simply, the United States needs more powerful
computers for growth, profitability and global competitiveness.
Any new idea that is a radical departure from the accepted
goes through three stages: Rejection, Ridicule, and Acceptance.
For an idea to gain acceptance, the originator must continue to harp on it
even when the listeners find it annoying. The struggle to get an innovative
idea accepted is greater than the struggle to conceive it.
As writer George Bernard Shaw
"The reasonable man adapts himself to the world; the unreasonable man
persists in trying to adapt the world to himself. Therefore, all progress
depends on the unreasonable man."
- History of the Internet, by Christos J. P. Moschovitis, et al, 1999
- Upstream, Oslo, Norway (oil & gas industry publication), January 27, 1997
- Software, Institute of Electrical and Electronics Engineers, May 1990
- SIAM News, Society of Industrial and Applied Mathematics, lead story, June 1990
- CNN, http://fyi.cnn.com/fyi/interactive/specials/bhm/story/black.innovators.html
- The New York Times, November 29, 1989.
- The White House, http://clinton6.nara.gov/2000/08/2000-08-26-remarks-by-the-president-in-address-to-joint-assembly.html
Emeagwali was born in Nigeria (Africa) in 1954. Due to
civil war in his country, he was forced to drop out
of school at the age of 12 and was conscripted into
the Biafran army at the age of 14. After the war ended,
he completed his high school equivalency by self-study
and came to the United States on a scholarship in March
1974. Emeagwali won the 1989 Gordon Bell Prize,
which has been called "supercomputing's Nobel Prize,"
for inventing a formula that allows computers to perform
fast computations - a discovery that inspired the
reinvention of supercomputers. He was extolled by the
then U.S. President Bill Clinton as "one of the great
minds of the Information Age” and described by CNN
as "a Father of the Internet." Emeagwali is the most
searched-for modern scientist on the Internet (emeagwali.com).
Click on emeagwali.com for more information.