The Intel Itanium processor, a monumental collaboration between Intel and Hewlett-Packard, represented an ambitious attempt to create a new 64-bit computing architecture (IA-64) to supersede the dominant x86 standard, but ultimately failed to achieve widespread market adoption due to technical challenges, market shifts, and the enduring strength of x86.
Key Points
Mind Map
Genişletmek için tıkla
Tam etkileşimli Mind Map'i keşfetmek için tıkla
In June 1994, Intel and Hewlett-Packard - two of Silicon Valley's largest and
most powerful companies - announced an alliance.
From the union of these two giants, will spring forth the next generation of CPUs.
The Great Successor. Chosen to unify two architectures under one umbrella.
It was named Itanium and by 2002 Intel had spent $5 billion on it. In today’s video,
we trace one of Intel's most ambitious products.
## Intel and 64-Bits
The x86 instruction set helped turn Intel into a giant.
A massive ecosystem had built up around it. In the 1980s, four out of every five
PCs shipped with an Intel CPU. These huge volumes helped them afford to build big,
advanced semiconductor fabs and produce at the lowest cost.
Why leave it all behind? But after shipping the famous Pentium CPU,
powerful voices inside Intel indeed began to assert that the time had come for something
new. The foremost reason for doing so was something called 64-bit computing.
The "64-bit" part of that phrase refers to the size of a CPU's "register". At the time,
Intel's CPUs were 32-bit processors, and that limited them in several ways.
The most prominent limit being that a 32-bit computer can only use up to
about 4 gigabytes of working memory: 2 to the power of 32. Less in practice,
because some of that is taken up by the operating system.
In the early 1990s, this 4-gigabyte wall was not a big deal for the consumer market
because PC memories topped out at about 128 megabytes. Who can imagine ordinary
folks ever needing much more than that at least in the near future?
But it was a big deal for graphics workstations, scientific computers
handling precise calculations, and web servers delivering content over the Internet.
These are powerful, very expensive machines that at the time ran UNIX.
Intel then dominated the PC, but had no presence in that high end space.
That space was populated by RISC chips like Sun Microsystems' SPARC,
Hewlett-Packard's PA-RISC, or DEC's Alpha. Intel wanted to get into that game.
## Extension versus Blank Sheet So question. Why not just extend the existing
32-bit x86 instruction set so that it can handle 64-bit registers?
After all, that is what Intel did with the prior major transition from 16-bit
to 32-bit. It wasn't easy, but the resulting 386 Intel CPU was a massive
success - powering a generation of PC clones like those from Compaq.
Intel even tried a similar, very ambitious blank sheet approach for that 32-bit transition.
The iAPX 432 was Intel's first 32-bit architecture. And to skip a lot of
words - feel free to read the very long Wikipedia if you care - that product failed.
But kind of like invading Russia, history rhymes. Intel felt that the 64-bit transition
would be different. And that this time, x86's years-old legacy CISC components would hold it
back. A lot of extra tooling and rules had to be followed to preserve that old world.
AMD and the other x86 cloners were a factor too. A history of 64-bit
computing by Matthew Kerner and Neil Padgett interviewed Richard Russell,
who pointed out that AMD's cross-licensing agreements gave them access to Intel's x86 work.
So the way it went was that Intel first releases a new x86 chip. Then six or twelve months later,
AMD releases their version at a cheaper price. This devalued
Intel’s R&D and burned a huge amount of profits for everyone involved.
The ghost of IBM and the PC loomed too. There is no guarantee that Intel will forever control x86.
The day might come that AMD, Cyrix, and the other x86 cloners somehow
pry control of the standard like what the PC cloners did to IBM.
So in Intel's eyes, yeah sure they can always extend x86. But the reboot-with-a-clean-sheet
approach could potentially let Intel surge ahead of the competition with an architecture
that it fully owned. And proponents argued that Intel now had enough influence to pull it off.
The debate raged until the late, great Albert Yu - Intel's general manager of microprocessors,
who oversaw development of the 386, 486, and the Pentium - bought in.
But how to achieve it? Kerner and Padgett also interviewed Dileep Bhandarkar,
who was then an Intel director. Bhandarkar recalled the company
doing a small internal 64-bit effort in 1992 while investigating outside opportunities.
The computer company DEC tried to get Intel to take on their RISC chip Alpha,
a very fast chip, which they declined. Intel then suggested DEC make the Alpha in Intel’s
fabs, which DEC declined because they just spent half a billion dollars on a new fab.
Then in late 1993, HP came knocking on Intel's door with an exciting new technology.
## A Post-RISC Technology In 1990, Hewlett-Packard rehired the brilliant
Bill Worley to flesh out the future of their proprietary line of chips.
Worley used to work at IBM alongside John Cocke on the legendary IBM 801 project.
801 is widely acknowledged as the driving force that kicked off the RISC revolution.
He then joined HP where he helped produce one of the earliest RISC instruction sets,
Performance-Architecture RISC, or PA-RISC. The architecture became a
growth engine for Hewlett-Packard through the turbulent RISC wars of the 1980s.
Worley then briefly left HP to lead a graphics processor
startup but rejoined in 1990 for a special project. The PA-RISC team
recognized that RISC was on the verge of hitting serious performance limits.
So a new project initially called "Super Workstation" was formed
to explore new architectures in the post-RISC beyond. Over time,
Super Workstation's work began to intertwine with that of another team inside Hewlett-Packard:
Fine-grained Architecture and Software Technologies, or FAST an HP internal
project exploring and evolving a radical concept known as Very Large Instruction Word, or VLIW.
## Meet VLIW
Very Long Instruction Word is a term coined by the brilliant Joshua Fisher while he was at Yale.
The way he puts it, VLIW describes a design philosophy. A concept or idea
more like RISC, rather than a specific instruction set like ARM or x86.
Its goal is for a CPU to achieve as much Instruction-Level Parallelism or
ILP as possible without making its hardware do it.
What is ILP? It is a way for a single microprocessor to speed up work by
initiating and executing multiple machine instructions in parallel
so that we can try to get more than one useful operation done per clock cycle.
Traditionally, high levels of ILP were seen as infeasible because programs have so many branching
conditions: If/else statements, loops and annoying dependencies that change the path of the code.
VLIW tries to surpass those shortcomings
by running "traces" of the program code. Using heuristics and user-provided data,
the compiler will try and guess how the user's program might progress.
That compiler then aggressively schedules the trace’s instructions
for maximum parallelism regardless of dependencies. To handle mistaken guesses,
the compiler adds compensation code to "backtrack" or fix things up.
These scheduled instructions are then packed together and sent to
the hardware in "very large words". Ergo the name.
People initially thought VLIW computers were impossible. That
is because it requires a compiler that can somehow predict a program’s future.
The difficulty of producing such compilers is a recurring theme with this technology.
## Fisher and Rau
Wanting to prove the skeptics wrong, Josh Fisher left to start a startup called Multiflow.
In 1987, they produced a line of powerful mini-supercomputers called TRACE. Over the
next two years, they sold and shipped about 100 units to scientific and commercial users.
Multiflow was not the only startup exploring VLIW at the time. There was
another founded by a brilliant Indian-American named Bob Rau.
Rau had led a team at the computer company TRW
studying similar Instruction-Level Parallelism techniques. In the same
year Fisher founded Multiflow, Bob Rau and several colleagues left to found Cydrome.
Cydrome worked on a VLIW-based "departmental supercomputer" called the Cydra 5. And while they
got it to work, it never shipped as a commercial product. The company eventually disbanded.
Multiflow also disbanded. In 1989, the mini-supercomputer market crashed from
over-competition in the category as well as cannibalization by powerful
single-chip RISC workstations called "Killer Micros". Circumstances trumped technology.
## A Radical Idea
After their startups closed down, both Bob Rau and Josh Fisher joined Hewlett
Packard and the FAST project with the goal of evolving the VLIW technology.
At the time, the big thing in the microprocessor world
was an ILP approach called Out-of-order Superscalar. This approach was arguably
pioneered by the aforementioned John Cocke and Tilak Agarwala.
Roughly speaking, superscalar involves us adding independent stations to the CPU, plus extra
hardware to grab a lot of instructions, figure out their various dependencies, and send them to
the right stations for simultaneous execution. This is all done as the program is running.
Superscalar worked. IBM utilized it then for their high-performing RS/6000 workstation.
Intel would later use it for their Pentium processors. But Rau and Fisher
came to believe - quite controversially - that superscalar is an anchor. An anchor that will
blunt the lift that microprocessors were then getting from Moore's Law.
Superscalar leans heavily on hardware to analyze instructions, figure out their
various dependencies, and sort them into the ideal order as the program runs. Such hardware
is incredibly complex and power-hungry. Rau and Fisher bet that it will not scale.
With their contributions, Super Workstation produced a new architecture called PA-Wide
Word or PA-WW. It performed quite well compared to what existed inside HP.
Next then is to design and produce a chip that implements this architecture. But in this,
there were challenges. Worley realized that PA-WW chips would have to be made
in a leading edge fab. In a 2001 interview for HP Labs, he explained the ramifications of such:
> The costs of such a fab implied that the chip volumes would have to be extremely high. High
volumes, as well as the need to attract software from many providers, implied that the architecture
would have to be an industry standard. An industry standard implied that HP could not do it alone.
Thus in July 1992, Worley recommended that HP bring in a manufacturing partner
with both prowess and scale. The obvious partner was Intel.
On Thanksgiving 1993, HP's CEO Lew Platt made a call to Andy Grove, asking whether
Intel might be interested in working with HP to make PA-WW the successor to x86.
Grove said no. HP tried again later, emphasizing that PA-WW would be fully
backwards compatible with both x86 and PA-RISC. This time it worked.
## Intel and HP Team Up So what did Intel see that got them so interested?
The HP design team included well-respected folks like Josh Fisher, Bob Rau,
and Bill Worley. And that team had already made much progress. In a widely circulated quote,
Intel's John Crawford told the Wall Street Journal:
> When we saw WideWord, we saw a lot of things we had only been looking at doing,
already in their full glory
A PA-Wide Word architect named Rajiv Gupta had this second golden quote - also widely circulated:
> I looked Albert Yu in the eyes and showed him we could run circles around PowerPC [a
competing IBM processor], that we could kill PowerPC, that we could kill the x86. Albert,
he's like a big Buddha. He just smiles and nods.
Intel would be blind if they didn't also notice the competitive dynamics. They can
convert one of their significant RISC rivals onto a technology platform that they control.
And if HP gets on board, then maybe others like Sun and Silicon Graphics will too.
Grove was intrigued and ordered a bake-off between PA-WW and its own internal 64-bit
architecture effort. PA-WW won. So they hammered out a deal, announced in June 1994.
Hewlett-Packard transfers the PA-WW IP over to Intel. Intel then designs and produces
the first CPUs. HP can then get said CPUs at a discount to produce enterprise system products.
There were no solid products, only a statement of direction towards a future
computer architecture. The first processors were not anticipated to arrive before 1998,
but once delivered, they will carry both companies into the 21st century.
This was going to be a massive project. Albert Yu anticipated it costing between $400 to
$500 million over its whole life. An underestimate as it turns out. But Intel
can afford it and the results were going to be amazing. Albert Yu told the press at the time:
> By combining our skills ... we will offer the marketplace chips and systems
with absolutely unparalleled performance for the future
## Taking Names
Now. I want to pause a bit and talk names. Part of what makes this all so confusing are the names.
There are more names here than you can shake a stick at. And unfortunately
they all come out at different times. I am going to step out of the flow of
time and gather them all together here so that we can keep track.
So we start off with HP’s Super Workstation,
which produces PA-Wide Word. The announced 1994 collaboration with Intel would eventually evolve
PA-WW into a new thing called "Explicitly Parallel Instruction Computing", or EPIC.
EPIC is an architecture philosophy kind of like how CISC or RISC are philosophies. So
think of it like the philosophy of French cuisine - a style with recommendations on
how to achieve a wanted goal. EPIC likes parallelism. French cuisine likes sauces.
EPIC is a direct descendant of VLIW. So it still transfers complexity from the hardware
to the software compiler. The complier still aggressively analyzes the program code for
parallelism opportunities and group together instructions in big bundles.
But EPIC strikes a more moderate tone by admitting that sometimes
the hardware is in a better position to do certain things in runtime because of
access to program variables. So EPIC accommodates hardware in the CPU for
that - but not so much to make it as complex as a superscalar chip.
Multiflow and Cydrome's VLIW compilers were also too tightly bound to their
microarchitectures' hardware. EPIC addresses this rigidity with something
called "templates" - which help define which instructions can be bundled together.
Now that is EPIC. The next term to introduce is the IA-64 instruction
set architecture. EPIC is to IA-64 as what RISC is to PA-RISC or SPARC. A specific
instruction set implementation of EPIC, defined and owned by both Intel and HP.
So to continue the cooking metaphor, you can think of it as like a French cuisine
cookbook - demonstrating various techniques and recipes for cooks to make French dishes.
After that, we go to the individual chips. The French dishes themselves,
as served by the restaurant. Intel expected its first IA-64 chip to hit the market in 1998.
Internally, this first IA-64 chip had the codename Merced after a river in California.
In October 1999, Intel would announce that the chip would be officially named Itanium. Intel said
at the time that the name conveys the processor's unique strengths and power while retaining the
"-ium" word endings for brand consistency. Netizens almost instantly dubbed it the Itanic.
## Reactions
Anyway. Back to 1994 and the flow of time. Outside analysts saw the
collaboration's potential - citing the two companies' talents and capabilities.
Hewlett-Packard was top two in the workstation and server markets,
where Intel was then weak. And of course, Intel was the juggernaut of the PC industry,
trying mightily to get into the workstation and server industries.
Analysts looked at how the collaboration might have on IBM, which backed its own PowerPC line
of RISC chips. Andrew Allison of the "Inside the Computer Industry" newsletter told ComputerWorld:
> I would imagine that IBM is not terribly thrilled with it ... It’s probably the only
combination that is virtually guaranteed to have the horsepower to stand up to PowerPC.
Intel didn't outright say it - and they would later deny to have ever
implied such a thing - but they also positioned this new family
as the future successor to x86. One VP at a Boston consultancy said:
> "Intel is smart enough to know when it’s time to be at the end of the x86 line."
The Microprocessor Report echoed the notion that the end was now in sight for x86.
This new architecture will supersede both it and
PA-RISC before trickling down to the mass market. They write:
> We expect that, in about 10 years, Intel will stop making pure x86 chips in favor
of [the new chips]. Intel will continue to milk the x86 cash cow as long as it can ...
> Intel’s P6, due in late 1995, probably will be the last pure x86 core that Intel develops
## Disagreements
Not everyone agreed with that. Shortly after the announcement,
Nick Tredennick wrote up a dissenting view.
He argued that the two companies had shot themselves in the foot by
transitioning architectures and pursuing the VLIW "technofad".
He pointed out that big architectural shifts require developers to recompile
their software. Which they hate doing because it’s never smooth.
And that the complicated hardware will also need extremely complicated
compilers. Neither of which have good histories of on-time delivery.
And that switching away from x86 would be walking the same mistaken and failed path
that IBM did when Big Blue tried to lock down the PC ecosystem with the Micro Channel Architecture.
Add to this boiling bone broth the collaboration's high expectations, which towered over K2.
Robert Colwell is a legendary CPU designer who previously worked at
Multiflow. He then went to Intel in 1990. In his memoirs, he wrote:
> In essence, [the Intel design team in charge of IA-64] were told that their mission was to
jointly conceive the world’s greatest instruction set architecture with HP,
and then realize that architecture in a chip called Merced by 1997,
with performance second to no other processor, for any benchmark you like.
Merced will also do all these things while being fully compatible with
legacy software of both x86 and PA-RISC. This sounds ambitious.
Colwell was not alone in his doubts. Intel's chief of corporate strategy
at the time was David House. While he approved the project,
he would later say that its sheer scale - and I quote - "scared the everloving bejesus out of me".
## Merced
Intel sold chips to HP, but they never worked together on this level.
HP is famous for its consensus-based management
style. Intel on the other hand is just as famous for "constructive confrontations",
where people are expected to challenge each other bluntly, promptly and with data.
So the two arm-wrestled over what functions should
be handled by the software or hardware while simultaneously ramping up their teams with new,
relatively inexperienced people. There was tension.
The difficult experience was either so traumatic or constructive that HP took the sole lead
for the second generation of IA-64 chips. This particular chip project was code-named McKinley.
The original plan was to release Merced in 1998 and fab it with
Intel's 250 nanometer node. But then the chip design was found
to be spilling beyond the limits of what can be fabbed. Like a muffin top.
So the designers took out transistors allocated for memory cache and x86
compatibility. Removing the latter was made easier after the much-faster
Pentium Pro released because of weak x86 performance relative to that beast.
Even so, there was still spill over. So it was decided to go to the 180-nanometer node
instead. The transistor shrink would let them put the whole design onto a single
die. The cost however was a six month delay, pushing the ship date to 1999.
Things progressed. In October 1997, the two companies introduced EPIC
and IA-64 to 1,500 computer designers at the Microprocessor Forum. They talked about EPIC's
key architectural choices and emphasized its speed relative to existing RISC chips.
Intel also shared a release date for Merced:
1999. They said it would have industry leading performance, full compatibility
with the old 32-bit architecture, and have a complete solution stack at launch.
Several big software developers announced their participation in the IA-64 ecosystem.
Microsoft agreed to have a 64-bit version of its Windows NT operating system available at
release. Sun said it would make their Solaris OS available on Merced chips.
And to raise the hype even more, presenter and Intel Fellow Fred Pollack teased the
second-generation McKinley chip, saying that it was going to "knock your socks off".
## P7
When Colwell arrived at Intel back in 1990,
he helped found the company's second design team in Oregon.
That team - working in friendly competition with a team in Santa Clara - began on a product
code-named P6. It would be released in 1995 as the 32-bit Pentium Pro.
The Pentium Pro was a remarkable chip. Despite being fabbed on the same process
as its predecessor (P5), P6 ran twice as fast thanks to the inclusion of ideas like out-of-order
superscalar, which to remind you, searched more aggressively for instructions to parallelize.
The Pentium Pro brought Intel's x86 architecture neck to neck with some of the fastest RISC chips.
It also opened the door to the workstation market by enabling the "personal workstation".
Such personal workstations - running Microsoft's Windows NT or Linux - cost
half that of the old-school UNIX-powered workstations. They grew rapidly in 1995,
eating into the low end of the market.
Unfortunately, internal politics interfered with the Oregon team's pursuit of this opportunity.
Colwell remembers being told that IA-64 will eventually replace the 32-bit lines,
so why keep working on the old legacy stuff?
To Colwell however, the Pentium Pro showed that the 32-bit architecture
still had plenty of juice. With no 64-bit killer application on the immediate horizon,
a premature switch might leave the market to AMD and other competitors.
He also argued that Merced had so many new things going on that there was no
chance that it would all work right on the first try. He felt Intel should have
returned the chip to the lab as a long-term research project to iron out its kinks.
In the end, management could not decide on a coherent strategy on how to resolve the conflicts
between the Oregon team working on 32-bit and the Santa Clara team working on 64-bit Merced.
At first, they were content to just stand aside and let the best one rise to the top.
However this backfired, because Merced had to be compatible with the 32-bit stuff. With
Colwell and the Oregon team still working on it, that goal became an ever-moving target. So
the Santa Clara team tried to "freeze" the specification, which Oregon hated.
In the end, management separated the children: 64-bit for the more powerful server chips.
32-bit for everything else including workstations. That’s the strategy Intel would follow henceforth.
By the way, I highly recommend Colwell's book, "The Pentium Chronicles", where he
talks about these worsening dynamics between Santa Clara and Oregon. It is a strong read.
## A Second Delay
Soon after the October 1997 presentation at the Microprocessor Forum, a new problem emerged.
A source told CNET at the time that Intel severely underestimated the
chip's complexity. The Wall Street Journal later reported Intel struggling with various
signals arriving at parts of the CPU at the wrong time, creating speed bottlenecks.
This was amplified by Intel targeting an exceptionally high 800 megahertz clock rate.
Tweaks made to fix bottlenecks in one module caused ripple effects in
other modules, making debugging endlessly tricky.
There are rumors of other things, but I won't go into them. Whatever the thing was,
it was serious. By mid-1998, the company had to announce that it was
pushing Merced's release from late 1999 to mid-2000. Which means servers do not
reach actual customers until Q4 2000. New CEO Craig Barrett told the press:
> Our best assessment is that the project is a bit bigger and complicated than we assumed
it would be ... we are pleased with progress. There's not a basic problem with the technology.
This second delay means that Merced is scraping up against the second-generation IA-64 chip - the
one that HP is designing code-named McKinley. It was scheduled to enter mass production in 2001.
Intel finally successfully taped out Merced in summer of 1999 and
demonstrated it in the fall at its 1999 Intel Developers Forum. Shortly afterwards,
the fabs started learning how to produce the new chip, with early versions seeded to developers.
## Transition Plans
Both Intel and Hewlett-Packard - perhaps expecting this might happen - went to their backups.
At the 1998 Microprocessor Forum, Hewlett-Packard unveiled a "transition plan" towards IA-64.
They would continue releasing additional PA-RISC chips for the next five years until
2003. Customers can choose which chip they want in their server.
This was not ideal. A former HP executive remarked that they had to do all sorts of tricks
to extend PA-RISC. The delays and distractions associated with getting out IA-64 allowed rival
Sun Microsystems to leap ahead in the web server market during the wild late 90s internet boom era.
And as for Intel, the chip giant revitalized market revenues of its 32-bit architecture in
1998 with the introduction of the Celeron and Xeon lines. Market segmentation.
The former targeted value-minded consumers who otherwise bought cheaper chips from AMD,
Cyrix and other cloners. The first Celeron flopped
because it basically had no cache but later iterations performed very well.
The latter chip, the Xeon, targeted the medium to high-end server market
with faster clock speeds, larger caches and higher cache bandwidth.
So when Merced was announced to be delayed,
analysts noted that it was not a huge deal and that the Xeon can hold on as
a "placeholder". As we will later see, that turned out to be an understatement.
The delay did give OS-makers like Microsoft and the UNIX vendors time
to port for Merced/Itanium. But even as something like a "race" developed, actual
application developer interest remained tepid. One Wells Fargo system architect said in 1998:
> We have a few applications that could benefit from Merced,
but probably not anytime soon ... first we’ve got to take
care of Year 2000 compliance issues. Maybe in 2001 we can look at Merced
## Itanium in 2001: The Revolution is Here
After 7 years and $5 billion spent, Intel finally launched Itanium in the summer of 2001.
Recognizing that their 32-bit products were still going strong,
Intel tried to position Itanium as a powerful but revolutionary product for the "most demanding
enterprise and high-performance computing applications" as their press release said.
So yes, while it might take some additional work at the start, those who do will be
rewarded. They commissioned a white paper to identify "sweet spots for early adopters",
which included technical computing, large databases, and complex analytics.
To the press, Intel worked hard to emphasize that this was just the first step of a long
journey and that the ecosystem adoption thus far at this early stage was pretty impressive.
On the hardware side, they highlighted buy-in from a spectrum of computer manufacturers.
Some 35 Itanium-based models were said to be released by 25 companies like Dell,
Compaq and Silicon Graphics throughout 2001.
Intel also highlighted that Itanium systems can run four compatible operating systems:
Two 64-bit versions of Windows, HP's proprietary UNIX variant HP-UX,
IBM's proprietary UNIX variant, and certain commercial Linux distributions.
With all this backing from the big companies,
people presumed that Itanium would take the market. A 2000 market report from MicroDesign
Resources had predicted that IA-64 chips would have 60% of the server market by 2003.
Unfortunately, Itanium took too long of a path to the market. Soon after its debut,
it was outshone by several new 64-bit RISC like Sun's UltraSPARC III and IBM's Power4 chips.
Microprocessor Report nominated the Itanium for
its Best Workstation/ Server Processor award, but wrote:
> But while other high-end server processor designs are moving to glueless multiprocessing,