( ESNUG 548 Item 3 ) -------------------------------------------- [04/03/15]

Subject: Engineering analysis finds 14 technology failures in Intel-Altera

> Last Friday the Wall Street Journal caught the chip world by surpise when
> it said Intel was in talks to buy Altera. ... At first it seemed like
> everyone loved the idea!  But after a few days, the enthusiam wore off
> when a number of stakeholders slowly realized that this Intel-Altera
> merger might not be such a good idea.
>
>     - from http://www.deepchip.com/items/0548-02.html


From: [ Been There, Done That ]

Hi, John,

Your column was good, but you failed to take on the fundamental engineering
flaws with an Intel-Altera merger.  The Wall Street types touted "FPGAs in
the Data Center" as the key technical reason for Intel to go after Altera.
"There is synergy here!," they claimed.
Their stance came from the assumption that Intel needed to protect its data
center monopoly from an ARM intrusion -- and Altera could help.  This was
puzzling from the technical perspective for 14 failures.  My engineers and
I have denoted those 14 failures here as (F#1 - F#14).
     
The whole buzz on this data center thing started with this one paper from
Microsoft Research:

  Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services

In this self-congratulatory paper, 23 MSFT authors show an acceleration of
their Bing search engine using FPGAs with a machine called "Catapult".

    - They start with a Xilinx Virtex-6.  They give up for reasons that
      are unpersuasive and then settle on a Altera Stratix-5.

    - The MSFT researchers create "Catapult", a network using 1,632 Xeon
      processors servers each paired with one Altera Stratix-5.

         "The system takes search queries coming from Bing and offloads
          a lot of the work to the FPGAs, which are custom-programmed
          for the heavy computational work needed to figure out which
          webpages results should be displayed in which order."

              - Wired.com (06/2014)

      Even though the FPGAs are 40X faster than the Xeon CPUs, the final
      search speed-up is only 95% faster than before.  (F#1)

    - The MSFT researchers are very worried with reliability. (F#2)

    - They are unclear why reconfigurability is beneficial, but they throw
      powerful words at the topic: the closest they come is "datacenter
      services evolve extremely rapidly". (F#3)

    - The MSFT researchers improved the Bing search performance by 95%.
      (Not quite 2x.)  They reduced tail latency by 29%.  They increased
      power by 10%.  They increase the cost of ownership by 30%.

So 23 engineers with an infinite budget spent several man-years of effort to
increase the Bing search performance by a factor of 2 -- while limiting the
power to only an additional 10%.   For those of us who specialize in
algorithmic acceleration using FPGAs, this was not impressive.  It was
actually depressing.  They really needed to get to 100x-1000x or more. (F#4)

TRUE COSTS GLOSSED OVER

Moreover, that so-called 30% increase "cost of ownership" is highly suspect.
We think the cost of each server blade is about $5k.  Our cost for the
Stratix-5 D5 FPGA alone is $2k.  So the commercial cost of the FPGA board
going into each server would be nearer to $5k (in quantity pricing) and
perhaps higher.  This means that the "cost of ownership" is closer to 2X
and nowhere near the 1.3x claimed in the MSFT paper.  (F#5)  Maybe MSFT
got the FPGAs for free, so the costs are not reported accurately.

Also, the NRE associated with this whole process was not amortized into the
costs.  (F#6)  If we had to hazard a guess, Microsoft Research spent $10M or
more on this project.  That number needs to be included. (F#7)

        ----    ----    ----    ----    ----    ----    ----

Regarding Intel in this whole thing...

    - The Bing speed-up results are unimpressive.  It is possible, and
      very likely, that a team of SW engineers given the same budget
      could optimize the SW on a Xeon CPU system doing Bing searches
      that was equal, or better, to what was accomplished via FPGAs. (F#8)
      So Intel gains nothing by acquiring Altera.

    - There is no difference whatsoever between Xilinx FPGAs and Altera
      FPGAs for searches and data center uses.  If you told my engineers
      to replicate this research, we could do it just as well using
      Xilinx FPGAs.  Altera FPGAs provide no advantage. (F#9)
     
      So there is no synergy that Altera uniquely adds that would help
      Intel.  Actually, Intel would be better off letting Xilinx and
      Altera compete for this business.

    - There is no obvious benefit to "system reconfiguration".  So if
      this is truly a valid application, why not just fund a HW startup
      to make a "data center ASIC" and discard the FPGAs?
      This is where FPGAs fall flat in the HPC market.  Anything worth
      doing in an FPGA can be done better, faster, cheaper, and with
      lower power in an ASIC.  (F#10)  Bitcoin is a recent example.  The
      HW progression for Bitcoin searches was

                           PCs --> FPGAs --> ASICs

      The window for Bitcoin FPGAs was, at most, 9 months before Bitcoin
      ASICs hit the market.

    - Reconfiguring FPGAs in PCIe on the fly is a nightmare.  (F#11)
      Bing searches and data center algorithms change.  Not practical.

    - There have been rumors that some data center algorithms can be
      accelerated with FPGAs by 1000x or more, but we haven't seen
      anything compelling so far.

So this fuzzy "FPGA in the data center" clearly falls into the "solution
looking for a market" category.    Perhaps combine an Intel processor and
an Altera FPGA in the same package?  That would dramatically increase the
bandwidth between uP and the FPGA -- getting around one of the biggest
bottlenecks in FPGA-based acceleration.  But Intel doesn't need to acquire
Altera to do this.  (F#12)

        ----    ----    ----    ----    ----    ----    ----

And regarding Altera...

Silicon geometries in FPGA land are still adhering to Moore's Law.  But 28nm
looks to be the "golden node".  Pricing on the 20nm Xilinx Virtex UltraScale
is ranging 3-5x PER GATE more expensive than the 28mn Xilinx Virtex-7.  This
renders the entire Virtex UltraScale family useless.  The 10% speed advantage
is miniscule.  The only advantage is power -- but even that's minimal.  So
Altera has NOT lost much market share, if any, by missing 20nm.  Altera might
even be better off playing TSMC against Intel for the 10nm node.  (F#13)

        ----    ----    ----    ----    ----    ----    ----

We haven't been able to come up with a single technical reason why Intel
should acquire Altera.  (F#14)  The "data center" buzz is not compelling.
We can't think of anything else related to FPGAs and data center HW that
Intel can't get elsewhere for significantly cheaper.

Please keep me anonymous for fear of retributions, John.

    - [ Been There, Done That ]

        ----    ----    ----    ----    ----    ----    ----

  Editor's Note: The recent rumors now are that Altera has rejected a
  an Intel $54/share takeover bid.  That's $16.3 billion!  - John


Join    Index    Next->Item






   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.





Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)