HNPWA with Next.js

Linux founder tells Intel to stop inventing magic instructions and fix problems

rsecora | 287 points

duplicate: https://news.ycombinator.com/item?id=23809335

detaro | 4 years ago

Previously: https://news.ycombinator.com/item?id=23809335

zozbot234 | 4 years ago

Two years ago, I wrote an LLVM compiler pass that automatically upgrades SSE or AVX-2 code to AVX-512 when those instructions are available. It's helpful for getting performance gains for hand engineered code that you don't want to touch. We saw some good gains on integer workloads (unpacking, useful for databases - something I'd call "regular code"): 1.16x speedup going from SSE to AVX-2, and 1.43x speedup going from SSE to AVX-512. Even better speedups are available for FP workloads, though hand-writing the AVX-512 version can work a lot better since programmers can exploit the full range of new instructions available in AVX.

https://www.nextgenvec.org/ https://arxiv.org/abs/1902.02816

ajayjain | 4 years ago

There's not much use for AVX-512 in the kernel. But, from the people who are using it I've heard good things. It's an enormous collection of new instructions. But, a lot of them came from the Larrabee team --which was probably the greatest case of high-perf software engineers having direct influence over the direction of a CPU. Otherwise, the history SSE and AVX has largely been software engineers requesting features and hardware engineers replying "Do not understand. How does that feature improve our SPECfp score? Rejected."

corysama | 4 years ago

While I love Linus and I think he's excellent at what he does, he's utterly missing the fact that things like AVX-512 sell chips, and that's what Intel's in the business of doing. And best practice in how one writes software will increasingly look like "write it so it executes well in a massively-parallel context" if the driver for chip sales is massively-parallel problems.

We've smacked pretty hard into the wall of how fast we can make CPUs by miniaturizing components, and the low-hanging fruit now is parallelization, predictive execution, and the whole host of ways to do more per clock cycle, not speed up clock cycles. But that flies in the face of the traditional embarassingly-serial pattern of the x86 instruction set and computing environment. Much of what Intel's doing these days is trying to open up opportunities for people to change tools to speed up code without having to boil the ocean by throwing the whole serial-instruction model completely out the window (even though, increasingly, code written to that instruction set is really operating like a language that is emulated by the underlying parallel-and-predictive CPU hardware).

I think what Linus calls "regular code" isn't going to move chips and we've wrung all the cheap optimizations out of that critical path (and, depending on the details of what is meant by "regular," such code is increasingly going to ram up against an efficiency ceiling unless it can be moved to a model of "Prepare work to do in parallel, execute in parallel, merge results and present to UI").

shadowgovt | 4 years ago

> This is not the first time Torvalds has directed his ire at Intel. In 2018, Torvalds referred to Intel's Meltdown and Spectre patches as "COMPLETE AND UTTER GARBAGE," in all caps to emphasize his level of anger.

I'm 1000% behind him on this point. Fix that first and fix it well. Then get onto fancier stuff that i'll never ever code against.

goalieca | 4 years ago

Seems like many people here haven't actually read Linus's specific thoughts on AVX-512, and are accusing him of rejecting SIMD entirely or the usefulness of FP performance entirely. This is not an accurate representation of his position and he singles out AVX-512 differently from SSE/AVX/AVX2.

Here is a link to the thread directly from Linus explaining his position on MMX/SSE, AVX/AVX2, AVX512. His complaints on AVX512 are both about fragmentation and regressing from the learned lessons from their prior generations. And he also looks to NEON and SVE2 and suggesting ARM looks seems much saner to him:

"So just as a bystander, I'm looking at AVX512, and I'm looking at SVE2, and I'm going "AVX512 really is nasty, isn't it"?""

https://www.realworldtech.com/forum/?threadid=193189&curpost...

dottrap | 4 years ago

Intel? Dude, have you seen Arm's JavaScript instructions? Or RISC-V's user-defined instructions? (and Arm announced the same thing at Arm Tech Con last year)

Someone is missing the boat: custom ISA is the future.

staycoolboy | 4 years ago

I'm still waiting for languages to support 2,3, and 4D vectors as first class data types. These are so common it's silly to have people define them. We end up with different implementations sometimes too.

Please Rust, please!

phkahler | 4 years ago

His idea that AVX-512 is something that's exclusively for floating-point is completely off-base, AVX has included integer operations since AVX2. Widely used in JIT, database, etc.

Furthermore, AVX-512 is about much more than doubling the vector width, it is a significant overhaul to the instruction set and adds many new operations and "fills in gaps" that were missing from previous instruction sets. It in fact would be perfectly valid and good to implement AVX-512 with a 256-bit unit that takes twice as long to run 512-bit width instructions. This completely negates all his points about die space utilization right from the start - AVX-512 support does not imply a significantly larger use of space than previous AVX instructions. This would also fix some of the power-related problems on Skylake-SP - after all if you go from 2 512-bit wide units to 2x256 gangable units or 1x256 running at half-rate, that reduces power correspondingly and you no longer need to drop clocks so strongly to offset this, but you keep the functionality added in AVX-512.

Furthermore, it's not like there are massive gains in general IPC that haven't been tapped. AVX-512 has taken 25% of the die area in some instances, if you dropped that to AVX2 (assume 12.5% of die area) then it's not like the processor would be 12.5% faster in general, that would translate to maybe 2-3% faster in general and a 30%+ loss in specialty applications. Once you've mostly explored general-purpose gains and are into diminishing returns territory (which modern processors certainly are), it makes sense to start looking at "specialty units", like AVX, or on GPUs you've got tensor cores and BVH traversal units, and so on. These can provide big speedups in key tasks at the cost of very little "general" performance (since that's already in diminishing returns territory).

The biggest thing slowing down AVX-512 adoption has been, yet again, 10nm. Right now it is only available on Skylake-X and Skylake-SP products, and more recently Ice Lake (which came out September of last year, in only the ultrabook segment, supplemented by 14nm in the mobile workstation segment as well as the ultrabook segment). So right now it is available in less than 1% of the desktop "fleet" and probably less than 1/8th of the laptop "fleet". There is very little reason to implement code paths for an instruction set that nobody can execute. Over time, as Ice Lake and Tiger Lake build share of the laptop "fleet", Rocket Lake implements it on desktop, and AMD implements it whenever, it will see more usage, just like prior AVX sets.

It really is wider-market than people realize. I have seem many people scoff and say "well you'll never see it used in games or whatever", but for many years now there have been games that simply will not run if you don't have AVX (notably many Ubisoft titles), there is no fallback SSE/scalar codepath. In another 10 years you will probably have AVX-512 mandatory games as well.

With all due respect to his long career in software engineering, that doesn't necessarily translate to processor design. This is just one person's opinion and you are under no obligation to accept it as gospel just because it's Torvalds'. See also: his weird ZFS rant.

(This seems to be a common thing with software engineers in particular, including many on this site - can't count how many "one weird fix from a software engineer to fix [complex domain problem] in [chemical/materials/aerospace engineering]" I've seen. I of course have no particular expertise in processor design either, but the engineers at Intel presumably do, and they thought it was a good idea.

paulmd | 4 years ago

I see a huge bump in numpy performance with AVX-512, to the point where I wouldn’t buy a cpu without it. I don’t know that I understand the critique—-is it because these instructions are Intel specific and not on AMD? Seems obviously useful for scientific computing.

tbenst | 4 years ago

[deleted]

| 4 years ago

Does anyone know if the frequency throttling caused by AVX512 can be circumvented with high-performance CPU cooling?

It would at a first glance appear the throttling is not caused by temperature spikes but rather just executing the instructions at all.

I wonder how much more FP performance one could extract out of an AVX512 CPU with extreme cooling.

etaioinshrdlu | 4 years ago

Also AVX-512 heat up quite a bit the CPU.

Thaxll | 4 years ago

It's an old argument of having too many special purpose circuits vs leaving more room for the rest. Linus has a point.

shmerl | 4 years ago

I find it amusing, er, ironic, er, amusing -- that after this article, on the same web page, there's a link/blurb which reads:

"Nvidia is worth more than Intel for the first time in history. Nvidia is now worth more than Intel, according to the NASDAQ. The GPU company has finally topped the CPU company's market cap (the total value of its outstanding shares) by $251bn..."

Well no surprise there.

On the one hand, Intel (especially early Intel employees) should be thanked, profusely, for giving us PC history as we know it today.

On the other hand, we should seek to honestly recognize what Intel has become today.

A mega-corporation, driven by corporate mentality, which always seeks to maximize profits for their shareholders at the expense of all other virtues.

Keep in mind I am not criticizing Intel's employees -- only the marching orders that come from the top down.

But, as far as I can tell, Intel, as we know it, will not be around in another 30 years.

The future of semiconductors is in the following areas:

1) Simple, non-proprietary instruction sets (RISC-V and others)

2) Transparent (or as transparent as possible) and publicly auditable engineering and manufacturing processes

3) Conscientous companies that place virtue first, and are not driven by maximimizing profits for shareholders

So on the one hand, Intel is to be thanked, lauded, praised for its role, especially its early role in the beginning of the PC revolution, but on the other hand, I don't see Intel existing as company more than 30 years from now...

Although, at that point in time, Intel's past role will always remain important and relevant -- to future students of early computer history...

peter_d_sherman | 4 years ago

x86 instructions I'd like to see are atomic instructions with lower global memory ordering guarantees. TSO is nice, but there's enough performance sensitive concurrent code where opting out would be useful.

anarazel | 4 years ago

Kinda odd to see the title 'founder'. Linux is no startup or company. Linus is more like the original author, maintainer or creator of Linux.

fierarul | 4 years ago

"Linux founder"

Folks, anyone who knows what Linux is also knows who Linus Torvalds is.

posedge | 4 years ago

I just love it when Torvalds flips out.

drummer | 4 years ago

Is this an eye-opener for Intel? I think not. So, then, what's the point (other than to vent one's frustration)?

Koshkin | 4 years ago

LT: "I hope AVX512 dies a painful death"

I see his month break to work on "unprofessional" behavior didn't include a course on non-violent communication (NVC - https://www.cnvc.org/)

okareaman | 4 years ago

Torvalds clearly does not understand the brutality of business. Intel does not care at all about customers it have at the moment. Because they have them by the balls. After many years of struggle only Apple (with its unlimited resource) was able to switch to ARM. So they don’t care.

They only care about long term market, which is HPC and ML workloads. Because Nvidia is destroying anybody in that market. Look at their stock.

I’ve got a news for Torvalds, it going to get bad for Intel.

0xFFC | 4 years ago