The Beauty of Unix Pipelines

0x4FFC8F | 649 points

Pipes are wonderful! In my opinion you can’t extol them by themselves. One has to bask in a fuller set of features that are so much greater than the sum of their parts, to feel the warmth of Unix:

(1) everything is text

(2) everything (ish) is a file

(3) including pipes and fds

(4) every piece of software is accessible as a file, invoked at the command line

(5) ...with local arguments

(6) ...and persistent globals in the environment

A lot of understanding comes once you know what execve does, though such knowledge is of course not necessary. It just helps.

Unix is seriously uncool with young people at the moment. I intend to turn that around and articles like this offer good material.

gorgoiler | 4 years ago

I use pipelines as much as the next guy but every time I see post praise how awesome they are, I'm reminded of the Unix Hater's Handbook. Their take on pipelines is pretty spot on too.

http://web.mit.edu/~simsong/www/ugh.pdf

cowmix | 4 years ago

Pipes are a great idea, but are severely hampered by the many edge cases around escaping, quoting, and, my pet peeve, error handling. By default, in modern shells, this will actually succeed with no error:

  $ alias fail=exit 1
  $ find / | fail | wc -l; echo $?
  0
  0
You can turn on the "pipefail" option to remedy this:

  $ set -o pipefail
  $ find / | fail | wc -l; echo $?
  0
  1
Most scripts don't, because the option makes everything much stricter, and requires more error handling.

Of course, a lot of scripts also forget to enable the similarly strict "errexit" (-e) and "nounset" options (-u), which are also important in modern scripting.

There's another error that hardly anyone bothers to handle correctly:

  x=$(find / | fail | wc -l)
This sets x to "" because the command failed. The only way to test if this succeeded is to check $?, or use an if statement around it:

  if ! x=$(find / | fail | wc -l); then
    echo "Fail!" >&2
    exit 1
  fi
I don't think I've seen a script ever bother do this.

Of course, if you also want the error message from the command. If you want that, you have to start using name pipes or temporary files, with the attendant cleanup. Shell scripting is suddenly much more complicated, and the resulting scripts become much less fun to write.

And that's why shell scripts are so brittle.

atombender | 4 years ago

I love pipelines. I don't know the elaborate sublanguages of find, awk, and others, to exploit them adequately. I also love Python, and would rather use Python than those sublanguages.

I'm developing a shell based on these ideas: https://github.com/geophile/marcel.

geophile | 4 years ago

Unix pipelines are cool and I am all for it. In recent times however, I see that sometimes they are taken too far without realizing that each stage in the pipeline is a process and a debugging overhead in case something goes wrong.

A case in point is this pipeline that I came across in the wild:

TOKEN=$(kubectl describe secret -n kube-system $(kubectl get secrets -n kube-system | grep default | cut -f1 -d ' ') | grep -E '^token' | cut -f2 -d':' | tr -d '\t' | tr -d " ")

In this case, perhaps awk would have absorbed 3 to 4 stages.

ketanmaheshwari | 4 years ago

I think there's an interesting inflection point between piping different utilities together to get something done, and just whipping up a script to do the same thing instead.

First I'll use the command line to, say, grab a file from a URL, parse, sort and format it. If I find myself doing the same commands a lot, I'll make a .sh file and pop the commands in there.

But then there's that next step, which is where Bash in particular falls down: Branching and loops or any real logic. I've tried it enough times to know it's not worth it. So at this point, I load up a text editor and write a NodeJS script which does the same thing (Used to be Perl, or Python). If I need more functionality than what's in the standard library, I'll make a folder and do an npm init -y and npm install a few packages for what I need.

This is not as elegant as pipes, but I have more fine grained control over the data, and the end result is a folder I can zip and send to someone else in case they want to use the same script.

There is a way to make a NodeJS script listen to STDIO and act like another Unix utility, but I never do that. Once I'm in a scripting environment, I might as well just put it all in there so it's in one place.

russellbeattie | 4 years ago

Debugging Linux pipelines is not a fun experience.

This is one clear area where Powershell with its object model has got it right.

nojito | 4 years ago

A similar philosophy has made the "tidyverse" a much-loved extension of the statistical language R.

Compare the following 2 equivalent snippets. Which one seems more understandable?

    iris_data %>%
        names() %>%
        tolower() %>%
        gsub(".", "_", ., fixed=TRUE) %>%
        paste0("(", ., ")")

or:

    paste0("(", gsub(".", "_", tolower(names(iris_data)), fixed=TRUE), ")")
antipaul | 4 years ago

Unix pipelines are indeed beautiful, especially when you consider its similarity to Haskell's monadic I/O: http://okmij.org/ftp/Computation/monadic-shell.html

Unix pipelines actually helped me make sense of Haskell's monad.

CalmStorm | 4 years ago

Surprised there is no mention of Doug McIlroy: https://wiki.c2.com/?DougMcIlroy

nsajko | 4 years ago

Interesting comment in the header of this site.

https://github.com/prithugoswami/personal-website/blob/maste...

lexpar | 4 years ago

Powershell is the ultimate expression of the Unix pipeline imo.

Passing objects through the pipeline and being able to access this data without awk/sed incantations is a blessing for me.

I think anyone who appreciates shell pipelines and python can grok the advantages of the approach taken by Powershell, in a large way it is directly built upon existing an Unix heritage.

I'm not so good at explaining why, but for anyone curious please have a look at the Monad manifesto by Jeffrey Snover

https://devblogs.microsoft.com/powershell/monad-manifesto-th...

You may not agree with the implementation, but the ideas being it, I think, are worth considering.

rzmnzm | 4 years ago

I think it will be on topic if I let myself take this occasion to once again plug in a short public service announcement of an open-source tool I built, that helps interactively build Unix/Linux pipelines, dubbed "The Ultimate Plumber":

https://github.com/akavel/up/

I've also recently seen it being described in shorter words as a "sticky REPL for shell". Hope you like it, and it makes your life easier!

akavel | 4 years ago

To my understanding, this is the same pattern where every "object" outputs the same data type for other "objects" to consume. This pattern can have a text or gui representation that is really powerful in its own nature if you think about it, its why automation agents with their events consumption/emition are so powerful, its why the web itself shifts towards this pattern (json as comunication of data, code as object), The thing is, this will always be a higher level of abstraction, i think that a gui of this pattern should exist as a default method in most operating systems, its would solve a lot learning problems like learning all the names and options of objects, i would be the perfect default gui tool, actually, sites like zapier, or tools like huginn already do this pattern, i always wondered why this pattern expands so slowly being so useful.

tekert | 4 years ago

I wish there were gui pipes. Pipes are wonderful but they’re neither interactive nor continuous.

Pipes that loop, outputting a declarative gui text format, and listen for events from the gui, would be marvellous.

I can’t think how to do that without sockets and a bash loop. And that seems to create the kind of complexity that pipes manage to avoid.

tarkin2 | 4 years ago

I find gron to be much more Unix-y than jq. It "explodes" JSON into single lines for use with grep, sed, etc and can recombine back into JSON as well.

https://github.com/TomNomNom/gron

benjaminoakes | 4 years ago

Related: Pipelines can be 235x faster than a Hadoop cluster https://adamdrake.com/command-line-tools-can-be-235x-faster-...

parliament32 | 4 years ago

Pipes are like one of the best experiences you’ll have whatever you were doing. I was debugging a remote server logging millions of Logs a day and was aggregating a little on the server. Then all it required was wget, jq, sed and awk. And I had a powerful log analyzer than splunk or any other similar solution on a developer Mac. Which you think is awesome when you’re paying a fortune to use Splunk. And for getting some insights quick, Unix pipes are a godsend.

theshadowmonkey | 4 years ago

It's ironic that the article ends with Python code. You could have done everything in Python in the first place and it would have probably been much more readable.

sedatk | 4 years ago

If you eval this simple pipeline

     file -b `echo $PATH:|sed 's/:/\/* /g'`|cut -d\  -f-2|sort|uniq -c|sort -n
it prints a histogram of the types of all the programs on your path (e.g., whether they are shell, python, perl scripts or executable binaries). How can you ever write such a cute thing in e.g., python or, god forbid, java?
enriquto | 4 years ago

I think it is terrible, as if everything is a text you can't precisely describe some data structure, e.g. circular list.

stevefan1999 | 4 years ago

I recently wrote a golang code for fetching rss feed and displaying gist based on requirement.

Looking at this code, I am tempted to reimplement this using pipes but saner mind took over and said "don't fix something that is not broken"

I probably would be still do it and get some benchmark numbers to compare both.

praveen9920 | 4 years ago

Thank you! I did not know you get to a "sql like group by" using uniq -c. That's so cool! I think I used to pipe it to awk and count using an array and then display but your method is far better than mine.

hi41 | 4 years ago

That was a great article! Pipes can definitely be very powerful. I will say, though, that I often find myself reading pages of documentation in order to actually get anything with Unix and its many commands.

Silamoth | 4 years ago

Great write up. One thing I would add is how pipes do buffering / apply backpressure. To my understanding this is the "magic" that makes pipes fast and failsafe(r).

miclill | 4 years ago

I've been using pipes for decades to get my work done, but it was cool to learn about jq as I have much less experience with JSON. It's a very cool program. Thanks.

not2b | 4 years ago

Minor nitpick: It is Unix "pipes" (not pipelines).

sigjuice | 4 years ago

> If you append /export to the url you get the list in a .csv format.

Text filtering approach narrowly rescued by website feature.

Phew, that was close!

kazinator | 4 years ago

The video from 1982 is brilliant great explanations from an era before certain knowledge was assumed as generally known

stormdennis | 4 years ago

What kills me about pipelines is when I pipe into xargs and then suddenly can't kill things properly with Ctrl+C. Often I have to jump through hoops to parse arguments into arrays and avoid xargs just for this. (This has to do with stdin being redirected. I don't recall if there's anything particular about xargs here, but that's where it usually comes up.)

mehrdadn | 4 years ago

The first example in the article is what `git shortlog -n` does; no need for the pipeline.

jakubnarebski | 4 years ago

What does it mean to say that the video shows "Kernighan being a complete chad"?

JadeNB | 4 years ago

Aka The Ugliness of Powershell. Would be fun to see equivalents to these in Powershell.

0xsnowcrash | 4 years ago

I have problems understanding what commands I can pipe. Some work some don't.

nickthemagicman | 4 years ago

Cygwin and you can do exactly this on Windows too. For me Cygwin is a bless.

unnouinceput | 4 years ago

abinitio.com was borne from these principles.

hbarka | 4 years ago

Good article. Thanks for information

Dilu8 | 4 years ago

The first time I saw pipes in action I was coming from Apple and PC DOS land. It blew my mind. The re-usability of so many /bin tools, and being able to stuff my own into that pattern was amazing.

If you like pipes and multimedia, checkout gstreamer, it has taken the custom pipeline example to real-time.

staycoolboy | 4 years ago

The capacity of pipe buffer is important for doing serious producer|consumer work;

known | 4 years ago

hiii bro

gajjuhacker | 4 years ago

I think systemd can do all of that.

billfor | 4 years ago

Esta genial Laexplicacion. Se aprende mucho aqui Soy nuevo y este articulo es bastante genial.

niko0221 | 4 years ago

I am sorry for playing the devil's advocate. I also think that pipes are extremely useful and a very strong paradigm, and I use them daily in my work. Also it is not an accident that it is fundamental and integral part of powershell too.

But is this really HN top page worthy? I have seen this horse beaten to death for decades now. These kind of articles have been around since the very beginning of the internet.

Am I missing something newsworthy which makes this article different from the hundreds of thousands of similar articles?

fogetti | 4 years ago

Unix pipes are the a 1970's construct, the same way bell bottom pants are. It's a construct that doesn't take into account the problems and scale of today's computing. Unicode? Hope your pipes process it fine. Video buffers? High perf? Fuggetaboutit. Piping the output of ls to idk what? Nice, I'll put it on the fridge.

adamnemecek | 4 years ago