The Beauty of Unix Pipelines
I use pipelines as much as the next guy but every time I see post praise how awesome they are, I'm reminded of the Unix Hater's Handbook. Their take on pipelines is pretty spot on too.
Pipes are a great idea, but are severely hampered by the many edge cases around escaping, quoting, and, my pet peeve, error handling. By default, in modern shells, this will actually succeed with no error:
$ alias fail=exit 1
$ find / | fail | wc -l; echo $?
0
0
You can turn on the "pipefail" option to remedy this: $ set -o pipefail
$ find / | fail | wc -l; echo $?
0
1
Most scripts don't, because the option makes everything much stricter, and requires more error handling.Of course, a lot of scripts also forget to enable the similarly strict "errexit" (-e) and "nounset" options (-u), which are also important in modern scripting.
There's another error that hardly anyone bothers to handle correctly:
x=$(find / | fail | wc -l)
This sets x to "" because the command failed. The only way to test if this succeeded is to check $?, or use an if statement around it: if ! x=$(find / | fail | wc -l); then
echo "Fail!" >&2
exit 1
fi
I don't think I've seen a script ever bother do this.Of course, if you also want the error message from the command. If you want that, you have to start using name pipes or temporary files, with the attendant cleanup. Shell scripting is suddenly much more complicated, and the resulting scripts become much less fun to write.
And that's why shell scripts are so brittle.
I love pipelines. I don't know the elaborate sublanguages of find, awk, and others, to exploit them adequately. I also love Python, and would rather use Python than those sublanguages.
I'm developing a shell based on these ideas: https://github.com/geophile/marcel.
Unix pipelines are cool and I am all for it. In recent times however, I see that sometimes they are taken too far without realizing that each stage in the pipeline is a process and a debugging overhead in case something goes wrong.
A case in point is this pipeline that I came across in the wild:
TOKEN=$(kubectl describe secret -n kube-system $(kubectl get secrets -n kube-system | grep default | cut -f1 -d ' ') | grep -E '^token' | cut -f2 -d':' | tr -d '\t' | tr -d " ")
In this case, perhaps awk would have absorbed 3 to 4 stages.
I think there's an interesting inflection point between piping different utilities together to get something done, and just whipping up a script to do the same thing instead.
First I'll use the command line to, say, grab a file from a URL, parse, sort and format it. If I find myself doing the same commands a lot, I'll make a .sh file and pop the commands in there.
But then there's that next step, which is where Bash in particular falls down: Branching and loops or any real logic. I've tried it enough times to know it's not worth it. So at this point, I load up a text editor and write a NodeJS script which does the same thing (Used to be Perl, or Python). If I need more functionality than what's in the standard library, I'll make a folder and do an npm init -y and npm install a few packages for what I need.
This is not as elegant as pipes, but I have more fine grained control over the data, and the end result is a folder I can zip and send to someone else in case they want to use the same script.
There is a way to make a NodeJS script listen to STDIO and act like another Unix utility, but I never do that. Once I'm in a scripting environment, I might as well just put it all in there so it's in one place.
Debugging Linux pipelines is not a fun experience.
This is one clear area where Powershell with its object model has got it right.
A similar philosophy has made the "tidyverse" a much-loved extension of the statistical language R.
Compare the following 2 equivalent snippets. Which one seems more understandable?
iris_data %>%
names() %>%
tolower() %>%
gsub(".", "_", ., fixed=TRUE) %>%
paste0("(", ., ")")
or: paste0("(", gsub(".", "_", tolower(names(iris_data)), fixed=TRUE), ")")
Unix pipelines are indeed beautiful, especially when you consider its similarity to Haskell's monadic I/O: http://okmij.org/ftp/Computation/monadic-shell.html
Unix pipelines actually helped me make sense of Haskell's monad.
Surprised there is no mention of Doug McIlroy: https://wiki.c2.com/?DougMcIlroy
Interesting comment in the header of this site.
https://github.com/prithugoswami/personal-website/blob/maste...
Powershell is the ultimate expression of the Unix pipeline imo.
Passing objects through the pipeline and being able to access this data without awk/sed incantations is a blessing for me.
I think anyone who appreciates shell pipelines and python can grok the advantages of the approach taken by Powershell, in a large way it is directly built upon existing an Unix heritage.
I'm not so good at explaining why, but for anyone curious please have a look at the Monad manifesto by Jeffrey Snover
https://devblogs.microsoft.com/powershell/monad-manifesto-th...
You may not agree with the implementation, but the ideas being it, I think, are worth considering.
I think it will be on topic if I let myself take this occasion to once again plug in a short public service announcement of an open-source tool I built, that helps interactively build Unix/Linux pipelines, dubbed "The Ultimate Plumber":
I've also recently seen it being described in shorter words as a "sticky REPL for shell". Hope you like it, and it makes your life easier!
To my understanding, this is the same pattern where every "object" outputs the same data type for other "objects" to consume. This pattern can have a text or gui representation that is really powerful in its own nature if you think about it, its why automation agents with their events consumption/emition are so powerful, its why the web itself shifts towards this pattern (json as comunication of data, code as object), The thing is, this will always be a higher level of abstraction, i think that a gui of this pattern should exist as a default method in most operating systems, its would solve a lot learning problems like learning all the names and options of objects, i would be the perfect default gui tool, actually, sites like zapier, or tools like huginn already do this pattern, i always wondered why this pattern expands so slowly being so useful.
I wish there were gui pipes. Pipes are wonderful but they’re neither interactive nor continuous.
Pipes that loop, outputting a declarative gui text format, and listen for events from the gui, would be marvellous.
I can’t think how to do that without sockets and a bash loop. And that seems to create the kind of complexity that pipes manage to avoid.
I find gron to be much more Unix-y than jq. It "explodes" JSON into single lines for use with grep, sed, etc and can recombine back into JSON as well.
Related: Pipelines can be 235x faster than a Hadoop cluster https://adamdrake.com/command-line-tools-can-be-235x-faster-...
Pipes are like one of the best experiences you’ll have whatever you were doing. I was debugging a remote server logging millions of Logs a day and was aggregating a little on the server. Then all it required was wget, jq, sed and awk. And I had a powerful log analyzer than splunk or any other similar solution on a developer Mac. Which you think is awesome when you’re paying a fortune to use Splunk. And for getting some insights quick, Unix pipes are a godsend.
It's ironic that the article ends with Python code. You could have done everything in Python in the first place and it would have probably been much more readable.
If you eval this simple pipeline
file -b `echo $PATH:|sed 's/:/\/* /g'`|cut -d\ -f-2|sort|uniq -c|sort -n
it prints a histogram of the types of all the programs on your path (e.g., whether they are shell, python, perl scripts or executable binaries). How can you ever write such a cute thing in e.g., python or, god forbid, java?I think it is terrible, as if everything is a text you can't precisely describe some data structure, e.g. circular list.
I recently wrote a golang code for fetching rss feed and displaying gist based on requirement.
Looking at this code, I am tempted to reimplement this using pipes but saner mind took over and said "don't fix something that is not broken"
I probably would be still do it and get some benchmark numbers to compare both.
Thank you! I did not know you get to a "sql like group by" using uniq -c. That's so cool! I think I used to pipe it to awk and count using an array and then display but your method is far better than mine.
That was a great article! Pipes can definitely be very powerful. I will say, though, that I often find myself reading pages of documentation in order to actually get anything with Unix and its many commands.
Great write up. One thing I would add is how pipes do buffering / apply backpressure. To my understanding this is the "magic" that makes pipes fast and failsafe(r).
I've been using pipes for decades to get my work done, but it was cool to learn about jq as I have much less experience with JSON. It's a very cool program. Thanks.
Minor nitpick: It is Unix "pipes" (not pipelines).
> If you append /export to the url you get the list in a .csv format.
Text filtering approach narrowly rescued by website feature.
Phew, that was close!
The video from 1982 is brilliant great explanations from an era before certain knowledge was assumed as generally known
What kills me about pipelines is when I pipe into xargs and then suddenly can't kill things properly with Ctrl+C. Often I have to jump through hoops to parse arguments into arrays and avoid xargs just for this. (This has to do with stdin being redirected. I don't recall if there's anything particular about xargs here, but that's where it usually comes up.)
The first example in the article is what `git shortlog -n` does; no need for the pipeline.
What does it mean to say that the video shows "Kernighan being a complete chad"?
Aka The Ugliness of Powershell. Would be fun to see equivalents to these in Powershell.
I have problems understanding what commands I can pipe. Some work some don't.
Cygwin and you can do exactly this on Windows too. For me Cygwin is a bless.
abinitio.com was borne from these principles.
Good article. Thanks for information
The first time I saw pipes in action I was coming from Apple and PC DOS land. It blew my mind. The re-usability of so many /bin tools, and being able to stuff my own into that pattern was amazing.
If you like pipes and multimedia, checkout gstreamer, it has taken the custom pipeline example to real-time.
The capacity of pipe buffer is important for doing serious producer|consumer work;
hiii bro
I think systemd can do all of that.
Esta genial Laexplicacion. Se aprende mucho aqui Soy nuevo y este articulo es bastante genial.
I am sorry for playing the devil's advocate. I also think that pipes are extremely useful and a very strong paradigm, and I use them daily in my work. Also it is not an accident that it is fundamental and integral part of powershell too.
But is this really HN top page worthy? I have seen this horse beaten to death for decades now. These kind of articles have been around since the very beginning of the internet.
Am I missing something newsworthy which makes this article different from the hundreds of thousands of similar articles?
Unix pipes are the a 1970's construct, the same way bell bottom pants are. It's a construct that doesn't take into account the problems and scale of today's computing. Unicode? Hope your pipes process it fine. Video buffers? High perf? Fuggetaboutit. Piping the output of ls to idk what? Nice, I'll put it on the fridge.
Pipes are wonderful! In my opinion you can’t extol them by themselves. One has to bask in a fuller set of features that are so much greater than the sum of their parts, to feel the warmth of Unix:
(1) everything is text
(2) everything (ish) is a file
(3) including pipes and fds
(4) every piece of software is accessible as a file, invoked at the command line
(5) ...with local arguments
(6) ...and persistent globals in the environment
A lot of understanding comes once you know what execve does, though such knowledge is of course not necessary. It just helps.
Unix is seriously uncool with young people at the moment. I intend to turn that around and articles like this offer good material.