The Awk Programming Language (1988) [pdf]

shawndumas | 370 points

I consider awk to be the most useful and underused language in the UNIX ecosystem. I use it daily to analyze, transform, and assemble data, and it always blows my mind that so few people really know how to use it at a decent level. This is an excellent book to give a real idea of what awk is capable of.

coliveira | 6 years ago

I'll just share my biggest Awk-based project here: A tool that creates HTML documentation from Markdown comments in your source code.

It does essentially what Javadoc does, except it uses Markdown to format text, which I find much more pleasing to the eyes when you read source code.

The benefit of doing it in Awk is that if you want to use it in your project, you can just distribute a single script with your source code and add a two lines to your Makefile. Because of the ubiquity of Awk, you never have to worry whether people building your library has the correct tools installed.

It doesn't have all the features that more sophisticated tools like Doxygen provides, but I'm going to keep on doing the documentation for my small hobby projects this way.

[1] https://github.com/wernsey/d.awk

wernsey | 6 years ago

I had this exact book and used awk on DOS/Novell in the early 90's when scripting choices were pretty scarce. The writing is tremendous - a model of clarity, and worth reading just for that. Anything with Kernighan, Pike or Plauger as author is worth checking out just for the example of clear thinking.

ianmcgowan | 6 years ago

In 1996 I worked as a federal contractor for a US Army base. They had different Unix systems locked down for security reasons. Had Awk and Sed to work with and ordered the books from Amazon.

Oracle databases and other databases exported data in fixed width files and I had to download from several Nix systems to import into one general Nix system using Oracle and then a DOS based Clipper 5 system and an Access 2.0 Windows system and they all had to get the same results.

If not for Awk I could not filter the files from the Nix systems.

orionblastar | 6 years ago

I printed this book and went through it and imho just skimming through all of it is worth it: just understanding how it works beyond the basic '{print $2}' is immensely worth it, and being exposed to some 'advanced' techniques gives you a set of techniques that you can reuse in your daily chores (in particular if you're a sysadmin).

znpy | 6 years ago

This book is worth the read. Just to get in the mindset of the authors. I wish that more programming books could be as concise and useful at the same time.

totalperspectiv | 6 years ago

TXR Lisp provides a Lisp-ified awk in a macro:

http://www.nongnu.org/txr/txr-manpage.html#N-000264BC

> "Unlike Awk, the awk macro is a robust, self-contained language feature which can be used anywhere where a TXR Lisp expression is called for, cleanly nests with itself and can produce a return value when done. By contrast, a function in the Awk language, or an action body, cannot instantiate an local Awk processing machine. "

The manual contains a translation of all of the Awk examples from the POSIX standard:

http://www.nongnu.org/txr/txr-manpage.html#N-03D16283

The (-> name form ...) syntax above is scoped to the surrounding awk macro. Like in Awk, the redirection is identified by string. If multiple such expressions appear with the same name, they denote the same stream (within the lexical scope of the awk macro instance to which they belong). These are implicitly kept in a hash table. When the macro terminates (normally or via non-local jump like an exception), these streams are all closed.

kazinator | 6 years ago

For context, I'm in university, but during one of my internships, a lot of the older developers always seemed to use awk/sed in really powerful ways. At the same time, I noticed a lot of the younger developers hardly used it.

I'm not sure if it's a generational thing, but I thought that was interesting.

Anyways, are there any good resources to learn awk/sed effectively?

3uclid | 6 years ago

Over the years I've written too many awk one liners to count. Most of them look ugly - hell awk makes Perl look elegant - but having awk in your toolkit means that you don't have to drop out of the shell to extract some weird shit out of a text stream. Thanks Aho Weinberger and Kernigan!

olskool | 6 years ago

And I'm still waiting for the structural regular expressions version of awk [0].

I very much like awk, I prefer it over sed, because it's easy to read. Also proper man page is all one needs. But I find myself many times doing something like this:

  match($0, /regex/) {
    x = substr($0, RSTART, RLENGTH)
    if(match(x, /regex2/)) {
      ...
    } else if(match(x, /regex3/)) {
      ...
Then I sometimes want to mix and match those strings. Or do some math on a matched number. It's a bit tedious in awk.

[0] http://doc.cat-v.org/bell_labs/structural_regexps/

hawski | 6 years ago

Awk is great for quick command line scripts and also for running on a very wide range of systems.

I recently wrote a simple statistics tool using Awk to calculate median, variance, deviation, etc. and people say the code is readable and good for seeing the simplicity of Awk.

https://github.com/numcommand/num/blob/master/bin/num

jph | 6 years ago

If you want a fast awk, use mawk.

https://github.com/mikebrennan000/mawk-2

rurban | 6 years ago

That is a nice book. Starting with a practical tutorial and going into the structure and language features afterwards on a reasonable page count of just about 200 pages.

I like to use awk when I need something a little more powerful than grep. Nevertheless, when I look at the examples and where the book is heading I prefer R for many of the tasks (in particular Rscript with a shebang).

Just to give an example: If you have to manipulate a CSV file, that would most certainly be possible with awk, but some day there might be a record which does contain the separator and your program will produce some garbage. R on the other hand comes with sophisticated algorithms to handle CSV file correct.

I truly respect awk for what it was and is but I also think that the use-cases where it is the best tool for the job has become very narrow over time.

JepZ | 6 years ago

As I do most of my daily work in cheminformatics with a (shell-based) workflow engine (http://scipipe.org), awk has turned out to be the perfect way of defining even quite complicated components based on just a shell command. These days, pretty much 50% of my components are CSV/TDV data munging with awk! :D

(Can be hard to explain how this works without an image, so an (older) image is found in: https://twitter.com/smllmp/status/984173696448434176 )

samuell | 6 years ago

I find awk so beautiful. I written many scripts in awk. It is so good at data transformation. I used it write a script to delete old and unused records from tables. The book is so beautifully written with amazing clarity of thought.

hi41 | 6 years ago

I recently had a sort of "contest" with someone for parsing the output of a tool. I had to parse some text output into a tree structure.

The other person wrote it in awk, quite quickly. After writing my own version in Python (my version was waaaay over-engineered), I decided to blatantly rip-off the awk solution and re-implement it in Python.

It was almost as simple and as short.

Awk is much more compact as a language, but also way more limited. And it still has its quirks and a certain volume of information you have to gather. I'd say it's more worthwhile to learn Python instead, because you'll be able to use it for other purposes.

oblio | 6 years ago

From the introduction to Chapter 2:

> Because it's a description of the complete language, the material is detailed, so we recommend that you skim it, then come back as necessary to check up on details.

Any book that recommends skimming is doing something right.

thomastjeffery | 6 years ago

This pdf looks like a scanned book but I can highlight and copy text from it? What exactly is going on here? Does Chrome pdf viewer have built-in OCR?

dokem | 6 years ago

The lovely and humbling thing about this it was written 3 decades ago and the examples still work. Makes me think of another short elegant piece by Kenneth Church (?) called "Unix for Poets" which shows how to use core UNIX utils to work with text. Also from the mid to late 80s. Perl may have replaced sed and awk but they endure.

mbubb | 6 years ago

I've this book and highly recommend. I have referenced it numerous times to pull out text manipulation wizardly that stunned others.

segmondy | 6 years ago

I always found it interesting that the Awk paradigm is also the basis for IBM's RPG language. Two very different environments coming up with basically the same elegant solution for the same problem:

1. Run zero or more setup operations.

2. Loop over the lines of a text file and process its columns into an output format.

3. Run zero or more cleanup operations at the end.

excitom | 6 years ago

If you want a binary alternative to awk, try using lex.[1]

You can feed in regexps and c code fragments and it will generate c code for you.

[1] https://www.tldp.org/HOWTO/Lex-YACC-HOWTO-3.html

gerbilly | 6 years ago

I wanted to know what the language looked like so I went to the first example in the book and found this:

  This is the kind of job that awk is meant for, so it's easy. Just type this command line:
  awk '$3 > 0 { print $1, $2 * $3 }' emp.data
anoonmoose | 6 years ago

Lots of comments about awk, perl and sed for text proccessing. What about tcl?

vasili111 | 6 years ago

Recycling an older HN discussion on Awk vs Perl

https://news.ycombinator.com/item?id=14647022

wenc | 6 years ago

I just skimmed it in 30 minutes. I feel I can write some simple stuff now. Except for all the examples it doesn't feel that overwhelming.

carlmr | 6 years ago

I was joking with my coworker some weeks ago about how awk is condemned to be forever used as a cut replacement.

ulzeraj | 6 years ago

I have this book. I use awk daily to do analysis of Suricata logs. It's great for querying structured text.

rbc | 6 years ago

I love the typography

mxschumacher | 6 years ago

Awk has it's uses. If you use the command line you'll probably use Awk occasionally.

I don't get the Perl hate. Perl's unpopularity may have something to do with some of the languages design choices. I think what really killed it was Perl coders. Some of the worst code I've seen happened to be written in Perl. If you follow clean code principles Perl is fine. Mojolicious is an awesome framework. I like it a lot.

Today I code Python and C. I used to code Ruby and before that Perl. I loved Ruby's syntax but Ruby seems to be waning. I'm looking forward to coding in Go. I'll be coding Javascript but I'm not looking forward to it.

Use the tool that fits the job. I have no loyalties to any programming language.

lasermike026 | 6 years ago

bold was called heavy, lol

florinutz | 6 years ago

I know many people will downvote, but in my opinion, just say no to this ancient "programming language". It's so confusing, completely text based, designed ages ago in an entirely different environment. There are many better alternatives, like Python or Powershell. Why not use them?

mdavid626 | 6 years ago