Naming things (2015) [pdf]

petethomas | 157 points

The real lesson of this discussion seems to be: metadata has failed our expectations.

All this ancillary stuff that we'd like attached to files, like dates, client names and projects, versions and so on, are metadata. Some systems keep metadata in files: EXIF, Word, PDF. Some systems have conventions for this instead: header blocks in source code. But if neither of those applies? Only place you can put it reliably is the filename :(

pjc50 | 7 years ago

The problem I've found with file names like those described as "awesome" in the fifth slide is that if you have a bunch of them open at once, your taskbar/switcher/windows menu truncates them all to something like "2013-06-26_BRAFWTNEG...", making finding the one you want a bit more burdensome.

Jakob Nielsen had a post (a link for which I cannot find) recommending that web-page titles put the most specific information at the beginning. Doing something similar with file names (e.g., calling them "H01_MutantFraction...2013-06-26.csv", etc) would trade some of the advantages of the proposed scheme for speed of finding and switching between files when you're actually using them.

dghf | 7 years ago

If you look at the filename examples, there seems to be an implicit suggestion of naming a group of related files using a common prefix.

If one needs to distinguish groups of files, why not just put them in directories? That's the reason directories exist, no?

I can somewhat understand if some (bad) software is written to look for files only in a single directory and you have to put everything there. But otherwise, it seems pretty pointless to use a common prefix and make filenames longer.

k_sze | 7 years ago

I'm constantly harping on everybody to pay attention to their file naming.

I'm a graphic designer, so for me everything is Client/YYMM-Project/_FINAL/YYMM-COLLATERAL-NAME

Within each project there is a _PROCESS folder with a _ELEMENTS subfolder for pieces the client has given me to work with.

For invoices I do YYMMDD-ClientName-Project-Sum.pdf. When the invoice is paid, I rename the file to add -PAID- before the client name. Its simple, but its allowed me to easily track and maintain projects and billing over the years.

If I end up working for another 83 years, I guess I'll pad the year with a 0...

Proper file management is an undervalued skill and should be taught both in school and in corporate environments. In an old tech job we had a public folder on the server that was total chaos. So many people insisted on naming their files MAY-%day%-%contents%-%personsname% -- and, as you'd expect, people spent countless hours per year trying to hunt down that one file so-and-so worked on before they left for another job.

stevewillows | 7 years ago

The author forgot to write down the most important reason for the whole exercise: file transfer.

Modern operating systems index the contents of your files, so finding all your files on project "foo" is only a search away. If you are a GUI user, then file naming isn't really that important for locating data on your machine.

Where file names matter is because there is 40 years of cruft out there which absolutely refuses to move metadata along with files. So you can touch your files to set dates, organize them in directories or tag them to your heart's content in Mac OS or Windows, but you will lose all that information when you attach the file to an email or put it in DropBox.

So you only have a choice of two places to put metadata in such a way that the metadata will be carried along with the file- the file name or the file contents.

Putting the data in the file contents lacks discoverability and in many cases the applications you use to manipulate the files don't allow for additional metadata anyway. Also, some file types (Word .docx files, jpegs, MP3s) get their metadata updated and/or scrambled when you open them with specific applications. So really your only valid choice is to put it in the file name.

The author's specific recommendations (use underscores and hyphens for delimiting) assume that you really want to access the files with the command line and use globbing. Other than that implication, the recommendations are sound.

efitz | 7 years ago

> avoid [...] accented characters

It's certainly good advice and I definitely avoid using non-ASCII characters in filenames in practice. But I can't help thinking that advice like that is why support for Unicode is still buggy in many places.

I see nothing fundamentally wrong with using non-ASCII for filenames (and the slides don't give any reasoning), if only random software wouldn't mangle encodings, sorting order, or plain refuse to accept such filenames.

avian | 7 years ago

For those who don't know Jenny Bryan, she is a wonderful force for good in the R community. It seems like these filenames are a little contentious here on HN, but IMO this will always be a big improvement over the file-naming practices of someone who has given little or no thought to the topic. Which I am guessing was her intended audience.

If you're getting into R or data analysis, check out http://stat545.com/topics.html. She has put a lot of thought into the project management aspects of carrying out a data science project that don't get discussed as often as other, sexier topics.

alexilliamson | 7 years ago

My only real concern is with left-padding numbers with 0s when you don't know in advance how big the numbers are going to get. Do you pad to 2 or 3 digits or...

andrewflnr | 7 years ago

Why left pad numbers if you could simply use better tools that properly sort numbers (instead of using ASCII order which doesn't even make sense for ordering purposes). For instance, use `ls -v` instead of `ls` (possibly as an alias).

GlitchMr | 7 years ago

The pdf didnt mention the most important point - keep it short and to the point

Its so hard to use CI interface and type that filename every time or navigating to in file explorer when its so long to read and the important bits of info should always be at the start.

Iso date conventions arent necessary since most files have metadata associated with it (create and modified date) so adding ISO date format is redundant for human made files . As you can always use a bulk renamer at any point.

But my point still stands just sort regularly and add some sorting identifier at the front of the file. Depending on who is working on file and what context it is a simple number at front suffices 01, 02, 03, etc or it can be a word and version number at end

Lastly the author didnt mention foldernames. Those need to be one or two words at most to help segregate information if theres lots of files in that one folder

If your creating machine made / autogenerated saved reports following a standard ISO state convention makes sense though, with regexable slugs, etc

Kagerjay | 7 years ago

There's some empirical work on how developers encounter and respond to naming anti-patterns e.g. http://www.veneraarnaoudova.com/wp-content/uploads/2014/10/2... and associated googlescholar search https://scholar.google.com/scholar?hl=en&as_sdt=0,34&q=lingu...

hyporthogon | 7 years ago

I always name from general to specific — left to right.

Example: client_project_element-name_20170818.txt

When sorted by name, similar items are grouped together.

adams_at | 7 years ago

MMDDYYYY

I like to call this middle endian.

snarfy | 7 years ago

For some reason I want to avoid names beginning with numbers. All my filenames can be used as an identifier (except the extension part).

i.e. [a-zA-Z][0-9a-zA-Z_]+\.[a-z0-9]+

euske | 7 years ago

I name my files like this:

  2017-04-04 #S4907 #Choir List of names.pdf
  2017-04-05 #EXCITE #S=5005 Notes on Data Repositories.pdf
  2017-04-06 #ARAG #S=5031 #Amount=14.2 #CUR=EUR Invoice.pdf
with the following semantic:

* date. Every file name is prefixed with the ISO date to facilitate sorting

* tags. The syntax #<tagname> to categorize documents with tags.

* key-value pairs. The syntax $<key>=value let's us attach structured information to the document name.

and keep them in a single large folder.

On top of this, I have written some shell tooling to normalize, view, and shuffle around those documents: https://github.com/HeinrichHartmann/pile

E.g. `pile extract EXCITE` will extract all files with tag #EXCITE to a separate folder named #EXCITE. There is also a HTML form that helps with proper naming of new files.

File management is still a pain for me, but this at least gives me some confidence that I can retrieve stored documents reasonably well. I hope, that one day I'll be able to auto-generate expense reports and tax filings, from properly tagged up filenames.

heinrichhartman | 7 years ago

I'm not a fan of dates at the beginning of the file name. If a file name needs a date I always put it at the end so future versions group when sorting the files.

mulmen | 7 years ago
[deleted]
| 7 years ago

Nice rules :) I especially like the use of underline-delimited metadata in filenames, saved me on a huge research deadline once.

stevenschmatz | 7 years ago

It's a nice set of guidelines, but why the snail on page 15?

deGravity | 7 years ago

Considered that most of the world uses "little-endian" format for date writing ( https://en.wikipedia.org/wiki/File:Date_format_by_country_(n... ) how comes ISO 8601 was set on "big-endian"?

Not so practical for anything else but file naming IMHO...

sguav | 7 years ago

Those filenames look terrible to me.

BRAFWTNEGASSAY doesnt make any sense. To distinguish between the filenames you actually have to read the full filename, and with such long filenames chances are they won't fully be displayed, so that you have 2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFrac... For the first four files.

Keeping cruft out of your filenames seems like a much, much better way to name files. Also, most systems keep track of the creation data, no need to keep it in the filename. I think it's better to give files an id.

kutkloon7 | 7 years ago