Working With Files and Directories

Questions:

How can I create, copy, and delete files and directories?
How can I edit files?

Important Objectives to Learn

Understand your location in a directory hierarchy that matches a given diagram.
Create files in that hierarchy using an editor or by copying and renaming existing files.
Delete, copy and move specified files and/or directories.

____________________________________________________________________

Creating directories

We now know how to explore files and directories, but how do we create them in the first place?

Step one: See where we are and what we already have

Let’s go back to our data-shell directory on the Desktop and use ls -F to see what it contains:

$ pwd
/Users/nelle/Desktop/data-shell
$ ls -F
creatures/  data/  molecules/  north-pacific-gyre/  notes.txt  pizza.cfg  solar.pdf  writing/

Create a directory

Let’s create a new directory called thesis using the command mkdir thesis (which has no screen output):

$ mkdir thesis

As you might guess from its name, mkdir means “make directory”. Since thesis is a relative path (i.e., does not have a leading slash, like /what/ever/thesis), the new directory is created within the current working directory:

$ ls -F
creatures/  data/  molecules/  north-pacific-gyre/  notes.txt  pizza.cfg  solar.pdf  thesis/  writing/

Two ways of doing the same thing

We just want to mention that using the shell to create a directory is no different than using a GUI. In fact, you can open the current shell directory Desktop/data-shell using your operating system’s graphical file explorer and the thesis directory will appear there too. While the shell and the file explorer are two different ways of interacting with the files, the files and directories themselves are the same.

Good names for files and directories

Complicated names of files and directories can make your life painful when working on the command line. Here we provide a few useful tips for the names of your files.

Don’t use whitespaces (a “space”).

Whitespace is used to separate arguments on the command line! Using spaces in file names effectively turns your file name into a command. It is better to never use spaces in names of files and directories. You can use - or _ instead of whitespace.
Don’t begin the name with - (dash) or . (period).

This is because the BASH shell treats names starting with - as options or flags! Remember that files with names beginning with a . are automatically “hidden”.
Inside a file or directory name the best practice is to only use letters, numbers, . (period or ‘full stop’), - (dash) and _ (underscore).

Also, there are other characters that have special meanings on the command line and we will learn about some of these during this lesson. Some special characters can cause your command to fail or work differently, even resulting in data loss.

Important: If you need to refer to names of files or directories that already have whitespace or another non-alphanumeric character, you can work around this problem by surrounding the name in quotes ("").

Since we’ve just created the thesis directory, there’s nothing in it yet, which we check using ls:

$ ls -F thesis
			<-- (nothing is returned)

Create a text file

Let’s change our working directory to thesis using cd, then run a text editor called nano to create a file called draft.txt:

$ cd thesis
$ nano draft.txt

Text vs. Whatever

We might call programs like Microsoft Word or LibreOffice Writer “text editors”, but we need to be a bit more careful when it comes to programming. By default, Microsoft Word uses .docx files to store not only text, but also formatting information (fonts, headings, and so on). This extra information isn’t stored as text, and doesn’t mean anything to many shell tools. Shell tools (commands, programs) expect input files to contain nothing but the letters, digits, and punctuation from a standard computer keyboard. When editing shell files and programs, you must use a plain text editor (or be very careful to save files as plain text).

Which Text Editor?

We use nano in examples because it is one of the least complex text editors. However, it may not be powerful enough, or flexible enough, for all your work after this workshop. On Unix systems (such as Linux and Mac OS X), many programmers use Emacs or Vim (both of which require more time to learn), or Macs can use a graphical editor such as Gedit. On Windows, a good text editor to use is Notepad++. Windows also has a built-in text editor called notepad that can be run from the command line in the same way as nano for the purposes of this lesson.

No matter which editor you use, you need to know where it searches for and saves files. If you start it from the shell, it will (probably) use your current working directory as its default location.

Let’s type a few lines of text into our “draft.txt” file.

Nano in Action

Once we’re happy with our text, we can press Ctrl+O (press the Ctrl or Control key and, while holding it down, press the O key) to write our data to disk (we’ll be asked what file we want to save to, so press Return to accept the suggested default of draft.txt). Now our file is saved, and we can use Ctrl-X to exit the editor and return to the shell.

Key combinations with Control, Ctrl, or ^

The Control key (sometimes referred to as the “Command Key” on a Mac) is referred to using variations of text. For example, you may see an instruction to press the Control key and, while holding it down, press the X key. This may be described as any of:

Control-X

Control+X

Ctrl-X

Ctrl+X

^X

C-x

In nano, along the bottom of the screen you’ll see ^G Get Help ^O WriteOut. This means that you can use Control-G to get help and Control-O to save your file.

You may notice that nano doesn’t leave any output on the screen after it exits, but ls now shows that we have created a file called draft.txt:

$ ls
draft.txt

Creating Files a Different Way

We have seen how to create text files using the nano editor. Now, try the following command in your home directory:

First type $ cd to go your home directory, then type pwd to verify you are “home”. Then type the following command:

$ touch my_file.txt

Now use ls to inspect the files in your home directory (or you can use the GUI file explorer), and you should see a new file named my_file.txt. When you inspect the file with ls -s (the -s flag stands for “size”), note that the size of my_file.txt is 0(kilobytes). In other words, it contains no data. If you open my_file.txt using your text editor it is blank.

The touch command becomes very valuable when programs do not generate output files themselves, but instead require that empty files have already been generated.

Removing files and directories

From our “home” directory, return to the data-shell directory using cd Desktop/data-shell. Let’s tidy up the thesis directory by removing the draft.txt file we created:

$ cd thesis
$ rm draft.txt

The rm is short for “remove” and this command removes (deletes) files. If we run ls again, its output is empty once more, which tells us that our file draft.txt is gone:

$ ls

Deleting Is Forever!

The Unix shell doesn’t have a trash bin where we can recover deleted files! Instead, when we delete files using the command-line they are gone and their storage space on disk can be recycled.

Now, let’s re-create draft.txt

$ pwd
/Users/nelle/Desktop/data-shell/thesis
$ nano draft.txt
$ ls
draft.txt

Now let’s move up one directory to /Users/nelle/Desktop/data-shell using cd ..

$ cd ..

Notice if we try to remove the entire thesis directory using rm thesis, we get an error message:

$ rm thesis
rm: cannot remove `thesis': Is a directory

This happens because rm by default only works on files, not directories. However, It can remove directories if they are completely empty.

To really get rid of thesis we must also delete the file draft.txt. We can do this with the flag-r or the recursive option for rm

$ rm -r thesis
$ ls

Understanding and Using `rm` Safely

Removing the files in a directory recursively can be a very dangerous operation. If we’re concerned about what we might be deleting we should add the “interactive” flag -i to rm which will ask us for confirmation before each step! To demonstrate this, we will recreate the thesis directory and the draft.txt file again:

$ mkdir thesis
$ touch thesis/draft.txt
$ ls thesis
draft.txt

Now let’s remove the draft.txt file using the -i flag for safety. Now you must accept each step of the command.

$ rm -r -i thesis
rm: descend into directory ‘thesis’? y
rm: remove regular file ‘thesis/draft.txt’? y
rm: remove directory ‘thesis’? y

This shows how the command goes into the directory, removes everything in the directory, then removes the directory itself.

Moving files and directories

Let’s create that directory and file one more time. Note that we used touch earlier, and now are using nano from the data-shell directory using the path thesis/draft.txt, rather than first going into the thesis directory.

$ pwd
/Users/nelle/Desktop/data-shell
$ mkdir thesis
$ nano thesis/draft.txt
$ ls thesis
draft.txt

But draft.txt isn’t a particularly informative name, so let’s change the file’s name (rename the file) using mv, which is short for “move”:

$ mv thesis/draft.txt thesis/quotes.txt

The command mv is a little different because it takes two arguments. The first argument tells mv what we’re “moving”, while the second argument tells mv where it should go. In this case, we’re moving thesis/draft.txt to thesis/quotes.txt, which has the same effect as renaming the file. Sure enough, ls shows us that thesis now contains one file called quotes.txt:

$ ls thesis
quotes.txt

One has to be careful when specifying the target file name, since mv will silently overwrite any existing file with the same name, which could lead to data loss. We can protect ourselves by using the additional flag, mv -i (or mv --interactive), can be used to make mv ask you for confirmation before overwriting.

Unlike the rm command, mv works on directories containing files! To show this let’s move quotes.txt into our current working directory (data-shell). We use mv again, but this time we’ll use the name of a directory as the second argument to tell mv that we want to keep the same filename, but put the file somewhere new. (This is why the command is called “move”.) In this case, the directory name we use is the special directory name . that we mentioned earlier.

$ mv thesis/quotes.txt .

The effect is to move the file from the directory it was in (thesis) to the current working directory (data-shell). ls now shows us that the thesis directory is empty:

$ ls thesis

Further (and we will use this often), ls with a filename as an argument only lists that file if it is present. We can use this to see that quotes.txt is now in our current directory:

$ ls quotes.txt
quotes.txt

Copying files and directories

The cp (copy) command works very much like mv, except it copies a file instead of moving it. But cp can make a copy with a new name also! Try this, and then check that cp worked as expected using ls with two paths as arguments — like many Unix commands, ls can be given multiple paths at once:

$ cp quotes.txt thesis/quotations.txt
$ ls quotes.txt thesis/quotations.txt
quotes.txt   thesis/quotations.txt

To prove that we made a copy, let’s delete the quotes.txt file in the current directory and then run that same ls command again.

$ rm quotes.txt
$ ls quotes.txt thesis/quotations.txt
ls: cannot access quotes.txt: No such file or directory
thesis/quotations.txt

This time the shell tells us that it can’t find quotes.txt in the current directory, but it does find the copy named quotations.txt in thesis that we didn’t delete.

What’s In A Name?

You may have noticed that all of Nelle’s files’ names are “something dot something”, and so far, this lesson, always used the extension .txt. This is just a convention: We can call a file mythesis (no extension) or almost anything else we want. However, most people use two-part names to help them (and their programs) identify different types of files. The second part of such a name is called the filename extension, and indicates what type of data the file holds: .txt signals a plain text file, .pdf indicates a PDF document, .cfg is a configuration file full of parameters for some program or other, .png is a PNG image, and so on.

These are just a conventions, albeit important ones (and for you word geeks, “albeit” is an interesting word). All computer files contain bytes: it’s up to us and our programs to interpret those bytes according to the rules for plain text files, PDF documents, configuration files, images, and so on.

To be clear: Naming a PNG image of a whale as whale.mp3 doesn’t somehow magically turn it into a recording of whalesong, though it might cause trouble (for example if the operating system tries to open it with a music player)

Exercises: Using wildcards for accessing multiple files at once

Often one needs to copy or move several files at once. This can be done by providing a list of individual filenames, or specifying a naming pattern using wildcards.

Wildcards

Make sure you are in the Desktop\data-shell\molecules directory. Type ls -F

* is a wildcard. It matches zero or more characters, so *.pdb matches ethane.pdb, propane.pdb, and every file that ends with “.pdb”. On the other hand, p*.pdb only matches pentane.pdb and propane.pdb, because the “p” at the front only matches filenames that begin with the letter “p”.

ls p*.pdb
pentane.pdb  propane.pdb

? is also a wildcard, but it only matches a single character. This means that p?.pdb would match pi.pdb or p5.pdb (if we had these two files in the molecules directory), but not propane.pdb.

We can use any number of wildcards at a time: for example, p*.p?* matches anything that starts with a “p” and ends with “ . “ then “p”, and at least one more character (since the ? MUST match one character), and the final * can match any number of characters). Thus, p*.p?* would match preferred.practice, and even p.pi (since the first * can match no characters at all), but not quality.practice (doesn’t start with “p”) or preferred.p (there isn’t at least one character after the “ .p”).

ls p*.p?*
pentane.pdb  propane.pdb

If a wildcard expression does not match any file, Bash will pass the expression as an argument to the command. For example; typing ls *.pdf in the molecules directory (which has no files with names ending with .pdf) results in an error message.

ls *.pdf
ls: cannot access '*.pdf': No such file or directory

Keypoints to Remember

touch <filename> creates a blank (empty) file for text.
cp <old> <new> copies a file.
mkdir <path> creates a new directory.
mv <old> <new> moves (or renames) a file or directory.
rm <path> removes (deletes) a file. rm -r <path> removes a folder and all files.
* matches zero or more characters in a filename, so *.txt matches all files ending in .txt.
? matches any single character in a filename (but can’t match nothing), so ?.txt matches a.txt but not any.txt or .txt.
Use of the Control key may be described in many ways, including Ctrl-X, Control-X, and ^X.
The shell does not have a trash bin: once something is deleted, it’s really gone.
Depending on the type of work you do, you may need a more powerful text editor than nano.

Data Science in Omics Introduction

Notes