Because Python is run across a variety of operating systems, its standard library includes the os module, a collection of helper utilities for doing file system operations, so that Python programmers can work with files (e.g. creating, moving, deleting files) using the same Python code, no matter what system they’re on. Among the os module’s most useful functions is makedirs()
, which we can use to programmatically create new directories.
Here's the problem we're trying to solve – if you're doing this as homework, see the full info for this exercise:
For many of the assignments, you will be stashing downloaded files and data into a local directory named tempdata
. Write a Python program to create that directory. This function should be “smart” enough not to crash/error-out if the tempdata
directory already exists.
When you run a.py
from the command-line:
0004-shakefiles $ python a.py
tempdata
(directory)
The program must not crash if the tempdata
directory already exists.
The action of creating a directory is different across different operating systems, e.g. Linux vs. Windows vs. Mac OS X. So the Python standard library os
provides a makedirs()
function.
This requires importing the os module:
>>> import os
>>> os.makedirs("mynewdirectory")
So where does the directory get created? Wherever your script (or ipython) is currently running.
If you run your script from, say, ~/Desktop
, then the following command:
>>> os.makedirs("exampledirectory")
– will create a new directory at ~/Desktop/exampledirectory
:
Desktop ## <== YOU ARE HERE
├── exampledirectory
If you need to know where you are, the os
module has the getcwd()
function (cwd
stands for, "current working directory"):
>>> os.getcwd()
/Users/dtown/Desktop
Idempotence refers to a quality of an operation in which an operation can be applied multiple times without different effects each time.
Here's an example of a non-idempotent operation:
>>> os.makedirs("oopsie")
>>> os.makedirs("oopsie")
FileExistsError: [Errno 17] File exists: 'oopsie'
The makedirs()
function throws an error by default if you try to create a directory that already exists (or, if you try to create a directory with the same path as a file…which is most definitely something you want to avoid)
One way to get around this is to check for the existence of a directory/file before trying to create it, as a conditional branch:
>>> if not os.path.exists('oopsie'):
... os.makedirs('oopsie')
An even easier way is to use the named argument, exists_ok
, of the makedirs()
function. This argument is optional because it has a default value; by default, exists_ok
is set to False
– i.e. it will not be OK if you try to use makedirs
to create an already existing path.
However, setting exists_ok
to True
when calling makedirs()
will prevent an error message from being thrown:
>>> os.makedirs("oopsie", exist_ok = True)
>>> os.makedirs("oopsie", exist_ok = True)
# no error message is thrown...
So that's how you easily create directories and subdirectories in Python with makedirs()
.
The os
module has a lot of useful utilities that we'll use to organize the files that we read and write from. In a later lesson, we'll be using the os.path submodule
But I'll take some time here to acquaint you with the os.path.join()
function and the problem it solves.
We're going to be dealing a lot with files stored on our computers and, by necessity, we will be dealing with their paths, i.e. the text strings that describe the next of subdirectories to actually get to a specific file. But this is complicated when dealing with different operating systems.
On OS X – and, all other Unix-based systems such as Linux, file paths are represented as text strings in this format, with forward-slashes delimiting the subdirectories and the actual filename – in this case, file.txt
:
my/path/to/file.txt
In Windows, the backslash is used:
\my\path\to\file.txt
If you've been paying attention to what the backslash character means for Python strings, you might remember that it acts as an escape sequence – i.e. the backslash modifies the meaning of the token (i.e. character) that follows it. This means to print a literal backslash in a Python string, you have to use double backslashes:
>>> print("\\my\\path\\to\\file.txt")
\my\path\to\file.txt
As you can imagine, that could complicate the ability to write code that works on Windows and everywhere else.
We get access to the os.path module if we have import os
in our code.
The os.path.join() function takes as many arguments needed to generate a specified file path, with each argument representing one component (i.e. subdirectory) of the path. So instead of doing this:
mypath = "my/path/to/file.txt"
We do this:
mypath = os.path.join('my', 'path', 'to', 'file.txt')
And whether you're running code on Windows or Unix-based systems, the actual path to the file will be consistent.
makedirs()
We don't have to make nested subdirectories yet – i.e. subdirectories within subdirectories, e.g. somedir/subdir/subsubdir
but I'm just using it as an example of a real-world scenario that we will later encounter.
The makedirs()
function recursively creates directories. That is, if you want to create a nested directory:
>>> os.makedirs("my/fun/new/directory")
– it will handle the creation of the necessary parent directories:
First:
my
Then:
my
└── fun
Then:
my
└── fun
└── new
– before creating the actual directory that you specified, i.e. my/fun/new/directory
:
my
└── fun
└── new
└── directory
This is pretty convenient. And a nice use case for os.path.join()
:
import os
mypath = os.path.join("my", "fun", "new", "directory")
makedirs(mypath)
We'll get some actual usage of os.path.join
in the next_lesson
If you're interested in why things are different between operating systems…well, like everything in life, it comes down to different people and companies creating different things at different times in different contexts.
So why is Windows the odd operating system out? It’s all down to a few accidents of history that happened decades ago.
Unix introduced the forward slash character — that’s the / character — as its directory separator around 1970. We don’t really know why they chose this one, but that’s the one they picked.
It’s hard to imagine today, but the original version of Microsoft DOS — that’s MS-DOS 1.0 — didn’t support directories at all when it was released in 1981…MS-DOS 2.0 introduced support for directories, but IBM wanted to keep compatibility with the original DOS utilities and other programs that expected the / character to be used for switches. Microsoft had already used the / character for something, so they couldn’t just re-use it.
They ultimately chose the \ character instead, as it was the most similar-looking character visually.