Tutorial 01: Changing File Types¶
Objectives¶
This tutorial will teach:
Creating command-line cctk scripts.
Manipulation of
File
objects.Reading/writing data.
Overview¶
Many computational chemistry papers report structures in the .xyz
format, which is not recognized by Gaussian.
Although manual conversion is facile for one file, an automated solution can prove useful for bulk data processing.
A sample .xyz
file can be found at the end of this tutorial.
This tutorial will showcase cctk’s ability to automatically interconvert between file types, as well as provide a template for other command-line scripts.
Creating a Bash Script¶
Open a terminal window in a directory with a file titled tutorial1.xyz
. A sample xyz file can be found here tutorial1.xyz.
In a terminal window, create a new file called read_from_xyz_01.py
and open it in your favorite text editor (e.g., vim
, emacs
, or nano
). If you prefer a notebook over bash scripts, you can find the corresponding notebook, here read_from_xyz_01.ipynb
$ vim read_from_xyz_01.py
This will open a blank file. First, we need to load cctk:
import re
from cctk import XYZFile, GaussianFile, OrcaFile
Now that we’ve loaded cctk, we can read in data from an input file:
filename = "./tutorial1.xyz"
file = XYZFile.read_file(filename)
The above code creates a cctk XYZFile
object from the file we specified, which now exists as a Python data structure.
To output a different filetype, we need to extract the Molecule
object represented by the file and write it as a .gjf
file:
molecule = file.get_molecule()
newfile = filename.rsplit('/',1)[-1]
newfile = re.sub(r"xyz$", "gjf", newfile)
GaussianFile.write_molecule_to_file(
newfile,
molecule,
"#p opt freq=noraman b3lyp/6-31g(d) empiricaldispersion=gd3bj",
None,
)
# to write an orca input simultaneously we could use the block below
newfile = re.sub(r"gjf$", "inp", newfile)
OrcaFile.write_molecule_to_file(
newfile,
molecule,
"! opt freq b3lyp/6-31g(d) d3bj",
)
The command write_molecule_to_file
is a class method, meaning we can create a .gjf
file without needing to create another Python object.
All we need to supply is the path to the new file, the Molecule
object (in this case, file.molecule
), and the header and footer for the new file.
(In this case, we have also ensured that the output file ends in .gjf
and is placed in the directory from which we run the script by using Python string manipulation.)
Running read_from_xyz_01.py on tutorial1.xyz generates the desired input files tutorial1.gjf and tutorial1.inp:
$ python read_from_xyz_01.py
The start of tutorial1.gjf is shown below:
%nprocshared=16
%mem=32GB
#p opt freq=noraman b3lyp/6-31g(d) empiricaldispersion=gd3bj
title
0 1
6 0.25892000 0.68427000 0.00004500
...
The and the start of tutorial1.inp:
! opt freq b3lyp/6-31g(d) d3bj
%maxcore 2000
%pal
nproc 16
end
* xyz 0 1
6 0.25892001 0.68427002 0.00004500
...
The script works!
Adding Command-Line Arguments¶
To create a more user-friendly script, we might want to make it so that we can specify the file and desired header without manually editing the script each time.
This can be done using Python’s argparse
module:
import sys, argparse, re
from cctk import GaussianFile, XYZFile
parser = argparse.ArgumentParser(prog="resubmit.py")
parser.add_argument("--header", "-h", type=str)
parser.add_argument("filename")
args = vars(parser.parse_args(sys.argv[1:]))
assert args["filename"], "Can't read file without a filename!"
assert args["header"], "Can't write file without a header!"
The script will now expect two arguments, the first of which must be preceded by the -h
flag.
After adding comments and integrating the above variables throughout, the final script looks like this:
import sys, argparse, re
from cctk import GaussianFile, XYZFile
#### Usage: python read_from_xyz.py -h "#p opt freq=noraman b3lyp/6-31g(d)" path/to/file.xyz
parser = argparse.ArgumentParser(prog="resubmit.py")
parser.add_argument("--header", "-h", type=str)
parser.add_argument("filename")
args = vars(parser.parse_args(sys.argv[1:]))
assert args["filename"], "Can't read file without a filename!"
assert args["header"], "Can't write file without a header!"
file = XYZFile.read_file(args["filename"])
newfile = args["filename"].rsplit('/',1)[-1]
newfile = re.sub(r"xyz$", "gjf", newfile)
GaussianFile.write_molecule_to_file(
newfile,
file.molecule,
args["header"],
None,
)
To run this on our test file, simply type:
python read_from_xyz.py -h "#p opt b3lyp/6-31(g)" tutorial1.xyz
This script can now be copied to other directories and used as a command-line tool. The template provided here can also be modified for myriad cctk-based applications, as future tutorials will demonstrate.