Overview¶

cctk: a Python-based computational chemistry toolkit.

cctk simplifies routine tasks in computational chemistry: preparing input files with scripts, checking whether jobs ran successfully, extracting energies and geometries, etc. All cctk operations are carried out using Python scripts. The prototypical workflow involves:

Reading in output files from a quantum chemistry program like Gaussian.
Analyzing the extracted data (e.g., determining which structure is lowest in energy).
Writing out new input files for further calculations.
Further analysis or visualization with pandas or matplotlib.

cctk Objects¶

Use these three main classes to interact with external quantum chemistry programs:

1. `Molecule`¶

A single molecular geometry.

Field	Description
`molecule.atomic_numbers`	the atomic number for each atom
`molecule.geometry`	xyz coordinates
`molecule.bonds`	the connectivity as a networkx graph
`molecule.charge`	the overall charge
`molecule.multiplicity`	the spin multiplicity

All arrays that refer to atoms in cctk are 1-indexed (i.e., 1, 2, …, n). Thus, both the atomic_numbers and geometry fields are 1-indexed. In contrast, all arrays that refer to non-atoms are 0-indexed.

Various methods are available to measure or set geometric parameters (bond distances, bond angles, or dihedral angles).

2. `Ensemble`¶

A collection of molecules and associated properties.

Each Molecule in the Ensemble is associated with its properties (filenames, energies, NMR shieldings, etc.) using a dictionary. For example, a conformation of pentane might be mapped to this dict:
properties_dict = {
     'energy': -0.0552410743198,
     'scf_iterations': 2,
     'link1_idx': 0,
     'filename': 'test/static/pentane_conformation_1.out',
     ... }
To access Ensemble information, use the following syntax:

Syntax	Result
`ensemble.molecules`	iterator over all molecules
`ensemble.molecules[i]`	the i-th molecule (0-indexed)
`ensemble.molecules[1:3]`	the second and third molecules as a list
`ensemble.molecules[-1]`	the last molecule
`ensemble.items()`	iterator over (molecule, property dictionary) tuples
`ensemble.get_properties_dict(molecule)`	the property dictionary associated with `molecule`
`ensemble[:,"energy"]`	one-dimensional array of energies, with `None` as a placeholder for any missing data
`ensemble[:,["filename","energy"]]`	two-dimensional array of filenames and energies, with `None` as a placeholder for any missing data
`ensemble.molecule_list()`	list of molecules
`ensemble.properties_list()`	list of the property dictionaries
`ensemble[0]`	`Ensemble` containing the first molecule and its properties
`ensemble[0:2]`	`Ensemble` containing the first and second molecules and their properties

Thus, Ensembles can be indexed or sliced to return smaller Ensembles. Note that while all such sub-Ensembles are new Ensemble objectes, they are essentially views of the original Ensemble, rather than deep copies.

A ConformationalEnsemble is a special case of an Ensemble in which each structure corresponds to the same molecule. This allows for RMSD calculation, structural alignment, and redundant conformer elimination to be carried out as desired (see tutorials).

3. `GaussianFile`¶

The results of a Gaussian job or the contents of an input file:
gaussian_file = cctk.GaussianFile.read_file(filename)
filename may be a Gaussian output file (.out/.log) or a Gaussian input file (.gjf/.com).

Important: cctk assumes that all Gaussian jobs will be run in verbose mode (#p in the route card). Parsing will not work correctly without #p .

As usual, molecules and their properties are stored in gaussian_file.ensemble:
ensemble = first_link.ensemble
energies = list(ensemble[:,"energy"])
# [-40.5169484082, -40.5183831835, -40.5183831835])

ensemble = second_link.ensemble
shieldings = ensemble[-1,"isotropic_shielding"]
# [192.9242, 31.8851, 31.8851, 31.8851, 31.8851]
Per cctk convention (vide infra), energies is 0-indexed, but shieldings is 1-indexed. (The -1 refers to the last geometry.)

(Note: if a Gaussian input file is read, no properties will be available, so the properties_dict for each molecule will be empty.)

Some Gaussian output files are composites of multiple jobs using the Link1 directive. In that case, GaussianFile.read_file(filename) will return one GaussianFile object per Link1 section.

For example, this is a two-step job:
gaussian_file = cctk.GaussianFile.read_file("test/static/methane2.out")
assert len(gaussian_file), 2
first_link = gaussian_file[0]
second_link = gaussian_file[1]
cctk will also interpret common job types via the cctk.JobType enum:
# first_link.job_types = [JobType.OPT, JobType.FREQ, JobType.SP]

Field	Description
`gaussian_file.ensemble`	`Ensemble` containing intermediate geometries and molecular properties
`gaussian_file.job_types`	list of what kind of jobs were run
`gaussian_file.succesful_terminations`	number of successful terminations
`gaussian_file.link0`	dictionary containing Link0 information (memory, processors, checkpoint filename, etc.)
`gaussian_file.route_card`	route card (must start with `#p`)
`gaussian_file.title`	title of Gaussian file
`gaussian_file.footer`	footer (optional)
`gaussian_file.elapsed_time`	how long this `Link1` took in seconds of wallclock time (`g16` and beyond only)

Limited support for other file formats is available (see Features section of documentation).

Indexing¶

In cctk, arrays whose contents refer to atoms are always 1-indexed; other arrays are 0-indexed.

Thus, arrays of atomic numbers, positions, or NMR shieldings are 1-indexed, while arrays of molecules, files, or molecular property values are 0-indexed.

1-indexed arrays are implemented via cctk.OneIndexedArray, a custom subclass of np.ndarray. For example:

molecule.geometry[1]

will return the coordinates of the first atom of the Molecule. However:

ensemble.molecules[0]

returns the first molecule of the Ensemble.

Overview¶

cctk Objects¶

1. Molecule¶

2. Ensemble¶

3. GaussianFile¶

Indexing¶

1. `Molecule`¶

2. `Ensemble`¶

3. `GaussianFile`¶