cctk: a Python-based computational chemistry toolkit.

cctk simplifies routine tasks in computational chemistry: preparing input files with scripts, checking whether jobs ran successfully, extracting energies and geometries, etc. All cctk operations are carried out using Python scripts. The prototypical workflow involves:

  1. Reading in output files from a quantum chemistry program like Gaussian.

  2. Analyzing the extracted data (e.g., determining which structure is lowest in energy).

  3. Writing out new input files for further calculations.

  4. Further analysis or visualization with pandas or matplotlib.

cctk Objects

Use these three main classes to interact with external quantum chemistry programs:

1. Molecule

A single molecular geometry.




the atomic number for each atom


xyz coordinates


the connectivity as a networkx graph


the overall charge


the spin multiplicity


All arrays that refer to atoms in cctk are 1-indexed (i.e., 1, 2, …, n). Thus, both the atomic_numbers and geometry fields are 1-indexed. In contrast, all arrays that refer to non-atoms are 0-indexed.

Various methods are available to measure or set geometric parameters (bond distances, bond angles, or dihedral angles).

2. Ensemble

A collection of molecules and associated properties.

Each Molecule in the Ensemble is associated with its properties (filenames, energies, NMR shieldings, etc.) using a dictionary. For example, a conformation of pentane might be mapped to this dict:

properties_dict = {
     'energy': -0.0552410743198,
     'scf_iterations': 2,
     'link1_idx': 0,
     'filename': 'test/static/pentane_conformation_1.out',
     ... }

To access Ensemble information, use the following syntax:




iterator over all molecules


the i-th molecule (0-indexed)


the second and third molecules as a list


the last molecule


iterator over (molecule, property dictionary) tuples


the property dictionary associated with molecule


one-dimensional array of energies, with None as a placeholder for any missing data


two-dimensional array of filenames and energies, with None as a placeholder for any missing data


list of molecules


list of the property dictionaries


Ensemble containing the first molecule and its properties


Ensemble containing the first and second molecules and their properties


Thus, Ensembles can be indexed or sliced to return smaller Ensembles. Note that while all such sub-Ensembles are new Ensemble objectes, they are essentially views of the original Ensemble, rather than deep copies.

A ConformationalEnsemble is a special case of an Ensemble in which each structure corresponds to the same molecule. This allows for RMSD calculation, structural alignment, and redundant conformer elimination to be carried out as desired (see tutorials).

3. GaussianFile

The results of a Gaussian job or the contents of an input file:

gaussian_file = cctk.GaussianFile.read_file(filename)

filename may be a Gaussian output file (.out/.log) or a Gaussian input file (.gjf/.com).

Important: cctk assumes that all Gaussian jobs will be run in verbose mode (#p in the route card). Parsing will not work correctly without #p .

As usual, molecules and their properties are stored in gaussian_file.ensemble:

ensemble = first_link.ensemble
energies = list(ensemble[:,"energy"])
# [-40.5169484082, -40.5183831835, -40.5183831835])

ensemble = second_link.ensemble
shieldings = ensemble[-1,"isotropic_shielding"]
# [192.9242, 31.8851, 31.8851, 31.8851, 31.8851]

Per cctk convention (vide infra), energies is 0-indexed, but shieldings is 1-indexed. (The -1 refers to the last geometry.)

(Note: if a Gaussian input file is read, no properties will be available, so the properties_dict for each molecule will be empty.)

Some Gaussian output files are composites of multiple jobs using the Link1 directive. In that case, GaussianFile.read_file(filename) will return one GaussianFile object per Link1 section.

For example, this is a two-step job:

gaussian_file = cctk.GaussianFile.read_file("test/static/methane2.out")
assert len(gaussian_file), 2
first_link = gaussian_file[0]
second_link = gaussian_file[1]

cctk will also interpret common job types via the cctk.JobType enum:

# first_link.job_types = [JobType.OPT, JobType.FREQ, JobType.SP]




Ensemble containing intermediate geometries and molecular properties


list of what kind of jobs were run


number of successful terminations


dictionary containing Link0 information (memory, processors, checkpoint filename, etc.)


route card (must start with #p)


title of Gaussian file


footer (optional)


how long this Link1 took in seconds of wallclock time (g16 and beyond only)


Limited support for other file formats is available (see Features section of documentation).


In cctk, arrays whose contents refer to atoms are always 1-indexed; other arrays are 0-indexed.

Thus, arrays of atomic numbers, positions, or NMR shieldings are 1-indexed, while arrays of molecules, files, or molecular property values are 0-indexed.

1-indexed arrays are implemented via cctk.OneIndexedArray, a custom subclass of np.ndarray. For example:


will return the coordinates of the first atom of the Molecule. However:


returns the first molecule of the Ensemble.