cctk.Molecule

class cctk.Molecule(atomic_numbers, geometry, name=None, bonds=None, charge=0, multiplicity=1, checks=True)[source]

Bases: object

Class representing a single molecular geometry.

In contrast to typical Python behavior, atomic_numbers and geometry are indexed from one, to simplify interfacing with computational chemistry programs. This has been done by defining a custom wrapper for numpy.ndarray called cctk.OneIndexedArray.

All other datatypes are indexed from 0.

name

for identification, optional

Type:

str

atomic_numbers

list of atomic numbers

Type:

cctk.OneIndexedArray, dtype=np.int8

geometry

list of 3-tuples of xyz coordinates - same ordering as atomic_numbers

Type:

cctk.OneIndexedArray

bonds

connectivity graph or list of 2-tuples, with each element representing the 1-indexed atom number of a bonded pair

Type:

nx.Graph or list of tuples

charge

the charge of the molecule

Type:

int

multiplicity

the spin state of the molecule (1 corresponds to singlet, 2 to doublet, 3 to triplet, etc. – so a multiplicity of 1 is equivalent to S=0)

Type:

int

vibrational_modes

vibrational modes

Type:

list of cctk.VibrationalMode

__init__(atomic_numbers, geometry, name=None, bonds=None, charge=0, multiplicity=1, checks=True)[source]

Create new Molecule object, and assign connectivity if needed.

bonds must be a list of edges (i.e. an n x 2 numpy array).

If checks is True, the atomic numbers in bonds will all be checked for consistency. This option can be disabled by setting checks to False, but this is not recommended for external data.

Methods

__init__(atomic_numbers, geometry[, name, ...])

Create new Molecule object, and assign connectivity if needed.

add_atom(symbol, coordinates)

Add an atom with symbol symbol at position coordinates.

add_atom_at_centroid(symbol, atom_numbers[, ...])

Adds atom with symbol symbol at the centroid of the atoms in atom_numbers.

add_bond(atom1, atom2[, bond_order, check])

Adds a new bond to the bond graph, or updates the existing bond order.

are_connected(atom1, atom2)

Wrapper to tell if two atoms are connected.

assign_connectivity([cutoff, ...])

Automatically recalculates bonds based on covalent radii.

atom_string(atom)

Returns the elemental symbol and the atom number for a given atom.

atomic_symbols()

Return list of atomic symbols.

atoms_moving_in_imaginary([max_num, ...])

Returns atoms moving in imaginary, ranked by how much they're moving.

calculate_mass_spectrum(**kwargs)

Generates list of m/z values.

center()

Moves the centroid to the origin.

center_of_mass()

Returns the center-of-mass of the molecule, as a np.array.

center_periodic(center, side_length)

Adjusts a molecule to be in the center of a cube, moving all other molecules accordingly.

check_for_conflicts([min_buffer, group1, group2])

Automatically checks for conflicts based on covalent radii.

combine_molecules(molecule1, molecule2)

Combine two molecules into one final molecule.

compute_energy([nprocs])

Compute energy of molecule at the GFN2-xtb level of theory.

coulomb_analysis(atoms1, atoms2, charges)

Computes the net Coulomb forces between atoms atoms1 and atoms atoms2.

csearch([nprocs, constraints, logfile, ...])

Optimize molecule at the GFN2-xtb level of theory.

epimerize(center_atom, substituent1, ...)

Epimerizes center_atom by exchanging the groups corresponding to substituent1 and substituent2.

equal(mol1, mol2)

Atomic numbers, geometry, charge, and multiplicity all must match.

formula([return_dict])

Returns the atomic formula.

fragment()

Returns list of cctk.Molecule objects based on the bond-connected components of self.

from_string(string[, check_version])

Loads a cctk.Molecule object from a string.

get_adjacent_atoms(atom)

Returns a list of the neighbors of atom.

get_angle([atom1, atom2, atom3, check, ...])

Wrapper to compute angle between three atoms.

get_atomic_number(atom)

Get the atomic number for a given atom.

get_atomic_symbol(atom)

Get the atomic symbol for a given atom.

get_atomic_symbols()

Get a list of atomic symbols for this Molecule.

get_atoms_by_symbol(symbol)

Returns all the numbers of atoms of type symbol in the molecule.

get_bond_order(atom1, atom2)

Wrapper to get bond order between two atoms.

get_components()

Returns a list of all the connected components in a molecule.

get_dihedral([atom1, atom2, atom3, atom4, ...])

Wrapper to compute dihedral angle between four atoms.

get_distance([atom1, atom2, check, _dist, atoms])

Wrapper to compute distance between two atoms.

get_heavy_atoms()

Returns a list of all the heavy atoms in the molecule (i.e., not hydrogen), for array indexing.

get_n_atoms()

Determine how many atoms are in this Molecule.

get_sq_distance(atom1, atom2[, check])

Wrapper to compute squared distance between two atoms -- optimized for speed!

get_symmetric_atoms()

Returns lists of symmetric atoms, as defined in cctk.load_group.

get_vector(atom[, atom2, check])

Get the geometry vector for a given atom.

is_atom_in_ring(atom)

limit_solvent_shell([solute, num_atoms, ...])

Automatically detects solvent molecules and removes them until you have a set number of solvents or atoms.

new_from_name(name)

Create a new Molecule instance using rdkit.

new_from_smiles(smiles)

Create a new Molecule instance using rdkit.

num_atoms()

num_neighbors_by_atom()

Returns a list of the number of neighbors of each atom.

optimize([inplace, nprocs, return_energy])

Optimize molecule at the GFN2-xtb level of theory.

optimize_dihedral(atom1, atom2, atom3, atom4)

Minimizes the value of self.rms_distance_between_atoms for the given dihedral, in one-degree increments.

perturb([size])

This function can be used to generate a slightly different molecule in cases where numerical (or geometric) converge is problematic.

principal_axes_of_rotation()

Compute principal axes of rotation and corresponding moments of inertia.

remove_atom(number)

Remove the atom with number number.

remove_bond(atom1, atom2)

Alias for self.add_bond(atom1, atom2, bond_order=0) -- more intuitive nomenclature.

renumber_to_match(model[, check_chirality])

Renumbers atoms to match model (must have isomorphic bond graph).

rms_distance_between_atoms()

Returns the RMS distance (in Angstroms) between every pair of atoms - a quick, easy-to-calculate proxy for minimizing steric clashes.

rotate_molecule(axis, degrees)

Rotates the whole molecule around the given axis.

set_angle([atom1, atom2, atom3, angle, ...])

Adjusts the atom1 -- atom2 -- atom3 bond angle to be a fixed value by moving atom3.

set_dihedral([atom1, atom2, atom3, atom4, ...])

Adjusts the atom1 -- atom2 -- atom3 -- atom4 dihedral angle to be a fixed value by moving atom 4.

set_distance([atom1, atom2, distance, move, ...])

Adjusts the atom1 -- atom2 bond length to be a fixed distance by moving atom2.

swap_atom_numbers(atom1, atom2)

Interchanges the numbers of atom1 and atom2.

to_string()

Save the current molecule as a string, for subsequent loading.

translate_molecule(vector)

Translates the whole molecule by the given vector.

volume([pts_per_angstrom, qhull])

Returns volume calculated using the Gavezotti algorithm (JACS, 1983, 105, 5220).

add_atom(symbol, coordinates)[source]

Add an atom with symbol symbol at position coordinates.

Parameters:
  • symbol (str) – symbol of the atom (e.g. “Cl”, “Ar”, “C”)

  • coordinates (list) – the coordinates to add

Returns:

the Molecule object

add_atom_at_centroid(symbol, atom_numbers, weighted=False)[source]

Adds atom with symbol symbol at the centroid of the atoms in atom_numbers.

If weighted is True, then the centroid calculation will take into account the atomic numbers of the atoms in question (placing the atom closer to more massive atoms).

Otherwise, the average is unweighted.

Parameters:
  • symbol (str) – the atomic symbol of the atom to be added

  • atom_numbers (list) – which atoms to put the new atom between

  • weighted (Bool) – if the centroid calculation should be weighted (see above)

Returns:

the Molecule object

add_bond(atom1, atom2, bond_order=1, check=True)[source]

Adds a new bond to the bond graph, or updates the existing bond order. Will not throw an error if the bond already exists.

Parameters:
  • atom1 (int) – the number of the first atom

  • atom2 (int) – the number of the second atom

  • bond_order (int) – bond order of bond between atom1 and atom2

are_connected(atom1, atom2)[source]

Wrapper to tell if two atoms are connected.

assign_connectivity(cutoff=0.2, periodic_boundary_conditions=None)[source]

Automatically recalculates bonds based on covalent radii. If two atoms are closer than the sum of their covalent radii + cutoff Angstroms, then they are considered bonded.

Parameters:

cutoff (float) – the threshold (in Angstroms) for how close two covalent radii must be to be considered bonded

Returns:

self

atom_string(atom)[source]

Returns the elemental symbol and the atom number for a given atom.

For example, methane.atom_string(1) might return “C1”.

Parameters:

atom (int) – number of the atom

Returns:

the aforementioned atom string

atomic_symbols()[source]

Return list of atomic symbols.

atoms_moving_in_imaginary(max_num=5, percent_cutoff=0.03, return_string=False)[source]

Returns atoms moving in imaginary, ranked by how much they’re moving.

Parameters:
  • max_num (int) – how many atoms max to return

  • percent_cutoff (float) – threshold for what percent of total TS movement qualifies as “movement”

  • return_string (bool) – whether or not to return a formatted string report

Returns:

list of atomic numbers or string

calculate_mass_spectrum(**kwargs)[source]

Generates list of m/z values.

Final weights rounded to one decimal point (because of low-res MS).

center()[source]

Moves the centroid to the origin.

center_of_mass()[source]

Returns the center-of-mass of the molecule, as a np.array.

center_periodic(center, side_length)[source]

Adjusts a molecule to be in the center of a cube, moving all other molecules accordingly. Bonded subgroups will be moved as a unit.

For analysis of MD files with periodic boundary conditions.

Parameters:
  • center (int) – atomic number to center

  • side_length (float) – length of side, in Å

check_for_conflicts(min_buffer=1, group1=None, group2=None)[source]

Automatically checks for conflicts based on covalent radii. If two atoms are closer than the sum of their covalent radii + buffer, then they are considered clashing. If group1 and group2 are selected, then conflicts will only be evaluated between these two groups of atoms.

Parameters:
  • min_buffer (float) – the threshold (in Angstroms) for how close two covalent radii must be to be considered clashing. 1.0 A is default, empirically.

  • group1 (list) – atoms to evaluate against group2 (if None, defaults to all atoms)

  • group2 (list) – atoms to evaluate against group1 (if None, defaults to all atoms)

Returns:

True if there are no conflicts ValueError if there is a conflict

classmethod combine_molecules(molecule1, molecule2)[source]

Combine two molecules into one final molecule.

Bonding information is not currently preserved.

Parameters:
Returns:

new Molecule object

compute_energy(nprocs=1)[source]

Compute energy of molecule at the GFN2-xtb level of theory.

Parameters:

nprocs (int) – number of processors to use

coulomb_analysis(atoms1, atoms2, charges)[source]

Computes the net Coulomb forces between atoms atoms1 and atoms atoms2.

csearch(nprocs=1, constraints=[], logfile=None, noncovalent=False, use_tempdir=True, gfn=2, additional_flags=None)[source]

Optimize molecule at the GFN2-xtb level of theory.

Parameters:
  • nprocs (int) – number of processors to use

  • constraints (list) – atoms numbers to freeze

  • noncovalent (bool) – whether or not to use non-covalent settings

  • logfile (str) – file to write ongoing crest output to

  • use_tempdir (bool) – write intermediate files to hidden directory (as opposed to current directory)

  • gfn (int or ff) – level of theory, either 1, 2, or ff

  • additional_flags (str) – additional flags for command line

Returns

ConformationalEnsemble

epimerize(center_atom, substituent1, substituent2)[source]

Epimerizes center_atom by exchanging the groups corresponding to substituent1 and substituent2. Both substituents must be bonded to the center atom!

Parameters:
  • center_atom (int) – number of middle atom

  • substituent1 (int) – number of 1st atom

  • substituent1 – number of 2nd atom

Returns

new Molecule object (does not modify in-place)

classmethod equal(mol1, mol2)[source]

Atomic numbers, geometry, charge, and multiplicity all must match. Name is irrelevant.

formula(return_dict=False)[source]

Returns the atomic formula.

If return_dict is True, then returns a dictionary with keys elemental symbols and values the number of occurrences.

For instance, water.formula() would return {'O': 1, 'H': 2}.

If return_dict is False, then returns a stringified version of the formula according to standard rules.

For instance, water.formula() would return H2O.

Parameters:

return_dict (Bool) – if the method should return a string or a dictionary

Returns:

a dictionary or string representing the molecule’s formula

fragment()[source]

Returns list of cctk.Molecule objects based on the bond-connected components of self.

classmethod from_string(string, check_version=True)[source]

Loads a cctk.Molecule object from a string.

Parameters:
  • string (str) – stringified version of the molecule

  • check_version (bool) – whether version consistency should be enforced

get_adjacent_atoms(atom)[source]

Returns a list of the neighbors of atom. If atom has no neighbors, an empty list will be returned.

get_angle(atom1=None, atom2=None, atom3=None, check=True, _angle=<function compute_angle_between>, atoms=None)[source]

Wrapper to compute angle between three atoms.

This function is relatively slow (rate-limiting for certain applications), so performance boosts have been implemented (e.g. preloading _angle).

Parameters:
  • atom1 (int) – number of the first atom

  • atom2 (int) – number of the second atom

  • atom3 (int) – number of the third atom

  • check (Bool) – whether to validate input data (can be overridden to prevent slow double-checking)

  • _angle (function) – function usd to compute angle

  • atoms (list) – list of atom numbers

Returns:

the angle, in degrees

get_atomic_number(atom)[source]

Get the atomic number for a given atom.

Parameters:

atom (int) – number of the first atom

Returns:

the atomic number of that atom

Return type:

atomic_number (int)

get_atomic_symbol(atom)[source]

Get the atomic symbol for a given atom.

Parameters:

atom (int) – number of the first atom

Returns:

the atomic symbol of that atom

Return type:

atomic_symbol (str)

get_atomic_symbols()[source]

Get a list of atomic symbols for this Molecule.

Returns:

the atomic symbols

Return type:

atomic_symbols (cctk.OneIndexedArray)

get_atoms_by_symbol(symbol)[source]

Returns all the numbers of atoms of type symbol in the molecule.

get_bond_order(atom1, atom2)[source]

Wrapper to get bond order between two atoms.

Parameters:
  • atom1 (int) – number of the first atom

  • atom2 (int) – number of the second atom

Returns:

the bond order

get_components()[source]

Returns a list of all the connected components in a molecule.

get_dihedral(atom1=None, atom2=None, atom3=None, atom4=None, check=True, _dihedral=<function compute_dihedral_between>, atoms=None)[source]

Wrapper to compute dihedral angle between four atoms.

This function is relatively slow (rate-limiting for certain applications), so performance boosts have been implemented (e.g. preloading _dihedral).

Parameters:
  • atom1 (int) – number of the first atom

  • atom2 (int) – number of the second atom

  • atom3 (int) – number of the third atom

  • atom4 (int) – number of the fourth atom

  • check (Bool) – whether to validate input data (can be overridden to prevent slow double-checking)

  • _dihedral (function) – function used to compute dihedral

  • atoms (list) – list of atom numbers

Returns:

the dihedral angle, in degrees

get_distance(atom1=None, atom2=None, check=True, _dist=<function compute_distance_between>, atoms=None)[source]

Wrapper to compute distance between two atoms.

This function is relatively slow (rate-limiting for certain applications), so performance boosts have been implemented (e.g. preloading _dist).

Parameters:
  • atom1 (int) – number of the first atom

  • atom2 (int) – number of the second atom

  • check (Bool) – whether to validate input data (can be overridden to prevent slow double-checking)

  • _dist (function) – function usd to compute distance

  • atoms (list) – list of atomic numbers

Returns:

the distance, in Angstroms

get_heavy_atoms()[source]

Returns a list of all the heavy atoms in the molecule (i.e., not hydrogen), for array indexing.

get_n_atoms()[source]

Determine how many atoms are in this Molecule.

Returns

n_atoms (int): the number of atoms

get_sq_distance(atom1, atom2, check=True)[source]

Wrapper to compute squared distance between two atoms – optimized for speed!

Parameters:
  • atom1 (int) – number of the first atom

  • atom2 (int) – number of the second atom

  • check (Bool) – whether to validate input data (can be overridden to prevent slow double-checking)

Returns:

the squared distance

get_symmetric_atoms()[source]

Returns lists of symmetric atoms, as defined in cctk.load_group.

Useful for NMR spectroscopy, etc.

get_vector(atom, atom2=None, check=True)[source]

Get the geometry vector for a given atom. If two atoms are specified, gives the vector connecting them (from atom2 to atom). mol.get_vector(atom) is thus equivalent to mol.get_vector(atom, origin).

Parameters:
  • atom1 (int) – number of the first atom

  • atom2 (int) – number of the second atom (optional)

  • check (Bool) – whether to validate input data (can be overridden to prevent slow double-checking)

Returns:

a Numpy array

is_atom_in_ring(atom)[source]
limit_solvent_shell(solute=0, num_atoms=0, num_solvents=10, distance_from_atom=None, return_idxs=False)[source]

Automatically detects solvent molecules and removes them until you have a set number of solvents or atoms.

The “distance” between molecules is the minimum of the pairwise atomic distances, to emphasize inner-sphere interactions.

Parameters:
  • solute (int) – which fragment is the solute, 0-indexed

  • num_atoms (int) – remove atoms until there are this number (modulo the size of a solvent molecule)

  • num_solvents (int) – remove solvent molecules until there are this number

  • distance_from_atom (int) – if you want to find molecules closest to a given atom in the solute, specify the atom number here. if this atom is not in the solute fragment, an exception will be raised.

  • return_idxs (bool) – if True, indices of atoms that would be in the new molecule are returned. no change is made to self.

Returns:

new Molecule object

classmethod new_from_name(name)[source]

Create a new Molecule instance using rdkit.

classmethod new_from_smiles(smiles)[source]

Create a new Molecule instance using rdkit.

num_atoms()[source]
num_neighbors_by_atom()[source]

Returns a list of the number of neighbors of each atom.

optimize(inplace=True, nprocs=1, return_energy=False)[source]

Optimize molecule at the GFN2-xtb level of theory.

Parameters:
  • inplace (Bool) – whether or not to return a new molecule or simply modify self.geometry

  • nprocs (int) – number of processors to use

  • return_energy (Bool) – whether to return energy or not

optimize_dihedral(atom1, atom2, atom3, atom4, step=10)[source]

Minimizes the value of self.rms_distance_between_atoms for the given dihedral, in one-degree increments. A cheap alternative to geometry optimization using ab initio methods or density functional theory.

Parameters:
  • atom1 (int) – atom number of the first atom in the dihedral

  • atom2 (int) – atom number of the second atom in the dihedral

  • atom3 (int) – atom number of the third atom in the dihedral

  • atom4 (int) – atom number of the fourth atom in the dihedral

  • step (float) – explore angles from 0 to 360 with this stepsize in degrees

Returns:

the final value of the angle

perturb(size=0.005)[source]

This function can be used to generate a slightly different molecule in cases where numerical (or geometric) converge is problematic.

It adds a random variable (sampled from a normal distribution, centered at 0 with stddev size`) to every number in ``self.geometry.

Parameters:

size (float) – stddev of the normal distribution

Returns:

the Molecule object

principal_axes_of_rotation()[source]

Compute principal axes of rotation and corresponding moments of inertia.

See Jprogdyn, RotationalBoltzmann, lines 48–115.

Returns:

moments of intertia (3-element np.array) - some may be zero axes of rotation (3 x 3 np.array)

remove_atom(number)[source]

Remove the atom with number number.

Parameters:

number (int) – number of the atom

Returns:

the Molecule object

remove_bond(atom1, atom2)[source]

Alias for self.add_bond(atom1, atom2, bond_order=0) – more intuitive nomenclature.

renumber_to_match(model, check_chirality='all')[source]

Renumbers atoms to match model (must have isomorphic bond graph). Returns a copy of self with renumbered atoms.

Parameters:
  • model (cctk.Molecule) – isomorphic molecule to renumber by

  • check_chirality (list of atomic numbers) – atomic numbers to check, to prevent inversion due to graph isomorphism. Alternatively None will prevent any checking and “all” will use cctk.topology.get_exchangable_centers().

Returns:

new Molecule object

rms_distance_between_atoms()[source]

Returns the RMS distance (in Angstroms) between every pair of atoms - a quick, easy-to-calculate proxy for minimizing steric clashes.

rotate_molecule(axis, degrees)[source]

Rotates the whole molecule around the given axis.

Parameters:
  • axis (vector) – the vector to rotate about

  • theta (float) – how much to rotate (in degrees)

Returns:

the Molecule object

set_angle(atom1=None, atom2=None, atom3=None, angle=None, move='group', atoms=None)[source]

Adjusts the atom1atom2atom3 bond angle to be a fixed value by moving atom3.

If move is set to “group”, then all atoms bonded to atom3 will also be moved.

If move is set to “atom”, then only atom3 will be moved.

Parameters:
  • atom1 (int) – the number of the first atom

  • atom2 (int) – the number of the second atom

  • atom3 (int) – the number of the third atom

  • angle (float) – final value in degrees of the atom1atom2atom3 angle

  • move (str) – determines how fragment moving is handled

  • atoms (list) – 3-element list of atom numbers

Returns:

the Molecule object

set_dihedral(atom1=None, atom2=None, atom3=None, atom4=None, dihedral=None, move='group34', check_result=True, atoms=None)[source]

Adjusts the atom1atom2atom3atom4 dihedral angle to be a fixed value by moving atom 4.

If move is set to “atom”, then only atom4 will be moved.

If move is set to “group4”, then all atoms bonded to atom4 will also be moved.

If move is set to “group34”, then all atoms bonded to atom3 and atom4 will also be moved.

Parameters:
  • atom1 (int) – the number of the first atom

  • atom2 (int) – the number of the second atom

  • atom3 (int) – the number of the third atom

  • atom4 (int) – the number of the fourth atom

  • dihedral (float) – final value in degrees of the atom1atom2atom3atom4 angle

  • move (str) – determines how fragment moving is handled

  • check_result (Bool) – whether the final answer should be checked for correctness

  • atoms (list) – 4-element list of atomic numbers

Returns:

the Molecule object

set_distance(atom1=None, atom2=None, distance=None, move='group', atoms=None)[source]

Adjusts the atom1atom2 bond length to be a fixed distance by moving atom2.

If move is set to “group”, then all atoms bonded to atom2 will also be moved.

If move is set to “atom”, then only atom2 will be moved.

Parameters:
  • atom1 (int) – the number of the first atom

  • atom2 (int) – the number of the second atom

  • distance (float) – distance in Angstroms of the final bond

  • move (str) – determines how fragment moving is handled

  • atoms (list) – 2-element list of atom numbers

Returns:

the Molecule object

swap_atom_numbers(atom1, atom2)[source]

Interchanges the numbers of atom1 and atom2.

Parameters:
  • atom1 (int) – number of 1st atom

  • atom2 (int) – number of 2nd atom

Returns

new Molecule object (does not modify in-place)

to_string()[source]

Save the current molecule as a string, for subsequent loading. Not human-readable.

Vibrational modes are currently not saved.

translate_molecule(vector)[source]

Translates the whole molecule by the given vector.

Parameters:

vector (vector) – the vector to translate by

Returns:

the Molecule object

volume(pts_per_angstrom=10, qhull=False)[source]

Returns volume calculated using the Gavezotti algorithm (JACS, 1983, 105, 5220). Relatively slow. If MemoryError, defaults to a qhull-based approach (accurate in the limit as number of atoms goes to infinity)

Parameters:
  • pts_per_angstrom (int) – how many grid points to use per Å - time scales as O(n**3) so be careful!

  • qhull (bool) – use faster QHull algorithm

Returns:

volume in Å**3