Command Line Interface

smartclass automatically installs the command smartclass. See smartclass --help for usage details.

smartclass

Smartclass - Classify chemical structures using SMARTS patterns.

A tool for classifying chemical structures against SMARTS-based chemical class definitions from Wikidata or custom sources.

Examples:
# Classify a single molecule
smartclass searchclasses -s “CCO” -c classes.tsv

# Query Wikidata for chemical classes smartclass querywikidata -q query.rq -o output.tsv

Usage

smartclass [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

-v, --verbose

Increase verbosity (-v for INFO, -vv for DEBUG).

combinecsvfiles

Combine multiple CSV/TSV files into one.

Merges rows from multiple input files with the same schema.

Usage

smartclass combinecsvfiles [OPTIONS]

Options

-i, --input-file <input_file>

Required Input CSV/TSV file(s) to combine.

-o, --output <output>

Required Output file path.

getlatestchembl

Download and process the latest ChEMBL database.

Generates fingerprints for molecules in ChEMBL for use in classification and similarity searches.

Usage

smartclass getlatestchembl [OPTIONS]

Options

-f, --fp-len <fp_len>

Fingerprint length for molecular fingerprints.

Default:

2048

-m, --max-atoms <max_atoms>

Maximum number of atoms in molecules to process.

Default:

50

-r, --report-interval <report_interval>

Progress reporting interval (number of molecules).

Default:

50000

-t, --tautomer-fingerprints, --no-tautomer-fingerprints

Include tautomer fingerprints in output.

loadpkgdata

Load and display bundled package data.

Shows the chemical classes, mappings, and MIA data included with the smartclass package.

Usage

smartclass loadpkgdata [OPTIONS]

querywikidata

Query Wikidata using SPARQL and export results.

Execute a SPARQL query against Wikidata and optionally apply chemical transformations to the results.

Examples:

# Get chemical classes with SMARTS
smartclass querywikidata -q classes_smarts.rq -o classes.tsv
# Generate InChIKeys from InChI
smartclass querywikidata -q inchi.rq -t transform_inchi_to_inchikey -o keys.csv

Usage

smartclass querywikidata [OPTIONS]

Options

-q, --query <query>

Path to SPARQL query file (.rq). If omitted with –output, the command runs the default class query batch.

-o, --output <output>

Output file path (CSV or TSV based on extension). Requires –query. If omitted with –query, the command runs the default class query batch.

-r, --remove-prefix, --keep-prefix

Remove Wikidata entity prefix from results.

-t, --transform <transform>

Apply a transformation to query results (e.g., check_smiles, transform_inchi_to_inchikey, transform_smiles_to_inchi).

Options:

check_smiles | transform_inchi_to_inchikey | transform_inchi_to_mass | transform_inchi_to_smiles_canonical | transform_inchi_to_smiles_isomeric | transform_smiles_to_formula | transform_smiles_to_inchi | transform_smiles_to_mass | transform_smiles_i_to_smiles_c | transform_formula_to_formula

-u, --url <url>

SPARQL endpoint URL.

Default:

'https://query.wikidata.org/sparql'

searchclasses

Classify chemical structures against SMARTS-based classes.

Match input SMILES strings against chemical class definitions using substructure searching. Results include the matched class and structural similarity metrics.

Examples:

# Classify a single molecule
smartclass searchclasses -s “CCO” -c classes.tsv -v
# Classify molecules from a file
smartclass searchclasses -i molecules.tsv -c classes.tsv
# Get all matches, not just closest
smartclass searchclasses -s “CCO” -c classes.tsv –all-matches

Usage

smartclass searchclasses [OPTIONS]

Options

-c, --classes-file <classes_file>

TSV file with chemical class definitions (class, structure columns).

-d, --classes-name-id <classes_name_id>

Column name for class identifiers.

Default:

'class'

-e, --classes-name-smarts <classes_name_smarts>

Column name for SMARTS patterns.

Default:

'structure'

-f, --include-hierarchy, --no-hierarchy

Use chemical hierarchy for faster BFS-based searching.

-i, --input-smiles <input_smiles>

Input file containing SMILES (CSV/TSV with ‘smiles’ column).

-s, --smiles <smiles>

SMILES string(s) to classify. Can be specified multiple times.

-z, --closest-only, --all-matches

Return only closest matching class per structure.

searchclasses-all-sources

Classify structures against SMARTS, SMILES, and CXSMILES class sources.

This command runs three classification passes and combines the results, adding a class_source field (smarts, smiles, cxsmiles).

Usage

smartclass searchclasses-all-sources [OPTIONS]

Options

--classes-smarts-file <classes_smarts_file>

TSV file with SMARTS-based class definitions.

Default:

PosixPath('scratch/wikidata_classes_smarts.tsv')

--classes-smiles-file <classes_smiles_file>

TSV file with SMILES-based class definitions.

Default:

PosixPath('scratch/wikidata_classes_smiles.tsv')

--classes-cxsmiles-file <classes_cxsmiles_file>

TSV file with CXSMILES-based class definitions.

Default:

PosixPath('scratch/wikidata_classes_cxsmiles.tsv')

-f, --include-hierarchy, --no-hierarchy

Use chemical hierarchy for faster BFS-based searching.

-i, --input-smiles <input_smiles>

Input file containing SMILES (CSV/TSV with ‘smiles’ column).

-s, --smiles <smiles>

SMILES string(s) to classify. Can be specified multiple times.

-z, --closest-only, --all-matches

Return only closest matching class per structure.

--output-dir <output_dir>

Directory for combined output files.