---------------------------------------------------------------------- This is the API documentation for the smartclass library. ---------------------------------------------------------------------- ## Classes Main classes provided by the package ChemicalConversionError(message: 'str', *args: 'Any', **kwargs: 'Any') -> 'None' Error during chemical format conversion. Raised when conversion between chemical representations fails, e.g. SMILES to Mol or InChI to SMILES. ClassificationError(message: 'str', *args: 'Any', **kwargs: 'Any') -> 'None' Error during chemical classification. ConfigurationError(message: 'str', *args: 'Any', **kwargs: 'Any') -> 'None' Error in configuration settings. DataExportError(destination: 'str', reason: 'str | None' = None, *args, **kwargs) -> 'None' Error exporting data to a file. DataLoadingError(source: 'str', reason: 'str | None' = None, *args, **kwargs) -> 'None' Error loading data from a file or URL. InChIError(inchi: 'str', *args, **kwargs) -> 'None' Error parsing an InChI string. InvalidInputError(parameter: 'str', value: 'object', reason: 'str | None' = None, *args, **kwargs) -> 'None' Error for invalid user input. MoleculeParsingError(input_string: 'str', input_type: 'str' = 'unknown', *args: 'Any', **kwargs: 'Any') -> 'None' Error parsing a molecule representation. Raised when RDKit fails to parse a molecular structure. NetworkError(url: 'str', status_code: 'int | None' = None, reason: 'str | None' = None, *args, **kwargs) -> 'None' Error during network operations. SMARTSError(smarts: 'str', *args, **kwargs) -> 'None' Error parsing a SMARTS pattern. SMILESError(smiles: 'str', *args, **kwargs) -> 'None' Error parsing a SMILES string. ## Exceptions Exception classes SmartclassError(message: 'str', *args: 'Any', **kwargs: 'Any') -> 'None' Base exception for all smartclass errors. All custom exceptions in smartclass inherit from this class, making it easy to catch all smartclass-specific errors. ## Functions Utility functions calculate_mcs(mols: 'list', threshold: 'float | None' = None, ring_matches_ring_only: 'bool' = False, timeout: 'int | None' = None) -> 'rdFMCS.MCSResult' Calculate Maximum Common Substructure (MCS) for a list of molecules. Uses RDKit's FindMCS algorithm with flexible atom and bond matching. Parameters ---------- mols : list Mol objects. threshold : float | None None. Default is None. ring_matches_ring_only : bool False. Default is False. timeout : int | None None. Default is None. Returns ------- rdFMCS.MCSResult MCS result object containing the common substructure. check_missing_stereochemistry(smiles: 'str', use_legacy: 'bool' = False) -> 'bool | None' Check stereochemistry. Parameters ---------- smiles : str SMILES. use_legacy : bool False. Default is False. Returns ------- bool | None None] flag. check_smiles_contains_no_dot(smiles: 'str') -> 'bool' Checks if SMILES contains no dot. Parameters ---------- smiles : str SMILES. Returns ------- bool SMILES meets the given criteria. check_smiles_contains_no_isotope(smiles: 'str') -> 'bool' Checks if SMILES contains no isotope. Parameters ---------- smiles : str SMILES. Returns ------- bool SMILES meets the given criteria. check_smiles_isomeric(smiles_isomeric: 'str', transform_to_canonical: 'bool' = False) -> 'str | None' Check isomeric smiles. Parameters ---------- smiles_isomeric : str SMILES. transform_to_canonical : bool False. Default is False. Returns ------- str | None SMILES. combine_csv_files(input_files: 'list[str] | str', output_file: 'str', separator='\t') -> 'None' Combine multiple CSV files into a single CSV file. Parameters ---------- input_files : list[str] | str CSV file path. output_file : str CSV file path where the combined data will be saved. separator : Any CSV files (default is 'tab'). Default is ' '. configure_logging(level: 'int | str' = 20, format_string: 'str' = '%(asctime)s - %(name)s - %(levelname)s - %(message)s', date_format: 'str' = '%Y-%m-%d %H:%M:%S', log_file: 'Path | str | None' = None, force: 'bool' = False) -> 'None' Configure logging for the smartclass package. This should be called once at application startup. Subsequent calls will be ignored unless `force=True`. Parameters ---------- level : int | str Logging level (e.g., logging.DEBUG, logging.INFO, "DEBUG", "INFO"). Default is logging.INFO. format_string : str DEFAULT_FORMAT. Default is DEFAULT_FORMAT. date_format : str DEFAULT_DATE_FORMAT. Default is DEFAULT_DATE_FORMAT. log_file : Path | str | None None. Default is None. force : bool False. Default is False. convert_chemical_formula(text: 'str') -> 'str' Convert chemical formula. Parameters ---------- text : str Text. Returns ------- str Modified text. convert_classyfire_dict(classyfire_json: 'str' = 'scratch/classyfire.json', output: 'str' = 'scratch/chemontids.txt') -> 'None' Converts the classyfire json into a CHEMONTID dictionary. Parameters ---------- classyfire_json : str Default is 'scratch/classyfire.json'. output : str Default is 'scratch/chemontids.txt'. convert_inchi_to_inchikey(inchi: 'str') -> 'str' Convert a structure InChI to InChIKey. Parameters ---------- inchi : str InChI. Returns ------- str InChIKey. convert_inchi_to_mass(inchi: 'str') -> 'float | None' Convert a structure InChI to mass. Parameters ---------- inchi : str InChI. Returns ------- float | None A mass. convert_inchi_to_mol(inchi: 'str') -> 'Mol | None' Convert a structure InChI to MOL. Parameters ---------- inchi : str InChI. Returns ------- Mol | None MOL. convert_inchi_to_smiles(inchi: 'str') -> 'str | None' Convert a structure InChI to SMILES. Parameters ---------- inchi : str InChI. Returns ------- str | None SMILES. convert_list_of_dict(list_of_dict: 'list', key: 'str', value: 'str', invert: 'bool' = False) -> 'dict' Convert a list of dictionaries to a dictionary with possible inversion. Parameters ---------- list_of_dict : list The list of dictionaries to convert. key : str The key. value : str The value. invert : bool False. Default is False. Returns ------- dict A dictionary with given keys and values. convert_mol_to_cxsmiles(mol: 'str') -> 'str' Convert a structure MOL to CXSMILES. Parameters ---------- mol : str MOL. Returns ------- str CXSMILES. convert_mol_to_inchi(mol: 'Mol') -> 'str' Convert a structure MOL to InChI. Parameters ---------- mol : Mol MOL. Returns ------- str InChI. convert_mol_to_inchikey(mol: 'Mol') -> 'str' Convert an RDKit Mol object to an InChIKey. Parameters ---------- mol : Mol Mol object. Returns ------- str InChIKey string. convert_mol_to_smarts(mol: 'Mol') -> 'str' Convert a structure MOL to SMARTS. Parameters ---------- mol : Mol MOL. Returns ------- str SMARTS. convert_mol_to_smiles(mol: 'Mol') -> 'str' Convert a structure MOL to SMILES. Parameters ---------- mol : Mol MOL. Returns ------- str SMILES. convert_molblock_to_mol(molblock: 'str') -> 'Mol | None' Convert a structure MOLBlock to MOL. Parameters ---------- molblock : str MOLBlock. Returns ------- Mol | None MOL. convert_smarts_to_mol(smarts: 'str') -> 'Mol | None' Convert a structure SMARTS to MOL. Parameters ---------- smarts : str SMARTS. Returns ------- Mol | None MOL. convert_smiles_to_canonical_smiles(smiles: 'str') -> 'str | None' Convert a structure SMILES to canonical SMILES. Parameters ---------- smiles : str SMILES. Returns ------- str | None SMILES. convert_smiles_to_formula(smiles: 'str') -> 'str | None' Convert a structure SMILES to a molecular formula. Parameters ---------- smiles : str SMILES. Returns ------- str | None A molecular formula. convert_smiles_to_inchi(smiles: 'str') -> 'str | None' Convert a structure SMILES to InChI. Parameters ---------- smiles : str SMILES. Returns ------- str | None InChI. convert_smiles_to_mass(smiles: 'str') -> 'float | None' Convert a structure SMILES to an exact mass. Parameters ---------- smiles : str SMILES. Returns ------- float | None An exact mass. convert_smiles_to_mol(smiles: 'str | None', sanitize: 'bool' = True) -> 'Mol | None' Convert a SMILES string to an RDKit Mol object. Performs sanitization and validation of the molecule. Returns None if the SMILES is invalid or the molecule fails validation. Parameters ---------- smiles : str | None SMILES string to convert. sanitize : bool True. Default is True. Returns ------- Mol | None None if conversion fails. download_file_if_not_exists(url: 'str', output: 'str') -> 'None' Downloads a file from the specified URL if it does not exist. Parameters ---------- url : str URL. output : str Output file path. enumerate_structures(mol: 'Mol') -> 'list[Mol]' Enumerate structural variants of a molecule. Uses RDKit's MolEnumerator to generate structural variants (e.g., for handling tautomers or stereoisomers in queries). Parameters ---------- mol : Mol Mol object to enumerate. Returns ------- list[Mol] Falls back to the original molecule if enumeration fails or produces no results. export_dict_to_json(dic: 'dict', output: 'str') -> 'None' Export dict to json. Parameters ---------- dic : dict A dictionary. output : str Output path. export_results(output: 'str | Path', results: 'list[dict]') -> 'None' Export a list of dictionaries to a CSV or TSV file. The output format is determined by the file extension: - .tsv: Tab-separated values - .csv: Comma-separated values Parameters ---------- output : str | Path Path to the output file. results : list[dict] List of dictionaries to export as rows. Raises ------ DataExportError If the export fails. fix_inchi_tautomerization(smiles: 'str') -> 'str' Fix InChI tautomerization. Parameters ---------- smiles : str SMILES. Returns ------- str SMILES. get_config() -> 'Config' Get the global configuration instance. Creates a new Config instance on first call, then returns the same instance on subsequent calls. Returns ------- Config Config instance. get_logger(name: 'str') -> 'logging.Logger' Get a logger for the given module name. This is the preferred way to obtain a logger in smartclass modules. The logger will be a child of the 'smartclass' logger, inheriting its configuration. Parameters ---------- name : str Module name (typically __name__). Returns ------- logging.Logger Example: >>> from smartclass.logging import get_logger >>> logger = get_logger(__name__) >>> logger.info("Processing started") get_num_atoms_bonds(mol: 'Mol') -> 'int' Get number of atoms and bonds. Parameters ---------- mol : Mol MOL. Returns ------- int Number of atoms and bonds. get_num_matched_atoms_bonds(mol_1: 'Mol', mol_2: 'Mol') -> 'int' Get number of matched atoms and bonds. Parameters ---------- mol_1 : Mol MOL. mol_2 : Mol MOL. Returns ------- int Number of matched atoms and bonds. get_request(url: 'str', query: 'str', max_retries: 'int | None' = None, base_delay: 'float | None' = None, timeout: 'int | None' = None) -> 'list[dict[str, str]]' Send a GET request to a SPARQL endpoint and retrieve JSON data. Uses exponential backoff with jitter for retries. Falls back to QLever endpoint if the primary endpoint fails. Parameters ---------- url : str URL. query : str SPARQL query string. max_retries : int | None None. Default is None. base_delay : float | None Base delay (seconds) for backoff. Uses config default if None. Default is None. timeout : int | None None. Default is None. Returns ------- list[dict[str, str]] List of dictionaries representing query results. Raises ------ NetworkError If the request fails after all retries. load_csv_from_path(path: 'str') -> 'DataFrame' Load csv from path. Parameters ---------- path : str Path of the file. Returns ------- DataFrame DataFrame. load_external_classes_file(file: 'str', id_name: 'str' = 'class', smarts_name: 'str' = 'structure') -> 'DataFrame' Load a Polars DataFrame from an external tsv file with chemical classes. Parameters ---------- file : str The name of the file to load. id_name : str Default is 'class'. smarts_name : str Default is 'structure'. Returns ------- DataFrame DataFrame containing the loaded data. load_json_from_path(path: 'str') -> 'dict' Load json from path. Parameters ---------- path : str Path of the file. Returns ------- dict A dictionary. load_json_from_url(url: 'str') -> 'dict[str, Any] | None' Load JSON from URL. Parameters ---------- url : str JSON file. Returns ------- dict[str, Any] | None None if loading fails. load_json_from_url_or_path(url: 'str', name: 'str') -> 'dict | None' Load json from URL or path. Parameters ---------- url : str URL to get the json from. name : str Name of the file. Returns ------- dict | None None. load_pkg_bitter_smiles() -> 'DataFrame' Load bitter SMILES data from the package file into a Polars DataFrame. Returns ------- DataFrame SMILES. load_pkg_chemical_hierarchy(file_path: 'str | Path' = 'scratch/wikidata_classes_taxonomy.tsv') -> 'dict[str, list[str]]' Load chemical class hierarchy from a TSV file. The file should have at least 3 columns, where: - Column 2 (index 1): Class URI - Column 3 (index 2): Parent URI Parameters ---------- file_path : str | Path Default is 'scratch/wikidata_classes_taxonomy.tsv'. Returns ------- dict[str, list[str]] URIs. Raises ------ DataLoadingError If the file cannot be read or parsed. load_pkg_classes() -> 'DataFrame' Load chemical classes data from the package file into a Polars DataFrame. Returns ------- DataFrame DataFrame containing chemical classes. load_pkg_data() -> 'tuple[DataFrame, DataFrame, DataFrame]' Load the package data. Returns ------- tuple[DataFrame, DataFrame, DataFrame] DataFrame containing the package data. load_pkg_file(file: 'str', directory: 'str' = 'smartclass.data') -> 'DataFrame' Load data from a package-bundled file into a Polars DataFrame. Supports TSV, CSV, and JSON file formats. Parameters ---------- file : str Name of the file to load (e.g., "classes_smarts.tsv"). directory : str Defaults to "smartclass.data". Returns ------- DataFrame DataFrame containing the loaded data. Raises ------ DataLoadingError If the file cannot be loaded or format is unsupported. load_pkg_mappings() -> 'DataFrame' Load chemont__wd mappings data from the package file into a Polars DataFrame. Returns ------- DataFrame DataFrame containing chemont__wd mappings. load_pkg_mia() -> 'DataFrame' Load Mono Indole Alkaloids (MIA) data from the package file into a Polars DataFrame. Returns ------- DataFrame MIA data. load_smiles(input: 'str | Path', column: 'str' = 'smiles', limit: 'int | None' = None) -> 'list[str]' Load unique SMILES strings from a CSV or TSV file. Parameters ---------- input : str | Path Path to the input file (CSV or TSV). column : str Default is 'smiles'. limit : int | None SMILES to return (for testing). Default is None. Returns ------- list[str] SMILES strings. Raises ------ DataLoadingError If the file cannot be read or column not found. load_tsv_from_path(path: 'str') -> 'DataFrame' Load tsv from path. Parameters ---------- path : str Path of the file. Returns ------- DataFrame DataFrame. read_query(query: 'str | Path') -> 'str' Read a SPARQL query from a file path or URL. Parameters ---------- query : str | Path URL to fetch query from. Returns ------- str SPARQL query string. Raises ------ DataLoadingError If the query cannot be loaded. extract_chebi(file_path: 'str' = 'scratch/chebi.obo', output: 'str' = 'scratch/chebi_extracted.tsv') Extract CHEBI. Parameters ---------- file_path : str Default is 'scratch/chebi.obo'. output : str Default is 'scratch/chebi_extracted.tsv'. get_chebi() Get CHEBI. sample_list(items: 'list[T]', max_samples: 'int' = 1000) -> 'list[T]' Randomly sample items from a list. Returns at most `max_samples` items. If the list has fewer items than `max_samples`, returns all items in random order. Parameters ---------- items : list[T] List of items to sample from. max_samples : int Default is 1000. Returns ------- list[T] List of randomly sampled items. split_csv(input_file: 'str', output_dir: 'str', lines_per_file: 'int' = 5000) -> 'None' Split a CSV file into multiple smaller CSV files. Parameters ---------- input_file : str CSV file. output_dir : str CSV files. lines_per_file : int Default is 5000. standardize(smiles: 'str') -> 'str | None' Standardize. Parameters ---------- smiles : str SMILES. Returns ------- str | None SMILES. ## Constants Module-level constants and data ## Other Additional exports smartclass.api SMARTCLASS API TODO. smartclass.resources.chebi Smartclass classifies structures using SMARTS.resources.chebi. smartclass.chem Smartclass classifies structures using SMARTS.chem. smartclass.resources.chembl Smartclass classifies structures using SMARTS.resources.chembl. smartclass.resources.chemont Smartclass classifies structures using SMARTS.resources.chemont. get_class_structures(classes: 'list[dict[str, list[str]]]') -> 'dict[str, list[str]]' Build a mapping from class IDs to their SMARTS structures. Parameters ---------- classes : list[dict[str, list[str]]] List of dicts mapping class IDs to lists of SMARTS patterns. Returns ------- dict[str, list[str]] Merged mapping from class ID to SMARTS pattern list. smartclass.io Smartclass classifies structures using SMARTS.io. smartclass.resources.wikidata Smartclass classifies structures using SMARTS.resources.wikidata.