API Reference

This page documents the stable public API surface currently intended for users who want to call omicsmeta from Python. The package is still pre-alpha, so minor names may change before the first release.

Harmonizer

omicsmeta.core.harmonizer.Harmonizer is the main entry point. It accepts an optional mapper backend and a confidence threshold.

from omicsmeta.core.harmonizer import Harmonizer

harmonizer = Harmonizer(confidence_threshold=0.70)
result = harmonizer.from_file("metadata.tsv", file_type="tabular")

Supported input methods:

from_file(path, file_type="tabular"): read CSV/TSV metadata, a GEO SOFT snippet, BioSample XML file, or SRA XML file from disk.
from_geo(accession): fetch a GEO accession and harmonize its sample metadata.
from_rows(rows): harmonize an in-memory list of row dictionaries.

file_type is one of tabular, geo_soft, biosample_xml, or sra_xml.

HarmonizationResult

Harmonizer returns a HarmonizationResult dataclass with these attributes:

harmonized: accepted ontology mappings, one row per mapped metadata term.
unmapped: candidate mappings below the confidence threshold or fields that could not be routed confidently.
unmapped_summary: deduplicated manual-review table grouped by field type and normalized term.
sample_table: one row per input sample with sample-wide ontology summaries.
qc_summary: aggregate counts, mapping rates, and validation warnings.
detections: semantic field detection result for each input column.
issues: row-level validation warnings.

Mapping Backends

The default backend is BuiltinMapper, an offline exact and fuzzy synonym matcher. It includes a small seed vocabulary for common test and demonstration terms and can load additional OBO files.

from omicsmeta.core.mapper import BuiltinMapper, load_builtin_terms
from omicsmeta.core.harmonizer import Harmonizer

terms = load_builtin_terms(["disease_slim.obo"])
mapper = BuiltinMapper(terms=terms, confidence_threshold=0.75)
result = Harmonizer(mapper=mapper).from_file("metadata.tsv")

Text2TermMapper adapts the optional text2term package. It is useful when users want broader ontology mapping from a maintained general-purpose mapper.

from omicsmeta.core.mapper import Text2TermMapper
from omicsmeta.core.harmonizer import Harmonizer

mapper = Text2TermMapper(confidence_threshold=0.70)
result = Harmonizer(mapper=mapper).from_file("metadata.tsv")

Output Writers

Use omicsmeta.io.writers to write result tables and the compact HTML QC report:

from omicsmeta.io.writers import write_html_report, write_tabular

write_tabular(result.harmonized, "harmonized.tsv")
write_tabular(result.unmapped, "unmapped.tsv")
write_tabular(result.sample_table, "samples.tsv")
write_html_report(result.qc_summary, "qc_report.html")

Benchmark Helpers

Known-answer fixtures can be scored from Python with omicsmeta.benchmark.benchmark_file:

from omicsmeta.benchmark import benchmark_file

summary = benchmark_file(
    "examples/basic/metadata.tsv",
    "examples/basic/expected_harmonized.tsv",
)
print(summary["overall"])

The benchmark compares accepted direct mappings to expected sample_id, field_type, and ontology_id triples. Inferred mappings are excluded so benchmark scores reflect direct term mapping behavior.

Run a TSV manifest of benchmark cases with benchmark_suite:

from omicsmeta.benchmark import benchmark_suite

summary = benchmark_suite("benchmarks/known_answer_suite.tsv")
print(summary["case_count"], summary["overall"]["f1"])