aces.configs package¶
Module contents¶
This subpackage contains the Hydra configuration groups for ACES, which can be used for aces-cli.
Configuration Group File Structure:
config/
├─ data/
│ ├─ single_file.yaml
│ ├─ defaults.yaml
│ ├─ sharded.yaml
├─ aces.yaml
aces-cli help message:
================== aces-cli ===================
Welcome to the command-line interface for ACES!
This end-to-end tool extracts a cohort from the external dataset based on a defined task configuration
file and saves the output file(s). Several data standards are supported, including `meds` (requires a
dataset in the MEDS format, either with a single shard or multiple shards), `esgpt` (requires a dataset
in the ESGPT format), and `direct` (requires a pre-computed predicates dataframe as well as a timestamp
format). Hydra multi-run (`-m`) and sweep capabilities are supported, and launchers can be configured.
------------- Configuration Groups ------------
$APP_CONFIG_GROUPS
`data` is defaulted to `data=single_file`. Use `data=sharded` to enable extraction with multiple shards
on MEDS.
------------------ Arguments ------------------
data.*:
- path (required): path to the data directory if using MEDS with multiple shards or ESGPT, or path to
the data `.parquet` if using MEDS with a single shard, or path to the predicates dataframe
(`.csv` or `.parquet`) if using `direct`
- standard (required): data standard, one of 'meds', 'esgpt', or 'direct'
- ts_format (required if data.standard is 'direct'): timestamp format for the data
- root (required, applicable when data=sharded): root directory for the data shards
- shard (required, applicable when data=sharded): shard number of specific shard from a MEDS dataset.
Note: data.shard can be expanded using the `expand_shards` function. Please refer to
https://eventstreamaces.readthedocs.io/en/latest/usage.html#multiple-shards and
https://github.com/justin13601/ACES/blob/main/src/aces/expand_shards.py for more information.
cohort_dir (required): cohort directory, used to automatically load configs, saving results, and logging
cohort_name (required): cohort name, used to automatically load configs, saving results, and logging
config_path (optional): path to the task configuration file, defaults to '<cohort_dir>/<cohort_name>.yaml'
predicates_path (optional): path to a separate predicates-only configuration file for overriding
output_filepath (optional): path to the output file, defaults to '<cohort_dir>/<cohort_name>.parquet'
---------------- Default Config ----------------
$CONFIG
------------------------------------------------
All fields may be overridden via the command-line interface. For example:
aces-cli cohort_name="..." cohort_dir="..." data.standard="..." data="..." data.root="..."
"data.shard=$$(expand_shards .../...)" ...
For more information, visit: https://eventstreamaces.readthedocs.io/en/latest/usage.html
Powered by Hydra (https://hydra.cc)
Use --hydra-help to view Hydra specific help
===============================================