aces.run module¶

Main script for end-to-end cohort extraction.

aces.run.cli()[source]¶: Main entry point for the script, allowing for no-arg help messages.

aces.run.get_and_validate_label_schema(df: DataFrame) → Table[source]¶

Validates the schema of a MEDS data DataFrame.

This function validates the schema of a MEDS label DataFrame, ensuring that it has the correct columns and that the columns are of the correct type. This function will:

Re-type any of the mandator MEDS column to the appropriate type.

Attempt to add the numeric_value or time columns if either are missing, and set it to None. It will not attempt to add any other missing columns even if do_retype is True as the other columns cannot be set to None.

Parameters:¶

df: DataFrame¶: The MEDS label DataFrame to validate.

Returns:¶

The validated MEDS data DataFrame, with columns re-typed as needed.

Return type:¶

pa.Table

Raises:¶

ValueError – if do_retype is False and the MEDS data DataFrame is not schema compliant.

Examples

>>> df = pl.DataFrame({})
>>> get_and_validate_label_schema(df)
Traceback (most recent call last):
    ...
ValueError: MEDS Label DataFrame must have a 'subject_id' column of type Int64.
>>> df = pl.DataFrame({
...     "subject_id": pl.Series([1, 3, 2], dtype=pl.UInt32),
...     "time": [datetime(2021, 1, 1), datetime(2021, 1, 2), datetime(2021, 1, 3)],
...     "boolean_value": [1, 0, 100],
... })
>>> get_and_validate_label_schema(df)
pyarrow.Table
subject_id: int64
prediction_time: timestamp[us]
boolean_value: bool
integer_value: int64
float_value: float
categorical_value: string
----
subject_id: [[1,3,2]]
prediction_time: [[null,null,null]]
boolean_value: [[true,false,true]]
integer_value: [[null,null,null]]
float_value: [[null,null,null]]
categorical_value: [[null,null,null]]

aces.run.main(cfg: DictConfig) → None[source]¶