aces.run module

Main script for end-to-end cohort extraction.

aces.run.cli()[source]

Main entry point for the script, allowing for no-arg help messages.

aces.run.get_and_validate_label_schema(df: DataFrame) Table[source]

Validates the schema of a MEDS data DataFrame.

This function validates the schema of a MEDS label DataFrame, ensuring that it has the correct columns and that the columns are of the correct type. This function will:

  1. Re-type any of the mandator MEDS column to the appropriate type.

  2. Attempt to add the numeric_value or time columns if either are missing, and set it to None. It will not attempt to add any other missing columns even if do_retype is True as the other columns cannot be set to None.

Parameters:
df: DataFrame

The MEDS label DataFrame to validate.

Returns:

The validated MEDS data DataFrame, with columns re-typed as needed.

Return type:

pa.Table

Raises:

ValueError – if do_retype is False and the MEDS data DataFrame is not schema compliant.

Examples

>>> df = pl.DataFrame({})
>>> get_and_validate_label_schema(df)
Traceback (most recent call last):
    ...
ValueError: MEDS Label DataFrame must have a 'subject_id' column of type Int64.
>>> df = pl.DataFrame({
...     "subject_id": pl.Series([1, 3, 2], dtype=pl.UInt32),
...     "time": [datetime(2021, 1, 1), datetime(2021, 1, 2), datetime(2021, 1, 3)],
...     "boolean_value": [1, 0, 100],
... })
>>> get_and_validate_label_schema(df)
pyarrow.Table
subject_id: int64
prediction_time: timestamp[us]
boolean_value: bool
integer_value: int64
float_value: float
categorical_value: string
----
subject_id: [[1,3,2]]
prediction_time: [[null,null,null]]
boolean_value: [[true,false,true]]
integer_value: [[null,null,null]]
float_value: [[null,null,null]]
categorical_value: [[null,null,null]]
aces.run.main(cfg: DictConfig) None[source]