aces.run module¶
Main script for end-to-end cohort extraction.
- aces.run.get_and_validate_label_schema(df: DataFrame) Table[source]¶
Validates the schema of a MEDS data DataFrame.
This function validates the schema of a MEDS label DataFrame, ensuring that it has the correct columns and that the columns are of the correct type. This function will:
- Parameters:¶
- df: DataFrame¶
The MEDS label DataFrame to validate.
- Returns:¶
The validated MEDS data DataFrame, with columns re-typed as needed.
- Return type:¶
pa.Table
- Raises:¶
ValueError – if do_retype is False and the MEDS data DataFrame is not schema compliant.
Examples
>>> df = pl.DataFrame({}) >>> get_and_validate_label_schema(df) Traceback (most recent call last): ... ValueError: MEDS Label DataFrame must have a 'subject_id' column of type Int64. >>> df = pl.DataFrame({ ... "subject_id": pl.Series([1, 3, 2], dtype=pl.UInt32), ... "time": [datetime(2021, 1, 1), datetime(2021, 1, 2), datetime(2021, 1, 3)], ... "boolean_value": [1, 0, 100], ... }) >>> get_and_validate_label_schema(df) pyarrow.Table subject_id: int64 prediction_time: timestamp[us] boolean_value: bool integer_value: int64 float_value: float categorical_value: string ---- subject_id: [[1,3,2]] prediction_time: [[null,null,null]] boolean_value: [[true,false,true]] integer_value: [[null,null,null]] float_value: [[null,null,null]] categorical_value: [[null,null,null]]