Task Examples¶

Provided below are two examples of mortality prediction tasks that ACES could easily extract subject cohorts for. The configurations have been tested all the provided synthetic data in the repository (sample_data/), as well as the MIMIC-IV dataset loaded using MEDS & ESGPT (with very minor changes to the below predicate definition). The configuration files for both of these tasks are provided in the repository (sample_configs/), and cohorts can be extracted using the aces-cli tool:

aces-cli data.path='/path/to/MIMIC/ESGPT/schema/' data.standard='esgpt' cohort_dir='sample_configs/' cohort_name='...'

For simplicity and consistency of these examples, we will use the following 4 window types and names. In practice, ACES supports arbitrary window types and window names, so you have full flexibility and control in how you define your windows in your task logic.

Window Legend

[1]:

from bigtree import print_tree

from aces import config

[2]:

config_path = "../../../sample_configs"

In-hospital Mortality¶

The below timeline specifies a binary in-hospital mortality prediction task where we aim to predict whether the patient dies (label=1) or is discharged (label=0):

In-hospital Mortality

Suppose we’d like to use all patient data up to and including 24 hours past an admission. We can therefore define the input window as above. We can also place criteria on the windows to filter out cohort. In this case, we’d like to ensure there is sufficient prior input data for our model, so we place a constraint that there must be at least 5 or more records (ie., with unique timestamps) within `input.

Next, suppose we’d like to only include hospital admissions that were longer than 48 hours. To represent this clause, we can specify gap as above with a length of 48 hours (overlapping the initial 24 hours of input). If we then place constraints on gap, preventing it to have any discharge or death events, then the admission must then be at least 48 hours.

Finally, we specify target, which is our prediction horizon and lasts until the immediately next discharge or death event. This allows us to extract a cohort that includes both patients who have died and those who did not (ie., successfully discharged).

In addition to constructing a cohort based on dynamic variables, we can also place constraints on static variables (ie., eye color). Suppose we’d like to filter our cohort to only those with blue eyes.

We can then specify a task configuration as below:

predicates:
  admission:
    code: event_type//ADMISSION
  discharge:
    code: event_type//DISCHARGE
  death:
    code: event_type//DEATH
  discharge_or_death:
    expr: or(discharge, death)

patient_demographics:
  eye_color:
    code: EYE//blue

trigger: admission

windows:
  input:
    start: NULL
    end: trigger + 24h
    start_inclusive: True
    end_inclusive: True
    has:
      _ANY_EVENT: (5, None)
    index_timestamp: end
  gap:
    start: trigger
    end: start + 48h
    start_inclusive: False
    end_inclusive: True
    has:
      admission: (None, 0)
      discharge: (None, 0)
      death: (None, 0)
  target:
    start: gap.end
    end: start -> discharge_or_death
    start_inclusive: False
    end_inclusive: True
    label: death

Predicates¶

To capture our task definition, we must define at least three predicates. Recall that these predicates are dataset-specific, and thus may be different depending on the data standard used or data schema.

For starters, we are specifically interested in mortality “in the hospital”. As such, an admission and a discharge predicate would be needed to represent events where patients are officially admitted “into” the hospital and where patients are officially discharged “out of” the hospital. We also need the death predicate to capture death events so we can accurately capture the mortality component.

Since our task endpoints could be either discharge or death (ie., binary label prediction), we may also create a derived predicate discharge_or_death which is expressed by an OR relationship between discharge and death.

Trigger¶

A prediction can be made for each event specified in trigger. This field must contain one of the previously defined dataset-specific predicates. In our case, we’d like to make a prediction of mortality for each valid admission in our cohort, and thus we set trigger to be the admission predicate.

Windows¶

The windows section contains the remaining three windows we defined previously - input, gap, and target.

input begins at the start of a patient’s record (ie., NULL), and ends 24 hours past trigger (ie., admission). As we’d like to include the events specified at both the start and end of input, if present, we can set both start_inclusive and end_inclusive as True. Our constraint on the number of records is specified in has using the _ANY_EVENT predicate, with its value set to be greater or equal to 5 (ie., unbounded parameter on the right as seen in (5, None)).

Note: Since we’d like to make a prediction at the end of input, we can set index_timestamp to be end, which corresponds to the timestamp of trigger + 24h.

gap also begins at trigger, and ends 48 hours after. As we have already included the left boundary event in trigger (ie., admission), it would be reasonable to not include it again as it should not play a role in gap. As such, we set start_inclusive to False. As we’d like our admission to be at least 48 hours long, we can place constraints specifying that there cannot be any admission, discharge, or death in gap (ie., right-bounded parameter at 0 as seen in (None, 0)).

target begins at the end of gap, and ends at the next discharge or death event (ie., discharge_or_death predicate). We can use this arrow notation which ACES recognizes as event references (ie., -> and <-; see Time Range Fields). In our case, we end target at the next discharge_or_death. Similarly, as we included the event at the end of gap, if any, already in gap, we can set start_inclusive to False.

Note: Since we’d like to make a binary mortality prediction, we can extract the death predicate as a label from target, by specifying the label field to be death.

Task Tree¶

ACES is then able to parse our configuration file and generate the below task tree that captures our task. You can see the relationships between nodes in the tree reflect that of the task timeline:

[3]:

inhospital_mortality_cfg_path = f"{config_path}/inhospital_mortality.yaml"
cfg = config.TaskExtractorConfig.load(config_path=inhospital_mortality_cfg_path)
tree = cfg.window_tree
print_tree(tree)

trigger
├── input.end
│   └── input.start
└── gap.end
    └── target.end

Imminent Mortality¶

The below timeline specifies a binary imminent mortality prediction task where we aim to predict whether the patient dies (label=1) or not (label=0) in the immediate 24 hours following a 2 hour period from any given time:

Imminent Mortality

In this case, we’d like to use all patient data up to and including the triggers (ie., every event). However, as we won’t be placing any constraints on this window, we actually do not need to add it into our task configuration, as ultimately, any and all data rows prior to the trigger timestamp will be included.

You can see that the trigger window essentially encapsulates the entire patient record. This is because we’d like to define a task to predict mortality at every single event in the record for simplicity. In practice, this might not be reasonable or feasible. For instance, you may only be interested in predicting imminent mortality within an admission. In this case, you might create admission_window, starting from admission predicates to discharge_or_death predicates. ACES would create a branch in the task tree from this window, and since the results ensure that all output rows satisfy all tree branches, the cohort would only include triggers on events in admission_window.

For this particular example, we create gap of 2 hours and target of 24 hours following gap. No specific constraints are set for either window, except for the time durations.

We can then specify a task configuration as below:

predicates:
  death:
    code: event_type//DEATH

trigger: _ANY_EVENT

windows:
  gap:
    start: trigger
    end: start + 2 hours
    start_inclusive: True
    end_inclusive: True
    index_timestamp: end
  target:
    start: gap.end
    end: start + 24 hours
    start_inclusive: False
    end_inclusive: True
    label: death

Predicates¶

Only a death predicate is required in this example to capture our label in target. However, as noted here, certain special predicates can be used without explicit definition. In this case, we will make use of _ANY_EVENT.

Trigger¶

A prediction can be made for each and every event. As such, trigger is set to the special predicate _ANY_EVENT.

Windows¶

The windows section contains the two windows we defined - gap and target. In this case, the gap and target windows are defined relative to every single event (ie., _ANY_EVENT). gap begins at trigger, and ends 2 hours after. target beings at the end of gap, and ends 24 hours after.

Note: Since we’d again like to make a binary mortality prediction, we can extract the death predicate as a label from target, by specifying the label field to be death. Additionally, since a prediction would be made at the end of each gap, we can set index_timestamp to be end, which corresponds to the timestamp of _ANY_EVENT + 24h.

Task Tree¶

As in the in-hospital mortality case, ACES is able to parse our configuration file and generate a task tree:

[4]:

imminent_mortality_cfg_path = f"{config_path}/imminent_mortality.yaml"
cfg = config.TaskExtractorConfig.load(config_path=imminent_mortality_cfg_path)

tree = cfg.window_tree
print_tree(tree)

trigger
└── gap.end
    └── target.end

Other Examples¶

A few other examples are provided in sample_configs/ of the repository. We will continue to add task configurations to MEDS-DEV, a benchmarking effort for EHR representation learning - stay tuned!