EXTEND

EXTraction of EMR Numerical Data

Overview

EXTraction of EMR Numerical Data (EXTEND) was develop by Tianrun Cai at Brigham and Women’s Hospital, Katherine P. Liao at Brigham and Women’s Hospital, Frank Rybicki at University of Ottawa, Tianxi Cai at Harvard T.H. Chan School of Public Health. EXTEND is a natural language processing (NLP) tool that can efficiently extract numerical clinical data from different type of narrative notes with high accuracy. By expanding the dictionary and developing new rules, the usage of EXTEND can be easily expanded to extract additional numerical data important in clinical outcomes research.

GitHub Repo

Installation

The following installation steps have been tested to work in a 64-bit Python 3.7 environment on both Windows 10 and Windows Server 2016.

Installation Steps:

  1. Create a system environment variable called ENTEND_HOME, and assign a desired path of the EXTEND main folder as the value.
  2. Download and unzip the EXTEND folder.
  3. In a command line window, change the directory to folder EXTEND-master.
  4. Run python setup.py install.
  5. In order to perform data extraction, please select some of variables below to run (Note: it’s case sensitive).
  6. Current version of EXTEND can be used to extract variables in the list:
    ['ECOG', 'EF', 'BMI', 'H', 'W', 'RR’, 'T', 'BP', 'HR', 'Sat’, 'PDL1', 'Crn', 'HbA1C']
    • ECOG: Eastern Cooperative Oncology Group
    • EF: Ejection Fraction
    • BMI: Body Mass Index
    • H: Height
    • W: Weight
    • RR: Respiratory Rate
    • T: Temperature
    • HR: Heart Rate
    • Sat: Oxygen Saturation
    • PDL1: Programmed death-ligand 1
    • Crn: Creatinine
    • HbA1C: Hemoglobin A1C
  7. Example: If we would like to etract EF and BMI, we can use ['EF', 'BMI'] in the script.

Reference

Cai T, Zhang L, Yang N, Kumamaru KK, Rybicki FJ, Cai T, Liao KP. EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research. BMC Medical Informatics and Decision Making. 2019;19(1):226. doi: 10.1186/s12911-019-0970-1. PMID: 31730484; PMCID: PMC6858776.