Bhamni is a opensource EMR (http://www.bahmni.org/). These slides explain how to start performing exploratory data analysis of Bahmni data using R. Here is the playlist of YouTube videos which explains the code in these slide:
https://www.youtube.com/playlist?list=PLzknGpbejfSyYEvUJnhJwifqHE_6ztW2o
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Exploratory Data Analysis of Bahmni with R
1. EDA (Exploratory Data Analysis) of Bahmni (EMR) data
Karrtik Iyer
Mail: @karrtik
Tweets @karrtikiyer
YouTube playlist
2. Purpose
Explore EMR data collected over a period of time to:
1. Derive insights
2. Observe Trends
3. Establish probable correlations.
4. Help community to get started to explore their EMR data.
3. Objectives/Agenda
1. Look at patient trend across various regions
2. Top 10 diagnosis reported
3. Pick up top diagnosis to further analyze
a. Male/Female ratio
b. Top regions/villages
c. Age distribution
d. Year wise trend
e. Explore observations/results and chief complaints reported for these patients.
4. Insights from data and challenges
5. Quick peek into other insights which can be derived from this EMR data.
4. Pre-requisites
1. Basic knowledge of
a. Bahmni/OpenMRS data model and concept dictionary
b. SQL
c. R (RStudio IDE)
2. PC/MAC/Linux machine set up with
a. MySQL Client to connect to the MYSQL server on which Bahmni anonymous DB is set up,
it could be either local or remote server
b. R and RStudio installed
5. Why R?
1. Open source with great community support.
2. Lot of inbuilt packages for descriptive and predictive analytics which can
be used out of box.
a. Very good mix of packages for querying and plotting the data
3. Easy to learn and use
8. Fundamentals
1. Exploring tables and columns of our interests
2. Using R/RStudio
a. Connect to MYSQL DB
b. Load required R packages
9. Patients across Regions
1. Number of patients reported across various cities/villages.
2. Percentage of Male/Female Ratio
3. Percentage of patients from each region in top 10 cities/villages
12. Top 10 diagnosis
1. Explore distribution of various diagnoses reported across Male/Females
2. Pick up top 10 diagnosis and look at the male/female ratio
13. Top Diagnosis - Gastritis
Look at
1. Top 5 regions
a. With Male/Female distribution
2. Age distribution for Male/Female in the top 5 regions.
a. Boxplot
b. Histogram
3. Year wise trend
17. Explore results for top diagnosis - Gastritis
1. Gather all results for patients with gastritis.
2. Look at important results for female to identify any trends
18. Top Chronic Diagnosis - Diabetes
1. Gather all the lab results
2. Explore HBA1C results.
a. Lack of consistent data
3. Analyze Hemoglobin levels
a. Outliers
b. Flooring and Capping
c. Check for gender bias in 12 to 18 age group
21. What’s next?
1. Better understanding of data
2. Data cleaning and preparation
a. City/Village misspelled
b. Outlier detection and replacement strategy
c. Descriptive statistics, measures of central tendency, skewness, hypothesis testing.
3. Feature transformation
a. Extract new features
i. Like Average sugar levels from fasting and postprandial blood sugar levels
ii. Binning of variables such as age to infant, youth, adult, etc..
b. Natural Language processing (NLP)
i. Chief complaints
4. Clustering of patients