There is a universal shortage of organs available for transplantation leading to thousands of people dying on transplant waiting lists every year. Every country has their own system for organ donation, allocation, and transplantation, leading to vastly different donation rates across countries.
This case study will use superheat to combine two sources of data in order to explore the recent trends in organ donation worldwide and its relation to Human Development using data from the WHO Global Observatory on Donation and Transplantation database and Human Development data from The UN.
Specifically, we will compare the organ donation trends of 58 countries over a 9-year time period spanning from 2006-2014, and simultaneously examine the Human Development Index of these countries.
# load in some useful libraries
library(knitr)
library(dplyr)
library(DT)
library(reshape2)
The organ donation dataset can be downloaded from the WHO Global Observatory on Donation and Transplantation database. The Global Observatory on Donation and Transplantation represents the most comprehensive source to date of worldwide data concerning activities in organ donation and transplantation derived from official sources. The data contains a several measurements concerning the number of organ donations by country from 2000-2014.
The searchable table below displays the raw data, much of which is missing. The data contains 19 measurements on various organ donation counts, for 194 countries.
# load in the data
organs.original <- read.csv("raw_data/global-organ-donation.csv")
# place data in an interactive data table
DT::datatable(organs.original, fillContainer = T,
height = 600, options = list(scrollY = 350))
The column names in the original dataset are somewhat unintelligible. Below we modify them so that they are in a human-readable format.
# clean the column names
clean.colnames <- c("region", "country", "year",
"population",
"total.deceased.donors",
"brain.dead.donors",
"circulatory.death.donors",
"total.utilized.deceased.donors",
"utilized.brain.dead.donors",
"utilized.circulatory.death.donors",
"deceased.kidney.donors",
"living.kidney.donors",
"total.kidney.donors",
"deceased.liver.donors",
"domino.liver.donors",
"living.liver.donors",
"total.liver.donors",
"deceased.lung.donors",
"living.lung.donors",
"total.lung.donors",
"total.pancreas.donors",
"total.kidney.pancreas.donors",
"total.small.bowel.donors")
# clean the organs dataset
organs.clean <- organs.original
colnames(organs.clean) <- clean.colnames
We also want to convert the population
variable to numeric format (it is currently a factor).
# convert population to numeric
organs.clean$population <- as.numeric(as.character(organs.clean$population))
Each variable (column) is described below:
region
: The world regioncountry
: The countryyear
: The yearpopulation
: The country’s populationtotal.deceased.donors
: The total number of organs donated from deceased individuals. This should be equal to brain.dead.donors
+ circulatory.death.donors
brain.dead.donors
: The number of organs donated from deceased individuals whose death has been determined by neurological criteria (“brain death”)circulatory.death.donors
: The number of organs donated from deceased individuals whose death has been determined by circulatory criteria (“circulatory death”)total.utilized.deceased.donors
: The number of organs from deceased donors that were used utilized in a transplantutilized.brain.dead.donors
: The number of organs from deceased donors (whose death was determined to be brain death) that were utilized in a transplantutilized.circulatory.death.donors
: The number of organs from deceased donors (whose death was determined to be circulatory death) that were utilized in a transplantdeceased.x.donors
: The number of organ “x” from deceased donorsliving.x.donors
: The number of organ “x” from living donorstotal.x.donors
: The total number of organ “x” from deceased and living donorsdomino.liver.donors
: The number of domino liver donorsThese variable names are significantly more intuitive and are presented in a consistent format. For example, a variable whose name begins with total
corresponds to the total of two or more other variables (e.g. total.kidney.donors
should be the sum of deceased.kidney.donors
and living.kidney.donors
).
The structure of the data is described in the Figure below which displays the variable hierarchy. Variables enclosed in a box are expected to add up to their parent variable (this turns out to not always be the case, but this is the expectation).
There are a number of countries that appear to have no recorded donation data (note that this does not necessarily mean that there were no organ donations, only that the data was not provided).
The countries that have no recorded measurements for the summary measurements deceased.donors
, brain.dead.donors
, and circulatory.death.donors
are identified and removed from our dataset.
countries.no.data <- organs.clean %>% group_by(country) %>%
# identify how many missing values are recorded for each country in the summary columns
summarise(region = unique(region),
n = n(),
missing.deceased = sum(is.na(total.deceased.donors)),
missing.brain.dead = sum(is.na(brain.dead.donors)),
missing.circulatory.death = sum(is.na(circulatory.death.donors))) %>%
# identify countries that have completely missing data
mutate(no.data = ((missing.deceased == 15) &
(missing.brain.dead == 15) &
(missing.circulatory.death == 15))) %>%
# filter to countries with no recorded data
filter(no.data)
# how many countries have missing data
length(unique(countries.no.data$country))
## [1] 80
Below, we remove these 80 countries from our dataset.
# remove the missing data countries from the data table
organs.clean <- organs.clean %>%
filter(!(country %in% countries.no.data$country))
Since we are only interested in the total donor counts, we decide to extract only the total.deceased.donors
variable.
# total donor counts
donor.totals <- organs.clean %>%
select(region, country, year, population,
count = total.deceased.donors)
# look at the first 6 rows
kable(head(donor.totals))
region | country | year | population | count |
---|---|---|---|---|
Eastern Mediterranean | United Arab Emirates | 2000 | 2.4 | NA |
Europe | Albania | 2000 | 3.1 | NA |
Europe | Armenia | 2000 | 3.5 | NA |
America | Argentina | 2000 | 37.0 | NA |
Europe | Austria | 2000 | 8.2 | 194 |
Western Pacific | Australia | 2000 | 18.9 | 195 |
Next, we want to add a variable for “count by population” to each table.
# add a count-by-population variable
donor.totals <- donor.totals %>%
mutate(count.by.pop = count / population)
The primary data matrix that we will use will be the total number of organ donations by year from 2006 and 2014 and by country, filtering to countries with no more than a single missing value in that time period.
The primary matrix of interest has each year as a row and each country as a column. The cell entries correspond to the number of organ donations from deceased persons per 100,000 individuals.
# filter to only deceased donors from 2006 to 2014
total.organs <- donor.totals %>%
filter(year > 2005,
year < 2015)
# identify countries with more than one missing value
few.missing.country <- total.organs %>%
group_by(country) %>%
# is there more than 1 year with missing data?
summarise(missing = sum(is.na(count.by.pop)) > 1,
missing_num = sum(is.na(count.by.pop))) %>%
# remove these countries for which there is more than 1 year missing
filter(!missing) %>%
# isolte the unfiltered countries
select(country) %>%
unlist %>% as.character
# remove countries with more than 1 missing value
total.organs <- total.organs %>%
filter(country %in% few.missing.country)
# cast to year by country matrix
donor.matrix <- acast(total.organs,
country ~ year,
value.var = "count.by.pop")
# view the first 6 rows and columns
kable(head(donor.matrix[, 1:6]))
2006 | 2007 | 2008 | 2009 | 2010 | 2011 | |
---|---|---|---|---|---|---|
Argentina | 11.815857 | 12.303797 | 13.007519 | 12.406948 | 14.324324 | 14.8039216 |
Australia | NA | 9.611650 | 12.333333 | 11.596244 | 14.046512 | 14.9115044 |
Austria | 25.243902 | 22.560976 | 20.476191 | 25.476191 | 23.333333 | 24.4047619 |
Belgium | 27.115385 | 28.380952 | 26.095238 | 26.886792 | 20.654206 | 30.6481481 |
Brazil | 5.886712 | 5.483534 | 6.853759 | 8.017553 | 9.897646 | 11.2201322 |
Bulgaria | 2.467532 | 1.184211 | 1.052632 | 1.466667 | 2.666667 | 0.5405405 |
Next, we want to be able to combine the organ donor data with Human Development Index (HDI) data. The data can be downloaded from the United Nations Development Program’s Human Development Reports. The HDI is calculated based on life expectancy, education and per capita indicators and has been shown to be correlated with organ transplantation.
# read in gdp data
hdi <- read.csv("raw_data/hdi.csv", header = F)
The HDI data was read in without proper column names. The code below cleans the data so that it has a standard format
# the column names are stored in the second row
colnames(hdi) <- (hdi[2,])
# remove the first two rows
hdi <- hdi[3:nrow(hdi),]
# change column names 1 and 2
colnames(hdi)[1:2] <- c("rank", "country")
# melt to long form
hdi <- melt(hdi, id.vars = c("country", "rank"))
# change column names
colnames(hdi) <- c("country", "rank", "year", "hdi")
# rearrange order of columns
hdi <- hdi %>% dplyr::select(country, year, rank, hdi)
# remove blank space after country name
hdi$country <- gsub(" ", "", as.character(hdi$country))
hdi$rank <- as.numeric(as.character(hdi$rank))
The table below displays the first 6 rows of the HDI dataset.
knitr::kable(head(hdi))
country | year | rank | hdi |
---|---|---|---|
Afghanistan | 1980 | 171 | 0.228 |
Albania | 1980 | 85 | 0.625 |
Algeria | 1980 | 83 | NA |
Andorra | 1980 | 34 | NA |
Angola | 1980 | 149 | NA |
AntiguaandBarbuda | 1980 | 58 | NA |
Next, we need to make sure that all of our countries in our organ donor dataset also appear in the HDI dataset. The code below shows that some of the countries are coded differently (e.g. the HDI dataset does not have spaces whereas the organ dataset does). We thus convert the country names in the HDI dataset to those in the organ dataset.
# which countries in donor.matrix are not in gdp?
rownames(donor.matrix)[!(rownames(donor.matrix) %in% unique(hdi$country))]
## [1] "Czech Republic"
## [2] "Dominican Republic"
## [3] "Iran (Islamic Republic of)"
## [4] "New Zealand"
## [5] "Republic of Korea"
## [6] "Saudi Arabia"
## [7] "South Africa"
## [8] "United Kingdom"
## [9] "United States of America"
## [10] "Venezuela (Bolivarian Republic of)"
# to find column names: unique(hdi$country)[grep("Venez", unique(hdi$country))]
# change country names accordingly
hdi[hdi$country == "CzechRepublic", "country"] <- "Czech Republic"
hdi[hdi$country == "DominicanRepublic", "country"] <- "Dominican Republic"
hdi[hdi$country == "Iran(IslamicRepublicof)", "country"] <- "Iran (Islamic Republic of)"
hdi[hdi$country == "NewZealand", "country"] <- "New Zealand"
hdi[hdi$country == "Korea(Republicof)", "country"] <- "Republic of Korea"
hdi[hdi$country == "SaudiArabia", "country"] <- "Saudi Arabia"
hdi[hdi$country == "SouthAfrica", "country"] <- "South Africa"
hdi[hdi$country == "UnitedKingdom", "country"] <- "United Kingdom"
hdi[hdi$country == "UnitedStates", "country"] <- "United States of America"
hdi[hdi$country == "Venezuela(BolivarianRepublicof)", "country"] <- "Venezuela (Bolivarian Republic of)"
Finally, we can filter the HDI dataset to the countries for which we have organ donor data. Meanwhile, we restrict the HDI rankings to the year 2014 only. These rankings will be placed as bars next to the rows of our heatmap.
hdi.match <- left_join(data.frame(country = rownames(donor.matrix)),
hdi,
by = "country")
# filter to 2014 hdi
hdi.match.2014 <- hdi.match %>% filter(year == 2014)
Our goal now is to visualise the change in organ donation trends over time by country and assess the relationship between the organ donation.
Below we calculate the total number of organ donations by year aggregated over the countries. We will plot this above the columns/years of the heatmap.
# organ donations by year (aggregated across countries)
organs.by.year <- total.organs %>%
group_by(year) %>%
summarise(total = sum(count, na.rm = TRUE)) %>%
select(total) %>%
unlist
Next, we calculate the average (over years) number of transplants per 100,000 population for each country. We will use this to order the countries in the heatmap.
# order countries by number of transplants averaged over years
organs.by.country = total.organs %>%
group_by(country) %>%
summarise(total = mean(count / population, na.rm = TRUE) * 100000)
# do a left join to make sure that the countries are in the correct order
organs.by.country <- left_join(data.frame(country = rownames(donor.matrix)),
organs.by.country,
by = "country")
organs.by.country <- organs.by.country$total
We also need to identify the region of each country.
# idenitfy the region of each country
country.region <- total.organs %>%
filter(country %in% rownames(donor.matrix)) %>%
select(region, country) %>%
distinct(region, country) %>%
arrange(country)
# define a colour for each region
region.col <- factor(country.region$region)
levels(region.col) <- c("#e6f5c9", "#fdcdac", "#cbd5e8",
"#b3e2cd", "#f4cae4", "#fff2ae")
region.col.dark <- region.col
levels(region.col.dark) <- c("#a6d854", "#fc8d62", "#8da0cb",
"#66c2a5", "#e78ac3", "#ffd92f")
region.col <- as.character(region.col)
region.col.dark <- as.character(region.col.dark)
Installing the superheat package from github is easy if you have the devtools
package installed in R. Simply type the following command:
# install devtools if you don't have it already
install.packages("devtools")
# install the development version of superheat
devtools::install_github("rlbarter/superheat")
Assuming that you didn’t run into any unfortunate errors when installing the package, you can load the package into R in the normal way.
library(superheat)
Finally, we can plot a superheatmap. The heatmap contains the number of organ donations per 100,000 individuals in each country for each year. To the right of the heatmap, adjacent to each row, we plot the country’s Human Development Index, and above the heatmap, we place a line plot which displays the organ donation trend (aggregate across countries) over time.
library(RColorBrewer)
superheat(as.matrix(donor.matrix),
# set heatmap color map
heat.pal = brewer.pal(5, "BuPu"),
heat.na.col = "white",
# order rows in increasing order of donations
order.rows = order(organs.by.country),
# grid line colors
grid.vline.col = "white",
# right plot: HDI
yr = as.numeric(as.character(hdi.match.2014$rank)),
yr.plot.type = "bar",
yr.axis.name = "Human Development\nRanking",
yr.plot.size = 0.5,
yr.bar.col = region.col.dark,
yr.obs.col = region.col,
# top plot: donations by year
yt = organs.by.year,
yt.plot.type = "scatterline",
yt.axis.name = "Total number\nof transplants\nper year",
yt.plot.size = 0.2,
yt.point.size = 4,
yt.line.size = 2,
# left labels
left.label.size = 0.5,
left.label.text.size = 3,
left.label.col = adjustcolor(region.col, alpha.f = 0.3),
# bottom labels
bottom.label.size = 0.05,
bottom.label.col = "white",
bottom.label.text.angle = 90,
bottom.label.text.alignment = "right")