Worldwide Organ Donation and Human Development

There is a universal shortage of organs available for transplantation leading to thousands of people dying on transplant waiting lists every year. Every country has their own system for organ donation, allocation, and transplantation, leading to vastly different donation rates across countries.

This case study will use superheat to combine two sources of data in order to explore the recent trends in organ donation worldwide and its relation to Human Development using data from the WHO Global Observatory on Donation and Transplantation database and Human Development data from The UN.

Specifically, we will compare the organ donation trends of 58 countries over a 9-year time period spanning from 2006-2014, and simultaneously examine the Human Development Index of these countries.

# load in some useful libraries
library(knitr)
library(dplyr)
library(DT)
library(reshape2)

Obtaining the organ data

The organ donation dataset can be downloaded from the WHO Global Observatory on Donation and Transplantation database. The Global Observatory on Donation and Transplantation represents the most comprehensive source to date of worldwide data concerning activities in organ donation and transplantation derived from official sources. The data contains a several measurements concerning the number of organ donations by country from 2000-2014.

The searchable table below displays the raw data, much of which is missing. The data contains 19 measurements on various organ donation counts, for 194 countries.

# load in the data
organs.original <- read.csv("raw_data/global-organ-donation.csv")
# place data in an interactive data table
DT::datatable(organs.original, fillContainer = T, 
              height = 600, options = list(scrollY = 350))

Cleaning the data

The column names in the original dataset are somewhat unintelligible. Below we modify them so that they are in a human-readable format.

# clean the column names
clean.colnames <- c("region", "country", "year", 
                    "population", 
                    "total.deceased.donors", 
                    "brain.dead.donors",
                    "circulatory.death.donors",
                    "total.utilized.deceased.donors",
                    "utilized.brain.dead.donors",
                    "utilized.circulatory.death.donors",
                    "deceased.kidney.donors",
                    "living.kidney.donors",
                    "total.kidney.donors",
                    "deceased.liver.donors",
                    "domino.liver.donors",
                    "living.liver.donors",
                    "total.liver.donors",
                    "deceased.lung.donors",
                    "living.lung.donors",
                    "total.lung.donors",
                    "total.pancreas.donors",
                    "total.kidney.pancreas.donors",
                    "total.small.bowel.donors")
# clean the organs dataset
organs.clean <- organs.original
colnames(organs.clean) <- clean.colnames

We also want to convert the population variable to numeric format (it is currently a factor).

# convert population to numeric
organs.clean$population <- as.numeric(as.character(organs.clean$population))

Each variable (column) is described below:

region: The world region
country: The country
year: The year
population: The country’s population
total.deceased.donors: The total number of organs donated from deceased individuals. This should be equal to brain.dead.donors + circulatory.death.donors
brain.dead.donors: The number of organs donated from deceased individuals whose death has been determined by neurological criteria (“brain death”)
circulatory.death.donors: The number of organs donated from deceased individuals whose death has been determined by circulatory criteria (“circulatory death”)
total.utilized.deceased.donors: The number of organs from deceased donors that were used utilized in a transplant
utilized.brain.dead.donors: The number of organs from deceased donors (whose death was determined to be brain death) that were utilized in a transplant
utilized.circulatory.death.donors: The number of organs from deceased donors (whose death was determined to be circulatory death) that were utilized in a transplant
deceased.x.donors: The number of organ “x” from deceased donors
living.x.donors: The number of organ “x” from living donors
total.x.donors: The total number of organ “x” from deceased and living donors
domino.liver.donors: The number of domino liver donors

These variable names are significantly more intuitive and are presented in a consistent format. For example, a variable whose name begins with total corresponds to the total of two or more other variables (e.g. total.kidney.donors should be the sum of deceased.kidney.donors and living.kidney.donors).

The structure of the data is described in the Figure below which displays the variable hierarchy. Variables enclosed in a box are expected to add up to their parent variable (this turns out to not always be the case, but this is the expectation).

Remove countries for which we have no data

There are a number of countries that appear to have no recorded donation data (note that this does not necessarily mean that there were no organ donations, only that the data was not provided).

The countries that have no recorded measurements for the summary measurements deceased.donors, brain.dead.donors, and circulatory.death.donors are identified and removed from our dataset.

countries.no.data <- organs.clean %>% group_by(country) %>% 
  # identify how many missing values are recorded for each country in the summary columns
  summarise(region = unique(region),
            n = n(),
            missing.deceased = sum(is.na(total.deceased.donors)), 
            missing.brain.dead = sum(is.na(brain.dead.donors)),
            missing.circulatory.death = sum(is.na(circulatory.death.donors))) %>%
  # identify countries that have completely missing data
  mutate(no.data = ((missing.deceased == 15) & 
                    (missing.brain.dead == 15) & 
                    (missing.circulatory.death == 15))) %>%
  # filter to countries with no recorded data
  filter(no.data)
# how many countries have missing data
length(unique(countries.no.data$country))
## [1] 80

Below, we remove these 80 countries from our dataset.

# remove the missing data countries from the data table
organs.clean <- organs.clean %>% 
  filter(!(country %in% countries.no.data$country))

Extracting relevant information

Since we are only interested in the total donor counts, we decide to extract only the total.deceased.donors variable.

# total donor counts
donor.totals <- organs.clean %>% 
  select(region, country, year, population, 
         count = total.deceased.donors)
# look at the first 6 rows
kable(head(donor.totals))

region	country	year	population	count
Eastern Mediterranean	United Arab Emirates	2000	2.4	NA
Europe	Albania	2000	3.1	NA
Europe	Armenia	2000	3.5	NA
America	Argentina	2000	37.0	NA
Europe	Austria	2000	8.2	194
Western Pacific	Australia	2000	18.9	195

Next, we want to add a variable for “count by population” to each table.

# add a count-by-population variable
donor.totals <- donor.totals %>%
  mutate(count.by.pop = count / population)

Summarize into a matrix containing organ transplants by year for each country

The primary data matrix that we will use will be the total number of organ donations by year from 2006 and 2014 and by country, filtering to countries with no more than a single missing value in that time period.

The primary matrix of interest has each year as a row and each country as a column. The cell entries correspond to the number of organ donations from deceased persons per 100,000 individuals.

# filter to only deceased donors from 2006 to 2014
total.organs <- donor.totals %>%
  filter(year > 2005,
         year < 2015)
# identify countries with more than one missing value
few.missing.country <- total.organs %>% 
  group_by(country) %>% 
  # is there more than 1 year with missing data?
  summarise(missing = sum(is.na(count.by.pop)) > 1,
            missing_num = sum(is.na(count.by.pop))) %>%
  # remove these countries for which there is more than 1 year missing
  filter(!missing) %>%
  # isolte the unfiltered countries
  select(country) %>%
  unlist %>% as.character
# remove countries with more than 1 missing value
total.organs <- total.organs %>% 
  filter(country %in% few.missing.country)
# cast to year by country matrix
donor.matrix <- acast(total.organs,
                      country ~ year, 
                      value.var = "count.by.pop")
# view the first 6 rows and columns
kable(head(donor.matrix[, 1:6]))

	2006	2007	2008	2009	2010	2011
Argentina	11.815857	12.303797	13.007519	12.406948	14.324324	14.8039216
Australia	NA	9.611650	12.333333	11.596244	14.046512	14.9115044
Austria	25.243902	22.560976	20.476191	25.476191	23.333333	24.4047619
Belgium	27.115385	28.380952	26.095238	26.886792	20.654206	30.6481481
Brazil	5.886712	5.483534	6.853759	8.017553	9.897646	11.2201322
Bulgaria	2.467532	1.184211	1.052632	1.466667	2.666667	0.5405405

Obtaining the Human Development Index Data

Next, we want to be able to combine the organ donor data with Human Development Index (HDI) data. The data can be downloaded from the United Nations Development Program’s Human Development Reports. The HDI is calculated based on life expectancy, education and per capita indicators and has been shown to be correlated with organ transplantation.

# read in gdp data
hdi <- read.csv("raw_data/hdi.csv", header = F)

Cleaning the data

The HDI data was read in without proper column names. The code below cleans the data so that it has a standard format

# the column names are stored in the second row
colnames(hdi) <- (hdi[2,])
# remove the first two rows
hdi <- hdi[3:nrow(hdi),]
# change column names 1 and 2
colnames(hdi)[1:2] <- c("rank", "country")
# melt to long form
hdi <- melt(hdi, id.vars = c("country", "rank"))
# change column names
colnames(hdi) <- c("country", "rank", "year", "hdi")
# rearrange order of columns
hdi <- hdi %>% dplyr::select(country, year, rank, hdi)
# remove blank space after country name
hdi$country <- gsub(" ", "", as.character(hdi$country))
hdi$rank <- as.numeric(as.character(hdi$rank))

The table below displays the first 6 rows of the HDI dataset.

knitr::kable(head(hdi))

country	year	rank	hdi
Afghanistan	1980	171	0.228
Albania	1980	85	0.625
Algeria	1980	83	NA
Andorra	1980	34	NA
Angola	1980	149	NA
AntiguaandBarbuda	1980	58	NA

Next, we need to make sure that all of our countries in our organ donor dataset also appear in the HDI dataset. The code below shows that some of the countries are coded differently (e.g. the HDI dataset does not have spaces whereas the organ dataset does). We thus convert the country names in the HDI dataset to those in the organ dataset.

# which countries in donor.matrix are not in gdp?
rownames(donor.matrix)[!(rownames(donor.matrix) %in% unique(hdi$country))]
##  [1] "Czech Republic"                    
##  [2] "Dominican Republic"                
##  [3] "Iran (Islamic Republic of)"        
##  [4] "New Zealand"                       
##  [5] "Republic of Korea"                 
##  [6] "Saudi Arabia"                      
##  [7] "South Africa"                      
##  [8] "United Kingdom"                    
##  [9] "United States of America"          
## [10] "Venezuela (Bolivarian Republic of)"
# to find column names: unique(hdi$country)[grep("Venez", unique(hdi$country))]
# change country names accordingly
hdi[hdi$country == "CzechRepublic", "country"] <- "Czech Republic"
hdi[hdi$country == "DominicanRepublic", "country"] <- "Dominican Republic"
hdi[hdi$country == "Iran(IslamicRepublicof)", "country"] <- "Iran (Islamic Republic of)"
hdi[hdi$country == "NewZealand", "country"] <- "New Zealand"
hdi[hdi$country == "Korea(Republicof)", "country"] <- "Republic of Korea"
hdi[hdi$country == "SaudiArabia", "country"] <- "Saudi Arabia"
hdi[hdi$country == "SouthAfrica", "country"] <- "South Africa"
hdi[hdi$country == "UnitedKingdom", "country"] <- "United Kingdom"
hdi[hdi$country == "UnitedStates", "country"] <- "United States of America"
hdi[hdi$country == "Venezuela(BolivarianRepublicof)", "country"] <- "Venezuela (Bolivarian Republic of)"

Finally, we can filter the HDI dataset to the countries for which we have organ donor data. Meanwhile, we restrict the HDI rankings to the year 2014 only. These rankings will be placed as bars next to the rows of our heatmap.

hdi.match <- left_join(data.frame(country = rownames(donor.matrix)),
                          hdi,
                          by = "country")
# filter to 2014 hdi
hdi.match.2014 <- hdi.match %>% filter(year == 2014)

Superheat: an evaluation of organ donations by country and its relationship to HDI

Our goal now is to visualise the change in organ donation trends over time by country and assess the relationship between the organ donation.

Below we calculate the total number of organ donations by year aggregated over the countries. We will plot this above the columns/years of the heatmap.

# organ donations by year (aggregated across countries)
organs.by.year <- total.organs %>% 
  group_by(year) %>% 
  summarise(total = sum(count, na.rm = TRUE)) %>%
  select(total) %>%
  unlist

Next, we calculate the average (over years) number of transplants per 100,000 population for each country. We will use this to order the countries in the heatmap.

# order countries by number of transplants averaged over years
organs.by.country = total.organs %>% 
  group_by(country) %>%
  summarise(total = mean(count / population, na.rm = TRUE) * 100000)
# do a left join to make sure that the countries are in the correct order
organs.by.country <- left_join(data.frame(country = rownames(donor.matrix)),
                               organs.by.country,
                               by = "country")
organs.by.country <- organs.by.country$total

We also need to identify the region of each country.

# idenitfy the region of each country
country.region <- total.organs %>% 
  filter(country %in% rownames(donor.matrix)) %>%
  select(region, country) %>%
  distinct(region, country) %>%
  arrange(country)
# define a colour for each region
region.col <- factor(country.region$region)
levels(region.col) <- c("#e6f5c9", "#fdcdac", "#cbd5e8",
                        "#b3e2cd", "#f4cae4", "#fff2ae")
region.col.dark <- region.col
levels(region.col.dark) <- c("#a6d854", "#fc8d62", "#8da0cb",
                             "#66c2a5", "#e78ac3", "#ffd92f")
region.col <- as.character(region.col)
region.col.dark <- as.character(region.col.dark)

Loading Superheat

Installing the superheat package from github is easy if you have the devtools package installed in R. Simply type the following command:

# install devtools if you don't have it already
install.packages("devtools")
# install the development version of superheat
devtools::install_github("rlbarter/superheat")

Assuming that you didn’t run into any unfortunate errors when installing the package, you can load the package into R in the normal way.

library(superheat)

Finally, we can plot a superheatmap. The heatmap contains the number of organ donations per 100,000 individuals in each country for each year. To the right of the heatmap, adjacent to each row, we plot the country’s Human Development Index, and above the heatmap, we place a line plot which displays the organ donation trend (aggregate across countries) over time.

library(RColorBrewer)
superheat(as.matrix(donor.matrix),
          
          # set heatmap color map
          heat.pal = brewer.pal(5, "BuPu"),
          heat.na.col = "white",
          
          # order rows in increasing order of donations
          order.rows = order(organs.by.country),
          
          # grid line colors
          grid.vline.col = "white",
          
          # right plot: HDI
          yr = as.numeric(as.character(hdi.match.2014$rank)),
          yr.plot.type = "bar",
          yr.axis.name = "Human Development\nRanking",
          yr.plot.size = 0.5,
          yr.bar.col = region.col.dark,
          yr.obs.col = region.col,
          
          # top plot: donations by year
          yt = organs.by.year,
          yt.plot.type = "scatterline",
          yt.axis.name = "Total number\nof transplants\nper year",
          yt.plot.size = 0.2,
          yt.point.size = 4,
          yt.line.size = 2,
          
          # left labels
          left.label.size = 0.5,
          left.label.text.size = 3,
          left.label.col = adjustcolor(region.col, alpha.f = 0.3),
          
          # bottom labels
          bottom.label.size = 0.05,
          bottom.label.col = "white",
          bottom.label.text.angle = 90,
          bottom.label.text.alignment = "right")

Worldwide Organ Donation and Human Development

Rebecca Barter

Obtaining the organ data

Cleaning the data

Remove countries for which we have no data

Extracting relevant information

Summarize into a matrix containing organ transplants by year for each country

Obtaining the Human Development Index Data

Cleaning the data

Superheat: an evaluation of organ donations by country and its relationship to HDI

Loading Superheat