---
title: "Age-standardized rates"
date: "3/9/2022"
output: rmarkdown::html_vignette
header-includes:
   - \usepackage{amsmath}
vignette: >
  %\VignetteIndexEntry{Age-standardized rates}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: bib.bib
link-citations: yes
nocite: |
  @donegan_2022
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  results = "hold", 
  collapse = TRUE, 
  eval = TRUE,
  fig.pos = 'h', 
  fig.align = 'center'
)
```

This vignette describes the method of direct age-standardization [@broemeling_2020] and then demonstrates implementation with *surveil*. 

## Data

The demonstration will use age-specific cancer incidence for the entire population of the United States of America, 1999-2017. 

```{r}
library(surveil)
data(cancer)
head(cancer)
```

We will also use the age distribution of the United States in the year 2000 (2000 U.S. standard million population, see `?standard`):

```{r}
data(standard)
print(standard)
```

Notice that the five-year age groups in the `cancer` data match the age groups provided by `standard`. 

In some cases, one is only interested in a subset of age groups. For the following examples, we will limit the analysis to persons 50-69 years old:

```{r}
cancer2 <- subset(cancer, grepl("50-54|55-59|60-64|65-69", Age))
head(cancer2)
```

Subsetting the data allows for faster results. If instead of making this selection we were to use the entire age distribution in our analysis, all of the following discussion and code could still proceed unchanged. 

## Direct age-standardization

Let $\theta_i$ be the disease risk in the $i^{th}$ age group, and let $\omega_i$ be the standard population count for that age group. Then the age-standardized risk is:
$$SR = \frac{\sum_i \theta_i \omega_i}{\sum_i \omega_i}$$
That is, age-standardization consists of multiplying actual age-specific rates by false, but fixed, population sizes. This enables comparisons to be made across populations that have different age structures.

There are two steps to producing age-standardized rates using **surveil**:

 1. Model time trends for each age group using `stan_rw`.
 2. Convert the age-specific model results into age-standardized rates using the `standardize` function.

## Modeling age-specific risk

To model age-specific rates, provide the `stan_rw` function with the cancer data and tell it which column contains the time period indicator (`Year`) and which column contains the grouping variable (`Age`):

```{r}
fit <- stan_rw(cancer2, time = Year, group = Age,
               refresh = 0,# silences some printing	       
               iter = 2e3,
               chains = 2) # for demo speed only. Use the default chains = 4
```

The default plot method will return all of the age-specific time trends on the same plot. It is sometimes easier to understand the results using a grid of multiple small plots (`facet = TRUE`). There is also an option to allow the scale of the y-axes to adjust for each age group (`facet_scales = "free"`):

```{r fig.height = 4.5, fig.width = 6.5}
plot(fit, 
     facet = TRUE,          # plot small multiples
     facet_scales = "free", # y-axes vary across plots
     base_size = 10,      # control text size 
     size = 0,            # removes crude rates from the plots
     scale = 100e3        # plot rates per 100,000
     )
```

The figures contain estimates with shaded 95\% credible intervals.

In addition to examining trends in age-specific rates (as above), we can also convert each age-specific trend to its annual percent change or cumulative percent change. 

```{r fig.height = 4.5, fig.width = 6.5}
fit_apc <- apc(fit)
plot(fit_apc, 
     base_size = 10,
     cum = TRUE)
```

## Age-standardizing model results

The `standardize` function takes a fitted model, plus the standard population, and returns standardized rates (SRs):

```{r}
fit_sr <- standardize(fit,
                        label = standard$age,
                        standard_pop = standard$standard_pop)
```

As usual, *surveil* provides a plotting method for the `fit_sr` object:

```{r fig.height = 4, fig.width = 5}
# load ggplot2 to enable additional plot customization
library(ggplot2)
plot(fit_sr, scale = 100e3, base_size = 10) + 
  labs(title = "US age-standardized cancer incidence per 100,000",
  subtitle = "Ages 50-69")
```

as well as a printing method:

```{r}
print(fit_sr, scale = 100e3)
```

To learn about the contents of `fit_sr`, see `?standardize` or explores its contents as you would a list using `names(fit_sr)`, `fit_sr$standard_summary`, `head(fit_sr$standard_samples`), etc.

The `apc` function and its methods for printing and plotting can be applied to the age-standardized results:

```{r}
fit_sr_pc <- apc(fit_sr)
```

```{r}
plot(fit_sr_pc, cum = TRUE)
```

## References