Data Visualization Makeover 02

Critique of Singapore Age-Sex Pyramid

Author

Bryant Philippe Lee

Published

January 29, 2022

Objective

The objective of this exercise is to critique and improve an Age-Sex Pyramid Analytic Visualization of Singapore of the year 2022. The output is to recreate a Trellis Chart Age-Sex Pyramid through R-Studios by using ggplot2, ggplot2 extensions and tidyverse packages.

The visualization is created using the Singapore Residents by Planning Area / Sub-zone, Age Group, Sex and Type of Dwelling, June 2022 (Singstat).

Note

The output should be focused as a static Trellis Chart without filter interactivity.

Original Visualization and Critique

The Original Visualization Source: https://public.tableau.com/app/profile/joseph.zexeong.tan/viz/SingaporePopulationPyramindJun2022v1_3/trel3x3_d?publish=yes

Clarity

  • The Original Visualization shows 2 separate Age-Pyramid Trellis Chart that shows the same message. Two sets of visualization normally indicate that they visualize two different messages that may it be independent of each other or connected. The issue with the dashboard is that both sets indicate the same message just in a different format. This raises either confusion to the reader and a form of redundancy that can be best allocated to something else.

  • 3x3 Trellis Chart limits the visualization to only 9 Age-Sex Pyramids. Limited and focused visualizations are normally selected under a criteria. This criteria may be in terms of highest population cumulative or selective. The original visualization shows no indication of the criteria chosen to only visualize 9 areas and may only be concluded as a random selection.

  • 3x3 Trellis Chart lacks y-axis labels. Taking to consideration that the Trellis Charts is a stand alone visualization, the lack of y-axis label reduces the clarity of the y-axis values. Furthermore, both Trellis Charts lack x-axis values and labels but make up for it through text values per bar. In terms of clarity, one can argue that the lack of x-axis may confuse the reader due to the text values having no reference while others say the taxt vales are enough to compensate the lack of x-axis.

Aesthetic

  • Text Values center bar alignment makes reading the values difficult. Due to the mismatched alignment of the text values brought abut by the central alignment placement in the bar chart makes reading the values difficult as the reader cannot simply scroll down and read the values, not mention creates unnecessary confusion in the graph.

  • Text Values on each Bar Chart overwhelms the visualization. More than the mismatch alignment, the number of the text values in the chart saturates the charts and makes it “noisy”. The text values fill the dashboard with a lot of numbers that is makes the labels and visuals difficult to look at and interpret.

  • Horizontally arranged Trellis Chart limits and squeezes the Age-Sex Pyramids. This visualization arrangement artificially contorts the bar length due to the lack of width space. The squeeze minimizes the visual differentiation between bar charts as such creates an illusion that some bars are of the same size.

Getting Started - Data Loading and Processing

Installing and loading the required libraries

Before we get started, it is important for us to ensure that the required R packages have been installed.

pacman::p_load(tidyverse,ggplot2,knitr, plotly, skimr, questionr, funModeling, sf, tmap, quanteda, NLP, ggiraph)

Importing Data

This code chunk is to import the data from respopagesextod2022.csv file to the Quarto/R page.

pop_data <- read_csv("data/respopagesextod2022.csv")
Rows: 100928 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): PA, SZ, AG, Sex, TOD
dbl (2): Pop, Time

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The CSV File contains the following columns:

COLUMN NAME DATA TYPE DESCRIPTION
Planning Area (PA) Character Distinct Planning Areas in Singapore designated and mapped by the Government of Singapore
Subzone (SZ) Character Sub-areas within each Planning Areas
Age Group (AG) Character Sets of population age group by 5 (ex 0-4, 5-9, etc)
Sex Character Binary biological identifier of gender
Type of Dwelling (TOD) Character Available dwelling types in Sinagpore
Population (Pop) Numerical Population per category
Time Numerical Year -> 2022

Data Exploration and Cleaning

This section is to check incorrect and missing values in the data set.

skimr::skim(pop_data)
Data summary
Name pop_data
Number of rows 100928
Number of columns 7
_______________________
Column type frequency:
character 5
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
PA 0 1 4 23 0 55 0
SZ 0 1 4 29 0 332 0
AG 0 1 6 11 0 19 0
Sex 0 1 5 7 0 2 0
TOD 0 1 6 39 0 8 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Pop 0 1 40.44 125.73 0 0 0 20 2300 ▇▁▁▁▁
Time 0 1 2022.00 0.00 2022 2022 2022 2022 2022 ▁▁▇▁▁

Create Age-Sex Pyramid

This section explores the creation of an Age-Sex Pyramid.

Exploration of y-axis Data

Exp1 <- pop_data %>% 
  filter(PA == "Ang Mo Kio") %>%
  group_by(AG, Sex) %>%
  summarise(`sum_pop` = sum(`Pop`), n = n()) %>%
  ungroup()
`summarise()` has grouped output by 'AG'. You can override using the `.groups`
argument.
ggplot(data = Exp1,
       aes(y = AG)) +
  geom_bar() +
  theme_bw() +  
  ggtitle("Test: Age Group Distribution") 

Based on the y-axis (Age Group), 2 issues are noticed:

  • Each Age Group is written with an underscore (“_”) instead of a space in between each word/number

  • The values are organized alphabetically with consideration of the starting value (number or alphabet). As such “5_to_9” came before “45_to_49”

Correcting Age Labels

pop_data$AG <- gsub("_", " ", pop_data$AG, fixed = TRUE)

Correcting Sequence

age_correct <- c("0 to 4", "5 to 9", "10 to 14", "15 to 19", "20 to 24", "25 to 29", "30 to 34", "35 to 39", "40 to 44", "45 to 49", "50 to 54", "55 to 59", "60 to 64", "65 to 69", "70 to 74", "75 to 79", "80 to 84", "85 to 89", "90 and over")

pop_sg <- pop_data %>%
  group_by(AG, Sex) %>%
  summarise(`sum_pop` = sum(`Pop`)) %>%
  mutate(AG = factor(AG, levels = age_correct)) %>%
  arrange(AG) %>%
  ungroup()
`summarise()` has grouped output by 'AG'. You can override using the `.groups`
argument.
pop_sg <- pop_sg %>%
  mutate(pct = scales::percent((sum_pop/sum(sum_pop)), accuracy = 0.01),
         res = str_c(sum_pop, ", ", pct))

pop_pa <- pop_data %>%
  group_by(PA, AG, Sex) %>%
  summarise(`sum_pop` = sum(`Pop`)) %>%
  mutate(AG = factor(AG, levels = age_correct)) %>%
  arrange(AG) %>%
  ungroup()
`summarise()` has grouped output by 'PA', 'AG'. You can override using the
`.groups` argument.
pop_pa <- pop_pa %>%
  mutate(pct = scales::percent((sum_pop/sum(sum_pop)), accuracy = 0.01),
         res = str_c(sum_pop, ", ", pct))

Creating a Age-Sex Pyramid

This section explores the visualization of an Age-Sex Pyramid.

Note

This section will be using the total Singapore data (cummulative of all Planning Area)

SG_Pyr <- ggplot(pop_sg,
                    aes(x = ifelse(Sex == "Males", 
                                   yes = sum_pop*(-1),
                                   no = sum_pop),
                        y = AG,
                        fill = Sex)) + 
  geom_col() +
  scale_x_continuous(limits = c(-170000, 170000),
    breaks = seq(-200000, 200000, 50000), 
                     labels = paste0(
                       as.character(
                         c(seq(200, 0, -50), 
                           seq(50, 200, 50))),
                       "k")) +
  scale_y_discrete(expand = expansion(mult = c(0, 0.01))) +
  labs (x = "Count of Population", 
        y = "Age Group",
        fill = "Gender",
        title = "Singapore Population Pyramid 2022",
        subtitle = "Distribution of gender and age groups",
        caption = "Data Source : Singstat.gov.sg, June 2022") +
  theme_bw() +
  theme(plot.title = element_text(size = 14, 
                                  colour = "#424242",
                                  face = "bold",
                                  hjust = 0.5),
        plot.subtitle = element_text(size = 8,
                                     colour = "#424242",
                                     hjust = 0.5),
        plot.caption = element_text(size = 5,
                                    colour = "#424242",
                                    hjust = 0),
        axis.ticks = element_line(colour = "#424242",
                                  linewidth = 0.1),
        axis.title.y = element_text(angle = 0, 
                                    size = 8, 
                                    colour = "#424242",
                                    face = "bold",
                                    vjust = 1.05,
                                    hjust = 1,
                                    margin = margin(r = -40, l = 10)),
        axis.title.x = element_text(size = 8, 
                                    colour = "#424242",
                                    face = "bold"),
        axis.text.x = element_text(size = 7,
                                   colour = "#424242"),
        axis.text.y = element_text(size = 7,
                                   colour = "#424242"),
        legend.position = "bottom",
        legend.justification = "left",
        legend.text = element_text(size = 7,
                                   colour = "#424242"),
        legend.title = element_text(size = 8, 
                                    colour = "#424242"),
        panel.grid.major = element_line(linewidth = rel(0.5)),
        panel.grid.minor = element_blank(),
        plot.background = element_rect(fill = "#ffffff"),
        legend.background = element_rect(fill = "#ffffff"),
        legend.margin = margin(t = -10),
        panel.border = element_rect(colour = "#424242",
                                    linewidth = 0.3)) 
  
SG_Pyr

Creating A Trellis Age-Sex Pyramid

This section explores the visualization of an Age-Sex Pyramid within a Trellis Chart. Each Pyramid indicates per Planning Area.

Removing Planning Areas without Values

It was previously identified that 26 out of 55 Planning Areas have no value/data. To maximize space, 9 Planning Areas will be retained and filtered.

Note

Due to the limitations of space in r generated visualization in quarto only a limited number of planning areas can be shown

t_pop_pa <- pop_data %>%
  group_by(PA) %>%
  summarise(`sum_pop` = sum(`Pop`)) %>%
  ungroup()

trellis9 <- arrange(t_pop_pa, desc(t_pop_pa$sum_pop)) %>%
  slice(1:9) %>%
  select(PA)

trellis9_filter <- pop_pa %>%
  filter(pop_pa$PA %in% trellis9$PA)

Creating Trellis Chart Age-Sex Pyramid

PA_Pyr <- ggplot() +
  geom_bar(data = subset(trellis9_filter, 
                         Sex == "Males"), 
           aes(x = AG, 
               y = -sum_pop, 
               fill = PA),
           stat = "identity", 
           fill = "#2E9598") +
  geom_bar(data = subset(trellis9_filter, 
                         Sex == "Females"), 
           aes(x = AG, 
               y = sum_pop, 
               fill = PA), 
           stat = "identity", 
           fill = "#EC1B4B") +
  coord_flip() +
  facet_wrap(.~ PA,
             drop = FALSE, 
             ncol = 3,
             scales = "fixed")+
  scale_y_continuous(limits = c(-13000, 13000),
                     breaks = seq(-20000, 20000, 5000), 
                     labels = paste0(
                       as.character(
                         c(seq(200, 0, -50), 
                           seq(50, 200, 50))),
                       "k"), 
                     expand = expansion(mult = c(0, .04)))+
  labs (y = "Population", 
        x = "Age Group",
        fill = "Gender",
        title = "Singapore Population Pyramid 2022",
        subtitle = "Distribution of gender and age groups from Most Populated Planning Areas",
        caption = "Data Source : Singstat.gov.sg, June 2022") +
  theme_bw() +
  theme(plot.title = element_text(size = 14, 
                                  colour = "#424242",
                                  face = "bold",
                                  hjust = 0.5),
        plot.subtitle = element_text(size = 8,
                                     colour = "#424242",
                                     hjust = 0.5),
        plot.caption = element_text(size = 8,
                                    colour = "#424242",
                                    hjust = 0),
        strip.text = element_text(size = 7,
                                  colour = "#424242"),
        strip.background = element_blank(),

        axis.ticks = element_line(colour = "#424242",
                                  linewidth = 0.5),
        axis.ticks.x = element_line(colour = "#424242",
                                  linewidth = 0.5),,
        axis.title.y = element_text(angle = 0, 
                                    size = 8, 
                                    colour = "#424242",
                                    vjust = 1.025,
                                    hjust = 0.7,
                                    margin = margin(r = -20, l = 20)),
        axis.title.x = element_text(size = 8, 
                                    colour = "#424242"),
        axis.text.x = element_text(size = 4,
                                   colour = "#424242"),
        axis.text.y = element_text(size = 4,
                                   colour = "#424242"),
        legend.position = "bottom",
        legend.justification = "left",
        legend.text = element_text(size = 5,
                                   colour = "#424242"),
        legend.title = element_text(size = 8, 
                                    colour = "#424242"),
        panel.grid.major = element_line(linewidth = rel(0.5)),
        panel.grid.major.x = element_line(linewidth = rel(0.5)),
        panel.grid.minor = element_blank(),
        plot.background = element_rect(fill = "#ffffff"),
        legend.background = element_rect(fill = "#ffffff"),
        legend.margin = margin(t = -10),
        panel.border = element_rect(colour = "#424242",
                                    linewidth = 0.3)) 

PA_Pyr

Learning from Practice

Clarity

  • With the subtitle indicating that these Planning Areas are the most populous in Singapore, the reader is more clear of the message. Not to mention that axis labels and ticks are present to further provide clarity.

  • Due to the limitation of space, a 3x3 Trellis Chart may be the maximum representation that can at least maximize readability. Adding more Age-Sex Pyramids will diminish readability and make the chart more difficult to interpret. Furthermore, the limits in size also affects the font sizes, thus reducing clarity and readability.

Aesthetic

  • The removal of bar numbers lessens the elements present in the visualization and makes in more pleasing to look at. This less “noisy” visualization greatly improves its aesthetic and design.

Interactivity

  • Crafting the visualization in R Studios is more difficult and restrictive as the limits in visual size restrict creative and engaging visuals. Furthermore, transitioning to interactivity may be difficult in the future as the limitations may complicate it further. Future visualization may be better crafted in R Shiny instead of quarto/R Studios.

Further Improvements and Developments

This section improves upon the original visualization by increasing size and adding interactivity.

Increasing Visualization Size

Code
PA_Pyr <- ggplot() +
  geom_bar(data = subset(trellis9_filter, 
                         Sex == "Males"), 
           aes(x = AG, 
               y = -sum_pop, 
               fill = PA),
           stat = "identity", 
           fill = "#2E9598") +
  geom_bar(data = subset(trellis9_filter, 
                         Sex == "Females"), 
           aes(x = AG, 
               y = sum_pop, 
               fill = PA), 
           stat = "identity", 
           fill = "#EC1B4B") +
  coord_flip() +
  facet_wrap(.~ PA,
             drop = FALSE, 
             ncol = 3,
             scales = "fixed")+
  scale_y_continuous(limits = c(-13000, 13000),
                     breaks = seq(-20000, 20000, 5000), 
                     labels = paste0(
                       as.character(
                         c(seq(200, 0, -50), 
                           seq(50, 200, 50))),
                       "k"), 
                     expand = expansion(mult = c(0, .04)))+
  labs (y = "Population", 
        x = "Age Group",
        fill = "Gender",
        title = "Singapore Population Pyramid 2022",
        subtitle = "Distribution of gender and age groups from Most Populated Planning Areas",
        caption = "Data Source : Singstat.gov.sg, June 2022") +
  theme_bw() +
  theme(plot.title = element_text(size = 20, 
                                  colour = "#424242",
                                  face = "bold",
                                  hjust = 0.5),
        plot.subtitle = element_text(size = 14,
                                     colour = "#424242",
                                     hjust = 0.5),
        plot.caption = element_text(size = 12,
                                    colour = "#424242",
                                    hjust = 0),
        strip.text = element_text(size = 12,
                                  colour = "#424242"),
        strip.background = element_blank(),

        axis.ticks = element_line(colour = "#424242",
                                  linewidth = 0.5),
        axis.ticks.x = element_line(colour = "#424242",
                                  linewidth = 0.5),,
        axis.title.y = element_text(angle = 0, 
                                    size = 12, 
                                    colour = "#424242",
                                    vjust = 1.025,
                                    hjust = 0.7,
                                    margin = margin(r = -50, l = 50)),
        axis.title.x = element_text(size = 12, 
                                    colour = "#424242"),
        axis.text.x = element_text(size = 10,
                                   colour = "#424242"),
        axis.text.y = element_text(size = 10,
                                   colour = "#424242"),
        legend.position = "bottom",
        legend.justification = "left",
        legend.text = element_text(size = 12,
                                   colour = "#424242"),
        legend.title = element_text(size = 12, 
                                    colour = "#424242"),
        panel.grid.major = element_line(linewidth = rel(0.5)),
        panel.grid.major.x = element_line(linewidth = rel(0.5)),
        panel.grid.minor = element_blank(),
        plot.background = element_rect(fill = "#ffffff"),
        legend.background = element_rect(fill = "#ffffff"),
        legend.margin = margin(t = -10),
        panel.border = element_rect(colour = "#424242",
                                    linewidth = 0.3)) 

PA_Pyr

Note

Additional Codes:

To Increase Visualization size:

#| fig-height: 12 #| fig-width: 12

To allow option to hide codes upon render:

#| code-fold: true

Adding Visualization

This section applies the ggiraph package to add interactive visualization per Bar within the Age Sex Pyramid.

Code
PA_Pyr <- ggplot() +
  geom_bar_interactive(data = subset(trellis9_filter, 
                         Sex == "Males"), 
                       aes(x = AG, 
                           y = -sum_pop, 
                           fill = PA,
                           tooltip =  paste0("Sex= ", Sex,
  "\n Age Group= ", AG,
  "\n Population= ", sum_pop)),
                       stat = "identity", 
                       fill = "#2E9598") +
  geom_bar_interactive(data = subset(trellis9_filter, 
                         Sex == "Females"), 
                       aes(x = AG, 
                           y = sum_pop, 
                           fill = PA,
                           tooltip =  paste0("Sex= ", Sex,
  "\n Age Group= ", AG,
  "\n Population= ", sum_pop)), 
                       stat = "identity", 
                       fill = "#EC1B4B") +
  coord_flip() +
  facet_wrap(.~ PA,
             drop = FALSE, 
             ncol = 3,
             scales = "fixed")+
  scale_y_continuous(limits = c(-13000, 13000),
                     breaks = seq(-20000, 20000, 5000), 
                     labels = paste0(
                       as.character(
                         c(seq(200, 0, -50), 
                           seq(50, 200, 50))),
                       "k"), 
                     expand = expansion(mult = c(0, .04)))+
  labs (y = "Population", 
        x = "Age Group",
        fill = "Gender",
        title = "Singapore Population Pyramid 2022",
        subtitle = "Distribution of gender and age groups from Most Populated Planning Areas",
        caption = "Data Source : Singstat.gov.sg, June 2022") +
  theme_bw() +
  theme(plot.title = element_text(size = 36, 
                                  colour = "#424242",
                                  face = "bold",
                                  hjust = 0.5),
        plot.subtitle = element_text(size = 28,
                                     colour = "#424242",
                                     hjust = 0.5),
        plot.caption = element_text(size = 24,
                                    colour = "#424242",
                                    hjust = 0),
        strip.text = element_text(size = 24,
                                  colour = "#424242"),
        strip.background = element_blank(),

        axis.ticks = element_line(colour = "#424242",
                                  linewidth = 0.5),
        axis.ticks.x = element_line(colour = "#424242",
                                  linewidth = 0.5),,
        axis.title.y = element_text(angle = 0, 
                                    size = 24, 
                                    colour = "#424242",
                                    vjust = 1.025,
                                    hjust = 0.7,
                                    margin = margin(r = -90, l = 90)),
        axis.title.x = element_text(size = 24, 
                                    colour = "#424242"),
        axis.text.x = element_text(size = 20,
                                   colour = "#424242"),
        axis.text.y = element_text(size = 20,
                                   colour = "#424242"),
        legend.position = "bottom",
        legend.justification = "left",
        legend.text = element_text(size = 24,
                                   colour = "#424242"),
        legend.title = element_text(size = 24, 
                                    colour = "#424242"),
        panel.grid.major = element_line(linewidth = rel(0.5)),
        panel.grid.major.x = element_line(linewidth = rel(0.5)),
        panel.grid.minor = element_blank(),
        plot.background = element_rect(fill = "#ffffff"),
        legend.background = element_rect(fill = "#ffffff"),
        legend.margin = margin(t = -10),
        panel.border = element_rect(colour = "#424242",
                                    linewidth = 0.3)) 

girafe(
  ggobj = PA_Pyr,
  width_svg = 24,
  height_svg = 36*0.618
)