Parliament’s new API - How to access data on MPs

What is the gender composition of Austria’s Nationalrat? How did it evolve over time? Who are the longest serving MPs? This post details how to answer these and similar questions by using parliament’s new API.

Author

Roland Schmidt

Published

10 Feb 2023

Code: Load packages

library(tidyverse)
library(httr2)
library(janitor)
library(rvest)
library(lubridate)
library(reactable)
library(reactablefmtr)
library(htmltools)
library(waffle)
library(ggtext)
library(gt)
library(gtExtras)
library(ggforce)
library(ggdist)
library(ggrepel)
library(ggbeeswarm)

1 Just the results, please.

Note: Best seen in landscape mode on mobile devices. For higher resolution images, please see plots in the blogpost.

2 Context

Austria’s parliament was re-opened on 12 January after a few years of renovation. The works did not only entail a major revamp of the 19th century neo-classicist building at Vienna’s Ringstraße, but also a substantial upgrade to its API. The latter was preceded by a public consultation process providing users the opportunity to file suggestions for improvement, and seems to be part of a conscious effort to strengthen the institution’s open data offer. While I am only beginning to toy with the API, as far as I can tell, it considerably lowers the barrier to data on Parliament’s work, not only by making it easier to access them, but also by broadening the offer. You can find information on the API here (unfortunately only in German), which provides an overview of the different endpoints/datasets. Most laudable, there are even some exemplary R script “showcases”!

In this post, I dig into one API endpoint, the dataset on MPs since 1920, more precisely the members of the Nationalrat (NR), parliament’s lower house.

The questions I’ll address include:

What’s the Nationalrat’s gender composition and how did it change over time?
Who are the longest serving MPs, overall, today and by gender?

None of the related results are groundbreaking (although some details were new to me). The point is first and foremost to get at ease with the new API and to document the process leading to the results. As always, if you spot any error, have a question or suggestion, please don’t hesitate to contact me via twitter or mastadon DM.

3 Accessing the API with httr2

Code: Load required packages

library(tidyverse)
library(httr2)

To obtain the data, I use the httr2 package and wrap it into a function.[^For an excellent overview how to wrap API’s see Hadley Wickham’s pertaining post here]. I insert some comments directly into the code chunk, which hopefully make things sufficiently clear.

3.1 Define function

Code: Function to access API

get_mps <- function(Gremium = NULL,
                    Gesetzgebungsperiode = NULL,
                    Männlich = NULL,
                    Weiblich = NULL,
                    PraesidentInnen = NULL,
                    Wahlpartei = NULL,
                    Fraktion = NULL,
                    Bundesland = NULL,
                    Wahlkreis = NULL) {

  base_url <- "https://www.parlament.gv.at/Filter/api/json/post"

  params <- list(
    `jsMode` = "EVAL",
    `FBEZ` = "WFW_004",
    `listeId` = "10004",
    `pageNumber` = "1",
    `ascDesc` = "ASC",
    `showAll` = "true"
  )

  data <- list(
    NRBR = {{ Gremium }},
    GP = {{ Gesetzgebungsperiode }},
    M = {{ Männlich }},
    W = {{ Weiblich }},
    PR = {{ PraesidentInnen }},
    WP = {{ Wahlpartei }},
    FR = {{ Fraktion }},
    BL = {{ Bundesland }},
    WK = {{ Wahlkreis }},
    R_WF = NULL
  ) %>% discard(., is.null)

  print(data)

  # run the actual request
  res <- request(base_url) %>%
    req_headers("Accept" = "application/json") %>%
    req_url_query(!!!params) %>%
    req_body_json(data = data, auto_unbox = F) %>%
    req_perform()

  # extract the column names
  vec_headings <- res %>%
    resp_body_json(., simplifyVector = T) %>%
    pluck(., "header", "label") %>%
    janitor::make_clean_names()
  
  # extract the actual substantive data
  df_res <- res %>%
    resp_body_json(., simplifyVector = T) %>%
    pluck(., "rows") %>%
    as.data.frame()

  # asign the column names as names to the main dataframe
  colnames(df_res) <- vec_headings

  # some columns contain html tags; here I define and subsequently apply a function which removes them, rendering plain text.
  fn_html_tags <- function(x) {
    {{ x }} %>%
      xml2::read_html() %>%
      rvest::html_elements("span") %>%
      rvest::html_attr("title")
  }

  df_res <- df_res %>%
    mutate(across(.cols = c(klub, gesetzgebungsperioden, bundesland), \(x) map(x, \(y) fn_html_tags(y))))
  return(df_res)
}

3.2 Apply function

While the API parameters allow to specify MPs’ gender in a search parameter, the subsequent results unfortunately do no include a gender indicator. Hence, I run two separate requests, one for male, one for female MPs and add a column specifying the gender.

Code: Apply function

df_m <- get_mps(Gremium = "NR", Männlich = "M", Gesetzgebungsperiode = "ALLE") %>%
  mutate(gender = "male")

df_w <- get_mps(Gremium = "NR", Männlich = "W", Gesetzgebungsperiode = "ALLE") %>%
  mutate(gender = "female")

# combine
df_res <- dplyr::bind_rows(df_m, df_w)

nrow(df_m)
nrow(df_w)
nrow(df_res)

The queries returned a combined dataframe with 1966 rows in total. 388 rows for female MPs, 1578 for male MPs. Does these results mean there were 1966 MPs in total, out of which 388 were women and 1578 were men? Nope. Why? Let’s have a closer look the results.

Code

glimpse(df_res)
## Rows: 1,966
## Columns: 9
## $ pad_intern            <chr> "13649", "2003", "577", "10", "14", "22", "23", …
## $ sortier               <chr> "Abraham", "Abram", "Achs", "Adler", "Adlersflüg…
## $ name                  <chr> "Abraham Gerhard", "Abram Simon", "Achs Matthias…
## $ klub                  <list> "Die Sozialdemokratische Parlamentsfraktion - K…
## $ gesetzgebungsperioden <list> "einundzwanzigste Gesetzgebungsperiode", <"erst…
## $ bundesland            <list> "2D Kärnten Ost", "25 Nordtirol, 18 Tirol", "1 …
## $ rss_pubdate           <chr> "20.11.2021", "20.11.2021", "17.03.2022", "20.11…
## $ link                  <chr> "/person/13649", "/person/2003", "/person/577", …
## $ gender                <chr> "male", "male", "male", "male", "male", "male", …

As you’ll gather from the output, there’s a column called pad_intern which appears to be something like an MP’s unique ID. When we check for the total number of unique pad_interns, it turns out that there are actually fewer IDs then rows.

Code

n_distinct(df_res$pad_intern)
## [1] 1931
nrow(df_res)
## [1] 1966

3.3 Taking care of name variants

This difference is due to changing names of MPs. Let’s have a look here:

Code: Identify MPs with changing names

df_dupes <- df_res %>%
  get_dupes(., pad_intern) %>%
  select(pad_intern, name) %>%
  group_by(pad_intern) %>%
  summarise(
    number_variants = n(),
    name_variants = paste(name, collapse = "<br>")
  ) %>%
  arrange(desc(number_variants))

df_dupes %>%
  reactable(.,
    columns = list(
      pad_intern = colDef(
        width = 100,
        align = "center",
        show=F
      ),
      number_variants = colDef(
        name = "Number of name variants",
        width = 200,
        align = "center"
      ),
      name_variants = colDef(
        name = "Name variants",
        html = T
      )
    ),
    compact = TRUE, ,
    defaultPageSize = 5, ,
    theme = fivethirtyeight()
  ) %>%
  add_title(title = "MPs with changing names", font_size = 15)

MPs with changing names

from Blau-Meissner to Meissner-Blau; Source: parlament.gv.at

There are in total 33 MPs whose names have undergone some changes. As the table above shows, two MPs changed their names more than once. Changes are mainly (and supposedly) due to marriage/divorce, and - Tu felix Austria - the addition of academic titles. Or, if you are Sepp (Steinhuber), and no longer want to be called ‘Josef.’ New to me: Freda Meissner-Blau, icon of the early environmental movement and first chairperson of the Green party, had up until 1988 the family name Blau-Meissner.

Since my interest is however not in an MP’s names but the individual, distinct MP, the different name variants have to be nested within the unique pad_intern identifiers.

Code

df_res <- df_res %>%
  tidyr::chop(cols = c(sortier, name))

nrow(df_res)
## [1] 1931
n_distinct(df_res$pad_intern)
## [1] 1931

As you can see, the new dataframe has now one row per MP. The columns name and sortiere became list columns, containing the different name variations.

(As a side note, I am by no means an expert on the design of APIs, but wouldn’t it have been more efficient and convenient to simply return different names nested under one distinct pad_intern?)

class(df_res$name)
## [1] "vctrs_list_of" "vctrs_vctr"    "list"
class(df_res$sortier)
## [1] "vctrs_list_of" "vctrs_vctr"    "list"

##   pad_intern                                       name
## 1       2819     Amon Werner, MBA, Amon Werner siehe...
## 2         76 Bauer Hannes, Dip..., Bauer Johann, Dip...
## 3      83124     Bernhard Michael, Pock Michael sieh...
## 4       2834 Graf Martin, Dr. ..., Graf Martin, Mag....
## 5        547           Heigl Hans, Heigl Johann sieh...

4 Gender composition of the Nationalrat

Now, with this issue taken care of, we can have a first, general look at the gender composition of MPs of the Nationalrat - in total, over all legislative periods.

Code: Calculate gender share and create plot

df_gender <- df_res %>%
  count(gender) %>%
  mutate(perc = n / sum(n))

df_gender %>%
  reactable(.,
    columns = list(
      gender = colDef(
        name = "Gender",
        footer = "Total of all MPs"
      ),
      n = colDef(
        name = "number",
        footer = sprintf("%i", sum(.$n))
      ),
      perc = colDef(
        name = "%",
        format = colFormat(
          percent = T,
          digits = 2
        ),
        footer = sprintf("%.2f%%", sum(.$perc) * 100)
      ) # % before % to escape
    ),
    compact = T,
    theme = fivethirtyeight()
  )

Code: Create waffle plot

df_gender %>%
  select(-perc) %>%
  ggplot() +
  labs(
    title = "Austrian Nationalrat:<br> Gender of MPs since 1918",
    caption = txt_caption
  ) +
  geom_waffle(
    aes(
      fill = gender,
      values = n
    ),
    n_rows = 30,
    size = 0.3,
    color = "white",
    flip = TRUE
  ) +
  geom_label(
    aes(
      x = 30,
      y = (n / 30) + .5,
      label = glue::glue("{gender} {n} ({(n/nrow(df_res)) %>% scales::percent(., accuracy=1)})"),
      group = gender,
      color = gender
    ),
    fill = "transparent",
    family = "sans",
    size = rel(3),
    label.r = unit(0, "lines"),
    label.size = 0,
    label.padding = unit(0, "lines"),
    hjust = 1,
    vjust = 0
  ) +
  geom_richtext(
    data = tibble(gender = "female"),
    label = "<span style='font-size:10pt; color:darkgrey;line-height:50%'>Austrian Nationalrat:</span><br> <span style='color:darkgrey;font-size:16pt'>Gender of MPs<br>since 1920</span>",
    hjust = 0,
    label.color = "white",
    label.r = unit(0, "lines"),
    label.padding = unit(0, "lines"),
    label.margin=unit(0, "lines"),
    label.size = 0,
    size = rel(6),
    fontface = "bold",
    family = "sans",
    lineheight = 0.2,
    vjust = 1,
    nudge_y = 2,
    aes(
      x = 0,
      y = Inf
    )
  ) +
  scale_fill_manual(values = c(
    female = "#E6AF2E",
    male = "#353A47"
  )) +
  scale_color_manual(values = c(
    female = "#E6AF2E", 
    male = "#353A47"
  )) +
  scale_x_discrete(expand = expansion(mult = c(0, 0))) +
  scale_y_continuous(
    expand = expansion(mult = c(0, 0.02))
  ) +
  coord_equal() +
  ggthemes::theme_fivethirtyeight() +
  theme_enhance_waffle() +
  theme(
    axis.text.y = element_blank(),
    panel.grid.major.y = element_blank(),
    strip.background = element_rect(fill = "white"),
    strip.text = element_blank(),
    legend.position = "none",
    plot.title = element_blank(),
    plot.caption = element_markdown(
      hjust = 0, 
      size = rel(0.5),
      lineheight = 1.2),
    plot.background = element_rect(fill = "white"),
    panel.background = element_rect(fill = "white")
  ) +
  facet_wrap(vars(gender))

The result above shows that out of the 1,931 individuals who held at one point a mandate in the Nationalrat, only 362 were women. That’s less that a fifth….

Note that this result provides only an aggregate overview of the assembly’s gender composition over its entire existence, meaning from 1920 onwards. Its preceeding bodies, the “Provisorische Nationalversammlung” and the “Konstituierende Nationalversammlung”, are not covered by the API call from above and would have required different search parameters.

In order to get an idea for the development of the chamber’s composition over the years, I am interested in the gender ratios at the beginnings of each legislative period.

The column gesetzgebungsperioden in our API result is a list column, with each element being one legislative period in which an MP held her/his mandate. Below a sample to make this point clearer.

To count the MPs (and their gender) per legislative period, this list has to be brought into a long format, so that we obtain one row per MP and legislative period.

Below the code, plus a sample of the result.

Code: Unnest list ‘gesetzgebungsperiode’; obtain one row per MP and legislative period

df_res_long <- df_res %>%
  distinct(pad_intern, gender, gesetzgebungsperioden) %>%
  unnest_longer(col=gesetzgebungsperioden)

head(df_res_long %>% select(-gender), n = 10) %>% reactable(., theme=fivethirtyeight(), compact=T)

As you can see from the sample output, e.g. the MP with the ID 2003 had a mandates in the first, second, third, and the fourth legislative period; the MP with the id 577, was MP from the seventeenth to the twentieth.

To make working with the legislative periods a bit easier, I convert the alphabetic version into a numeric one.

Sequence of conditions of case_when arguments

Critical for the proper working of the function is the sequence of conditions in the case_when function. E.g. if ‘zwei’ would preceed ‘zweiundzwanzig’, the latter would never be matched and would erroneously be asigned the numeric value “2” (since ‘zwei’ is also part of the word ‘zweiundzwanzig’).

Code: Define and apply function to convert words to numbers

# order is critical

word2num <- function(word) {
  case_when(
    str_detect(word, "einundzwanzig") ~ 21,
    str_detect(word, "zweiundzwanzig") ~ 22,
    str_detect(word, "dreiundzwanzig") ~ 23,
    str_detect(word, "vierundzwanzig") ~ 24,
    str_detect(word, "fünfundzwanzig") ~ 25,
    str_detect(word, "sechsundzwanzig") ~ 26,
    str_detect(word, "siebenundzwanzig") ~ 27,
    str_detect(word, "dreizehn") ~ 13,
    str_detect(word, "vierzehn") ~ 14,
    str_detect(word, "fünfzehn") ~ 15,
    str_detect(word, "sechzehn") ~ 16,
    str_detect(word, "siebzehn") ~ 17,
    str_detect(word, "achtzehn") ~ 18,
    str_detect(word, "neunzehn") ~ 19,
    str_detect(word, "zwanzig") ~ 20,
    str_detect(word, "erst") ~ 1,
    str_detect(word, "zwei") ~ 2,
    str_detect(word, "dritte") ~ 3,
    str_detect(word, "vier") ~ 4,
    str_detect(word, "fünf") ~ 5,
    str_detect(word, "sechs") ~ 6,
    str_detect(word, "sieben") ~ 7,
    str_detect(word, "acht") ~ 8,
    str_detect(word, "neun") ~ 9,
    str_detect(word, "zehn") ~ 10,
    str_detect(word, "elf") ~ 11,
    str_detect(word, "zwölf") ~ 12,
    .default = NA
  )
}

df_res_long <- df_res_long %>%
  mutate(gesetzgebungsperioden_num = map_dbl(gesetzgebungsperioden, \(x) word2num(x), .progress = T))

After this step, we have all the data necessary to compile the gender composition per legislative period.

Code: Table on gender composition per legislative period

vec_color_gender=c(female = "#E6AF2E", male = "#353A47")

df_pl_df_res_long <- df_res_long %>%
  select(gesetzgebungsperioden_num, pad_intern, gender) %>%
  count(gesetzgebungsperioden_num, gender) %>%
  left_join(., df_parl, by="gesetzgebungsperioden_num") 

df_pl_df_res_long %>%
mutate(date_end_char=as.character(lubridate::year(date_end))) %>%
mutate(date_end_char=if_else(is.na(date_end_char), "now", date_end_char)) %>%
mutate(duration=glue::glue("{lubridate::year(date_start)}-{date_end_char}")) %>%
pivot_wider(
  id_cols=c(
    gesetzgebungsperioden_num,
    duration
  ),
  values_from=n,
  names_from=gender
) %>%
arrange(desc(gesetzgebungsperioden_num)) %>%
mutate(gesetzgebungsperioden_num=as.character(as.roman(gesetzgebungsperioden_num))) %>%
mutate(total_mp=female+male) %>%
mutate(across(c("female", "male"), .fns=list(rel=\(x) x/total_mp))) %>%
select(
  gesetzgebungsperioden_num,
  duration,
  total_mp,
  female_rel,
  female,
  male,
  male_rel
) %>%
reactable(.,
columns=list(
  gesetzgebungsperioden_num=colDef(
    name="legisltiave period",
    align="center"
  ),
  duration=colDef(
    name="duration"
  ),
  total_mp=colDef(
    name="total number of MPs",
    align="center"
  ),
  female=colDef(
    name="women",
    align="right"
  ),
  male=colDef(
    name="men",
    align="right"
  ),
  female_rel=colDef(
    name="women",
    align="center",
    cell=data_bars(
      data=.,
      text_position = "outside-end",
      max_value=1,
      fill_color="#E6AF2E",
      number_fmt = scales::label_percent(accuracy=0.1))
  ),
  # female_rel=colDef(
  #   name="women",
  #   format=colFormat(
  #     percent=T,
  #     digits=2)
  # ),
  male_rel=colDef(
    name="men",
    show=F,
    format=colFormat(
      percent=T,
      digits=2
    )
  )
),
  columnGroups=list(
    colGroup(
      name="absolute",
      columns=c("female", "male")),
    colGroup(
      name="%",
      columns=c("female_rel", "male_rel")
    )  
  ),
compact=T,
fullWidth=F,
defaultPageSize=27,
theme=fivethirtyeight()) %>%
add_title(title = "Gender composition of the Nationalrat per legislative period", font_size = 19) %>%
  add_subtitle(subtitle = "As of 20.1.2023.", font_size = 12) %>%
  add_source(source = html("Data: parlament.gv.at<br>Analysis: Roland Schmidt | @zoowalk | <span style='font-weight:bold'>https://werk.statt.codes</span>"), font_size = 10)

Gender composition of the Nationalrat per legislative period

As of 20.1.2023.

absolute

legisltiave period

duration

total number of MPs

women

men

XXVII

2019-now

204

40.7%

121

XXVI

2017-2019

210

35.7%

135

XXV

2013-2017

209

32.1%

142

XXIV

2008-2013

219

28.8%

156

XXIII

2006-2008

212

32.5%

143

XXII

2002-2006

209

33.0%

140

XXI

1999-2002

218

28.0%

157

1996-1999

228

25.9%

169

XIX

1994-1996

214

24.8%

161

XVIII

1990-1994

233

21.9%

182

XVII

1986-1990

231

14.7%

197

XVI

1983-1986

226

11.5%

200

1979-1983

218

9.2%

198

XIV

1975-1979

202

7.9%

186

XIII

1971-1975

205

6.8%

191

XII

1970-1971

170

4.7%

162

1966-1970

182

6.0%

171

1962-1966

182

6.0%

171

1959-1962

179

6.1%

168

VIII

1956-1959

180

5.6%

170

VII

1953-1956

178

6.2%

167

1949-1953

173

5.8%

163

1945-1949

177

5.6%

167

1930-1934

199

5.5%

188

III

1927-1930

177

3.4%

171

1923-1927

182

4.9%

173

1920-1923

202

5.9%

190

Data: parlament.gv.at
Analysis: Roland Schmidt | @zoowalk | https://werk.statt.codes

Below an alternative graphical representation of the same data.

Code: Plot gender composition per legislative period plus available seats

df_pl_df_res_long %>%
  ggplot() +
  labs(
    title="Austria's Nationalrat:<br> Number of <span style='color:#E6AF2E'>female</span> and <span style='color:#353A47'>male</span> MPs per legislative period",
    y="Number of MPs",
    x="Legislative period",
    caption=txt_caption
  )+
  geom_bar(
    aes(
      x = gesetzgebungsperioden_num+.25,
      y = n,
      fill = gender
    ),
    stat = "identity"
  ) +
  geom_segment(aes(
    x = 1 - .5+.25, xend = 1 + .5+.25,
    y = 183, yend = 183,
      color="green"
  )) +
  geom_segment(aes(
    x = 2 - .5+.25, xend = 12 + .5+.25,
    y = 165, yend = 165,
    color="green")
  ) +
  geom_segment(aes(
    x = 13 - .5+.25, xend = 27 + .5+.25,
    y = 183, yend = 183,
    color="green")#,
    # color="green"
  ) +
  scale_color_manual(values="lightgrey", 
  label="Number of available seas", name=NULL)+
  scale_fill_manual(values=vec_color_gender,
  name=NULL)+
  scale_x_continuous(expand=expansion(mult=c(0,0)),
  labels=\(x) as.roman(x)  %>% fn_label_unit(., unit="legislaitve period"),
  breaks=c(1, seq(0,25, 5), 27))+
  scale_y_continuous(
    expand=expansion(mult=c(0,0.1)),
    labels=\(x) fn_label_unit(x=x, unit="<br>MPs"),
    breaks=c(seq(0,200, 50), 233),
    sec.axis=sec_axis(
      trans=\(x) x*1,
      breaks=c(165, 183), 
      labels=c("165", "183 seats available"),
      name="Number of chamber's total seats")
  )+
  ggthemes::theme_fivethirtyeight()+
  theme(
  legend.position = "none",
  legend.justification="left",
  legend.direction ="horizontal",
  legend.box="horizontal",
  legend.background = element_rect(fill="white"),
  legend.key=element_rect(fill="white"),
  axis.text.y.left = element_markdown(),
  axis.text.x.bottom = element_text(hjust=0),
  plot.title=element_markdown(),
  plot.caption = element_markdown(
    hjust = 0,
    size = rel(0.8),
    lineheight = 1.2),
  plot.background = element_rect(fill = "white"),
  panel.background = element_rect(fill = "white"),
  plot.title.position="plot",
  plot.caption.position="plot",
  panel.grid.major.y = element_blank(),
  panel.grid.major.x = element_blank()
)

The result above is already quite informative since it gives us the gender of all MPs who held a mandate during a specific legislative period. Substantively, it shows how the share of female MPs only started to grow gradually from the 1970s onwards.

4.1 MP fluctuation during legislative periods

The data, however, also reveals that there was was some fluctuation of MPs during each legislative period, or to put it differently, that there were more MPs per legislative period than there are seats in the Nationalrat. The vertical bars in the plot above indicate the available number of seats in the Nationalrat. While this is not surprising as such, MPs may resign due to various reason (health, intra-party dynamics…), I find it interesting to see how little fluctuation there was prior to the 1970s, if compared to later periods. The plot below makes this point clearer.

legislative periods	Number of seats
Number of seats in the Nationalrat
1	183
2 - 11	165
13 - today (27)	183

Code: Calcualte and plot MP turnover by legislative period

df_res_long_seats <- df_res_long %>%
  left_join(., df_seats, by = join_by(
    between(gesetzgebungsperioden_num,
      legislative_period,
      legislative_period_end,
      bounds = "[]"
    )
  ))

df_mp_turnover <- df_res_long_seats %>%
  count(gesetzgebungsperioden_num, seats, txt, name = "n_mps") %>%
  mutate(turnover = (n_mps - seats) / seats) %>%
  mutate(
    seats_diff = n_mps - seats,
    seats_stable = seats - seats_diff
  )

df_pl_mp_turnover <- df_mp_turnover %>%
  pivot_longer(
    cols = c(seats_stable, seats_diff),
    names_to = "seats_type",
    values_to = "seats_type_n"
  ) %>%
  left_join(., df_parl, by = c("gesetzgebungsperioden_num" = "gesetzgebungsperioden_num")) %>%
  mutate(x_label = glue::glue("{date_start_year}\n{as.roman(gesetzgebungsperioden_num)}")) %>%
  mutate(txt = forcats::fct_relevel(txt, levels = c("1", "2 - 11", "13 - today (27)"))) %>%
  mutate(gesetzgebungsperioden_num = fct_inseq(as.character(gesetzgebungsperioden_num)))

breaks_pos <- sort(c(1, seq(5,10,5), 13, 20, 27))

label_sec <- df_pl_mp_turnover %>%
filter(seats_type=="seats_diff")  %>%
filter(row_number() %in% breaks_pos) %>%
mutate(sec_label_end=if_else(is.na(date_end), "today", str_sub(date_end, 3,4))) %>%
mutate(sec_label=glue::glue("{year(date_start)}/{sec_label_end}")) %>%
pull(sec_label)

df_pl_mp_turnover %>%
filter(seats_type=="seats_diff") %>%
mutate(gesetzgebungsperioden_num=as.numeric(gesetzgebungsperioden_num)) %>%
ggplot()+
labs(
  title="Fluctuation of MPs per legislative period",
  subtitle="How many % of all seats change mandate holder during a legislative period?",
  # subtitle=expression(paste("How many % of all seats change mandate holder during a legislative period?", frac(N['MPs,Legis.Period']-N['Seats, Legis. Period'], N['Seats, Legis. Period']), sep="   ")),
  caption=txt_caption
)+
geom_rect(aes(
  xmin=1,
  xmax=2,
  ymin=0,
  ymax=Inf
),
fill="#dfdfdf",
alpha=.025)+
geom_rect(aes(
  xmin=13, #183 Mps form 13th legis period again
  xmax=Inf,
  ymin=0,
  ymax=Inf
),
fill="#dfdfdf",
alpha=0.02)+
geom_point(aes(
  x=gesetzgebungsperioden_num,
  y=turnover
),
color="#9cd321")+
geom_line(aes(
  x=gesetzgebungsperioden_num,
  y=turnover
),
color="#9cd321")+
geom_text(aes(
  x=2.5,
  y=.33,
  label="165 seats"
),
check_overlap=T,
color="black",
size=3,
hjust=0
)+
geom_text(aes(
  x=13.5,
  y=.33,
  label="183 seats"
),
check_overlap=T,
hjust=0,
color="black",
size=3
)+
geom_text(aes(
  x=16,
  y=.05,
  label=paste("Fluctuation[Legis.Period]==",'frac(N["MPs,Legis.Period"]-N["Seats, Legis.Period"], N["Seats, Legis.Period"])')
),
check_overlap=T,
hjust=0,
color="black",
size=3,
parse=T)+
scale_x_continuous(
  # labels=\(x) as.roman(x),
  labels=\(x) glue::glue("{as.roman(x)}\n{label_sec}"),
  breaks=breaks_pos,
  position="top",
  name=" Legislative period:",
  # sec.axis=ggplot2::dup_axis(
  #   labels=label_sec,
  #   name="Legislative period - years"),
  expand=expansion(mult=c(0.01,0.1))
)+
scale_y_continuous(
  breaks=seq(0,.3, .1),
  label=scales::percent,
  expand=expansion(mult=c(0, 0.05)),
  limits=c(0,.35)
)+
ggthemes::theme_fivethirtyeight()+
 theme(
  legend.position = "none",
  legend.justification="left",
  legend.direction ="horizontal",
  legend.box="horizontal",
  legend.background = element_rect(fill="white"),
  legend.key=element_rect(fill="white"),
  axis.title.x.top = element_text(hjust=0),
  axis.title.x.bottom = element_text(hjust=0),
  axis.text.y.left = element_markdown(),
  axis.text.x.top = element_text(
    hjust=0
  ),
  axis.line.x = element_line(color="lightgrey"),
  plot.title=element_markdown(),
  plot.subtitle=element_text(hjust=0),
  plot.caption = element_markdown(
    hjust = 0,
    size = rel(0.8),
    lineheight = 1.2),
  plot.background = element_rect(fill = "white"),
  panel.background = element_rect(fill = "white"),
  plot.title.position="panel",
  plot.caption.position="panel"
)

There’s quite a difference between the periods before and after the early 1970s. I wonder why, but that’s only an observation on the side, and something for another time. Nevertheless, one important caveat here: At this point, our API result only indicates whether an MP had a mandate during a specific legislative period. What the data misses - when it comes to MP fluctuation - are MPs who have a mandate e.g. during legislative period X, leave, and then return within the same legislative period X again. The data underlying the the graph above doesn’t capture the fluctuation due to the return to the NR within the same period.

4.2 Composition at the start of the legislative period

In the next step, I want to refine the calculation of the NR’s gender composition by specifically looking at it at the start of each legislative session. So far we only had the aggregate number per legislative period. Given the demonstrated MP turnover, we have no exact picture of who was MP at the start of each legislative period and hence of the initial gender composition.

The challenge here is that the results obtained by the API call above did provide us with information on whether an MP was member of the NR during a specific legislative period, but not the exact starting and ending dates of her/his political mandate. Unfortunately, as far as I can tell, this information is not (yet?) accessible via the API. However, the data is actually presented on the Parliament’s homepage.

Below a sample screenshot from the bio-page of Sebastian Kurz (who was also an MP). The data of interest is highlighted.

4.3 Getting details on MPs’ political mandates

To extract the relevant data, I define a function which scrapes the section on political mandates from the biography page. Subsequently, the various political mandates are filtered for membership in the Nationalrat. This function is than applied to the biography pages of all MPs. Once the start and end date of membership in parliament is available, we are also able to answer the question who was an MP at the beginning of which legislative period.

Code: Define function to extract political mandates of each MPs

fn_get_mp_mandates <- function(url) {
  # Scrape the section on political functions
  txt_raw <- url %>%
    xml2::read_html() %>%
    rvest::html_elements("#biografie-tabs-tabpanel-BIO") %>%
    rvest::html_elements("section") %>%
    rvest::html_text2() %>%
    .[[1]] %>%
    enframe(name = NULL, value = "txt")

  # create a table with the start and ending dates for each mandate
  df_details <- txt_raw %>%
    mutate(txt = str_remove(txt, regex("Politische Mandate/Funktionen\n"))) %>%
    separate_rows(txt, sep = "\n") %>%
    mutate(row_id = ifelse(
      !str_detect(txt, regex("^\\d")), 1, 0
    )) %>%
    mutate(mandate_id = cumsum(row_id)) %>%
    pivot_wider(id_cols = c("mandate_id"), names_from = "row_id", values_from = "txt") %>%
    rename(
      office = `1`,
      period = `0`
    ) %>%
    select(
      -mandate_id
    ) %>%
    separate(period, sep = "-", into = c("office_date_start", "office_date_end")) %>%
    mutate(across(contains("date"), \(x) lubridate::dmy(x))) %>%
    separate(office, sep = ", ", into = c("office", "party"))

  return(df_details)
}

Code: Apply function

df_details <- df_res %>%
  select(pad_intern, gender, link) %>%
  mutate(link = glue::glue("https://www.parlament.gv.at{link}?selectedtab=BIO")) %>%
  mutate(mandates = purrr::map(link, \(x) fn_get_mp_mandates(x), .progress = T))

Code: Unnest list with political mandates and filter

df_details_long <- df_details %>%
  unnest_longer(mandates) %>%
  unnest_wider(mandates) %>%
  filter(str_detect(office, regex("Abgeordneter? zum Nationalrat")))

Below the results for the Kurz:

# A tibble: 4 × 5
  pad_intern office                                  office_d…¹ office_d…² party
  <chr>      <chr>                                   <date>     <date>     <chr>
1 65321      Abgeordneter zum Nationalrat (XXVII. G… 2021-10-14 2021-12-08 ÖVP  
2 65321      Abgeordneter zum Nationalrat (XXVII. G… 2019-10-23 2020-01-07 ÖVP  
3 65321      Abgeordneter zum Nationalrat (XXVI. GP) 2017-11-09 2018-01-22 ÖVP  
4 65321      Abgeordneter zum Nationalrat (XXV. GP)  2013-10-29 2013-12-16 ÖVP  
# … with abbreviated variable names ¹office_date_start, ²office_date_end

While glancing through the results, I noticed an unexpected increase in the number of rows. Checking for duplicates revealed that there were a few MPs who - for reasons unknown to me - featured two different party memberships during the same mandate period. Remarkably, they are all member of the KPÖ.

Code: Control for MPs with multiple party affiliations during same mandate

# check for dupes
dupes <- df_details_long %>%
  janitor::get_dupes("pad_intern", "office_date_start", "office_date_end")
nrow(dupes) # KPÖ/LB duplicates
## [1] 18

dupes %>%
group_by(across(.cols=-party)) %>%
summarise(party=list(party)) %>%
left_join(., df_res %>% select(pad_intern, name) %>%
mutate(name=map_chr(name, 1))) %>%
ungroup() %>%
select(
  name,
  office,
  office_date_start,
  office_date_end,
  party,
  link
) %>%
reactable(.,
columns=list(
  office_date_start=colDef(
    name="office start"
  ),
  office_date_end=colDef(
    name="office end"
  ),
  party=colDef(
    name="multiple parties",
    style = list(background = "rgba(0, 0, 0, 0.03)")
  ),
  link=colDef(
        name="Link to Biography",
        align="center",
        html=T,
        cell=function(value, index) {
                htmltools::tags$a(href=value,
                                  target="_blank",
                                  as.character("link"))  
                  }
              )
  ),
theme=fivethirtyeight(font_size=12),
compact=T) %>%
add_title(title="MPs with multiple party affiliations during the same period.", font_size=14)

MPs with multiple party affiliations during the same period.

name

office

office start

office end

multiple parties

Link to Biography

Scharf Erwin

Abgeordneter zum Nationalrat (VI. GP)

1949-11-08

1953-03-18

KPÖ,LB

link

Elser Viktor

Abgeordneter zum Nationalrat (V. GP)

1945-12-19

1949-11-08

KPÖ,LB

link

Elser Viktor

Abgeordneter zum Nationalrat (VI. GP)

1949-11-08

1953-03-18

KPÖ,LB

link

Fischer Ernst

Abgeordneter zum Nationalrat (V. GP)

1945-12-19

1949-11-08

KPÖ,LB

link

Fischer Ernst

Abgeordneter zum Nationalrat (VI. GP)

1949-11-08

1953-03-18

KPÖ,LB

link

Honner Franz

Abgeordneter zum Nationalrat (V. GP)

1945-12-19

1949-11-08

KPÖ,LB

link

Honner Franz

Abgeordneter zum Nationalrat (VI. GP)

1949-11-08

1953-03-18

KPÖ,LB

link

Koplenig Johann

Abgeordneter zum Nationalrat (V. GP)

1945-12-19

1949-11-08

KPÖ,LB

link

Koplenig Johann

Abgeordneter zum Nationalrat (VI. GP)

1949-11-08

1953-03-18

KPÖ,LB

link

While I could imagine that this is a data entry error, I preferred to keep both parties and collapsed the character vector party into a list instead of deleting one. As the result, we have again one row per MPs and mandate, and a list containing all pertaining party affiliations during the period in question.

Code: Collapse party into list

df_details_long <- df_details_long %>%
  tidyr::chop(c(party))

4.4 Duration of mandates

Before checking for MPs’ presence at the beginning of the legislative period, there’s one additional aspect we can look into: The length of MPs’ time in the Nationalrat. Who has been the longest serving MP in the Council’s history as of the time of writing this post?

4.4.1 Longest serving MPs

Code: Calculate duration of mandates

df_duration_mandate <- df_details_long %>%
  mutate(office_date_end = case_when(
    is.na(office_date_end) & str_detect(office, regex("XXVII. GP")) ~ Sys.Date(),
    .default = office_date_end
  )) %>%
  mutate(duration = difftime(office_date_end, office_date_start, units = "days")) %>%
  arrange(desc(duration)) %>%
  left_join(., df_res %>% select(pad_intern, name)) %>%
  unnest_wider(col = name, names_sep = "_") %>%
  mutate(office_date_end = case_when(
    office_date_end == Sys.Date() ~ "ongoing",
    .default = as.character(office_date_end)
  ))

df_duration_mp <- df_duration_mandate %>%
  group_by(pad_intern, gender, name_1) %>%
  summarise(
    duration_sum = sum(duration),
    party = list(unique(party))
  ) %>% # nest party; MPS can change party;
  ungroup() %>%
  arrange(desc(duration_sum))

Below the top 10 of the longest serving MPs.

Code: Table top 10 MPs

df_duration_mp %>%
  mutate(row_id = row_number()) %>%
  select(row_id, name_1, party, duration_sum) %>%
  slice_head(., n = 10) %>%
  reactable(.,
    columns = list(
      row_id = colDef(
        align = "center",
        name = "Pos.",
        width = 50
      ),
      name_1 = colDef(
        name = "MP",
        width = 250
      ),
      party = colDef(
        name = "party",
        width = 100
      ),
      duration_sum = colDef(
        name = "Total number of days",
        align = "center",
        format = colFormat(
          separators = T
        )
      )
    ),
    details = function(index) {
      mandates <- filter(df_duration_mandate, pad_intern == df_duration_mp$pad_intern[index]) %>% select(-gender, -link)
      tbl <- mandates %>%
        select(-pad_intern, -contains("name")) %>%
        reactable(., columns = list(
          office = colDef(width = 150),
          office_date_start = colDef(name = "start"),
          office_date_end = colDef(name = "end"),
          duration = colDef(name = "duration mandate", format = colFormat(separator = T))
        ), outline = T, fullWidth = F, compact = T, theme = reactableTheme(backgroundColor = "lightgrey"))
      htmltools::div(style = list(margin = "12 px 45px"), tbl)
    },
    compact=T,
    onClick = "expand",
    rowStyle = list(cursor = "pointer"),
    fullWidth = FALSE,
    theme = fivethirtyeight()
  ) %>%
  add_title(title = html("Top 10: Austria's <span style='color:white; background-color:black;'>longest serving</span> MPs"), font_size = 17) %>%
  add_subtitle(
    subtitle = "Nationalrat only. As of 20.1.2023.", font_size = 12,
    font_weight="normal",
    margin=margin(t=8)) %>%
  add_source(source = html("Data: parlament.gv.at. Analysis: Roland Schmidt | @zoowalk | <span style='font-weight:bold'>https://werk.statt.codes</span>"), font_size = 10)

Top 10: Austria's longest serving MPs

Nationalrat only. As of 20.1.2023.

Data: parlament.gv.at. Analysis: Roland Schmidt | @zoowalk | https://werk.statt.codes

4.4.2 Longest serving female MPs

Note that there is not a single woman among the top 10 of the longest serving MPs. Let’s have look at female MPs only.

Code: Table top 10 female MPs

df_duration_mp_fem <- df_duration_mp %>%
  filter(gender == "female")

df_duration_mp_fem %>%
  mutate(row_id = row_number()) %>%
  select(row_id, name_1, party, duration_sum) %>%
  slice_head(., n = 10) %>%
  reactable(.,
    columns = list(
      row_id = colDef(
        align = "center",
        name = "Pos.",
        width = 50
      ),
      name_1 = colDef(
        name = "MP",
        width = 250
      ),
      party = colDef(
        name = "party",
        width = 100
      ),
      duration_sum = colDef(
        name = "Total number of days",
        align = "center",
        format = colFormat(
          separators = T
        )
      )
    ),
    details = function(index) {
      mandates <- df_duration_mandate %>%
        filter(gender == "female") %>%
        filter(pad_intern == df_duration_mp_fem$pad_intern[index]) %>%
        select(-gender, -link)
      tbl <- mandates %>%
        select(-pad_intern, -contains("name")) %>%
        arrange(desc(office_date_start)) %>%
        reactable(., columns = list(
          office = colDef(width = 150),
          office_date_start = colDef(name = "start"),
          office_date_end = colDef(name = "end"),
          duration = colDef(name = "duration mandate", format = colFormat(separator = T))
        ), outline = T, fullWidth = F, compact = T, theme = reactableTheme(backgroundColor = "lightgrey"))
      htmltools::div(style = list(margin = "12 px 45px"), tbl)
    },
    onClick = "expand",
    compact=T,
    rowStyle = list(cursor = "pointer"),
    fullWidth = FALSE,
    filterable = FALSE,
    theme = fivethirtyeight()
  ) %>%
  add_title(title = html("Top 10: Austria's longest serving <span style='background-color:black; color:white;'>female</span> MPs"), font_size = 15) %>%
  add_subtitle(
    subtitle = "Nationalrat only. As of 20.1.2023.", font_size = 12,
    font_weight="normal",
    margin=margin(t=8)    
    ) %>%
  add_source(source = html("Data: parlament.gv.at.Analysis: Roland Schmidt | @zoowalk | <span style='font-weight:bold'>https://werk.statt.codes</span>"), font_size = 10)

Top 10: Austria's longest serving female MPs

Nationalrat only. As of 20.1.2023.

Data: parlament.gv.at.Analysis: Roland Schmidt | @zoowalk | https://werk.statt.codes

4.4.3 Longest service active MPs

Let’s also check who is the longest serving MP among the currently serving MPs.

Code: longest serving active MPs

# not only those which were continously persent
df_current_mps <- df_duration_mandate %>%
  filter(office_date_end == "ongoing")

df_duration_mp_current <- df_duration_mp %>%
  semi_join(.,
    df_current_mps,
    by = "pad_intern"
  ) %>%
  rename(party_all = party) %>%
  left_join(., df_current_mps %>% select(pad_intern, party), by = "pad_intern") %>%
  unnest(party)


df_duration_mp_current %>%
  mutate(row_id = row_number()) %>%
  select(row_id, name_1, party, duration_sum) %>%
  reactable(.,
    columns = list(
      row_id = colDef(
        align = "center",
        name = "Pos.",
        width = 50
      ),
      name_1 = colDef(
        name = "MP",
        width = 250
      ),
      party = colDef(
        name = "party",
        width = 100
      ),
      duration_sum = colDef(
        name = "Total number of days",
        align = "center",
        format = colFormat(
          separators = T
        )
      )
    ),
    details = function(index) {
      mandates <- df_duration_mandate %>%
        filter(pad_intern == df_duration_mp_current$pad_intern[index]) %>%
        select(-gender, -link)
      tbl <- mandates %>%
        select(-pad_intern, -contains("name")) %>%
        arrange(desc(office_date_start)) %>%
        reactable(., columns = list(
          office = colDef(width = 150),
          office_date_start = colDef(name = "start"),
          office_date_end = colDef(name = "end"),
          duration = colDef(name = "duration mandate", format = colFormat(separator = T))
        ), outline = T, fullWidth = F, compact = T, theme = reactableTheme(backgroundColor = "lightgrey"))
      htmltools::div(style = list(margin = "12 px 45px"), tbl)
    },
    compact=T,
    onClick = "expand",
    rowStyle = list(cursor = "pointer"),
    fullWidth = FALSE,
    filterable = T,
    theme = fivethirtyeight()
  ) %>%
  add_title(title = html("Time in Parliament by <span style='color:white; background-color:black;'>currently serving</span> MPs"), font_size = 15) %>%
  add_subtitle(
    subtitle = "Nationalrat only. As of 20.1.2023.", font_size = 12,
    font_weight="normal",
    margin=margin(t=8)    
    ) %>%
  add_source(source = html("Data: parlament.gv.at.Analysis: Roland Schmidt | @zoowalk | <span style='font-weight:bold'>https://werk.statt.codes</span>"), font_size = 10)

Time in Parliament by currently serving MPs

Nationalrat only. As of 20.1.2023.

Data: parlament.gv.at.Analysis: Roland Schmidt | @zoowalk | https://werk.statt.codes

4.4.4 Distribution of time as MPs between parties as of today

The result of above serves as the basis to display the distribution of MPs’ time in the Nationalrat per party.

Code: Calculate distribution of MPs’ time in parliament and plot

vec_party_col <- c(
  "FPÖ" = "#0056a2",
  "ÖVP" = "#63c3d1",
  "SPÖ" = "#ce000c",
  "NEOS" = "#CB1967",
  "GRÜNE" = "#73A303",
  "ohne Klubzugehörigkeit" = "grey"
)

df_duration_mp_current %>%
  mutate(party = fct_infreq(party)) %>%
  mutate(duration_sum_num = (as.numeric(duration_sum, units = "days") / 365) %>% janitor::round_half_up(.)) %>%
  # mutate(duration_sum_num=as.numeric(duration_sum)) %>%
  ggplot() +
  labs(
    title = "Distribution of current MPs' time in parliament\nper party",
    x = "Number of days as MP",
    subtitle="NR only. As of 20.1.2023. Duration was rounded to full years ('half-up').",
    caption=txt_caption
  ) +
  # ggdist::stat_dots(
  #   aes(
  #     color = party,
  #     fill = party
  #   ),
  #   size = 3
  # ) +
  geom_bar(aes(
    x = duration_sum_num,
    group=pad_intern,
    fill=party),
    color="white",
    stat="count"
  )+
  scale_color_manual(values = vec_party_col) +
  scale_fill_manual(values = vec_party_col) +
  # # scale_x_continuous(label=scales::label_number(big.mark=","))+
  scale_x_continuous(labels = \(x)fn_label_unit(x, unit="years as MP") %>% str_wrap(., width=10), 
  breaks = seq(0, 30, 5),
  expand=expansion(mult=c(0, 0.25))) +
  scale_y_continuous(
    labels=\(x) fn_label_unit(x, unit="MPs"),
    breaks=seq(0,30, 10),
    expand=expansion(mult=c(0, 0.2)))+
  # str_wrap(., width = 10)) +
  ggthemes::theme_fivethirtyeight() +
  theme(
    panel.grid.minor.y = element_blank(),
    strip.text.x=element_text(hjust=0, face="bold"),
    strip.background.x = element_rect(fill="white"),
    strip.placement = "left",
    plot.caption = element_markdown(size=8),
    legend.position = "none",
    axis.text.x = element_text(hjust = 0),
    # axis.title.y=element_text(
    #   hjust=1,
    #   face="bold",
    #   size=9),
    plot.background = element_rect(fill = "white"),
    panel.background = element_rect(fill = "white")
  )+
  # facet_col(facet=vars(party))
  facet_wrap(facet=vars(party),
  ncol=2)

The plot is rather self-explanatory, but some details were new to me. I didn’t know that the Green’s basically changed their entire group of MPs with the last elections. Only one MP is not new, that’s Sigrid Maurer.

4.5 Result

Finally, with the start and end dates of MPs’ mandates now available, we only have to match them with the start/end dates of legislative periods to see who was actually an MP at each period’s beginning.

4.5.1 Identify MPs at beginning of legislative period

To get the start/end dates of legislative periods, I scrape a pertaining table from Wikipedia.

Code: Get data on start/end of legislative periods

url <- "https://de.wikipedia.org/wiki/Nationalrat_(%C3%96sterreich)"

tbl_parl <- url %>%
  xml2::read_html() %>%
  rvest::html_table() %>%
  .[[2]]

df_parl <- tbl_parl %>%
  janitor::clean_names() %>%
  filter(!wahltag == "Wahltag") %>%
  tidyr::separate(col = "zeitraumvon_bis", sep = "\\s\\p{Pd}\\s", into = c("date_start", "date_end")) %>%
  mutate(across(contains("date"), \(x) lubridate::dmy(x))) %>%
  mutate(gesetzgebungsperioden_num = str_extract(gesetzgebungsperiode_nationalversammlung, regex("^[^\\.]+(?=\\.\\sGesetzgebungs)"))) %>%
  mutate(gesetzgebungsperioden_num = utils::as.roman(gesetzgebungsperioden_num) %>% as.numeric()) %>%
  select(-wahltag, -wahl) %>%
  filter(str_detect(gesetzgebungsperiode_nationalversammlung, regex("Gesetzgebungsperiode")))

df_parl %>%
select(
gesetzgebungsperiode_nationalversammlung,
date_start,
date_end) %>%
  reactable(.,
    columns = list(
      gesetzgebungsperiode_nationalversammlung = colDef(name = "Gesetzgebunsperiode",
      width=250)
    ),
    compact=T,
    theme=fivethirtyeight()
  ) %>%
  add_title(title="Start/end dates of legislative periods", font_size=12) %>%
  add_source(source="Source: https://de.wikipedia.org/wiki/Nationalrat_(Österreich)", font_size=10)

Start/end dates of legislative periods

Gesetzgebunsperiode

date_start

date_end

I. Gesetzgebungsperiode

1920-11-10

1923-11-20

II. Gesetzgebungsperiode

1923-11-20

1927-05-18

III. Gesetzgebungsperiode

1927-05-18

1930-10-01

IV. Gesetzgebungsperiode

1930-12-02

1934-05-02

V. Gesetzgebungsperiode

1945-12-19

1949-11-08

VI. Gesetzgebungsperiode

1949-11-08

1953-03-18

VII. Gesetzgebungsperiode

1953-03-18

1956-06-08

VIII. Gesetzgebungsperiode

1956-06-08

1959-06-09

IX. Gesetzgebungsperiode

1959-06-09

1962-12-14

X. Gesetzgebungsperiode

1962-12-14

1966-03-30

1–10 of 27 rows

Source: https://de.wikipedia.org/wiki/Nationalrat_(Österreich)

If the start date of a new legislative period falls within an MP’s mandate period, the MP must have formed part of the initial composition of the NR at the beginning of the legislative period. Dplyr’s between join allows us to match the two dataframes accordingly.

Code: Identify MPs at begining of legislative period

df_details_long <- df_details_long %>%
mutate(office_date_end=case_when(
  is.na(office_date_end) & str_detect(office, regex("XXVII")) ~ Sys.Date(),
is.na(office_date_end) ~ office_date_start,
.default=office_date_end))

df_mps_gp_start <- df_parl %>%
left_join(., df_details_long, by=
join_by(between(date_start, office_date_start, office_date_end, bounds="[)"))) #bounds!

df_mps_gp_start_n <- df_mps_gp_start %>%
group_by(gesetzgebungsperioden_num) %>%
summarise(n_row=n()) %>%
arrange(desc(gesetzgebungsperioden_num))

df_gp_start_gender <- df_mps_gp_start %>%
group_by(gesetzgebungsperioden_num, gender) %>%
summarise(n=n()) %>%
mutate(rel=n/sum(n)) %>%
ungroup()

#add year
df_gp_start_gender_x <- df_gp_start_gender %>%
left_join(., df_parl %>% select(gesetzgebungsperioden_num, date_start) %>%
mutate(year_start=lubridate::year(date_start))) %>%
filter(gender=="female") %>%
select(-date_start, -n, -gender)

4.5.2 Add data from other parliaments (IPU data)

To put the share of female MPs into perspective, I add data from the Interpaliamentarian Union (IPU) on other parliaments as of September 2019, i.e. the start of the NR’s current legislative period.

Code: Get IPU data

df_ipu <- readr::read_csv(file="https://data.ipu.org/api/women-ranking.csv?load-entity-refs=taxonomy_term%2Cfield_collection_item&max-depth=2&langcode=en&month=10&year=2019", skip=4) %>%
janitor::row_to_names(row_number=1) %>%
janitor::clean_names() %>%
select(country=na_2, percent_w) %>%
mutate(percent_w=as.numeric(percent_w))

With this data available, we can finally produce our plot of interest.

Code: Plot development of share & contrast

pos <- position_jitter(width = 3, seed = 1) #define seed for positioning

df_all <- df_gp_start_gender_x %>%
mutate(country="Austria",
percent_w=rel*100) %>%
select(year=year_start, percent_w, country) %>%
filter(year<2019) %>%
bind_rows(df_ipu %>% mutate(year=2019))

df_all %>%
mutate(country_indicator=ifelse(country=="Austria", "red", "darkgrey")) %>%
filter(year<2020) %>%
ggplot()+
labs(
  title="Share of female MPs  in <span style='color:red'>Austria</span>'s Nationalrat <br>at the start of legislative periods.",
  subtitle="2019 in comparison with other parliaments worldwide.",
  caption="Souce: parlament.gv.at/ipu.org. Analysis & Graph: Roland Schmidt | @zoowalk | **https:&#47;&#47;werk.statt.codes**"
)+
# geom_point(aes(
#   x=year,
#   y=percent_w,
#   color=country_indicator
# ),
#   position=pos
# )+
geom_quasirandom(aes(
  x=year,
  y=percent_w,
  color=country_indicator,
  ),
  width = 2.5,
  position=pos)+
geom_text_repel(data=. %>%
filter(year==2019) %>%
filter(str_detect(country, regex("Cuba|Rwanda|Vanuatu|Kuwait|Germany|Switzerland"))),
# slice_head(.,n=5),
aes(
  x=year,
  y=percent_w,
  label=glue::glue("{country} ({percent_w}%)")
  ),
size=2.75,
color="black",
hjust="left",
direction="y",
min.segment.length = 0,
segment.size=.2,
segment.color="lightgrey",
nudge_x=10,
xlim=c(2021, NA),
ylim=c(0, NA)
)+
geom_text_repel(data=. %>%
filter(year==2019) %>%
filter(str_detect(country, regex("Austria"))),
# slice_head(.,n=5),
aes(
  x=year,
  y=percent_w,
  label=glue::glue("{country} ({percent_w}%)")
  ),
size=2.75,
color="red",
hjust="left",
direction="y",
min.segment.length = 0,
segment.size=.2,
segment.color="lightgrey",
nudge_x=10,
xlim=c(2021, NA),
ylim=c(0, NA)
)+
scale_color_manual(
  values=c("red"="red", "darkgrey"="darkgrey")
)+
scale_x_continuous(
  breaks=c(min(df_all$year), 1930, 1945, seq(1960, 2010, 10), max(df_all$year)),
  expand=expansion(mult=c(0.05, .2))
)+
scale_y_continuous(label=scales::label_percent(scale=1))+
ggthemes::theme_fivethirtyeight()+
theme(
  legend.position = "none",
  plot.title=element_markdown(),
  plot.caption = element_markdown(
    hjust = 0,
    size = rel(0.5),
    lineheight = 1.2),
  plot.background = element_rect(fill = "white"),
  panel.background = element_rect(fill = "white"),
  plot.title.position="plot",
  plot.caption.position="plot"
)

5 Fin

Et Voilà. Another blog post where I digressed somewhat from the initial idea, but I think it was worthwile. If you see any errors, have suggestions etc. don’t hesitate to contact me via twitter or mastadon DM.

Reuse

CC BY-NC 4.0

Citation

BibTeX citation:

@online{schmidt2023,
  author = {Schmidt, Roland},
  title = {Parliament’s New {API} - {How} to Access Data on {MPs}},
  date = {2023-02-10},
  url = {https://werk.statt.codes/posts/2023-01-18-parliament-new-api-mps},
  langid = {en}
}

For attribution, please cite this work as:

Schmidt, Roland. 2023. “Parliament’s New API - How to Access Data on MPs.” February 10, 2023. https://werk.statt.codes/posts/2023-01-18-parliament-new-api-mps.