Similarity of public submissions to Austria’s amendment of the epidemic law

Austria Corona virus OCR stringr web scraping

An analysis of public submissions to bill seeking to amend Austria’s epidemic law.

true
12-22-2020

Setup

Code: Load packages

Code: Define rmarkdown options

# knit_hooks$set(wrap = function(before, options, envir) {
#   if (before) {
#     paste0("<", options$wrap, ">")
#   } else {
#     paste0("</", options$wrap, ">")
#   }
# })

knitr::opts_chunk$set(
   fig.align = "left",
  message = FALSE,
  warning = FALSE,
   dev = "svglite",
#  dev.args = list(type = "CairoPNG"),
  dpi = 300,
   out.width = "100%"
)
options(width = 180, dplyr.width = 150)

Code: Define plot theme, party colors, caption

plot_bg_color <- readr::read_file(file=here::here("theme.css")) %>% 
  str_extract(., regex("(?<=blog-bg-color:).*?(?=;)")) %>%
  str_trim() %>% 
  str_extract(., regex("^#\\S+"))


theme_post <- function() {
  hrbrthemes::theme_ipsum_rc() +
    theme(
      plot.background = element_rect(fill = plot_bg_color, color=NA),
      panel.background = element_rect(fill = plot_bg_color, color=NA),
      #panel.border = element_rect(colour = plot_bg_color, fill=NA),
      #plot.border = element_rect(colour = plot_bg_color, fill=NA),
      plot.margin = margin(l = 0, 
                           t = 0.25,
                           unit = "cm"),
      plot.title = element_markdown(
        color = "grey20",
        face = "bold",
        margin = margin(l = 0, unit = "cm"),
        size = 13
      ),
      plot.title.position = "plot",
      plot.subtitle = element_text(
        color = "grey50",
        margin = margin(t = 0.2, b = 0.3, unit = "cm"),
        size = 11
      ),
      plot.caption = element_text(
        color = "grey50",
        size = 8,
        hjust = c(0)
      ),
      plot.caption.position = "panel",
      axis.title.x = element_text(
        angle = 0,
        color = "grey50",
        hjust = 1
      ),
      axis.text.x = element_text(
        size = 9,
        color = "grey50"
      ),
      axis.title.y = element_blank(),
      axis.text.y = element_text(
        size = 9,
        color = "grey50"
      ),
      panel.grid.minor.x = element_blank(),
      panel.grid.major.x = element_blank(),
      panel.grid.minor.y = element_blank(),
      panel.spacing = unit(0.25, "cm"),
      panel.spacing.y = unit(0.25, "cm"),
      strip.text = element_text(
        angle = 0,
        size = 9,
        vjust = 1,
        face = "bold"
      ),
      legend.title = element_text(
        color = "grey30",
        face = "bold",
        vjust = 1,
        size = 7
      ),
      legend.text = element_text(
        size = 7,
        color = "grey30"
      ),
      legend.justification = "left",
      legend.box = "horizontal", # arrangement of multiple legends
      legend.direction = "vertical",
      legend.margin = margin(l = 0, t = 0, unit = "cm"),
      legend.spacing.y = unit(0.07, units = "cm"),
      legend.text.align = 0,
      legend.box.just = "top",
      legend.key.height = unit(0.2, "line"),
      legend.key.width = unit(0.5, "line"),
      text = element_text(size = 5)
    )
}

data_date <- format(Sys.Date(), "%d %b %Y")

Context

Austria’s government tabled about three months ago an amendment to the country’s epidemic law. The amendment was largely a reaction to the shortcomings of the law dating from 1950 when confronted with the current Covid crisis as well as a recent ruling of Austria’s Constitutional Court which declared several measures passed by the government as violating the constitution. Hence, the government proposed the amendment.

While there would be a lot to say about the latent tension between civic rights and a state’s obligation to curtail an epidemic or the thin line between legitimate restrictions and excessive infringements, this post will deal with the public submissions to the government’s amendment bill. In short and crude terms, the legislative process in Austria provides the opportunity for citizens, NGOs, expert bodies etc to file submissions in which they can raise their concerns as to the (draft) version of the bill before it will be debated in parliament. So, at least in theory, it’s an avenue for the government to solicit input from the public at large.

What caught my eye, or better my ears, was a related radio news report which mentioned something along the lines that the submissions to the amendment reached a) a record number and b) that a considerable number of submissions were similar in their wording.

I guess the former is indicative for the fundamental issues which the amendment touches and the overall somewhat ‘edgy’, not to say polarized atmosphere when it comes to Corona. The latter point, though, the similarity of submissions’ wording, puzzled me a bit. What the similarity of the wording effectively means is the use of some kind of template by those filing a submission. In this regard, the news report mentioned that there have been some pertaining calls on social media channels to oppose the bill, possibly also including the provision of a template text.

It was this context which made me curious about the extent of the matter. At least curious enough to have an ‘empirical’ look at it with R.

So I went to the a) parliament’s website which provides a record of received submissions, b) extracted the weblinks to all those submissions of which the text is public, c) download the pertaining pdfs, d) extracted their content, and e) finally checked for their similarity. As for the last point, I can gladly report that I learned something here and my approach changed a bit as I moved along. Hence, the blog post somewhat mirrors this process.

When it comes to similarity, my first attempt was to simply look at the the recurrence of a distinct phrase which I repeatedly noticed while randomly glancing through some of the submissions. The formulation is Tür und Tor which means something like ‘(to open) the (flood)gates’. I guess it’s safe to assume that those submissions articulated severe misgivings about the amendment.

As I’ll show below, while this approach is already somewhat informative since it reveals a pervasive use of a rather distinct phrase and hence points towards the use of text template, it is rather inductive and likely to suffer from omissions of other, possibly even more re-current phrases. Furthermore, simply looking at one distinct phrase is likely to be a rather ‘fragile’ indicator. Only a minor, effectively non-substantive modification of the wording would result in missing out on similar submissions.

To make a long story short, this issue introduced me to the world of quantitative text analysis as provided by the powerful quanteda package. Admittedly, I am only starting to scratch the surface here, but hopefully deep enough to legitimately include it in this post. So, enough of context and waffling. In media R(es).

Get data

The list of submissions filed for the amendment is provided here at the Parliament’s website. To extract the links to the sub-pages which include the links to individual texts, I make us of the provided RSS feed. From there I extract the relevant elements and combine them to one dataframe containing the name of the person/institution filing a submission, the submission’s date, and the link to the sub-page.

Code: get links to submission sub-pages

link_rss_all_submissions <- "https://www.parlament.gv.at/PAKT/VHG/XXVII/ME/ME_00055/filter.psp?view=RSS&jsMode=&xdocumentUri=&filterJq=&view=&GP=XXVII&ITYP=ME&INR=55&SUCH=&listeId=142&FBEZ=FP_142"

data <- xml2::read_xml(link_rss_all_submissions)

#get link to subpages with link to submissions
df_submission_pages_link <- data %>% 
  xml2::xml_find_all("//link") %>% 
  html_text() %>% 
  enframe(., 
          name="id",
          value="link_single_submission_page") %>% 
  mutate(link_single_submission_page=str_squish(link_single_submission_page)) %>% 
  filter(id>2) #removes first two rows which don't include data on submissions

#get title
df_submission_pages_name <- data %>% 
  xml2::xml_find_all("//title") %>% 
  html_text() %>% 
  enframe(., 
          name="id",
          value="title") %>% 
  mutate(name=str_extract(title, regex("(?<=\\>).*(?=\\<)"))) %>% 
  filter(id>2) %>% 
  select(-title)

#get publication date
df_submission_pages_pub_date <- data %>% 
  xml2::xml_find_all("//pubDate") %>% 
  html_text() %>% 
  enframe(., 
          name="id",
          value="date") %>% 
  mutate(date=date %>% str_squish() %>% lubridate::dmy_hms(., tz="Europe/Vienna"))

#combine to one dataframe
df_submission <- bind_cols(
  df_submission_pages_name,
  df_submission_pages_pub_date,
  df_submission_pages_link,
  ) %>% 
  select(-contains("id"))

As it turns out, there were 6,670 submissions to the amendment in total.