Austria’s government tabled about three months ago an amendment to the country’s epidemic law. The amendment was largely a reaction to the shortcomings of the law dating from 1950 when confronted with the current Covid crisis as well as a recent ruling of Austria’s Constitutional Court which declared several measures passed by the government as violating the constitution. Hence, the government proposed the amendment.
While there would be a lot to say about the latent tension between civic rights and a state’s obligation to curtail an epidemic or the thin line between legitimate restrictions and excessive infringements, this post will deal with the public submissions to the government’s amendment bill. In short and crude terms, the legislative process in Austria provides the opportunity for citizens, NGOs, expert bodies etc to file submissions in which they can raise their concerns as to the (draft) version of the bill before it will be debated in parliament. So, at least in theory, it’s an avenue for the government to solicit input from the public at large.
What caught my eye, or better my ears, was a related radio news report which mentioned something along the lines that the submissions to the amendment reached a) a record number and b) that a considerable number of submissions were similar in their wording.
I guess the former is indicative for the fundamental issues which the amendment touches and the overall somewhat ‘edgy’, not to say polarized atmosphere when it comes to Corona. The latter point, though, the similarity of submissions’ wording, puzzled me a bit. What the similarity of the wording effectively means is the use of some kind of template by those filing a submission. In this regard, the news report mentioned that there have been some pertaining calls on social media channels to oppose the bill, possibly also including the provision of a template text.
It was this context which made me curious about the extent of the matter. At least curious enough to have an ‘empirical’ look at it with R.
So I went to the a) parliament’s website which provides a record of received submissions, b) extracted the weblinks to all those submissions of which the text is public, c) download the pertaining pdfs, d) extracted their content, and e) finally checked for their similarity. As for the last point, I can gladly report that I learned something here and my approach changed a bit as I moved along. Hence, the blog post somewhat mirrors this process.
When it comes to similarity, my first attempt was to simply look at the the recurrence of a distinct phrase which I repeatedly noticed while randomly glancing through some of the submissions. The formulation is Tür und Tor which means something like ‘(to open) the (flood)gates’. I guess it’s safe to assume that those submissions articulated severe misgivings about the amendment.
As I’ll show below, while this approach is already somewhat informative since it reveals a pervasive use of a rather distinct phrase and hence points towards the use of text template, it is rather inductive and likely to suffer from omissions of other, possibly even more re-current phrases. Furthermore, simply looking at one distinct phrase is likely to be a rather ‘fragile’ indicator. Only a minor, effectively non-substantive modification of the wording would result in missing out on similar submissions.
To make a long story short, this issue introduced me to the world of quantitative text analysis as provided by the powerful quanteda package. Admittedly, I am only starting to scratch the surface here, but hopefully deep enough to legitimately include it in this post. So, enough of context and waffling. In media R(es).
3 Get data
3.1 Links to submission subpages
The list of submissions filed for the amendment is provided here at the Parliament’s website. To extract the links to the sub-pages which include the links to individual texts, I make us of the provided RSS feed. From there I extract the relevant elements and combine them to one dataframe containing the name of the person/institution filing a submission, the submission’s date, and the link to the sub-page.
Code: get links to submission sub-pages
link_rss_all_submissions <-"https://www.parlament.gv.at/PAKT/VHG/XXVII/ME/ME_00055/filter.psp?view=RSS&jsMode=&xdocumentUri=&filterJq=&view=&GP=XXVII&ITYP=ME&INR=55&SUCH=&listeId=142&FBEZ=FP_142"data <- xml2::read_xml(link_rss_all_submissions)#get link to subpages with link to submissionsdf_submission_pages_link <- data %>% xml2::xml_find_all("//link") %>%html_text() %>%enframe(., name="id",value="link_single_submission_page") %>%mutate(link_single_submission_page=str_squish(link_single_submission_page)) %>%filter(id>2) #removes first two rows which don't include data on submissions#get titledf_submission_pages_name <- data %>% xml2::xml_find_all("//title") %>%html_text() %>%enframe(., name="id",value="title") %>%mutate(name=str_extract(title, regex("(?<=\\>).*(?=\\<)"))) %>%filter(id>2) %>%select(-title)#get publication datedf_submission_pages_pub_date <- data %>% xml2::xml_find_all("//pubDate") %>%html_text() %>%enframe(., name="id",value="date") %>%mutate(date=date %>%str_squish() %>% lubridate::dmy_hms(., tz="Europe/Vienna"))#combine to one dataframedf_submission <-bind_cols( df_submission_pages_name, df_submission_pages_pub_date, df_submission_pages_link, ) %>%select(-contains("id"))
As it turns out, there were 6,670 submissions to the amendment in total.
Notice that there are several submissions which state ‘Keine öffentliche Stellungnahme’ (no public statement) as a name. Hence, we will not be able to analyse the text of all submissions, but only of those which are made public. How many are there? Let’s see:
Out of 6,670 submissions, the text of 3,785 is publicly available (no idea why some are published and others not, probably it’s up to the submitter to decide). While the high number of non-public submissions is obviously quite a caveat for the further analysis, we still have the text of more than 50 % of all submissions to look into.
3.2 Links to pdfs
Now let’s use the links to the submissions’ sub-pages and search in each sub-page for a link which leads to the actual pdf document of the submission. By using the browser’s developer tools (F12 key), we can see the html code behind the page. Clicking on the link of one submission reveals that the link contains the word ‘imfname’. I use this string to identify the links to the submissions’ pdfs and neglect all other links and pdfs which are also on the site.
To do so, below the function which is applied to all (!) links leading to the sub-pages with the individual submissions. It reads the html code, extracts the links (href), converts the results into a dataframe, and subsequently filters out the links which include ‘imfname’. I then complement the extracted link with the missing part of the web address (www.parlament….). Since there are several submission (sub-pages) which state that the text is not public, and don’t include a link to a pdf, I insert in these cases a ‘missing’ instead of the link. To accelerate the process I apply the function in parallel by using `furrr::future_map`.
Code: read htmlof sub-pages, extract pdf-link
# function to identify pdf linkfn_get_doc_links <-function(x){ link_doc <- x %>%read_html() %>%html_nodes("a") %>%html_attr("href") %>%enframe(name=NULL,value="link_part") %>%filter(str_detect(link_part, "imfname")) %>%mutate(link_doc=paste0("https://www.parlament.gv.at", link_part)) %>%pull(link_doc)ifelse(length(link_doc)>0, link_doc, "missing")} #apply function to all linksdf_submission <- df_submission %>%mutate(link_doc=future_map_chr(link_single_submission_page, possibly(fn_get_doc_links,otherwise="missing"),.progress = T))
3.3 Download submissions
Now with all pdf links available we can start to download them. To do so, I use the walk2 function of the purrr package. Note that I also use magrittr’s %$%operator to expose/pipe into the the walk2 function.
After the steps above we now have the all the available submission files at our disposal. Finally time to start working on the actual texts.
3.4 Extract text from pdfs
When it comes to extracting text by means of OCR, pdftools by Jeroen Ooms is the package of choice. In preparation, we need to get the German language data. Extracting text from pdfs can be quite a time consuming matter. Going through the more than 3700 submissions initially took me more than 7 hours. Reducing the dpi value from 300 to 150 accelerated the procedure quite considerably, down to approx. 60 minutes, and I didn’t notice any relevant decrease of the output’s quality.
Let’s pause for a moment and take stock: There are 6,623 submissions, out of which 3,762 are public. From the latter we were able to extract 3,750. Hence, 12 texts could not be retrieved.
4 Analysis
4.1 Identify submissions with key formulation
Now with submissions’ texts cleaned etc, let’s identify those which contain the formulation which appeared quite frequently when randomly browsing through some of the texts: Tür und Tor.
tb_submission <- df_tb_submission %>%gt() %>%tab_header(title =md("**Stellungnahmen zur Novelle des Epidemiegesetzes**"),subtitle =md(glue::glue(md(" Von den {df_tb_submission %>% filter(!str_detect(indicator_tbl, 'Keine')) %>% summarise(n_abs=sum(n_abs)) %>% pull(n_abs) %>% scales::comma()} Stellungnahmen von denen der Text öffentlich ist enthalten rund {df_tb_submission %>% filter(str_detect(indicator_tbl, 'Formulierung')) %>% pull(n_rel_public) %>% scales::percent()} die Formulierung *'{submission_pattern}'*.")))) %>%tab_options(heading.align ="left",row_group.font.weight ="bold",table.background.color=plot_bg_color) %>%tab_style(style =list(cell_fill(color ="orange"),cell_text(weight ="bold") ),locations =cells_body(columns =c(n_abs),rows = n_abs ==2204) )%>%tab_style(style =list(cell_fill(color ="orange"),cell_text(weight ="bold") ),locations =cells_body(columns =c(n_abs, n_rel, n_rel_public),rows =str_detect(indicator_tbl, "Formulierung")) ) %>%cols_label(indicator_tbl="Stellungnahme",n_abs="Anzahl",n_rel="Anteil alle St.",n_rel_public="Anteil öff. St.") %>%sub_missing(columns=c(n_rel_public), missing_text ="--") %>%fmt_number(columns=c(n_abs), decimals=0) %>%fmt_percent(columns =c(n_rel, n_rel_public), decimals =1) %>%tab_source_note(source_note =md("data: www.parlament.gv.at, 'Ministerialentwurf betreffend Bundesgesetz, mit dem das Epidemiegesetz 1950, das Tuberkulosegesetz und das COVID-19-Maßnahmengesetz geändert werden.'<br>analysis: Roland Schmidt | @zoowalk | https://werk.statt.codes"))
As the table below shows, in 2,341 submissions the formulation ‘Tür und Tor’ appears. This number amounts to 35% of all submissions, and to 62% of all submissions for which we can access the text. I think that’s quite remarkable in the sense that it is suggest for quite pervasive use of a submission template.
Stellungnahmen zur Novelle des Epidemiegesetzes
Von den 3,762 Stellungnahmen von denen der Text öffentlich ist enthalten rund 62% die Formulierung 'Tür und Tor'.
Stellungnahme
Anzahl
Anteil alle St.
Anteil öff. St.
Keine öffentliche Stellungnahmen
2,861
43.2%
–
enthalten Formulierung 'Tür und Tor'
2,341
35.3%
62.2%
andere öffentliche Stellungnahmen
1,421
21.5%
37.8%
data: www.parlament.gv.at, 'Ministerialentwurf betreffend Bundesgesetz, mit dem das Epidemiegesetz 1950, das Tuberkulosegesetz und das COVID-19-Maßnahmengesetz geändert werden.' analysis: Roland Schmidt | @zoowalk | https://werk.statt.codes
4.3 Distribution of length
Let’s approach the similarity of submissions from another angle now: The length of the text. The graph below shows the distribution of submissions’ length in terms of number of characters. One density curve shows the distribution for submissions which include the key phrase ‘Tür und Tor’, the other one shows the distribution of those which don’t include it (and for which the text is available).
Code: distribution of length
# distribution of length --------------------------------------------------my_caption <- glue::glue("data: www.parlament.gv.at; Stellungnahmen zum Ministerialentwurf betreffend Bundesgesetz, mit dem das Epidemiegesetz 1950, das Tuberkulosegesetz \nund das COVID-19-Maßnahmengesetz geändert werden.\nanalysis: Roland Schmidt | @zoowalk | https://werk.statt.codes")pl_length <- df_submission %>%filter(public!="not public") %>%filter(doc_length<1000) %>%#remove outliersggplot(., aes(y=pattern_indicator,x=doc_length,fill=pattern_indicator,color=pattern_indicator) )+labs(title=paste("BEGUTACHTUNGSVERFAHREN EPIDEMIEGESETZ-NOVELLE:<br>","Verteilung der Länge von Stellungnahmen mit u. ohne Formulierung ","'",str_trunc(submission_pattern, 20),"'"),subtitle="Zur besseren Sichtbarkeit wurden Stellungnahmen mit mehr als 1000 Zeichen ausgeschlossen.",x="Länge Stellungnahme in Zeichen",caption=my_caption)+ ggridges::geom_density_ridges(scale=5)+scale_fill_manual(values=c("FALSE"="darkolivegreen4","TRUE"="orange"),labels=c("FALSE"=glue::glue("ohne '{str_trunc(submission_pattern, 20)}'"),"TRUE"=glue::glue("mit '{str_trunc(submission_pattern, 20)}'")))+scale_color_manual(values=c("FALSE"="darkolivegreen4","TRUE"="orange"),labels=c("FALSE"=glue::glue("ohne '{str_trunc(submission_pattern, 20)}'"),"TRUE"=glue::glue("mit '{str_trunc(submission_pattern, 20)}'")))+scale_y_discrete(expand=expansion(mult=c(0, 0.1)))+theme_post()+theme(plot.title.position ="plot",panel.grid.minor.y =element_blank(),panel.grid.major.x =element_blank(),panel.grid.minor.x =element_blank(),plot.caption =element_text(hjust=0),legend.position ="top",legend.justification ="left",legend.direction ="horizontal",legend.title=element_blank(),axis.title.y =element_blank(),axis.text.y=element_blank())+guides(fill=guide_legend(reverse=T),color=guide_legend(reverse=T))
What the graph shows is that the length of submissions which contain the formulation ‘Tür und Tor’ is concentrated at around 350 characters, while the length of other submissions is more dispersed. Hence, when it comes to the length of documents, those containing the distinct formulation are also pretty similar in term of their length. I take this as another indicator for the similarity of the submissions with our search phrase. As for the density curve of those not containing the search phrase, note that there is still a little ‘bump’ around 350. This could potentially reflect that we are missing out on some submissions which may originate from the same submission template, but do not include the exact wording of our search phrase.
4.4 Submissions over time
I was also wondering whether there is any ‘temporal’ pattern as to the date of the submission. If the submissions were the result of a social media campaign, are the timings of their submissions clustered? This could be read as a reaction to e.g. a social media post providing a template and asking to submit it.
I think the bar chart above doesn’t suggest any temporal clustering of submissions including ‘Tür und Tor’. But considering that the speed with which the amendment was introduced and the very short period to file submissions doesn’t really allow for much clustering.
5 Analysis with ‘quanteda’
As already indicated in the introduction, the above analysis - while informative to some extent - has its limitations. Limiting the search to the presence/absence of the key phrase ‘Tür und Tor’ is prone to miss out on other wordings which might be even more prevalent in some of the submissions. Furthermore, only marginal changes to the wording could result in the omission of submissions which are otherwise very similar.
To overcome these limitations and approach the issue of text similarity in a methodologically more rigors manner, let’s now take up some of the features of the quanteda package. I never worked with the package before but from what I can tell it provides a very comprehensive and powerful set of tools for quantitative text analysis (for an overview see here).
To be able to use the offered tools, we have to convert our collection of submission texts into a ‘corpus’. Before doing so, I further clean the extracted texts so that we only have the actual text of the submission and not any header, footer, or meta-data. Then let’s create a corpus.
The corpus is essentially a named vector with the names of the submission files the names of the vector’s elements.
Corpus consisting of 6 documents and 1 docvar.
835051.pdf :
"Sehr geehrte Damen und Herren! Gegen den Gesetzesentwurf des..."
833930.pdf :
"Ich erhebe schärfste Einwendungen gegen diesen Gesetzesentwu..."
833003.pdf :
"Keine Stellungnahme"
831523.pdf :
"Ich erhebe schärfste Einwendungen gegen diesen Gesetzesentwu..."
830585.pdf :
"Ich kann den vorgesehenen Änderungen nicht zustimmen, da wir..."
829581.pdf :
"Ich weigere mich massiv gegen diese Gesetzesänderung da dies..."
5.1 Collocations
With this as the basis, we can now identify all collocations e.g. which comprise 3 words as e.g. in ‘Tür und Tor’. The textstat_collocation function provides us with the frequencies for all (!) collocations of the requested length. For performance reasons I limited the results to those collocations which result in more than 500 hits. For a detailed description of the function see here.
And as it turns out, our used inductively selected key phrase ‘Tür und Tor’ is close to the most frequent one. The collocation ‘gegen diesen gesetzesentwurf’ appears 2,372.1
5.2 Cosine similarity
Quantitative text analysis offers also the concept of cosine similarity when it comes to evaluating the similarity of two or more documents. I won’t dig into the details here, but want to refer you to the video below which provides in my view a clear and accessible overview.
One of the convenient properties of the concept is that it allows us to compare documents of any lengths along a scale of 0 (not similar at all) to 1 (identical).
In order to control for different document length when it comes to check for similarity, we have to create a weighted document feature matrix which accounts for the relative frequency of words (see here).
What we obtain now is a comparison of all submission pairs (!) as to their cosine similarity. Overall, these are 5,886,253 pairs (duplicates removed, i.e. document 1 vs document 2 = document 2 vs document 1). Seems like text analysis can quickly get a bit large. To keep the table manageable I only show the first thousand.
To be able to group these pairs according to their similarity, I cut their cosine similarity values into 10 intervals (sim_interval). We can now count the number of submission pairs per interval. The barplot below presents the relative result, i.e. the share of submission pairs from the total number of pairs.
Code: Frequency of pairs per cosine interval
df_similarity <- df_similarity %>%left_join(., df_submission %>%select(file_name, pattern_indicator),by=c("document1"="file_name")) %>%rename(document1_indicator=pattern_indicator) %>%left_join(., df_submission %>%select(file_name, pattern_indicator),by=c("document2"="file_name")) %>%rename(document2_indicator=pattern_indicator) %>%mutate(indicator=case_when(document1_indicator==T & document2_indicator==T ~"both submissions with key phrase", document1_indicator==F & document2_indicator==F ~"both submissions without key phrase",TRUE~as.character("one submission without key phrase")))vec_color_submissions <-c("#FFC125", "#104E8B", "grey50")names(vec_color_submissions) <-c("both submissions with key phrase","one submission without key phrase","both submissions without key phrase")#barplotdf_bar_sim <- df_similarity %>%group_by(sim_interval, indicator) %>%summarise(interval_n=n()) %>%ungroup() %>%mutate(interval_rel=interval_n/sum(interval_n)) %>%ungroup() %>%mutate(law="EpidemieG") %>%mutate(indicator=fct_relevel(indicator, names(vec_color_submissions)))pl_bar_sim <- df_bar_sim %>%ggplot() +labs(title="PUBLIC CONSULTATION PROCEDURE ON EPIDEMIC LAW:<br> Distribution of all submission pairs' cosine similarity",subtitle="",y="% of all submission pairs",x="cosine similarity",caption=my_caption)+geom_bar(aes(x=sim_interval,y=interval_rel,fill=indicator),position=position_stack(),stat="identity")+geom_text(# data=. %>%# filter(sim_interval %in% c("[0.7,0.8)", "[0.8,0.9)", "[0.9,1]")) %>%# filter(str_detect(indicator, "both submissions with key phrase")),data=. %>%mutate(interval_rel_label=case_when( indicator=="both submissions without key phrase"~NA_real_, interval_rel>0.05~ interval_rel,TRUE~NA_real_)),aes(x=sim_interval,y=interval_rel,label=interval_rel_label %>% scales::percent(., accuracy =0.01),group=indicator),position=position_stack(vjust=0.5, reverse = F),color="white")+# stat_summary(fun.y = sum, # aes(x=sim_interval,# y=interval_rel+.1,# label = ..y.. %>% scales::percent(., accuracy = 0.01), # group = sim_interval), # # position=position_stack(),# # vjust=2,# geom = "text")+geom_text(aes(x=sim_interval,y=interval_rel,label = ..y.. %>% scales::percent(., accuracy =0.01),group = sim_interval), stat ='summary', fun = sum, vjust =-1 )+scale_y_continuous(labels=scales::label_percent(accuracy =1),expand=expansion(mult=c(0, 0.1)))+scale_fill_manual(values=vec_color_submissions)+theme_post()+theme(axis.title.y =element_text(angle =90,size=9,color="grey50",hjust=1),legend.direction ="horizontal",legend.position="top",plot.subtitle =element_blank(),plot.title=element_markdown(margin=unit(0.5, "cm")),legend.title=element_blank(),legend.text =element_text(size=9))
If we take 0.7 as a cut-off point, we can conclude that 27.71% of all submission pairs (!) were very similar. Note that among these submission pairs, there is no pair in which not at least one document contains our key phrase ‘Tür und Tor’. On the other end of the scale, we see that the largest junk of non-similar pairs (18.14% in the interval [0, 0.1)) consists of pairs of one document with the key phrase and the other document without it.
Note that I removed the key phrase before calculating the cosine similarity. Hence, the high degree of similarity of submission pairs containing the key phrase does not originate from the key phrase, but from the rest of the texts!
# A tibble: 30 × 4
# Groups: sim_interval [10]
sim_interval indicator sum_interval sum_indidicat…¹
<fct> <fct> <dbl> <dbl>
1 [0,0.1) both submissions with key phrase 0.00985 0.0391
2 [0,0.1) one submission without key phrase 0.181 0.721
3 [0,0.1) both submissions without key phrase 0.0604 0.240
4 [0.1,0.2) both submissions with key phrase 0.0161 0.117
5 [0.1,0.2) one submission without key phrase 0.0942 0.682
6 [0.1,0.2) both submissions without key phrase 0.0278 0.201
7 [0.2,0.3) both submissions with key phrase 0.0232 0.315
8 [0.2,0.3) one submission without key phrase 0.0415 0.562
9 [0.2,0.3) both submissions without key phrase 0.00911 0.123
10 [0.3,0.4) both submissions with key phrase 0.0285 0.483
# … with 20 more rows, and abbreviated variable name ¹sum_indidicator_rel
Another way to approach the issue is to look into the composition of each interval. The plot below provides the pertaining overview. As it becomes quite clear, the the share of submission pairs with both documents containing our search phrase becomes higher the more similar the documents are. Again, let’s note that the detected similarity does not originate from both documents containing ‘Tür und Tor’ since I removed these words before calculating pairs’ similarity. I take plot below as another indication of the use of a submission template.
Finally, to further highlight the similarity of documents with Tür und Tor in contrast to other submission pairs, let’s contrast the distribution of of the cosine similarity values per group with a violin plot.
Code: groupwise comparison
#create indicator for each documentdf_submission <- df_submission %>%mutate(search_term=str_detect(doc_text, regex(submission_pattern,ignore_case = T,multiline = T,dotall = T))) #add indicator to pairwise similarity comparison for first documentdf_similarity <- df_similarity %>%left_join(., df_submission %>%select(file_name, search_term),by=c("document1"="file_name")) %>%rename(document1_term=search_term)#add indicator to pairwise similarity comparison for first documentdf_similarity <- df_similarity %>%left_join(., df_submission %>%select(file_name, search_term),by=c("document2"="file_name")) %>%rename(document2_term=search_term)#create one, both, none groupsdf_similarity <- df_similarity %>%mutate(group_id=case_when( document1_term==TRUE& document2_term==TRUE~"both", document1_term==FALSE& document2_term==FALSE~"none", document1_term==TRUE& document2_term==FALSE~"one", document1_term==FALSE& document2_term==TRUE~"one",TRUE~as.character("missing")))
Code
pl_sim_groups <- df_similarity %>%mutate(indicator=fct_relevel(indicator, names(vec_color_submissions))) %>%ggplot()+labs(title="PUBLIC CONSULTATION PROCEDURE ON EPIDEMIC LAW:<br>Cosine similarity of submission pairs per group",subtitle=str_wrap(glue::glue("Grouped by presence/absence of key phrase '{submission_pattern}'. Key phrase was removed before calculating cosine similarity."),110),y="cosine similarity",caption=my_caption)+geom_violin(aes(x=indicator,y=cosine,fill=indicator),color=NA)+theme_post()+theme(legend.position ="none",axis.title.x =element_blank())+scale_fill_manual(values=vec_color_submissions)
I think the result shows quite clearly that those submissions which include the key phrase are much more similar to each other than all other groups of submission pairs even after having removed the key phrase. The graph also shows that those submission which do not contain the wording are hardly similar to each other.
5.3 Comparision with other bills
But how does this result contrast with e.g. submissions to other bills. To put the results for the epidemic bill into context, I repeated the analysis from above for another bill, the 2018/19 Fundamental Law on Social Welfare (‘Sozialhilfegrundgesetz’).2 The plot below contrasts the distribution of cosine similarities between submission pairs.
I think the result is quite telling. In contrast to epidemic law, the values pertaining to the Law on Social Welfare are almost normally distributed and centered on the [0.3, 0.4) interval. In other words, there almost no submissions which are particularly similar or dissimilar. I would think that this is a distribution typical for submissions without the (prevalent) presence of a text template.
6 Wrap up
So that’s it (for now). Again, this got more lengthy than initially intended. A next step could be to see whether the similarity to submissions grew since the ability to file submissions via email. After all, copying a template, modifying it a bit before submission is not too difficult. Maybe as a final word, this is not to say that copy-and-paste submissions are of some sort of diminished democratic value. As always, if you have any comment or question regarding the post, feel free to contact me via DM on twitter..
Footnotes
Note that the frequency of Tür und Tor quanteda result is different than the frequency I detected earlier. I am not entirely sure why this is, but I assume it’s due to different approaches. While our result above is the actual↩︎
I don’t reproduce the code for the analysis here, but it is available on my github.↩︎
@online{schmidt2020,
author = {Schmidt, Roland},
title = {Similarity of Public Submissions to {Austria’s} Amendment of
the Epidemic Law},
date = {2020-12-22},
url = {https://werk.statt.codes/posts/2020-12-22-submissionsepidemiclaw},
langid = {en}
}