Elections in Vienna are today and while glancing through the electoral lists I couldn’t help but paying attention to candidates’ birth years. Maybe that’s an age thing… This got me thinking that I haven’t seen any more systematic analysis of parties’/candidates’ age profile. So as a modest contribution to this end, here are my two cents. Again, I’ll focus mainly on the pertaining steps in R and related number crunching. Due to a lack of time and not being an expert on Vienna’s electoral system, I’ll be brief when it comes to substantive matters. But the presented results hopefully provide sufficient material to dig into.
As always, if you see any glaring error or have any constructive comment, feel free to let me now (best via twitter DM).
3 Data
Again, as so often, the trickiest part is to get the data ‘liberated’ from the format it is provided in. The entire list of candidates is published in this pdf. Note that there are three lists: One for the city council (‘Gemeinderat’; composed on the basis of the results in 18 multi-member electoral districts), one for the 23 district councils (‘Bezirksrat’; one in each district), and the ‘city election proposal’ (‘Stadtwahlvorschlag’; admittedly a somewhat clumsy translation). The latter doesn’t constitute a body in itself, but serves to allocate mandates which remained unassigned after counting the votes for the city council (‘zweites Ermittlungsverfahren’/similar to the d’Hondt procedure).
When it comes to extracting the data from the linked pdf, a difficulty may arise due to the two-column format of the document. Hence, simple row-wise extraction doesn’t help much since it would put candidates together which could be from different parties. Similarly, simply isolating the two columns and extracting candidates would also not do the trick since breaks betwee parties, districts etc run over two, and not one column. To illustrate this, I resorted to cutting edge technology and drew the two arrows below:
Luckily, the tabulizer package is not only very powerful when it comes to extracting text/data from a pdf, it is also sophisticated enough to take into consideration the text flow highlighted above. I am not familiar with the underlying heuristic, but I assume it is contingent on consistently formatted section headings. Hence, empowered with this tool, retrieving the text becomes rather effortless. The subsequent steps are a battery of regular expressions to extract the specific data we are interested in. To see the code, unfold the snippets below.
Code: Extract text from pdf
Code
df_raw <- tabulizer::extract_text(file=here::here("_blog_data", "vienna_elections_2020","amtsblatt2020.pdf"),pages=c(1:115),encoding="UTF-8") %>%#capital letters of UTF!enframe(name=NULL,value="text_raw")
After these few steps we have a searchable/sortable table of all candidates (or better candidatures since one person can be candidate on multiple districts/lists). There have been 8,983 candidatures by 5,038 individuals.
The table essentially provides all necessary data for the subsequent analysis. While the pdf does not include the exact birth date of each candidate, it provides us with their birth years which we can take as a proxy for age. Note that I also extracted candidates’ residence zip code to see how often place of residence and candidature actually overlap (see below).
Source:
data: https://www.wien.gv.at/politik/wahlen/grbv/2020/ analysis: Roland Schmidt | @zoowalk | https://werk.statt.codes
4 Analysis
4.1 Oldest and youngest candidates
Let’s now look at the overall youngest and oldest candidates. We could retrieve this information already from the main table provided above (sort column birth year). Here, however, let’s nest candidates’ different candidatures for the sake of clarity.
The oldest candidate is Waschiczek Wolfgang, who was born in 1928. Not bad.
4.2 Avgerage birth year per election and party
Let’s now look at the average year of birth of parties’ candidates on each of the different electoral levels. The table below provides the median, mean and standard deviation for each party. The thin white line in the density plots on the right indicates the median.
WIEN-WAHLEN 2020: Durchschnittliches Geburtsjahr der KandidatInnen
Median
arith. Mittel
Std. Abw.
Verteilung1
Gemeinderat
1. VOLT
1991.5
1992.12
2.85
2. SÖZ
1988.0
1985.63
10.57
3. BIER
1986.0
1984.50
5.83
4. LINKS
1983.0
1978.02
16.36
5. WIFF
1981.0
1975.00
26.75
6. NEOS
1977.0
1977.59
13.20
7. SPÖ
1976.0
1975.66
12.42
8. ÖVP
1975.0
1975.26
16.06
9. GRÜNE
1973.0
1973.43
12.83
10. FPÖ
1970.0
1971.81
14.96
11. HC
1968.0
1970.49
12.92
12. PRO
1959.0
1963.27
13.30
Bezirksvertretungen
1. VOLT
1992.0
1992.27
4.18
2. WANDL
1990.0
1986.62
12.73
3. KURZ
1989.0
1987.71
7.90
4. SÖZ
1988.0
1985.35
11.17
5. BIER
1986.0
1986.77
3.26
6. LINKS
1985.0
1979.47
15.52
7. NEOS
1980.0
1979.68
14.28
8. GRÜNE
1973.0
1973.55
13.48
9. SPÖ
1972.0
1972.73
15.30
9. ÖVP
1972.0
1972.44
18.37
11. PdA
1969.0
1967.67
13.05
12. WIR
1968.5
1969.25
5.95
13. FPÖ
1968.0
1969.75
16.46
13. HC
1968.0
1970.16
12.91
13. VOLK
1968.0
1970.67
15.18
16. PRO
1964.5
1964.33
13.21
17. WIEN
1960.0
1958.67
7.89
18. WIFF
1958.0
1961.77
20.93
19. PH
1955.0
1957.40
5.94
Stadt
1. VOLT
1991.5
1992.12
2.95
2. LINKS
1987.0
1981.74
13.88
3. BIER
1986.0
1984.70
6.52
4. SÖZ
1984.5
1983.65
11.38
5. ÖVP
1977.0
1976.58
15.48
6. NEOS
1976.5
1977.02
13.89
7. SPÖ
1976.0
1975.51
13.18
8. GRÜNE
1973.0
1973.79
12.25
9. PRO
1970.0
1966.60
9.40
10. FPÖ
1968.0
1969.77
15.82
11. HC
1967.0
1968.61
13.00
data: https://www.wien.gv.at/politik/wahlen/grbv/2020/
analysis: Roland Schmidt | @zoowalk | https://werk.statt.codes
1 Vertical white line indicates median.
I won’t dig into every details, so only a few general remarks: On the city council level (‘Gemeinderat’) the established parties (SPÖ, FPÖ, Greens, ÖVP, Neos) feature (on average) somewhat older candidates than other, newer parties. On the district level (‘Bezirksvertretungen’), the former are occupying more of a middle ground. Among them, Neos’ candidates are on average younger, FPÖ candidates older. Funnily, when ranking parties according their median age, FPÖ and Team HC Strache (=former leader of FPÖ) are always adjacent.
4.3 Average birth year per electoral districts
Let’s now take up a ‘geographical perspective’ and see how birth years are distributed within electoral districts and the difference between the latter.
There is quite a difference in the median value between Ottakring (the youngest, 1979) and Donaustadt (the oldest, 1970.5). But then again the median value is just an aggregate value and conveys only a bit of the wider picture. I think the distribution curves are more telling. Whether that’s useful information or not, I am not entirely sure. At least it’s something, I have not seen before.
Plot: District Councils
Plot: City electoral proposal
4.4 Average birth year per electoral district and party
Now let’s dig deeper and see for the distribution of birth years by electoral districts and parties, again for each election (city council, district councils).
4.4.1 Graph
Hover with the mouse over the box plot and dots to obtain the pertaining data.
WIEN-WAHL 2020: Durchschnittliches Alter pro Wahlbezirk
Geburtsjahr lt. Wahlvorschlag als Basis.
4.5 Birth year and position on electoral list
Another question which came to my mind: How is a candidate’s birth year/age related to his or her position on the electoral list. Are more senior candidates more likely to be on more attractive, i.e. top positions? Or is youth related to competitive positions? Are there any differences between the parties or elections (city vs district level) when it comes to this relation? Hover with the mouse over the individual dots to obtain data on indivdual candidates.
Plot: City Council - Age and position on electoral list
Interestingly, there are some subtle differences between the regression lines’ slope. Note how it increases for higher birth years for the SPÖ and the FPÖ. In other words, younger candidates tend to be on higher (less attractive) positions on the electoral list for these two parties. As for the other major parties, there is no comparable relation, at least not as visible as in the two previous cases. Obviously, this is purely descriptive.
To see the results for the other electoral levels unfold the sections below.
Plot: District Councils - Age and position on electoral list
Plot: City - Age and position on electoral list
4.6 Non-resident candidates
tb_residence_party When extracting data from the pdf document with candidates’ details, I also retrieved the zip code of their places of residence. The idea was to see whether candidates actually live in the electoral districts where they are running for office. For the lack of a better term, I call these candidates ‘non-resident candidates.’
I could imagine that ‘non-residency’ conflates quite a number of factors, and the reasons and consequences of ‘non-residency’ are likely to be quite complex. A possible hypotheses could be that parties with high number of non-resident candidates are unevenly institutionalized across the city and hence have to bring in members from outside of the electoral district. But again, this is purely speculative and I haven’t thought, let alone read about it systematically. Nevertheless, as some kind of tentative inquiry, I thought it’s worth looking at the numbers. Three specific questions came to my mind: 1) Are there parties which feature particularly frequently such non-resident candidates? 2) Are there districts where non-resident candidates are particularly frequently running for office? 3) Do the shares of non-resident candidates differ between elections for the city council and for the district councils? I would think that the relation between a candidate and her electoral district, i.e. place of residence, is more of a significance in the latter case.
4.6.1 Share per party
Code: Out-of-residence candidates per party and election
Code
#party with most out-of-residence candidatesdf_residence_party <- df_clean %>%filter(!str_detect(election, "Stadt")) %>%group_by(election, party, residence) %>%summarize(n_residence=n()) %>%mutate(rel_residence=n_residence/sum(n_residence, na.rm = T)) %>%filter(residence=="outside") %>%arrange(desc(rel_residence)) %>%ungroup()#fn for barchartbar_chart <-function(label, width ="100%", height ="16px", fill ="#00bfc4", background =NULL) { bar <-div(style =list(background = fill, width = width, height = height)) chart <-div(style =list(flexGrow =1, marginRight ="8px", background = background), bar)div(style =list(display ="flex", alignItems ="right"), chart, label)}#assemble tabletb_residence_party <- df_residence_party %>%select(party, everything()) %>%reactable(.,columns =list(party=colDef(name="Party"),election=colDef(name="Election"),residence=colDef(show=FALSE),n_residence =colDef(name ="Number",align ="left", cell =function(value) { width <-paste0(value /max(df_residence_party$n_residence) *100, "%") value <-format(value, trim=F, width=3, justify ="right")bar_chart(value, width = width, fill ="#00bfc4")}),rel_residence =colDef(name ="Share",align ="left",cell =function(value) { width <-paste0(value *100, "%") value <- scales::percent(value)bar_chart(value, width = width, fill ="orange", background ="#e1e1e1")})),bordered=F,compact =TRUE,highlight =TRUE,style =list(fontSize ="10px"),filterable =TRUE,theme =reactableTheme(borderColor ="#7f7f7f",borderWidth =1,backgroundColor ="#f0eff0",filterInputStyle =list(color="green",backgroundColor = plot_bg_color) )) %>%add_title(title="Parties' number and share of non-resident candidates per election") %>%add_source(source="Non-resident candidates: Candidates not residing in their electoral district according to their residency zip code.")
Parties' number and share of non-resident candidates per election
Non-resident candidates: Candidates not residing in their electoral district according to their residency zip code.
4.6.2 Share per electoral districts
Code: Share of out-of-residence candidates per district
pl_diff_share_non_residency<- diff_share_non_residency %>%drop_na() %>%ggplot()+labs(title=glue::glue("WIEN-WAHL 2020: Share of non-resident candidates per party.<br><span style=color:{paletteer_d('ggsci::default_jama')[2]}>City council (Gemeinderat)</span> vs <span style=color:{paletteer_d('ggsci::default_jama')[3]}>district councils (Bezirksräte)</span>"),subtitle="Non-resident candidates: Candidates with different residence zip-code than their electoral district",x="Share of non-resident candiates",caption=my_caption)+geom_segment(data=diff_share_non_residency_wide,aes(y=reorder(party, -Gemeinderat),yend=reorder(party, -Gemeinderat),x=Gemeinderat,xend=Bezirksvertretungen),color="grey50")+geom_point(aes(y=reorder(party, -rel_residence),x=rel_residence,shape=election,color=election),size=2)+scale_color_manual(values=c("Gemeinderat"=paletteer_d("ggsci::default_jama")[c(2)],"Bezirksvertretungen"=paletteer_d("ggsci::default_jama")[c(3)]),label=c("Gemeinderat"="City council","Bezirksvertretungen"="District councils"))+scale_shape(guide="none")+scale_x_percent()+theme_post()+theme(legend.position ="none",legend.direction ="horizontal",legend.title=element_blank(),panel.grid.major.y =element_blank(),panel.grid.major.x =element_line())
The graph shows that almost all parties feature a smaller share of non-resident candidates in the elections to the city council than in the elections to the district councils. This difference is particularly strong for the FPÖ and for Team HC Strache. However, there are two notable exceptions. On is WIFF - Wir für Floridsdorf, a party with a clear programmatic focus for one specific district. Hence, it doesn’t come as a surprise that the share of non-resident candidates for the district council is low. The second exceptions are the Greens. I am second guessing here, but I could imagine that this is indicative for a strong ‘grass-root’, locally based party organization and reflecting the engagement of candidates who actually live in their electoral districts. I find the graph in any case quite noteworthy.
5 Wrap-up
So that’s it for now. Again, this has been largely about retrieving and crunching numbers, but I guess some bits are worth entertaining more substantively.