Hello everyone,
and first of all let me say a big fat
Thank you for 30.000 comments
Yup, we passed it some day last week and usually I do some kind of giveaway but I actually missed it this time.
So we’ll do the next one at 40.000, I guess :). But yeah, vielen lieben Dank. Your comments are part of the reason why this blog even made it through the first few years and without them it would feel really lonely around here. And also, you ask lots of great questions that add value to the article itself.
I read every single comment and I’ll continue to do that, so keep ’em coming :).
And now on to our next point which is… my failure.
Honestly, this week was a complete failure for me, article-wise. My plan was to do an exercise on noun gender, and I thought I had an idea.
I didn’t want to test single nouns but rather the rules that Slavica told us about last time. But for this to actually work, we’d need to do it flashcard-style… so you get asked the same thing repeatedly until it’s automatic. And then I realized I actually don’t have a flashcard setup in my quiz software.
So then I thought “Okay, I’m gonna do nouns” and I looked for lists of the most common nouns and so on, but actually it’s the same thing… I need a flashcard setup, not just “one up multiple choice”. So yeah… there went my idea of doing a gender practice :/.
I do have something interesting about gender though, because a while back someone sent me a really interesting email. His day name is Emmanuel Haton, but I’ll refer to him with his secret identity…. Excel-Man. He had done some serious number crunching on the issue of German gender and I found this so interesting that I absolutely had to share that with you.
So, using a giant database of German nouns and their frequency, Excel-Man used his superpower to check for trends between endings and gender and calculated how accurate this rule of thumb is if you just follow it blindly.
And he didn’t stop there. Then, he collected the most common exceptions to the particular rule of thumb and then calculated the accuracy again. And it’s all weighted by frequency, mind you. So if 99 of 100 words follow a certain pattern but that one exception is super common, it’ll increase the accuracy quite a bit.
And to top it all off, it even accounts for compounds. So if die Sicht was an exception for instance, its frequency would also include words like Einsicht and Aussicht and so on.
He used the DeReWo frequency database of the Leibniz-Institut für Deutsche Sprache, which contains abou 23,000 singular nouns and which is based on a collection of recent German texts totalling 23 billion words. That’s thousands and thousands of words.
So it’s a pretty damn nice insight into regularities and how regular they are and all in all an incredible piece of work and I’m gonna share part of his email and the tables with you below.
Not all of it is equally useful in practice, but I’m sure you’ll find some really nice ideas and insights. And it’s generally just very interesting to see how accurate these trends really are.
So… take a look and then tell me what you think. I’m really really curious for your feedback and if you can use this for your studies.
Oh and also, since I don’t have an exercise, if you have a great tool for practicing German gender, please share it in the comments. I know there’s loads of apps out there but I have no idea what’s good.
So, I’ll see you in the comments, have a great weekend and bis nächste Woche :)
Oh, by the way… there’s a big new feature about to come to the website. Not sure when, but I’d say two weeks at the most.
It’s really really exciting, so get ready :) :) :).
***
German Gender and Big Data
(by Excel-Man)
I looked into the question of noun gender in German. I know, for a native speaker, this is a most uninteresting topic. But that’s only because you’ve been immersed all along in the language and therefore gender is obvious. Not so for a foreign learner, for whom it is something of an enigma and a holy grail. And for a language perfectionist like me it is a tragedy: I can easily umsonst walk by a bakery if I can’t remember the correct gender for Kuchen…
I decided to have a go at it, not from a semantic point of view (meaning: babies, diminutives and young animals are neutral; days, months and seasons are masculine, etc.) but using another ally… drumroll…
BIG DATA
Ok, really simply statistics. Ok, really some analysis in an excel table. My hope was to find some useable rules, and I did find some. As always in languages, few rules are valid 100% of the time but I found out some regularities that can save the day with a pretty good success rate.
Also I made sure to consider frequency data. I have read that overall in the German vocabulary the percentage of masculine/ feminine words is such and such, which can help make some “bets” when you don’t know the gender of a word. But, to be useable, this kind of tip needs to be based on the usage frequency of the words, not only on their raw number.
Ideally usage frequency of spoken German should be considered, because rules of thumb are much more useful when speaking, as you don’t have the possibility to open a dictionary and check. Unfortunately I couldn’t get my hands on any such spoken word frequency database. Instead I relied on the general DeReWo database from the Leibniz-Institut für Deutsche Sprache.
The results
Some known things were confirmed:
- words in -UNG are feminine except der Sprung
- words in -ION are mostly feminine
- ,,,.
But there were also some less obvious things, like for instance about the gender of words that end in -r
The gender of words ending in -R
For some reason the R final sounds masculine to me but it’s not exactly true.
First of all, let’s be semantic one second, if the -R word designates a feminine person (die Mutter) or a masculine person (der Bauer) then go for the person’s gender.
The exception is das Opfer and I remember it by thinking that the Opfer doesn’t actually do something. Similarly, devices that do something (der Computer) are masculine.
– otherwise -AUER is always feminine
– other words ending in -ER are, to my surprise, more often neutral (das Meer, das Messer) except if the first letter is K – in that case, go for masculine (der Koffer).
– words ending in -UR and -HR are mostly feminine but for some exceptions, including das Jahr,
– for other -R words (that is, not ending in -ER), try masculine and you’ll be right 78% of the time.
Words ending with I
- words ending with “-ei” are overwhelmingly feminine
- other words in -i are 95% masculine.
- Words ending in -f, -b, -g(except -ung) are 90% masculine
- Words ending in -us, -uss, -uß (except das Haus) are 90% masculine.
Easy to remember when thinking that -us is the typical masculine marker in Latin. - Similarly, words in -um (the typical Latin marker for neutral) are indeed almost all neutral except for…
- …those in –aum, which are masculine.
For more statistical data, see results in tables A and B below. In every case you can see that learning a few exceptions seriously increases the probability of guessing right.
Finally, usage statistics let us know the list of words that are so frequent, including the frequency of their compounds, that it is worth learning their gender by heart (see table C).
Conclusion: doing this analysis was slightly geeky but fun. In total, the situations described here cover over 70% of the usage.
Enough to make me feel safer when ordering einen Kuchen at the bakery. And yes, it is der Kuchen.
** if you’re reading on mobile, please flip to landscape!! The full tables should fit then **
Table A
Gender of nouns in -R
(in total, 16% of all word occurrences, so these words make up 16% of the total nouns used)
ending: | trend: | accuracy: | extras: | accuracy with extras: |
-er and is a feminine (Mutter) or masculine person (Bauer) | f/m | 99% | das Opfer | 100% |
-er and a device that does something, or a month | m | 97% | das Thermometer (and other “meter” devices), die Leiter | 99% |
-auer (if not a masculine person) | f | 100% | ||
begins by K- | m | 91% | das Klavier, die Kammer, das Kloster | 100% |
other words in -er | n | 58% | der Fehler, der Meter, der Liter der Ärger, der Hunger, der Finger die Nummer, die Steuer, die Feier | 80% |
-hr or -ur | f | 67% | das Jahr das Ohr, das Rohr,der Verkehr, der Schwur, der Azur | 100% |
other words ending in -r | m | 78% | das Tor, das Paar, das Haar, die Tür, die Bar, die Schar | 94% |
Table B
Other noun gender rules
when a noun ends with: | if you say: | then you are x% right | and if you learn these exceptions: | then you are x% right | (this rule covers x% of total noun occurrences) |
-ung | f | 99% | der Sprung | 100% | 9.2% |
-ion | f | 98% | der Champion, der Spion, der Skorpion, das Stadion, das Ion | 100% | 2.3% |
-eit | f | 98% | der Streit | 100% | 2.7% |
-ft | f | 94% | der Saft, der Lift, der Stift, das Geschäft, das Heft, der Schaft (but other words in -schaft are f | 99% | 1.7% |
-e (except if a masculine person) | f | 90% | das Ende, das Interesse, das Gebäude, das Gelände, das Finale, das Auge, der Name, der Kaffee, der Schnee, der Käse | 97% | .4% |
-nn | m | 98% | das Kinn, das Zinn | 99% | 0.7% |
-g | m | 91% | die Burg, das Zeug, das XXX-ing | 99% | 2.9% |
-pf | m | 99% | das Geschöpf | 100% | 0.3% |
-ik | f | 99% | der Streik, das Mosaik | 100% | 1.0% |
-b | m | 89% | das Lob, das Grab, das Weib, das Pub, das Kalb, das Verb, das Laub | 99% | 0.4% |
-f | m | 88% | der/das Golf, der /die Elf, das Schaf, das Schiff, das Dorf, das Kaff | 99% | 1.4% |
-tz | m | 89% | das XXX-etz | 99% | 1.3% |
-nz | f | 90% | der Tanz, der Schwanz, der Glanz, der Kranz | 99% | 0.5% |
-us, -uss, -uß (except das Haus) | m | 92% | die Nuss, das Muß, das Plus, das Minus, das Virus, das Aus | 98% | 1.0% |
words in UM | |||||
-aum | m | 100% | 0.3% | ||
other words in -um | n | 97% | der Konsum, der Irrtum, der Reichtum | 100% | 0.9% |
words in -is | |||||
-nis | n | 86% | der Penis die Kenntnis, die Erlaubnis | 96% | 0.5% |
-eis | m | 94% | das Eis, das Gleis | 99% | 0.5% |
other words in -is | f | 77% | das Remis, das Palais der Mais, der Cannabis, der Kurbis | 95% | 0.1% |
words in “-i” | |||||
-ei | f | 98% | der Schrei, der Papagei, der Brei , das Geschrei | 100% | 0.6% |
other words in -i | m | 95% | die Safari, die Salami, die Gaudi das Taxi, das Alibi, das Sushi, das Müsli | 99% | 0.8% |
Table C
Frequent nouns worth learning by heart
The frequency includes the frequency of the compound words with the same root.
der Tag | 1.8% |
die Xxx-schaft (but der Schaft ) | 1.1% |
das Jahr | 1.0% |
die Zeit | 0.8% |
die Stadt | 0.6% |
der Euro | 0.6% |
das Ende | 0.5% |
das Spiel | 0.5% |
der Fall | 0.5% |
das Haus | 0.5% |
der Platz | 0.4% |
der Xxx-trag | 0.4% |
der Satz | 0.4% |
die Arbeit | 0.4% |
der Punkt | 0.3% |
das Land | 0.3% |
der Meter (the unit, but das Thermometer ) | 0.3% |
der Bau | 0.3% |
das Leben | 0.3% |
die Welt | 0.3% |
der Gang | 0.3% |
der Artikel | 0.3% |
der Rat | 0.3% |
der Ort | 0.3% |
der Weg | 0.2% |
das Geld | 0.2% |
die Sicht | 0.2% |
das Unternehmen | 0.2% |
der Grund | 0.2% |
der Schluss | 0.2% |
das Bild | 0.2% |
das Thema | 0.2% |
der Abend | 0.2% |
die Wahl | 0.2% |
die Zahl | 0.2% |
das Mal | 0.2% |
der Kreis | 0.2% |
der Zug | 0.2% |
das Amt | 0.2% |
das Werk | 0.2% |
die Form | 0.2% |
der Raum | 0.2% |
das Wort (but die Antwort ) | 0.2% |
***
Let us know what you think and of course also if you have any questions.
See you in the comments :)