X-Git-Url: http://unremediatedgender.space/source?p=Ultimately_Untrue_Thought.git;a=blobdiff_plain;f=content%2Fdrafts%2Fsurvey-data-on-cis-and-trans-women-among-haskell-programmers.md;h=6506081ce5736e10ade82c4014ec8754f1010448;hp=02838277b2d0d194f1bc6232ace2d04cd19903d3;hb=7a0ecdcc1d4a6fe5d093135d03025b0298434eab;hpb=d92b92d6e08f7d2c3bf4d8153e985ce755a4f520

diff --git a/content/drafts/survey-data-on-cis-and-trans-women-among-haskell-programmers.md b/content/drafts/survey-data-on-cis-and-trans-women-among-haskell-programmers.md
index 0283827..6506081 100644
--- a/content/drafts/survey-data-on-cis-and-trans-women-among-haskell-programmers.md
+++ b/content/drafts/survey-data-on-cis-and-trans-women-among-haskell-programmers.md
@@ -1,14 +1,14 @@
 Title: Survey Data on Cis and Trans Women Among Haskell Programmers
 Date: 2021-01-01
 Category: other
-Tags: Haskell, sex differences
+Tags: Haskell, sex differences, Python
 Status: draft
 
 Stereotypically, computer programming is both a predominantly male profession and the quintessential profession of non-exclusively-androphilic trans women. Stereotypically, these demographic trends are even more pronounced in "niche", academic, or hobbyist technology communities (_e.g._, Rust), rather than those with more established mainstream use (_e.g._, JavaScript).
 
 But stereotypes can be _wrong_! The heuristic process by which people's brains form stereotypes from experience are riddled with biases that prevent our mental model of what people are like from matching what people are _actually_ like. Unless you believe [a woman is more likely to be a feminist bank teller than a bank teller (which is _mathematically impossible_)](https://en.wikipedia.org/wiki/Conjunction_fallacy), you're best off seeking _hard numbers_ about what people are like rather than relying on mere stereotypes.
 
-Fortunately, sometimes hard numbers are available! Taylor Fausak has been administering an annual State of Haskell survey [since 2017](https://taylor.fausak.me/2017/11/15/2017-state-of-haskell-survey-results/), and the [2018](https://taylor.fausak.me/2018/11/18/2018-state-of-haskell-survey-results/), [2019](https://taylor.fausak.me/2019/11/16/haskell-survey-results/), and [2020](TODO: linky) surveys include optional "What is your gender?" and "Do you identify as transgender?" questions, as well as the anonymous response data. 
+Fortunately, sometimes hard numbers are available! Taylor Fausak has been administering an annual State of Haskell survey [since 2017](https://taylor.fausak.me/2017/11/15/2017-state-of-haskell-survey-results/), and the [2018](https://taylor.fausak.me/2018/11/18/2018-state-of-haskell-survey-results/), [2019](https://taylor.fausak.me/2019/11/16/haskell-survey-results/), and [2020](https://taylor.fausak.me/2020/11/22/haskell-survey-results/) surveys include optional "What is your gender?" and "Do you identify as transgender?" questions, as well as the anonymous response data. 
 
 I wrote a script to use these answers from the CSV response data for the 2018â2020 surveys to tally the number of cis and trans women among survey respondents. (In Python. Sorry.)
 
@@ -18,12 +18,12 @@ import csv
 survey_results_filenames = [
     "2018-11-18-2018-state-of-haskell-survey-results.csv",
     "2019-11-16-state-of-haskell-survey-results.csv",
-    # TODO: 2020
+    "2020-11-22-haskell-survey-results.csv",
 ]
 
 if __name__ == "__main__":
     for results_filename in survey_results_filenames:
-        year, _ = results_filename.split('-', 1)
+        year, _ = results_filename.split("-", 1)
         with open(results_filename) as results_file:
             reader = csv.DictReader(results_file)
             total = 0
@@ -31,15 +31,26 @@ if __name__ == "__main__":
             trans_f = 0
             for row in reader:
                 total += 1
-                if row['What is your gender?'] == "Female":
-                    transwer = row['Do you identify as transgender?']
+                # 2018 and 2019 CSV header has the full question, but
+                # 2020 uses sXqY format
+                gender_answer = (
+                    row.get("What is your gender?") or row.get("s7q2")
+                )
+                if gender_answer == "Female":
+                    transwer = (
+                        row.get("Do you identify as transgender?") or
+                        row.get("s7q3")
+                    )
                     if transwer == "No":
                         cis_f += 1
                     elif transwer == "Yes":
                         trans_f += 1
             print(
-                "{}: total: {}, cis-â: {}, trans-â: {}".format(
-                    year, total, cis_f, trans_f
+                "{}: total: {}, "
+                "cis-â: {} ({:.2f}%), trans-â: {} ({:.2f}%)".format(
+                    year, total,
+                    cis_f, 100*cis_f/total,
+                    trans_f, 100*trans_f/total,
                 )
             )
 
@@ -48,8 +59,9 @@ if __name__ == "__main__":
 It prints this tally:
 
 ```
-2018: total: 1361, cis-â: 26, trans-â: 19
-2019: total: 1211, cis-â: 16, trans-â: 16
+2018: total: 1361, cis-â: 26 (1.91%), trans-â: 19 (1.40%)
+2019: total: 1211, cis-â: 16 (1.32%), trans-â: 16 (1.32%)
+2020: total: 1348, cis-â: 12 (0.89%), trans-â: 21 (1.56%)
 ```
 
-[TODO: 2020 data; I briefly thought about pooling years to get a better sample size, but that's methodologically invalid because probably a lot of the same people took the survey multiple years]
+[TODO: wrap up]