gender bias and stereotypes

I am currently preparing a lecture on “computational methods in big data analysis for business” and that involves scraping data from web pages, and since I learn best by doing I decided to explore gender biases and stereotypes. I was curious how different women and men are with respect to their interests. I also read this blog post from Randy Olson which was heavily criticized by PZ Meyers for neglecting to mention the underrepresentation of women in his study (read more here). Which made me wonder if under (or over) representation of women could be geographically influenced: Does the state you live in matter to your gender being underrepresented in the “web”?

Of course there are many social media and tons of sites I could have studied, but I chose google blogger.com, because I could not only download names and gender of blogger.com users but also where they live and more importantly what they are interested in. We have to be careful here, I can of course not see where they live and what they are interested in, nor can I know the gender … what I scraped was what a certain user wrote in his/her profile. So when I talk here about gender, location, or interest, I am talking about self reported interests, and self reported gender, and self reported location. I got worldwide data from 65851 (self reported) males, and 84993 (self reported) females in my more or less shallow scraping – I could have run this so much longer, but I am not interested in super exact data here, but just a trend.

The first thing I did is to plot the gender ratio per state of the US and color code it in the following picture:

genderBiasByState

What I found is that first of all women are highly overrepresented at blogger.com. Two thirds of all entries I see are self reported “females”, and what you see in the above plot is a color enhancement of that data differentiated per state. The greener the more women blog in that state (or self reported women blog in self reported states…), and the more blue the less women. However, the highest fraction of male bloggers is not higher than 40%. Each state has at least 500 users, and many states have about 5000 data points in them. Looking at this, I was wondering if indeed there is something particular about New York area which makes women blog more then men?

I took all the interests from men and women and made a ranking sorting the most often self reported interest to the top. This was normalized by person, someone with 50 interests is contributing less to each of the interests than someone with one two interests. I only used the top 100 of each list and computed the rank difference between both. If you see in the following plot a rank difference from -20 it says that this interests was reported in the men list 2o positions higher than in the women list. Consequently something with a rank discrepancy of 50 means that this interests ranks 50 places higher in the women list than in the men list. When you look at the following results you will not find 100 interests because different interests end up on both lists, and I can’t make a statement about those interests using this method. Male biased interests are to the left and blue colored, women biased interests are right and green.:

generalGenderBiasEdited

 

(click for full size view)

What I think is so astonishing is that men are so much into sport that they even self report that they are interested in sport and sports … something that I guess should be filtered first, but that is for a later day. However, I would suggest that this bias also follows gender stereotypes quite well, which I was aware exist, but I was not expecting them to be so obvious.

I took the same method but for pooled the data for each state alone. Unfortunately sometimes the data now becomes sparse and I can not trust their statistics, so I focused on those states which have a high initial bias and a high blogger count. I added the interests with the highest male and female bias into the map from above:

genderBiasInterestsByState

 

In the above map the interest to the left of the dash is the top biased interest for women, and to the right the top biased interest for men – among those interests that were reported by at least 50 bloggers.

I am still pondering about what this all means… enjoy thinking about it yourself!

Cheers Arend

 

 

Leave a Reply