Some Whatever Stats Geekery For You
Posted on October 13, 2010 Posted by John Scalzi 48 Comments
I’ve had a couple questions recently about stats, as in, how many people visit daily and where they come from and so on. It’s been a while since I did a public airing of statistics, so here’s what I know, at the moment.
First, a long-winded introduction and caveat: I no longer have a really reliable idea of how many people visit the site on a day to day basis, and that’s because of two things. One, the site is currently divided into two areas which report to different stats programs: My WordPress install (which as noted earlier this week is housed in WordPress.com’s VIP area) reports in one place, and the rest of Scalzi.com reports elsewhere.
Two, even when everything was in the same place, the stats programs I would use to track unique visitors, pages served and other data would report different numbers. For example, in 2008 — before I migrated Whatever off the site — the stats program that my ISP 1&1 uses regularly showed the site receiving between 25,000 and 40,000 unique visitors daily (factoring out search engine spiders and other automated visits), while the WordPress stats package I used would show 50% to 70% of those numbers for “visits.” Some of that was due to site content not in the WordPress software being ignored, but even accounting for that there was a pretty significant discrepancy between the two stats packages. I used the 1 & 1 stat package numbers as my “official” numbers, as it was better integrated with my site overall, so I made the possibly-not-entirely-defensible assumption its numbers were closer to the actual site visits and pages served.
These days the 1 & 1 stats package doesn’t count the people directly hitting on Whatever’s WordPress install, since the URL sends them to WordPress.com’s servers. But it counts everything else, including archive pages not in WordPress, and also sub-sites, like UnicornPegasusKitten.com. Those pages still get several thousand visits a day. On Whatever proper, I have WordPress’ stats package running and also Google Analytics, both of which report slightly different numbers, Google Analytics typically but not always being slightly lower.
So, what does it all mean? Given my knowledge of the site’s reportage pattern history and my own back-of-the-envelope number-crunching, I can say generally and with reasonable confidence that the site hasn’t lost readers at any point and by all indications continues to gain readers at it goes along. How many readers that is, is the interesting question. The low end — the one that works off the Google Analytics numbers for the WordPress install and assumes the 1&1 stats package overreports substantially, is about 15,000 unique visitors daily. The high end, which assumes the WordPress stats package underreports and the 1&1 stat package is on bead, is about 50,000 unique visits daily. The actual truth is undoubtedly in between.
What I’m comfortable saying to people is that the site gets up to 45,000 visitors daily, which to me implies that it generally gets below that but that the site shows spiky behavior, which in fact it does. Indeed, a number of days spike substantially above 45k in terms of visitorship (as seen through the WordPress stats suite), usually when I’m pressing some button about politics or publishing or what have you.
If I were trying to sell advertising on the site, I wouldn’t guarantee the 45k number; I’d pick a number well below what the Google Analytics reports, because that’s the stats package I would assume they would want reporting from, because the advertising would probably be placed only on WordPress pages anyway, and because I believe in an overabundance of caution when guaranteeing eyeballs. So: say, 10,000 visitors daily, which I know is far less than the site gets; that way I wouldn’t have angry advertisers.
(Not that I plan on selling advertising here anytime soon; I’m just rattling on.)
(Update, 6:10pm: in the comments, someone asked me if RSS readers are included in the stats numbers above. The answer: No. The WordPress stats package notes syndicated readers in its entry breakdowns but doesn’t add them to the general overall stats, and the Google Analytics doesn’t track them at all, as far as I can see. My 1 & 1 stats don’t include current RSS feed readers either. This is another complicating factor in pinning down the total readership of the site, to be sure.)
One day, when I have the time/money/an actual reason to do so, I will actually hire someone to consolidate all the content on Scalzi.com into one install from which it will be easy to get more accurate reports about visitors. For the moment, however, I just have to live with the fact that while I know lots of people come to visit, the actual number is a mystery.
That said, for the purposes of what follows, I’m using data from Google Analytics. It captures only a subset of the people who visit the site, but it captures their data in some detail, and it’s not unreasonable to assume that generally speaking, the larger audience for the site follows the trends in the data reported here.
1. More than 90% of Whatever readership comes from four countries: The United States (which is more than 76% of the overall total readership), Canada, the UK and Australia. The largest non-English speaking country visitorship comes from Germany, from which a little over 1% of the site readers hail.
2. In the US, the state with the largest readership is California, with over 15% of the US total, followed by New York, Texas, Washington and Massachusetts. Ohio, where I live, is #6, with 4.6% of the US total visits. Top US city visiting Whatever: New York City (2.77% of the US total), followed by Seattle, San Francisco, Chicago, and Portland. Top Ohio City: Columbus, at #13.
3. 70% of you have Windows machines, while 24.5% of you are Mac heads, and 4.16% of you are Linux nerds. Of the relatively small number of other types of machines which access the site, most of them are iPads and iPods.
4. Whatever draws a Firefox crowd, as 47% of you use that browser, followed by Internet Explorer at 21% (hi, mom and everyone at a corporate workplace), then Chrome, then Safari (those two almost tied at 14.4%) and then Opera. One person accessed Whatever with a Nook browser, which I think shows real commitment.
5. The very large majority of you are visiting with computers whose monitors are set at higher than 1024×768. This generally implies newer computers or at least newer monitors.
Add all of that up, and what sort of educated guesses can we make about the Whatever readership?
Well, I’m guessing that in general the Whatever readership is urban/suburban, educated, tech-savvy edging into tech-nerdy, probably mostly white, probably mostly moderate-to-liberal, probably generally 45 and under, and generally reasonably well off (or in the sort of social strata where being reasonably well off is not uncommon). I’d also guess, of course, that a large chunk of you read more than average, and read at least some science fiction and fantasy.
In short, overall, you’re not terribly unlike me. Bear in mind that I’m not saying you are all the things above (particularly regarding politics, as there is a vocal conservative/libertarian subset here), but on balance I’d guess you’re more of those things above than not. I’m not sure this should be terribly surprising to anyone.
How cow, lynx works too.
the thing is that that profile is exactly what you would predict a priori given your topics and comment threads…
I guess this just shows that sometimes assumptions ARE correct…
“4.16% of you are Linux nerds”. Geeks, John. We’re geeks, not nerds. Big difference.
I’m sorry, you just made me think of an xkcd strip: http://xkcd.com/747/
I swear that site is like crack.
Well, no, just that my assumptions match yours, based on a different set of data.
“I’m not saying you are all the things above (particularly regarding politics, as there is a vocal conservative/libertarian subset here), but on balance I’d guess you’re more of those things above than not. I’m not sure this should be terribly surprising to anyone. ”
Kind of. Anymore I’m interested to find out if statistics will actually back up my limited impression about public opinions.
You forgot how attractive we all are. A writer of your stature wouldn’t have unattractive fans.
Sorry, Google Analytics doesn’t track hawtness.
While I agree “mostly moderate-to-liberal” is probably correct, strictly speaking, I would guess libertarians make up a larger percentage of your traffic than of the population as a whole. I think this because: (i) you do seem to have a number of dedicated libertarian commenters, (ii) you are a sci-fi writer and have been compared to RAH, which naturally draws a disproportionate share of libertarians, and (iii) your brilliant “Why I Hate Your Politics” essay from years back gave libertarians a WHOLE PARAGRAPH of criticism, as if we were a valid political segment just as worthy of attention as the other two major political groupings.
“…probably generally 45 and under…”
I love being an outlier.
Also, I read Whatever at work on Firefox and at home on Internet Explorer. So as my cat would put it (if he was a lolcat), “Iz in ur data, futzin ur satistics.”
And to whoever reads Whatever on the nook, get help.
Late 30s, White, Male, Computer $60K a year. Conservative to Moderate in Politics.
Nope, don’t think you were describing me at all.
part of my post was eaten. guess you can’t use the Greater and Less than symbols.
John – just curious.
Do people who read you via syndicated feed (like me via my Livejournal friends page) who don’t necessarily click through to the actual site every day (though I obviously just did to leave this comment!) get counted in the stats pool?
I read about 50% of your posts off google reader and never jump over to the site. I only jump if I’m interested in how people are reacting.
So does your data include the feeds? And if so how would you account for the double dip I would make by reading it on the reader and then hopping over for the comments?
The WordPress stats package notes them in breakout data but not in overall site visits. That is indeed another complicating factor, since RSS readership can be equal to or sometimes greater than site visitorship.
Come on, the nook thing was either when I first got it, or when I was traveling. Which incidentally coincided.
And ditto @14 Sara.
@CrypticMirror 4: ROTFLMAO!
I don’t really care if I’m called a geek or a nerd since I know they both mean Awesome.
This is actually all quite amazing. I am surprised I guess at how different the numbers are depending where you grab your data from, regardless it seems as though you can get a good feel for what is happening on the site. It surly makes me want to take a closer look at who is really looking at my busyness site rather than being satisfied with knowing how many hits it gets.
You should be able to use greater/less than signs via html escape codes: > for > and < for <.
So… yeah. Nerdy.
This is all fine and good, but does Google Analytics tell you that I’m currently reading this in my pajamas? Hmm?
Vocal Libertarian shout out !
Oh dear…so I match all of those demographic categories, right down to using Firefox on Windows with my wide-screen monitor. The only thing I’m not is a resident of California, not for 10 years now. To be true to my contrarian nature, I’d have to stop reading the site now…or can we agree that we’re all unique snowflakes together?
Wait! You have stats on our screen resolutions? I’m smashing my internet connection after I finish this post.
My little, month-old blog has recieved only 200 visitors to date. Almost half are from when I linked from the comments of this site and the rest are me looking at the finished posts. I guess a twelve year head start really helps. Writing about something other than math helps too.
@Eternal Density 18: Congrats! You just made #88 on my random quote board.
Whatever.com is the one and only site I used to play with the browser on my kindle. For what that’s worth.
Yep. Down to the Firefox at home and Explorer at work, and residing in one of the cities you mentioned.
All that said, there does seem to be quite a bit of diversity represented in the comment threads. Although I suppose that may be my white privilege talking.
P.S. “sire readers” under your first point should read “site readers”
John Scalzi called me a Linux Nerd!
It’s good to be noticed, us Linux folks.
OMG, you know who I am and where I live. That’s kind of strange when you realize I meet all those. You know about our monitors? Really? Cool. I need a new one. I’ve never been called a tech-nerdy before. My kids probably call me worthless on the computer behind my back.
Now I feel I must click through when I access through my RSS feed just so I can be counted in your stats.
For RSS Tracking, you can do a simple two step (even one if you want, but it won’t count people who read the rss feed).
Google owns a company called feedburner, which you can send your rss through and track readers with it, and WordPress has a plugin for feedburner which automatically sends all links to your feed over to feedburner.
Step one would be to setup feedburner.
Step two would be to announce that you’re switching, so all y’alls rss readers switch your feed link.
Be interesting to see a survey to compare actual demographics to your (well-founded) guess.
IE on NT at work, Firefox on Linux at home, Android on the road.
Urban yes, educated (BS, MS) yes, geek yes, melanin-impaired yes, moderate-to-(unwilling to admit ANY conservative tendencies thanks to teabaggers), barely under 45, well off and smart enough not to pull a Henderson (and to realize taxes *need* to match expenditures), formerly voracious SF/F reader.
Pretty close, at least until my next birthday …
Unless FF/NS users are whitelisting your sites, they might not be showing up at all.
Looks like you’re right demographically: http://www.quantcast.com/whatever.scalzi.com
Not bad for Internet guessery.
On a side note, I love how you’ve made the background image of the discussions only visible with your posts. Evil. Brilliant. But Evil.
Ah, but you ignore the Monks of the Net of the Nameless, who shun the Analytic Menace and navigate their CowlMax browsers here thru paths unseen.
To say nothing of the ten thousand strong but rarely comment-ative psychic kitten crowd. (“Mia-om. In the crystal of my mind’s eye I see… what, another political post from Scalzi? The cur! Paws and whiskers, what’s wrong with adult cats?”)
This, however, is offset by the few thousand daily Google image search hits of “feline meatproduct fetish”, and similar.
Quantcast appears to woefully undercount the number of visitors I receive, however. Which I suppose goes further to my point.
One person accessed Whatever with a Nook browser, which I think shows real commitment.
Are you sure that’s not you? IIRC, you have a Nook.
If you run your stats again, you’ll now see at least 1 hit from a Kindle browser. I probably won’t be doing that very often — it’s something you *can* do, not necessarily something you’d want to do — but I did feel the need to help round out your site stats.
I guess it all averages out… since some of the distinct visits are actually the same person (at least in one case – I use PC/IE8 at work (no choice) and a Mac/safari or firefox at home.)
It is interesting to see the stats though. I am not the baseline demographic (being female and over 40) but I find the writing here interesting and diverse, and the commentary often just as fascinating…
Reading you on my Droid Incredible. Mobile version of Whatever has nice layout, easy to read. WordPress doesn’t seem to play nice with the Swiftkey keyboard though there is a WordPress app for Android.
Data point– 55 year old grandmother of five, social liberal, financial conservative. There is no party that represents me, I’m a polyamorous conservative. So I usually go with “independant”.
5.2K for Global! It would be nice to know from which countries you have more readers. I do know (or frankly estimate), that at least from this backwater country (Venezuela) no more than 1 regular reader (me). But it’s nice to have such tools available. Scalzi could start a providing custom content for the readers depending on his/her geographic location.
Perhaps even in other languages? :-)
Please count me as one of the non vocal conservatives.
I’m curious what would you would see if you normalized your state stats. Would California still hit the top? Does tiny Rhode Island actually have 25% readership?
Just the map geek in me coming out.
Quantcast, Google analytics, etc: The Firefox/NoScript thing is a big reason to trust WordPress’ numbers more than theirs–any scripting involved in the WordPress counts is enabled when we allow scalzi.com and wordpress.com. Apart from actual ad sites, quantserve and google-analytics are frequently blocked by NoScript users (e.g., myself). I’ll sometimes temporarily allow Google Analytics, but quantcast is blacklisted in my browser, mostly to limit the number of sites I am exposed to at any time, more than privacy concerns.
However if you ever need to sell advertising, your rates will probably be determined by the quantcast or google numbers. I’d say your best bet is probably start charging a flat rate for “The Big Idea” pieces, rather than getting the ad agencies involved.
I use the Camino browser on my Mac. A Firefox variant expressly for Macs, simple, love the interface look. Fast. And free!
(And yes, I have FF and Safari, I just prefer Camino.)
Lol, and how what is the percentage of us Greek geeks, then? (I know we are at least three, hehehe… :-P)
Let me drop some geek at you. There are more factors that can cause the sort of difference you’re seeing between the two packages:
– Some packages count every single file request to the server and don’t make a distinction between a supporting file (like a script) and an actual “page”. So they’ll conflate the numbers. For example, a user will see one “page”, but the two supporting .js files, the CSS file used to render it, and the three images on it will be added to the number. Some can filter out certain kinds of files but get caught on others. You need to find out the specifics of how your stats packages define a page. What you’re seeing – your ISP’s numbers being high and your Analytics numbers being low bears that out. Google’s pretty good with dealing with the chaff whereas your ISP’s package may not be so hot at it.
– “Unique users” is another rabbit hole. They are often tracked by hits from a specific IP address over time. However, if someone is using a gateway, every machine on their network will report the same IP address to your stats. This is very common in work/academic environments. Or alternatively, you can get situations where the package tries to use other metrics along with it or very short periods of time so the same person will get called a “unique user” multiple times. Again, it’s a case of needing to find out how both packages define that.
– Obfuscation is the order of the day on the Internet, security and convenience-wise. Things like NoScript, Adblock, or Greasemonkey block or change how users see your site and that will affect what gets “hit”, depending on how the stats package defines a page. Also, blocking/not allowing cookies, or anti-malware/anti-virus software that obscures the user’s IP address, and anonymous browsing can throw a monkey-wrench into things, too.
tl;dr – This stuff is confusing, contradictory, and a pain in the backside. Anyone buying ads knows this. All you really have to do is find out how your tracking systems count things, and make those definitions part of your ad selling standards so people know what they’re buying.
Yup. As noted, if I were selling ads I would probably go by Google Analytic numbers, precisely because they are a widely understood metric.
I used the nook browser to follow Whatever for a long time, but switched to Chrome when I got the nook HD+ around Xmas. The one person using the nook browser may well be my wife!