Tuesday, February 17, 2009

AARGGH


I'm one of many community organizers of the Alexandria-Arlington Regional Gaming Group. We get together and play games.

Everyone contributes how they choose. Most of the organizers help host events. In addition to that, one thing I do is to download the member list and perform some basic analysis on it.


This chart shows three variables. The red line and the faded one use the Y-axis to the right, while the blue line uses the one to the left. The time period begins at the point that the mainstay of the AOs had joined. The days before this graph starts were extreme outliers and are not predictive of future growth.

The taller, blue line shows total membership. We continue to have new members, but that is not the really important part of the story.

The faded line in the background is the daily membership join rate. This is in the background as it is the least important, and I want attention drawn to the other variables. As this is discrete points in time, I was hesitant to use a line graph at all. But without the lines, it looks really crowded; each of the individual points becomes very eye catching. Instead, I faded the line and got rid of the points.

I might entirely remove this line, but wanted to include it so my audience could see how variable the daily join is. This is a good contrast with the Rolling Average, which has an obvious, visual downward trend.

If I've done things correctly, the eye should be drawn to the red line. Red catches the eye, and suggests either "stop" or "bad". The red line is the average join per day, and I've graphed it on the same scale as the Daily Join rate. This has been going down since early January, and was going down before a NYE-era increase.

If the rolling average is decreasing, then we're attracting fewer new people as time goes on. It looks fairly steady since about January first, despite a surge just after then. While there are some odd patches after that, it is a good a time as any to use a beginning point. It has the psychological factor of describing a discrete year.

To find out how strong this connection really is, we need to compare the day and the daily join rate using something like a Pearson's Correlation Coefficient or simple linear regression. This data is all in Excel, which lists dates as serial days since January 1, 1900. Instead of using that, I can convert the days into the number of days since January 1, 2009. I can then do a linear tregression and get something more meaningful. I could just find the correlation, but a linear regression allows for extrapolation.

Either way, this is where Excel's Analysis toolpack comes in handy.

When a regression is perfomed, I get a Multiple R of 0.82. More or less this means that 82% of the change in daily join rate can be computed based solely upon a non-zero starting point and the days since January 1.

That's pretty significant. The p-values are all very near zero (ie, they would be unconnected in a very small number of distributions that look like this one), so this is likely a significant relationship. The coefficient is -0.014, meaning each day about 1/10th less people join. Yes, people come in discrete packets, but for a moment that can be ignored.

If we take this moderately seriously, then it is about 130 days until membership increase drops to zero. At which point, we'll be at 263 members and it will be June 24.

Granted, this isn't actually very likely. Each time the group meets, new people show up. These events act like intervention on the the variable of Meetup Size, which is the deterministic cause of Daily Join Rate. Supposing that nothing dramatic or terrible happens, this is likely to be a low-end estimate of the size near the end of June. This sort of extrapolation is known to be be particuarly exact, especially with a R^2 so far from one.

Unfortunately, the group list that meetup.com gives its users does not include information about users who have left a group. Without that information, the only way to find out if the size is really going down is to compare different time period.

With a few tricks in Excel and meetup's very basic member list, this sort of analysis can be generated in a few minutes. The trick is not to take it too seriously, especially extrapolation of a linear regression.

No comments:

Post a Comment