Scalability and Asserts I’m Not Yet Fixing
I’ve been a bit quiet because I’ve been working on (hopefully) completely invisible stuff involving backend server scalability. What does that mean, you ask? In practical player-facing terms, it means I’m trying to get the lobby and registration system robust enough to first invite the rest of the waiting list in,1 and then to open the beta up completely, allowing people to hear about the game, visit the site, pay their money, and get into the beta instantly. Between 10 and 500 people sign up a day, depending on how much press the game is getting at the time, and I’m very curious to find out how many of those will join the beta if it’s available immediately, and they aren’t told they have to wait.
I’m currently working on the text-mode robot client. I’ve got it logging into the lobby and pretending it’s a full SpyParty client, and it can chat, and invite people to games. I posted on facebook and twitter asking for suggestions of what the robots should say to each other as they’re hanging out in the lobby.
Next up, I’m going to get the robots playing fake matches and games and reporting the game results to the lobby, and logging in and out a bunch. Then, I’m going to hack up the bees with machineguns load testing app to launch a bunch of EC2 micro instances with multiple robots running on each, and aim them all at my test server. I expect chaos.
Once I’ve got the obvious stuff fixed, and have figured out how many clients my system can host at a given machine size, I’ll do some tests with beefier machines to make sure it scales linearly. There are four basic machine resources: CPU, memory, network, and disk…I assume I’ll run out of memory first, but I don’t know for sure. I was originally going to try to get the entire backend infrastructure running in the cloud, but I think for the near term I’m going to just make sure I get a machine that can accept way more clients signing up and playing than I think I’m going to get when I open up the beta, and hope it holds up. If it dies due to too much traffic, well, I guess that’s a good problem to have, at least! I did this same kind of thing with when I started taking beta signups, and I stress tested and optimized it well enough that it didn’t even break a sweat, so hopefully the beta launch itself will go as smoothly. If not, then if Blizzard can do it, so can I! Yikes.
Then, once I’ve got the backend scalable for the robots, I’ll start inviting much larger groups into the beta to loadtest with live players. There are some features I’m going to need to add to make this work, like chat rooms and colored chat text, and it’s going to be pretty raw to start with, but it should work okay. The beta testers have been very forgiving of my incredibly primitive lobby so far, so hopefully that attitude will continue!
After everybody’s in, I’ll shut off the signups and let it stew for a few days, and if it’s working, I’ll open it up! I hesitate to give time estimates on this stuff, because I’ve never hit a date, but most of this will be happening over the next few weeks.
Asserts!
Programming is complicated, and handling errors makes it even more complicated. Oftentimes, it’s good programming practice to not handle some types of errors gracefully, but simply assert that you aren’t in a state you shouldn’t be in, and if the assertion fails, you exit with an error.
assert(1 + 1 == 2); // integer arithmetic better work or we're hosed
Of course, during development, the impossible is not only possible, but likely, and so your asserts fire. And, often in complicated code, you’ll assert things that should be true, but aren’t catastrophic if they’re not true, and so you usually pop up a dialog that gives you the choice to break into the debugger, abort the program, or ignore the assert (just this once, or forever). The problem is, once you let yourself have that ignore option, you can get lazy, and start using asserts as popup reminders of things to fix. This tends to be really bad on a large team, because the game is asserting every 2 seconds, and you’re just hitting ignore all the time to other people’s asserts, and it becomes a habit. However, on an indie sized team, which in my case means exactly one programmer, one can use asserts this way and they can still be useful. I can get a feel for how often certain types of things are going wrong while I’m testing, and I can remember what I was doing when a specific assert fired most of the time…it goes from a purely quantitative tool to a qualitative tool.
I also can leave the asserts on in beta builds, but in a way that fires silently once and then auto-ignores-forever, and then have them shipped off over the network to the server, so I can see what kinds of things are going wrong on player machines, and how frequently they go wrong.
I tend to be pretty liberal with asserts in my code, and so they fire a lot, and in turn the server logs a lot of them. About 30000 of them so far in the beta. Here they are, sorted by frequency:
4918 | internalMcostTest(sx, sy) object_system\subsystems\pathing\pathing.cpp |
2237 | holding object_system\subsystems\animation\animation.cpp |
2237 | ad object_system\subsystems\animation\animation.cpp |
2145 | am && (right || left) object_system\subsystems\animation\animation.cpp |
1780 | am->playing.blend_out == am->queue.front().blend_in object_system\subsystems\animation\animation.cpp |
1485 | am && (boneId != -1) && ad object_system\subsystems\animation\animation.cpp |
1415 | cd->object_of_interest character.cpp |
1350 | d->NearBookcaseID situations\bookcase\bookcase.cpp |
1114 | (err = glGetError()) == GL_NO_ERROR, “code: 0x00000501” spyparty.cpp |
1105 | cd->object_of_interest && (cd->object_of_interest == BriefcaseID) situations\briefcase\briefcase.cpp |
994 | StatueAD && (StatueAD->Object->Type == object_types::STATUE) && (object_system::GetObject(StatueAD->ParentID)->Type == object_types::PEDESTAL) situations\pedestal\pedestal.cpp |
952 | am && (am->playing.type == &core_talks) situations\conversation\conversation.cpp |
813 | BriefcaseAD && HoldingBriefcase && d->PlayingCycle situations\briefcase\briefcase.cpp |
801 | !d->PlayingCycle situations\drinks\drinks.cpp |
628 | verify( pathing::pathGetCharacterValue(x, y) == BriefcasePathValue ) situations\briefcase\briefcase.cpp |
539 | verify( animation::HandleDetach(am, am->Events[i].event->boneId, cd->holding_right, cd->holding_left, &OnRight) && OnRight ) situations\briefcase\briefcase.cpp |
478 | cd->object_of_interest == BriefcaseID situations\briefcase\briefcase.cpp |
469 | !PoppedYesNoQuestioner spyparty.cpp |
413 | d->GoalPedestalID == d->NearPedestalID situations\pedestal\pedestal.cpp |
278 | HoldingBriefcase && BriefcaseAD situations\briefcase\briefcase.cpp |
252 | verify( HandleDetach(am, am->Events[i].event->boneId, cd->holding_right, cd->holding_left, &OnRight) && OnRight ) situations\pedestal\pedestal.cpp |
236 | IK.Bone && IK.Target && IK.MeshHardpoint object_system\subsystems\animation\animation.cpp |
229 | am->queue.empty() object_system\subsystems\animation\animation.cpp |
216 | x == p->ex && y == p->ey object_system\subsystems\pathing\pathing.cpp |
193 | 0, “unknown packet type: 9” spy_server.cpp |
188 | !”stuck!” character.cpp |
172 | cd->disposable_left situations\pedestal\pedestal.cpp |
160 | verify( pathing::pathGetCharacterValue(x, y) == pathing::PATH_VALUE_INFINITE ) situations\briefcase\briefcase.cpp |
151 | !cd->holding_right || (cd->holding_right == d->BookID) situations\bookcase\bookcase.cpp |
139 | (cd->holding_left == d->BookID) && (!cd->holding_right || (cd->holding_right == d->BookID)) situations\bookcase\bookcase.cpp |
110 | e.MatchTimestamp > RoundTimeline[LastMarkSuspectIdx].MatchTimestamp round_events.cpp |
97 | verify( GetConnectionNames(Us, sizeof(Us), Them, sizeof(Them)) ) spyparty.cpp |
78 | 0, “unknown packet type: 15” spy_server.cpp |
77 | 0, “unknown packet type: 9” sniper_client.cpp |
74 | Distance2(cd->Object->Position, ps_sci->Object->Position) <= MaxHandoffDistance2 situations\briefcase\briefcase.cpp |
70 | !gd->ForceGoToPedestalID situations\steal_statue\steal_statue.cpp |
66 | am->playing.type == &core_briefcase_pickups situations\briefcase\briefcase.cpp |
59 | !”spy stuck!” spy_server.cpp |
59 | 0, “unknown packet type: 9” spyparty.cpp |
55 | pathTestOpen(sx, sy) object_system\subsystems\pathing\pathing.cpp |
50 | 0, “unknown packet type: 7” spy_server.cpp |
48 | d->BookBookcaseID && cd->IsGoalOwner(this) situations\bookcase\bookcase.cpp |
43 | !”what do to here?” examples\lobby\lobbyclient.cpp |
40 | verify( animation::GetPlayingAnimationInfo(am, &time, &duration) ) situations\conversation\conversation.cpp |
40 | HoldingBriefcase situations\briefcase\briefcase.cpp |
40 | am->playing.animid == -1 situations\pedestal\pedestal.cpp |
35 | n spyparty.cpp |
32 | e.MatchTimestamp > RoundTimeline[LastMarkBookIdx].MatchTimestamp round_events.cpp |
26 | verify( animation::HandleDetach(am, am->Events[i].event->boneId, cd->holding_right, cd->holding_left, &OnRight) && !OnRight ) situations\bookcase\bookcase.cpp |
26 | object_system::GetObject(cd->holding_left) && (object_system::GetObject(cd->holding_left)->Type == FNV1(“BOOK”)) situations\bookcase\bookcase.cpp |
26 | it != am->AnimHandleMap.end() network.cpp |
26 | IsTypingString spyparty.cpp |
26 | 0, “unknown packet type: 8” spyparty.cpp |
25 | DefaultCharacterStatePacket && (ndata == CharacterStatePacketSizeBytes) sniper_client.cpp |
25 | cd->object_of_interest == d->BookID situations\bookcase\bookcase.cpp |
24 | !p2pauth_con && (p2pauthn_state == WAITING_AUTHN) examples\lobby\lobbyclient.cpp |
24 | !IsSpy && d->TargetBookcaseID situations\bookcase\bookcase.cpp |
24 | HoldingDrink situations\drinks\drinks.cpp |
23 | obj->Rotation.IsIdentity() character.cpp |
23 | HoldingBook situations\bookcase\bookcase.cpp |
23 | !err, “krb5_rd_priv err: -1765328342” examples\lobby\lobbyclient.cpp |
22 | (w >= 0) && (w <= 1) object_system\subsystems\animation\animation_cal3dutils.cpp |
22 | d_cust && (d_cust->State == drinks_data::INVALID) situations\serving\serving.cpp |
21 | !”somehow in a valid state but !HoldingDrink?!” situations\drinks\drinks.cpp |
20 | propid && ( (Spy->holding_right == propid) || (Spy->holding_left == propid)) situations\steal_statue\steal_statue.cpp |
20 | !HoldingDrink situations\drinks\drinks.cpp |
18 | !”somehow didn’t get statue” situations\pedestal\pedestal.cpp |
18 | ad && (ad->Object->Type == object_types::STATUE) situations\pedestal\pedestal.cpp |
16 | o && !(o->Flags & object::UNMANAGED) object_system\object_manager.cpp |
16 | am->queue.empty() situations\drinks\drinks.cpp |
13 | !”something went wrong picking up briefcase!” situations\briefcase\briefcase.cpp |
12 | !”should have detached” situations\bookcase\bookcase.cpp |
11 | !”should not get here” situations\conversation\conversation.cpp |
11 | 0, “unknown packet type: 11” spyparty.cpp |
10 | (w >= 0) && (w <= 1) && (u >= 0) && (u <= 1) && (v >= 0) && (v <= 1) checkerlib\misc\geomutils.cpp |
10 | verify( network::SendPacket(&gs, sizeof(gs), true) ) spy_server.cpp |
10 | (rem >= 0.0f) && (rem < 1.0f) spyparty.cpp |
10 | Pedestal situations\pedestal\pedestal.cpp |
10 | (err = glGetError()) == GL_NO_ERROR, “code: 0x00000505” spyparty.cpp |
9 | Statue && (Statue->Type == object_types::STATUE) object_utils.cpp |
9 | Level network.cpp |
9 | !err examples\lobby\async_krb5_wrapper.cpp |
9 | 0, “unknown packet type: 1” sniper_client.cpp |
8 | SpyCheck == player_control::state::TESTING situations\check_watch\check_watch.cpp |
8 | ObjectIDToByte.empty() || nettest_mode network.cpp |
8 | !IsSpy && cd->object_of_interest situations\pedestal\pedestal.cpp |
8 | ByteToObjectID.empty() || nettest_mode network.cpp |
8 | 0, “unknown packet type: 20” sniper_client.cpp |
8 | 0, “unknown packet type: 13” spy_server.cpp |
8 | 0, “unknown packet type: 13” spyparty.cpp |
7 | ObjectIDToByte.empty() && ByteToObjectID.empty() network.cpp |
7 | (CameraMode == SNIPER_CAMERA) && Level round_events.cpp |
7 | 0, “unknown packet type: 24” spy_server.cpp |
6 | SwapStatueID && ( (Spy->holding_right == SwapStatueID) || (Spy->holding_left == SwapStatueID)) situations\steal_statue\steal_statue.cpp |
6 | RoundTimeline.size() < round_events_packet::MAX_NUM_EVENTS round_events.cpp |
6 | o network.cpp |
6 | mc && (mc->Object->Type == object_types::STATUE) && (StatueMeshIndex < mc->Meshes.size()) object_utils.cpp |
6 | IsChatAllowed() spyparty.cpp |
6 | ad object_utils.cpp |
6 | 0, “unknown packet type: 21” sniper_client.cpp |
5 | !”shouldn’t get here” situations\steal_statue\steal_statue.cpp |
5 | mark_value <= 1.0f spyparty.cpp |
5 | client && client->OtherClientID spyparty_lobby.cpp |
5 | 0, “unknown packet type: 25” spy_server.cpp |
4 | !”shouldn’t get here, must have moved while holding book” situations\book_transfer\book_transfer.cpp |
4 | object_system::GetObject(History.States[i].ID) network.cpp |
4 | Level sniper_client.cpp |
4 | Level->ActiveGameTypeIndex < Level->GameTypes.size() sniper_client.cpp |
4 | !err examples\lobby\lobbyclient.cpp |
4 | am network.cpp |
4 | 0, “unknown packet type: 10” spyparty.cpp |
3 | SpyCheck == player_control::state::TESTING situations\seduction\seduction.cpp |
3 | !”shouldn’t get here” examples\lobby\lobbyclient.cpp |
3 | i_ind == i_dep+1 character.cpp |
3 | !Focus player_control.cpp |
3 | Control->GetSpyTriggeredResult(this) == player_control::state::TESTING situations\double_agent\double_agent.cpp |
3 | !(ChooserCurrentCharacter->Object->Flags & object_system::object::UNMANAGED) ui.cpp |
3 | am->Events[i].animation->getCoreAnimation() situations\drinks\drinks.cpp |
3 | 0, “unknown packet type: 16” spyparty.cpp |
3 | 0, “unknown packet type: 15” spyparty.cpp |
2 | verify( network::SendPacket(&p, sizeof(p), true) ) network.cpp |
2 | verify( network::SendPacket(&gs, sizeof(gs), true) ) spyparty.cpp |
2 | verify( ConfirmGameIDToLobby(CurrentPlayID, CurrentGameID) ) spy_server.cpp |
2 | verify( animation::GetPlayingAnimationInfo(am, &time, &duration) ) situations\book_transfer\book_transfer.cpp |
2 | t >= 0 round_events.cpp |
2 | State.CurrentNode->Parent->StringSoFar == State.CurrentLeaf->StringSoFar spyparty.cpp |
2 | s checkerlib\misc\glutils.cpp |
2 | mc situations\steal_statue\steal_statue.cpp |
2 | glMultiTexCoord2f_ && glActiveTexture_ spy_server.cpp |
2 | GameIDs.find(PacketPlayID) == GameIDs.end() examples\lobby\lobbyclient.cpp |
2 | !err && p2pauth_con && krbtgt.data && (krbtgt.length < KRBTGT_LIMIT) examples\lobby\lobbyclient.cpp |
2 | (err = glGetError()) == GL_NO_ERROR, “code: 0x00000506” spyparty.cpp |
2 | !decoder.underflowed() && decoder.on_last_byte() examples\lobby\lobbyclient.cpp |
2 | CurrentNetworkObjectID network.cpp |
2 | !ChooserScrollDrag ui.cpp |
2 | (cd->object_of_interest == d->BookID) || (ps->status != pathing::PATH_success) situations\bookcase\bookcase.cpp |
2 | cd->object_of_interest && (cd->object_of_interest == d->BookID) situations\bookcase\bookcase.cpp |
2 | ATPCachedDescription player_control.cpp |
2 | Array && Num checkerlib\misc\utils.h |
2 | am->playing.type == core_book_hidefilm_okay situations\book_transfer\book_transfer.cpp |
2 | 0, “unknown packet type: 8” sniper_client.cpp |
2 | 0, “unknown packet type: 2” spy_server.cpp |
2 | 0, “unknown packet type: 22” sniper_client.cpp |
1 | verify( network::SendPacket(DefaultCharacterStatePacket, CharacterStatePacketSizeBytes, true) ) spy_server.cpp |
1 | verify( network::SendPacket(CommandsPacketBuffer, SizeBytes, true) ) network.cpp |
1 | verify( Camera.Project(vector_3(0, 0,0), &origin) ) spyparty.cpp |
1 | StatueAD && (StatueAD->Object->Type == object_types::STATUE) situations\pedestal\pedestal.cpp |
1 | !”somehow didn’t get book” situations\bookcase\bookcase.cpp |
1 | ScreamSound spy_server.cpp |
1 | network::IsConnected() spy_server.cpp |
1 | it != NetTestByteMap.end() network.cpp |
1 | fabs(1-Length2(Axis)) < 1e-5 checkerlib\misc\math4d.h |
1 | (err = glGetError()) == GL_NO_ERROR, “code: 0x00000502” spyparty.cpp |
1 | DefaultCharacterStatePacket && (ndata == sizeof(*DefaultCharacterStatePacket) + DefaultCharacterStatePacket->NumCharacters*sizeof(DefaultCharacterStatePacket->States[0])) sniper_client.cpp |
1 | !decoder.underflowed() network.cpp |
1 | !decoder.underflowed() && decoder.on_last_byte() network.cpp |
1 | (d >= 0) && (d <= 1) c:\users\checker\dev\spyparty\project\spyparty\code\network.h |
1 | CharacterStatePacket && (CharacterStatePacket->NumCharacters <= MaxNumCharacters) network.cpp |
1 | ByteToObjectID.find(CurrentNetworkObjectID) == ByteToObjectID.end() network.cpp |
1 | 0, “unknown packet type: 8” spy_server.cpp |
1 | 0, “unknown packet type: 6” spy_server.cpp |
1 | 0, “unknown packet type: 6” sniper_client.cpp |
1 | 0, “unknown packet type: 4” sniper_client.cpp |
1 | 0, “unknown packet type: 2” sniper_client.cpp |
1 | 0, “unknown packet type: 25” sniper_client.cpp |
1 | 0, “unknown packet type: 19” spyparty.cpp |
1 | 0, “unknown packet type: 18” spyparty.cpp |
1 | 0, “got new play id with existing: 0xca98 “ sniper_client.cpp |
1 | 0, “got new play id with existing: 0x57a5 “ sniper_client.cpp |
1 | 0, “got new play id with existing: 0x5267 “ sniper_client.cpp |
1 | 0, “got new play id with existing: 0x3e15 “ sniper_client.cpp |
1 | 0, “got new play id with existing: 0x33fc “ sniper_client.cpp |
1 | 0, “got new play id (0xabeb) with existing (0xcdb6) “ sniper_client.cpp |
1 | 0, “got new play id (0x6f9f) when already playing with existing (0xfb2e) “ sniper_client.cpp |
- currently at 16681 people as of this post [↩]
Interested to see how a full-sized community will change and evolve SpyParty game-play.
You and me both!
I cannot wait until I can get into the beta. Every time I get an email notification on my phone I hope it is my beta invite. Excited once you roll out all the invites
I think I’d go crazy if I had a todo list as long as yours is. Still, the promise of finally getting paid has got to be a solid motivator.
Nice to hear an update!
Getting paid will be nice, since I’m definitely spending lots of my savings, but the bigger motivator is finally getting all these people invited in and playing! Once that’s mostly working I can go back to working on the game itself!
Hey, great, a new post! Thanks!
I agree, i enjoy reading about what’s going on in that studio of his. I would demand a new post every week, but i don’t think that’d be possible.
My thoughts when reading the list of asserts: ” Ah yes… of course… we need to compound the hyperlink structure code and hack into the main cpu to boost optimal paths.”
My signup request date-time is 2011-05-10 17:01:05 and last week you tweeted you’re at 2011-05-10 16:18:43… I’m so very, very close…
Why make bots when you can just invite real players? I only have sixteen minutes and forty seconds left…
Bots don’t judge me when my code breaks…that I know of.
Bots judging people’s bad code was what started the robot uprising before the beginning of Terminator.