This article is the first in a 5-part series.
- Part 1: The Best and the Rest is available here: (Gamasutra) (BlogSpot) (in Chinese)
- Part 2: Building Effective Teams is available here: (Gamasutra) (BlogSpot) (in Chinese)
- Part 3: Game Development Factors is available here: (Gamasutra) (BlogSpot) (in Chinese)
- Part 4: Crunch Makes Games Worse is available here: (Gamasutra) (BlogSpot) (in Chinese)
- Part 5: What Great Teams Do is available here: (Gamasutra) (in Chinese)
- For extended notes on our survey methodology, see our Methodology blog page.
- Our raw survey data (minus confidential info) is now available here if you’d like to verify our results or perform your own analysis.
- 關於問卷的方法論，請參閱我們的部落格頁面 “Game Outcomes Project Methodology":http://intelligenceengine.blogspot.tw/2014/11/game-outcomes-project-methodology-in.html
The Game Outcomes Project, Part 1: The Best and the Rest
What makes the best teams so effective?
Veteran developers who have worked on many different teams often remark that they see vast cultural differences between them. Some teams seem to run like clockwork, and are able to craft world-class games while apparently staying happy and well-rested. Other teams struggle mightily and work themselves to the bone in nightmarish overtime and crunch of 80-90 hour weeks for years at a time, or in the worst case, burn themselves out in a chaotic mess. Some teams are friendly, collaborative, focused, and supportive; others are unfocused and antagonistic. A few even seem to be hostile working environments or political minefields with enough sniping and backstabbing to put Team Fortress 2 to shame.
在不同團隊工作過的資深的開發者，常談論團隊間的文化差異。有些團隊的運作方式像精準的時鐘，能夠產生世界級的產品，同時還有愉快的工作環境與充分的休閒生活。有些團隊則在無止盡的熬夜工作中掙扎前進，甚至一週工作八十到九十小時。更糟糕的是，成員因此覺得自己油盡燈枯。有些團隊氣氛良好，合作愉快，專注，游刃有餘。有些團隊目標搖擺不定，成員互相制衡。甚至有些團隊是在仇視，背刺，針對性行為或充滿政治的工作環境下工作。難道大家都沒玩Team Fortress 2（絕地要塞2）嗎？
What causes the differences between those teams? What factors separate the best from the rest?
As an industry, are we even trying to figure that out?
Are we even asking the right questions?
These are the kinds of questions that led to the development of the Game Outcomes Project. In October and November of 2014, our team conducted a large-scale survey of hundreds of game developers. The survey included roughly 120 questions on teamwork, culture, production, and project management. We suspected that we could learn more from a side-by-side comparison of many game projects than from any single project by itself, and we were convinced that finding out what great teams do that lesser teams don’t do – and vice versa – could help everyone raise their game.
Our survey was inspired by several of the classic works on team effectiveness. We began with the 5-factor team effectiveness model described in the book Leading Teams: Setting the Stage for Great Performances. We also incorporated the 5-factor team effectiveness model from the famous management book The Five Dysfunctions of a Team: A Leadership Fable and the 12-factor model from 12: The Elements of Great Managing, which is derived from aggregate Gallup data from 10 million employee and manager interviews. We felt certain that at least one of these three models would surely turn out to be relevant to game development in some way.
我們的問卷被幾項團隊效率的數個研究所啟發：Leading Teams: Setting the Stage for Great Performances的"團隊效率的五個指標"；管理名著：The Five Dysfunctions of a Team: A Leadership Fable；Gallup 公司所蒐集一千萬員工與管理者面試資料所產出12: The Elements of Great Managing的"十二個指標"；我們認為這三種模型中至少有一種能真正套用在遊戲開發上。
We also added several categories with questions specific to the game industry that we felt were likely to show interesting differences.
On the second page of the survey, we added a number of more generic background questions. These asked about team size, project duration, job role, game genre, target platform, financial incentives offered to the team, and the team’s production methodology.
We then faced the broader problem of how to quantitatively measure a game project’s outcome.
Ask any five game developers what constitutes “success,” and you’ll likely get five different answers. Some developers care only about the bottom line; others care far more about their game’s critical reception. Small indie developers may regard “success” as simply shipping their first game as designed regardless of revenues or critical reception, while developers working under government contract, free from any market pressures, might define “success” simply as getting it done on time (and we did receive a few such responses in our survey).
Lacking any objective way to define “success,” we decided to quantify the outcome through the lenses of four different kinds of outcomes. We asked the following four outcome questions, each with a 6-point or 7-point scale:
- “To the best of your knowledge, what was the game’s financial return on investment (ROI)? In other words, what kind of profit or loss did the company developing the game take as a result of publication?"
- “For the game’s primary target platform, was the project ever delayed from its original release date, or was it cancelled?"
- “What level of critical success did the game achieve?"
- “Finally, did the game meet its internal goals? In other words, to what extent did the team feel it achieved something at least as good as it was trying to create?"
We hoped that we could correlate the answers to these four outcome questions against all the other questions in the survey to see which input factors had the most actual influence over these four outcomes. We were somewhat concerned that all of the “noise” in project outcomes (fickle consumer tastes, the moods of game reviewers, the often unpredictable challenges inherent in creating high-quality games, and various acts of God) would make it difficult to find meaningful correlations. But with enough responses, perhaps the correlations would shine through the inevitable noise.
We then created an aggregate “outcome” value that combined the results of all four of the outcome questions as a broader representation of a game project’s level of success. This turned out to work nicely, as it correlated very strongly with the results of each of the individual outcome questions. Our Methodology blog page has a detailed description of how we calculated this aggregate score.
We worked carefully to refine the survey through many iterations, and we solicited responses through forum posts, Gamasutra posts, Twitter, and IGDA mailers. We received 771 responses, of which 302 were completed, and 273 were related to completed projects that were not cancelled or abandoned in development.
So what did we find?
In short, a gold mine. The results were staggering.
More than 85% of our 120 questions showed a statistically significant correlation with our aggregate outcome score, with a p-value under 0.05 (the p-value gives the probability of observing such data as in our sample if the variables were be truly independent; therefore, a small p-value can be interpreted as evidence against the assumption that the data is independent). This correlation was moderate or strong in most cases (absolute value > 0.2), and most of the p-values were in fact well below 0.001. We were even able to develop a linear regression model that showed an astonishing 0.82 correlation with the combined outcome score (shown in Figure 1 below).
Figure 1. Our linear regression model (horizontal axis) plotted against the composite game outcome score (vertical axis). The black diagonal line is a best-fit trend line. 273 data points are shown. 圖片請連原文：http://gamasutra.com/db_area/images/blog/232023/regression_vs_outcome_normalized.png。我們的水平軸回歸分析對上垂直軸的產出分數。
To varying extents, all three of the team effectiveness models (Hackman’s “Leading Teams” model, Lencioni’s “Five Dysfunctions” model, and the Gallup “12” model) proved to correlate strongly with game project outcomes.
We can’t say for certain how many relevant questions we didn’t ask. There may well be many more questions waiting to be asked that would have shined an even stronger light on the differences between the best teams and the rest.
But the correlations and statistical significance we discovered are strong enough that it’s very clear that we have, at the very least, discovered an excellent partial answer to the question of what makes the best game development teams so successful.
廣義來說，三個團隊效率的模型（Hackman’s “Leading Teams” model, Lencioni’s “Five Dysfunctions” model, and the Gallup “12” model）全部都與我們的產出分數高度相關。
The Game Outcomes Project Series
Due to space constraints, we’ll be releasing our analysis as a series of several articles, with the remaining 3 articles released at 1-week intervals beginning in January 2015. We’ll leave off detailed discussion of our three team effectiveness models until the second article in our series to allow these topics the thorough analysis they deserve.
This article will focus solely on introducing the survey and combing through the background questions asked on the second survey page. And although we found relatively few correlations in this part of the survey, the areas where we didn’t find a correlation are just as interesting as the areas where we did.
Project Genre and Platform Target(s)
First, we asked respondents to tell us what genre of game their team had worked on. Here, the results are all across the board.
Figure 2. Game genre (vertical axis) vs. composite game outcome score (horizontal axis). Higher data points (green dots) represent more successful projects, as determined by our composite game outcome score. 水平軸的遊戲類型對垂直軸的產出分數。越高的數值代表越成功的案子。
We see remarkably little correlation between game genre and outcome. In the few cases where a game genre appears to skew in one direction or another, the sample size is far too small to draw any conclusions, with all but a handful of genres having fewer than 30 responses.
(Note that Figure 2 uses a box-and-whisker plot, as described here).
We also asked a similar question regarding the product’s target platform(s), including responses for desktop (PC or Mac), console (Xbox/PlayStation), mobile, handheld, and/or web/Facebook. We found no statistically significant results for any of these platforms, nor for the total number of platforms a game targeted.
Project Duration and Team Size
We asked about the total months and years in development; based on this, we were able to calculate each project’s total development time in months:
Figure 3. Total months in development (horizontal axis) vs game outcome score (vertical). The black diagonal line is a trend line. 總月數對產出分數
As you can see, there’s a small negative correlation (-0.229, using the Spearman correlation coefficient), and the p-value is 0.003. This negative correlation is not too surprising, as troubled projects are more likely to be delayed than projects that are going smoothly.
We also asked about the size of the team, both in terms of the average team size and the final team size. Average team size was between 1 and 11 with an average of 5.7; final team size was between 1 and 500 with an average of 48.6. Both showed a slight positive correlation with project outcomes, as shown below, but in both cases the p-value is over 0.1, indicating there’s not enough statistical significance to make this correlation useful or noteworthy. We suspect that the small positive correlation can be explained by the fact that a struggling project is less likely to receive additional resources over time than one that’s going well. So the result is not too surprising.
Figure 4. Average team size correlated against game project outcome (vertical axis).平均人數對產出分數
Figure 5. Final team size correlated against game project outcome (vertical axis).最終人數對產出分數
Figure 6. Percent change in team size (final divided by average) correlated against game project outcome (vertical axis).專案人數改變綠對產出分數
We asked about the technology solution used: whether it was a new engine built from scratch; core technology from a previous version of a similar game or another game in the same series; an in-house / proprietary engine (such as EA Frostbite); or an externally-developed engine (such as Unity, Unreal, or CryEngine).
The results are as follows:
Figure 7. Game engine / core technology used (horizontal axis) vs game project outcome (vertical axis), using a box-and-whisker plot.遊戲引擎對產出分數
|Average composite score||Standard Deviation||Number of responses|
|Engine from previous version of same or similar game||64.8||15.8||58|
|Internal/proprietary engine / tech (such as EA Frostbite)||60.7||19.4||46|
|Licensed game engine (Unreal, Unity, etc.)||55.6||17.5||113|
The results here are less striking the more you look at them. The highest score was for projects that used an engine from a previous version of the same game or a similar one – but that’s exactly what one would expect to be the case, given that teams in this category clearly already had a head start in production, much of the technical risk had already been stamped out, and there was probably already a veteran team in place that knew how to make that type of game!
We analyzed these results using a Kruskal-Wallis one-way analysis of variance, and we found that this question was only statistically significant on account of that very option (engine from a previous version of the same game or similar), with a p-value of 0.006. Removing the data points related to this answer category caused the p-value for the remaining categories to shoot up above 0.3.
我們使用Kruskal-Wallis one-way analysis of variance來分析結果，我們發現只有沿用同類型技術的專案才有顯著的低p值（0.006）。除此之外的數據的p值都超過0.3。
Our interpretation of the data is that the best option for the game engine depends entirely on the game being made and what options are available for it, and that any one of these options can be the “best” choice given the right set of circumstances. In other words, the most reasonable conclusion is there is no universally “correct” answer separate from the actual game being made, the team making it, and the circumstances surrounding the game’s development. That’s not to say the choice of engine isn’t terrifically important, but the data clearly shows that there plenty of successes and failures in all categories with only minimal differences in outcomes between them, clearly indicating that each of these four options is entirely viable in some situations.
We also did not ask which specific technology solution a respondent’s dev team was using. Future versions of the study may include questions on the specific game engine being used (Unity, Unreal, CryEngine, etc.)
We also asked a question on this page regarding the team’s average experience level, along a scale from 1 to 5 (with a ‘1’ indicating less than 2 years of average development experience, and a ‘5’ indicating a team of grizzled game industry veterans with an average of 8 or more years of experience).
Figure 8. Team experience level ranking (horizontal axis, by category listed above) mapped against game outcome score (vertical axis)團隊經歷對產出分數
Here, we see a correlation of 0.19 (and p-value under 0.001). Note in particular the complete absence of dots in the upper-left corner (which would indicate wildly successful teams with no experience) and the lower-right corner (which would indicate very experienced teams that failed catastrophically).
So our study clearly confirms the common knowledge in the industry that experienced teams are significantly more likely to succeed. This is not at all surprising, but it’s reassuring that the data makes the point so clearly. And as much we may all enjoy stories of random individuals with minimal game development experience becoming wildly successful with games developed in just a few days (as with Flappy Bird), our study shows clearly that such cases are extreme outliers.
The Surprises: Production and Incentives
This first page of our survey also revealed two major surprises.
The first surprise was financial incentives. The survey included a question: “Was the team offered any financial incentives tied to the performance of the game, the team, or your performance as individuals? Select all that apply.” We offered multiple check boxes to say “yes” or “no” to any combination of financial incentives that were offered to the team.
The correlations are as follows:
Figure 9. Incentives (horizontal axis) plotted against game outcome score (vertical axis) for the five different types of financial incentives, using a box-and-whisker plot. From left to right: incentives based on individual performance, team performance, royalties, incentives based on game reviews/MetaCritic scores, and miscellaneous other incentives. For each category, we split all 273 data points into those excluding the incentive (left side of each box) and those including the incentive (right side of each box).獎勵對產出分數，從左到右根據分別代表根據個人效率，團隊效率，分成，根據網頁評論評分，或其他數據。我們在各項目中分別列出有與沒有的情形。
Of these five forms of incentives, only individual incentives showed statistical significance. Game projects offering individually-tailored compensation (64 out of the 273 responses) had an average score of 63.2 (standard deviation 18.6), while those that did not offer individual compensation had a mean game outcome score of 56.5 (standard deviation 17.7). A Wilcoxon rank-sum test for individual incentives gave a p-value of 0.017 for this comparison.
在五種激勵因子中，只有個人的獎勵是有顯著的統計指標。273份數據中的64份有給個人的獎勵，該些專案的平均產出分數是63.2，標準差18.6，而反過來沒有給個人獎勵的數據則是平均56.5（標準差17.7）。在這個比較下給予個人獎勵的數據透過Wilcoxon rank-sum test方法可以得到0.017的p值。
All the other forms of incentives – those based on team performance, based on royalties, based on reviews and/or MetaCritic ratings, and any miscellaneous “other” incentives – show p-values that indicate that there was no meaningful correlation with project outcomes (p-values 0.33, 0.77, 0.98, and 0.90, respectively, again using a Wilcoxon rank-sum test).
This is a very surprising finding. Incentives are usually offered under the assumption that they are a huge motivator for a team. However, our results indicate that only individual incentives seem to have the desired effect, and even then, to a much smaller degree than expected.
One possible explanation is that perhaps the psychological phenomenon popularized by Dan Pink may be playing itself out in the game industry – that financial rewards are (according to a great deal of recent research) usually a completely ineffective motivational tool, and actually backfire in many cases.
可能的解釋是也許類似psychological phenomenon popularized by Dan Pink的解釋，也就是金錢的獎勵可能反而會造成反效果。
We also speculate that in the case of royalties and MetaCritic reviews in particular, the sense of helplessness that game developers can feel when dealing with factors beyond their control – such as design decisions they disagree with, or other team members falling down on the job – potentially compensates for any motivating effect that incentives may have had. With individual incentives, on the other hand, individuals may feel that their individual efforts are more likely to be noticed and rewarded appropriately. However, without more data, this all remains pure speculation on our part.
Whatever the reason, our results seem to indicate that individually tailored incentives, such as Pay For Performance (PFP) plans, seem to achieve meaningful results where royalties, team incentives, and other forms of financial incentives do not.
不管原因如何，我們的結論指向個人的獎勵，如Pay For Performance所言，確實比其他方式來的有效。
Our second big surprise was in the area of production methodologies, a topic of frequent discussion in the game industry.
We asked what production methodology the team used – 0 (don’t know), 1 (waterfall), 2 (agile), 3 (agile using “Scrum”), and 4 (other/ad-hoc). We also provided a detailed description with each answer so that respondents could pick the closest match according to the description even if they didn’t know the exact name of the production methodology. The results were shocking.
Figure 10. Production methodology vs game outcome score.製程方法論對上產出分數
Here’s a more detailed breakdown showing the mean and standard deviation for each category, along with the number of responses in each:
|Average composite score||Standard Deviation||Number of responses|
|Agile using Scrum||59.7||16.9||75|
|Other / Ad-hoc||57.6||17.6||44|
What’s remarkable is just how tiny these differences are. They almost don’t even exist.
Furthermore, a Kruskal-Wallis H test indicates a very high p-value of 0.46 for this category, meaning that we truly can’t infer any relationship between production methodology and game outcome. Further testing of the production methodology against each of the four game project outcome factors individually gives identical results.
更進一步的說，透過Kruskal-Wallis H test方法測試，竟造成了一個高度的p值0.46。也就是說我們完全無法找到方法論對於遊戲產出的關係。
Given that production methodologies seem to be a game development holy grail for some, one would expect to see major differences, and that Scrum in particular would be far out in the lead. But these differences are tiny, with a huge amount of variation in each category, and the correlations between the production methodology and the score have a p-value too high for us to deny the assumption that the data is independent. Scrum, agile, and “other” in particular are essentially indistinguishable from one another. “Unknown” is far higher than one would expect, while “Other/ad-hoc” is also remarkably high, indicating that there are effective production methodologies available that aren’t on our list (interestingly, we asked those in the “other” category for more detail, and the Cerny method was listed as the production methodology for the top-scoring game project in that category).
Also, unlike our question regarding game engines, we can’t simply write this off as some methodologies being more appropriate for certain kinds of teams. Production methodologies are generally intended to be universally useful, and our results show no meaningful correlations between the methodology and the game genre, team size, experience level, or any other factors.
This begs the question: where’s the payoff?
We’ve seen several significant correlations in this article, and we will describe many more throughout our study. Articles 2 and 3 in particular will illustrate many remarkable correlations between many different cultural factors and game outcomes, with more than 85% of our questions showing a statistically significant correlation.
So it’s very clear that where there were significant drivers of project outcomes, they stood out very clearly. Our results were not shy. And if the specific production methodology a team uses is really vitally important, we would expect that it absolutely should have shown up in the outcome correlations as well.
But it’s simply not there.
It seems that in spite of all the attention paid to the subject, the particular type of production methodology a team uses is not terribly important, and it is not a significant driver of outcomes. Even the much-maligned “Waterfall” approach can apparently be made to work well.
Our third article will detail a number of additional questions we asked around production that give some hints as to what aspects of production actually impact project outcomes regardless of the specific methodology the team uses — although these correlations are still significantly weaker on average than any of our other categories concerning culture.
We are beginning to crack open the differences that separate the best teams from the rest.
We have seen that four factors – total project duration, team experience level, financial incentives based on individual performance, and re-use of an existing game engine from a similar game – have clear correlations with game project outcomes.
Our study found several surprises, including a complete lack of any correlations between factors that one would assume should have a large impact, such as team size, game genre, target platforms, the production methodology the team used, or any additional financial incentives the team was offered beyond individual performance compensation.
In the second article in the series, to be published in early January, we will discuss the three team effectiveness models that inspired our study in detail and illustrate their correlations with the aggregate outcome score and each of the individual outcome questions. We will see far stronger correlations than anything presented in this article.
Following that, the third article will explore additional findings around many other factors specific to game development, including technology risk management, design risk management, crunch / overtime, team stability, project planning, communication, outsourcing, respect, collaboration / helpfulness, team focus, and organizational perceptions of failure. We will also summarize our findings and provide a self-reflection tool that teams can use for postmortems and self-analysis.
Finally, our fourth article will bring our data to bear on the controversial issue of crunch and draw unambiguous conclusions.
The Game Outcomes Project team would like to thank the hundreds of current and former game developers who made this study possible through their participation in the survey. We would also like to thank IGDA Production SIG members Clinton Keith and Chuck Hoover for their assistance with question design; Kate Edwards of the IGDA for assistance with promotion; and Christian Nutt and the Gamasutra editorial team for their assistance in promoting the survey.
For announcements regarding our project, follow us on Twitter at @GameOutcomes
遊戲專案為何成功團隊希望能感謝數百名現任開發者及前輩，讓這個問卷研究能順利進行。我們也同時感謝IGDA生產力同好會的成員Clinton Keith與Chuck Hoover再問題設計方面的協助；感謝IGDA的Kate Edward協助推廣此專案；感謝Christian Nutt及Gamasutra編輯群對我們問卷的支持。對我們的進度有興趣的話，不妨追蹤我們在Twitter上的帳號@GameOutcomes。