Effective Crowdsourcing for Software Feature Ideation

Effective Crowdsourcing for Software Feature
Ideation in Online Co-Creation Forums
Karthikeyan Rajasekharan, Aditya P Mathur, See-Kiong Ng
Information Systems Technology and Design
Singapore University of Technology and Design
karthikeyan@sutd.edu.sg, aditya_mathur@sutd.edu.sg, ngseekiong@sutd.edu.sg
Abstract—Many software companies are creating firm-centric
online forums for customer engagement. These forums can be an
effective crowdsourcing platform for software product feature
ideation and co-creation with the end users. We studied the
community interaction data from the ideation forums of two
software providers. Link analysis revealed that a small core
community was responsible for generating a large proportion of
the implemented ideas. This indicated the need to identify key
users in the online forum. Our analysis showed the applicability
of centrality measures such as betweenness in ranking key users.
We also found that commenting was likely to produce better
community formation amongst the participants than voting.
Keywords-co-creation; key users; ideation; link analysis;
crowdsourcing; social network analysis; expertise ranking;
software feature requirements
I. INTRODUCTION
Company-centric online user forums are an attractive
platform for company and end-user interactions and offer the
potential to co-opt customer knowledge as part of the
innovation process. Several consumer goods companies such as
Dell, Nike etc. manage online participation communities that
help to strengthen their product portfolio through customer-
suggested features. In particular, for the Software-as-a-Service
(SaaS) arena, the new product development process carries
higher risks of market adoption relative to the risks of technical
failure. Such company-owned online user forums can be used
to help mitigate the market adoption risk by transferring
knowledge from the user to the company, thereby enabling
better decision making pertaining to creating new customer-
centric product features.
User-led innovation has been suggested to be a key part of
the ideation process that can lead to breakthrough product
features, [1] found that "that on average user ideas score
higher in novelty and customer benefit, but lower in feasibility.
Even more interestingly, user ideas are placed more frequently
than expected among the very best in terms of novelty and
customer benefit." [12],[2] argued in favor of taking advantage
of online communities for generating ideas and suggest that the
system needs to be open and social for it to be successful. In
this paper, we focus on the addition of new features to an
existing software product through crowdsourcing in firm-
centric online forums.
For a company to effectively extract value and manage
knowledge creation in such an online community, there are two
key questions that merit consideration
1. How can a firm identify the key users for ideation in
the online ideation forum?
2. Which of the activities in the online ideation forums
are more effective in fostering community formation?
II. DATA GATHERING
To perform the analysis, the online ideation forums of
Salesforce.com (SFDC) and SAP were used. Salesforce.com is
a leading SaaS provider and as part of its online community
SFDC involves end users in its ideation process in a forum
entitled Ideaexchange. Given the nature of its business
(providing software services over the internet), the company
has an active ecosystem of partners and customers who
interact with each other and with SFDC in this online forum.
SAP also has an active ecosystem and has been pursuing Open
Innovation and Crowd-Sourcing as a means of generating new
customer insight. SAP’s ideation forum was called IdeaPlace.
A. Ideation Forum Structure
The screenshot in Figure 1 shows the structure of an
ideation forum using SFDC as an example.
Figure 1. Salesforce Idea Exchange. (forum structure)

The key activities that users can perform in such a forum
are the suggestion of ideas, voting on ideas (up and down),
commenting on ideas and annotating the ideas with meta-data
tags. Each idea belongs to a single user and users cannot vote
on an idea more than once. They can however, comment on a
single idea many times. Each user is uniquely identified by a
user identifier. Ideas and comments are linked to the users who
created them.
B. Crawling the Forum
The forums of Salesforce and SAP were crawled using the
Selenium and Scrapy toolkits for publicly available ideation
information and the data that was obtained were encapsulated
into PostgreSQL databases for further analysis.
C. Dataset Description
The datasets that were obtained is described in detail in table I.
TABLE I. IDEATION FORUM DATA THAT WAS GATHERED
Forum #Ideas #Participants #Comments #Votes
SFDC 19,593 73,942 62,389 516,514
SAP 7,506 2,226 7,276 40,765
In the subsequent sections, the discussion will focus on the
SFDC dataset; similar results were obtained on the SAP dataset
and are summarized in the section on related work.
III. ACTIVITY GRAPH GENERATION
We construct activity graphs (one for voting and one for
commenting) from the dataset as follows. Each Node in the
activity graph represents a unique user account. The edges in
this activity graph correspond to a particular communication
activity between two users.
An assumption made in this analysis is that each user
account refers to a unique individual. Each node is annotated
with properties such as the number of ideas and votes that were
contributed by that user.
Each edge in our graph is a reflection of communication
between two users in relation to a particular idea. Edges are
derived through the procedure illustrated by example below
1. User A makes an idea contribution to the community.
User A is identified as the originator and the node's
idea count is increased by 1.
2. User B then comments on the idea proposed by User
A. Thus, this indicates a communication interaction
from User B to User A on his or her idea. An edge is
created from User B to User A to capture this
interaction. The number of such interactions between
the two users will determine the strength of that edge.
3. User C comments on the same idea. Now an edge is
drawn between C and A.
4. User D introduces a new Idea. User D's Idea count is
incremented but no edges are drawn.
5. User C and User A comment on User D's idea. Edges
are drawn from User A and User C to User D.
This is illustrated in the Figure 2. This process is repeated
with the voting activity data to obtain the voting activity graph.
Figure 2. Activity Graph Construction
Vote graph degree distribution plot
Comment graph degree distribution plot
Figure 3. Degree Distribution of Activity Graphs

The activity graphs were visualized using [5] and were
found to exhibit a core-periphery structure. There is a highly
connected (relative to rest of the graph) core community whose
members have diverse interests and connect with the less active
periphery community of users. Studying the degree
distributions plots of these activity graphs as shown in Figure
3, the activity networks’ degree distribution likely follow
power law distributions as per the formulation,
( )
Typically, for real world power law distributions, the value
of α is between 2 and 3. The values that were obtained were
2.12 and 2.05 for the vote graph and comment graph
respectively. The power law fitting libraries used in [3] were
used to make these calculations. This seems to suggest that
these are scale free networks within empirical limits and show
behavior similar to those observed in other empirical networks
in [3].
IV. ISOLATING THE CORE COMMUNITY
Given the observation that the user community structure is
that of a core-periphery type, we develop a heuristic algorithm
based on average degree of a sub-graph to isolate the core
ideation community. Intuitively, the sub-graph that forms the
core of the activity graph will have an average degree that is
maximal.
A. Core Community Isolation Results
The above algorithm was applied and the results are shown
in Figure 4. The Y axis tracks the value of the Average Degree
of the sub-graph and the X axis shows the degree cutoff. Based
on the maximal average degree of the sub-graph, we find the
degree cutoff points for the core were 150 and 61 for the Vote
graph and the Comment graph respectively
Having obtained the degree cutoffs, the core community
can be isolated. We used the actual ideation output to evaluate
the core community detected. Table II showed that while the
core community comprises of a relatively few users, they
contribute a significant portion of the ideas that are
implemented. This result when combined with the fact that
SFDC implemented only 4.3% of the total ideas put forth by
the users suggests that it is important to identify the key users
in the community for effective ideation co-creation.
SFDC Vote Graph
SFDC Comment Graph
Figure 4. Core Community Isolation
TABLE II. SFDC IDEA EXCHANGE CORE COMMUNITY PERFORMANCE
#Graph
% of total
users in Core
Idea contribution
fraction of core
Implemented Idea
fraction of Core
Vote
Graph
0.35% =38%
Comment
Graph
0.68%
V. KEY USER RANKING
We conduct link analysis to rank community users for their
relative importance. This can be done by calculating the
prestige of a node and also by looking at measures of centrality
of a node. Structural prestige in network analysis has been the
basis for analyzing many networks. [11] details the PageRank
algorithm that was used to rank web pages according to
structural prestige. [16] proposed the idea of betweenness
centrality as a measure of a node’s importance in the overall
graph.
Based on the original page rank algorithm [11], we define
the community activity rank as follows

( ) ( ) ∑
( )
∑
( )
Where C(i) is the community activity rank of node i, E is
the set of all edges in the graph, d is the damping factor (set to
0.85), is the weight of outbound link from node j to i,
∑ is the sum of all of the weights of outbound edges from
node j. Thus, the final activity rank of a user is dependent on
the activity ranks of the users who collaborate with the user in
question. The key difference is that the original page rank
algorithm didn’t cater for edge weights and in our formulation
we use a directed graph with weighted edges.
Betweenness Centrality is defined as follows
( ) ∑
( )
Where is the number of shortest paths between j and k
and ( )is the number of shortest paths that have node i as
part of the path. Thus, Betweenness is a measure of the
number of times a node is part of the shortest path between
any two other nodes in the graph. The intuition that guides this
centrality measure is the idea that a node in the shortest path
between two other nodes can influence the flow of information
between those two nodes.
A. Ranking Results
The two approaches to ranking users were applied using [5]
and the users were ranked. An abbreviated subset of the results
(due to space constraints) - the top 10 users - for the comment
graph are shown in tables III and IV
TABLE III. SFDC IDEAS COMMENT COMMUNITY RANK TOP 10
# User Name
Community
Activity Rank
Community
Recognition
1 Alexander Sutherland 0.019588813 MVP Winter 11
2 Christoph K 0.008686445 None
3 werewolf 0.007827351 MVP Winter 11
4 Andres G 0.007087102 MVP Winter 11
5 jcohen 0.006898484 None
6 TomaszO 0.006523066 None
7 ToddJanzen 0.005924399 SFDC
8 eyewellse 0.005483297 None
9 ErikM 0.005006845 None
10 chris925 0.004876349 None
TABLE IV. SFDC IDEAS COMMENT BETWEENNESS CENTRALITY TOP 10
# User Name
Betweenness
Centrality
Community
Recognition
1 Alexander Sutherland 0.019771398 MVP Winter 11
2 Rhonda Ross 0.015190872 MVP Winter 11,12,13
3 Scott J 0.013384961 SFDC
4 Andres G 0.00886717 MVP Winter 11
5 Matthew Lamb 0.008623408 MVP Spring 11
# User Name
Betweenness
Centrality
Community
Recognition
6 AMartin 0.007343319 MVP Spring 11
7 Mattias Nordin 0.005807566 MVP Winter 11,12
8 mattybme1 0.005726696 MVP Winter 11,12,13
9 Christoph K. 0.00516802 None
10 Jakester 0.004668099 None
B. Evaluation of the Ranking
To evaluate the ranking of nodes, a measurement of the
firm’s evaluation of the importance of a user is useful.
Salesforce runs a community recognition program called the
MVP program where it periodically chooses members from
the community for their outstanding achievements and
recognizes them with virtual badges as MVPs. The
Salesforce.com website describes the program as “This
program recognizes exceptional individuals within the
Salesforce community for their leadership, knowledge, and
ongoing contributions. These individuals represent the spirit
of the community and what it is all about!”
In the result tables, the Community Recognition column
shows if the individual has been the recipient of any such
award. In cases, where the contributor is part of Salesforce, the
employee is not eligible for recognition. Such members have
also been highlighted. To evaluate the ranking approaches, the
MVP recognition of a user can be used to as a qualitative
measure. I.e. to what extent can network prestige or centrality
be linked to the firm’s recognition of individual users.
If the firm's recognition of community member's
contribution is the key criteria then the Betweenness measure
does much better than the Community Activity Rank measure.
Most of the people in the top 10 as ranked by the betweenness
measure are already members that the firm (SFDC) has also
recognized publicly. This does imply that this could be
measure that can potentially be used to identify users who
have not been yet recognized. This measure could also be used
in a dynamic fashion (as the community grows) to identify
newer key users. It is interesting to note that the community
rank based approach didn’t perform as well as the
betweenness centrality measure. While the transfer of prestige
from one user to another through out-links has an intuitive
appeal, in this instance, it didn’t perform as well empirically.
[15] performed a similar analysis on the java question and
answer forum and report similar findings that in online
expertise networks PageRank derivatives did not outperform
simpler measures.
The results also pose interesting qualitative questions for
analysis. For instance, the user Jakester (number 10 as per
betweenness ranking) has suggested 26 ideas, of which 10
have been implemented by SFDC. He has also contributed 534
comments and 771 votes on ideas. It would be of interest to
understand the reasons in the decision making process of the
firm that led to him not being recognized. In a similar fashion,
it would be interesting to understand the motivational impact
of having been granted a MVP badge. While, the analysis

covered in this paper didn’t evaluate this, it presents an
interesting avenue for further research.
Thus, betweenness centrality is a potential tool to answer
the first question posed at the start of this paper. In an actual
implementation scenario, this metric could be calculated in an
offline batch mode for analysis. [6] has proposed a fast way of
calculating betweenness centrality that could be used to
perform this calculation.
VI. COMPARING VOTING AND COMMENTING
The next key question then is which of the two online
forum activities (voting and commenting) encourage a tighter
and close knit community to be formed? This question is tied to
what motivates users to engage and participate in innovation
forums with the firm. If the activity fosters intrinsic
motivational factors, then it is likely to be self-sustaining. [13]
note that in innovation communities a key motivating factor for
users is learning. In [9], Lakhani and Eric Von Hippel studied
the Apache Open Source community and report that in their
study "98% of the effort expended by information providers in
fact returns direct learning benefits to those providers".
To evaluate the voting activity against the commenting
activity, a measure of community quality is required. [10] uses
the notion of conductance as a measure of community quality.
According to [10], if A is the adjacency matrix of the graph G
= (V, E), then
( )
∑
{ ( ) ( )
Where ( ) ∑ ∑
Conductance is a measure of the intra-community
connections versus the inter-community connections. The
lower the value of conductance, the better the quality of the
community i.e. the community is densely connected internally
and sparsely connected to the rest of the graph.
[10] also introduces the notion of a community profile plot.
Network Community Profile (NCP) plot characterizes the best
possible community over a range of size scales. In this plot, the
size of the nodes in a community (community size) is plotted
on the x axis and on the y axis the best possible community of
the given size (based on conductance) is tracked. Both the axis
are on a log scale. In real world networks, the value of
conductance decreases initially and then starts to increase. In
our analysis, the global minimum of the NCP plot can be a
measure of the community formation tendencies of an activity
graph. A comparison of the community size at which the global
minimum occurred was used to draw conclusions on
community formation characteristics of voting and
commenting activities.
Using this approach, the activity graphs constructed out of
comment and voting data were treated as un-directed graphs
and used to create separate network community profile plots.
The plots for the vote activity graph and the comment activity
graph are shown in the figures 5 and 6 respectively. The SNAP
[17] (Stanford Network Analysis Project) toolkit was used to
create this plots.
Both the profile plots show the expected behavior of
initially decreasing conductance followed by increasing
conductance. This is to say that the quality of communities
increases with node count for a while and then starts to
degenerate. The vote activity graph reaches a community size
of 10 nodes when conductance is at the global minimum, while
for the comment activity graph; the community size where the
global minimum is found is around ~33 nodes. In other words,
in the comment activity graph, the highest quality community
was found involving up to ~33 users whilst in the vote activity
graph, the best community size is comprised of only 10 users.
Figure 5. SFDC Vote Graph NCP Profile Plot
Figure 6. SFDC Comment Graph NCP Profile Plot
This comparison suggests that commenting activity has a
higher community creation effect than voting activity. This is
to be expected as psychologically, there is higher intrinsic
motivations and rewards (through the knowledge gained) for
engaging in discourse as opposed to merely voting on an idea.
While, this analysis has been based on a single ideation
community, it shows the distinction between voting and
commenting activity in objective and measurable terms.
Further work is required to analyze other ideation networks to
understand if similar characteristics are observed there. This
result is also in line with [9], [13] which have suggested that a
key motivating factor is learning through participation. Such

understanding will be important for designing suitable activity
features for the online user forums to be effective ideation co-
creation platforms.
VII. RELATED WORK
Similar analysis was performed on the SAP dataset and the
following results were obtained. The activity graph also
displayed power law distribution of node degree with an α of
2.57 and exhibited similar core-periphery structure. The size of
the core community obtained by the heuristic algorithm was
5.4% of the overall community but accounted for 20% of the
suggested ideas and 46% of the implemented ideas (only 4% of
all suggested ideas were implemented). Qualitative analysis of
the ranking also demonstrated that betweenness performed
better than the PageRank derived community activity rank.
Many analyses of online networks have used the notions of
node prestige to rank and evaluate participants. [7] used a
PageRank based approach to identify key users in online
communities. [14], [15] applied activity based ranking
techniques to the study of expertise in online question and
answer forums. In these forums, one user poses a question and
other users contribute answers to the posed question. [15]
obtained similar results where the PageRank derivatives of
node importance did not outperform simpler measures. In their
analysis, they found that “z_score” and “z_num” -simple
metrics derived from a node’s in and out degree- performed
best in their dataset. [8] used the notion of out-links as a means
of identifying rising stars in bibliography networks. The
intuition here is that the nodes in this network (namely
researchers) have prestige which they confer on others through
their co-authorship and collaboration. [4] analyzed the online
ideation community of DELL and concluded that past success
likely has detrimental effects on the productivity of new ideas.
While much work has been done on online communities, the
study of ideation in online communities is still evolving and
presents an opportunity for continued research.
VIII. CONCLUSIONS
In this paper, we have performed link analyses on the
online ideation communities of two software providers for
crowdsourcing new product features. We found that most of
the implemented ideas were originated from a small core
community in the forums. To identify the key users for product
feature ideation, we found that Betweenness centrality is a
better measure for user ranking than PageRank. We also found
that the community cohesion tendencies of commenting
activity were higher than that of voting activity. These findings
will be useful for designing such company-centric user forums
for effective co-creation of new product features.
A. Limitations
The analysis in this paper adopted a static approach to the
network activity. In reality, collaborations in online
communities weaken / strengthen over time. If two users
communicated on a certain task once, it doesn't necessarily
imply that the link remains active for their entire lifetime on the
community. This could potentially be handled by varying the
edge weight as a function of time. This is a potential area for
further research.
The analysis of community formation required splitting the
community into sub-communities. Other approaches such as
those demonstrated by [18] could be used to measure
community quality of overlapping communities. These will be
evaluated in future work on the data set.
REFERENCES
[1] MK Poetz and Martin Schreier. The value of crowdsourcing: can users
really compete with professionals in generating new product ideas?
Journal of Product Innovation, 29(2):245-256, 2012.
[2] Dahlander, Linus, Lars Frederiksen, and Francesco Rullani. "Online
communities and open innovation." Industry and innovation 15.2 (2008):
115-123.
[3] Clauset, Aaron, Cosma Rohilla Shalizi, and Mark EJ Newman. "Power-
law distributions in empirical data." SIAM review 51.4 (2009): 661-703.
[4] B. Bayus. Crowdsourcing and individual creativity over time: the
detrimental effects of past success. Available at SSRN 1667101, 2010.
[5] Mathieu Bastian, Sebastien Heymann, and M Jacomy. Gephi: An open
source software for exploring and manipulating networks. In Interna-
tional AAAI Conference on Weblogs and Social Media. Association for
the Advancement of Artificial Intelligence, 361-362 ,2009.
[6] Ulrik Brandes. A faster algorithm for betweenness centrality. Journal of
Mathematical Sociology, 25(1994):163-177, 2001.
[7] Julia Heidemann, Mathias Klier, and Florian Probst. Identifying key
users in online social networks: A PageRank based approach.
Information Systems Journal, 4801(December):12-15, 2010.
[8] XL Li, C Foo, K Tew, and SK Ng. Searching for rising stars in
bibliography networks. In Database Systems for Advanced
Applications,pages 288-292, 2009.
[9] KR Lakhani and Eric Von Hippel. How open source software works:free
user-to-user assistance. Research policy, 32(July 2002):923-943, 2003.
[10] Leskovec, Jure, et al. "Community structure in large networks: Natural
cluster sizes and the absence of large well-defined clusters." Internet
Mathematics 6.1 (2009): 29-123.
[11] L Page, S Brin, R Motwani, and T Winograd. The PageRank citation
ranking: bringing order to the web. pages 1-17, 1999.
[12] E. Prandelli, M. Swahney, and G. Verona. Collaborating with customers
to innovate: conceiving and marketing products in the networking age.
Edward Elgar Publishing, 2008.
[13] Anna Stahlbrost and Birgitta Bergvall-Kareborn. Exploring users
motivation in innovation communities. International Journal of
Entrepreneurship and Innovation Management, 14(4):298-314, 2011.
[14] KK Nam, MS Ackerman, and LA Adamic. Questions in, knowledge in?:
a study of naver's question answering community. Human Factors,pages
779-788, 2009.
[15] Jun Zhang, MS Ackerman, and L Adamic. Expertise networks in online
communities: structure and algorithms. Proceedings of the 16th
international conference on World Wide Web, pages 221-230, 2007
[16] Freeman, Linton. "A set of measures of centrality based on
betweenness". Sociometry, 40: (1977):35–41
[17] Stanford Network Analysis Project, http://snap.stanford.edu/index.html
[18] Palla, Gergely, Imre Derényi, Illés Farkas, and Tamás Vicsek.
"Uncovering the overlapping community structure of complex networks
in nature and society." Nature 435, no. 7043 (2005): 814-818.

Effective Crowdsourcing for Software Feature Ideation

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (17)

Destaque

Destaque (13)

Semelhante a Effective Crowdsourcing for Software Feature Ideation

Semelhante a Effective Crowdsourcing for Software Feature Ideation (20)

Effective Crowdsourcing for Software Feature Ideation