SlideShare uma empresa Scribd logo
1 de 48
Baixar para ler offline
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                                                                    1



                   A Study of Internet RFC Authors using
                            NetDraw and yEd.
                                                    Olivier M. J. Crépin-Leblond, PhD.




  Abstract— The Internet is a very important yet extremely sophisticated aspect of modern life.
There has often been discussion in online forums about its origins. In particular, the community
feels that it is time to say “thank you” to those people who contributed to its design and evolution.
Some of the main contributors are already well known and recognized. This essay shows how to
use Social Network Analysis to identify the other significant contributors to this adventure. The
analysis rests on the main assumption that the Internet Engineering Task Force’s (IETF) 5000+
“Request For Comments” (RFCs) constitute the engineering basics for the Internet. Here, we use
novel methods to extract data from the RFCs using readily available software, and use a suite of
free downloadable software to draw several social maps of the RFC authors’ space. Our results
highlight recent techniques for social mappings & data analysis in complex interaction
environments such as large organizations and emerging bottom-up process governance circles
such as those considered for governing the Internet.

    Index Terms—NetDraw, Mage, yEd, RFC, Father, Internet, Social, Networking, IETF.




                                                            I. INTRODUCTION
  N APRIL 1969, Dr. Steve Crocker, then at UCLA, published the first Request for Comments, RFC 1 [1]
I entitled Host software. The RFC repository consisting of more than 5000 entries, remains one of the
“technical pillars” of the network of networks called the Internet. Once published, an RFC cannot be
modified. Many RFCs are therefore superseded (or made obsolete) as new ones replace them, but each
publication contributes to the overall Internet edifice. As mentioned on the RFC Editor Web page, “The
RFC (Request for Comments) series contains technical and organizational documents about the
Internet, including the technical specifications and policy documents produced by the Internet
Engineering Task Force (IETF).”[2].
   So who is the “Father of the Internet”? There is no single answer to this frequently posed question. Dr.
Leonard Kleinrock is credited with packet switching theory [3]. Dr. Joseph Licklider, with the concept
that computers could all be connected together into a giant network to talk to each other [4]. What about
Dr. Douglas Englebart [5] inventor of the computer mouse? One of the most important advances in the
Internet’s development was the TCP communications protocol, developed in 1974 by Dr. Vinton Cerf
and Dr. Robert Kahn [6]. However, circa 1977 the “IP” in TCP/IP was split off from TCP circa at the
urging of Dr. Danny Cohen, Dr. David Reed and Jon Postel, to support real-time, unsequenced packet
streams. Furthermore, Dr. Robert Metcalfe is credited with co-inventing Ethernet [7], which today is the
basic physical communication standard in most wired networked computers. How do all these people

   Draft manuscript completed December 5, 2008. Revised April 2009. Working Title: “Will the Real Father(s) please stand up?” This work was supported
in part by Global Information Highway Limited. The Author is with Global Information Highway Ltd, 7 Kensington Church Court, London, W8 4SP, UK.
(e-mail: ocl@gih.com)
   © 2008/2009 Olivier MJ Crépin-Leblond. All Rights Reserved. The Copyright for this paper rests with the Author but permission to freely distribute the
information contained within this publication is granted provided the source of the article is credited. Parts of this document may be reproduced in a
commercial publication ONLY if prior permission has been granted by the copyright holder.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                     2

relate to each other?
  However, the Internet is not solely TCP/IP and Ethernet. A great number of services and other
protocols at each layer of the Internet model make this network of networks, what it is today. It is
therefore likely that each protocol and component of today's Internet has several “fathers” (and
“mothers”). In fact, there are several thousands such contributors, both inside and outside the realm of
RFC space. Nevertheless, because their proposals are contained in the many RFCs, we decided to look
specifically at the Internet standards, RFCs and their authors, possibly the largest “family” of Internet
pioneers and contributors available.

  This essay serves to determine the most prolific authors/contributors to the RFC database and to
extract a social network of RFC authors in order to better understand their working relationships and
spheres of influence. It uses modern social network engineering tools to make the vast amounts of data
available to us today more easily understandable. It will also serve to highlight the shortcomings of such
a method, mainly caused by its restricted input data set consisting solely of the RFC database.

Why this research?
  By undertaking this research, we show the use of social networking topology modeling to elucidate
the workings of bottom-up processes promoted to construct at-large governance. We define a
methodology for such study and look forward to such an analysis being used in future organizational
processes involving large groups of participants. Finally we explore avenues to more fully comprehend
the change in social paradigm that Internet brings to the traditional governing processes used in non-
Internet regimes.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                      3



                                 II. DATA COLLECTION METHOD

A. Collecting Data
  The source data of the study was loaded from the RFC Editor FTP site as the RFC Bibliographic
Listing (Created 09/08/2008)[8]. This has the advantage of respecting a set format which can be more
easily machine-readable than other RFC documents. This resulted in 5340 RFCs indexed.

B. Refining/Formatting Data
  Using data mining techniques to extract the names of authors and their interpersonal relationships
from the list of RFC authors forms a crucial part of the work.
  No purpose-built software was used for data mining: the data set was filtered in several stages using
text processing tricks usable by anyone with an ability to master them in standard Microsoft software.

  This consisted of importing the list of RFC authors as a text file into MS Word and reformatting the
text with even delimiters using the “replace” functions inherent in that software. The resulting file was
imported into an MS Excel table with each line corresponding to one RFC entry matching names of
authors, one name per column – a formatted table of authors working together. The most time-
consuming process was to crosscheck accuracy and synchronicity of data manually due to errors
generated by erroneous formatting of the original file. For example, missing punctuation delimiters in
the MS word file triggered mismatching of names in columns. Intermediate stages included tables 52
columns across & 5 170 rows in height. This table was transformed (using cut/paste) into a linear
numbered referential X-Y listing of authors containing 10 735 entries.
  The file was imported into an MS Access Database. Two cross-linking rules were set-up. The first one
served to add-up the total number of publications per author. The second one was used to add-up the
number of publications of each pair of authors.
  The input included a table of 10 735 entries. The outputs consisted of a table of 3 480 entries for the
authors listing and a table of 17 266 pairing links. This constituted our network of authors.

  Cutting and pasting into a text file and adding the correct formatting code resulted in a file satisfying
the input “.vna” format required for the NetDraw Software. The format is human-readable and therefore
easy to generate manually or automatically, without being a proprietary binary file format. It is shown
next.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                    4


       *Node data
       ID, publi
       Postel_J. 205
       McCloghrie_K. 92
       Rose_M. 75
       Rekhter_Y. 69
       Reynolds_J. 64
       Schulzrinne_H. 62
       McKenzie_A. 60
       Braden_R. 51
       Crocker_D. 51
       […]
       *Tie data
       from to intensity
       Postel_J. Reynolds_J.            37
       Reynolds_J. Postel_J.            37
       McCloghrie_K. Rose_M.            26
       Rose_M. McCloghrie_K.            26
       […]



  ID is the author’s name; publi is a variable denoting the number of publications; intensity is the
number of publications for the author pair. Obviously, this collaboration is reciprocal so it is
automatically shown going both ways. “[…]” denotes all further entries.

  The data mining mechanism defines the data which is made available for the NetDraw software to
analyze and plot. Different data sets can be designed for different purposes and the stage of information
collection and data mining is therefore crucial in relation to the targeted end results.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                     5



                                     III. GLOSSARY OF TERMS
  In order to analyze a social network, we start by looking at each individual.
  In the field of bottom-up analysis, all networks are composed of groups (or sub-graphs). When two
participants have a tie, they form a "group". One approach to thinking about the group structure of a
network begins with this most basic group, and seeks to see how far this kind of close relationship can
be extended. This is a useful method, because sometimes more complex social structures evolve, or
emerge, from very simple ones, and this is the type of hidden information which we are hoping to detect
when analyzing the network.
  Social networking analysis relies on graph theory, a discipline which has been traditionally
mathematical in nature. Because each discipline speaks a particular language, it is important to define a
restricted number of terms which will be used at length in this essay.

  For the sake of easy referral, those terms are presented here, taking into account the context of our
analysis. In general, different terms sometimes have the same meaning depending on the context
(bibliographical, scientific, geographic; mathematical, etc.). Their equivalency is shown here.

   A “node”, also referred to mathematically as a “vertex” (plural: “vertices”), is a point representing a
single RFC author. In NetDraw, this is also called a “symbol”. In the paragraph above, we referred to a
node as it a “participant” or an “individual”. In order to reduce confusion, we use only “node” and
“author”.
   When two or more nodes (RFC authors) work together on an RFC, they are linked by a “line”. A line
therefore ends at nodes. In NetDraw, this is also called a “link”. A mathematical designation of a link is
an “edge”. All three terms will be used in this essay.
   A “graph” is the set of nodes and set of lines between pairs of nodes, as visualized on a 2 or 3
dimensional plane.
   A “network” consists of a graph and additional information on the nodes or the lines of the graph.
This is effectively what we are building with NetDraw.
   A “cluster” is a group of 2 or more nodes connected together.
   A “clique” is a maximal complete sub-network containing 3 nodes or more. It is a specific form of
cluster. In graph theory, this sub-set of a network contains nodes which are more closely and intensely
tied to one another than they are to other members of the network. Strictly speaking a group is identified
as a clique when every node is directly linked to every other node in the group.
   A “dyad” is the smallest grouping of nodes, that is, two nodes linked together.
   “Betweenness” is defined as the degree a node lies between other nodes in the network. In effect, it is
an intermediary, also known as a bridge or a liaison. Therefore, it is the number of other nodes it links
directly or indirectly together through its own links. The degree of betweenness is important in a social
network because it defines the nodes connecting sometimes vastly different groups together.
   “Closeness” is defined as the degree a node is near all other nodes in a network (directly or
indirectly). Thus, closeness is the inverse of the sum of the shortest distances between each node and
every other node in the network. A node with a high degree of closeness is more “central” to the
network than one with lower closeness.
   “Pendants” are nodes connected to the rest of the network through a single link.
   “Isolates” are nodes which are not connected to any other node in the network. In our case, this is an
author having published all of his or her RFCs solo.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                    6

                                        IV. PLOT AND ANALYSIS
   NetDraw [9] is a network visualization software that can be downloaded from the Internet for free. Its
license agreement allows it to be freely copied. A set of analytical protocols is available to extract
meaning from the data. The algorithms included in the software are used in social network analysis,
micro-molecular analysis, physics as used in astronomy, and other disciplines. In this section, results
will be presented for several types of analysis.


A. Circle Layout
   1) Method/Theory
  The Circle Layout uses a simple algorithm to plot nodes in a geographic circle. In NetDraw, it is
possible to define the order of the nodes around the circle to be alphabetic or depending on the number
of RFCs published.
  The best connected nodes are found by simply looking at the concentration of links and their
thickness. User intervention is however required to detect pendants since these are also plotted within
the circle and are not immediately discernable.
   2) Results
  A graph of the resulting plot is shown in Figure 1.




  Figure 1: Network nodes plot using the Circle Layout (Authors having published 20+ RFCs)


The parameters used for the plot were as follows:
   • Data subset: author has published more than 20 RFCs (64 authors satisfied this rule)
   • Node size according to number of publications
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                    7

   •   Link thickness between two nodes according to number of publications the authors have written
       together

   There is a concentration of links around Jon Postel. This is to be expected since, as RFC editor for
many years, his contribution to the RFC process vastly exceeded any other author’s contribution. Other
link concentrations can also be easily discerned, directly related to the closeness of each author.
   A dyad, a few isolates, as well as several pendants are visible. The order around the circle is set
automatically by the program using a parameter which is user-chosen, in this case alphabetically.
Another straight forward parameter which could be chosen for this function is the number of links to
other nodes. Nonetheless, neither parameter avoids the pitfalls that the software falls into and which
requires a human eye to reorganize:
      • The dyad, pendants & isolates had to be extracted manually from the circle’s layout;
      • Nodes are not arranged in an order which reduces link distance. For example, Malkin_G is
          connected to Reynolds_J and Baker_F but is geographically located on the other side of the
          circle, thus adding to a possibly false impression of extensive inter-connection between nodes.

   3) Scaling up
  Loosening the data subset constraints of 20 RFC publications per author brings more nodes in the
picture. The restricted data results may show no connection between the isolates and the main group –
this may only be so due to the constraints used. In fact, they may connect to the main group via other
authors who do not satisfy the sample’s constraints but have a high degree of betweenness.
  Reducing the constraint by selecting authors of 10 RFCs or more (171 authors), reveals an increase in
mesh density within the network. This is shown in Figure 2.




Figure 2: Network nodes plot using the Circle Layout (Authors having published 10+ RFCs)
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                     8

  Removing the subset constraints altogether shows the overall graph shape of the network, including
all 3 480 authors, as shown in Figure 3. It is clear that RFC authors are well connected together and that
the RFC process provides a real sense of community.




Figure 3: Network nodes plot using the Circle Layout (all RFC authors)




   4) Conclusions
  The Circle layout utilizes a simple algorithm to display nodes in a geographic circle. Its advantages
are reduced computing processing power requirements and a display giving the eye a clear sense of
cross-group connectivity. Its weaknesses are individual anomalies such as the ill-placing of isolates,
dyads and wrong placement of nodes which are connected to a reduced number of other nodes. The
algorithm does not take into account the geographical positioning of nodes according to their links to
other nodes.
  Both weaknesses can be corrected by human intervention. As a result, the algorithm is very useful for
displaying social interaction between the authors of RFCs and detecting some of the synergies that
originated in building the RFC standards.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                      9


B. Multi-Dimensional Scaling (MDS) Analysis
   1) Method/Theory
  Multi-Dimensional Scaling (MDS) Analysis [10] comprises a set of statistical techniques used together
to visualize data in an N-dimensional plane. The MDS Algorithm looks at similarities within the data
and assigns a location to each node of the input network. This algorithm is particularly suited for 3D
visualization.
  MDS is not so much an exact procedure as rather a way to "rearrange" nodes in an efficient manner,
so as to arrive at a configuration that best approximates the link structure.

   2) Results & tri-dimensional MAGE Plot
  The parameters used for the plot were as follows:
   • Data subset: author has published more than 20 RFCs (64 authors satisfied this rule)
   • Node size according to number of publications
   • Link thickness between two authors according to number of publications written together

  MDS analysis yields poor results when plotted in 2 dimensions because the nodes overlap each other,
thus making the graph illegible. However, it is possible to view a 3D graph by exporting of the graph
data (in Kinetic Image .kin format) to a separate (free) 3-dimensional rendering program named
MAGE[11].
  MAGE is used for all sorts of 3-dimensional rendering such as molecular chemistry and physics,
biology, mathematical analysis and even archeological modeling. NetDraw can export data to a Kinetic
Image format, which makes it suitable for displaying the network in 3D, as seen in Figure 4.




  Figure 4 : Mage visualisation of authors of 20+ RFCs


  The overall structure consists of a main cluster of nodes and several isolates. Pendants are also clearly
discernable. Nodes at the center of the cluster can be clearly seen as being more connected.
  An important feature of MAGE is the ability to rotate the structure taking any node as an axis.
Zooming in/out is also possible. Rotation is a particularly important cognitive process for the brain to
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                    10

understand 3D structures, although we are only using a subset of the features of MAGE.

  The zoom feature is illustrated in Figure 5 which shows a clique within the network structure. This
shows a working group of authors who wrote several RFCs together. Whilst it does not mean that all
authors were present in each RFC, it shows extended collaboration between the authors represented by
the nodes.




Figure 5: Zooming in on a cluster within a Mage Plot of Multi-dimensional Analysis


Once zoomed-in, rotating the structure around the central cluster’s node is also possible and yields good
results.

   3) Scaling up
   Multi-dimensional system analysis is a processor and memory-intensive method since its results are
best represented in 3 dimensions. A test run was undertaken by selecting authors having published at
least 10 RFCs. This brought the number of authors up to 171 authors. MDS cluster analysis, although
demanding much processing power, gave poor results, even when plotted using MAGE. The cause was
traced to tight clustering of the nodes, thus requiring parameters in MAGE to be tweaked to omit
displaying the nodes. This resulted in a diagram showing the links only – a wire frame of the whole
structure which required maximum zooming in to be displayed. The resulting view was very unclear.

  Scaling up the MDS analysis with an input data set from the initial 64 nodes to hundreds or even
thousands of nodes, requires more computing power and several Gigabytes of memory. Insufficient
memory triggers buffer overflows which crash the software. Future versions of the software might avoid
this condition although increasing node count increases complexity exponentially.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                 11

  4) Conclusions
  Multi-dimensional system analysis is useful in displaying the network in three dimensions. The
NetDraw feature to transfer the results to MAGE (through a .kin export file) is very useful to plot the
network in true 3D, including changing the position of lights as well as visualizing the network from
any angle and traveling virtually through it. Node clusters can be visualized with ease. However, some
information is lost, for example thickness of link or size of node. It is hoped that future versions of
NetDraw and Mage will incorporate those features to make the visualization an enhanced experience.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                    12


C. Geodesic Distance through Spring Embedding
   1) Method/Theory
  The Spring Embedding method is based on the geometric theory of gravitation [12], although
constrained to the 2-dimensional plane, hence the crowded display. Each node is considered to be acting
on the other nodes through attraction and repulsion, and the links between the nodes are taken as springs
enabling the nodes to travel. This iterative method places nodes on the plane and eventually reaches a
stable state, provided enough iterations are calculated.
   The “geodesic distance” is the shortest path between two nodes. If node x is connected to node y
which is connected to node z, the distance from node x to node y is the length of the geodesic distance
from x to y. The geodesic distance from node x to node z is the sum of the geodesic distances from x to
y and from y to z. In the context of social networking, this enables us to analyze the “networking extent”
of an individual based on his or her number of connections. In other words, how well are they connected
to the rest of the network? This is the concept of “centrality”, also referred to as “closeness” and
described earlier in the glossary.

  The constraints of the layout criteria, whilst introducing some error margin, included “node repulsion”
and “equal edge lengths”.
  Node repulsion introduces a minimum distance between nodes displayed on the graph and is required
to avoid a clustering of the nodes to the extent that the overall diagram would be unreadable.
  Equal edge lengths is self-explanatory and serves to constrain the length of the links between the
nodes in order to provide some space within the graph. It does not mean that all links will have the same
lengths: the program will just try to make the lengths as similar as possible. Both constraints were used
specifically to improve the readability of the graph.

  The analytical process being an iterative process, every instance of the analysis does not yield a
geometrically exact reproducible layout, although the produced layouts are a very similar in shape and
geometric positioning. The structures and clusters are the same.
  This type of plot is readily available in the NetDraw software.

   2) Results
  The parameters used for the plot were as follows:
   • Data subset: author has published more than 20 RFCs (64 authors satisfied this rule)
   • Node size according to number of publications
   • Link thickness between two authors according to number of publications written together
   • layout using Spring embedding iterative simulation
   • number of iterations:100 – 1 Billion

  Since this is an iterative analysis, increasing the number of iterations should improve on the
“accuracy” of the results. In fact, repeating the analysis from 100 iterations in regular steps to
1,000,000,000 iterations showed no significant difference to the layout. The resulting graph is shown in
Figure 6.

  With each node representing an individual researcher, individuals which are located more at the center
of the diagram act as bridges between various groups of researchers. It is also possible to easily see
clusters of nodes which are well interlinked together. Cliques are clearly visible, and clusters including
thicker link width indicate more extensive collaboration between a number of authors.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                   13


  Attempting to export the .kin data to a MAGE 3D plot yielded results which did not appear as
conclusive as the MDS analysis due to the high cluster concentration of nodes – the export process
shortened the link length to such an extent that the viewing of the cluster was affected.




Figure 6: Graph of Spring Embedding using Geodesic Distances, Node Repulsion and Equal Edge Lengths
(64 authors having published 20+ RFCs). 100 Million iterations.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                     14

   3) Scaling up
  Scaling up and running the simulation under constraint that authors publish 10 or more RFCs, brings
the total number of nodes to 171. Sadly, the clutter caused by more nodes makes the resulting graph less
useful than for a more restricted input set. It is possible to discern the largest nodes, but smaller nodes
are seen with difficulty. (Figure 7)




Figure 7: Graph of Spring Embedding using Geodesic Distances, Node Repulsion and Equal Edge Lengths
(171 authors having published 10+ RFCs)



The new authors join the whole network with several pendants, very few isolates and only one dyad.
With up to all 3000+ authors, the network becomes difficult to interpret due to lack of space.

   4) Conclusions
  The advantage of Geodesic Distance analysis using node repulsion is that of providing results which
are easily displayed in two dimensions. Since the analysis is based on an iterative process, the
computing power required for such an analysis can be user-selected. Lower iteration values yield results
which are slightly more unstable in geometric placement of the nodes. Cliques, clusters, isolates and
other features of the network can be clearly identified and reliable conclusions can be derived about the
centrality of an individual thanks to his or her final geometric location within the resulting “network of
people”.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                    15


D. K-Core Analysis
   1) Method/Theory
   K-Core analysis is based on the clustering of groups of people who are closely connected together. It
is a way to study the nested structure of a modular organization. The K-Core of a network is the
maximal sub-network consisting of links with degree at least k. For example, the 1-core is simply the
original network; the 2-core is the network with all the pendants removed etc. Increasing k removes
links and nodes which are less closely connected to the network.

   2) Results
  The parameters used for the plot were as follows:
   • Data subset: author has published more than 20 RFCs (64 authors satisfied this rule)
   • Node size according to number of publications
   • Link thickness between two authors according to number of publications written together

  Four distinct groups of people are established: three main groups, and one group of authors that
published solo.

  It can be seen that clustering is caused by “similarity” data. As expected from the algorithm, the
defining factor for the clustering is the number of links originating at each node. This in itself is a
limitation. When performing K-Core analysis, the resulting groups show inconsistencies.

  Pendants and nodes connected to 2 groups with a single link to those groups, or to 2 nodes in the same
group, are defined as a separate group. This, of course, is a correct representation of K-Cores, but of no
use for our purpose of organizing the groups visually.
  Manual translation of these nodes into the correct groups was therefore required and the resulting
graph is shown in Figure 8.
  Dyads do not fare well either in K-Cores since they are not connected to the main group. Pendants
also need to be translated since they are not seen by the software as having integrated well with any of
the cliques present, although in real life, a pendant would probably benefit well from the clique through
the node to which it was linked.

   3) Scaling up

  Scaling up, running the simulation under constraint that authors publish 10 or more RFCs, brings the
total number of nodes to 171. Whilst the overall graph including all Ks is too crowded, it is possible to
run a different type of K-Core analysis, by selecting only groups with specific value for K. This selects
the nodes having a specific closeness or better.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                    16




Figure 8: K-Core Analysis of 64 authors having published 20+ RFCs


  Since the network is divided into six groups, (Group 1, Group 2, Group 3, Group 4, Pendants,
Isolates), the value of k can be selected to be any number from 1 to 6. k=0 selects the isolates. k=1, the
pendants, k=2, the nodes having 2 links etc.
  Selecting nodes with k=6 and plotting them using Spring Embedding with Geodesic Distances, Node
Repulsion and Equal Edge Lengths, it is possible to display the most tightly connected nodes in the
network. These 15 nodes might not be the most central, but form the highest clique in the overall graph.
This is shown in Figure 9.
  In another run, a value of k=5 was selected thus incorporating more nodes in the graph, as shown in
Figure 10. The network obtained is the core network upon which most other nodes will link to.
  In the real world, and non technical language, the 64 authors shown this graph are the “pillars of the
community” in that they have published in excess of 10 RFCs and have also networked extensively with
their peers. Some authors might have published more RFCs than them, but their network might not have
been as wide-ranging.
   4) Conclusions
  Whilst K-Core analysis might appear to, on first use, not yield meaningful results, this is countered by
the usefulness in finding the nodes with the highest closeness within our target group. Performing K-
Core analysis and displaying the results by grouping according to the K-Core criteria, it is possible to
see how many of each type of node is present in the network. Displaying the results using Spring
Embedding GeoDesic Layout shows who are the most socially connected authors in our network. The
mixing and matching of parameters (constraints about number of RFCs published, k value, display and
grouping methods) can bring very interesting facts about the social network than first meet the eye.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                      17




Figure 9: K-Core Analysis K=6 of authors having published 10+ RFCs / Spring Embedding GeoDesic Layout




Figure 10: K-Core Analysis K=5 of authors having published 10+ RFCs / Spring Embedding GeoDesic Layout
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                        18

E. Blocks & Cutpoints
   In this analysis, the software checks for nodes that will specifically cut parts of the overall network off
if they were to be removed from the structure.

   1) Method/Theory and results
  The parameters used in our analysis were as follows:
   • Data subset: author has published more than 20 RFCs (64 authors satisfied this rule)
   • Node size according to number of publications
   • Link thickness between two authors according to number of publications written together

  Results using this method are not useful in our case: the subset of authors selected has worked
extensively together since it is really composed of the core of our network of 3 480 authors. As a result,
the overall network of nodes features enough redundancy for no single “point of failure” – ie.
“Cutpoint” – except for pendants. Since this can be established visually, there is no requirement to run
the analysis and plot results.
  However, this type of analysis would be useful in more loosely-connected communities because it
tags the nodes which are essential in linking disparate clusters which would otherwise have been
unconnected.

   2) Scaling up
  As the constraints on the RFC authors are eased by allowing authors having published less than 20
RFCs in the network, it is possible to discover where the cutpoints are to these other authors. This could
determine which of the core authors bring connectivity between the core network of authors and the rest
of the RFC community. However, in the case of RFCs, the network is too closely connected to be
affected by cutpoints.

   3) Conclusions
  “Blocks and cutpoint” analysis is useful in examining loosely-connected networks.

  This type of analysis yields ambiguous results when used on closely-connected networks such as the
network of RFC authors since the only critically connected components of the network are pendants,
and those are easily detected by eye.

  It is worth noting that this type of analysis can be combined with any of the above analyses since the
tagging of blocks and cutpoints can be undertaken by changing node colors and shapes. Sometimes, a
new network layout can enhance readability whilst keeping block and cutpoint tagging active.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                              19


F. Factions
  A “faction” is a group or clique within a larger organization, or the like. In graph theory, a "faction" is
a part of a graph in which the nodes are more tightly connected to one another than they are to members
of other “factions”.

  The NetDraw program can iteratively determine the most appropriate division of the network using a
“factioning” algorithm. It is worth comparing this analysis with K-Core data which is based on similar
principles of local clustering or sub-structure.
   1) Method/Theory
  The algorithm is different from the K-Core algorithm in that NetDraw actually asks how many
factions should be created. The algorithm then forms the number of groups desired by seeking to
maximize connection within, and minimizing connection between the groups. Nodes are colored, and
the information about which nodes fall in which partitions (i.e. which cases are in which factions) is
saved to the node attributes database.
   2) Results
  The parameters used in our analysis were as follows:
   • Data subset: author has published more than 20 RFCs (64 authors satisfied this rule)
   • Node size according to number of publications
   • Link thickness between two authors according to number of publications written together

  In our example, expanding the K-Core analysis described earlier, it was assumed that we could
initially divide the network into 5 factions. This is shown in Figure 11. It is then possible to explore
further faction division by increasing the parameter for the number of clusters required. This yields
sometimes peculiar layouts, shown in Figure 12.




Figure 11: 5 factions of authors having published 20+ RFCs / Layout, node color & shape, according to factions
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                               20




Figure 12: 10 factions of authors having published 20+ RFCs / Layout, node color & shape, according to analysis
when dividing into 5 factions. Note which factions have been divided – hence which are the weaker factions


  There appears to be no single correct or incorrect “answer” using the faction algorithm. There is just a
measure of the faithfulness of a node to a cluster depending on its connection to one, two, or more
groups.




Figure 13: 5 factions of authors having published 20+ RFCs / Layout, node color & shape, according to analysis
when dividing into 10 factions. Note which nodes have been grouped.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                    21

   It is therefore possible to determine which factions are more strongly linked together and which are
likely to break apart when circumstances change. It is also possible to see which new factions are likely
to be created.
   The algorithm can be used in the other direction. For example, it is possible to start with a larger
number of factions, and reduce the number of factions, with groups merging together. It is interesting to
see how there is no homogeneous gathering of all nodes when the number of factions is reduced. An
example is shown in Figure 13.
   Another oddity is the grouping of nodes which are not inter-connected to each other. Rather than the
algorithm grouping them due to their inter-connectivity, it groups them due to their not fitting in any
other faction.

  The results can be displayed not only as a layout rendered by this algorithm, but also as another
layout, such as K-Cores, Circles, etc. This introduces interesting differences since some nodes which
might be part of one cluster during faction analysis, might be part of another group during K-Core
analysis grouped layout.

   3) Scaling up
   Scaling up, running the simulation under constraint that authors publish 10 or more RFCs, brings the
total number of authors to 171. Reading individual node labels is impossible at this density. However, it
is possible to remove labels and perform macro-analysis.
   For example, it is possible to divide the network into 10 factions and assign a color and shape to each
faction, then group the factions together by reducing the network to 5 factions. Some factions do not
wholly group with a single other faction but sometimes distribute their nodes among the other factions
according to the affinity each node had with other nodes in other factions. This makes for interesting
analysis in real world social grouping, for example in electoral processes.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                                  22

This is shown in Figure 14.




Figure 14: 10 factions above are grouped into 5 factions below. It is possible to see how some clusters splintered
among several factions. Network subset: authors having published 20+ RFCs




  4) Conclusions
  The “Factions” feature in NetDraw is useful to group authors into clusters and detect those authors
having an affinity to another group when dividing the network into a different number of factions. This
type of analysis is sometimes more conclusive when a network is more loosely interconnected than in
our example making use of a restricted number of authors which are very closely related to each other.
This analysis is also useful when grouping clusters according to affinity. Our example shows that the
grouping of clusters is not one that takes place wholly and evenly since some factions divide themselves
among the remaining clusters.
  As with any social network analysis, care must be taken not to jump to conclusions from first
examination because oddities might appear in the clustering process. These are caused by lack of fit
within any other group, rather than a similarities or good connectivity within the group itself.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                    23



G. Girvan-Newman algorithm

  The Girvan-Newman algorithm is one of the methods used to detect communities in complex systems.
In fact, the theory developed by Girvan and Newman [13] defined communities as not being quite the
same thing as clusters.



   1) Method/Theory
  A “community” is a cluster of nodes where the inter-relationship between nodes is high through a
high concentration of links. A clique would fit this description but a community is not restricted to a
clique. What defines the community from the cluster is that the links to nodes in other communities are
specifically less dense, whilst clusters do not take this into account.

Without going into details about this algorithm, its basic function is as follows:

   1.   Calculate the betweenness of all existing links in the network;
   2.   Remove links with the highest betweenness;
   3.   Recalculate betweenness of all links affected by the removal;
   4.   Repeat steps 2 and 3 until no links remain.

   2) Results
  The parameters used in our analysis were as follows:
   • Data subset: author has published more than 20 RFCs (64 authors satisfied this rule)
   • Node size according to number of publications
   • Link thickness between two authors according to number of publications written together

  The analyst can choose how many communities to create from the network. Running the analysis
produces node data which will be saved with the rest of the data related to each node. Displaying the
results is possible by selecting the “Group by attribute” layout.

  It is therefore possible to reach a large number of results, depending on the number of communities
chosen. With other methods, the meaning of the resulting data is left to the analyst’s eye. Selecting too
few communities will cluster nodes which are too loosely connected together. Too many communities
will explode more tightly knit communities but show the cliques within the communities with greater
detail.

  However, the Girvan Newman algorithm introduces the variable Modularity Q. The algorithm
calculates the Modularity of each type of grouping and Q is an indicator of the quality of clustering.

Choosing a calculation using from 2 to 15 clusters in our target network, the following results were
obtained:

Clusters      2        3         4         6         7         8         9         13      14       15
   Q        0.013    0.294     0.460     0.500     0.493     0.487     0.482     0.463   0.458    0.442

  Q is maximized when dividing the network into 6 clusters. This result therefore appears to be the most
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                   24

befitting group structure in our network, and this is shown in Figure 15.




Figure 15: Girvan-Newman algorithm clustering for 6 clusters (Modularity Q=0.500)
  For comparison reasons, it is then possible to mix several analyses on one diagram. For example, the
above diagram layout can be kept while node attributes are modified according to other parameters such
as K-Core analysis. Performing such a plot, it is possible to find the degree of connectivity of nodes
within each community. This is shown in Figure 16.

  In the diagram, the nodes with highest K-Core value are shown as upward pointing red triangles, the
next as down pointing blue triangles, then yellow circles in square, etc. This provides information about
the key connecting nodes, intra & intercommunity-wise.




Figure 16: Girvan-Newman algorithm clustering for 6 clusters (Modularity Q=0.500) & K-Core Analysis
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                     25

   3) Scaling up
Growing the sample size by loosening the constraint to 10+ publications by author, it is possible to
analyze 171 authors.
Running the data through the Girvan Newman algorithm produced values for the Modularity factor Q
different from the smaller data set.

Clusters     7         8         9         10       11        12        13        14        15       16
   Q       0.091     0.456     0.446     0.453    0.451     0.455     0.453     0.438     0.438    0.437

In this case, no type of clustering shows a dominant Q modularity. The network could be divided into 8
to 13 communities with similar Q modularity, thus demonstrating a very similar quality of clustering.

It is therefore apparent that the Girvan Newman algorithm does not scale well with our network since
the communities are to tightly knit together – a testimony to the “community feeling” of RFC authors.
   4) Conclusions
  When analyzing the core network of RFC authors, the Girvan-Newman algorithm produces graphical
results which give the impression of being similar to other methods. It is useful to find those clusters of
nodes with highest betweenness, even when zooming onto communities which might have initially
appeared to be tightly knit. The Modularity factor Q is calculated by the NetDraw software using the
algorithm, as a measure of the quality of clustering, and this allows us to find the most natural type of
clustering for the data.
  This algorithm is consequently very efficient at detecting communities and the most likely grouping of
those communities, even when the initial data set is as restricted as the RFC authors list. It yields more
accurate results when used with smaller social networks.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                    26



H. Hiclus of Geo-distances
  This stands for a method named High Clustering of Geodesic distances. This algorithm was
developed by Johnson [14] and is used by NetDraw to generate n-numbers of clustering possibilities were
n ranges from 2 clusters to the total number of nodes analyzed.
    1) Method/Theory
  The Hiclus of geodesic distance is a measure of cohesion in subgroups within the network calculated
by algorithms defined as follows:

With N nodes that need to be clustered and an N x N distance (or similarity matrix):

    1. Assign each node to its own cluster, with its distances defined as the distances (similarity)
       between the items they contain
    2. Find the most similar pairs of clusters and merge them into a single cluster
    3. Compute distances (similarities) between the new cluster and each of the old clusters
    4. Repeat steps 2 and 3 until all nodes are clustered into a single cluster of size N.

  The geodesic distance in this context makes the assumption that the graph is a three-dimensional
object and that the links between each node is the distance between them. For example, adjacent nodes
have a distance of one. From a node to another by stepping through a third node has distance of two, etc.
    2) Results
  A large set of results is calculated by the program and is saved as new attributes for each node. This
can therefore be plotted using “group by attribute”. The Hiclus of 5 clusters is shown in Figure 17.




Figure 17: Hiclus of geodesic distance selecting 5 clusters
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                       27




Figure 18: Hiclus of geodesic distance selecting 7 clusters

  The Hiclus of 7 clusters (Figure 18) appears more meaningful since enough groups are formed which
show real clustering. Increasing the number of groups (8, 9, 10, etc.) it is possible to see groups
splitting. The Hiclus of 15 clusters is shown in Figure 19.




Figure 19: Hiclus of geodesic distance selecting 15 clusters – groups are splitting up into individuals
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                         28


  Analysis of these graphs makes it possible to find out which nodes are more likely to break off from a
cluster, and in which order. Since NetDraw generates node data until Hiclus N, where N is the number
of nodes in the network, it is possible to gauge the order in which groups will split up.
  For example, taking the graph of Figure 19 and redefining node colors and shapes according to the
Hiclus of Geodesic distance with 5 clusters, it is possible to see how the 5 original clusters split up into
15 clusters, some clusters being single pendants or dyads. This is shown in Figure 20.




Figure 20: Hiclus of geodesic distance selecting 15 clusters compared with node attributes for 5 clusters



   3) Scaling up
  Since the algorithm as implemented in NetDraw involves clustering from 2 to the total number of
nodes in the graph, this type of analysis does not scale well except if using powerful processing
resources. An analysis of 64 authors (initial subset) yielded the above results. Increasing the sample size
to 171 authors served only to crowd the graph to the point of making it less legible.
  If all constraints are removed, more than 3 000 authors have to be processed and this has been found
to generate superfluous results. This type of analysis is therefore better used for smaller subsets of
nodes.
   4) Conclusions
  The Hiclus of Geodesic distance analysis yields results where a division of the graph is undertaken
from 2 to N clusters, where N is the total number of nodes in the graph. Successive plots, for example
Hiclus of 5 and Hiclus of 15, are possible, and if the node shape is defined according to its clustering in
Hiclus 5, it is possible to see the clustering in Hiclus 15 and the varied make-up of the resulting clusters.
  Whilst pendants will be the first to break from a cluster, cliques are likely to be the last clusters to
divide themselves. It can be seen clearly by comparing the graph for Hiclus 5 and Hiclus 15. This is
essentially a very useful method to gauge the stability of a group of people.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                   29


I. Ego Networks
  One of the assumptions in each of the analyses presented thus far assumes that all nodes are active
throughout the period of activity from which the data was mined. In the case of RFCs, this was
unfortunately sometimes not the case. For example, Jon Postel passed away in 1998 and this left a huge
gap in the RFC space, not only because of his hierarchic position in the social network but also because
he was such a pleasant and hardworking individual. This kind of influence could however not be
measured mathematically.
  If one resorts to strictly looking at relationships as defined from data mining, a mathematical measure
of an individual’s influence in a network can be calculated in NetDraw. This theory is named by social
network researchers as “Ego Networks”.
   1) Method/Theory
  The Ego network of a node with geodesic distance 1 consists of all nodes immediately linked to that
node. When the geodesic distance is increased to 2, nodes connected to those nodes are included in the
graphic, and so forth for higher Geodesic distances.
NetDraw allows the user to select more than one node’s ego network to find out the geodesic relation
between them, depending on each individual ego network’s reach.
   2) Results
  The parameters used in our analysis were as follows:
   • Complete Data set of 3 480 authors and 17 000+ links
   • Node size according to number of publications
   • Link thickness between two authors according to number of publications written together

  In order to illustrate the concept of Ego networks, we have simulated the ego network of a well-
known RFC author, Dr. Vinton Cerf, as shown in Figure 21.




Figure 21: Ego Network of V. Cerf. (geodesic distance = 1)
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                   30

Input file discrepancies (Metcalfe_R. and Metcalfe_B) are treated in Section V.A.1.a.

  This diagram, just like every other result obtained using NetDraw, can be exported to MAGE and
rotated, zoomed-in and otherwise manipulated in 3D. An example screenshot is shown in Figure 22.




Figure 22: Ego Network of V. Cerf. (geodesic distance = 1) as seen in MAGE 3-D



  Another use of the Ego network analysis as applied in NetDraw is the analysis of connection paths
between two nodes having a geodesic distance greater than 1.
  It is possible to plot the Ego network for another author, for example Randy Bush, and relate it to Dr.
Cerf’s Ego network, whilst keeping a maximum geodesic distance of 1. This is shown in Figure 23,
overleaf.

  NetDraw allows for any combination of geodesic distance & simultaneous node selection (or de-
selection, in order to note the “holes” in the network, and further analysis is possible on this sub-
network alone, through K-Core, Newton-Girvan, or indeed any other analysis as described above. This
makes for a very extensive combination of analysis and the possible generation of interesting social
patterns within the network.

   3) Scaling up
  In NetDraw, it is possible to select a geodesic distance of 2 (or more), in order to find out the nodes
connected to the nodes connected to Dr. Cerf’s node – the 2nd degree of separation. Since the RFC
community is well networked, the resultant graph is much more crowded, as seen in Figure 24. The
analysis is therefore limited by readability of the resulting graphs.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                         31




 Figure 23: Ego Network of V. Cerf. (geodesic distance = 1) relating to R. Bush’s network




 Figure 24: Extended Ego Network of V. Cerf. (geodesic distance = 2)
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                   32

   4) Conclusions
  The Ego Network analysis is very useful in determining the structure of nodes directly linked to a
node, and in turn, the structure of nodes connected to those nodes. It is a useful tool to determine the
extent of a node’s social networking reach as well as the social structure between two or more nodes.
  When used on a data set consisting of a group of people in an organization, it is therefore possible to
evaluate an individual’s social influence and immediate surround.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                      33


J. Geometrical Analysis using yEd
   yEd [15] is a free Java-based graph editor which can be used to generate drawings and to apply
automatic layouts to the graphs comparable to those generated by NetDraw. The strength of yEd lies in
its ability to re-map complex graph structures into entirely new layouts which might bring more sense to
the input data and help detect hierarchies or pseudo-hierarchies within a social network.

  Other layout algorithms make use of geometry to produce Orthogonal or Organic layouts, Tree and
Circular layouts including multi-radial and plain disc layout which can detail interconnected rings and
star topologies.

  NetDraw was used earlier to provide a number of layouts, but yEd’s algorithms are more powerful in
re-routing edges (links) to provide a cleaner layout topology, especially when using edge routing, an
option which makes edges align with each other.
  It is important to note that NetDraw and yEd have entirely different purposes. NetDraw is used to
analyze a network to detect clusters, ego networks etc. yEd is a graph editor used solely to display a
network in a variety of topologies. Indeed, most users utilize yEd solely to produce clearer graphs for
knowledge representation, software engineering, database schematics, process and workflow illustration
and family trees.

   1) Method/Theory
  yEd accepts several input file data formats including GraphML, YGF, GML (a popular text-based
format), TGF and XML formats. Unfortunately, none of these formats is compatible with any of the
formats in which NetDraw data can be exported.
  The yEd graph therefore had to be built using the integrated graphic editor by a click, drag and drop
process to create nodes and link them together. The input data was manually read from the .vna file
generated by the NetDraw software when saving the NetDraw graph.
  As a result, 64 nodes and several hundred links were created manually using point and click. Each
node was also labeled accordingly.

  A choice was made to select rectangular boxes allowing for containing an author’s full name, but it is
also possible to modify node attributes to follow shapes, colors and sizes, whilst also modifying link
thickness, arrowheads, etc. In this respect, yEd has features similar to NetDraw. The only drawback is
that it is impossible to change node attributes automatically, although this can change under certain
conditions when performing specific types of layout analysis, according the special demands of the
resulting graphic.

  Arrowheads denote a link’s direction. All arrowheads were removed in order to clear up the clutter
generated by so many nodes in such a small topological space. It is important to note that even with
arrowheads removed, links keep a direction. Whilst some layout algorithms do not make use of this
information, others, such as the hierarchical layout algorithm, establish layer order by using the direction
of the links. This might be confusing when no arrowheads are present and might lead to erroneous
results.
  Link thickness was defined for each link, according to the number of RFCs written together by a pair
of authors. 5 levels of arrow thickness were chosen manually.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                    34



   2) Results
  A large number of permutation of layouts is possible using yEd. Each type of layout allows for several
parameters to be modified, sometimes producing vastly different results.

    a)       Circle Layout
  It is possible to select from layouts which are appear similar to those obtained using NetDraw. One
such layout is the Circle layout where a plot similar to the one shown in Figure 1 can be created.
  Nonetheless, yEd offers more layout options to plot the circle.

  For example, the circle plot layout can be transformed into a disk, where some nodes appear in the
center of the circle, and others, namely cutpoints, appear outside the circle. Those cutpoints are defined
as the base for all pendants. Some manual housekeeping (shortening of some links, coloring of cutpoints
and pendants) results in the graph shown in Figure 25.




  Figure 25: yEd plot of network (subset of authors having published 20+ RFCs) in disk layout.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                             35


     b)      Disk Layout with organic edges
   Starting with the network shown in Figure 25, the links connecting the nodes can be altered into
organic links, whilst the layout of the nodes remains untouched. This algorithm routes the links so as to
ensure that they do not overlap nodes and keeps a specifiable minimal distance between them.
   The algorithm is based on a force directed layout paradigm. Nodes act as repulsive forces on links in
order to guarantee a certain (user-defined) minimal distance between nodes and links. The links tend to
contract themselves. Using “simulated-annealing” leads to link layouts, which are calculated for each
link in turn. The resulting graph is visually attractive in that nodes are not overlapped by links, although
since some links overlap each other, it is sometimes difficult to follow their routing. The result is shown
in Figure 26.




Figure 26: yEd plot of network (subset of authors having published 20+ RFCs) in disk layout and organic links
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                                36


    c)      Disk layout with orthogonal edges
   Starting with the network shown in Figure 25, the links connecting the nodes can be altered into
orthogonal links, whilst the layout of the nodes remains untouched. This algorithm can route the links of
the network using only vertical and horizontal line segments, while keeping the positions of the nodes in
the network fixed. The routed links will usually not cross through any nodes and not overlap any other
links. The resulting network is shown in Figure 27.
   yEd channel edges layout provides a similar routing topology for the links, with a few less significant
alterations.
   It is interesting to note that yEd’s orthogonal edge router and orthogonal channel edge router
algorithms can be used on any type of network topology without displacing the initial node position. It is
therefore possible to “clean up” any type of network graph through a combination of node positioning
and link positioning.




Figure 27: yEd plot of network (subset of authors having published 20+ RFCs) in disk layout and orthogonal links
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                             37


    d)       Organic Layout
   Selection of the Organic Layout produces undirected graphs containing no overlap between nodes.
Processing the resulting graph through the edge router’s organic layout also makes sure that no overlap
occurs between links and nodes. The type of layout generated has essential similarities with the layout
obtained in NetDraw’s output of Spring Embedding using Geodesic Distances, Node Repulsion and
Equal Edge Lengths analysis. The organic layout box in yEd also allows for the defining of a preferred
link length. The resulting network can be seen in Figure 28. Whilst readability is improved over the




  Figure 28: yEd plot of network (subset of authors having published 20+ RFCs) in organic layout and organic links

NetDraw output shown in Figure 6, clusters might be slightly less noticeable by eye because all nodes
are evenly spaced. yEd allows for the manual definition of clusters, whereas the cluster can be laid out
with a different algorithm. This is seen later.


    e)       Orthogonal Layout
  This type of layout produces compact drawings with no node overlaps, few crossings, and few bends.
All links are routed in an orthogonal style: only vertical and horizontal line segments will be used. This
enhances readability of the resulting graph. As with every other type of layout in yEd, this option offers
a selection of preferences which radically modify the results, each with its own advantages.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                            38

  One such option is the use of “Node Boxes”, where nodes are resized according to the number and
position of their neighbors to reduce the overall number of bends in the links. Readability of the graph is
improved and a by-product of this algorithm is that it tends to cluster more intensely tied nodes together.
An example of this stylish layout is shown in Figure 29. A tradeoff is that node size might be
misinterpreted as being linked to the importance of a node in the network whilst it is clearly not the
case.




  Figure 29: yEd plot of network (subset of authors having published 20+ RFCs) using variable size Node Boxes and
  orthogonal layout, with grid size 15.




  Correlating results obtained using NetDraw with results obtained with yEd generates interesting
results. For instance, since the above diagram appears to show clear clustering of nodes which are
closely connected together, a clustering algorithm from NetDraw can be used and applied to the nodes
in Figure 29.
  The results from the Girvan-Newman algorithm (Section G) generated a diagram shown in Figure 15,
with 6 clear clusters appearing to be the most optimal network clustering.
  Applying these results to the nodes shown in Figure 29 and selecting the option of “face
maximization”, generates the graphic shown in Figure 30.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                           39




Figure 30: yEd plot of network (subset of authors having published 20+ RFCs) using variable size Node Boxes and
orthogonal layout, with grid size 15, cross-linked with Girvan-Newman Clustering algorithm data




  The clusters are shown and appear to validate the data generated through the Girvan-Newman
algorithm. Further combinations of analysis and graphical display are possible, although not all
combinations bring further cognitive advantages to the analysis.
  For example, other options using this layout also allow for mixing the orthogonal layout algorithm
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                              40

with a tree sub-algorithm where larger sub-trees are processed using a special tree algorithm. Whilst
results using each of these algorithms generate good looking graphs, no significant further insight is
gained from our input RFC network than by other means described earlier.


    f)      Hierarchical Layout
  An important option in yEd is plotting the graph using hierarchical layout. This includes a set of
algorithms which can be permutated to generate a vast array of hierarchical graphs.
  Establishment of a hierarchy of nodes necessitates the use of link direction. However, the network of
RFC authors involves two-way collaboration between common authors of a RFC, with no explicit
hierarchy or precedence of one author over another. Drawing the graph with reciprocal arrows vitiates
the hierarchical layout analysis by still trying to establish a clear hierarchy and reciprocal arrows are not
shown as a single double-ended arrow, but rather a cyclic process. Removing arrowheads does not
remove link direction. Such layout is therefore flawed since the anticipated result, that which determines
the most significant nodes in the network (for example, nodes denoting highest RFC authorship or
highest betweenness), is not the result attained.
  When the “top to bottom” option is selected, nodes are placed in hierarchically arranged layers and
this gives a false illusion of hierarchy when there actually is none. Nevertheless, the aesthetic outlook of
the resulting layout is helpful in providing a clear view of the network, especially when selecting
orthogonal edge routing. Whilst at first glance, clustering of cliques seems apparent, this is actually not
the case. Many long links exist, linking distant nodes. By selecting hierarchical optimal ranking, layer
assignment is done in such a way that the overall sum of the layer distances of all edges in the layout is
minimal. The resulting graph is shown in Figure 31.




Figure 31: Hierarchical Layout of network (subset of authors having published 20+ RFCs) with orthogonal links.
Options are hierarchical optimal ranking,
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                          41


   Surprising, visually creative but analytically ineffective results are obtained with the polyline option
which creates a vast number of bends and parallel links.
   Various other options are available to concentrate or disperse, increase or decrease the number of
links, layout and direction of the network. Each permutation of options produces graphs which are
equally as visually attractive, but with no analytical value.
   As a result, the hierarchical layout might not be most suited to display the results obtained in our
analysis – although the minimum bends in the links make the network very readable.

    g)      Using groups with a mix of layouts
  Clusters found using NetDraw can be defined as a group in yEd, and a mix of layouts applied for the
groups themselves and the links between the groups. This gives rise to nested graphs. For example, it is
possible to define 6 node clusters from the results of the Girvan-Newman algorithm. Nodes in each
cluster can be grouped, and groups laid-out independently from the rest of the network. Many
combinations of types of layout can be used to reach various results.
  Cross-group links could be routed orthogonally, organically, randomly, or could be removed
altogether. In this case, the resulting diagram of node organization within each group is shown in Figure
32.




Figure 32: Grouping of nodes (subset of authors having published 20+ RFCs) cross-linked with NetDraw’s Girvan-
Newman Clustering algorithm data and layout using disk algorithm, with removal of inter-group links
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                      42



   3) Scaling Up

  The sub-networks plotted in Figures 25-32 reach a nodal upper limit for effective micro-analysis due
to the restricted space available on a single A4 page. Scaling the network size up by removing
constraints on the input data set is possible but is bound by two principal limits:
            - all data from NetDraw need to be input manually either using point & click, or writing a
                data file in text format; RFC authors are so closely linked together that this introduces an
                exponential number of links as soon as further nodes are added;
            - as more nodes are displayed on screen, clutter takes over. Node size might need to be
                reduced, the diagram zoomed-out, and labels therefore rendered unreadable.

Micro-analysis transforms itself into macro-analysis. On a network as cross-linked as the RFC authors,
macro-analysis of data using yEd does not yield additional key results. However, networks containing
more defined sub-groups and less cross-group connectivity will likely yield satisfactory results with
macro-analysis.



   4) Conclusions
  yEd is a powerful piece of software which can be used to generate new network topologies in a social
network. Whilst many of its features are similar to the features provided by NetDraw, yEd is different in
that it is a graph editor, whilst NetDraw performs network analysis.

Its weaknesses:
           - Most input file formats are binary files and do not interface with NetDraw output files. It
              is therefore difficult to share data between the two types of software
           - Except in specific cases, nodes do not support automatic attributes which could have
              been generated by analysis – attributes need to be set-up manually


Its strengths:
            - Nodes can be replaced by user-defined icons, thus giving rise to the possibility of very
               impressive visual styles
            - Edge routing generates particularly clean graphical results
            - Hierarchical, organic and orthogonal layouts are not offered in NetDraw. The results
               attained using yEd are therefore complementary to those reached using other software
            - Tree, as well as star and spoke layouts generate very clear results which might pinpoint
               more “hidden” information within a data set or social network
            - Most components of the resulting graph can be labeled extensively
            - Customizable workplace by docking sub-menus as desired
            - Graphs can be nested: for example part of a graph can be displayed using one algorithm
               and another part using another algorithm best fitting its needs

 In this section, we have shown a few possible uses of yEd in the context of social network analysis.
Combined with NetDraw software, yEd provides a powerful free starter pack which can be used in the
world of Social Network analysis.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                    43

                                            V. DISCUSSION

A. Limits
  The analysis presented here is bounded by many limits. Many assumptions had to be made in order to
keep the analysis to a sustainable size. These assumptions are likely to introduce discrepancies in the
results. For the sake of awareness, the limits of the analysis are detailed in this section.
   1) Input file discrepancies

    a)      Naming conventions for individual authors
  As is sometimes customary in Anglo-Saxon countries, some names are not always transcribed in their
original form. “Richard” may be quoted as “Dick”; “Robert” as “Bob”; “Anne” as “Ann” etc.
  Similarly, some foreign names may be spelled differently depending on the period. This is particularly
understandable when the RFC database is pure ASCII and many names use characters that go beyond
the scope of ASCII, for example replacing “ü” with “ue” or “u”.
  Both name inconsistencies might introduce several apparently different instances for a given author.
This was not corrected in the several cases found in the database because we did not have the ability to
crosscheck if “John Doe” and “Jon Doe”, or “Bob Smith” and “Robert Smith” were the same individual.
Such manual crosschecking would be too time-consuming. The errors introduced in the results were
found to be small enough to ignore. The rationale behind this is that an author would use one type of
spelling to their name in most cases. Erroneous spelling would therefore be the exception rather than the
norm.
  An example of such discrepancy can be seen in Figure 21 where both Metcalfe_B. (Bob) and
Metcalfe_R. (Robert) are shown.

    b)      Naming conventions for organizations
  The naming integrity in the RFC database is imperfect. Whilst in some cases the full name of an
organization is given, there are also several equally frequent occurrences were the acronym of the same
organization is used.
  For example: IAB vs. Internet Architecture Board vs. Internet Activities Board; IETF vs. Internet
Engineering Task Force; IANA vs. Internet Assigned Numbers Authority; ISOC vs. Internet Society;
IESG vs. Internet Engineering Steering Group etc.
  We felt that these discrepancies were sparse enough not to cause major data corruption. Furthermore,
our study centered on individuals and not on organizations. The decision was therefore taken to ignore
those discrepancies, though it is worth noting that several RFCs have one of the above organizations as
their sole or joint author.

    c)      Reporting on work with third parties
  Several RFCs report on work undertaken by or in collaboration with third parties who might not be
named in the RFC itself. Some of these RFCs have an author identifying himself or herself as the editor
of the document. It is unknown whether this editor will have also contributed to the work presented,
and the team which performed the work might be identified through acronyms described above, or are
simply unidentified. “Conversation with” a third-party also constitutes the subject of several early RFCs.
In all cases, the name recorded is that of the RFC author. Manually treating each case in turn is much
too time consuming, and whilst some individuals might have benefited from this reporting, we felt that
this inconsistency was marginal enough to be ignored.

    d)      RFC status
  This is explained in RFC 2026. It is important to remember that not all RFCs are standards track
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                    44

documents, and that not all standards track documents reach the level of Internet Standard.
  RFCs therefore fall in different statuses: Internet Standards Track (Proposed Standard, Draft Standard,
Internet Standard); Non-Standard Track Maturity Levels (Experimental, Informational, Historic); Best
Current Practice (BCP) and Unknown.
  Our study does not take note of the current status because it is assumed that at the time the RFC was
written, it was current. The "usefulness" and scope of an RFC’s importance cannot be scientifically
evaluated. All RFCs are therefore considered in this study on an equal footing. Again, this might
introduce inaccuracies in the results, although the RFC sample size is so large, these are statistically
minimal. Furthermore, it is worth noting that all RFCs are equal when it comes to networking between
authors, whether the RFC reaches standards track or not.

   2) Use of Network Analysis Software

    a)      2-D vs. 3-D
  Mentioning 3-dimensional displaying of data always attracts much attention. Although NetDraw
displays networks in 2-dimensions, its interfacing to export data which can be readily used in MAGE
software displaying a network in 3D is a real asset. However, the utility of displaying the data in 3D is
dependent on the input data set. Some network topologies will not show well in 3D. The question of 2D
vs. 3D is one which can only be answered through trial and error.
  Although an in depth discussion about 2D vs. 3D is outside the scope of this document, current
scientific knowledge points to 3-dimensional cognitive processes requiring more complex processing
for the human brain than 2-dimensional. Fixed 3-dimensional display of data adds complexity to the
brain’s visual recognition and might therefore be less useful than 2-dimensions except when presented
in an interactive way, such as the reader being able to rotate the 3-dimensional space about an axis. In
fixed displays, adding a third dimension to a planar representation might be counterproductive by
adding complexity.
  Much data about the 2D vs. 3D cognitive model is available elsewhere on the Internet.

    b)      Large input data sets
  It is said that a diagram speaks a thousand words. Large input data sets can indeed be analyzed using
the tools presented in this essay. However, space restrictions on an A4 page make it difficult to show the
results in a legible manner. It is therefore usually only possible to undertake macro-analysis on the
network (removing individual node labels, for example), and restrict micro-analysis to smaller data sets.
A mix of micro and macro analysis would be very useful in the future – somehow being able to zoom in
towards specific parts of the network and isolating them using point and click. For the time being, both
NetDraw and yEd have low usability factor in this scenario.
  Large input data sets also require an increased amount of computing power and memory. Provided
adequate computing resources and memory are available, it would be possible to carry out a much more
targeted analysis.

    c)      Informal data vs. Formal data
  The data source, namely IETF RFCs, constitutes only a subset of every development and collaborative
work ever undertaken to make the Internet what it is today. Whilst formally only a subset of authors are
included as the authors of an RFC, most RFCs are discussed extensively in working groups when at
Internet Draft stage, and informal communication provides much of the input towards the final RFC. In
choosing a defining link between authors as the only link between the authors and as the total data set
for our analysis, we are unable to incorporate the informal data generated in the discussions.
  This introduces a limit which vitiates the hypothesis of this research to find the “Father of the
Internet”. Indeed, based on the research which therefore uses only a subset of the people involved in the
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                        45

Internet’s development, it is clear that only a subset of contributors to the Internet’s development is
displayed in the graphs.
  Informal data is nearly impossible to track. Increasing the reliability of the results would have to
involve a mining of every email ever sent in the realm of the IETF standards process and this is clearly
impossible. Working group mailing lists do have archives, but these are so dissimilar in style,
completeness and social interaction protocols that a significant dose of Artificial Intelligence would be
required to mine noteworthy data.

    d)      Restricted data sets
  Even with the subset data source consisting solely of IETF RFCs, the data mining process used further
reduces the input data set since it is too basic to extract acknowledgments from the RFC’s text. The only
parameters currently mined are the name of authors for each RFC. Processing of this data gives rise to
the number of publications by author and an author’s links with other authors.
  An important dimension missing from the data set is the concept of time. Some RFCs were written in
the 70s, some in the 80s, some in the 90s etc. Modifying the data mining process to incorporate dates
would enable NetDraw analysis by target date, which could then show social networks as they existed in
each period of time. Comparison of those networks might provide a good idea on the “nomadic”
behavior of some authors, a possible explanation for the differing faithfulness shown by some nodes
when dividing the network of authors into clusters, as seen in Section G, the Girvan-Newman algorithm.
The Internet has evolved, and so have the social networks of people building it.
  Another dimension missing from the data set is the significance of an RFC, the current assumption
being that every RFC is as “important” as every other RFC – and this is clearly not the case. Perhaps a
concept of “RFC weight” could be developed to measure the impact of each RFC on the Internet’s
technical development.
The more restricted a data set, the more restricted the results.

   3) Reliance on RFC database to find a father
   This study makes exclusively use of the RFC database to examine its evolutionary process. Of course,
the basic assumption of “Anything not in a RFC does not exist” is as absurd as “Anything invented
before the first RFC does not exist”. A great many inventions, for example WYSIWYG and the Mouse,
the World Wide Web, search engines, peer-to-peer computing and other applications also make the
Internet what it is today. In fact, the biggest strength of today’s Internet is that you can throw any type of
traffic at it and it will carry it, since it is both physical, link and application layer independent.
   None of the above applications was covered by a RFC. Does this mean, none of the inventors of those
applications have any kind of paternity claim over the Internet?
   RFCs are not the Alpha and Omega of the Internet’s existence. For example, a large amount of early
work was published as Internet Experiment Notes (IENs), a set of more than 200 documents and reports
preceding the first Internet RFC (RFC675)[16]. Our analysis misses this data.
   Perhaps a wider, more inclusive, cross-standard, cross activity, cross-invention and cross-layer search
and analysis would be required? Scio me nihil scire (I know myself to know nothing) [17].

B. Opportunities
  Social Network Analysis opens a new door to further understanding groups of people. In the context
of RFC authors, any method circumventing the limits described above would increase the accuracy of
the analysis and therefore the accuracy of the results.

  As we have seen in this essay, a major analytical shortfall of our research is that it does not take
chronological perspective and timelines in consideration. Any ongoing research process spanning
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                        46

several discoveries introduces a hierarchical chronology of innovations and publications. For instance,
combining this study with an analysis making use of the RFC Citation index would yield further
information about the influence RFCs authors have had over theirs peers and over the development of
today’s internet. The Citation index would need to be data mined from the existing RFC Index [8].

  The treatment of these results would ease the limit described in Section V.A.1.d above, as well as
allow for the sketching of NetDraw and yEd graphs, a non-exhaustive list being suggested as follows:
      • RFC timelines
      • A hierarchical tree of RFCs which might lead to a hierarchical tree of RFC authors
      • The Ego network of a RFC, which might lead to an Ego network of authors with indirect
           influence, rather than the current direct influence analysis shown in Section IV.I.
      • The branching of RFCs which have been unsuccessful in generating traction – some might be
           lost opportunities, some might be rising stars, some might be dormant, some might be
           alternative processes and some might be dead ducks. The interest in this analysis is generated
           from the question: did history make full use of knowledge available at the time?

  Clearly, a cross-discipline fusion of analytical methods, using statistical techniques, Artificial
Intelligence, graph theory, chaos theory, fuzzy logic and social network analysis of a cross-layer input
set of data would enhance the accuracy of results – as would having access to vast manpower and
computing resources. The opportunity to derive data on this subject from the sources currently available
on the Internet is almost limitless, but the intent of the work presented in this essay is not to reach highly
conclusive and accurate results. Rather, it is to provide a somewhat rhetorical example of what could be
achieved with very limited computing resources (namely one laptop) and software freely available out
there on the Internet.



                                         VI. FURTHER WORK
  Taking into account the opportunities described above, the door is clearly open to many paths for
further work. For example:
      • Use methods of social analysis on:
             o ISOC bottom-up structure
             o ICANN and its constituencies, starting perhaps with the At-Large structures
             o W3C recommendations and its consensus-based standards tracks
             o WSIS/UN IGF bottom-up processes and at large involvement at global level
             o Elements of Internet Governance
      • Use of methods of social analysis in a political party, to ensure a smooth information flow and
         correct leadership process, including the processes leading to presidential races and elections
      • Use of such methods in any organization whose decision structure is based on the concept of
         bottom-up processes, whether by consensus or vote



                                           VII. CONCLUSION
  In this exercise, we have demonstrated the worth of Social Analysis and its usefulness in light of new
Knowledge Management practices. By combining some Competitive Intelligence Data Mining
techniques with Social Network Analysis, we have introduced new parameters which can be used to
verify and display the degree of satisfactory consensus building in an organization. This could include
the organization of working groups or combining of entities having a different social, contextual and
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                       47

historical background. This could also include the sharing of data within the context of an
organization’s Knowledge Management.
   The Internet and its governance is possibly one of the most complex societal systems ever to evolve.
Its complex mesh of working communities will require co-ordination in the future in order for
governance to be able to tackle future challenges successfully and to make sure that its decision process
is as inclusive as possible whilst being streamlined enough to actually reach decisions. Since the
Internet’s place in people’s lives is increasing year on year, novel scientific tools which could help in its
governance & development should be available for anyone to use.

  This essay has provided an insight into what some of these future tools might look like and how
useful they could be.

  As for the question, “who is the father of the Internet” – since we have proven that RFC authors have
a habit of working as a community, this would be impossible to determine without a DNA sample. Will
the real father(s) and mother(s) please stand up?

  More specifically, we have shown that:

              -   There are many “fathers” of the Internet. They are all closely linked together into a
                  network of authors comprised of many cliques and clusters which appear to be as
                  interlinked as the interlinking of networks in the Internet’s network of networks
              -   Many RFC documents are written single-handedly by authors, although this constitutes
                  a minority in the community
              -   The most prolific authors tend to form clear clusters, inter-linked to other clusters by
                  key individuals
              -   Jon Postel having held the position of RFC Editor was one of those key people
              -   Joyce K. Reynolds is also a very prominent author, with many RFCs co-authored with
                  Jon Postel – in fact, she also acted as RFC editor and helped with IANA management
              -   Robert Braden is a key character in the RFC structure of authors, as shown by his high
                  centrality. Admittedly, he chaired the IRTF End-to-End Research Group which
                  developed many key RFC's, and served as the RFC co-editor for the IETF.




                                         ACKNOWLEDGMENT
  The author thanks E. Boutin (University of Toulon, France) [18] and Dr. Brian Dickens (National
Institute of Standards and Technology, Gaithersburg, MD, USA) for their valuable feedback and
corrections in the dissertation of this paper, V. Cerf [19] for his kind feedback about early Internet
research and R. Bush [20] for having allowed his name to be used in examples on Ego networks. The
author would also like to dedicate this essay to Tim Gartside (ISOC Sphere Labels project) [21] who
provided some of the initial inspiration for this research but who left us tragically before its conclusion.
© 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1                                                                                  48



                                                                REFERENCES

[1] S. Crocker, “Host Software”, RFC Repository, IETF Online Secretariat, Available:
      http://www.ietf.org/rfc/rfc0001.txt?number=1
[2]
    RFC Editor Web Page. Available: http://www.rfc-editor.org/
[3]
    Kleinrock, L., “Communication Nets; Stochastic Message Flow and Delay”, McGraw-Hill Book Company, New York,
1964. (Out of Print) Reprinted by Dover Publications, 1972. (Published in Russian, 1971, Published in Japanese, 1975.)
[4]
    Licklider, J. C. R., "Topics for Discussion at the Forthcoming Meeting, Memorandum For: Members and Affiliates of the
Intergalactic Computer Network". Washington, D.C.: Advanced Research Projects Agency, 23 April 1963.
[5]
    Engelbart, D. C., et al., "SRI-ARC. A technical session presentation at the Fall Joint Computer Conference in San
Francisco, Dec. 9, 1968" (NLS demo ’68: The computer mouse debut), 11 film reels and 6 video tapes (100 min.), Engelbart
Collection, Stanford University Library, Menlo Park (CA) (some footage available on the Internet)
[6]
    Cerf, V. and Kahn, R., “A Protocol for Packet Network Intercommunication”, IEEE Trans on Communications, Vol 22-5,
May 1974.
[7]
    Metcalfe, R, et. Al., Xerox Corporation, “Multipoint data communication system with collision detection”, U.S. Patent
4,063,220, 31 March 1975.
[8]
    RFC Index. Available: ftp://ftp.rfc-editor.org/in-notes/rfc-ref.txt
[9]
    NetDraw Network Visualization. Available: http://www.analytictech.com/Netdraw/netdraw.htm
[10]
     Torgerson, W. S., “Multidimensional scaling: I. Theory and method.” Psychometrika,
17:401-419.
[11]
     3D Analysis: The Mage Page. Available: http://www.sbb.duke.edu/kinemage/magepage.php
[12]
     Einstein, A., "Die Grundlage der allgemeinen Relativitätstheorie", Annalen der Physik 49, 1916.
[13]
     Girvan, M. and Newman, M.E., “Community structure in social and biological networks.”, Proc. Natl. Acad. Sci. USA,
99, 7821-7826, 2002.
[14]
     Johnson, S.C., "Hierarchical Clustering Schemes" Psychometrika, 2:241-254, 1967.
[15]
     yEd Graph Editor. Available: http://www.yworks.com/en/products_yed_about.html
[16]
     Internet Experiment Note (IEN) Available: http://www.postel.org/ien/txt/ien-index.txt
[17]
     attributed to Socrates’s apology which Plato handed down
[18]
     Boutin, Eric, Personal Web Page : http://i3m.univ-tln.fr/imprimer.php3?id_article=88
[19]
     Cerf, Vinton, Web Page (no affiliation) : http://en.wikipedia.org/wiki/Vint_Cerf
[20]
     Bush, Randy, Personal Web Page : https://archive.psg.com/
[21]
     Gartside, Tim, Web Page : http://wiki.chapters.isoc.org/tiki-index.php?page=Tim+Gartside&bl=y




              Olivier M.J. Crépin-Leblond has been an Internet user since 1988. He received a B.Eng. Honours degree in Computer Systems and
Electronics from King’s College, London, UK, in 1990, a Ph.D. in Digital Communications from Imperial College, London, UK, in 1997, and a
Specialized Masters Degree in Competitive Intelligence and Knowledge Management from CERAM Business School in Nice-Sophia Antipolis, France, in
2007.
Over the years, he has been involved in many Internet and telecom projects, has founded Global Information Highway Ltd in 1995 and is available as a
consultant in Telecom matters. Current interests range from IPv6 deployment, Network Neutrality, Internet Governance and Green Internet to all aspects of
Strategy, Intelligence and Knowledge Management in the 21st Century, especially for bottom-up consensus-based organisations.
He is a member of the IET and senior member of the IEEE, Board member of the English chapter of ISOC and of ICANN’s European At-Large
Organisation (EURALO). In 2010 he is also a Nominations Committee member for ICANN.
Full details available on: http://www.gih.com/ocl.html

Mais conteúdo relacionado

Mais procurados

Project_report_BitTorrent
Project_report_BitTorrentProject_report_BitTorrent
Project_report_BitTorrentSrikanth Vanama
 
Lec3chap2f04
Lec3chap2f04Lec3chap2f04
Lec3chap2f04screaminc
 
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.PhiloWeb
 
Teori Jaringan Nirkabael
Teori Jaringan NirkabaelTeori Jaringan Nirkabael
Teori Jaringan NirkabaelEilaz Barnaveld
 
L26 communication services
L26   communication servicesL26   communication services
L26 communication servicesheidirobison
 
Web Technology Management Lecture II
Web Technology Management Lecture IIWeb Technology Management Lecture II
Web Technology Management Lecture IIsopekmir
 
Internet Principles and Components, Client-Side Programming
Internet Principles and Components, Client-Side ProgrammingInternet Principles and Components, Client-Side Programming
Internet Principles and Components, Client-Side ProgrammingPrabu U
 
Uses Of Internet In A Day To Day Life
Uses Of Internet In A Day To Day LifeUses Of Internet In A Day To Day Life
Uses Of Internet In A Day To Day LifeSundeep Malik
 
Net To Web 2007 version
Net To Web 2007 versionNet To Web 2007 version
Net To Web 2007 versionJohan Koren
 
Open Collaboration and Peer Production: Technical Infrastructure and Communit...
Open Collaboration and Peer Production: Technical Infrastructure and Communit...Open Collaboration and Peer Production: Technical Infrastructure and Communit...
Open Collaboration and Peer Production: Technical Infrastructure and Communit...Sebastian Benthall
 
Vinton Cerf Birth Of The Internet
Vinton Cerf Birth Of The InternetVinton Cerf Birth Of The Internet
Vinton Cerf Birth Of The InternetDavid Ricker
 
Websites 2007/2010 version
Websites 2007/2010 versionWebsites 2007/2010 version
Websites 2007/2010 versionJohan Koren
 

Mais procurados (17)

Project_report_BitTorrent
Project_report_BitTorrentProject_report_BitTorrent
Project_report_BitTorrent
 
Lec3chap2f04
Lec3chap2f04Lec3chap2f04
Lec3chap2f04
 
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
 
Teori Jaringan Nirkabael
Teori Jaringan NirkabaelTeori Jaringan Nirkabael
Teori Jaringan Nirkabael
 
L26 communication services
L26   communication servicesL26   communication services
L26 communication services
 
BitTorrent Seminar Report
BitTorrent Seminar ReportBitTorrent Seminar Report
BitTorrent Seminar Report
 
Web Technology Management Lecture II
Web Technology Management Lecture IIWeb Technology Management Lecture II
Web Technology Management Lecture II
 
Internet Principles and Components, Client-Side Programming
Internet Principles and Components, Client-Side ProgrammingInternet Principles and Components, Client-Side Programming
Internet Principles and Components, Client-Side Programming
 
fightingdci
fightingdcifightingdci
fightingdci
 
Uses Of Internet In A Day To Day Life
Uses Of Internet In A Day To Day LifeUses Of Internet In A Day To Day Life
Uses Of Internet In A Day To Day Life
 
Internet service
Internet serviceInternet service
Internet service
 
Net To Web 2007 version
Net To Web 2007 versionNet To Web 2007 version
Net To Web 2007 version
 
Internetnuovo1
Internetnuovo1Internetnuovo1
Internetnuovo1
 
Internet
InternetInternet
Internet
 
Open Collaboration and Peer Production: Technical Infrastructure and Communit...
Open Collaboration and Peer Production: Technical Infrastructure and Communit...Open Collaboration and Peer Production: Technical Infrastructure and Communit...
Open Collaboration and Peer Production: Technical Infrastructure and Communit...
 
Vinton Cerf Birth Of The Internet
Vinton Cerf Birth Of The InternetVinton Cerf Birth Of The Internet
Vinton Cerf Birth Of The Internet
 
Websites 2007/2010 version
Websites 2007/2010 versionWebsites 2007/2010 version
Websites 2007/2010 version
 

Destaque

ITU - MDD - Textural Languages and Grammars
ITU - MDD - Textural Languages and GrammarsITU - MDD - Textural Languages and Grammars
ITU - MDD - Textural Languages and GrammarsTonny Madsen
 
Infosys Q311 Result Update
Infosys Q311 Result UpdateInfosys Q311 Result Update
Infosys Q311 Result Updateabhiseksasmal
 
IDA - Fra forretningside til bundlinie: Eclipse følger dig hele vejen (In Dan...
IDA - Fra forretningside til bundlinie: Eclipse følger dig hele vejen (In Dan...IDA - Fra forretningside til bundlinie: Eclipse følger dig hele vejen (In Dan...
IDA - Fra forretningside til bundlinie: Eclipse følger dig hele vejen (In Dan...Tonny Madsen
 
Eclipse Demo Camp 2010 - UI Bindings - An Introduction
Eclipse Demo Camp 2010 - UI Bindings - An IntroductionEclipse Demo Camp 2010 - UI Bindings - An Introduction
Eclipse Demo Camp 2010 - UI Bindings - An IntroductionTonny Madsen
 
Eclipse Summit Europe '08 - Implementing Screen Flows in Eclipse RCP Applicat...
Eclipse Summit Europe '08 - Implementing Screen Flows in Eclipse RCP Applicat...Eclipse Summit Europe '08 - Implementing Screen Flows in Eclipse RCP Applicat...
Eclipse Summit Europe '08 - Implementing Screen Flows in Eclipse RCP Applicat...Tonny Madsen
 
ITU - MDD - Eclipse Plug-ins
ITU - MDD - Eclipse Plug-insITU - MDD - Eclipse Plug-ins
ITU - MDD - Eclipse Plug-insTonny Madsen
 
IDA - Eclipse Workshop I (In Danish)
IDA - Eclipse Workshop I (In Danish)IDA - Eclipse Workshop I (In Danish)
IDA - Eclipse Workshop I (In Danish)Tonny Madsen
 
World IPv6 Day IPv6Matrix Results Presentation
World IPv6 Day IPv6Matrix Results PresentationWorld IPv6 Day IPv6Matrix Results Presentation
World IPv6 Day IPv6Matrix Results PresentationOlivier MJ Crépin-Leblond
 

Destaque (10)

IPv6 Matrix exec summary
IPv6 Matrix exec summaryIPv6 Matrix exec summary
IPv6 Matrix exec summary
 
Suggestion for an IPv6 Roll Out
Suggestion for an IPv6 Roll OutSuggestion for an IPv6 Roll Out
Suggestion for an IPv6 Roll Out
 
ITU - MDD - Textural Languages and Grammars
ITU - MDD - Textural Languages and GrammarsITU - MDD - Textural Languages and Grammars
ITU - MDD - Textural Languages and Grammars
 
Infosys Q311 Result Update
Infosys Q311 Result UpdateInfosys Q311 Result Update
Infosys Q311 Result Update
 
IDA - Fra forretningside til bundlinie: Eclipse følger dig hele vejen (In Dan...
IDA - Fra forretningside til bundlinie: Eclipse følger dig hele vejen (In Dan...IDA - Fra forretningside til bundlinie: Eclipse følger dig hele vejen (In Dan...
IDA - Fra forretningside til bundlinie: Eclipse følger dig hele vejen (In Dan...
 
Eclipse Demo Camp 2010 - UI Bindings - An Introduction
Eclipse Demo Camp 2010 - UI Bindings - An IntroductionEclipse Demo Camp 2010 - UI Bindings - An Introduction
Eclipse Demo Camp 2010 - UI Bindings - An Introduction
 
Eclipse Summit Europe '08 - Implementing Screen Flows in Eclipse RCP Applicat...
Eclipse Summit Europe '08 - Implementing Screen Flows in Eclipse RCP Applicat...Eclipse Summit Europe '08 - Implementing Screen Flows in Eclipse RCP Applicat...
Eclipse Summit Europe '08 - Implementing Screen Flows in Eclipse RCP Applicat...
 
ITU - MDD - Eclipse Plug-ins
ITU - MDD - Eclipse Plug-insITU - MDD - Eclipse Plug-ins
ITU - MDD - Eclipse Plug-ins
 
IDA - Eclipse Workshop I (In Danish)
IDA - Eclipse Workshop I (In Danish)IDA - Eclipse Workshop I (In Danish)
IDA - Eclipse Workshop I (In Danish)
 
World IPv6 Day IPv6Matrix Results Presentation
World IPv6 Day IPv6Matrix Results PresentationWorld IPv6 Day IPv6Matrix Results Presentation
World IPv6 Day IPv6Matrix Results Presentation
 

Semelhante a A Study of Internet RFC Authors using NetDraw and yEd

Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database dannyijwest
 
Chapter 5 Networking and Communication Learning Objecti.docx
Chapter 5 Networking and Communication Learning Objecti.docxChapter 5 Networking and Communication Learning Objecti.docx
Chapter 5 Networking and Communication Learning Objecti.docxrobertad6
 
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTING
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH  THE RISE OF CLOUD COMPUTINGFUTURE OF PEER-TO-PEER TECHNOLOGY WITH  THE RISE OF CLOUD COMPUTING
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTINGijp2p
 
Evolution of Internet and WWW-03-01.pptx
Evolution of Internet and WWW-03-01.pptxEvolution of Internet and WWW-03-01.pptx
Evolution of Internet and WWW-03-01.pptxshubhangirastogi2023
 
Ch01.pdf kurose and ross
Ch01.pdf kurose and rossCh01.pdf kurose and ross
Ch01.pdf kurose and rossDavid Charles
 
Unit 1 - Introduction.pptx
Unit 1 - Introduction.pptxUnit 1 - Introduction.pptx
Unit 1 - Introduction.pptxBhisandulal
 
DataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
DataPortability and Me: Introducing SIOC, FOAF and the Semantic WebDataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
DataPortability and Me: Introducing SIOC, FOAF and the Semantic WebJohn Breslin
 
Analysis the article from foreign Affairs (April 1945), The Camp.docx
Analysis the article from foreign Affairs (April 1945), The Camp.docxAnalysis the article from foreign Affairs (April 1945), The Camp.docx
Analysis the article from foreign Affairs (April 1945), The Camp.docxnettletondevon
 
2009-C&T-NodeXL and social queries - a social media network analysis toolkit
2009-C&T-NodeXL and social queries - a social media network analysis toolkit2009-C&T-NodeXL and social queries - a social media network analysis toolkit
2009-C&T-NodeXL and social queries - a social media network analysis toolkitMarc Smith
 
Name of Company for Term ProjectStudent Name(s)Course MGMT.docx
Name of Company for Term ProjectStudent Name(s)Course  MGMT.docxName of Company for Term ProjectStudent Name(s)Course  MGMT.docx
Name of Company for Term ProjectStudent Name(s)Course MGMT.docxrosemarybdodson23141
 
182482527, md najmul hasan
182482527, md najmul hasan 182482527, md najmul hasan
182482527, md najmul hasan MD Najmul Hasan
 
History of the Internet.doc
History of the Internet.docHistory of the Internet.doc
History of the Internet.docNPeredaSamyJ
 
Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11
Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11
Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11William Hall
 

Semelhante a A Study of Internet RFC Authors using NetDraw and yEd (20)

Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database
 
Chapter 5 Networking and Communication Learning Objecti.docx
Chapter 5 Networking and Communication Learning Objecti.docxChapter 5 Networking and Communication Learning Objecti.docx
Chapter 5 Networking and Communication Learning Objecti.docx
 
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTING
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH  THE RISE OF CLOUD COMPUTINGFUTURE OF PEER-TO-PEER TECHNOLOGY WITH  THE RISE OF CLOUD COMPUTING
FUTURE OF PEER-TO-PEER TECHNOLOGY WITH THE RISE OF CLOUD COMPUTING
 
Evolution of Internet and WWW-03-01.pptx
Evolution of Internet and WWW-03-01.pptxEvolution of Internet and WWW-03-01.pptx
Evolution of Internet and WWW-03-01.pptx
 
Ch01.pdf kurose and ross
Ch01.pdf kurose and rossCh01.pdf kurose and ross
Ch01.pdf kurose and ross
 
Internet
InternetInternet
Internet
 
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
 
Unit 1 - Introduction.pptx
Unit 1 - Introduction.pptxUnit 1 - Introduction.pptx
Unit 1 - Introduction.pptx
 
DataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
DataPortability and Me: Introducing SIOC, FOAF and the Semantic WebDataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
DataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
 
Analysis the article from foreign Affairs (April 1945), The Camp.docx
Analysis the article from foreign Affairs (April 1945), The Camp.docxAnalysis the article from foreign Affairs (April 1945), The Camp.docx
Analysis the article from foreign Affairs (April 1945), The Camp.docx
 
Evolution of end-to-end: why the Internet is not like any other network
Evolution of end-to-end: why the Internet is not like any other networkEvolution of end-to-end: why the Internet is not like any other network
Evolution of end-to-end: why the Internet is not like any other network
 
2009-C&T-NodeXL and social queries - a social media network analysis toolkit
2009-C&T-NodeXL and social queries - a social media network analysis toolkit2009-C&T-NodeXL and social queries - a social media network analysis toolkit
2009-C&T-NodeXL and social queries - a social media network analysis toolkit
 
Name of Company for Term ProjectStudent Name(s)Course MGMT.docx
Name of Company for Term ProjectStudent Name(s)Course  MGMT.docxName of Company for Term ProjectStudent Name(s)Course  MGMT.docx
Name of Company for Term ProjectStudent Name(s)Course MGMT.docx
 
Tics
TicsTics
Tics
 
Computer 3
Computer 3Computer 3
Computer 3
 
182482527, md najmul hasan
182482527, md najmul hasan 182482527, md najmul hasan
182482527, md najmul hasan
 
Linking Open Data
Linking Open DataLinking Open Data
Linking Open Data
 
History of the Internet.doc
History of the Internet.docHistory of the Internet.doc
History of the Internet.doc
 
Websites
WebsitesWebsites
Websites
 
Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11
Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11
Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11
 

Mais de Olivier MJ Crépin-Leblond

IPv6 Matrix presentation for World IPv6 Launch, June 2012
IPv6 Matrix presentation for World IPv6 Launch, June 2012IPv6 Matrix presentation for World IPv6 Launch, June 2012
IPv6 Matrix presentation for World IPv6 Launch, June 2012Olivier MJ Crépin-Leblond
 
Permissionless Innovation BIM Trichy 2 Feb 2012
Permissionless Innovation BIM Trichy 2 Feb 2012Permissionless Innovation BIM Trichy 2 Feb 2012
Permissionless Innovation BIM Trichy 2 Feb 2012Olivier MJ Crépin-Leblond
 
Internet History - ICCA Pondicherry 30 jan 2012
Internet History - ICCA Pondicherry 30 jan 2012Internet History - ICCA Pondicherry 30 jan 2012
Internet History - ICCA Pondicherry 30 jan 2012Olivier MJ Crépin-Leblond
 
IPv6 Matrix Exec Summary Dec 2011 Results - ICCA Pondicherry 31 Jan 2012
IPv6 Matrix Exec Summary Dec 2011 Results - ICCA Pondicherry 31 Jan 2012IPv6 Matrix Exec Summary Dec 2011 Results - ICCA Pondicherry 31 Jan 2012
IPv6 Matrix Exec Summary Dec 2011 Results - ICCA Pondicherry 31 Jan 2012Olivier MJ Crépin-Leblond
 
Projet IPv6 Matrix / Version française intégrale
Projet IPv6 Matrix / Version française intégraleProjet IPv6 Matrix / Version française intégrale
Projet IPv6 Matrix / Version française intégraleOlivier MJ Crépin-Leblond
 

Mais de Olivier MJ Crépin-Leblond (20)

IPv6 Matrix Presentation - June 2013
IPv6 Matrix Presentation - June 2013IPv6 Matrix Presentation - June 2013
IPv6 Matrix Presentation - June 2013
 
What Happened at WCIT in December 2012?
What Happened at WCIT in December 2012?What Happened at WCIT in December 2012?
What Happened at WCIT in December 2012?
 
IPv6 Matrix Presentation - December 2012
IPv6 Matrix Presentation - December 2012IPv6 Matrix Presentation - December 2012
IPv6 Matrix Presentation - December 2012
 
IPv6 Matrix Presentation - August 2012
IPv6 Matrix Presentation - August 2012IPv6 Matrix Presentation - August 2012
IPv6 Matrix Presentation - August 2012
 
IPv6 Matrix Présentation Tunis 19 Juin 2012
IPv6 Matrix Présentation Tunis 19 Juin 2012IPv6 Matrix Présentation Tunis 19 Juin 2012
IPv6 Matrix Présentation Tunis 19 Juin 2012
 
IPv6 Matrix presentation for World IPv6 Launch, June 2012
IPv6 Matrix presentation for World IPv6 Launch, June 2012IPv6 Matrix presentation for World IPv6 Launch, June 2012
IPv6 Matrix presentation for World IPv6 Launch, June 2012
 
Multi stakeholder IGF-UA 2 - Kyiv Sep 2011
Multi stakeholder IGF-UA 2 - Kyiv Sep 2011Multi stakeholder IGF-UA 2 - Kyiv Sep 2011
Multi stakeholder IGF-UA 2 - Kyiv Sep 2011
 
IPv6 required - Karunya University 3 Feb 2012
IPv6 required - Karunya University 3 Feb 2012IPv6 required - Karunya University 3 Feb 2012
IPv6 required - Karunya University 3 Feb 2012
 
Permissionless Innovation BIM Trichy 2 Feb 2012
Permissionless Innovation BIM Trichy 2 Feb 2012Permissionless Innovation BIM Trichy 2 Feb 2012
Permissionless Innovation BIM Trichy 2 Feb 2012
 
Internet History - ICCA Pondicherry 30 jan 2012
Internet History - ICCA Pondicherry 30 jan 2012Internet History - ICCA Pondicherry 30 jan 2012
Internet History - ICCA Pondicherry 30 jan 2012
 
IPv6 required - ICCA Pondicherry 31 Jan 2012
IPv6 required - ICCA Pondicherry 31 Jan 2012IPv6 required - ICCA Pondicherry 31 Jan 2012
IPv6 required - ICCA Pondicherry 31 Jan 2012
 
IPv6 Matrix Exec Summary Dec 2011 Results - ICCA Pondicherry 31 Jan 2012
IPv6 Matrix Exec Summary Dec 2011 Results - ICCA Pondicherry 31 Jan 2012IPv6 Matrix Exec Summary Dec 2011 Results - ICCA Pondicherry 31 Jan 2012
IPv6 Matrix Exec Summary Dec 2011 Results - ICCA Pondicherry 31 Jan 2012
 
IPv6 Matrix Exec Summary July 2011 Results
IPv6 Matrix Exec Summary  July 2011 ResultsIPv6 Matrix Exec Summary  July 2011 Results
IPv6 Matrix Exec Summary July 2011 Results
 
IPv6 Matrix Project - general presentation
IPv6 Matrix Project - general presentationIPv6 Matrix Project - general presentation
IPv6 Matrix Project - general presentation
 
IPv6 Matrix Project
IPv6 Matrix ProjectIPv6 Matrix Project
IPv6 Matrix Project
 
IPv6 Matrix Project - ISOC Chennai
IPv6 Matrix Project -  ISOC ChennaiIPv6 Matrix Project -  ISOC Chennai
IPv6 Matrix Project - ISOC Chennai
 
IPv6 Matrix EuroDIG Summary April 2011
IPv6 Matrix EuroDIG Summary April 2011IPv6 Matrix EuroDIG Summary April 2011
IPv6 Matrix EuroDIG Summary April 2011
 
Multi-stakeholder governance
Multi-stakeholder governanceMulti-stakeholder governance
Multi-stakeholder governance
 
Projet IPv6 Matrix / Version française intégrale
Projet IPv6 Matrix / Version française intégraleProjet IPv6 Matrix / Version française intégrale
Projet IPv6 Matrix / Version française intégrale
 
The Internet in 2020 ukraine 20100904
The Internet in 2020 ukraine 20100904The Internet in 2020 ukraine 20100904
The Internet in 2020 ukraine 20100904
 

Último

Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...Aggregage
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒anilsa9823
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityEric T. Tung
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...rajveerescorts2022
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 

Último (20)

Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 

A Study of Internet RFC Authors using NetDraw and yEd

  • 1. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 1 A Study of Internet RFC Authors using NetDraw and yEd. Olivier M. J. Crépin-Leblond, PhD. Abstract— The Internet is a very important yet extremely sophisticated aspect of modern life. There has often been discussion in online forums about its origins. In particular, the community feels that it is time to say “thank you” to those people who contributed to its design and evolution. Some of the main contributors are already well known and recognized. This essay shows how to use Social Network Analysis to identify the other significant contributors to this adventure. The analysis rests on the main assumption that the Internet Engineering Task Force’s (IETF) 5000+ “Request For Comments” (RFCs) constitute the engineering basics for the Internet. Here, we use novel methods to extract data from the RFCs using readily available software, and use a suite of free downloadable software to draw several social maps of the RFC authors’ space. Our results highlight recent techniques for social mappings & data analysis in complex interaction environments such as large organizations and emerging bottom-up process governance circles such as those considered for governing the Internet. Index Terms—NetDraw, Mage, yEd, RFC, Father, Internet, Social, Networking, IETF. I. INTRODUCTION N APRIL 1969, Dr. Steve Crocker, then at UCLA, published the first Request for Comments, RFC 1 [1] I entitled Host software. The RFC repository consisting of more than 5000 entries, remains one of the “technical pillars” of the network of networks called the Internet. Once published, an RFC cannot be modified. Many RFCs are therefore superseded (or made obsolete) as new ones replace them, but each publication contributes to the overall Internet edifice. As mentioned on the RFC Editor Web page, “The RFC (Request for Comments) series contains technical and organizational documents about the Internet, including the technical specifications and policy documents produced by the Internet Engineering Task Force (IETF).”[2]. So who is the “Father of the Internet”? There is no single answer to this frequently posed question. Dr. Leonard Kleinrock is credited with packet switching theory [3]. Dr. Joseph Licklider, with the concept that computers could all be connected together into a giant network to talk to each other [4]. What about Dr. Douglas Englebart [5] inventor of the computer mouse? One of the most important advances in the Internet’s development was the TCP communications protocol, developed in 1974 by Dr. Vinton Cerf and Dr. Robert Kahn [6]. However, circa 1977 the “IP” in TCP/IP was split off from TCP circa at the urging of Dr. Danny Cohen, Dr. David Reed and Jon Postel, to support real-time, unsequenced packet streams. Furthermore, Dr. Robert Metcalfe is credited with co-inventing Ethernet [7], which today is the basic physical communication standard in most wired networked computers. How do all these people Draft manuscript completed December 5, 2008. Revised April 2009. Working Title: “Will the Real Father(s) please stand up?” This work was supported in part by Global Information Highway Limited. The Author is with Global Information Highway Ltd, 7 Kensington Church Court, London, W8 4SP, UK. (e-mail: ocl@gih.com) © 2008/2009 Olivier MJ Crépin-Leblond. All Rights Reserved. The Copyright for this paper rests with the Author but permission to freely distribute the information contained within this publication is granted provided the source of the article is credited. Parts of this document may be reproduced in a commercial publication ONLY if prior permission has been granted by the copyright holder.
  • 2. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 2 relate to each other? However, the Internet is not solely TCP/IP and Ethernet. A great number of services and other protocols at each layer of the Internet model make this network of networks, what it is today. It is therefore likely that each protocol and component of today's Internet has several “fathers” (and “mothers”). In fact, there are several thousands such contributors, both inside and outside the realm of RFC space. Nevertheless, because their proposals are contained in the many RFCs, we decided to look specifically at the Internet standards, RFCs and their authors, possibly the largest “family” of Internet pioneers and contributors available. This essay serves to determine the most prolific authors/contributors to the RFC database and to extract a social network of RFC authors in order to better understand their working relationships and spheres of influence. It uses modern social network engineering tools to make the vast amounts of data available to us today more easily understandable. It will also serve to highlight the shortcomings of such a method, mainly caused by its restricted input data set consisting solely of the RFC database. Why this research? By undertaking this research, we show the use of social networking topology modeling to elucidate the workings of bottom-up processes promoted to construct at-large governance. We define a methodology for such study and look forward to such an analysis being used in future organizational processes involving large groups of participants. Finally we explore avenues to more fully comprehend the change in social paradigm that Internet brings to the traditional governing processes used in non- Internet regimes.
  • 3. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 3 II. DATA COLLECTION METHOD A. Collecting Data The source data of the study was loaded from the RFC Editor FTP site as the RFC Bibliographic Listing (Created 09/08/2008)[8]. This has the advantage of respecting a set format which can be more easily machine-readable than other RFC documents. This resulted in 5340 RFCs indexed. B. Refining/Formatting Data Using data mining techniques to extract the names of authors and their interpersonal relationships from the list of RFC authors forms a crucial part of the work. No purpose-built software was used for data mining: the data set was filtered in several stages using text processing tricks usable by anyone with an ability to master them in standard Microsoft software. This consisted of importing the list of RFC authors as a text file into MS Word and reformatting the text with even delimiters using the “replace” functions inherent in that software. The resulting file was imported into an MS Excel table with each line corresponding to one RFC entry matching names of authors, one name per column – a formatted table of authors working together. The most time- consuming process was to crosscheck accuracy and synchronicity of data manually due to errors generated by erroneous formatting of the original file. For example, missing punctuation delimiters in the MS word file triggered mismatching of names in columns. Intermediate stages included tables 52 columns across & 5 170 rows in height. This table was transformed (using cut/paste) into a linear numbered referential X-Y listing of authors containing 10 735 entries. The file was imported into an MS Access Database. Two cross-linking rules were set-up. The first one served to add-up the total number of publications per author. The second one was used to add-up the number of publications of each pair of authors. The input included a table of 10 735 entries. The outputs consisted of a table of 3 480 entries for the authors listing and a table of 17 266 pairing links. This constituted our network of authors. Cutting and pasting into a text file and adding the correct formatting code resulted in a file satisfying the input “.vna” format required for the NetDraw Software. The format is human-readable and therefore easy to generate manually or automatically, without being a proprietary binary file format. It is shown next.
  • 4. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 4 *Node data ID, publi Postel_J. 205 McCloghrie_K. 92 Rose_M. 75 Rekhter_Y. 69 Reynolds_J. 64 Schulzrinne_H. 62 McKenzie_A. 60 Braden_R. 51 Crocker_D. 51 […] *Tie data from to intensity Postel_J. Reynolds_J. 37 Reynolds_J. Postel_J. 37 McCloghrie_K. Rose_M. 26 Rose_M. McCloghrie_K. 26 […] ID is the author’s name; publi is a variable denoting the number of publications; intensity is the number of publications for the author pair. Obviously, this collaboration is reciprocal so it is automatically shown going both ways. “[…]” denotes all further entries. The data mining mechanism defines the data which is made available for the NetDraw software to analyze and plot. Different data sets can be designed for different purposes and the stage of information collection and data mining is therefore crucial in relation to the targeted end results.
  • 5. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 5 III. GLOSSARY OF TERMS In order to analyze a social network, we start by looking at each individual. In the field of bottom-up analysis, all networks are composed of groups (or sub-graphs). When two participants have a tie, they form a "group". One approach to thinking about the group structure of a network begins with this most basic group, and seeks to see how far this kind of close relationship can be extended. This is a useful method, because sometimes more complex social structures evolve, or emerge, from very simple ones, and this is the type of hidden information which we are hoping to detect when analyzing the network. Social networking analysis relies on graph theory, a discipline which has been traditionally mathematical in nature. Because each discipline speaks a particular language, it is important to define a restricted number of terms which will be used at length in this essay. For the sake of easy referral, those terms are presented here, taking into account the context of our analysis. In general, different terms sometimes have the same meaning depending on the context (bibliographical, scientific, geographic; mathematical, etc.). Their equivalency is shown here. A “node”, also referred to mathematically as a “vertex” (plural: “vertices”), is a point representing a single RFC author. In NetDraw, this is also called a “symbol”. In the paragraph above, we referred to a node as it a “participant” or an “individual”. In order to reduce confusion, we use only “node” and “author”. When two or more nodes (RFC authors) work together on an RFC, they are linked by a “line”. A line therefore ends at nodes. In NetDraw, this is also called a “link”. A mathematical designation of a link is an “edge”. All three terms will be used in this essay. A “graph” is the set of nodes and set of lines between pairs of nodes, as visualized on a 2 or 3 dimensional plane. A “network” consists of a graph and additional information on the nodes or the lines of the graph. This is effectively what we are building with NetDraw. A “cluster” is a group of 2 or more nodes connected together. A “clique” is a maximal complete sub-network containing 3 nodes or more. It is a specific form of cluster. In graph theory, this sub-set of a network contains nodes which are more closely and intensely tied to one another than they are to other members of the network. Strictly speaking a group is identified as a clique when every node is directly linked to every other node in the group. A “dyad” is the smallest grouping of nodes, that is, two nodes linked together. “Betweenness” is defined as the degree a node lies between other nodes in the network. In effect, it is an intermediary, also known as a bridge or a liaison. Therefore, it is the number of other nodes it links directly or indirectly together through its own links. The degree of betweenness is important in a social network because it defines the nodes connecting sometimes vastly different groups together. “Closeness” is defined as the degree a node is near all other nodes in a network (directly or indirectly). Thus, closeness is the inverse of the sum of the shortest distances between each node and every other node in the network. A node with a high degree of closeness is more “central” to the network than one with lower closeness. “Pendants” are nodes connected to the rest of the network through a single link. “Isolates” are nodes which are not connected to any other node in the network. In our case, this is an author having published all of his or her RFCs solo.
  • 6. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 6 IV. PLOT AND ANALYSIS NetDraw [9] is a network visualization software that can be downloaded from the Internet for free. Its license agreement allows it to be freely copied. A set of analytical protocols is available to extract meaning from the data. The algorithms included in the software are used in social network analysis, micro-molecular analysis, physics as used in astronomy, and other disciplines. In this section, results will be presented for several types of analysis. A. Circle Layout 1) Method/Theory The Circle Layout uses a simple algorithm to plot nodes in a geographic circle. In NetDraw, it is possible to define the order of the nodes around the circle to be alphabetic or depending on the number of RFCs published. The best connected nodes are found by simply looking at the concentration of links and their thickness. User intervention is however required to detect pendants since these are also plotted within the circle and are not immediately discernable. 2) Results A graph of the resulting plot is shown in Figure 1. Figure 1: Network nodes plot using the Circle Layout (Authors having published 20+ RFCs) The parameters used for the plot were as follows: • Data subset: author has published more than 20 RFCs (64 authors satisfied this rule) • Node size according to number of publications
  • 7. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 7 • Link thickness between two nodes according to number of publications the authors have written together There is a concentration of links around Jon Postel. This is to be expected since, as RFC editor for many years, his contribution to the RFC process vastly exceeded any other author’s contribution. Other link concentrations can also be easily discerned, directly related to the closeness of each author. A dyad, a few isolates, as well as several pendants are visible. The order around the circle is set automatically by the program using a parameter which is user-chosen, in this case alphabetically. Another straight forward parameter which could be chosen for this function is the number of links to other nodes. Nonetheless, neither parameter avoids the pitfalls that the software falls into and which requires a human eye to reorganize: • The dyad, pendants & isolates had to be extracted manually from the circle’s layout; • Nodes are not arranged in an order which reduces link distance. For example, Malkin_G is connected to Reynolds_J and Baker_F but is geographically located on the other side of the circle, thus adding to a possibly false impression of extensive inter-connection between nodes. 3) Scaling up Loosening the data subset constraints of 20 RFC publications per author brings more nodes in the picture. The restricted data results may show no connection between the isolates and the main group – this may only be so due to the constraints used. In fact, they may connect to the main group via other authors who do not satisfy the sample’s constraints but have a high degree of betweenness. Reducing the constraint by selecting authors of 10 RFCs or more (171 authors), reveals an increase in mesh density within the network. This is shown in Figure 2. Figure 2: Network nodes plot using the Circle Layout (Authors having published 10+ RFCs)
  • 8. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 8 Removing the subset constraints altogether shows the overall graph shape of the network, including all 3 480 authors, as shown in Figure 3. It is clear that RFC authors are well connected together and that the RFC process provides a real sense of community. Figure 3: Network nodes plot using the Circle Layout (all RFC authors) 4) Conclusions The Circle layout utilizes a simple algorithm to display nodes in a geographic circle. Its advantages are reduced computing processing power requirements and a display giving the eye a clear sense of cross-group connectivity. Its weaknesses are individual anomalies such as the ill-placing of isolates, dyads and wrong placement of nodes which are connected to a reduced number of other nodes. The algorithm does not take into account the geographical positioning of nodes according to their links to other nodes. Both weaknesses can be corrected by human intervention. As a result, the algorithm is very useful for displaying social interaction between the authors of RFCs and detecting some of the synergies that originated in building the RFC standards.
  • 9. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 9 B. Multi-Dimensional Scaling (MDS) Analysis 1) Method/Theory Multi-Dimensional Scaling (MDS) Analysis [10] comprises a set of statistical techniques used together to visualize data in an N-dimensional plane. The MDS Algorithm looks at similarities within the data and assigns a location to each node of the input network. This algorithm is particularly suited for 3D visualization. MDS is not so much an exact procedure as rather a way to "rearrange" nodes in an efficient manner, so as to arrive at a configuration that best approximates the link structure. 2) Results & tri-dimensional MAGE Plot The parameters used for the plot were as follows: • Data subset: author has published more than 20 RFCs (64 authors satisfied this rule) • Node size according to number of publications • Link thickness between two authors according to number of publications written together MDS analysis yields poor results when plotted in 2 dimensions because the nodes overlap each other, thus making the graph illegible. However, it is possible to view a 3D graph by exporting of the graph data (in Kinetic Image .kin format) to a separate (free) 3-dimensional rendering program named MAGE[11]. MAGE is used for all sorts of 3-dimensional rendering such as molecular chemistry and physics, biology, mathematical analysis and even archeological modeling. NetDraw can export data to a Kinetic Image format, which makes it suitable for displaying the network in 3D, as seen in Figure 4. Figure 4 : Mage visualisation of authors of 20+ RFCs The overall structure consists of a main cluster of nodes and several isolates. Pendants are also clearly discernable. Nodes at the center of the cluster can be clearly seen as being more connected. An important feature of MAGE is the ability to rotate the structure taking any node as an axis. Zooming in/out is also possible. Rotation is a particularly important cognitive process for the brain to
  • 10. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 10 understand 3D structures, although we are only using a subset of the features of MAGE. The zoom feature is illustrated in Figure 5 which shows a clique within the network structure. This shows a working group of authors who wrote several RFCs together. Whilst it does not mean that all authors were present in each RFC, it shows extended collaboration between the authors represented by the nodes. Figure 5: Zooming in on a cluster within a Mage Plot of Multi-dimensional Analysis Once zoomed-in, rotating the structure around the central cluster’s node is also possible and yields good results. 3) Scaling up Multi-dimensional system analysis is a processor and memory-intensive method since its results are best represented in 3 dimensions. A test run was undertaken by selecting authors having published at least 10 RFCs. This brought the number of authors up to 171 authors. MDS cluster analysis, although demanding much processing power, gave poor results, even when plotted using MAGE. The cause was traced to tight clustering of the nodes, thus requiring parameters in MAGE to be tweaked to omit displaying the nodes. This resulted in a diagram showing the links only – a wire frame of the whole structure which required maximum zooming in to be displayed. The resulting view was very unclear. Scaling up the MDS analysis with an input data set from the initial 64 nodes to hundreds or even thousands of nodes, requires more computing power and several Gigabytes of memory. Insufficient memory triggers buffer overflows which crash the software. Future versions of the software might avoid this condition although increasing node count increases complexity exponentially.
  • 11. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 11 4) Conclusions Multi-dimensional system analysis is useful in displaying the network in three dimensions. The NetDraw feature to transfer the results to MAGE (through a .kin export file) is very useful to plot the network in true 3D, including changing the position of lights as well as visualizing the network from any angle and traveling virtually through it. Node clusters can be visualized with ease. However, some information is lost, for example thickness of link or size of node. It is hoped that future versions of NetDraw and Mage will incorporate those features to make the visualization an enhanced experience.
  • 12. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 12 C. Geodesic Distance through Spring Embedding 1) Method/Theory The Spring Embedding method is based on the geometric theory of gravitation [12], although constrained to the 2-dimensional plane, hence the crowded display. Each node is considered to be acting on the other nodes through attraction and repulsion, and the links between the nodes are taken as springs enabling the nodes to travel. This iterative method places nodes on the plane and eventually reaches a stable state, provided enough iterations are calculated. The “geodesic distance” is the shortest path between two nodes. If node x is connected to node y which is connected to node z, the distance from node x to node y is the length of the geodesic distance from x to y. The geodesic distance from node x to node z is the sum of the geodesic distances from x to y and from y to z. In the context of social networking, this enables us to analyze the “networking extent” of an individual based on his or her number of connections. In other words, how well are they connected to the rest of the network? This is the concept of “centrality”, also referred to as “closeness” and described earlier in the glossary. The constraints of the layout criteria, whilst introducing some error margin, included “node repulsion” and “equal edge lengths”. Node repulsion introduces a minimum distance between nodes displayed on the graph and is required to avoid a clustering of the nodes to the extent that the overall diagram would be unreadable. Equal edge lengths is self-explanatory and serves to constrain the length of the links between the nodes in order to provide some space within the graph. It does not mean that all links will have the same lengths: the program will just try to make the lengths as similar as possible. Both constraints were used specifically to improve the readability of the graph. The analytical process being an iterative process, every instance of the analysis does not yield a geometrically exact reproducible layout, although the produced layouts are a very similar in shape and geometric positioning. The structures and clusters are the same. This type of plot is readily available in the NetDraw software. 2) Results The parameters used for the plot were as follows: • Data subset: author has published more than 20 RFCs (64 authors satisfied this rule) • Node size according to number of publications • Link thickness between two authors according to number of publications written together • layout using Spring embedding iterative simulation • number of iterations:100 – 1 Billion Since this is an iterative analysis, increasing the number of iterations should improve on the “accuracy” of the results. In fact, repeating the analysis from 100 iterations in regular steps to 1,000,000,000 iterations showed no significant difference to the layout. The resulting graph is shown in Figure 6. With each node representing an individual researcher, individuals which are located more at the center of the diagram act as bridges between various groups of researchers. It is also possible to easily see clusters of nodes which are well interlinked together. Cliques are clearly visible, and clusters including thicker link width indicate more extensive collaboration between a number of authors.
  • 13. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 13 Attempting to export the .kin data to a MAGE 3D plot yielded results which did not appear as conclusive as the MDS analysis due to the high cluster concentration of nodes – the export process shortened the link length to such an extent that the viewing of the cluster was affected. Figure 6: Graph of Spring Embedding using Geodesic Distances, Node Repulsion and Equal Edge Lengths (64 authors having published 20+ RFCs). 100 Million iterations.
  • 14. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 14 3) Scaling up Scaling up and running the simulation under constraint that authors publish 10 or more RFCs, brings the total number of nodes to 171. Sadly, the clutter caused by more nodes makes the resulting graph less useful than for a more restricted input set. It is possible to discern the largest nodes, but smaller nodes are seen with difficulty. (Figure 7) Figure 7: Graph of Spring Embedding using Geodesic Distances, Node Repulsion and Equal Edge Lengths (171 authors having published 10+ RFCs) The new authors join the whole network with several pendants, very few isolates and only one dyad. With up to all 3000+ authors, the network becomes difficult to interpret due to lack of space. 4) Conclusions The advantage of Geodesic Distance analysis using node repulsion is that of providing results which are easily displayed in two dimensions. Since the analysis is based on an iterative process, the computing power required for such an analysis can be user-selected. Lower iteration values yield results which are slightly more unstable in geometric placement of the nodes. Cliques, clusters, isolates and other features of the network can be clearly identified and reliable conclusions can be derived about the centrality of an individual thanks to his or her final geometric location within the resulting “network of people”.
  • 15. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 15 D. K-Core Analysis 1) Method/Theory K-Core analysis is based on the clustering of groups of people who are closely connected together. It is a way to study the nested structure of a modular organization. The K-Core of a network is the maximal sub-network consisting of links with degree at least k. For example, the 1-core is simply the original network; the 2-core is the network with all the pendants removed etc. Increasing k removes links and nodes which are less closely connected to the network. 2) Results The parameters used for the plot were as follows: • Data subset: author has published more than 20 RFCs (64 authors satisfied this rule) • Node size according to number of publications • Link thickness between two authors according to number of publications written together Four distinct groups of people are established: three main groups, and one group of authors that published solo. It can be seen that clustering is caused by “similarity” data. As expected from the algorithm, the defining factor for the clustering is the number of links originating at each node. This in itself is a limitation. When performing K-Core analysis, the resulting groups show inconsistencies. Pendants and nodes connected to 2 groups with a single link to those groups, or to 2 nodes in the same group, are defined as a separate group. This, of course, is a correct representation of K-Cores, but of no use for our purpose of organizing the groups visually. Manual translation of these nodes into the correct groups was therefore required and the resulting graph is shown in Figure 8. Dyads do not fare well either in K-Cores since they are not connected to the main group. Pendants also need to be translated since they are not seen by the software as having integrated well with any of the cliques present, although in real life, a pendant would probably benefit well from the clique through the node to which it was linked. 3) Scaling up Scaling up, running the simulation under constraint that authors publish 10 or more RFCs, brings the total number of nodes to 171. Whilst the overall graph including all Ks is too crowded, it is possible to run a different type of K-Core analysis, by selecting only groups with specific value for K. This selects the nodes having a specific closeness or better.
  • 16. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 16 Figure 8: K-Core Analysis of 64 authors having published 20+ RFCs Since the network is divided into six groups, (Group 1, Group 2, Group 3, Group 4, Pendants, Isolates), the value of k can be selected to be any number from 1 to 6. k=0 selects the isolates. k=1, the pendants, k=2, the nodes having 2 links etc. Selecting nodes with k=6 and plotting them using Spring Embedding with Geodesic Distances, Node Repulsion and Equal Edge Lengths, it is possible to display the most tightly connected nodes in the network. These 15 nodes might not be the most central, but form the highest clique in the overall graph. This is shown in Figure 9. In another run, a value of k=5 was selected thus incorporating more nodes in the graph, as shown in Figure 10. The network obtained is the core network upon which most other nodes will link to. In the real world, and non technical language, the 64 authors shown this graph are the “pillars of the community” in that they have published in excess of 10 RFCs and have also networked extensively with their peers. Some authors might have published more RFCs than them, but their network might not have been as wide-ranging. 4) Conclusions Whilst K-Core analysis might appear to, on first use, not yield meaningful results, this is countered by the usefulness in finding the nodes with the highest closeness within our target group. Performing K- Core analysis and displaying the results by grouping according to the K-Core criteria, it is possible to see how many of each type of node is present in the network. Displaying the results using Spring Embedding GeoDesic Layout shows who are the most socially connected authors in our network. The mixing and matching of parameters (constraints about number of RFCs published, k value, display and grouping methods) can bring very interesting facts about the social network than first meet the eye.
  • 17. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 17 Figure 9: K-Core Analysis K=6 of authors having published 10+ RFCs / Spring Embedding GeoDesic Layout Figure 10: K-Core Analysis K=5 of authors having published 10+ RFCs / Spring Embedding GeoDesic Layout
  • 18. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 18 E. Blocks & Cutpoints In this analysis, the software checks for nodes that will specifically cut parts of the overall network off if they were to be removed from the structure. 1) Method/Theory and results The parameters used in our analysis were as follows: • Data subset: author has published more than 20 RFCs (64 authors satisfied this rule) • Node size according to number of publications • Link thickness between two authors according to number of publications written together Results using this method are not useful in our case: the subset of authors selected has worked extensively together since it is really composed of the core of our network of 3 480 authors. As a result, the overall network of nodes features enough redundancy for no single “point of failure” – ie. “Cutpoint” – except for pendants. Since this can be established visually, there is no requirement to run the analysis and plot results. However, this type of analysis would be useful in more loosely-connected communities because it tags the nodes which are essential in linking disparate clusters which would otherwise have been unconnected. 2) Scaling up As the constraints on the RFC authors are eased by allowing authors having published less than 20 RFCs in the network, it is possible to discover where the cutpoints are to these other authors. This could determine which of the core authors bring connectivity between the core network of authors and the rest of the RFC community. However, in the case of RFCs, the network is too closely connected to be affected by cutpoints. 3) Conclusions “Blocks and cutpoint” analysis is useful in examining loosely-connected networks. This type of analysis yields ambiguous results when used on closely-connected networks such as the network of RFC authors since the only critically connected components of the network are pendants, and those are easily detected by eye. It is worth noting that this type of analysis can be combined with any of the above analyses since the tagging of blocks and cutpoints can be undertaken by changing node colors and shapes. Sometimes, a new network layout can enhance readability whilst keeping block and cutpoint tagging active.
  • 19. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 19 F. Factions A “faction” is a group or clique within a larger organization, or the like. In graph theory, a "faction" is a part of a graph in which the nodes are more tightly connected to one another than they are to members of other “factions”. The NetDraw program can iteratively determine the most appropriate division of the network using a “factioning” algorithm. It is worth comparing this analysis with K-Core data which is based on similar principles of local clustering or sub-structure. 1) Method/Theory The algorithm is different from the K-Core algorithm in that NetDraw actually asks how many factions should be created. The algorithm then forms the number of groups desired by seeking to maximize connection within, and minimizing connection between the groups. Nodes are colored, and the information about which nodes fall in which partitions (i.e. which cases are in which factions) is saved to the node attributes database. 2) Results The parameters used in our analysis were as follows: • Data subset: author has published more than 20 RFCs (64 authors satisfied this rule) • Node size according to number of publications • Link thickness between two authors according to number of publications written together In our example, expanding the K-Core analysis described earlier, it was assumed that we could initially divide the network into 5 factions. This is shown in Figure 11. It is then possible to explore further faction division by increasing the parameter for the number of clusters required. This yields sometimes peculiar layouts, shown in Figure 12. Figure 11: 5 factions of authors having published 20+ RFCs / Layout, node color & shape, according to factions
  • 20. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 20 Figure 12: 10 factions of authors having published 20+ RFCs / Layout, node color & shape, according to analysis when dividing into 5 factions. Note which factions have been divided – hence which are the weaker factions There appears to be no single correct or incorrect “answer” using the faction algorithm. There is just a measure of the faithfulness of a node to a cluster depending on its connection to one, two, or more groups. Figure 13: 5 factions of authors having published 20+ RFCs / Layout, node color & shape, according to analysis when dividing into 10 factions. Note which nodes have been grouped.
  • 21. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 21 It is therefore possible to determine which factions are more strongly linked together and which are likely to break apart when circumstances change. It is also possible to see which new factions are likely to be created. The algorithm can be used in the other direction. For example, it is possible to start with a larger number of factions, and reduce the number of factions, with groups merging together. It is interesting to see how there is no homogeneous gathering of all nodes when the number of factions is reduced. An example is shown in Figure 13. Another oddity is the grouping of nodes which are not inter-connected to each other. Rather than the algorithm grouping them due to their inter-connectivity, it groups them due to their not fitting in any other faction. The results can be displayed not only as a layout rendered by this algorithm, but also as another layout, such as K-Cores, Circles, etc. This introduces interesting differences since some nodes which might be part of one cluster during faction analysis, might be part of another group during K-Core analysis grouped layout. 3) Scaling up Scaling up, running the simulation under constraint that authors publish 10 or more RFCs, brings the total number of authors to 171. Reading individual node labels is impossible at this density. However, it is possible to remove labels and perform macro-analysis. For example, it is possible to divide the network into 10 factions and assign a color and shape to each faction, then group the factions together by reducing the network to 5 factions. Some factions do not wholly group with a single other faction but sometimes distribute their nodes among the other factions according to the affinity each node had with other nodes in other factions. This makes for interesting analysis in real world social grouping, for example in electoral processes.
  • 22. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 22 This is shown in Figure 14. Figure 14: 10 factions above are grouped into 5 factions below. It is possible to see how some clusters splintered among several factions. Network subset: authors having published 20+ RFCs 4) Conclusions The “Factions” feature in NetDraw is useful to group authors into clusters and detect those authors having an affinity to another group when dividing the network into a different number of factions. This type of analysis is sometimes more conclusive when a network is more loosely interconnected than in our example making use of a restricted number of authors which are very closely related to each other. This analysis is also useful when grouping clusters according to affinity. Our example shows that the grouping of clusters is not one that takes place wholly and evenly since some factions divide themselves among the remaining clusters. As with any social network analysis, care must be taken not to jump to conclusions from first examination because oddities might appear in the clustering process. These are caused by lack of fit within any other group, rather than a similarities or good connectivity within the group itself.
  • 23. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 23 G. Girvan-Newman algorithm The Girvan-Newman algorithm is one of the methods used to detect communities in complex systems. In fact, the theory developed by Girvan and Newman [13] defined communities as not being quite the same thing as clusters. 1) Method/Theory A “community” is a cluster of nodes where the inter-relationship between nodes is high through a high concentration of links. A clique would fit this description but a community is not restricted to a clique. What defines the community from the cluster is that the links to nodes in other communities are specifically less dense, whilst clusters do not take this into account. Without going into details about this algorithm, its basic function is as follows: 1. Calculate the betweenness of all existing links in the network; 2. Remove links with the highest betweenness; 3. Recalculate betweenness of all links affected by the removal; 4. Repeat steps 2 and 3 until no links remain. 2) Results The parameters used in our analysis were as follows: • Data subset: author has published more than 20 RFCs (64 authors satisfied this rule) • Node size according to number of publications • Link thickness between two authors according to number of publications written together The analyst can choose how many communities to create from the network. Running the analysis produces node data which will be saved with the rest of the data related to each node. Displaying the results is possible by selecting the “Group by attribute” layout. It is therefore possible to reach a large number of results, depending on the number of communities chosen. With other methods, the meaning of the resulting data is left to the analyst’s eye. Selecting too few communities will cluster nodes which are too loosely connected together. Too many communities will explode more tightly knit communities but show the cliques within the communities with greater detail. However, the Girvan Newman algorithm introduces the variable Modularity Q. The algorithm calculates the Modularity of each type of grouping and Q is an indicator of the quality of clustering. Choosing a calculation using from 2 to 15 clusters in our target network, the following results were obtained: Clusters 2 3 4 6 7 8 9 13 14 15 Q 0.013 0.294 0.460 0.500 0.493 0.487 0.482 0.463 0.458 0.442 Q is maximized when dividing the network into 6 clusters. This result therefore appears to be the most
  • 24. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 24 befitting group structure in our network, and this is shown in Figure 15. Figure 15: Girvan-Newman algorithm clustering for 6 clusters (Modularity Q=0.500) For comparison reasons, it is then possible to mix several analyses on one diagram. For example, the above diagram layout can be kept while node attributes are modified according to other parameters such as K-Core analysis. Performing such a plot, it is possible to find the degree of connectivity of nodes within each community. This is shown in Figure 16. In the diagram, the nodes with highest K-Core value are shown as upward pointing red triangles, the next as down pointing blue triangles, then yellow circles in square, etc. This provides information about the key connecting nodes, intra & intercommunity-wise. Figure 16: Girvan-Newman algorithm clustering for 6 clusters (Modularity Q=0.500) & K-Core Analysis
  • 25. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 25 3) Scaling up Growing the sample size by loosening the constraint to 10+ publications by author, it is possible to analyze 171 authors. Running the data through the Girvan Newman algorithm produced values for the Modularity factor Q different from the smaller data set. Clusters 7 8 9 10 11 12 13 14 15 16 Q 0.091 0.456 0.446 0.453 0.451 0.455 0.453 0.438 0.438 0.437 In this case, no type of clustering shows a dominant Q modularity. The network could be divided into 8 to 13 communities with similar Q modularity, thus demonstrating a very similar quality of clustering. It is therefore apparent that the Girvan Newman algorithm does not scale well with our network since the communities are to tightly knit together – a testimony to the “community feeling” of RFC authors. 4) Conclusions When analyzing the core network of RFC authors, the Girvan-Newman algorithm produces graphical results which give the impression of being similar to other methods. It is useful to find those clusters of nodes with highest betweenness, even when zooming onto communities which might have initially appeared to be tightly knit. The Modularity factor Q is calculated by the NetDraw software using the algorithm, as a measure of the quality of clustering, and this allows us to find the most natural type of clustering for the data. This algorithm is consequently very efficient at detecting communities and the most likely grouping of those communities, even when the initial data set is as restricted as the RFC authors list. It yields more accurate results when used with smaller social networks.
  • 26. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 26 H. Hiclus of Geo-distances This stands for a method named High Clustering of Geodesic distances. This algorithm was developed by Johnson [14] and is used by NetDraw to generate n-numbers of clustering possibilities were n ranges from 2 clusters to the total number of nodes analyzed. 1) Method/Theory The Hiclus of geodesic distance is a measure of cohesion in subgroups within the network calculated by algorithms defined as follows: With N nodes that need to be clustered and an N x N distance (or similarity matrix): 1. Assign each node to its own cluster, with its distances defined as the distances (similarity) between the items they contain 2. Find the most similar pairs of clusters and merge them into a single cluster 3. Compute distances (similarities) between the new cluster and each of the old clusters 4. Repeat steps 2 and 3 until all nodes are clustered into a single cluster of size N. The geodesic distance in this context makes the assumption that the graph is a three-dimensional object and that the links between each node is the distance between them. For example, adjacent nodes have a distance of one. From a node to another by stepping through a third node has distance of two, etc. 2) Results A large set of results is calculated by the program and is saved as new attributes for each node. This can therefore be plotted using “group by attribute”. The Hiclus of 5 clusters is shown in Figure 17. Figure 17: Hiclus of geodesic distance selecting 5 clusters
  • 27. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 27 Figure 18: Hiclus of geodesic distance selecting 7 clusters The Hiclus of 7 clusters (Figure 18) appears more meaningful since enough groups are formed which show real clustering. Increasing the number of groups (8, 9, 10, etc.) it is possible to see groups splitting. The Hiclus of 15 clusters is shown in Figure 19. Figure 19: Hiclus of geodesic distance selecting 15 clusters – groups are splitting up into individuals
  • 28. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 28 Analysis of these graphs makes it possible to find out which nodes are more likely to break off from a cluster, and in which order. Since NetDraw generates node data until Hiclus N, where N is the number of nodes in the network, it is possible to gauge the order in which groups will split up. For example, taking the graph of Figure 19 and redefining node colors and shapes according to the Hiclus of Geodesic distance with 5 clusters, it is possible to see how the 5 original clusters split up into 15 clusters, some clusters being single pendants or dyads. This is shown in Figure 20. Figure 20: Hiclus of geodesic distance selecting 15 clusters compared with node attributes for 5 clusters 3) Scaling up Since the algorithm as implemented in NetDraw involves clustering from 2 to the total number of nodes in the graph, this type of analysis does not scale well except if using powerful processing resources. An analysis of 64 authors (initial subset) yielded the above results. Increasing the sample size to 171 authors served only to crowd the graph to the point of making it less legible. If all constraints are removed, more than 3 000 authors have to be processed and this has been found to generate superfluous results. This type of analysis is therefore better used for smaller subsets of nodes. 4) Conclusions The Hiclus of Geodesic distance analysis yields results where a division of the graph is undertaken from 2 to N clusters, where N is the total number of nodes in the graph. Successive plots, for example Hiclus of 5 and Hiclus of 15, are possible, and if the node shape is defined according to its clustering in Hiclus 5, it is possible to see the clustering in Hiclus 15 and the varied make-up of the resulting clusters. Whilst pendants will be the first to break from a cluster, cliques are likely to be the last clusters to divide themselves. It can be seen clearly by comparing the graph for Hiclus 5 and Hiclus 15. This is essentially a very useful method to gauge the stability of a group of people.
  • 29. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 29 I. Ego Networks One of the assumptions in each of the analyses presented thus far assumes that all nodes are active throughout the period of activity from which the data was mined. In the case of RFCs, this was unfortunately sometimes not the case. For example, Jon Postel passed away in 1998 and this left a huge gap in the RFC space, not only because of his hierarchic position in the social network but also because he was such a pleasant and hardworking individual. This kind of influence could however not be measured mathematically. If one resorts to strictly looking at relationships as defined from data mining, a mathematical measure of an individual’s influence in a network can be calculated in NetDraw. This theory is named by social network researchers as “Ego Networks”. 1) Method/Theory The Ego network of a node with geodesic distance 1 consists of all nodes immediately linked to that node. When the geodesic distance is increased to 2, nodes connected to those nodes are included in the graphic, and so forth for higher Geodesic distances. NetDraw allows the user to select more than one node’s ego network to find out the geodesic relation between them, depending on each individual ego network’s reach. 2) Results The parameters used in our analysis were as follows: • Complete Data set of 3 480 authors and 17 000+ links • Node size according to number of publications • Link thickness between two authors according to number of publications written together In order to illustrate the concept of Ego networks, we have simulated the ego network of a well- known RFC author, Dr. Vinton Cerf, as shown in Figure 21. Figure 21: Ego Network of V. Cerf. (geodesic distance = 1)
  • 30. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 30 Input file discrepancies (Metcalfe_R. and Metcalfe_B) are treated in Section V.A.1.a. This diagram, just like every other result obtained using NetDraw, can be exported to MAGE and rotated, zoomed-in and otherwise manipulated in 3D. An example screenshot is shown in Figure 22. Figure 22: Ego Network of V. Cerf. (geodesic distance = 1) as seen in MAGE 3-D Another use of the Ego network analysis as applied in NetDraw is the analysis of connection paths between two nodes having a geodesic distance greater than 1. It is possible to plot the Ego network for another author, for example Randy Bush, and relate it to Dr. Cerf’s Ego network, whilst keeping a maximum geodesic distance of 1. This is shown in Figure 23, overleaf. NetDraw allows for any combination of geodesic distance & simultaneous node selection (or de- selection, in order to note the “holes” in the network, and further analysis is possible on this sub- network alone, through K-Core, Newton-Girvan, or indeed any other analysis as described above. This makes for a very extensive combination of analysis and the possible generation of interesting social patterns within the network. 3) Scaling up In NetDraw, it is possible to select a geodesic distance of 2 (or more), in order to find out the nodes connected to the nodes connected to Dr. Cerf’s node – the 2nd degree of separation. Since the RFC community is well networked, the resultant graph is much more crowded, as seen in Figure 24. The analysis is therefore limited by readability of the resulting graphs.
  • 31. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 31 Figure 23: Ego Network of V. Cerf. (geodesic distance = 1) relating to R. Bush’s network Figure 24: Extended Ego Network of V. Cerf. (geodesic distance = 2)
  • 32. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 32 4) Conclusions The Ego Network analysis is very useful in determining the structure of nodes directly linked to a node, and in turn, the structure of nodes connected to those nodes. It is a useful tool to determine the extent of a node’s social networking reach as well as the social structure between two or more nodes. When used on a data set consisting of a group of people in an organization, it is therefore possible to evaluate an individual’s social influence and immediate surround.
  • 33. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 33 J. Geometrical Analysis using yEd yEd [15] is a free Java-based graph editor which can be used to generate drawings and to apply automatic layouts to the graphs comparable to those generated by NetDraw. The strength of yEd lies in its ability to re-map complex graph structures into entirely new layouts which might bring more sense to the input data and help detect hierarchies or pseudo-hierarchies within a social network. Other layout algorithms make use of geometry to produce Orthogonal or Organic layouts, Tree and Circular layouts including multi-radial and plain disc layout which can detail interconnected rings and star topologies. NetDraw was used earlier to provide a number of layouts, but yEd’s algorithms are more powerful in re-routing edges (links) to provide a cleaner layout topology, especially when using edge routing, an option which makes edges align with each other. It is important to note that NetDraw and yEd have entirely different purposes. NetDraw is used to analyze a network to detect clusters, ego networks etc. yEd is a graph editor used solely to display a network in a variety of topologies. Indeed, most users utilize yEd solely to produce clearer graphs for knowledge representation, software engineering, database schematics, process and workflow illustration and family trees. 1) Method/Theory yEd accepts several input file data formats including GraphML, YGF, GML (a popular text-based format), TGF and XML formats. Unfortunately, none of these formats is compatible with any of the formats in which NetDraw data can be exported. The yEd graph therefore had to be built using the integrated graphic editor by a click, drag and drop process to create nodes and link them together. The input data was manually read from the .vna file generated by the NetDraw software when saving the NetDraw graph. As a result, 64 nodes and several hundred links were created manually using point and click. Each node was also labeled accordingly. A choice was made to select rectangular boxes allowing for containing an author’s full name, but it is also possible to modify node attributes to follow shapes, colors and sizes, whilst also modifying link thickness, arrowheads, etc. In this respect, yEd has features similar to NetDraw. The only drawback is that it is impossible to change node attributes automatically, although this can change under certain conditions when performing specific types of layout analysis, according the special demands of the resulting graphic. Arrowheads denote a link’s direction. All arrowheads were removed in order to clear up the clutter generated by so many nodes in such a small topological space. It is important to note that even with arrowheads removed, links keep a direction. Whilst some layout algorithms do not make use of this information, others, such as the hierarchical layout algorithm, establish layer order by using the direction of the links. This might be confusing when no arrowheads are present and might lead to erroneous results. Link thickness was defined for each link, according to the number of RFCs written together by a pair of authors. 5 levels of arrow thickness were chosen manually.
  • 34. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 34 2) Results A large number of permutation of layouts is possible using yEd. Each type of layout allows for several parameters to be modified, sometimes producing vastly different results. a) Circle Layout It is possible to select from layouts which are appear similar to those obtained using NetDraw. One such layout is the Circle layout where a plot similar to the one shown in Figure 1 can be created. Nonetheless, yEd offers more layout options to plot the circle. For example, the circle plot layout can be transformed into a disk, where some nodes appear in the center of the circle, and others, namely cutpoints, appear outside the circle. Those cutpoints are defined as the base for all pendants. Some manual housekeeping (shortening of some links, coloring of cutpoints and pendants) results in the graph shown in Figure 25. Figure 25: yEd plot of network (subset of authors having published 20+ RFCs) in disk layout.
  • 35. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 35 b) Disk Layout with organic edges Starting with the network shown in Figure 25, the links connecting the nodes can be altered into organic links, whilst the layout of the nodes remains untouched. This algorithm routes the links so as to ensure that they do not overlap nodes and keeps a specifiable minimal distance between them. The algorithm is based on a force directed layout paradigm. Nodes act as repulsive forces on links in order to guarantee a certain (user-defined) minimal distance between nodes and links. The links tend to contract themselves. Using “simulated-annealing” leads to link layouts, which are calculated for each link in turn. The resulting graph is visually attractive in that nodes are not overlapped by links, although since some links overlap each other, it is sometimes difficult to follow their routing. The result is shown in Figure 26. Figure 26: yEd plot of network (subset of authors having published 20+ RFCs) in disk layout and organic links
  • 36. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 36 c) Disk layout with orthogonal edges Starting with the network shown in Figure 25, the links connecting the nodes can be altered into orthogonal links, whilst the layout of the nodes remains untouched. This algorithm can route the links of the network using only vertical and horizontal line segments, while keeping the positions of the nodes in the network fixed. The routed links will usually not cross through any nodes and not overlap any other links. The resulting network is shown in Figure 27. yEd channel edges layout provides a similar routing topology for the links, with a few less significant alterations. It is interesting to note that yEd’s orthogonal edge router and orthogonal channel edge router algorithms can be used on any type of network topology without displacing the initial node position. It is therefore possible to “clean up” any type of network graph through a combination of node positioning and link positioning. Figure 27: yEd plot of network (subset of authors having published 20+ RFCs) in disk layout and orthogonal links
  • 37. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 37 d) Organic Layout Selection of the Organic Layout produces undirected graphs containing no overlap between nodes. Processing the resulting graph through the edge router’s organic layout also makes sure that no overlap occurs between links and nodes. The type of layout generated has essential similarities with the layout obtained in NetDraw’s output of Spring Embedding using Geodesic Distances, Node Repulsion and Equal Edge Lengths analysis. The organic layout box in yEd also allows for the defining of a preferred link length. The resulting network can be seen in Figure 28. Whilst readability is improved over the Figure 28: yEd plot of network (subset of authors having published 20+ RFCs) in organic layout and organic links NetDraw output shown in Figure 6, clusters might be slightly less noticeable by eye because all nodes are evenly spaced. yEd allows for the manual definition of clusters, whereas the cluster can be laid out with a different algorithm. This is seen later. e) Orthogonal Layout This type of layout produces compact drawings with no node overlaps, few crossings, and few bends. All links are routed in an orthogonal style: only vertical and horizontal line segments will be used. This enhances readability of the resulting graph. As with every other type of layout in yEd, this option offers a selection of preferences which radically modify the results, each with its own advantages.
  • 38. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 38 One such option is the use of “Node Boxes”, where nodes are resized according to the number and position of their neighbors to reduce the overall number of bends in the links. Readability of the graph is improved and a by-product of this algorithm is that it tends to cluster more intensely tied nodes together. An example of this stylish layout is shown in Figure 29. A tradeoff is that node size might be misinterpreted as being linked to the importance of a node in the network whilst it is clearly not the case. Figure 29: yEd plot of network (subset of authors having published 20+ RFCs) using variable size Node Boxes and orthogonal layout, with grid size 15. Correlating results obtained using NetDraw with results obtained with yEd generates interesting results. For instance, since the above diagram appears to show clear clustering of nodes which are closely connected together, a clustering algorithm from NetDraw can be used and applied to the nodes in Figure 29. The results from the Girvan-Newman algorithm (Section G) generated a diagram shown in Figure 15, with 6 clear clusters appearing to be the most optimal network clustering. Applying these results to the nodes shown in Figure 29 and selecting the option of “face maximization”, generates the graphic shown in Figure 30.
  • 39. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 39 Figure 30: yEd plot of network (subset of authors having published 20+ RFCs) using variable size Node Boxes and orthogonal layout, with grid size 15, cross-linked with Girvan-Newman Clustering algorithm data The clusters are shown and appear to validate the data generated through the Girvan-Newman algorithm. Further combinations of analysis and graphical display are possible, although not all combinations bring further cognitive advantages to the analysis. For example, other options using this layout also allow for mixing the orthogonal layout algorithm
  • 40. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 40 with a tree sub-algorithm where larger sub-trees are processed using a special tree algorithm. Whilst results using each of these algorithms generate good looking graphs, no significant further insight is gained from our input RFC network than by other means described earlier. f) Hierarchical Layout An important option in yEd is plotting the graph using hierarchical layout. This includes a set of algorithms which can be permutated to generate a vast array of hierarchical graphs. Establishment of a hierarchy of nodes necessitates the use of link direction. However, the network of RFC authors involves two-way collaboration between common authors of a RFC, with no explicit hierarchy or precedence of one author over another. Drawing the graph with reciprocal arrows vitiates the hierarchical layout analysis by still trying to establish a clear hierarchy and reciprocal arrows are not shown as a single double-ended arrow, but rather a cyclic process. Removing arrowheads does not remove link direction. Such layout is therefore flawed since the anticipated result, that which determines the most significant nodes in the network (for example, nodes denoting highest RFC authorship or highest betweenness), is not the result attained. When the “top to bottom” option is selected, nodes are placed in hierarchically arranged layers and this gives a false illusion of hierarchy when there actually is none. Nevertheless, the aesthetic outlook of the resulting layout is helpful in providing a clear view of the network, especially when selecting orthogonal edge routing. Whilst at first glance, clustering of cliques seems apparent, this is actually not the case. Many long links exist, linking distant nodes. By selecting hierarchical optimal ranking, layer assignment is done in such a way that the overall sum of the layer distances of all edges in the layout is minimal. The resulting graph is shown in Figure 31. Figure 31: Hierarchical Layout of network (subset of authors having published 20+ RFCs) with orthogonal links. Options are hierarchical optimal ranking,
  • 41. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 41 Surprising, visually creative but analytically ineffective results are obtained with the polyline option which creates a vast number of bends and parallel links. Various other options are available to concentrate or disperse, increase or decrease the number of links, layout and direction of the network. Each permutation of options produces graphs which are equally as visually attractive, but with no analytical value. As a result, the hierarchical layout might not be most suited to display the results obtained in our analysis – although the minimum bends in the links make the network very readable. g) Using groups with a mix of layouts Clusters found using NetDraw can be defined as a group in yEd, and a mix of layouts applied for the groups themselves and the links between the groups. This gives rise to nested graphs. For example, it is possible to define 6 node clusters from the results of the Girvan-Newman algorithm. Nodes in each cluster can be grouped, and groups laid-out independently from the rest of the network. Many combinations of types of layout can be used to reach various results. Cross-group links could be routed orthogonally, organically, randomly, or could be removed altogether. In this case, the resulting diagram of node organization within each group is shown in Figure 32. Figure 32: Grouping of nodes (subset of authors having published 20+ RFCs) cross-linked with NetDraw’s Girvan- Newman Clustering algorithm data and layout using disk algorithm, with removal of inter-group links
  • 42. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 42 3) Scaling Up The sub-networks plotted in Figures 25-32 reach a nodal upper limit for effective micro-analysis due to the restricted space available on a single A4 page. Scaling the network size up by removing constraints on the input data set is possible but is bound by two principal limits: - all data from NetDraw need to be input manually either using point & click, or writing a data file in text format; RFC authors are so closely linked together that this introduces an exponential number of links as soon as further nodes are added; - as more nodes are displayed on screen, clutter takes over. Node size might need to be reduced, the diagram zoomed-out, and labels therefore rendered unreadable. Micro-analysis transforms itself into macro-analysis. On a network as cross-linked as the RFC authors, macro-analysis of data using yEd does not yield additional key results. However, networks containing more defined sub-groups and less cross-group connectivity will likely yield satisfactory results with macro-analysis. 4) Conclusions yEd is a powerful piece of software which can be used to generate new network topologies in a social network. Whilst many of its features are similar to the features provided by NetDraw, yEd is different in that it is a graph editor, whilst NetDraw performs network analysis. Its weaknesses: - Most input file formats are binary files and do not interface with NetDraw output files. It is therefore difficult to share data between the two types of software - Except in specific cases, nodes do not support automatic attributes which could have been generated by analysis – attributes need to be set-up manually Its strengths: - Nodes can be replaced by user-defined icons, thus giving rise to the possibility of very impressive visual styles - Edge routing generates particularly clean graphical results - Hierarchical, organic and orthogonal layouts are not offered in NetDraw. The results attained using yEd are therefore complementary to those reached using other software - Tree, as well as star and spoke layouts generate very clear results which might pinpoint more “hidden” information within a data set or social network - Most components of the resulting graph can be labeled extensively - Customizable workplace by docking sub-menus as desired - Graphs can be nested: for example part of a graph can be displayed using one algorithm and another part using another algorithm best fitting its needs In this section, we have shown a few possible uses of yEd in the context of social network analysis. Combined with NetDraw software, yEd provides a powerful free starter pack which can be used in the world of Social Network analysis.
  • 43. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 43 V. DISCUSSION A. Limits The analysis presented here is bounded by many limits. Many assumptions had to be made in order to keep the analysis to a sustainable size. These assumptions are likely to introduce discrepancies in the results. For the sake of awareness, the limits of the analysis are detailed in this section. 1) Input file discrepancies a) Naming conventions for individual authors As is sometimes customary in Anglo-Saxon countries, some names are not always transcribed in their original form. “Richard” may be quoted as “Dick”; “Robert” as “Bob”; “Anne” as “Ann” etc. Similarly, some foreign names may be spelled differently depending on the period. This is particularly understandable when the RFC database is pure ASCII and many names use characters that go beyond the scope of ASCII, for example replacing “ü” with “ue” or “u”. Both name inconsistencies might introduce several apparently different instances for a given author. This was not corrected in the several cases found in the database because we did not have the ability to crosscheck if “John Doe” and “Jon Doe”, or “Bob Smith” and “Robert Smith” were the same individual. Such manual crosschecking would be too time-consuming. The errors introduced in the results were found to be small enough to ignore. The rationale behind this is that an author would use one type of spelling to their name in most cases. Erroneous spelling would therefore be the exception rather than the norm. An example of such discrepancy can be seen in Figure 21 where both Metcalfe_B. (Bob) and Metcalfe_R. (Robert) are shown. b) Naming conventions for organizations The naming integrity in the RFC database is imperfect. Whilst in some cases the full name of an organization is given, there are also several equally frequent occurrences were the acronym of the same organization is used. For example: IAB vs. Internet Architecture Board vs. Internet Activities Board; IETF vs. Internet Engineering Task Force; IANA vs. Internet Assigned Numbers Authority; ISOC vs. Internet Society; IESG vs. Internet Engineering Steering Group etc. We felt that these discrepancies were sparse enough not to cause major data corruption. Furthermore, our study centered on individuals and not on organizations. The decision was therefore taken to ignore those discrepancies, though it is worth noting that several RFCs have one of the above organizations as their sole or joint author. c) Reporting on work with third parties Several RFCs report on work undertaken by or in collaboration with third parties who might not be named in the RFC itself. Some of these RFCs have an author identifying himself or herself as the editor of the document. It is unknown whether this editor will have also contributed to the work presented, and the team which performed the work might be identified through acronyms described above, or are simply unidentified. “Conversation with” a third-party also constitutes the subject of several early RFCs. In all cases, the name recorded is that of the RFC author. Manually treating each case in turn is much too time consuming, and whilst some individuals might have benefited from this reporting, we felt that this inconsistency was marginal enough to be ignored. d) RFC status This is explained in RFC 2026. It is important to remember that not all RFCs are standards track
  • 44. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 44 documents, and that not all standards track documents reach the level of Internet Standard. RFCs therefore fall in different statuses: Internet Standards Track (Proposed Standard, Draft Standard, Internet Standard); Non-Standard Track Maturity Levels (Experimental, Informational, Historic); Best Current Practice (BCP) and Unknown. Our study does not take note of the current status because it is assumed that at the time the RFC was written, it was current. The "usefulness" and scope of an RFC’s importance cannot be scientifically evaluated. All RFCs are therefore considered in this study on an equal footing. Again, this might introduce inaccuracies in the results, although the RFC sample size is so large, these are statistically minimal. Furthermore, it is worth noting that all RFCs are equal when it comes to networking between authors, whether the RFC reaches standards track or not. 2) Use of Network Analysis Software a) 2-D vs. 3-D Mentioning 3-dimensional displaying of data always attracts much attention. Although NetDraw displays networks in 2-dimensions, its interfacing to export data which can be readily used in MAGE software displaying a network in 3D is a real asset. However, the utility of displaying the data in 3D is dependent on the input data set. Some network topologies will not show well in 3D. The question of 2D vs. 3D is one which can only be answered through trial and error. Although an in depth discussion about 2D vs. 3D is outside the scope of this document, current scientific knowledge points to 3-dimensional cognitive processes requiring more complex processing for the human brain than 2-dimensional. Fixed 3-dimensional display of data adds complexity to the brain’s visual recognition and might therefore be less useful than 2-dimensions except when presented in an interactive way, such as the reader being able to rotate the 3-dimensional space about an axis. In fixed displays, adding a third dimension to a planar representation might be counterproductive by adding complexity. Much data about the 2D vs. 3D cognitive model is available elsewhere on the Internet. b) Large input data sets It is said that a diagram speaks a thousand words. Large input data sets can indeed be analyzed using the tools presented in this essay. However, space restrictions on an A4 page make it difficult to show the results in a legible manner. It is therefore usually only possible to undertake macro-analysis on the network (removing individual node labels, for example), and restrict micro-analysis to smaller data sets. A mix of micro and macro analysis would be very useful in the future – somehow being able to zoom in towards specific parts of the network and isolating them using point and click. For the time being, both NetDraw and yEd have low usability factor in this scenario. Large input data sets also require an increased amount of computing power and memory. Provided adequate computing resources and memory are available, it would be possible to carry out a much more targeted analysis. c) Informal data vs. Formal data The data source, namely IETF RFCs, constitutes only a subset of every development and collaborative work ever undertaken to make the Internet what it is today. Whilst formally only a subset of authors are included as the authors of an RFC, most RFCs are discussed extensively in working groups when at Internet Draft stage, and informal communication provides much of the input towards the final RFC. In choosing a defining link between authors as the only link between the authors and as the total data set for our analysis, we are unable to incorporate the informal data generated in the discussions. This introduces a limit which vitiates the hypothesis of this research to find the “Father of the Internet”. Indeed, based on the research which therefore uses only a subset of the people involved in the
  • 45. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 45 Internet’s development, it is clear that only a subset of contributors to the Internet’s development is displayed in the graphs. Informal data is nearly impossible to track. Increasing the reliability of the results would have to involve a mining of every email ever sent in the realm of the IETF standards process and this is clearly impossible. Working group mailing lists do have archives, but these are so dissimilar in style, completeness and social interaction protocols that a significant dose of Artificial Intelligence would be required to mine noteworthy data. d) Restricted data sets Even with the subset data source consisting solely of IETF RFCs, the data mining process used further reduces the input data set since it is too basic to extract acknowledgments from the RFC’s text. The only parameters currently mined are the name of authors for each RFC. Processing of this data gives rise to the number of publications by author and an author’s links with other authors. An important dimension missing from the data set is the concept of time. Some RFCs were written in the 70s, some in the 80s, some in the 90s etc. Modifying the data mining process to incorporate dates would enable NetDraw analysis by target date, which could then show social networks as they existed in each period of time. Comparison of those networks might provide a good idea on the “nomadic” behavior of some authors, a possible explanation for the differing faithfulness shown by some nodes when dividing the network of authors into clusters, as seen in Section G, the Girvan-Newman algorithm. The Internet has evolved, and so have the social networks of people building it. Another dimension missing from the data set is the significance of an RFC, the current assumption being that every RFC is as “important” as every other RFC – and this is clearly not the case. Perhaps a concept of “RFC weight” could be developed to measure the impact of each RFC on the Internet’s technical development. The more restricted a data set, the more restricted the results. 3) Reliance on RFC database to find a father This study makes exclusively use of the RFC database to examine its evolutionary process. Of course, the basic assumption of “Anything not in a RFC does not exist” is as absurd as “Anything invented before the first RFC does not exist”. A great many inventions, for example WYSIWYG and the Mouse, the World Wide Web, search engines, peer-to-peer computing and other applications also make the Internet what it is today. In fact, the biggest strength of today’s Internet is that you can throw any type of traffic at it and it will carry it, since it is both physical, link and application layer independent. None of the above applications was covered by a RFC. Does this mean, none of the inventors of those applications have any kind of paternity claim over the Internet? RFCs are not the Alpha and Omega of the Internet’s existence. For example, a large amount of early work was published as Internet Experiment Notes (IENs), a set of more than 200 documents and reports preceding the first Internet RFC (RFC675)[16]. Our analysis misses this data. Perhaps a wider, more inclusive, cross-standard, cross activity, cross-invention and cross-layer search and analysis would be required? Scio me nihil scire (I know myself to know nothing) [17]. B. Opportunities Social Network Analysis opens a new door to further understanding groups of people. In the context of RFC authors, any method circumventing the limits described above would increase the accuracy of the analysis and therefore the accuracy of the results. As we have seen in this essay, a major analytical shortfall of our research is that it does not take chronological perspective and timelines in consideration. Any ongoing research process spanning
  • 46. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 46 several discoveries introduces a hierarchical chronology of innovations and publications. For instance, combining this study with an analysis making use of the RFC Citation index would yield further information about the influence RFCs authors have had over theirs peers and over the development of today’s internet. The Citation index would need to be data mined from the existing RFC Index [8]. The treatment of these results would ease the limit described in Section V.A.1.d above, as well as allow for the sketching of NetDraw and yEd graphs, a non-exhaustive list being suggested as follows: • RFC timelines • A hierarchical tree of RFCs which might lead to a hierarchical tree of RFC authors • The Ego network of a RFC, which might lead to an Ego network of authors with indirect influence, rather than the current direct influence analysis shown in Section IV.I. • The branching of RFCs which have been unsuccessful in generating traction – some might be lost opportunities, some might be rising stars, some might be dormant, some might be alternative processes and some might be dead ducks. The interest in this analysis is generated from the question: did history make full use of knowledge available at the time? Clearly, a cross-discipline fusion of analytical methods, using statistical techniques, Artificial Intelligence, graph theory, chaos theory, fuzzy logic and social network analysis of a cross-layer input set of data would enhance the accuracy of results – as would having access to vast manpower and computing resources. The opportunity to derive data on this subject from the sources currently available on the Internet is almost limitless, but the intent of the work presented in this essay is not to reach highly conclusive and accurate results. Rather, it is to provide a somewhat rhetorical example of what could be achieved with very limited computing resources (namely one laptop) and software freely available out there on the Internet. VI. FURTHER WORK Taking into account the opportunities described above, the door is clearly open to many paths for further work. For example: • Use methods of social analysis on: o ISOC bottom-up structure o ICANN and its constituencies, starting perhaps with the At-Large structures o W3C recommendations and its consensus-based standards tracks o WSIS/UN IGF bottom-up processes and at large involvement at global level o Elements of Internet Governance • Use of methods of social analysis in a political party, to ensure a smooth information flow and correct leadership process, including the processes leading to presidential races and elections • Use of such methods in any organization whose decision structure is based on the concept of bottom-up processes, whether by consensus or vote VII. CONCLUSION In this exercise, we have demonstrated the worth of Social Analysis and its usefulness in light of new Knowledge Management practices. By combining some Competitive Intelligence Data Mining techniques with Social Network Analysis, we have introduced new parameters which can be used to verify and display the degree of satisfactory consensus building in an organization. This could include the organization of working groups or combining of entities having a different social, contextual and
  • 47. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 47 historical background. This could also include the sharing of data within the context of an organization’s Knowledge Management. The Internet and its governance is possibly one of the most complex societal systems ever to evolve. Its complex mesh of working communities will require co-ordination in the future in order for governance to be able to tackle future challenges successfully and to make sure that its decision process is as inclusive as possible whilst being streamlined enough to actually reach decisions. Since the Internet’s place in people’s lives is increasing year on year, novel scientific tools which could help in its governance & development should be available for anyone to use. This essay has provided an insight into what some of these future tools might look like and how useful they could be. As for the question, “who is the father of the Internet” – since we have proven that RFC authors have a habit of working as a community, this would be impossible to determine without a DNA sample. Will the real father(s) and mother(s) please stand up? More specifically, we have shown that: - There are many “fathers” of the Internet. They are all closely linked together into a network of authors comprised of many cliques and clusters which appear to be as interlinked as the interlinking of networks in the Internet’s network of networks - Many RFC documents are written single-handedly by authors, although this constitutes a minority in the community - The most prolific authors tend to form clear clusters, inter-linked to other clusters by key individuals - Jon Postel having held the position of RFC Editor was one of those key people - Joyce K. Reynolds is also a very prominent author, with many RFCs co-authored with Jon Postel – in fact, she also acted as RFC editor and helped with IANA management - Robert Braden is a key character in the RFC structure of authors, as shown by his high centrality. Admittedly, he chaired the IRTF End-to-End Research Group which developed many key RFC's, and served as the RFC co-editor for the IETF. ACKNOWLEDGMENT The author thanks E. Boutin (University of Toulon, France) [18] and Dr. Brian Dickens (National Institute of Standards and Technology, Gaithersburg, MD, USA) for their valuable feedback and corrections in the dissertation of this paper, V. Cerf [19] for his kind feedback about early Internet research and R. Bush [20] for having allowed his name to be used in examples on Ego networks. The author would also like to dedicate this essay to Tim Gartside (ISOC Sphere Labels project) [21] who provided some of the initial inspiration for this research but who left us tragically before its conclusion.
  • 48. © 2009 – Olivier MJ Crépin-Leblond. Full copyright notice on Page 1 48 REFERENCES [1] S. Crocker, “Host Software”, RFC Repository, IETF Online Secretariat, Available: http://www.ietf.org/rfc/rfc0001.txt?number=1 [2] RFC Editor Web Page. Available: http://www.rfc-editor.org/ [3] Kleinrock, L., “Communication Nets; Stochastic Message Flow and Delay”, McGraw-Hill Book Company, New York, 1964. (Out of Print) Reprinted by Dover Publications, 1972. (Published in Russian, 1971, Published in Japanese, 1975.) [4] Licklider, J. C. R., "Topics for Discussion at the Forthcoming Meeting, Memorandum For: Members and Affiliates of the Intergalactic Computer Network". Washington, D.C.: Advanced Research Projects Agency, 23 April 1963. [5] Engelbart, D. C., et al., "SRI-ARC. A technical session presentation at the Fall Joint Computer Conference in San Francisco, Dec. 9, 1968" (NLS demo ’68: The computer mouse debut), 11 film reels and 6 video tapes (100 min.), Engelbart Collection, Stanford University Library, Menlo Park (CA) (some footage available on the Internet) [6] Cerf, V. and Kahn, R., “A Protocol for Packet Network Intercommunication”, IEEE Trans on Communications, Vol 22-5, May 1974. [7] Metcalfe, R, et. Al., Xerox Corporation, “Multipoint data communication system with collision detection”, U.S. Patent 4,063,220, 31 March 1975. [8] RFC Index. Available: ftp://ftp.rfc-editor.org/in-notes/rfc-ref.txt [9] NetDraw Network Visualization. Available: http://www.analytictech.com/Netdraw/netdraw.htm [10] Torgerson, W. S., “Multidimensional scaling: I. Theory and method.” Psychometrika, 17:401-419. [11] 3D Analysis: The Mage Page. Available: http://www.sbb.duke.edu/kinemage/magepage.php [12] Einstein, A., "Die Grundlage der allgemeinen Relativitätstheorie", Annalen der Physik 49, 1916. [13] Girvan, M. and Newman, M.E., “Community structure in social and biological networks.”, Proc. Natl. Acad. Sci. USA, 99, 7821-7826, 2002. [14] Johnson, S.C., "Hierarchical Clustering Schemes" Psychometrika, 2:241-254, 1967. [15] yEd Graph Editor. Available: http://www.yworks.com/en/products_yed_about.html [16] Internet Experiment Note (IEN) Available: http://www.postel.org/ien/txt/ien-index.txt [17] attributed to Socrates’s apology which Plato handed down [18] Boutin, Eric, Personal Web Page : http://i3m.univ-tln.fr/imprimer.php3?id_article=88 [19] Cerf, Vinton, Web Page (no affiliation) : http://en.wikipedia.org/wiki/Vint_Cerf [20] Bush, Randy, Personal Web Page : https://archive.psg.com/ [21] Gartside, Tim, Web Page : http://wiki.chapters.isoc.org/tiki-index.php?page=Tim+Gartside&bl=y Olivier M.J. Crépin-Leblond has been an Internet user since 1988. He received a B.Eng. Honours degree in Computer Systems and Electronics from King’s College, London, UK, in 1990, a Ph.D. in Digital Communications from Imperial College, London, UK, in 1997, and a Specialized Masters Degree in Competitive Intelligence and Knowledge Management from CERAM Business School in Nice-Sophia Antipolis, France, in 2007. Over the years, he has been involved in many Internet and telecom projects, has founded Global Information Highway Ltd in 1995 and is available as a consultant in Telecom matters. Current interests range from IPv6 deployment, Network Neutrality, Internet Governance and Green Internet to all aspects of Strategy, Intelligence and Knowledge Management in the 21st Century, especially for bottom-up consensus-based organisations. He is a member of the IET and senior member of the IEEE, Board member of the English chapter of ISOC and of ICANN’s European At-Large Organisation (EURALO). In 2010 he is also a Nominations Committee member for ICANN. Full details available on: http://www.gih.com/ocl.html