Evolving Collaboration Patterns in North American Research Using Advanced Collaborative GRID Infrastructures : A Canadian Perspective Based on Co-Linking of High Performance Research GRIDs
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
HSHP Research GRID co-linking
1. 1
EVOLVING COLLABORATION PATTERNS IN NORTH AMERICAN RESEARCH USING
ADVANCED COLLABORATIVE GRID INFRASTRUCTURES: A CANADIAN
PERSPECTIVE BASED ON CO-LINKING OF HIGH PERFORMANCE RESEARCH GRIDS
Gordon M. Groat
2. 2
TABLE OF CONTENTS
LIST OF TABLES ........................................................................................................ 3
ABSTRACT............................................................................................................... 4
CHAPTER I: INTRODUCTION ......................................................................................... 7
Overview of the topic .................................................................................................. 7
What is a High Performance Computing (HPC) and what is a HPC GRID? ....................................... 10
HPC GRID Computing Relevance to Higher Education Science and Technology Policy ........................ 12
Statement of the problem ............................................................................................ 14
Statement of the purpose ............................................................................................. 19
Research Questions .................................................................................................. 20
North American Grid Structures ..................................................................................... 21
Canadian Regional Grids ............................................................................................ 22
United States Regional Grids ........................................................................................ 25
Comparative analysis of Canadian and U.S. grid development .................................................... 32
Significance of the study ............................................................................................. 34
CHAPTER II: REVIEW OF THE LITERATURE .....................................................................35
Grounding literature and theoretical framework .................................................................... 35
Outsourceability ...................................................................................................... 36
Resource based view ................................................................................................. 37
Transaction cost economics .......................................................................................... 38
Agency theory ........................................................................................................ 39
CHAPTER III: METHODOLOGY .....................................................................................40
Pilot study ................................................................................................................40
Resource collating and data preparation ............................................................................. 42
Data categorization ................................................................................................... 43
Canadian HSHPRG Co-Link Structures: Initial returns from NAGR Institutions Canada ....................... 44
Proposed NAGR Inlink/Outlink Categorization Structure Design ................................................. 44
Sample data collected: Co-link with specificity “CO2” research ................................................... 45
Pilot Study Analysis .................................................................................................. 47
CONCLUSIONS .........................................................................................................49
Works Cited ..............................................................................................................50
Abbreviations ............................................................................................................52
3. 3
LIST OF TABLES
Table 2 - US Regional Grid Fabric ............................................................................. 25
Table 3 - Supraregional US Grid Fabric ...................................................................... 29
Table 4 - Query Structure Examples ......................................................................... 43
Table 8 - U Laval Linkdomain Query (ulaval.ca+.ca+.gc.ca+"co2") ................................................. 46
Table 9 - U Laval Linkdomain Query (umontreal.ca+.ca+.edu+"co2") ............................................... 46
Table 10 - U Laval Linkdomain Query (montreal.ca+.ca+.gc.ca+"co2") ............................................. 46
Table 11 - U Laval Linkdomain Query (usask.ca+.e.ca+.edu+ co2") ................................................. 47
Table 12 - U Laval Linkdomain Query (usask.ca+.ca+.gc.ca+"co2")................................................. 47
Table 5 - NAGR Inlink/Outlink Categorization Structure ............................................... 49
Table 6 - NAGR Inlink/Outlink Language .................................................................... 49
Table 13- Abbreviations ................................................................................................ 52
4. 4
ABSTRACT
As research agendas at universities and colleges require increasingly sophisticated and
powerful technological infrastructures, institutions become increasingly strained to provide
sufficient resources to underpin their research agendas. In the struggle to maintain momentum,
institutions increasingly turn to collaborative research structures that leverage inter-institutional
infrastructures because they believe that prestige and resources will be the fruits of increasing
knowledge creation (Slaughter, 2004).
Given the reality of compressed resources due to accelerating costs that exists at most
institutions, institutions increasingly collaborate across high performance research grids designed
to facilitate the movement of large data sets so that they can leverage the larger and more
competitive technological and academic resources brought to bear by consortiums that pool these
resources, whether it regards to basic or applied research as described by Bush (Bush, 1945).
A good example of this would be research that requires extensive computational
overhead. Certain institutions maintain massively parallel supercomputer facilities, but it is far
more often the case that institutions do not have such facilities. It is, of course, a critical
infrastructure for computational scientists and engineers, but it is also important to advance
knowledge for the humanities, for experimental scientists, for corporations and associations, and
due to our changing planetary environmental conditions, the criticality of such resources are
central to such fields of study as the environmental sciences (Smarr, 1999).
Resource based view would suggest that if the institution is not able to field the resource
at a world class level, then this component of the research is a candidate to be shifted into the
arena of technologically facilitated collaborative high performance research grids. It should also
be noted that the massive generation of data being generated by computational advancement has
5. 5
created an enormous pressure to remain competitive where computational overhead is concerned
(F. Berman, 2003).
Recent academic developments in this arena explore the application of more theoretical
constructs. Business definitions of words such as outsourcing tend to transition from business to
academe, and it gradually becomes part of the lexicon in academic research. This transition from
a passing interest in an emerging area of technologically facilitated collaborate research
structures to a significantly researched area of academic inquiry is a natural progression.
For the purposes of this research, the challenge is to rethink the way we look at shifting
research to the resource rich environment of inter-institutional research grids by examining the
way these resources interlink and interact with each other. Collaborative consortiums that now
thrive in higher education research leverage an extensive sharing of resource bases, whether they
are hardware or software, whether they are facilities or equipment, or whether they consist of
exchanging and collaborating with human resource assets, i.e. multiple investigators from
various institutions wielding various sets and subsets of these resources. For this research, I
define the outsourceability of research activity as it relates to the degree to which it is beneficial
to outsource that activity in accordance with the work of Mol (MOL, 2007). I support the genesis
and growth of outsourcing components of research as being correlated to shifting and
compressed budgets and I also note that as national research agendas change, so does the
political influence of institutions, and along with that, so too changes the budgets realized by
direct and indirect funding (Barr, 2002).
High performance research grids now create an important fabric in national and
international research agendas and the way we link these resources together provides pathways
to understand certain preferences. This research is notably concerned with inlink and outlink
6. 6
analysis to ascertain how and why we interact on these grids and to identify language preference
in collaborative research grids. An important component of this study is to overlay language
preferences with geographical preferences in order to elicit the impact across multilingual
institutions in Canada. Canada was selected due to a distinct bilingual mandate on the national
level. In order to determine the influence of national and international collaborative structures by
tracing inlinking and outlinking across English, French and national grids, it is hoped that
implications for academic research in an officially bilingual nation may be better understood.
7. 7
CHAPTER I: INTRODUCTION
Overview of the topic
Patterns of co-linking on the internet have been generated in many quantitative studies.
Co-linking is how we describe web pages that are linked together. Some of these links may be
intra-institutional, some may be links between institutions, and some links may have little to do
with areas of academic interest, yet they are important for the web content of higher education
organizations. Some of those links might include links to websites that offer students, staff, and
faculty information regarding benefits plans, recreational opportunities, housing, or
transportation services. This list of those kinds of links are extensive and the size of institutional
web sites has grown tremendously.
By analyzing a variety of linking structures, data have shown interesting relationships
between institutions. It is, essentially, a technological way of looking at how we communicate
with each other. Initially, linking was part of a structure that we did not asses, we merely used
these interlinked pages for matters of convenience. Much like how Facebook came about. The
program was designed just to link together some information about classes and match students
up for study groups. This evolved into a facemash that was designed to let people look at the
names and faces of people in dorms and rate who was hotter. According to the votes, rankings
were developed. This is how Facebook started.
This seems incredibly simplistic and of little value from the perspective of people who
had no idea what potential could be contained in such programming. Zuckerberg put the site up
on the weekend and on Monday morning it was taken down because it had overwhelmed
Harvard's server and prevented students from accessing the web. It was also described as
completely improper and without merit. Today, there are over 500 million active users and
8. 8
people spend over 700 billion minutes per month on Facebook. It has over 900 million objects
that interact with people, has been translated to 70 languages, and over 10 million new applicatio
interfaces are installed daily. Facebook is nothing more than a gigantic system of co-linking.
First said by academe to be foolish, of no value, and having no meaning for advanced education
in any way, shape, or form. Today, the company is worth over 7 billion dollars and more of your
students, no matter what university you teach at, no matter in what country (except China of
course) are on facebook than then visit your university website. If you are a professor, you may
rest peacefully at night knowing your students spend more time on facebook than they do on
Google. This is not by a small margin either.
Co-linking, for Facebook reasons, can obviously be very popular. In advanced education,
co-linking is also popular, but it is the process of trying to understand why and how institutions
co-link that matters to academe and to policy makers in advanced education. Why it matters to
policy makers is central to the questions of academic mobility across the web and the funding to
support infrastructure that will enable that mobility of academic thought.
In academe, however, we are, for the most part, more interested in how we shape
research than in dating or playing games. But don't get me wrong, you'll be hard pressed to find
anybody at any university who does not know what Facebook is. Social media is a larger
expression of interlinking or "co-linking" as we start to move information more freely across the
web. Just a few years ago, one could visit almost any new outlet on the web and see no
mechanism to post the article to Facebook and, of course, there was typically no mobile web for
smart phone users. Today, it is virtually impossible not to find a news outlet without a share
mechanism and smart phone designed pages. The reason why is based on the statistics gained
from studying co-linking patterns and interlinking traffic, all of which tell us that smart phones
9. 9
and the ability to interlink data and information seamlessly are not the exception, they have
become the norm.
As such, when I discuss the importance of understanding how and why institutions
interlink with each other to uncover patterns of things such as language preference or
geographical location, or cultural preferences, all of these things inform a larger picture that
policy makers can use to create investment decisions to advance certain kinds of research. To
reject the validity of this notion having any importance to advanced education is probably no
more short sighted than Harvard refusing to let Zuckerberg run his Facebook application on their
server. If, on the other hand, they had asked for a small percentage of the intellectual property
rights in exchange for server capacity, Harvard would have easily doubled their endowment by
now. Hindsight is always crystal clear and usually painful.
This paper will not focus on social media, but instead, will focus on understanding
patterns of in-linking and out-linking in research grids. At the beginning of the paper, there was
very little interest in GRID research at all. Because technology explodes at such an exponential
rate, high speed high performance research grids have now become central to national education
strategies and national security as research is often centered around the rarified topics that glean
military advantages such as physics, chemical computing, imagery analysis, and things of that
nature. But like the internet itself, it was and always has been destined to open new gateways for
the humanities and the arts. These fields are ever faster forging ahead into this arena and we
know they will one day be strong players as global symphonies are created, just as an example.
But the limits are as uncapped as the limits of human imagination itself.
10. 10
What is a High Performance Computing (HPC) and what is a HPC GRID?
For many years, High Performance Computing was thought of as Supercomputing and
was associated with ownership of extremely expensive supercomputers. Because computational
advancements have moved forward at an astounding rate.
As the internet grew, so too did the desire to create ever larger pipelines to transfer data.
It was this growing of the internet combined with rapidly evolving technologies that have created
many interconnected resources across specially dedicated large internet cables. A different way
to refer to that is bandwidth. Bandwidth is associated with large pipelines to transfer data, the
bigger the bandwidth, the bigger the pipeline and the more data that can be exchanged.
This is very important to you, the reader. Because no matter who you are, you probably
use a computer and you probably use the internet to gather information for your own specific
purposes. If you are engaged in academic research, this information comes to you from libraries
or from data collections hosted either on your campus somewhere, or perhaps across the country
or even on the other side of the planet. Your ability to access that information is determined by
your bandwidth. Because the capacity of the personal computer has become so advanced that the
only speed limitations on data transfer lay within the bandwidth infrastructure itself. If you are
pleased with the speed of your network, then you should be able to work in comfort and secure
huge bounties of data that were unimaginable to academics just a couple of decades ago.
Because the Internet exploded so quickly, academics sought to have their own "internet"
bandwidth pipelines. These pipelines required significant investment that was shared by many
institutions, corporations, and organizations. The name GRID was created to describe how these
large cable runs are interconnected, and of course, they are connected to sites that invest and all
11. 11
of the large regional and national grid structures, often called grid fabric. All of these grids, to
some extent, are subsidized by different levels of government.
Governments become involved in this infrastructure development because they have the
most to gain and, of course, the most to lose. We used to think it was important to have big
banks of supercomputers at each university, this was our advantage. But the internet has
changed all of that. We are moving into the era of resource independent computing
environments. Sometimes referred to as "Cloud Computing" this really means that the resources
we use can be hosted anywhere. The explosion of GRID computing is really just a reflection of
the growth of the internet, the underlying infrastructure, bandwidth, and computer capacity. A
short list of GRIDS include a variety of things that are directly tied to advanced education.
The topic list or "genre" of GRIDS include things like bioinformatics, photonic
switching, data center markup language, climate research, severe weather prediction, health care,
middleware, operating systems, astronomy, physics, economics, hydrology, geology, earthquake
engineering to name a few. There's even a grid on mammograms in Europe, to establish a EU
wide database of mammograms so that researchers can evaluate research models across a much
larger data set. There are national and regional grids in Japan, Korea, Canada, the EU, China,
Denmark, Bulgaria, Armenia, Italy, Israel, Croatia, Singapore, Russia, Ireland, Finland, Sweden,
Romania, Netherlands, Serbia, Austria, Switzerland, and the list goes on. This point is simple,
where only a few GRIDS existed at the start of the decade, the proliferation has spread across the
globe. Researchers across the globe are talking to each other in ways that were unimaginable
only a few short years ago.
As the need to share large data sets and infrastructure grows, the implication for GRID
technology is obvious. It will grow exponentially, like Facebook, until it becomes part and
12. 12
parcel of everyday life in academe. In many disciplines, it is already a core component of
everyday life, in many other disciplines, it will become more and more integrated as time,
access, and the ability to leverage the benefits of cloud computing become part and parcel of the
everyday life and language of both the professoriate and the students of the institution.
HPC GRID Computing Relevance to Higher Education Science and Technology Policy
The variety of activities being carried on by HPC GRID computing is astounding, as
previously mentioned. The obvious big players, as disciplines go, include environmental and
meteorological studies, nanotechnology research, weather prediction and simulation,
bioinformatics, biology, chemistry, and physics (Trellis Project, 2003). "The Next Big Thing in
Humanities, Arts, and Social Science Computing: 18 Connect" (Kevin D. Franklin and Karen
Rodriguez'G, 2008) combines a variety of social science information, and offers, for example,
popular texts such as Milton's Paradise Lost or Shakespeare's Macbeth available in different
printed versions that can be customized from a vast database of copytexts and editions.
Emerging social science discussions about the future and shape of academic interactions are
starting to focus on the element of need for exploration and understanding new tools that may
greatly enhance the way social scientists interact with each other (Hodgson, 2007).
Multidimensional scaling creates a visual representation of co-linking patterns that reflect
the way Canadians see the role of their universities. When we are able to generate an image that
shows the number and nature of links that exist between universities and colleges, we can
sometimes identify interesting patterns that we did not previously realize or, perhaps we may
have only suspected. This contextualization of research can be revealing mechanisms to
understand of language preferences (Thelwall, 2002) and stratify cultural differences that are part
of the elemental fabric of Canadian society, the largest and most obvious difference centering on
13. 13
French and English culture. Of course, the lessons we glean from looking at language
preferences in Canada are well understood in Scandinavia, perhaps less so in the United States
where, by any measure of reality, language diversity will only continue to grow.
One of the great projects of the late 20th Century and early 21st Century by the
Consortium for North American Higher Education Collaboration (CONAHEC) was the
foundational work designed to enhance academic mobility between the United States, Canada,
and Mexico. This mission grew and prospered, but it directly impacted how we view ourselves
at the University of Arizona Center for the Study of Higher Education, a center that began to
integrate and embrace people from a wide variety of backgrounds, cultures, and disciplines. It
does not take much of a stretch of the imagination to look at the CONAHEC mission, look at the
University of Arizona, and look at both the State of Arizona and the demographics of the United
States as a whole to understand that Spanish will be increasingly important as a part of the rich
language diversity that is central to the fabric of advanced education.
So it should then stand to reason that as we have embraced diversity in the physical
sense, is it not logical that we would also wish to explore diversity across the medium of cloud
computing and how our interactions in cyberspace may be analyzed, enhanced, and used to
further the research and mission of the Center for the Study of Higher Education and every other
department, center, or college at the University of Arizona, or for that matter, any University
located anywhere on the planet?
Because Canada was crafted from countless first nations and the immigration of French
and English settlers, the resulting non first nation culture has been largely split into two distinct
societies, one French, recognized as a distinct culture and have their own National Assembly.
The rest of Canada has Provincial or Territorial Legislatures. The fundamental differences that
14. 14
exist between French and English Canada have been a central discussion in Canadian culture and
politics for centuries. It has been a cause of discord and few who live in Canada are not familiar
with these cultural themes and tensions.
Creating a qualitative overview through co-link patterns that exist in regional high
performance research grids in North America may prove to provide interesting analysis of
distinctions created in previous research to be compared with non grid co-linking at
corresponding grids.
Co-linking analysis has been predictive based on the finding of strong language
preference in studies conducted at the Institute for Studies and Research and Higher Education in
Oslo found that found identifiable patterns that demonstrated increased co-linking between
Nordic institutions (Persson, 1997) The genesis of this study is a desire to understand the nature
of our collaborations within and external to Canada and to examine how language preference
may impact those collaborative efforts.
Statement of the problem
As institutions and governments are subject to economic cycles, it follows that
institutions of advanced education will also endure compressed budget cycles combined with
increasing demand for research infrastructure. Smaller budgets and rising costs simply outweigh
the ability of the single institution to provide all the leading edge tools the researchers of the
institution require. The short version of the problem is, simply stated, institutions can't afford to
buy enough computer equipment to do everything they want to do... and their budgets are
probably going to be cut back relative to inflationary pressures, making that proposition even
more difficult.
15. 15
The problem is compounded for those departments that are deemed to be non-core areas
of the institution, in other words, those departments and disciplines that are less attractive to the
financial planning interests and revenue streams of the institution. Not only can they ill afford to
invest in high end computational assets, some of them will have to struggle for their very
existence. They will be forced to justify their existence in the age of Academic Capitalism and
the greatest contrast is seen across the areas of the institution that engage in basic research versus
those areas engaged in applied research where significant national grants exist combined with the
seductive promise of intellectual property residuals.
Remember how Harvard told Zuckerberg to take down his Facebook site, that it was
entirely irrelevant to advanced education... they even made him apologize. Well you can be sure
that there are many institutional hawks who will be looking for every ounce of intellectual
property they can find. What institution would like to be made famous for letting the next big
thing get by them. Accordingly, they will likely focus their efforts in fields where they have
seen the largest gains in the past.
It should also come as no surprise that increasing computational power has enabled
numerous institutions of higher education to extend the size, shape, and dimensions of their
academic exploration. The rate of change in the last few decades, like the rate and change of
computational power, has been exponential. A co-founder of the Intel Corporation, Gordon E.
Moore, described a trend that related directly to the amount of transistors that could be placed on
integrated circuits. This prediction suggested that as a result, computational power would
roughly double every two years and predicted to last for several decades (Lundstrom, 2003).
While this is not really a law in the sense of a gas law or a physics law, it has long been
recognized as being remarkably accurate. As such, it has become known as Moore's Law.
16. 16
The simple mathematical formula of doubling should give us some room for perspective.
This is a formula that is simply the log of 2 (69.3), to understand how much computational
power Moore was talking about, a simple doubling rate of two years would produce over 134
quadrillion floating point operations from a starting point of 256 thousand in the course of just
two decades. Sounds like a lot and, of course, it is. To understand the power of the future and
see the exciting promise of future computational power, it is also enlightening to look back a few
years to understand where we have come from.
To take a look back at a time, not so long ago, when personal computers had not been
invented yet, when cell phones did not exist, and there were no iPods or music downloads. It
was not so long ago that a Professor of Higher Education, or any other discipline, conducted
research from their office bookshelf, the library, and through borrowing physical books from
other institutions.
To obtain a snippet of information, one might invest countless hours of time just
obtaining access to resources. Extending academic capacity through high speed research grid
infrastructures, meaning more powerful computers, bigger bandwidth, and more of this being
extended across campus, offersinteresting possibilities that are combined with cost compression
capacity for computational resource overhead, Cost compression is really just the political
reality that most state institutes deal with as state budgets suffer form economic downturns (V.
Piscitello, 2003). But what does that really mean? It means that no university or campus can
possibly compete with cloud computing. No institution has the financial ability to compete
against a global infrastructure build on a sharing model. The process of trying would bankrupt
even the most richly endowed university in very short order. And, of course, they know that and
so they have moved into a shared resource model.
17. 17
Concepts like software as a service (SAS) did not exist just a few short years ago. Most
technological advancements are initially unthinkable commercially because of a small market
size or undefined economics. Private industry did not build it out until it became evident there
was profit to be made (Mark Turner, 2003).
Just twenty five years ago there were not a lot of corporations or universities that had
super computers and none had personal computers. Ordinary faculty, researchers, and students
relied on calculators, and hand written programs that could be entered into mainframe computers
via cards that were typed on card punch machines, organized in great cardboard boxes, and
carried to a centralized computer center where they could be fed into the mainframe
(supercomputer) through a special machine called a card reader.
Organizations simply didn't have computer hardware resources or bandwidth resources to
provide these kinds of things to individual faculty or researchers, they have to be in central
facilities because the cost and size of the facilities was so substantial that no institution could
afford to provide these services in any other way. This is why you will see, on most university
campuses, a computer center. As these things grew and became more and more a part of both
business and education, we have come to depend on an ever increasing capacity to quench our
thirst for more power, more data, and more ability to conduct the research. But now, instead of
having to rely on our own institution for everything, we want to collaborate with people from
across the globe and share resources on a global basis.
It is true that like people, all institutions are have something unique about them. Their
capacity, ability, perhaps their location, and the one question that they all contend with is their
funding. As inequities exacerbate, the chasm between those institutions that are well endowed
and economically prosperous and those with fewer resources continues to create an increasingly
18. 18
precarious situation for those institutions that are getting left behind on the technology curve.
This translates across all the disciplines of the institution due to economic reality. Some budgets
are cut; departments may be slashed or eliminated altogether as administrators constantly
struggle to balance the institutional budget.
Focused excellence tends to be the slogan for cutting back funding across the less
prosperous centers of research while preserving capital for those departments that have two key
components, a significant demand for the research product and the potential for accelerated
economic gains going forward.
This is most typically situated as the potential for intellectual property revenues via
patentable research that offers economic participation for the institution. By analyzing and
extending collaborative research grid environments, or places where academics may enjoy
substantial internet resources, access to large online library collections, and of course, sufficient
bandwidth to support the exchange of research and data. In addition to this, being able to extend
collaboratory environments where it people may meet and collaborate online is all under pinned
by a powerful infrastructure referred to as a collaborative research grid environment.
As more technological infrastructure is extended globally, the digital divide becomes
increasingly diminished. The imbalance across institutions of higher education is exacerbated by
cost prohibitive environments and costs and increasingly complex technological solutions (Erik
Brynjolfsson, 2003). The ever burgeoning global high performance research grid environments
offer unique technology driven solutions that have significant potential to reduce the growing
imbalance across research disciplines and institutions. In other words, as the grid structures
proliferate and the costs are spread across more and more governments and institutions, the cost
for entry into the large scale environment becomes lower. This is the same power of numbers
19. 19
upon which the insurance industry operates. They spread the risk out among many to protect the
few. With computational resources, the risk is spread via the reduction of funding costs for each
institution and the benefit falls to those who leverage those resources and facilities.
Additionally, if students are not provided access to increasingly enhanced technological
resources and provided an environment rich with diverse collaboration options across
institutions, then it may come to pass that recruitment may suffer given the perception of a less
organized strategic viewpoint relating to student affairs. Institutions clearly assoicate their brand
management with their web presence. Technological capacity is at the forefront of recruiting
and some institutions provide computational technologies to students upon enrollment so that
their ubiquitous access is in synch with institutional firewalls and security policies.
We already speak to technological prowess through leveraging Social Media for
recruiting, a concept totally unheard of just a few years ago (Briggs, 2008). If recruitment is
changing and students have expectations such as ubiquitous Wi-Fi access anywhere on campus,
this has implications regarding computational infrastructure in the recruitment and retention of
top student talent (Wilen-Daugenti, 2008).If this viewpoint is increasingly adopted by students at
a given institution, there is some risk that the institution will be seen as an underperforming
institution compared to others. Such an outcome is likely to have a negative impact on
graduation rates as outlined by Woodard, Mallory, and De Luca (Woodard Jr. D., 2001).
Statement of the purpose
The purpose of this research is to focus on a manageable scope of research that seeks to
further the foundational analysis of how research collaboration is conducted utilizing high speed
high performance research grids across North America. The gist of the project is to analyze the
low hanging fruit in the sciences that are most conversant with collaborative research models
20. 20
using the North American Grid Fabric (NAGF). The study is designed to cultivate an
understanding of how we interact in the NAGF.
Toplevel analysis of domains associated with research collaborations that are hosted by the
participating institutions. By analyzing hyperlink patterns (inlinking and outlinking) a high level
understanding of collaborative language preferences may be examined. This is designed to see if
language preference is present in Canada. While it has been confirmed in research that has been
conducted in Scandinavian and European institutions of higher education (Vaughn L, 2007) (Fry,
2006), we know that Canada is unlike these nations in the sense that Canada was founded by two
distinct and different cultures, French and English. These cultures have lived together as a
nation. These completely different and distinct societies and cultures exist under one flag with
the incumbent diversities that any other nation might have, yet it is different at the same time.
Canada is a singularly unique laboratory for this research. Because it is so, there is no way to
predict if what happens in Scandinavia will happen here, or if it will be completely different.
Research Questions
The fundamental research question, simply stated, examines how collaborative
preferences impact how research collaborations are conducted in the NAGF. The research
questions parallel the work of Vaughn, Kipp, and Gao regarding the macro-analysis of linking
patterns from which meaningful patterns may be deciphered (Vaughn L, 2007). How are the co-
linked sites related and how are they related? The language preference will serve as a
contextualization layer to be analyzed after gathering data from the research. Language
preference is an overlay to the central question and a the backdrop designed to tease out even
more understanding of what drives effective collaboration across the grid and how policy may
impact that collaboration.
21. 21
North American Grid Structures
The National Research Council (NRC) has been the Government of Canada's premier
organization for research and development since 1916; and it is also the financial driver for the
development of the Canadian National Grid Fabric (CNGF). A memorandum of understanding
was signed in August of 2001 between CANARIE, the C3.ca, and the NRC. The three agreed
they would monitor interdependencies, agree about technical directions, share project
management, and define a Grid focus in projects. Each brings expertise to the table: advanced
networks, high performance computing systems, and advanced multi-laboratory eScience
projects, respectively.
The Grid Canada project is committed to enabling a core grid infrastructure for use by
these three grid structures and their partners. It is also designed to effectively leverage the
resources each can provide, providing the genesis of a formidable CNGF. Some infrastructure
has already been built, and Grid Canada has inculcated itself into the development of several
applications that will use this infrastructure. Some examples include NRC's iHPC, CANARIE's
Lightpath, and University of Victoria's Data Grid projects.
The NRC President's Challenge has resulted in a $3 million grid-based, multi-scale
computation platform for modeling of nano-structures and biological materials. The core grid
infrastructure will be built and supported by a team internal to NRC in conjunction with Grid
Canada.
The CANARIE Customer-Empowered Lightpaths project is developing standard
interfaces to allow the provisioning of end-to-end lightpaths across heterogeneous network
resources. This work is proceeding with an eye towards the next generation of grids based on the
22. 22
Open Grid Services Architecture, a Web Services enabled infrastructure that can leverage
emerging web standards. Grid Canada is actively tracking this next generation and planning new
infrastructure support.
Researchers at the University of Victoria will be taking part in experiments at CERN that
will be extremely data intensive. They need access to infrastructure that is being built by the
European Union Data Grid effort. Grid Canada is working towards harmonizing its infrastructure
with respect to the EU Data Grid so that the science can be done in the Canadian grid community
as well as the explosively growing international grid community (Canada, 2007). Significant
investments have enabled these collaboration platforms and will continue to expand their reach
into research and higher education.
Canadian Regional Grids
Grid development in Canada has proceeded at a slower pace than in the United States.
Given constrained resources and limited funding ability of the Canadian NRC, the grid fabric
continues to develop across different regions of Canada. Canada's national grid fabric is
shouldered by Canarie, which was one of the most advanced grid structures designed for
research and education when it was deployed.
The regional grids take advantage of the Canarie infrastructure. One of the mandates of
the Canarie infrastructure as guided by the NRC, was to create an internet research laboratory,
but also to provide a platform for education in remote areas of the country, such as the Nunavut,
the Yukon Territory, the Northwest Territories, and areas of Northern Quebec, Labrador, and
Newfoundland. This was done to extend the digital classroom to first nation's citizens and to
ramp up educational capacity in traditionally underserved areas. An overview of the Canadian
23. 23
regional grid fabric shows collaboration across provincial boundaries and also demonstrates
intra-provincial grid fabric as well. One of the questions we approach from an analysis if
inlinking and outlinking patterns relates to preference of aboriginal language and cultural
knowledge.
It doesn't take a leap of faith to understand that first nations share similar interests and
viewpoints, but will they prefer to work with particular non first nations groups when language is
identified as a unique separator. In other words, is it more important to collaborate with another
first nations researcher if they speak English or if they speak French, or does this not matter at
all? None of these questions have been asked and part of the interest of this research is to see if
there is any information that can be gained from examining how research and learning
preferences can be examined through inlinking and outlinking patterns. In other words, how can
we tell what is important to them by who they want to work with.
In the far north, this is perhaps one of the greatest laboratories for examining cultural and
language preference because of one key fact. The relative isolation can only be penetrated by
technology easily. Ask everybody you know who has personally been to the Arctic. It would be
a normal expectation to see that very few people will be able to answer yes. There is no other
place more isolated that has an established population that Canada's far north. As such, it is a
great place to look at inlink and outlink patterns on a small scale and find out if patterns may be
deciphered. It is also an obvious selection of English and French language preferences between
grid infrastructures. The analysis of Quebec as a central point of French collaboration across
NAGR is easy to compare to language preferences in French speaking countries outside of North
America. The Canadian laboratory, in short, has unique benefits for the research that can be
correlated to the Scandinavian countries where the bulk of the existing research data exists.
24. 24
Table 1 - Canadian Regional Grid Fabric
WestGrid operates high performance computing (HPC),
collaboration and visualization infrastructure across western
WestGrid
Canada. It encompasses 14 partner institutions across four
provinces and includes network partners BCNET, Cybera, SRnet,
MRnet, CANARIE
SHARCNET is a consortium of Canadian academic institutions
who share a network of high performance computers. With this
infrastructure we enable world-class academic research. Goals are
SHARCNet
to accelerate computational academic research, attract the best
students and faculty to our partner institutions by providing cutting
edge expertise and hardware, and link academic researchers with
corporate partners in a search for new business opportunities
HPCVL stands for the High Performance Computing Virtual
Laboratory, cluster of fast and powerful Sun computers at five
Ontario universities and three colleges: Queen's University, Royal
Military College and St. Lawrence College in Kingston, Carleton
HPCVL University and the University of Ottawa in Ottawa, Ryerson
University and Seneca College in Toronto, and Loyalist College in
Belleville. In addition to reliable, secure computing, HPCVL
provides storage resources and support for over 130 Canadian
research groups, comprising some 800 researchers, working in a
variety of fields.
The RQCHP is a consortium of five Quebec institutions of higher
education whose mission is to provide researchers of these
institutions with world-class high-performance computing (HPC)
facilities, in addition to training and support from HPC
professionals.
RQCHP The RQCHP's member institutions are the Université de Montréal,
the Université de Sherbrooke, Concordia University, École
polytechnique de Montréal and Bishop's University.
The RQCHP is part of the Compute Canada collaboration, which
ensures access to HPC facilities for all researchers in Canada.
Thus, researchers from other Canadian institutions of higher
education can obtain access to the RQCHP's systems.
The Atlantic Canada High Performance Computing Consortium
(AC3) was formed by a consortium of universities located in
AC3 Atlantic Canada. AC3 is dedicated to providing researchers at
member institutions and across Atlantic Canada with High
Performance Computing (HPC) resources they require to perform
research.
25. 25
United States Regional Grids
The development of regional grids in the United States has seen a period of expansion
during the last decade of the twentieth century and has continued to expand moving into the new
century. A snapshot of regional grids infrastructures in the United States is exemplified by the
table 2 below.
Table 2 - US Regional Grid Fabric
CENIC's California Research and Education Network
(CalREN) is a multitiered, advanced network-services
CALREN
fabric serving the vast majority of K-20 educational and
research institutions in the state.
The Connecticut Education Network (CEN) is America's
first statewide K-12 and higher education network to be
built exclusively using state-of-the-art fiber optic
connections. Operating at speeds 1000 times faster than a
home broadband connection, the CEN
provides incredible access to the Internet, the next
generation Internet2, iCONN - Connecticut's re-search
Connecticut Education engine, and thousands of other resources exclusively
Network targeted to students, teachers, researchers, and
administrators in Connecticut's education institutions.
Every K-12 school district and higher education
campus now has a fiber optic-based connection that
enables students, educators, and staff to take advantage of
multimedia learning resources, research tools, and online
administrative activities. Many public libraries are also
connected to the network.
26. 26
The Florida LambdaRail, LLC (FLR) was created to
facilitate advanced research, education, and economic
development activities in the State of Florida, utilizing
next generation network technologies, protocols, and
services.
The FLR is complementary to the National LambdaRail
Florida LambdaRail (NLR) initiative, a national high-speed research network
initiative for research universities and technology
companies. The FLR provides opportunities for Florida
university faculty members, researchers, and students to
collaborate with colleagues around the world on leading
edge research projects. The FLR also supports the State
of Florida’s economic development and high-tech
aspirations.
The I-Light network is a unique collaboration in Indiana
between colleges and universities, state government and
private sector broadband providers. Indiana colleges and
universities are connected directly to I-Light at speeds
from 1 Gigabit to 10 Gigabit with the ability to provide
even larger, on-demand wavelengths between research
groups on various campuses, when that functionality is
needed. I-Light dramatically improves Indiana's position
as a national leader in very high-speed networking in
support of teaching, learning, research, technology
transfer, and inter-institutional collaboration and
cooperation, activities that will help fuel the State's
economy.
I-LIGHT
I-Light has enabled a community forum for the sharing of
information. In addition to providing more bandwidth
than most Indiana colleges and universities could
otherwise afford, the network provides a variety of other
capabilities such as connecting classrooms at distant
locations with high-quality video-streaming and allowing
researchers at any location to exchange large digital data
files and access to supercomputers and scientific data
storage facilities. It makes possible multi-campus
collaborative research projects and enables the use of
high-definition learning tools such as telepresence, a new
way of video conferencing that gives the user the
appearance of being at the same location.
27. 27
I-WIRE is a dark fiber communications infrastructure
interconnecting Argonne National Laboratory, the
University of Illinois (Chicago and Urbana campuses,
including the National Center for Supercomputing
Applications- NCSA and the Electronic Visualization
Laboratory- EVL), the University of Chicago, Illinois
Institute of Technology, Northwestern University, the
Illinois Century Network Chicago hub, and a several
collocation facilities in Chicago.
I-WIRE Using a dedicated dark fiber plant and Ciena DWDM
transport equipment, I-WIRE currently provides point-to-
point lambda services between I-WIRE sites. Each I-
WIRE site has a minimum of one OC-48 (2.5 Gb/s)
lambda providing connectivity to Starlight. Projects
using I-WIRE as of 2003 include the NSF-funded
TeraGrid, OptiPuter, DOT and Teraport projects. The
TeraGrid project, for example, uses I-WIRE to provide 30
Gb/s (3 x OC-192) connections between Starlight,
Argonne and NCSA.
NEREN (Northeast Research and Education Network),
founded in 2003, is a consortium of non-profit
organizations that provide a fiber-optic network
connecting and unifying the research and education
communities in New York and New England. NEREN
securely enables some of the most prestigious universities
in the world to explore the global resources that utilize
ultra broadband applications.
NEREN
The NEREN network ties together in-state fiber
initiatives effectively creating an e-corridor that links the
members not only to one another but also to facilities
throughout the region and globe. The network primarily
transports research, academic and healthcare information,
but is also intended to allow corporate and government
members to form partnerships and collaborations with the
region's, academic, research and healthcare members.
28. 28
The Ohio Supercomputer Center provides
supercomputing, networking, research and educational
resources to a diverse state and national community,
including education, academic research, industry and
state government.
OSCnet
At the Ohio Supercomputer Center, our duty is to
empower our clients, partner strategically to develop new
research and business opportunities, and lead Ohio's
knowledge economy.
The Southeastern Universities Research Association
(SURA) is a consortium of colleges and universities in
the southern United States and the District of Columbia
established in 1980 as a nonstock, nonprofit corporation.
SURA serves as an entity through which colleges,
SURA Crossroads
universities, and other organizations may cooperate with
one another and with government in acquiring,
developing, and using laboratories and other research
facilities and in furthering knowledge and the application
of that knowledge in the physical, biological, and other
natural sciences and engineering.
Larger US Grid Fabrics are highlighted by the National Lambda Rail. This grid fabric
arose from the Internet2 consortium with additional funding from the National Science
Foundation. This new grid fabric is enabling some of the most difficult and extensive research
operations in the United States and is now stretching across oceans to enhance a more diverse
global collaboration grid fabric (GCGF). This infrastructure leads to an accelerated dispersion of
knowledge supplementing our increasingly globalized reality because knowledge, due to its
depersonalized and universal nature, lends itself to the forces of globalization (Delanty, 2001).
This is arguably an important part of the infrastructure of the knowledge society we see
developing out of what has been characterized as the postindustrial information society
(Castells, 1996) (Stehr, 1994) (Bohme, 1997). These infrastructures continue to grow and reach
across international boundaries and, by the very nature of their design and deployment,
29. 29
encourage increased collaboration across political boundaries, supplementing the reach of
globalization. Research collaborations across the National Lambda Rail are extensive and are
briefly outlined in table 3 below.
Table 3 - Supraregional US Grid Fabric
International peering fabric enabling collaboration between
Atlantic Wave
researchers in Canada, the U.S., Caribbean and South America
The Community Cyberinfrastructure for Advanced Marine
Microbial Ecology Research and Analysis, leverages NLR
CAMERA infrastructure to build state-of-the-art, computational resources
and to develop software tools to decipher the genetic code of
communities of microbial life in world oceans.
NLR and members University of New Mexico and the
Corporation for Education Network Initiatives in California
(CENIC) provided the ultra high-speed network linking a
DreamWorks/Cerelink digital media studio in Rio Rancho with
Hollywood. The demonstration, on February 17, showcased
how large, 3D animation files can be created in New Mexico
CENIC / ABQG and delivered quickly, securely and reliably to Hollywood
University of New Mexico studios.
NLR arranged for a 1-Gbps FrameNet circuit between the New
Mexico and the Los Angeles points-of-presence (PoPs). New
Mexico Governor Bill Richardson referred to the
demonstration as a "major advance in digital media
production."
NLR’s coast-to-coast; high-performance backbone network
enables ESnet, or the Energy Science Network, of the
ESnet Department of Energy (DOE), to support the high-bandwidth
projects of thousands of DOE researchers and collaborators
around the country.
For GENI, the Global Environment for Network Innovations,
NLR makes available up to 30 Gbps of capacity on three
different networks, FrameNet and CWave at Layer 2 and
GENI PacketNet at Layer 3. GENI researchers utilize these NLR
networks as the platform for a wide range of advance research,
including in communications, networking, distributed systems,
cyber-security and networked services and applications.
30. 30
NLR provides the 10-Gigabit Ethernet connectivity between
NASA centers and facilities around the U.S., including
NASA
Sunnyvale to Washington, D.C. and Washington, D.C. to
Atlanta.
The Open Cloud Consortium (OCC) uses NLR as its wide-area
test bed network, supporting the development of standards for
cloud computing and frameworks for interoperating between
Open Cloud Consortium clouds. Using the NLR infrastructure, the OCC recently
demonstrated the first cloud designed for HIPAA-compliant
applications and the first wide area cloud that uses a wide area
10 Gbps network.
Dedicated, high-capacity NLR circuits link research teams in
Southern California and Chicago who are pioneering a
radically new, distributed cyberinfrastructure based on optical
networking, not computers, to support data-intensive scientific
Optiputer
collaboration. Scientists who are generating terabytes and
petabytes of data will be able to interactively visualize,
analyze, and correlate their data from multiple storage sites
connected to optical networks.
NLR and its partners are making possible high-speed, high-
performance connections between researchers around the
Pacific Rim, bridging the gap between national and regional
networks. NLR is helping to create, deploy and operate an
Pacific Wave
advanced, extensible peering facility along the entire US
Pacific Coast. Recent applications included a demonstration of
“4K” video teleconferencing, which has 4x the resolution of
HDTV, between Tokyo, San Diego and Chicago.
NLR provides the ultra-high speed, high capacity backbone
infrastructure for TeraGrid, the world's largest, most
comprehensive distributed cyberinfrastructure for open
scientific research. Thousands of researchers around the
TeraGrid
country take advantage of the over 100 discipline-specific
databases, high-performance computers and high-end
experimental facilities interconnected via TeraGrid under a
major National Science Foundation grant.
NLR is the vital, high-speed; high-capacity link between
Sunnyvale, CA and Chicago for UltraScience Net, an
experimental research test bed funded by the Department of
UltraScience Net Energy’s Office of Science and managed by Oak Ridge
National Laboratories. UltraScienceNet develops hybrid
optical networking and associated technologies to meet the
unprecedented demands of large-scale science applications.
31. 31
The U.S. National Grid Fabric (NGF) is highlighted by the National LambdaRail (NLR). The
NLR is the ultra-high performance, 12,000-mile network infrastructure that makes possible many
of the world’s most demanding research projects.
The NLR is owned by the U.S. research and education community and provides high
performance networking and resource sharing on a platform dedicated to a wide range of
academic disciplines and public-private partnerships. The NLR offers unrestricted usage and
bandwidth, cutting-edge network services, applications, and customized service for individual
researchers and projects. The NLR map is seen below
Figure 1 - National Lambda Rail Map
32. 32
Comparative analysis of Canadian and U.S. grid development
Regional grid fabrics started to emerge in the United States and followed by a phase of
consolidation and expansion that has evolved into what we now define as semi-mature NGF.
The CA*net3 (subsequently CA*net4) topology is indicative of consolidation in a shared tree
and explicit joint model. The CA*net3&4 PIM-SM domain topography serves the national
deployment. The various topologies of high speed high performance research grids enable
institutions to transmit operate upon, and share enormous data sets related to academic
investigation. Given this fantastic capability, the question starts to narrow in on questions of
language preference, governmental influence, and any emerging differences between universities
located in different geographical areas of Canada, namely the Maritimes and Quebec compared
to the rest of the country where English is the predominant language and culture. The Canarie
Advanced Network topology that fosters the CA*net4 backbone is shown in figure 2 below.
33. 33
Figure 2 - CANARIE Map
Given the rapid evolution of the NGF in the United States, the Canadian government in
collaboration with regional and national grid organizations invested in significant upgrades to
advance the Canadian NGF. With support from the NRC the enhanced infrastructure is known
as Ca*4 and has extended membership, access, and notably, increased presence in traditionally
underserved areas in the far north.
34. 34
Significance of the study
The significance of the study is to add to the field of inquiry relative to how we
collaborate across NGF's and GCGF's. This can have implications as the global research
environment matures. The GCGF research environment allows us to extend research capacity to
all areas of the globe and engage broader perspective and greater diversity of thought. It is,
really, no different than how we embrace diversity on our local campus except that it seeks to
extend diversity across institutions on a global basis. To bring together the great minds of all the
continents would, no doubt, be a noble endeavor. The consequences of failing to analyze and
implement appropriate policy regarding inter-institutional and international collaboration across
these research grids would certainly seem to be a significant limitation in an increasingly
globalized society.
35. 35
CHAPTER II: REVIEW OF THE LITERATURE
Grounding literature and theoretical framework
While exploring the intersection of research and of the evolution of grid infrastructures created to
enable advanced collaborative and parallel research networks, I have experienced a progressive
interest in the exploration of certain microeconomic theories that supplement the insights of both
academic and industry leaders. Higher education, as an institution that must survive in the
society that sustains it, is not immune from the forces of the economy (Barr, 2002). Extending
collaborative models that leverage high performance research grids, by the nature of distributed
computing architecture, results in enormous opportunity to share resources across member
institutions, reducing cost pressures to each institution for similar resources that would otherwise
be sustained internally.
The very act of shifting work into extracorporeal environments, digital or otherwise, may
reasonably be interpreted as outsourcing. The extent of this activity; the costs and benefits, and
the various dynamics of impact to all parties concerned provide an interesting and fertile ground
for investigation. This proposal draws upon the intersection of research flavours that include
academic capitalism (Slaughter, 2004) (Slaughter S., 1997), resource based view (RBV), and
transaction cost economics (TCE) (Huang, 1998). Seeking to understand the demographic
landscape of the research is informed by the discussion of basic and applied research in higher
education and attempts, wherever possible, to identify and quantify these conditions (Bush,
1945) (Stokes, 1997).
Modern Institutions are surrounded by complex and dynamic economic conditions.
Factors that shape and define research agendas are influenced by a myriad of different forces. To
explore the evolving collaboration patterns in research using advanced collaborative GRID
36. 36
Infrastructures seems like a natural field for academic investigation. The underpinning
methodology is taken from the field of webometrics which seeks to understand intellectual and
social dynamics within and between research disciplines involved with high performance
computing and narrows the scope to the evaluation of hyperlinking patterns as a grounding
parameter for scoping the impact of these developing collaborative environments (Thelwall,
2002) (Fry, 2006).
Outsourceability
Outsourceability is compromised of many different viewpoints informed by robust
resources of peer reviewed material. Academic theories also apply to the study of outsourcing.
Resource based view speaks to the early years of outsourcing, especially in support services in
countries such as India. Given the limited nature of resources incurred by most institutions, there
are times when the institution cannot possibly bring suitable resources to bear on specific areas
of research in basic or applied research interests as described by Bush (Bush, 1945). A good
example of this would be research that requires extensive computational overhead. Certain
institutions maintain massively parallel supercomputer facilities, but it is far more often the case
that institutions do not have such facilities. RBV would suggest that if the institution is not able
to field the resource at a World Class level, then this component of the research is a candidate to
be outsourced.
Recent academic developments in this arena explore the application of more theoretical
constructs. Business definitions of words such as outsourcing tend to transition from business to
academe, and it is now an established part of the lexicon in higher education research. This
transition from a passing interest in an emerging area of economic development to a significantly
researched area of academic inquiry is a natural progression.
37. 37
For the purposes of this research, the challenge is to rethink the way we look at
outsourcing research by how we define that activity. When we look at collaborative consortiums
such as those that now thrive in higher education research, we see extensive sharing of resource
bases, whether they be hardware or software, whether they be facilities or equipment, or whether
they consist of exchanging and collaborating with human resource assets, i.e. multiple
investigators from various institutions wielding various sets and subsets of these resources. For
this research, I define the outsourceability of an activity as it relates to the degree to which it is
beneficial to outsource that activity in accordance with the work of Mol (MOL, 2007). I support
the genesis and growth of outsourcing as being correlated to shifting and compressed budgets
and I also note that as research agendas change, the budgets change along with them, constantly
shifting the nature of academic inquiry susceptible to outsourcing.
Resource based view
Whenever an organization finds itself in a position where a specific process or certain
work is not longer inimitable, nor is it inherently a part of the core competencies that mark their
strengths or refined areas of expertise, this is considered to be fertile ground for outsourcing
activities or processes. In most cases, RBV identifies and shapes the matrix of research that can
be outsourced. Organizations seek to efficiently leverage existing collaborative relationships
with other institutions to maximize budget generation potential via enhanced competitive
positioning in the grant review process and, quite naturally, to generate superior results as a
consortium. This is typically seen in research programs where multiple institutions partner in a
collaborative effort to distribute resources in a manner that leverages various strengths of
different institutions. Some may have supercomputer overhead while others maintain a
synchrotron or proton accelerator, while yet another may have World Class experts in various
38. 38
fields of study. In a mixed resource pool, all parties bring certain offerings to the group (Yang,
2007).
Transaction cost economics
Another aspect of collaborative research environments speaks to the bottom line of cost
metrics. TCE tends to be leveraged a great deal when structuring business ventures, but this
theory is also seen in a variety of different ways in higher education. Typically, most public
research institutions tend to have an office that manages aspects of grant related research. This is
typically seen as part of an award system whereby the institution assumes a certain percentage of
the grant as a pro rata payment for the overhead costs associated with housing and maintaining
the facilities where the research is conducted. In these environments, the institutions are,
especially in cases involving major national funding bodies such as the National Institutes of
Health (NIH) or the National Science Foundation (NSF), given to maintain a cap on these
overhead expenses.
The granting agencies, quite naturally, seek to keep costs down in order to minimize the
institutional “take” from the grant that is typically applied towards maintenance and overhead
expenses, is but another example of the various forms of market pressures and incentive
strategies that tend to drive researchers to pursue value chain options in their research. In
essence, if they can accomplish greater amounts of research by outsourcing various aspects of
the research that are obvious candidates of value chain enhancement, thus reducing overall
expenses associated with those aspects of the research that are highly outsourceable, it becomes
increasingly likely that they will do so.
The schematic of TCE, however, points out the difficulties of this theory insomuch as it
speaks to uncertainty and asset specificity. While there is little uncertainty surrounding issued
39. 39
grants, there are enormous uncertainty surrounding extensions of many of those grants and the
continued support from the various sponsors of research, especially where basic research is
concerned. This uncertainty is lessened, obviously, in direct correlation to applied research that is
seen to hold the promise of profitability.
Figure 3 - TCE Schematic
Agency theory
Because higher education research is not typically grounded in the day to day profit
motives of corporations, there are differing views of the nature of value chains that exist. These
value chains span expertise and resources in structured remote collaborative environments
(RCE).
After stripping out the profit motives, we can see how agency theory informs RCE
organizations in Higher Education research environments. The problem domain in agency theory
arises when “the principal and agent have partly differing goals and risk preferences (e.g.
compensation, regulation, leadership, impression management, whistle blowing, vertical
integration, transfer pricing)” (Eisenhardt, 1989).
Agency theory speaks to challenges encountered when collaborating parties have a
divergence of goals. If both organizations are engaged in research that holds the same end goals,
such divergence is less likely to occur, setting the stage for enhanced research output.
40. 40
These theoretical constructs underpin the motivations for institutional participation in
NAGR activities. The extent and the nature of that participation in RCE are correlated to
institutional resources, and this contributes to the nature, quality, and ultimately, the amount of
academic output. Measuring output leveraging bibliometric analysis offers a method to secure
data points surrounding academic output while shedding insight into qualitative aspects of RCE
in NAGR environments. Hyperlink mapping done on institutional and departmental levels have
shown that patterns of collaboration exist across national research infrastructures and have also
shown that collaboration external to national infrastructures have revealed interesting patterns of
collaboration across languages that are the same or similar, whereas languages that are quite
dissimilar shows dramatically lower levels of outlinking. While there is debate regarding the
qualitative nature of outlinks (i.e. journal level publications), it is possible to disaggregate and
categorize outlink data, showing meaningful patterns at the level of the department.
CHAPTER III: METHODOLOGY
Pilot study
A pilot study was implemented to test drive the software and query structure necessary to
complete the study. After numerous attempts to capture and categorize inlink/outlink structures
using a variety of different software including open source web crawling software such as Nutch,
it became evident that the complexities of gaining permission to crawl intra-institutional and
inter-institutional websites would be a major problem. It is also likely that many institutions
would have policies in place that prohibit such activities in the name of institutional security. In
other words, getting behind the firewall is a huge mountain to climb. Fortunately, drawing on
the work of Jenny Fry and Mike Thelwall (Thelwall, 2002) (Fry, 2006), an open source platform
41. 41
was discovered that permits hyperlink analysis and data capture without requiring invasive web
crawling procedures. This eliminates the problem of having to gain access to each institution
website through what would be a lengthy and drawn out bureaucratic process at best. Instead, it
is a non-invasive scan of existing page links that may be captured and recorded into a
spreadsheet and/or a database structure of choice.
The first test of this technology provided the data required for this study and also allows
for flexibility in deployment strategies. In short, almost any variable required for hyperlink
analysis may be easily programmed, keywords may be selected at the pleasure of the researcher
and better still, this technology can be applied across the internet and can be used to evaluate
hyperlink structures anywhere on the internet.
Preliminary findings have not yet been categorized, but have been presented in their raw
format in tables 7-12 and were focused on Laval University in Canada to test the flexibility of
the query structures. Categorization structures are noted in tables 5 and 6. These structures were
designed based on accepted scientometric standards (Thelwall, 2002; Vaughn L, 2007; Persson,
1997) (Fry, 2006). An explanation of the query structure is noted in table 4. Because of the
simplicity of access and the ability to deploy in any region, these query structures and open
source software tools present a rich ability to collect this data and also presents an easily
accessible resource for any researcher in this field of study and requires little specialized
hardware, thus making it a tool that can be leveraged with great ease. Its greatest strength is that
it is a tool that is completely open source and readily available to anybody.
42. 42
Resource collating and data preparation
Categorization of research collaborations across heterogeneous high performance
research grids in select US and Canadian Grid structures presents webometric challenges,
but accessing linking and co-linking data is readily available leveraging existing tools.
Some tools were evaluated and discarded based on search engine lack of support either as
a standalone product or in conjunction with third party developers via the application
programming interface (API) if, in fact, an API exists at all. This approach was discarded
due to the programming challenges presented with uncertain probability of a successful
outcome.
Crawling sites for co-linking structures presented ethical issues with page demand
constraints, and it also presented difficulties regarding the leveraging of the best open
source options (Nutch was the best candidate) but relied upon UNIX platform for
accessibility, this too, was not feasible for the investigation.
The co-link command on Yahoo was supplemented by the ability to leverage –site
command and the –link command in supplement to link and linkdomain commands
respectively. This provided a mechanism capable of delivering data returns that can be
sorted, qualified, categorized, and then analyzed.
43. 43
Table 4 - Query Structure Examples
Query Data Output
(link:http://www.canarie.ca -site:u.canarie.ca)
Co-links to domain
AND (linkdomain:http://www.westgrid.ca -
home page
site:http://www.westrgid.ca)
(linkdomain:http://www.canarie.ca –
link:http://www.canarie.ca) AND Co-links to domain non-
(linkdomain:http://www.westgrid.ca – home pages
link:http://www.westgrid.ca)
o (link:http://www.canarie.ca -site:u.canarie.ca) AND
(linkdomain:http://www.westgrid.ca -site:http://www.westrgid.ca)
(This query returns co links to home pages)
Data categorization
Data collected from early co-linking analysis sustains the work of Vaughn, Kipp and Gao in their
examination of co-linking (Vaughn L, 2007). The categorizations structures are designed to
understand how and why researchers are linked by assessing inlink/outlink patterns
supplemented by categorization of language preferences. By categorizing these structures with
these criteria, it is hoped that pattern analysis will reveal both disciplinary patterns of linking
44. 44
overlaid with an assessment of language preference. The central idea is to understand how these
patterns impact collaboration in high speed high performance research grids (HSHPRG).
Canadian HSHPRG Co-Link Structures: Initial returns from NAGR Institutions Canada
Due to the nature of an officially bilingual country, Canada is fertile ground for
investigating language preferences in HSHPRG environments. Accordingly, the initial pilot
study was deployed with a French language institution in order to test out both data returns and
to see if any readily identifiable patterns emerged. Interestingly, it was noted in the limited
scope of the pilot study that language preference where the keyword "CO2" was used, returned
no evidence of language preference. It may be hypothesized that language preference may
correlate to particular fields of study. In fact, the only evidence of language specific preference
was noted on links to web pages hosted by the federal government of Canada where bilingual
design is mandated under federal law.
Proposed NAGR Inlink/Outlink Categorization Structure Design
The categorization structure design requires the data to be organized into different
buckets. Drawing on existing scientometric research, a categorization scheme was developed
with the intention to understand why and how these hyperlink patterns exist between institutions
as outlined in table 5.
The overlay of language preference is a categorization scheme outlined in table 6 and is
primarily designed to take note of those institutions that demonstrate identifiable language
preference patterns outside of federally mandated structures. While this is particularly
meaningful for the Canadian component of the study, it may offer interesting findings in US
institutions where collaborative environments transcend national boundaries.
45. 45
Language preferences and co-linked grids
A study deployed across Scandinavian research grid environments found that scientific
collaboration played a key role and noted similar degrees of production. Rates of intra-grid
collaboration and extra-grid collaboration were also noted (Persson, 1997).
The amount of collaboration varies across fields. Some fields, such as physics and
medicine, have a very high degree of domestic intra-grid collaboration whereas international
collaboration outside of contiguous regional grids is quite low (Persson, 1997). This seems to
suggest that value chain efficiencies may exert significant influence over collaboration in extra-
grid international contacts and provides the incentive to explore a Canadian/American
(CANAM) comparison. Initial results show emerging patterns in the Canadian infrastructure of
higher education where international collaboration is concerned. Using one test subject of great
interest to the current mainstream academic interests, carbon dioxide (CO2) and global climate
change, the outcome should provide interesting findings regarding what parts of the country are
engaging in the research and how they collaborate with US Institutions, the Government of
Canada, and do French institutions prefer to do this in the English or French language, which is
of interest where primarily French speaking institutions are concerned.
Sample data collected: Co-link with specificity “CO2” research
Table 5 - U Laval Linkdomain Query (ulaval.ca+.ca+.edu+"co2")
Data
Type Univ Type 1 University of Alaska Fairbanks
FR/EN French Language Southern Illinois University
U Laval U Laval Michigan Technical University
Xact Text "CO2" University of California Santa Barbara
linkdomain:ulaval.ca +site.ca
Query +site:.edu "co2" University of California Los Angeles
Filter .ca University of Wisconsin
Filter .edu Duke University
46. 46
Filter Duke University
University of Arizona
Table 6 - U Laval Linkdomain Query (ulaval.ca+.ca+.gc.ca+"co2")
Data
Type Univ Type 1 Natural Resources Canada
FR/EN French Language Ressourses naturelies Canada
U Laval U Laval Fisheries and Oceans Canada
Xact Text "CO2" Peches et Oceans Canada
linkdomain:ulaval.ca +site.ca
Query +site:gc.ca "co2" Chaires de recherche du Canada
Filter .ca Canada Research Chairs
Filter .gc.ca CANMET
Filter Ressourses naturelies Canada
Table 7 - U Laval Linkdomain Query (umontreal.ca+.ca+.edu+"co2")
Data
Type Univ Type 1 University of Pittsburgh
FR/EN French Language Unviersity of Buffalo
U Laval University of Montreal Utah State University
Xact Text "CO2" Gallaudet University
Query linkdomain:umontreal.ca +site.ca +site:.edu "co2"
Filter .ca
Filter .edu
Filter
Table 8 - U Laval Linkdomain Query (montreal.ca+.ca+.gc.ca+"co2")
Data
Type Univ Type 1 Natural Resources Canada
FR/EN French Language Ressourses naturelies Canada
U Laval University of Montreal
Xact Text "CO2"
Query linkdomain:montreal.ca +site.ca +site:gc.ca "co2"
Filter .ca
Filter .gc.ca
Filter
47. 47
Table 9 - U Laval Linkdomain Query (usask.ca+.e.ca+.edu+
co2")
Data
Type Univ Type 1 University of Colorado
FR/EN English Language
U Laval U Saskatchewan
Xact Text "CO2"
linkdomain:usask.ca +sit:e.ca
Query +site:.edu "co2"
Filter .ca
Filter .edu
Filter
Table 10 - U Laval Linkdomain Query (usask.ca+.ca+.gc.ca+"co2")
Data
Type Univ Type 1 Environment Canada
FR/EN English Language Environment Canada
U Laval U Saskatchewan Environnement Canada
Xact Text "CO2" DFAIT
Query linkdomain:usask.ca +site:.ca +site:.gc.ca "co2"
Filter .ca
Filter .gc.ca
Filter
Pilot Study Analysis
Because French and English institutions were compared, it was readily evident that
inlink-outlink analysis, on a superficial level, were highly dependent upon language. This was
not unexpected given the results of previous studies conducted across Scandinavian countries
(Persson, 1997). The pilot study found a very direct correlation to language preference in the
48. 48
very first data sets that were analyzed. These results would likely parallel other distinct English
and French speaking universities.
Accordingly, since language preferences are predominant across Universities in Canada,
the curiosity of learning just how much influence language would impact grid collaboration
environments. The mainstay of the study is to uncover collaborative patterns that exist between
regions grids in the CANAM grid infrastructure.
Nevertheless, while the study seeks to understand collaborative patterns of inlinking and
outlinking at the higher level of research grid collaboratory environments, keeping an eye open
for obvious language differences that may present themselves would, of course, be noted in this
study. NAGR Inlink / Outlink Categorization structure was limited according to a manageable
structure that was determined to be manageable after exhaustive analysis by previous researchers
in the field (Fry, 2006) (Thelwall, 2002).
Instead of trying to parallel the work of language preferences, the study seeks to apply a
unique analysis that leverages the thought patterns of previous research, but focuses instead of
directly upon language, nor upon high level domains (i.e. th eArizona.edu) domain. The
investigation, instead, will focus on regional grid infrastructures in regional proximity within the
United States and Canada in order to determine the nature of research collaborations that take
place at the top devel domain followed by a more granular analysis of department level analysis
of those institutions where Education related collaborations are underway.
In addition, the study will seek to explore direct collaborative activities between
prestegiouis high speed high performance research grid at high level institutional levels and
compare that to overal US News and World Report rankings. The study will also seek to analyze
any components of Higher Education Adminisstration programs that found to be associated with
49. 49
these institutions. In short, is there a correlation of inlink/outlink connections between
institutions where Higher Education Administration Programs are ranked in US News and World
Report.
Table 11 - NAGR Inlink/Outlink Categorization Structure
Research Teaching General Not Related Total
Table 12 - NAGR Inlink/Outlink Language
Institutional English &
English French Total
Language French
CONCLUSIONS