This document discusses the potential of science gateways and their role in connecting scientists to shared computing resources. It notes that gateways provide intuitive interfaces to access advanced capabilities, allowing scientists to focus on their research without deep technical knowledge. However, gateways must be developed by experts and sustained over the long term to gain the trust of scientists and truly enable new forms of collaborative, data-driven research.
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Science Gateways and Their Tremendous Potential for Science and Engineering
1. Science Gateways
and their tremendous
potential for science and
engineering
Nancy Wilkins-Diehr
TeraGrid Area Director for Science Gateways
wilkinsn@sdsc.edu
2. Thank You for the Invitation to Speak
To such a distinguished audience in such a beautiful location
•Many similarities
between Banff and
Gateways
–Both are about
connections
•National park created due to
sea to sea railway connection
–Trail guides lead the way
•“Peyto assumes a wild and
picturesque, though
somewhat tattered attire”
–Describes Banff trail guides and
gateway developers!
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
3. Phenomenal Impact of the Internet on Worldwide
Communication and Information Retrieval
Only 15 years since the release of Mosaic!
•Implications on the conduct of science are still evolving
– 1980’s, Early gateways, National Center for Biotechnology Information BLAST
server, search results sent by email, still a working portal today
– 1989, First ftp archive (archie) created at McGill
– 1992 Mosaic web browser developed
– 1995 “International Protein Data Bank Enhanced by Computer Browser”
– 2004 TeraGrid project director Rick Stevens recognized growth in scientific
portal development and proposed the Science Gateway Program
•Simultaneous explosion of digital information
– Analysis needs in a variety of scientific areas
– Sensors, telescopes, satellites, digital images and video
– #1 machine on Top500 today is 300x more powerful than all combined entries
on the first list in 1993
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
4. 1998 Workshop Highlights Early Impact of
Internet on Science
•Shared access to geographically
disperse resources
•Assembling the best minds to
tackle the toughest problems
regardless of location
•Tackling the same problems
differently, but also tackling
different problems
•Not only the scope, but the
process of scientific investigation is
changed
– “As the chemical applications and
capabilities provided by collaboratories Requirements for future success include:
become more familiar, researchers - Development of interdisciplinary partnerships of
will move significantly beyond chemists and computer scientists
current practice to exciting new - Flexible and extensible frameworks for
paradigms for scientific work” collaboratories
- Means to deploy, support, and evaluate
collaboratories in the field
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
5. Rapid Advances in Web Usability
•First generation
– Static Web pages
•Second generation
– Dynamic, database interfaces, cgi
– Lacked the ease of use of desktop applications
•Third generation
– True networked and internetworked applications that enable dynamic two-way,
even multi-way, communication and collaboration on the Web.
– Remarkable new uses of the Web in the organizational workplace and on the
Internet
Source: Screen Porch White Paper, The University of Western Ontario (1996)
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
6. What’s Next?
“Prediction is hard. Especially about the future.” Yogi Berra
•Scientists of tomorrow are familiar with media we don’t even know about
•Not using full power of the internet by any means today
– Data and knowledge are handled differently
•Linking publications and data referenced in those publications
•Annotation, data provenance
•Inability to create discourse around a piece of data
– Ability to keep up with knowledge generation
•16,000 papers a week into PubMed
•50,000 papers a week in biology
–Right now have choice between reading abstract or paper, might add 10 minute
author clip
•How can science motivate in the way YouTube can?
– Streaming video to view simulations, using visual and sound media
– Ipods everywhere, but not exploited for science
– Web 2.0
•Science was earlier internet adopter, now overtaken by business
– Now a big difference between commercial and scientific sites
• Noticeable efforts to keep users on commercial sites
Source: 5/14/07 interview with Dr. Philip Bourne, Protein Data Bank
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
7. The Internet as a Resource for News and Information about Science:
Summary of Findings at a Glance
40 million Americans rely on the internet as their primary source for news and information about
science. broadband users, the internet and television are equally popular as sources for science
For home
news – and the internet leads the way for young broadband users.
The internet is the source to which people would turn first if they need information on a specific
The convenience of getting scientific topic.
scientific material on the web The internet is a research tool for 87% of online users. That translates to 128 million adults.
opens doors to better attitudes Consumers of online science information are fact-checkers of scientific claims. Sometimes they use
and understanding of science. the internet for this, other times they use offline sources.
Convenience plays a large role in drawing people to the internet for science information.
Happenstance also plays a role in users’ experience with online science resources. Two-thirds of
November 20, 2006 internet users say they have come upon news and information about science when they went online
John B. Horrigan, Associate for another reason.
Director Those who seek out science news or information on the internet are more likely than others
to believe that scientific pursuits have a positive impact on society.
Internet users who have sought science information online are more likely to report that they have
higher levels of understanding of science.
Between 40% and 50% of internet users say they get information about a specific topic using the
internet or through email.
Search engines are far and away the most popular source for beginning science research among
users who say they would turn first to the internet to get more information about a specific topic.
Half of all internet users have been to a website which specializes in scientific content.
Fully 59% of Americans have been to a science museum in the past year.
Science websites and science museums may serve effectively as portals to one another.
http://www.pewinternet.org/pdfs/PIP_Exploratorium_Science.pdf
8. NSF (my sponsor) has long recognized the
importance of science and technology
interactions
•Interdisciplinary programs did much to facilitate application-
technology integration and develop standard tools
– 1997 PACI Program
•Marriage of technologists and application scientists
–A few groups served as path finders and benefited
tremendously
–NPACI neuroscience thrust in 1997 leads to Telescience
portal and BIRN in 2001
– Information Technology Research (ITR)
– NSF Middleware Initiative (NMI)
•Plug and play tools so more groups can benefit
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
9. NSF Continues Its Leadership Today
What Will Lead to Transformative Science?
•“Virtual environments have the
potential to enhance collaboration,
education, and experimentation in
ways that we are just beginning to
explore.”
•“In every discipline, we need new
techniques that can help scientists
and engineers uncover fresh
knowledge from vast amounts of
data generated by sensors,
telescopes, satellites, or even the
media and the Internet.” Gateways are a terrific example of
interfaces that can support
transformative science
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
10. Flagship US$52M CDI Program Launched in
2008
•Cyber-enabled Discovery and Innovation (CDI) is
– “NSF’s bold five-year initiative to create revolutionary science and engineering
research outcomes made possible by innovations and advances in
computational thinking.”
– Program announced October 1
•Bold multidisciplinary activities that, through computational thinking, promise radical,
paradigm-changing research findings
•Far-reaching, high-risk science and engineering research and education agendas that
capitalize on innovations in, and/or innovative use of, computational thinking
•Partnerships to involve investigators from academe, industry and may include
international entities
•Growth to US$250M recommended by 2012
– Funded across NSF directorates
•Birds-of-a-feather session at SC07 in Reno, NV
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
11. Three Thematic Areas Offer Diversity
•From Data to Knowledge
– Enhancing human cognition and generating new knowledge from a wealth of
heterogeneous digital data
– Data mining, visualization, petascale computational power, etc. to assist scientists and
engineers extract most important information from the almost infinite amounts of data
from sensors, telescopes, satellites, the media, the Internet, surveys, etc.
•Understanding Complexity in Natural, Built, and Social Systems
– Deriving fundamental insights on systems comprising multiple interacting elements
– Simulate and predict complex stochastic or chaotic systems
– Explore and model nature’s interactions, connections, complex relations, and
interdependencies, scaling from sub-particles to galactic, from subcellular to biosphere,
and from the individual to the societal
•Building Virtual Organizations
– Facilitate creative, cyber-enabled boundary-crossing collaborations, including those with
industry and international dimensions
– Advance the frontiers of science and engineering and broaden participation in science,
technology, engineering and math fields
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
12. Exciting Canadian Activities
•September 13, 2007 announcement of $30M CANARIE
program
– Network-Enabled Platforms (NEP)
•Collaborative projects that accelerate the development of, and participation in,
national and international cyberinfrastructure and e-Research platforms. Participants
in the Program can be from both the public and private sectors.
– Infrastructure Extension Program (IEP)
•Extensions to Canada's research and education network that will enhance and
accelerate research, enable national and international collaboration, improve access to
knowledge, and contribute to the development of cyberinfrastructure and e-research
in Canada.
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
13. Science Gateways are a Natural Extension of
Internet Developments
•3 common types of gateway
– Web portal with users in front and services in back
– Client server model where application programs running on users' machines
(i.e. workstations and desktops) and accesses services
– Bridges across multiple grids, allowing communities to utilize both community
developed grids and shared grids
•Continued rapid changes ahead, must be adaptable,
gateways can provide some nimbleness
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
14. Gateway Idea Resonates with Scientists
•Capabilities provided by the Web are easy to envision
because we use them in every day life
•Researchers can imagine scientific capabilities provided
through a familiar interface
•Groups resonate with the fact that gateways are designed
by communities and provide interfaces understood by those
communities
– But also provide access to greater capabilities on the back end without the
user needing to understand the details of those capabilities
– Scientists know they can undertake more complex analyses and that’s all they
want to focus on
•But this seamless access doesn’t come for free. It all hinges
on very capable developers
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
15. Trust and Reliability are Fundamental to Success
•Fundamental in business applications
– Fundamental for science too
•The public gains confidence in internet sites that provide
accurate information reliably
– Pub Med
– National Cancer Institute
– Google
– Paypal
•For scientists it takes far longer to build this confidence
– Scientists will not rely on gateway tools to conduct their analysis and store
their research results unless they have ultimate confidence in the interfaces
•Proven track record
–Run by reputable organization
–Have been in existence “a long time”
–Provide accurate results
–Work repeatedly
–Confidence in PDB developed over 30 years, started with community mandate that
proteins must be deposited before publications would be accepted
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
16. How can we build interfaces that scientists will trust?
•Expertise
– Simple web pages are easy to design
– Complex capabilities, particularly those involving grid access, take
knowledgeable developers to create a production product
•LEAD, nanoHUB show what investment can do
•Sustained funding
– Most science groups have money for research, not portal building or ongoing
support for portals
•Knowledge transfer
– Must take advantage of industry advancements
– Investments must result in building blocks that other applications can use
– Many gateways have similar issues
•Data access
•Analysis capabilities
•User work environments
•Workflow capabilities
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
17. Tremendous Opportunities Using the Largest
Shared Resources -
Challenges too!
•What’s different when the resource doesn’t belong just to
me?
– Resource discovery
– Accounting
– Security
– Proposal-based requests for resources (peer-reviewed access)
•Code scaling and performance numbers
•Justification of resources
•Gateway citations
•Tremendous benefits at the high end, but even more work
for the developers
•Potential impact on science is huge
– Small number of developers can impact thousands of scientists
– But need a way to train and fund those developers and provide them with
appropriate tools
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
18. What is the TeraGrid?
A unique combination of fundamental CI components
19. What is the TeraGrid?
•NSF-funded facility to offer high end compute, data and
visualization resources to the nation’s academic researchers
300+ Teraflops Computation
Visualization
20+ Petabytes Storage
Dedicated cross-country network
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
20. Opportunities and Challenges as a Virtual
Organization
•Full vision of cyberinfrastructure
– Data, compute, visualization, workflows
– But need to do a better job of representing the capabilities to researchers
– Creating prototypes for others to follow
– Never underestimate the value in keeping things SIMPLE
•Work with top notch people regardless of location
– Better for end users
•Single request process for all types of resources
•Single place for documentation
•But must work harder
– To sustain momentum in projects
•Set a few high-level goals
•Clear management structure
–Individual responsibility
–Project accountability
– To provide clarity for users
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
21. TeraGrid Resources Available for all Domain Scientists
At no cost to them!
•Integrated, persistent, pioneering
resources
•Significantly improve the ability
and capacity to gain new insights
into the most challenging research
questions and societal problems
•Peer-reviewed, proposal-based
access
– Targeted support available as
well
•Dedicated staff investment to
really make a difference on
complex problems
–Transformational science
•Must have PI commitment
•Make lessons learned available
for all
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
22. TeraGrid Usage
Specific Allocations Roaming Allocations
Compute
Cycles ~50% Annual Growth
Delivered
200
Normalized
Units
(millions)
100
TeraGrid currently delivers an
average of 420,000 cpu-hours per
Source: Dave Hart (dhart@sdsc.edu)
day -> ~21,000 DC every hour
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
23. TeraGrid User Community
Gateways
Growth Target
Source: Dave Hart (dhart@sdsc.edu)
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
24. Easy TeraGrid Gateway True and False Test
Answers Provided
•Any PI can request an allocation •TeraGrid selects all gateways (F)
and use it to develop a gateway •TeraGrid designs all gateways (F)
(T) •TeraGrid limits the number of
•Gateway design is community- gateways (F)
developed and that is the core •All gateways need TeraGrid
strength of the program (T) funding to exist (F)
•TeraGrid staff are alerted to
gateway work when a proposal is
reviewed or when a community
account is requested (T)
•Limited TeraGrid support can be
provided for targeted assistance to
integrate an existing gateway with
TeraGrid (T)
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
25. TeraGrid RATs
(Requirements Analysis Teams)
•Spring, 2005 Science
Gateway Requirements
Analysis Team (RAT)
– Identification of common needs
across the gateways
– Goal is production use of TG
resources in the gateway as well
as development of process and
policy within TG for scalable
gateway program and services
– Tremendous sharing of
experiences amongst talented
developers
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
26. 2006 – Implementing Common Gateway
Requirements
•Web Services •Scheduling
– GT4 deployment, identification of – Metascheduling RAT
remaining capabilities – On-demand via SPRUCE framework
– Information services, WebMDS •Outreach
•Auditing – Talks, Schools/workshops (NVO,
– Need to retrieve job usage info on GISolve), major project demonstrations
production resources (LEAD)
– GRAM audit deployed in test mode in – SURA, HASTAC, GEON, CI-Channel, SC,
September, inclusion in CTSSv4 Grace Hopper, MSI-CI2, Lariat, Science
•Community Accounts Workflows and On Demand Computing
– Policy finalized, security approaches for Geosciences Workshop
being tested by RPs •Primer
– Attribute-based authentication testing – Living document in wiki, provides up-to-
•Allocations date overview and instructions for new
gateway developers (“how to make your
– Changes in allocation procedures, the portal a TeraGrid science gateway”)
mechanisms used to evaluate science
impact, and models for identity
management, authentication and
authorization that are more tuned to
virtual organizations.
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
27. Current Activities – Moving Forward!
•Extend development of general gateway services
– React to and anticipate community needs
•Streamlined TeraGrid integration means more interest and more science
– Building Blocks for Science Gateways
(http://www.cigi.uiuc.edu/doku.php/projects/simplegrid)
•Continue targeted work with selected projects
– SidGrid, CReSIS
•Stay ahead of technology changes
– Well, at least not get too far behind…
•Build on burgeoning interest in gateways for education
– Navajo Technical College
– TeraGrid EOT supplemental funding
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
28. Planning for the Future of TeraGrid
•Activity lead by U Michigan School of Information
– www.teragridfuture.org
– Gateway (June) and user (August) workshops held
– Report due February, 2008
•Recommendations from gateway workshop include:
– Support interaction and cross-fertilization among Science Gateway
development communities
•Sharing code and successful solutions
•Financial and professional support for developing gateways
– Develop gateway framework templates built upon toolkits which may already
exist
– Training, education, workshops, generalized & standardized basic services,
documentation
– End-to-end support for Virtual Organizations
– Operating more effectively as a community in order to better support the
education and development needs of gateway developers.
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
29. Selected Gateway Highlights
•nanoHUB
•Linked Environments for Atmospheric Discovery (LEAD)
•GridChem
•Biomedical Informatics Research Network (BIRN)
•Center for Remote Sensing of Polar Icesheets (CReSIS)
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
30. Highlights: NanoHub Explosive User Growth
•In past 12 months
– 26,000 users
•50% of usage from U.S.
– 10 courses viewed by over 6,000 users
– 165 podcasts downloaded by over 4,000 users
– 1400 online meetings
•Short clip from Gerhard Klimeck
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
31. Highlights: LEAD Inspires Students
Advanced capabilities regardless of location
•A student gets excited about what he
was able to do with LEAD
•“Dr. Sikora:Attached is a display of 2-
m T and wind depicting the WRF's
interpretation of the coastal front on
14 February 2007. It's interesting that
I found an example using IDV that
parallels our discussion of mesoscale
boundaries in class. It illustrates very
nicely the transition to a coastal low
and the strong baroclinic zone with a
location very similar to Markowski's
depiction. I created this image in IDV
after running a 5-km WRF run
(initialized with NAM output) via the
LEAD Portal. This simple 1-level plot
is just a precursor of the many
capabilities IDV will eventually offer to
visualize high-res WRF output. Enjoy!
• Eric” (email, March 2007)
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
32. Highlights: GridChem’s Client-Server Approach
Provides Power and a Rich Feature Set
National Center for
Source: Sudhakar Pamidighantam, NCSA Supercomputing
Applications
33. Biomedical Informatics Research Network (BIRN)
BIRN is a National Center for Research Resources (NCRR) initiative
aimed at creating a testbed to address biomedical researchers
Source: Anthony Kolasny, Johns Hopkins
34. Shape Analysis - A Morphometry BIRN Project
4
JHU CIS-KKI
Shape Analysis
3
of Segmented Structures
MGH 5
BWH
Segmentation Visualization
TeraGrid
Supercomputing
Data Donor Goal: comparison and
1 quantification of structures’
Sites
Storage shape and volumetric
differences across patient
De-identification
And upload populations
2
Source: Anthony Kolasny, Johns Hopkins
35. BIRN uses SSHFS to mount TeraGrid
filesystems locally
CIS has 87TB
of local
storage.
/cis/net lists
network
drives.
220TB
through
CIS portal
using
autofs, samba,
smbwebclient.
Source: Anthony Kolasny, Johns Hopkins University
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
36. CReSIS (Center for Remote Sensing of Ice
Sheets)
•Awarded CI-TEAM
funding to build a Polar
Gateway
– International Polar Year 2007-2008
– Led by Geoffrey Fox, IU and Linda
Hayden, Elizabeth City State
•CReSISGrid
– Build a TeraGrid Science Gateway
– Provide broad-based educational and
training activity in Cyberinfrastructure
for remote sensing and ice sheet
dynamics
– Lessons learned in remote data
gathering can be applied to fields
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
37. When is a gateway appropriate?
•Researchers using defined sets of tools in different ways
– Same executables, different input
•GridChem, CHARMM
– Creating multi-scale workflows
– Datasets
•Common data formats
– National Virtual Observatory
– Earth System Grid
– Some groups have invested significant efforts here
•caBIG, extensive discussions to develop common terminology and formats
•BIRN, extensive data sharing agreements
•Difficult to access data/advanced workflows
– Sensor/radar input
•LEAD, GEON
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
38. Tremendous Potential for Gateways
•In only 15 years, the Web has fundamentally changed
human communication
•Science Gateways can leverage this amazingly powerful tool
to:
– Transform the way scientists collaborate
– Streamline conduct of science
– Influence the public’s perception of science
•Like e-commerce, Science Gateways need to build trust in
the infrastructure, tools, and methods that they use
•Unlike the public or commercial arena, scientists will be
vested in these gateways
– Science Gateways will need to build trust in the organization behind them.
Gateways need to have continuity
•High end resources can have a profound impact
•The future is very exciting!
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
39. Enjoy the Summit!
•Thank you for your
attention
•Please contact me for
further information
wilkinsn@sdsc.edu
Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)