Slides from a talk given at the National Genealogical Society Family History Conference, Raleigh, NC, May 11th, 2017 - Session T218. Shared for personal use only. No use approved for non-profit or for-profit organizations.
General Principles of Intellectual Property: Concepts of Intellectual Proper...
Beyond Google: The Evolution of Search - NGS 2017
1. NGS FAMILY HISTORY CONFERENCE
Raleigh, North Carolina, 2017
Session T-218, Syllabus, p. 169
JORDAN JONES
E -mail : jordan@genealogymedia.com
Web: genealogymedia.com
Twitter: @genealogymedia
2. oTurn off any noise-making ringers on devices.
oUse phones, tablets, and computers only for personal notes
or brief social media posts.
oDo not take photos of slides.
oPhotos during the session are only by prior written
permission. No permission having been requested or
approved, there should be no pictures during the session.
oPhotographing or recording any part of this session, or any
session at NGS, is a violation of the speaker’s copyright.
oIn any case, I will post all of my slides at
www.genealogymedia.com/talks
3. 1. Access: Navigation and Search
2. How Do Search Engines Work?
3. Genealogical Searching
4. Basic Searches
5. Advanced Searches
6. Special Google Search Features
7. Methodologies
8. A Search Example: Jane Graham
4.
5. Access is “The availability of or permission
to use records.”
— Archives & Records Management Handbook,
Oregon State U.,
http://osulibrary.oregonstate.edu/archives/handboo
k/definitions/
6. For web sites, access similarly describes
the permission and ability for people to
“identify, locate, and use information.”
7. The number of public web pages
quadrupled between 2005 (11.5 billion) and
2017 (47 billion).
One needs powerful search skills to find
relevant genealogical data.
8. o Navigation — Clicking through a pre-
defined path in a website to find the
information you need.
o Search — Helpful if you do not know
how to navigate to the information, or if
the site is designed for search, and
maybe has limited navigation available.
9. A good web designer will focus on
improving customer access to information
through both paths (search and
navigation).
However, some databases do not lend
themselves to navigation.
10.
11. 1. Gather
The search engine has computer
programs “crawl” the web, gathering all
the pages. Limited by:
• Login or Navigation Requirements
• Site Owner Requests for Inclusion or
Exclusion
12. 2. Organize
The search engine creates and manages
an index of all the words found on the
pages crawled
3. Cache
Some web search applications (such as
Google) store (cache) all the pages they
crawl
13. 4. Rank
Links are ranked in terms of relevance,
popularity, authoritativeness and other
criteria: The Secret Sauce.
5. Search
All the previous steps are in service of
helping you find content.
16. 1. Keyword
Every significant word is part of the
search. “Every word searchable.” This is
mainly what you go to Google for.
2. Database
Words are searched against particular
fields in a database, such as “surname”
or “state.” This is how most genealogy
sites are organized.
19. To include common words or to ignore
plurals and synonyms and search for a term
exactly as you typed it, preface the term with
a + sign.
Google: [ Jane Graham +barn ]
20. The exact phrase “Jane Graham”.
Google: [ “Jane Graham” ]
will find only pages with the exact phrase
“Jane Graham” and ignore pages that only
have phrases such as “Jane Eliza Graham”
21. “Jane” is near “Graham”.
Google: [ Jane * Graham OR Graham * Jane
]
will find “Jane Eliza Graham”, “Jane ‘Liza’
Eliza Graham”, and “Graham, Miss Jane”
but not “Jane Graham” or “Graham, Jane”
22. AND / OR
Google: [ “Jane Graham” OR “Graham, Jane”
]
This allows you to find something in spite of
the alternate ways something may be listed.
It is very handy for first-name last and last-
name first searches or searches with names
and nicknames.
23.
24. Find pages with words like your search
term or phrase.
(More useful outside genealogy, but has
some similarity to a Soundex search.)
Google: A search for [ ~genealogy ] returns
results about genealogy and family history.
25. Some sites allow wildcards (*_?) to replace
one or more characters. Check the site’s
guidelines.
These are very handy for checking for
occurrences of variably-spelled surnames.
26. o ? replaces one character
o * replaces zero to five characters.
o Names must contain at least three non-
wildcard characters.
o To search for Janson OR Johnson OR
Jensen OR Johannsen, you could type
J*ns?n
28. o * replaces one or two whole words.
o There is no wild card for less than a word
on Google.
o For multiple surname spelling searches
on Google, you would use the Boolean OR
operator [ Johnson OR Jensen OR
Johannsen ]
29.
30. Find pages that do not include a particular
word.
Google: [ Jane Graham -Eliza ]
31. Find pages that do not include a particular
phrase.
Google: [ Jane -“Eliza Graham” ]
32. Find pages in a specific site, such as
usgenweb.org, where ...
Google: [ site:www.usgenweb.org Graham ]
33. Find pages, but exclude a specific site.
Google: [ -site:www.usgenweb.org Graham ]
Useful when you already know what’s at one
site, and it is coming up prominently in your
searches.
34. City, County, State or Other Locale can
narrow search results drastically.
Google: [ “Jane Graham” “Monroe County” ]
Keep in mind that this is only a keyword
search on Google. It does not understang
you are talking about a place.
Major genealogy sites, Flickr and other
database searches provide fields to narrow
searches by location.
35. Birth Certificate, Obituary, Newspaper
Google: [ “Jane Graham” newspaper ]
Major genealogy sites provide methods for
limiting searches to particular record types
based on database searches.
Google does not know what a newspaper is,
but it sees the word “newspaper” in the site,
or in links to it.
36. Google databases are databases for
particular records or kinds of information
Google: Image Search, Map Search
Major genealogy sites provide tools for
limiting searches to specific databases or
groups of related tags or descriptors.
37. Soundex, available on many genealogy
websites, and not only for census records.
Find pages with words like your search
term or phrase. (More useful outside
genealogy, but has some similarity to a
Soundex search.)
Google: A search for [ ~genealogy ] returns
results about genealogy and family history.
38.
39. Google Instant is Google’s predictive search.
Based on what you have searched for and
what others have search for, it suggests
searches you might want to run.
40.
41.
42.
43. Searches recent pages.
Google has recently hidden this under the
Tools menu.
This is handy for looking at pages crawled
since you last ran a search.
44. Stephen Morse points out that Google is
really tracking when Google indexed a
page, not when the page was last modified.
Probably a better search for the age of a
web page is Stephen Morse’s:
http://stevemorse.org/google/googledate.ht
ml
45. Narrow your search to images (including
images of documents), video, blogs, voice
(for voicemail in Google Voice), news
(including recent obituaries), books, social
media.
See also GoogleMaps, which has taken over
from Google’s searchable map and image
mashup, Panoramio.
46.
47.
48. Google Ngrams tracks the currency of words
in the millions titles in Google Books. It’s
handy to understand when words became
popular or passé.
https://books.google.com/ngrams
49.
50. Google Shopping can be used for quick
comparison pricing for that next scanner
51. @ – Search social media, such as @Twitter
or @Facebook.
53. a) Read the search tips.
b) Look for advanced search pages.
c) Experiment! Make your search more
specific to narrow it. Make your search
more general to widen it.
d) Map out strategies for more specific or
varied searches.
e) Automate search results into your inbox
with Google Alerts.
55. Fold3 Advanced Search Page.
https://www.fold3.com/s.php and click
“Advanced.”
GenealogyBank Advanced Techniques.
http://www.genealogybank.com/informatio
n/help/advanced-techniques
Google Advanced Search.
http://www.google.com/advanced_search
(under the Settings menu)
56.
57.
58.
59.
60. What do you want to know?
How close do your search results get you?
What might you subtract (if you have too
many results) or add (if you have too
few)?
61. Learn from your searches and build on
them.
Note what works in some cases, and see
where else you might apply it.
Keep a “search log” just as you keep a
“research log”
62. You can have Google search in the
background and send results on a regular
basis:
www.google.com/alerts/
63. Get results when Google finds them (“as-it-
happens”), daily or weekly
Select “All” or “Only the best” results
Choose to receive your results
in your e-mail, or via an RSS feed
64.
65.
66. oFacts: Jane Graham, was born in 1811
and died unmarried in 1854.
She lived her life in Monroe County, VA
(now WV).
oQ: How do I find her?
oA: By adjusting the specificity of the
search.
67.
68.
69.
70.
71.
72.
73. By creating a more specific search, we
narrowed the results
from nearly 51 million
to 7,
or by a factor of almost 7.3 million!
For librarians, the main issues are: Is the information available? And Do you have permission to use the information?
For technologists, in addition to permission and ability to access, there’s also the question of identifying, locating, and using the information.
Gather – By “crawling” through links on the web using software “spiders,” a search engine gathers the information it will use to provide search results. Crawling is limited in two ways:
o By Login or Navigation Requirements. Some web pages are not accessible to spiders because of the technology used, or because of login requirements.
o By Site Owner Requests for Inclusion or Exclusion of URLs. Webmasters use a file called robots.txt to request limitations to crawling. Conversely, a sitemap suggests pages to search, and can include pages not directly available to the search tool.
Organize – The search engine “indexes” the sites it has crawled. That is, it creates and manages a series of relationships between index terms (words and phrases) and URLs (websites).
Cache – Some web search applications (such as Google) store all the pages they crawl. This allows you to see what was on the webpage at the time the search engine crawled it, in case the page has changed or moved before you receive your search results.
Rank – Links delivered are ranked in terms of relevance (is the term in the title, the metadata, or some other prominent place on the website?), popularity (how many pages link to it?), authoritativeness and other criteria: The Secret Sauce. SEO (Search Engine Optimization) is a field dedicated to improving site rank.
Search – Now we’re getting somewhere. This is the function you came to the search engine to get, but understanding the previous operations will help you get better search results.
Gather – By “crawling” through links on the web using software “spiders,” a search engine gathers the information it will use to provide search results. Crawling is limited in two ways:
o By Login or Navigation Requirements. Some web pages are not accessible to spiders because of the technology used, or because of login requirements.
o By Site Owner Requests for Inclusion or Exclusion of URLs. Webmasters use a file called robots.txt to request limitations to crawling. Conversely, a sitemap suggests pages to search, and can include pages not directly available to the search tool.
Organize – The search engine “indexes” the sites it has crawled. That is, it creates and manages a series of relationships between index terms (words and phrases) and URLs (websites).
Cache – Some web search applications (such as Google) store all the pages they crawl. This allows you to see what was on the webpage at the time the search engine crawled it, in case the page has changed or moved before you receive your search results.
Rank – Links delivered are ranked in terms of relevance (is the term in the title, the metadata, or some other prominent place on the website?), popularity (how many pages link to it?), authoritativeness and other criteria: The Secret Sauce. SEO (Search Engine Optimization) is a field dedicated to improving site rank.
Search – Now we’re getting somewhere. This is the function you came to the search engine to get, but understanding the previous operations will help you get better search results.
You need to be aware of both keyword and database searches. They each have limitations. Keyword searches have a hard time understanding the significance of any particular word. Database searches are only as good as the data model and the data put into the model. There is often no way to search for the data if it’s not in one of those fields, even though it might exist in another.
Your goal is to use either or both of these tools to narrow your search results down the a reasonable size.
Entering the words Jane Graham into most search engines functions as a search with an implicit “and”. This search will return the pages where Jane AND Graham both appear.
Google ignores common words ("the"), it looks for words that are similar to your words, and it includes plurals and synonyms.
You can force Google to search for words exactly as you type them by putting a plus sign in front.
* Is a Google wildcard that replaces one or two words. We will talk about it some more later.
Boolean ands and ors are inclusive. In an OR used this way, you are finding all pages that have any of the values connected by the OR.
A tilde preceeds a “Like” search on Google.
Johnson OR Jensen OR Johannsen
Janson OR Johnson OR Jensen OR Johannsen
Johnson OR Jensen OR Johannsen
Johnson OR Jensen OR Johannsen
Type a word, click
Type a word, click
Type a word, click
However, it depends on what you are trying to do. If you really want to see what is in the search engine since you last looked, Google’s index age is what you want anyway.
Under Settings after you have done your first search, you can see Advanced Search, History (a helpful list of all of your recent searches), and help
This guides you through many of the searches I have described and
Plan your more complex web searches the way you would plan a trip to a major repository.
This guides you through many of the searches I have described and
This guides you through many of the searches I have described and
This guides you through many of the searches I have described and
This guides you through many of the searches I have described and