Mais conteúdo relacionado Semelhante a Estrat search (20) Mais de Lee Schlenker (20) Estrat search1. ©2012 LHST
Search
Recovery and Discovery
Prof. Lee SCHLENKER
E-Stratégies
Sept 5th 2014
- Preliminary Draft -
How can you use enterprise
technologies to improve
apprenticeship?
2. ©2012 LHST
Focus Improve Knowledge Leverage Mesure
Organization Processes Explicit Transactions Efficiency
Services Delivery Implicit Interactions Effectiveness
Networks Relationships Emerging Interactions Innovation
Search Relevancy Connected Proximity CTR
3. ©2012 LHST©2009 LHSTProf. Lee SCHLENKER
Are business solutions anything
more than recovering something
something you once knew or
discovering something that is
« out there » that others can’t
find?
4. ©2012 LHST
• The size of the indexed world wide web in 2012
- Indexed by Google: about 40 billion pages
• Yahoo deals with 12TB of data per day
(according to Ron Brachman)
• Twitter hits 400 million tweets per day (June,
2012. Dick Costolo, CEO at Twitter)
• Over 2.5 billion photos uploaded to Facebook
each month (2010. blog.facebook.com)
• 55 Million WordPress Sites in the World
http://www.worldwidewebsize.com/
5. ©2012 LHST
• Search is the attempt to make sense
of information
• As the amount of information
explodes, search has become the
user’s interface metaphor.
• Twenty percent of searches are for
entertainment, 15 percent are
commercial in nature, and 65 percent
are informational
• On the Internet, all intent is
commercial in form or another
The perfect search engine," says Google
co-founder Larry Page, "would
understand exactly what you mean and
give back exactly what you want."
Prof. Lee SCHLENKER
6. ©2012 LHST
• Pourquoi Paul Ford fait un lien entre la
recherche de “meaning” et le “Semantic
Web” ?
• Comment définir le “The New Economy.” Cette
notion a-t-elle un sens aujourd’hui ?
• L’auteur compare Google à Amazon et EBay.
Pourquoi le modèle de gestion (« business
model ») de ce dernier est menacé
aujourd’hui ?
• Quelles sont les différences entre les notions
de « web search » et d’« entreprise
search » ?
• Analysez la faisabilité aujourd’hui de sa notion
de “personal agent” ?
7. ©2012 LHST
• Web search applies search
technology to documents on the
open web, and
• Desktop search applies search
technology to the content on a single
computer.
• Enterprise search involves making
diverse content searchable for a
defined audience.With Search you won’t ever have to
leave your house or open a physical
book…
Prof. Lee SCHLENKER
Eric Borboen
8. ©2012 LHST
• Text-based (Bing, Google, Yahoo!).
Search by keywords. Limited search using
queries in natural language.
• Multimedia (QBIC, WebSeek, SaFe)
Search by visual appearance (shapes,
colors,… ).
• Question answering systems (Ask, NSIR,
Answerbus). Search in (restricted) natural
language
• Clustering systems (Vivísimo/IBM, Clusty)
• Research systems (Lemur/MIT, Nutch)
9. ©2012 LHST
• Crawl the set of documents to
to skim the keywords from
their contents,
• Indexing the buzzwords (foam)
in a semi-structured form, and
• Resolving user entries/queries
to return mostly relevant
results
Prof. Lee SCHLENKER
Robert Korfhage
10. ©2012 LHST
• Boolean
• Vector
• Probabilistic
• Fuzzy retrieval
• Language modeling
• Latent semantic indexing
11. ©2012 LHST
• The first step in classifying web pages is to
find an ‘index item’ that might relate
expressly to the ‘search term.’
• These days, a continuous crawl method is
employed as opposed to an incidental
discovery based on a seed list.
• Most search engines use sophisticated
scheduling algorithms to “decide” when to
revisit a particular page, to appeal to its
relevance.
• The speed of the web server running the
page as well as resource constraints like
amount of hardware or bandwidth also figure
in.
With Search you won’t ever have to
leave your house or open a physical
book…
Prof. Lee SCHLENKER
12. ©2012 LHST
• Searching for text-based content in structured
data formats (databases, XML, CSV etc.)
presents a special challenges
• Databases allow logical queries which full-text
search doesn't (use of multi-field boolean logic for
instance).
• There is no crawling necessary for a database
since the data is already structured.
• Databases are slow when solving complex
queries or using customize indexing formats
(compounding, normalization,
transformation, transliteration, etc.)
Prof. Lee SCHLENKER
13. ©2012 LHST
• Content Ingestion – push or pull
content collection
• Content processing and analysis –
normalizing content
• Indexing - dictionary of all unique
words , ranking and frequency
• Query parsing – user entries, multiple
dimensional filters and paging
information
• Matching – comparing the query to the
stored index
Prof. Lee SCHLENKER
14. ©2012 LHST
Profitability
Profit Margin (ttm): 27.48%
Operating Margin (ttm): 32.45%
Management Effectiveness
Return on Assets (ttm): 15.21%
Return on Equity (ttm): 22.36%
Income Statement
Revenue (ttm): 13.43B
Revenue Per Share
(ttm):
43.676
Qtrly Revenue Growth
(yoy):
57.70%
Gross Profit (ttm): 6.38B
Internet users spend about 15 million
hours a month on the site. Nearly four out
of five Internet searches happen on
Google or on sites that license its
technology
Prof. Lee SCHLENKER
15. ©2012 LHST
January 1996-December 1997 – Sergey Brin and Larry
Page create BackRub, the precursor to the Google search
engine.
Sept. 7, 1998 - Google is incorporated and takes up
residence in a Menlo Park, California, garage with four
employees
September-October 2002 - Google rolls out its keyword
advertising program worldwide based on the GoTo.com
model
March-April 2002 - Google launches a beta version of
Google News
May-June 2003 - Google launches AdSense, an advertising
program that delivers ads based on the content of Web sites
15
History
Google is the fastest growing company ever – 400 000 percent revenue growth in five years.
Prof. Lee SCHLENKER
16. ©2012 LHST
“To organize the world's information
and make it universally accessible
and useful"
« You Can Make Money Without
Doing Evil »
“You Can Be Serious Without a
Suit »
« No Pop Up Ads »
16
Larry Page : “I’m not a big believer in strategy”
Prof. Lee SCHLENKER
17. ©2012 LHST
PageRank algorithm looks at the links on a page,
the anchor text around those links, and the
popularity of the pages that link to another page for
relevance
Google has 175,000 computers dedicated to the
job of crawling, more than all computers on earth in
the early 70’s
Google developed its own OS on top of its
servers, unique approach to designing, cooling and
stacking the components
Prof. Lee SCHLENKER
18. ©2012 LHST
“Being a different kind of company"
encompasses more than the products we make
and the business we're building; it means
making sure that our core values inform our
conduct in all aspects of our lives as Google
employees. “
I. Serving our Users
II. Respecting Each Other
III. Avoiding Conflicts of Interest
IV. Preserving Confidentiality
V. Maintaining Books and Records
VI. Protecting Google's Assets
VII. Obeying the Law
VIII. Using our Code
Google tracks what products you shop for, the
mail you send, which phrases you research in a
book, which satellite photos and news stories you
view,…
Prof. Lee SCHLENKER
20. ©2012 LHST
• You create your ads
• Your ads appear on
Google
• You attract customers
• You're charged only if
someone clicks your ad,
not when your ad is
displayed.
©2007 LHSTProf. Lee SCHLENKER
21. ©2012 LHST
Automatically crawls the
content of your pages and
delivers ads (you can
choose both text or image
ads) that are relevant to
your audience and your
site
©2007 LHSTProf. Lee SCHLENKER
22. ©2012 LHST
Gmail -- Offer custom email addresses to your organization with
up to 25 gigabytes of storage for each account, search tools to
help people find information fast, plus instant messaging and
calendar tools built right into the email interface.
Google Talk -- Your users can call or send instant messages to
their contacts for free -- anytime, anywhere in the world. File
sharing and voicemail is included, too.
Google Calendar -- Your users can organize their schedules
and share events, meetings and entire calendars with others.
Your organization can also publish calendars and events on the
web.
Google Docs -- Your users can create documents,
spreadsheets and presentations and collaborate with each other
in real-time right inside a web browser window.
The Start Page -- A central place for your users to preview their
inboxes and calendars, access your essential content, and
search the web.
Google Page Creator -- Create and publish web pages for your
domain quickly and easily with this what-you-see-is-what-you-get
page design tool.
Prof. Lee SCHLENKER
23. ©2012 LHST
• Google continues to bet on centralized servers and thin
clients. That's why they are spending $600 million to build a
new data center in North Carolina - the purpose is to
provide 100% uptime for business applications..
• Google built its web office suite via acquisitions. The
startups they have acquired are: Gtalkr (instant
messaging), Writely (word processing), iRows
(spreadsheets), JotSpot (wiki), Tonic Systems
(presentations), and Zenter (presentations).
• Google, whose web office solutions are based on AJAX,
has a clear online office strategy among the big companies.
In order to provide offline capabilities Google developed
Google Gears, which is a set of browser plugins and
Javascript libraries that enable AJAX applications to run
offline.
Prof. Lee SCHLENKER
24. ©2012 LHST
Social Media
• Google plans to begin introducing a common
set of standards (Open Social) to allow
software developers to write programs for
Google’s social network, Orkut, as well as
others, including LinkedIn, hi5, Friendster,
Plaxo, Ning as well as Salesforce and Oracle.
• Google can benefit from their success, in part,
by selling advertising on those sites, in part by
incorporating social media functions inside
their own applications
• Google said it has advertising relationships
with several social networks (including
Facebook), and $900 million partnership to
sell ads on MySpace.
Prof. Lee SCHLENKER
25. ©2012 LHST
• An application to handle all the
information, browser – Chrome
• Internet - Support net neutrality
initiatives
• Mobile OS - Android as an open
platform
• Mobile Device - Nexus
26. ©2012 LHST
• Vic Gundotra, « Google's mobile moves are
driven by one objective: pushing the industry
to open up”
• The phones sold on the Google website will
all be available unlocked.
• Google doesn't want to compete with other
companies offering handsets.
• They want to change the mindset of
consumers towards having an open handset
that will work with any network any where
27. ©2012 LHST
• Constant transformation: from big mainframes to
PCs, and from PCs to the Internet
• People increasingly rely on powerful mobile phones
instead of PCs to surf the Web
• Online advertising may well lose its role as the
Web's primary economic engine
• Recent Google acquisitions include Android, maker
of a mobile operating system; GrandCentral, a VOIP
operator; and AdMob, a mobile advertising network
• Google has invested heaviy in mapping and location
technologies
• Google's mobile strategy isn't hardware--- it's about
generating money from its core business:
advertising
Sizing up Google's Nexus 10 tablet
28. ©2012 LHST
• Google's US ad revenue = 15 billion
• The size of the US Yellow Pages market is roughly 14
billion.
• Jonathan Rosenberg : mobile ads are already a billion
Dollar market for Google.
• Google owns 97% search marketshare, while offering
localized search auto-complete, ads that map to
physical locations, and creating a mobile coupon offers
network
• Google Trusted Stores, Google Wallet, and now
Google Local Delivery
Prof. Lee SCHLENKER
29. ©2012 LHST
Rich content SERP will allow Google to
move into:
• Travel search
• Paid media (ebooks, music, magazines,
newspapers, videos etc.)
• Real estate
• Large lead generation markets (like
insurance, mortgage, credit cards, .edu)
• Ecommerce search
30. ©2012 LHST
Web Search Entreprise
Search
Validity Popular search + Deep Search
Algorithms Links Semantics
Scope Public pages + Private pages
Type Web pages + Data stores
Concerns Ranking + Security
31. ©2012 LHST
Architecture Issues
Query layer How will people find the data?
Indexing layer What metadata (context) is
relevant?
Processing layer How should we interpret the data?
Connector layer How can bring this data “home”?
These are multiple opportunities to add value to the Microsoft
platform!
32. ©2012 LHST
• Before the Web we assumed that our
digital footprint was as ephemeral as a
phone
• Clickstreams can provide a level of
intelligence about how people use the
Web
• Innovative companies have figured out
how to deliver great Web-based services
by divining clickstream patterns
• We have yet to aggregate the critical
mass of clickstreams in a database of
intentions
Prof. Lee SCHLENKER
33. ©2012 LHST
• Blogs are personal statements of who they
are and who they wish to be in the
searchable world.
• The Blog is an indexable statement of
individual’s social standing, relationships,
interests and history.
• Mass personalization – blogs can become
proxies for personal taxonomies
• Intelligent engines will be able to discern
patterns among blogs that will provide
third order relevance inputs that will help
define and return far better search results
John Battelle
Prof. Lee SCHLENKER
34. ©2012 LHST
• The Web is in the process of becoming the next
great computing platform, owned by no-one and
used by everyone.
• The telephone, the automobile, the television, the
stereo are all part of the network (your dog, your
kid)
• By tracking not only what searches you do, but what
sites you visit, the engines of the future will be able
to build a real-time profile of your interests
• Recovery is everywhere you’ve been before,
discovery is everything you may wish to find, but
have yet to encounter.
• In the near future we’ll store everything that can be
digitalized on one massive platform – the Google
grid?Prof. Lee SCHLENKER
35. ©2012 LHST
• It’s what your job in marketing, sales
and management is all about
• Decisions are based on judgment
and precision
• Search ends with proof of value
rather than a empty box
• Enterprise Search is an integral part
of BI, Collaboration, ECM, and UC
Notas do Editor Set-theoretic models represent documents as sets of words or phrases. Similarities are usually derived from set-theoretic operations on those sets.
Algebraic models represent documents and queries usually as vectors, matrices, or tuples. The similarity of the query vector and document vector is represented as a scalar value
Probabilistic models treat the process of document retrieval as a probabilistic inference. Similarities are computed as probabilities that a document is relevant for a given query.
In fuzzy-set theory, an element has a varying degree of membership, say dA, to a given set A instead of the traditional membership choice (is an element/is not an element).
A statistical language model assigns a probability to a sequence of m words by means of a probability distribution.
Latent semantic indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called Singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text.