SlideShare uma empresa Scribd logo
1 de 8
Zemanta - Blog me up!

                                      Magdalena Jitca

                         Faculty of Computer Science, Iasi, Romania
                                magdalena.jitca@info.uaic.ro



       Abstract. There is an augmenting need to help authors publish their content
       online, but there are still unsolved issues regarding the enrichment of the
       content and making it more readable, discoverable, and interconnected.
       Zemanta is a tool for content-understanding and recommendation in real time,
       widely used by bloggers because of its relevant suggestions which make
       blogging more fun. According to their estimations, it takes only 30 seconds to
       have an article published via Zemanta assistance.

       Keywords: Zemanta, content recommendation, social web software




    1. Introduction

Making your work known and visible to the world is most of the times a long-term
process, if we consider the long-winded procedures behind it. It is not just a matter of
content, but of how you present it and relate it to the past works in the field. Finding
the appropriate references and definitions, as well as suggestive images and tags to
describe your article is time-consuming. Besides, this usually requires skills and in-
depth knowledge of specifications and standards. What if there would be an automatic
way of dealing with all these activities, leaving you more time for your research and
the creative process? The solution comes from Zemanta Ltd., which provides a free
tool for content-understanding and recommendation in real time (while you are
writing your post) by performing a semantic analysis of the input text and providing
as output related content, pictures, links, and tags. From now on, it’s the author’s job
to choose from the content recommended by the Zemanta engine the one needed. The
result is not only an attractive, user-friendly design, but also an efficient relation
between your content and some external data (that computers can understand).
   The target group is composed of social web software users (e.g. blog networks,
blog farms), professional publishers and also persons with programming skills. This
article will further on focus on bloggers and how Zemanta presents different types of
interaction.
   Zemanta is available for downloading in several formats from their website [1] as
browser extension (for Firefox 2&3, Internet Explorer 7&8, Chrome, Safari etc.), as
server-side plug-in (for Wordpress, Blogger, Joomla etc.) and as API for developers.
2   Magdalena Jitca




    2. Basic principles and features

     1. Recommendation of related content
Zemanta searches the content pool for related articles, links, and images, and proposes
the most relevant ones to the author. They will show up in a special sidebar which
constantly keeps updating itself while editing your manuscript. The recommendations
are more accurate for entries longer than 300 words. It is up to you to decide
afterwards if the suggestions are the expected ones and how correct they are. Zemanta
is English-only for now, but I have obtained good results when writing in another
language if the content written about involves trademarked items or well-known
unique entities (personalities, places, companies etc). The comparison will be
presented in section 4.

     2. Large-scale knowledge database
Zemanta disposes of more than 10,000 news sources, connecting 100 Million content
objects. The articles it suggests come from hundreds of top media sources on the web,
as well as from the social networks and other blogs of Zemanta users. The images
suggested come from Wikimedia Commons, Flickr, and stock photo providers like
Shutterstock and Fotolia. Zemanta, like many other applications, uses Wikipedia as a
kind of expert system. For example, if a page is linked to from a Wiki page, it is for
certain that the page is relevant to the topic of the Wikipedia page. That kind of
approach can be used for many different tasks, all with the goal of making the web
and web services smarter.

     3. Auto-tagging
Tagging is not an easy task, but doing it right helps the web grow smarter by marking
up pages, posts, videos, images, and other objects available on the web. Zemanta can
automatically tag the content into general categories among which you can choose the
ones you need. However, it seems that humans are still more efficient than computers
at tagging, that’s why Zemanta made it possible for its users to customize the
categorization. Beside the benefits of making tagging suitable for your own needs, the
choice you make will further be used by Zemanta for refining the recommendation
results.

    4. Uploading Custom Content
A client can help improve the content pool by making use of his own experience and
previous work. He can upload RSS feeds which will later be indexed by Zemanta
Enterprise and thus be included in the content pool. The only condition that the
content must fulfill is to be either original or licensed under copyright.

      5. Customizing the Recommendation Pool
It is also possible to select the content to be included in the recommendations you get
from Zemanta (e.g. limiting links to own network and trusted sources). Because it is
Zemanta - Blog me up!    3


so well integrated into the blogging platforms, it offers personalization targeted at
bloggers. They can define their own blog sources to browse for and import their
Twitter/Facebook/MyBlogLog contacts for automatic recognition while writing.

     6. Copyright filtering
Zemanta also pays close attention to the copyright legislation, making sure that
suggested content is licensed as Creative Commons or approved by third parties, so
the user won’t have any problem by using Zemanta's service. It is mostly the case of
images you have to pay attention to, because tags, for example, are generally not
regarded as creative work and therefore are not protected under copyright terms of
service.

     7. Re-blogging
Zemanta offers a special feature of cross-platform quoting for blogs by means of
different techniques to obtain the raw body of the post intended for quoting. One of
them is via HTTP referrer and the second one is via a “request id” that is passed as
part of the URL. For example, you can have your finished article submitted to a blog
of your choice (by supplying the username and the password).


    3. Architectural details


3.1 Zemanta system architecture

From the architectural point of view, Zemanta web service is a server that stores the
content received from the application and, when requested, sends suggestions of
related content. This communication is based on a HTTP protocol and makes use of
the standard JSON and XML response formats. Authoring applications such as
content management systems then provide the suggested content to the author, so he
can select the appropriate information to merge into his manuscript. Fig. 1 [2] depicts
the flow of the manuscript authoring process and we can see clearly how Zemanta
server works like. It is important to mention that the Zemanta service is no longer
involved after posting the authored work, for example when the content is being read
by other users.
4   Magdalena Jitca




                  Fig. 1. The flow of the authoring process with Zemanta

   The entities involved in this process can be split into 5 categories, although it
might happen that a single person performs the actions corresponding to several
entities. The roles played by them and the way they interact are depicted in Fig. 2 [2]
and described below.




                 Fig. 2. The distribution of the roles in the Zemanta system

• Author - the person which composes the content and improves it with Zemanta’s
  suggestions
• CMS creator - the person or organization developing Content management
  software that integrates the services and experience offered by Zemanta
• Platform owner - person or organization owning the specific hosting platform on
  which the CMS software runs
• Reader - a person or organization benefiting from the content
Zemanta - Blog me up!     5


• Zemanta – the service provider during content creation process

   An example where roles overlap is the case of applications developed on top of the
Zemanta API. If a programmer works as a software developer (for the CMS), but also
runs the application and then creates content with it, he is playing three different
roles. This is the case of enthusiast developers experimenting with Zemanta, whose
results can be seen and tested in [3].
   Taking a step back to understand how Zemanta’s content recommendation engine
works. Instead of running keyword based queries (as traditional search engines do), it
analyzes the whole text by means of different natural language processing techniques
and performs a deep understanding of the content. On this basis it identifies the
concepts in the text by connecting them to a semantic database (e.g. DBpedia,
MusicBrainz) and delivers the suggested related results. In section 4, an interesting
application based on Zemanta, DBpedia and Freebase will be presented, in order to
have a view of the internal representation of the content recommendation engine. It is
this unique combination of NLP machine learning, and fine tuning that makes it work
so well.


3.2 Suggestions in detail

As previously stated, Zemanta provides four types of content recommendations,
which will be discussed in this section. They are all plotted in Fig. 3, which is a print
screen done while composing a blog entry.
6   Magdalena Jitca


3.2.1 Images
There are multiple sources which Zemanta uses as a basis for image suggestions.
Among the most widely used are Wikipedia, Getty and Flickr, but stock image
providers are also a good choice, as they provide images of higher aesthetical quality.
Because Zemanta uses Flickr API, it cannot use the advantage of Zemanta's internal
concept representation. This means it might happen that less topically accurate images
are suggested, although there would have been better suggestions. Each image
suggestion includes a “description” attribute. This is only a textual description of what
the image represents, but it may be inaccurate in some cases, especially because they
have been either poorly tagged or completely wrong. The image also includes an
“attribution” feature, which describes the source and the author of the image when
those are available.


3.2.2 Related articles
Zemanta allows an automatic search for related articles and full control of the author
over their inclusion. Zemanta aggregates articles from many different internet
sources, such as the major news sources (e.g. BBC and CNN) and over 10,000 blogs.
Judging from customers’ feedback, Zemanta has come to know that many authors
only read suggested related articles by themselves and use gained information to write
better content instead of explicitly linking their work to the suggested resources.
Although this doesn’t seem a proper use of Zemanta, this use case has been accepted
by the Zemanta community.

3.2.3 In-text links
In-text links present links inside the main body of text that lead the reader to
information about very specific concepts and topics that were directly mentioned. In
order to establish connections between specific concepts or topics and the considered
input text, Zemanta uses knowledge databases such as Wikipedia, IMDB, Rotten
Tomatoes, Amazon book listings and others similar. Links are not anchored to a
specific location in the text, but to substrings of the text. This is done because the
original text might change before the author decides to apply a link and it would be
extremely hard for the authoring software to store the bookmarks. That's why the
“anchor” attribute defines to whom the link should be attached.

3.2.4 Tags
Tag is a relevant keyword or term associated with a specific content. Labeling by
keywords has been used for a long time in scientific publications, but recently many
web services have gotten religion about tagging, because of its powerful way of
describing information with metadata. Even when lacking formal structure, tags can
provide valuable navigational enhancements and make the task of search engines
easier. However, it is still a problem when tagging isn’t done in a standardized way
(we have discussed about this in the previous sections). Zemanta offers both tags
based on words and phrases that can be found in the author's text and also those topics
that could represent the content as a whole, but are not explicitly mentioned.
Zemanta - Blog me up!   7


    4. Test results

    I have used Zemanta on several platforms, such as Wordpress for writing blogs and
Google Mail for composing e-mails and it has proven itself to be very useful. But the
longer I was enjoying the benefits of this tool, the more I wanted to know what it’s
behind it. That’s when I discovered LinkedGalaxy, an application built upon Zemanta
API, DBpedia and Freebase which allows you to visualize the input text as a graph.
Its nodes are the semantic entities corresponding to the concepts in the text. Fig. 4 and
5 represent print screens of the output of the application of the English text [4]




                  Fig. 3. The semantic concepts graph of the English text




                       Fig. 4. The complete graph of the English text

I have tried the same application for a text written in Romanian but which discusses
about world-known entities, for which I got a very poor output. The resulting graphs
for a Romanian text can be seen in Fig. 6. The result was predictable because I knew
that Zemanta doesn’t support multilingualism for the moment, however it was proven
right that the engine manages to relate some of the words to concepts and that is a step
towards further improvements which could include internationalization.
8     Magdalena Jitca




Fig. 5. The semantic concepts graph of the Romanian text




References

1.   http://www.zemanta.com
2.   http://developer.zemanta.com/docs/Zemanta_API_companion
3.   http://developer.zemanta.com/showcase
4.   http://www.time.com/time/specials/packages/article/0,28804,1937994_1938235,00
     .html?cnn=yes

Mais conteúdo relacionado

Mais procurados

WEB 2.0
WEB 2.0WEB 2.0
WEB 2.0
ARJUN
 

Mais procurados (16)

IRJET - A Study on Building a Web based Chatbot from Scratch
IRJET - A Study on Building a Web based Chatbot from ScratchIRJET - A Study on Building a Web based Chatbot from Scratch
IRJET - A Study on Building a Web based Chatbot from Scratch
 
Geliyoo Browser Beta
Geliyoo Browser BetaGeliyoo Browser Beta
Geliyoo Browser Beta
 
Introduction to Html
Introduction to HtmlIntroduction to Html
Introduction to Html
 
Access Management Technologies Update by Simon McLeish and John Paschoud
Access Management Technologies Update by Simon McLeish and John PaschoudAccess Management Technologies Update by Simon McLeish and John Paschoud
Access Management Technologies Update by Simon McLeish and John Paschoud
 
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaSemantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
 
Tech Terms
Tech TermsTech Terms
Tech Terms
 
Blog It Up, Baby! Extending the new IBM Lotus Domino Blog Template
Blog It Up, Baby! Extending the new IBM Lotus Domino Blog TemplateBlog It Up, Baby! Extending the new IBM Lotus Domino Blog Template
Blog It Up, Baby! Extending the new IBM Lotus Domino Blog Template
 
Web designing
Web designingWeb designing
Web designing
 
WEB 2.0
WEB 2.0WEB 2.0
WEB 2.0
 
Internet Tutorial 01
Internet Tutorial 01Internet Tutorial 01
Internet Tutorial 01
 
A customized web search engine [autosaved]
A customized web search engine [autosaved]A customized web search engine [autosaved]
A customized web search engine [autosaved]
 
Building Email Apps
Building Email AppsBuilding Email Apps
Building Email Apps
 
Bots, adaptive cards, task module, message extensions in microsoft teams
Bots, adaptive cards, task module, message extensions in microsoft teamsBots, adaptive cards, task module, message extensions in microsoft teams
Bots, adaptive cards, task module, message extensions in microsoft teams
 
Q&a
Q&aQ&a
Q&a
 
Analyzing bootsrap and foundation font-end frameworks : a comparative study
Analyzing bootsrap and foundation font-end frameworks : a comparative studyAnalyzing bootsrap and foundation font-end frameworks : a comparative study
Analyzing bootsrap and foundation font-end frameworks : a comparative study
 
Bcs 053 solved assignment 2014-15
Bcs 053 solved assignment 2014-15Bcs 053 solved assignment 2014-15
Bcs 053 solved assignment 2014-15
 

Semelhante a Zemanta

Improve information retrieval and e learning using
Improve information retrieval and e learning usingImprove information retrieval and e learning using
Improve information retrieval and e learning using
IJwest
 
Nt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And AnswersNt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And Answers
Lisa Williams
 

Semelhante a Zemanta (20)

Zemanta: A Content Recommendation Engine
Zemanta: A Content Recommendation EngineZemanta: A Content Recommendation Engine
Zemanta: A Content Recommendation Engine
 
Oral recitations
Oral recitationsOral recitations
Oral recitations
 
Kbee Spaces Financial Services
Kbee Spaces Financial ServicesKbee Spaces Financial Services
Kbee Spaces Financial Services
 
Open source content management systems
Open source content management systemsOpen source content management systems
Open source content management systems
 
SharePoint Benefits
SharePoint BenefitsSharePoint Benefits
SharePoint Benefits
 
Chapter 1 Produce server side script for dynamic web page.pptx
Chapter 1  Produce server side script for dynamic web page.pptxChapter 1  Produce server side script for dynamic web page.pptx
Chapter 1 Produce server side script for dynamic web page.pptx
 
Anahita Social Engine - Vancouver Demo Camp Edition
Anahita Social Engine - Vancouver Demo Camp EditionAnahita Social Engine - Vancouver Demo Camp Edition
Anahita Social Engine - Vancouver Demo Camp Edition
 
Rdf Based User Interfaces
Rdf Based User InterfacesRdf Based User Interfaces
Rdf Based User Interfaces
 
Improve information retrieval and e learning using
Improve information retrieval and e learning usingImprove information retrieval and e learning using
Improve information retrieval and e learning using
 
Mozillamagazine
MozillamagazineMozillamagazine
Mozillamagazine
 
Ajax, rss, feeds, web service,
Ajax, rss, feeds, web service, Ajax, rss, feeds, web service,
Ajax, rss, feeds, web service,
 
Web 2.0 In The Enterprise
Web 2.0 In The EnterpriseWeb 2.0 In The Enterprise
Web 2.0 In The Enterprise
 
Web2.0 ppt
Web2.0 pptWeb2.0 ppt
Web2.0 ppt
 
Nt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And AnswersNt1310 Final Exam Questions And Answers
Nt1310 Final Exam Questions And Answers
 
Ecs knowledge share web center services
Ecs knowledge share   web center servicesEcs knowledge share   web center services
Ecs knowledge share web center services
 
DDive11 - Lotus Connections 3.0
DDive11 - Lotus Connections 3.0DDive11 - Lotus Connections 3.0
DDive11 - Lotus Connections 3.0
 
Web 2.0: new definition of web
Web 2.0: new definition of webWeb 2.0: new definition of web
Web 2.0: new definition of web
 
Client Building Functional webapps.
Client   Building Functional webapps.Client   Building Functional webapps.
Client Building Functional webapps.
 
Social Network sites
Social Network sitesSocial Network sites
Social Network sites
 
Social Network sites
Social Network sitesSocial Network sites
Social Network sites
 

Último

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Último (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

Zemanta

  • 1. Zemanta - Blog me up! Magdalena Jitca Faculty of Computer Science, Iasi, Romania magdalena.jitca@info.uaic.ro Abstract. There is an augmenting need to help authors publish their content online, but there are still unsolved issues regarding the enrichment of the content and making it more readable, discoverable, and interconnected. Zemanta is a tool for content-understanding and recommendation in real time, widely used by bloggers because of its relevant suggestions which make blogging more fun. According to their estimations, it takes only 30 seconds to have an article published via Zemanta assistance. Keywords: Zemanta, content recommendation, social web software 1. Introduction Making your work known and visible to the world is most of the times a long-term process, if we consider the long-winded procedures behind it. It is not just a matter of content, but of how you present it and relate it to the past works in the field. Finding the appropriate references and definitions, as well as suggestive images and tags to describe your article is time-consuming. Besides, this usually requires skills and in- depth knowledge of specifications and standards. What if there would be an automatic way of dealing with all these activities, leaving you more time for your research and the creative process? The solution comes from Zemanta Ltd., which provides a free tool for content-understanding and recommendation in real time (while you are writing your post) by performing a semantic analysis of the input text and providing as output related content, pictures, links, and tags. From now on, it’s the author’s job to choose from the content recommended by the Zemanta engine the one needed. The result is not only an attractive, user-friendly design, but also an efficient relation between your content and some external data (that computers can understand). The target group is composed of social web software users (e.g. blog networks, blog farms), professional publishers and also persons with programming skills. This article will further on focus on bloggers and how Zemanta presents different types of interaction. Zemanta is available for downloading in several formats from their website [1] as browser extension (for Firefox 2&3, Internet Explorer 7&8, Chrome, Safari etc.), as server-side plug-in (for Wordpress, Blogger, Joomla etc.) and as API for developers.
  • 2. 2 Magdalena Jitca 2. Basic principles and features 1. Recommendation of related content Zemanta searches the content pool for related articles, links, and images, and proposes the most relevant ones to the author. They will show up in a special sidebar which constantly keeps updating itself while editing your manuscript. The recommendations are more accurate for entries longer than 300 words. It is up to you to decide afterwards if the suggestions are the expected ones and how correct they are. Zemanta is English-only for now, but I have obtained good results when writing in another language if the content written about involves trademarked items or well-known unique entities (personalities, places, companies etc). The comparison will be presented in section 4. 2. Large-scale knowledge database Zemanta disposes of more than 10,000 news sources, connecting 100 Million content objects. The articles it suggests come from hundreds of top media sources on the web, as well as from the social networks and other blogs of Zemanta users. The images suggested come from Wikimedia Commons, Flickr, and stock photo providers like Shutterstock and Fotolia. Zemanta, like many other applications, uses Wikipedia as a kind of expert system. For example, if a page is linked to from a Wiki page, it is for certain that the page is relevant to the topic of the Wikipedia page. That kind of approach can be used for many different tasks, all with the goal of making the web and web services smarter. 3. Auto-tagging Tagging is not an easy task, but doing it right helps the web grow smarter by marking up pages, posts, videos, images, and other objects available on the web. Zemanta can automatically tag the content into general categories among which you can choose the ones you need. However, it seems that humans are still more efficient than computers at tagging, that’s why Zemanta made it possible for its users to customize the categorization. Beside the benefits of making tagging suitable for your own needs, the choice you make will further be used by Zemanta for refining the recommendation results. 4. Uploading Custom Content A client can help improve the content pool by making use of his own experience and previous work. He can upload RSS feeds which will later be indexed by Zemanta Enterprise and thus be included in the content pool. The only condition that the content must fulfill is to be either original or licensed under copyright. 5. Customizing the Recommendation Pool It is also possible to select the content to be included in the recommendations you get from Zemanta (e.g. limiting links to own network and trusted sources). Because it is
  • 3. Zemanta - Blog me up! 3 so well integrated into the blogging platforms, it offers personalization targeted at bloggers. They can define their own blog sources to browse for and import their Twitter/Facebook/MyBlogLog contacts for automatic recognition while writing. 6. Copyright filtering Zemanta also pays close attention to the copyright legislation, making sure that suggested content is licensed as Creative Commons or approved by third parties, so the user won’t have any problem by using Zemanta's service. It is mostly the case of images you have to pay attention to, because tags, for example, are generally not regarded as creative work and therefore are not protected under copyright terms of service. 7. Re-blogging Zemanta offers a special feature of cross-platform quoting for blogs by means of different techniques to obtain the raw body of the post intended for quoting. One of them is via HTTP referrer and the second one is via a “request id” that is passed as part of the URL. For example, you can have your finished article submitted to a blog of your choice (by supplying the username and the password). 3. Architectural details 3.1 Zemanta system architecture From the architectural point of view, Zemanta web service is a server that stores the content received from the application and, when requested, sends suggestions of related content. This communication is based on a HTTP protocol and makes use of the standard JSON and XML response formats. Authoring applications such as content management systems then provide the suggested content to the author, so he can select the appropriate information to merge into his manuscript. Fig. 1 [2] depicts the flow of the manuscript authoring process and we can see clearly how Zemanta server works like. It is important to mention that the Zemanta service is no longer involved after posting the authored work, for example when the content is being read by other users.
  • 4. 4 Magdalena Jitca Fig. 1. The flow of the authoring process with Zemanta The entities involved in this process can be split into 5 categories, although it might happen that a single person performs the actions corresponding to several entities. The roles played by them and the way they interact are depicted in Fig. 2 [2] and described below. Fig. 2. The distribution of the roles in the Zemanta system • Author - the person which composes the content and improves it with Zemanta’s suggestions • CMS creator - the person or organization developing Content management software that integrates the services and experience offered by Zemanta • Platform owner - person or organization owning the specific hosting platform on which the CMS software runs • Reader - a person or organization benefiting from the content
  • 5. Zemanta - Blog me up! 5 • Zemanta – the service provider during content creation process An example where roles overlap is the case of applications developed on top of the Zemanta API. If a programmer works as a software developer (for the CMS), but also runs the application and then creates content with it, he is playing three different roles. This is the case of enthusiast developers experimenting with Zemanta, whose results can be seen and tested in [3]. Taking a step back to understand how Zemanta’s content recommendation engine works. Instead of running keyword based queries (as traditional search engines do), it analyzes the whole text by means of different natural language processing techniques and performs a deep understanding of the content. On this basis it identifies the concepts in the text by connecting them to a semantic database (e.g. DBpedia, MusicBrainz) and delivers the suggested related results. In section 4, an interesting application based on Zemanta, DBpedia and Freebase will be presented, in order to have a view of the internal representation of the content recommendation engine. It is this unique combination of NLP machine learning, and fine tuning that makes it work so well. 3.2 Suggestions in detail As previously stated, Zemanta provides four types of content recommendations, which will be discussed in this section. They are all plotted in Fig. 3, which is a print screen done while composing a blog entry.
  • 6. 6 Magdalena Jitca 3.2.1 Images There are multiple sources which Zemanta uses as a basis for image suggestions. Among the most widely used are Wikipedia, Getty and Flickr, but stock image providers are also a good choice, as they provide images of higher aesthetical quality. Because Zemanta uses Flickr API, it cannot use the advantage of Zemanta's internal concept representation. This means it might happen that less topically accurate images are suggested, although there would have been better suggestions. Each image suggestion includes a “description” attribute. This is only a textual description of what the image represents, but it may be inaccurate in some cases, especially because they have been either poorly tagged or completely wrong. The image also includes an “attribution” feature, which describes the source and the author of the image when those are available. 3.2.2 Related articles Zemanta allows an automatic search for related articles and full control of the author over their inclusion. Zemanta aggregates articles from many different internet sources, such as the major news sources (e.g. BBC and CNN) and over 10,000 blogs. Judging from customers’ feedback, Zemanta has come to know that many authors only read suggested related articles by themselves and use gained information to write better content instead of explicitly linking their work to the suggested resources. Although this doesn’t seem a proper use of Zemanta, this use case has been accepted by the Zemanta community. 3.2.3 In-text links In-text links present links inside the main body of text that lead the reader to information about very specific concepts and topics that were directly mentioned. In order to establish connections between specific concepts or topics and the considered input text, Zemanta uses knowledge databases such as Wikipedia, IMDB, Rotten Tomatoes, Amazon book listings and others similar. Links are not anchored to a specific location in the text, but to substrings of the text. This is done because the original text might change before the author decides to apply a link and it would be extremely hard for the authoring software to store the bookmarks. That's why the “anchor” attribute defines to whom the link should be attached. 3.2.4 Tags Tag is a relevant keyword or term associated with a specific content. Labeling by keywords has been used for a long time in scientific publications, but recently many web services have gotten religion about tagging, because of its powerful way of describing information with metadata. Even when lacking formal structure, tags can provide valuable navigational enhancements and make the task of search engines easier. However, it is still a problem when tagging isn’t done in a standardized way (we have discussed about this in the previous sections). Zemanta offers both tags based on words and phrases that can be found in the author's text and also those topics that could represent the content as a whole, but are not explicitly mentioned.
  • 7. Zemanta - Blog me up! 7 4. Test results I have used Zemanta on several platforms, such as Wordpress for writing blogs and Google Mail for composing e-mails and it has proven itself to be very useful. But the longer I was enjoying the benefits of this tool, the more I wanted to know what it’s behind it. That’s when I discovered LinkedGalaxy, an application built upon Zemanta API, DBpedia and Freebase which allows you to visualize the input text as a graph. Its nodes are the semantic entities corresponding to the concepts in the text. Fig. 4 and 5 represent print screens of the output of the application of the English text [4] Fig. 3. The semantic concepts graph of the English text Fig. 4. The complete graph of the English text I have tried the same application for a text written in Romanian but which discusses about world-known entities, for which I got a very poor output. The resulting graphs for a Romanian text can be seen in Fig. 6. The result was predictable because I knew that Zemanta doesn’t support multilingualism for the moment, however it was proven right that the engine manages to relate some of the words to concepts and that is a step towards further improvements which could include internationalization.
  • 8. 8 Magdalena Jitca Fig. 5. The semantic concepts graph of the Romanian text References 1. http://www.zemanta.com 2. http://developer.zemanta.com/docs/Zemanta_API_companion 3. http://developer.zemanta.com/showcase 4. http://www.time.com/time/specials/packages/article/0,28804,1937994_1938235,00 .html?cnn=yes