Presentation about Newstin at conference WebExpo 2008 about categorization of text content on web in real time.
More at http://2008.webexpo.cz/prednaska/kategorizace-weboveho-obsahu-v-realnem-case/
Injustice - Developers Among Us (SciFiDevCon 2024)
WebExpo 2008 Newstin
1. Newstin Real-time Web
Content Categorization
Presentation to WebExpo 2008
October 18, 2008
2. Company Background
Newstin a.s. founded in 1998 as I2S in Prague
Team of 30 employees
26 engineers
14 nations
Since 2005
Real-time semantic content
categorization
Multiple patent filings
on cross-language solution
Past activities
Business & government projects in
information management and security
Partnership with Business Objects/SAP
RedHerring Europe 100 Winner Award
3. What is Newstin?
Patented technology
Largest news database, catalog of news in the world
150,000+ information sources in 11 languages
250,000+ articles daily fully processed into 1,000,000+ categories
US, UK, Indian, French, German, Italian, Spanish, Mexican, Portuguese,
Brazilian, Czech, Russian, Arabic, Chinese
Japanese, Korean, Turkish coming in Q4 2008
Newstin.com
Popular user applications
Business Intelligence
Enterprise content organization
4. What is Newstin? (Details)
Newstin is an innovative technology that incorporates a completely new approach to content
organization. Newstin technology and its service-oriented architecture is the foundation of a unique
system that features fully scalable real-time semantic, multi-language and cross-language document
categorization. Newstin patented technology has the potential to become the core platform for
organizing any unstructured textual data, including data from all sources on the Internet and potentially
including the hidden Web.
Newstin is a powerful engine which harnesses a variety of cutting-edge technologies and implements
linguistic processing with semantic analysis, multilevel content categorization and cross-language
taxonomy structures. The applications of Newstin technology utilize an inherent capability to make use
of context in addition to conventional key word approaches.
Newstin is the largest news database/catalogue in the world currently comprising 40 Million documents
& 2.2 Billion metadata items and constantly growing. Newstin article collection is continuously updated
from over 160,000 global and weighted sources selected from a pool of over 3 Million preprocessed
sources in 12 languages. Daily up to 200,000+ articles are fully processed into 1.1 Million categories in 15
supported editions: US, UK, Indian, French, German, Italian, Spanish, Mexican, Portuguese, Brazilian,
Czech, Russian, Arabic, Chinese and Korean; with more languages and editions coming soon.
Newstin is a complex system incorporating content retrieval, metadata processing, analysis and
visualization. The extensive operation behind Newstin makes it a perfect platform for SaaS solutions.
Newstin is a bi-directional application of its own. By imposing order on unstructured data Newstin
leverages its own extensive metadata collection for business intelligence and enterprise performance
management. It is inevitable to organize content first to maximize knowledge mining capability.
5. Web Content Chaos
An inspiration for Newstin to develop a solution for organizing web content
6. Semantic Web 2.0 Organization
A portion of Newstin’s taxonomy structure – a step toward organizing web content
16. Shrnutí Prezentace - CZ
Hlavní téma: Kategorizace webového obsahu v reálném čase
Newstin a.s. je česká technologická firma se sídlem v Praze,
zaměstnávající 30 inženýrů z 15 zemí. Během 3,5 roku vytvořila
unikátní technologii na real-time organizování textových dokumentů s
využitím sémantických a lingvistických technologií. Stěžejní a
patentovanou součástí Newstin technologie je tzv. cross-lingvální
řešení umožňující propojovat internetový obsah v různých jazycích bez
použití překladů.
Newstin vytvořil největší aktuální databázi článků internetového
zpravodajství v 11 světových jazycích včetně češtiny, která obsahuje 37
milionů článků za posledních 9 měsíců a 2 miliardy metadat. V
současnosti servery Newstin denně zpracují 250 tis. unikátních článků
ze 160 tis. nejdůležitějších zdrojů po celém světě.
Další využití technologie Newstin leží v oblasti mediálních analýz a
organizaci podnikových dat.
17. Real-time Web Content Categorization
Thank you.
Julius Rusnak
CTO
Newstin a.s.
Lomnickeho 9
140 00 Prague
Czech Republic