This seminar deals with Trend detection in numbers and text and its visualization. In the second part, it focuses on Custom Search Application, Apache Solr, Semantic search and Linked data approach.
Trend Detection and Visualization and Custom Search Applications
1. Trend Detection and Visualization
and
Custom Search Applications
Seminar for
PG PUSHPIN
Pranav Kadam (6641525)
Universität Paderborn
January 12, 2012
2. Overview
• Trend Detection
Trend Detection in Numbers
Trend Detection in Text
Trend Visualization
• Custom Search Applications
Apache Solr
Semantic Search
Linked Data Approach
Trend Detection and Visualization and Custom Search Applications 2
3. Overview
• Prototypes
• Q&A
Trend Detection and Visualization and Custom Search Applications 3
5. Trend Detection
What is a trend?
• A general direction in which something is changing
• An inclination
• A pattern of gradual change in a condition over time
• A trend is
always associated with time
often described using ‘time series‘
• Long term change in the mean level of a ‘time series‘.
Trend Detection and Visualization and Custom Search Applications 5
6. Trend Detection
Trend Analysis
• Practice of collecting information and trying to detect
trend in it
• Process of identifying pattern in behavior of a time
series by minimising noise
• Useful in forecasting future events
• Science of studying changes in social patterns
E.g. Google Trends, Youtube Trends, trendwatching.com,
Facebook Insights, Tag Cloud(on PG PUSHPIN blog)
Trend Detection and Visualization and Custom Search Applications 6
7. Trend Detection
Trend Detection in Numbers
Trend Detection and Visualization and Custom Search Applications 7
8. Trend Detection in Numbers
Time series and statistical methods
• Time series: ordered sequence of values at equally
spaced time intervals
• Trend detection in numbers: Statistical methods to
interpret time series and determine behavior
• Assumption: pattern in past data can be used to forecast
future data points
• Models: AutoRegressive(AR), Integrated(I), Moving
Average(MA)
Trend Detection and Visualization and Custom Search Applications 8
9. Trend Detection in Numbers
Moving Average
• Average of time series data taken at consecutive periods
• New data in, old data out as the series progresses
E.g. MA of temperature for six months: Temp from January
to June, February to July, March to August, and so on.
• Minimizes temporal fluctuations
• Establishes trend, distinguishes any value above or
below trendline
• Applications in fields of Financial analysis, Trade,
Economics, Mathematics
Trend Detection and Visualization and Custom Search Applications 9
10. Trend Detection in Numbers
Moving Average
• Simple Moving Average: Plain average of data points
over specific no. of periods
• Period selected can be short, medium or long according
to interest (E.g. standard periods of SMA for stock
market analysis is 50 days or 200 days)
• Longer the period gives smoother curve but increases
the lag
• SMA always lags behind the latest data point
Trend Detection and Visualization and Custom Search Applications 10
11. Trend Detection in Numbers
Moving Average
• Exponential Moving Average: Weight applied to the data
pointa to reduce the lag
• Weight decreases exponentially and never reaches zero
• EMA has less lag and is more sensitive to the changes in
data points
• SMA vs EMA: Though difference is apparent, either one
cannot be stated as better over the other
MA preference depends on objectives & time horizon
Trend Detection and Visualization and Custom Search Applications 11
12. Trend Detection
Trend Detection in Text
Trend Detection and Visualization and Custom Search Applications 12
13. Trend Detection in Text
Trend detection system
• Emerging Trend: Topic area growing in interest and
utility over time
• Study of emerging trend dependent on automated
process
• TD system processes collection of textual data and
identifies upward(growing), downward(falling) or
sideway(constant) tendency
• TD then highlights the emerging topics in trial period
Trend Detection and Visualization and Custom Search Applications 13
14. Trend Detection in Text
Trend detection system
• Trend detection methods can be classified as:
Fully-automatic
Semi-automatic
• Fully-automatic systems:
It generates a list of emerging topics from the
input(collection of texual data)
Reviewer examines data & evidence provided to
conclude actual emerging trends
Results supported with graphical visualization
Trend Detection and Visualization and Custom Search Applications 14
15. Trend Detection in Text
Trend detection system
• Semi-automatic:
User inputs a topic
System outputs the evidence that helps to determine that
the topic is emerging or not
Evidence provided either as a summary or a descriptive
report
Trend Detection and Visualization and Custom Search Applications 15
16. Trend Detection in Text
Useful models, schemes and tools
• Term-Document Matrix
• Scheme: Term Frequency – Inverse Document
Frequency (tf-idf)
• Latent Semantic Analysis
• Science Citation Index or Web of Science database
• Inspec, Compendex database
Trend Detection and Visualization and Custom Search Applications 16
17. Trend Detection in Text
Approches for Trend Detection
1. Tracing a trend via citation linkages:
Determine a potential trend or select a topic of interest
Find recent documents on the topic
Examine whether they really discuss the topic
Extract keywords
Fetch abstract of the documents those are frequently
referenced using citation information
Examine abstract to verify relation with topic
Trend Detection and Visualization and Custom Search Applications 17
18. Trend Detection in Text
Approches for Trend Detection
1. Tracing a trend via citation linkages:
Examine the references used above and make a subset
where author names are referenced in more than, say, 3
documents
As an improvement, query the repositories of citation
linkage information and other sources
Graph document frequency, repeated authors and no. of
venues by year
Trend Detection and Visualization and Custom Search Applications 18
19. Trend Detection in Text
Approches for Trend Detection
1. Tracing a trend via citation linkages:
Years with overall higher document frequency are likely
to have points where trend is emerging
Finally, to determine trend, apply a series of thresholds
like atleast one repeated author, atleast 10 venues
present, etc.
Trend Detection and Visualization and Custom Search Applications 19
20. Trend Detection in Text
Approches for Trend Detection
2. Using web resources:
Select a main topic area first
Knowledge in this area is essential to identify trends in
later stages
Validate it as a possible research area using sources like
Inspec database
Search workshop websites and technical papers for
discussions on the main topic area
Trend Detection and Visualization and Custom Search Applications 20
21. Trend Detection in Text
Approches for Trend Detection
2. Using web resources:
Search web using helper terms like
most recent contribution, hot topic, cutting edge strategy, etc
Again search an indexing database with
main topic ‘AND‘ newly found candiate trend
from year of origin to current year
Trend Detection and Visualization and Custom Search Applications 21
22. Trend Detection in Text
Approches for Trend Detection
2. Using web resources:
If document frequency increases over the years, the
candidate trend is a genuine trend
x If documents from same author appear in different years
its not a trend
Trend Detection and Visualization and Custom Search Applications 22
23. Trend Detection
Trend Visualization
Trend Detection and Visualization and Custom Search Applications 23
24. Trend Visualization
Trend visualization techniques
• Trends can be visualized using
Line graphs
Bar graphs
Word clouds
Frequency tables
Sparklines
Histograms
Trend Detection and Visualization and Custom Search Applications 24
25. Trend Visualization
Other ways to visualize trends
• ThemeRiver
Visualizes thematic variations over time
Changing widths depict changes in thematic strength of
the associated documents
Flow represents time
Colors represent themes
Vertical section represents an ordered time slice
Trend Detection and Visualization and Custom Search Applications 25
26. Trend Visualization
Other ways to visualize trends
• ThemeRiver
Trend Detection and Visualization and Custom Search Applications 26
27. Trend Visualization
Other ways to visualize trends
• ThemeRiver
Assigning same color group to related themes simplify its
tracking
Trend Detection and Visualization and Custom Search Applications 27
28. Trend Visualization
Other ways to visualize trends
• SparkClouds
SparkClouds= Sparklines + Tag Clouds
Sparkline, characterized by small size and high data density,
visualize trends and variations in a simple condensed way
Trend Detection and Visualization and Custom Search Applications 28
29. Trend Visualization
Other ways to visualize trends
• SparkClouds
Tag clouds are text based
visualizations showing
frequency, popularity or
importance of words
Trend Detection and Visualization and Custom Search Applications 29
30. Trend Visualization
Other ways to visualize trends
• SparkClouds
Sparklines are added to tag clouds to represent trend across
series of tag clouds
Overview of trends provided in limited space
Its compact and aesthetic
Trend Detection and Visualization and Custom Search Applications 30
32. Custom Search Application
Apache Solr
• Open source search platform from Apache Lucene
project
• Provides full text search, faceted search, dynamic
clustering, database integration, rich document handling,
geo-spatial search
• High scalability, distributed search
• The core of search and navigation engine of some of the
world‘s largest internet sites
Trend Detection and Visualization and Custom Search Applications 32
33. Custom Search Application
Apache Solr
• Written in Java, runs as a standalone search server
within a servlet container like Jetty or Tomcat
• REST-like API eases its use with any prog. language
• Input: XML, JSON or binary over HTTP(GET)
• Output: XML, JSON or binary
• Highly customizable
Trend Detection and Visualization and Custom Search Applications 33
34. Custom Search Application
Apache Solr
• Operations:
Indexing data
Updating data
Deleting data
Querying data
Sorting
Higlighting
Faceted search
Trend Detection and Visualization and Custom Search Applications 34
35. Custom Search Application
Semantic Web
• An extension to current Web
• Information is given well-defined meaning
• Goes beyond media objects to link people, places, events,
organizations, etc.
• Resources connected by multiple relations
• Data modeled using directed labeled graph
• Based on W3C‘s RDF, it does quering and exchanging
instance data in RDF using SOAP
Trend Detection and Visualization and Custom Search Applications 35
36. Custom Search Application
Semantic Web
9°C
temp
located in type
USA City
San Francisco
Apple Inc.
birth
place
Steve Jobs
type
Company Businessman
died on
Pixar February 24, 1955
October 5, 2011
Trend Detection and Visualization and Custom Search Applications 36
37. Custom Search Application
Semantic Search
• Context-based search results
• Can possibly enhance, but cannot replace the traditional
navigational search
• Disambiguation
• Data divided as ontological data and instance data
• Determines meaning of every word and establishing a
context between them to achieve coherence for a
sentence
Trend Detection and Visualization and Custom Search Applications 37
39. Custom Search Application
Linked Data Approach
• Linked data: method of publishing structured data that
can be interlinked
• Based on HTTP and URIs, extended to be read by
computers
• Components:
URIs
HTTP
RDF
Serialization formats (RDFa, RDF/XML, N3)
Trend Detection and Visualization and Custom Search Applications 39
40. Custom Search Application
Linked Data Approach
• KiWi – a Linked Media Framework
• Easy to setup server application bundling Semantic Web
technologies
• Consists of LMF core and LMF modules
Trend Detection and Visualization and Custom Search Applications 40
41. Custom Search Application
Linked Data Approach
• KiWi LMF core:
Use URIs as names for things.
Use HTTP URIs, so that people can look up those names.
When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL).
Include links to other URIs, so that they can discover more
things.
Trend Detection and Visualization and Custom Search Applications 41
42. Custom Search Application
Linked Data Approach
• KiWi LMF module:
LMF Semantic Search(highly configurable Semantic Search
service based on Apache SOLR)
LMF Linked Data Cache (implements a cache to the Linked
Data Cloud)
LMF Reasoner (implements a rule-based reasoner that
allows to process Datalog-style rules over RDF triples)
Trend Detection and Visualization and Custom Search Applications 42