O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Time Based Context Cluster Analysis for Automatic Blog Generation Luca Costabello and Laurent-Walter Goix Telecom Italia, ...
Context as Blog Content <ul><li>User context  is gaining importance </li></ul><ul><ul><li>Location info </li></ul></ul><ul...
Context-Based Blog Generation 1) Raw data gathering Daily actions 2) Offline Cluster analysis 3) Blog post generation
System Architecture
Cluster Analysis: Detecting User Actions 2007-10-03 11:02:33  222-1-61101-72162201  office,tilab  2007-10-03 10:59:09  222...
Clustering Algorithms Dimensions <ul><li>Location </li></ul><ul><ul><li>GSM/UMTS Cell IDs </li></ul></ul><ul><ul><li>User-...
Cell-Based Location Data Issues <ul><li>Context updates occur with  variable frequency </li></ul><ul><li>Detecting  static...
Compare&Merge Algorithm 2007-10-03 11:02:33  222-1-61101-72162201  office,tilab  2007-10-03 10:59:09  222-1-61101-72162201...
MultiLevel Sliding Window Algorithm <ul><li>For each window iteration: </li></ul><ul><li>Check if any user-defined label i...
Algorithms Comparison Lower precision than C&M.  (A 30 minute long window leads to a less than 30 minutes error) Very high...
Cluster Analysis Accuracy VS User Perception
From Clusters To Blog Post  NLG Natural Text Generation Action Detector Context Clusters User Preferences
Results <ul><li>Mining context history leads to user pattern discovery </li></ul><ul><li>Daily actions sharing </li></ul><...
<ul><li>Any Questions? </li></ul>Thank You! luca.costabello@guest.telecomitalia.it  [email_address] Email
Próximos SlideShares
Carregando em…5
×

Time Based Cluster Analysis for Automatic Blog Generation

Presented at the Social Web Search and Mining Workshop, WWW2008 in Beijing

  • Entre para ver os comentários

Time Based Cluster Analysis for Automatic Blog Generation

  1. Time Based Context Cluster Analysis for Automatic Blog Generation Luca Costabello and Laurent-Walter Goix Telecom Italia, Italy
  2. Context as Blog Content <ul><li>User context is gaining importance </li></ul><ul><ul><li>Location info </li></ul></ul><ul><ul><li>Nearby buddies </li></ul></ul><ul><ul><li>The surrounding environment in general </li></ul></ul><ul><li>We mine context data to detect daily user actions </li></ul><ul><li>User actions are converted into natural text </li></ul><ul><li>Blog posts describing the user days enable the detection of a community of users with similar behavioral patterns. </li></ul>
  3. Context-Based Blog Generation 1) Raw data gathering Daily actions 2) Offline Cluster analysis 3) Blog post generation
  4. System Architecture
  5. Cluster Analysis: Detecting User Actions 2007-10-03 11:02:33 222-1-61101-72162201 office,tilab 2007-10-03 10:59:09 222-1-61101-72162201 office,tilab 2007-10-03 10:55:46 222-1-61101-72162201 office,tilab 2007-10-03 10:52:41 222-1-61101-64530928 n/a,n/a 2007-10-03 10:48:59 222-1-61101-72162201 office,tilab 2007-10-03 10:45:34 222-1-61101-72162201 office,tilab 2007-10-03 10:42:11 222-1-61101-64530928 n/a,n/a 2007-10-03 10:38:47 222-1-61101-72162201 office,tilab 2007-10-03 10:37:47 222-1-61101-72162201 office,tilab 2007-10-03 09:27:01 222-1-61101-72157899 office,tilab 2007-10-03 08:58:11 222-1-61104-72386176 n/a,n/a 2007-10-03 08:56:28 222-1-24650-121 n/a,n/a 2007-10-03 08:56:05 222-1-24650-122 n/a,n/a 2007-10-03 08:54:20 222-1-54650-923 n/a,n/a 2007-10-03 08:51:31 222-1-61104-72395762 n/a,n/a 2007-10-03 08:49:16 222-1-61104-72384437 n/a,n/a 2007-10-03 08:48:47 222-1-61104-72395762 n/a,n/a 2007-10-03 08:48:18 222-1-61104-72384437 n/a,n/a 2007-10-03 08:47:50 222-1-61104-72395762 n/a,n/a 2007-10-03 08:47:21 222-1-61104-72395762 n/a,n/a 2007-10-03 08:46:51 222-1-61104-72384437 n/a,n/a 2007-10-03 08:46:20 222-1-61104-72376116 n/a,n/a 2007-10-03 08:45:15 222-1-61104-72395763 n/a,n/a 2007-10-03 08:44:02 222-1-61104-72400263 n/a,n/a 2007-10-03 08:42:33 222-1-61104-72395770 n/a,n/a 2007-10-03 08:42:02 222-1-61104-72400262 n/a,n/a 2007-10-03 08:40:08 222-1-24650-1281 residence,home 2007-10-03 08:36:26 222-1-24650-1281 residence,home 2007-10-03 08:33:02 222-1-24650-1281 residence,home Cluster 1 (Static) Start 08:58 End 11:02 CGI 222-1-61101-162201 VP CGI Office, TILab VP Bth Not available Cluster 2 (Movement) Start 08:42 End 08:56 CGI From 222-1-24550-1281 CGI To 222-1-24650-121 VP CGI From Residence,home VP CGI To Office, TILab VP Bth Not available Timestamp Cell ID Cell ID Virtual Place
  6. Clustering Algorithms Dimensions <ul><li>Location </li></ul><ul><ul><li>GSM/UMTS Cell IDs </li></ul></ul><ul><ul><li>User-defined Cell ID Labels </li></ul></ul><ul><li>Time </li></ul><ul><ul><li>Chronological order of actions must be respected </li></ul></ul>Categorical attributes Euclidean distance not available Time must be evaluated according to “temporal distance” Ad-hoc algorithms had to be designed
  7. Cell-Based Location Data Issues <ul><li>Context updates occur with variable frequency </li></ul><ul><li>Detecting static situations VS detecting movement </li></ul><ul><li>Base station concentration affects context data patterns </li></ul><ul><li>Frequent cell handovers during static actions </li></ul>
  8. Compare&Merge Algorithm 2007-10-03 11:02:33 222-1-61101-72162201 office,tilab 2007-10-03 10:59:09 222-1-61101-72162201 office,tilab 2007-10-03 10:55:46 222-1-61101-72162201 office,tilab 2007-10-03 10:52:41 222-1-61101-64530928 n/a,n/a 2007-10-03 10:48:59 222-1-61101-72162201 office,tilab 2007-10-03 10:45:34 222-1-61101-72162201 office,tilab 2007-10-03 10:42:11 222-1-61101-64530928 n/a,n/a 2007-10-03 10:38:47 222-1-61101-72162201 office,tilab 2007-10-03 10:37:47 222-1-61101-72162201 office,tilab 2007-10-03 09:27:01 222-1-61101-72157899 office,tilab 2007-10-03 08:58:11 222-1-61104-72386176 n/a,n/a 2007-10-03 08:56:28 222-1-24650-121 n/a,n/a 2007-10-03 08:56:05 222-1-24650-122 n/a,n/a 2007-10-03 08:54:20 222-1-54650-923 n/a,n/a 2007-10-03 08:51:31 222-1-61104-72395762 n/a,n/a 2007-10-03 08:49:16 222-1-61104-72384437 n/a,n/a 2007-10-03 08:48:47 222-1-61104-72395762 n/a,n/a 2007-10-03 08:48:18 222-1-61104-72384437 n/a,n/a Context History Preliminary Context Scan Long Temporary Cluster Short Temporary Clusters Temporary Clusters Merge Static Cluster Movement Cluster Static Cluster
  9. MultiLevel Sliding Window Algorithm <ul><li>For each window iteration: </li></ul><ul><li>Check if any user-defined label is available. </li></ul><ul><li>Detect user movement </li></ul><ul><li>Detect the most frequent position </li></ul><ul><li>Merge window data with previous window iteration (if detected position is the same) </li></ul>
  10. Algorithms Comparison Lower precision than C&M. (A 30 minute long window leads to a less than 30 minutes error) Very high in optimal situations (less than 2-5 minutes) Precision <ul><li>Non-labeled areas </li></ul><ul><li>Frequent cell handovers </li></ul><ul><li>Good user labeling </li></ul><ul><li>Cells with low handovers issues </li></ul>Optimal usage None Frequent cell handovers Critical situations MultiLevel Sliding Window Compare&Merge  
  11. Cluster Analysis Accuracy VS User Perception
  12. From Clusters To Blog Post NLG Natural Text Generation Action Detector Context Clusters User Preferences
  13. Results <ul><li>Mining context history leads to user pattern discovery </li></ul><ul><li>Daily actions sharing </li></ul><ul><li>Detection of user communities, according to daily behaviors </li></ul><ul><li>Clustering accuracy VS personal memories perception </li></ul><ul><li>Movement detection </li></ul><ul><li>Location-labeling importance </li></ul>
  14. <ul><li>Any Questions? </li></ul>Thank You! luca.costabello@guest.telecomitalia.it [email_address] Email

×