技术控

    今日:0| 主题:49390
收藏本版 (1)
最新软件应用技术尽在掌握

[其他] Using NLP + Neo4j for a Social Media Recommendation Engine

[复制链接]
我的独一无二 发表于 2016-10-4 16:55:31
189 5

立即注册CoLaBug.com会员,免费获得投稿人的专业资料,享用更多功能,玩转个人品牌!

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
ByAlessandro Negro, Chief Scientist, GraphAware | October 4, 2016
  Introduction

  In recent years, the rapid growth of social media communities has created a vast amount of digital documents on the web. Recommending relevant documents to users is a strategic goal for the effectiveness of customer engagement but at the same time is not a trivial problem.
   In a previous blog post , we introduced the GraphAware Natural Language Processing (NLP) plugin. It provides the basis to realize more complex applications that leverage text analysis and to offer enhanced functionalities to end users.
  An interesting use case is combining content-based recommendations with a collaborative filtering approach to deliver high quality “suggestions”. This scenario fits well in all applications that combine user-generated content such as social media, with any sort of reaction, like tagging, likes, and so on.
   In this direction, starting from the ideas exposed in the paper Social-aware Document Similarity Computation for Recommender Systems [1], we developed as part of the GraphAware Enterprise Reco plugin for Neo4j, a recommendation engine that uses a combination of similarities as a model to provide high quality recommendations.
  Document Modelling

  In a social community, a document (which could be a post, tweet, blog, etc.) could be characterized by three elements:
  
       
         
    • The document internal content and extracted tags   
    • Tags that users associate with it   
    • The readers’ interactions (i.e., view, comment, tag, like) with the document   
      
  The internal content of the document is static over time. However, tags and users associated with the document are community-driven. They reflect the attitude of the community towards the document and can be changed over time.
  With traditional information retrieval techniques, the internal contents of the document are indexed. The index is then used to help users search for documents of their interest.
  These techniques are still popular in many information retrieval systems. However, using only the document may miss out certain meaning carried by tags and users. Recognizing the importance of tags as a supplement to internal content indexing, some systems use tags as document external metadata. This type of metadata is used to assist users with browsing or navigating in document databases.
  GraphAware Enterprise Reco uses the combined approach of computing document similarity for building recommender systems. The idea is that the meaning of a document is derived not only from its content, but also from its associated tags and user interactions.
   “These three factors are viewed as three dimensions of a document in social space, named as Content, Tag, and User. Each dimension provides a different view of the document. In Content dimension, the meaning of the document is given by its author(s). However, in the Tag dimension, the meaning of the document is what it is perceived by the community. Each user may provide a different view of the document by tagging it. This view can be far different from the initial intention of the document’s author(s). In User dimension, the meaning of the document is exposed via its readers’ activities in the community.” [1]
  Moreover, while analyzing “static” content and social tags, ontology and semantics can be used to extract hierarchies in concepts. This extension allows the finding of relationships between tags and in this way, discovers the hidden relationship between apparently unrelated documents.
   So, for instance, if a document is tagged (automatically from content or by a user) with the tag violence while another is tagged with the tag war , at first analysis, they could appear unrelated, but after analyzing the semantic hierarchy of word violence (with ConceptNet 5 for instance ) the system can reveal a relation between them.
  The designed schema for the database will appear as follows:
   
Using NLP + Neo4j for a Social Media Recommendation Engine-1 (strategic,direction,documents,relevant,provides)

   This schema shows also how this complex model can be easily stored, and further extended, using graphs andNeo4j.
  Similarity Computation

  Using all the information stored, three different vectors will be created for each document:
   Content- and ontology-based vector:
  
       
          Ci = {wc(i,1), wc(i,2), …, wc(i,n)} where    n is the total number of tags in the database,    wc(i,k) is the weight of the    k th tag in the document or in the hierarchy of the tag.    wc(i,k) is computed using the following formula:    α*tf-idf(i,k) , where    α is a weight associated with the hierarchy in the ontology; it is equal to 1 if the tag is in the document or if it is a synonym of a tag in the document; less than 1 in other cases.   
      
   Social Tag-based vector:
  
       
          Ti = {wt(i,1), wt(i,2), …, wt(i,p)} where    p is the total number of tags in the database,    wt(i,k) is the weight of the    k th tag for the document.    wt(i,k) is the association frequency of the tag    k to document    i .   
      
   User vectors:
  
       
          Ui = {wu(i,1), wu(i,2), …, wu(i,q)} where    q is the total number of users in the database,    wu(i,k) is the weight of the    k th user for the document. This weight can be computed in a different way, considering the different levels of interest expressed by a user for the document. Moreover, more than one user vector can be used if it is necessary to use different weights for each of the components (for instance, one vector for likes, one for rates, and so on).   
      
  Using these three (or more) vectors, three (or more) different cosine similarities are computed and then the value for the combined similarity is calculated in the following way:
   CombinedSimilarity(i, j) = αCosineSim(Ci, Cj)+βCosineSim(Ti, Tj)+γ*CosineSim(Ui, Uj)
  Where:
   α + β + γ = 1
   It is worth noting that the similarity computed represents new knowledge extracted from the data available in thegraph database. It is stored as model for the recommendation engine and it can be used in several ways to provide suggestions to users.
  Conclusion

  In this use case, the GraphAware NLP Plugin is used to deliver high-quality recommendations to end users. The plugin provides content-based and ontology-based cosine similarities, which, together with the more classical “collaborative filtering” approach, produces completely new and more advanced functionalities in a straightforward way.
   The GraphAware NLP Plugin can be used with other plugins available on the GraphAware products page . In particular, using the Neo4j2Elastic plugin for Neo4j and Graph-Aided Search plugin for Elasticsearch, it is possible to provide a complete end-to-end customized search framework.
  The NLP plugin is going to be open-sourced under GPL in the future, and we would like to make sure it is production ready with private beta testers. If you’re interested to know more or see its usage in action, please get in touch.
  If you’re attending GraphConnect in San Francisco in October this year, or in London next May, be sure to stop by our booth!
  Reference

  [1] Tran Vu Pham, Le Nguyen Thach, “Social-Aware Document Similarity Computation for Recommender Systems”, vol. 00, no., pp. 872-878, 2011, doi:10.1109/DASC.2011.147
      Learn more about the GraphAware NLP plugin and meet the GraphAware team at GraphConnect San Francisco on October 13th, 2016. Click below to register – and we’ll see you in San Francisco soon!
   Get My Ticket
友荐云推荐




上一篇:7 Different Star Pattern Programs in C#
下一篇:Writing Libraries for Swift 2.x and 3.0 Compatibility
酷辣虫提示酷辣虫禁止发表任何与中华人民共和国法律有抵触的内容!所有内容由用户发布,并不代表酷辣虫的观点,酷辣虫无法对用户发布内容真实性提供任何的保证,请自行验证并承担风险与后果。如您有版权、违规等问题,请通过"联系我们"或"违规举报"告知我们处理。

贾叶洋 发表于 2016-11-8 10:17:39
再踩踩,楼主辛苦了!
回复 支持 反对

使用道具 举报

zklhal 发表于 2016-11-12 09:53:49
一瓶酱油,打尽多少经验;一句挽尊,顶起多少秒沉。
回复 支持 反对

使用道具 举报

贾梅 发表于 2016-11-13 07:14:23
当你的眼泪忍不住要流出来的时候,睁大眼睛,千万别眨眼,你会看到世界由清晰到模糊的全过程
回复 支持 反对

使用道具 举报

杨涛 发表于 2016-11-14 08:22:35
失去某人,最糟糕的莫过于,他近在身旁,却犹如远在天边.
回复 支持 反对

使用道具 举报

不傻不成气候 发表于 2016-11-15 06:47:47
各位兄台,通融通融,沙发我来啦!
回复 支持 反对

使用道具 举报

*滑动验证:
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

我要投稿

推荐阅读

扫码访问 @iTTTTT瑞翔 的微博
回页顶回复上一篇下一篇回列表手机版
手机版/CoLaBug.com ( 粤ICP备05003221号 | 文网文[2010]257号 )|网站地图 酷辣虫

© 2001-2016 Comsenz Inc. Design: Dean. DiscuzFans.

返回顶部 返回列表