• Code

    This page contains the open-source code developed and used in the MULTISENSOR Project.

    Feel free to download, use and change the code as you like. We’d love to get your feedback.

    No. Description Project Partner Licence
    01 Deep statistical dependency parser [-> code]
    This Parser takes as input a surface-syntactic dependency structure as produced, e.g., by the MATE Tools parser, and outputs dependency structures annotated with deep-syntactic relations in the sense of the Meaning-Text Theory. The parser is trained on a parallel corpus of surface-syntactic structures and deep-syntactic structures in the CoNLL’09 format.
    Check out the demo here. (select “deep output”). For more details see “DSynt Converter” section.
    UPF GNU GPL v2
    02 Mate Tools Surface Dependency Parser [-> code]
    This tool takes as input plain text and produces dependency structures annotated with surface-syntactic relations (subject, object, etc.), lemmas, part of speech, and morpho-syntactic features. The MATE Tools parser is trained on surface-syntactic data anntotated in the CoNLL’09 (one word per line) format.

    Check out the demo here. (select “surface output”).
    UPF GNU GPL v2
    03 Socially interconnected/interlinked and multimedia-enriched objects [-> code]
    A model for representing multimedia content in the context of the Web and Social Media. This model allows for representing in a common format heterogeneous content such as: web pages with images and videos, images, videos, textual documents, social media posts and user comments.
    CERTH Apache Licence v2.0
    04 VERGE [-> code]
    A hybrid interactive video retrieval system, which is capable of searching into video content by integrating different search modules that employ visual- and textual-based techniques. VERGE is built on Open source technologies such as HTML, PHP, Javascript and MongoDB. It can be used by companies, organizations that are interested in indexing and searching image and video content.

    Check out the demo here.
    CERTH Apache Licence v2.0
    05 DSynt Converter (ENG) [-> code]
    A tool that converts the reference surface-syntactic annotation of English (Penn TreeBank) into its corresponding deep-syntactic annotation in the CoNLL’09 format. The conversion removes auxiliaries, modals, and functional prepositions, conjunctions and determiners, and maps the grammatical labels onto semantics-oriented labels. A deep-syntactic structure expresses at the same time the syntactic structure of the sentence and most predicate-argument relations between the meaning-bearing elements that are in it. Together with the corresponding surface-syntactic corpus, the deep-syntactic corpus is used for training a deep-syntactic parser or a deep-syntactic generator.
    UPF GNU GPL v2
    06 Deep statistical text generator [-> code]
    This tool takes as input a deep-syntactic dependency structure in the sense of the Meaning-Text Theory (in the CoNLL’09 format), and outputs a linearized structure with all the words of the sentence. The generator is trained on a parallel corpus of surface-syntactic structures and deep-syntactic structures in the CoNLL’09 format. See on this page the “DSynt Converter” section for more details.
    UPF GNU GPL v2
    07 Twitter Crawler for Contributor Analysis and Name Search [-> code]
    This tool receives as input a Twitter handle and extracts information about the user and his immediate connections, including measures of the user’s authority. The authority scores are based on three criteria: 1) reach (number of followers and size of the ego network), 2) relevance to a given set of keywords and 3) retweet influence score (average fraction of followers that retweet a random post by the user). Instead of giving as input a specific Twitter handle, the tool can work alternatively given a specific search key as input. Given this search key, the tool retrieves the top 10 relevant Twitter accounts with this string and proceeds as before with each of them.
    EURECAT MIT
    08 Framework for topic detection [-> code]
    In this framework, topic detection is tackled as a clustering problem and a hybrid clustering approach for assigning news articles into topics is realized. In this approach, prior knowledge of the correct number of clusters/topics is not required, as this number is automatically estimated by means of a novel methodology named DBSCAN-Martingale. The assignment of news articles into topics is done using Latent Dirichlet Allocation (LDA).
    CERTH Apache Licence v2.0
    09 Framework for category-based classification [-> code]
    This is the implementation of a framework for classification of news articles into a predefined set of generic categories, i.e. Nature_Environment, Politics, Science_Technology, Economy_Business_Finance, Health and Lifestyle_Leisure. The framework relies on the Random Forests (RF) machine learning method and a late fusion strategy that is based on the operational capabilities of RF, namely the OOB error estimate. For a given dataset, two types of textual features are extracted, namely word2vec and N-grams. One RF model is trained for each type of features. Next, the predicted probabilities from each model on the test set are aggregated, so as to calculate the final late fusion model predictions. These probabilities are not equally weighted. Weights are individually calculated for each class based on the OOB error estimate of each RF model.
    CERTH Apache Licence v2.0
    10 Multimedia retrieval framework [-> code]
    This framework

    • fuses multiple modalities, so as to retrieve multimedia objects in response to a multimodal query;
    • integrates high-level information, i.e. multimedia objects are enriched with high-level textual and visual concepts;
    • is language-independent.

    The framework leverages 3 modalities from every multimedia object, namely visual features, visual concepts and textual concepts. Each modality provides a vector representation of the multimedia object through its corresponding features. The similarity matrices from the 3 modalities are constructed and fused for the computation of one relevance score vector.

    CERTH Apache Licence v2.0
    11 Character-based Stack-LSTM surface parser [-> code]
    This tool takes as input plain text and produces dependency structures annotated with surface-syntactic relations (subject, object, etc.). The Stack-LSTM parser is trained on surface-syntactic data annotated in the CoNLL’06 (one word per line) format. The character-based representations are a way of overcoming the out-of-vocabulary (OOV) problem; without any additional resources, they enable the parser to substantially improve the performance when OOV rates are high since they allow to calculate vector representations for words that the machine learning model has never seen during training (out of domain, mainly). This implies that the machine learning model will be able to handle (and classify) new words without using additional resources.
    UPF Apache Licence v2.0
    12 Frame Semantics parser (ENG) [-> code]
    This parser produces structures as found in FrameNet. It has several advantages when compared to state-of-the-art systems:

    1. Unlike, e.g., the Semafor system, this parser does not consider spans of text as Frame Fillers, but instead individual meanings;
    2. Unlike, e.g., Semafor or FRED, this tool will eventually be able to process multilingual inputs.

    This parser builds upon the output of a deep-syntactic parser as described above. The level of abstraction of a Frame Semantics structure is greater than that of deep-syntactic structures, but it is also more complete from the perspective of semantics, in particular by making explicit many relations which are not given by a syntactic parser (shared arguments, gapping constructions, etc.).

    UPF Apache Licence v2.0
    13 Multimedia concept and event detection [-> code]
    Implementation of experiments for the MULTISENSOR video concept and event detection framework. The framework relies on DCNN features and Support Vector Machines (SVM) classification algorithm. A three-fold cross validation (CV) is executed to evaluate performance. The code has been developed and tested in Python, version 3.5.1, 64-bit.
    In the experiments, a dataset that contains 106 videos from news reports is utilized. Videos are categorised into nine concepts/events. Note that one video may be relevant to zero or more of these concepts/events.
    The dataset is available on the projects dataset page or through direct download here.
    CERTH Apache Licence v2.0
    14 Community detection [-> code]
    Implementation of the MULTISENSOR community detection task. Contrary to the traditional modularity maximization approaches for finding community structure, MULTISENSOR adopts the information-theoretic codelength minimization, known as the Infomap method. MULTISENSOR uses this module for the detection of Twitter communities, given a list of desired keywords/hashtags.
    CERTH Apache Licence v2.0
    15 Ontology alignment [-> code]
    The MULTISENSOR visual-based ontology alignment implements the ontology alignment algorithm for computing a visual-based similarity metric for entity matching between two ontologies. Each ontological entity is associated with sets of images, retrieved through ImageNet or web-based search, and visual feature extraction, clustering and indexing for computing the similarity between concepts is employed. An adaptation of a popular Wordnet-based matching algorithm to exploit the visual similarity has also been developed.
    CERTH Apache Licence v2.0
    16 User and Context-centric Content Analysis [-> code]
    This code implements models for representing contextual, sentiment and online social interaction features, as well as deploys linguistic processing at different levels of accuracy and completeness. Our approach is based on disambiguated entities, relations between them, subjective expressions, opinion holders and, relations between pieces of sentiment-rich information.
    EURECAT Apache Licence v2.0
  • Contact

    Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

    Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.

    Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.

    Nam liber tempor cum soluta nobis eleifend option congue nihil imperdiet doming id quod mazim placerat facer possim assum. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat.

    Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis.

    At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, At accusam aliquyam diam diam dolore dolores duo eirmod eos erat, et nonumy sed tempor et et invidunt justo labore Stet clita ea et gubergren, kasd magna no rebum. sanctus sea sed takimata ut vero voluptua. est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur

  • Slides

    Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

    Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.

    Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.

    Nam liber tempor cum soluta nobis eleifend option congue nihil imperdiet doming id quod mazim placerat facer possim assum. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat.

    Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis.

    At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, At accusam aliquyam diam diam dolore dolores duo eirmod eos erat, et nonumy sed tempor et et invidunt justo labore Stet clita ea et gubergren, kasd magna no rebum. sanctus sea sed takimata ut vero voluptua. est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur

  • Press Kit

    Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

    Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.

    Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.

    Nam liber tempor cum soluta nobis eleifend option congue nihil imperdiet doming id quod mazim placerat facer possim assum. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat.

    Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis.

    At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, At accusam aliquyam diam diam dolore dolores duo eirmod eos erat, et nonumy sed tempor et et invidunt justo labore Stet clita ea et gubergren, kasd magna no rebum. sanctus sea sed takimata ut vero voluptua. est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur

  • SocialSensor

    Sensing User Generated Input for Improved Media Discovery and Experience

    logo_socialsensor

     

     

     

    Project Description:
    SocialSensor will develop a new framework for enabling real-time multimedia indexing and search in the Social Web. The goal is to mine and aggregate user inputs and content over multiple social networking sites. Social Indexing will incorporate information about the structure and activity of the users’ social network directly into the multimedia analysis and search process.

    Furthermore, it will enhance the multimedia consumption experience by developing novel user-centric media visualization and browsing paradigms. For example, SocialSensor will analyse the dynamic and massive user contributions in order to extract unbiased trending topics and events and will use social connections for improved recommendations.

    Relevance for MULTISENSOR
    Topic detection based on multimodal representation, content integration. (WP4)
    Reuse social topic detection approaches


    > SocialSensor Website

  • Academic Publications

    The following papers have been published in relation to the MULTISENSOR Project:

    Journals

    No. Title Authors Publication
    1 “Greedy Transition-based Dependency Parsing with Stack-LSTMs” Miguel Ballesteros, Chris Dyer, Yoav Goldberg and Noah A. Smith Computational Linguistics. MIT Press, 2016
    2 “Many Languages, One Parser” Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer and Noah Smith Transactions of the Associations for Computational Linguistics (TACL), 2016
    3 “On the Feasibility of Predicting Popular News at Cold Start” I. Arapakis, B. B. Cambazoglu, and M. Lalmas JASIST 2016
    4 “Gaze Movement-driven Random Forests for Query Clustering in Automatic Video Annotation” S. Vrochidis, I. Patras and I. Kompatsiaris Journal: Multimedia Tools and Applications, 2016,
    5 “Focussed Crawling of Environmental Web Resources Based on the Combination of Multimedia Evidence”

    T. Tsikrika, A. Moumtzidou, S. Vrochidis, and I. Kompatsiaris Journal of Multimedia Tools and Applications, Springer US
    6 “Data-Driven Deep-Syntactic Dependency Parsing” M.Ballesteros, B.Bohnet, S.Mille, and L.Wanner Journal of Natural Language Engineering, Cambridge University Press, November 2015

    Conferences

    No. Title Authors Publication
    1 “Extending WordNet with Fine-Grained Collocational Information” Luis Espinosa Anke, Jose Camacho-Collados, Sara Rodríguez-Fernández, Horacio Saggion and Leo Wanner Proceedings of the International Conference on Computational Linguistics (COLING), Osaka, Japan, 2016.
    2 “A Neural Network Architecture for Multilingual Punctuation Generation” M. Ballesteros and L. Wanner In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Short paper track, Austin, TX, 2016.
    3 “Semantics-Driven Recognition of Collocations Using Word Embeddings” Sara Rodríguez Fernández, Luis Espinosa Anke, Roberto Carlini, and Leo Wanner In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Short paper track, Berlin, Germany, 2016
    4 “Example-based Acquisition of Fine-grained Collocation Resources” Sara Rodríguez-Fernández, Roberto Carlini, Luis Espinosa Anke, and Leo Wanner Proceedings of the International Conference on Linguistic Resources and Evaluation (LREC), Portoroz, Slovenia, 2016
    5 “Semantics-Driven Collocation Discovery” Sara Rodríguez-Fernández, Luis Espinosa-Anke, Roberto Carlini, and Leo Wanner In Proceedings of the Spanish Conference on Computational Linguistics, Salamanca, Spain, 2016
    6 “Towards Multilingual Natural Language Generation within Abstractive Summarization” Simon Mille, Miguel Ballesteros, Alicia Burga, Gerard Casamayor and Leo Wanner In Proceedings of the Catalan Conference on Artificial Intelligence, Barcelona, 2016
    7 “Multilingual Natural Language Generation within Abstractive Summarization” S. Mille, M. Ballesteros, A. Burga, G. Casamayor, and L. Wanner In Proceedings of the First International Workshop on Multimodal Multimedia Data Analytics (MMDA), held in conjunction with the European Conference on Artificial Intelligence (ECAI), The Hague, The Netherlands, 2016
    8 “Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser” Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer and Noah A. Smith In proceedings of EMNLP (EMNLP 2016), 2016
    9 “Training with Exploration Improves a Greedy Stack LSTM Parser” Miguel Ballesteros, Yoav Goldberg, Chris Dyer and Noah A. Smith In proceedings of Short Papers EMNLP (EMNLP 2016), 2016
    10 “A Neural Network Architecture for Multilingual Punctuation Generation” Miguel Ballesteros and Leo Wanner In proceedings of Short Papers EMNLP (EMNLP 2016), 2016
    11 “Transition-Based Dependency Parsing with Heuristic Backtracking” Jacob Buckman, Miguel Ballesteros and Chris Dyer In proceedings of Short Papers EMNLP (EMNLP 2016), 2016
    12 “Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs” Swabha Swayamdipta, Miguel Ballesteros, Chris Dyer and Noah A. Smith In proceedings of CoNLL (CoNLL 2016). Berlin, Germany, 2016
    13 “Neural Architectures for Named Entity Recognition” Guillaume Lample, Miguel Ballesteros, Kazuya Kawakami, Sandeep Subramanian and Chris Dyer In proceedings of NAACL-HLT (NAACL 2016). San Diego, US, 2016
    14 “Recurrent Neural Network Grammars” Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros and Noah Smith In proceedings of NAACL-HLT (NAACL 2016). San Diego, US, 2016
    15 “Static and Dynamic Feature Selection in Morphosyntactic Analyzers” Bernd Bohnet, Miguel Ballesteros, Ryan McDonald and Joakim Nivre Arxiv. arXiv:1603.06503, 2016
    16 “Community Detection in Complex Networks Based on DBSCAN* and a Martingale Process” I. Gialampoukidis, T. Tsikrika, S. Vrochidis, I. Kompatsiaris 11th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP 2016), Thessaloniki, Greece, 20-21 October 2016
    17 “Learning Text Patterns to Detect Opinion Targets” F. Peleja and J. Magalhães In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (KDIR’15), Lisbon, Portugal, November 2015
    18 “Query-based Topic Detection Using Concepts and Named Entities” I. Gialampoukidis, S. Vrochidis, I. Kompatsiaris 1st International Workshop on Multimodal Media Data Analytics (MMDA 2016), The Hague, Netherlands, 30 August 2016
    19 “Incremental estimation of visual vocabulary size for image retrieval” I. Gialampoukidis, S. Vrochidis, I. Kompatsiaris Proc. INNS Big Data 2016, Thessaloniki, Greece, 23-25 October 2016
    20 “Key player identification in terrorism-related social media networks using centrality measures” I. Gialampoukidis, G. Kalpakis, T. Tsikrika, S. Vrochidis, I. Kompatsiaris European Intelligence and Security Informatics Conference (EISIC 2016), Uppsala, Sweden, August 17-19, 2016
    21 “Linguistic Benchmarks of Online News Article Quality” Ioannis Arapakis, Filipa Peleja, B. Barla Cambazoglu, Joao Magalhaes In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany, August 7-12, 2016.
    22 “Semantic integration of web data for international investment decision support” Boyan Simeonov, Vladimir Alexiev, Dimitris Liparas, Marti Puigbo, Stefanos Vrochidis, Emmanuel Jamin and Ioannis Kompatsiaris 3rd international conference on Internet Science, Florence, Italy, 12-14 September 2016
    23 “A hybrid framework for news clustering based on the DBSCAN-Martingale and LDA” I. Gialampoukidis, S. Vrochidis, I. Kompatsiaris 12th International Conference on Machine Learning and Data Mining, New York, 16-21 July 2016,
    24 “A hybrid graph-based and non-linear late fusion approach for multimedia retrieval” I. Gialampoukidis, A. Moumtzidou, D. Liparas, S. Vrochidis, I. Kompatsiaris 14th International Workshop on Content-based Multimedia Indexing (CBMI), Bucharest, Romania, 15-17 June, 2016,
    25 “A Multimedia Interactive Search Engine based on Graph-based and Non-linear Multimodal Fusion” A. Moumtzidou, I. Gialampoukidis, T. Mironidis, D. Liparas, S. Vrochidis, I. Kompatsiaris 14th International Workshop on Content-based Multimedia Indexing (CBMI), Bucharest, Romania, 15-17 June 2016,
    26 “Retrieval of Multimedia objects by Fusing Multiple Modalities” I. Gialampoukidis, A. Moumtzidou, T. Tsikrika, S. Vrochidis and I. Kompatsiaris International Conference on Multimedia Retrieval (ICMR), New York, USA, 6-9 June 2016,
    27 “VERGE: A Multimodal Interactive Search Engine for Video Browsing and Retrieval”

    A. Moumtzidou, T. Mironidis, E. Apostolidis, F. Markatopoulou, A. Ioannidou, I. Gialampoukidis, K. Avgerinakis, S. Vrochidis, V. Mezaris, I. Kompatsiaris, I. Patras Proc. Video Browser Showdown (VBS’16) at the 22nd Int. Conf. on MultiMedia Modeling (MMM’16), Miami, USA, 4 January 2016
    28 “Fast Visual Vocabulary Construction for Image Retrieval using Skewed-Split k-d trees” Ilias Gialampoukidis, Stefanos Vrochidis and Ioannis Kompatsiaris Proc. 22nd Int. Conf. on MultiMedia Modeling (MMM16), Miami, USA, Jan. 2016
    29 “Exploiting visual similarities for ontology alignment”

    C. Doulaverakis, S. Vrochidis, I. Kompatsiaris 7th International Conference on Knowledge Engineering and Ontology Development (KEOD 2015), Lisbon, Portugal, 12-14 November, 2015
    30 “Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs “ M.Ballesteros, C. Dyer, N. Smith In proceedings of EMNLP (EMNLP 2015). Lisbon, Portugal, September 2015
    31 “Classification using various ML Methods and Combinations of Key-Phrases and Visual Features”

    Y. Hacohen-Kerner, A. Sabag, D. Liparas, A. Moumtzidou, S. Vrochidis and I. Kompatsiaris 1st KEYSTONE Conference (IKC2015), Coimbra, Portugal, 8-9 September 2015
    32 “Transition-Based Spinal Parsing” M.Ballesteros, X. Carreras In proceedings of CoNLL (CoNLL 2015). Beijing, China, August 2015
    33 “Transition-Based Dependency Parsing with Stack Long Short-Term Memory” C.Dyer, M.Ballesteros, W.Ling, A.Matthews, N. Smith In proceedings of ACL (ACL-IJCNLP 2015). Beijing, China, August 2015
    34 “Explanatory opinions: to whom or what is all the fuss about?” F. Peleja and I. Arapakis Sixth BCS-IRSG Symposium on Future Directions in Information Access (FDIA’15), Thessaloniki, Greece, August 2015
    35 “Concept Detection on Multimedia Web Resources about Home Made Explosives”

    George Kalpakis, Theodora Tsikrika, Foteini Markatopoulou, Nikiforos Pittaras, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Patras, and Ioannis Kompatsiaris In Proceedings of the International Workshop on Multimedia Forensics and Security (MFSec 2015), to be held in conjunction with the 10th International Conference on Availability, Reliability and Security, Toulouse, France, 24-28 August 2015
    36 MULTISENSOR: Development of Multimedia Content Integration Technologies for Journalism, Media Monitoring and International Exporting Decision Support S. Vrochidis, I. Kompatsiaris, G. Casamayor, I. Arapakis, R. Busch, V. Alexiev, E. Jamin, M. Jugov, N. Heise, T. Forrellat, D. Liparas, L. Wanner, I. Miliaraki, V. Aleksic, K. Simov, A. M. Soro, M. Eckhoff, T. Wagner, M. Puigbó 2015 IEEE International Conference on Multimedia and Expo (ICME 2015), Turin, Italy, June 29 – July 3, 2015
    37 “Visualizing deep-syntactic structures” J.Soler-Company, M.Ballesteros, B. Bohnet, S. Mille, and L. Wanner In Proceedings of the Demonstrations of the North American Chapter of Computational Linguistics (NAACL HLT 2015), Denver US, June 2015
    38 “Data-driven sentence generation with non-isomorphic trees” M.Ballesteros, B. Bohnet, S. Mille, and L. Wanner In Proceedings of the North American Chapter of Computational Linguistics (NAACL HLT 2015), Denver US, June 2015
    39 “Discovery of Environmental Web Resources Based on the Combination of Multimedia Evidence” T. Tsikrika, A. Latas, A. Moumtzidou, E. Chatzilari, S. Vrochidis, and I. Kompatsiaris In Proceedings of the Environmental Multimedia Retrieval Workshop (EMR 2015), Shanghai, China, 23-26 June 2015
    40 “VERGE: A Multimodal Interactive Video Search Engine”

    A. Moumtzidou, K. Avgerinakis, E. Apostolidis, F. Markatopoulou, K. Apostolidis, T. Mironidis, S. Vrochidis, V. Mezaris, Y. Kompatsiaris, I. Patras Proc. 21st Int. Conf. on MultiMedia Modeling (MMM15), Sydney, Australia, January 2015
    41 “A Unified Model for Socially Interconnected Multimedia-Enriched Objects”

    T. Tsikrika, K. Andreadou, A. Moumtzidou, E. Schinas, S. Papadopoulos, S. Vrochidis, Y. Kompatsiaris 21st MultiMedia Modelling Conference (MMM2015), Sydney, Australia, 5-7 January, 2015
    42 “Modeling adoptions and the stages of the diffusion of innovations” Y. Mehmood, N. Barbieri, F. Bonchi Proceedings of the International Conference on Data Mining, December 2014
    43 “On the Feasibility of Predicting News Popularity at Cold Start”

    I. Arapakis, B. Barla Cambazoglu, M. Lalmas In Proceedings of the 6th International Conference on Social Informatics. Barcelona, 10-13 November 2014
    44 “News articles classification using Random Forests and weighted multimodal features”

    D. Liparas, Y. Hacohen-Kerner, A. Moumtzidou, S. Vrochidis and I. Kompatsiaris 3rd Open Interdisciplinary MUMIA Conference and 7th Information Retrieval Facility Conference (IRFC2014), Copenhagen, Denmark, 10-12 November 2014
    45 “The Influence of Indirect Ties on Social Network Dynamics”

    X. Zuo, J. Blackburn, N. Kourtellis, J. Skvoretz, A. Iamnitchi Proceedings of the 6th International Conference on Social Informatics (SocInfo 2014), Barcelona, Spain, 10-13 November 2014
    46 “Who to follow and why: link prediction with explanations” N. Barbieri, F. Bonchi, G. Manco Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2014
    47 “Deep-syntactic parsing” M.Ballesteros, B. Bohnet, S. Mille, and L. Wanner In Proceedings of the 25th International Conference on Computational Linguistics (COLING), Dublin, Ireland, August 2014
    48 “Key-phrase Extraction using Textual and Visual Features” Y. HaCohen-Kerner, S. Vrochidis, D. Liparas, A. Moumtzidou and I. Kompatsiaris 3rd Workshop on Vision and Language (VL), Dublin, Ireland, 23-29 August 2014
    49 “Concept-oriented labelling of patent images based on Random Forests and proximity-driven generation of synthetic data” D. Liparas, A. Moumtzidou, S. Vrochidis, I. Kompatsiaris COLING’14 Workshop on Vision and Language (VL’14), Dublin, 23 August 2014
    50 “Classifiers for Data-driven Deep Sentence Generation” M. Ballesteros, S. Mille and L. Wanner In Proceedings of the 8th International Natural Language Generation Conference (INLG), Philadelphia, USA, June 2014
    51 “Influence Maximization with Viral Product Design” N. Barbieri, F. Bonchi To appear in Proceeding of the SIAM International Conference on Data Mining, April 2014
    52 “Privacy Preserving Estimation of Social Influence” T. Tassa, F. Bonchi To appear in Proceeding of the SIAM International Conference on Data Mining, April 2014
    53 “Multi-evidence User Group Discovery in Professional Image Search”

    T. Tsikrika, C. Diou In Proceedings of the 36th European Conference on Information Retrieval (ECIR 2014), Amsterdam, The Netherlands, 13-16 April 2014
    54 “Focussed Crawling of Environmental Web Resources: A Pilot Study on the Combination of Multimedia Evidence” Theodora Tsikrika, Anastasia Moumtzidou, Stefanos Vrochidis and Ioannis Kompatsiaris In Proceedings of the Environmental Multimedia Retrieval Workshop (EMR 2014), 01 April 2014
    55 “Online Topic-aware Influence Maximization Queries” C. Aslay, N. Barbieri, F.Bonchi. R.Baeza-Yates To appear in Proceeding of the International Conference on Extending Database Technology, March 2014
    56 “VERGE: An Interactive Search Engine for Browsing Video” A. Moumtzidou, K. Avgerinakis, E. Apostolidis, V. Aleksic, F. Markatopoulou, C. Papagiannopoulou, S. Vrochidis, V. Mezaris, R. Busch, I. Kompatsiaris Video Browser Showdown (VBS) 2014, Dublin, Ireland, January 2014
    57 “Influence-based Network-oblivious Community Detection” N. Barbieri, F. Bonchi, G. Manco Proceedings of the IEEE International Conference on Data Mining, December 2013
    58 “Mining Summaries of Propagations” L. Macchia, F. Bonchi, F. Gullo, L. Chiarandini Proceedings of the IEEE International Conference on Data Mining, December 2013
    59 ITI-CERTH participation to TRECVID 2013 F. Markatopoulou, A. Moumtzidou, C. Tzelepis, K. Avgerinakis, N. Gkalelis, S. Vrochidis, V. Mezaris, I. Kompatsiaris In Proceedings of TRECVID 2013 Workshop, Gaithersburg, MD, USA, November 2013
  • Presentations

    This page contains presentations held at conferences or workshops in regard to the MULTISENSOR project by one of its project partners.

    Presentations may be available in a local language only.

    No. Topic Event Date Presenter
    01 MULTISENSOR Project Poster The project poster was designed to give an overview of the Multisensor project at conferences, workshops and other events. It is available for download here. 31/10/2016 MULTISENSOR
    02 Application of language technologies for SMEs and the Public Sector Horizon 2020 ICT-16 Big Data networking day / Workshop on multilingual data value chains in the Digital Single Market 16/1/2015 EVERIS
  • Press releases/Newsletters

    This page contains the press releases issued about MULTISENSOR by one of its project partners.

    Press Releases might be in a local language only.

    No. Source/Partner Date Language Type
    01 CERTH press release 28/11/2013 Greek Press Release
    02 PIMEC Newsletter,
    specific MULTISENSOR item
    05/12/2013 Catalan Newsletter
    03 PIMEC Newsletter,
    specific MULTISENSOR item
    30/01/2014 Catalan Newsletter
    04 pressrelations press release 25/02/2014 German Press Release
    05 PIMEC Newsletter,
    specific MULTISENSOR item
    13/03/2014 Catalan Newsletter
    06 pressrelations Newsletter, specific MULTISENSOR item 10/04/2014 German/ English Newsletter
    07 EuropaPress Newsletter, specific MULTISENSOR item 30/10/2014 Catalan Newsletter
    08 PIMEC Newsletter,
    specific MULTISENSOR item
    30/10/2014 Catalan Newsletter
    09 PIMEC Newsletter,
    specific MULTISENSOR item
    10/04/2015 Catalan Newsletter
    10 pressrelations press release DEU/ENG 08/07/2015 German/ English Press Release
    11 PIMEC Newsletter,
    specific MULTISENSOR item
    15/07/2015 English/ Spanish/ Catalan Newsletter
    12 PIMEC Newsletter,
    specific MULTISENSOR item
    29/07/2015 English/ Spanish/ Catalan Newsletter
    13 pressrelations internal newsletter, specific MULTISENSOR item (Part 1/ Part 2) 14/10/2015 German Newsletter
    14 PIMEC Newsletter,
    specific MULTISENSOR item
    29/10/2015 English/ Spanish/ Catalan Newsletter
    15 PIMEC Newsletter,
    specific MULTISENSOR item
    25/11/2015 English/ Spanish/ Catalan Newsletter
    16 pressrelations internal newsletter, specific MULTISENSOR item (Part 1/ Part 2) 01/12/2015 German Newsletter
    17 PIMEC Newsletter,
    specific MULTISENSOR item
    12/01/2016 English/ Spanish/ Catalan Newsletter
    18 PIMEC Newsletter,
    specific MULTISENSOR item
    28/01/2016 English/ Spanish/ Catalan Newsletter
    19 pressrelations, Article at Datascouting 17/03/2016 English Newsletter
    20 PIMEC Newsletter,
    specific MULTISENSOR item
    20/07/2016 Spanish/ Catalan Newsletter
    21 PIMEC Newsletter,
    specific MULTISENSOR item
    06/09/2016 Spanish/ Catalan Newsletter
    22 pressrelations press release 19/09/2016 German/ English Press Release
    23 PIMEC press release 20/09/2016 Catalan Press Release
    24 PIMEC Newsletter,
    specific MULTISENSOR item
    21/10/2016 Spanish/ Catalan Newsletter
  • Journalism

    We have chosen European energy policy as a specific topic in order to illustrate how MULTISENSOR could be used in a journalistic environment. Energy policy is one of the main issues in international news coverage; NGOs, politicians and citizens struggle about the right strategy. Covering this topic, journalists are interested in official sources such as national governments and European institutions as well as in the international press and academic institutions that deal with energy policies. Furthermore, they like to follow comments from the civil society, including statements from NGOs and regular citizens through social networks and blogs in different languages. Ideally, MULTISENSOR will support journalists in covering European energy policy. The platform provides a weekly list of content items from different (multilingual) sources that are relevant to energy policies in Europe. MULTISENSOR also delivers summaries of specific articles and presents an overview of how recent developments in European energy policy have been perceived by the population.

  • Architecture

    MULTISENSOR envisions a system dealing with a wide range of technologies addressing different users and use cases. These circumstances pose different requirements that will have to be met by a stable technical baseline. On the one hand the seamless integration of all these technologies is an important challenge on the road to a successful application. On the other hand it will be necessary to harmonize the  interfaces allowing the users to interact with all segments equally without any interruptions.

    These are precisely the critical challenges for MULTISENSOR, to design and build a sound yet simple and flexible architecture able to integrate heterogeneous technologies and artifacts, based on a continuous deployment methodology, while providing usable and re-usable interfaces for both humans and external systems and applications (portals, services and application programming interfaces).

    The fundamental principles for the architecture are:

    • SERVICE-ORIENTED: to make it modular, decoupled and manageable
    • SIMPLE: to make it intelligible and manageable
    • STANDARDS-BASED: for effective interoperability and solid foundations, SOA-based architecture, and decoupled RESTful services

    The following diagram gives a high-level overview of the envisioned architecture.

  • Workpackage WP-1

    Project Management

    Objectives:

    • Manage the project to time and budget
    • supervise and to co-ordinate the MULTISENSOR activities internally and in relation to external events
    • monitor and adjust the implementation plan if necessary.

    Responsible Partner: CERTH


    The work is divided into the following tasks:

    • Task 1.1: Project Management and co-ordination
    • Task 1.2: Research quality management
    • Task 1.3: System quality management
    • Task 1.4: Administration and reporting to the Commission
  • Project Calender

     

  • MULTISENSOR – in short

    • MULTISENSOR is an EU funded research project, which aims at advancing the research and development of multilingual media analysis technologies.

    • The goal is to enable users (e.g. journalists, entrepreneurs) to attain a comprehensive and exact understanding of topics they are engaged in, not only from their own but from multiple viewpoints.

    • MULTISENSOR stands for Mining and Understanding of multilinguaL contenT for Intelligent Sentiment Enriched coNtext and Social Oriented inteRpretation

    • Scanning multiple heterogeneous sources MULTISENSOR will help gather and semantically integrate various local subjective and biased views disseminated via TV, radio, mass media websites and social media.

    • Using sentiment, social and spatiotemporal methods MULTISENSOR will then help to interpret, relate and summarize economic information and news items.


    For further information…

     

    –> you can download our factsheet

     
     

    –> or have a look at our project flyer.

     
     

    –> or visit the details page.

     
  • Want more details?

    What is the motivation behind MULTISENSOR?

    During the past decade, the rapid development of digital technologies and the low cost of recording media have led to a great increase in the availability of multilingual and multimedia content worldwide. In the best case, this content is repetitive or complementary across political, cultural, or linguistic borders.However, the reality shows that it is also often contradictive and in some cases unreliable.

    The consumption of such large amounts of content regardless of its reliability and cross-validation can have important consequences on the society. An indicative example is the current crisis of the financial markets in Europe, which has created an extremely unstable ground for economic transactions and caused insecurity in the population.

    The consequence is an extreme uncertainty and nervousness of politics, and economy on the one side, which makes national and international investments (e.g. SME internationalisation) really risky, and on the other side, the inability of journalism and media monitoring to equally consider all the media resources leaves the population in each of these encapsulated areas in its own perspective – without the realistic opportunity to understand the perspective developed in the other areas in order to adjust the own.

    To break this isolation, we need technologies that provide unified access to multilingual and multicultural economic, news story material across borders, that ensure its context-aware, spatiotemporal, sentiment-oriented and semantic interpretation, and that correlate and summarise the content into a coherent whole.

    These technologies should be capable to capture, interpret and relate economic information and news from various subjective views as disseminated via TV, radio, newspapers, blogs and social media.

    On top of this, semantic integration of heterogeneous media including computer-mediated interaction is required to gain a usable understanding based on social intelligence, while a correlation with relevant incidents with different spatiotemporal characteristics would allow for extracting hidden meanings. Although there are several research works targeting these areas independently, there is a gap in accessing these resources in a holistic manner.

    MultiSensor Concept

    What is the concept?

    In order to achieve multidimensional integration of heterogeneous resources, MULTISENSOR proposes a content integration framework that builds upon multimedia mining, knowledge extraction, analysis of computer-mediated interaction, topic detection, semantic and multimodal representation as well as hybrid reasoning.

    MULTISENSOR aims at bridging this gap by envisaging at an integrated view of heterogeneous resources sensing the world (i.e. sensors) such as international TV, newspapers, radio and social media.

    The approach of MULTISENSOR will build upon the concept of multidimensional content integration by considering the following dimensions for mining, correlating, linking, understanding and summarising heterogeneous material:

    • language
    • multimedia
    • semantics
    • context
    • emotion
    • time and location.

    The overall goal of MULTISENSOR is to research and develop a unified platform, which will allow for the multidimensional content integration from heterogeneous sensors, with a view to providing end-user services such as international media monitoring, and decision support for SME internationalisation.

    Scientific Objectives:

    • content distillation of heterogeneous multimedia and multilingual data;
    • sentiment and context analysis of content and social interactions;
    • semantic integration of heterogeneous multimedia and multilingual data;
    • semantic reasoning and intelligent decision support;
    • multilingual and multimodal summarization and presentation of the information to the user.

    Deliverables:

    There is an overall of 41 deliverables (most of them public) derived from the nine different workpackages.
    Depending on the workpackage they are assigned to, they aim at fulfilling different goals ranging from management to research, development and exploitation.

    Under Achievements > Deliverables you can take a look into the publicly available versions of these deliverables.


    Workstructure:

    The work is divided into different workpackages aligned with the different aspects of the project.

    You can find more details about this in the the menu under Project > Workstructure.


    Use Cases:

    The developed technologies will be validated with the aid of 2 main use cases:

    1. International mass media news monitoring
      and
    2. SME international investments.

    More Details about the different Use Cases and Scenarios can be found under Specifics > Use Cases.

  • TRENDMINER

    Large-scale, Cross-lingual Trend Mining Summarization of Real-Time Media Streams

    TM2

     

     

     

    Project description:
    The goal of this project is to deliver innovative, portable open-source real-time methods for cross-lingual mining and summarisation of large-scale stream media. TrendMiner will achieve this through an inter-disciplinary approach, combining deep linguistic methods from text processing, knowledge-based reasoning from web science, machine learning, economics, and political science. No expensive human annotated data will be required due to our use of time-series data (e.g. financial markets, political polls) as a proxy. A key novelty will be weakly supervised machine learning algorithms for automatic discovery of new trends and correlations. Scalability and affordability will be addressed through a cloud-based infrastructure for real-time text mining from stream media.

    Relevance for MULTISENSOR:
    Event detection (WP2,4), e-motion (WP3), hidden meaning extraction (WP5)
    Follow the updates in realtime data collection (WP3).


    > Trendminder Website

  • Commercial media monitoring

    We assume that a leading manufacturer of household appliances wants to use the MULTISENSOR platform in order to access and organise its entire global media monitoring and analysis activities, both in traditional and in social media. The client requests that all data must be mapped according to country and business sector. He also wants the MULTISENSOR platform to map all articles not only to business sectors, but also to a multitude of specific products and marketing campaigns that accompany the launch of new products. MULTISENSOR should work with a given set of qualitative analysis parameters such as target media, tonality, message penetration, picture penetration, exclusivity and mentions of corporate spokespersons. In addition the client is interested in identifying media contacts to reuse these in a contact database. He wants to track consumer opinions and sentiment with regards to specific products, as well as predict market trends and customer behaviour based on similar situations in other countries. Ideally, MULTISENSOR will provide the client with this information and support him in the market monitoring process.

  • Workpackage WP-2

    Multilingual and Multimedia Content Extraction

    Objectives:

    • extract knowledge from multimedia input data
    • present the data in a way that later components can operate on them

    Responsible Partner: Linguatec (LT)


    The work is divided into the following tasks:

    • Task 2.1: Empirical study
    • Task 2.2: Named entity extraction workflows
    • Task 2.3: Concept extraction from text
    • Task 2.4: Concept linking and relations
    • Task 2.5: Audio transcription and analysis
    • Task 2.6: Multimedia concept and event detection
    • Task 2.7: Machine Translation
  • XLike

    Cross-lingual Knowledge Extraction

    xlike_logo

     

     

    Project Description:
    The goal of the XLike project is to develop technology to monitor and aggregate knowledge that is currently spread across mainstream and social media, and to enable cross-lingual services for publishers, media monitoring and business intelligence.

    The aim is to combine scientific insights from several scientific areas to contribute in the area of cross-lingual text understanding. By combining modern computational linguistics, machine learning, text mining and semantic technologies we plan to deal with the following two key open research problems:

    • to extract and integrate formal knowledge from multilingual texts with cross-lingual knowledge bases
    • to adapt linguistic techniques and crowdsourcing to deal with irregularities in informal language used primarily in social media.

    Relevance for MULTISENSOR
    Event detection (WP2), sentiment extraction (WP3), semantic integration
    Take into account XLIKE work (cross-lingual media monitoring).


    > XLike Website

  • SME internationalisation

    We assume that a family-run company that produces dairy products intends to considerably increase its export business. Unfortunately neither the management nor the company’s 20 employees have the resources or language skills to sufficiently explore opportunities in foreign markets. They would like to use the MULTISENSOR platform to provide them with some key indicators on a selection of foreign markets. This includes – for instance – the respective countries’ GDPs, the average income, the market infrastructure as well as fundamental regulatory issues. Additionally, the management is interested in learning more about the preferences of possible consumers by not only checking official databases but also by analysing social network communication. Ideally, MULTISENSOR is not only contributing to this information but also supporting the management in the decision making process with regard to which specific country the company should focus on.

  • Workpackage WP-3

    User and Context-centric Content Analysis

    Objectives:

    • model and represent contextual, sentiment and online social interaction features
    • deploy linguistic processing at different levels of accuracy and completeness.

    Our modelling approach will be based on disambiguated entities, relations between them, subjective expressions, opinion holders and, relations between pieces of sentiment-rich information. Moreover, the techniques for cross-language information extraction developed in WP2 will support the information propagation analysis of multi-lingual content, as well as other tasks in WP3.

    Responsible Partner: BM-Y!


    The work is divided into the following tasks:

    • Task 3.1: Indicators for media monitoring and internationalisation
    • Task 3.2: Context modelling and representation
    • Task 3.3: Polarity and sentiment extraction
    • Task 3.4: Information propagation and social interaction analysis
  • NewsReader

    Building structured event Indexes of large volumes of financial and economic Data for Decision Making

    NWR_logo

     

     

    Project description:
    The goal of the NewsReader project is to process news in four different languages to extract what happened to whom, when and where. Thereby removing duplication, complementing information, registering inconsistencies and keeping track of the original sources. Any new information is integrated with the past, distinguishing the new from the old in an unfolding story line, providing constant access to all original sources and details (like a “History Recorder”).

    A decision-support tool will be developped allowing professional decision makers to explore these story lines using visual interfaces and interactions to exploit their explanatory power and their systematic structural implications. Likewise, NewsReader will help to make predictions from the past on future events or explain new events and developments through the past.

    Relevance for MULTISENSOR:
    Extract information from multimedia (WP2)
    Follow the work on indexing big financial data (WP2, WP4) and decision making (WP6)


    > NewsReader Website

  • Workpackage WP-4

    Multidimensional Content Integration and Retrieval

    Objective:

    • achieve a multidimensional integration of content

    In this context we apply topic-based modelling and representation of the content (by classifying it and extracting topic-events), we perform content integration in the semantic dimension by applying ontology alignment techniques and finally we implement a vector-based indexing representation.

    Responsible Partner: CERTH


    The work is divided into the following tasks:

    • Task 4.1: Topic-based modelling
    • Task 4.2: Mapping discovery and validation
    • Task 4.3: Content alignment and integration
    • Task 4.4: Multimodal indexing and retrieval
  • EUMSSI

    Event Understanding through Multimodal Social Stream Interpretation

    Logo EUMSSI Project

     

     

    Project Description:
    The main objective of EUMSSI is to develope technologies for identifying and aggregating data presented as unstructured information in sources of very different nature (video, image, audio, speech, text and social context), including both online (e.g., YouTube) and traditional media (e.g. audiovisual repositories). Furthermore it aims to deal  with information of very different degrees of granularity.

    A core idea is that the process of integrating content from different media sources is carried out in an interactive manner, so that the data resulting from one media helps reinforce the aggregation of information from other media, in a cross-modal interoperable semantic representation framework. All this will be integrated in a multimodal platform of state-of-the-art information extraction and analysis techniques from the different fields involved.

    Relevance for MULTISENSOR:
    Summarisation techniques, Follow the advancements by common workshops; data exchange, use case discussion.


    > EUMSSI Website

  • Datasets

    This page contains the open-source datasets that have been created and used in the MULTISENSOR Project.

    Feel free to download and use the datasets as you like. We’d love to get your feedback.

    No. Title Description Creator
    01 WikiRef220 220 news articles, which are references to specific Wikipedia pages. The selected topics of the WikiRef220 dataset (and the number of articles per topic) are:

    Paris Attacks November 2015 (36), Barack Obama (5), Premier League (37), Cypriot Financial Crisis 2012-2013 (5), Rolling Stones (1), Debt Crisis in Greece (5), Samsung Galaxy S5 (35), Greek Elections June 2012 (5), smartphone (5), Malaysia Airlines Flight 370 (39), Stephen Hawking (1), Michelle Obama (38), Tohoku earthquake and tsunami (5), NBA draft (1), U2 (1), Wall Street (1). The topics Barack Obama, Cypriot Financial Crisis 2012-2013, Rolling Stones, Debt Crisis in Greece, Greek Elections June 2012, smartphone, Stephen Hawking, Tohoku earthquake and tsunami, NBA draft, U2 and Wall Street appear no more than 5 times and therefore, they are regarded as noise. The remaining 5 topics of WikiRef220 are:

    The WikiRef186 dataset (4 topics) is the WikiRef220 without 34 documents related to “Malaysia Airlines Flight 370” and the WikiRef150 dataset (3 topics) is the WikiRef186 without the 36 documents related to “Paris Attacks”.

    If you use this dataset, please cite: Gialampoukidis, I., Vrochidis, S., & Kompatsiaris, I. (2016). A Hybrid Framework for News Clustering Based on the DBSCAN-Martingale and LDA. In Machine Learning and Data Mining in Pattern Recognition (pp. 170-184). Springer International Publishing

    CERTH
    02 WikiRef150 150 web news articles, which are references to specific Wikipedia pages, so as to ensure reliable ground-truth. The selected topics and the corresponding number of articles per topic are:

    • Barack Obama(5),
    • Premier League(37),
    • Cypriot Financial Crisis 2013(5),
    • Rolling Stones(1),
    • Debt Crisis in Greece(5),
    • Samsung Galaxy S5(35),
    • Greek Elections June 2012(5),
    • smartphone(5),
    • Malaysia Airlines Flight 370(5),
    • Stephen Hawking(1),
    • Michelle Obama(38),
    • Tohoku earthquake and tsunami(5),
    • NBA draft(1),
    • U2(1),
    • Wall Street(1)

    If you use this dataset, please cite: Gialampoukidis, I., Vrochidis, S., & Kompatsiaris, I. (2016). A Hybrid Framework for News Clustering Based on the DBSCAN-Martingale and LDA. In Machine Learning and Data Mining in Pattern Recognition (pp. 170-184). Springer International Publishing

    CERTH
    03 ArticlesNewsSitesData_1043 1043 web pages/articles retrieved from three well known news sites (i.e. BBC, The Guardian and Reuter) and their annotation with the following four topics found in the IPTC news codes taxonomy:

    • Economy-Business-Finance,
    • Lifestyle-Leisure,
    • Science-Technology and Sports.

    It should be noted that the articles are classified to a single topic.

    If you use this dataset in your research, please cite the following article:

    D. Liparas, Y. Hacohen-Kerner, A. Moumtzidou, S. Vrochidis and I. Kompatsiaris, “News articles classification using Random Forests and weighted multimodal features”, 3rd Open Interdisciplinary MUMIA Conference and 7th Information Retrieval Facility Conference (IRFC2014), Copenhagen, Denmark, November 10-12, 2014.

    CERTH
    04 ArticlesNewsSitesData_2382 2382 web pages/articles retrieved from several sites. The web pages were annotated with the following six topics found in the IPTC news codes taxonomy:

    • Nature_Environment,
    • Politics,
    • Science_Technology,
    • Economy_Business_Finance,
    • Health and Lifestyle_leisure.

    It should be noted that the articles are classified to a single topic.

    CERTH
    05 NewsArticlesData_12073 12073 news articles retrieved from several sites. The news articles were annotated with the following six topics found in the IPTC news codes taxonomy:

    • Nature_Environment,
    • Politics,
    • Science_Technology,
    • Economy_Business_Finance,
    • Health and Lifestyle_Leisure.

    It should be noted that the articles are classified to a single topic.

    CERTH
    06 YahooNewsQualityDataset The News Quality Dataset provides over 500 news articles annotated with 14 editorial quality aspects. EURECAT
    07 Event_Detection_Dataset_MS This dataset is the example set for the Multimedia concept and event detection available on the code-page.

    The dataset contains 106 videos from news reports. Videos are categorised into nine concepts/events. Keyframes for the concept and event detection are extracted. The total number of key frames in this dataset is 2826. DCNN features are extracted from the key frames based on the Caffe models trained in the work of (Markatopoulou et al., 2016). Using a random balanced split on the dataset for each concept/event, where the videos are divided into three chunks, a three-fold CV is performed using two chunks for training purposes and the remaining chunk for testing. The classification algorithm used in this code is SVM, where the “c” parameter is tuned using grid search. Output of this module is the evaluation per concept/event on videos in terms of accuracy and F-score.

    CERTH
  • Workpackage WP-5

    Semantic Reasoning and Decision Support

    Objectives:

    • provide the infrastructure which will serve as the storage layer for the (meta)data of the MULTISENSOR platform
    • develop reasoning techniques beyond state of the art allowing for efficient information selection from heterogeneous data pools, e.g. hybrid reasoning, multi-threaded reasoning, temporal reasoning, geo-spatial reasoning
    • produce a decision support mechanism based on the aforementioned reasoning techniques as well as cognitive techniques for context aware graph navigation, such as spreading activation.

    Responsible Partner: Ontotext (ONTO)


    The work is divided into the following tasks:

    • Task 5.1: Knowledge modelling
    • Task 5.2: Semantic representation infrastructure
    • Task 5.3: Hybrid reasoning
    • Task 5.4: Decision support
  • Workpackage WP-6

    Summarisation and Content Delivery

    Objectives:

    • investigate and implement procedures for briefing information in a context aware way considering user interests as well as opinionated information to produce a novel type of summarisation.

    Responsible Partner: Universitat Pompeu Fabra (UPF)


    The work is divided into the following tasks:

    • Task 6.1: Basic summarisation infrastructure
    • Task 6.2: MULTISENSOR summarisation dataset
    • Task 6.3: Content selection metrics
    • Task 6.4: Content delivery procedures
    • Task 6.5: Concept-based summarisation
    • Task 6.6: Advanced single and multi-document summary delivery system
    • Task 6.7: Summarisation evaluation
  • Reveal

    Social Media Verification

    Logo Reveal Project

     

     

     

    Project Description:
    The world of media and communication is currently experiencing enormous disruptions: from one-way communication and word of mouth exchanges, we have moved to bi- or multidirectional communication patterns. No longer can selected few act as gatekeepers, deciding what is communicated to whom and what not. Individuals now have the opportunity to access information directly from primary sources, through a channel we label ‘e-word of mouth’, or what we commonly call ‘Social Media’.

    A key problem: it takes a lot of effort to distinguish useful information from the ‘noise’ (e.g. useless or misleading information). Finding relevant information is often tedious. REVEAL aims to discover higher level concepts hidden within information. In Social Media we do not only have bare content; we also have interconnected sources. We have to deal with interactions between them, and we have many indicators about the context within which content is used, and interactions taking place. A core challenge is to decipher interactions of individuals in permanently changing constellations, and do so in real time.

    Relevance for MULTISENSOR
    Extract information from multimedia (WP2)
    Community detection (WP3)
    Content integration (WP4)


    > Reveal Website

  • Workpackage WP-7

    System Development and Integration

    Objectives:

    • plan the roadmap and design the architecture based on user requirements
    • integrate the developed technologies in previous Work Packages (WP2 to 6)
    • design and implement prototype portals for each use case

    The main goal of these portals is to demonstrate the developed technologies and allow the users to evaluate the system and the quality of the results.

    Responsible Partner: everis


    The work is divided into the following tasks:

    • Task 7.1: MULTISENSOR architecture
    • Task 7.2: Crawlers and data channels infrastructure
    • Task 7.3: Technical infrastructure
    • Task 7.4: System development

  • PERICLES

    Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics

    Logo PERICLES Project

     

     

     

    Project Description:
    PERICLES aims to address the challenge of ensuring that digital content remains accessible in an environment that is subject to continual change. This can encompass not only technological change, but also changes in semantics, academic or professional practice, or society itself, which can affect the attitudes and interests of the various stakeholders that interact with the content. PERICLES will take a ‘preservation by design’ approach that involves modelling, capturing and maintaining detailed and complex information about digital content, the environment in which it exists, and the processes and policies to which it is subject.

    Relevance for MULTISENSOR
    Content integration (WP4)
    Semantic representation (WP5)


    > PERICLES Website

  • Workpackage WP-8

    Use Cases and Evaluation

    Objectives:

    • define and validate the MULTISENSOR use cases

    It will thus provide the basis for all following research and development activities within the project. Work kicks-off with detailed identification and description of user requirements based on the user scenarios provided in section B1.1.3. This approach grants that the whole project is initiated and executed with a real commercial user demand in mind. Hence, being able to serve this demand will also be the benchmark for subsequent – both technical and usability – evaluation tasks performed by the consortium.

    Responsible Partner: Deutsche Welle (DW)


    The work is divided into the following tasks:

    • Task 8.1: User requirements and content provision
    • Task 8.2: Specification of the two pilot use cases
    • Task 8.3: Research and system quality assurance
    • Task 8.4: Metrics development for theoretical solutions
    • Task 8.5: User validation and prototype evaluation
  • KRISTINA

    Knowledge-Based Information Agent with Social Competence and Human Interaction Capabilities

    Logo KRISTINA Project

     

     

     

     

     

    Project Description:
    KRISTINA’s overall objective is to research and develop technologies for a human-like socially competent and communicative agent that is run on mobile communication devices and that serves for migrants with language and cultural barriers in the host country as a trusted information provision party and mediator in questions related to basic care and healthcare.
    To develop such an agent, KRISTINA will advance the state of the art in dialogue management, multimodal (vocal, facial and gestural) communication analysis and multimodal communication. The technologies will be validated in two use cases, in which prolonged trials will be carried out for each prototype, with a representative number of migrants recruited as users from the migration circles identified as especially in need: elderly Turkish migrants and their relatives and short term Polish care giving personnel in Germany and North African migrants in Spain.

    Relevance for MULTISENSOR
    Automatic speech recognition (WP2)
    Content integration (WP4)
    Semantic reasoning (WP5)


    > KRISTINA Website

  • Workpackage WP-9

    Dissemination and Exploitation

    Objectives:

    • contribute to market awareness by disseminating information on the project, its progress and results
    • to ensure the exploitation of the results in academia, industry and especially in SMEs, public administration and end users
    • make the objectives and the scope of the project publicly available
    • coordinate and handle the dissemination of MULTISENSOR towards different recipients
    • create a community of interest, the MULTISENSOR User Group (UG) where relevant stakeholders will be included
    • organise three workshops throughout the project to present research and development results to industry

    Responsible Partner: pressrelations (PR)


    The work is divided into the following tasks:

    • Task 9.1: Dissemination plan, event participation and organisation
    • Task 9.2: Project web presence & promotional material
    • Task 9.3: MULTISENSOR User Group
    • Task 9.4: Exploitation plans
    • Task 9.5: Business models
    • Task 9.6: Standardisation and collaboration with other projects
    • Task 9.7: Development of the Project Showcase
  • TENSOR

    Retrieval and Analysis of Heterogeneous Online Content for Terrorist Activity Recognition

    Logo TENSOR Project

     

     

     

    Project Description:
    Law Enforcement Agencies (LEAs) across Europe face today important challenges in how they identify, gather and interpret terrorist generated content online. The Dark Web presents additional challenges due to its inaccessibility and the fact that undetected material can contribute to the advancement of terrorist violence and radicalisation. LEAs also face the challenge of extracting and summarising meaningful and relevant content hidden in huge amounts of online data to inform their resource deployment and investigations.
    The main objective of the TENSOR project is to provide a powerful terrorism intelligence platform offering LEAs fast and reliable planning and prevention functionalities for the early detection of terrorist organised activities, radicalisation and recruitment. The platform integrates a set of automated and semi-automated tools for efficient and effective searching, crawling, monitoring and gathering online terrorist-generated content from the Surface and the Dark Web; Internet penetration through intelligent dialogue-empowered bots; Information extraction from multimedia (e.g., video, images, audio) and multilingual content; Content categorisation, filtering and analysis; Real-time relevant content summarisation and visualisation; Creation of automated audit trails; Privacy-by-design and data protection.

    Relevance for MULTISENSOR
    Extract information from multimedia, automatic speech recognition, machine translation (WP2)
    Classification and topic and event detection approaches (WP4)
    Semantic reasoning (WP5)
    Summarisation techniques (WP6)


    > TENSOR Website

  • InVID

    In Video Veritas – Verification of Social Media Video Content for the News Industry

    Logo InVID Project

     

     

     

    Project Description:
    The digital media revolution is bringing breaking news to online video platforms; and, news organisations delivering information by Web streams and TV broadcast often rely on user-generated recordings of breaking and developing news events shared by social media to illustrate the story. However, in video there is also deception. Access to increasingly sophisticated editing and content management tools, and the ease in which fake information spreads in electronic networks requires reputable news outlets to carefully verify third-party content before publishing it.
    InVID will build a platform providing services to detect, authenticate and check the reliability and accuracy of newsworthy video files and video content spread via social media.

    Relevance for MULTISENSOR
    Extract information from multimedia (WP2)
    Social media analysis techniques (WP3)


    > InVID Website