ORCID Profile
0000-0001-6575-9737
Current Organisation
Politecnico di Milano
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: ACM
Date: 26-10-2010
Publisher: IEEE
Date: 11-2017
Publisher: Springer Science and Business Media LLC
Date: 07-10-2011
Publisher: Springer Berlin Heidelberg
Date: 2012
Publisher: Springer Science and Business Media LLC
Date: 22-07-2016
Publisher: Elsevier BV
Date: 04-2020
Publisher: Association for Computational Linguistics
Date: 2016
DOI: 10.18653/V1/K16-1015
Publisher: ACM
Date: 30-10-2008
Publisher: ACM
Date: 19-07-2009
Publisher: Elsevier BV
Date: 12-2016
Publisher: Springer Science and Business Media LLC
Date: 26-01-2017
Publisher: ACM
Date: 04-07-2017
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: ACM
Date: 26-10-2010
Publisher: Springer International Publishing
Date: 2014
Publisher: IEEE
Date: 09-2019
Publisher: IEEE
Date: 09-2019
Publisher: IEEE Comput. Soc
Date: 2003
Publisher: ACM
Date: 19-07-2010
Publisher: Association for Computational Linguistics
Date: 2016
DOI: 10.18653/V1/D16-1104
Publisher: Association for Computational Linguistics
Date: 2015
DOI: 10.3115/V1/P15-2100
Publisher: ACM
Date: 09-02-2011
Publisher: ACM
Date: 05-12-2016
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: ACM
Date: 06-11-2017
Publisher: IEEE
Date: 2002
Publisher: Elsevier BV
Date: 06-2015
Publisher: Elsevier BV
Date: 11-2018
Publisher: IEEE
Date: 11-2018
Publisher: Springer Science and Business Media LLC
Date: 02-07-2019
Publisher: Association for Computational Linguistics
Date: 2017
DOI: 10.18653/V1/E17-1039
Publisher: ACM Press
Date: 2013
Publisher: Springer Singapore
Date: 2018
Publisher: Springer Singapore
Date: 2018
Publisher: Springer International Publishing
Date: 2014
Publisher: Springer Berlin Heidelberg
Date: 2002
Publisher: Oxford University Press (OUP)
Date: 2022
Abstract: The Gene Expression Omnibus (GEO) is a public archive containing & million digital s les from functional genomics experiments collected over almost two decades. The accompanying metadata describing the experiments suffer from redundancy, inconsistency and incompleteness due to the prevalence of free text and the lack of well-defined data formats and their validation. To remedy this situation, we created Genomic Metadata Integration (GeMI gmql.eu/gemi/), a web application that learns to automatically extract structured metadata (in the form of key-value pairs) from the plain text descriptions of GEO experiments. The extracted information can then be indexed for structured search and used for various downstream data mining activities. GeMI works in continuous interaction with its users. The natural language processing transformer-based model at the core of our system is a fine-tuned version of the Generative Pre-trained Transformer 2 (GPT2) model that is able to learn continuously from the feedback of the users thanks to an active learning framework designed for the purpose. As a part of such a framework, a machine learning interpretation mechanism (that exploits saliency maps) allows the users to understand easily and quickly whether the predictions of the model are correct and improves the overall usability. GeMI’s ability to extract attributes not explicitly mentioned (such as sex, tissue type, cell type, ethnicity and disease) allows researchers to perform specific queries and classification of experiments, which was previously possible only after spending time and resources with tedious manual annotation. The usefulness of GeMI is demonstrated on practical research use cases. Database URL gmql.eu/gemi/
Publisher: Elsevier BV
Date: 10-2020
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: Elsevier BV
Date: 05-2017
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: Springer International Publishing
Date: 2018
Publisher: Wiley
Date: 31-10-2011
DOI: 10.1002/ASI.21687
Publisher: IEEE
Date: 12-2008
Publisher: Association for Computing Machinery (ACM)
Date: 11-2012
Abstract: The enormous amount of user-generated data available on the Web provides a great opportunity to understand, analyze, and exploit people’s opinions on different topics. Traditional Information Retrieval methods consider the relevance of documents to a topic but are unable to differentiate between subjective and objective documents. Opinion retrieval is a retrieval task in which not only the relevance of a document to the topic is important but also the amount of opinion expressed in the document about the topic. In this article, we address the blog post opinion retrieval task and propose methods that rank blog posts according to their relevance and opinionatedness toward a topic. We propose estimating the opinion density at each position in a document using a general opinion lexicon and kernel density functions. We propose and investigate different models for aggregating the opinion density at query terms positions to estimate the opinion score of every document. We then combine the opinion score with the relevance score based on a probabilistic justification. Experimental results on the BLOG06 dataset show that the proposed method provides significant improvement over the standard TREC baselines. The proposed models also achieve much higher performance compared to all state of the art methods.
Publisher: Springer Singapore
Date: 2018
Publisher: Association for Computing Machinery (ACM)
Date: 17-08-2016
DOI: 10.1145/2866571
Abstract: Current random-forest (RF)-based learning-to-rank (LtR) algorithms use a classification or regression framework to solve the ranking problem in a pointwise manner. The success of this simple yet effective approach coupled with the inherent parallelizability of the learning algorithm makes it a strong candidate for widespread adoption. In this article, we aim to better understand the effectiveness of RF-based rank-learning algorithms with a focus on the comparison between pointwise and listwise approaches. We introduce what we believe to be the first listwise version of an RF-based LtR algorithm. The algorithm directly optimizes an information retrieval metric of choice (in our case, NDCG) in a greedy manner. Direct optimization of the listwise objective functions is computationally prohibitive for most learning algorithms, but possible in RF since each tree maximizes the objective in a coordinate-wise fashion. Computational complexity of the listwise approach is higher than the pointwise counterpart hence for larger datasets, we design a hybrid algorithm that combines a listwise objective in the early stages of tree construction and a pointwise objective in the latter stages. We also study the effect of the discount function of NDCG on the listwise algorithm. Experimental results on several publicly available LtR datasets reveal that the listwise/hybrid algorithm outperforms the pointwise approach on the majority (but not all) of the datasets. We then investigate several aspects of the two algorithms to better understand the inevitable performance tradeoffs. The aspects include examining an RF-based unsupervised LtR algorithm and comparing in idual tree strength. Finally, we compare the the investigated RF-based algorithms with several other LtR algorithms.
Publisher: Springer Singapore
Date: 2018
Publisher: ACM
Date: 20-07-2008
Publisher: Springer Singapore
Date: 2018
Publisher: Elsevier BV
Date: 03-2018
Publisher: ACM
Date: 19-07-2009
Publisher: Springer Singapore
Date: 2018
Publisher: Elsevier BV
Date: 09-2018
Publisher: Elsevier BV
Date: 09-2021
Publisher: AI Access Foundation
Date: 11-09-2007
DOI: 10.1613/JAIR.2205
Abstract: The Internet contains a very large number of information sources providing many types of data from weather forecasts to travel deals and financial information. These sources can be accessed via Web-forms, Web Services, RSS feeds and so on. In order to make automated use of these sources, we need to model them semantically, but writing semantic descriptions for Web Services is both tedious and error prone. In this paper we investigate the problem of automatically generating such models. We introduce a framework for learning Datalog definitions of Web sources. In order to learn these definitions, our system actively invokes the sources and compares the data they produce with that of known sources of information. It then performs an inductive logic search through the space of plausible source definitions in order to learn the best possible semantic model for each new source. In this paper we perform an empirical evaluation of the system using real-world Web sources. The evaluation demonstrates the effectiveness of the approach, showing that we can automatically learn complex models for real sources in reasonable time. We also compare our system with a complex schema matching system, showing that our approach can handle the kinds of problems tackled by the latter.
Publisher: ACM
Date: 24-10-2016
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: IEEE
Date: 12-2014
DOI: 10.1109/ICDM.2014.53
Publisher: Informa UK Limited
Date: 07-06-2023
Publisher: ACM
Date: 07-12-2017
Publisher: ACM
Date: 13-08-2016
Publisher: ACM
Date: 24-10-2011
Publisher: Association for Computing Machinery (ACM)
Date: 26-09-2017
DOI: 10.1145/3124420
Abstract: Automatic sarcasm detection is the task of predicting sarcasm in text. This is a crucial step to sentiment analysis, considering prevalence and challenges of sarcasm in sentiment-bearing text. Beginning with an approach that used speech-based features, automatic sarcasm detection has witnessed great interest from the sentiment analysis community. This article is a compilation of past work in automatic sarcasm detection. We observe three milestones in the research so far: semi-supervised pattern extraction to identify implicit sentiment, use of hashtag-based supervision, and incorporation of context beyond target text. In this article, we describe datasets, approaches, trends, and issues in sarcasm detection. We also discuss representative performance values, describe shared tasks, and provide pointers to future work, as given in prior works. In terms of resources to understand the state-of-the-art, the survey presents several useful illustrations—most prominently, a table that summarizes past papers along different dimensions such as the types of features, annotation techniques, and datasets used.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
No related grants have been discovered for Mark Carman.