1st International Workshop on Dataset PROFIling & fEderated Search for Linked Data

While the Web of Data, and in particular Linked Data, has seen tremendous growth over the past years, take-up, usage and reuse of data is still limited and is often focused on well-known reference datasets. The main obstacles preventing users from obtaining relevant, correct and up-to-date information from distributed LOD datasets is the lack of scalable and usable methods for formulating and distributing semantic and keyword queries across the Web of Data. This problem is  further alleviated by the lack of trust in the quality of search results retrieved using federated search over distributed third party data. Hence, dataset and endpoint selection and discovery are inherent challenges for query distribution. These are currently hindered by the lack of trust-worthy and up-to-date information about the nature, characteristics, currentness and suitability of particular datasets for a given task. Given the heterogeneous and large-scale context of LOD, state-of-the-art semantic and keyword search techniques for structured data face increased query ambiguity and scalability problems already in single-source search scenarios. In the federated search scenarios for LOD, dataset selection and adoption of queries to the respective schemas used poses even further challenges.

As the Linked Open Data (LOD) Cloud includes data from a variety of domains spread across hundreds of datasets containing billions of entities and facts and is constantly evolving, manual assessment of dataset features is not feasible or sustainable, leading to brief and often outdated dataset metadata. That is, for instance, apparent with the DataHub (, the largest dataset registry for open datasets in general and LOD in particular. Hence, given the dynamic and evolving nature of the LOD Cloud, particular focus should be on the development of scalable automated approaches, which facilitate the frequent assessment and profiling of large-scale datasets to enable the selection of suitable datasets for query federation.

The PROFILES 2014 workshop aims to gather innovative search approaches for large-scale, distributed and heterogeneous linked datasets inline with dedicated approaches to analyse, describe and discover endpoints, as an inherent task of query distribution. PROFILES 2014 will equally consider both novel scientific methods and techniques for querying, assessment, profiling, and curation distributed datasets as well as the application perspective, such as the innovative use of tools and methods for providing structured knowledge about distributed datasets, their evolution and fundamentally, means to search and query the Web of Data. The workshop will provide a highly interactive forum for researchers in the fields of Semantic Web and Linked Data, Databases, Semantic Search, Text Mining, NLP as well as Information Retrieval.