MK:Smart is a research and development programme leveraging a large scale data infrastructure, the MK Data Hub, to support the economic developement of Milton Keynes . The primary focus of the project is the development of a city “Data Hub”, an infrastructure that offers resources and methods to deal with the diversity of use cases foreseen in a Smart City, spanning from sensor data collection and storage to scalable computing, analytics and efficient delivery to citizens . However, the diversity of data sources, owners and licences associated with the data opens a new challenge, namely the problem of data exploitability .
To better introduce this problem, let’s take as example an application that has been developed within the project, taking inspiration from the popular card game Top Trumps. In this game, players challenge each others with cards each representing a subject with a set of attributes, each one associated with a particular score. In the Top Trumps world, cards can be Marvel heroes, car models, tank types, … thousands of variants have been realised since this game was firstly introduced in 1968! The “attacker” picks an attribute from the card he is hiding, and tells the associated score to the opponent. If the player receiving the attack has a card with a lower score, he loses his card (otherwise, he wins it). The goal is collecting all the cards.
In our game, each card represents a Milton Keynes ward (e.g. Bletchley and Fenny Stratford, Newport Pagnell North), and attributes are taken from the data selected from the various datasets within the Data Hub. Each ward has a different score for “Qualification”, “Etnic diversity”, “Size” and so on.
However, the data used in “Top MK” comes from a variety of datasets, each one associated with a license that specifies the usage policies associated with the data. But here comes the problem: what are the policies associated with Top MK?
We define “data exploitability” as the assessment of the policies associated with the data resulting from the computation of diverse datasets in complex data flows. Could Top Trumps commercialise a card set created with the data processed by the Data Hub?
To answer this question, let’s first observe what happens under the hood of “Top MK”. The data is originally obtained from the Entity Centric API, a facility offered by the MK Data Hub that offers an aggregated view on the data about a given entity (e.g. bus stop, ward, place, post code, and many others). For example, a developer can obtain the data about Newport Pagnell North at this URL: https://data.mksmart.org/entity/ward/newport_pagnell_north.
Moreover, we can obtain details about the sources of this information through the Provenance API. From the provenance data the developer learns that information about Qualifications are taken from the “Census 2011 – Qualifications in Milton Keynes’ wards” dataset, published with the OGL licence. Licenses are represented in a machine-friendly way, also using the Entity Centric API, in the MK Data Hub. For example, information about OGL is available from: http://data.mksmart.org/entity/policy/open-government-license.
However, we still do not know how the data flow implemented in the Top MK application affects the various policies of the data sources! The Datanode ontology  provides a vocabulary of relations to describe complex data flows as data centric graphs. We can describe the Top MK application with Datanode, obtaining a graph relating all data sources to the output data through a number of processing steps, specifying the nature of the process (selection, aggregation, combination, etc…). Datanode descriptions can be used in conjunction with Policy Propagation Rules . A PPR reasoner implements a single abstract rule, that can be summarised as follows: “if one data node has a policy associated, and it is connected to a second node by a relation that triggers the propagation of the given policy, then also the second node will have this policy”. Applying this reasoning to the Top MK case, allows to answer the question: data can be used in commercial applications, but attribution to the owners of the data sources is required.
While we have the technology to perform this reasoning, still we do not know to what extent all this can be implemented as a generalised facility of a Smart City Data Hub. In a recent work  (presented at the IEEE Smart Cities conference in September 2016), we introduce a methodology to deal with the diversity of data sources, licenses and data flow at Data Hub scale. The methodology involves data producers, data managers and data consumers in a Metadata Supply Chain, having a new generation Data Catalogue as a central component of data governance. A cataloguing effort follows the life cycle of data in parallel, and feeds a set of metadata stores covering the various aspects required, Licenses, Policy Propagation Rules, Dataset records, Content metadata and Provenance. We implemented this methodology within the MK Data Hub and we analyse in the paper  to what extent this activity can be managed with state of art techniques.
We plan to undertake an end-user evaluation, “in the wild” of the MK Smart Project.
However, there are still many open challenges, for example how to support Data Hub managers in the definitions of the data flows implemented , or how to support users in the validation of data flows with respect to propagated policies.
 d’Aquin, Mathieu, John Davies, and Enrico Motta. “Smart Cities’ Data: Challenges and Opportunities for Semantic Technologies.” IEEE Internet Computing 19.6 (2015): 66-70.
 d’Aquin, Mathieu, et al. “Dealing with diversity in a smart-city datahub.”Proceedings of the Fifth International Conference on Semantics for Smarter Cities-Volume 1280. CEUR-WS. org, 2014.
 Daga, Enrico, et al. “Addressing exploitability of Smart City data.” Proceedings of the 2nd International Conference on Smart Cities. IEEE, 2016 (to be published)
 Daga, Enrico, et al. Describing semantic web applications through relations between data nodes. Technical Report kmi-14-05, Knowledge Media Institute, The Open University, Walton Hall, Milton Keynes, 2014.
 Daga, Enrico, et al. “Propagation of Policies in Rich Data Flows.”Proceedings of the 8th International Conference on Knowledge Capture. ACM, 2015.
 Daga, Enrico, et al. “An incremental learning method to support the annotation of workflows with data-to-data relations.” 20th International Conference on Knowledge Engineering and Knowledge Management, 2016 (Accepted)