Introduction

Technological progress in the production, distribution and consumption of digital content has nurtured a business environment, in which the cost-efficient handling of internal and external data has become critical for innovations at the process- and product-level (Müchner Kreis 2009; The Economist 2010). This motivates the question if and how the publishing industry makes use of advanced data management technologies to diversify their service and product portfolio, organizationally adapt to the affordances of new data management practices and utilizes them to position itself strategically in the emerging ecosystem of data clouds.

The automation of editorial workflows, i.e. to support dynamic content publishing (Rayfield 2012), and the increasing proliferation of machine-generated content, i.e. as exercised by data journalism (Chambert and Gray 2012), causes and stimulates the adoption of new data management technologies which support the time critical and context-sensitive creation, curation and marketing of digital content (Zwick and Knott 2009). Given the fact that from the vast amount of digital content produced every day just about 5 % is “structured” (Russom 2011), new data management practices are being implemented to improve the machine-processability of digital content. This is achieved not just by creating and enriching content with or on top of structured data, but also by applying metadata standards that support interoperability at a syntactic and semantic level. One of these approaches is called Linked Data.Footnote 1

Linked Data is a generic technology to structure and query federated data, thus enabling the flexible and cost efficient reutilization of dispersed digital assets (Cranford 2009). Case studiesFootnote 2 from various industries reveal that Linked Data fits well into the incremental IT development practices of enterprises and public organisations, but additionally entails disruptive organisational and institutional effects that pose significant challenges to and opportunities for business diversification (Archer et al. 2013; Pellegrini 2014).

The aim of this paper is twofold. First, it approaches the topic from a strategic management perspective discussing the role of Linked Data technologies in the diversification of product and service portfolios. To do so, the paper provides insights into how Linked Data contributes to the content value chain, identifies typical stakeholder roles within Linked Data ecosystems and discusses asset types and licensing policies in the commercialization of Linked Data assets. Second, the theoretical assumptions laid out in this paper are complemented by two case studies that illustrate the adoption strategy of Linked Data technologies at two publishing companies. The findings help to better understand the organisational impact of new data management practices from a resource-based point of view (Barney 1995) and the role of Linked Data technologies as a service infrastructure for data-driven diversification and the creation of competitive advantage within so called service ecosystems (Frow et al. 2014; Chandler and Lusch 2015). These shall be understood as “relatively self-contained, self-adjusting systems of resource-integrating actors connected by shared institutional logics and mutual value creation through service exchange” (Vargo and Lusch 2011, p. 11).

The paper is structured as follows: Section 2 illustrates the changing role of metadata in the publishing industry and highlights the added value of Linked Data as a new paradigm to turn data into a network good. Section 3 discusses the Linked Data ecosystem by differentiating prototypical data traffic patterns, the contribution of Linked Data to the content value chain and associated licensing issues. Section 4 presents two use cases from the publishing industry to illustrate how these companies have positioned themselves within the Linked Data ecosystem. Section 5 gives a conclusion and outlook on future research.

Metadata and the publishing industry

Metadata from a resource-based point of view

Industrialization and – as a consequence – the digitization of information production coincided with the increasing proliferation of metadata management as a critical factor in the organization of large quantities of coded information (Bowker and Star 1999; Kitchin 2014) and the diversification of product and service portfolios on top of existing assets (Hass 2011; Pellegrini 2013). From a resource-based perspective (Barney 1995 & Barney 2001; Barney et al. 2011) building, advancing and utilizing metadata capabilities for the diversification of product and service portfolios becomes a strategic resource and a core component to the profit maximization strategies of publishing companies. Building on the argument of Sirmon et al. (2011), the strategic utilization of metadata can be interpreted as a means of resource orchestration, especially when the business environment of the company is characterized by supply inelasticity and the company itself exercises control over the distribution channels for its products and services. Under such circumstances metadata is being applied to reutilize and recompile existing digital assets for the cost-efficient creation and marketing of new products and services, and the deliberate exploitation of market opportunities. Hence, the professionalization of metadata management should be understood as a strategic activity that generates valuable, non-imitable resources which form the basis for new business practices and competitive advantage – either by reducing operational costs or by extending strategic capabilities into new markets. Herein, the author follows the argument of Wan et al. (2011, p. 1340) that it is not “external market failure [that] encourages firms to engage in internal growth; rather, it [is] an internal resource perspective that underscores firms’ motivation to maximize their resources by diversifying into (related) businesses”. The following sections will discuss the increasing importance of and contemporary trends in metadata management and the strategic implications of Linked Data as a technological enabler for resource maximization and business diversification.

Towards a metadata shift

Originating from the library and information sciences metadata standards and practices have spilled over into other industries increasingly being utilized for two purposes: the semantic description and the automated exchange of data. One of the first to adopt progressive metadata practices was the newspaper industry that at the verge of digitization in the 1960s started to develop unified exchange formats for news content of all common media types.Footnote 3 About a decade later the book industry started to adopt and develop specific metadata standards for broader purposes of media asset management nowadays known as MARC (Machine Readable Cataloging),Footnote 4 ONIX for Books,Footnote 5 DublinCoreFootnote 6 or PRISM (Publishing Requirements for Industry Standard Metadata)Footnote 7 to name, but a few.

With the emergence of the World Wide Web as a universal platform for the creation and distribution of information the nature and functional characteristics of metadata have changed significantly. This trend is illustrated by a survey conducted by Saumure and Shiri (2008) on research topics in the Library and Information Sciences before and after 1993. Table 1 shows their research results.

Table 1 Changes in research areas of the library and information science. Source: Saumure and Shiri (2008)

The survey illustrates three trends: 1) the spectrum of research areas has broadened significantly, 2) while certain areas have kept their status over the years (i.e. Cataloging & Classification or Machine Assisted Knowledge Organization), new areas of research have entered the discipline (i.e. Metadata Applications & Uses, Classifying Web Information, Interoperability Issues) while others have declined or dissolved into other areas (i.e. Cognitive Models), and 3) practical aspects of metadata management have become the primary research area.

The “Metadata Shift” (Haase 2004) described in the survey mentioned above can also be observed in the media industry and related initiatives (i.e. Dublin Core, who started its activities in 1994Footnote 8) that, since the 1990s, have begun to increasingly address issues of interoperability in the exchange of data. The standards IPTC NewsCodesFootnote 9 and the recently introduced semantic web-enabled microformat rNews v1.0Footnote 10 are examples of this trend. Both metadata standards are composed of a reasonably manageable amount of concept classes with a sufficient level of semantic expressivity in terms of domain-specific vocabulary and data types that cover the most important attributes of a news item and related media types. The metadata standards can be extended with any other controlled vocabulary (i.e. DublinCore) that adhere to Semantic Web standards (like the Resource Description Format - RDFFootnote 11). By applying Semantic Web principles to existing metadata standards, the media industry is incrementally provided with a technological infrastructure that improves the machine-readability and semantic interoperability of digital media assets, and also changes their good characteristics from isolated artefacts into interconnected assets within a digital ecosystem.

Technological impact of linked (meta) data

Semantic interoperability is crucial to building cost-efficient, interconnected IT systems that integrate numerous data sources (Cranford 2009; Mitchell and Wilson 2012). Since 2009, the Linked Data paradigm has emerged as a light weight approach to improve data portability between various systems on top of a common data model called RDF (Resource Description Framework). By building on RDF (and additional Semantic Web standards like OWLFootnote 12 or SPARQLFootnote 13), the Linked Data approach offers significant benefits compared to conventional data management practices. These are according to Auer (2011):

  • De-Referencability. Identifiers (URIs) are not just used for identifying entities, but since they can be used in the same way as URLs they also enable locating and retrieving resources describing and representing these entities on the Web.

  • Coherence. When an RDF triple contains URIs from different namespaces in subject and object position, this triple establishes a link between the entity identified by the subject (and described in the source dataset using namespace A) with the entity identified by the object (described in the target dataset using namespace B). Through these typed RDF links, data items are effectively and coherently interlinked.

  • Integrability. Since all Linked Data sources share the RDF data model, which is based on a single mechanism for representing information, it is very easy to attain a syntactic and simple semantic integration of different Linked Data sets. A higher-level semantic integration can be achieved by employing schema and instance matching techniques and expressing found matches again as alignments of RDF vocabularies and ontologies in terms of additional triple facts.

  • Timeliness. Linked Data can be easily published and updated, and thus facilitates a timely availability. In addition, once a Linked Data source is updated it can instantaneously be accessed and used, since time consuming and error-prone extraction, transformation and loading is not required.

On top of these technological principles Linked Data promises to improve primarily the reusability and richness (in terms of depth and broadness) of digital content, but also altering traditional editorial workflows towards new forms of resource integration. Based upon this approach, the following section will elaborate how Linked Data can be utilized to add value to the content value chain in the production and distribution of digital content.

The linked data ecosystem

Linked Data marks a transition from hierarchies to networks as an organisational principle for information. Hence, the primary value proposition of Linked Data is rooted in its modular flexibility and network characteristics deriving thereof. By sharing the Resource Description Framework (RDF) as a common data model, Linked Data provides the infrastructure for publishing and repurposing of data on top of semantic interoperability.

Linked data traffic patterns

Taking the network characteristics of Linked Data into account, it is possible to identify three prototypical usage scenarios that leverage the potential of increased semantic interoperability, here described as Linked Data Traffic Patterns.

Scenario 1: Internal Perspective: From an internal perspective organizations utilize Linked Data principles to organize information within closed organizational settings. This is especially relevant for organizations whose information is spread among dispersed databases or repositories, entailing challenges with respect to integrating and querying federated data and the harmonisation of legacy issues. Linked Data is bearing a high potential in consolidating dispersed information infrastructures without necessarily disrupting existing systems and workflows.

Scenario 2: Inbound Perspective: In the second scenario organizations aggregate data from external data sources for purposes like content pooling or content enrichment. This trend is basically backed by the increasing availability of open data, i.e. provided by governmental bodies, community projects (i.e. Wikipedia,Footnote 14 MusicbrainzFootnote 15 or GeonamesFootnote 16) or enterprises (i.e. Socrata,Footnote 17 FactualFootnote 18 or DatamarketFootnote 19). Instead of creating these resources on their own, organizations can reutilize existing data according to the Terms of Trade of the rights holder – sometimes free of charge or sometimes as paid service according to the service levels of an application programming interface (API).

Scenario 3: Outbound Perspective: In the third scenario organizations apply Linked Data principles to publish data on the web either as open data or via an API that allows the retrieval of data according to predefined service level agreements. This process called Linked Data Publishing is basically a diversification of a data distribution strategy and allows an organization to become part of a Linked Data Cloud (Halford et al. 2012; Jentzsch 2014). Data publishing strategies often go hand in hand with the diversification of business models and require a good understanding of the licensing issues associated with it (Pellegrini 2014).

Linked data in the content value chain

The value chain approach, as introduced by Michael Porter (1985/ Porter 1998) in the mid 1980s, is a core concept in strategic management which describes the structure of sector specific value creation mechanisms and sequential production logics. It has been adopted in a variety of ways for the information industry (i.e. Zerdick 2000; Kim et al. 2004), and recently, it has also gained popularity to systematize the value creation process in data-driven business models as part of the European Commission’s Open Data Initiative (COM/2010/0245 final).

The value chain is comprised of distinct stages in the process of value creation, from which each step, in various degrees, contributes to the competitive advantage of an organization. According to this concept the content value chain consists of five steps: 1) content acquisition, 2) content editing, 3) content bundling, 4) content distribution and 5) content consumption. As illustrated in Fig. 1, Linked Data can contribute to each step by supporting the associated intrinsic production function.Footnote 20

Fig. 1
figure 1

Linked data in the content value chain

Content acquisition mainly comprises the collection, storage and integration of relevant information necessary to produce a marketable product or service. In the course of this process, all necessary components are pooled from internal or external sources for further processing. Recent developments have illustrated that Linked Data has been successful in tackling the problem of automated content aggregation (Graube et al. 2011; Hee et al. 2007; Heino et al. 2011), especially in connection with multimedia information (Schandl et al. 2011; Messina et al. 2011), and its enrichment with data from Linked Data sources like DBpediaFootnote 21 or the Linked Movie Data Base.Footnote 22 Hausenblas (2009), Kobilarov et al. (2009) and Rayfield (2012) provide a comprehensive insight into how the BBC is pulling Linked Data to improve existing web applications for purposes such as content syndication, enrichment and page navigation. I.e. BBC Music is aggregating data from MusicBrainz, Wikipedia and DBpedia to enrich its own database with external information on music related topics. And BBC Sport is using Linked Data technologies to aggregate data from hundreds of internal sources for the automatic generation of landing pages for individual athletes, teams, sports disciplines and competitions.

Content editing includes all necessary steps that deal with the adaptation, interlinking and enrichment of data. Adaptation can be understood as a process, in which acquired data is organized and provided in a way, so that it can be used in the editorial process. Interlinking and enrichment are often performed via processes like tagging and/or referencing of other media assets. Content editing is a highly time- and cost-intensive activity. Hence, cost-efficiency and quality considerations are at the core of the content editing process and potential areas for automisation on top of expressive metadata. Early work (Kosch et al. 2005; Ohtsuki et al. 2006; Smith and Schirling 2006) provides design principles for a metadata life cycle management and demonstrates the value of well-structured metadata for indexing and compiling multimedia documents across various modalities like text, speech and video. More recent approaches investigate the benefits of ontologies in organising and reutilising semantic metadata for purposes such as metadata enrichment (Hu et al. 2009; Mannens et al. 2009), collaborative tagging practices (Kim et al. 2008) or adaptive content services (Yu et al. 2010).

Content bundling mainly comprises the compilation, contextualisation and personalisation of information products. It can be used to provide customized access to media files, i.e. by using metadata for the device-sensitive delivery of media assets, or to compile thematically relevant material into comprehensive products or product lines, improving thus the navigability, findability and reuse of information. This can be achieved by applying so called mixed-initiative approaches, where machines and humans are interacting in feedback loops when compiling a product or service (i.e. Jokela et al. 2001; Bomhardt 2004; Zhou et al. 2007; Gao et al. 2009; Knauf et al. 2011; Malheiros et al. 2012). Expressive metadata also stimulates purely algorithmic compilation of personalized products (i.e. Liu et al. 2007; Bouras and Tsogkas 2009; Schouten et al. 2010; Ijntema et al. 2010; Goosen et al. 2011) by calculating similarities between media assets, and thus improve the relevance of automated filtering and recommendation services on top of legacy data. This allows new forms of knowledge discovery and delivery services that go beyond the established search and retrieval paradigms and provide the users with a richer interaction experience, often at cost of privacy intrusion and disclosure of personal information.

In a Linked Data environment the process of content distribution mainly deals with the provision of machine-readable and semantically interoperable (meta)data via Application Programming Interfaces (APIs) or SPARQL Endpoints (Knowles 2002; Zimmermann 2011). These can be designed either to serve internal purposes, so that data can be reused within the controlled settings of an organization, or for external purposes, so that data can be shared between organizations or with the public. Lots of media-related datasets are already available as Linked Data (i.e. LinkedMDB,Footnote 23 DBpediaFootnote 24 or MusicBrainzFootnote 25). Over the past years, several media companies have started to offer Linked Data to the public. Since 2009, BBC is offering a SPARQL Endpoints for their program, music, and sports data (i.e. Smethurst 2009; Kobilarov et al. 2009; Rayfield 2012), and in the same year the New York Times has started to offer large amounts of subject headings via its Article Search APIFootnote 26 (Larson and Sandhaus 2009). Similar activities are carried out by ReutersFootnote 27 and The GuardianFootnote 28 (Dodds and Davis 2009) or Nature Publishing.Footnote 29

Content consumption is the last step in the content value chain. This includes any means that enable a human user to search for and interact with media assets in a comfortable und purposeful way. So, according to this view, this level mainly deals with end user applications that make use of Linked Data to provide access to products and services, i.e. by providing reasonable interfaces. Over the past years, increasing attention has been paid to visualization and interaction issues associated with Linked Data although this area of research is still in its infancy (Böhm et al. 2010; Fu et al. 2007; Hoxha et al. 2011; Paulheim 2011; Freitas et al. 2012). Research on and the improvement of interface design for the handling of semantic data services will be one of the critical success factors in the broad adaptation of Linked Data in publishing industry.

Stakeholder roles in a linked data ecosystem

As illustrated in Fig. 2, Latif et al. (2009) propose a simple model that describes the value creation process in a Linked Data ecosystem.

Fig. 2
figure 2

Linked data value chain. Source: Latif et al. (2009)

The model distinguishes between various stakeholder roles in the creation of Linked Data assets and various types of data and applications that are created along the data transformation process. Stakeholder roles are raw data provider, linked data provider, application provider and finally end user. The stakeholders differ according to their contribution to the Linked Data value chain. Along this process of value creation, raw data – which is provided in any kind of Non-RDF format (i.e. XML, CSV, PDF, HTML etc.) – is transformed into Linked Data which is consumed and processed by a Linked Data application. Finally the end user consumes the human readable data via functionally extended applications and services. As illustrated in Fig. 2, the process of Linked Data creation can be covered in its entirety by a single economic actor or it can be split among several actors who are functionally intertwined via a Linked Data ecosystem. Kinnari (2013) extends this view with an orthogonal layer called “support services and consultation”, stressing the fact that apart from the value creation process itself, Linked Data also creates an environment for added value services that transcends the pure transformation and consumption of data. Such services are usually provided by data brokers who collect, clean, visualize, and resell available data for further processing and consumption.Footnote 30

For the time being, it is difficult to estimate the cost-effectiveness of Linked Data, but several surveys indicate that depending on the scale and scope of a Linked Data project the saving potential can be significant (Cranford 2009; McHugh 2009). These savings result from the network effect which Linked Data generates as an integration layer across various components and workflows in heterogeneous IT systems. Herein, Linked Data can help to reduce technological redundancies, and thus reducing maintenance costs, improving information access in terms of reduced search and discovery efforts, and providing opportunities for business diversification due to the higher granularity and increased connectivity of digital assets (Mitchell and Wilson 2012).

Licensing policies for linked data assets

Technology per se has never been a sufficient precondition for new modes of value creation (Knowles 2002). In the case of Linked Data it is not just the methodology that entails a disruptive potential, but also the changing nature of data as an economic good and its appropriate protection as intellectual property.

Linked Data is comprised of various asset types that emerge during semantic processing of data. These are instance data, metadata, ontologies, services, and technologies. Each asset type contributes in its special way to the value creation process, and thus can be protected by appropriate licensing instruments like copyright, database rights or patents. Table 2 provides an overview over Linked Data assets and related property rights.Footnote 31

Table 2 Linked data assets and related property rights

The legal regimes of Copyright,Footnote 32 Database Right,Footnote 33 Competition LawFootnote 34 and Patent LawFootnote 35 are being complemented by open licensing policies.Footnote 36 Creative CommonsFootnote 37 allows to define tired licensing policies for the reuse of work protected by copyright. Open Data CommonsFootnote 38 does the same thing for assets protected by database right. And open source licenses complement the patent regime as an alternative form of resource allocation and value generation in the production of software and services (Ghosh et al. 2006).

The open and non-proprietary nature of Linked Data design principles allow to easily share and reuse this data for collaborative purposes. This also offers opportunities to publishers to diversify their assets and nurture new forms of value creation (i.e. by open innovation policies) or unlock new revenue channels (i.e. by establishing highly customizable data syndication services on top of fine granular accounting services). To meet these requirements, commons-based licensing approaches like Creative Commons or Open Data Commons have gained popularity over the last years, allowing re-usability while at the same time providing a framework for protection against unfair usage practices and rights infringements. Nevertheless, to meet the requirements of the various asset types, a Linked Data licensing policy should make a deliberate distinction between assets that are protected by database rights and assets that are protected by copyright. Additionally the policy should provide its information not just in a human-readable representation, but also provide the licensing information in a machine-readable way, given the fact that with the increasing reusability of Linked Data the transaction costs of rights clearance and contracting tend to rise. Automated brokering systems can make use of machine-processable licensing information, and thus decreasing the contracting costs significantly. Nevertheless, appropriate licensing practices in the commercial utilization of Linked Data are still in its infancy, but the awareness about the importance of licensing policies as a trigger for business development is rising (Ermilov and Pellegrini 2015).

Linked Data licensing policies provide a secure and resilient judicial framework to protect against the unfair appropriation of (open) datasets and contribute to the strategic aims of an organization to generate competitive advantages, create added value on top of existing assets and diversify its business models.

Case studies in linked data utilization

This chapter discusses two case studies of Linked Data utilization at the publishing companies Wolters KluwerFootnote 39 and Reed Elsevier.Footnote 40 Both companies are global players in the market of scientific publishing and consider themselves to be competitors. Additionally, in both cases the supply situation is characterized by a high degree of inelasticity and both companies exercise tight control over the distribution channels of their products and services. Hence, they perfectly fit into the analytic framework of the resource-based theory as laid out in chapter 2. The information given in the two case studies was gathered from company material and interviews conducted with company representatives who are in charge of innovation management and the companies’ transition to Linked Data technologies. The aim was to gain understanding of the adoption of Linked Data technologies and how they contribute to value creation within the company. Both company representatives were interviewed using the same semi-structured interview scheme, with questions focussing on the motivations for Linked Data deployment, the application area, the contribution to the value chain, and the licensing policy associated with newly emerging data assets. After the interviews have been transcribed and analysed according to the criteria mentioned above, the interviewees received a copy of the analysis for review, clarification, suggestions and updates. Table 3 gives an overview over the central findings.

Table 3 Comparison of linked data strategies between Wolters Kluwer and reed elsevier

Case 1: Wolters kluwer

Motivation

The primary motivation for Wolters Kluwer to engage in Linked Data is to reduce costs of the editorial process. This is achieved by reutilizing existing assets either from within the company or from trusted third party sources, reducing the efforts to generate and maintain assets themselves or by sharing existing assets across organizational units. Additionally, Linked Data allows Wolters Kluwer to build functionalities and provide services that have not been possible before, triggering innovation on top of collaborative practices. This stimulates new content exploitation practices by utilizing application programming interfaces and other forms of service-based principles on top of automated content processing.

Application area

Wolters Kluwer uses Linked Data technology to support several editorial processes in the compilation and provision of legal information to their customers. At the time of writing, this mainly takes place within their proprietary syndication platform Jurion©, a professional service that supports legal professionals like lawyers, attorneys, judges or notaries in their daily work. The Jurion© platform provides information management functionalities like search services, libraries and knowledge management services for the professional handling and processing of legal information.

Value chain

Wolters Kluwer utilizes Linked Data technology along its entire content value chain. To achieve this, they have been engaging in a major change management process that has been successively rolled out among the whole corporation. The semantic metadata is steering the whole production process from acquisition to distribution of content. Additionally, semantic metadata is being applied to improve the customer experience at the content consumption level.

Traffic patterns

At the time of writing, Wolters Kluwer uses Linked Data technologies mainly to aggregate and organize internal information from sources from within the company. Inbound aggregation of content from external sources or the provision of content via outbound practices is being considered to be part of future strategies, but for the time being, this is not a primary strategic aim. Nevertheless, Wolters Kluwer is aware of the strategic value generated especially by outbound practices, i.e. by providing data assets to the public and thus initiating positive feedback loops for the content providers in the sense of open innovation.

Stakeholder role

Wolters Kluwer currently acts as a Linked Data provider, i.e. by providing several Linked Data vocabularies to the public, and as a Linked Data application provider, i.e. by utilizing Linked Data within their Jurion© platform. The strategic focus lies primarily on the second aspect as Wolters Kluwer is keen on exercising a sufficient amount of control over their assets and how they are being utilized by third parties. Providing Linked Data assets to the public is part of this strategy. For the time being inbound activities are of minor relevance given the fact that open data often lacks sufficient licensing and provenance information, thus posing a risk to quality assurance and legal security.

Licensing strategy

For the first time in the company history, Wolters Kluwer perceives the diversification of its licensing strategy as an issue. In former times, intellectual property issues were comparably simple. In the future, asset management based on a diversified licensing strategy will be a critical issue in the exploitation of new content markets. At the time of writing, Wolters Kluwer is experimenting with the combination of copyright and Creative Commons licenses to develop a nuanced asset management strategy for the future.

Case 2: Reed elsevier

Motivation

Reed Elsevier’s business rationale is to expand their market share among higher education customers by offering advanced learning and certification services. To do so, they have started to reorganize their editorial workflows by introducing a semantic metadata layer that interlinks the assets managed by formerly separated business units. This shall improve the effectiveness in the creation of new products and services and create opportunities for business development under special consideration of new content delivery platforms and consumption habits.

Application area

Reed Elsevier utilizes Linked Data technologies within their product line Elsevier Optimized Learning Suite© (EOLS). EOLS lets its users create so called learning journeys by allowing teachers and students to arrange learning objects in a consistent and responsive way under special consideration of mobile consumption. Linked Data principles are being applied to automatically extract knowledge artefacts from existing sources, arrange them in didactically meaningful ways and offer them as consistent products to their customers.

Value chain

The current application focus lies on editing and bundling of learning objects for purposes like context-sensitive grouping and compiling of content objects. Semantic relationships are being used to create tight cores of semantically interlinked objects, which are being provided as a consistent product. Additionally, semantic metadata is used for recommendation purposes of semantically related content objects. So semantics is instrumental to creating the product itself, define product families or product lines and support the reutilization of existing assets for various consumption purposes and responsive formats as an extension to the traditional product lines (i.e. handbooks).

Traffic patterns

For the time being, Reed Elsevier applies Linked Data in a strictly controlled business environment. They use Linked Data technologies internally to support editorial offices in collaboratively creating semantic relationships, semantically enrich datasets and support the quality approval process. Inbound content is retrieved from certified partners only who have to adhere to strict bylaws, policies and quality standards. Currently, no open data is utilized, but Reed Elsevier leaves the option open for the future. Nevertheless, Reed Elsevier retrieves a lot of content from certified partners which makes the inbound process a critical phase in the value creation process. Inbound efforts are almost equal to efforts for internal processes. From the outbound perspective, no data publishing or publishing of vocabularies or ontologies currently occurs. All outbound activities are subject to strict licensing agreements with certified partners where the costs of maintaining shared resources are shared equally among all partners involved. Currently, there are no plans to open up any assets for public use apart from some low level assets that have no strategic value to the company.

Stakeholder role

Reed Elsevier describes itself as a relatively closed system that does not serve an ecosystem in a broader sense. If Linked Data is exposed to the outside world, than it is done in a very controlled fashion. There are no endpoints or APIs available to the public. Any utilization of Linked Data takes place within a strongly controlled B2B environment, but Reed Elsevier is aware that there are opportunities ahead. The strategic aim is to become a data integration partner and service provider within knowledge and research institutions. This entails the creation of a shared knowledge backend that does not only provide Elsevier’s products, but also provides the data services to cross-fertilize research and the record of science as a whole by integrating datasets from previously separated sources and sites.

Licensing strategy

Licensing has become a big issue. Traditional contracting models have become dysfunctional for repurposing and disseminating content under circumstances of multi-channel publishing. Reed Elsevier currently faces a severe backlog in the adjustment of existing licenses. All new contracts take new options of repurposing and appropriate compensation models into account, but existing legacies impose certain obstacles to jumpstart new products, introduce new functionalities and service-levels. Thus, in the future dynamic licensing will be vital to serve various business models, i.e. allowing new charging models. Dual licensing is currently being applied under their open access policies for certain journals, but data licensing is currently not an issue.

Interpretation of results

The two use cases show similarities and differences in the commercial utilization of Linked Data principles. Similarities exist in the utilization of Linked Data technologies for purposes of collaborative data management and reutilizing assets across business units and organizational boundaries. This is a clear indicator that both companies are applying Linked Data technologies to leverage internal resources for the expansion into new markets (Reed Elsevier) or the improvement of existing costs structures in the production and distribution of existing assets (Wolters Kluwer). In both cases Linked Data serves as a technological and an organizational integration layer, impacting existing workflows and working practices, and triggering new products and business models. Nevertheless, differences exist in the exploitation of these opportunities and in the notion and design of the appropriate ecosystem.

Wolters Kluwer is aiming at nurturing an open business environment inspired by the principles of open innovation. They have started to experiment with providing certain resources under a dual licensing policy, thus stimulating community dynamics and collaborative business practices. Feeding into and retrieving input from an open business environment will become more important in the future and a strategic cornerstone of their business development practices. On the contrary, Reed Elsevier relies on strict control mechanisms in governing the value creation process. They utilize a very strict licensing model, arguing that for reasons of quality assurance they need to exercise tight control over their business environment and associated collaborators. Nevertheless, they are aware of the business opportunities offered by open innovation, but have not yet embraced this culture.

Despite the differences, both publishing companies have identified Linked Data technologies as the core of their innovation strategy having a profound impact on existing business practices and strategies of value creation. It allowed them to use existing resources more effectively and extend their practical capabilities by reutilizing existing resources in new contexts within and beyond the boundaries of their company.

Conclusion & outlook

Without any claim to completeness and representativeness, this paper discusses the strategic utilization of semantic metadata and its impact on business practices in the publishing industry from a resource-based point of view. Linked Data can be described as a change agent in the strategic transformation of the publishing industry under conditions of digitization and collaborative value creation within service systems. Hence, the resource-based approach discussed at the beginning of this paper provides a robust theoretical framework for the explanation and understanding of the adoption of Linked Data technologies. Nevertheless, numerous issues require further investigation.

One issue is the relative complexity of Linked Data as a technologically induced innovation. The two case studies presented in this paper might claim a blueprint for large enterprises that exercise control over markets with high supply inelasticity, but they have little explanatory value when it comes to the adoption of Linked Data technologies by small and medium sized enterprises. Lack of skills, competencies, financial resources and economies of scale in the production and distribution of content might hinder smaller companies in actively participating in newly emerging service ecosystems. This raises questions about market disparities, anti-competitive behavior, and concentration of market power.

Another aspect that has not been touched upon in this paper is the quality assurance of Linked Data with respect to validity, trustworthiness, and provenance of (open) data. Given the fact that the publishing industry is highly sensitive to such quality criteria, efforts should be undertaken to significantly improve the quality of Linked Data and provide transparent and reliable measures for quality assurance and maintenance – especially when it is created via crowdsourcing or similar collaborative practices. Additionally, it needs to be discussed how free-rider effects can be prohibited and incentives created, not just to consume, but also to contribute to emerging Linked Data ecosystems. Tackling these challenges is equally important as managing the technological feasibility of Linked Data in editorial processes, but probably much harder to accomplish.

A third aspect affects the issue of business and revenue models that appropriately reward the various stakeholders within a Linked Data ecosystem. Linked Data will make a big leap forward if it can prove to significantly reduce the costs of existing editorial workflows, create new revenue opportunities and provide incentives and fair compensation models across the Linked Data value chain. For the time being, these issues are open to debate, although the expectations are very high.