Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

The following proposed enhancements follow the numbering scheme outlined in "Merritt, EZID, and the Web Archiving Service Enhancements and Development Activities" (Rev. 0.1 – 2012-02-02)

3. Enhancements

3.1. Exposing Content

The following items are focused on enhancing the discovery of relevant curated data by consumers.  Several of the development activities defined here in Section 3.1 and also Section 3.3 are focused on expanding accessibility to content in Merritt, moving it from a "dark" to a "brighter" archive.

3.1.1. Visibility of key metadata elements, including author/creator affiliation, journal citation, and abstract

Currently, only supplied Dublin Kernel (DK) metadata fields are visible through the Merritt UI.  (Arbitrary metadata may be supplied as a component of a submitted data set.  It is stored and is retrievable, although not currently visible.)  Merritt should deploy additional schema parsers to recognize and surface metadata elements found as components of a submitted content. Merritt users will be able to submit metadata in various formats, e.g. MODS. Merritt will be able to process these metadata and allow for search and display of metadata in their native formats.

3.1.2. Exposure of content for search engine indexing

Research in search engine optimization (SEO) strongly suggests that an ever increasing number of researchers find primary data through the major internet search engines.  (Google, Yahoo!, and Bing collectively account for over 95% of search activity.)  Merritt and EZID should expose designated data and metadata for harvesting by these search engines.  This will require the generation and registration of appropriate sitemaps. * *Curators will be able to indicate if they wish their metadata to be crawled by search engines.

3.1.4. Mediated communication between consumers and providers

The simplest way to address this need in the short term is to support the visibility of provider email addresses as part of dataset metadata to facilitate out-of-band communication.  Supporting a more robust threaded discussion capability would require a much more substantial period of investigation and development.

3.1.5. Branding 

Provide a way for providers to brand deposited content at the collection level, presumably in the form of supplied header/footer, descriptive text, logo, links to further information, etc.  Organizations will be able to add their logo to a collection.

3.1.6. DataONE Member Node

We are working with DataONE to become a DataONE member node. Merritt clients will have the option to include metadata about datasets meeting the DataONE collection criteria in the DataONE union catalog.

3.2. Submission

The following items are focused on enhancing the experience of submitting content for curation by providers.

3.2.1. Support for multiple metadata schemas

Currently, the Merritt submission interface only provides for the input of Dublin Kernel metadata.  All of the additional metadata elements/schemas identified in the "Exposure of content for search engine indexing" enhancement should be supported through the submission interface, i.e., have provision for supplying metadata at the point of deposit.  This effort may be informed by activities of the NSF DataONE Preservation and Metadata working group.   Merritt submitters will have a wider range of options for including descriptive metadata in their native formats, such as MODS, without the need to derive Dublin Kernel metadata.

3.2.3. Simplified submission workflows for single objects

While submission of simple, single objects is straightforward, deposit of more complex objects or batches of objects still raises unacceptably high barriers to widespread adoption.  Merritt should support significantly simplified submission workflows, possibly facilitated by new easy-to-use client-side tools, such as a “drop box” type of functionality.  This activity will result in phased incremental enhancements, possibly including an automated manifest builder, support for Sword/Atom submission protocols, reimplementation of submission UI taking greater advantage of Dropbox-like Javascript/AJAX; ONEdrive-like desktop client supporting a file browser user experience.   Merritt users will have simpler ways to submit their content

3.3. Access Control

The following items are focused on enhancing provider control over consumer access to curated content.

3.3.1. Anonymous public access to designated collections and objects

This has previously been identified as the top priority for Merritt development.  The designation of collections and/or objects to be exposed publicly is performed by providers, based on local policy decisions.  Merritt curators will be able to designate their collections publicly accessible, and users will have direct access to materials stored in Merritt.

3.3.2. Click-through Data Use Agreements

It is desirable that certain datasets are not made available until the consumer has explicitly accepted the terms of a Data Use Agreement (DUA).  Merritt should support the association of collections and objects with DUAs, and maintain a persistent store of consumer acceptances.  In terms of user experience, the consumer will be shown a click-through dialog box (or some similar UI mechanism) at the point of the first request for access.  Most DUAs will require submission of user identity, as well as email, affiliation, and optional comment.  Initially this will be user supplied information; in the future support for OpenID and Shibboleth-based IdM federations is desirable.  DUA acceptance should be stateful to avoid repeated challenges for the same user/object combination, and to support routine administrative queries.  Most collections will have a default DUA, but which can be overridden on a per-object basis.  The response to DUA acceptance may vary: direct access to designated data; required email activation; required data provider approval.

3.3.3. Distinct access rules for metadata and content

Currently, the granularity of access control is at the object level; if designated for read access, all components of the object are accessible.  There are use cases in which it is desirable for a meaningful distinction to be made between object metadata, which generally should be open for the widest access, and data, which may be subject to more restrictions.  Merritt should support expression of access control rules at a finer granularity supportive of a metadata/data distinction, and these need to carry through to EZID as appropriate so that any indexing of EZID metadata respects those designations. Curators will have the option of allowing the metadata for their objects in Merritt to be accessible by the public, while restricting access to the objects’ associated files. What this means from EZID’s perspective is that researchers can control how their data/resources are indexed and exposed.

3.3.4. Limited time embargoes

As an extension to controlling access to content in Merritt at a finer granularity, users will also be able to add a time-based valence to when materials can be more broadly exposed.  As an example, a user can submit content -- such as a dataset -- to Merritt in an intermediate or working state that is not ready for use by others, but specify a specific date for when it can be made available.

3.3.5. Self-service user account registration

Merritt account registration is currently an off-line operation that requires communication with the Merritt service manager.  Merritt should support a self-service model for new account registration.

3.6. Merritt storage and legacy infrastructure

3.6.1. SDSC Cloud storage

We want to take advantage of the cloud storage offered by the San Diego Supercomputer Center, allowing for further cost-savings and extending the replication of content stored in Merritt. This requires some changes to the architecture of our storage micro-service.

3.6.2. Migrate DPR collections to Merritt

We want to migrate the content from the Digital Preservation Repository (DPR) to Merritt. We know that some clients do not wish us to migrate their content for them, but would rather submit it themselves to Merritt. We will contact our DPR clients to determine the most appropriate actions for migrating their content.

3.7. Integrate Merritt with DataONE repositories

Merritt will be deployed as a DataONE member node which will allow users that deposit earth science data in Merritt to expose them in DataONE

4.  Community Building

4.1. Micro-services community building

We want to release the software we've developed for the Merritt repository as free and open source, and begin to engage a community of users to help us build out and improve it. This will require some work on our part, and additional resources for meetings to begin to build the community.

5.  Policy Development 

5.1. Cost accounting and pricing model for Merritt

We will implement more formal processes for working with clients, including service-level agreements, fee structures, and billing cycles. Develop a pricing model for internal and external users, seeking to keep costs as low as possible for UC affiliates.  The pricing model will be of two types: 1) pay as you go and 2) pay once, store forever.

5.2. Trusted Repository Audit and Certification (TRAC) for Merritt

Assemble documentation to respond to the Trustworthy Repository Audit and Certification (TRAC) checklist, with a goal of conducting a self-audit. We want to do this in a transparent manner, so that the community can understand and comment on the policies and chart our progress.

5.3. Data Publication

Using EZID, Merritt, and CDL’s publishing services UC3 and the CDL Publishing and Access Group will explore developing capacity for “data papers.”  A data paper is a package that includes data, its metadata, and any processing information (techniques, formulas, software, etc.) that together define the state of a data product that was used and is ready for re‐use in traditional journal articles.  See CDL white paper for more information:

  • No labels