The following proposed enhancements follow the numbering scheme outlined in "Merritt, EZID, and the Web Archiving Service Enhancements and Development Activities" (Rev. 0.1 – 2012-02-02)
3.1. Exposing Content
The following items are focused on enhancing the discovery of relevant curated data by consumers. Several of the development activities defined here in Section 3.1 and also Section 3.3 are focused on expanding accessibility to content in Merritt, moving it from a "dark" to a "brighter" archive.
3.1.1. Visibility of key metadata elements, including author/creator affiliation, journal citation, and abstract
Currently, only supplied Dublin Kernel (DK) metadata fields are visible through the Merritt UI. (Arbitrary metadata may be supplied as a component of a submitted data set. It is stored and is retrievable, although not currently visible.) Merritt should deploy additional schema parsers to recognize and surface metadata elements found as components of a submitted content. Merritt users will be able to submit metadata in various formats, e.g. MODS. Merritt will be able to process these metadata and allow for search and display of metadata in their native formats.
3.1.2. Exposure of content for search engine indexing
Research in search engine optimization (SEO) strongly suggests that an ever increasing number of researchers find primary data through the major internet search engines. (Google, Yahoo!, and Bing collectively account for over 95% of search activity.) Merritt and EZID should expose designated data and metadata for harvesting by these search engines. This will require the generation and registration of appropriate sitemaps. * *Curators will be able to indicate if they wish their metadata to be crawled by search engines.
3.1.4. Mediated communication between consumers and providers
The simplest way to address this need in the short term is to support the visibility of provider email addresses as part of dataset metadata to facilitate out-of-band communication. Supporting a more robust threaded discussion capability would require a much more substantial period of investigation and development.
Provide a way for providers to brand deposited content at the collection level, presumably in the form of supplied header/footer, descriptive text, logo, links to further information, etc. Organizations will be able to add their logo to a collection.
3.1.6. DataONE Member Node
We are working with DataONE to become a DataONE member node. Merritt clients will have the option to include metadata about datasets meeting the DataONE collection criteria in the DataONE union catalog.
The following items are focused on enhancing the experience of submitting content for curation by providers.
3.2.1. Support for multiple metadata schemas
Currently, the Merritt submission interface only provides for the input of Dublin Kernel metadata. All of the additional metadata elements/schemas identified in the "Exposure of content for search engine indexing" enhancement should be supported through the submission interface, i.e., have provision for supplying metadata at the point of deposit. This effort may be informed by activities of the NSF DataONE Preservation and Metadata working group. Merritt submitters will have a wider range of options for including descriptive metadata in their native formats, such as MODS, without the need to derive Dublin Kernel metadata.
3.2.3. Simplified submission workflows for single objects
3.3. Access Control
The following items are focused on enhancing provider control over consumer access to curated content.
3.3.1. Anonymous public access to designated collections and objects
This has previously been identified as the top priority for Merritt development. The designation of collections and/or objects to be exposed publicly is performed by providers, based on local policy decisions. Merritt curators will be able to designate their collections publicly accessible, and users will have direct access to materials stored in Merritt.
3.3.2. Click-through Data Use Agreements
3.3.3. Distinct access rules for metadata and content
Currently, the granularity of access control is at the object level; if designated for read access, all components of the object are accessible. There are use cases in which it is desirable for a meaningful distinction to be made between object metadata, which generally should be open for the widest access, and data, which may be subject to more restrictions. Merritt should support expression of access control rules at a finer granularity supportive of a metadata/data distinction, and these need to carry through to EZID as appropriate so that any indexing of EZID metadata respects those designations. Curators will have the option of allowing the metadata for their objects in Merritt to be accessible by the public, while restricting access to the objects’ associated files. What this means from EZID’s perspective is that researchers can control how their data/resources are indexed and exposed.
3.3.4. Limited time embargoes
As an extension to controlling access to content in Merritt at a finer granularity, users will also be able to add a time-based valence to when materials can be more broadly exposed. As an example, a user can submit content -- such as a dataset -- to Merritt in an intermediate or working state that is not ready for use by others, but specify a specific date for when it can be made available.
3.3.5. Self-service user account registration
Merritt account registration is currently an off-line operation that requires communication with the Merritt service manager. Merritt should support a self-service model for new account registration.
3.6. Merritt storage and legacy infrastructure
3.6.1. SDSC Cloud storage
We want to take advantage of the cloud storage offered by the San Diego Supercomputer Center, allowing for further cost-savings and extending the replication of content stored in Merritt. This requires some changes to the architecture of our storage micro-service.
3.6.2. Migrate DPR collections to Merritt
We want to migrate the content from the Digital Preservation Repository (DPR) to Merritt. We know that some clients do not wish us to migrate their content for them, but would rather submit it themselves to Merritt. We will contact our DPR clients to determine the most appropriate actions for migrating their content.
3.7. Integrate Merritt with DataONE repositories
Merritt will be deployed as a DataONE member node which will allow users that deposit earth science data in Merritt to expose them in DataONE
4. Community Building
4.1. Micro-services community building
We want to release the software we've developed for the Merritt repository as free and open source, and begin to engage a community of users to help us build out and improve it. This will require some work on our part, and additional resources for meetings to begin to build the community.
5. Policy Development
5.1. Cost accounting and pricing model for Merritt
We will implement more formal processes for working with clients, including service-level agreements, fee structures, and billing cycles. Develop a pricing model for internal and external users, seeking to keep costs as low as possible for UC affiliates. The pricing model will be of two types: 1) pay as you go and 2) pay once, store forever.
5.2. Trusted Repository Audit and Certification (TRAC) for Merritt
Assemble documentation to respond to the Trustworthy Repository Audit and Certification (TRAC) checklist, with a goal of conducting a self-audit. We want to do this in a transparent manner, so that the community can understand and comment on the policies and chart our progress.
5.3. Data Publication
Using EZID, Merritt, and CDL’s publishing services UC3 and the CDL Publishing and Access Group will explore developing capacity for “data papers.” A data paper is a package that includes data, its metadata, and any processing information (techniques, formulas, software, etc.) that together define the state of a data product that was used and is ready for re‐use in traditional journal articles. See CDL white paper for more information: http://www.cdlib.org/services/uc3/docs/dax.pdf