Child pages
  • Identity

CDL Digital Preservation Program

Identity Service

Rev. 0.1 – 2009-03-12

 

1 Purpose

 

To distinguish an object from all others by unambiguous persistent naming and actionable resolution.

 

2 Concepts

 

An identifier is an association between a character string and a thing.  Things can be objects, files, parts of files, persons, organizations, abstractions, etc.

 

An identifier is much like an assertion, an opinion, or a thought; its only reality comes from what you, I, it, they, or we believe.  Belief is based on “authority”, often established by such things as trustedness of the witness (e.g., a family member to identify a victim) or weight of numbers (e.g., everyone agrees that song is …).  Communicating authority in the digital world is imperfect.  After clicking on a URL, you are returned exactly what the web server asserts is bound to it, whether that be the page you expected, a page you didn’t expect, or a “not found” error.  Web servers act on behalf of their owners via complex processes that can distort the exact sense of authority intended.  Servers are often not maintained or managed as you expect, returning results that you, the owner, or both would consider incorrect.

 

Authority is often supported by bindings. In many cases you don’t know what to expect or how to verify that what you got is “authoritative” (in whatever way you measure that).  It is important to be able to request information from a given authority (eg, DNS, or a website) about the identifier's bindings.  Opinions, hence identifiers, will differ, over time and depending on whom you ask – this is natural. Interesting problems arise when two trusted authorities disagree (and lots of popular fiction is based on mistaken identity).

 

Aside from the superficial form of the character string, the bindings drive every experience of the identifier, especially persistence and actionability (eg, you can "click it").  Bindings are often implicit in the filesystem layout beneath a web server.  They can be complemented with databases.  Strong bindings are created by embedding of identifiers within the objects they identify.

 

Minting generates strings.  Embedding the strings in URLs makes them actionable.  Publishing those URLs sets user expectations; in some sense the string isn't "used up" until it's made public.  The birth of an identifier is therefore more closely tied to its being published widely than to its being minted or even bound (or published very "narrowly").

 

Binding associates the string with metadata, with an object, and with support policies (which is metadata).  Resolution is an automated processes whereby an identifier's binding is fetched and then used, especially useful for URL redirection.

 

3 Anatomy of a Digital Identifier String

 

Identifier strings, or names , are often constructed from left to right in increasing specificity.  Digital identifiers are (currently) embedded in URLs, the hostname part of the URL makes an identifier string actionable.  In general, the hostname part acts as a Name Mapping Authority (NMA), providing an opinion (via a web server) about what the identifier is bound to.

 

After the NMA there is an explicit or implicit identifier scheme name , such as ARK, Handle, DOI, or URN.  After the scheme name the usually appears the Name Assigning Authority (NAA), which asserted an early (allegedly the first) opinion about what the string was bound to.  After that comes the name that the NAA assigned.

 

The assigned name itself have structure, beginning with a shoulder prefix .

 

Terms:

“ARK Namespace” (internal-only: “Bowspace”) populated by NAANs

Shoulderspace populated by ARK prefixes

Blade space=ARK identifier minus Shoulder

Tip=Check digit for entire ARK (optional; covers Bow+Blade)

Local Name=Shoulder + Blade (includes tip, which may be a check digit)

 

Configuration options when setting up minter:

  • NAAN
  • Shoulder/prefix
  • Shoulder: 1 st char is a-z; variable length
  • Blade: random vs. sequential string
  • Blade: infinite vs. set length
  • Blade: define pattern: for each char, extended (a-z, 0-9) or normal (0-9)
  • Tip: check digit Y/N

 

Best practices for shoulderspace:

  • Whenever shoulder prefix is used, the constant leading sequence cannot form part of another shoulder prefix
    • E.g. If “xt1” is already defined as a shoulder, “xt” cannot be used as another shoulder (without significant extra effort and risk in minting xt… ids that don’t begin xt1…).
  • Strongly recommend using three char length

 

All idempotent/safe services could be run in a distributed mode, but idempotent/unsafe would have to be coordinated between instances, or only run in single instance.

 

Separate binding, minting, and resolving services may be realized together on one host/database or combinations of hosts.

 

Binding is a very general operation that can be done inside metadata records, in bookmark files, by saving an id on a title page, etc.  Binding for the purpose of fast resolution should be done into the same database that drives the resolver; however, the binding interface may/should live at a hostname different from the resolver hostname, which appears in the published URL that embeds the id.

 

In general,

 

 

3 Abstract Methods

 

Identity functions are implemented via minters and resolvers.  An overall service instantiation, S, has

 

  (a) a command line interface,

  (b) a RESTful (URL-queryable) interface, and

  (c) various language bindings.

 

Shoulder-space (one per "shoulder" prefix, the fixed chars after then NAAN and before the generated chars), are determined by id "templates".  In this way, adding chars (extending id length) to a template does not create a new shoulder-space, even though it does create a new blade-space.

 

The methods are listed next.

 

Get-Service-State ():

 

Retrieve global state information about S, including:

 

  Globally-unique identifier of the service instantiation

  Enumeration of all supported shoulder spaces flagged as minter or resolver

[idempotent / safe]

 

Get-Namespace-State (namespace-identifier):

 

Retrieve state information about a namespace, including:

 

  Creation date

  Namespace syntactic rules

  Enumeration of all namespace identifiers

[idempotent / safe]

 

| Get-Identifier-State (identifier):

 

Retrieve state information about an identifier, including:

 

  Creation date

  Modification date

  Enumeration of all identifier referents (typed name/value pairs)

[idempotent / safe]

 

Add-Namespace (namespace-identifier, rules):

 

Define a new, or re-define an existing, namespace with respect to

its syntactic rules for minting.

 

[idempotent / unsafe]

 

Mint-Identifier (namespace-identifier):

 

Mint a new identifier in a namespace.

 

[non-idempotent / unsafe]

 

Bind-Identifier-Referent (identifier, name, type, value):

 

Bind a new, or re-bind an existing, named referent (typed value) to

an identifier for the purpose of resolution.

 

[idempotent / unsafe]

 

Resolve-Identifier (identifier, name):

 

Retrieve the typed value of a named referent.

 

[idempotent / safe]

 

Delete-Namespace (namespace-identifier):  (De-activate?)

 

Delete a namespace definition.  Note that this had no effect on

identifiers existing in that namespace.

 

[idempotent / unsafe]

 

Delete-Identifier-Referent (identifier, name):

 

Delete an identifier referent.

 

[idempotent / unsafe]

 

Deactivate-Identifier (identifier):

 

Deactivate an identifier from subsequent resolution.

Resolution requests will generate an informative error response.

 

[idempotent / unsafe]