Search
Service API
Digital Discovery System (DDS)
NCAR Library DDS Repository (live)

Search API Documentation

Service version: DDSWS v1.1

Document last revised: $Date: 2012/09/26 22:54:04 $

Overview

The Search API uses a REST-RPC hybrid approach to accept requests expressed as HTTP argument/value pairs and respond with structured data in XML or JSON format. Search requests operate over a Lucene index of terms. The API is avilable from the Digital Discovery System (DDS), the Digital Collection System (DCS), and the NSDL Collection System (NCS).

Table of Contents

  1. Overview
  2. Definitions and concepts
  3. Service requests
  4. Service responses
  5. Search fields
  6. Example search queries
  7. Configure search fields, facets, and relationships

Definitions and concepts

The Search API uses a REST-RPC hybrid approach to accept requests expressed as HTTP argument/value pairs. Requests may be made using the HTTP GET or POST method, which behave identically and vary only in the length of the request allowed (GET has a limited request length whereas POST is unlimited). Responses are returned in XML or JSON format (XML by default), which varies in structure and content depending on the request as shown below in the examples section of this document.

  • Base URL - the base URL used to access the Web service. This is the portion of the request that precedes the request arguments. For example http://nldr.library.ucar.edu/dds/services/ddsws1-1.
  • Request arguments - the argument=value pairs that make up the request and follow the base URL.
  • Response envelope - the XML container used to return data. This container returns different types of data depending on the request made.

HTTP request format

The format of the request consists of the base URL followed by the ? character followed by one or more argument=value pairs, which are separated by the & character. Each request must contain one verb=request pair, where verb is the literal string 'verb' and request is one of the API request strings defined below. All arguments must be encoded using the syntax rules for URIs. This is the same encoding scheme that is described by the OAI-PMH.

Service requests

This section defines the available requests, or verbs.

The HTTP request format has the following structure:
[base URL]?verb=request[&additional arguments].

For example:
http://nldr.library.ucar.edu/dds/services/ddsws1-1?
        verb=GetRecord&id=DLESE-000-000-000-001

Summary of available requests:

Search - Search across items in the repository using Lucene queries and get a list of matching records.

GetRecord - Get a single record from the repository by ID.

ListFields - Get a list of the search fields in the index.

ListTerms - Get a list of the terms in a given search field or fields.

ListCollections - Get a list of the collections in the repository.

ListXmlFormats - Get a list of the available XML formats from the service.

UrlCheck - Check whether a given resource URL or URLs exists in the repository.

ServiceInfo - Get information about the service end-point and index version.


Search

Summary and usage

The Search request allows a client to search across items in the repository using standard Lucene queries and get a list of matching records. The Search index is composed of search fields, and through the use of query clauses, can be used to apply custom search rank algorithms (see example search queries). The request also provides faceted search, sorting, searching by XML format, date ranges, geospatial bounding box search, and other functionality.

The Search response consists of an ordered set of metadata records, sorted by relevancy. The Search request searches over all XML formats that are available in the repository, unless otherwise specified in the 'xmlFormat' argument as described below. Flow control is managed by the client, which may 'page through' a set of results using the 's' and 'n' arguments as described below.

The Search request accept queries supplied in the standard Lucene Query Syntax. Lucene supports advanced Information Retrieval query clauses such as term and field boosting, wildcard and fuzzy searches, etc. Queries are supplied in the q argument of the request.

Sample request

The following request performs a search for the term "ocean" and returns 10 search results, starting at position 0:

http://nldr.library.ucar.edu/dds/services/ddsws1-1?verb=Search&q=ocean&s=0&n=10&client=ddsws-documentation

Arguments

Textual and fielded searches: The following argument is used to conduct textual and fielded searches and may be performed independently or in combination with other search criteria described below.

  • q - (query) an optional argument that may contain plain text and/or any query feature supported by the Lucene Query Syntax including field/term specifiers, boolean operators, boosting, range searches, and others. When no field is specified, search operates over the default field (the default field for this repository is default). See available search fields for detailed information about the fields that are available for searching.

Search by collection(s): Records in the repository are grouped into collections. The collection key argument limits the search to one or more collections. If one or more collection key arguments are included with no other search criteria, the search will return all records in the given collection(s). The available collections and their corresponding collection keys may be discovered using the ListCollections request.

  • ky - (collection key) an optional repeatable argument that limits the search to records that reside in the given collection(s).

Date range searches: The following arguments instruct the service to search in a given index date field and may be performed independently or in combination with other search criteria. The values provided in the fromDate or toDate arguments must be a union date type string of the form yyyy-MM-dd or an ISO8601 UTC datastamp of the form yyyy-MM-ddTHH:mm:ssZ. Example dates include 2004-07-08 or 2004-07-26T21:58:25Z. The fields that are available for searching by date are listed below. If supplied, the date range portion of the search criteria must match a given record in order for it to be included in the results.

  • dateField - an optional argument that indicates which index date field to search in. If supplied, one or both of either the fromDate or toDate arguments must be supplied.

  • fromDate - an optional argument that indicates a date range to search from. If supplied, the dateField argument must also be supplied.

  • toDate - an optional argument that indicates a date range to search to. If supplied, the dateField argument must also be supplied.

Geospatial searches: Geospatial searches operate over each record that has associated with it a geographic footprint (a geographic region representing the record's area of relevance) in the form of a box (defined below). A geospatial query takes a query region (also in the form of a box) and a spatial predicate (one of "within," "contains," "overlaps,") and returns all documents that 1) have a geographic footprint that 2) has the predicate relationship to the query region.

Formally, a box is a geographic region defined by north and south bounding coordinates (latitudes expressed in degrees north of the equator and in the range [-90,90]) and east and west bounding coordinates (longitudes expressed in degrees east of the Greenwich meridian and in the range [-180,180]). The north bounding coordinate must be greater than or equal to the south. The west bounding coordinate may be less than, equal to, or greater than the east; in the latter case, a box that crosses the ±180° meridian is described. As a special case, the set of all longitudes is described by a west bounding coordinate of -180 and an east bounding coordinate of 180.

The following arguments instruct the service to conduct a geospatial query over the subset of records that contain a geospatial footprint. Geospatial queries may be performed independently or in combination with other search criteria. To perform a geospatial query, all five of the required geospatial arguments must be included, otherwise none may be included, and thus are conditionally required. If an error in the request arguments is encountered, the service will return an appropriate error response and message. The optional geospatial argument may be included if desired.

  • geoPredicate - a conditionally required argument that indicates the relationship to the query region. Values must be one of [ within | overlaps | contains ].

  • geoBBNorth - a conditionally required argument that indicates the northern most latitude of search. Values must be a floating point number in the range [-90,90].

  • geoBBSouth - a conditionally required argument that indicates the southern most latitude of search. Values must be a floating point number in the range [-90,90].

  • geoBBWest - a conditionally required argument that indicates the western most longitude of search. Values must be a floating point number in the range [-180,180].

  • geoBBEast - a conditionally required argument that indicates the eastern most longitude of search. Values must be a floating point number in the range [-180,180].

  • geoClause - an optional argument that indicates the boolean clause applied to the geospatial portion of the search. Values must be one of [ must | should ], where must indicates the geospatial portion of the search criteria must match a given record in order for it to be included in the results; should indicates it should match but is not required in order to appear in the search results. Default value is must.

Flow control: A search client can control the flow of paging through a set of search results and the size of the result set using the s (starting offset) and n (number returned) arguments. As an example, when a search is initially performed, the client might construct a request that supplies the arguments s=0 and n=10 to return up to the first 10 matching results. The client would then page through the set of results by issuing subsequent requests indicating s=10 and n=10 for the next ten results, s=20 and n=10 for results 20 through 30 and so forth up to totalNumResults. To retrieve each successive segment of search results the client must supply identical search criteria in all search related arguments (q, xmlFormat, gr, su, cs, re, xmlFormat, so, etc.), sorting and date-restrictive arguments. DDS search is deterministic and the set and order of search results are guaranteed to be identical for any two identical searches (assuming the repository has not changed in the interim). Thus the s and n arguments can be thought of as indicating the 'window' into the set of ordered search results into which the client wants to see.

  • s - (starting offset) - a required argument that specifies the starting offset into the results set upon which metadata records should be returned. May be any integer grater than or equal to 0.

  • n - (number returned) - a required argument that specifies the number of metadata records to return, beginning at the offset specified by s. Must be a integer from 0 to maxSearchResultsAllowed, as indicated in the response to the ServiceInfo request. If 0 is specified, no search results will be returned and only facet results and the header will be displayed. The maximum allowed by this server is 1000.


Response content arguments: The following arguments instruct the service to return specified content in the response.

  • response - an optional argument that instructs the service to return specified content in the response. Available argument values include:
    • [score] - Returns the score element inside the <record> element. Score indicates the Lucene hit score for the matching record.
    • [head] (included in the response by default - see response.mode below for options) - Returns the head element inside the <record> element.
    • [metadata] (included in the response by default - see response.mode below for options) - Returns the metadata element inside the <record> element.
    • [collectionMetadata] - Returns the metadata that describes the collection in which the record resides inside the <record> element.
    • [allCollectionsMetadata] - Returns the metadata that describes all collections in which the resource (URL) resides inside the top-level <record> element only and not inside the nested relation reponse (see below).

  • response.mode - an optional argument that indicates which response elements are returned by default. Available argument values include:
    • [standardResponse] (default) - Returns the <head> and <metadata> elements inside each <record> returned.
    • [allOff] - Instructs the service to omit all standard response elements (<head>, <metadata>, <collectionMetadata>) from the response except those indicated in the response argument.
    • [allOn] - Instructs the service to include all standard response elements (<head>, <metadata>, <collectionMetadata>) in the response.

  • storedContent - an optional argument that instructs the service to return the given stored content from the index for each record returned. These appear in a <storedContent> element inside the <record> container. Inputs: a stored field name, for example /text//nsdl_dc/title.

  • storedContent.mode - an optional argument that indicates how the storedContent response will be returned. Available argument values include:
    • [singleRecord] (default) - Returns the given stored content for each record, including nested related records, inside each <record> element.
    • [multiRecord] - Returns the given stored content for all records that catalog the given resource inside the top-level <record> element only and not inside the nested relation reponse (see below)

  • relation - an optional argument that instructs the service to return the given related records for each record returned. These appear in a <relations> element nested inside the top-level <record> container. The relations that are available depend the given DDS repository. Some common relation argument values include:
    • [alsoCatalogedBy] - Returns all records that catalog the given resource. The top level record indicates the best match for the given search query. All nested records indicate additional records that catalog the same resource (URL).
    • [annotatedBy] - Returns all annotation records associated with the resource. Annotation records contain user-contributed comments, start ratings, and other information that describes the resource
    • [paradataProvidedBy] - Returns all paradata records associated with the resource. Paradata records contain summary data about how the resource was used across a group of users in a given context, for example visited 12 times, downloaded 4 times, favorited 8 times, etc.
    • Note that memberOfCollection is not supported here - use the response=collectionMetadata argument/value pair to request metadata for the collection in which each record resides (see above)

  • xmlFormat - an optional argument that indicates the format the records must be returned in. If specified, Search and GetRecord are limited to only those records that can be disseminated in the given format and the top-level <metadata> elements in the response will contain that format, transformed from their original native format if necessary. If not specified, the records will be returned in their native format. The available formats may be discovered using the ListXmlFormats request.

    Note that this argument includes all records that can be disseminated in the given format, which may include records that were originally cataloged in a different native format. To filter the Search request to include only those records that were originally cataloged in a given native format, add a query clause xmlFormat:formatSpecifier to the search query instead.


Additional arguments: The following arguments may also be supplied in the request.

  • disableFilter - an optional, repeatable argument that must contain the name of a global filter that has been defined for the repository. Global filters are an optional feature that may be applied to a given DDS repository. Normally, records that match the filter definition are omitted from service responses. The filter can be temporarily turned off (if allowed) by supplying this argument in the service request. Contact the repository administrator to see if any global filters are being used for the repository.

  • so - (search over) an optional argument that must contain the value allRecords or discoverableRecords. Clients that request to search over allRecords must be authorized by IP, otherwise an error is returned. Defaults to discoverableRecords.

  • client - an optional argument that may be supplied by the client to indicate where the request originated from. Example values might be ddsExamplesSearchClient or myLibrarySearchClient. When supplied, this information is used by the services administrators to help understand how people are using the service on a client-by-client basis.

Sorting the response: The following arguments instruct the service to sort the response by one or more index fields or relevancy score. The service sorts the entire result set lexically prior to returning the requested portion of the results. Only one of the sort arguments may be supplied in the request. If no sort argument is indicated, results are sorted in descending order by relevancy score. To use the contents of an element or attribute in the record XML for sorting, specify a keyword XPath search field. Any other field that exists in the index as a single token or keyword may also be used.

  • sort - an optional argument that instructs the service to sort the search results by one or more index fields or relevancy score. The sort argument must contain an ordered list of one or more comma-separated fields, with a directionality specifier (asc or desc) after each field. The name score must be used to indicate sorting by relevancy. For example "modtime asc, title desc, score desc" will sort first by modtime, then by title and finally by relevancy score. Fields are sorted as Strings unless 'score' is indicated. The default sort order is 'score desc'.

As a convenience, the following sort arguments may be used as a shorthand way to specify sorting by a single field only:

  • sortAscendingBy - an optional argument that instructs the service to sort the search results in ascending lexical order by a given index field.

  • sortDescendingBy - an optional argument that instructs the service to sort the search results in descending lexical order by a given index field.

 

Faceted search

The Search request supports faceting over categories that have been defined in the index. A category is an aspect of indexed documents which can be used to classify the documents. For example, in a collection of books at an online bookstore, categories of a book can be its price, author, publication date, binding type, and so on. A facet category may be flat or hierarchal, containing one or more levels in a taxonomy/hierarchy path.

In faceted search, in addition to the standard set of search results, the service also returns facet results, which are lists of subcategories for certain categories. For example, for the price facet, one might get a list of relevant price ranges; for the author facet, one might get a list of relevant authors; and so on. In most UI's, when users click one of these subcategories, the search is narrowed, or drilled down, and a new search limited to this subcategory (e.g., to a specific price range or author) is performed.

Include the following required and optional arguments to enable faceting features. These must be included in addition to the normal arguments for Search like q, n, s, ky, xmlFormat, etc. Facet counts apply across the records that match the given search criteria. Facet results (XML element <facetResults>) are returned at the top of the Search response.


To Find Available Facet Categories

To find the categories that are available for faceted search in the index, use the ListTerms request and request the special field named $facets. In each <term> element returned in the response, the first token listed indicates the facet category and subsequent tokens indicate subcategories. Facet subcategories may be flat (one level deep) or hierarchical (two or more levels deep):

http://nldr.library.ucar.edu/dds/services/ddsws1-1?verb=ListTerms&field=$facets


To Retrieve Facet Counts

  • facet=on - (required) - Turns faceting on
  • facet.category=<facetCategory> - (required, repeatable) - Indicates the facet category you wish to collect facet totals for. The facetCategory token must be one of the available categories (listed above). Repeat this argument to return multiple categories if desired.
  • facet.maxResults=<Integer> - (optional) - Indicates the maximum number of results. Defaults to 10. This argument can be specified on a per-category basis with the syntax of f.<facetCategory>.maxResults (optional, repeatable)
  • facet.maxDepth=<Integer> - (optional) - Indicates the maximum depth to display down the facet hierarchy path. Defaults to 10. This argument can be specified on a per-category basis with the syntax of f.<facetCategory>.maxDepth (optional, repeatable)
  • facet.maxLabels=<Integer> - (optional) - Indicates the maximum number of labels to show in the results. Labels are expensive to calculate so response time can be improved by using a lower number here. If a label is not returned, the facetId can be used to determine the label, provided you've cached the list of facetIds and their corresponding labels in your client somewhere. The facetIds may change when the search index is updated. Defaults to 1000. This argument can be specified on a per-category basis with the syntax of f.<facetCategory>.maxLabels (optional, repeatable)
  • f.<facetCategory>.path=<hierarchy/path> - (optional, repeatable) - Indicates a position in the hierarchy path to start faceting from. This only applies if the given category has a hierarchy greater than one. If not used then results are returned starting at the root. The delimiter character used in the path must match the one that was specified in the config for the given category and that is reflected in the <facetResult> response. As an example, given the following facet taxonomy Book/Fiction/Science Fiction/Fahrenheit 451 where 'Book' represents the category and 'Fiction/Science Fiction/Fahrenheit 451' represents a possible path through the taxonomy and '/' is the delimiter character, f.Book.path=Fiction/Science Fiction would return facet results for Science Fiction and below, only.

To Drill-down into a Facet

  • f.drilldown.category=<facetCategory> - (required, repeatable) - Indicates the facet category you wish to drill down into. Token must be the same as the one specified in the <facetResult> response. Repeat this argument to filter results by multiple facet categories (AND logic).
  • f.drilldown.<facetCategory>.path - (optional, repeatable) - Indicates the category path you wish to drill down into. Path must be the same as the one specified in the <facetResult> response. If omitted then the search results are filtered by the top-level category only.
  • It is not necessary to specify facet=on, but you can request facet counts again while doing a drill-down if desired.

The facet functionality in the Search API is implemented with the Lucene faceting library. For background information about this library and faceting in general, see the Faceted Search User's Guide.

 

Errors and exceptions

See error and exception conditions.

Examples

Request

Search for the word ocean.

http://nldr.library.ucar.edu/dds/services/ddsws1-1?
           verb=Search&q=ocean&s=0&n=10

Response

<?xml version="1.0" encoding="UTF-8" ?> 
<DDSWebService>
  <Search>
    <resultInfo>
      <totalNumResults>520</totalNumResults>
      <numReturned>10</numReturned>
      <offset>0</offset>
    </resultInfo>
    <results>
      <record>
        <head>
          <id>DLESE-COLLECTION-000-000-000-018</id>
          <collection recordId="DLESE-COLLECTION-000-000-000-012">
              Science Ed Resource Center (SERC)</collection>
          <xmlFormat nativeFormat="dlese_collect">dlese_collect</xmlFormat>
          <fileLastModified>2004-03-29T20:44:41Z</fileLastModified>
          <whatsNewDate type="collection">2004-03-29</whatsNewDate>
          <additionalMetadata realm="dlese_collect">
            <formatOfRecords>adn</formatOfRecords>
            <isEnabled>true</isEnabled>
            <numRecords>325</numRecords>
            <numRecordsIndexed>324</numRecordsIndexed>
            <partOfDrc>false</partOfDrc>
          </additionalMetadata>
        </head>
        <metadata>
          <collectionRecord>
            <general>
              <fullTitle>Carleton College Science Education 
               Resource Center (SERC) - Starting Point Entry 
               Level Geoscience Collection
              </fullTitle>
              ...

</DDSWebService>

Request

Search for the word ocean and limit the search to grade range High (9-12).

http://nldr.library.ucar.edu/dds/services/ddsws1-1?
           verb=Search&q=ocean&gr=02&s=0&n=10

Response

<?xml version="1.0" encoding="UTF-8" ?> 
<DDSWebService>
  <Search>
    <resultInfo>
      <totalNumResults>208</totalNumResults>
      <numReturned>10</numReturned>
      <offset>0</offset>
    </resultInfo>
    <results>
      <record>
	<head>
          <id>NASA-Edmall-2315</id>
          <collection recordId="DLESE-COLLECTION-000-000-000-014">
             NASA ED Mall Collection</collection>
          <xmlFormat nativeFormat="adn">adn</xmlFormat>
          <fileLastModified>2004-06-17T18:24:10Z</fileLastModified>
          <whatsNewDate type="itemnew">2003-07-29</whatsNewDate>
          <additionalMetadata realm="adn">
            <accessionStatus>
               accessioneddiscoverable
            </accessionStatus>
            <partOfDrc>false</partOfDrc>
          </additionalMetadata>
        </head>
        <metadata>
          <itemRecord>
            <general>
              <title>Coriolis Force</title>
              ...

</DDSWebService>

Request

Search for all ADN records new to the repository since July 7th, 2004 and sort descending by the wndate field.

http://nldr.library.ucar.edu/dds/services/ddsws1-1?
verb=Search&s=0&n=10&fromDate=2004-07-08&dateField=wndate
&sortDescendingBy=wndate&xmlFormat=adn-localized

Response

Same format as above.


GetRecord

Summary and usage

The GetRecord request is used to retrieve a single record from the repository in one of the available XML formats.

Sample request

The following request displays the metadata for record ID DLESE-000-000-000-001 displayed in it's native XML format:

http://nldr.library.ucar.edu/dds/services/ddsws1-1?verb=GetRecord&id=DLESE-000-000-000-001

Arguments

  • id - a required argument that specifies the identifier for the record.

  • disableFilter - an optional, repeatable argument that must contain the name of a global filter that has been defined for the repository. Global filters are an optional feature that may be applied to a given DDS repository. Normally, records that match the filter definition are omitted from service responses. The filter can be temporarily turned off (if allowed) by supplying this argument in the service request. Contact the repository administrator to see if any global filters are being used for the repository.

  • so - (search over) an optional argument that must contain the value allRecords or discoverableRecords. Clients that request to search over allRecords must be authorized by IP, otherwise an error is returned. Defaults to discoverableRecords.


Response content arguments: The following arguments instruct the service to return specified content in the response.

  • response - an optional argument that instructs the service to return specified content in the response. Available argument values include:
    • [score] - Returns the score element inside the <record> element. Score indicates the Lucene hit score for the matching record.
    • [head] (included in the response by default - see response.mode below for options) - Returns the head element inside the <record> element.
    • [metadata] (included in the response by default - see response.mode below for options) - Returns the metadata element inside the <record> element.
    • [collectionMetadata] - Returns the metadata that describes the collection in which the record resides inside the <record> element.
    • [allCollectionsMetadata] - Returns the metadata that describes all collections in which the resource (URL) resides inside the top-level <record> element only and not inside the nested relation reponse (see below).

  • response.mode - an optional argument that indicates which response elements are returned by default. Available argument values include:
    • [standardResponse] (default) - Returns the <head> and <metadata> elements inside each <record> returned.
    • [allOff] - Instructs the service to omit all standard response elements (<head>, <metadata>, <collectionMetadata>) from the response except those indicated in the response argument.
    • [allOn] - Instructs the service to include all standard response elements (<head>, <metadata>, <collectionMetadata>) in the response.

  • storedContent - an optional argument that instructs the service to return the given stored content from the index for each record returned. These appear in a <storedContent> element inside the <record> container. Inputs: a stored field name, for example /text//nsdl_dc/title.

  • storedContent.mode - an optional argument that indicates how the storedContent response will be returned. Available argument values include:
    • [singleRecord] (default) - Returns the given stored content for each record, including nested related records, inside each <record> element.
    • [multiRecord] - Returns the given stored content for all records that catalog the given resource inside the top-level <record> element only and not inside the nested relation reponse (see below)

  • relation - an optional argument that instructs the service to return the given related records for each record returned. These appear in a <relations> element nested inside the top-level <record> container. The relations that are available depend the given DDS repository. Some common relation argument values include:
    • [alsoCatalogedBy] - Returns all records that catalog the given resource. The top level record indicates the best match for the given search query. All nested records indicate additional records that catalog the same resource (URL).
    • [annotatedBy] - Returns all annotation records associated with the resource. Annotation records contain user-contributed comments, start ratings, and other information that describes the resource
    • [paradataProvidedBy] - Returns all paradata records associated with the resource. Paradata records contain summary data about how the resource was used across a group of users in a given context, for example visited 12 times, downloaded 4 times, favorited 8 times, etc.
    • Note that memberOfCollection is not supported here - use the response=collectionMetadata argument/value pair to request metadata for the collection in which each record resides (see above)

  • xmlFormat - an optional argument that indicates the format the records must be returned in. If specified, Search and GetRecord are limited to only those records that can be disseminated in the given format and the top-level <metadata> elements in the response will contain that format, transformed from their original native format if necessary. If not specified, the records will be returned in their native format. The available formats may be discovered using the ListXmlFormats request.

    Note that this argument includes all records that can be disseminated in the given format, which may include records that were originally cataloged in a different native format. To filter the Search request to include only those records that were originally cataloged in a given native format, add a query clause xmlFormat:formatSpecifier to the search query instead.


Errors and exceptions

See error and exception conditions.

Examples

Request

Request the record id DLESE-000-000-000-337 and get the response in ADN format. Shown without the required encoding, for clarity.

http://nldr.library.ucar.edu/dds/services/ddsws1-1?
        verb=GetRecord&id=DLESE-000-000-000-337

Response

<?xml version="1.0" encoding="UTF-8" ?> 
<DDSWebService>
  <GetRecord>
    <record>
      <head>
        <id>DLESE-000-000-000-337</id> 
        <collection recordId="DLESE-COLLECTION-000-000-000-015">
          DLESE Community Collection (DCC)</collection> 
        <xmlFormat nativeFormat="adn">adn</xmlFormat> 
        <fileLastModified>2004-06-24T19:06:08Z</fileLastModified> 
        <whatsNewDate type="itemnew">2003-07-10</whatsNewDate> 
        <additionalMetadata realm="adn">
          <accessionStatus>accessioneddiscoverable</accessionStatus> 
          <partOfDrc>true</partOfDrc> 
          <alsoCatalogedBy collectionLabel="NASA ESE Reviewed 
             Collection" 
             collectionRecordId="DLESE-COLLECTION-000-000-000-023">
                  NASA-ESERevProd333</alsoCatalogedBy> 
        </additionalMetadata>
      </head>
      <metadata>
        <itemRecord>
          <general>
            <title>Earth Science Picture of the Day</title> 
            ...

</DDSWebService>


ListFields

Summary and usage

The ListFields request is used to get all search fields that reside in the index.

Sample request

The following request lists all fields in the index:

http://nldr.library.ucar.edu/dds/services/ddsws1-1?verb=ListFields

Arguments

None


Errors and exceptions

See error and exception conditions.

Examples

See link above

ListTerms

Summary and usage

The ListTerms request is used to get all search terms that exist in the index for a given field or fields. For each term the response indicates the number of times it appears in the index (termCount) as well as the number of documents (records) it appears in (docCount).

Sample request

The following request lists all terms in the index for field 'title':

http://nldr.library.ucar.edu/dds/services/ddsws1-1?field=title&verb=ListTerms

Arguments

  • field - a required repeatable argument that contains the name of a field. The field argument may be repeated as many times as desired within a single request. Note that response times will increase dramatically when more than one field is requested.


Errors and exceptions

See error and exception conditions.

Examples

See link above

ListCollections

Summary and usage

The ListCollections request is used to discover the collections in the repository, collection metadata and the collection keys used in the Search request. Clients should use this request to generate user interface widgets for selecting collections to search from, and to display collection information and metadata to users.

Sample request

The following request lists the collections that are in the repository and all available metadata about each collection:

http://nldr.library.ucar.edu/dds/services/ddsws1-1?verb=ListCollections&response=collectionMetadata

Arguments

  • response - an optional argument that instructs the service to return the collection metadata in the response, which is returned in a <collectionMetadata> element for each collection. Argument value must be: [collectionMetadata]

Errors and exceptions

See error and exception conditions.

Examples

See link above


ListXmlFormats

Summary and usage

The ListXmlFormats request is used to discover the XML formats that are available from the repository as a whole or for a single record in the repository. Clients should use this request to discover the available XML formats and the keys that may be supplied in the 'xmlFormat' argument of the Search or GetRecord requests.

The Service is able to disseminate any number of XML formats depending on the record collections that reside in the repository. Some common formats include OAI Dublin Core (oai_dc), NSDL Dublin Core (nsdl_dc), DLESE collection (dlese_collect), ADN (ADEPT/DLESE/NASA) (adn), News&Opps (news_opps), and DLESE annotation (dlese_anno).

Certain records may be disseminated in multiple alternative formats. For example, records that were originally cataloged in the ADN format may also be returned in the oai_dc, nsdl_dc, and other formats. When a record is requested in a non-native format, it's XML is transformed to the requested format using XSLT or other transformation prior to being returned by the service.

Sample request

The following request lists the XML formats that may be disseminated from this service and their corresponding search keys:

http://nldr.library.ucar.edu/dds/services/ddsws1-1?verb=ListXmlFormats

Arguments

  • id - an optional argument that specifies a record ID in the repository. If supplied, the request will show only those XML formats that are available for the given record. If omitted, the response will indicate all XML formats that are available in the repository.

Errors and exceptions

See error and exception conditions.

Examples

Request

Show all XML formats available for ID DLESE-000-000-000-001.

http://nldr.library.ucar.edu/dds/services/ddsws1-1?
             verb=ListXmlFormats&id=DLESE-000-000-000-001

Response

<?xml version="1.0" encoding="UTF-8" ?> 
<DDSWebService>
  <ListXmlFormats>
    <xmlFormat>adn</xmlFormat>
    <xmlFormat>adn-localized</xmlFormat>
    <xmlFormat>briefmeta</xmlFormat>
    <xmlFormat>nsdl_dc</xmlFormat>
    <xmlFormat>oai_dc</xmlFormat>
  </ListXmlFormats>
</DDSWebService>


UrlCheck

Summary and usage

The UrlCheck request is used to check whether a given URL is in the DDS repository. This request supports the use of the * wildcard construct. The * character, or wildcard construct, indicates that any character combination is a valid match. For example, a search for http://www.dlese.org/myResource* will match the two URLs http://www.dlese.org/myResource1.html and http://www.dlese.org/myResource2.html. The wildcard construct may appear at any position in the URL argument except the first position.

Sample request

The following request searches for all records in the repository that have a URL ending in '.pdf':

http://nldr.library.ucar.edu/dds/services/ddsws1-1?url=http://.pdf&verb=UrlCheck

Arguments

  • url - a required repeatable argument that contains a URL. The url argument may be repeated as many times as desired within a single request.

  • disableFilter - an optional, repeatable argument that must contain the name of a global filter that has been defined for the repository. Global filters are an optional feature that may be applied to a given DDS repository. Normally, records that match the filter definition are omitted from service responses. The filter can be temporarily turned off (if allowed) by supplying this argument in the service request. Contact the repository administrator to see if any global filters are being used for the repository.

Errors and exceptions

See error and exception conditions.

Examples

Request

Determine whether the URL 'http://epod.usra.edu/' is in the repository. Shown without the required encoding, for clarity.

http://nldr.library.ucar.edu/dds/services/ddsws1-1?
    verb=UrlCheck&url=http://epod.usra.edu/

Response

<?xml version="1.0" encoding="UTF-8" ?> 
<DDSWebService>
  <UrlCheck>
    <resultInfo>
      <totalNumResults>1</totalNumResults> 
    </resultInfo>
    <results>
      <matchingRecord>
        <url>http://epod.usra.edu/</url> 
        <head>
          <id>DLESE-000-000-000-337</id> 
          <collection recordId="DLESE-COLLECTION-000-000-000-015">
            DLESE Community Collection (DCC)</collection> 
          <xmlFormat nativeFormat="adn">adn</xmlFormat> 
          <fileLastModified>2004-06-24T19:06:08Z</fileLastModified> 
          <whatsNewDate type="itemnew">2003-07-10</whatsNewDate> 
          <additionalMetadata realm="adn">
            <accessionStatus>accessioneddiscoverable</accessionStatus> 
            <partOfDrc>true</partOfDrc> 
            <alsoCatalogedBy collectionLabel="NASA ESE 
                 Reviewed Collection" 
              collectionRecordId="DLESE-COLLECTION-000-000-000-023">
                 NASA-ESERevProd333</alsoCatalogedBy> 
          </additionalMetadata>
        </head>
      </matchingRecord>
    </results>
  </UrlCheck>
</DDSWebService>
Note: responses to this request contain the common head element.

Request

Determine whether the URL 'http://epod.usra.edu/' or 'http://www.marsquestonline.org/index.html' is in the repository.

http://nldr.library.ucar.edu/dds/services/ddsws1-1?
   verb=UrlCheck&url=http://epod.usra.edu/&
   url=http://www.marsquestonline.org/index.html

Response

<?xml version="1.0" encoding="UTF-8" ?> 
<DDSWebService>
  <UrlCheck>
    <resultInfo>
      <totalNumResults>2</totalNumResults> 
    </resultInfo>
    <results>
      <matchingRecord>
        <url>http://www.marsquestonline.org/index.html</url> 
        ....
      </matchingRecord>
      <matchingRecord>
        <url>http://epod.usra.edu/</url> 
        ...
      </matchingRecord>
    </results>
  </UrlCheck>
</DDSWebService>

Request

Determine whether a URL that begins with 'http://www.dlese.org' is in the repository. The * character acts as a wildcard, which may appear at any position in the URL argument except the first position.

http://nldr.library.ucar.edu/dds/services/ddsws1-1?
         verb=UrlCheck&url=http://www.dlese.org* 

Response

<?xml version="1.0" encoding="UTF-8" ?> 
<DDSWebService>
  <UrlCheck>
    <resultInfo>
      <totalNumResults>2</totalNumResults> 
    </resultInfo>
    <results>
      <matchingRecord>
        <url>http://www.dlese.org/vgee/index.htm</url> 
        ...
      </matchingRecord>
      <matchingRecord>
        <url>
   http://www.dlese.org/documents/policy/CollectionsScope_final.html
        </url> 
        ...
      </matchingRecord>
    </results>
  </UrlCheck>
 </DDSWebService>

Request

Determine whether the URL 'http://epod.usra.edu/zzzz' is in the repository. In this case no matching records are found.

http://nldr.library.ucar.edu/dds/services/ddsws1-1?
        verb=UrlCheck&url=http://epod.usra.edu/zzzz 

Response

<?xml version="1.0" encoding="UTF-8" ?> 
<DDSWebService>
  <UrlCheck>
    <resultInfo>
      <totalNumResults>0</totalNumResults> 
    </resultInfo>
  </UrlCheck>
</DDSWebService>


ServiceInfo

Summary and usage

The ServiceInfo request is used to get information about the service and the index version. The index version is updated any time a change is made to the repository. Clients may use the index version to determine when to update cached data from the repository. Other data include name, description, the URL used to access the service (base URL), service version, the maximum number of search results allows by the Search request, and an administrator e-mail.

Sample request

The following request displays information about this Web service:

http://nldr.library.ucar.edu/dds/services/ddsws1-1?verb=ServiceInfo

Arguments

None

Errors and exceptions

See error and exception conditions.

Examples

See link above.

Service responses

Service responses are returned in XML or JSON format and vary in structure and content depending on the request made. The content and structure of the response from each of the requests are described above in their respective sections. This section describes common response structures that are returned by the service across all requests.

Common response elements

Several requests in the protocol share common XML elements in their responses. These include the <head> and <additionalMetadata> elements, which are described below.

The head element

The head element appears in the Search, GetRecord, UrlCheck responses. The head element is used to return information about a single record. This includes the ID of the record, the collection in which the record is a member of, the XML format of the record that was returned, the native XML format of the record, the date the record was last modified, the whatsNewDate and an additionalMetadata element.

Head element example:
<?xml version="1.0" encoding="UTF-8" ?> 
...
<head>
   <id>CEIS-000-000-001</id>
   <collection recordId="DLESE-COLLECTION-000-000-000-003">
          Discover Our Earth</collection>
   <xmlFormat nativeFormat="adn">adn</xmlFormat>
   <fileLastModified>2004-07-02T17:32:29Z</fileLastModified>
   <whatsNewDate type="itemnew">2003-07-19</whatsNewDate>
	<additionalMetadata realm="adn">
         ...
        </additionalMetadata>
</head>
...


The additionalMetadata element

The additionalMetadata element appears in Search, GetRecord, UrlCheck and the vocabulary list class of responses. The additionalMetadata element is used to return additional information related to the record's format type, referred to as realms. The information realms include adn and dlese_collect, and each contains slightly different information related to underlying format type.

additionalMetadata element example:
<?xml version="1.0" encoding="UTF-8" ?> 
...
<additionalMetadata realm="adn">
   <accessionStatus>accessioneddiscoverable</accessionStatus>
   <partOfDrc>false</partOfDrc>
   <alsoCatalogedBy collectionLabel="DLESE Community Collection (DCC)"
        collectionRecordId="DLESE-COLLECTION-000-000-000-015">
           DLESE-000-000-000-840</alsoCatalogedBy>
   <alsoCatalogedBy collectionLabel="Cutting Edge" 
        collectionRecordId="DLESE-COLLECTION-000-000-000-010">
           SERC-NAGT-000-000-000-322</alsoCatalogedBy>
</additionalMetadata>
...

Error and exception conditions

If an error or exception occurs, the service returns an <error> element with the type of error indicated by a code attribute. Clients are advised to test the value of these codes and respond with an appropriate message to users. For example, if a user conducts a search that has no matches, the code noRecordsMatch will be returned from the server and a message indicating that the search had no results can be displayed. The error codes are similar to those defined by OAI-PMH.

Error Codes Description Applicable Verbs
noRecordsMatch The combination of values supplied in the Search request resulted in a query that had no matching records. Search
badQuery The value supplied in the q argument of the Search request was malformed or syntactically incorrect. Search
badArgument The request includes illegal arguments, is missing required arguments, includes a repeated argument, or values for arguments have an illegal syntax. all verbs
badVerb Value of the verb argument is not a legal or the verb argument is missing. N/A
cannotDisseminateFormat The metadata format identified by the value given for the xmlFormat argument is not supported by the item or by the repository. GetRecord
idDoesNotExist The value of the id argument is unknown or illegal in the repository. GetRecord
notAuthorized The client that made the request is not authorized to access the requested data from the service. all verbs
internalServerError The server for the service encountered a problem and was not able to respond to the request. all verbs

Example error response


Request

Request a record id that does not exist in the repository using GetRecord.

http://nldr.library.ucar.edu/dds/services/ddsws1-1?
          verb=GetRecord&id=BAD-ID-123

Response

<?xml version="1.0" encoding="UTF-8" ?> 
<DDSWebService 
    xmlns="http://www.dlese.org/Metadata/ddsws" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://www.dlese.org/Metadata/ddsws 
	http://www.dlese.org/Metadata/ddsws/1-1/ddsws.xsd">
	<error code="idDoesNotExist">ID "BAD-ID-123" does not exist in the repository</errror>
</DDSWebService>


Requesting JSON response

Each of the service responses can be returned as JSON (JavaScript Object Notation) as an alternate output format to XML. JSON is a simple data format based on the object notation of the JavaScript language and is commonly used in Ajax-style programming to bring content into Web pages asynchronously. For more information about JSON and how it is used, see Douglas Crockford's site www.json.org. A DDS client that illustrates it's use is shown in these examples.

By default, all responses are output in XML format. To get JSON output, include the argument output=json in the request. Additionally, a callback argument callback=function may be included to wrap the JSON output in parentheses and a function name of your choosing. The JSON output by the service is a direct translation of the XML structure into JSON.

Arguments

  • output=json - An optional argument that, when used, instructs the service to return JSON output instead of XML.

  • callback=function - An optional argument that, when used in conjunction with the output=json argument, instructs the service to return the JSON output wrapped in parentheses and a function name of your choosing, as indicated by the argument value.



Removing namespaces from responses

Namespaces can be removed from the XML and JSON output from the service, which can simplify working with and processing the output.

By default, all responses are returned with the namespaces that appear in the requested format disseminated from the repository. To remove namespaces, include the argument transform=localize in the request.

Arguments

  • transform=localize - An optional argument that, when used, instructs the service to return the XML or JSON output without namespaces.




Search fields

This section describes the search fields that are available in the Search request. The repository index contains fields that are extracted from each of the XML records within, and a given repository may contain records in many different native XML formats. Searches within a given field operate over the set of records that contain that field. For example, a search in the default field operates over all records in the repository, since all records are guaranteed to contain the default field, whereas a search in the title field operates over a potentially smaller sub-set of records that contain the title field. Boolean searches may be performed across and within each of the fields using the Lucene query syntax supplied in the q argument of the Search request. The appropriate Lucene Analyzer is applied automatically for each field specified in the query. Example search queries are provided below.

Fields may contain plain text, controlled vocabularies or encoded field values.

Certain fields may be used to sort the search results when used in the sort, sortAscendingBy or sortDescendingBy arguments of the Search request. Sortable fields are indicated below.

The default field for queries

The default field used by the query parser is default, which is searched when no field is explicitly specified. The query ocean is therefore equivalent to default:ocean.

How search fields are generated

At index creation time, each record is inserted in the repository in it's native XML format. The indexer extracts standard, XPath and custom search fields from the content of the native XML and additional fields associated with the item may also be extracted from other sources, such as text derived from a crawl of the resource described by the metadata record. The indexer then generates a single entry containing each of the fields and inserts it into the repository. All records are guaranteed to contain certain fields such as the default and stems fields, as well as XPath fields for their native XML format. Details about the standard, XPath and custom fields are provided below.

Searching across and within specific XML formats

The Search request operates over and disseminates records in any available XML format. By default, searches operate over the available fields for all records in the repository regardless of format, and results may contain records of mixed XML formats. For example, a search for default:ocean searches the for the term ocean in the default field across all records in the repository and may return records in oai_dc, adn, dlese_anno and other formats in a single result set depending on what matches are found.

Requesting search results in a specific XML format: Certain XML formats can be disseminated from the service in multiple formats, for example records that reside natively as adn can also be disseminated in the oai_dc format. The Search request accepts an optional xmlFormat argument, which instructs the service to search over and return only those records that can be disseminated in the given format. In this case, the search still operates over the fields associated with the record's native XML format, however the results will be returned in the requested XML format only, and records that reside in a different native format will be transformed and returned in the requested XML format.

Limiting search to specific XML formats: Each record contains the special field xmlFormat, which contains the format key associated with the native format for the record. To search over and return records that reside in specific native XML formats, include this field in the query for the Search request. For example, the query xmlFormat:oai_dc ocean will search for and return all records in the repository that reside in the native oai_dc format and that contain the term ocean in the default field.

The xml format keys that may be used in the xmlFormat argument or xmlFormat search field in the Search request may be discovered using the ListXmlFormats request.

Text versus stemmed text

When searching in a text field, exact terms are matched. For example a search for ocean will return all records that contain the exact term ocean in the given field. Where indicated, certain textual fields have stemming applied to them using the Porter stemmer algorithm (snowball variation). When searching in a field that has been stemmed, all records containing morphologically similar terms in the given field are matched. For example a search for stems:ocean will return all records that contain the terms ocean, oceans or oceanic in the stems field. Note that when searching in a stemmed field, the client should not apply stemming to the terms it supplies for search. Stemming will be applied automatically by the search engine for these fields and no pre-processing is necessary by the client.


Standard Search Fields

The following search fields are generally available for all XML formats in the repository. This is implementation specific for each repository - see Configure Search Fields, facets, and relationships.


  • default - Contains the full content from all Elements and Attributes within the XML of each record. May not be sorted. Available for all formats. Note that the default field used by the query parser in a given repository may be different than this. The default field used in the query parser for this repository is default.

  • stems - Contains the same content as the default field, in stemmed form. May not be sorted. Available for all formats.

  • title - Contains the titles of resources or items, as text. May be sorted. Available for formats: adn, news_opps, ncs_collect and all formats that specify this field for indexing.

  • titlestems - Contains the same content as the title field, in stemmed form. May not be sorted. Available for formats: adn, news_opps, ncs_collect and all formats that specify the title field for indexing.

  • description - Contains the descriptions of resources or items, as text. May be sorted. Available for formats: adn, dlese_collect, news_opps, ncs_collect and all formats that specify this field for indexing.

  • descriptionstems - Contains the same content as the description field, in stemmed form. May not be sorted. Available for formats: adn, dlese_collect, news_opps, ncs_collect and all formats that specify the description field for indexing.

  • xmlFormat - Contains the native XML format key for the record, for example oai_dc or adn, which may be discovered via the ListXmlFormats service request. Available for formats: all formats.

  • url - Contains the URL for the resource indexed as a single token. Use this field to search for exact URL match. May be sorted. Available for all formats that have a URL.

  • idvalue - Contains the internal unique identifier for the record corresponding to the ID returned in the record header, for example MY-ID-001, indexed as a single token. Use this field to search for exact ID match. Available for all formats.

  • allrecords - Special field that matches all records in the repository by applying allrecords:true to the query. This is useful for constructing certain types of queries, for example allrecords:true NOT ocean returns all records in the repository that do not contain the term ocean in the default field. Single valid value is true. This has the same effect as the Lucene query *:*. Available for formats: all formats.

  • hasBoundingBox - Boolean value that indicates whether the record has a geospatial bounding box footprint available for search. Valid values are either true or false. Available for formats: all formats.

 

XPath Search Fields

XPath search fields provide separate searchable fields for the contents of every element and attribute found in the native XML of the records. For each element and attribute there are three forms of search fields: text, stemmed text and untokenized keywords. These provide a powerful, flexible way to search for specific text or data within and across the records in the repository.

The XPath fields consist of a prefix followed by an XPath that addresses a specific XML element or attribute in the XML record. Prefixes are one of /text/, /stems/, or /key/, which specify to search over text, stemmed text or untokenized keyword forms of the data, respectively. This is followed by a namespace-free, position-free XPath addressing a specific element or attribute in the XML.

The three types of search fields are processed in the following manner:

text - Text is processed using the Lucene StandardAnalyzer.
stems - Text is processed using the Lucene SnowballAnalyzer for the english language.
key - Text is processed using the Lucene KeywordAnalyzer, which is case-sensitive and includes the entire element or attribute as a single token. Use these for sorting.

The XPaths used for the search fields are the most simple form of XPath expression, containing no namespaces or position specifiers. For more information about XPath see XPath Language 1.0. The ZVON XPath Tutorial is also useful. Note that this is not an implementation of XQuery but rather a mapping of simple XPaths to searchable Lucene fields.

For example, consider this simple XML instance document:

<book>
  <author birthDate="1955-01-25">
    <firstName>John</firstName>
    <lastName>Doe</lastName>
  </author>
  <identifier>http://books.org/catalog_123</identifier>
</book>

The index will contain the following search fields for this record:

/text//book/author/firstName
/stems//book/author/firstName
/key//book/author/firstName

/text//book/author/lastName
/stems//book/author/lastName
/key//book/author/lastName

/text//book/author/@birthDate
/stems//book/author/@birthDate
/key//book/author/@birthDate

/text//book/identifier
/stems//book/identifier
/key//book/identifier

As another example, consider the following Dublin Core oai_dc record:

<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
    xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ 
	http://www.openarchives.org/OAI/2.0/oai_dc.xsd" 
    xmlns:dc="http://purl.org/dc/elements/1.1/" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">Ocean Science Leadership Awards</dc:title>
  <dc:description xmlns:dc="http://purl.org/dc/elements/1.1/">This is a description of the 
  Ocean Science Leadership Awards... </dc:description>
  <dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Earth system science</dc:subject>
  <dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Education</dc:subject>
  <dc:format xmlns:dc="http://purl.org/dc/elements/1.1/">text/html</dc:format>
  <dc:type xmlns:dc="http://purl.org/dc/elements/1.1/">Text</dc:type>
  <dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">
     http://www.usc.edu/org/cosee-west/quikscience/OceanLeadershipAwards.html
  </dc:identifier>
</oai_dc:dc>

The following Lucene queries are examples that match specific text and data in this record. As with all fielded Lucene queries, these queries consist of a field name followed by a colon ":" and then followed by the term(s) to search for. Note that XPaths do not contain namespaces or position specifiers:

/stems//dc/title:oceans - Matches the stemmed form of the term ocean found in the title element of the XML record.

/text//dc/subject:education - Matches the term education found in one of the subject elements of the XML record.

/key//dc/format:"text/html" - Matches the untokenized keyword term text/html found in the format element of the XML record.

 

Searching by indexed XPath

In addition to the XPaths fields, a special field named indexedXpaths contains each XPath that has been indexed for a given record, as a keyword. Using this field it is possible to search for all records that have any value assigned for a given XPath. For example, the following query:

indexedXpaths:"/dc/subject" - Matches all records that have any value in the /dc/subject field.

Conversely, the following query:

allrecords:true !indexedXpaths:"/dc/subject" - Matches all records that have no value in the /dc/subject field.

 

Relation Search Fields

The DDS data model supports a notion of relationships between records in the repository. Relation search fields provide a means to search for records based on the content of their related records. For example, each record in the repository has a memberOfCollection relationship that connects it to it's collection-level metadata record. Using a relation search field, one can perform a search such as "show me all records that are in a collection of resources for education levels K through 5" (this assumes that collections have defined metadata for education levels).

XPath relation search fields

XPath relation search fields follow the same syntax as the XPath fields except they contain an additional prefix that specifies the relation. Fields begin with /relation.[relationship name] and are followed by the XPath field of the related XML record.

For example, consider a record that is a member of the following collection:

<collectionRecord>
	  <general>
		<fullTitle>Science books</fullTitle>
		<description>This collection has books about science.</description>
	  </general>
	  <approval>
		<collectionStatuses>
		  <collectionStatus date="2010-10-05T13:23:43Z" state="Accessioned"/>
		</collectionStatuses>
	  </approval>
	  <access>
	  	<key libraryFormat="book" static="true" redistribute="false">sciBooks</key>
	  </access>
	  <metaMetadata>
		<catalogEntries>
		  <catalog entry="COLLECTION-123"/>
		</catalogEntries>
	  </metaMetadata>
</collectionRecord>

The index will contain the following relation search fields for the records in this collection:

/relation.memberOfCollection//text//collectionRecord/general/fullTitle
/relation.memberOfCollection//stems//collectionRecord/general/fullTitle
/relation.memberOfCollection//key//collectionRecord/general/fullTitle

/relation.memberOfCollection//text//collectionRecord/general/description
/relation.memberOfCollection//stems//collectionRecord/general/description
/relation.memberOfCollection//key//collectionRecord/general/description

etc. (not all fields shown here).

A search for /relation.memberOfCollection//text//collectionRecord/general/fullTitle:oceans will return all records whoes collection have the word oceans in thier fullTitle.

 

Additional relation search fields


  • isRelatedToByCollectionKey - Contains collection keys of all collections that have assigned any relation to the item.

  • isRelatedToByCollectionKey.[relationship] - Contains collection keys of all collections that have assigned the given relation to the item, for example isRelatedToByCollectionKey.isAnnotatedBy.

Relationships

The relation memberOfCollection is intrinsic in a DDS repository, however implementation specific relationships can be configured between any two record types in a given DDS repository (see Configure Search Fields, facets, and relationships). The following are common relationships that exist in a default DDS configuration:

 

Subject XML record Predicate / relationship Object XML record format Description
all formats memberOfCollection collection record (dlese_collect) All records are a member of a collection; The collection record describes the kinds of records in the collection and their general attributes such as subjects, education levels, and resource types.
all formats isAnnotatedBy annotation record (comm_anno, dlese_anno) A record may be annotated by another record. The annotation record contains the contents of the annotation, for example a user's comments about a resource.
all formats isAssessedBy assessment record (assessments) A record may have an assessment associated with it. The assessment record contains the contents of the assessment, for example a set of questions and answers to assess a learners understanding of a given concept.

 

 

Custom Search Fields

Custom search fields are available for specific XML formats as indicated below. Additional implementation specific custom search fields that are not described here may also be available for a given DDS repository configuration.


Text fields - These fields contain plain text or, where indicated, text that has been stemmed using the Porter stemmer algorithm (snowball variation).

  • keyword - Contains keywords associated with the resource or item, as text. May be sorted. Available for formats: adn, dlese_collect, news_opps.

  • creator - Contains the first, middle and last name of each contributor for the resource. May not be sorted. Available for formats: adn.

  • organizationInstName - Contains the name of the contributing institution. May be sorted. Available for formats: adn.

  • organizationInstDepartment - Contains the name of the contributing institution's department. May be sorted. Available for formats: adn.

  • personInstName - Contains the name of the contributing person's institution. May be sorted. Available for formats: adn.

  • personInstDepartment - Contains the name of the contributing person's institutional department. May be sorted. Available for formats: adn.

  • emailPrimary - The primary contributor's e-mail. May be sorted. Available for formats: adn.

  • emailOrganization - The contributing organization's e-mail. May be sorted. Available for formats: adn.

  • emailAlt- The alternate contributor's e-mail. May be sorted. Available for formats: adn.

  • placeNames - Place names, for example "colorado," "AZ," "Brazil," as text. May be sorted. Available for formats: adn.

  • eventNames - Event names, for example "windstorm," "Destruction of Pompeii," as text. May be sorted. Available for formats: adn.

  • temporalCoverageNames - Temporal coverage names, for example "cambrian," "Triassic Period," as text. May be sorted. Available for formats: adn.

  • itemAudienceTypicalAgeRange - The typical age range for this resource. Available for formats: adn.

  • itemAudienceInstructionalGoal - The instructional goals for this resource. Available for formats: adn.

  • newsOppstitle - News & Opportunities title. May be sorted. Available for formats: news_opps.

  • newsOppsdescription - News & Opportunities description. May be sorted. Available for formats: news_opps.

  • newsOppskeyword - News & Opportunities keywords. May be sorted. Available for formats: news_opps.

  • ncsCollectOaiBaseUrl - Contains the NSDL Collection OAI baseURL. Useful search clause examples include http*nasa.gov* or http*.edu*. May be sorted. Available for formats: ncs_collect.
    From xpath: /record/collection/ingest/oai/@baseURL

Textual content - These fields contain the text of the content of the resources themselves, extracted by crawling the first page of the resource. These are available for all ADN resources in the reository whose primary content is in HTML or PDF.

  • itemContent - The full textual content of the resource. May be sorted. Available for formats: adn.

  • itemContentTitle - The HTML title element text. May be sorted. Available for formats: adn.

  • itemContentHeaders - The HTML header element (H1, H2, etc.) text. May be sorted. Available for formats: adn.

  • itemContentType - The HTTP content type header terms that were returned by the Web server that holds the resource, for example "text html", "application pdf". May be sorted. Available for formats: adn.

Textual vocabulary fields - These fields contain DLESE controlled vocabularies that have been indexed as plain text.

  • gradeRange - The DLESE grade range vocabularies verbatim as text, for example "DLESE:Primary elementary." May be sorted. Available for formats: adn, dlese_collect.

  • resourceType - The DLESE resource type vocabularies verbatim as text, for example "DLESE:Learning materials:Classroom activity." May be sorted. Available for formats: adn, dlese_collect.

  • subject - The DLESE subject vocabularies verbatim as text, for example "DLESE:Atmospheric science." May be sorted. Available for formats: adn, dlese_collect.

  • contentStandard - The DLESE content standard vocabularies verbatim as text, for example "NSES:K-4:Unifying Concepts and Processes Standards:Change, constancy, and measurement." May be sorted. Available for formats: adn.




  • itemannotypes - Indicates the type of annotation that this item has, for example "Teaching tip," "Information on challenging teaching and learning situations," as text. These values are shown in the types schema. May be sorted. Available for formats: adn.

  • itemannostatus - Indicates the status of an annotation that this item has, for example "Text annotation completed," as text. These values are shown in the status schema. May be sorted. Available for formats: adn.

  • itemannoformats - Indicates the format of an annotation that this item has. Values include 'text', 'audio', 'video' and 'graphical'. May be sorted. Available for formats: adn.

  • itemannopathways - Indicates the pathway of an annotation that this item has, for example "CRS (Community Review System)," as text. These values are shown in the pathway schema. May be sorted. Available for formats: adn.

  • newsOppsannouncement - News & Opportunities announcements. May be sorted. Available for formats: news_opps.

  • newsOppsaudience - News & Opportunities audience. May be sorted. Available for formats: news_opps.

  • newsOppsdiversity - News & Opportunities diversity. May be sorted. Available for formats: news_opps.

  • newsOppslocation - News & Opportunities locations. May be sorted. Available for formats: news_opps.

  • newsOppstopic - News & Opportunities topics. May be sorted. Available for formats: news_opps.

  • ncsCollectEdLevel - NSDL Collection education level field. May be sorted. Available formats: ncs_collect.
    From xpath: /record/educational/educationLevels/nsdlEdLevel, /record/educational/educationLevels/otherEdLevel

  • ncsCollectCollectionPurpose - NSDL Collection collection purpose field. May be sorted. Available formats: ncs_collect.
    From xpath: /record/collection/collectionPurposes/collectionPurpose

  • ncsCollectAudience - NSDL Collection audience field. May be sorted. Available formats: ncs_collect.
    From xpath: /record/educational/audiences/nsdlAudience, /record/educational/audiences/otherAudience

  • ncsCollectSubject - NSDL Collection subject field. May be sorted. Available formats: ncs_collect.
    From xpath: /record/general/subject

  • ncsCollectCollectionSubject - NSDL Collection collection subject field. May be sorted. Available formats: ncs_collect.
    From xpath: /record/collection/collectionSubjects/collectionSubject

  • ncsCollectPathwayName - NSDL Collection pathway name. May be sorted. Available formats: ncs_collect.
    From xpath: /record/collection/pathways/name

Defined key fields - These fields contain finite sets of key values that may be used to limit searches to a sub-set of records.

  • ky - Contains the search key for the collection in which the record resides, which may be used to limit search to within one or more collections of records. These values may be discovered using the ListCollections request within the searchKey element. May be sorted. Available for formats: adn.

  • collection - Similar to ky, contains the record's collection vocabulary entry appended with a 0, for example "0dcc," "0comet.". These values may be discovered using the ListCollections request within the vocabEntry element. May be sorted. Available for all formats.

  • itemhasanno - Indicates whether an item has an annotation. Values are either "true" or "false." May be sorted. Available for formats: adn.

  • partofdrc - Indicates whether the item or collection is part of the DLESE Reviewed Collection (DRC). Values are either "true" or "false." May be sorted. Available for formats: adn, dlese_collect.

  • multirecord - Indicates whether the resource that the record catalogs is also cataloged by other records in other collections. Values are either "true" or "false." May be sorted. Available for formats: adn.

  • wntype - Indicates the reason the item is new to the repository, corresponding to the 'wndate' field. Possible values are: itemnew, itemannocomplete, itemannoinprogress, annocomplete, drcannocomplete, drcannoinprogress, collection. May be sorted. Available for all formats.

  • ncsCollectHasOai - Boolean value that indicates whether the NSDL Collection metadata contains OAI information (an OAI baseURL). Possible values are: true, false. May be sorted. Available formats: ncs_collect.
    From xpath: /record/collection/ingest/oai/@baseURL

  • ncsCollectOaiVisibility - The NSDL Collection OAI visibility field falue. Possible values are: public, protected, private. May be sorted. Available formats: ncs_collect.
    From xpath: /record/collection/OAIvisibility

  • ncsCollectIsPathway - Boolean value that indicates the NSDL Collection pathway value. Possible values are: true, false. May be sorted. Available formats: ncs_collect.
    From xpath: /record/collection/pathway

Fields available for searching by value or range of value - These fields may be searched by exact value or by range of value:

  • itemannoaveragerating - Contains the average of all star ratings assigned to a given resource. Values range from 1.000 to 5.000. Example search syntax itemannoaveragerating:[3.500 TO 5.000] - returns all resources with an average star rating of 3.5 to 5.0. May be sorted. Available for formats:adn.

  • itemannoratingvalues - Contains all star ratings assigned to a given resource. Values range from 1 to 5. Example search syntax itemannoratingvalues:[3 TO 5] - returns all resources that have one or more ratings of 3, 4, or 5 stars assigned to them. May be sorted. Available for formats:adn.

  • itemannonumratings - Contains the number of star ratings that have been assigned to a given resource. Values are encoded to 5 digits, for example 00000 or 00014. Example search syntax itemannonumratings:[00004 TO 99999] - returns all resources that have from 4 to 99999 star ratings assigned to them. May be sorted. Available for formats:adn.

  • annorating - Contains the star rating of a given annotation record. Values are integers from 1 to 5. Example search syntax annorating:[3 TO 5] - returns all annotations that have a start rating of 3 to 5. May be sorted. Available for formats: dlese_anno.

  • ncsCollectOaiFrequency - Integer and float values that indicate the NSDL Collection OAI harvest frequency in months. Range queries are not supported. (search by value only). Possible values are: 1, 2, ... n; 0.5. May be sorted. Available formats: ncs_collect.
    From xpath: /record/collection/ingest/oai/@frequency

Fields available for searching by date - These fields may be supplied in the 'dateField' argument of the Search request:

  • wndate - A date field that indicate the date the item was new to the repository, corresponding to the 'wntype' field. May be sorted. Available for all formats.

  • accessiondate - The ADN accession date for the record. May be sorted. Available for formats: adn.

  • collaccessiondate - The dlese_collect accession date for the collection. May be sorted. Available for formats: dlese_collect.

  • modtime - A date field that corresponds to the time the items file was last modified or touched. This does not necessarily indicate that the content in the record changed. May be sorted. Available for all formats.

  • newsOppsapplyBydate - News & Opportunities applyBy date. May be sorted. Available for formats: news_opps.

  • newsOppsarchivedate - News & Opportunities archive date. May be sorted. Available for formats: news_opps.

  • newsOppsduedate - News & Opportunities due date. May be sorted. Available for formats: news_opps.

  • newsOppseventStartdate - News & Opportunities eventStart date. May be sorted. Available for formats: news_opps.

  • newsOppseventStopdate - News & Opportunities eventStop date. May be sorted. Available for formats: news_opps.

  • newsOppspostdate - News & Opportunities post date. May be sorted. Available for formats: news_opps.

  • newsOppsrecordCreationdate - News & Opportunities recordCreation date. May be sorted. Available for formats: news_opps.

  • newsOppsrecordModifieddate - News & Opportunities recordModified date. May be sorted. Available for formats: news_opps.


Example search queries


This section shows some examples of performing searches using the Search request. To perform these searches, the values shown below should be supplied in the 'q' argument, using the Lucene query syntax. Additional arguments may be supplied to the Search request to further limit the search, such as xmlFormat, dateField and the vocabulary fields gr, su, re and cs.

Search for the term 'ocean' in the default field:
ocean

Search for the term 'ocean' in the stems field. This will return documents containing morphologically similar terms including ocean, oceans and oceanic:
stems:ocean

Search for the terms 'currents in the oceans' in the stems field. Notice that the client should supply the plain english version of the terms without pre-stemming them. In this example the resulting search matches documents that contain both currents, current or currently AND oceans, ocean, or oceanic (the terms 'in' and 'the' are stop words that are dropped for the purpose of search):
stems:(currents in the oceans)

Search for resources that that have an average star rating of 3.5 to 5.0:
itemannoaveragerating:[3.500 TO 5.000]

Search for resources that contain 'noaa.gov' in their URL:
url:http*noaa.gov*

Search for the term ocean within resources from 'noaa.gov':
url:http*noaa.gov* AND ocean

Search for term 'estuary' in the stems field, and limit the search to subject biological oceanography (subject key 02):
stems:estuary AND su:02

Search for the term 'ocean' in the default field, and boost the ranking of results that contain 'ocean' in their title (stemmed) (uses the special clause allrecords:true to select the set of all records). Note that this clause returns the same number of results as if the search were performed only over the word 'ocean' in the default field, but it applies additional boosting for records that contain the term 'ocean' in their title (stemmed), which augments the search rank of the results that are returned.
ocean AND (allrecords:true OR titlestems:ocean^2)

Show all records with subject biological oceanography, and boost results that contain florida in the title (stemmed), description or placeNames fields (uses the clause allrecords:true to select the set of all records):
su:02 AND (allrecords:true OR titlestems:florida*^20 
           OR description:florida*^20 OR placeNames:florida^20) 


Glossary

whatsNewDate - A date that describes when an item was new to the repository. Generally this corresponds to the item's accession date or the date in which the item first became accessible in the repository.

 

Configure search fields, facets, and relationships

The following document provides information for system administrators who are installing and managing a DDS repository system, which includes the Digital Discovery System (DDS) and the NSDL Collection System (NCS).


University Corporation for Atmospheric Research (UCAR) National Science Foundation (NSF) National Science Digital Library (NSDL)