Differences between revisions 1 and 6 (spanning 5 versions)
Revision 1 as of 2008-06-26 12:26:10
Size: 2109
Editor: kfc
Comment: Created by the PackagePages action.
Revision 6 as of 2008-10-17 13:44:57
Size: 8998
Editor: bam
Comment: Search API review follow-up
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
An API for searching DOMS needs to be provided.

Two possibilities exist

 * Using GSearch - http://defxws2006.cvt.dk/fedoragsearch/
   * Pros: Easily set up, simple interface
   * Cons: Searches on a one-fedora-object basis
 * Using Summa - https://gforge.statsbiblioteket.dk/projects/summa
   * Pros: Searches on entire metadata records
   * Cons: Difficult to setup, more complicated interface

Probably, Summa is the best bet.

Summa webservice search interface WSDL extract:

{{{
   <element name="simpleSearch">
    <complexType>
     <sequence>
      <element name="query" type="xsd:string"/>
      <element name="numberOfRecords" type="xsd:int"/>
      <element name="startIndex" type="xsd:int"/>
     </sequence>
    </complexType>
   </element>
   <element name="simpleSearchResponse">
    <complexType>
     <sequence>
      <element name="simpleSearchReturn" type="xsd:string"/>
     </sequence>
    </complexType>
   </element>
   <element name="simpleSearchSorted">
    <complexType>
     <sequence>
      <element name="query" type="xsd:string"/>
      <element name="numberOfRecords" type="xsd:int"/>
      <element name="startIndex" type="xsd:int"/>
      <element name="sortKey" type="xsd:string"/>
      <element name="reverse" type="xsd:boolean"/>
     </sequence>
    </complexType>
   </element>
   <element name="simpleSearchSortedResponse">
    <complexType>
     <sequence>
      <element name="simpleSearchSortedReturn" type="xsd:string"/>
     </sequence>
    </complexType>
   </element>
}}}


Result will be of the form


{{{
       <?xml version="1.0" encoding="UTF-8"?>
       <searchresult filter="..." query="..."
                     startIndex="..." maxRecords="..."
                     sortKey="..." reverseSort="..."
                     fields="..." searchTime="..." hitCount="...">
         <record score="..." sortValue="...">
           <field name="recordID">...</field>
           <field name="shortformat">...</field>
         </record>
         ...
       </searchresult>
}}}
DOMS Search uses simple search methods of the Summa Search interface.

WSDL: attachment:DomsGUISearch.xml

Content of this page:
 * [#operations Operations]
 * [#resultXML Result XML Description]
 * [#example Result XML Example]

[[Anchor(operations)]]
== Operations ==

=== simpleSearch ===
This method executes the given query and returns a search result ranked by relevance.

Input parameters:
 * {{{String query}}} The query string.
 * {{{int numberOfRecords}}} The maximum number of records returned in search result.
 * {{{int startIndex}}} The number of the first record to return.

Returns:
 * {{{String simpleSearchReturn}}} The search result sorted by relevance as structured XML document. See [#resultXML description] below.

Throws:
 * {{{java.rmi.RemoteException}}}

=== simpleSearchSorted ===
This method executes the given query and returns a search result ranked by the given sort key.

Input parameters:
 * {{{String query}}} The query string.
 * {{{int numberOfRecords}}} The maximum number of records returned in search result.
 * {{{int startIndex}}} The number of the first record to return.
 * {{{String sortKey}}} The key to sort by.
 * {{{boolean reverse}}} A boolean indication whether or not to sort in reverse.

Returns:
 * {{{String simpleSearchReturn}}} The search result sorted by the given key, reversed if so indicated, as structured XML document. See [#resultXML description] below.

Throws:
 * {{{java.rmi.RemoteException}}}

[[Anchor(resultXML)]]
== Result XML Description ==

The result string defined by Summa is XML, in the following form:

{{{
<?xml version="1.0" encoding="UTF-8" ?>
<responsecollection>
  <response> response-xml-1 </response>
  <response> response-xml-2 </response>
  ...
<responsecollection>
}}}

Possible responses (in place of {{{response-xml-1}}}, {{{response-xml-2}}}, ... above) are document response, facet result and others. In DOMS we only use document response, which looks like this:
{{{
<documentresult filter="..." query="..." startIndex="..." maxRecords="..." sortKey="..."
                reverseSort="..." fields="..." searchTime="..." hitCount="...">
  <record score="..." sortValue="...">
    <field name="recordID">...</field>
    <field name="shortformat">...</field>
  </record>
  ...
</documentresult>
}}}

Currently, we do not have a schema for the result. The result can be read as follows:

documentresult element
 * Attribute {{{filter}}} is not used in simple search results.
 * Attributes {{{query}}}, {{{startIndex}}}, {{{maxRecords}}}, {{{sortKey}}}, {{{reverseSort}}}: Same as input to method.
 * Attribute {{{fields}}}: Always "recordID, shortformat" in DOMS.
 * Attribute {{{searchTime}}}: Time it took to search.
 * Attribute {{{hitCount}}}: Number of results.

record element
 * Attribute {{{score}}}: relevancy ranking, value from 0 to 1.
 * Attribute {{{sortValue}}} is the value that the sort was performed on.

field element
 * Attribute {{{name}}}: In DOMS always either recordID or shortformat.
 * Contents are the PID for recordID, or XML for shortformat.

The XML for shortformat is of the following form:
{{{
<shortrecord>
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <rdf:Description>
      <dc:title>...</dc:title>
      <dc:creator>...</dc:creator>
      <dc:date>...</dc:date>
      <dc:type xml:lang="da">netdokument</dc:type>
      <dc:type xml:lang="en">net document</dc:type>
      <dc:identifier>...</dc:identifier>
      ...
    </rdf:Description>
  </rdf:RDF>
</shortrecord>
}}}

The important elements are the "dc" fields. They will contain the actual results.

[[Anchor(example)]]
== Result XML Example ==
This example is the same as the one given by the [http://wiki.statsbiblioteket.dk/summa/Community/Tutorials/MinimalDeployment Summa Minimal Deployment Tutorial], except without the facet result response.
##New Example:
{{{
<?xml version="1.0" encoding="UTF-8" ?>
<responsecollection>
<response name="DocumentResponse">
<documentresult query="narrative" startIndex="0" maxRecords="20" sortKey="summa-score" reverseSort="false" fields="main_titel, lsubject, lsu_oai, author_normalised, recordID, shortformat" searchTime="8" hitCount="2">
  <record score="0.20924361" id="122" source="NA">
    <field name="main_titel">Pensare per immagini: una strada per la coscienza</field>
    <field name="lsubject">NoSubject</field>
    <field name="lsu_oai">NoOAI</field>
    <field name="author_normalised">Ferdinando Testa</field>
    <field name="recordID">oai:oai:doaj-articles:badd9ac32fc2e096cf76fec4f0d19250</field>
    <field name="shortformat"><shortrecord>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description>
<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">Pensare per immagini: una strada per la coscienza</dc:title>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Ferdinando Testa</dc:creator>
<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2005</dc:date>
<dc:type xml:lang="da" xmlns:dc="http://purl.org/dc/elements/1.1/">netdokument</dc:type>
<dc:type xml:lang="en" xmlns:dc="http://purl.org/dc/elements/1.1/">net document</dc:type>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.analisiqualitativa.com/magma/0304/articolo_01.htm</dc:identifier>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.doaj.org/doaj?func=openurl&amp;genre=article&amp;issn=17219809&amp;date=2005&amp;volume=03&amp;issue=04&amp;spage=</dc:identifier>
<dc:format xmlns:dc="http://purl.org/dc/elements/1.1/">todo</dc:format>
</rdf:Description>
</rdf:RDF>
</shortrecord></field>
  </record>
  <record score="0.20924361" id="149" source="NA">
    <field name="main_titel">La narrazione: dimensione ontologica della formazione</field>
    <field name="lsubject">NoSubject</field>
    <field name="lsu_oai">NoOAI</field>
    <field name="author_normalised">Francesca Pulvirenti</field>
    <field name="recordID">oai:oai:doaj-articles:dd2dffe34df1293e045aee58f06a5c3f</field>
    <field name="shortformat"><shortrecord>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description>
<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">La narrazione: dimensione ontologica della formazione</dc:title>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Francesca Pulvirenti</dc:creator>
<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2005</dc:date>
<dc:type xml:lang="da" xmlns:dc="http://purl.org/dc/elements/1.1/">netdokument</dc:type>
<dc:type xml:lang="en" xmlns:dc="http://purl.org/dc/elements/1.1/">net document</dc:type>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.analisiqualitativa.com/magma/0303/editoriale.htm</dc:identifier>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.doaj.org/doaj?func=openurl&amp;genre=article&amp;issn=17219809&amp;date=2005&amp;volume=03&amp;issue=03&amp;spage=</dc:identifier>
<dc:format xmlns:dc="http://purl.org/dc/elements/1.1/">todo</dc:format>
</rdf:Description>
</rdf:RDF>
</shortrecord></field>
  </record>
</documentresult>
</response>
</responsecollection>
}}}


##Old Example:
##{{{
##<responsecollection>
##<response name="DocumentResponse">
##<documentresult query="Hans" startIndex="0" maxRecords="20"
##sortKey="summa-score" reverseSort="false" fields="recordID, shortformat"
##searchTime="105" hitCount="1">
## <record score="0.37572" id="0" source="NA">
## <field name="recordID">fagref:hj@example.com</field>
## <field name="shortformat">&amp;lt;shortrecord&amp;gt;
##&amp;lt;rdf:RDF
##xmlns:rdf=&amp;quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;&gt;
##&amp;lt;rdf:Description&amp;gt;
##&amp;lt;dc:title&amp;gt;Fagekspert i Datalogi&amp;lt;/dc:title&amp;gt;
##&amp;lt;dc:creator&amp;gt;Hans Jensen&amp;lt;/dc:creator&amp;gt;
##&amp;lt;dc:type
##xml:lang=&amp;quot;da&amp;quot;&amp;gt;person&amp;lt;/dc:type&amp;gt;
##&amp;lt;dc:type
##xml:lang=&amp;quot;en&amp;quot;&amp;gt;person&amp;lt;/dc:type&amp;gt;
##&amp;lt;dc:identifier&amp;gt;hj@example.com&amp;lt;/dc:identifier&amp;gt;
##&amp;lt;/rdf:Description&amp;gt;
##&amp;lt;/rdf:RDF&amp;gt;
##&amp;lt;/shortrecord&amp;gt;</field>
## </record>
##</documentresult>
##</response>
##</responsecollection>
##}}}
##
##The format of the "shortformat" field is (decoded version of contents above):
##
##{{{
##<shortrecord>
##<rdf:RDF
##xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
##<rdf:Description>
##<dc:title>Fagekspert i Datalogi</dc:title>
##<dc:creator>Hans Jensen</dc:creator>
##<dc:type
##xml:lang="da">person</dc:type>
##<dc:type
##xml:lang="en">person</dc:type>
##<dc:identifier>hj@example.com</dc:identifier>
##</rdf:Description>
##</rdf:RDF>
##</shortrecord>
##}}}

Search API

DOMS Search uses simple search methods of the Summa Search interface.

WSDL: attachment:DomsGUISearch.xml

Content of this page:

  • [#operations Operations]
  • [#resultXML Result XML Description]
  • [#example Result XML Example]

Anchor(operations)

Operations

simpleSearch

This method executes the given query and returns a search result ranked by relevance.

Input parameters:

  • String query The query string.

  • int numberOfRecords The maximum number of records returned in search result.

  • int startIndex The number of the first record to return.

Returns:

  • String simpleSearchReturn The search result sorted by relevance as structured XML document. See [#resultXML description] below.

Throws:

  • java.rmi.RemoteException

simpleSearchSorted

This method executes the given query and returns a search result ranked by the given sort key.

Input parameters:

  • String query The query string.

  • int numberOfRecords The maximum number of records returned in search result.

  • int startIndex The number of the first record to return.

  • String sortKey The key to sort by.

  • boolean reverse A boolean indication whether or not to sort in reverse.

Returns:

  • String simpleSearchReturn The search result sorted by the given key, reversed if so indicated, as structured XML document. See [#resultXML description] below.

Throws:

  • java.rmi.RemoteException

Anchor(resultXML)

Result XML Description

The result string defined by Summa is XML, in the following form:

<?xml version="1.0" encoding="UTF-8" ?>  
<responsecollection>  
  <response>  response-xml-1 </response>  
  <response>  response-xml-2 </response>  
  ... 
<responsecollection> 

Possible responses (in place of response-xml-1, response-xml-2, ... above) are document response, facet result and others. In DOMS we only use document response, which looks like this:

<documentresult filter="..." query="..." startIndex="..." maxRecords="..." sortKey="..." 
                reverseSort="..." fields="..." searchTime="..." hitCount="...">  
  <record score="..." sortValue="...">  
    <field name="recordID">...</field>  
    <field name="shortformat">...</field>  
  </record>  
  ... 
</documentresult>

Currently, we do not have a schema for the result. The result can be read as follows:

documentresult element

  • Attribute filter is not used in simple search results.

  • Attributes query, startIndex, maxRecords, sortKey, reverseSort: Same as input to method.

  • Attribute fields: Always "recordID, shortformat" in DOMS.

  • Attribute searchTime: Time it took to search.

  • Attribute hitCount: Number of results.

record element

  • Attribute score: relevancy ranking, value from 0 to 1.

  • Attribute sortValue is the value that the sort was performed on.

field element

  • Attribute name: In DOMS always either recordID or shortformat.

  • Contents are the PID for recordID, or XML for shortformat.

The XML for shortformat is of the following form:

<shortrecord>
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <rdf:Description>
      <dc:title>...</dc:title>
      <dc:creator>...</dc:creator>
      <dc:date>...</dc:date>
      <dc:type xml:lang="da">netdokument</dc:type>
      <dc:type xml:lang="en">net document</dc:type>
      <dc:identifier>...</dc:identifier>
      ...
    </rdf:Description>
  </rdf:RDF>
</shortrecord>

The important elements are the "dc" fields. They will contain the actual results.

Anchor(example)

Result XML Example

This example is the same as the one given by the [http://wiki.statsbiblioteket.dk/summa/Community/Tutorials/MinimalDeployment Summa Minimal Deployment Tutorial], except without the facet result response.

<?xml version="1.0" encoding="UTF-8" ?>
<responsecollection>
<response name="DocumentResponse">
<documentresult query="narrative" startIndex="0" maxRecords="20" sortKey="summa-score" reverseSort="false" fields="main_titel, lsubject, lsu_oai, author_normalised, recordID, shortformat" searchTime="8" hitCount="2">
  <record score="0.20924361" id="122" source="NA">
    <field name="main_titel">Pensare per immagini: una strada per la coscienza</field>
    <field name="lsubject">NoSubject</field>
    <field name="lsu_oai">NoOAI</field>
    <field name="author_normalised">Ferdinando Testa</field>
    <field name="recordID">oai:oai:doaj-articles:badd9ac32fc2e096cf76fec4f0d19250</field>
    <field name="shortformat"><shortrecord>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description>
<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">Pensare per immagini: una strada per la coscienza</dc:title>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Ferdinando Testa</dc:creator>
<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2005</dc:date>
<dc:type xml:lang="da" xmlns:dc="http://purl.org/dc/elements/1.1/">netdokument</dc:type>
<dc:type xml:lang="en" xmlns:dc="http://purl.org/dc/elements/1.1/">net document</dc:type>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.analisiqualitativa.com/magma/0304/articolo_01.htm</dc:identifier>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.doaj.org/doaj?func=openurl&amp;genre=article&amp;issn=17219809&amp;date=2005&amp;volume=03&amp;issue=04&amp;spage=</dc:identifier>
<dc:format xmlns:dc="http://purl.org/dc/elements/1.1/">todo</dc:format>
</rdf:Description>
</rdf:RDF>
</shortrecord></field>
  </record>
  <record score="0.20924361" id="149" source="NA">
    <field name="main_titel">La narrazione: dimensione ontologica della formazione</field>
    <field name="lsubject">NoSubject</field>
    <field name="lsu_oai">NoOAI</field>
    <field name="author_normalised">Francesca Pulvirenti</field>
    <field name="recordID">oai:oai:doaj-articles:dd2dffe34df1293e045aee58f06a5c3f</field>
    <field name="shortformat"><shortrecord>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description>
<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">La narrazione: dimensione ontologica della formazione</dc:title>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Francesca Pulvirenti</dc:creator>
<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2005</dc:date>
<dc:type xml:lang="da" xmlns:dc="http://purl.org/dc/elements/1.1/">netdokument</dc:type>
<dc:type xml:lang="en" xmlns:dc="http://purl.org/dc/elements/1.1/">net document</dc:type>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.analisiqualitativa.com/magma/0303/editoriale.htm</dc:identifier>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.doaj.org/doaj?func=openurl&amp;genre=article&amp;issn=17219809&amp;date=2005&amp;volume=03&amp;issue=03&amp;spage=</dc:identifier>
<dc:format xmlns:dc="http://purl.org/dc/elements/1.1/">todo</dc:format>
</rdf:Description>
</rdf:RDF>
</shortrecord></field>
  </record>
</documentresult>
</response>
</responsecollection>

Search API (last edited 2010-03-17 13:09:38 by localhost)