Differences between revisions 1 and 7 (spanning 6 versions)
Revision 1 as of 2008-06-26 12:26:10
Size: 2109
Editor: kfc
Comment: Created by the PackagePages action.
Revision 7 as of 2010-03-17 13:09:38
Size: 9014
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
An API for searching DOMS needs to be provided.

Two possibilities exist

 * Using GSearch - http://defxws2006.cvt.dk/fedoragsearch/
   * Pros: Easily set up, simple interface
   * Cons: Searches on a one-fedora-object basis
 * Using Summa - https://gforge.statsbiblioteket.dk/projects/summa
   * Pros: Searches on entire metadata records
   * Cons: Difficult to setup, more complicated interface

Probably, Summa is the best bet.

Summa webservice search interface WSDL extract:

{{{
   <element name="simpleSearch">
    <complexType>
     <sequence>
      <element name="query" type="xsd:string"/>
      <element name="numberOfRecords" type="xsd:int"/>
      <element name="startIndex" type="xsd:int"/>
     </sequence>
    </complexType>
   </element>
   <element name="simpleSearchResponse">
    <complexType>
     <sequence>
      <element name="simpleSearchReturn" type="xsd:string"/>
     </sequence>
    </complexType>
   </element>
   <element name="simpleSearchSorted">
    <complexType>
     <sequence>
      <element name="query" type="xsd:string"/>
      <element name="numberOfRecords" type="xsd:int"/>
      <element name="startIndex" type="xsd:int"/>
      <element name="sortKey" type="xsd:string"/>
      <element name="reverse" type="xsd:boolean"/>
     </sequence>
    </complexType>
   </element>
   <element name="simpleSearchSortedResponse">
    <complexType>
     <sequence>
      <element name="simpleSearchSortedReturn" type="xsd:string"/>
     </sequence>
    </complexType>
   </element>
}}}


Result will be of the form


{{{
       <?xml version="1.0" encoding="UTF-8"?>
       <searchresult filter="..." query="..."
                     startIndex="..." maxRecords="..."
                     sortKey="..." reverseSort="..."
                     fields="..." searchTime="..." hitCount="...">
         <record score="..." sortValue="...">
           <field name="recordID">...</field>
           <field name="shortformat">...</field>
         </record>
         ...
       </searchresult>
}}}
DOMS Search uses simple search methods of the Summa Search interface.

WSDL: [[attachment:DomsGUISearch.xml]]

Content of this page:
 * [[#operations|Operations]]
 * [[#resultXML|Result XML Description]]
 * [[#example|Result XML Example]]

<<Anchor(operations)>>
== Operations ==

=== simpleSearch ===
This method executes the given query and returns a search result ranked by relevance.

Input parameters:
 * {{{String query}}} The query string.
 * {{{int numberOfRecords}}} The maximum number of records returned in search result.
 * {{{int startIndex}}} The number of the first record to return.

Returns:
 * {{{String simpleSearchReturn}}} The search result sorted by relevance as structured XML document. See [[#resultXML|description]] below.

Throws:
 * {{{java.rmi.RemoteException}}}

=== simpleSearchSorted ===
This method executes the given query and returns a search result ranked by the given sort key.

Input parameters:
 * {{{String query}}} The query string.
 * {{{int numberOfRecords}}} The maximum number of records returned in search result.
 * {{{int startIndex}}} The number of the first record to return.
 * {{{String sortKey}}} The key to sort by.
 * {{{boolean reverse}}} A boolean indication whether or not to sort in reverse.

Returns:
 * {{{String simpleSearchReturn}}} The search result sorted by the given key, reversed if so indicated, as structured XML document. See [[#resultXML|description]] below.

Throws:
 * {{{java.rmi.RemoteException}}}

<<Anchor(resultXML)>>
== Result XML Description ==

The result string defined by Summa is XML, in the following form:

{{{
<?xml version="1.0" encoding="UTF-8" ?>
<responsecollection>
  <response> response-xml-1 </response>
  <response> response-xml-2 </response>
  ...
<responsecollection>
}}}

Possible responses (in place of {{{response-xml-1}}}, {{{response-xml-2}}}, ... above) are document response, facet result and others. In DOMS we only use document response, which looks like this:
{{{
<documentresult filter="..." query="..." startIndex="..." maxRecords="..." sortKey="..."
                reverseSort="..." fields="..." searchTime="..." hitCount="...">
  <record score="..." sortValue="...">
    <field name="recordID">...</field>
    <field name="shortformat">...</field>
  </record>
  ...
</documentresult>
}}}

Currently, we do not have a schema for the result. The result can be read as follows:

documentresult element
 * Attribute {{{filter}}} is not used in simple search results.
 * Attributes {{{query}}}, {{{startIndex}}}, {{{maxRecords}}}, {{{sortKey}}}, {{{reverseSort}}}: Same as input to method.
 * Attribute {{{fields}}}: Always "recordID, shortformat" in DOMS.
 * Attribute {{{searchTime}}}: Time it took to search.
 * Attribute {{{hitCount}}}: Number of results.

record element
 * Attribute {{{score}}}: relevancy ranking, value from 0 to 1.
 * Attribute {{{sortValue}}} is the value that the sort was performed on.

field element
 * Attribute {{{name}}}: In DOMS always either recordID or shortformat.
 * Contents are the PID for recordID, or XML for shortformat.

The XML for shortformat is of the following form:
{{{
<shortrecord>
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <rdf:Description>
      <dc:title>...</dc:title>
      <dc:creator>...</dc:creator>
      <dc:date>...</dc:date>
      <dc:type xml:lang="da">netdokument</dc:type>
      <dc:type xml:lang="en">net document</dc:type>
      <dc:identifier>...</dc:identifier>
      ...
    </rdf:Description>
  </rdf:RDF>
</shortrecord>
}}}

The important elements are the "dc" fields. They will contain the actual results.

<<Anchor(example)>>
== Result XML Example ==
This example is the same as the one given by the [[http://wiki.statsbiblioteket.dk/summa/Community/Tutorials/MinimalDeployment|Summa Minimal Deployment Tutorial]], except without the facet result response.
##New Example:
{{{
<?xml version="1.0" encoding="UTF-8" ?>
<responsecollection>
<response name="DocumentResponse">
<documentresult query="narrative" startIndex="0" maxRecords="20" sortKey="summa-score" reverseSort="false" fields="main_titel, lsubject, lsu_oai, author_normalised, recordID, shortformat" searchTime="8" hitCount="2">
  <record score="0.20924361" id="122" source="NA">
    <field name="main_titel">Pensare per immagini: una strada per la coscienza</field>
    <field name="lsubject">NoSubject</field>
    <field name="lsu_oai">NoOAI</field>
    <field name="author_normalised">Ferdinando Testa</field>
    <field name="recordID">oai:oai:doaj-articles:badd9ac32fc2e096cf76fec4f0d19250</field>
    <field name="shortformat"><shortrecord>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description>
<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">Pensare per immagini: una strada per la coscienza</dc:title>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Ferdinando Testa</dc:creator>
<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2005</dc:date>
<dc:type xml:lang="da" xmlns:dc="http://purl.org/dc/elements/1.1/">netdokument</dc:type>
<dc:type xml:lang="en" xmlns:dc="http://purl.org/dc/elements/1.1/">net document</dc:type>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.analisiqualitativa.com/magma/0304/articolo_01.htm</dc:identifier>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.doaj.org/doaj?func=openurl&amp;genre=article&amp;issn=17219809&amp;date=2005&amp;volume=03&amp;issue=04&amp;spage=</dc:identifier>
<dc:format xmlns:dc="http://purl.org/dc/elements/1.1/">todo</dc:format>
</rdf:Description>
</rdf:RDF>
</shortrecord></field>
  </record>
  <record score="0.20924361" id="149" source="NA">
    <field name="main_titel">La narrazione: dimensione ontologica della formazione</field>
    <field name="lsubject">NoSubject</field>
    <field name="lsu_oai">NoOAI</field>
    <field name="author_normalised">Francesca Pulvirenti</field>
    <field name="recordID">oai:oai:doaj-articles:dd2dffe34df1293e045aee58f06a5c3f</field>
    <field name="shortformat"><shortrecord>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description>
<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">La narrazione: dimensione ontologica della formazione</dc:title>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Francesca Pulvirenti</dc:creator>
<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2005</dc:date>
<dc:type xml:lang="da" xmlns:dc="http://purl.org/dc/elements/1.1/">netdokument</dc:type>
<dc:type xml:lang="en" xmlns:dc="http://purl.org/dc/elements/1.1/">net document</dc:type>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.analisiqualitativa.com/magma/0303/editoriale.htm</dc:identifier>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.doaj.org/doaj?func=openurl&amp;genre=article&amp;issn=17219809&amp;date=2005&amp;volume=03&amp;issue=03&amp;spage=</dc:identifier>
<dc:format xmlns:dc="http://purl.org/dc/elements/1.1/">todo</dc:format>
</rdf:Description>
</rdf:RDF>
</shortrecord></field>
  </record>
</documentresult>
</response>
</responsecollection>
}}}


##Old Example:
##{{{
##<responsecollection>
##<response name="DocumentResponse">
##<documentresult query="Hans" startIndex="0" maxRecords="20"
##sortKey="summa-score" reverseSort="false" fields="recordID, shortformat"
##searchTime="105" hitCount="1">
## <record score="0.37572" id="0" source="NA">
## <field name="recordID">fagref:hj@example.com</field>
## <field name="shortformat">&amp;lt;shortrecord&amp;gt;
##&amp;lt;rdf:RDF
##xmlns:rdf=&amp;quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;&gt;
##&amp;lt;rdf:Description&amp;gt;
##&amp;lt;dc:title&amp;gt;Fagekspert i Datalogi&amp;lt;/dc:title&amp;gt;
##&amp;lt;dc:creator&amp;gt;Hans Jensen&amp;lt;/dc:creator&amp;gt;
##&amp;lt;dc:type
##xml:lang=&amp;quot;da&amp;quot;&amp;gt;person&amp;lt;/dc:type&amp;gt;
##&amp;lt;dc:type
##xml:lang=&amp;quot;en&amp;quot;&amp;gt;person&amp;lt;/dc:type&amp;gt;
##&amp;lt;dc:identifier&amp;gt;hj@example.com&amp;lt;/dc:identifier&amp;gt;
##&amp;lt;/rdf:Description&amp;gt;
##&amp;lt;/rdf:RDF&amp;gt;
##&amp;lt;/shortrecord&amp;gt;</field>
## </record>
##</documentresult>
##</response>
##</responsecollection>
##}}}
##
##The format of the "shortformat" field is (decoded version of contents above):
##
##{{{
##<shortrecord>
##<rdf:RDF
##xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
##<rdf:Description>
##<dc:title>Fagekspert i Datalogi</dc:title>
##<dc:creator>Hans Jensen</dc:creator>
##<dc:type
##xml:lang="da">person</dc:type>
##<dc:type
##xml:lang="en">person</dc:type>
##<dc:identifier>hj@example.com</dc:identifier>
##</rdf:Description>
##</rdf:RDF>
##</shortrecord>
##}}}

Search API

DOMS Search uses simple search methods of the Summa Search interface.

WSDL: DomsGUISearch.xml

Content of this page:

Operations

simpleSearch

This method executes the given query and returns a search result ranked by relevance.

Input parameters:

  • String query The query string.

  • int numberOfRecords The maximum number of records returned in search result.

  • int startIndex The number of the first record to return.

Returns:

  • String simpleSearchReturn The search result sorted by relevance as structured XML document. See description below.

Throws:

  • java.rmi.RemoteException

simpleSearchSorted

This method executes the given query and returns a search result ranked by the given sort key.

Input parameters:

  • String query The query string.

  • int numberOfRecords The maximum number of records returned in search result.

  • int startIndex The number of the first record to return.

  • String sortKey The key to sort by.

  • boolean reverse A boolean indication whether or not to sort in reverse.

Returns:

  • String simpleSearchReturn The search result sorted by the given key, reversed if so indicated, as structured XML document. See description below.

Throws:

  • java.rmi.RemoteException

Result XML Description

The result string defined by Summa is XML, in the following form:

<?xml version="1.0" encoding="UTF-8" ?>  
<responsecollection>  
  <response>  response-xml-1 </response>  
  <response>  response-xml-2 </response>  
  ... 
<responsecollection> 

Possible responses (in place of response-xml-1, response-xml-2, ... above) are document response, facet result and others. In DOMS we only use document response, which looks like this:

<documentresult filter="..." query="..." startIndex="..." maxRecords="..." sortKey="..." 
                reverseSort="..." fields="..." searchTime="..." hitCount="...">  
  <record score="..." sortValue="...">  
    <field name="recordID">...</field>  
    <field name="shortformat">...</field>  
  </record>  
  ... 
</documentresult>

Currently, we do not have a schema for the result. The result can be read as follows:

documentresult element

  • Attribute filter is not used in simple search results.

  • Attributes query, startIndex, maxRecords, sortKey, reverseSort: Same as input to method.

  • Attribute fields: Always "recordID, shortformat" in DOMS.

  • Attribute searchTime: Time it took to search.

  • Attribute hitCount: Number of results.

record element

  • Attribute score: relevancy ranking, value from 0 to 1.

  • Attribute sortValue is the value that the sort was performed on.

field element

  • Attribute name: In DOMS always either recordID or shortformat.

  • Contents are the PID for recordID, or XML for shortformat.

The XML for shortformat is of the following form:

<shortrecord>
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
    <rdf:Description>
      <dc:title>...</dc:title>
      <dc:creator>...</dc:creator>
      <dc:date>...</dc:date>
      <dc:type xml:lang="da">netdokument</dc:type>
      <dc:type xml:lang="en">net document</dc:type>
      <dc:identifier>...</dc:identifier>
      ...
    </rdf:Description>
  </rdf:RDF>
</shortrecord>

The important elements are the "dc" fields. They will contain the actual results.

Result XML Example

This example is the same as the one given by the Summa Minimal Deployment Tutorial, except without the facet result response.

<?xml version="1.0" encoding="UTF-8" ?>
<responsecollection>
<response name="DocumentResponse">
<documentresult query="narrative" startIndex="0" maxRecords="20" sortKey="summa-score" reverseSort="false" fields="main_titel, lsubject, lsu_oai, author_normalised, recordID, shortformat" searchTime="8" hitCount="2">
  <record score="0.20924361" id="122" source="NA">
    <field name="main_titel">Pensare per immagini: una strada per la coscienza</field>
    <field name="lsubject">NoSubject</field>
    <field name="lsu_oai">NoOAI</field>
    <field name="author_normalised">Ferdinando Testa</field>
    <field name="recordID">oai:oai:doaj-articles:badd9ac32fc2e096cf76fec4f0d19250</field>
    <field name="shortformat"><shortrecord>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description>
<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">Pensare per immagini: una strada per la coscienza</dc:title>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Ferdinando Testa</dc:creator>
<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2005</dc:date>
<dc:type xml:lang="da" xmlns:dc="http://purl.org/dc/elements/1.1/">netdokument</dc:type>
<dc:type xml:lang="en" xmlns:dc="http://purl.org/dc/elements/1.1/">net document</dc:type>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.analisiqualitativa.com/magma/0304/articolo_01.htm</dc:identifier>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.doaj.org/doaj?func=openurl&amp;genre=article&amp;issn=17219809&amp;date=2005&amp;volume=03&amp;issue=04&amp;spage=</dc:identifier>
<dc:format xmlns:dc="http://purl.org/dc/elements/1.1/">todo</dc:format>
</rdf:Description>
</rdf:RDF>
</shortrecord></field>
  </record>
  <record score="0.20924361" id="149" source="NA">
    <field name="main_titel">La narrazione: dimensione ontologica della formazione</field>
    <field name="lsubject">NoSubject</field>
    <field name="lsu_oai">NoOAI</field>
    <field name="author_normalised">Francesca Pulvirenti</field>
    <field name="recordID">oai:oai:doaj-articles:dd2dffe34df1293e045aee58f06a5c3f</field>
    <field name="shortformat"><shortrecord>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description>
<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">La narrazione: dimensione ontologica della formazione</dc:title>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Francesca Pulvirenti</dc:creator>
<dc:date xmlns:dc="http://purl.org/dc/elements/1.1/">2005</dc:date>
<dc:type xml:lang="da" xmlns:dc="http://purl.org/dc/elements/1.1/">netdokument</dc:type>
<dc:type xml:lang="en" xmlns:dc="http://purl.org/dc/elements/1.1/">net document</dc:type>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.analisiqualitativa.com/magma/0303/editoriale.htm</dc:identifier>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">http://www.doaj.org/doaj?func=openurl&amp;genre=article&amp;issn=17219809&amp;date=2005&amp;volume=03&amp;issue=03&amp;spage=</dc:identifier>
<dc:format xmlns:dc="http://purl.org/dc/elements/1.1/">todo</dc:format>
</rdf:Description>
</rdf:RDF>
</shortrecord></field>
  </record>
</documentresult>
</response>
</responsecollection>

Search API (last edited 2010-03-17 13:09:38 by localhost)