The Connected Histories API

Connected Histories (http://www.connectedhistories.org/) brings together a range of digital resources related to early modern and nineteenth century Britain with a single federated search that allows sophisticated searching of names, places and dates, as well as the ability to save, connect and share resources within a personal workspace. The Connected Histories API enables users to connect programmatically to the search engine, using GET parameters, and retrieve search results in an XML format.

API Location

The Connected Histories API is available at http://ch1.shef.ac.uk/connectedhistories/CHSearch.jsp.

Input Parameters

At least one person, given, family or place name or keyword must be specified. Any number of parameters may be combined but, unless otherwise stated, each parameter can only have single value - see URL-encoded Strings for details on how to search for multiple values or phrases. Input parameters should be appended as parameters to the URL (e.g. http://ch1.shef.ac.uk/connectedhistories/CHSearch.jsp?rsg=Katherine&sr=cr,ob,or. Further examples of API queries can be found in Example API Queries).

Queries that return a large number of matches, particularly a large number of matches in the larger collections (such as the Newspapers), may take a relatively long time to return (perhaps upwards of 30 seconds). For example, keyword searching for London across all the datasets is likely to lead to a noticeable delay. It is therefore desirable to specify as many, and as accurate, parameters as possible.

Parameter code Parameter Permitted values
kw Keyword Any URL-encoded String. Details about the format of valid URL-encoded search strings can be found beneath the table.
sr Collection Two-letter code specifying the collection. More than one collection may be specified; each collection code should be separated by a comma (URL-encoded as %2c). If this parameter is not specified, all collections are searched. Valid codes are:-
  • bc - Cause Papers in the Diocesan Courts of the Archbishopric of York, 1300-1858
  • bh - British History Online
  • bm - British Museum Image Collection Database
  • bu - British Newspapers, 1600-1900
  • cb - Charles Booth Online Archive
  • cd - The Clergy of the Church of England Database
  • cr - Convict Transportation Records
  • dm - Database of Mid-Victorian Wood-Engraved Illustration
  • hp - History of Parliament
  • jb - Transcribed Papers of Jeremy Bentham
  • jf - John Foxe's The Acts and Monuments Online
  • jj - John Johnson Collection of Printed Ephemera
  • jp - (JSTOR) 19th Century British Pamphlets
  • lf - Lane's Masonic Records
  • ll - London Lives 1690 - 1800 - Crime, Poverty and Social Policy in the Metropolis
  • ob - The Proceedings of the Old Bailey Online, 1674-1913
  • or - Origins.net
  • pp - House of Commons Parliamentary Papers
  • sc - Science in the Nineteenth-Century Periodical
  • st - John Strype's Survey of London Online
  • vh - Victoria County History
  • wi - Witches in Early Modern England
The order in which the collections are specified does not matter and does not affect the returned results.
rsPerson Name Any valid URL-encoded String. The search will provide matches that contain at least one person name that matches all the criteria - i.e. a search for George* Clarence*, will search for any matches will contain a person name which contains both George and Clarence (in that order) and other characters in place of the wildcard. This field can also be used to search for people referred to by specific titles, e.g. Bishop of London, as well as given and family names.
rsfFamily Name Any valid URL-encoded String.
rsgGiven Name Any valid URL-encoded String.
pcPlace Name Any valid URL-encoded String.
dtfDate From Any date in the form yyyy or yyyy-mm-dd. Zero may be used (for example 1625-00-00 is equivalent to 1625 or 1625-03-00 may be used to search for March, 1625). The date search includes dates mentioned in the document as well as dates associated with the document. The date from should be equal to or less than the date to (if a date to is specified). Technically the date range covered by the documents is 1500-1900, although, as all identified dates are indexed, searches for dates beyond this date range may result in results.
dttDate To Any date in the form yyyy or yyyy-mm-dd. Zero may be used (for example 1625-00-00 is equivalent to 1625 or 1625-03-00 may be used to search for March, 1625). The date search includes dates mentioned in the document as well as dates associated with the document. The date to should be equal to or more than the date from (if a date from is specified). Technically the date range covered by the documents is 1500-1900, although, as all identified dates are indexed, searches for dates beyond this date range may result in results.
acAccess A single letter code. The default is all access types.
  • f - free / no subscription
  • s - subscription
ctCategory A single letter code. More than one category may be specified; each category code should be separated by a comma (URL-encoded as %2c). Some collections and / or results may match more than one category - for example, Strype is a book containing a significant number of Maps, so it falls into both the "Books, Pamphlets and Printed Ephemera" category and the "Images and Maps" category.
  • a - all (default if not provided)
  • b - Books, Pamphlets and Printed Ephemera
  • g - National Government Records
  • i - Images and Maps
  • l - Local Records
  • n - Newspapers
  • p - Parliamentary
ccChannel Count The number of channels to return (a channel represents the results for one of the source collections). A numerical value greater than 0. It defaults to 10.
csChannel Start The first channel to return. This should be used in conjunction with cc (Channel Count) to 'page' through the channels. A numerical value greater than 0. It defaults to 1.
hcHits per Channel The number of results to return for each channel (a channel represents the results for one of the source collections). This is ignored if the search returns a mixed channel. A numerical value between 1 and 5. It defaults to 5.
hpHits per Page The number of hits to return as a mixed channel. A mixed channel contains results from several different collections and is returned when the search returns only "hits per page" number of hits or less. A numerical value between 1 and 50. It defaults to 20.

URL-encoded Strings

Search strings may contain several words, separated by spaces (URL-encoded as +) - which will result in an OR search, matching texts that contain any of the words. An exact match may be carried out by enclosing a multi-word phrase in double quotes, or the user may specify that one or more of the words must be present by using a plus (URL-encoded as %2b) in front of the required word(s). Results containing certain words can be excluded by placing a minus sign(-) in front of the words. + and - can not be used to mandate or exclude words within exact phrases (i.e. within double quoted phrases) but can be used before phrases to indicate a required or excluded phrase.

* can be used as a wildcard in the middle or end of a word or phrase (e.g. ta*lor returns taylor and tailor), but not at the start; the more letters you specify the better.

XML Output

The results are returned as a UTF-8 encoded XML document with a root element, CHSP. The XML document consists of two main subsections - a Q element, which details the query submitted and a RES element which provides the results, and also facets which can be used to drill down through the results. Results are generally returned in channels, each channel representing a single collection / data source.

The XSD schema defining the structure of the XML can be accessed at http://ch1.shef.ac.uk/connectedhistories/xsd/CHSP.xsd. Further information about the XML can be found in comments in the XML document itself - see http://ch1.shef.ac.uk/connectedhistories/CHSearch.jsp?dtf=1625&dtt=1700&rsf=Galvin for an example.

REFINE

The REFINE section contains details of search facets, each detailed with a PARAM element, that can be used to drill down into the results. Each result set can be drilled down via by Document Type ("ct"), date ("dt". N.B. The date is any date found within or associated with the document, not necessarily a publication or production date. So a newspaper article written in 1876, describing Napoleon's and Nelson's lives, might fall into several different date categories, ranging from 1750s to the 1870s), and availability ("ac" - whether the source material is available to all users or only to those with a subscription). Each PARAM element has three attributes - name(ct, dt or ac); value (a String description of the limits of the facet), and match (the number of documents that match this facet). As documents may match more than one facet description, the total number of matches may exceed the total number of results.

CHANNEL

The actual search results are contained in one or more channels. A single channel result is returned if:-

Unless fewer than the hp (hits per page) number of results are returned, each channel relates to one specific source collection. The CHANNEL element has attributes sn (start index for results), en (end index for results), ip(items per page), m (total number of matches for this channel), type sid (electronic resource id, equivalent to the sr codes for queries, or "nyi" if a mixed channel - further details on the sr codes can be found in the Input Parameters table). The channels are ordered by Lucene scores - the one with the highest scoring first result is returned first. This means that the channels have no fixed order.

Each channel has a channel name (CN) which either identifies it as a mixed channel or provides a brief textual description of the source collection. Each channel then consists of a number of R elements, each of which contains a single (numbered) result or match. Within the R Element, the T element provides the match title, the U element provides the URL (the link to the document in the source collection) and the S element - a text snippet which usually contains some highlighted terms (within b tags) that match the search query. As many of the source collections require the user to have a subscription, the URL may display an error rather than direct the user to the source material if the user is not currently logged in, via a personal or institutional subscription.

Example API queries

Exclusion (Napoleon, not Bonaparte) - http://ch1.shef.ac.uk/connectedhistories/CHSearch.jsp?kw=%2bNapoleon+-Bonaparte
Exact phrase - http://ch1.shef.ac.uk/connectedhistories/CHSearch.jsp?kw=%22Robin+Hood%22
Specific collections - http://ch1.shef.ac.uk/connectedhistories/CHSearch.jsp?rsg=Katherine&sr=cr,ob,or
Wildcard - http://www.connectedhistories.org/Search_results.aspx?rs=Katherine*
Dates (years) - http://ch1.shef.ac.uk/connectedhistories/CHSearch.jsp?dtf=1625&dtt=1700&rsf=Galvin
Full dates - http://www.connectedhistories.org/Search_results.aspx?dtf=1812-02-14&dtt=1812-02-16&pc=London
A mixed channel - http://ch1.shef.ac.uk/connectedhistories/CHSearch.jsp?kw=polonium
An invalid request (generates error page) - http://ch1.shef.ac.uk/connectedhistories/CHSearch.jsp?kw=polonium&dtf=15-0000--00-00-00

Errors

Invalid use of codes or invalid data will usually result in the return of an empty result set, such as this. Where another error occurs, an error document is returned. Please contact us if you encounter unexpected errors.

Contact Us

The HRI developer currently responsible for maintaining the Connected Histories API is Katherine Rogers. Technical support may also be also be requested from hri-support@sheffield.ac.uk. Further information on HRI Digital can be found at http://hridigital.shef.ac.uk/.