2 Technical terms
2.1 abstract
An abstract is the short summary at the beginning of scientific publications, e.g. journal articles, theses or conference papers. The abstract is among the most important parts of a publication as it is usually the only part of the text which is freely available and can be retrieved from bibliographic databases. Title and abstract are the key elements of a textword search. (Cals and Kotz (2013), Pitkin and Branagan (1998))
2.2 ambiguity
A term or statement which has more than one possible meaning or definition is ambiguous. Ambiguity proves to be an obstacle in creating search strategies, because any free text search with ambiguous terms will inevitably retrieve records for all meanings of the term regardless of the context. In this case, it is an option to render such terms more specific by using phrases or proximity operators.
- The acronym
CVD
is a very common abbreviation for cardiovascular disease. However, it can also mean cerebrovascular disease, chronic venous disease, color vision deficiency or chemical vapour deposition. - The term
pharmacy
may refer to the pharmaceutical sciences, the manufacture of drugs or the pharmacy as a retail shop. - In anatomy, a
styloid process
is a pointed outgrowth from a bone. However, there are serveral of these in the human body, for instance in the temporal bone, the radius bone, the ulna bone or the metacarpal bones. - The search term
crab
might lead to articles about one of two very different kinds of arthropods: True crabs (Brachyura) or crab lice (Phthirus).
2.3 appendix
An appendix provides additional content to a publication and is often called supplementary material or supplementary information. Usually any content which goes beyond the constraints of the publication is provided there. Examples are the full search strategies of systematic literature searches, detailed descriptions of methods or measurements.
It is also possible to publish such supplementary information or research data independently in online repositories.
2.5 bias
Bias is a systematic deviation between results and facts that may lead to under- or over-estimation of intervention effects. As a result bias might lead to conclusions which do not accurately represent the truth. There are various types and sources for bias, as well as methods to avoid some forms of bias, such as randomization or blinding of participants in clinical trials.
Systematic literature searching is a means to reduce bias. Systematic reviews strive to minimise these deviations by assessing the risk of bias in results of included studies. (Braun et al. (2021), Boutron et al. (2022), Gough, Oliver, and Thomas (2017))
2.5.1 selection bias
Systematic differences between comparison groups lead to a deviation from the true effect of an intervention. This so-called selection bias can be prevented by measures such as the sufficient randomization and concealment of allocation of trial participants to different study arms.(Gough, Oliver, and Thomas (2017), Odgaard‐Jensen et al. (2011))
2.5.2 publication bias
Results perceived as “positive” are more likely to be published than those results which are perceived as “negative”, which gives the positive results more weight. (Gough, Oliver, and Thomas (2017))
2.5.3 language of publication bias
In systematic literature searches it is often tempting to restrict the search to languages which are easily understood by the screeners and researchers. Not only will this practive keep the number of records at a more manageable level, it also makes the translation of foreign publications unnecessary. However, this so-called language of publication bias can have a significant impact on the quality of the review. (Gough, Oliver, and Thomas (2017), Morrison et al. (2012), Moher et al. (2003))
2.6 bibliographic database
Bibliographic databases comprise references to publications, such as articles in peer-reviewed journals, reports, patents, book chapters or conference proceedings.
As opposed to full text databases, bibliographic databases only provide bibliographic information or metadata. Bibliographic records typically include title, abstract, author(s), publication year, journal name, and the DOI or other persistent identifiers.
References in bibliographic databases are often indexed with subject headings to facilitate the retrieval of relevant records, for instance during a systematic literature search.
2.7 classification metrics
In information retrieval the performance of a systematic search, a search strategy or a search filter is described by certain metrics for binary classification tasks, such as accuracy, precision, sensitivity and specificity.
A classifier (the search) makes a prediction about the condition of a record (by retrieving or not retrieving the record). The classification (the search result) is evaluated by comparing the prediction with the actual condition (the relevance of the records).
In other words: The literature search is supposed to retrieve mostly relevant references and ignore non-relevant ones. A retrieved relevant record equals a true positive (tp), whereas a retrieved non-relevant record equals a false positive (fp). See Table 2.1 for reference.
record | relevant | irrelevant |
---|---|---|
retrieved | tp | fp |
not retrieved | fn | tn |
2.7.1 accuracy
Accuracy, also called fraction correct (FC), is a statistical measure of how well a binary classifier correctly (“true”) identifies a condition (“positive or negative”). It is defined as the ratio of all true classifications (true positives and true negatives) to the total number of classifications. (Haynes and Wilczynski (2004), Lefebvre et al. (2017), Fawcett (2006))
It can be calculated according to the following equation: \[ \text{accuracy} = \tfrac{tp + tn}{tp + fp + fn + tn} \]
2.7.2 precision
Precision, also called positive predictive value (PPV), is a performance metric for the retrieval of information. It is the fraction of all relevant records among all retrieved records, which can be written as:
\[ \text{precision} = \tfrac{tp}{tp + fp} \]
A high-precision search tries to retrieve as few non-relevant records as possible, usually missing out on relevant records.
2.7.3 sensitivity
Sensitivity, also called true positive rate (TPR), recall or hit rate, is a performance metric for the retrieval of informaton, similar to the precision. It equals the probability with which relevant records are correctly identified. (See Haynes and Wilczynski (2004), Lefebvre et al. (2017)).
It is the ratio of all relevant retrieved records and the total of all relevant records: \[ \text{sensitivity} = \tfrac{tp}{tp + fn} \] The idea of a sensitive search is to retrieve as many relevant records as possible, which results in retrieving more non-relevant records in the process.
2.8 data field
A data field (also called field or column) is a set of values of a particular data type within a database. For instance the data fields for the author or the title contain text strings whereas the fields for the issue, volume or PubMed ID contain numerical values.
Data fields possess designations in the form of field codes or search field tags such as PMID
, AU
, TI
and AB
for the fields of unique identifier, author, title and abstract.
PMID | AU | TI |
---|---|---|
7616995 | J. P. Kassirer, M. Angell | Redundant publication: a reminder |
16040884 | A. K. Akobeng | Principles of evidence based medicine |
22071866 | T. Young, S. Hopewell | Methods for obtaining unpublished data |
It always depends on the syntax of the database or search interface which fields can be searched and by what code they are searchable.
2.9 dataset
Datasets or records of a database are collections of data. In a tabular database they correspond to the rows of the table, as shown in Table 2.2. A record in such a database consists of values for the given columns or data fields of the table.
Records of bibliographic databases are called references. These contain the metadata referring to a publication, such as title, abstract, authors, journal name or publication date.
Datasets within clinical trials registries contain metadata for clinical trials, such as registration number, study type, research institution, study status, etc.
Records within fulltext databases also feature a fulltext document, as opposed to bibliographic databases.
2.10 digital object identifier
A digital object identifier (DOI) is a persistent identifier issued by the DOI foundation. It is used to uniquely identify publications.
DOIs take the form of character strings which consist of a prefix and a suffix, separated by a slash /
. The prefix identifies the registrant of the DOI (usually the publisher of an article) and takes the form 10.xxxx
, where xxxx
is a number greater than or equal to 1000. After the prefix and the slash follows the suffix, which is chosen by the registrant for the particular digital object.
DOIs can be resolved using the website of the International DOI Foundation or the Handle.Net Registry.
doi:10.1000/182
can be resolved via https://doi.org/10.1000/182 or https://hdl.handle.net/10.1000/182 and leads to the DOI handbook.
2.11 eligibility criteria
A very important step at the beginning of any review project is the definition of certain eligibility criteria on the basis of the research question. There are two types of criteria:
Inclusion criteria must be met by studies in order for them to be included in the review. In contrast, if a study meets one or more of the exclusion criteria, it will be excluded from the review. See also McKenzie et al. (2023), Gough, Oliver, and Thomas (2017).
- Inclusion criteria:
- adults (18 years or older)
- patients with chronic non-cancer pain
- randomized controlled trials
- Exclusion criteria:
- acute pain, post-surgical pain
- chronic cancer pain
- pregnancy
2.12 evidence
Scientific evidence or simply evidence is information obtained by conducting experiments or by analyzing empirical data in accordance with the scientific method. Scientific evidence is used to support or disprove scientific hypotheses and in consequence to inform evidence-based decision-making.
2.13 evidence-based medicine
Evidence-based medicine (EBM) is commonly understood as the “conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients. The practice of evidence based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research.”(Sackett et al. (1996))
2.13.1 types of evidence
There are various types of research with a varying degree of evidence. This is often represented in the form of a so-called evidence pyramid (see Figure 2.2).
The pyramid shape suggests the availability of a high amount of fundamental research and personal experience as a foundation for scientific studies with an increasing level of evidence. The very top of the pyramid encompasses various types of evidence synthesis, i.e. various kinds of (systematic) reviews (see Section 5.1) which summarize primary studies.
2.14 FAIR
FAIR is an acronym for the four principles Findability, Accessibility, Interoperability and Reusability, which serve as a guideline for the management of scientific or scholarly data. A consortium of scientists and organizations defined these FAIR principles in order to advance the machine-actionability of data. (Wilkinson et al. (2016))
Data that meet these principles are called FAIR Data.
Findability | Data should be provided with sufficient metadata, such as title, authors, summary, and information about the origin of the data. Moreover, a globally unique and persistent identifier, such as a digital object identifier (DOI), should be assigned to the data. |
Accessibility | Data and their metadata should be made long-term accessible through standardized communication protocols, such as https. |
Interoperability | Controlled vocabulary (thesauri) such as Medical Subject Headings (MeSH) or formats for metadata such as XML, which can be read by both humans and machines, allow for the creation of interoperable metadata and links between datasets. |
Reusability | Reusability depends on the quality of the metadata, a proper description of the provenance of the research data and its citability (for instance by using DOIs) under clearly stated license conditions. |
The GO FAIR Initiative provides additional information and a platform for anyone interested in applying the FAIR principles.
DataCite is an international non-profit organization which strives to facilitate access to research data on the internet, to increase the acceptance of research data as citable contributions to the research records and to support data archiving for the purposes of reutilization and verification.
A factsheet about the FAIR Principles, published by the Association of European Research Libraries LIBER, highlights the role of the libraries in their implementation.
2.15 full-text
The complete texts of publications (e.g. articles, books, chapters, reports, …) are called full-texts.
Most of the available databases for literature searching are bibliographic databases, which means they do not contain the full-text, but bibliographic references to the publications, in most cases including a link to the publisher, where the full-text can be obtained.
Open Access publications can be accessed freely, whereas non-open access publications usually require a paid subscription or access fee. Alternatively, they can be ordered using the document delivery service of a university library.
There are tools that can help to identify the shortest path to the full-text:
Also reference management programs often provide ways to retrieve available full-texts for the managed references. Sometimes this automated retrieval fails due to incompatibility or security measures of the publishers, even when the full-text would be otherwise accessible (e.g. EndNote).
2.16 indexing
Indexing is the process in which index terms are assigned to records. This is done by the database provider in order to indicate what the referenced document is about, independent of its explicit title or abstract. In other words, an indexed record can be retrieved systematically based on its contextual meaning and implicit contents, rather than by searching verbatim expressions in the text.
2.17 orphan line
All parts of a search strategy are supposed to contribute to the overall result of the database search. In case a search query within a search strategy is not connected (using operators) to the rest of the search strategy, it is called an orphan line.
1 hypertension/
2 (hypertension or high blood pressure).ti,ab.
3 *patient attitude/
4 *patient satisfaction/
5 (choice$ or empower$).ti.
6 1 or 2
7 3 or 4
8 6 and 7
2.18 PMID and PMCID
The PubMed ID (PMID) and the PubMed Central ID (PMCID) are unique identifiers assigned to the records within the databases PubMed and PubMed Central. They are similar to the digital object identifier (DOI).
PMIDs are unique integer values, e.g. 32256971, PMCIDs are composed of the prefix PMC followed by a series of numbers, e.g. PMC7106990.
PubMed records can easily be found simply by entering their PMIDs as search terms into the PubMed search.
The National Library of Medicine provides a tool for the conversion of the PMID, PMCID and DOI into one another. This tool only works for records which are both part of PubMed and PubMed Central.
2.19 retractions
The review of manuscripts by fellow scientists (aka peer review) as part of the publication process is supposed to protect the scientific community from frauds, detect errors or false conclusions and in doing so uphold a certain level of quality. This, however, is not always enough.
In cases where serious errors or even scientific misconduct are detected only after an article is published, it may get retracted. The retraction can be initiated by the authors themselves, their affiliated institutions or the journal editors. On the publisher’s website and within bibliographic databases, such an article usually gets flagged as being retracted and a retraction notice is published, which explains the cause for the retraction.
In bibliographic databases, such as PubMed, retracted publications and the retraction notices are separate records, which can be searched individually.
The blog Retraction Watch, run by the non-profit organization Center for Scientific Integrity, keeps an eye on current retractions and reports on developments in this area.
2.20 seed paper
A systematic literature search often begins with a quick scoping search for a handful of publications that provide an answer to the research question. These publications are called seed paper, key paper or core paper.
Apart from making oneself familiar with the topic by reading them, these seed papers can be put to use in:
- Extraction of index terms and free-text expressions.
- Citation searching.
- Testing search strategies (see Table 4.4).
From the own experience of the author of this compendium, one can further divide seed papers into two categories:
- Publications which answer the research question and which can be used as a template for similar literature which should be picked up by a systematic search, because they might be included in the review.
- Publications which do not fully cover all the aspects of the question, but provide background information or search terms for at least one of the main concepts. These do not qualify as included studies for the review project. An example are review articles which are similar (but not identical) to what the researchers have in mind for the present project.
Often, researchers are not aware of this distinction and its implications for the systematic literature search.
Ideally, the systematic search should retrieve all of the type 1 seed papers, whereas type 2 papers may or may not be found by the search. Type 2 paper are useful for preparation, citation searching (e.g. for relevant studies of previous systematic reviews) or to learn important search terms from previous search strategies.
2.21 syntax
Similar to programming, the syntax is the set of rules that applies for setting up search queries and building search strategies in databases. It defines the operators, field codes and special characters (such as wildcard symbols, parentheses, slashes, quotation marks, etc.) that are available within a particular search interface.
As a consequence, search strategies cannot be used freely in every database. They have to be translated due to different syntax and due to different index terms. (See Clark et al. (2020), Glanville et al. (2019), Wanner and Baumann (2019), Damarell, Tieman, and Sladek (2013)).
PubMed | "ocular hypertension"[tiab] |
Embase | 'ocular hypertension':ti,ab |
Ovid | "ocular hypertension".ti,ab. |
Cochrane Library | "ocular hypertension":ti,ab |
Scopus | TITLE-ABS({ocular hypertension}) |
Web of Science | TI=("ocular hypertension") OR AB=("ocular hypertension") |
EBSCOhost | (TI "ocular hypertension") OR (AB "ocular hypertension") |
2.22 thesaurus
A thesaurus (ancient greek: θησαυρός (thesaurós) ‘treasury’) is a dictionary of synonyms, which are often ordered alphabetically or hierarchically.
In the context of literature searching, a thesaurus contains a database-specific controlled vocabulary of index terms.
Database | Thesaurus |
---|---|
PubMed/MEDLINE | Medical Subject Headings (MeSH) |
Embase | Emtree |
Cochrane Library | Medical Subject Headings (MeSH) |
CINAHL | CINAHL Subject Headings |
APA PsycInfo | Thesaurus of Psychological Index Terms |
Global Health | CABI Thesaurus |
ERIC | ERIC Thesaurus |
These vocabularies usually list preferred terms for indexing, their definitions as well as lists of synonyms for each of those index terms. The index terms are arranged hierarchically, ranging from very broad categories to very specific terms.
All MeSH Categories
Anatomy Category
Body Regions
Torso
Thorax
Thoracic Cavity
Pleural Cavity
The controlled vocabularies are regularly updated, new index terms are introduced or hierarchies rearranged. (See What’s New in MeSH for example).
2.23 wildcard
Wildcards are special characters used for truncation. The usage and meaning of the available wildcards for this purpose depends on the syntax of the database or search interface.
character | name |
---|---|
* |
asterisk |
$ |
dollar sign |
? |
question mark |
# |
hash sign, pound sign |