In addition to command line usage, documented in the section Command-line interface manual, PubFetcher can be used as a library. This section is a short overview of the public interface of the source code that constitutes PubFetcher. Documentation in the code itself is currently sparse.
BasicArgs is the abstract class used as base class for FetcherArgs and FetcherPrivateArgs and other command line argument classes in “org.edamontology” packages that use JCommander for command line argument parsing and Log4J2 for logging. It provides the
--log keys and functionality.
FetcherArgs and FetcherPrivateArgs are classes encapsulating the parameters described in Fetching and Fetching private. Arg and Args are used to store properties of each parameter, like the default value or description string (this comes in useful in EDAMmap, where parameters, including fetching parameters, are displayed and controllable by the user).
IllegalRequestException is a custom Java runtime exception thrown if there are problems with the user’s request. The exception message can be output back to the user, for example over a web API.
The main class of interest for a potential library user is however PubFetcher. This class contains most of the public methods making up the PubFetcher API. Currently, it is also the only class documented using Javadoc. Some of the methods (those described in Publication IDs and Miscellaneous) can be called from PubFetcher-CLI.
Package pubfetcher.core.db (and subpackages)¶
The Database class can be used to initialise a database file, put content to or get or remove content from the database file, get IDs contained or ask if an ID is contained in the database file or compact a database file. The class abstracts away the currently used underlying database system (MapDB). The structure of the database is described in the Database section of the output documentation. Some methods can be called from PubFetcher-CLI, these are described in the corresponding Database section.
DatabaseEntry is the base class for Publication and Webpage. It contains the methods “canFetch” and “updateCounters” whose logic is explained in Can fetch. DatabaseEntryType specifies whether a given DatabaseEntry is a publication, webpage or doc.
Publication, Webpage and most other classes in the “pubfetcher.core.db” packages are the entities stored in the database. These classes contain methods to get and set the value of their fields and methods to output content fields in plain text, HTML or JSON, with or without metadata fields. Their structure is explained in Contents.
Fetcher contains the public method “getDoc”, which is described in Getting a HTML document. The “getDoc” method, but also the “getWebpage” method and the “updateCitationsCount” method can be called from PubFetcher-CLI as seen in Print a web page and Update citations count.
The Fetcher methods “initPublication” and “initWebpage” must be used to construct a Publication and Webpage. Then, the methods “getPublication” and “getWebpage” can be used to fetch the Publication and Webpage. But instead of these “init” and “get” methods, the “getPublication”, “getWebpage” and “getDoc” methods of class PubFetcher should be used, when possible.
Classes in this package deal with scraping, as explained in the Scraping rules section.
The command line interface of PubFetcher, that is PubFetcher-CLI, is implemented in package “pubfetcher.cli”. Its usage is the topic of the first section Command-line interface manual.
The functionality of PubFetcher-CLI can be extended by implementing new operations in a new command line tool, where the public “run” method of the PubFetcherMethods class can then be called to pull in all the functionality of PubFetcher-CLI. One of the main reasons to do this is to implement some new way of getting publication IDs and webpage/doc URLs. These IDs and URLs can then be passed to the “run” method of PubFetcherMethods as the lists “externalPublicationIds”, “externalWebpageUrls” and “externalDocUrls”. One example of such functionality extension is the EDAMmap-Util tool (see its UtilMain class).