![]() |
![]()
| ![]() |
![]()
NAMEref-cache - CRAM reference caching proxy SYNOPSISref-cache [-bLUv] [-l LOG_DIR] [-u URL] -d CACHE_DIR -p PORT DESCRIPTIONref-cache is a caching proxy for reference sequences, for use when encoding and decoding CRAM format sequence alignment files. CRAM can use reference based compression where individual bases in aligned records are compared against a known reference sequence, storing only the bases that differ. This gives better compression, but requires the reference sequence to be supplied from an external source. One way to get these sequences is by querying a server implementing the GA4GH refget standard <https://ga4gh.github.io/refget/>, however this can lead to excessive network traffic and server load if, as is often the case, the same reference is needed more than once. ref-cache makes reference handling easier by keeping copies of downloaded files, allowing them to be reused when they are needed again. As it has been specifically designed to serve reference sequences for CRAM encoders and decoders, ref-cache behaves rather differently to general-purpose caching web proxies:
QUICK-START GUIDECreate directories for the cache and (optionally) log files. Then start up the server in the background, listening on port 8080 and with the EBI's CRAM reference server as the upstream source.
mkdir cached_refs mkdir logs ref-cache -b -d cached_refs -l logs -p 8080 -u https://www.ebi.ac.uk/ena/cram/md5/ To make SAMtools and HTSlib use the server, set its URL in the REF_PATH environment variable (note that colons should be doubled up in the URL, and you should substitute the hostname of your actual server).
REF_PATH='http:://myserver.example.com::8080/%s' export REF_PATH If the cache directory can be made visible to SAMtools/HTSlib processes, it can also be added directly to REF_PATH by putting it before the web server URL. It is necessary to use the full path to the directory, followed by "/%2s/%2s/%s" for the file location due to the way they are stored inside the cache.
REF_PATH='/path/to/cache/%2s/%2s/%s:http:://myserver.example.com::8080/%s' export REF_PATH This is useful as accessing the files directly is more efficient than using http. Files are downloaded to a temporary name and then renamed after validation so processes directly using the cache will never try to use a partly downloaded file. By putting the URL at the end, the web server will pick up any requests for references not already in the cache, download them, provide them to the requester, and store them in the cache. OPTIONS
CLIENT ADDRESS CHECKINGref-cache is designed to serve references to local networks. To ensure that it only responds to the desired clients, it has an allow list of address ranges that it will talk to. If a connection attempt comes from an IP address not in the allowed set, it will be closed immediately. (N.B.: Rejected clients will see a connection open and immediately close, as it's necessary for connections to be opened for the server to discover the peer address. If you want to drop or reject unwanted requests without opening them, you will need to use your operating system's firewall.) The address ranges can be set using the -m option, which may be used more than once. Networks can be specified either as a comma-separated list of CIDR-format blocks (e.g. 192.0.2.0/24, 2001:db8::/32) or using one of the following synonyms: If no -m option is given, the "default" list will be used, as most organisations will be using one or more of these internally. This will be overridden if any -m option appears, in which case -m default will need to be specified explicitly if you also want to reply to addresses in the IPv4 and IPv6 private ranges. For example:
ref-cache -m 192.0.2.0/24 -m default ...
ref-cache will always listen to the loop-back address, even if this was not specified. Using -m localhost will limit it to only respond to loop-back requests. AUTHORWritten by Rob Davies from the Wellcome Sanger Institute SEE ALSOsamtools(1) Samtools website: <http://www.htslib.org/> CRAM specification: <https://samtools.github.io/hts-specs/CRAMv3.pdf> Refget website: <https://ga4gh.github.io/refget/>
|