Storage¶
ARCHIVE¶
Archive data files within an NGAS node, calculating and storing the CRC of the archived file in the NGAS database.
After a successful archiving of a file, all archiving event handlers are invoked. Read Archiving events for more information about these events and how to handle them.
The ARCHIVE
command supports two modes of operation: pull and push.
A pull request is issued by using the GET
method,
and tells an NGAS node to fetch and archive a file based on a valid URI.
A push request, on the other hand, is issued by using the POST
method,
and requires the client to send the file contents as a byte stream to the NGAS server.
When incoming files matching an existing file ID in the NGAS DB, their contents are stored using a new version number if file versioning is on. If file versioning is off, they will overwrite an existing file’s contents instead. When overwriting, users can specify a specific file version to overwrite, which must exist locally in the server receiving the request.
Parameters
filename
: a valid URI i.e.file://, http://, ftp://
for Pull or filename ie.test.fits
for Push.mime_type
: describes the content-type of the file. If not given, NGAS tries to guess it based on the filename’s extension, and the internal mime-type information stored in the NGAS configuration.versioning
: used to switch the automatic versioning on (1
, the default behavior) or off (0
).no_versioning
: The inverse ofversioning
. This is kept for backwards compatibility. If both are specified,versioning
takes precedence.file_version
: specifies which file version to overwrite. Only taken into account whenversioning=0
/no_versioning=1
.crc_variant
: used to explicitly choose which CRC variant will be used to checksum the file, overriding the system-wide configuration. See CRC for details
Archive Pull Example
In this case the NGAS server will attempt to retrieve and archive the file remote.fits
from the remote http server:
curl http://<host>:<port>/ARCHIVE?filename=http://<remotehost>:<remoteport>/remote.fits
Archive Push Example
In this example it is expected that the client uploads the file content as a byte stream to the NGAS server:
curl -X POST -i -H "Content-Type: application/octet-stream" --data-binary "@/tmp/file.fits" http://<host>:<port>/ARCHIVE?filename=file.fits
QARCHIVE¶
Like ARCHIVE, but with the following differences:
- The target volume is selected at random from the available volumes in the server. This bypasses the server’s stream configuration, but should yield a more even loading of the available volumes.
- No file replication is carried out, even if a storage set declares a replication disk.
The QARCHIVE
command was initially implemented
separately from the ARCHIVE
command,
and therefore they used to differ in more ways.
In particular QARCHIVE
originally was the only one
implementing on-stream checksuming,
while ARCHIVE
didn’t.
Nowadays they share the same underlying logic though,
and only the differences documented above remain.
RETRIEVE¶
Retrieve archived data files from an NGAS server or cluster.
Parameters
file_id
: ID of the file to retrieve.file_version
: version of the file to retrieve.processing_pars
: invoke a processing plug-in by name that will operate on the file requested. Note that NGAS will send back the result of the processing which may or may not be a file stream.
If multiple files of the same ID exist and file_version
is not specified then the file with the highest version number will be retrieved by default.
If the file was compressed internally by NGAS at archiving time,
at RETRIEVE
time the contents are still returned compressed.
Different clients will handle this differently:
if the client supports Content-Encoding: XYZ
responses
(with XYZ
being the compression used internally in NGAS)
then a filename without the corresponding compression extension will be presented,
as the client automatically decompresses the incoming content
before presenting it to the user.
In the case the client doesn’t support this,
a filename with the corresponding extension will be presented
to ensure clients associate the filename and its contents
with the corresponding utilities.
Note that only one file can be retrieved per RETRIEVE request.
Example
Get the latest version of a file if it exists:
curl http://<host>:<port>/RETRIEVE?file_id=file.fits
Get specific version of a file if it exists:
curl http://<host>:<port>/RETRIEVE?file_id=file.fits&file_version=2
QUERY¶
The QUERY command is used to list different types of elements currently known by the server. Users can use the QUERY command to find out which files have been archived into the server, which subscriptions have been created, which disks are present in the system, and more.
Parameters
query
: The query to run, the only required argument. Valid values are:files_list
: a list of all files in the system.subscribers_list
: a list of all subscriptions in the system.subscribers_like
: a list of subscriptions in the database belonging to hosts matching the givenlike
value (see below).disks_list
: a list of all disks in the system.hosts_list
: a list of all hosts that are part of the same cluster.files_like
: a list of files whose ID matches the givenlike
value (see below).files_location
: a list of all files in the system, but with information about their physical location on disk.lastver_location
: likefiles_location
, but only listing the last version of each file.files_between
: a list of files archived into the system betweenstart
andend
date (see below).files_stats
: the total size of all archived files, in [MB].files_list_recent
: a list of the last 300 archived files.
format
: the format in which the result of the query will be returned to the client. Valid values arelist
(a textual, table-like representation),pickle
(a python pickled version of the data),json
(a json representation of the data), andpython-list
(astr
representation of the direct result of the query).like
: indicate the value to use in the*_like
queries. If no string is given,%
will be used, therefore matching all values for the corresponding attribute.start
andend
: indicate the beginning and the end of the time interval used for thefiles_between
query. Both parameters must be specified in order for the interval to be properly defined. If any of the two is (or both are) missing, no interval is used.
Example
Get list of all subscriptions the system in json format:
curl http://<host>:<port>/QUERY?query=subscribers_list&format=json
Get list of all files in the system:
curl http://<host>:<port>/QUERY?query=files_list&format=list
CLONE¶
The CLONE Command is used to create copies of a single file or sets of files. In order for the CLONE Command to be accepted by an NGAS node, the system must be configured to accept Archive Requests. NGAS will calculate if there is enough space to execute the request, if not then an error is returned. If the files to be cloned are located on other NGAS host, these will be requested automatically during the cloning (if possible). If the NGAS hosts are suspended, they will be woken up automatically.
Parameters
disk_id
: disk ID where the files to be cloned exist.file_id
: ID of the files to be cloned.file_version
: file version of the files to be cloned.notif_email
: list of comma separated email addresses to where the Clone Status Report can be sent.
The actions of the various combinations of these parameters are explained below:
disk_id | file_id | file_version | Action |
---|---|---|---|
Clone one file with the given ID. Latest version of the file is taken. | |||
Clone one file stored on the given disk. Latest version on that disk is taken. | |||
Clone all files found with the given File Version. Storage location (Disk ID) is not taken into account. | |||
Clone one file on the given disk with the given File Version. | |||
Clone all files from the disk with the given ID. | |||
Clone all files with the given File Version from the disk with the ID given. | |||
Illegal. Not accepted to clone arbitrarily files given by only the File Version. |
CHECKFILE¶
The CHECKFILE command is used to check the consistency of a specific file.
Parameters
disk_id
: disk ID where the file to be checked exists.file_id
: ID of the file to check.file_version
: version of the file to check.
CACHEDEL¶
The CACHEDEL command is used to remove a file from an NGAS cluster. Only the ngamsCacheServer
version supports this command.
WARNING: Once the command completes successfully the file is permanently deleted from the NGAS database and the underlying file system.
Parameters
disk_id
: disk ID where the file to be deleted exists.file_id
: ID of the file to be deleted.file_version
: version of the file to be deleted.
REMDISK¶
The REMDISK command is used to remove storage media from an NGAS node.
The command removes both the information about the storage media and the files stored on said media.
NGAS will not remove the files from the system unless there are at least three (3) independent copies of the files.
Three independent copies refers to three copies of the file stored on three independent storage media.
In order for the REMDISK command to be accepted the system must be configured to allow remove requests i.e. NgamsCfg.Server:AllowRemoveReq
is set in the configuration file.
If the command is executed without the execute
parameter, the information about the disk is not deleted,
but a report is generated indicating what will be deleted if the execution is requested i.e. execute = 1
.
WARNING: Once the command completes successfully the files associated with the storage media are permanently deleted from the NGAS database and the underlying file system.
Parameters
disk_id
: ID of disk/media to remove from NGAS node.execute
: (0 or 1) 0: is a dummy run which will only report what will happen if the command is executed. 1: executes the command which will deleted the storage media and the associated files.notif_email
: list of comma separated email addresses to where the REMDISK Status Report can be sent.
REMFILE¶
The REMFILE command removes a single file from an NGAS node. NGAS will not remove the files from the system unless there are at least three (3) independent copies of the files.
In order for the REMFILE command to be accepted the system must be configured to allow remove requests i.e. NgamsCfg.Server:AllowRemoveReq
is set in the configuration file.
Parameters
disk_id
: disk ID where the file to be deleted exists.file_id
: ID of the file to be deleted.file_version
: version of the file to be deleted.execute
: (0 or 1) 0: is a dummy run which will only report what will happen if the command is executed. 1: executes the command which will delete the file.notif_email
: list of comma separated email addresses to where the REMFILE Status Report can be sent.
The actions of the various combinations of these parameters are explained below:
disk_id | file_id | file_version | Action |
---|---|---|---|
All files matching the given File ID pattern on the contacted NGAS host are selected. | |||
All files with the given File ID on the disk with the given ID will be selected. | |||
All files with the given File ID pattern and the given File Version are selected. | |||
The referenced file with the given File ID and File Version on the given ID is selected (if this exists). | |||
Illegal. | |||
No files are selected. | |||
No files are selected. |
REGISTER¶
The REGISTER command is used to register files already stored on an NGAS disk. It is possible to register single files or entire sets of files by specifying a root path. Only files that are known to NGAS (with a mime-type defined in the configuration) will be taking into account. It is also possible to explicitly specify a comma separated list of mime-types that will be registered. Files with other mime-types than specified in this list will be ignored.
Parameters
mime_type
: comma separated list of mime-types. A single mime-type can also be specified.path
: The root path under which NGAS will look for candidate files to register. It is also possible to specify a complete path to a single file.notif_email
: email address to send file registration report.
NGAS can be configured to run specific code when registering a file. See Register for details.
REARCHIVE¶
The purpose of the REARCHIVE command is to register a file in the NGAS DB that has already been generated when the file was archived with the QARCHIVE command. This means that the process of extracting the meta-information and other processing can be skipped whilst re-archiving the file making the processing more efficient.
The meta-information about the file is contained in the special HTTP header named NGAS-File-Info
.
It is stored as a base64
encoded NGAS XML block for the file (NGAS File Info).
This encoding can be accomplished by means of the Python module base64
using base64.b64encode()
.
The command does not require any parameters but the data to be re-archived should be contained in the body of the HTTP request similar to QARCHIVE Push or Pull.