GRACC Transition Visualization
The OSG is in the progress of transitioning from an older ElasticSearch (ES) cluster to a new version. Part of this process is reindexing (copying) data fro...
A StashCache Origin is the authoritative source of data. The origin receives data location requests from the central redirectors. These requests take the form of “Do you have the file X”, to which the origin will respond “Yes” or “No”. The redirector then returns a list of origins that claim to have the requested file to the client.
An Origin is a simple XRootD server, exporting a directory or set of directories for access.
|Origin||Base Directory||Data Read|
|LIGO Open Data||/gwdata||926TB|
A list of Origins and their base directories.
The clients interact with the StashCache federation on the user’s behalf. They are responsible for choosing the “best” cache. The available clients are CVMFS and StashCP.
In the pictures above, you can see that most users of StashCache use CVMFS to access the federation. GeoIP is used by all clients in determining the “best” cache. GeoIP location services are provided by the CVMFS infrastructure in the U.S. The geographically nearest cache is used.
The GeoIP service runs on multiple CVMFS Stratum 1s and other servers. The request to the GeoIP service includes all of the cache hostnames. The GeoIP service takes the requesting IP address and attempts to locate the requester. After determining the location of all of the caches, the service returns an ordered list of nearest caches.
The GeoIP service uses the MaxMind database to determine locations by IP address.
Most (if not all) origins on are indexed in an
*.osgstorage.org repo. For example, the OSG Connect origin is indexed in the
stash.osgstorage.org repo. It uses a special feature of CVMFS where the namespace and data are separated. The file metadata such as file permissions, directory structure, and checksums are stored within CVMFS. The file contents are not within CVMFS.
When accessing a file, CVMFS will use the directory structure to form an HTTP request to an external data server. CVMFS uses GeoIP to determine the nearest cache.
The indexer may also configure a repo to be “authenticated”. A whitelist of certificate DN’s is stored within the repo metadata and distributed to each client. The CVMFS client will pull the certificate from the user’s environment. If the certificate DN matches a DN in the whitelist, it uses the certificate to authenticate with an authenticated cache.
StashCP works in the order:
xrdcpcommand to copy the data from the nearest cache.
The cache is half XRootD cache and half XRootd client. When a cache receives a data request from a client, it searches it’s own cache directory for the files. If the file is not in the cache, it uses the built-in client to retrieve the file from one of the origins. The cache will request the data location from the central redirector which in turn, asks the origins for the file location.
The cache listens on port 1094 to regular XRootD protocol, and port 8000 for HTTP.
Authenticated caches use GSI certificates to authenticate access to files within the cache. The client will authenticate with the cache using the client’s certificate. If the file is not in the cache, the cache will use it’s own certificate to authenticate with the origin to download the file.
Authenticated caches use port 8443 for HTTPS.