RDSF Usage FAQs


What is the RDSF?

And BluePeta?

Who can use it?

How much space can I have?

What does it cost?

Can't I just add a big disk to my PC?

How secure & reliable is it?

Is it fast?

Why does my data appear to occupy twice as much space as I expect?

What limitations does the RDSF have?

Is my data backed up?

Can I Back up my PC to the RDSF?

How Can I keep the data on the RDSF up to date with my PC?

How often do you backup?

How long do you keep backups?

What is a Data Steward?

Can I delegate Data Steward duties?

What is a project?

How do I access my project?

Who can access my project(s)?

Can David & Ruth access this directory but not Brian or Jennifer?

Why is the access model set up this way?

Can my PhD students store their data in the RDSF?

Which machines can access my data?

What is the Data-Bris folder in my project?

Should I store all my data in the Data-Bris folder?



What is the RDSF?

The RDSF is a facility for the long term storage of research data and is available to researchers from all disciplines.  Physically it is a set of disks and servers housed in two separate data centres. It provides a central location for data associated with research activities throughout the University.  The data may be accessed as a Windows shared drive (this works for Macs too) or via NFS on  Linux.

And BluePeta?

An alternative name for the RDSF mainly used internally.

Who can use it?

Any PI in the University. You need to apply to be a Data Steward (see below). You can then apply for storage for any Research Projects you have and can arrange access for other members of your research groups.

How much space can I have?

How much do you want? A Data Steward can currently have up to 5TB of storage free of charge. Above 5TB, the University has implemented a charging policy. For technical reasons the minimum allocation is 100GB per project.

What does it cost?

Costs are detailed here. You should include costs for storing more than 5TB of research data in grant applications.

Can't I just add a big disk to my PC?

Yes but...

The RDSF is in a secure location, it's shareable and regularly backed up to tape.

How secure & reliable is it?

We limit access to each project to a named set of people so only users authorized by the Data Steward can see the data.

The data is stored on multiple disks with protection against individual component failure. Of course there are events over which we have no control e.g. a power failure. All project data is stored in two different locations. Then if we lose one location your data continues to be available from the other (replicated), this is totally transparent. The system is very scalable and we monitor it. If it slows down significantly we can add more resource (e.g. another pair of filers).

Is it fast?

The system is designed for capacity rather than speed, but is comparable with other network drives. If speed is of the essence store a 'working copy' on local storage and make regular copies to your projects(s).

Why does my data appear to occupy twice as much space as I expect?

As all projects are replicated, a 5TB project actually occupies 10TB of disk space; for example, a 1KB file will occupy 2KB.

What limitations does the RDSF have?

The RDSF was not designed for performance, but purely to offer bulk second tier storage, so we ask that you do not run applications that directly access the RDSF filesystems. If you would like help or advice on changing your usage patterns on the RDSF or with changing to use local disk systems for active data, do contact us.

Is my data backed up?

Yes, all data stored on the RDSF is backed up to our tape library in the Computer Centre. This does not mean that you should use the RDSF to back up all of the data on your PC. See below...

Can I back up my PC to the RDSF?

Yes and no.

For research data, this is fine and we would encourage you to do so.

As for system backups such as those carried out by utilities such as Apple Time Machine, Linux rsync and PC backup software we're afraid not. This use is outside the remit and configuration of the RDSF.

The problem is copies of system specific files such as applications, preferences, browser caches etc. can cause issues with our backups. In many cases these are rejected by our software, extending the time our backups take which can affect our ability to recover files.

Anyone who needs to back up non research data should make alternative arrangements with Zonal teams.

Problematic files we discover will be removed from our backups. In most cases the files are being rejected anyway.

How can I keep the data on the RDSF up to date with my PC?

The most widely used program to do this is called rsync. This is a Unix/Linux utility which is also available for Mac and Windows. It is normally run from the command line eg.

rsync -av mydata /mnt/rdsf_project/mydata/

will ensure that the directory /mnt/rdsf_project/mydata/will be contain a copy of all of the data in mydata including sub directories (folders). Subsequent runs will keep the copy up to date, only coping new and changed files.

For Windows there is a command line version cwrsync, and serveral GUI versions including grsync (also available for Mac & Linux). for Windows only, there is Microsoft's SyncToy. A graphical program that may be more suitable for those unfamiliar with Linux. There are also other File Synchronization tools available.

Daily

How long do you keep backups?

Forever, however, we recycle every 30 days, throwing away all but the latest copies. So if you lose a file we can get it back as long as you realize, and let us know, within four weeks. This does not mean that if a file is older than 30 days we don't have it. As long as a file is still on the system, i.e. it has not been deleted, we will always have the 'latest' copy, even if it's several years old. We aim to retain all data for at least 20 years.

What is a Data Steward?

A Data Steward is someone who owns a set of data stored on the RDSF in the form of one or more projects. You decide how much storage a project needs, for how long the data should be kept, who has access to it and pay any costs beyond the first 5TB. Normally the Data Steward would be the PI for any associated University research projects.

Can I delegate Data Steward duties?

Yes, you can delegate some activities such as adding/removing access to/from users and preparing data for publication to a RA in your research group. We call this a Deputy Data Steward and we recommend a maximum of two per project.

What is a project?

A project is a set of data associated with an activity.  It is stored as a directory on the RDSF and made available as a Windows (CIFS) share, or Unix/Linux (NFS) mountable directory. A Data Steward may have many projects. Every project must have a Data Steward.

How do I access my project

Assume your project is called My_Project

On a PC - From Windows Explorer access or map a network drive path \\rdsfcifs.acrc.bris.ac.uk\My_Project.

On a Mac - In the Finder select 'Go>Connect To Server...' from the menu, or press CMD-K, then enter smb://rdsfcifs.acrc.bris.ac.uk/My_Project into the dialogue box.

From the Linux desktop - This can vary, but most now have a 'Connect to Server' from the Places menu.
Using this select "Windows Share", Server: rdsfcifs.acrc.bris.ac.uk, Folder: My_Project

If any of the above ask for a Windows Domain the answer is UOB (all in capitals).

Linux NFS
- we recommend Linux users connect to the standard Windows share over SMB.
Go to 'Connect to Server' from the Places menu. Using this select "Windows Share", Server: rdsfcifs.acrc.bris.ac.uk, Folder: My_Project


Who can access my project(s)?

Any member of the University the Data Steward authorises. Either give us a list when you ask for the Project to be created or let us know later on and we will add or remove access for them.

Can David & Ruth access this directory but not Brian or Jennifer?

All members of the project have full access. However, a Data Steward could apply for 2 projects, for example one for the whole research group and one for a small number of users, maybe just the Data Steward and one or two RAs.

Why is the access model set up this way?

So we can support both Windows and Linux users. The Windows & Unix/Linux views of who can do what may look similar on the outside, but are quite different. This is not just an issue for the RDSF and is currently being looked at by the IT Services Unix Virtual Team.

Can my PhD students store their data in the RDSF?

As Data Steward, you can add any PhD students you supervise as users of your project.

Which machines can access my data?

For Windows/Mac (CIFS sharing) any of the project users can access the data as a Windows shared drive from any machine on-site or via the University VPN. If you'd prefer tighter restrictions, for example restricting access to a small number of PCs, just ask.

For most Unix/Linux systems it is possible to use Windows sharing as above. However for NFS you will need to let us have a list of authorized machines. Be aware that normal Unix/Linux permissions apply so your local root account will have unrestricted access to the project. You will also have to use University Standard UIDs and add the project's Unix group to your system. If you're unsure about this consult with your Zonal support team.

What is the Data-Bris folder in my project?

It's a pre-created folder for you to use when you wish to publish research data via the University Research Data Repository. This is managed by the Research Data Service team and guidance is provided on their website.

Should I store all my data in the Data-Bris folder?

Absolutely not. Only data that is to be published via the University Research Data Repository should be stored there. For other data you should create other folders alongside it as necessary.