Notes from SURA Research Data Management (RDM) conference call of 6/4/12
Attendees: J.L. Albert (GSU), Diane Bruxvoort (UFL), John Burger (ASERL), Joseph Combs (Vanderbilt), Gary Crane (SURA), Marc Hoit (NCSU), Steven Morris (NCSU), Russell Moy (SURA), Richard Newman (FIT), Ramon Padilla (UNC), Andrew Sallans (UVA), Aaron Trehub (Auburn), Pat Vince (Tulane), Tom Wilson (UA), Gary Worley (VATech)
Purpose of Calls: This was the first of our bi-weekly calls that are focused on evolving the ideas that were first raised at the March 29th SURA IT Committee meeting in St Petersburg. The purpose of these calls is to:
- review the results of the SURA Research Data Management (RDM) survey,
- better define the topics that were ranked highest in that survey,
- identify individuals willing to provide leadership for specific projects
- develop a plan and agenda for an August 23-24 face-to-face meeting to launch a community RDM project
In general the participants felt that the SURA RDM survey had identified an appropriate set of concepts and priorities that the group can focus on. They are, in priority order:
- Create/expand on existing data management efforts to provide or enhance a shared tool, support or service that helps researchers meet granting agency requirements.
- Develop a lifecycle data management framework to help researchers understand what tools, resources and support are available.
- Develop a best practices document on ways campus IT and Library groups can collaborate.
- Explore leveraging existing or creating a new multi-institutional shared research data repository.
It was agreed that we need to identify, vet and leverage existing RDM tools and projects.
Existing RDM tools discussed by the group were:
- DMPTool (https://dmp.cdlib.org/)
- Universities can customize DMPTool to offer specialized campus offerings. Currently 19 of 52 schools utilizing DMPTool have taken this approach. SURA schools may want to expand this number as a group project.
- Databib (http://databib.org/) – listing of discipline specific repositories
- Short term project, intended to be self-managed,
- Crowd sourced solution, at Purdue and Penn state
- Seems to be multiple sites for same data
- DataONE project (http://www.dataone.org/)
- DataOne (one of the two initial NSF DataNet projects) is focused on environmental science as a broad interdisciplinary area, so this isn't necessarily one size fits all, and the "Decide on a repository" section is fairly simplified, but perhaps this offers a starting point for discussion about life cycle pieces.
- DataOne has developed a best practices document for working with data through all the stages of the data life cycle: https://www.dataone.org/best-practices
- Kuali Open Library Environment (http://kuali.org/ole)
Biggest missing piece is a shared repository? (Priority #4)
- Large science disciplines typically have a shared repository.
- Can we use/expand DataOne as a tool for cataloging existing repositories and educating researchers on its ues?
- Difference between where to find big data sets (Hubble, etc) – and where researchers can put their data (local sky survey)
- What about disciplines that do not have shared repository or a community ontology?
- How do researchers decide where to put their data?
- To what extent do Librarians determine where data is stored?
- Vetting process for repositories? (databib – which are good)
- Discovery interface – how to expose/find the data?
- Best practices? How to make data available to a general engine
- Bibliographic records in their home library
- Upload to worldcat (shared library catalog ) oclc
- Some institutions already do this
- Do we want to consider creating a shared repository within SURA
- Universities must retain control of their data
Can we create a checklist/cookbook for managing data from creation to long term preservation?
Create a roadmap for a community RDM project:
- Review DataONE and adopt/refine data lifecycle framework for our needs
- Identify existing repositories and tools and fit them into the data lifecycle framework
- Identify where gaps exist in available repositories and tools
- Target projects to fill gaps
- Document and communicate results to community
- Aaron Trehub and Ramon Padilla agreed to review the DataONE data lifecycle framework and create a draft research data management flowchart that will give us a start on the first element (above) of a SURA community RDM project roadmap. This will be shared through the firstname.lastname@example.org email list and form the basis for discussion on our next scheduled call on June 18.