11/1/12 Georgetown Meeting
Notes From 11/1 IT Committee Meeting
What Did We Hear (Learn) From Agency Presentations?
- Data management is still a key issue/challenge
- Agency’s have different requirements and are at different maturity levels
- Conflicts exist between sharing data and maintaining IP/ownership
- Tension between use of data within a domain and use across domains
- Is data sharing a subset of data management and where do we focus first?
- Data sharing is about further use (beyond original research) – validation, alternate use…
- Management of data must be flexible enough to accommodate the full data lifecycle (different steps may have different policies and different responsible parties)
- How do you maintain the integrity of the data being maintained?
- Standards are key – local, regional, national, international standards are needed to make data sharable
- What do you do with data sets that are too large to move? (compute in place).
- What is the role of NSF in funding the working elements of a data management/sharing infrastructure? What is the role of the institutions? (see NSF DIBBS)
- How can past NSF projects be mined for RDM solutions? (need for a data directory of data mgmt and sharing of plans?)
- Need a clear definition of responsibilities for various elements of data lifecycle.
- Need to define the difference between data management and sharing and where metadata falls in that dichotomy. Most of the presenters this morning focused on the "management" * portion of the problem but skirted the metadata and sharing components.
- Not enough focus on sharing of data (is this where we can add value?)
Next Steps:
Involvement in DataWay Program and Charrette
- Can SURA offer NSF a subset of the national community to identify mechanisms/models for aggregating community input for DataWay? (Is this a DataWay white paper topic?)
- Researcher focus?
- Institutional focus?
- Implement our pilot (see below) – use DataWay to improve
- How to sustain (NASA spends $100M yr which represents 10% recurring for life of data). This must be budgeted for new projects.
- What are the grand challenges for DataWay?
Respond to call for DataWay White Papers – possible RDM Focus Areas:
- Access and discoverability of distributed data sources including rights management (“rights management”, privacy, copyrights, compliance)
- Governance and economic models for sustainable curation including distribution of effort between local through global communities
- Communities, methods and processes for the definition of metadata and ontologies (vocabulary) (link to pilot)
- Models for building multi-institutional, cross disciplinary infrastructure to improve the management of research data
- Identification of logical / potential division between institution / regional / national responsibilities for various DataWay (data life cycle) components (link to SURA collaboration)
- Data Management vs. Data Sharing (Retention vs. Access)
- What responsibility does the institution have to maintain data over time (how long?) and how is this funded
- Who is responsible for making the data sharable and to how large an audience (research collaborators vs. everyone)
- Sharing requires management (you can’t share data if you haven’t collected and catalogued it)
Identify a focused project to make progress in a specific area
Create a meta-data tool/repository and federate across institutions (check for existing ones)
- Simple meta-data standard and tool (Data version of MARK tool)
- Review PURR (Purdue University Research Repository), DataVerse and Hopkins models for elements that could be added to SBSG or knitted together into a more comprehensive resource.
- Auto meta-data extraction (see OAI-PMH)
- Review COAR Document: The Current State of Open Access Repository Interoperability
Continued development of Best Practices Doc (SBSG)
- Encourage and document use of SBSG
- Institutional case studies of use of SBSG
- Who is using it? How is it being used?
- How can it be improved? Operational review?
- How are researchers being engaged in its use?
- What are the outcomes? DMP is input.
- Can/should SBSG be tailored for specific funding agency requirements?