The long-term management of data requires a sustainability and governance model that specifies the policies that will be used to guarantee funding support, minimize risk of data loss, assure integrity, and assure authenticity.
The management plan needs to address plans for future access if the sustainability model fails, where the collection might be housed, and how the material will be migrated to the new environment. The concept of infrastructure independence in persistent archives can be extended to include independence from a particular sustainability model through federation with other institutions that use alternate sustainability models. Guaranteed access to a collection requires a community that is willing to curate the collection, identify risks to the maintenance of the collection, and seek opportunities to replicate the collection as widely as possible.
For science and engineering, as in life, there is “no free lunch.” The ability to organize, analyze, and utilize today’s deluge of data to drive research, education, and practice incurs costs for management, curation, preservation and distribution. These costs must be included in project budgeting and infrastructure planning, and are non-zero.
They are better than the alternative, however. Without responsible data planning as part of the process of project development, organization, and management, valuable data collections will be lost, damaged, or become unavailable. Lack of planning can incur substantive cost for resurrecting, re-generating, or rescuing a data collection, and without critical data, science and engineering advancement and discovery can be slowed. At the end of the day, the costs of thoughtful and strategic data management, curation and preservation are a bargain.
The authors would like to thank Helen Berman, Phil Bourne, and Richard Moore for their comments and improvements.