Preserving the Digital Record: Challenges in Archiving Born-Digital Content and Big Data
Abstract
The rapid expansion of digital technologies has led to an unprecedented growth in born-digital content and big data, presenting new challenges for long-term preservation. Born-digital content, including emails, websites, and social media posts, requires specialized strategies for archiving due to its format volatility and ephemeral nature. Similarly, the vast and complex nature of big data poses significant difficulties in ensuring data integrity, security, and accessibility over time. This study aims to explore the challenges in archiving born-digital content and big data and to evaluate the current practices and frameworks employed by organizations in managing these digital records. A mixed-methods approach was used, combining qualitative interviews with digital archivists and quantitative surveys from organizations across sectors like government, academia, and technology. The results reveal a widespread reliance on cloud storage, with limited use of specialized archival systems for big data. Additionally, 15% of organizations lack a formal digital preservation strategy. The study concludes that while cloud storage offers scalability, specialized archival solutions are essential for long-term preservation, especially for big data. The research highlights the need for standardized frameworks and advanced technologies to address the growing complexities of digital preservation.
Full text article
References
Aggarwal, A. (2024). Computational Model for the Detection of Diabetic Retinopathy in 2-D Color Fundus Retina Scan. Current Medical Imaging, 20(Query date: 2026-03-19 08:18:30). https://doi.org/10.2174/0115734056248183231010111937
Alabdulatif, A. (2025). Blockchain-Based Privacy-Preserving Authentication and Access Control Model for E-Health Users. Information Switzerland, 16(3). https://doi.org/10.3390/info16030219
Ali, A. (2022). An Industrial IoT-Based Blockchain-Enabled Secure Searchable Encryption Approach for Healthcare Systems Using Neural Network. Sensors, 22(2). https://doi.org/10.3390/s22020572
Ali, A. (2023). Blockchain-Powered Healthcare Systems: Enhancing Scalability and Security with Hybrid Deep Learning. Sensors, 23(18). https://doi.org/10.3390/s23187740
Ansarian, M. (2023). Applications and Challenges of Telemedicine: Privacy-Preservation as a Case Study. Archives of Iranian Medicine, 26(11), 654–661. https://doi.org/10.34172/aim.2023.96
Arissabarno, C. (2023). Blockchain Integration for Mixed Reality Based Smart Lab Systems. Ies 2023 International Electronics Symposium Unlocking the Potential of Immersive Technology to Live A Better Life Proceeding, (Query date: 2026-03-19 08:18:30), 211–217. https://doi.org/10.1109/IES59143.2023.10242494
Bai, Y. (2025). A Survey on Directed Acyclic Graph-Based Blockchain in Smart Mobility. Sensors, 25(4). https://doi.org/10.3390/s25041108
Banerjee, S. (2022). Agent-based beat-by-beat compression of 12-lead electrocardiogram signal using adaptive Fourier decomposition. Biomedical Signal Processing and Control, 75(Query date: 2026-03-19 08:18:30). https://doi.org/10.1016/j.bspc.2022.103628
Bao, Z. (2024). Creating and controlling global Greenberger-Horne-Zeilinger entanglement on quantum processors. Nature Communications, 15(1). https://doi.org/10.1038/s41467-024-53140-5
Borges, R. (2022). An anonymous and unlinkable electronic toll collection system. International Journal of Information Security, 21(5), 1151–1162. https://doi.org/10.1007/s10207-022-00604-8
Castellazzi, G. (2023). Advancing Cultural Heritage Structures Conservation: Integrating BIM and Cloud-Based Solutions for Enhanced Management and Visualization. Heritage, 6(12), 7316–7342. https://doi.org/10.3390/heritage6120384
Chen, X. (2022). AQ-ABS: Anti-Quantum Attribute-based Signature for EMRs Sharing with Blockchain. IEEE Wireless Communications and Networking Conference Wcnc, 2022(Query date: 2026-03-19 08:18:30), 1176–1181. https://doi.org/10.1109/WCNC51071.2022.9771830
Dharminder, D. (2023). Construction of system friendly attribute based fully distributed access control architecture for e-healthcare. Multimedia Tools and Applications, 82(17), 26937–26953. https://doi.org/10.1007/s11042-023-14836-w
Duranti, L. (2022). Authenticity. Archives and Records, 43(2), 188–203. https://doi.org/10.1080/23257962.2022.2054406
El-Shafai, W. (2023). An optical-based encryption and authentication algorithm for color and grayscale medical images. Multimedia Tools and Applications, 82(15), 23735–23770. https://doi.org/10.1007/s11042-022-14093-3
Ghayvat, H. (2022). CP-BDHCA: Blockchain-Based Confidentiality-Privacy Preserving Big Data Scheme for Healthcare Clouds and Applications. IEEE Journal of Biomedical and Health Informatics, 26(5), 1937–1948. https://doi.org/10.1109/JBHI.2021.3097237
Goyat, R. (2022). Blockchain-Based Data Storage With Privacy and Authentication in Internet of Things. IEEE Internet of Things Journal, 9(16), 14203–14215. https://doi.org/10.1109/JIOT.2020.3019074
Huvila, I. (2022). Archaeological Practices and Societal Challenges. Open Archaeology, 8(1), 296–305. https://doi.org/10.1515/opar-2022-0242
Islam, M. S. (2023). Blockchain-enabled Secure Privacy-preserving System for Public Health-center Data. International Journal of Advanced Computer Science and Applications, 14(5), 1147–1154. https://doi.org/10.14569/IJACSA.2023.01405118
Jeon, K. (2025). Advancing Medical Imaging Research Through Standardization: The Path to Rapid Development, Rigorous Validation, and Robust Reproducibility. Investigative Radiology, 60(1), 1–10. https://doi.org/10.1097/RLI.0000000000001106
Khan, A. A. (2025). BAML: a decentralized approach to secure, privacy-preserving financial compliance for enhancing anti-money laundering with blockchain hyperledger and federated learning. Peer to Peer Networking and Applications, 18(5). https://doi.org/10.1007/s12083-025-02086-6
Khan, S. (2025). Advancing Medical Innovation Through Blockchain-Secured Federated Learning for Smart Health. IEEE Journal of Biomedical and Health Informatics, 29(9), 6482–6495. https://doi.org/10.1109/JBHI.2025.3532976
Klein, D. (2025). Building a Digital Health Research Platform to Enable Recruitment, Enrollment, Data Collection, and Follow-Up for a Highly Diverse Longitudinal US Cohort of 1 Million People in the All of Us Research Program: Design and Implementation Study. Journal of Medical Internet Research, 27(Query date: 2026-03-19 08:18:30). https://doi.org/10.2196/60189
Kumar, N. P. H. (2022). An Authorization Framework for Preserving Privacy of Big Medical Data via Blockchain in Cloud Server. International Journal of Advanced Computer Science and Applications, 13(3), 140–150. https://doi.org/10.14569/IJACSA.2022.0130319
Li, W. (2023). Aggregated Zero-Knowledge Proof and Blockchain-Empowered Authentication for Autonomous Truck Platooning. IEEE Transactions on Intelligent Transportation Systems, 24(9), 9309–9323. https://doi.org/10.1109/TITS.2023.3271436
Ouyang, X. (2024). ADMarker: A Multi-Modal Federated Learning System for Monitoring Digital Biomarkers of Alzheimer’s Disease. ACM Mobicom 2024 Proceedings of the 30th International Conference on Mobile Computing and Networking, (Query date: 2026-03-19 08:18:30), 404–419. https://doi.org/10.1145/3636534.3649370
Panimalar, S. P. (2023). A Survey Based on Privacy-Preserving Over Health Care Data Analysis. Lecture Notes in Networks and Systems, 682(Query date: 2026-03-19 08:18:30), 443–456. https://doi.org/10.1007/978-981-99-1946-8_40
Qi, S. (2023). Blockchain-Aware Rollbackable Data Access Control for IoT-Enabled Digital Twin. IEEE Journal on Selected Areas in Communications, 41(11), 3517–3532. https://doi.org/10.1109/JSAC.2023.3310061
Rani, P. (2023). An Efficient and Privacy-Preserving Data Aggregation Scheme for Smart Grids in Cloud Environment. SN Computer Science, 4(5). https://doi.org/10.1007/s42979-023-01955-2
Ranjan, A. K. (2025). A survey on blockchain-based privacy preserving techniques for edge internet of things. International Journal of Computers and Applications, 47(6), 497–508. https://doi.org/10.1080/1206212X.2025.2498687
Sabiri, K. (2025). A systematic review of privacy-preserving blockchain applications in healthcare. Multimedia Tools and Applications, 84(32), 39925–39980. https://doi.org/10.1007/s11042-024-20541-z
Sami, K. T. (2024). Blockchain-Based Access Control for Electronic Health Records. Communications in Computer and Information Science, 1884(Query date: 2026-03-19 08:18:30), 21–33. https://doi.org/10.1007/978-3-031-55829-0_2
Schneising, O. (2023). Advances in retrieving XCH4 and XCO from Sentinel-5 Precursor: Improvements in the scientific TROPOMI/WFMD algorithm. Atmospheric Measurement Techniques, 16(3), 669–694. https://doi.org/10.5194/amt-16-669-2023
Shaban, A. I. (2022). Building a Smart System for Preservation of Government Records in Digital Form. Hora 2022 4th International Congress on Human Computer Interaction Optimization and Robotic Applications Proceedings, (Query date: 2026-03-19 08:18:30). https://doi.org/10.1109/HORA55278.2022.9800034
Shah, R. A. (2024). Collaborative Blockchain-based Crypto-Efficient Scheme for Protecting Visual Contents. Journal of Computing and Biomedical Informatics, 2024(Query date: 2026-03-19 08:18:30). https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105026790249&origin=inward
Sharma, M. (2024). Blockchain’s Frontier: Enhancing Data Security and Collaboration for Healthcare. 2024 11th International Conference on Reliability Infocom Technologies and Optimization Trends and Future Directions Icrito 2024, (Query date: 2026-03-19 08:18:30). https://doi.org/10.1109/ICRITO61523.2024.10522385
Singh, B. (2024). Cherish data privacy and human rights in the digital age: Harmonizing innovation and individual autonomy. Balancing Human Rights Social Responsibility and Digital Ethics, (Query date: 2026-03-19 08:18:30), 199–226. https://doi.org/10.4018/979-8-3693-3334-1.ch007
Tabet, S. (2024). AI-Enhanced Mobile Diminished Reality for Preserving 3D Visual Privacy. 2024 2nd International Conference on Intelligent Metaverse Technologies and Applications Imeta 2024, (Query date: 2026-03-19 08:18:30), 141–148. https://doi.org/10.1109/iMETA62882.2024.10807959
Tian, H. (2022). CDTP: A Copyright-preserving Decentralized Data Trading Platform Based on Blockchain. Proceedings 2022 18th International Conference on Mobility Sensing and Networking Msn 2022, (Query date: 2026-03-19 08:18:30), 386–390. https://doi.org/10.1109/MSN57253.2022.00069
Trace, C. B. (2024). Algorithmic futures: The intersection of algorithms and evidentiary work. Information Communication and Society, 27(7), 1334–1350. https://doi.org/10.1080/1369118X.2023.2255656
Usmani, U. A. (2023). A Systematic Review of Privacy-Preserving Blockchain in e-Medicine. Studies in Computational Intelligence, 1045(Query date: 2026-03-19 08:18:30), 25–40. https://doi.org/10.1007/978-3-031-08580-2_3
Wolff, B. (2025). Artificial intelligence and natural language processing in modern clinical neuropsychology: A narrative review. Clinical Neuropsychologist, (Query date: 2026-03-19 08:18:30). https://doi.org/10.1080/13854046.2025.2547934
Zaitseva, N. V. (2023). CONCEPTUAL FOUNDATIONS OF A CORPORATE INTELLIGENT RISK-BASED SYSTEM FOR ANALYSIS, PREDICTION AND PREVENTION OF OCCUPATIONAL AND WORK-RELATED HEALTH DISORDERS OF WORKERS. Health Risk Analysis, 2023(4), 19–32. https://doi.org/10.21668/health.risk/2023.4.02.eng
Zhang, G. (2022). Blockchain-based privacy preserving e-health system for healthcare data in cloud. Computer Networks, 203(Query date: 2026-03-19 08:18:30). https://doi.org/10.1016/j.comnet.2021.108586
Zhang, S. (2024). BAKA: Biometric Authentication and Key Agreement Scheme Based on Fuzzy Extractor for Wireless Body Area Networks. IEEE Internet of Things Journal, 11(3), 5118–5128. https://doi.org/10.1109/JIOT.2023.3302620
Zukaib, U. (2023). Blockchain and Machine Learning in EHR Security: A Systematic Review. IEEE Access, 11(Query date: 2026-03-19 08:18:30), 130230–130256. https://doi.org/10.1109/ACCESS.2023.3333229
Authors
Copyright (c) 2026 Bayu Hartono, Ahmed Al-Mohannadi , Ana María Rodríguez

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.