Energy Data Lab
Welcome to the Energy Data Lab (EDL), an initiative designed to transform the research landscape of Machine Learning (ML) models for energy systems. Our vision is to empower energy researchers by providing a novel digital platform for sharing data, models, and code in a standardized and scalable way, and offering powerful tools to assist researchers during the entire ML model development process.
When developing new ML models for energy systems, researchers work through various steps. First, data must be acquired, explored, cleaned, and reshaped, such that it can be used for modelling. These steps are typically time-consuming and repetitive. The EDL will offer an abundance of benchmark datasets, together with visualizations and cleaning or merging scripts, which will speed up the data preparation process significantly. For developing and training models, the EDL will provide weights of pretrained models, improving performance and computing time for EDL users, alongside with a variety of benchmark models. With this, the EDL fosters transparent research, enabling all users to compare methods in a standardized manner.
As a powerful tool to enable this kind of support, the EDL will employ Directed Acyclic Graphs (DAGs). These DAGs represent and track the entire ML development pipeline. Each artifact in this process (can be a dataset, a script like a data cleaning protocol, or an ML model) is represented as a node in a DAG. The directed edges then describe which artifact is using another artifact. For example, the process of cleaning a raw dataset would be described as shown in Figure 1.
Our main development principle is to design the EDL user-centric, i.e., closely adapting to the needs of energy researchers. This is why our first step in the development process was to interview EDL’s future users. The outcome was presented at the 1st nfdi4energy conference in 2024 and a result summary can be found in the published presentation abstract.
One central advice from the user interviews where the decentralization of the platform. The platform should not claim to store all resources in one central repository, but rather serve as a registry for artifacts which can stored at the institutes where the artifacts are from. This benefits the EDL performance, security, and suits the decentralized nature of research.
A further feature future EDL users desire is easy and fast research data management. Researchers complained about challenges to save and annotate their own artifacts in an understandable and standardized manner. The EDL will thus be set up in a way such that it can be used as personalized and/or project specific research data management tool. With a commitment to transparency, the EDL then encourages institutions to share insights into their models, experiments, and data. By checking a "public" checkbox, researchers invite others to explore and utilize their research artifacts. This way, the EDL becomes more than just a data management tool — it's a platform for sharing, collaboration, and inspiration.
By bringing this concept to life, the EDL will (i) accelerate research and development of ML techniques in the energy domain, (ii) improve the quality of corresponding research, and (iii) boost the actual performance of ML models performing energy management tasks.
If you have questions, ideas, or other feedback, don’t hesitate to reach out via email!
Are you a student looking for a master's thesis, an internship, or an Interdisciplinary Project (IDP)? Join us at the Energy Data Lab, and let's shape the future of ML model development for energy systems together!