Good software engineering practices for software-writing scientists

Scientists regularly create software to support their research, from data analysis scripts to complex simulations. As these projects grow, they inevitably face challenges related to complexity, maintainability, and reproducibility. This course introduces practical, approachable software engineering techniques tailored specifically for scientists who write code but may lack formal training in software development.

Key practices covered include:

  • Version Control: Track changes, manage project history, and facilitate collaboration using tools like Git.

  • Unit Testing: Write automated tests to ensure code reliability and correctness.

  • Continuous Integration (CI): Automate testing and validation to catch issues early and streamline development.

  • Documentation: Clearly document software usage, functionality, and development processes to enhance understandability and ease collaboration, and do it in a way that is sustainable, maintainable and quasi-automatic.

  • Building and Packaging: Automate the build and installation processes, making software easy to share and use across diverse environments.

  • Principles of Software Design: Learn about modularity, abstraction, and other design principles that help manage complexity and improve code quality.

We will also explore the evolving role of Large Language Models (LLMs) in scientific software, highlighting both their potential to boost productivity and common pitfalls to avoid.

The course emphasizes adopting standard practices to manage complexity, improve maintainability, and promote sustainable, transparent, and collaborative scientific software development.

The specific tools and technologies used in the course may vary depending on the software stack at hand (such as the main programming language used or the type of software being developed), but the principles and practices are broadly applicable across different domains and programming environments.

For instance, a MATLAB-focused training will seldom leave the MATLAB ecosystem, while a Python-focused training will use tools like pytest, pip, conda, sphinx, and setuptools. The course can be adapted to different programming languages and environments, including Python, MATLAB, C++, and others.

The duration of this course is flexible and can be adapted to fit the needs of the participants, ranging from one session to a full university course.

Taught at