Project

The use of very high-level languages (Python/Perl) for creating flexible scientific software environments

Background

If one or more of these questions attract your attention, the current project will be of interest!

Purpose

The primary purpose of the project is to help scientists and engineers working intensively with computers to become more productive, have more fun, and improve the reliability of their work. Scripting can be a key tool for reaching these goals. The term scripting means different things to different people. We use the term scripting for writing programs in what is often referred to as very high-level languages. Perl and Python are primary examples of such languages, and the resulting programs have more the nature of managing computations rather than crunching numbers in do/for-loops.

The main tasks in the projects are to

  1. develop a book about scripting techniques in computational science,
  2. develop strategies for writing Python interfaces to Diffpack,
  3. equip Diffpack with managing tools based on scripts,
  4. use scripting for simplifying Diffpack programming.

What scripting is about

The simplest application of scripting is to write short scripts (programs) that automate manual interaction with the computer, meaning that the scripts mainly glue stand-alone applications and operating system commands. More advanced use of scripting includes searching and manipulating text files, rapid construction of graphical user interfaces, tailoring visualization and image processing environments to your own needs, administering large sets of computer experiments, and managing your existing Fortran, C, or C++ libraries and applications directly from scripts. Scripts are often much shorter than corresponding Fortran, C, C++, or Java programs and considerably faster to develop. Moreover, scripts are for the most part truly cross-platform, so what you write on Windows runs without modifications on Unix and Macintosh, also when graphical user interfaces are involved.

The interest in scripting with Perl and Python has exploded among Internet service deveopers and computer system administrators. However, scripting has a significant potential in computational science and engineering (CSE) as well. Software systems such as Matlab, Maple, Mathematica, and S-Plus are primary examples of very popular, widespread tools because of their simple and effective user interface. The user issues high-level commands, which may be very short for novice users or filled with fine-tuning parameters for the more experienced users. Results and data can be stored in variables without bothering about the type. You often feel that the syntax is as short as possible. Python resembles the nature of these interface languages, but is a full-fledged, advanced programming language. With Python and the techniques explained in this book, you can actually create your own easy-to-use computational environment, which can be made very close to the nature of Matlab, Maple, and similar systems (and communicate with them), but tailored to your own number crunching codes.

In our scripting project is to develop software techniques for combining "the best of all worlds", i.e., combining different tools and programming languages, when building scientific applications. As a simple example, one can think of using a C++ library for creating computational grids, a Fortran 77 library for solving a PDE, a C code for visualization, and the scripting language Python for gluing the other tools together in a high-level program, perhaps with an easy-to-use graphical interface.

Mixed-language programming is possible in Fortran, Java, and C-like languages, but the support for such programming is much stronger in a scripting language like Python, first of all because scripting languages were intially designed for being integrated with C. The clear syntax of Python means that the controling high-level script looks much like a Matlab-like interface tailored to your Fortran and C/C++ libraries. The script can be run as a stand-alone application or interactively (just as Matlab), or you can invoke it through a fully graphical user interface. Notice that interactivity means that you can almost trivially introduce computational steering into your applications.

Roughly speaking, the combination of Python, traditional number crunching languages, and existing computing and visualization software makes it possible, with quite limited efforts, to build a Matlab-like environment for your special set of numerical software. The alternative strategy of including your code in a system like Matlab is a much less straightforward process and gives you a much less flexible end product.

The Book

The first part of the project is to develop a book and a course teaching scripting techniques in a computational science setting. The emphasis is on the Python language and its modules, often in combination with Fortran 77, C and C++ code. The book's Table of Contents gives an impression of the relevant topics. Parts of these topics constitute a university course (IN 228), with about 100+ students passing the requirements each fall. A short course on using Python in numerical investigations has been given at Uppsala University.

Very high-level parallel programming

The BSP software and programming model are available through an interface in Python (as part of the Scientific Python package by Konrad Hinsen). Ola Skavhaug is investigating how this very high-level Python interface can simplify the standard C/C++/MPI-based low-level programming in parallel computing. Two aspects make this approach promising: (i) the BSP programming model is conseptually simple and efficiently implemented, and (ii) numerics in Python can be efficiently handled using the Numerical Python (NumPy) package. In practice, this type of programming involves development of new classes for the frequently encountered data structures when solving PDEs, e.g., tridiagonal and sparse matrices. These classes are derived from BSP-Python classes, thus inheriting parallel programming tools, and make extensive use of NumPy arrays, either within the functionality of NumPy or in specialized C/F77 extension modules operating directly on the NumPy data segments. The development of these classes is on a much higher level than C/MPI programming, but the real benefit is the programming with these classes; one can simply write A*x in a Python code to get a distributed matrix-vector product with a distributed matrix A and vector x. Hence, the approach brings parallel programming to a much higher conceptual level. Besides simplifying research work involving parallel computing, the approach may have a particular strong impact on training students in parallel thinking and implementation at an early stage in their studies.

The Diffpack-Python coupling

Diffpack is programmed in C++. For those who have worked much with scripting languages, or environments like Maple and Matlab, where the syntax is simple and tasks can be carried out interactively, programming C++ soon feels comprehensive and tedious. Our aim is to develop tools such that you can program Diffpack simulators in Python. Especially when running the simulator, making adjustments, and doing computational steering, the Python interface gives you an efficient very high-level language, of the nature in Maple and Matlab, but tailored to a complicated mathematical/numerical problem (a PDE solver). The ultimate aim is to allow users to quickly build an interface as convenient as Matlab to any numerical application or library.

In the first part of the project we concentrate on Python interfaces to Diffpack simulators. Later, we will focus on Python interfaces to the Diffpack libraries.

Code re-use

Experience during the 90s with using C++ for numerical computing has clearly shown that the resulting libraries and applications are significantly easier to extend, maintain, and re-use in new situations, compared to the traditional procedure-based programming style of Fortran and C. Object-oriented programming and generic (template) programming are two important techniques that contribute to such improvements of software development efficiency. Nevertheless, many research groups already have well-tested and optimized numerical codes, written in Fortran or C, and true code re-use would mean to integrate the pure computing parts of such codes with new developments in perhaps C++ or Java. Combining C++ and Fortran, or Java and C, quickly gives you a lot of frustrations (think of differences in representing even simple data structures such as strings!). Python offers the benefits of object-oriented and generic (template) programming, together with a syntax that is simpler and clearer than C++ and Java. In addition, there exists several tools which makes calling Fortran 77/90, C, C++, or Java code trivial, at least in principle. Hence, the idea is to write the managing code segments in Python, using efficient data structures and algorithms in new or old Fortran, C/C++, or Java code.

High-level tools for simplifying mixed-language integration

Even if the overall goal sounds attractive, and Python is a good starting point for mixed-language programming, it must be extremely easy to combine different languages if this is a technique that is going to be used widely. There are tools (SWIG, Pyfort, f2py) that automatically generate most of the necessary code for combining Python with Fortran and C/C++ code, but some disturbing details are often left for manual adjustment. This can be very annoying unless you have lots of experience with such language integrations. We therefore aim at building scripts on top of existing integration tools to streamline the integration process. One goal is to write Python code in one part of an editor window and simply jump to another part to write number crunching code in Fortran, C, or C++, often calling up your own libraries in those languages. With just pressing a button or running a simple one-line command, the Python script works with the external numerical code. (We remark that combining Python and Java is already made fully transparent by JPython.)

Moving graphics for teaching

Dynamic processes, either physical or algorithmic, can be effectively visualized using animation of figures, preferably with some interactive control through a graphical user interface. We want to explore the use of Python and Tk/Pmw for this purpose. With a proper set of software modules and documentation, we hope to offer tools that non-experts can use for efficiently creating dynamic, visual illustrations of topics met in teaching practice. That is, instead of drawing a couple of rough figures on the blackboard indicating some dynamic process you can develop a professional, interactive animation.

Automatic generation of Diffpack code

Although C++ programming with Diffpack is straightforward and often a matter of some manual editing of template codes, you still need to do some careful manual work. We intend to build interfaces, using scripting languages, that automatically generate a working Diffpack code from a mathematical specification of the PDEs to be solved. Further development and tuning can then be performed directly in the Diffpack code, but the proposed tool will get you started with a new application much more quickly, especially if you are a novice Diffpack programmer.

Graphical user interfaces

Python is easily combined with various GUI software, like Tk, Qt, Gtk, MFC, and java.swing. Programming with Tk and Tk extensions (Pmw) from Python means that cross-platform GUIs can be generated with a minimum of code. In other words, if you already steer your scientific computing applications through Python scripts, adding a GUI on top is an efficient process that requires much less code than in C++ or Java. Python and most of its GUI tools are, of course, cross platform and run on Unix, Windows, and Mac.

An ongoing project is to build a graphical interface to the Vtk library for visualization of stationary and time-dependent scalar and vector fields in 2D and 3D. This graphical interface will hopefully offer "visual programming" of the type met in AVS and IRIS Explorer. Together with the Python and C++ interfaces to the Vtk data structures and algorithms, the resulting tool can indeed give you the best of all worlds:

Personnel

Hans Petter Langtangen (book author, course developer, software developer, Diffpack-Python coupling), Kent-Andre Mardal (course and software developer, Diffpack-Python coupling), Ola Skavhaug (high-level parallel Python/BSP/MPI programming), Åsmund Ødegård and Halvard Moe (oracles/wizards), Konrad Hinsen (computational Python expert, France), Greg McFarlane (GUI expert, Australia), Roger Hansen (coupling of Python with F77/C/C++).

Results

Over 550 pages of the book have been written (in revised version) and about 250 students have passed the course (IN 228). One cand.scient. student (Roger Hansen) has been produced, and additional thesis are in progress. Ola Skavhaug has developed parallel PDE solvers based on very high-level programming in Python/BSP.