Project
The use of very high-level languages (Python/Perl) for creating
flexible scientific software environments
Background
-
Are you a scientist or engineer running simulation
programs, making visualizations, managing large sets of numerical
experiments, and comparing huge amounts of figures?
-
Would it be of interest to make this type of work more efficient,
increase the reliability, and have more fun?
-
Are you interested in developing graphical user interfaces,
including Web services, but need an easy-to-use, programmable
environment?
-
Are you concerned with portability and want to develop programs
that run on Unix, Windows, and Macintosh machines?
-
Would you like to easily combine well-tested Fortran 77 or C libraries with
a modern object-oriented interface, having a tailored, Matlab-like,
interactive syntax?
-
Would you like to develop your own tailored, user-friendly, efficient
working environment?
If one or more of these questions attract your attention, the current project
will be of interest!
Purpose
The primary purpose of the project is to help scientists and engineers
working intensively with computers to become more productive,
have more fun, and improve the reliability of their work.
Scripting can be a key tool for reaching these goals.
The term scripting means different things to different people.
We use the term scripting
for writing programs in what is often referred to
as very high-level languages. Perl and Python are primary
examples of such languages, and the resulting programs have
more the nature of managing computations rather than crunching numbers
in do/for-loops.
The main tasks in the projects are to
- develop a book about scripting techniques in computational science,
- develop strategies for writing Python interfaces to Diffpack,
- equip Diffpack with managing tools based on scripts,
- use scripting for simplifying Diffpack programming.
What scripting is about
The simplest application of scripting is to write short scripts (programs)
that automate manual interaction with the computer, meaning that
the scripts mainly
glue stand-alone applications and operating system commands.
More advanced use of scripting includes
searching and manipulating text files,
rapid construction of graphical user interfaces,
tailoring visualization and image processing environments to your
own needs, administering large sets of computer experiments,
and
managing your existing Fortran, C, or C++ libraries and
applications directly from scripts.
Scripts are often much shorter than corresponding
Fortran, C, C++, or Java programs and
considerably faster to develop.
Moreover, scripts are for
the most part truly cross-platform, so what you write on Windows
runs without modifications on Unix and Macintosh, also when
graphical user interfaces are involved.
The interest in scripting with Perl and Python has exploded among
Internet service deveopers and computer system administrators.
However, scripting has a significant potential in computational
science and engineering (CSE) as well.
Software systems such as Matlab, Maple, Mathematica, and S-Plus are
primary examples of very popular, widespread tools because of their
simple and effective user interface. The user issues
high-level commands, which may be very short for novice users or
filled with fine-tuning parameters for the more experienced
users. Results and data can be stored in variables without bothering
about the type. You often feel that the syntax is as short as
possible. Python resembles the nature of these interface languages, but
is a full-fledged, advanced programming language. With Python and
the techniques explained in this book, you can actually create your
own easy-to-use computational environment, which can be made very close to
the nature of Matlab, Maple, and similar systems (and communicate with them),
but tailored to your own number crunching codes.
In our scripting project is to develop software techniques for
combining "the best of all worlds", i.e., combining different tools
and programming languages, when building scientific applications. As a
simple example, one can think of using a C++ library for creating
computational grids, a Fortran 77 library for solving a PDE, a C code
for visualization, and the scripting language Python for gluing the
other tools together in a high-level program, perhaps with an
easy-to-use graphical interface.
Mixed-language programming is possible in Fortran, Java, and C-like
languages, but the support for such programming is much stronger in a
scripting language like Python, first of all because scripting
languages were intially designed for being integrated with C. The
clear syntax of Python means that the controling high-level script
looks much like a Matlab-like interface tailored to your Fortran and
C/C++ libraries. The script can be run as a stand-alone application or
interactively (just as Matlab), or you can invoke it through a fully
graphical user interface. Notice that interactivity means that you can
almost trivially introduce computational steering into your
applications.
Roughly speaking, the combination of Python, traditional number
crunching languages, and existing computing and visualization software
makes it possible, with quite limited efforts, to build a Matlab-like
environment for your special set of numerical software. The
alternative strategy of including your code in a system like Matlab is
a much less straightforward process and gives you a much less flexible
end product.
The Book
The first part of the project is to develop a book and a course teaching
scripting techniques in a computational science setting.
The emphasis is on the Python language and its modules, often in
combination with Fortran 77, C and C++ code.
The book's Table of Contents gives an impression
of the relevant topics.
Parts of these topics constitute a university course
(IN 228), with about 100+
students passing the requirements each fall.
A short course on using Python in numerical investigations has been
given at Uppsala University.
Very high-level parallel programming
The BSP software and programming
model are available through an interface in Python (as part of the
Scientific Python package by Konrad Hinsen). Ola Skavhaug is
investigating how this very high-level Python interface can simplify
the standard C/C++/MPI-based low-level programming in parallel
computing. Two aspects make this approach promising: (i) the BSP
programming model is conseptually simple and efficiently implemented,
and (ii) numerics in Python can be efficiently handled using the
Numerical Python (NumPy) package. In practice, this type of
programming involves development of new classes for the frequently
encountered data structures when solving PDEs, e.g., tridiagonal and
sparse matrices. These classes are derived from BSP-Python classes,
thus inheriting parallel programming tools, and make extensive use of
NumPy arrays, either within the functionality of NumPy or in
specialized C/F77 extension modules operating directly on the NumPy
data segments. The development of these classes is on a much higher
level than C/MPI programming, but the real benefit is the programming
with these classes; one can simply write A*x in a Python code
to get a distributed matrix-vector product with a distributed matrix
A and vector x. Hence, the approach brings parallel programming to
a much higher conceptual level.
Besides simplifying research work
involving parallel computing, the approach may have a particular
strong
impact on training students in parallel thinking and implementation
at an early stage in their studies.
The Diffpack-Python coupling
Diffpack is programmed in C++.
For those who have worked much with scripting languages, or
environments like Maple and Matlab, where the syntax is simple
and tasks can be carried out interactively, programming C++ soon feels
comprehensive and tedious.
Our aim is to develop tools such that you can program Diffpack
simulators in Python. Especially when running the simulator, making
adjustments, and doing computational steering, the Python interface
gives you an efficient very high-level language, of the nature in
Maple and Matlab, but tailored to a complicated mathematical/numerical
problem (a PDE solver).
The ultimate aim is to allow users to quickly build an interface as
convenient as Matlab to any numerical application or library.
In the first part of the project we concentrate on Python interfaces
to Diffpack simulators. Later, we will focus on Python interfaces to
the Diffpack libraries.
Code re-use
Experience during the 90s with using C++ for
numerical computing has clearly shown that the resulting libraries and
applications are significantly easier to extend, maintain, and re-use
in new situations, compared to the traditional procedure-based
programming style of Fortran and C. Object-oriented programming and
generic (template) programming are two important techniques that
contribute to such improvements of software development efficiency.
Nevertheless, many research groups already have well-tested and
optimized numerical codes, written in Fortran or C, and true code
re-use would mean to integrate the pure computing parts of such codes
with new developments in perhaps C++ or Java. Combining C++ and
Fortran, or Java and C, quickly gives you a lot of frustrations (think
of differences in representing even simple data structures such as
strings!). Python offers the benefits of object-oriented and generic
(template) programming, together with a syntax that is simpler and
clearer than C++ and Java. In addition, there exists several tools
which makes calling Fortran 77/90, C, C++, or Java code trivial, at
least in principle. Hence, the idea is to write the managing code
segments in Python, using efficient data structures and algorithms in
new or old Fortran, C/C++, or Java code.
High-level tools for simplifying mixed-language integration
Even if the overall goal sounds attractive, and Python is a good
starting point for mixed-language programming, it must be extremely
easy to combine different languages if this is a technique that is
going to be used widely. There are tools (SWIG, Pyfort, f2py) that
automatically generate most of the necessary code for combining Python
with Fortran and C/C++ code, but some disturbing details are often
left for manual adjustment. This can be very annoying unless you have
lots of experience with such language integrations. We therefore aim
at building scripts on top of existing integration tools to streamline
the integration process. One goal is to write Python code in one part
of an editor window and simply jump to another part to write number
crunching code in Fortran, C, or C++, often calling up your own
libraries in those languages. With just pressing a button or running a
simple one-line command, the Python script works with the external
numerical code. (We remark that combining Python and Java is already
made fully transparent by JPython.)
Moving graphics for teaching
Dynamic processes, either physical or algorithmic, can be effectively
visualized using animation of figures, preferably with some
interactive control through a graphical user interface. We want to
explore the use of Python and Tk/Pmw for this purpose. With a proper
set of software modules and documentation, we hope to offer tools that
non-experts can use for efficiently creating dynamic, visual
illustrations of topics met in teaching practice. That is, instead of
drawing a couple of rough figures on the blackboard indicating some
dynamic process you can develop a professional, interactive animation.
Automatic generation of Diffpack code
Although C++ programming with Diffpack is straightforward and often a
matter of some manual editing of template codes, you still need to do
some careful manual work. We intend to build interfaces, using
scripting languages, that automatically generate a working Diffpack
code from a mathematical specification of the PDEs to be
solved. Further development and tuning can then be performed directly
in the Diffpack code, but the proposed tool will get you started with
a new application much more quickly, especially if you are a novice
Diffpack programmer.
Graphical user interfaces
Python is easily combined with various GUI software, like Tk, Qt, Gtk,
MFC, and java.swing. Programming with Tk and Tk extensions (Pmw) from
Python means that cross-platform GUIs can be generated with a minimum
of code. In other words, if you already steer your scientific
computing applications through Python scripts, adding a GUI on top is
an efficient process that requires much less code than in C++ or
Java. Python and most of its GUI tools are, of course, cross platform
and run on Unix, Windows, and Mac.
An ongoing project is to build a graphical interface to the Vtk
library for visualization of stationary and time-dependent scalar and
vector fields in 2D and 3D. This graphical interface will hopefully
offer "visual programming" of the type met in AVS and IRIS
Explorer. Together with the Python and C++ interfaces to the Vtk data
structures and algorithms, the resulting tool can indeed give you the
best of all worlds:
-
complete programming control for automation and real-time visualization, or
-
user-friendly visual construction of visualization pipelines.
Personnel
Hans Petter Langtangen (book author, course developer, software
developer,
Diffpack-Python coupling),
Kent-Andre Mardal (course and software developer, Diffpack-Python
coupling), Ola Skavhaug (high-level parallel Python/BSP/MPI programming),
Åsmund Ødegård and Halvard Moe (oracles/wizards),
Konrad Hinsen (computational Python expert, France),
Greg McFarlane (GUI expert, Australia),
Roger Hansen (coupling of Python with F77/C/C++).
Results
Over 550 pages of the book have been written (in revised version)
and about 250 students have
passed the course (IN 228).
One cand.scient. student (Roger Hansen) has been produced, and
additional thesis are in progress.
Ola Skavhaug has developed parallel PDE solvers based on
very high-level programming in Python/BSP.