«Steering Languages and Future Science Codes L. E. Busby October 3, 2014 Disclaimer This document was prepared as an account of work sponsored by an ...»
Steering Languages and Future
L. E. Busby
October 3, 2014
This document was prepared as an account of work sponsored by an agency of the United States
government. Neither the United States government nor Lawrence Livermore National Security, LLC,
nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or
responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.
This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Steering Languages and Future Science Codes Lee Busby, LLNL email@example.com September 15, 2014 Rev. 11 Contents
1. Summary Recommendations........................................... 1
2. Structure of the Paper................................................. 1
3. What is a Steering Language? Why Use One?............................. 2
3.1. Background.................................................... 2
3.2. Your Point of View............................................... 2
4. Recommendation: Build a Pilot Code Using Lua........................... 4
4.1. Comparison of Source Code Size.................................... 5
4.2. Speed and Memory Benchmarks.................................... 5 Expression Parsers............................................ 5 Scimark Benchmarks.......................................... 6 Crosscheck.................................................. 9
5. Recommendation: Use Python More, not Less............................. 9
6. Recommendation: Consider Some
1. Summary Recommendations
• Build a pilot code using Lua as the steering language.1 Standard Lua was 3.5× smaller and 10× faster than Standard Python across ﬁve Scimark benchmarks. It may be more suitable for static linking and as an embedded steering language. It is mature, well-documented, and widely used in many areas, with a recent uptick in scientiﬁc users and codes. These points and others will be more fully developed in the sections that follow.
• Use Python more, not less. Python is a very good choice as a steering language. Moreover, it has become an excellent choice as a primary programming language for many application areas and uses. There are a wide variety of tools and techniques now that effectively allow the scientist/programmer to write (just) Python, then selectively choose how and how much to optimize their code as needed for speed. Consider purchasing support for a comprehensive Python user environment such as Anaconda [CON1] for a period of time.
• Consider some lessons from three components of the Basis project, and try to carry
some of their good ideas into future work:
A) The Basis run-time database (RDB), which is shared with and fully accessible from compiled code, using an ISO C API;
B) A data description language (DDL) that developers use to deﬁne and initialize the RDB;
C) A domain speciﬁc language (DSL) for use in building the user interface for each project.
The ﬁrst two bullets taken together are not intended to suggest an either/or future. Python and Lua can work together well if and when that becomes desireable. However, the interfaces between codes will continue to be important. Interfaces that emphasize native types will usually be a good choice if language interoperability is important.
2. Structure of the Paper The rest of the paper is in ﬁve main parts. Some general background information and terminology is given in the next section. This will hopefully make it easier to follow the discussion in the remainder. Each of the three recommendations above will be discussed 1Disclaimer: The author is participating in an ongoing evaluation of Lua as a steering language for the Blast [KOL1] code.
separately, with a ﬁnal section for concluding remarks and acknowledgements. Three appendices contain information to extend or complete earlier material.
3. What is a Steering Language? Why Use One?
3.1. Background In 1984, Paul Dubois [DUB1] introduced Basis and the idea of steering languages to LLNL. His work preceded Perl (1987), Tcl (1988), and Python (1989) by some years.
MATLAB [MOL1] the company was also founded in 1984. Basis had some similarities with MATLAB, but it went much further in deﬁning ways to connect an interactive command interpreter to compiled code packages. John Ousterhout’s 1990 paper [OUS1] introduced the idea of a language-in-a-library (Tcl), embeddable in other tools, to many people outside our scientiﬁc community. He went on in 1998 [OUS2] to divide (most) programming languages into two main categories: Weakly typed scripting languages like Tcl, used on their own for rapid development, or to ‘‘glue’’ together components written in strongly typed system languages like C. He suggested that ‘‘Scripting and system programming are symbiotic’’, and, to paraphrase, that scripting languages allowed many more, and more casual programmers, to make effective use of computing power.
Python hardly needs further introduction now. Lua [LUA1], [IER1] is less well-known. It began in 1993 as a conﬁguration language, mostly for industrial clients of Tecgraf, the Computer Graphics Technology Group of PUC-Rio in Brazil. It slowly evolved into a complete language over the next 7-10 years. In a 2011 paper, Ierusalimschy et al. [IER2] distinguish two ways that a scripting language can be integrated with code written in a system language: In the ﬁrst form − extending − the main program is written in the scripting language, which is extended using libraries and functions written in the system language. In the second form − embedding − the main program is written in the system language, now called the host program, which can run scripts and call functions deﬁned in the scripting language. Although this terminology is not precise, (most real codes have some of both types of integration), I will use their deﬁnitions for those terms in the rest of the paper. Also, the term steering language implies a context where two languages are being used together, one to ‘‘steer’’ the other, whereas scripting language may be just the one. It’s often ﬁne to write complete programs using (only) Python, Lua, or Basis. I will use those terms synonymously, unless the context requires otherwise.
3.2. Your Point of View
Ok, let’s test your knowledge so far with a short quiz. The proper form for a next generation physics code is:
A) An ensemble of mostly independent compiled packages, glued together with and extending a central steering language that ‘‘knows all, sees all’’;
B) A coherent compiled core, composed of packages designed to know about one another and work together well, along with an embedded scripting language used for problem setup and such other tasks as found to be necessary;
C) Sometimes one, sometimes the other − I just do not know.
That is hardly signiﬁcant in any statistical sense, of course, and if my sample failed to account for your opinions, I am sorry. However, our differences, such as they may be, are not just conﬁned to LLNL. Game developer Tim Sweeney [SWE1] identiﬁed four reasons why mixing a scripting language into compiled code might create problems, which are
• [As a system grows] there is increasing pressure to expose more of its native C++ features to the scripting environment. [The system] eventually grows into a desert of complexity and duplication.
• There is a seemingly exponential increase in the cost and complexity [of the] “interop” layer where C++ and script code communicate. [Interop] becomes very tricky for advanced data types such as containers
• Developers seeking to take advantage of [...] native C++ features end up dividing their code unnaturally between the script world and the C++ world
• Developers need to look at program behavior holistically, but quickly ﬁnd that script debugging tools and C++ debugging tools are separate and incompatible.
Sweeney and his team felt strongly enough about these issues to rip the existing (proprietary) scripting language back out of their popular C++ game engine, after putting it in and living with it for some years. Although I feel that scripting languages do bring net beneﬁt to our physics codes, I acknowledge Sweeney’s points, and allow that I have encountered them in the past. The questionnaire also asked you to rate the relative importance of several ‘‘use cases’’ for a steering language. Here are the average scores from 25 replies. (In this case, AX− and B−division responses were lumped together.) Use Case Mean Score(L=1,M=2,H=3) Set up job input, check syntax 2.70 Postprocess output 2.61 Make custom output ﬁles 2.50 Steer code during run 2.44 Play around, learn code 1.78 Handle unusual input, etc. 1.86 Table 2 − Some Steering Language Use Cases Apparently we’re a little uncomfortable with the idea of playing around at work, even to learn how to use a new code. There isn’t a lot more to say about those speciﬁc results, but
deﬁning a reasonable set of use cases is certainly part of a successful steering language project. One additional table (3) shows how the responders self-identiﬁed as a ‘‘designer’’ or as a ‘‘developer’’.
Finally, in B-division, the breakdown by years in ﬁeld (10:10−20:20) was 1:3:2; AX was 6:4:9.
4. Recommendation: Build a Pilot Code Using Lua The primary purpose of this section is to draw out the differences between Python and Lua, in order to justify the recommendation. That said, the languages have quite a bit in common. Their syntax is generally procedural. Assignments, function deﬁnitions and function invocations are easy to recognize in either language. Python uses indentation to indicate block structure. Lua is more traditional in its use of keywords to indicate block structure. White space in Lua is freeform. Lua is simple, but it has some sophisticated features such as ﬁrst-class functions, closures, coroutines, iterators, and more. (None of which a casual user needs to understand.) Syntactically, it seems fair to say that neither language requires much effort to learn and use. Unlike Python, Lua has no pre-deﬁned class statement, but it does have mechanisms that allow object-oriented interfaces to be constructed, if desired. At the level of linking to C code, the Lua API contained about 113 functions in 2007, whereas the Python API had about 656 public functions. [MUH1] (Neither has changed greatly in the last seven years.) A short example below illustrates basic syntax. More example code is available in the Scimark benchmarks run as part of this study. [LEB1]
def factorial(n): function factorial(n) if n == 0: if n == 0 then return 1 return 1 else: else return n*factorial(n-1) return n*factorial(n-1) end end There are at least two major implementations of the Lua library and language. The ﬁrst is ‘‘Standard Lua’’, the original version from Brazil. The second is Luajit, [LUJ1] which is a just-in-time compiler for Lua. To a ﬁrst approximation, the two versions are compatible, and compatible at the application binary interface (ABI) level: A code can link to either library without recompiling or any special effort. Luajit is probably responsible for much of the recent surge in interest among scientiﬁc users of Lua. Besides being much faster in general, Luajit has an excellent foreign function interface that can simplify much of the
effort in linking to C libraries. For us at LLNL, however, it’s important to note that Luajit is not presently available for the PPC64 architecture (Sequoia). Standard Lua is famous for its portability, and is available for all our platforms.
We need to be careful in touting how small something is, of course. Small size is false economy if you give up required functionality. Python + Numpy + Scipy is an indispensable tool for many scientists. Together, they sum to more than 1½ million lines of code. Is that required for basic code steering? In my opinion, Standard Lua (13 KLOC) or Luajit (69 KLOC) are capable of handling the basic steering task, and enough smaller as to make a qualitative difference in planning for their support and maintenance within a development team.
4.2. Speed and Memory Benchmarks We can argue about functionality, but lines of code is fairly objective and easy to measure. Demonstrating differences in memory size and execution speed takes more work. To that end, I ran two separate sets of benchmarks that included the languages of interest.