RTSC Interface Primer/Lesson 14

From RTSC-Pedia

Jump to: navigation, search
revision tip
—— LANDSCAPE orientation
[printable version]  offline version generated on 18-Aug-2017 00:08 UTC  

RTSC Interface Primer/Lesson 14

Abstract testing — benchmarking IFir implementations

In this lesson we'll expand upon some idioms for module testing first introduced in Lesson 9, by now fabricating a more generic test harness that enables us to easily benchmark the performance of any IFir implementation. Using this test harness in tandem with whole-program optimization, we'll systematically benchmark the entire series of IFir modules presented in Lesson 13.

We've always promised that RTSC can deliver higher-level programming and higher-levels of performance; and now you'll finally see some concrete evidence of this claim, as we compare an optimized and processor-dependent FirB module against a more generalized and abstracted FirC module that achieves portability through skillful use of the proxy-delegate pattern.


Testing more generically

With several heirs of the IFir interface already in hand, we'll turn now to a generic test harness—a new module named acme.filters2.test.FirTester—that invokes and benchmarks any IFir implementation in a uniform manner. Derived from the FirTest1.c test program originally introduced back in Lesson 9, we've elected instead to field a first-class RTSC module in lieu of a dedicated main program to perform the testing itself.

Besides enabling us to declare a PFir proxy through which we can bind the specific IFir implementation under scrutiny, the FirTester module allows the client to further customize the testing scenario through additional configuration parameters.

import stdsorg.math.IFir;
/*! Generic IFir test harness */
module FirTester {
    /*! IFir implementation under test */
    proxy PFir inherits IFir;
    /*! Message displayed when benchmarking */
    config String benchMsg;
    /*! Program entry point */
    Int main(Int argc, Char* argv[]);

Before commenting on FirTester.xdc—especially the somewhat unexpected declaration of main at line 3—let's first look at the module's implementation in the target-domain.

#include <acme/utils2/Bench.h>
#include <xdc/runtime/System.h>
#include "package/internal/FirTester.xdc.h"
#define COEFFSLEN 4
#define FRAMELEN 20
static Int16 coeffs[COEFFSLEN] = {20000, 21000, 22000, -3000}; 
static Int16 inFrame[FRAMELEN] = {0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9};
static Int16 outFrame[FRAMELEN];
static Void printOut();
Int FirTester_main(Int argc, Char* argv[])
    FirTester_PFir_Handle fir;
    FirTester_PFir_Params params;
    params.frameLen = FRAMELEN;
    fir = FirTester_PFir_create(coeffs, COEFFSLEN, &params, NULL);    /* create filter */
    Bench_begin(FirTester_benchMsg);                                  /* start benchmark */
    FirTester_PFir_apply(fir, inFrame, outFrame);                     /* run filter */
    Bench_end();                                                      /* stop and display timing */
    printOut();                                                       /* display results */
    FirTester_PFir_delete(&fir);                                      /* delete filter */
    return 0;
static Void printOut()
    Int i;
    String comma = "";
    System_printf("\toutFrame = {");
    for (i = 0; i < FRAMELEN; i++) {
        System_printf("%s%d", comma, outFrame[i]);
        comma = ",";

Compared with the FirTest1.c program from Lesson 9, we've introduced nothing "new" here at all:  the #include at line 4 should surely come as no surprise; the main function defined at line 5 follows the usual RTSC naming idiom, given the declaration at line 3 of the spec; and the use of a longer FirTester_PFir_ prefix beginning at line 6 follows from the proxy declaration at line 2. Otherwise, we've basically preserved the original program's structure.

The FirTester module presents many more opportunities for generalization through additional module-wide config params. As we've stressed before, client-assignable configs could naturally replace the #define constants and static arrays declared between lines 4 and 5 of FirTester.c. To aid your understanding, though, we've tried to minimize the differences between FirTester.c and the original FirTest1.c program from Lesson 9.

Needless to say, we'll also need an implementation of FirTester in the meta-domain.

function module$use()

Note that the PFir proxy spec'd back at line 1 does not acquire a default binding inside of module$use, unlike the acme.utils2.Bench meta-implementation from Lesson 12; clients must explicitly bind a suitable IFir implemenation to FirTester.PFir. In the same vein, note that the benchMsg config spec'd at line 2 and used internally at line 7 also has no default value; clients must likewise assign a meaningful string to FirTester.benchMsg.

To guarantee robustness, FirTester might actually assert inside the module$use function that the client has indeed bound the module's PFir proxy, issuing a meaningful warning if necessary; similar checks can occur here—or within the special module$validate function invoked at the end of the configuration process—to ensure that benchMsg has a non-empty string value.

To further streamline re-use across a set of IFir implementations—FirA, FirB, and FirC for now, but perhaps others in the future—the acme.filter2.test package employs a single testFir.cfg meta-program made more flexible through arguments originating inside of the package.bld script for the acme.filters2.test package.

var Bench = xdc.useModule('acme.utils2.Bench');
var FirTester = xdc.useModule('acme.filters2.test.FirTester');
var Program = xdc.useModule('xdc.cfg.Program');
FirTester.benchMsg = Program.build.cfgArgs.benchMsg;
FirTester.PFir = xdc.useModule(Program.build.cfgArgs.firModName);
if (Program.build.target.isa == '64P') {
    Bench.PClock = xdc.useModule('txn.clocks.Clock64P');
    Bench.enableFlag = true;
else {
    Bench.enableFlag = false;
Program.main = FirTester.main;

Starting with the benchMsg assignment at line 1, the script simply uses a string retrieved from an identically named property of the special Program.build.cfgArgs object. The binding of the PFir proxy at line 2 similarly retrieves the name of the IFir module under test, passing this value programmatically to xdc.useModule. Formation of the cfgArgs object occurs through a more general form of a Pkg.addExecutable call found in the package.bld script for this package, which we'll have more to say about in the section that follows.

The logic beginning at line 3 then proceeds to configure the Bench module in an obvious way, conditional upon the current program's target. Though we haven't done so here, additional arguments to the testFir.cfg script could similarly generalize the configuration of Bench as we already have done with FirTester.

Turning finally to line 4, the special xdc.cfg.Program module also enables us to designate the actual "main" function for this program. By assigning Program.main a reference to a spec'd function with the same type signature as the standard C main function—certainly the case with FirTester.main spec'd at line 3 of FirTester.xdc—we've effectively overridden the default main entry-point into this program with one of our own module-wide functions.

We've shamelessly stolen the idea behind Program.main from Java, in which program execution begins in a named class that defines a static, public function named main with a well-known signature. Though RTSC does not directly support the notion of a class as a first-class entity—XDCspec has no corresponding keyword—referencing a canonically-named module with a conforming main function (such as acme.filters2.test.FirTester.main) within a RTSC configuration script mimics the way one launches Java (or C#) applications from the command-line or using an XML-based manifest.

Building and running the tests

We've hinted back in Lesson 9 that one key to managing an open-ended suite of tests lies with appropriately structuring the package.bld script to reflect the problem at hand. In this case, we not only need to build multiple programs for multiple targets in general—testFirA.x*, testFirB.x*, testFirC.x*, and so on—but we also need to pass specific arguments to a common testFir.cfg script used throughout.

For those targets supporting whole-program optimization—such as ti.targets.64P, selected in your «examples»/config.bld file—we actually want to build two versions of the same program:  one using the compiler's default release profile; the other using whole-program optimization. To further complicate matters, the package.bld script must also ensure that we build non-portable programs like testFirB.x* for only the appropriate targets.

var Build = xdc.useModule('xdc.bld.BuildEnvironment');
var Pkg = xdc.useModule('xdc.bld.PackageContents');
var TEST_INFO = [
    {id: "FirA", benchMsg: "portable implementation"},
    {id: "FirB", benchMsg: "optimized implementation", buildFor: "64P"},
    {id: "FirC", benchMsg: "generalized implementation"},
var LIB_NAME = "lib/" + Pkg.name;
var LIB_SRCS = ["FirTester.c"];
for each (var targ in Build.targets) {
    Pkg.addLibrary(LIB_NAME, targ, {profile: "whole-program"}).addObjects(LIB_SRCS);
    for each (var testInfo in TEST_INFO) {
        if (testInfo.buildFor && testInfo.buildFor != targ.isa) {
        var progName = "test" + testInfo.modId;
        var progAttrs = {
            cfgScript: "testFir.cfg",
            cfgArgs: "{" +
                "firModName: 'acme.filters2." + testInfo.id + "', " +
                "benchMsg: '" + testInfo.benchMsg + "'" +
        Pkg.addExecutable(progName, targ, targ.platform, progAttrs).addObjects(["FirTester.c"]);
        if (!targ.profiles["whole_program"]) {
        progName += "-WP";
        progAttrs.profile = "whole_program";
        Pkg.addExecutable(progName, targ, targ.platform, progAttrs);

The Pkg.addExecutable call at line 5 receives an additional fourth argument here, used to further refine the building of the program designated as progName—a local variable initialized at line 2 using structured elements within the TEST_INFO array defined at the top of the file. The variable progAttrs—itself an object—further defines a pair of properties that shape the configuration of the designated program.

  • the cfgScript property initialized at line 3 designates this particular script as the current meta-program, overriding any implicit assumptions based upon the program's name; and

  • the cfgArgs property initialized at line 4 becomes a set of arguments passed to the program's configuration script, accessed via the Program.build.cfgArgs object in the meta-program itself.

The cfgArgs property expects a string which, when evaluated later at the outset of program configuration, results in the actual XDCscript object consumed by testFir.cfg. To avoid excessive use of escape sequences inside this string (such as "\""), recall that XDCscript allows us to use matching single- or double-quotes in literal string values; we've done our best at line 4 to make this multi-line string look like the "real" object it will eventually become.

Finally, the logic at line 1 uses an optional buildFor property found in some elements of the TEST_INFO array to screen out particular IFir modules that cannot support the current target. In our case, this ensures that we only test the FirB module using a TMS32064+ target and platform; unlike the other modules, we will not build a native version of the testFirB program using a GCC-based RTSC target such as gnu.targets.Mingw.

Further filtering then occurs at line 6, where we determine whether the current target supports a special "whole_program" profile. If so, we invoke Pkg.addExecutable for a second time at line 7, only now with progName carrying a distinct "-WP" suffix and the special progAttrs.profile property appropriately assigned.

To build and run all tests targeted for the TMS32064+ within the acme.filters2.test package—which will output benchmark timings as well as program results—simply invoke xdc test,64P from the command-line. Invoking xdc test will additionally build and run native versions of (a subset of) these programs—but without any benchmark timings.

Interpreting benchmark results

If you haven't already done so, let's fire off the xdc test,64P command—which will benchmark FirA through FirC, both with and without whole-program optimization—and then analyze the results.

>> xdc test,64P
running testFirA.x64P ...
portable implementation [571]
	outFrame = {0,-1,0,1,3,5,7,9,10,12,15,10,5,1,3,5,7,9,10,12}
running testFirA-WP.x64P ...
portable implementation [550]
	outFrame = {0,-1,0,1,3,5,7,9,10,12,15,10,5,1,3,5,7,9,10,12}
running testFirB.x64P ...
optimized implementation [391]
	outFrame = {0,-1,0,1,3,5,7,9,10,12,15,10,5,1,3,5,7,9,10,12}
running testFirB-WP.x64P ...
optimized implementation [370]
	outFrame = {0,-1,0,1,3,5,7,9,10,12,15,10,5,1,3,5,7,9,10,12}
running testFirC.x64P ...
generalized implementation [1903]
	outFrame = {0,-1,0,1,3,5,7,9,10,12,15,10,5,1,3,5,7,9,10,12}
running testFirC-WP.x64P ...
generalized implementation [370]
	outFrame = {0,-1,0,1,3,5,7,9,10,12,15,10,5,1,3,5,7,9,10,12}

Looking first at our FirA baseline—built with the default profile at line A1 and with whole-program optimization at line A2—we see only modest improvement in the latter over the former. This pattern repeats itself with the optimized FirB module at lines B1 and B2, as the impact of loop-unrolling and processor-specific intrinsics used in the FirB target-implementation begins to kick in here.

The most interesting outcome, however, occurs with FirC at lines C1 and C2. Before commenting on the wide disparity in these timings, note that whole-program optimization of this heavily-abstracted target-implementation yields identical results to the compiler-specific and processor-dependent FirB module benchmarked at line B2. Despite the generality introduced through use of an IMathOps proxy-delegate pattern inside of FirC, we've managed to hold the line on runtime performance!

As for the large degradation in FirC when built with the default profile at line C1, just think about what's really happening in each of those FirC_PMathOps calls after line 3 in Fir.ctwo runtime function dispatches apiece, each eventually arriving in a function body comprising one processor instruction. With callers and callees residing in separate source files, however, "normal" compilation has no choice but to generate external function calls at each site. Using whole-program optimization, on the other hand, the compiler rather trivially identifies these (small!) functions as viable candidates for automatic inlining.

The brilliance of whole-program optimization results from applying relatively basic compilation techniques—function inlining, constant folding, and dead code elimination—to a consolidated base of source code that otherwise resides in separate files potentially produced by different individuals. In reality, the compiler's no smarter; it now just sees more of the program during its usual optimization phase, and hence will often yield dramatic performance improvements—especially when given RTSC modules replete with references to (extern) config params as well as calls to (extern) proxies that in turn call other (extern) functions provided by the delegate module.

See also


[printable version]  offline version generated on 18-Aug-2017 00:08 UTC  
Copyright © 2008 The Eclipse Foundation. All Rights Reserved
Personal tools
package reference