A P P E N D I X  A

Frequently Asked Questions

This appendix provides frequently asked questions regarding the Netra Data Plane software and how it interacts with the tejacc compiler.


Summary

General Questions

Configuration Questions

Building Questions

Late-Binding Questions

API and Application Questions

Optimization Questions

Legacy Code Integration Questions

Sun CMT Specific Questions

Address Resolution Protocol (ARP) Questions


General Questions

What is Teja 4.x and How Does It Differ From an Ordinary C Compiler?

Teja 4.x is an optimizing C compiler (called tejacc) and API system for developing scalable, high-performance applications for embedded multiprocessor architectures. tejacc operates on a system-level view of the application through three techniques:

The techniques yield superior code validation and optimization, leading to more reliable and higher performance systems.

Where Are the Tutorials?

The ticktock tutorial is in Tutorial.

Where Is the Documentation?

The Teja specific documentation is available from the Teja website:

http://www.teja.com/library/index.html


Configuration Questions

What Are the Hardware Architecture, Software Architecture and Mapping Dynamic Libraries for?

These three dynamic libraries are user supplied. The libraries describe the configuration of the hardware (processors, memories, buses), software (OS, processes, threads, communication channels, memory pools, mutexes), and mapping (functions to threads, variables to memory banks). The library code runs in the context of the tejacc compiler. The tejacc compiler uses this information as a global system view on the entire system (hardware, user code, mapping, connectivity among components) for different purposes:

The dynamic libraries are run on the host, not on the target.

How Can I Debug the Dynamic Libraries?

Two ways to help debug the dynamic libraries are:

1. Type:


gdb $teja-install-directory/bin/tejacc

2. Set a breakpoint on the teja_user_libraries_loaded function.

3. Type run followed by the same parameters that were passed to tejacc.

Control returns immediately after the user dynamic libraries are loaded.

4. Set a breakpoint on the desired dynamic library function, and type cont.

What Should I Do When the tejacc Compiler Crashes?

There might be a bug in the hardware architecture, software architecture, or mapping dynamic libraries. To debug the issue, see How Can I Debug the Dynamic Libraries?.

What If the Hardware Architecture, Software Architecture, or Mapping Dynamic Libraries Crash?

tejacc gets information about hardware architecture, software architecture, and mapping by executing the configuration code compiled into dynamic libraries. The code is written in C and might contain errors causing tejacc to crash. Upon crashing, you are presented with a Java Hotspot exception, as tejacc is internally implemented in Java.

An alternative version of tejacc.sh called tejacc_dbg.sh is provided to assist debugging configuration code. This program runs tejacc inside the default host debugger (gdb for Linux/Cygwin hosts, dbx for Solaris hosts). The execution automatically stops immediately after the hardware architecture, software architecture, and mapping dynamic libraries have been loaded by tejacc.

You can continue the execution and the debugger stops at the instruction causing the crash. Alternatively, you can set breakpoints in the code before continuing or use any other feature provided by the host debugger.

Can I Build Hardware Architecture, Software Architecture, and Mapping in the Same Dynamic Library?

The dynamic libraries can be combined, but the entry points must be different.

Can I Map Multiple Variables With One Function Call?

Use regular expressions to map multiple variables to a memory bank, using the function:


teja_mapping_t teja_map_variables_to_process_(const char * var, const char * process);

For example, to map all variables starting with my_var_ to the OS-based memory bank:


teja_map_variables_to_memory ("my_var_.*", TEJA_MEMORY_TYPE_OS_BASED);


Building Questions

Where Is the Generated Code?

The generated code is located in the top-level-application/code/process directory, where top-level-application is the directory where make was invoked and process is the process name as defined in the software architecture.

If you are generating with optimization there is an additional directory, code/process/.ir. Optimized generation is a two-step process. The .ir directory contains the result of the first step.

Where Is the Executable Image?

The executable image is located in the code/process directory, where process is the process name as defined in the software architecture.

How Can I Compile Multiple Modules on the Same Command-Line?

tejacc is a global compiler, and all C files must be provided on the same command-line in order for it to perform global validation and optimization. To compile an application that requires multiple modules, use the srcset CLI option. The syntax for this option is:


-srcset srcset-name srcset-specific-options source-files

where:

How Can I Pass Different CLI Options to Different Modules on the tejacc Command- Line?

See How Can I Compile Multiple Modules on the Same Command-Line?.

How Can I Change the Behavior of the Generated makefile Without Modifying It?

You can use the user_defs.mk file in the top level application directory to overwrite any parameter that the generated makefile sets. The substituted value for the parameter is used for the compilation.

You can specify any file other than the default user_defs.mk file using the "external_makefile" property of the process. For example:


teja_process_set_property(<process_obj>, "external_makefile", "new-filename-with-or-without-path")

If the path is not specified, the top-level application directory is assumed.



Note - There is no warning or error if the file does not exist, the compilation continues with the generated Makefile parameters.


You can also specify the user_defs.mk file as a value to the EXTERNAL_DEFINES parameter during the compilation of the generated code. For example:


gmake EXTERNAL_DEFINES=../../user_defs.mk

The user_defs.mk file takes precedence over the value you specify in the software architecture if you use both methods.

How Do I Compile the RLP and IPfwd Applications to Run on the Netra CP3060?

For ipfwd, execute build_1g_1060 under src/apps/rlp.

For RLP, you need to modify apps/rlp/src/app/rlp.c and apps/rlp/src/app/rlp_config.h to reflect the exact mapping.

Add the following to TEJACC_CFLAGS in the application makefile:


-DDEVID_1060

How Can I Change the Behavior of the Generated Makefile Without Modifying It?

You can create an auxiliary file that modifies the behavior of the generated makefile and then invoke the generated Makefile with the EXTERNAL_MAKEFILE variable set to this file name, Or use the "external_makefile" property in the software architecture (both mechanisms are explained below). This causes the generated makefile to include the file after setting up all the parameters but before invoking any compilation command. You can then overwrite any parameter that the generated Makefile is setting and the new value for that parameter will be in effect for the compilation.

You can specify a file name using the "external_makefile" property of the process. To set the new value for the property, invoke the following: teja_process_set_property(process_obj,"external_makefile","new file name with or without path")

If the path is not specified, the top level application directory will be assumed. The path can be relative to the top level application directory or an absolute value. There will not be any warning or error if the file does not exist, the compilation will continue with the generated Makefile parameters. If you prefer, you can also specify this external defines filename as a value to the EXTERNAL_DEFINES parameter during the compilation of the generated code. For example, gmake EXTERNAL_DEFINES=../../user_defs.mk. This value will take precedence over the value specified in the software architecture if both of the approaches are used.

An example of user_defs.mk is USR_CFLAGS=-xO3.

The generated Makefile can be invoked as:


make EXTERNAL_DEFINES=user_defs.mk

This has the effect of adding the -xO3 flag to the compilation lines.


Late-Binding Questions



Note - Refer to Overview of the Late-Binding API for more information on the Late-Binding API.


What Is the Late-Binding API?

The late-binding API is the Teja equivalent of OS system calls. However, OS calls are fixed in precompiled libraries, and late-binding API calls are generated based on contextual information. This situation ensures that the late-binding API calls are small and optimized. See What Is Context-Sensitive Generation?.

The late-binding API addresses the following services:

What Is a Teja Memory Pool?

A memory pool is a portion of contiguous memory that is preallocated at system startup. The memory pool is subdivided into equal-sized nodes and allocated. You declare memory pools in the software architecture using teja_memory_pool_declare(). Memory pools enable you to choose size implementation type, producers, consumers, and so on. In the application code, you can write data to the channel using teja_channel_send() and read from the channel using teja_wait(). The send and wait primitives are Late-Binding API calls (see What Is the Late-Binding API?), so they benefit from context-sensitive generation.

What Is a Teja Channel?

A channel is a pipe-like mechanism to send data from one thread to another. Channels are declared in the software architecture using teja_channel_declare(), which enables you to choose the size and number of nodes, implementation type, and so on. In the application code, you can get nodes from or put nodes in the memory pool, using teja_memory_pool_get_node() and teja_memory_pool_put_node. The allocation mechanism is more efficient than malloc() and free(). The get_node and put_node primitives are Late-Binding API calls, so they benefit from context-sensitive generation.

How Do I Access a Late-Binding Object From Application Code?

Use the teja_late-binding-object-type_declare call to declare all late-binding objects (memory pool, channel, mutex, queue) in the software architecture. The first parameter of this call is a string containing the name of the object. In the application code, the late-binding objects are accessed as a C preprocessor symbolic interpretation of the object name. The name is no longer a string. tejacc makes these symbols available to the application by processing the software architecture dynamic library.

Can I Define a Symbol in the Software Architecture and Use It in My Application Code?

The following function in the software architecture can define a C preprocessor symbol used in application code:


int teja_process_add_preprocessor_symbol (teja_process_t process, const char * symbol, const char * value);

where:



Note - In the application, the symbol is accessed as a C preprocessor symbol, not as a string.



API and Application Questions



Note - Refer to the Netra Data Plane Software Suite 1.1 Reference Manual for detailed description of the API functions.


How Do I Synchronize a Critical Region?

Use the mutex API, which is composed of:

How Do I Send Data From a Thread to Another Thread?

Use the Channel API or the Queue API.

The Channel API is composed of:

The Queue API is composed of:

How Do I Allocate Memory?

Use the Memory Pool API, which is composed of:

What Changes Are Required for the RLP Application Under LDoms?

The RLP application requires some minimal changes to run under the LDoms enviroment.

1. Update the src/app/rlp.c file to reflect the total number of strands given to the LWRTE domain.

2. Change the src/app/rlp_config.h file so that the stat thread is configured to run on the last strand (CPU number) available for your LDom partition.

The NUM_STRANDS macro requires this change from the default value of strand 31.

3. Change the src/config/rlp_map.c and src/config/rlp_swarch.c files to reflect the new architecture of your LWRTE domain under LDoms.

What Changes Are Required for the ipfwd Application Under LDoms?

The ipfwd application requires some minimal changes to run under the LDoms enviroment.

1. In apps/ipfwd./src/config/ipfwd_map.c, do the following:

Comment out teja_map_function_to_thread("_main", "main_thd16") and main_thd > 16.

2. In apps/ipfwd.10g/src/config/ipfwd_swarch.c, do the following:

a. Comment out
processors[16] = "app.cmt1board.cmt1_chip.strand16";
and similar lines greater than 16.

b. Commnet out
thd[15] = teja_thread_create(init, "main_thd16");
and similar lines greater than 16.

c. Assign stat_thd to 15 by entering the following:


thd[15] = teja_thread_create(init, "stat_thd");

d. Comment out
thd[31] = teja_thread_create(init, "stat_thd");

3. In apps/ipfwd.10g/src/app/ipfwd_config.h, change NUM_STRANDS to 16.



Note - Strand number of 16 is only used in this example. You can use any multiple of 4, from 4 through 28.


When Should I Use Queues Instead of Channels?

Generally, queues are more efficient than channels. Consider the following guidelines when deciding between queues or channels:

Why Is It Not Necessary to Block Interface or Queue Reads?

If a queue is used by one producer and one consumer, there is no need to block during the queue read. For example, in the ipfwd application, each queue has only one producer and consumer, and does not need to block. See FIGURE A-1.

FIGURE A-1 Example for the ipfwdApplication



Note - If the Teja Queue API is used instead of Fast Queue, then locks are generated implicitly during compile time.


It is not necessary to block Ethernet interface reads, as there is only one thread reading from or writing to a particular interface port or DMA channel at any given time.

Can Multiple Strands on the Same Queue Take Advantage of the Extra CPU Cycles If the Strands Are Not Being Used?

A strand is not being used or consuming the pipeline only when the strand is parked. Even when a strand is calling teja_wait(), the CPU consumes cycles because the strand does a busy wait. If the strand performs busy polls, the polls can be optimized so that other strands on the same CPU core utilize the CPU. This optimization is accomplished by executing instructions that release the pipeline to other strands until the instruction completes.

Consider IP-forwarding type applications. When the packet receiving stream approaches line rate, it is better to let the strand perform busy poll for arriving packets. At less than the line rate, the polling mechanism is optimized by inserting large instructions between polls. Under this methodology, the pipeline releases and enables other strands to utilize unused CPU cycles.

Why Does the Application Choose the Role for the Strand From the Code Instead of the Software Architecture API?

When the role is determined from the code, the application (for example, ipfwd.c) can be made more adaptable to the number of flows and physical interfaces without modifying any mapping files. However, this is your preference and in some situations, the Software Architecture API can provide a better role for a strand.

Is It Possible to Park a Strand Under LDoms Without Explicitly Specifying So in the Code?

Methods of parking strands are no different in an LDoms environment. Un-utilized strands are automatically parked. If a strand is assigned to a logical domain but is not used, then that strand should be parked. Strands that are not assigned to the LWRTE logical domain are not visible to that domain and cannot be parked.

You must assign complete cores to LWRTE. Otherwise, you have no control over the resources consumed by other domains on the core.

What Is bss_mem?

For example:


(ipfwd_map.c) (teja_map_variables_to_memory(".*", "app.cmt1board.bss_mem");

bss_mem is a location where all global and static variables are stored.



Note - The sum of BSS and the code size must not exceed 5 Mbytes of memory.


What Is the Significance of bss_mem Placement in the Code Listing?

When the example of What Is bss_mem? is inserted into the code, all subsequent variables using .*_dram are superseded. To clarify, all variables suffixed with _dram are mapped to the DRAM memory region. All other variables are mapped to the BSS.

How Are app.cmt1board.heap_mem0 and Similar Heaps Affected?

The heap region is used by teja_malloc(). Every time teja_malloc() is called, the heap space is reduced.

Can You Clarify BSS, Code, Heap, and DRAM Memory Allocation?

FIGURE A-2 illustrates the allocation of memory for BSS, code, heap, and DRAM.

FIGURE A-2 Memory Allocation Stack

where:

Why Are So Many Warnings Displayed When Compiling the ipfwd Code?

Some of the warnings are marginal warnings that are accepted by a regular C compiler, but not the tejacc compiler.

What Is LWIP Lib?

The Light Weight Internet Protocol Library (LWIP lib) consists of essential functions for implementing a basic User Datagram Protocol (UDP) or Transport Control Protocol/Internet Protocol (TCP/IP) stack.

Does the eth_* API Support Virtual Ethernet Devices?

The eth_* API supports only physical Ethernet devices at this time. IPC is designed to run on top of Logical Domain Communication (LDC, a HyperVisor protocol) and on the eth_* API.


Optimization Questions

How Do I Enable Optimization?

TABLE A-1 describes the options for tejacc to enable optimization:


TABLE A-1 Optimization Options for tejacc

Option for tejacc

Description

-O

Enables all optimizations.

-fcontext-sensitive-generation

Enables context sensitive generation only.


What Is Context-Sensitive Generation?

Context-sensitive generation is the ability of the tejacc compiler to generate minimal and optimized system calls based on global context information provided from:

In the traditional model, the operating system is completely separated from the compiler and the operating system calls are fixed in precompiled libraries. In the tejacc compiler, each system call is generated based on the context.

For example, if a shared memory communication channel is declared in the software architecture as having only one producer and one consumer, the tejacc compiler can generate that channel as a mutex-free circular buffer. On a traditional operating system the mutex would have to be included because the usage of the function call was not known when the library was built. See Overview of the Late-Binding API for more information on the Late-Binding API.

What Is Global Inlining?

Functions marked with the inline keyword or with the -finline command-line option get inlined throughout the entire application, even across files.


Legacy Code Integration Questions

How Can I Reuse Legacy C Code in a Teja Application?

You can port preexisting applications to the Teja environment. There are two methods to integrate legacy application C code with newly compiled Teja application code:

Linking Legacy Code to Teja Code

By linking legacy code the to Teja code as libraries, the legacy code is not compiled and changes are minimized. The legacy library is also linked to the Teja generated code, so those libraries must be available on the target system, where performance is not an important factor.

For example, to port a UNIX legacy application to Teja running on UNIX is simple, because all of the UNIX libraries are available. However, porting a UNIX application to Teja running on bare hardware might require additional effort, because the bare hardware system does not have all the necessary OS libraries. In this situation, you must provide the missing functions.

Changing Legacy Source Code

Introducing calls to the Teja API in the legacy source code enables context-sensitive and late-binding optimizations to be activated in the legacy code. This method provides higher performance than the linking method.

Heavy memory allocation operations such as malloc and free are substituted with Teja preallocated memory pools, generated in a context-sensitive manner. The same advantage applies to mutexes, queues, communication channels, and functions such as select(), which are substituted with teja_wait().



Note - It is not necessary to substitute all legacy calls with Teja calls as only performance-critical parts of legacy code need to be ported to Teja. Error handling and exception code can remain unchanged.


How Can I Reuse Legacy C++ Code in a Teja Application?



Note - See How Can I Reuse Legacy C Code in a Teja Application?.


C++ code can be integrated with a Teja application by two methods:

Mixing C and C++ Code

Teja generates C code, so the final program is in C. Mixing C++ and Teja code is similar to mixing C++ and C code. This topic has been discussed extensively in C and C++ literature and forums. Basically, declare the C++ functions you call from Teja to have C linkage. For example:


#include <iostream>
extern "C" int print(int i, double d)
{
   std::cout << "i = " << i << ", d = " << d;
}

Compile the C++ code natively with the C++ compiler and link it to the generated Teja code. The Teja code can call the C++ functions with C linkage.

For detailed discussions of advanced topics such as overloading, templates, classes, and exceptions, refer to these URLs:

Translating C++ Code to C Code

The third-party packages at the following web sites can be used to translate code from C++ to C. Sun has not verified the functionality of these software programs:



Note - This latter URL requires registration with the Sun Download Center.



Sun CMT Specific Questions

Is There a Maximum Allowed Size for Text + BSS in My Program?

The limit is 5 Mbyte. If the application exceeds this limit, the generated makefile indicates so with a static check.

How Is Memory Organized in the Sun CMT Hardware Architecture?

TABLE A-2 lists the default memory setup in Sun CMT hardware architecture:


TABLE A-2 Default Memory Setup

Memory Address Space

Description

0x00000000 - 0x11000000

Reserved for system use.

0x11000000 - 0x13000000

Private heap memory for each strand. On CMT systems, there are 32 strands and each strand receives 1/32th of the memory from 0x11000000 to 0x13000000: the first strand has its heap from 0x11000000 to 0x11100000, the second one has its heap from 0x1110000 to 0x11200000, and so on. Heap memory is used by teja_malloc().

0x13000000 - 0x100000000

Shared DRAM. Variables that are mapped to DRAM are generated in the static memory map.


These values are changed in the memory bank properties of the hardware architecture. For example, to move the end of DRAM to 0x110000000, add the following code to your hardware architecture:


teja_memory_t mem; char * new_value = "0x110000000"; ... mem = teja_lookup_memory (board, "dram_mem"); teja_memory_set_property (mem, TEJA_PROPERTY_MEMORY_SIZE, new_value); 

How Do I Increase the Size of the DRAM membank?

You can increase the size of DRAM as explained in How Is Memory Organized in the Sun CMT Hardware Architecture?.


Address Resolution Protocol (ARP) Questions

How Do I Enable ARP in RLP?

To enable ARP in RLP, you need to make the following changes:

1. Modify rlp_config.h to give IP addresses to the network ports.

For example,

a. Assign an IP address to the network ports of the system, running lwrte.


 # define    IP_BY_PORT(port)    \((port == 0)? __GET_IP(192, 12, 1, 2): \(port == 1)? __GET_IP(192, 12, 2, 2): \(port == 2)? __GET_IP(192, 12, 3, 2): \(port == 3)? __GET_IP(192, 12, 4, 2): \(0))

b. Tell the RLP application, the remote IP address to which its going to send IP packets.


 # define DEST_IP_BY_PORT(port) \((port == 0)? __GET_IP(192, 12, 1, 1): \
(port == 1)? __GET_IP(192, 12, 2, 1): \
(port == 2)? __GET_IP(192, 12, 3, 1): \
(port == 3)? __GET_IP(192, 12, 4, 1): \
(0))

c. Assign netmask to each port, to define a subnet.


 # define NETMASK_BY_PORT(port) (0xffffff00)

2. Compile the RLP application with ARP=on


$ gmake clean
$ gmake ARP=on