Caroline Dahllof <caro@rhythm.com>
Calvin Williamson <calvinw@mindspring.com>
Introduction
How to run Codegen
Color Channel Names
The Lexer
Lexer: State Diagram
Lexer: DT_MACORS
Lexer: ARGUMENTS
Lexer: DEFINITION
Parser: DT MACORS
dt_keyword
Lexer: GENERIC_IMAGE_DECL
Parser: Declaration Block
Lexer: GENERIC_IMAGE_CODE
Parser: Code Block
Lexer: COMMENT
Lexer: EXTERNAL_VARIABLE
Reference
Codegen is a part of GEGL, its purpose is to convert Generic Image Language
(GIL) code into data type and color model specific code. Codegen is written
in Flex and Bison. This document will give an overview of the implementation
of Codegen, you should have a good understanding of the GIL before reading
this document.
> codegen --channel-names COLOR_CHANNEL_NAMES --channel-data-file CHANNEL_DATA_FILE
--channel-names Codegen expects a string that contains the channel names separated by commas after this flag . The first name will be the name of the first channel, the second name will be for the second channels and so on. Don't include the alpha channel in this string, Codegen will add this channel automatically. The alpha channel will be the last channel and it will be called alpha.
--channel-data-file Channel data file contains the definition of the DT Macros and it specified with this flag. These channel data files are in the sub directory called channel_data.
The fifth argument can be a .gil file, if no file is supplied Codegen
will assume the input will come from stdin. The result of Codegen is printed
to stdout.
This string gets parser in the parer.y:read_channel_names. It stores
the channel name in NAME_COLOR_CHANNEL and the number of channels is stored
in NUM_COLOR_CHANNEL (not including alpha). Remember that Codegen will
add an extra channel called alpha and Codegen will assume this channel
to be the alpha channel. These two variables are defined in data_type.h.
The lexer has eight states, and these states can further be divided
into two types of states. The first type of states is used to parse the
channel data file and the second type of states is used to parse the .gil
files.
The channel data states are:
The .gil file states are:
Most time the lexer will either return a simple token or a token that contains a elem_t struct (defined in common.h) when it is the .gil file states.
In the initial state the lexer will just echo everything it reads.
When the lexer parses the string DT_MACROS_BEGIN in initial state, it will switch to its DT_MACROS state. In the DT_MACROS state it will parse upper case strings. When it finds a string it will check if it is a name of a macros it knows by calling parser.y:dt_get_keyword. This function will return a dt_keyword struct. The arg field will tell the lexer weather the macro has arguments or not. If the struct has arguments (arg==1) it will switch to ARGUMENTS and if it doesn't (arg==0) it will switch to DEFINITIONS. When it reads the string DT_MACROS_END it will switch back to the initial state.
In this state the lexer will read a string that can include new line characters, spaces, parenthesis, numbers, and characters. After it has read the string it will switch back to the DT_MACROS state.
In this state the lexer will read all the arguments of the DT Macros. It will parse the parenthesis, the arguments, and the commas separating the arguments. After it has read the closing parenthesis it will switch state to DEFINITION.
The lexer will parse the channel data file and return tokens to the parser. The parser's grammar expects a dt_keyword.token and a DT_STRING (in the case of no arguments) or dt_keyword.token LT_PARENTHESIS DT_NAME (',' DT_NAME)* RT_PARENTHESIS DT_STRING. Once the parser has made sure that the grammar is correct it will assigned the corresponding string to the right global variable. These global variables are defined in data_type.h. In the case of a macro with arguments, it will replace the name argument in the DT_STRING with the string $number-of-the-argument. For example, if it is the first argument it will be replaced by $1.
The parser contains a dt_keyword_tab that is an array of dt_keyword struct.
This is a struct that contains the name of the macro, the field that tells the lexer weather or not the struct has arguments, and the token of the macro. All the allowed DT macros are stores in dt_keyword_tab.
When the lexer, in its initial state, finds the string GENERIC_IMAGE_DECL_BEGIN, it will switch to the GENERIC_IMAGE_DECL state. In this state the lexer will read all the variables that will be used in the code block. The lexer in this state reads indentations, data types, and the variable names. When it reads the GENERIC_IMAGE_DECL_END, it will switch back to the initial state. During the GENERIC_IMAGE_DECL state, if the lexer reads a /* it will switch to the COMMENT state and it will return form this state when it has read a */.
The parser expects first an indentation, then the data type, and finally the variable names. The grammars allows a list of variables names separated by ','.
Pixel and Channel will be substituted by DATATYPE_STR. Depending on the arguments that is passed to Pixel, parser will defined some aux variables.
All the variables are a elem_t struct (common.h) and when defined in the declaration block are added to the symtab (parser.y). elem_t has a field called scope which is assigned when a variable is defined. When a variable goes out of scope the variable is removed from the symtab. Remember that variables that are out of scope cannot be used in the code block.
When the lexer, in its initial state, reads the string GENERIC_IMAGE_CODE_BEGIN it will switch to the GENERIC_IMAGE_CODE state. In this state it will read the generic code. All the variables that it reads must have been defined in the declaration block first. When the lexer reads EXTERNAL_VARIABLE it will switch to the EXTERNAL_VARIABLE state and read the variable and then switch back. It will switch to the COMMENT state when it reads comments and the switch back. When it finally read GENERIC_IMAGE_CODE_END it will switch back to its initial state.
The parser will make sure that the generic code has the right grammar, if it does not, it will simple give an error and exit. The parser also makes sure that the variables that are in a function represent the same number of channels. For example, if you try to assign a rgb channels to an alpha channel (variable_alpha = variable_color), parser would give an error and exit.
Parser will expand functions depending on the color channel names and
variable used. For example, for the r,g,b channel color names:
| GIL CODE | EXPANDED CODE |
| dest_color = src_color; | dest_r = src_r;
dest_g = src_g; dest_b = src_b; |
| dest_alpha = src_alpha; | dest_alpha = src_alpha |
| dest = src; | Pixel dest(color,alpha), src(color,alpha);
dest_r = src_r; dest_g = src_g; dest_b = src_b; dest_alpha = src_alpha; Pixel dest(color), src(color); dest_r = src_r; dest_g = src_g; dest_b = src_b; Pixel dest(color,alpha,has_alpha), src(color,alpha); dest_r = src_r; dest_g = src_g; dest_b = src_b; if (dest_has_alpha) dest_alpha = src_alpha; |
| dest = src_color; | Pixel dest(color), src(color,alpha);
dest_r = src_r; dest_g = src_g; dest_b = src_b; |
All the DT Macros are substituted with their definitions.
The comment state is reach from either the GENERIC_IMAGE_DECL or GENERIC_IMAGE_CODE state when the lexer reads a /*. When the lexer is in the COMMENT state it will echo everything it reads. When it reads */, it will switch back to its previous state.
When the lexer in the GENERIC_IMAGE_CODE state reads EXTERNAL_VARIABLE it will switch to the EXTERNAL_VARIABLE state. The lexer will add the $e.EXTERNAL_INIT_STRING to the variable and return it in the elem_t to the parser and then switch back to the GENERIC_IMAGE_CODE when it reads ')'.
Gegl Classes
Generic Channel Data
Generic Image Language