[Subject Prev][Subject Next][Thread Prev][Thread Next][Subject Index][Thread Index]

[LI] flex/lex incompatibility!



Hello All,

This is about a strange problem I am facing in lex (flex-2.5.4a-6 on RH 6.0 Linux).

Iam explaining the problem in considerable detail.  If you are not inquisitive,
then hit delete and save urselves some time ;-)
I have a program(lex generated scanner), lex.yy.c, which when compiled & linked,
works absolutely fine on HP-UX 10.20, Solaris 2.6 & AIX 4.3. Now, Iam trying to
port this to RH 6.0 Linux. Linux has flex.

The program is quite simple. It reads an input file, scans it for a pattern and
returns a integer based on the pattern found.
The problem I found is that this program doesnot read & process the whole input
data file, if the size of the file is 8192 bytes. If the size is more than 
8192 bytes, it works fine, except that EOF processing (yywrap()) is not
performed. The program works perfectly only if the size of the input data file
is exactly 8192 bytes.
Upon digging into lottsa stuff, I noticed that flex reads the file in blocks
of 8192 bytes (default). It is defined in the lex generated scanner (lex.yy.c)
as YY_READ_BUF_SIZE macro, unlike in other flavours of unix, where they read
charecter by charecter.

Now, I understand that, yylex() reads the input data until EOF is reached or
a return statement in one of the rules is encountered. And when called again, 
yylex() will start reading from where it has left before.

If I have a return statement in one of my rules, the scanner is processing only
first line of its input (and only if the size of input is < 8192).
If my input data file size is > 8192 bytes, the scanner perfoms properly
except EOF processing(yywrap()). And strangely, if my input data file size is
exactly 8192, everything is perfect.

Iam providing you with a sample program which you can run and see the behaviour
for yourself.

------------------------------------cut here(testlex.l)-------------------------
%{
#include <stdio.h>

#define TRUE 1
#define FALSE 0

 /* tokens */
 #define SET     5   /* Fileset name */
 #define PDFSET  7   /* Fileset generated by pdf file */
 #define PDFINFO 8   /* Info generated by pdf */
 #define ERR         100
    
char    *progname;      /* name of program = argv[0] */

%}


digit   [0-9]
number  [0-9]+
space   [ ]+
newline "\n"

%%
"Fileset:".*\n  return(SET);

"%"{space}.*\n  return(PDFSET);

.*\n        	return(PDFINFO);

.       	return(ERR);

%%

/* determines action on end of file ( TRUE = exit) */
 yywrap()
    {
    	printf("EOF reached\n");
       	return(TRUE);
    }

main(int argc, char *argv[])
{

        int token;
        char *optarg;
        char *InputFile;
        InputFile=argv[1];
        printf("Input file is %s\n", InputFile);
        yyin=fopen(InputFile,"r");
        if (yyin==NULL)
        {
                printf("Error opening file %s\n", InputFile);
                exit(1);
        }

        while(!feof(yyin))
        {
                token=yylex();
                switch(token)
                {
                        case SET:
                                printf("SET\n");
                                break;
                        case PDFINFO:
                               printf("PDFSET\n");
                                break;
                        default:
                                printf("NONE\n");
                                break;
                }
        }
        return(0);
}

  --------------------------cut here----------------------------------------

Input Data file:
------------------------------cut here-----------------------------------
% Product Documentation file
Fileset:Test file
Fileset:Test file
Fileset:Test file
Fileset:Test file
------------------------------------------------------------------------------
Expected output:				Actual output
------------------------------------------------------------------------------ 
NONE						NONE
SET
SET
SET
SET
EOF reached

(The actual o/p matches the expected on other flavours of unix! )

Now, increase the size of input data file (by filling it with any pattern) to
more than 8192 bytes and exactly 8192 bytes & see the output! (I bet u'll be
bewildered)

If you see the program, after the first iteration in *while*, the control
doesnt go back into the loop. The reason is, since flex reads data in 8192 byte
chunks, and if input data file size is less than that, EOF is already reached
and so doesnt go into the loop. And so, it doesnot do any processing except
first record. Now isnt this wrong?? Many programs processing data from flat files, depend on EOF and in those cases, the code (like the one above) fails on
Linux.

Thanks you for reading it patiently. Your thoughts in this please....

TIA
Sai

--
Be careful what you choose. You may get it! 
--------------------------------------------------------------------
The Linux India Mailing List Archives are now available.  Please search
the archive at http://lists.linux-india.org/ before posting your question
to avoid repetition and save bandwidth.