Recognizing the Numeral from a String

Recognizing the Numeral from a String

When programming microcontrollers, it is often necessary to enter some calibration coefficients into the firmware in the UART console: values ​​of currents, threshold voltages, lighting levels, etc. As a rule, these are real numbers with floating point.

Then it is often necessary to analyze text logs from the SD card. It is necessary to extract real numbers from CSV lines.

This is also a parsing of the NMEA 0183 protocol, which is present in all navigation receivers to simply read latitude and longitude.

All this requires is some reliable transparent and simple portable algorithm to recognize real numbers from strings.

Formulation of the problem.

Given a string of characters. Recognize a real (real) number from a string of characters. If there is no number, issue an error. At the same time, the prototype should be like this

bool try_str2real(const char* const text, double* const double_value)

Here is the legal alphabet of characters in the string: ”, ‘+’, ‘-‘, 0 1 2 3 4 5 6 7 8 9, ‘e’, ​​’E’ ‘.’ ‘\n’ ‘\r’

Here is a list of test cases:

No

Input line

Recognized number

is this a real number?

1

“.0”

0.0

Yes

2

“+.”

No

3

“005047e+6”

5047000000

Yes

4

“-8115e957”

INF

Yes

5

“2e0”

2

Yes

6

“4e+”

No

7

“.”

No

you can see the full list of test cases here

https://docs.google.com/spreadsheets/d/1rbEw8p8CMHzM5DJUYGxDX-BCQ8sZsp4jDMRsbifJD9k/edit#gid=0

for some inexplicable reason, the LeetCode site rates this challenge hard. https://leetcode.com/problems/valid-number/description/

Although it seems to me that there is nothing complicated here. In general, this is the rare case when real tasks from prod(a) appear on LeetCode.

Decision

real number– this is essentially a number for presenting continuous physical quantities (mass, voltage, lighting, etc.).

First, let’s decide on the scheme of the syntactic rib of the vevestven number. Here is a diagram of rational number analysis

Looking at this diagram, you can distinguish 5 or 6 distinct states.

typedef enum{
	PARSE_NUMBER_STATE_PARSE_SIGN=1,
	PARSE_NUMBER_STATE_PARSE_INTEGER=2,
	PARSE_NUMBER_STATE_PARSE_FRACTION=3,
	PARSE_NUMBER_STATE_PARSE_EXPONENTA_SIGN=4,
	PARSE_NUMBER_STATE_PARSE_EXPONENTA_INTEGER=5,
	PARSE_NUMBER_STATE_DONE=6,

	PARSE_NUMBER_STATE_UNDEF=0
}ParseNumberStates_t;

That is why I will solve this task with the help of an automatic methodology. Here is the main function. It scrolls through the terminal automaton to recognize the line number.

typedef struct {
    double value;
    double mantissa;
    double exponenta;
    char cur_letter;
    int8_t sign;
    uint64_t integer ;
    uint32_t fraction_order;
    double fraction;
    ParseNumberStates_t state;
    bool spot_mantissa;
    bool spot_exponent;
    int8_t exponent_sign;
    uint32_t exponent_integer;
}Text2NumberFsm_t;


bool try_str2real(const char* const text, double* const double_value){
    bool res = false;
    uint32_t len = 0;
    if(text){
        len = strlen(text);
        if(len){
            if(double_value){
                res = true;
            }
        }
    }

    if(res) {
        Text2NumberFsm_t Fsm;
        res = text_2_number_init(&Fsm);
        uint32_t i = 0;
        uint32_t ok = 0;
        uint32_t err = 0;
        LOG_DEBUG(STRING, "ParseDoubleIn:%u:[%s]", len, text);

        for(i=0; i<len; i++){
            Fsm.cur_letter = text[i];
            res = number_proc_one(&Fsm);
            if (res) {
                ok++;
            } else {
                err++;
                LOG_ERROR(STRING,"ProcErr: ch[%u]=%c ",i, text[i]);
                break;
            }
        }

        if(0==err) {
            Fsm.cur_letter="\n";
            res = number_proc_one(&Fsm);
            if (res) {
                ok++;
            } else {
                err++;
            }
        }
        if (0==err) {
            *double_value = Fsm.value;
            res = true;
        } else {
            LOG_ERROR(STRING,"len:%u ok:%u",len, ok);
            res = false;
        }
    }
    return res;
}

Here is the processing symbol. The algorithm changes depending on the current state of the state machine

static bool number_proc_one(Text2NumberFsm_t* Node){
    bool res = false;
    LOG_DEBUG(STRING, "Proc: %s",NumberParserFsm2Str(Node));
    if(Node) {
        switch (Node->state) {
            case PARSE_NUMBER_STATE_PARSE_SIGN:
                res = number_proc_parse_sign(Node); break;
            case PARSE_NUMBER_STATE_PARSE_INTEGER:
                res = number_proc_parse_integer(Node); break;
            case PARSE_NUMBER_STATE_PARSE_FRACTION:
                res = number_proc_parse_fracion(Node); break;
            case    PARSE_NUMBER_STATE_PARSE_EXPONENTA_SIGN:
                res = number_proc_parse_exponenta_sign(Node); break;
            case    PARSE_NUMBER_STATE_PARSE_EXPONENTA_INTEGER:
                res = number_proc_parse_exponenta_integer(Node); break;
            case PARSE_NUMBER_STATE_DONE:
                res = number_proc_done(Node); break;

        default: res = false; break;
        }

    }
    return res;
}

Here is the initialization of the finite state machine for material number analysis.

static bool text_2_number_init(Text2NumberFsm_t* Node){
    bool res = false;
    if (Node) {
        Node->value = 0.0;
        Node->sign = 1;
        Node->integer = 0;
        Node->exponent_integer = 0;
        Node->fraction = 0.0;
        Node->fraction_order = 1;
        Node->spot_mantissa = false;
        Node->spot_exponent = false;
        Node->mantissa = 1.0;
        Node->exponent_sign = 1;
        Node->exponenta = 1.0;
        Node->cur_letter="\0";
        Node->state = PARSE_NUMBER_STATE_PARSE_SIGN;
        res = true;
    }
    return res;
}

And these auxiliary functions are handlers of each individual state so that all functions can fit separately on one screen



bool  number_compose_result(Text2NumberFsm_t* Node){
    bool res = false;
    if(Node) {
    	if(Node->spot_mantissa) {
            Node->value = ( (double)Node->sign ) * (((double) Node->integer) + Node->fraction);
            LOG_DEBUG(STRING,"Val:%f",Node->value);
            Node->state = PARSE_NUMBER_STATE_DONE;
            res = true;
        } else {
            LOG_ERROR(STRING,"NoMantissa");
    	}
    }
    return res;
}

static bool number_proc_parse_sign(Text2NumberFsm_t* Node){
    bool res = false;
    switch(Node->cur_letter) {
    case 'E':
    case 'e': {
        Node->mantissa = 1.0;
        Node->spot_mantissa = false;
        Node->integer = 1;
        Node->fraction = 0.0;
        Node->state = PARSE_NUMBER_STATE_PARSE_EXPONENTA_SIGN;
        res = true;

    }break;

    case ' ': {
        res = true;
    }break;
    case '+': {
        Node->sign = 1;
        Node->state = PARSE_NUMBER_STATE_PARSE_INTEGER;
        res = true;
    }break;
    case '-': {
        Node->sign =-1;
        Node->state = PARSE_NUMBER_STATE_PARSE_INTEGER;
        res = true;
    }break;
    case '0':  {
        Node->sign =1;
        Node->spot_mantissa=true;
        Node->state = PARSE_NUMBER_STATE_PARSE_INTEGER;
        res = true;
    }break;
    case '1':
    case '2':
    case '3':
    case '4':
    case '5':
    case '6':
    case '7':
    case '8':
    case '9': {
        uint8_t digit = 0;
        Node->spot_mantissa = true;
        res= try_hex_char_to_u8((uint8_t) Node->cur_letter, &digit) ;
        Node->integer *=10;
        Node->integer += digit;

        Node->state = PARSE_NUMBER_STATE_PARSE_INTEGER;
    }break;
    case '.': {
    	Node->fraction_order = 1;
        Node->integer = 0;
        Node->spot_mantissa = false;
        Node->state = PARSE_NUMBER_STATE_PARSE_FRACTION;
        res = true;
    }break;
    case '\n':
        res = false;
        break;
    case '\r':
        res = false;
        break;
    default:
        res = false;
        break;
    }
    return res;
}

static bool number_mantissa_save(Text2NumberFsm_t* Node){
    bool res = false;
    if(Node) {
        Node->mantissa = ( (double)Node->sign   ) * (((double) Node->integer) + Node->fraction);
        LOG_DEBUG(STRING,"Mantissa:%f",Node->mantissa);
        res = true;
    }
    return res;
}

static bool number_proc_parse_integer(Text2NumberFsm_t* Node){
    bool res = false;
    switch(Node->cur_letter) {
    case 'E':
    case 'e': {
        res = number_mantissa_save(Node);
        Node->state =PARSE_NUMBER_STATE_PARSE_EXPONENTA_SIGN;
    }break;
    case ' ': {
        res = false;
    }break;
    case '.':{
    	Node->fraction_order=1;
        Node->state = PARSE_NUMBER_STATE_PARSE_FRACTION;
        res = true;
    } break;
    case '0':
    case '1':
    case '2':
    case '3':
    case '4':
    case '5':
    case '6':
    case '7':
    case '8':
    case '9': {
        uint8_t digit = 0;
        res= try_hex_char_to_u8((uint8_t) Node->cur_letter, &digit) ;
        LOG_PARN(STRING,"ParseInt %u", digit);
        Node->integer *= 10;
        Node->integer += digit;
        Node->spot_mantissa = true;
        res = true;
    }break;

    case '\r':
    case '\n':{
        res = number_compose_result(Node);
    } break;


    default:{
        res = false;
    } break;
    }
    return res;
}


static bool number_proc_parse_exponenta_sign(Text2NumberFsm_t* Node){
    bool res = false;
    switch(Node->cur_letter) {
    case '+': {
        Node->exponent_sign = 1;
        Node->state = PARSE_NUMBER_STATE_PARSE_EXPONENTA_INTEGER;
        res = true;
    }break;

    case '-': {
        Node->exponent_sign = -1;
        res = true;
        Node->state= PARSE_NUMBER_STATE_PARSE_EXPONENTA_INTEGER;
    }        break;

    case '0':{
    	Node->exponent_integer = 0;
    	Node->exponent_sign = 1;
    	Node->spot_exponent = true;
    	Node->state = PARSE_NUMBER_STATE_PARSE_EXPONENTA_INTEGER;
    	res = true;
    }break;
    case '1':
    case '2':
    case '3':
    case '4':
    case '5':
    case '6':
    case '7':
    case '8':
    case '9':{
        uint8_t digit = 0;
        res= try_hex_char_to_u8((uint8_t) Node->cur_letter, &digit) ;
        LOG_PARN(STRING,"ParseExpInt %u",digit);
        Node->exponent_integer *=10;
        Node->spot_exponent = true;
        Node->exponent_integer += digit;
        Node->state= PARSE_NUMBER_STATE_PARSE_EXPONENTA_INTEGER;
        res = true;
    }break;

    case '\r':
    case '\n':{res = false;} break;
    default: res = false; break;
    }
    return res;
}

static bool number_exponenta_calc(Text2NumberFsm_t* Node){
    bool res = false;
    if(Node) {
    	if(Node->spot_exponent ){
            int32_t rank = ((int32_t)Node->exponent_integer) * ((int32_t)Node->exponent_sign);
            LOG_DEBUG(STRING,"ExpRank %d",rank);
            double power = (double )rank;
            LOG_DEBUG(STRING,"power %f",power);
            if(rank < 306) {
                if(-306 < rank) {
                    Node->exponenta = pow(10.0,power);
                    LOG_DEBUG(STRING,"Exponenta: %f",Node->exponenta);
                    res = true;
                } else {
                	Node->exponenta = 0.0;
                    LOG_ERROR(STRING,"MathError:TooSmalExpPower %d",rank);
                    res = false;
                }
            } else {
                LOG_ERROR(STRING,"MathError:TooBigExpPower %d",rank);
                res = false;
            }
    	} else {
    		LOG_ERROR(STRING,"NoExponents");
    	}
    }
    return res ;
}

static bool number_proc_parse_exponenta_integer(Text2NumberFsm_t* Node){
    bool res = false;
    switch(Node->cur_letter) {
    case '0':
    case '1':
    case '2':
    case '3':
    case '4':
    case '5':
    case '6':
    case '7':
    case '8':
    case '9': {
        uint8_t digit = 0;
        res= try_hex_char_to_u8((uint8_t) Node->cur_letter, &digit) ;
        LOG_PARN(STRING,"ParseExpInt %u",digit);
        Node->exponent_integer *=10;
        Node->exponent_integer += digit;
        Node->spot_exponent = true;
        Node->state= PARSE_NUMBER_STATE_PARSE_EXPONENTA_INTEGER;
        res = true;
    }break;

    case '\r':
    case '\n': {
        res = number_exponenta_calc(Node);
        if (res) {
        	if(Node->spot_mantissa) {
                double mantissa = 0.0;
                mantissa =((double)Node->sign)*(((double)Node->integer)+((double)Node->fraction));
                LOG_DEBUG(STRING,"mantissa %f", mantissa);
                Node->value = mantissa*(Node->exponenta);
                Node->state= PARSE_NUMBER_STATE_DONE;
        	}else {
        		LOG_DEBUG(STRING,"NoMantissa");
        		res = false;
        	}
        }
    }break;
    default:
        res = false;
        break;
    }
    return res;
}

bool number_compose_mantissa(Text2NumberFsm_t* Node){
    bool res = false;
    if(Node) {
    	Node->mantissa =((double)Node->sign)*(((double)Node->integer)+((double)Node->fraction));
    	res = true;
    }
    return res;
}

static bool  number_proc_parse_fracion(Text2NumberFsm_t* Node){
    bool res = false;
    switch(Node->cur_letter){
    case 'E':
    case 'e':{
    	res = number_compose_mantissa(Node);
    	Node->state = PARSE_NUMBER_STATE_PARSE_EXPONENTA_SIGN;
    }break;
    case ' ': {
    	LOG_ERROR(STRING,"TornFlowErr!");
    	res = false;
    }break;
    case '.': {
    	LOG_ERROR(STRING,"DoubleDotErr!");
    	res = false;}break;
    case '0':
    case '1':
    case '2':
    case '3':
    case '4':
    case '5':
    case '6':
    case '7':
    case '8':
    case '9': {
        uint8_t digit = 0;
        res = try_hex_char_to_u8((uint8_t) Node->cur_letter, &digit) ;
        double fraction_digit =  (    (double)digit  ) / (   pow(10.0, (double) Node->fraction_order) );
        LOG_DEBUG(STRING,"+ %f",fraction_digit);
        Node->spot_mantissa= true;
        Node->fraction += fraction_digit;

        Node->fraction_order++;
        res = true;
    }break;
    case '-':
    case '+': {res = false;}break;

    case '\r':
    case '\n':{
        res = number_compose_result(Node);
    } break;

    }
    return res;
}

static bool number_proc_done(Text2NumberFsm_t* Node){
    bool res = false;
    LOG_DEBUG(STRING, "Proc: %s",NumberParserFsm2Str(Node));
    switch(Node->cur_letter){
        case '\n':
        case '\r':{
            res = true;
        } break;
        default:{
            res = false;
        }break;
    }
    return res;
}

This solution was verified on the LeetCode website and turned out to be the fastest solution in the C programming language!

If you know the test data for the task of recognizing a real number from a string, write it in the comments.

Conclusion

State machines are ideal for parsing a variety of data types from strings, including doubles. If you learned to recognize the double type, then you can immediately claim that you learned to recognize uint8_t uint64_t, uint64_t, float with the same code. One double parser is universal!

Dictionary

Acronym

Transcript

DFA

Deterministic finite automaton

CSV

Comma-separated values

NMEA

National Marine Electronics Association

Links

https://leetcode.com/problems/valid-number/description/

https://docs.google.com/spreadsheets/d/1rbEw8p8CMHzM5DJUYGxDX-BCQ8sZsp4jDMRsbifJD9k/edit#gid=0

Question

– Do you know examples of real practical tasks on LeetCode, the solution of which would be necessary in real embedded projects?

Related posts