Intel x86 Assembly Language & Microarchitecture
Conversione di stringhe decimali in numeri interi

.NET Framework algorithm Android C Language C# Language C++ Java Language GNU/Linux Python Language Visual Basic .NET Language

Osservazioni

La conversione di stringhe in numeri interi è una delle attività più comuni.

Qui mostreremo come convertire stringhe decimali in numeri interi.

Il codice Psuedo per fare questo è:

function string_to_integer(str):
    result = 0
    for (each characters in str, left to right):
        result = result * 10
        add ((code of the character) - (code of character 0)) to result
    return result

Affrontare le stringhe esadecimali è un po 'più difficile perché i codici dei caratteri non sono in genere continui quando si trattano tipi di caratteri multipli come cifre (0-9) e alfabeti (af e AF). I codici dei caratteri sono in genere continui quando si ha a che fare con un solo tipo di caratteri (tratteremo qui le cifre), quindi tratteremo solo gli ambienti in cui i codici dei caratteri per le cifre sono continui.

IA-32 assembly, GAS, convenzione chiamata cdecl

# make this routine available outside this translation unit
.globl string_to_integer

string_to_integer:
    # function prologue
    push %ebp
    mov %esp, %ebp
    push %esi

    # initialize result (%eax) to zero
    xor %eax, %eax
    # fetch pointer to the string
    mov 8(%ebp), %esi

    # clear high bits of %ecx to be used in addition
    xor %ecx, %ecx
    # do the conversion
string_to_integer_loop:
    # fetch a character
    mov (%esi), %cl
    # exit loop when hit to NUL character
    test %cl, %cl
    jz string_to_integer_loop_end
    # multiply the result by 10
    mov $10, %edx
    mul %edx
    # convert the character to number and add it
    sub $'0', %cl
    add %ecx, %eax
    # proceed to next character
    inc %esi
    jmp string_to_integer_loop
string_to_integer_loop_end:

    # function epilogue
    pop %esi
    leave
    ret

Questo codice in stile GAS converte la stringa decimale fornita come primo argomento, che viene inserita nello stack prima di chiamare questa funzione, in intero e restituita tramite %eax . Il valore di %esi viene salvato perché è registrato in modo callee-save e viene utilizzato.

Overflow / wrapping e caratteri non validi non vengono controllati per rendere il codice semplice.

In C, questo codice può essere usato in questo modo (assumendo unsigned int e puntatori lunghi 4 byte):

#include <stdio.h>

unsigned int string_to_integer(const char* str);

int main(void) {
    const char* testcases[] = {
        "0",
        "1",
        "10",
        "12345",
        "1234567890",
        NULL
    };
    const char** data;
    for (data = testcases; *data != NULL; data++) {
        printf("string_to_integer(%s) = %u\n", *data, string_to_integer(*data));
    }
    return 0;
}

Nota: in alcuni ambienti, due string_to_integer nel codice assembly devono essere modificati in _string_to_integer (aggiungi underscore) per far funzionare il codice C.

Funzione MS-DOS, TASM / MASM per leggere un numero intero senza segno a 16 bit

Leggere un numero intero senza segno a 16 bit dall'input.

Questa funzione utilizza il servizio di interrupt Int 21 / AH = 0Ah per la lettura di una stringa memorizzata nel buffer.
L'uso di una stringa bufferizzata consente all'utente di rivedere ciò che ha digitato prima di passarlo al programma per l'elaborazione.
Vengono letti fino a sei cifre (come 65535 = 2 ¹⁶ - 1 ha sei cifre).

Oltre a eseguire la conversione standard da numero a numero, questa funzione rileva anche input e overflow non validi (numero troppo grande per adattarsi a 16 bit).

Valori di ritorno

La funzione restituisce il numero letto in AX . Le bandiere ZF , CF , OF indicano se l'operazione è stata completata con successo o meno e perché.

Errore	ASCIA	ZF	CF	DI
Nessuna	Il numero intero a 16 bit	Impostato	Non impostato	Non impostato
Inserimento non valido	Il numero parzialmente convertito, fino all'ultima cifra valida rilevata	Non impostato	Impostato	Non impostato
straripamento	7FFFH	Non impostato	Impostato	Impostato

Lo ZF può essere utilizzato per distinguere rapidamente gli input validi rispetto a quelli non validi.

uso

call read_uint16
jo _handle_overflow            ;Number too big (Optional, the test below will do)
jnz _handle_invalid            ;Number format is invalid

;Here AX is the number read

Codice

;Returns:
  ;
  ;If the number is correctly converted:
  ;   ZF = 1, CF = 0, OF = 0
  ;   AX = number
  ;
  ;If the user input an invalid digit:
  ;   ZF = 0, CF = 1, OF = 0
  ;   AX = Partially converted number
  ;
  ;If the user input a number too big
  ;   ZF = 0, CF = 1, OF = 1
  ;   AX = 07fffh
  ;
  ;ZF/CF can be used to discriminate valid vs invalid inputs
  ;OF can be used to discrimate the invalid inputs (overflow vs invalid digit)
  ;
  read_uint16:
    push bp   
    mov bp, sp

    ;This code is an example in Stack Overflow Documentation project.
    ;x86/Converting Decimal strings to integers


    ;Create the buffer structure on the stack

    sub sp, 06h                ;Reserve 6 byte on the stack (5 + CR)
    push 0006h                 ;Header

    push ds
    push bx
    push cx
    push dx

    ;Set DS = SS

    mov ax, ss
    mov ds, ax                           
                  

    ;Call Int 21/AH=0A

    lea dx, [bp-08h]            ;Address of the buffer structure
    mov ah, 0ah
    int 21h

    ;Start converting

    lea si, [bp-06h]
    xor ax, ax
    mov bx, 10
    xor cx, cx

   _r_ui16_convert:

    ;Get current char

    mov cl, BYTE PTR [si]
    inc si

    ;Check if end of string

    cmp cl, CR_CHAR
    je _r_ui16_end                      ;ZF = 1, CF = 0, OF = 0
   
    ;Convert char into digit and check

    sub cl, '0'
    jb _r_ui16_carry_end                ;ZF = 0, CF = 1, OF = X -> 0
    cmp cl, 9
    ja _r_ui16_carry_end                ;ZF = 0, CF = 0 -> 1, OF = X -> 0


    ;Update the partial result (taking care of overflow)

    ;AX = AX * 10
    mul bx

    ;DX:AX = DX:AX + CX
    add ax, cx
    adc dx, 0
       
    test dx, dx
   jz _r_ui16_convert            ;No overflow
   
    ;set OF and CF
    mov ax, 8000h
    dec ax                           
    stc

   jmp _r_ui16_end                      ;ZF = 0, CF = 1, OF = 1
    
   _r_ui16_carry_end:

    or bl, 1                ;Clear OF and ZF
    stc                     ;Set carry
 
    ;ZF = 0, CF = 1, OF = 0

   _r_ui16_end:
    ;Don't mess with flags hereafter!

    pop dx
    pop cx
    pop bx
    pop ds

    mov sp, bp

    pop bp
    ret

    CR_CHAR EQU 0dh

Porting NASM

Per portare il codice su NASM rimuovere la parola chiave PTR dagli accessi alla memoria (es. mov cl, BYTE PTR [si] diventa mov cl, BYTE [si] )

Funzione MS-DOS, TASM / MASM per stampare un numero a 16 bit in binario, quaternario, ottale, esadecimale

Stampa un numero in binario, quaternario, ottale, esadecimale e una potenza generale di due

Tutte le basi che hanno una potenza di due, come le basi binarie (2 ¹ ), quaternarie (2 ² ), ottali (2 ³ ), esadecimali (2 ⁴ ), hanno un numero intero di bit per cifra ¹ .
Quindi, per recuperare ciascuna cifra ² di un numero, si interrompe semplicemente il gruppo di introduzione numerica di n bit a partire da LSb (a destra).
Ad esempio per la base quaternaria, interrompiamo un numero di 16 bit in gruppi di due bit. Ci sono 8 di tali gruppi.
Non tutta la potenza di due basi ha un numero intero di gruppi che si adatta a 16 bit; per esempio, la base ottale ha 5 gruppi di 3 bit che rappresentano 3 · 5 = 15 bit su 16, lasciando un gruppo parziale di 1 bit ³ .

L'algoritmo è semplice, isoliamo ciascun gruppo con uno spostamento seguito da un'operazione AND .
Questa procedura funziona per ogni dimensione dei gruppi o, in altre parole, per qualsiasi potenza di base di due.

Per mostrare le cifre nel giusto ordine, la funzione inizia isolando il gruppo più significativo (il più a sinistra), quindi è importante sapere: a) quanti bit D un gruppo è e b) il bit posizione S dove è il più a sinistra inizia il gruppo.
Questi valori sono precalcolati e memorizzati in costanti accuratamente elaborate.

parametri

I parametri devono essere spinti in pila.
Ognuno è largo 16 bit.
Sono mostrati in ordine di spinta.

Parametro	Descrizione
N	Il numero da convertire
Base	La base da usare espressa usando le costanti `BASE2` , `BASE4` , `BASE8` e `BASE16`
Stampa gli zeri iniziali	Se zero, non vengono stampati zero zeri non significativi, altrimenti lo sono. Il numero 0 viene comunque stampato come "0"

uso

push 241
push BASE16
push 0
call print_pow2              ;Prints f1

push 241
push BASE16
push 1
call print_pow2              ;Prints 00f1

push 241
push BASE2
push 0
call print_pow2              ;Prints 11110001

Nota per gli utenti TASM : se si inseriscono le costanti definite con EQU dopo il codice che le utilizza, abilitare il multi-pass con il flag /m di TASM o si otterrà l' override del fabbisogno di riferimento Forward .

Codice

;Parameters (in order of push):
;
;number
;base (Use constants below)
;print leading zeros
print_pow2:
 push bp
 mov bp, sp

 push ax
 push bx
 push cx
 push dx
 push si
 push di

 ;Get parameters into the registers

 ;SI = Number (left) to convert
 ;CH = Amount of bits to shift for each digit (D)
 ;CL = Amount od bits to shift the number (S)
 ;BX = Bit mask for a digit

 mov si, WORD PTR [bp+08h]
 mov cx, WORD PTR [bp+06h]            ;CL = D, CH = S

 ;Computes BX = (1 << D)-1
 
 mov bx, 1
 shl bx, cl
 dec bx

 xchg cl, ch              ;CL = S, CH = D

_pp2_convert:
 mov di, si
 shr di, cl
 and di, bx         ;DI = Current digit

 or WORD PTR [bp+04h], di             ;If digit is non zero, [bp+04h] will become non zero
                      ;If [bp+04h] was non zero, result is non zero
 jnz _pp2_print                       ;Simply put, if the result is non zero, we must print the digit

 
 ;Here we have a non significant zero
 ;We should skip it BUT only if it is not the last digit (0 should be printed as "0" not
 ;an empty string)

 test cl, cl
 jnz _pp_continue


_pp2_print:
 ;Convert digit to digital and print it

 
 mov dl, BYTE PTR [DIGITS + di]
 mov ah, 02h
 int 21h


_pp_continue:
 ;Remove digit from the number

 sub cl, ch
jnc _pp2_convert

 pop di
 pop si
 pop dx
 pop cx
 pop bx
 pop ax

 pop bp
 ret 06h

Dati

This data must be put in the data segment, the one reached by `DS`.

DIGITS    db    "0123456789abcdef"

;Format for each WORD is S D where S and D are bytes (S the higher one)
;D = Bits per digit  --> log2(BASE)
;S = Initial shift count --> D*[ceil(16/D)-1]

BASE2    EQU    0f01h
BASE4    EQU    0e02h
BASE8    EQU    0f03h
BASE16    EQU    0c04h

Porting NASM

Per portare il codice su NASM rimuovere la parola chiave PTR dagli accessi alla memoria (ad esempio, mov si, WORD PTR [bp+08h] diventa mov si, WORD PTR [bp+08h] )

Estendere la funzione

La funzione può essere facilmente estesa a qualsiasi base fino a 2 ²⁵⁵ , anche se ciascuna base sopra a 2 ¹⁶ stamperà lo stesso numero del numero di soli 16 bit.

Per aggiungere una base:

Definire un nuovo BASEx costante dove x è 2 ⁿ .
Il byte inferiore, denominato D , è D = n .
Il byte superiore, denominato S , è la posizione, in bit, del gruppo più alto. Può essere calcolato come S = n · (⌈16 / n ⌉ - 1).
Aggiungi le cifre necessarie alla stringa DIGITS .

Esempio: aggiunta della base 32

Abbiamo D = 5 e S = 15, quindi definiamo BASE32 EQU 0f05h .
Aggiungiamo quindi altre sedici cifre: DIGITS db "0123456789abcdefghijklmnopqrstuv" .

Come dovrebbe essere chiaro, le cifre possono essere modificate modificando la stringa DIGITS .

¹ Se B è una base, quindi ha cifre B per definizione. Il numero di bit per cifra è quindi log ₂ ( B ). Per la potenza di due basi questo semplifica la registrazione di ₂ (2 ⁿ ) = n che è un numero intero per definizione.

² In questo contesto si assume implicitamente che la base in esame sia una potenza di due basi 2 ⁿ .

³ Per una base B = 2 ⁿ per avere un numero intero di gruppi di bit, deve essere n | 16 ( n divide 16). Poiché l'unico fattore in 16 è 2, deve essere che n è esso stesso un potere di due. Quindi B ha il formato 2 ^{2 ^k} o il log ₂ equivalente ( log ₂ ( B )) deve essere un numero intero.

MS-DOS, TASM / MASM, funzione per stampare un numero a 16 bit in decimale

Stampa un numero senza segno a 16 bit in decimale

Il servizio di interrupt Int 21 / AH = 02h viene utilizzato per stampare le cifre.
La conversione standard da numero a numerale viene eseguita con l'istruzione div , il dividendo è inizialmente la potenza massima di dieci adattatori di 16 bit (10 ⁴ ) e viene ridotto a potenze inferiori ad ogni iterazione.

parametri

I parametri sono mostrati in ordine di spinta.
Ognuno è 16 bit.

Parametro	Descrizione
numero	Il numero senza segno a 16 bit da stampare in decimale
mostra gli zeri iniziali	Se 0 non vengono stampati zero non significativi, altrimenti lo sono. Il numero 0 viene sempre stampato come "0"

uso

push 241
push 0
call print_dec          ;prints 241

push 56
push 1
call print_dec          ;prints 00056

push 0
push 0
call print_dec          ;prints 0

Codice

;Parameters (in order of push):
;
;number
;Show leading zeros
print_dec:
 push bp
 mov bp, sp

 push ax
 push bx
 push cx
 push dx

 ;Set up registers:
 ;AX = Number left to print
 ;BX = Power of ten to extract the current digit
 ;DX = Scratch/Needed for DIV
 ;CX = Scratch

 mov ax, WORD PTR [bp+06h]
 mov bx, 10000d
 xor dx, dx

_pd_convert:     
 div bx                           ;DX = Number without highmost digit, AX = Highmost digit
 mov cx, dx                       ;Number left to print

 ;If digit is non zero or param for leading zeros is non zero
 ;print the digit
 or WORD PTR [bp+04h], ax
 jnz _pd_print

 ;If both are zeros, make sure to show at least one digit so that 0 prints as "0"
 cmp bx, 1
 jne _pd_continue

_pd_print:

 ;Print digit in AL

 mov dl, al
 add dl, '0'
 mov ah, 02h
 int 21h

_pd_continue:
 ;BX = BX/10
 ;DX = 0

 mov ax, bx
 xor dx, dx
 mov bx, 10d
 div bx
 mov bx, ax

 ;Put what's left of the number in AX again and repeat...
 mov ax, cx

 ;...Until the divisor is zero
 test bx, bx
jnz _pd_convert

 pop dx
 pop cx
 pop bx
 pop ax

 pop bp
 ret 04h

Porting NASM

Per portare il codice su NASM rimuovere la parola chiave PTR dagli accessi alla memoria (ad esempio, mov ax, WORD PTR [bp+06h] diventa mov ax, WORD [bp+06h] )

Modified text is an extract of the original Stack Overflow Documentation

Autorizzato sotto CC BY-SA 3.0

Non affiliato con Stack Overflow