Intel x86 Assembly Language & Microarchitecture
10進文字列を整数に変換する

.NET Framework algorithm Android C Language C# Language C++ Java Language GNU/Linux Python Language Visual Basic .NET Language

備考

文字列を整数に変換することは一般的な作業の1つです。

ここでは、10進文字列を整数に変換する方法を示します。

これを行うための擬似コードは次のとおりです。

function string_to_integer(str):
    result = 0
    for (each characters in str, left to right):
        result = result * 10
        add ((code of the character) - (code of character 0)) to result
    return result

数字（0-9）とアルファベット（afとAF）のような複数の文字型を扱う場合、文字コードは一般的に連続しないので、16進文字列を扱うのは少し難しいです。文字コードは通常、1つのタイプの文字（ここでは数字を扱う）を扱うときは連続しているので、数字の文字コードが連続している環境のみを扱います。

IA-32アセンブリ、GAS、cdecl呼び出し規約

# make this routine available outside this translation unit
.globl string_to_integer

string_to_integer:
    # function prologue
    push %ebp
    mov %esp, %ebp
    push %esi

    # initialize result (%eax) to zero
    xor %eax, %eax
    # fetch pointer to the string
    mov 8(%ebp), %esi

    # clear high bits of %ecx to be used in addition
    xor %ecx, %ecx
    # do the conversion
string_to_integer_loop:
    # fetch a character
    mov (%esi), %cl
    # exit loop when hit to NUL character
    test %cl, %cl
    jz string_to_integer_loop_end
    # multiply the result by 10
    mov $10, %edx
    mul %edx
    # convert the character to number and add it
    sub $'0', %cl
    add %ecx, %eax
    # proceed to next character
    inc %esi
    jmp string_to_integer_loop
string_to_integer_loop_end:

    # function epilogue
    pop %esi
    leave
    ret

このGASスタイルのコードは、この関数を呼び出す前にスタックにプッシュされている最初の引数として与えられた10進文字列をintegerに変換し、 %eax経由で返します。 %esiの値は、callee-saveレジスタであり、使用されているため保存されます。

オーバーフロー/ラッピングと無効な文字は、コードを単純にするためにチェックされません。

Cでは、このコードは次のように使用できます（ unsigned intとポインタが4バイト長であると仮定します）。

#include <stdio.h>

unsigned int string_to_integer(const char* str);

int main(void) {
    const char* testcases[] = {
        "0",
        "1",
        "10",
        "12345",
        "1234567890",
        NULL
    };
    const char** data;
    for (data = testcases; *data != NULL; data++) {
        printf("string_to_integer(%s) = %u\n", *data, string_to_integer(*data));
    }
    return 0;
}

注：一部の環境では、アセンブリコード内の2つのstring_to_integerを_string_to_integer （アンダースコアの追加）に変更して、Cコードで動作させる必要があります。

MS-DOS、16ビットの符号なし整数を読み取るTASM / MASM関数

入力から16ビットの符号なし整数を読み込みます。

この関数は、バッファされた文字列を読み込むために、割り込みサービスInt 21 / AH = 0Ahを使用します。
バッファリングされた文字列を使用すると、ユーザーは入力した内容をレビューしてから処理するためにプログラムに渡すことができます。
（1 - 6桁を有する65535 = 2 ¹⁶のよう^に）最大6桁が読み出されます。

この機能に番号を数字から標準変換を行う以外にも、無効な入力及びオーバーフロー（16ビットに合うように大きすぎる数）を検出します。

戻り値

この関数は、 AX読み取った数値を返します。フラグZF 、 CF 、 OFは、操作が正常に完了したかどうか、なぜ失敗したかを示します。

エラー	斧	ZF	CF	の
無し	16ビット整数	セット	設定されていません	設定されていません
無効入力	部分的に変換された番号。遭遇した最後の有効数字	設定されていません	セット	設定されていません
オーバーフロー	7fffh	設定されていません	セット	セット

有効な入力と無効な入力を迅速に区別するために、 ZFを使用することができます。

使用法

call read_uint16
jo _handle_overflow            ;Number too big (Optional, the test below will do)
jnz _handle_invalid            ;Number format is invalid

;Here AX is the number read

コード

;Returns:
  ;
  ;If the number is correctly converted:
  ;   ZF = 1, CF = 0, OF = 0
  ;   AX = number
  ;
  ;If the user input an invalid digit:
  ;   ZF = 0, CF = 1, OF = 0
  ;   AX = Partially converted number
  ;
  ;If the user input a number too big
  ;   ZF = 0, CF = 1, OF = 1
  ;   AX = 07fffh
  ;
  ;ZF/CF can be used to discriminate valid vs invalid inputs
  ;OF can be used to discrimate the invalid inputs (overflow vs invalid digit)
  ;
  read_uint16:
    push bp   
    mov bp, sp

    ;This code is an example in Stack Overflow Documentation project.
    ;x86/Converting Decimal strings to integers


    ;Create the buffer structure on the stack

    sub sp, 06h                ;Reserve 6 byte on the stack (5 + CR)
    push 0006h                 ;Header

    push ds
    push bx
    push cx
    push dx

    ;Set DS = SS

    mov ax, ss
    mov ds, ax                           
                  

    ;Call Int 21/AH=0A

    lea dx, [bp-08h]            ;Address of the buffer structure
    mov ah, 0ah
    int 21h

    ;Start converting

    lea si, [bp-06h]
    xor ax, ax
    mov bx, 10
    xor cx, cx

   _r_ui16_convert:

    ;Get current char

    mov cl, BYTE PTR [si]
    inc si

    ;Check if end of string

    cmp cl, CR_CHAR
    je _r_ui16_end                      ;ZF = 1, CF = 0, OF = 0
   
    ;Convert char into digit and check

    sub cl, '0'
    jb _r_ui16_carry_end                ;ZF = 0, CF = 1, OF = X -> 0
    cmp cl, 9
    ja _r_ui16_carry_end                ;ZF = 0, CF = 0 -> 1, OF = X -> 0


    ;Update the partial result (taking care of overflow)

    ;AX = AX * 10
    mul bx

    ;DX:AX = DX:AX + CX
    add ax, cx
    adc dx, 0
       
    test dx, dx
   jz _r_ui16_convert            ;No overflow
   
    ;set OF and CF
    mov ax, 8000h
    dec ax                           
    stc

   jmp _r_ui16_end                      ;ZF = 0, CF = 1, OF = 1
    
   _r_ui16_carry_end:

    or bl, 1                ;Clear OF and ZF
    stc                     ;Set carry
 
    ;ZF = 0, CF = 1, OF = 0

   _r_ui16_end:
    ;Don't mess with flags hereafter!

    pop dx
    pop cx
    pop bx
    pop ds

    mov sp, bp

    pop bp
    ret

    CR_CHAR EQU 0dh

NASM移植

コードをNASMに移植するには、メモリアクセスからPTRキーワードを削除します（たとえば、 mov cl, BYTE PTR [si]はmov cl, BYTE [si] ）。

MS-DOS、TASM / MASMバイナリ、4進、8進、16進数で16ビットの数値を出力する関数

数値を2進数、4進数、8進数、16進数、および2の累乗で印刷する

バイナリ（2 ¹ ）、4次（2 ² ）、8進（2 ³ ）、16進（2 ⁴ ）ベースのように、2の累乗であるすべてのベースは、1桁あたりの整数ビット数が¹です。
したがって、数字の各数字²を検索するには、単にLSb（右）から始まるnビットの数字のイントログループを分割するだけです。
たとえば、4進数ベースの場合、2ビットのグループで16ビットの数値を分割します。このようなグループは8つあります。
2つの基数のすべての累乗が16ビットに適合する整数のグループを持つわけではありません。例えば、8進数ベースは3ビットの5つのグループを有し、16のうち3,5 = 15ビットを占め、1ビット^3の部分グループを残す。

アルゴリズムは単純で、シフトとそれに続くAND演算を用いて各グループを分離する。
この手順は、グループのすべてのサイズに対して、言い換えると、2の任意の基数に対して機能します。

数字を正しい順序で表示するには、最も重要なグループ（左端）を分離することから始めます。それによって、a）グループがいくつのビットDであるかを知ることが重要です。b）ビット位置Sグループが開始されます。
これらの値はあらかじめ計算され、注意深く作成された定数に格納されます。

パラメーター

パラメータはスタックにプッシュする必要があります。
それぞれは16ビット幅です。
それらはプッシュ順に表示されます。

パラメータ	説明
N	変換する数値
ベース	定数使用して表現を使用する基地`BASE2` 、 `BASE4` 、 `BASE8`と`BASE16`
先頭のゼロを印字する	ゼロでない場合、重要でないゼロは印刷されません。そうでない場合はゼロです。 0番は "0"として印刷されます

使用法

push 241
push BASE16
push 0
call print_pow2              ;Prints f1

push 241
push BASE16
push 1
call print_pow2              ;Prints 00f1

push 241
push BASE2
push 0
call print_pow2              ;Prints 11110001

TASMユーザーへの注意 ： EQU定義された定数を使用するコードの後にTASMの /mフラグを使用してマルチパスを有効にすると、 Forward参照ニーズが上書きされます 。

コード

;Parameters (in order of push):
;
;number
;base (Use constants below)
;print leading zeros
print_pow2:
 push bp
 mov bp, sp

 push ax
 push bx
 push cx
 push dx
 push si
 push di

 ;Get parameters into the registers

 ;SI = Number (left) to convert
 ;CH = Amount of bits to shift for each digit (D)
 ;CL = Amount od bits to shift the number (S)
 ;BX = Bit mask for a digit

 mov si, WORD PTR [bp+08h]
 mov cx, WORD PTR [bp+06h]            ;CL = D, CH = S

 ;Computes BX = (1 << D)-1
 
 mov bx, 1
 shl bx, cl
 dec bx

 xchg cl, ch              ;CL = S, CH = D

_pp2_convert:
 mov di, si
 shr di, cl
 and di, bx         ;DI = Current digit

 or WORD PTR [bp+04h], di             ;If digit is non zero, [bp+04h] will become non zero
                      ;If [bp+04h] was non zero, result is non zero
 jnz _pp2_print                       ;Simply put, if the result is non zero, we must print the digit

 
 ;Here we have a non significant zero
 ;We should skip it BUT only if it is not the last digit (0 should be printed as "0" not
 ;an empty string)

 test cl, cl
 jnz _pp_continue


_pp2_print:
 ;Convert digit to digital and print it

 
 mov dl, BYTE PTR [DIGITS + di]
 mov ah, 02h
 int 21h


_pp_continue:
 ;Remove digit from the number

 sub cl, ch
jnc _pp2_convert

 pop di
 pop si
 pop dx
 pop cx
 pop bx
 pop ax

 pop bp
 ret 06h

データ

This data must be put in the data segment, the one reached by `DS`.

DIGITS    db    "0123456789abcdef"

;Format for each WORD is S D where S and D are bytes (S the higher one)
;D = Bits per digit  --> log2(BASE)
;S = Initial shift count --> D*[ceil(16/D)-1]

BASE2    EQU    0f01h
BASE4    EQU    0e02h
BASE8    EQU    0f03h
BASE16    EQU    0c04h

NASM移植

コードをNASMに移植するには、メモリアクセスからPTRキーワードを削除します（例： mov si, WORD PTR [bp+08h]はmov si, WORD PTR [bp+08h] ）

関数の拡張

この関数は2 ²⁵⁵までのどの基数にも簡単に拡張できますが、2 ¹⁶より上の各基数は数字が16ビットのみと同じ数字を出力します。

ベースを追加するには：

xが2 ^nの新しい定数BASEx定義します。
Dという名前の下位バイトはD = nです。
Sという名前の上位バイトは、上位グループのビット単位の位置です。これは、（⌉⌈16/ N - 1）S = N・のように計算することができます。
文字列DIGITS必要な数字を追加します。

例：ベース32を追加する

D = 5、 S = 15なので、 BASE32 EQU 0f05hを定義します。
さらに16桁の数字を追加します： DIGITS db "0123456789abcdefghijklmnopqrstuv"

明白なはずであるが、数字はDIGITS文字列を編集することによって変更することができる。

¹ Bがベースの場合、定義ごとにB桁があります。したがって、1桁あたりのビット数はlog ₂ （ B ）です。 2つの基数の累乗に対して、これはlog ₂ （2 ⁿ ）= nを定義することによって整数に簡単化する。

²この文脈では、考慮中の基底が2つの基底2 ^nのべき乗であることを暗に仮定している。

³基底B = 2 ⁿが整数のビットグループを持つためには、 n | 16（ nは16で割る）。 16の唯一の因数は2であるため、 n自体が2の累乗でなければなりません。したがって、 Bは2 ^{2 ^k}と等価な形をとる。log ₂ （ log ₂ （ B ））は整数でなければならない。

MS-DOS、TASM / MASM、10進数で16ビットの数値を出力する関数

10進数で16ビットの符号なし数値を出力する

割り込みサービスInt 21 / AH = 02hを使用して数字を印刷します。
数値から数値への標準変換はdiv命令で実行され、被除数は当初10フィッティング16ビット（10 ⁴ ）の最大出力であり、各繰り返しでより低い出力に減少します。

パラメーター

パラメータはプッシュ順に表示されます。
それぞれは16ビットです。

パラメータ	説明
数	10進数で印刷する16ビットの符号なし数値
先行ゼロを表示	0の場合は、有意でないゼロは印刷されません。番号0は常に "0"として出力されます。

使用法

push 241
push 0
call print_dec          ;prints 241

push 56
push 1
call print_dec          ;prints 00056

push 0
push 0
call print_dec          ;prints 0

コード

;Parameters (in order of push):
;
;number
;Show leading zeros
print_dec:
 push bp
 mov bp, sp

 push ax
 push bx
 push cx
 push dx

 ;Set up registers:
 ;AX = Number left to print
 ;BX = Power of ten to extract the current digit
 ;DX = Scratch/Needed for DIV
 ;CX = Scratch

 mov ax, WORD PTR [bp+06h]
 mov bx, 10000d
 xor dx, dx

_pd_convert:     
 div bx                           ;DX = Number without highmost digit, AX = Highmost digit
 mov cx, dx                       ;Number left to print

 ;If digit is non zero or param for leading zeros is non zero
 ;print the digit
 or WORD PTR [bp+04h], ax
 jnz _pd_print

 ;If both are zeros, make sure to show at least one digit so that 0 prints as "0"
 cmp bx, 1
 jne _pd_continue

_pd_print:

 ;Print digit in AL

 mov dl, al
 add dl, '0'
 mov ah, 02h
 int 21h

_pd_continue:
 ;BX = BX/10
 ;DX = 0

 mov ax, bx
 xor dx, dx
 mov bx, 10d
 div bx
 mov bx, ax

 ;Put what's left of the number in AX again and repeat...
 mov ax, cx

 ;...Until the divisor is zero
 test bx, bx
jnz _pd_convert

 pop dx
 pop cx
 pop bx
 pop ax

 pop bp
 ret 04h

NASM移植

コードをNASMに移植するには、メモリアクセスからPTRキーワードを削除します（例： mov ax, WORD PTR [bp+06h]はmov ax, WORD [bp+06h] ）

Modified text is an extract of the original Stack Overflow Documentation

ライセンスを受けた CC BY-SA 3.0

所属していない Stack Overflow