PHP => 文字列の解析

備考

正規表現は、文字列を文字列から取り出したり、文字列を断片に切り出したりする以外にも、他の用途に使用する必要があります。

セパレータで文字列を分割する

explodeとstrstrセパレータで部分文字列を取得するためのシンプルな方法があります。

共通文字で区切られたテキストのいくつかの部分を含む文字列は、 explode関数を使用して部分に分割できます。

$fruits = "apple,pear,grapefruit,cherry";
print_r(explode(",",$fruits)); // ['apple', 'pear', 'grapefruit', 'cherry']

このメソッドは、次のように使用できるlimitパラメータもサポートしています。

$fruits= 'apple,pear,grapefruit,cherry';

limitパラメータが0の場合、これは1として扱われます。

print_r(explode(',',$fruits,0)); // ['apple,pear,grapefruit,cherry']

limitが設定されていれば、返された配列には最大限のlimit要素が含まれ、残りの文字列を含む最後の要素が含まれます。

print_r(explode(',',$fruits,2)); // ['apple', 'pear,grapefruit,cherry']

limitパラメータが負の場合、last -limitを除くすべてのコンポーネントが返されます。

print_r(explode(',',$fruits,-1)); // ['apple', 'pear', 'grapefruit']

explodeをlistと組み合わせると、文字列を1行で変数に解析explodeことができます。

$email = "[email protected]";
list($name, $domain) = explode("@", $email);

ただし、 explodeの結果に十分な要素explode含まれていることを確認するか、未定義のインデックス警告がトリガーされるようにしてください。

strstrは、指定された針が最初に出現する前に部分文字列を取り除くか、または部分文字列を返します。

$string = "1:23:456";
echo json_encode(explode(":", $string)); // ["1","23","456"]
var_dump(strstr($string, ":")); // string(7) ":23:456"

var_dump(strstr($string, ":", true)); // string(1) "1"

strposを使って部分文字列を検索する

strposは針が最初に出現する前の干し草のバイト数として理解できます。

var_dump(strpos("haystack", "hay")); // int(0)
var_dump(strpos("haystack", "stack")); // int(3)
var_dump(strpos("haystack", "stackoverflow"); // bool(false)

部分文字列が存在するかどうかを確認する

TRUEまたはFALSEのチェックに注意してください。インデックスが0の場合、if文でFALSEと表示されるためです。

$pos = strpos("abcd", "a"); // $pos = 0;
$pos2 = strpos("abcd", "e"); // $pos2 = FALSE;

// Bad example of checking if a needle is found.
if($pos) { // 0 does not match with TRUE.
    echo "1. I found your string\n";
}
else {
    echo "1. I did not found your string\n";
}

// Working example of checking if needle is found.
if($pos !== FALSE) {
    echo "2. I found your string\n";
}
else {
    echo "2. I did not found your string\n";
}

// Checking if a needle is not found
if($pos2 === FALSE) {
    echo "3. I did not found your string\n";
}
else {
    echo "3. I found your string\n";
}

全体の例の出力：

1. I did not found your string 
2. I found your string 
3. I did not found your string

オフセットからの検索

// With offset we can search ignoring anything before the offset
$needle = "Hello";
$haystack = "Hello world! Hello World";

$pos = strpos($haystack, $needle, 1); // $pos = 13, not 0

部分文字列のすべての出現を取得する

$haystack = "a baby, a cat, a donkey, a fish";
$needle = "a ";
$offsets = [];
// start searching from the beginning of the string
for($offset = 0;
        // If our offset is beyond the range of the
        // string, don't search anymore.
        // If this condition is not set, a warning will
        // be triggered if $haystack ends with $needle
        // and $needle is only one byte long.
        $offset < strlen($haystack); ){
    $pos = strpos($haystack, $needle, $offset);
    // we don't have anymore substrings
    if($pos === false) break;
    $offsets[] = $pos;
    // You may want to add strlen($needle) instead,
    // depending on whether you want to count "aaa"
    // as 1 or 2 "aa"s.
    $offset = $pos + 1;
}
echo json_encode($offsets); // [0,8,15,25]

正規表現を使用して文字列を解析する

preg_matchは、正規表現を使用して文字列を解析するために使用できます。括弧で囲まれた式の部分はサブパターンと呼ばれ、文字列の個々の部分を選択することができます。

$str = "<a href=\"http://example.org\">My Link</a>";
$pattern = "/<a href=\"(.*)\">(.*)<\/a>/";
$result = preg_match($pattern, $str, $matches);
if($result === 1) {
    // The string matches the expression
    print_r($matches);
} else if($result === 0) {
    // No match
} else {
    // Error occured
}

出力

Array
(
    [0] => <a href="http://example.org">My Link</a>
    [1] => http://example.org
    [2] => My Link
)

部分文字列

サブストリングは、startパラメーターとlengthパラメーターで指定された文字列の部分を返します。

var_dump(substr("Boo", 1)); // string(2) "oo"

マルチバイト文字列を満たす可能性がある場合は、mb_substrを使用する方が安全です。

$cake = "cakeæøå";
var_dump(substr($cake, 0, 5)); // string(5) "cake�"
var_dump(mb_substr($cake, 0, 5, 'UTF-8')); // string(6) "cakeæ"

もう1つの変形はsubstr_replace関数で、文字列の一部の中のテキストを置き換えます。

var_dump(substr_replace("Boo", "0", 1, 1)); // string(3) "B0o"
var_dump(substr_Replace("Boo", "ts", strlen("Boo"))); // string(5) "Boots"

たとえば、文字列内の特定の単語を探して、正規表現を使用したくないとします。

$hi = "Hello World!";
$bye = "Goodbye cruel World!";

var_dump(strpos($hi, " ")); // int(5)
var_dump(strpos($bye, " ")); // int(7)

var_dump(substr($hi, 0, strpos($hi, " "))); // string(5) "Hello"
var_dump(substr($bye, -1 * (strlen($bye) - strpos($bye, " ")))); // string(13) " cruel World!"

// If the casing in the text is not important, then using strtolower helps to compare strings
var_dump(substr($hi, 0, strpos($hi, " ")) == 'hello'); // bool(false)
var_dump(strtolower(substr($hi, 0, strpos($hi, " "))) == 'hello'); // bool(true)

別のオプションは、電子メールの非常に基本的な解析です。

$email = "[email protected]";
$wrong = "foobar.co.uk";
$notld = "foo@bar";

$at = strpos($email, "@"); // int(4)
$wat = strpos($wrong, "@"); // bool(false)
$nat = strpos($notld , "@"); // int(3)

$domain = substr($email, $at + 1); // string(11) "example.com"
$womain = substr($wrong, $wat + 1); // string(11) "oobar.co.uk"
$nomain = substr($notld, $nat + 1); // string(3) "bar"

$dot = strpos($domain, "."); // int(7)
$wot = strpos($womain, "."); // int(5)
$not = strpos($nomain, "."); // bool(false)

$tld = substr($domain, $dot + 1); // string(3) "com"
$wld = substr($womain, $wot + 1); // string(5) "co.uk"
$nld = substr($nomain , $not + 1); // string(2) "ar"

// string(25) "[email protected] is valid"
if ($at && $dot) var_dump("$email is valid");
else var_dump("$email is invalid");

// string(21) "foobar.com is invalid"
if ($wat && $wot) var_dump("$wrong is valid");
else var_dump("$wrong is invalid");

// string(18) "foo@bar is invalid"
if ($nat && $not) var_dump("$notld is valid");
else var_dump("$notld is invalid");

// string(27) "foobar.co.uk is an UK email"
if ($tld == "co.uk") var_dump("$email is a UK address");
if ($wld == "co.uk") var_dump("$wrong is a UK address");
if ($nld == "co.uk") var_dump("$notld is a UK address");

または、「読書を続ける」または「...」を宣言の最後に置く

$blurb = "Lorem ipsum dolor sit amet";
$limit = 20;

var_dump(substr($blurb, 0, $limit - 3) . '...'); // string(20) "Lorem ipsum dolor..."

Modified text is an extract of the original Stack Overflow Documentation

ライセンスを受けた CC BY-SA 3.0

所属していない Stack Overflow

PHP
文字列の解析

サーチ…

備考