C++ => C ++での最適化

空の基底クラスの最適化

オブジェクトは1バイト未満を占めることはできません。したがって、この型の配列のメンバーは同じアドレスを持ちます。したがって、 sizeof(T)>=1常に成立する。また、派生したクラスがその基底クラスのどれよりも小さくなることはできません。ただし、基本クラスが空の場合、そのサイズは必ずしも派生クラスに追加されるわけではありません。

class Base {};

class Derived : public Base
{
public:
    int i;
};

この場合、オブジェクト内のタイプごとに異なるアドレスを持つために、 Derived内のBaseバイトを割り当てる必要はありません。空の基底クラスの最適化が行われ（パディングは不要）、 sizeof(Derived) == sizeof(int) 、つまり空のベースに対して追加の割り当ては行われません。これは複数の基底クラスでも可能です（C ++では、複数の基底が同じ型を持つことができないので、それから問題は発生しません）。

これは、 Derivedの最初のメンバーの型が基本クラスのいずれかと異なる場合にのみ実行できます。これには、直接的または間接的な共通基盤が含まれます。ベースの1つと同じタイプ（または共通のベースがある場合）の場合、同じタイプの異なる2つのオブジェクトが同じアドレスを持たないようにするには、少なくとも1バイトを割り当てる必要があります。

パフォーマンスの概要

CおよびC ++は高性能言語としてよく知られています。主に、コードのカスタマイズが大量であるため、ユーザーは構造の選択によってパフォーマンスを指定できます。

最適化する場合は、関連するコードをベンチマークし、コードの使用方法を完全に理解することが重要です。

一般的な最適化ミスは次のとおりです。

時期尚早の最適化：複雑なコードは、最適化後に悪化する可能性があり、時間と労力を浪費します。まず第一に、最適化されたコードではなく、正確で維持可能なコードを書くことです。
間違ったユースケースの最適化： 1％のオーバーヘッドを追加することは、他の99％
マイクロ最適化：コンパイラはこれを非常に効率的に行い、マイクロ最適化はコンパイラがコードをさらに最適化する能力を損なうことさえあります

典型的な最適化目標は次のとおりです。

仕事を少なくする
より効率的なアルゴリズム/構造を使用するには
ハードウェアをより有効に活用するには

最適化されたコードには、以下を含むマイナスの副作用があります。

メモリ使用量の増加
複雑なコード - 読み込みや維持が難しい
APIとコードデザインの妥協

より少ないコード実行で最適化

最適化の最も簡単なアプローチは、コードを少なくすることです。このアプローチは、通常、コードの時間複雑さを変更することなく、一定のスピードアップを提供します。

このアプローチは明確なスピードアップを提供しますが、これはコードが多く呼び出される場合にのみ顕著な改善をもたらします。

役に立たないコードを削除する

void func(const A *a); // Some random function

// useless memory allocation + deallocation for the instance
auto a1 = std::make_unique<A>();
func(a1.get()); 

// making use of a stack object prevents 
auto a2 = A{};
func(&a2);

C ++ 14

C ++ 14から、コンパイラは、このコードを最適化して、割り振り解除と一致する割り振り解除を取り除くことができます。

コードを1回だけ実行する

std::map<std::string, std::unique_ptr<A>> lookup;
// Slow insertion/lookup
// Within this function, we will traverse twice through the map lookup an element
// and even a thirth time when it wasn't in
const A *lazyLookupSlow(const std::string &key) {
    if (lookup.find(key) != lookup.cend())
        lookup.emplace_back(key, std::make_unique<A>());
    return lookup[key].get();
}

// Within this function, we will have the same noticeable effect as the slow variant while going at double speed as we only traverse once through the code
const A *lazyLookupSlow(const std::string &key) {
    auto &value = lookup[key];
    if (!value)
        value = std::make_unique<A>();
    return value.get();
}

この最適化に対する同様のアプローチを使用して、安定したuniqueなバージョンを実装することができます

std::vector<std::string> stableUnique(const std::vector<std::string> &v) {
    std::vector<std::string> result;
    std::set<std::string> checkUnique;
    for (const auto &s : v) {
        // As insert returns if the insertion was successful, we can deduce if the element was already in or not
        // This prevents an insertion, which will traverse through the map for every unique element
        // As a result we can almost gain 50% if v would not contain any duplicates
        if (checkUnique.insert(s).second)
            result.push_back(s);
    }
    return result;
}

無駄な再割り当てやコピー/移動の防止

前の例では、すでにstd :: setのルックアップを防いでいましたが、 std::vectorまだストレージを再割り当てする必要があるアルゴリズムが増えています。これは、最初に正しいサイズに予約することで防ぐことができます。

std::vector<std::string> stableUnique(const std::vector<std::string> &v) {
    std::vector<std::string> result;
    // By reserving 'result', we can ensure that no copying or moving will be done in the vector
    // as it will have capacity for the maximum number of elements we will be inserting
    // If we make the assumption that no allocation occurs for size zero
    // and allocating a large block of memory takes the same time as a small block of memory
    // this will never slow down the program
    // Side note: Compilers can even predict this and remove the checks the growing from the generated code
    result.reserve(v.size());
    std::set<std::string> checkUnique;
    for (const auto &s : v) {
        // See example above
        if (checkUnique.insert(s).second)
            result.push_back(s);
    }
    return result;
}

効率的なコンテナの使用

適時に適切なデータ構造を使用することで最適化することで、コードの時間的複雑さを変えることができます。

// This variant of stableUnique contains a complexity of N log(N)
// N > number of elements in v
// log(N) > insert complexity of std::set
std::vector<std::string> stableUnique(const std::vector<std::string> &v) {
    std::vector<std::string> result;
    std::set<std::string> checkUnique;
    for (const auto &s : v) {
        // See Optimizing by executing less code
        if (checkUnique.insert(s).second)
            result.push_back(s);
    }
    return result;
}

異なる要素を格納するコンテナ（ツリーではなくハッシュコンテナ）を使用することで、実装を複雑性Nに変換できます。副作用として、std :: stringの比較演算子をless挿入された文字列が同じバケットに終わるはずです。

// This variant of stableUnique contains a complexity of N
// N > number of elements in v
// 1 > insert complexity of std::unordered_set
std::vector<std::string> stableUnique(const std::vector<std::string> &v) {
    std::vector<std::string> result;
    std::unordered_set<std::string> checkUnique;
    for (const auto &s : v) {
        // See Optimizing by executing less code
        if (checkUnique.insert(s).second)
            result.push_back(s);
    }
    return result;
}

小さなオブジェクトの最適化

小さなオブジェクトの最適化は、例えばstd::string （時にはShort / Small String Optimizationと呼ばれる）のような、低レベルのデータ構造内で使用される技法です。これは、コンテンツが予約済みのスペースに収まるのに十分小さい場合には、割り当てられたメモリの代わりにスタックスペースをバッファとして使用することを意味します。

余分なメモリオーバーヘッドと余分な計算を追加することにより、高価なヒープ割り当てを防止しようとします。この手法の利点は、使用法に依存し、誤って使用された場合でもパフォーマンスを損なう可能性があります。

例

この最適化を使用して文字列を実装する非常に素朴な方法は、次のとおりです。

#include <cstring>

class string final
{
    constexpr static auto SMALL_BUFFER_SIZE = 16;

    bool _isAllocated{false};                       ///< Remember if we allocated memory
    char *_buffer{nullptr};                         ///< Pointer to the buffer we are using
    char _smallBuffer[SMALL_BUFFER_SIZE]= {'\0'};   ///< Stack space used for SMALL OBJECT OPTIMIZATION

public:
    ~string()
    {
        if (_isAllocated)
            delete [] _buffer;
    }        

    explicit string(const char *cStyleString)
    {
        auto stringSize = std::strlen(cStyleString);
        _isAllocated = (stringSize > SMALL_BUFFER_SIZE);
        if (_isAllocated)
            _buffer = new char[stringSize];
        else
            _buffer = &_smallBuffer[0];
        std::strcpy(_buffer, &cStyleString[0]);
    }

    string(string &&rhs)
       : _isAllocated(rhs._isAllocated)
       , _buffer(rhs._buffer)
       , _smallBuffer(rhs._smallBuffer) //< Not needed if allocated
    {
        if (_isAllocated)
        {
           // Prevent double deletion of the memory
           rhs._buffer = nullptr;
        }
        else
        {
            // Copy over data
            std::strcpy(_smallBuffer, rhs._smallBuffer);
            _buffer = &_smallBuffer[0];
        }
    }
    // Other methods, including other constructors, copy constructor,
    // assignment operators have been omitted for readability
};

上のコードで分かるように、 new操作やdelete操作を防ぐために、いくつかの複雑さが追加されています。これに加えて、クラスには、2つの場合を除いて使用されない可能性のある、より大きなメモリフットプリントがあります。

多くの場合、 _bufferポインタ内のbool値_isAllocatedをビット操作で _bufferして、単一のインスタンスのサイズを縮小しようとします（ _bufferビット：サイズを8バイト小さくすることができます）。最適化は、プラットフォームのアラインメントルールがわかっている場合にのみ可能です。

いつ使用しますか？

この最適化では多くの複雑さが加わるため、この最適化をすべてのクラスで使用することはお勧めしません。一般的に使用される低レベルのデータ構造でしばしば遭遇するでしょう。一般的なC ++ 11 standard library実装では、 std::basic_string<>およびstd::function<>使用法を見つけることができます。

この最適化は、格納されたデータがバッファよりも小さい場合にのみメモリ割り当てを防止するため、クラスが小さなデータでよく使用される場合にのみメリットをもたらします。

この最適化の最終的な欠点は、バッファを移動するときに余分な労力が必要となり、バッファが使用されない場合よりも移動操作が高価になることである。これは、バッファに非PODタイプが含まれている場合に特に当てはまります。

Modified text is an extract of the original Stack Overflow Documentation

ライセンスを受けた CC BY-SA 3.0

所属していない Stack Overflow

C++
C ++での最適化

サーチ…