C++ => C ++의 최적화

빈 기본 클래스 최적화

객체는 1 바이트 미만을 차지할 수 없으므로이 유형의 배열 멤버는 동일한 주소를가집니다. 따라서 sizeof(T)>=1 항상 유지됩니다. 이는 파생 클래스는 기본 클래스들보다 작을 수 없습니다 것 또한 사실이다. 그러나 기본 클래스가 비어있는 경우 크기가 파생 클래스에 반드시 추가되지는 않습니다.

class Base {};

class Derived : public Base
{
public:
    int i;
};

이 경우 Derived 내의 Base 바이트를 할당하여 객체 당 유형별로 고유 한 주소를 가질 필요가 없습니다. 빈 기본 클래스 최적화가 수행되고 패딩이 필요하지 않으면 sizeof(Derived) == sizeof(int) 즉 빈 기본에 대해 추가 할당이 수행되지 않습니다. 이것은 여러 기본 클래스에서도 가능합니다 (C ++에서는 여러 개의 기본 클래스가 동일한 유형을 가질 수 없기 때문에 아무런 문제가 발생하지 않습니다).

Derived 클래스의 첫 번째 멤버가 기본 클래스의 형식과 다른 경우에만 수행 할 수 있습니다. 여기에는 직간접 적 공통 기초가 포함됩니다. 하나의베이스와 동일한 유형 (또는 공통베이스가있는 경우) 인 경우 동일한 유형의 두 개의 별개의 오브젝트가 동일한 주소를 갖지 않도록 최소한 하나의 바이트를 할당해야합니다.

실적 소개

C 및 C ++은 고성능 언어로 잘 알려져 있습니다. 주로 코드 사용자 정의가 많아 사용자가 구조를 선택하여 성능을 지정할 수 있습니다.

최적화 할 때 관련 코드를 벤치 마크하고 코드 사용 방법을 완전히 이해하는 것이 중요합니다.

일반적인 최적화 실수는 다음과 같습니다.

시기 상조 최적화 : 복잡한 코드는 최적화 후 성능이 저하되어 시간과 노력을 낭비 할 수 있습니다. 최우선 과제는 최적화 된 코드가 아닌 정확 하고 유지 보수가 가능한 코드를 작성하는 것입니다.
잘못된 사용 사례에 대한 최적화 : 1 %에 대한 오버 헤드 추가는 다른 99 %에 대한 속도 저하의 가치가 없을 수도 있습니다.
마이크로 최적화 : 컴파일러는이를 매우 효율적으로 수행하며 마이크로 최적화는 컴파일러가 코드를 더욱 최적화 할 수있는 능력을 해칠 수 있습니다

일반적인 최적화 목표는 다음과 같습니다.

일을 적게하려면
보다 효율적인 알고리즘 / 구조를 사용하려면
하드웨어를보다 잘 활용하려면

최적화 된 코드에는 다음을 포함하여 부작용이있을 수 있습니다.

높은 메모리 사용
복잡한 코드 - 읽거나 유지하기 어렵다.
손상된 API 및 코드 디자인

적은 코드 실행으로 최적화

최적화에 대한 가장 간단한 방법은 코드를 적게 실행하는 것입니다. 이 방법은 대개 코드의 시간 복잡도를 변경하지 않고 고정 된 속도 향상을 제공합니다.

이 접근 방식은 명확한 속도 향상을 제공하지만 코드가 많이 호출 될 때만 눈에 띄는 향상을 제공합니다.

쓸데없는 코드 제거하기

void func(const A *a); // Some random function

// useless memory allocation + deallocation for the instance
auto a1 = std::make_unique<A>();
func(a1.get()); 

// making use of a stack object prevents 
auto a2 = A{};
func(&a2);

C ++ 14

C ++ 14에서 컴파일러는이 코드를 최적화하여 할당 및 일치 해제 할당을 제거 할 수 있습니다.

한 번만 코드하기

std::map<std::string, std::unique_ptr<A>> lookup;
// Slow insertion/lookup
// Within this function, we will traverse twice through the map lookup an element
// and even a thirth time when it wasn't in
const A *lazyLookupSlow(const std::string &key) {
    if (lookup.find(key) != lookup.cend())
        lookup.emplace_back(key, std::make_unique<A>());
    return lookup[key].get();
}

// Within this function, we will have the same noticeable effect as the slow variant while going at double speed as we only traverse once through the code
const A *lazyLookupSlow(const std::string &key) {
    auto &value = lookup[key];
    if (!value)
        value = std::make_unique<A>();
    return value.get();
}

이 최적화에 대한 유사한 접근 방식을 사용하여 안정적인 unique 버전을 구현할 수 있습니다

std::vector<std::string> stableUnique(const std::vector<std::string> &v) {
    std::vector<std::string> result;
    std::set<std::string> checkUnique;
    for (const auto &s : v) {
        // As insert returns if the insertion was successful, we can deduce if the element was already in or not
        // This prevents an insertion, which will traverse through the map for every unique element
        // As a result we can almost gain 50% if v would not contain any duplicates
        if (checkUnique.insert(s).second)
            result.push_back(s);
    }
    return result;
}

쓸데없는 재배치 및 복사 / 이동 방지

앞의 예에서 std :: set의 검색을 이미 막았지만 std::vector 에는 저장 알고리즘을 다시 할당해야하는 알고리즘이 계속 포함되어 있습니다. 처음에는 적당한 크기로 예약함으로써 예방할 수 있습니다.

std::vector<std::string> stableUnique(const std::vector<std::string> &v) {
    std::vector<std::string> result;
    // By reserving 'result', we can ensure that no copying or moving will be done in the vector
    // as it will have capacity for the maximum number of elements we will be inserting
    // If we make the assumption that no allocation occurs for size zero
    // and allocating a large block of memory takes the same time as a small block of memory
    // this will never slow down the program
    // Side note: Compilers can even predict this and remove the checks the growing from the generated code
    result.reserve(v.size());
    std::set<std::string> checkUnique;
    for (const auto &s : v) {
        // See example above
        if (checkUnique.insert(s).second)
            result.push_back(s);
    }
    return result;
}

효율적인 컨테이너 사용

적시에 올바른 데이터 구조를 사용하여 최적화하면 코드의 시간 복잡성을 변경할 수 있습니다.

// This variant of stableUnique contains a complexity of N log(N)
// N > number of elements in v
// log(N) > insert complexity of std::set
std::vector<std::string> stableUnique(const std::vector<std::string> &v) {
    std::vector<std::string> result;
    std::set<std::string> checkUnique;
    for (const auto &s : v) {
        // See Optimizing by executing less code
        if (checkUnique.insert(s).second)
            result.push_back(s);
    }
    return result;
}

요소 (트리 대신 해시 컨테이너)를 저장하는 데 다른 구현을 사용하는 컨테이너를 사용하면 구현을 복잡도 N으로 변환 할 수 있습니다. 부작용으로 std :: string less 비교 연산자를 호출합니다. 삽입 된 문자열이 동일한 버킷으로 끝나야 만 호출되어야합니다.

// This variant of stableUnique contains a complexity of N
// N > number of elements in v
// 1 > insert complexity of std::unordered_set
std::vector<std::string> stableUnique(const std::vector<std::string> &v) {
    std::vector<std::string> result;
    std::unordered_set<std::string> checkUnique;
    for (const auto &s : v) {
        // See Optimizing by executing less code
        if (checkUnique.insert(s).second)
            result.push_back(s);
    }
    return result;
}

작은 객체 최적화

작은 객체 최적화는 낮은 수준의 데이터 구조, 예를 들어 std::string (때때로 Short / Small String Optimization이라고도 함) 내에서 사용되는 기술입니다. 예약 된 공간에 맞게 내용이 작을 경우 스택 공간을 일부 할당 된 메모리 대신 버퍼로 사용합니다.

여분의 메모리 오버 헤드와 추가 계산을 추가함으로써 값 비싼 힙 할당을 방지하려고 시도합니다. 이 기술의 장점은 사용법에 따라 다르며 잘못 사용하면 성능을 해칠 수도 있습니다.

예

이 최적화로 문자열을 구현하는 매우 순진한 방법은 다음과 같습니다.

#include <cstring>

class string final
{
    constexpr static auto SMALL_BUFFER_SIZE = 16;

    bool _isAllocated{false};                       ///< Remember if we allocated memory
    char *_buffer{nullptr};                         ///< Pointer to the buffer we are using
    char _smallBuffer[SMALL_BUFFER_SIZE]= {'\0'};   ///< Stack space used for SMALL OBJECT OPTIMIZATION

public:
    ~string()
    {
        if (_isAllocated)
            delete [] _buffer;
    }        

    explicit string(const char *cStyleString)
    {
        auto stringSize = std::strlen(cStyleString);
        _isAllocated = (stringSize > SMALL_BUFFER_SIZE);
        if (_isAllocated)
            _buffer = new char[stringSize];
        else
            _buffer = &_smallBuffer[0];
        std::strcpy(_buffer, &cStyleString[0]);
    }

    string(string &&rhs)
       : _isAllocated(rhs._isAllocated)
       , _buffer(rhs._buffer)
       , _smallBuffer(rhs._smallBuffer) //< Not needed if allocated
    {
        if (_isAllocated)
        {
           // Prevent double deletion of the memory
           rhs._buffer = nullptr;
        }
        else
        {
            // Copy over data
            std::strcpy(_smallBuffer, rhs._smallBuffer);
            _buffer = &_smallBuffer[0];
        }
    }
    // Other methods, including other constructors, copy constructor,
    // assignment operators have been omitted for readability
};

위의 코드에서 알 수 있듯이 일부 new 작업과 delete 작업을 방지하기 위해 몇 가지 추가 복잡성이 추가되었습니다. 이 클래스는 몇 가지 경우를 제외하고는 사용할 수없는 더 큰 메모리 사용량을 가지고 있습니다.

종종 _isAllocated 포인터 안에 bool 값 인 _isAllocated 를 비트 조작 으로 인코딩하여 단일 인스턴스의 크기를 줄이려고 _buffer 합니다 (인텔 64 비트 : 크기를 8 바이트 줄일 수 있음). 최적화는 플랫폼의 정렬 규칙이 무엇인지를 알고있을 때만 가능합니다.

언제 사용합니까?

이 최적화는 많은 복잡성을 추가하므로 모든 단일 클래스에서이 최적화를 사용하지 않는 것이 좋습니다. 자주 사용되는 저수준 데이터 구조에서 자주 접하게됩니다. 일반적인 C ++ 11 standard library 구현에서 std::basic_string<> 및 std::function<> 에서 사용법을 찾을 수 있습니다.

이 최적화는 저장된 데이터가 버퍼보다 작을 때만 메모리 할당을 방지하기 때문에 클래스가 작은 데이터와 함께 자주 사용되는 경우 이점을 제공합니다.

이 최적화의 마지막 단점은 버퍼를 이동할 때보다 많은 노력이 필요하므로 버퍼를 사용하지 않을 때보 다 이동 작업이 더 비쌉니다. 이는 버퍼가 비 POD 유형을 포함 할 때 특히 그렇습니다.

Modified text is an extract of the original Stack Overflow Documentation

아래 라이선스 CC BY-SA 3.0

와 제휴하지 않음 Stack Overflow

C++
C ++의 최적화

수색…