历史记录

清除记录

猜你想搜

AcWing热点
App
登录/注册

SGI-STL std::sort()硬核分析

作者：

王鶴凱 , 2024-04-11 23:19:32 , 所有人可见 , 阅读 16

SGI-STL sort()详细解析

“STL所提供的所有算法中，sort() 是最复杂最庞大的一个。”

0、前置内容

STL的sort算法，数据量大时采用了Quick Sort()，分段递归排序，一旦分段后的数据量小于某个门槛，为避免Quick Sort()的递归调用带来过大的额外负荷，就改用Insertion Sort，如果递归层次过深，还会改用Heap Sort()。

因此需要了解快速排序，插入排序和堆排序。

1、sort()源码剖析

源代码 https://github.com/gcc-mirror/gcc，采用分阶段展示代码方式，尝试解析sort()。

1.1 __sort()

  // sort

  template<typename _RandomAccessIterator, typename _Compare>
    _GLIBCXX20_CONSTEXPR
    inline void
    __sort(_RandomAccessIterator __first, _RandomAccessIterator __last,
       _Compare __comp)
    {
      if (__first != __last)
    {
      std::__introsort_loop(__first, __last,
                std::__lg(__last - __first) * 2,
                __comp);
      std::__final_insertion_sort(__first, __last, __comp);
    }
    }

我们可以看到sort()接受一个范围[first, last)，和一个仿函数__comp, 紧接着如果范围first!=last,就执行两个函数，稍后解释这两个函数，下面是对函数作用和参数的解释。

①__introsort_loop()，采用混合排序(quick sort and heap sort)，传递范围[first,last], 并计算 log(last-first)*2,传递仿函数。

introsort由来：由于不当的quick_sort()划分元素的选择，即为枢纽(pivot)的选择，导致不当的分割，使得quick_sort()恶化为O(n^2)。David R.Musser提出一种混合式排序算法：introspective Sorting(内省式排序) ，简称为IntroSort。其行为大部分情况下几乎与quick sort()完全相同，但当分割导致恶化倾向时，能够自我侦测，转而改用Heap Sort(),使效率维持在Heap Sort的O(NlogN)。

②__final_insertion_sort(),采用插入排序，传递范围[first,last],传递仿函数

2、introsort混合排序

2.1 __introsort_loop()

  /**
   *  @doctodo
   *  This controls some aspect of the sort routines.
  */
  enum { _S_threshold = 16 };
  /// This is a helper function for the sort routine.
  template<typename _RandomAccessIterator, typename _Size, typename _Compare>
    _GLIBCXX20_CONSTEXPR
    void
    __introsort_loop(_RandomAccessIterator __first,
             _RandomAccessIterator __last,
             _Size __depth_limit, _Compare __comp)
    {
      while (__last - __first > int(_S_threshold))
    {
      if (__depth_limit == 0)
        {
          std::__partial_sort(__first, __last, __last, __comp);
          return;
        }
      --__depth_limit;
      _RandomAccessIterator __cut =
        std::__unguarded_partition_pivot(__first, __last, __comp);
      std::__introsort_loop(__cut, __last, __depth_limit, __comp);
      __last = __cut;
    }
    }

函数首先判断了[first, last)范围是否大于我们给定的阈值_S_threshold(16)，如果区间范围小于16的话，它将什么都不做，转而执行insert_sort();

紧接着判断了__depth_limit == 0，还记得吗？ __depth_limit是我们传递的参数 log(last-first) * 2，那么为什么要传递这个参数呢？我们知道，在理想的情况下，quick_sort每次正好都选取值为中间的元素作为枢纽(pivot)，那么整个快速排序产生的递归树就应该是log(N)层的，最坏情况下递归树的层次应该就是N层的。STL取log(N) * 2作为一个分解点。当递归层次大于log(N)*2层时，说明了quick_sort有了恶化的倾向，这时我们就应该采用heap sort。

tips：gcc通过了一种Median-of-Three（三点中值）的方法选择枢纽，为了避免”元素当初输入不够随机“所带来的恶化效应，最理想那个的方法就是取整个序列的头、尾、中央三个位置的元素，以其中值作为枢纽。这种做法被称为 median-of-tree paritioning。

然后每次执行一次quick_sort()，都会递减__depth_limit;让我们看看执行quick_sort 都做了些什么吧。

① __unguarded_partition_pivot()，无监督快速排序，传递范围[first, last)和仿函数，选择合适的pivot 返回给__cut;

②__introsort_loop() ，递归调用，传递范围[cut,last],深度限制__depth_limit,和仿函数

③__last = __cut，单边递归法

2.2 __unguarded_partition_pivot() 无边界检查快速排序

  /// This is a helper function...
  template<typename _RandomAccessIterator, typename _Compare>
    _GLIBCXX20_CONSTEXPR
    inline _RandomAccessIterator
    __unguarded_partition_pivot(_RandomAccessIterator __first,
                _RandomAccessIterator __last, _Compare __comp)
    {
      _RandomAccessIterator __mid = __first + (__last - __first) / 2;
      std::__move_median_to_first(__first, __first + 1, __mid, __last - 1,
                  __comp);
      return std::__unguarded_partition(__first + 1, __last, __first, __comp);
    }

  /// This is a helper function...
  template<typename _RandomAccessIterator, typename _Compare>
    _GLIBCXX20_CONSTEXPR
    _RandomAccessIterator
    __unguarded_partition(_RandomAccessIterator __first,
              _RandomAccessIterator __last,
              _RandomAccessIterator __pivot, _Compare __comp)
    {
      while (true)
    {
      while (__comp(__first, __pivot))
        ++__first;
      --__last;
      while (__comp(__pivot, __last))
        --__last;
      if (!(__first < __last))
        return __first;
      std::iter_swap(__first, __last);
      ++__first;
    }
    }

就像快速排序模板中做的那样，此函数实现了快速排序的前两步，1）确立分界点pivot；2）调整区间

首先函数取中值mid，然后调用了函数__move_dedian_to_first() ，我们可以看到此函数传递了四个变量，分别是first、first+1、mid、last-1、通过函数名称我们可以大致推断出此函数将三点中值返回给了first（此处不在给出源码是因为不想让复杂的源代码看花了眼，当然你也可以在这里找到源码)。那么哪三个点呢？是的，first+1、mid、last-1、选择它们中的中值是也就是完成了pivot的选择。它们的大哥first作为被中值赋值的元素枢纽站岗去了。

此处的三点中值是根据仿函数__comp确定的最适合的值，如存在a,b,c；comp(a,b)==false, comp(b,c)==true; 则b为中值

做好这一切之后，函数返回了__unguarded_partition() 像是快速排序一样进行调整区间，但是它是无边界检查的，但是它不会产生越界错误。因为在comp(first,pivot) 中一定有元素first大于pivot，comp(pivot,last)中一定有元素last小于pivot，它们都会停下来，毕竟pivot是其中某两个元素的中值；代码减少了两次边界判断，减少开始的一次边界判断和最后一次边界判断，略微提升速度。真是无不用其极啊！

但当仿函数存在问题时，可能导致异常。如当__comp 为 [](int a, int b){ return a<=b;}就可能导致comp比较时一路滑出范围导致越界错误~

2.3 单边递归法

    _RandomAccessIterator __cut = std::__unguarded_partition_pivot(__first, __last, __comp);
    std::__introsort_loop(__cut, __last, __depth_limit, __comp);
    __last = __cut;

或许开始看到这个递归代码有一点点疑惑，为什么只递归了一次？快排模板中调整区间后就会递归左右区间了，此处只递归了右区间，而左区间继续进行了循环。这样是为了节约递归调用的开销，省去左半区间的递归。（有待详细探究！）

2.4 堆排序

是时候让我们回过头来看看当__depth_limit == 0的时候的处理了，此时调用了std::__partial_sort(__first, __last, __last, __comp);进行了推排序；

  template<typename _RandomAccessIterator, typename _Compare>
    _GLIBCXX20_CONSTEXPR
    inline void
    __partial_sort(_RandomAccessIterator __first,
           _RandomAccessIterator __middle,
           _RandomAccessIterator __last,
           _Compare __comp)
    {
      std::__heap_select(__first, __middle, __last, __comp);
      std::__sort_heap(__first, __middle, __comp);
    }
  /// @cond undocumented

  /// This is a helper function for the sort routines.
  template<typename _RandomAccessIterator, typename _Compare>
    _GLIBCXX20_CONSTEXPR
    void
    __heap_select(_RandomAccessIterator __first,
          _RandomAccessIterator __middle,
          _RandomAccessIterator __last, _Compare __comp)
    {
      std::__make_heap(__first, __middle, __comp);
      for (_RandomAccessIterator __i = __middle; __i < __last; ++__i)
    if (__comp(__i, __first))
      std::__pop_heap(__first, __middle, __i, __comp);
    }

函数使用了基本的堆操作：make_heap(),pop_heap(),sort_heap()就不再过多赘述了。

3、插入排序

3.1 final_insertion_sort()

  enum { _S_threshold = 16 };

  /// This is a helper function for the sort routine.
  template<typename _RandomAccessIterator, typename _Compare>
    _GLIBCXX20_CONSTEXPR
    void
    __final_insertion_sort(_RandomAccessIterator __first,
               _RandomAccessIterator __last, _Compare __comp)
    {
      if (__last - __first > int(_S_threshold))
    {
      std::__insertion_sort(__first, __first + int(_S_threshold), __comp);
      std::__unguarded_insertion_sort(__first + int(_S_threshold), __last,
                      __comp);
    }
      else
    std::__insertion_sort(__first, __last, __comp);
    }

最后就是对固定区间16大小的范围或者小于区间为16的范围进行插入排序了。