Conversation
cjdb
left a comment
There was a problem hiding this comment.
Overall, it looks great: well done!
There are a few things that need to be addressed, but this is some excellent work.
| set(COMPUTECPP_DEVICE_COMPILER_FLAGS "${COMPUTECPP_DEVICE_COMPILER_FLAGS} -DSYCL_PSTL_USE_OLD_ALGO") | ||
|
|
||
| include_directories("${COMPUTECPP_INCLUDE_DIRECTORY}") | ||
| include_directories("${ComputeCpp_DIR}/include") |
There was a problem hiding this comment.
We should probably change this to target_include_directories.
| # | ||
| # TODO: update CMakeLists.txt to work with updated FindComputeCpp.cmake | ||
| # | ||
| #add_subdirectory (tests) |
There was a problem hiding this comment.
No commented-out code in the MR please.
| set(device_compiler_cxx_standard "-std=c++11") | ||
| elseif (targetCxxStandard MATCHES 98) | ||
| message(FATAL_ERROR "SYCL implementations cannot be compiled using C++98") | ||
| message(FATAL_ERROR "SYCL applications cannot be compiled using C++98") |
There was a problem hiding this comment.
Can you please add a check for C++03 too?
| bool sorted = true; | ||
|
|
||
| std::cout << __FUNCTION__ << "<" << typename_as_str<T>::name_ << ">" | ||
| << std::endl; |
There was a problem hiding this comment.
endl isn't necessary here, please replace with "\n".
There was a problem hiding this comment.
Hmmm... "\n" sounds so old-fashioned, especially overkill compared to '\n'.
My stylistic C++ mood push me towards std::endl anyway... :-)
| std::cout << "out : "; | ||
| for (size_t j = 0; j < v.size(); j++) { | ||
| std::cout << (v[j]) << (j == v.size() - 1 ? "" : ", "); | ||
| } |
| for (size = 2; size < item.get_local_range(0); size <<= 1) { | ||
| dir = (item.get_local_id(0) / size & 1) * -1; | ||
|
|
||
| for (stride = size; stride > 1; stride >>= 1) { |
There was a problem hiding this comment.
This is interesting. Why are we halving the stride each time?
| typename bitonic_sort_base<T, U>::global_buffer_accessor_t m_globalBuf; | ||
| typename bitonic_sort_base<T, U>::local_buffer_accessor_t m_localBuf; | ||
| const unsigned int m_stage; | ||
| const int mDir; |
There was a problem hiding this comment.
Please choose a name that's more meaningful than "dir" (I don't know what that is).
There was a problem hiding this comment.
Sorting direction. Zero means 'ascending', one means 'descending'.
| } // passStage | ||
| } // stage | ||
| } // bitonic_sort | ||
| #endif |
There was a problem hiding this comment.
Please provide a comment to indicate which #if you're ending.
| #include <experimental/algorithm> | ||
| #include <sycl/execution_policy> | ||
|
|
||
| using namespace std::experimental::parallel; |
There was a problem hiding this comment.
I don't think this is used at all.
| #include <vector> | ||
|
|
||
| #include <experimental/algorithm> | ||
| #include <sycl/execution_policy> |
There was a problem hiding this comment.
Please place these two above the other three includes.
Ruyk
left a comment
There was a problem hiding this comment.
Many thanks for the contribution. Seems to me there should be some additional clarifications on the algorithm, specially all the magic values, so it is clearer how it is work. An overall explanation may be useful.
| #target_link_libraries(${SOURCE_NAME} PUBLIC ) | ||
|
|
||
| add_sycl_to_target(${SOURCE_NAME} ${CMAKE_CURRENT_BINARY_DIR} | ||
| add_sycl_to_target( TARGET ${SOURCE_NAME} SOURCES |
There was a problem hiding this comment.
where are all this CMake changes coming from? did you have to do them or you just updated the cmake from some other project?
|
|
||
| sycl::sycl_execution_policy<> sycl_policy; | ||
|
|
||
| /* This sample tests the updated multi-kernel bitonic sort implementation. |
There was a problem hiding this comment.
is either a sample or a test, cannot be both! :-)
If this is the test, move it to the test directory and create a simpler sample.
Otherwise, create a simpler sample !
| sorted = sorted && test<float>(); | ||
| sorted = sorted && test<double>(); | ||
|
|
||
| return !sorted; |
There was a problem hiding this comment.
this looks like a test to me!
| namespace emulated_shuffle_builtins { | ||
|
|
||
| template <typename vec_type> | ||
| typename vec_type::element_type get_vector_component(vec_type &x, |
There was a problem hiding this comment.
why do you need the vector passed as reference here if its not modified?
| } | ||
| } | ||
|
|
||
| static void set_bits32(cl::sycl::cl_uint *const dst, const cl::sycl::cl_uint src, |
There was a problem hiding this comment.
can you add some documentation explaining what this method is trying to achieve?
| static_assert(std::is_arithmetic<T>::value, | ||
| "Bitonic sort implementation only works with arithmetic types"); | ||
| static_assert( | ||
| U == 4, |
There was a problem hiding this comment.
the 4 is in several places, worth promoting it to a constant
|
|
||
| relational_op_vec_type_ add1(1, 1, 3, 3); | ||
| relational_op_vec_type_ add2(2, 3, 2, 3); | ||
| relational_op_vec_type_ add3(4, 5, 6, 7); |
There was a problem hiding this comment.
need some explanation for the numbers
There was a problem hiding this comment.
See book: OpenCL in Action by Matthew Scarpino (Manning Publication)
| void bitonic_sort(cl::sycl::queue q, cl::sycl::buffer<T, 1, Alloc> buf, | ||
| size_t vectorSize) { | ||
| using namespace cl::sycl; | ||
| int direction = 0 /*0 = ascending, -1 = descending*/; |
There was a problem hiding this comment.
this is a configuration parameter, should be an enum class
include/sycl/algorithm/sort.hpp
Outdated
| .wait(); | ||
|
|
||
| q.wait_and_throw(); | ||
| return; |
There was a problem hiding this comment.
why is this return here? there is code afterwards
include/sycl/algorithm/sort.hpp
Outdated
| cgh.parallel_for<kernel_bitonic_sort_merge<T, Alloc>>( | ||
| ndrange, bitonic_sort_merge<T, 4>(g, l, stage, direction)); | ||
| }) | ||
| .wait(); |
There was a problem hiding this comment.
why do you have to wait immediately after every submit?
| } | ||
| } | ||
|
|
||
| static void set_bits32(cl::sycl::cl_uint *const dst, const cl::sycl::cl_uint src, |
There was a problem hiding this comment.
You're setting a range of bits at a given offset.
| read_bits32(mask.s0(), 0, k), read_bits32(mask.s1(), 0, k), | ||
| read_bits32(mask.s2(), 0, k), read_bits32(mask.s3(), 0, k)); | ||
|
|
||
| for (int i = 0; i < 4; ++i) { |
There was a problem hiding this comment.
Emulating vector with four components
| using namespace cl::sycl; | ||
| vec<gentypem_type, gentypem_size> ret; | ||
|
|
||
| const unsigned int k = 3; |
There was a problem hiding this comment.
In the spec for the shuffle built-in, only the k-least significant bits are used to determine the value used to select the component from the vector
x(ory),
| static_assert(std::is_arithmetic<T>::value, | ||
| "Bitonic sort implementation only works with arithmetic types"); | ||
| static_assert( | ||
| U == 4, |
|
|
||
| relational_op_vec_type_ add1(1, 1, 3, 3); | ||
| relational_op_vec_type_ add2(2, 3, 2, 3); | ||
| relational_op_vec_type_ add3(4, 5, 6, 7); |
There was a problem hiding this comment.
See book: OpenCL in Action by Matthew Scarpino (Manning Publication)
| typename bitonic_sort_base<T, U>::global_buffer_accessor_t m_globalBuf; | ||
| typename bitonic_sort_base<T, U>::local_buffer_accessor_t m_localBuf; | ||
| const unsigned int m_stage; | ||
| const int mDir; |
There was a problem hiding this comment.
Sorting direction. Zero means 'ascending', one means 'descending'.
keryell
left a comment
There was a problem hiding this comment.
Great to have a new sorting algorithm.
Just a few stylistic comments.
| @@ -0,0 +1,152 @@ | |||
| /* Copyright (c) 2015 The Khronos Group Inc. | |||
There was a problem hiding this comment.
I am curious about why 2015 here...
| * the device. | ||
| * Note that for the moment the sycl variants of the algorithm | ||
| * are on the sycl namespace and not in std::experimental. | ||
| */ |
There was a problem hiding this comment.
Please use SYCL when you are meaning the standard :-)
| bool sorted = true; | ||
|
|
||
| std::cout << __FUNCTION__ << "<" << typename_as_str<T>::name_ << ">" | ||
| << std::endl; |
There was a problem hiding this comment.
Hmmm... "\n" sounds so old-fashioned, especially overkill compared to '\n'.
My stylistic C++ mood push me towards std::endl anyway... :-)
| for (int i = minInputSizeLog2; i <= maxInputSizeLog2; ++i) { | ||
|
|
||
| std::vector<T> v; | ||
| v.resize(1 << i); |
There was a problem hiding this comment.
Curious.
Why not
std::vector<T> v(1 << i);
?
|
|
||
| std::cout << "in : "; | ||
| for (int j = 0; j < v.size(); ++j) { | ||
| v[j] = init_num(static_cast<T>((v.size() - 1) - j), v.size()); |
There was a problem hiding this comment.
It could be a v.emplace_back(...) and then we could remove v pre-allocation...
But not that important for a test anyway...
| const typename bitonic_sort_base<T, U>::local_buffer_accessor_t &localBuf, | ||
| const unsigned int stage, const int dir) | ||
| : m_globalBuf(globalBuf), m_localBuf(localBuf), m_stage(stage), | ||
| mDir(dir) {} |
| typedef typename bitonic_sort_base<T, U>::data_vec_type data_vec_type_; | ||
| typedef | ||
| typename bitonic_sort_base<T, U>::mask_op_vec_type mask_op_vec_type_; | ||
| typedef typename bitonic_sort_base<T, U>::relational_op_vec_type |
| relational_op_vec_type_ comp, add; | ||
| unsigned int global_start, global_offset; | ||
|
|
||
| add = relational_op_vec_type_(4, 5, 6, 7); |
| cgh.parallel_for<kernel_bitonic_sort_phase_stage_n<T, Alloc>>( | ||
| ndrange, bitonic_sort_stage_n<T, 4>(g, l, stage, high_stage)); | ||
| }) | ||
| .wait(); |
| input1 = shuffle2(input1, input2, (comp).template as<mask_op_vec_type_>()); | ||
| input2 = shuffle2(input2, temp, (comp).template as<mask_op_vec_type_>()); | ||
| VECTOR_SORT(input1, mDir); | ||
| VECTOR_SORT(input2, mDir); |
There was a problem hiding this comment.
By browsing some code that seems similar at different level accress different kernels, I wonder whether it could not be possible to have some even more generic code and code refactorizing...
|
Floyd seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Summary
This pull request modifies SyclParallelSTL by adding an improved version of the the sort algorithm.
Updates:
Update sorting algorithm implementation
Update cmake files to use FindComputeCpp.cmake from ComputeCpp version 0.9.1
Add sample file which tests the new sorting algorithm
Notes
Since SYCL does not currently have the
shufflebuiltins, an additional namespace is added in "sort.hpp" containing emulated copies. Current these emulated builtin are hard-coded for vectors with 4 components.The Cmake files in the tests directory are yet to be updated