Skip to content

introduce ndarr support #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 87 commits into
base: main
Choose a base branch
from
Open

introduce ndarr support #98

wants to merge 87 commits into from

Conversation

kafka1991
Copy link
Collaborator

@kafka1991 kafka1991 commented Apr 1, 2025

introduce ndarr support

TODO Lists

  • Line protocol version tests in c, c++ and system tests (rust client done)
  • Rust client's support for common built-in array and vector APIs (for ease of use, less 3 dims)
  • add more system array tests
  • Python client support

@kafka1991 kafka1991 self-assigned this Apr 1, 2025
@kafka1991 kafka1991 marked this pull request as draft April 1, 2025 15:04
@kafka1991 kafka1991 marked this pull request as ready for review April 3, 2025 09:05
@@ -0,0 +1,133 @@
pub(crate) const MAX_DIMS: usize = 32;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this top-level in questb::ingress and make it public, then add a doc to it. It might be useful to callers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 58 to 59
/// Buffer outOfMemory.
BufferOutOfMemory,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -24,8 +24,8 @@

#![doc = include_str!("mod.md")]

pub use self::ndarr::*;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The * risks leaking too many types.
Instead just pick the ones you explicitly are sure you want in the public API.

@@ -29,6 +29,9 @@ mod http;
mod mock;
mod sender;

#[cfg(feature = "ndarray")]
mod ndarr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be tests for the trait directly as well, even without ndarray enabled.

This is to be sure that we never tie the ndarray feature to the ability to write arrays.

The tests that check the NdArrayView trait should also check what happens when the trait does nothing or less data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another test to write is if the trait gives back the wrong number of dimensions.

Comment on lines 641 to 645
template <typename T, size_t N>
line_sender_buffer& column(
column_name_view name,
const std::vector<uint32_t>& shape,
const std::array<T, N>& data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to come up with something that's a bit more idiomatic C++.
Let's park the C++ API for now until we have everything else nailed down.

Comment on lines 636 to 639
self.f64serializer = match version {
LineProtocolVersion::V1 => f64_text_series,
LineProtocolVersion::V2 => f64_binary_series,
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally function dispatch is actually a fair bit slower than a branch.
Remove this field and just use a match on the protocol.

Copy link
Collaborator

@amunra amunra May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could bias the V2 behaviour with std::hint::likely.
Inlining the f64 serialization functions is probably a good idea too.

@@ -953,8 +1012,7 @@ impl Buffer {
Error: From<N::Error>,
{
self.write_column_key(name)?;
let mut ser = F64Serializer::new(value);
self.output.extend_from_slice(ser.as_str().as_bytes());
(self.f64serializer)(&mut self.output, value);
Copy link
Collaborator

@amunra amunra May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if likely(!matches!(version, LineProtocolVersion::V1)) {
    // Protocols V2+
    binary
} else {
    // Protocol V1
    text
}


/// Line Protocol Version supported by current client.
#[derive(Debug, Copy, Clone, PartialEq)]
pub enum LineProtocolVersion {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to ProtocolVersion to match what I think is a better config setting, I.e. protocol_version=1|2|auto.

/// # {
/// # use questdb::Result;
/// # use questdb::ingress::Buffer;
/// # use ndarray::array;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't comment the use ndarray::array line here. Should be shown as part of the example.

/// Record a multidimensional array value for the given column.
///
/// Supports arrays with up to [`MAX_ARRAY_DIMS`] dimensions. The array elements must
/// implement [`ArrayElement`] trait which provides type-to-[`ElemDataType`] mapping.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ElemDataType has been renamed now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth listing all the supported datatypes here so users don't need to hunt around in the documentation.

Comment on lines 1099 to 1116
/// Using [`ColumnName`] for validated column names:
///
/// ```
/// # #[cfg(feature = "ndarray")]
/// # {
/// # use questdb::Result;
/// # use questdb::ingress::{Buffer, ColumnName};
/// # use ndarray::Array3;
/// # fn main() -> Result<()> {
/// # let mut buffer = Buffer::new();
/// # buffer.table("x1")?;
/// // Record a 3D array of f64 values
/// let array_3d = Array3::from_elem((2, 3, 4), 42f64);
/// let col_name = ColumnName::new("col1")?;
/// buffer.column_arr(col_name, &array_3d.view())?;
/// # Ok(())
/// # }
/// # }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this example and show one with slices and another with vectors.

You can use some headings here.

Run

cargo doc --open

To preview.

@@ -1199,6 +1426,9 @@ pub struct Sender {
handler: ProtocolHandler,
connected: bool,
max_buf_size: usize,
default_line_protocol_version: LineProtocolVersion,
#[cfg(feature = "ilp-over-http")]
supported_line_protocol_versions: Option<Vec<LineProtocolVersion>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth commenting why it's an Option<Vec<..>> and not just a Vec<..>.

I'm assuming that's because it's None if no probing was done.
I'm just afraid neither of us will rememember in 6 months time.

///
/// Please ensure client's default version ([`LINE_PROTOCOL_VERSION_V2`]) or
/// explicitly set protocol version exactly matches server expectation.
pub fn disable_line_protocol_validation(mut self) -> Result<Self> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with pub fn protocol_version(mut self, vers: ProtocolVersion), overriding the default when called.

@@ -1199,6 +1426,9 @@ pub struct Sender {
handler: ProtocolHandler,
connected: bool,
max_buf_size: usize,
default_line_protocol_version: LineProtocolVersion,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with protocol_version: Option<ProtocolVersion> where None means "auto".

Comment on lines 409 to 412
pub(super) fn get_line_protocol_version(
state: &HttpHandlerState,
settings_url: &str,
) -> Result<(Option<Vec<LineProtocolVersion>>, LineProtocolVersion), Error> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document return type: I would have expected a single Vec<ProtocolVersion>.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also ensure that there's tests for the following scenarios:

  • The server returns 404.
    • Default to V1
  • The server returns, but the field is missing from the JSON.
    • Default to V1
  • The server returns only version 1 (if the v2 protocol gets toggled via config in the server).
  • The server returns [1, 2].
    • Client uses V2.
  • The server returns [2].
    • Client uses V2.
  • The server returns [MAX_SUPPORTED_VERSION + 1].
    • The client errors, since there's no compatible protocol.

For these tests, ensure create a buffer from server.new_buffer(), add a simple with just a timestamp and an f64 and flush it, checking the serialized behaviour.

ArrayWriteToBufferError,

/// Validate line protocol version error.
LineProtocolVersionError,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to ProtocolVersionError.

@@ -83,6 +86,8 @@ json_tests = []
# Enable methods to create timestamp objects from chrono::DateTime objects.
chrono_timestamp = ["chrono"]

benchmark = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -55,6 +56,8 @@ mio = { version = "1", features = ["os-poll", "net"] }
chrono = "0.4.31"
tempfile = "3"
webpki-roots = "0.26.8"
criterion = "0.5"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants