Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft for vector calling convention. #171

Closed

Conversation

Hsiangkai
Copy link
Contributor

Describe the purpose of vector registers and the calling convention for vector arguments.

@kito-cheng
Copy link
Collaborator

Could you add description about vector tuple types (NF > 1)?

@ebahapo
Copy link
Contributor

ebahapo commented Jan 21, 2021

Also, please explain briefly that LMUL < 1 is the same as LMUL = 1.

Hsiangkai added a commit to llvm/llvm-project that referenced this pull request Jan 21, 2021
The maximum LMUL is 8. We need 16 vector registers for two LMUL-8
arguments. The modification follows the proposal of psABI in
riscv-non-isa/riscv-elf-psabi-doc#171

Differential Revision: https://reviews.llvm.org/D95134
@Hsiangkai Hsiangkai force-pushed the vector-calling-convention branch 2 times, most recently from 13d6969 to 31fcb90 Compare January 25, 2021 03:11
@Hsiangkai
Copy link
Contributor Author

Also, please explain briefly that LMUL < 1 is the same as LMUL = 1.

I have described it as "Vectors that are LMUL = 1 or fractional LMUL are passed in a single vector argument register."

riscv-elf.md Outdated
size of the elements in the vector. The addresses of the vector or mask values on
the stack are passed according to the integer calling convention.

Vector tuple types for Zvlsseg will be decoupled as `NF` vector values when passing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a native speaker but I wonder if flattened might be a better term than decoupling (the concept is already used in the Hard FP ABI above)

@rofirrim
Copy link
Contributor

Except for a minor comment, overall looks good to me.

riscv-elf.md Outdated
@@ -91,6 +93,17 @@ The Floating-Point Control and Status Register (fcsr) must have thread storage
duration in accordance with C11 section 7.6 "Floating-point environment
<fenv.h>".

Vector Register Convention <a name=vector-register-convention>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mark all these as "(Draft)" for now until the extension is ratified.

riscv-elf.md Outdated
Comment on lines 101 to 103
v1-v7 | | Temporary registers | No
v8-v23 | | Argument registers | No
v24-v31 | | Temporary registers | No
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if "va0" were v10 to match x and f, but given this nicely splits things up into quarters and we have far more vector argument registers than integer and float ones anyway I think this is probably the right choice, and at the end of the day it really doesn't matter.

riscv-elf.md Outdated
v8-v23 | | Argument registers | No
v24-v31 | | Temporary registers | No

*: v0 is used as the mask register for masked vector instructions. It is also used as the first mask argument in the procedure calling convention. If there is no need to use it as the mask, it can be considered a temporary register.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are mask arguments distinguished from normal vector arguments as far as the compiler is concerned looking at a function call? Is it just anything that's a vector of bools? This needs to be specified, likely in the body of text below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is distinguished by the argument type.

@aswaterman
Copy link
Contributor

See also https://github.com/riscv/riscv-v-spec/blob/c05d4a7e10b1ae474f6f4bf102aeaf04d6930bfc/calling-convention.adoc -- this describes the discipline for vl, vtype, and vcsr, as well as the fact that vector state is not preserved across system calls. (Not sure if the latter belongs in the Linux-specific section or what.)

@jrtc27
Copy link
Collaborator

jrtc27 commented Mar 14, 2021

See also https://github.com/riscv/riscv-v-spec/blob/c05d4a7e10b1ae474f6f4bf102aeaf04d6930bfc/calling-convention.adoc -- this describes the discipline for vl, vtype, and vcsr, as well as the fact that vector state is not preserved across system calls. (Not sure if the latter belongs in the Linux-specific section or what.)

Yes, system calls are out of scope for a psABI, though not preserving the state is surprising to be honest; interrupts and system calls take the same path into the kernel, so you'd have to go out of your way to make them behave differently. I'd expect the kernel to either not touch vector registers outside of context switches or, if it wants to use them, lazily save/restore them like it does for floating-point registers.

@kito-cheng
Copy link
Collaborator

I just realized this calling convention need to consider to lazy binding issue[1] too, my first impression is oh, we don't have callee-save register so we might don't have such issue, but in this proposal we have passed argument on vector register, which possible to be clobbered by lazy binding/ifunc resolver, so I guess the only possible choice for baseline vector calling convention is all argument/return values are passed in memory.

@jrtc27 @jim-wilson what do you think?

[1] #66

arichardson pushed a commit to arichardson/llvm-project that referenced this pull request Mar 30, 2021
The maximum LMUL is 8. We need 16 vector registers for two LMUL-8
arguments. The modification follows the proposal of psABI in
riscv-non-isa/riscv-elf-psabi-doc#171

Differential Revision: https://reviews.llvm.org/D95134
riscv-elf.md Outdated

There is no scalar values passed through vector registers. There is no vector
values passed through scalar registers. There is no need to define a new ABI
for vector. Vector calling convention is appliable for existing ABIs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to compatible between this convention and origin vector call convention
if assembly code is deployed in some libs, why is it not necessary to define a new ABI ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the origin vector call convention?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguments passed through memory as "origin vector call convention" ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguments passed through memory as "origin vector call convention" ?

Its's not part of standard, so I think there is no compatible issue from the standard ABI view.

riscv-elf.md Outdated
Vectors that are LMUL = 1 or fractional LMUL are passed in a single vector
argument register. Vectors that are LMUL = 2 are passed in 2-aligned vector
argument registers. Vectors that are LMUL = 4 are passed in 4-aligned vector
argument registers. Vectors that are LMUL = 8 are passed in 8-aligned vector

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if I've missed what this document is specifying (and for whom) but should it also specify what happens for vector types which are conceptually (to the programmer and/or compiler) LMUL = 16 or above? The LLVM compiler as we've implemented it, for instance, would split these vectors in half (and again until a legal type is found) and the ABI is applied to each split part independently.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current spec only list LMUL could be 1/8, 1/4, 1/2, 1, 2, 4, 8, so I think this should be enough?
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#332-vector-register-grouping-vlmul20

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's true, which is why I wasn't sure if my comment applied. I was viewing this doc from a user/compiler perspective, where it's possible (at least in LLVM) to define something that ends up as void @foo(<vscale x 128 x i8> %v) in the IR, which is LMUL > 8. It would be legal for a compiler to pre-define a type like __rvv_int8m16_t. Shouldn't that case be covered by this CC specification?

I only ask because the rest of this psABI document defines the lowering of high-level concepts like "aggregates" and "C structs", and how they must be broken down into registers and/or stack.

This vector calling convention document, on the other hand, only defines things in terms of LMUL so is more hardware-focused and feels slightly out of place with the other CC definitions. That's why it's not clear to me whether it's intended to spec out software concepts like which high-level "types" it applies to, what happens to "wide" LMUL>8 vectors, or even what is supposed to happen if the V extension isn't enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The restriction LMUL <= 8 is pretty firmly baked into the RISC-V V-extension at this point, so I think it's reasonable to have the same restriction in the associated C language dialect ("RVV intrinsics").

This being said, I don't see a problem with LLVM implementing its own extension of the RVV intrinsics dialect, with new "m16" types, as long as it's compatible with the existing dialect. I do think some discussion needs to happen over at https://github.com/riscv/rvv-intrinsic-doc before attempting to codify the calling convention here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree that LMUL <= 8 seems firm enough ground to stand on for the purposes of this document. The issue I have is mostly stemming from the unclear lowering from software to LMUL which leaves me with questions about whom this calling convention specification is aimed at.

I may be thinking of things which may be out of scope -- and apologies if I am -- but I've not just been thinking of the RVV intrinsic types. If this document is purely aimed at supporting the pre-defined set of RVV intrinsic types in https://github.com/riscv/rvv-intrinsic-doc then I think that could be clarified here.

However, as it stands, this comes across like as generic a calling convention as the integer/floating-point ones. I'll go through a list of the use cases which I believe are currently left ambiguous.

  1. At the language level, I concede that m16 doesn't make much sense when used in conjunction with the intrinsics. Though I can foresee compilers exposing "scalable" vector types (think ext_vector_type) and having support for operators on them, much like they do for the fixed vector types: something like __rvv_int8m16_t v = x + y;
  2. Auto-vectorization
  3. LLVM's support for lowering fixed-length vectors to RVV instructions (e.g. by passing the known vector length to vsetivli)

All of the above may introduce vectors into the program which may not necessarily fall into the hardware-supported LMUL categories, and I think it's natural to wonder if and how they should be handled by this vector calling convention. It's analogous to how we have to specify the ABI for integer scalars > XLEN, isn't it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the HW only support up to 8 for LMUL, SW eventually need to legalize those type into HW legal type to make it code-gen-able, so from this point I think it should not too much benefit to generate such software m16 type?

In theory vscale could be arbitrary positive integer in LLVM type system, but it's not meaning all of those types are legal for code gen, we need to handle in LLVM back-end: reject those type, legalized to nearest type or break down into several value, either is fine since it's kind of LLVM extension, so I would prefer only included that in psABI until those type become part RISC-V standard, of cause RVV intrinsic specs is one of RISC-V standard.

For fixed-length vector over RVV, it's another story, we don't have well define that like SVE's VLS yet, I am happy to see this happen but it should be separated issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your input. I think this all makes sense. I think that this draft is solid as far as hardware-level "legal types" are concerned. It's just the missing link between software and that hardware which I think is currently left open to interpretation and may cause confusion to other readers. Especially considering both the Integer and Floating-Point Calling Convention specifications are careful to explain how all of these language constructs like structs and unions and large scalars and complex numbers are handled. This vector one goes straight to LMUL without explanation.

It's sounding to me like, at the language level, this calling convention is only trying to cover the vector types defined by the RVV intrinsic specification and nothing else. And that's perfectly valid, but I do think it could be clarified in this document to correctly narrow down the scope of this calling convention.

Copy link
Contributor Author

@Hsiangkai Hsiangkai Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback. From the LLVM IR perspective, there is no limit on the LMUL values. Actually, it has no concept of LMUL in the LLVM IR. All kinds of <vscale x n x ty> are possible. However, from C/C++ language perspective, there is no LMUL = 16 types. In addition, there is no struct or array for scalable vector types. I have no idea how to create LMUL = 16 from the high level language. I think the description could be more clear. I will think about how to clarify it. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frasercrmck, I have added some description about the types we focus on in the vector calling convention. Please help me to review if it is appropriate or not, thanks.

@jrtc27 jrtc27 added this to the First Release DoD milestone Jul 9, 2021
|v24-v31 | | Temporary registers | No
|vl | | Vector length | No
|vtype | | Vector data type register | No
|===
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider function call as @kito-cheng mentioned before,
so maybe we need to make some preserved / callee saver registers.

i post a vector call convention long time ago [0], so I suggestion that
v24-v31 as preserved / callee saver in some conditions.

[0] : https://lists.riscv.org/g/tech-vector-ext/message/7?p=%2C%2C%2C20%2C0%2C0%2C0%3A%3Arecentpostdate%2Fsticky%2C%2Ccall+convention%2C20%2C2%2C0%2C69238312

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not going to add any callee-saved registers, so that we can add the V extension without an ABI break. This is a first-order goal for a number of reasons, including vectorizing library routines dynamically linked to applications that are compiled without the V extension. See related discussion about ABI breakages with callee-saved state here. https://sourceware.org/pipermail/libc-alpha/2021-September/130897.html

@kito-cheng kito-cheng removed this from the Freeze 2021 milestone Sep 28, 2021
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this pull request Oct 7, 2022
The maximum LMUL is 8. We need 16 vector registers for two LMUL-8
arguments. The modification follows the proposal of psABI in
riscv-non-isa/riscv-elf-psabi-doc#171

Differential Revision: https://reviews.llvm.org/D95134
@kito-cheng
Copy link
Collaborator

kito-cheng commented Jul 6, 2023

This proposal is deprecated and moved to #389

@kito-cheng kito-cheng closed this Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants