blob: 2d8e80055b0165c4eacca38c3c3aab028fe1c9e0 [file] [log] [blame]
Peter Maydellbb1cff62023-09-27 16:12:00 +01001======================================================
2Device Specification for Inter-VM shared memory device
3======================================================
Markus Armbrusterfdee2022016-03-15 19:34:25 +01004
5The Inter-VM shared memory device (ivshmem) is designed to share a
6memory region between multiple QEMU processes running different guests
7and the host. In order for all guests to be able to pick up the
8shared memory area, it is modeled by QEMU as a PCI device exposing
9said memory to the guest as a PCI BAR.
10
11The device can use a shared memory object on the host directly, or it
12can obtain one from an ivshmem server.
13
14In the latter case, the device can additionally interrupt its peers, and
15get interrupted by its peers.
16
Peter Maydellbb1cff62023-09-27 16:12:00 +010017For information on configuring the ivshmem device on the QEMU
18command line, see :doc:`../system/devices/ivshmem`.
Markus Armbrusterfdee2022016-03-15 19:34:25 +010019
Peter Maydellbb1cff62023-09-27 16:12:00 +010020The ivshmem PCI device's guest interface
21========================================
Markus Armbrusterfdee2022016-03-15 19:34:25 +010022
Markus Armbruster5400c022016-03-15 19:34:51 +010023The device has vendor ID 1af4, device ID 1110, revision 1. Before
24QEMU 2.6.0, it had revision 0.
Markus Armbrusterfdee2022016-03-15 19:34:25 +010025
Peter Maydellbb1cff62023-09-27 16:12:00 +010026PCI BARs
27--------
Markus Armbrusterfdee2022016-03-15 19:34:25 +010028
29The ivshmem PCI device has two or three BARs:
30
31- BAR0 holds device registers (256 Byte MMIO)
Markus Armbruster5400c022016-03-15 19:34:51 +010032- BAR1 holds MSI-X table and PBA (only ivshmem-doorbell)
Markus Armbrusterfdee2022016-03-15 19:34:25 +010033- BAR2 maps the shared memory object
34
35There are two ways to use this device:
36
37- If you only need the shared memory part, BAR2 suffices. This way,
38 you have access to the shared memory in the guest and can use it as
Peter Maydellbb1cff62023-09-27 16:12:00 +010039 you see fit.
Markus Armbrusterfdee2022016-03-15 19:34:25 +010040
41- If you additionally need the capability for peers to interrupt each
Markus Armbruster5400c022016-03-15 19:34:51 +010042 other, you need BAR0 and BAR1. You will most likely want to write a
43 kernel driver to handle interrupts. Requires the device to be
44 configured for interrupts, obviously.
Markus Armbrusterfdee2022016-03-15 19:34:25 +010045
Markus Armbruster1309cf42016-03-15 19:34:41 +010046Before QEMU 2.6.0, BAR2 can initially be invalid if the device is
47configured for interrupts. It becomes safely accessible only after
Markus Armbruster5400c022016-03-15 19:34:51 +010048the ivshmem server provided the shared memory. These devices have PCI
49revision 0 rather than 1. Guest software should wait for the
50IVPosition register (described below) to become non-negative before
51accessing BAR2.
Markus Armbrusterfdee2022016-03-15 19:34:25 +010052
Markus Armbruster5400c022016-03-15 19:34:51 +010053Revision 0 of the device is not capable to tell guest software whether
54it is configured for interrupts.
Markus Armbrusterfdee2022016-03-15 19:34:25 +010055
Peter Maydellbb1cff62023-09-27 16:12:00 +010056PCI device registers
57--------------------
Markus Armbrusterfdee2022016-03-15 19:34:25 +010058
59BAR 0 contains the following registers:
60
Peter Maydellbb1cff62023-09-27 16:12:00 +010061::
62
Markus Armbrusterfdee2022016-03-15 19:34:25 +010063 Offset Size Access On reset Function
64 0 4 read/write 0 Interrupt Mask
Markus Armbruster5400c022016-03-15 19:34:51 +010065 bit 0: peer interrupt (rev 0)
66 reserved (rev 1)
Markus Armbrusterfdee2022016-03-15 19:34:25 +010067 bit 1..31: reserved
68 4 4 read/write 0 Interrupt Status
Markus Armbruster5400c022016-03-15 19:34:51 +010069 bit 0: peer interrupt (rev 0)
70 reserved (rev 1)
Markus Armbrusterfdee2022016-03-15 19:34:25 +010071 bit 1..31: reserved
Markus Armbruster1309cf42016-03-15 19:34:41 +010072 8 4 read-only 0 or ID IVPosition
Markus Armbrusterfdee2022016-03-15 19:34:25 +010073 12 4 write-only N/A Doorbell
74 bit 0..15: vector
75 bit 16..31: peer ID
76 16 240 none N/A reserved
77
78Software should only access the registers as specified in column
79"Access". Reserved bits should be ignored on read, and preserved on
80write.
81
Markus Armbruster5400c022016-03-15 19:34:51 +010082In revision 0 of the device, Interrupt Status and Mask Register
83together control the legacy INTx interrupt when the device has no
84MSI-X capability: INTx is asserted when the bit-wise AND of Status and
85Mask is non-zero and the device has no MSI-X capability. Interrupt
86Status Register bit 0 becomes 1 when an interrupt request from a peer
87is received. Reading the register clears it.
Markus Armbrusterfdee2022016-03-15 19:34:25 +010088
89IVPosition Register: if the device is not configured for interrupts,
Markus Armbruster1309cf42016-03-15 19:34:41 +010090this is zero. Else, it is the device's ID (between 0 and 65535).
91
92Before QEMU 2.6.0, the register may read -1 for a short while after
Markus Armbruster5400c022016-03-15 19:34:51 +010093reset. These devices have PCI revision 0 rather than 1.
Markus Armbrusterfdee2022016-03-15 19:34:25 +010094
95There is no good way for software to find out whether the device is
96configured for interrupts. A positive IVPosition means interrupts,
Markus Armbruster1309cf42016-03-15 19:34:41 +010097but zero could be either.
Markus Armbrusterfdee2022016-03-15 19:34:25 +010098
99Doorbell Register: writing this register requests to interrupt a peer.
100The written value's high 16 bits are the ID of the peer to interrupt,
101and its low 16 bits select an interrupt vector.
102
103If the device is not configured for interrupts, the write is ignored.
104
105If the interrupt hasn't completed setup, the write is ignored. The
106device is not capable to tell guest software whether setup is
107complete. Interrupts can regress to this state on migration.
108
109If the peer with the requested ID isn't connected, or it has fewer
110interrupt vectors connected, the write is ignored. The device is not
111capable to tell guest software what peers are connected, or how many
112interrupt vectors are connected.
113
Markus Armbruster5400c022016-03-15 19:34:51 +0100114The peer's interrupt for this vector then becomes pending. There is
115no way for software to clear the pending bit, and a polling mode of
116operation is therefore impossible.
Markus Armbrusterfdee2022016-03-15 19:34:25 +0100117
Markus Armbruster5400c022016-03-15 19:34:51 +0100118If the peer is a revision 0 device without MSI-X capability, its
119Interrupt Status register is set to 1. This asserts INTx unless
120masked by the Interrupt Mask register. The device is not capable to
121communicate the interrupt vector to guest software then.
Markus Armbrusterfdee2022016-03-15 19:34:25 +0100122
123With multiple MSI-X vectors, different vectors can be used to indicate
124different events have occurred. The semantics of interrupt vectors
125are left to the application.
126
Peter Maydellbb1cff62023-09-27 16:12:00 +0100127Interrupt infrastructure
128========================
Markus Armbrusterfdee2022016-03-15 19:34:25 +0100129
130When configured for interrupts, the peers share eventfd objects in
131addition to shared memory. The shared resources are managed by an
132ivshmem server.
133
Peter Maydellbb1cff62023-09-27 16:12:00 +0100134The ivshmem server
135------------------
Markus Armbrusterfdee2022016-03-15 19:34:25 +0100136
137The server listens on a UNIX domain socket.
138
139For each new client that connects to the server, the server
Peter Maydellbb1cff62023-09-27 16:12:00 +0100140
Markus Armbrusterfdee2022016-03-15 19:34:25 +0100141- picks an ID,
142- creates eventfd file descriptors for the interrupt vectors,
143- sends the ID and the file descriptor for the shared memory to the
144 new client,
145- sends connect notifications for the new client to the other clients
146 (these contain file descriptors for sending interrupts),
147- sends connect notifications for the other clients to the new client,
148 and
149- sends interrupt setup messages to the new client (these contain file
150 descriptors for receiving interrupts).
151
Markus Armbruster62a830b2016-03-15 19:34:54 +0100152The first client to connect to the server receives ID zero.
153
Markus Armbrusterfdee2022016-03-15 19:34:25 +0100154When a client disconnects from the server, the server sends disconnect
155notifications to the other clients.
156
157The next section describes the protocol in detail.
158
159If the server terminates without sending disconnect notifications for
160its connected clients, the clients can elect to continue. They can
161communicate with each other normally, but won't receive disconnect
162notification on disconnect, and no new clients can connect. There is
163no way for the clients to connect to a restarted server. The device
164is not capable to tell guest software whether the server is still up.
165
166Example server code is in contrib/ivshmem-server/. Not to be used in
167production. It assumes all clients use the same number of interrupt
168vectors.
169
170A standalone client is in contrib/ivshmem-client/. It can be useful
171for debugging.
172
Peter Maydellbb1cff62023-09-27 16:12:00 +0100173The ivshmem Client-Server Protocol
174----------------------------------
Markus Armbrusterfdee2022016-03-15 19:34:25 +0100175
176An ivshmem device configured for interrupts connects to an ivshmem
177server. This section details the protocol between the two.
178
179The connection is one-way: the server sends messages to the client.
180Each message consists of a single 8 byte little-endian signed number,
181and may be accompanied by a file descriptor via SCM_RIGHTS. Both
182client and server close the connection on error.
183
Markus Armbruster71c26582016-03-15 19:34:30 +0100184Note: QEMU currently doesn't close the connection right on error, but
185only when the character device is destroyed.
186
Markus Armbrusterfdee2022016-03-15 19:34:25 +0100187On connect, the server sends the following messages in order:
188
1891. The protocol version number, currently zero. The client should
190 close the connection on receipt of versions it can't handle.
191
1922. The client's ID. This is unique among all clients of this server.
193 IDs must be between 0 and 65535, because the Doorbell register
194 provides only 16 bits for them.
195
1963. The number -1, accompanied by the file descriptor for the shared
197 memory.
198
1994. Connect notifications for existing other clients, if any. This is
200 a peer ID (number between 0 and 65535 other than the client's ID),
201 repeated N times. Each repetition is accompanied by one file
202 descriptor. These are for interrupting the peer with that ID using
203 vector 0,..,N-1, in order. If the client is configured for fewer
204 vectors, it closes the extra file descriptors. If it is configured
205 for more, the extra vectors remain unconnected.
206
2075. Interrupt setup. This is the client's own ID, repeated N times.
208 Each repetition is accompanied by one file descriptor. These are
209 for receiving interrupts from peers using vector 0,..,N-1, in
210 order. If the client is configured for fewer vectors, it closes
211 the extra file descriptors. If it is configured for more, the
212 extra vectors remain unconnected.
213
214From then on, the server sends these kinds of messages:
215
2166. Connection / disconnection notification. This is a peer ID.
217
218 - If the number comes with a file descriptor, it's a connection
219 notification, exactly like in step 4.
220
221 - Else, it's a disconnection notification for the peer with that ID.
222
223Known bugs:
224
225* The protocol changed incompatibly in QEMU 2.5. Before, messages
226 were native endian long, and there was no version number.
227
228* The protocol is poorly designed.
229
Peter Maydellbb1cff62023-09-27 16:12:00 +0100230The ivshmem Client-Client Protocol
231----------------------------------
Markus Armbrusterfdee2022016-03-15 19:34:25 +0100232
233An ivshmem device configured for interrupts receives eventfd file
234descriptors for interrupting peers and getting interrupted by peers
235from the server, as explained in the previous section.
236
237To interrupt a peer, the device writes the 8-byte integer 1 in native
238byte order to the respective file descriptor.
239
240To receive an interrupt, the device reads and discards as many 8-byte
241integers as it can.