送信者: Ed Himwich <weh@ivscc.gsfc.nasa.gov>
宛先: Alan R. Whitney <awhitney@haystack.mit.edu>
Cc: Wayne Cannon <wayne@sgl.crestech.ca>; Brent Carlson <bcarlson@sol.drao.nrc.ca>; Dick Ferris <dferris@atnf.csiro.au>; Dave Graham <p062gra@mpifr-bonn.mpg.de>; Nori Kawaguchi <kawagu@hotaka.mtk.nao.ac.jp>; Tetsuro Kondo <kondo@crl.go.jp>; Sergei Pogrebenko <svp@jive.nl>; Misha Popov <mpopov@asc.rssi.ru>; Jon Romney <jromney@nrao.edu>; Ralph Spencer <res@jb.man.ac.uk>; Ed Himwich <weh@gemini.gsfc.nasa.gov>; Ari Mujunen <amn@kurp.hut.fi>
件名 : Re: VSI-S draft 2
日時 : 2001年3月30日 5:48

I am joining the discussion rather late, so I may be unaware of previous
discussions about VSI-S. I am sure it developed during discussions of
VSI-H. Consequently, I may have missed the point of some things. I have
tried to anticipate this in my comments. If I am off base on something
please let me know. The background I bring to this problem is that I
responsible for the FS, which communicates with a variety of VLBI
back-ends and also many different antennas. Every one of these systems
has it own communication interface with its own peculiarities. Since the
VSI-S interface is being created from whole cloth I see it as a
wonderful opportunity to avoid, in the future, some of the peculiarities
that I have had to deal with. For the most part the draft specification
seems to be on the right track and the second revision has corrected
some problems in the first. I would note that my concern is primarily
with the DIM, in some cases there may be corresponding changes that
should be made in parallel for the DOM side as well. I have included
some general comments first and then some specific items. Last but not
least I have some questions about VSI-H which have some impact on my
understanding of VSI-S. I apologize for being so long winded, but since
I have not been involved in previous discussions I am unsure of where
the group is at in these discussions.

GENERAL COMMENTS

The addition of an unambiguous start of transmission character to the
communication string is an important improvement for serial connections.
It is most important for the commands sent to the DTS and unsolicited
responses in full-duplex mode, but less so for the half-duplex
responses. Perhaps it is not needed for Ethernet connections. However
for simplicity I suppose it should be used in all cases, although it
certainly isn't minimal.

I agree with some comments that have been made that the DTS should
respond even to incorrectly formed commands. Obviously it cannot detect
all cases, but the more information it can return to the controller the
better. With the unambiguous start character for each transmission, the
controller can effectively require the DIM to discard what it is doing
and reconsidering its input from scratch.

I would strongly encourage case insensitivity in all places where
possible. This would specifically include keywords and fields with fixed
choices for values. It would not include fields where the contents are
unpredictable and it might be useful to have case variations (but this
is purely stylistic I think, like capitalization of place names). I
think the problem here is being case sensitive implies that different
cases might mean different things. You can do that explicitly in some
general form (like upper case keywords in general mean something
different than lower case keywords, perhaps commands versus queries),
but I don't see that here. If it isn't a general feature of the
interface the worse case is that the commands are sensitive in more
subtle ways and you have to be careful how you enter a command. While it
is true that the VSI-S seems primarily machine-to-machine oriented,
people still have to write the commands (in code if nowhere else) that
are used. A program that emulates a terminal can always map everything
to lower case, as could the low-level driver in a more sophisticated
program, but why add the complication? In the best it adds no
functionality and in the worst case it adds confusion.

I don't see much role for sending comments over the interface. That
isn't to say that isn't useful to have comments in a command file or in
a log file written by a program that executes the commands in a command
file or by the DTS in an internal log file. Otherwise I would say don't
waste the communication bandwidth by sending the DTS strings it will
just throw away anyway (and the controller will retry if gets an error
response?). If the DTS would actually record these comments internally
it is a different thing, but I wonder if this would really be used.
These are allowed for in the informational commands and that seems fine.
Comments that are injected into the PDATA are of course a different
issue and useful. The setting of the auxiliary data (PDATA?) should be
possible to through the control interface to avoid the requirement of a
PDATA interface. Is this the role of the informational commands or do we
need an explicit command that would specify the auxiliary data? I have
to admit the relationship between informational data and the PDATA 
(auxiliary data?) seems a bit vague to me. I don't understand it anyway.

I think we should have shorter time-outs than 0.5 seconds. For a serial
connections I think something much shorter, perhaps a per character
time-out that is about 2-3 times the character transmission time would
be much better. I have a lot of reasons for preferring this, one of the
major ones that is the main function of a time-out is detection of the
fact that contact has been lost with a device. There is no problem
taking up to 0.5 seconds to detect this fact. However, it would be a
problem if the result were that a DTS were implemented in such a way
that it typically required 0.5 seconds to respond to commands and
queries. It would bog down communication considerably. My best
experiences with devices have been those that respond immediately to
every communication, even if they then have to think a bit afterwards
about how to complete the request (you then monitor the status of the
request by polling). How this is approached for Ethernet connections I
am not sure, some one more familiar with Ethernet for real-time control
might make a suggestion. In any event it would seem like the controller
should be allowed to define the time-out it uses depending on the path.
A dedicated local connection should be extremely fast. A long path over
the Internet would require a longer time-out (I have had problems
communicating from Maryland to an S2 in Ottawa). One issue I don't
really understand is whether TCP or UDP should be used. It seems as
though a reliable connection is required and that if you used UDP, you
would end-up re-creating a TCP like service. Again I am no expert, but
it seems like you might have to specify a lot of details about this. On
the other hand if the handling of time-outs and lost connections in TCP
is awkward (which it seems to be in some cases), you might actually
prefer UDP. In any event I think there probably need to be more details
specified about how TCP or UDP connections would work.
 
The sequence numbering, check-sums, and re-try mechanism is very
sophisticated, but it seems like overkill to me. Generally short local
RS-232 connections (even fairly long ones using long haul modems) are
very reliable. Likewise TCP/IP includes error-correcting protocols. It
would seem like the most likely use of this would be for UDP (but as I
asked above, I wonder if we really want UDP). However check-sum seem
straightforward, and not a big deal.

Why is there a limit on retries? It should be up to the controller to
decide when communication has failed. There seems to be some problems in
the use of sequence numbers, e.g., what if there is a long break in
communication. I would assume that the receipt of a command with a
different sequence number would delete the retry response of the
previous command. For that matter if the same command is received with a
different sequence it would be consider a different request and would
presumable delete the old retry response and give a new one. What if
after a long communication break the same command occurs with same
sequence number (not that far-fetched for a status request especially
since there are only 94 sequence numbers). If the DTS sends the retry
response it might be quite old data. I would propose instead of sequence
numbers we include parameters for retrying on the small number of
commands where a retry capability is needed (status mostly I think). You
might append a "/r" to the command to indicate that it is a retry. This
would have the disadvantage of moving the retry up from the lowest
layers of the controller's interface. However, since only a few commands
need this capability, it is seems appropriate for it to be only
associated with those commands and at the higher level in the
controller's interface where these commands are processed. This would
allow us to get rid of sequence numbers, which are an unnecessary
complication for the majority of commands. It will also eliminate this
problem of returning a latched value that might be old as described
above. There should no loss of the connection of cause and effect by
pairing sequence numbers between request and response. The keyword is
returned to allow this pairing, but isn't that strong (the sequence
number isn't completely strong either) a connection. For RS-232 the real
pairing of cause and effect would have to be that if the line is
verified to be idle for at least the (inter-character) time-out period
before a request is sent to the DTS. I am not sure how to generalize
this to Ethernet, but in any case it seems like some more complicated
mechanism may be required because there might be very long network
delays. What do you do when the connection has broken for some time to
prevent very old (and now out of date) packets that have been sent from
being acted on, re-establish the connection? Is this handled is some
straightforward way by a reliable connection protocol like TCP/IP?

The delay completion monitoring seems cumbersome. I would prefer just
having commands to query the state of whatever is being done. For
example to start recording, you could command it, the query the state
and the state could transition from stopped to ramping to recording. I
think this is more useful for monitoring the equipment anyway.

If we can avoid delayed execution completion responses (unsolicited) and
use periodic TVR status polling (see below) instead of unsolicited
response, this would get rid of the type 1 unsolicited response in
S-6.3. Then the only unsolicited responses would those of type 2, which
are truly unsolicited although they are still effectively detected by
polling.

The situation for unsolicited responses on the half-duplex line is
greatly improved in my opinion, i.e., there aren't any now (except those
detected by polling). I am a little concerned though that there may be a
potential buffer problem in the DTS with such things as periodic TVR
reporting. I might suggest a slightly different approach, which is that
TVR reporting have some defined (or adjustable) period for collection of
results, say one second tied to the 1PPS. Then it is the controller's
responsibility to query the DTS at the appropriate time relative to the
1 PPS to pick-up the previous period's results. I'm not sure if there
other periodic response or unsolicited response that might have a
problem with buffering if they get queued.

Although potentially useful, I don't think I would ever use the
full-duplex mode since this implies accepting unsolicited responses. It
is unlikely to be a problem, but has the potential to get out of
control. I'm not sure what functionality it adds. There might be some
scheme where an unsolicited response would spawn a process to handle it,
but this is getting too far from being deterministic for me to consider
it for a real-time application. Not to say that it is impossible. I'm
not sure we need something that sophisticated.

I would suggest that there be commands to increment and decrement the
DOT by one second. This is useful for setting the time in cases where
there are timing problems and/or a long Internet connection. In
addition, although I am not sure that falls anywhere in any
specification, but it should be encouraged that any DIM have a
(preferably hardware) display of the least significant seconds digit at
minimum.

I don't understand why the DOT_set query has to return the time only at
the epoch of the 1 PPS. Why can't it return the actual time with as much
resolution as possible when asked? This would remove a potentially long
delay in the response (it is getting really bad if in addition you then
have a 0.5 second delay in getting the response). For RS-232, it is more
complicated, but I think to more useful to have a mechanism like: the
terminating character (0x04) in the DOT_set query triggers the reading
of the time, which is completed before the first character of the
response is returned (0x01). This allows the control computer to crudely
measure the offset between its own time and that of the DIM. This is
quite useful for monitoring equipment during operations. I do not how
you make a corresponding mechanism for Ethernet. Maybe the response time
on a LAN is short enough to make this a non-issue (the FS seems to be
able to handle it okay for S2 recorders, but I wonder if we would use a
different design if we a starting over from first principles). In any
event you get an estimate of the precision of the measurement by
checking the time difference between before the request and after the
response (should be about 2 character times if the above sequence is
used).

Should DTS specific parameters be included in commands (keywords) with
DTS non-specific parameters? Is using trailing fields for DTS specific
parameters good enough? I think it would cleaner and easier to identify
a vanilla subset if DTS specific request and response are in separate
commands.

It would seem to be useful to mention something about the communication
interface being separated in some way from the control program for the
DTS. I have run into several cases where this was not followed and the
result was communication with the device could effect the results. I
would think the designer should at least think about this so that there
is no problem either for the DTS or the controller talking to it.
 
SPECIFIC COMMENTS

5.2 comments 5. Presumable the DTS won't hang, but only the interface to
the controller would hang.

6.1 for a='1' instead of "relevant", perhaps "can occur". Actually I
would prefer to get rid of "a" altogether. This seems to overburden the
response code. Since handling unsolicited responses would presumably
initiated at a higher level I would rather just have a command that
polls to see if there are (or maybe returns the first) unsolicited
response. The use of b=1 seems superfluous. If a delayed execution
command gives an okay response I would think we could assume that the
action has been initiated. In other words b=1 and b=2 should be combined
into one response.

6.3 If we do stick with sequence numbers, it doesn't seem to me to be a
good idea to require the same <S> for checking on the status. This might
imply that some subsystem (like media positioning) might be able to be
executing different long execution command simultaneously and you could
check on the results independently. That might be useful, but I think it
would be easier to have actually different commands (or maybe
sub-parameters in a command) if you want to do that. A request like,
position? would be independent of what <S> was last used to command it
and would remove some bookkeeping burden from the DTS.

10.1 There is an extra "." in comm_set. What is the default on power-up,
half-duplex I hope? It should be specified

10.2 I would suggest that DTS_id? response be separated into responses
to three commands. The first the DTS_id? would be the system id, the
possible value of which would be coordinated by central organization.
The second, DTS_version? which would be an uncontrolled literal ASCII.
The third, DTS_media? If system id is DTS specific then it seems so is
media type.

10.3 for DIM_mask you could say use 2^n to specify the bit order. I
think some one brought this up already. For PDATA is there an option to
accept all commands including DOT_set. For position, instead of using
special numbers for dismount and stop, how about using explicit commands
like stop and dismount. 

10.4 Please don't use special values, like "-1" for position to indicate
dismounting. There might instead be a "media state" query that returns
loaded, loading, unloading, unloaded, positoning, and (others?)

10.8 surely there is some contradiction between the commands being
no-ops and containing potentially useful information (some one brought
this too I think), maybe they contain potentially useful information,
but may be no-ops in any particular DTS. Again I ask if the PDATA (or
auxiliary data?) can be formed from these or if we need a specific
command for that. I suppose they might have some more use than that,
perhaps display on some console or recorded in a log file.

VSI-H QUESTIONS

There are a couple of things I didn't understand about the VSI-H
document. These may have been due to my missing the relevant
information, so please excuse me if these are spurious points.

I don't understand the P/QDATA time setting mechanism. It seems like
this is an attempt to implement a DOM-to-DIM direct tape copying system
without the use of a controller. I don't mean to sound like a union shop
steward, but is this really feasible? It seems like you would need to
have a controller to start and stop the tape, select clock rates, which
tracks are in use, etc? In fact the controller might want to use the
observing schedule and log to coordinate the copy operation. If so then
it seems like you could safely remove the DOT-set function from both
sides of the interface. The P/QDATA interface could then be used
exclusively for injecting and recovering a small amount of auxiliary
data to/from the DTS. It seems well suited for that purpose. It should
be possible to send all the auxiliary data over the control interface so
that a separate connection to the PDATA input for this purpose is not
required.

Why is there is an alt1pps? It seems as though selecting the 1PPS source
would be an external function. Probably there has been a lot of
discussion about this that I have missed. So taking it for granted that
ALT1PPS is needed, shouldn't it be requirement for correct operation
that when the 1PPS source is changed that a DOT_set is required? It
seems to be a potentially confusing situation if the 1 PPS can be
different from that determined the clock setting. I guess the clock
marches on using the old 1 PPS defined epoch, if you don't do a DOT_set,
but what is the point of that? Perhaps this is to allow potential
internal clock comparison circuitry in the DTS to be used to figure out
what is going on in some pathological case. Otherwise it seems like it
should be disallowed or at least discouraged.

How far in advance of the 1PPS does the DOT_set have to be issued?

H-8.1.2.5 obviously might include some AND operation on a PVALID input
and the playback units judgement of whether the reproduction is faithful

The fact that Fclock is the DAS sampler rate isn't explained until
H-10.1.3, perhaps this should be explained somewhere earlier and more
prominently, perhaps in H-7.1.1.1, since it is so fundamental. It is
hinted at in other places, but the lack of an explanation early on seems
like an omission. One thing that is confusing is that H-7.1.1.1 says
that CLOCK is a clock accompanying the bit streams, but if I understand
correctly in fact is not the clock OF the bit streams.  If I want to
change sample rates do I change Fclock to the new sample rate, Fbsi to
the new bit rate, and then do DOT_set again? It is unfortunate if
changing the sample rate requires resetting the clock. I presume section
H-7.3.1 should also mention that the controller would specify the Fclock
to the DIM as well.

12.3 Both RS-232 and Ethernet interfaces being active at the same time
seems problematic. The control interface needs to provide a function to
allow one interface to lock out command functions on the other
interface. Should there also be a lockout on monitor requests since some
status requests reset the values?

The document seems to imply that only pre-existing systems are allowed
to use DR format. Will all new systems be NDR?

Ed
 