From owner-Big-Internet@munnari.oz.au Tue May 12 01:21:53 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA29792; Tue, 12 May 1992 01:22:07 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Message-Id: <9205111522.29792@munnari.oz.au>
Received: from bells.cs.ucl.ac.uk by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA29786; Tue, 12 May 1992 01:21:53 +1000 (from Z.Wang@cs.ucl.ac.uk)
Received: from sol.cs.ucl.ac.uk by bells.cs.ucl.ac.uk with local SMTP 
          id <g.08387-0@bells.cs.ucl.ac.uk>; Mon, 11 May 1992 16:21:36 +0100
To: Big-Internet@munnari.oz.au
Cc: J.Crowcroft@cs.ucl.ac.uk, vcerf@NRI.Reston.VA.US
Subject: an alternative solution to the space exhaustion problem
Address: Computer Science Dept, University College London, London WC1E 6BT
Date: Mon, 11 May 92 16:21:09 +0100
From: Zheng Wang <Z.Wang@cs.ucl.ac.uk>

Your comments are welcome:
---------


Network Working Group                                       Zheng Wang
Request for Comments: DRAFT                              Jon Crowcroft
                                             University College London
                                                              May 1992


             A Two-Tier Address Structure for the Internet:
         a solution to the problem of address space exhaustion


Status of this Memo

   This RFC presents a solution to problem of address space exhaustion
   in the Internet. It proposes a two-tier address structure for the
   Internet. This is an "idea" paper and discussion is strongly
   encouraged. Distribution of this memo is unlimited.

Introduction

   Address space exhaustion is one of the most serious and immediate
   problems that the Internet faces today [1,2].  The current Internet
   address space is 32-bit. Each Internet address is divided into two
   parts: a network portion and a host portion. This division
   corresponds the three primary Internet address classes: Class A,
   Class B and Class C. Table 1 lists the network number statistics as
   of April 1992.

                      Total       Allocated     Allocated (%)
   Class A              126            48            54%
   Class B            16383          7006            43%
   Class C          2097151         40724             2%

          Table 1: Network Number Statistics (April 1992)

   If recent trends of exponential growth continue, the network numbers
   in Class B will soon run out [1,2]. There are over 2 million Class C
   network numbers and only 2% have been allocated.  However, a Class C
   network number can only accommodate 254 host numbers which is too
   small for most networks.  With the rapid expansion of the Internet
   and drastic increase in personal computers, the time when the 32-bit
   address space is exhausted altogether is also not too distant [1-3].

   Recently several proposals have been put forward to deal with the
   immediate problem [1-4].  The Supernetting and C-sharp schemes
   attempt to make the Class C numbers more usable by re-defining the
   way in which Class C network numbers are classified and assigned
   [3,4].  Both schemes require modifications to the exterior routing
   algorithms and global coordination across the Internet may be


Wang, Crowcroft                                                 [Page 1]


RFC DRAFT                                                       May 1992


   reuqired for the deployment. The two schemes do not expand the total
   number of addresses available to the Internet therefore can only be
   used as a short-term fix for next two or three years. Chiappa and
   Jacobson also proposed schemes in which the 32-bit address field is
   replaced with a field of the same size but with different meaning and
   the gateways on the bounary re-write the address when the packet
   crossed the bounary [1,2].  Such schemes, however, requires
   substantial changes to the gateways and the exterior routing
   algorithm.

   In this paper, we present an alternative solution to the problem of
   address space exhaustion. The DNA scheme is based on a two-tier
   address structure and sharing of addresses.  It requires no
   modifications to the exterior routing algorithms and any networks can
   adopt the scheme dividually at any time without affecting other
   networks.

The Scheme

   The scheme we propose here is, in some respects, similar to the
   extension system used in telephone system. Many large organizations
   usually have extensive private telephone networks for internal use
   and at the mean time hire a limited number of external lines for
   communications with the outside world. In such a telephone system,
   important offices may have direct external lines and telephones in
   the public areas may be restricted to internal calls only. The
   majority of the telephones can usually make both internal calls and
   external calls.  But they must share a limited number of external
   lines.  When an external call is being made, a pre-defined digit has
   to be pressed so that an external line can be allocated from the poll
   of external lines.

   We propose here a similar system for the Internet, which is called
   the Dual Network Addressing (DNA) scheme.  In the DNA scheme, there
   are two types of Internet addresses: Internal addresses and External
   addresses.  An internal address is an Internet address only used
   within one network and is unique only within that network. An
   interface with an internal address can only communicate with another
   interface with an internal address in the same network.  An external
   address is unique in the entire Internet and an interface with an
   external address can communicate directly to another interface with
   an external address over the Internet. All current Internet addresses
   as external addresses.

   In effect, the external addresses form one global Internet and the
   internal addresses form many private Internets.  Within one network,
   the external addresses are only used for inter-network communications
   and internal addresses for intra-network communications.  An External


Wang, Crowcroft                                                 [Page 2]


RFC DRAFT                                                       May 1992


   Address Sharing Service (EASS) is needed to manage the sharing of
   external addresses. An EASS server reserves a number of external
   addresses. When a machine that only has an internal address wants to
   communicate a machine with an external address in other networks, it
   can send a request to an EASS server to obtain a temporary external
   address.  After the use, the machine can return the external address
   to the EASS server.

   Due to the nature of network applications and the bandwith
   constraints on the wide area networks, most of the network traffic is
   likely to be confined to its local area network. At one given time,
   the number of machines that are communicating with machines in other
   networks is often limited, and much smaller than the total number of
   machines in the network. In many large corporation networks, the
   majority machines may not allowed at all to communicate directly with
   machines outside its own network.  Therefore, it is possible for a
   network with a very large number of machines to operate with a small
   number of external addresses.

   In the DNA scheme, all machines in a network are assigned a permanent
   internal address and can communicate with any machines within the
   same network.  The allocation of external addresses creates three-
   level privileges:

   *  machines such as important servers (eg. mail, domain name,
      ftp, driectory or xarchive servers) and central hosts, which
      have frequent communications with other networks or are
      likely to be called by machines in other networks, have
      permanent external addresses.

   *  machines which are not allowed to communicate with other
      networks have no external addresses and can only communicate
      with machines within their own network.

   *  the rest of the machines share a number of external
      addresses. The external addresses are allocated by
      the EASS server on request. These machines can only
      used as clients to call machines in other networks,
      ie. they can be called by machines in other networks.

   A network can choose any network number other than its external
   network number as its internal network number.  Different networks
   can use the same network number as their internal number. We propse
   to reserve one Class A network number as the well-known network
   number for internal use.


Wang, Crowcroft                                                 [Page 3]


RFC DRAFT                                                       May 1992


The Advantages

   The DNA scheme attempts to tackle the problem from the bottom of the
   Internet, ie. each individual network, while other schemes described
   in the first section deal with the problem from the top of the
   Internet, ie. gateways and exterior routing algorithms. These
   schemes, however, do not need to be consider as mutually exclusive.
   The DNA scheme has several advantages:

   *  The DNA scheme takes an evolutionary approach towards the
      changes. Different networks can individually choose to
      adopt the scheme at any time only when necessary.
      There is no need for global coordination between different
      networks for their deployment. The effects of the deployment
      are confined to the network in which the scheme is being
      implemented, and are invisible to exterior routing
      algorithms and external networks.

   *  With the DNA scheme, it is possible for a medium size organization
      to use a Class C network number with 254 external addresses.
      The scheme allows the current Internet to expand to over 2 million
      networks and each network to have more than 16 million hosts.
      This will allow considerable time for a long-term solution to
      be developed and fully tested.

   *  The DNA scheme requires modifications to the host software. However,
      The modifications are needed only in those networks which
      adopt the DNA scheme. Since all existing Class A and B networks
      usually have sufficient external addresses for all their
      machines, they do not need to adopt the DNA scheme, and therefore
      need no modifications at all to their software. The networks
      which need to use the DNA scheme are those new networks which are
      set up after the Class A and B numbers run out and have to
      use a Class C number.

   *  The DNA scheme makes it possible to develope to a new addressing
      scheme without expanding the 32-bit address length to 64-bit.
      With the two-tier address structure, the current 32-bit space
      can accommodate over 4 billion hosts in the global Internet and
      100 million hosts in each individual network. When we move to a
      classless multi-hierarchic addressing scheme, the use of external
      addresses can be more efficient and less wasteful and the
      32-bit space can be adequate for the external addresses.

   *  When a new addressing scheme has been developed, all current
      Internet addresses have to be changed. The DNA scheme will make
      such a undertaking much easier and smoother, since only the
      EASS servers and those have permanent external addresses will


Wang, Crowcroft                                                 [Page 4]


RFC DRAFT                                                       May 1992


      be affected, and communications within the network will not
      be interrupted.

The Modifications

   The major modifications to the host software is in the network
   interface code. The DAN scheme requires each machine to have at least
   two addresses. But most of the host software currently does not allow
   us to bind two addresses to one physical interface. This problem can
   be solved by using two network interfaces on each machine. But this
   option is too expensive. Note the two interfaces are actually
   connected to the same physical network.  Therefore, if we modify the
   interface code to allow two logical interfaces to be mapped onto one
   single physical interface, the machine can then use both the external
   address and the internal address with one physical interface as if it
   has two physical interfaces.  In effect, two logical IP networks
   operate over the same physical network.

   An EASS server is required to manage the reserved external addresses
   and keeps the track of all the external addresses that has been
   allocated.  When a machines with an internal address requires an
   external address, it sends an AddrRequest(InterAddr) message to the
   EASS server.  When the EASS server receives a request, it allocates
   one external address from its reserved pool and responds with an
   AddrAlloc(InterAddr, ExterAddr) message. After finishing using the
   external address, the machine returns the address to the EASS server
   with an AddrReturn(InterAddr, ExterAddr) message, so that the address
   can be deallocated.

   Machines may crash and lose the track of their current external
   addresses. To deal with this problem, two additional messages are
   used for checking the current external addresses.  The EASS server
   can send an AddrCheck(InterAddr, ExterAddr) message to a machine that
   has not release its external address for a certain time period (eg.
   24 hours). The machine has to reply with an AddrCurrent(InterAddr,
   ExterAddr) message to inform the EASS server its current external
   address. The EASS server then updates its allocation table
   accordingly.

   The EASS server may receive duplicate AddrRequest(InterAddr) messages
   from the same machine that has been allocated an external address.
   This may occur when the AddrAlloc(InterAddr, ExterAddr) message is
   lost or the machine comes up after a crash. When a EASS server
   allocates a new external address, it has to check its allocation
   table. If the machine already has an external address, it will use
   the same external address. This aviods two different external
   addresses being allocated to the same machine.


Wang, Crowcroft                                                 [Page 5]


RFC DRAFT                                                       May 1992


   The DNA scheme also has implications to the domain name service. Many
   machines will have two entries in the local name server. The domain
   name server must examine the source address of the request and decide
   which entry to use. If the source address matches the well-known
   internal network number, it passes the internal address of the domain
   name. Otherwise, the name server passes the external address.  Many
   hosts do an inverse lookup of incoming connections therefore it is
   desirable to integrate the EASS functions into the DNS server so that
   the mapping in the DNS can updated when an external is allocated.

References

   [1]  J. Noel Chiappa "The IP Addressing Issue", Internet Draft,
        October 1990.

   [2]  D. Clark, L. Chapin, V. Cerf, R. Braden, "Towards the Future
        Architecture", RFC 1287, SRI International, December 1991.

   [3]  F. Solensky, F. Kastenholz, "A Revision to IP Address
        Classifications", Internet Draft, March 1992.

   [4]  V. Fuller, T. Li, J. Yu, K. Varadhan, "Supernetting:
        an Address Assignment and Aggregation Strategy", Internet Draft,
        March 1992

Authors' Address:

   Zheng Wang
   Jon Crowcroft
   Dept of Computer Science
   University College London
   London WC1E 6BT, UK

   zheng.wang@cs.ucl.ac.uk
   jon.crowcroft@cs.ucl.ac.uk


Wang, Crowcroft                                                 [Page 6]

From owner-Big-Internet@munnari.oz.au Wed May 20 02:13:38 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA20262; Wed, 20 May 1992 02:13:48 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Message-Id: <9205191613.20262@munnari.oz.au>
Received: from bells.cs.ucl.ac.uk by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA20259; Wed, 20 May 1992 02:13:38 +1000 (from P.Tsuchiya@cs.ucl.ac.uk)
Received: from wanstead.cs.ucl.ac.uk by bells.cs.ucl.ac.uk with local SMTP 
          id <g.24303-0@bells.cs.ucl.ac.uk>; Tue, 19 May 1992 17:13:18 +0100
To: big-internet@munnari.oz.au
Subject: new ip protocol proposal.....
Date: Tue, 19 May 92 17:12:36 +0100
From: Paul Tsuchiya <P.Tsuchiya@cs.ucl.ac.uk>


Gang,

I am pleased to announce a proposal for a new internet protocol
called Pip.  The intent of this protocol is to replace IP.  As
such, it is a counter-proposal to the two "mid-term" proposals
floated by the road group--CLNP and IPIP.  Except, I consider Pip
to be a long-term (forseeable future) proposal, because it has
many more features than CLNP.

The paper is available by anonymous ftp at thumper.bellcore.com,
file pub/tsuchiya/pip.ps.Z.  It should be posted soon as an
internet-draft.  And, there will be a plenary talk and BOF about
it in Boston.

Just to pique a little interest, below is the first two sections
of the paper.................

PT

____________________________________________________________

Pip: The `P' Internet Protocol

Paul F. Tsuchiya
Bellcore
tsuchiya@thumper.bellcore.com
May 19, 1992

1.0  Purpose of this draft

Pip is an IP protocol that scales, encodes policy, and is high speed. The 
purpose of this draft is to explain the basic concepts behind Pip so that 
people can start thinking about potential pitfalls. I am proposing Pip as an 
alternative to the two "medium term" proposals that emerged from the 
Road (Routing and Addressing) group to deal with the dual IP problems 
of scaling and address depletion. Because this proposal, which represents 
new ideas, is competing with old (and therefore well thought-out) ideas, I 
wish to circulate it (and get the process started) as quickly as possible, 
albeit in not as complete a form as I would like. I expect to have a 
complete proposal by the beginning of September. There will be a plenary 
presentation and a BOF covering this material at the Boston meeting of 
IETF.

2.0  Pip General

Pip has the following features:

1.	Pip carries multiple address types in a common format. As such, it is 
beneficial for transition from one address to another, and for future 
evolution (of routing techniques as well as of addressing schemes).

2.	The Pip address is completely general (multiple levels of hierarchy, 
expands to any number of systems).

3.	The Pip address is compact-it grows with the number of systems.

4.	The Pip address efficiently encodes policy (source-based) routes, both 
in "long form" (explicit path) and "short form" (path identifier).

5.	Because the Pip address can be a path identifier (multi-layer if de-
sired, like the ATM VCI/VPI), Pip can be used in a connection-orient-
ed fashion (this paper only briefly touches on mechanisms for 
controlling connections).

6.	The Pip address includes multicasting (potentially substantially more 
sophisticated than what is for IP multicast numbers, for instance, hier-
archical multicast).

7.	Pip efficiently encodes QOS (Quality-of-Service) information.

8.	The routing table lookup with Pip is well-bounded (by the depth of 
the address hierarchy).

9.	Pip accommodates "multiple defaults" routing from (multi-homed) 
stub domains.

10.	Pip allows intra-domain routing and hosts to operate with no notion 
of the "inter-domain" parts of their address, if desired. This is equiva-
lent to current IP hosts and intra-domain routers not needing to know 
their own network number.

11.	Pip accommodates tunneling across transit domains.

12.	By virtue of 8 and 9, Pip accommodates separation of interior and ex-
terior routing.

13.	Pip simplifies handling mobile systems (by having flat network layer 
identifiers).

In short, Pip is a "next generation" protocol, intended to allow the internet 
to evolve over the foreseeable future.

One of the design philosophies behind Pip is that it encodes all "routing" 
information (what is traditionally spread over the address and QOS fields) 
in a single structure (the Routing Directive). The rules for parsing the 
structure are simple on one hand, but provide a rich set of routing 
functions. Therefore, it is possible to build a single forwarding engine that 
will accommodate many different types of routing styles, including 
traditional hierarchical addresses, policy, source route, and virtual circuit. 
This way, the forwarding engine can be built in hardware and can remain 
constant even while internet routing evolves.

Another design philosophy behind Pip is that it delays the definition of 
how internet packet should be composed and interpreted. The meaning of 
addresses and QOS information are dynamically determined by 
information in Directory Services, distributed protocols such as routing 
protocols, and MIBs, rather than in a protocol specification. Current 
internet protocols have continuously been moving towards this 
philosophy, but with header formats that are not conducive to late 
semantic definition. Pip facilitates late semantic definition of the internet 
protocol header. This on one hand makes it easier to evolve the internet 
incrementally, but requires that all systems (hosts, routers, and directory 
servers) be a little smarter, and that algorithms be a little more complex. 
This, in a nutshell, is the trade-off being made by Pip.

From owner-Big-Internet@munnari.oz.au Wed May 20 02:56:20 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA21381; Wed, 20 May 1992 02:56:23 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Received: from mundamutti.cs.mu.OZ.AU by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA21378; Wed, 20 May 1992 02:56:20 +1000 (from kre)
To: big-internet@munnari.oz.au
Subject: Re: new ip protocol proposal..... 
In-Reply-To: Paul Tsuchiya's message of Tue, 19 May 92 17:12:36 +0100.
Date: Wed, 20 May 92 02:56:11 +1000
Message-Id: <6225.706294571@munnari.oz.au>
From: Robert Elz <kre@munnari.oz.au>

    Date:        Tue, 19 May 92 17:12:36 +0100
    From:        Paul Tsuchiya <P.Tsuchiya@cs.ucl.ac.uk>

    The paper is available by anonymous ftp at thumper.bellcore.com,
    file pub/tsuchiya/pip.ps.Z.

It is now also in the big-internet archives on munnari.oz.au
in big-internet/pip.ps.Z

kre

From owner-Big-Internet@munnari.oz.au Fri May 22 23:49:00 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA16406; Fri, 22 May 1992 23:49:17 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Received: from nsco.network.com by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA16378; Fri, 22 May 1992 23:49:00 +1000 (from jmh@anubis.network.com)
Received: from anubis-e1.network.com by nsco.network.com (5.61/1.34)
	id AA14762; Fri, 22 May 92 08:52:48 -0500
Received: from petunia.network.com by anubis.network.com (4.0/SMI-4.0)
	id AA07072; Fri, 22 May 92 08:48:37 CDT
Date: Fri, 22 May 92 08:48:37 CDT
From: jmh@anubis.network.com (Joel Halpern)
Message-Id: <9205221348.AA07072@anubis.network.com>
To: big-internet@munnari.oz.au
Subject: PIP Questions	

After reading Paul's first draft PIP document, I have a few questions:

Paul, that description needs work. I found it quite hard to understand.
However, I think I have figured out some of it, and I am confused.

1) At what scoping must there be consistency in numbering?
	Do tunnels within a domain have to be consistent?
	Do RHs for backbones have to be consistent?

2) Are tunnels created by management, by protocol, or by black magic?
	(And over what scope to tunnel numbers have to be consistent?)

3) It looks like tunnels are sort of "local destinations" to get the sender
	around having to know your internal topology.  Is this
	correct?

4) It sure looks like "RH.Tunnel" is a second level of tunnelling, after
	you very carefully said there was only 1.

5) You refer to support for mobile hosts.  If a host is mobile, and I want
	to reach it, how do I find an RH path to get to it?  Directory
	Service????

6) It appears that the RH path one generates relates to the connectivity
	of the net.  This means that, in the smart host case, when trying
	to reach a destination, one starts by getting (from directory?)
	the terminal RH segment (61.92.7?) for the host one is after.
	One then has to splice that together with ones own RH segment
	(1.14.12.96?).  By itself, that would be fine, since it appears
	that the top levels must know how to get together.  But suppose
	that there are two entries for us, and two entries for the
	destination, and not all combinations will work at any given
	time  (the link between A and D, and some others might be down).
	How does one establish the path?  Try each combination until
	somthing works?

7) Size of RF?  I understand that you must use the largest RF size that
	will be needed.  However, it appears that if, when allocating
	numbers, the space at a given level is exhausted, then a larger
	size is needed.  To do that, everyone must simultaneously change
	over!  

8) It looks like, in many situations, connection setup will be slow and
	expensive.  My udnerstanding of much other ongoing work is that 
	people want faster connection setup.  Some of this can be dealt
	with by caching the RH info for a destination, but not all.
	Have you any thoughts on that?

Minor point, one page 7, when you describe the 6 bit RHF Offset field, I
presume that the units are RHs, but the text is not clear.

Just starting to follow what you are doing,
Joel M. Halpern			jmh@network.com
Network Systems Corporation


From owner-Big-Internet@munnari.oz.au Sat May 23 01:08:38 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA19122; Sat, 23 May 1992 01:08:46 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Message-Id: <9205221508.19122@munnari.oz.au>
Received: from bells.cs.ucl.ac.uk by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA19119; Sat, 23 May 1992 01:08:38 +1000 (from P.Tsuchiya@cs.ucl.ac.uk)
Received: from waffle.cs.ucl.ac.uk by bells.cs.ucl.ac.uk with local SMTP 
          id <g.01630-0@bells.cs.ucl.ac.uk>; Fri, 22 May 1992 16:08:28 +0100
To: big-internet@munnari.oz.au
Cc: jmh@anubis.network.com
Subject: PIP Questions
Date: Fri, 22 May 92 16:07:43 +0100
From: Paul Tsuchiya <P.Tsuchiya@cs.ucl.ac.uk>

> 
> Paul, that description needs work. I found it quite hard to understand.
> However, I think I have figured out some of it, and I am confused.

Like I say, I want to get it out in whatever form so that
people can start thinking about it.  Hopefully the basic idea
is coming through if not all the details.

> 
> 1) At what scoping must there be consistency in numbering?
> 	Do tunnels within a domain have to be consistent?

Tunnels are local to domains.  But there is only one tunnel per domain,
so the scoping of numbering for tunnels has to be consistent within a
domain.

> 	Do RHs for backbones have to be consistent?

RH (routing hint) numbers are per "logical router".  So, for a given LR number, 
the RH numbers must be consistent.  The LR number can be purely local, or
it can have scope across domains (the latter case being more usual,
keeping in mind the LR number can change "values" as it crosses domains
without changing semantics, since each domain has a local mapping of
LR number bits).

> 
> 2) Are tunnels created by management, by protocol, or by black magic?
> 	(And over what scope to tunnel numbers have to be consistent?)

Tunnels are created by routing protocol (with a little help from management).
A typical case would be, for instance, that a domain had M border routers
attached to N exit backbones.  N tunnel values would be created, and those
would be given to the appropriate border routers (the ones attached to
each exit backbone).  Then, the border router would advertise the tunnel
value in its routing updates, and therefore all routers in the domain would
know how to route to the exit points.

> 
> 3) It looks like tunnels are sort of "local destinations" to get the sender
> 	around having to know your internal topology.  Is this
> 	correct?

No.  The tunnel is behaves basically like a loose source route.  It is a
means of preventing the internal routers from having to know about the
external world.  Hosts don't need to know the internal topology whether
or not tunneling is being used.

> 
> 4) It sure looks like "RH.Tunnel" is a second level of tunnelling, after
> 	you very carefully said there was only 1.

RH.Tunnel is an ALTERNATE form of tunneling, not a second level (in the
hierarchical sense).  It is useful mainly for setting the
source address "source RH Number" on packets leaving the local domain.

> 
> 5) You refer to support for mobile hosts.  If a host is mobile, and I want
> 	to reach it, how do I find an RH path to get to it?  Directory
> 	Service????

The support for mobile hosts I refer is simply the fact that the ID
is separate from the routing directive, and therefore the routing directive
can change due to mobility without hindering the identification function.
The actual protocol support for this is another matter.  It could be
directory service, but if that is too slow or cumbersome, then perhaps
a "home gateway" in the sense of Zaw-Sing some years back is better.

> 
> 6) It appears that the RH path one generates relates to the connectivity
> 	of the net.  This means that, in the smart host case, when trying
> 	to reach a destination, one starts by getting (from directory?)
> 	the terminal RH segment (61.92.7?) for the host one is after.
> 	One then has to splice that together with ones own RH segment
> 	(1.14.12.96?).  By itself, that would be fine, since it appears
> 	that the top levels must know how to get together.  But suppose
> 	that there are two entries for us, and two entries for the
> 	destination, and not all combinations will work at any given
> 	time  (the link between A and D, and some others might be down).
> 	How does one establish the path?  Try each combination until
> 	somthing works?

That would be one way.  If paths usually work, then this method would
be fine (you'd get a destination backbone unreachable and try something
else).  Or, one could get more sophisticated and get into policy routing
stuff ala idpr or estrin.  This, by the way, is exactly the same problem
that the policy routing people have for dealing with the case where they
have a somewhat static local topology map, and they find out in real
time when they try to set up a path if there are failed resources.

> 
> 7) Size of RF?  I understand that you must use the largest RF size that
> 	will be needed.  However, it appears that if, when allocating
> 	numbers, the space at a given level is exhausted, then a larger
> 	size is needed.  To do that, everyone must simultaneously change
> 	over!  

The size of the RHF is on a per packet basis, not global basis!  (I admit
that I didn't explicitly point this out in the paper, but it never occured
to me that it would be interpreted as a global constant).  That is, when
a host creates a packet, it finds the biggest individual RHF, and sets
all the RHFs in the packet to that size.  Another packet may have bigger
or smaller RHFs.

> 
> 8) It looks like, in many situations, connection setup will be slow and
> 	expensive.  My udnerstanding of much other ongoing work is that 
> 	people want faster connection setup.  Some of this can be dealt
> 	with by caching the RH info for a destination, but not all.
> 	Have you any thoughts on that?

I'm confused by this question.  I hardly talk about "connections" in the
paper, except to briefly mention that one could use the RHFs as a virtual
circuit identifier if one wanted to.  But I if the normal usage of Pip
is routing hints that mimic current globally-significant addresses, then
there is no "setup" at all.  All routers have enough information to route
every individual packet, in the true datagram sense.

But, ignoring this, why would connection setup be any slower or more 
expensive with Pip than with anything else?  It all depends on how much
resource reservation you are doing, whether you need to get an ack all
the way back from the destination before proceeding with data packets,
and so on.  I don't discuss any of this in my paper.

> 
> Minor point, one page 7, when you describe the 6 bit RHF Offset field, I
> presume that the units are RHs, but the text is not clear.

Yes.

> 
> Just starting to follow what you are doing,

Yeah.  Some of this stuff is pretty different, so I guess there will be
a lot of confusion early on.  Also, this paper talks about the behaviour
of Pip given certain info in the forwarding tables, but doesn't talk at
all about how that information gets there.  I think that in most cases
it is pretty easy to figure out how the info will get there, but none the
less it needs to be written down somewhere.

From owner-big-internet@munnari.oz.au Sat May 23 06:23:38 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA26409; Sat, 23 May 1992 06:23:40 +1000 (from owner-big-internet)
Return-Path: <owner-big-internet@munnari.oz.au>
Received: from nsco.network.com by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA23882; Sat, 23 May 1992 04:18:22 +1000 (from jmh@anubis.network.com)
Received: from anubis-e1.network.com by nsco.network.com (5.61/1.34)
	id AA16427; Fri, 22 May 92 13:22:01 -0500
Received: from petunia.network.com by anubis.network.com (4.0/SMI-4.0)
	id AA08930; Fri, 22 May 92 13:17:51 CDT
Date: Fri, 22 May 92 13:17:51 CDT
From: jmh@anubis.network.com (Joel Halpern)
Message-Id: <9205221817.AA08930@anubis.network.com>
To: big-internet@munnari.oz.au
Subject: Re: PIP
Cc: tsuchiya@thumper.bellcore.com

First,  another question:

In the paper, you talk about multiple LR values, with local interpretation,
which affect the usage of the RH field.  I understand how this works
if the entire path is subject to that interpretation (possibly with
value remapping).  However, you indicate that you could splice together
a path where the local end uses one interpretation, and the backbones use
a different interpretation.  Again, I understand that on the outbound
direction the border router changes LR to the global meaning.

But suppose that the local region recognizes more than one LR type.  Now,
suppose the receiver of the message wants to use the RH that he gets  (as
that seems the best way to get a path to someone if one is a server).  Then,
he will return an RH where the last set of entries are actual relative to
a local interpretation.  How is the border router (at the entry to the
domain that started this mess) supposed to know which LR value to plug
in?

Additionally:
Given that obviously a router should not modify the number of RH entries,
there is a question for an entry border router to a transit domain.  That
domain may have policies about which internal paths are to be used by
transit traffic.  These policies are not likely to be reflected in the
header from the originator of the packet (noone is going to trust
someone else to happen to comply with their policies).  It appears that
the tunnels are used in this case to implement the policies?

Now, on to Paul's further comments on my questions:


>> 1) At what scoping must there be consistency in numbering?
>> 	Do tunnels within a domain have to be consistent?
>
>Tunnels are local to domains.  But there is only one tunnel per domain,
>so the scoping of numbering for tunnels has to be consistent within a
>domain.
Actually, there is obviously more than one tunnel per domain.  In your
example, tehre are two tunnels. In fact, given that a tunnel is a source
route, and selects an exit domain, there appears to need to be somthing like
the product of the number of neighbor domains, times the number of selectable
paths to those domains  (reference figure 6:  If w is to have tunnels to use
A or D as exits, there must be 2 tunnels.  If some intermediate policy
is related to the use of d or a for internal transit, there may be a
desire for 2 tunnels to D.  Yes, this can be accomplished with RHs, but
if this is a transit domain (rather than an end domain), then a tunnel is
the only way to go.)

>> 
>> 2) Are tunnels created by management, by protocol, or by black magic?
>> 	(And over what scope to tunnel numbers have to be consistent?)
>
>Tunnels are created by routing protocol (with a little help from management).
>A typical case would be, for instance, that a domain had M border routers
>attached to N exit backbones.  N tunnel values would be created, and those
>would be given to the appropriate border routers (the ones attached to
>each exit backbone).  Then, the border router would advertise the tunnel
>value in its routing updates, and therefore all routers in the domain would
>know how to route to the exit points.
>
And then some form of tunnel redirect is used to tell the hosts to use
the tunnels?  When does such get generated?

>> 
>> 4) It sure looks like "RH.Tunnel" is a second level of tunnelling, after
>> 	you very carefully said there was only 1.
>
>RH.Tunnel is an ALTERNATE form of tunneling, not a second level (in the
>hierarchical sense).  It is useful mainly for setting the
>source address "source RH Number" on packets leaving the local domain.
>
You say that this is not a second level.  However, in section 5.2.1,
example 2.1a, the source originates a packet with an Rh-tunnel field.
In the text you say: "x will eventually discover a tunnel that will get
the packet to router b."  As such, there are two tunnel values in the
same packet.

Is the RH-Tunnel only for allowing defaulting and allowing the originator
not to know the full "address" of this domain?

>> 
>> 5) You refer to support for mobile hosts.  If a host is mobile, and I want
>> 	to reach it, how do I find an RH path to get to it?  Directory
>> 	Service????
>
>The support for mobile hosts I refer is simply the fact that the ID
>is separate from the routing directive, and therefore the routing directive
>can change due to mobility without hindering the identification function.
>The actual protocol support for this is another matter.  It could be
>directory service, but if that is too slow or cumbersome, then perhaps
>a "home gateway" in the sense of Zaw-Sing some years back is better.
>
My concern with the hand-waving at mobile hosts, is that for clean support
of mobile hosts, the resolution procedures used to translate IDs to RHs
need to be the same, whether a host has a location which is fixed over a
large time scale, or is mobile over very short intervals.  I understand
that the separation of the two notions helps this.  I suppose the rest
can wait until this is further developed.

>> 
>> 6) It appears that the RH path one generates relates to the connectivity
>> 	of the net.  This means that, in the smart host case, when trying
>> 	to reach a destination, one starts by getting (from directory?)
>> 	the terminal RH segment (61.92.7?) for the host one is after.
>> 	One then has to splice that together with ones own RH segment
>> 	(1.14.12.96?).  By itself, that would be fine, since it appears
>> 	that the top levels must know how to get together.  But suppose
>> 	that there are two entries for us, and two entries for the
>> 	destination, and not all combinations will work at any given
>> 	time  (the link between A and D, and some others might be down).
>> 	How does one establish the path?  Try each combination until
>> 	somthing works?
>
>That would be one way.  If paths usually work, then this method would
>be fine (you'd get a destination backbone unreachable and try something
>else).  Or, one could get more sophisticated and get into policy routing
>stuff ala idpr or estrin.  This, by the way, is exactly the same problem
>that the policy routing people have for dealing with the case where they
>have a somewhat static local topology map, and they find out in real
>time when they try to set up a path if there are failed resources.
>
I understand that this is a general policy problem.  However, it needs a
clearer solution when the entire packet delivery architecture rests on
this framework.  This is particularly an issue since this splicing is
likely to have to be done in every end system, and not all of them will
have the amount of topology and routing information that a full idpr
router would have.

>> 
>> 7) Size of RF?  I understand that you must use the largest RF size that
>> 	will be needed.  However, it appears that if, when allocating
>> 	numbers, the space at a given level is exhausted, then a larger
>> 	size is needed.  To do that, everyone must simultaneously change
>> 	over!  
>
>The size of the RHF is on a per packet basis, not global basis!  (I admit
>that I didn't explicitly point this out in the paper, but it never occured
>to me that it would be interpreted as a global constant).  That is, when
>a host creates a packet, it finds the biggest individual RHF, and sets
>all the RHFs in the packet to that size.  Another packet may have bigger
>or smaller RHFs.
>
But, a router can re-write the RH fields along the way.  I understand
that there are error messages if the new value does not fit.  However,
all of these error checks (enough levels, big enough fields, etc) result
in a significant overhead in checking, and more error messages being
generated than looks desirable.

>> 
>> 8) It looks like, in many situations, connection setup will be slow and
>> 	expensive.  My udnerstanding of much other ongoing work is that 
>> 	people want faster connection setup.  Some of this can be dealt
>> 	with by caching the RH info for a destination, but not all.
>> 	Have you any thoughts on that?
>
>I'm confused by this question.  I hardly talk about "connections" in the
>paper, except to briefly mention that one could use the RHFs as a virtual
>circuit identifier if one wanted to.  But I if the normal usage of Pip
>is routing hints that mimic current globally-significant addresses, then
>there is no "setup" at all.  All routers have enough information to route
>every individual packet, in the true datagram sense.
>
>But, ignoring this, why would connection setup be any slower or more 
>expensive with Pip than with anything else?  It all depends on how much
>resource reservation you are doing, whether you need to get an ack all
>the way back from the destination before proceeding with data packets,
>and so on.  I don't discuss any of this in my paper.
>
Connection setup here refers to getting the correct RH fields, so that one
can communicate.  There appear to be several sets of cases where one
tries something.  If that does not work, you probably get an error message.
When you get the error message, you can correct what it reports.  Depending
on the problems, more than one iteration of this may be necessary.  Thus,
it can take a while.  (Thus also, the thought that caching will help some.)

Thank you,
Joel M. Halpern			jmh@network.com
Network Systems Corporation


From owner-Big-Internet@munnari.oz.au Tue May 26 18:02:35 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA09941; Tue, 26 May 1992 18:02:42 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Message-Id: <9205260802.9941@munnari.oz.au>
Received: from bells.cs.ucl.ac.uk by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA09936; Tue, 26 May 1992 18:02:35 +1000 (from P.Tsuchiya@cs.ucl.ac.uk)
Received: from waffle.cs.ucl.ac.uk by bells.cs.ucl.ac.uk with local SMTP 
          id <g.10245-0@bells.cs.ucl.ac.uk>; Tue, 26 May 1992 09:01:06 +0100
To: jmh@anubis.network.com (Joel Halpern)
Cc: big-internet@munnari.oz.au
Subject: A mod to one of my answers.....
In-Reply-To: Your message of "Fri, 22 May 92 08:48:37 CDT." <9205221348.AA07072@anubis.network.com>
Date: Tue, 26 May 92 09:01:04 +0100
From: Paul Tsuchiya <P.Tsuchiya@cs.ucl.ac.uk>


I take back my answer to one of Joel's questions.

The question was:
> > 
> > 4) It sure looks like "RH.Tunnel" is a second level of tunnelling, after
> > 	you very carefully said there was only 1.
> 

And my answer was:

> RH.Tunnel is an ALTERNATE form of tunneling, not a second level (in the
> hierarchical sense).  It is useful mainly for setting the
> source address "source RH Number" on packets leaving the local domain.
> 

It is true that the RH.Tunnel cannot be a second level of tunnelling under
the regular Tunnel, but the regular Tunnel could
be a second level of tunneling under the RH.Tunnel.  But the RH.Tunnel
is used for pretty specific purposes (allowing the "inter-domain" part
of the address to be filled in when packets exit a stub domain), so I
don't know that this bit of hierarchical tunneling would be of much use.

PT

From owner-Big-Internet@munnari.oz.au Tue May 26 19:19:46 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA12030; Tue, 26 May 1992 19:19:58 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Message-Id: <9205260919.12030@munnari.oz.au>
Received: from bells.cs.ucl.ac.uk by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA12026; Tue, 26 May 1992 19:19:46 +1000 (from P.Tsuchiya@cs.ucl.ac.uk)
Received: from waffle.cs.ucl.ac.uk by bells.cs.ucl.ac.uk with local SMTP 
          id <g.19013-0@bells.cs.ucl.ac.uk>; Tue, 26 May 1992 10:19:09 +0100
To: jmh@anubis.network.com (Joel Halpern)
Cc: big-internet@munnari.oz.au
Subject: Re: PIP
In-Reply-To: Your message of "Fri, 22 May 92 13:17:51 CDT." <9205221817.AA08930@anubis.network.com>
Date: Tue, 26 May 92 10:19:05 +0100
From: Paul Tsuchiya <P.Tsuchiya@cs.ucl.ac.uk>


> 
> First,  another question:
> 
> In the paper, you talk about multiple LR values, with local interpretation,
> which affect the usage of the RH field.  I understand how this works
> if the entire path is subject to that interpretation (possibly with
> value remapping).  However, you indicate that you could splice together
> a path where the local end uses one interpretation, and the backbones use
> a different interpretation.  Again, I understand that on the outbound
> direction the border router changes LR to the global meaning.
> 
> But suppose that the local region recognizes more than one LR type.  Now,
> suppose the receiver of the message wants to use the RH that he gets  (as
> that seems the best way to get a path to someone if one is a server).  Then,
> he will return an RH where the last set of entries are actual relative to
> a local interpretation.  How is the border router (at the entry to the
> domain that started this mess) supposed to know which LR value to plug
> in?

I don't fully understand your scenario.  In general, the routing algorithms
will set things up so that each pair of external border routers will know
how one LR maps into another, and when LR semantics get lost.  So, if the
meaning of the local LR is purely local, then that meaning will be lost
when the packet exits the local domain.   If the RH Numbers associated with
the now lost LR are simply not valid under a different LR, then the border
router should throw away the packet and send back an error message.  I
think that normally one would set up RH Numbers such that the same RH numbers
are valid under a range of LRs (where the different LRs in this case
represent different QOS types, say).  This way, even though you might lose
the QOS semantics, the RH-Numbers are still valid under a different QOS (LR) 
type.

> 
> Additionally:
> Given that obviously a router should not modify the number of RH entries,
> there is a question for an entry border router to a transit domain.  That
> domain may have policies about which internal paths are to be used by
> transit traffic.  These policies are not likely to be reflected in the
> header from the originator of the packet (noone is going to trust
> someone else to happen to comply with their policies).  It appears that
> the tunnels are used in this case to implement the policies?

I guess they could be, but it seems that the border router would have to
do something more sophisticated than what I describe in my draft.  For
instance, parse the source RH-number completely to determine that the
packet is coming from domain X, and therefore gets internal paty Y.

> 
> Now, on to Paul's further comments on my questions:
> 
> 
> >> 1) At what scoping must there be consistency in numbering?
> >> 	Do tunnels within a domain have to be consistent?
> >
> >Tunnels are local to domains.  But there is only one tunnel per domain,
> >so the scoping of numbering for tunnels has to be consistent within a
> >domain.
> Actually, there is obviously more than one tunnel per domain.  In your

By "one tunnel per domain", I meant one "tunnel scope" per domain, not 
literally one tunnel.  (Sheesh Joel, give me a little more credit than that :-)

> example, tehre are two tunnels. In fact, given that a tunnel is a source
> route, and selects an exit domain, there appears to need to be somthing like
> the product of the number of neighbor domains, times the number of selectable
> paths to those domains  (reference figure 6:  If w is to have tunnels to use

Well, this might be the theoretical max, but I guess a closer number might
be the number of neighbor domains X the number of real QOSs that one want
to implement.  Is this a problem?

> A or D as exits, there must be 2 tunnels.  If some intermediate policy
> is related to the use of d or a for internal transit, there may be a
> desire for 2 tunnels to D.  Yes, this can be accomplished with RHs, but
> if this is a transit domain (rather than an end domain), then a tunnel is
> the only way to go.)

Why is a tunnel the only way to go?  If one wants to convey "external" routing
information into their transit domain routers, then there is no problem doing
so.  In fact, I guess that with backbones like NSFnet, all routers are border
routers (and therefore must know external information), so there may be no 
advantage to doing tunneling in this case.

> 
> >> 
> >> 2) Are tunnels created by management, by protocol, or by black magic?
> >> 	(And over what scope to tunnel numbers have to be consistent?)
> >
> >Tunnels are created by routing protocol (with a little help from management).
> >A typical case would be, for instance, that a domain had M border routers
> >attached to N exit backbones.  N tunnel values would be created, and those
> >would be given to the appropriate border routers (the ones attached to
> >each exit backbone).  Then, the border router would advertise the tunnel
> >value in its routing updates, and therefore all routers in the domain would
> >know how to route to the exit points.
> >
> And then some form of tunnel redirect is used to tell the hosts to use
> the tunnels?  When does such get generated?

What do you mean?  Of course it gets generated when the host sends a packet 
that attempts to use a high (external) level of routing in an RD.  At least
in my examples, either tunneling OR external-level RHs, but not BOTH, are
ever in use at a given time, so once the host gets this redirect (actually,
not a redirect, but an error report), it saves
it off and uses tunnels from then on out (unless the routers decide at some
point not to use tunnels, in which case the host will get another error
report).  I suppose it is possible to mix tunneling and externel RHs, so the
error report would have to be more detailed in this case, which is a good bit
more complicated, so perhaps it should stay one or the other.

> 
> >> 
> >> 4) It sure looks like "RH.Tunnel" is a second level of tunnelling, after
> >> 	you very carefully said there was only 1.
> >
> >RH.Tunnel is an ALTERNATE form of tunneling, not a second level (in the
> >hierarchical sense).  It is useful mainly for setting the
> >source address "source RH Number" on packets leaving the local domain.
> >
> You say that this is not a second level.  However, in section 5.2.1,
> example 2.1a, the source originates a packet with an Rh-tunnel field.
> In the text you say: "x will eventually discover a tunnel that will get
> the packet to router b."  As such, there are two tunnel values in the
> same packet.

This sentence should read "Through a method anoalogous to that used for
Tunnels, x will eventually discover an RH-Tunnel that will get.....".
Sorry.

> 
> Is the RH-Tunnel only for allowing defaulting and allowing the originator
> not to know the full "address" of this domain?

I don't know if it will always ONLY be for this (we will probably come up 
with other useful things to do with it), but so far it is only for this.

> 
> >> 
> >> 5) You refer to support for mobile hosts.  If a host is mobile, and I want
> >> 	to reach it, how do I find an RH path to get to it?  Directory
> >> 	Service????
> >
> >The support for mobile hosts I refer is simply the fact that the ID
> >is separate from the routing directive, and therefore the routing directive
> >can change due to mobility without hindering the identification function.
> >The actual protocol support for this is another matter.  It could be
> >directory service, but if that is too slow or cumbersome, then perhaps
> >a "home gateway" in the sense of Zaw-Sing some years back is better.
> >
> My concern with the hand-waving at mobile hosts, is that for clean support
> of mobile hosts, the resolution procedures used to translate IDs to RHs
> need to be the same, whether a host has a location which is fixed over a
> large time scale, or is mobile over very short intervals.  I understand
> that the separation of the two notions helps this.  I suppose the rest
> can wait until this is further developed.

There are any number of ways that this can be worked out.  But, there is
a "ID does not match dest RH" message that can be used to flush out old
information in hosts, so directory service information won't have to be
timed out quickly, since it can get flushed "on demand".

> I understand that this is a general policy problem.  However, it needs a
> clearer solution when the entire packet delivery architecture rests on
> this framework.  This is particularly an issue since this splicing is
> likely to have to be done in every end system, and not all of them will
> have the amount of topology and routing information that a full idpr
> router would have.

Yep.

> 
> >> 
> >> 7) Size of RF?  I understand that you must use the largest RF size that
> >> 	will be needed.  However, it appears that if, when allocating
> >> 	numbers, the space at a given level is exhausted, then a larger
> >> 	size is needed.  To do that, everyone must simultaneously change
> >> 	over!  
> >
> >The size of the RHF is on a per packet basis, not global basis!  (I admit
> >that I didn't explicitly point this out in the paper, but it never occured
> >to me that it would be interpreted as a global constant).  That is, when
> >a host creates a packet, it finds the biggest individual RHF, and sets
> >all the RHFs in the packet to that size.  Another packet may have bigger
> >or smaller RHFs.
> >
> But, a router can re-write the RH fields along the way.  I understand
> that there are error messages if the new value does not fit.  However,
> all of these error checks (enough levels, big enough fields, etc) result
> in a significant overhead in checking, and more error messages being
> generated than looks desirable.

For the case of writing over the source RH Number, the size of the field
won't change much over time, so once a host is alerted to the appropriate
size, it need not get another such error message for a long time.  For the
case of using RHFs as VCIs, the max VCI size will also be fairly constant,
so again not too many error messages are needed.  Also, the required RHF
size could be returned as part of the "path setup" proceedure (if there
is one).  So, it should be possible to make these messages fairly rare.

> Connection setup here refers to getting the correct RH fields, so that one
> can communicate.  There appear to be several sets of cases where one
> tries something.  If that does not work, you probably get an error message.
> When you get the error message, you can correct what it reports.  Depending
> on the problems, more than one iteration of this may be necessary.  Thus,
> it can take a while.  (Thus also, the thought that caching will help some.)
> 

Oh.  I see what you mean.  I just call this "creating RDs" or something of
the sort.  As stated above, most of this information (like tunnels or RHF
size) gets learned once and then is valid for subsequent RDs, so its not like
you have to go through a series of error messages every time.

PT

From owner-big-internet@munnari.oz.au Wed May 27 06:42:23 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA29389; Wed, 27 May 1992 06:42:26 +1000 (from owner-big-internet)
Return-Path: <owner-big-internet@munnari.oz.au>
Received: from nsco.network.com by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA28283; Wed, 27 May 1992 05:46:07 +1000 (from jmh@anubis.network.com)
Received: from anubis-e1.network.com by nsco.network.com (5.61/1.34)
	id AA07844; Tue, 26 May 92 14:49:57 -0500
Received: from petunia.network.com by anubis.network.com (4.0/SMI-4.0)
	id AA20932; Tue, 26 May 92 14:45:33 CDT
Date: Tue, 26 May 92 14:45:33 CDT
From: jmh@anubis.network.com (Joel Halpern)
Message-Id: <9205261945.AA20932@anubis.network.com>
To: P.Tsuchiya@cs.ucl.ac.uk
Subject: Re: PIP
Cc: big-internet@munnari.oz.au

>
>> 
>> First,  another question:
>> 
>> In the paper, you talk about multiple LR values, with local interpretation,
>> which affect the usage of the RH field.  I understand how this works
>> if the entire path is subject to that interpretation (possibly with
>> value remapping).  However, you indicate that you could splice together
>> a path where the local end uses one interpretation, and the backbones use
>> a different interpretation.  Again, I understand that on the outbound
>> direction the border router changes LR to the global meaning.
>> 
>> But suppose that the local region recognizes more than one LR type.  Now,
>> suppose the receiver of the message wants to use the RH that he gets  (as
>> that seems the best way to get a path to someone if one is a server).  Then,
>> he will return an RH where the last set of entries are actual relative to
>> a local interpretation.  How is the border router (at the entry to the
>> domain that started this mess) supposed to know which LR value to plug
>> in?
>
>I don't fully understand your scenario.  In general, the routing algorithms
>will set things up so that each pair of external border routers will know
>how one LR maps into another, and when LR semantics get lost.  So, if the
>meaning of the local LR is purely local, then that meaning will be lost
>when the packet exits the local domain.   If the RH Numbers associated with
>the now lost LR are simply not valid under a different LR, then the border
>router should throw away the packet and send back an error message.  I
>think that normally one would set up RH Numbers such that the same RH numbers
>are valid under a range of LRs (where the different LRs in this case
>represent different QOS types, say).  This way, even though you might lose
>the QOS semantics, the RH-Numbers are still valid under a different QOS (LR) 
>type.
>

In your paper you talk about having a local interpretation of RH fields,
and indicating this with the LR value.  That makes sense.  However, it
seems to me that the only reason one needs to have an indicator is that
one might support several LR values (ie several RH interpretations).
Now, in your example, you put together a path that uses a local LR/RH
value within a domain, and then has the interdomain part use the standard
interpretation.  The boundary router changes the LR value on a message
heading out, so that the LR shows a "standard" or "default" value.  This
part works.  However, when somone receives the initial information, they
are likely to record the LR/RH information, to use in replying.  (It would
be awkward if one needed to do another directory type lookup before sending
a reply.  However,  the LR value the replying entity wants to use is the
default value, since that is the interpretation on the portion of the RH
that will be interpretted first.  When this message reaches the border of
the domain which first sent the message, how will the border router know
which, of potentially several including default and one or more local,
interpretation the LR value should be converted to so that the RH values
will be processed properly to get the message to the correct destination.

Joel


From owner-Big-Internet@munnari.oz.au Wed May 27 19:31:26 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA20812; Wed, 27 May 1992 19:31:36 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Message-Id: <9205270931.20812@munnari.oz.au>
Received: from haig.cs.ucl.ac.uk by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA20805; Wed, 27 May 1992 19:31:26 +1000 (from P.Tsuchiya@cs.ucl.ac.uk)
Received: from epping.cs.ucl.ac.uk by haig.cs.ucl.ac.uk with local SMTP 
          id <g.03226-0@haig.cs.ucl.ac.uk>; Wed, 27 May 1992 10:30:45 +0100
To: jmh@anubis.network.com (Joel Halpern)
Cc: big-internet@munnari.oz.au
Subject: Re: PIP
In-Reply-To: Your message of "Tue, 26 May 92 14:45:33 CDT." <9205261945.AA20932@anubis.network.com>
Date: Wed, 27 May 92 10:29:22 +0100
From: Paul Tsuchiya <P.Tsuchiya@cs.ucl.ac.uk>

> 
> In your paper you talk about having a local interpretation of RH fields,
> and indicating this with the LR value.  That makes sense.  However, it
> seems to me that the only reason one needs to have an indicator is that
> one might support several LR values (ie several RH interpretations).
> Now, in your example, you put together a path that uses a local LR/RH
> value within a domain, and then has the interdomain part use the standard
> interpretation.  The boundary router changes the LR value on a message
> heading out, so that the LR shows a "standard" or "default" value.  This
> part works.  However, when somone receives the initial information, they
> are likely to record the LR/RH information, to use in replying.  (It would
> be awkward if one needed to do another directory type lookup before sending
> a reply.  However,  the LR value the replying entity wants to use is the
> default value, since that is the interpretation on the portion of the RH
> that will be interpretted first.  When this message reaches the border of
> the domain which first sent the message, how will the border router know
> which, of potentially several including default and one or more local,
> interpretation the LR value should be converted to so that the RH values
> will be processed properly to get the message to the correct destination.
> 
> Joel

Ok, now I see your point.  I don't think I mention it in the paper, but I have
had this vague notion that there would be bits in the LR and HD that could
be "passed through" untouched and unexamined.  I haven't thought it out
completely, but it seems that the following is possible:

When external border routers (say between a stub and a backbone)
exchange information, they exchange something like:

	bits 0-2   = RH.level
	bits 3-4   = global.QOS
	bits 5-6   = multicast
	bits 7-9   = source specific
	bits 10-12 = dest specific

where the descriptions on the right-hand side are in a globally understood
description language (like ASN.1 or something.  By the way, directory service
will have to hand out these same descriptors, so that hosts will be able to
set the LR (and HD) bits correctly from the start.  It also means that the
hosts will need to know their local mapping from descriptor to bits.)

Bits 7-9 are understood in the source stub network (X), and are used for
local QOS (or whatever) routing.  Backbones map these bits into their
"source specific" bits, but do not route on them (either they are masked before
indexing into the LR Table, or multiple entries in the LR Table have the
same meaning).  When the packet reaches the destination stub (Y),
the (previously) source specific bits are put into the dest specific bits, 
and the dest specific into the source specific.  Then stub Y
can route on the bits meaningful to it.  When the destination host y receives
the packet, the "dest specific" bits are the same as those that were sent
by the source host x, and when the packet is returned, the dest specific
bits will eventually get mapped back into the source specific bits for
use by stub X.

Of course, you can still run into the problem where the transits don't
pass-through as many of the bits as are required by the stubs, and therefore
will lose semantics.

I suppose that it is also possible for a transit domain to set certain
"private" bits when a packet enters the domain, and zero them out when the
packet leaves the domain, in order to influence routing within the domain.
So, this would require that there be, in addition to the globally understood
and acted upon bits, and in addition to the passed-through source and dest
specific bits, some "private" bits, that could be used locally.  But, the
Tunnel could also be used in this capacity.

PT


These bits can have values in them, but will be ignored (masked off) by
a domain that does not use them.

So, for you above example, the originating host would put the local  

From owner-big-internet@munnari.oz.au Thu May 28 02:49:55 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA02775; Thu, 28 May 1992 02:50:01 +1000 (from owner-big-internet)
Return-Path: <owner-big-internet@munnari.oz.au>
Received: from europa.clearpoint.com by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA29031; Thu, 28 May 1992 00:14:30 +1000 (from kasten@europa.clearpoint.com)
Received: by europa.clearpoint.com (4.1/1.34)
	id AA10564; Wed, 27 May 92 10:07:03 EDT
Date: Wed, 27 May 92 10:07:03 EDT
From: kasten@europa.clearpoint.com (Frank Kastenholz)
Message-Id: <9205271407.AA10564@europa.clearpoint.com>
To: big-internet@munnari.oz.au
Subject: pip comments


paul

i've read through the first 4 parts of PIP. seems
very intriguing. at the very least it may stimulate
lots of thought and discussion.

i have a bunch of "simple" comments on the first
4 sections. i started to read section 5 and
realized that that section needs a lot of careful
reading and thought before i can competently comment
on it.

most of these comments are based on general 
implementation experience. basically, i want
the routers to be able to parse the header as
efficiently as possible. my main assumptions
are:
a) network bandwidth is increasing,
b) packet/buffer memories are growing in
   size, and
c) risc processors will be used more and more
   in routers.

so trading off larger headers for reduced cpu
time in forwarding a packet is "a good thing"

sorry for the sort of rambling nature of the comments.
i typed them in as i was reading the paper.

see you in boston
frank kastenholz
clearpoint research corp.


1. rearrange some of the header fields to make them
   more easily parsed:

   +--------+--------+-------------------------+
   |Protocol|HopCount| Handling Directive      |
   +--------+--------+-------------------------+

   and make fixed part of the RD divide up as
   follows:
      logical router 16 or 32 bits
      RH Length+descriptor 16 or 32 bits
      (so sum of BOTH is even mult of 32).
      valid values of the fields need not be that 
      big, just the fields.

   this is important for high speed routers to parse
   the packet. do not want to do masks and shifts to 
   deal with the header.

2. header checksum is not really needed (i think) since
   most all media are covered by a much better crc.

3. frag/reasm are needed since the route could change,
   altering the max path pdu size, unless there is
   a "max pdu size change" message in pip's icmp.

4. version numbers are a) always good, and b) take little
   overhead. furthermore, it may be needed to differentiate
   from ip if pip needs to use the same protocol type as
   ip.

5. make each id in the ID field pad out to a multiple
   of 32 bits -- easier extraction and parsing by
   processors doing high speed routing (especially
   risc type processors which may not do non-32-bit
   operations very well.)

6. make the pip protocol field bigger -- i can just see it
   now "We are running out of protocols!" it can't hurt to
   make it bigger, and making it 16 (or even 32) bits would
   make for more efficient processing of the header.

7. ditto the hop count. network diameter is increasing. even
   though the field is on a 32 bit boundary, it will be more
   efficient if its length is 16 (or even 32) bits.

8. i do not like the variability between the tunnel and the RD
   goop. it leads to too many decisions to be made when 
   routing the packet.
   
9. when doing tunneling, why not just encapsulate the packet 
   in a "tunneled PIP" packet? this would allow nested 
   tunneling without the complexities you list on pp5.

   i would hate to limit pip to one level of tunneling and 
   then end up saying in three years "i wish we had multi-
   level tunneling"

10. LR bits probably need internet-wide common meaning set in 
   something like "assigned numbers". if my LR bit 7 means 
   something different than your LR bit 7 then bad things can 
   happen. i am skeptical of using a distributed control 
   algorithm -- i do not know how willing people will be of 
   letting some global noc come in and set/change the bit 
   meaning and there are machines that are not always running 
   some network software so they would not receive the "here 
   is the meaning of the LR bits packet"

   there would really need to be a query-response mechanism 
   like dns. but then, to ask a dns server, i need to send a 
   packet, which must have its LR bits set and what value do 
   i set the bits to? some well known defaults are needed.

11. is the RH field big enough? remember vint's call for 10**9 
   networks.  while 10**9 is a big number, my guess is that the 
   net will (try to) grow much bigger than that much faster than 
   we expect.

   if i read pp 6 and 7 right, the RH stuff is describing 
   hierarchical nets. RHF offset is 6 bits -- giving a maximum 
   of 64 levels (up+down+none) in the hierarchy.

12. i would lengths of the RHF+RHFR always be a even number of 
   bytes.  much more efficient in implementation. might even 
   consider making the RHFR field 8 bits long -- we may find
   good things to do with the bits. i'd also make the RHFR
   fields 1, 2, 3...16 bytes long, i.e. RHF Length +1.

13. the handling directive ought to be globally significant, 
   at least defining global to mean any single internetwork. or
   maybe it ought to be split into two parts -- a local and a 
   global part. i am not too keen on the idea of having the 
   border routers "fix" the field as a packet crosses domains. 

14. the IDs are needed. they (well, actually, ip addresses today) 
   are part of the identification of tcp connections. IDs would 
   have the same use in a "pip" network. the IDs should be a 
   simple, globally unique number.  at least 32 bits long, maybe 
   64. do not make them variable length, keep them fixed size. 
   efficiency is important.

   IDs are also useful debugging information. as i monitor a 
   network for some reason, i would like to see where packets 
   came from, where they are going to and so on.

   use the ID type field just to indicate the form of the ID, 
   not the length.

15. you also want things like echo and destination unreachable 
   messages in PIP's ICMP. echo (ping) is the second most useful 
   tool in managing and troubleshooting network problems. 


From owner-Big-Internet@munnari.oz.au Thu May 28 17:07:29 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA25515; Thu, 28 May 1992 17:07:55 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Message-Id: <9205280707.25515@munnari.oz.au>
Received: from bells.cs.ucl.ac.uk by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA25493; Thu, 28 May 1992 17:07:29 +1000 (from P.Tsuchiya@cs.ucl.ac.uk)
Received: from wanstead.cs.ucl.ac.uk by bells.cs.ucl.ac.uk with local SMTP 
          id <g.05256-0@bells.cs.ucl.ac.uk>; Thu, 28 May 1992 08:07:05 +0100
To: kasten@europa.clearpoint.com (Frank Kastenholz)
Cc: big-internet@munnari.oz.au
Subject: Re: pip comments
In-Reply-To: Your message of "Wed, 27 May 92 10:07:03 EDT." <9205271407.AA10564@europa.clearpoint.com>
Date: Thu, 28 May 92 08:06:58 +0100
From: Paul Tsuchiya <P.Tsuchiya@cs.ucl.ac.uk>

> 
> most of these comments are based on general 
> implementation experience. basically, i want
> the routers to be able to parse the header as
> efficiently as possible. my main assumptions
> are:
> a) network bandwidth is increasing,
> b) packet/buffer memories are growing in
>    size, and
> c) risc processors will be used more and more
>    in routers.
> 
> so trading off larger headers for reduced cpu
> time in forwarding a packet is "a good thing"

But processors are also getting faster, so I'm not sure the tradeoff
towards easier-to-parse packets is that important.  My considerations
with respect to this are the following:

1.  I don't know that the smashed header is that much slower to parse
than the spread-out header.  My guess is that the difference is
quantifiable by a small handful of extra shifts and masks.  And, isn't
it the case that due to pipelining (on risc cpus), sometimes these shifts and
masks (made on registers) come for free?  Anyway, it seems to me that even
in the case of the spread-out header, unless the field you want is on
a 32-bit (or in the near future, 64-bit) boundary, you are going to
have to usually do a mask, if not shift and mask, anyway, just to
isolate the field.  So I really question whether you gain much at all
with the different header format.

2.  Respectable people have shown in the not-to-distant past that
it is not header processing but getting in and out of memory that is
the real bottleneck (I'm thinking specifically of Dave Clark's sigcomm
paper a few years back).  Now, I know that technology changes fast,
but I'm still under the impression that this remains basically true.

3.  I'm assuming that folks will run multi-media directly over Pip, as
they are currently running with multi-media directly over IP.
Since you want to keep packet sizes down for these cases, I'm inclined
to make some effort to keep the Pip header small (again assuming that
it doesn't REALLY kill you in processing).

4.  I have a desire to almost always be able to keep all of the parts of the
Pip header that a router needs to look at (through the RD) in a single ATM
cell.  This will allow a router to switch a Pip header in one cell
time, and not have to buffer up the first cell and wait for the second
cell to do more header processing.  

(Note that 3 is the anti-ATM argument and 4 is the pro-ATM argument.
I don't want to be accused of partisan protocol design here. :-)

5.  If one is going to build special hardware or hardware assist,
then I think the details of the header format matters even less (up
to a point, because you will design your hardware to deal with whatever
you've got.  What is more important is that the mechanics of parsing
the header don't change every few years, because the address format
was changed, or folks decided to do QOS, or whatever.  This way, you
can design good hardware and stick with it.

6.  Taken to the extreme, your statement that 'trading off larger
headers for reduced cpu time in forwarding a packet is "a good thing"'
implies a header consisting of nothing but 64-bit fields, each field 
containing one protocol element.  But, since you didn't suggest this,
I'll assume that you only think that it is "a good thing" up to a
point.  Beyond this point, it becomes a bad thing.  So, the difference
between us here is not one of absolutes but one of degrees.

The header I give is a first cut, and definately subject to change.
But I want there to be real experiments with real headers on real
machines before coming to conclusions.  The header format can very
easily be changed for some time to come now.  It is the functionality 
that is of concern, so I'd just as soon not worry to much
about header structure for now.

> 
> sorry for the sort of rambling nature of the comments.
> i typed them in as i was reading the paper.

No problem.  I'd rather have rambling comments than no comments.
 
> see you in boston
> frank kastenholz
> clearpoint research corp.
> 
> 
> 1. rearrange some of the header fields to make them
>    more easily parsed:
> 
>    +--------+--------+-------------------------+
>    |Protocol|HopCount| Handling Directive      |
>    +--------+--------+-------------------------+
> 
>    and make fixed part of the RD divide up as
>    follows:
>       logical router 16 or 32 bits
>       RH Length+descriptor 16 or 32 bits
>       (so sum of BOTH is even mult of 32).
>       valid values of the fields need not be that 
>       big, just the fields.
> 
>    this is important for high speed routers to parse
>    the packet. do not want to do masks and shifts to 
>    deal with the header.


Why is the above easier to parse than:

    +--------+---------------------+----------+
    |Protocol|  Handling Directive | HopCount |
    +--------+---------------------+----------+

It seems to me that in the first case, you have to (ignoring the
Protocol field):

read 32-bit word into register X
copy into another register Y
shift Y right 16
mask
decrement hop count
test for zero (branch into never-never land if zero)
shift left 16
OR with register X
copy X into register Y
mask
use handling directive to index into some table and
    do some things.
write new HD value into Y
OR with register X


In the latter case, you have to:

read 32-bit word into register X
copy into another register Y
mask
decrement hop count
test for zero (branch into never-never land if zero)
OR with register X
copy X into register Y
shift right 8
mask
use handling directive to index into some table and
    do some things.
write new HD value into Y
shift left 8
OR with register X


Both have exactly the same number of operations.  The difference
is that one has to do shifting to process the hop count, while the
other shifts to process the HD.  Is there some aspect of header
processing design that I'm missing the boat on?  (I've never actually
written a header parser, so maybe I'm out of touch here.)


> 
> 2. header checksum is not really needed (i think) since
>    most all media are covered by a much better crc.

I tend to agree that it is not needed, but not because of the
crc argument.  This is because of cut-through switching, which
happens before you ever look at the crc.  My feeling about this
is that a corrupted Pip header isn't such a bad thing.  The
packet will go careening off somewhere, and eventually die.
I suppose one could postulate a looping situation, where the
corruption causes the packet to loop back through the
corrupting router, which also manages to reset the hop count
to a high value, so the packet never dies.  But, one can
postulate the same sort of problem (perhaps with less probability)
even when there is a checksum.

The biggest concern is that the corruption turns the header from
a unicast into a multicast.  It has occured to me that one way to
minimize this is to have two bits in the LR that indicate multicast
(one on each end of the field) to minimize the possibility that
both get corrupted.  But you still don't eliminate the possibility.

> 
> 3. frag/reasm are needed since the route could change,
>    altering the max path pdu size, unless there is
>    a "max pdu size change" message in pip's icmp.

I vote in faver of max pdu size change message.

> 
> 4. version numbers are a) always good, and b) take little
>    overhead. furthermore, it may be needed to differentiate
>    from ip if pip needs to use the same protocol type as
>    ip.

You are probably right here.  I've just noticed over the years that 1)
there are a million ways to identify protocols, and 2) most protocols
don't seem to go through that many versions anyway.  So, adding a
version number as yet another means of protocol discrimination somehow
rubs me wrong.

> 
> 5. make each id in the ID field pad out to a multiple
>    of 32 bits -- easier extraction and parsing by
>    processors doing high speed routing (especially
>    risc type processors which may not do non-32-bit
>    operations very well.)

Probably your later comment to just make the ID fields 64 bits all the
time is a good idea.  On one hand, routers strictly speaking don't
need to look at the ID to route the packet, so variable size doesn't
theoretically hurt you there, but I guess that for some
future billing or security thing, routers will probably end up looking
at them anyway.

> 
> 6. make the pip protocol field bigger -- i can just see it
>    now "We are running out of protocols!" it can't hurt to
>    make it bigger, and making it 16 (or even 32) bits would
>    make for more efficient processing of the header.

You are probably right here also, but for the running out of protocols
reason, not the efficient processing reason.  I went to some effort to
keep the fixed part of my header to two 64-bit words.  Probably in the
end it will have to be three 64-bit words, and so everything will
expand a bit.

> 
> 7. ditto the hop count. network diameter is increasing. even
>    though the field is on a 32 bit boundary, it will be more
>    efficient if its length is 16 (or even 32) bits.

256 is a lot of hops.  I can see increasing it to 10 bits (1024 hops),
just to deal with really pathological cases, but beyond that it seems
to me that you have a really bad path, and something has been
engineered wrong.

> 
> 8. i do not like the variability between the tunnel and the RD
>    goop. it leads to too many decisions to be made when 
>    routing the packet.

Hmmmm.  I don't think it amounts to too much more decision.  Just look
at the tunnel, if its not zero, use it for table index.  If it is
zero, use the RH.  Once you start indexing tables, the subsequent
machinery is pretty much the same.

>    
> 9. when doing tunneling, why not just encapsulate the packet 
>    in a "tunneled PIP" packet? this would allow nested 
>    tunneling without the complexities you list on pp5.
> 
>    i would hate to limit pip to one level of tunneling and 
>    then end up saying in three years "i wish we had multi-
>    level tunneling"

I thought about this quite a bit.  The thing I don't like about
encapsulating goes back to ATM.  If you encapsulate, then you have to
create a bunch of new cells on the fly and stick them in front of your
current stream.  In general (ATM or not), it is very quick and easy to 
set the Tunnel field vs. creating a new header.  Other opinions?

> 
> 10. LR bits probably need internet-wide common meaning set in 
>    something like "assigned numbers". if my LR bit 7 means 
>    something different than your LR bit 7 then bad things can 
>    happen. i am skeptical of using a distributed control 
>    algorithm -- i do not know how willing people will be of 
>    letting some global noc come in and set/change the bit 
>    meaning and there are machines that are not always running 
>    some network software so they would not receive the "here 
>    is the meaning of the LR bits packet"
> 
>    there would really need to be a query-response mechanism 
>    like dns. but then, to ask a dns server, i need to send a 
>    packet, which must have its LR bits set and what value do 
>    i set the bits to? some well known defaults are needed.

Consider the case with the IP TOS bits.  These were defined
internet-wide, we never figured out what to do with them, and finally
they were recently redefined by router requirements.  What a mess.  If
you get the protocols to exchange the info, you can much more easily
create and retire the values as necessary.  I agree that this makes
the system more complex and introduces increased scope for bugs, but
doing this is a lot less complicated than your average routing protocol.

The usual means for setting this stuff will be routing algorithms and
host configuration algorithms.

> 
> 11. is the RH field big enough? remember vint's call for 10**9 
>    networks.  while 10**9 is a big number, my guess is that the 
>    net will (try to) grow much bigger than that much faster than 
>    we expect.
> 
>    if i read pp 6 and 7 right, the RH stuff is describing 
>    hierarchical nets. RHF offset is 6 bits -- giving a maximum 
>    of 64 levels (up+down+none) in the hierarchy.

Yep.  64 RHFs, up to 16 32-bit words long.  So, if you have say 6
levels of hierarchy, each level with on the average 4000 sub-levels,
then you get 10**21 systems, and take up less than half of the RH.
So, no problem with growth.
 
> 
> 12. i would lengths of the RHF+RHFR always be a even number of 
>    bytes.  much more efficient in implementation. might even 
>    consider making the RHFR field 8 bits long -- we may find
>    good things to do with the bits. i'd also make the RHFR
>    fields 1, 2, 3...16 bytes long, i.e. RHF Length +1.

I don't think much more efficient.  As argued before, maybe a very
small number of shifts@masks.

> 
> 13. the handling directive ought to be globally significant, 
>    at least defining global to mean any single internetwork. or
>    maybe it ought to be split into two parts -- a local and a 
>    global part. i am not too keen on the idea of having the 
>    border routers "fix" the field as a packet crosses domains. 

Same argument as with LRs on comment 10.

> 
> 14. the IDs are needed. they (well, actually, ip addresses today) 
>    are part of the identification of tcp connections. IDs would 
>    have the same use in a "pip" network. the IDs should be a 
>    simple, globally unique number.  at least 32 bits long, maybe 
>    64. do not make them variable length, keep them fixed size. 
>    efficiency is important.
> 
>    IDs are also useful debugging information. as i monitor a 
>    network for some reason, i would like to see where packets 
>    came from, where they are going to and so on.
> 
>    use the ID type field just to indicate the form of the ID, 
>    not the length.

Good point.  I agree.

> 
> 15. you also want things like echo and destination unreachable 
>    messages in PIP's ICMP. echo (ping) is the second most useful 
>    tool in managing and troubleshooting network problems. 

Yep.


Thanks for the comments.

PT

From owner-Big-Internet@munnari.oz.au Fri May 29 15:08:37 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA00380; Fri, 29 May 1992 15:08:43 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Received: from shark.mel.dit.CSIRO.AU by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA00368; Fri, 29 May 1992 15:08:37 +1000 (from smart@mel.dit.csiro.au)
Received: from wanda.mel.dit.CSIRO.AU by shark.mel.dit.csiro.au with SMTP id AA08343
  (5.65c/IDA-1.4.4/DIT-1.3 for <big-internet@munnari.oz.au>); Fri, 29 May 1992 15:08:33 +1000
Received: by wanda.mel.dit.CSIRO.AU (4.1/SMI-4.0)
	id AA28857; Fri, 29 May 92 15:08:32 EST
Message-Id: <9205290508.AA28857@wanda.mel.dit.CSIRO.AU>
To: big-internet@munnari.oz.au
Subject: Comparison of rfc1335 with supernetting proposal
Date: Fri, 29 May 92 15:08:32 +1000
From: Bob Smart <smart@mel.dit.csiro.au>

The fact that rfc1335 got published in a very rough state after no
significant discussion in the Big-Internet list shows how serious the
situation is. I think that proposal deserves more attention from this
list. Here are the pros and cons as I see them.

There are 2 proposals: 

A. Networks with 12 bits of host number in the top half of Class C
   that can be given to most people wanting a Class B. I'll call
   this CHIAPPA.

B. Give sites a small number (nearly always 1) of Class C networks
   to cover the number of external addresses they need at any one time.
   I'll call this WANG.

1. Effect on routers:

CHIAPPA will increase the number of networks as seen by routers that 
think they are 16 different Class Cs, but this can be handled by
supernetting.

WANG won't increase the number of networks other than the normal (hectic)
increase of the Internet. Even that increase can have a reduced effect
on the routers if the Class Cs allocated for external use are allocated
in ways that will work well with supernetting.

2. Effect on existing systems:

Both systems allow the existing world to keep going without software
changes.

3. Effect on the sites which get lumbered with the new stuff:

CHIAPPA will probably work fairly easily at sites which don't want any
individual subnets >256 hosts. Even then if they have an intelligent
router they can put 2 Class Cs on one network and it will at least
forward the packets between them. So even parts of the local network
that don't understand the new format explicitly can work.

WANG has a much more serious effect on the sites who have to use it.
The easy way to do WANG is to have only the routers know about it
and have them do address translation where appropriate. This may be
expensive, but it isn't a lot different to what ATM cells go through
all the time at mega-cells/sec. If you don't mind the inefficiency you
can make it work with only the routers connected to outside networks
knowing about the External Address [this could mean that packets go
through some internal routers, get converted, and come back through
the same routers to a nearby machine].

If you want to implement WANG by changing hosts as recommended in
RFC1335 you have a lot more work to do. However you can do a mixture
and use address translation in the routers to handle hosts which 
can't or won't have their software updated.

The real effect of WANG on the sites using it is that it makes the
hosts on the network which don't have permanent external addresses
into 2nd class citizens. They can call out, but can't conveniently
set up services for external hosts to call.

[We could actually get around this with some cleverness using source
routing or with a new ICMP message. For example we could have in the
DNS:

x.y.zz.		IN	AW	197.23.111.1:2.15.16.177

and if we wanted to call that machine we'd send a special icmp message
to the router 197.23.111.1 saying we want to talk to the host with
internal number 2.15.16.177. The router would then assign an external
address for that host and return that to the machine wanting to
connect. This will take place in the gethostbyname routine of the
machine wishing to connect, and it will then return the allocated
external address to the client wishing to connect.]

A corollary is that the sites forced to use WANG would be second-class
sites. Not a situation you would want to have for long. In another
forum I half-seriously suggested making everybody second-class
simultaneously (with various beneficial side-affects).

One reason that the WANG proposal is actually quite acceptable is that
the sort of restrictions it places on external contact to internal
servers is just the sort of restriction which network managers are
implementing now (with cisco access lists) to improve security.

4. Effect on the Problem

Here is the crucial difference. WANG allows us to have millions of
networks. That will give us enough time to get PIP or something else
ready. I am concerned that CHIAPPA has only enough extra margin to
cover the next couple of years, and will seriously restrict the
expansion of Internet-connected TCP/IP beyond the Research and
Education community.

Bob Smart

From owner-Big-Internet@munnari.oz.au Fri May 29 17:00:53 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA04109; Fri, 29 May 1992 17:01:01 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Received: from lager.cisco.com by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA04106; Fri, 29 May 1992 17:00:53 +1000 (from tli@cisco.com)
Received: by lager.cisco.com; Fri, 29 May 92 00:00:21 -0700
Date: Fri, 29 May 92 00:00:21 -0700
From: Tony Li <tli@cisco.com>
Message-Id: <9205290700.AA10955@lager.cisco.com>
To: smart@mel.dit.csiro.au
Cc: big-internet@munnari.oz.au
In-Reply-To: Bob Smart's message of Fri, 29 May 92 15:08:32 +1000 <9205290508.AA28857@wanda.mel.dit.CSIRO.AU>
Subject: Comparison of rfc1335 with supernetting proposal


   The fact that rfc1335 got published in a very rough state after no
   significant discussion in the Big-Internet list shows how serious the
   situation is. 

RFC1335 is an informational RFC.  It's speed in publishing is
irrelevant.

Tony

From owner-Big-Internet@munnari.oz.au Sat May 30 00:31:26 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA15591; Sat, 30 May 1992 00:31:47 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Received: from europa.clearpoint.com by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA15588; Sat, 30 May 1992 00:31:26 +1000 (from kasten@europa.clearpoint.com)
Received: by europa.clearpoint.com (4.1/1.34)
	id AA14350; Fri, 29 May 92 10:23:19 EDT
Date: Fri, 29 May 92 10:23:19 EDT
From: kasten@europa.clearpoint.com (Frank Kastenholz)
Message-Id: <9205291423.AA14350@europa.clearpoint.com>
To: P.Tsuchiya@cs.ucl.ac.uk
Subject: T
Re: pip comments
In-Reply-To: Mail from 'Paul Tsuchiya <P.Tsuchiya@cs.ucl.ac.uk>'
      dated: Thu, 28 May 92 08:06:58 +0100
Cc: big-internet@munnari.oz.au

 > From P.Tsuchiya@cs.ucl.ac.uk Thu May 28 02:59:57 1992
 > To: kasten@europa.clearpoint.com (Frank Kastenholz)
 > Cc: big-internet@munnari.oz.au
 > Subject: Re: pip comments
 > From: Paul Tsuchiya <P.Tsuchiya@cs.ucl.ac.uk>
 > 
 > > 
 > > most of these comments are based on general 
 > > implementation experience. basically, i want
 > > the routers to be able to parse the header as
 > > efficiently as possible. my main assumptions
 > > are:
 > > a) network bandwidth is increasing,
 > > b) packet/buffer memories are growing in
 > >    size, and
 > > c) risc processors will be used more and more
 > >    in routers.
 > > 
 > > so trading off larger headers for reduced cpu
 > > time in forwarding a packet is "a good thing"
 > 
 > But processors are also getting faster, so I'm not sure the tradeoff
 > towards easier-to-parse packets is that important.  My considerations
 > with respect to this are the following:
 > 
 > 1.  I don't know that the smashed header is that much slower to parse
 > than the spread-out header.  My guess is that the difference is
 > quantifiable by a small handful of extra shifts and masks.  And, isn't
 > it the case that due to pipelining (on risc cpus), sometimes these shifts and
 > masks (made on registers) come for free?  Anyway, it seems to me that even

not with the cpus with which i am familiar. you still have to spend a
cycle someplace to execute the instruction. what gets pipelined is
fetching things from memory, decoding instructions, and so on. even
in more parallel types of processors, it still costs some amount of
cpu resources to execute the instruction and those resources are not
available for some other operation.

 > in the case of the spread-out header, unless the field you want is on
 > a 32-bit (or in the near future, 64-bit) boundary, you are going to
 > have to usually do a mask, if not shift and mask, anyway, just to
 > isolate the field.  So I really question whether you gain much at all
 > with the different header format.

i probably did not explicitly say it in the original note, but this
is what i have tried to do.

 > 2.  Respectable people have shown in the not-to-distant past that
 > it is not header processing but getting in and out of memory that is
 > the real bottleneck (I'm thinking specifically of Dave Clark's sigcomm
 > paper a few years back).  Now, I know that technology changes fast,
 > but I'm still under the impression that this remains basically true.

memory access time is important, there is no doubt about it.
however, cpu time and instruction counts are also important. 
every instruction that gets executed must also be fetched from
memory, or cache. it takes time to execute instructions and
so on.

even though processor speeds are increasing, the maximum packet
rate of the networks is also increasing. if one has as a target
the goal of "wire" speed packet switching -- i.e. to route/bridge/
whatever a packets faster than the attached network(s) can handle them
then the amounts of time available to do this get amazingly small.
for instance, on an ethernet, the smallest packets require 67.2
microseconds. at clearpoint i am working on an 8 port full speed
bridge/router. this means that i must be able to filter packets
(receive, process and discard) in 8.4 microseconds or be able
to forward a packet (receive, process and output on another port)
in 16.8 microseconds. we use a 20 MHz risc processor, which gives
us at most 336 instructions to do everything, which includes
care and feeding of the interface chips, buffer management, 
etc, etc, etc, etc. there are actually even fewer instructions
available since there are some interrupts constantly going on
and some memory is not 0 wait state, etc, etc.

cpu speeds are going up. so are network speeds. people are going
to multi-processor designs, but, as you go to more processors you
add overhead for inter-processor cooperation, locking, etc, etc.

 > 3.  I'm assuming that folks will run multi-media directly over Pip, as
 > they are currently running with multi-media directly over IP.
 > Since you want to keep packet sizes down for these cases, I'm inclined
 > to make some effort to keep the Pip header small (again assuming that
 > it doesn't REALLY kill you in processing).

i do not understand why multi-media must have small packet sizes just
because it is multi-media. what it needs is certain latency,
throughput and bandwidth characteristics. as long as these
characteristics are provided, packet size per-se should not matter.
granted, keeping packets small is a real good way of achieving these
goals.

 > 4.  I have a desire to almost always be able to keep all of the parts of the
 > Pip header that a router needs to look at (through the RD) in a single ATM
 > cell.  This will allow a router to switch a Pip header in one cell
 > time, and not have to buffer up the first cell and wait for the second
 > cell to do more header processing.  

if that's a goal, then that's a goal. i do not know enough about atm to
be able to say if this is a reasonbable goal or not.

 > 5.  If one is going to build special hardware or hardware assist,
 > then I think the details of the header format matters even less (up
 > to a point, because you will design your hardware to deal with whatever
 > you've got.  What is more important is that the mechanics of parsing
 > the header don't change every few years, because the address format
 > was changed, or folks decided to do QOS, or whatever.  This way, you
 > can design good hardware and stick with it.

first, i am not so sure how real these schemes are. i've been hearing
about it for many years. i believe that some products are finally
starting to show up but from all i have been able to see, they
are bridging related. we all sort of hold "hardware-based-routing"
as some kind of holy grail and it always seems to be a couple
of years away.

we do not have widely fielded IP routing in hardware (i recall hearing
of some people trying to do it). the PIP header is more complex and
larger than IP's header.  

also, alot of people will still do this in software. your argument is that
the header format will not matter for hardware-based routing systems. if
that is the case, we may as well make sure that the header format does
not hurt software-based routing.

 > > 1. rearrange some of the header fields to make them
 > >    more easily parsed:
 > > 
 > >    +--------+--------+-------------------------+
 > >    |Protocol|HopCount| Handling Directive      |
 > >    +--------+--------+-------------------------+
 > 
 > Why is the above easier to parse than:
 > 
 >     +--------+---------------------+----------+
 >     |Protocol|  Handling Directive | HopCount |
 >     +--------+---------------------+----------+

my only answer is gut feeling. i do not like splitting the
HD across 16-bit boundaries. taking what could be treated
as a 16 bit number (the HD) and splitting it across a 16
bit boundary just feels wrong. i do not know why. i can't
prove it.

 > > 
 > > 3. frag/reasm are needed since the route could change,
 > >    altering the max path pdu size, unless there is
 > >    a "max pdu size change" message in pip's icmp.
 > 
 > I vote in faver of max pdu size change message.

unless it gets lost in the network. even still, how do i know
where to send the message to? it could operate the same as, say,
the current icmp "fragmentation needed and DF set" message,
but this would lead to the loss of all packets that are in
the network that are "before" the point at which the MTU
changes. such a large packet loss seems to be rather offensive.
this is even more offensive for non-tcp packets. they may
not be "segmentable" as a tcp byte stream is. these protocols
may also have trivial error detection/recovery/retransmission
schemes (e.g. SNMP).

i would combine both -- do an MTU change message, informing the
source that it better send out different size packets, and 
allow fragmentation so that packet loss is reduced.


 > > 
 > > 4. version numbers are a) always good, and b) take little
 > >    overhead. furthermore, it may be needed to differentiate
 > >    from ip if pip needs to use the same protocol type as
 > >    ip.
 > 
 > You are probably right here.  I've just noticed over the years that 1)
 > there are a million ways to identify protocols, and 2) most protocols
 > don't seem to go through that many versions anyway.  So, adding a
 > version number as yet another means of protocol discrimination somehow
 > rubs me wrong.

i keep making the arguments you make and keep ending up wishing that
there were (or glad that there are) version numbers in things. this 
is simply a result of experience. 

there are two ways to identify a version -- either by the lower
layer's type field (of which there are only 65535 distinct values for
ethernet)  or as being self-encoded in the protocol -- basically
a version number. we can argue as to whether the self-encoding
form is really a version number or something else and whether it
should be 4, 8, 16, 32, ... bits long, but the important element
is to have a way of changing things in "our" protocol without
having to go to someone else (e.g. IEEE for enet) and get
another type number for the "new" version.... 

granted that version's do not change all that often. but
they do change and having a version number is a low overhead
thing. 

 > > 
 > > 5. make each id in the ID field pad out to a multiple
 > >    of 32 bits -- easier extraction and parsing by
 > >    processors doing high speed routing (especially
 > >    risc type processors which may not do non-32-bit
 > >    operations very well.)
 > 
 > Probably your later comment to just make the ID fields 64 bits all the
 > time is a good idea.  On one hand, routers strictly speaking don't
 > need to look at the ID to route the packet, so variable size doesn't
 > theoretically hurt you there, but I guess that for some
 > future billing or security thing, routers will probably end up looking
 > at them anyway.

also, why complicate something that need not be complicated?

 > > 7. ditto the hop count. network diameter is increasing. even
 > >    though the field is on a 32 bit boundary, it will be more
 > >    efficient if its length is 16 (or even 32) bits.
 > 
 > 256 is a lot of hops.  I can see increasing it to 10 bits (1024 hops),
 > just to deal with really pathological cases, but beyond that it seems
 > to me that you have a really bad path, and something has been
 > engineered wrong.

while i agree that a network diameter of greater than 256 or 1024 is
probably silly, i am also concerned with forwarding efficiency. most
processors that i am aware of support 16 bit math natively, i do
not recall any that support 8 bit math (Z80 and 6502... excepted :-).
thus, to do the math on an 8 bit field i would need to do some 
masking/math/merging operations. if the field is 16 bits, i could 
operate on the field directly.

if the field is "large", we have the option of administratively limiting
the significant bits to be "small".

 > > 8. i do not like the variability between the tunnel and the RD
 > >    goop. it leads to too many decisions to be made when 
 > >    routing the packet.
 > 
 > Hmmmm.  I don't think it amounts to too much more decision.  Just look
 > at the tunnel, if its not zero, use it for table index.  If it is
 > zero, use the RH.  Once you start indexing tables, the subsequent
 > machinery is pretty much the same.

except that decisions (i.e. branches) in risc processors can be
very expensive. if you have to take a branch then the pipeline
has to be flushed and you have to restart it at the new location,
maybe the cache has to get reloaded, etc, etc. it can take several
instruction cycles to restart things. very expensive when you only
have a couple of hundred instructions.

 > > 9. when doing tunneling, why not just encapsulate the packet 
 > >    in a "tunneled PIP" packet? this would allow nested 
 > >    tunneling without the complexities you list on pp5.
 > > 
 > >    i would hate to limit pip to one level of tunneling and 
 > >    then end up saying in three years "i wish we had multi-
 > >    level tunneling"
 > 
 > I thought about this quite a bit.  The thing I don't like about
 > encapsulating goes back to ATM.  If you encapsulate, then you have to
 > create a bunch of new cells on the fly and stick them in front of your
 > current stream.  In general (ATM or not), it is very quick and easy to 
 > set the Tunnel field vs. creating a new header.  Other opinions?

this is true. but on the otherhand, what is the 90% case? i imagine
that more packets will not be tunneled than will be tunneled. if i
correctly understand the parts of the spec that i have read, the
tunneling will basically be done only for packets that are
crossing domains, etc, etc. i imagine that more packets stay
within a domain than leave it. thus, relative to the number of
pip packets generated in the world, tunneled ones will be rare
(though quite common in things like backbones).

also, i have this feeling that any prediction (e.g. only one level
of tunneling is needed) that we make today will, in three to five
years, turn out to be bad. i would hate to limit the network's
flexibility because of a decision made today. granted, we can not
make the protocol so general that it solves all problems for all
times, but, everything else being equal, i think we should opt
for generality. my own belief is that separating the tunnel-
encapsulation is a performance win much more often than not, and
it would provied flexibility that we will probably need in 
several years.

 > > 10. LR bits probably need internet-wide common meaning set in 
 > >    something like "assigned numbers". if my LR bit 7 means 
 > >    something different than your LR bit 7 then bad things can 
 > >    happen. i am skeptical of using a distributed control 
 > >    algorithm -- i do not know how willing people will be of 
 > >    letting some global noc come in and set/change the bit 
 > >    meaning and there are machines that are not always running 
 > >    some network software so they would not receive the "here 
 > >    is the meaning of the LR bits packet"
 > > 
 > >    there would really need to be a query-response mechanism 
 > >    like dns. but then, to ask a dns server, i need to send a 
 > >    packet, which must have its LR bits set and what value do 
 > >    i set the bits to? some well known defaults are needed.
 > 
 > Consider the case with the IP TOS bits.  These were defined
 > internet-wide, we never figured out what to do with them, and finally
 > they were recently redefined by router requirements.  What a mess.  If
 > you get the protocols to exchange the info, you can much more easily
 > create and retire the values as necessary.  I agree that this makes
 > the system more complex and introduces increased scope for bugs, but
 > doing this is a lot less complicated than your average routing protocol.
 > 
 > The usual means for setting this stuff will be routing algorithms and
 > host configuration algorithms.

yup. this problem does exist. we all should realize though that
all this does is move the problem someplace else. we still need
a common way of associating LR bits with their QOS/whatever. 
we would still have to, at some point, come up with all the
policies that could be associated with LR bits, how to specify them,
what they mean etc, etc. defering this work til later (i.e. in
another protocol) is a good thing since it means that the pip
development and the lr-policy development are divorced from
each other and each can proceed at its own best pace.


frank kastenholz


From owner-Big-Internet@munnari.oz.au Sat May 30 00:56:37 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA16175; Sat, 30 May 1992 00:56:46 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Message-Id: <9205291456.16175@munnari.oz.au>
Received: from bells.cs.ucl.ac.uk by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA16170; Sat, 30 May 1992 00:56:37 +1000 (from J.Crowcroft@cs.ucl.ac.uk)
Received: from kant.cs.ucl.ac.uk by bells.cs.ucl.ac.uk with local SMTP 
          id <g.24812-0@bells.cs.ucl.ac.uk>; Fri, 29 May 1992 15:54:53 +0100
To: kasten@europa.clearpoint.com (Frank Kastenholz)
Cc: P.Tsuchiya@cs.ucl.ac.uk, big-internet@munnari.oz.au
Subject: Re: T
In-Reply-To: Your message of "Fri, 29 May 92 10:23:19 EDT." <9205291423.AA14350@europa.clearpoint.com>
Date: Fri, 29 May 92 15:54:24 +0100
From: Jon Crowcroft <J.Crowcroft@cs.ucl.ac.uk>


 >i do not understand why multi-media must have small packet sizes just
 >because it is multi-media. what it needs is certain latency,
 >throughput and bandwidth characteristics. as long as these
 >characteristics are provided, packet size per-se should not matter.
 >granted, keeping packets small is a real good way of achieving these
 >goals.

the 2 reasons for baby packets (e.g. in ATM) are connected:
1. cut through allows low delay variance (jitter/wander etc)
2. cut through means low delay end to end due to small buffering times
so you stay within the 600 million phone systems delay budget (thats
why the french wanted 32 byte atm cells, sans echo canceller, and US
wanted 64 avec...)

plus for human interaction you need to keep e2e average
delay minimised, not just variance...

 jon


From owner-Big-Internet@munnari.oz.au Sat May 30 03:04:37 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA19034; Sat, 30 May 1992 03:04:47 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Received: from jarvis.csri.toronto.edu by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA19030; Sat, 30 May 1992 03:04:37 +1000 (from @jarvis.csri.toronto.edu,@relay.cs.toronto.edu:@smoke.cs.toronto.edu:rayan@cs.toronto.edu)
Received: by jarvis.csri.toronto.edu id <6844>; Fri, 29 May 1992 13:03:26 -0400
To: Big-Internet@munnari.OZ.AU
From: rayan@cs.toronto.edu (Rayan Zachariassen)
Subject: Re: Comparison of rfc1335 with supernetting proposal
Organization: Department of Computer Science, University of Toronto
References: <9205290508.AA28857@wanda.mel.dit.CSIRO.AU>
Distribution: list
Date: 	Fri, 29 May 1992 13:02:47 -0400
Message-Id: <92May29.130326edt.6844@jarvis.csri.toronto.edu>

Dynamic address assignment is vile!

   In other words, if I'm snooping on the wire, I need to know exactly
   which addresses correspond to which hosts at all times, at both ends
   of a session.  If the addresses are only resolved near the end-system,
   and even worse if they change over time, then audit trails, security
   investigations, and detailed traffic analysis all go out the window.

Forced access control is what keeps big bureaucracies going.

   It has no place in the Internet society [sic].

rayan

From owner-Big-Internet@munnari.oz.au Sat May 30 19:50:48 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA10841; Sat, 30 May 1992 19:50:59 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Message-Id: <9205300950.10841@munnari.oz.au>
Received: from bells.cs.ucl.ac.uk by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA10837; Sat, 30 May 1992 19:50:48 +1000 (from P.Tsuchiya@cs.ucl.ac.uk)
Received: from waffle.cs.ucl.ac.uk by bells.cs.ucl.ac.uk with local SMTP 
          id <g.13992-0@bells.cs.ucl.ac.uk>; Sat, 30 May 1992 10:50:28 +0100
To: kasten@europa.clearpoint.com (Frank Kastenholz)
Cc: big-internet@munnari.oz.au
Subject: Re: T
In-Reply-To: Your message of "Fri, 29 May 92 10:23:19 EDT." <9205291423.AA14350@europa.clearpoint.com>
Date: Sat, 30 May 92 10:50:16 +0100
From: Paul Tsuchiya <P.Tsuchiya@cs.ucl.ac.uk>


Frank,

Many good comments on header design/processing.  It will be interesting
to design two headers, one meant to be small, and another meant to be easy to
parse, and see 1) how much slower the small one is, and 2) how much bigger the fast
one is.  Hopefully, this is a decision we can put off for a little bit.  I'll
also be interested in hearing what other fast bridge/router implementors have
to say about it.  Also, I wonder how much of all of this is a function of
current technology, and how much is pretty fundamental.

>  >
>  > I vote in faver of max pdu size change message.
> 
> unless it gets lost in the network. even still, how do i know
> where to send the message to? it could operate the same as, say,
> the current icmp "fragmentation needed and DF set" message,
> but this would lead to the loss of all packets that are in
> the network that are "before" the point at which the MTU
> changes. such a large packet loss seems to be rather offensive.
> this is even more offensive for non-tcp packets. they may
> not be "segmentable" as a tcp byte stream is. these protocols
> may also have trivial error detection/recovery/retransmission
> schemes (e.g. SNMP).
> 
> i would combine both -- do an MTU change message, informing the
> source that it better send out different size packets, and
> allow fragmentation so that packet loss is reduced.

But, from a simplicity-in-header-processing perspective, don't you
find dealing with fragmentation also offensive?  I think a good style
would be to have a notion of an MTU size that will succeed with
very high probability (perhaps even by advertizing MTU size along
with routing information), and then, once you start sending packets,
also send a "path info packet" of some sort, that will return such
information as max PDU, line rates (for setting rate control?), or
whatever.  At that point, the true MTU size will be figured, and
you can increase you packet sizes accordingly.  Of course, you might
get a path change somewhere that results in a smaller MTU, but usually
some loss of packets is associated with these kinds of changes anyway,
so perhaps its doesn't matter that much.

> 
>  > 4.  I have a desire to almost always be able to keep all of the parts of the
>  > Pip header that a router needs to look at (through the RD) in a single ATM
>  > cell.  This will allow a router to switch a Pip header in one cell
>  > time, and not have to buffer up the first cell and wait for the second
>  > cell to do more header processing.  
> 
> if that's a goal, then that's a goal. i do not know enough about atm to
> be able to say if this is a reasonbable goal or not.

Well, it's a soft goal, and of mine, not other people's really.  Its the
sort of thing that might turn out to be important and might not.

> 
> first, i am not so sure how real these schemes are. i've been hearing
> about it for many years. i believe that some products are finally
> starting to show up but from all i have been able to see, they
> are bridging related. we all sort of hold "hardware-based-routing"
> as some kind of holy grail and it always seems to be a couple
> of years away.
> 
> we do not have widely fielded IP routing in hardware (i recall hearing
> of some people trying to do it). the PIP header is more complex and
> larger than IP's header.  

Perhaps one reason not many folks commit to hardware is that, the mechanics
for interpreting the headers changes often enough that it is risky.  Also, if
you are planning on adding new protocols to multi-protocol switches, you
want to limit your special hardware.

I guess it would be dreaming to think that Pip will become so popular, and be
so general, that multi-protocol will become a thing of the past.  But, why
not try for it, eh?

> 
> we do not have widely fielded IP routing in hardware (i recall hearing
> of some people trying to do it). the PIP header is more complex and
> larger than IP's header.

Its true that the Pip header is more complex and larger, but I think
that on the whole Pip forwarding is faster (with a software forwarder,
and in the most general case) because the routing table lookup can be
quite expensive with IP, when you consider general subnet masks and
more-so in the not-to-distant future if CIDR comes along.  Or perhaps,
the correct characterization is that the speed of IP forwarding can vary a
lot (depending on whether you get a hash hit or not, and depending on
how big your routing table is and how varied your masks are), whereas
the speed of Pip has a much smaller variance, although the lower limit
is higher than the lower limit with IP.

>  > 
>  > Hmmmm.  I don't think it amounts to too much more decision.  Just look
>  > at the tunnel, if its not zero, use it for table index.  If it is
>  > zero, use the RH.  Once you start indexing tables, the subsequent
>  > machinery is pretty much the same.
> 
> except that decisions (i.e. branches) in risc processors can be
> very expensive. if you have to take a branch then the pipeline
> has to be flushed and you have to restart it at the new location,
> maybe the cache has to get reloaded, etc, etc. it can take several
> instruction cycles to restart things. very expensive when you only
> have a couple of hundred instructions.

But, how expensive is this compared to encapsulating/decapsulating?
I think that tunneling is an important mechanism, as it allows for
a strong decoupling between internal and external routing, so I don't
think it should be delegated to a "once-in-a-rare-while" technique.

> 
> also, i have this feeling that any prediction (e.g. only one level
> of tunneling is needed) that we make today will, in three to five
> years, turn out to be bad. i would hate to limit the network's
> flexibility because of a decision made today. granted, we can not
> make the protocol so general that it solves all problems for all
> times, but, everything else being equal, i think we should opt
> for generality. my own belief is that separating the tunnel-
> encapsulation is a performance win much more often than not, and
> it would provied flexibility that we will probably need in 
> several years.

I'll think about this.  It may be possible to treat the Tunnel field
as hierarchical while still preserving the "single index" style of
forwarding that is prevalent with Pip.

> 
> yup. this problem does exist. we all should realize though that
> all this does is move the problem someplace else. we still need
> a common way of associating LR bits with their QOS/whatever. 
> we would still have to, at some point, come up with all the
> policies that could be associated with LR bits, how to specify them,
> what they mean etc, etc. defering this work til later (i.e. in
> another protocol) is a good thing since it means that the pip
> development and the lr-policy development are divorced from
> each other and each can proceed at its own best pace.
> 

I don't think it will be that hard to do dynamic definition of LR
bits.  I think this information will ride quite easily on existing
routing and host configuration protocols.  But, not having shown
how to do it, I can't verify this statement yet.

Also, if we lock ourselves into "hard-coded" definitions now, it may
be very hard to extract ourselves from that style later on.  So, I'd
rather make it dynamic from the start.

PT

From owner-Big-Internet@munnari.oz.au Sun May 31 21:06:13 1992
Received: by munnari.oz.au (5.64+1.3.1+0.50)
	id AA05445; Sun, 31 May 1992 21:06:18 +1000 (from owner-Big-Internet)
Return-Path: <owner-Big-Internet@munnari.oz.au>
Message-Id: <9205311106.5445@munnari.oz.au>
Received: from bells.cs.ucl.ac.uk by munnari.oz.au with SMTP (5.64+1.3.1+0.50)
	id AA05442; Sun, 31 May 1992 21:06:13 +1000 (from J.Crowcroft@cs.ucl.ac.uk)
Received: from waffle.cs.ucl.ac.uk by bells.cs.ucl.ac.uk with local SMTP 
          id <g.06317-0@bells.cs.ucl.ac.uk>; Sun, 31 May 1992 12:05:37 +0100
To: rayan@cs.toronto.edu (Rayan Zachariassen)
Cc: Big-Internet@munnari.OZ.AU
Subject: Re: Comparison of rfc1335 with supernetting proposal
In-Reply-To: Your message of "Fri, 29 May 92 13:02:47 EDT." <92May29.130326edt.6844@jarvis.csri.toronto.edu>
Date: Sun, 31 May 92 12:05:25 +0100
From: Jon Crowcroft <J.Crowcroft@cs.ucl.ac.uk>


 >Dynamic address assignment is vile!

wow - strong opinion...
 
 >   In other words, if I'm snooping on the wire, I need to know exactly

some people think snooping on the wire is vile:-)
(try the milnet...)
 >   which addresses correspond to which hosts at all times, at both ends
 
 >Forced access control is what keeps big bureaucracies going.
 >   It has no place in the Internet society [sic].
sure - who said anything about access control?
 
 jon