![]() |
![]()
| ![]() |
![]()
NAMEtags - Vi tags file format extended in ctags projects DESCRIPTIONThe contents of next section is a copy of FORMAT file in Exuberant Ctags source code in its subversion repository at sourceforge.net. Exceptions introduced in Universal Ctags are explained inline with "EXCEPTION" marker. Statements that are made further clear in Universal Ctags are explained inline with "COMMENT" marker.
PROPOSAL FOR EXTENDED VI TAGS FILE FORMATVersion: 0.06 DRAFT Date: 1998 Feb 8 Author: Bram Moolenaar <Bram at vim.org> and Darren Hiebert <dhiebert at users.sourceforge.net> IntroductionThe file format for the "tags" file, as used by Vi and many of its descendants, has limited capabilities. This additional functionality is desired:
From proposal to standardTo make this proposal into a standard for tags files, it needs to be supported by most people working on versions of Vi, ctags, etc.. Currently this standard is supported by:
These have been or will be asked to support this standard:
Backwards compatibilityA tags file that is generated in the new format should still be usable by Vi. This makes it possible to distribute tags files that are usable by all versions and descendants of Vi. This restricts the format to what Vi can handle. The format is:
{tagname}<Tab>{tagfile}<Tab>{tagaddress}
The best way to add extra text to the line for the new functionality, without breaking it for Vi, is to put a comment in the {tagaddress}. This gives the freedom to use any text, and should work in any traditional Vi implementation. For example, when the old tags file contains: main main.c /^main(argc, argv)$/ DEBUG defines.c 89 The new lines can be: main main.c /^main(argc, argv)$/;"any additional text DEBUG defines.c 89;"any additional text Note that the ';' is required to put the cursor in the right line, and then the '"' is recognized as the start of a comment. For Posix compliant Vi versions this will NOT work, since only a line number or a search command is recognized. I hope Posix can be adjusted. Nvi suffers from this. SecurityVi allows the use of any Ex command in a tags file. This has the potential of a trojan horse security leak. The proposal is to allow only Ex commands that position the cursor in a single file. Other commands, like editing another file, quitting the editor, changing a file or writing a file, are not allowed. It is therefore logical to call the command a tagaddress. Specifically, these two Ex commands are allowed:
89
/^int c;$/ ?main()? There are two combinations possible:
/struct xyz {/;/int count;/ 389;/struct foo/;/char *s;/
89;" foo bar This might be extended in the future. What is currently missing is a way to position the cursor in a certain column. GoalsNow the usage of the comment text has to be defined. The following is aimed at:
ProposalUse a comment after the {tagaddress} field. The format would be: {tagname}<Tab>{tagfile}<Tab>{tagaddress}[;"<Tab>{tagfield}..]
Optionally:
A tagfield has a name, a colon, and a value: "name:value".
Other use of the backslash character is reserved for future expansion. Warning: When a tagfield value holds an MS-DOS file name, the backslashes must be doubled! EXCEPTION: Universal Ctags introduces more conversion rules.
EXCEPTION: Universal Ctags allows all these escape sequences in {tagname} and {tagfile} also. However, about {tagfile}, a condition must be satisfied. See "Exceptions in Universal Ctags" about the condition.
Proposed tagfield names:
Note that these are mostly for C and C++. When tags programs are written for other languages, this list should be extended to include the used field names. This will help users to be independent of the tags program used. Examples: asdf sub.cc /^asdf()$/;" new_field:some\svalue file: foo_t sub.h /^typedef foo_t$/;" kind:t func3 sub.p /^func3()$/;" function:/func1/func2 file: getflag sub.c /^getflag(arg)$/;" kind:f file: inc sub.cc /^inc()$/;" file: class:PipeBuf The name of the "kind:" field can be omitted. This is to reduce the size of the tags file by about 15%. A program reading the tags file can recognize the "kind:" field by the missing ':'. Examples: foo_t sub.h /^typedef foo_t$/;" t getflag sub.c /^getflag(arg)$/;" f file: Additional remarks:
Note about line separators: Vi traditionally runs on Unix systems, where the line separator is a single linefeed character <NL>. On MS-DOS and compatible systems <CR><NL> is the standard line separator. To increase portability, this line separator is also supported. On the Macintosh a single <CR> is used for line separator. Supporting this on Unix systems causes problems, because most fgets() implementation don't see the <CR> as a line separator. Therefore the support for a <CR> as line separator is limited to the Macintosh. Summary:
The characters <CR> and <LF> cannot be used inside a tag line. This is not mentioned elsewhere (because it's obvious). Note about white space: Vi allowed any white space to separate the tagname from the tagfile, and the filename from the tagaddress. This would need to be allowed for backwards compatibility. However, all known programs that generate tags use a single <Tab> to separate fields. There is a problem for using file names with embedded white space in the tagfile field. To work around this, the same special characters could be used as in the new fields, for example \s. But, unfortunately, in MS-DOS the backslash character is used to separate file names. The file name c:\vim\sap contains \s, but this is not a <Space>. The number of backslashes could be doubled, but that will add a lot of characters, and make parsing the tags file slower and clumsy. To avoid these problems, we will only allow a <Tab> to separate fields, and not support a file name or tagname that contains a <Tab> character. This means that we are not 100% Vi compatible. However, there is no known tags program that uses something else than a <Tab> to separate the fields. Only when a user typed the tags file himself, or made his own program to generate a tags file, we could run into problems. To solve this, the tags file should be filtered, to replace the arbitrary white space with a single <Tab>. This Vi command can be used: :%s/^\([^ ^I]*\)[ ^I]*\([^ ^I]*\)[ ^I]*/\1^I\2^I/ (replace ^I with a real <Tab>). COMMENT: Universal Ctags running on MS Windows converts the \ separator to / by default, and allows the escape sequences even in {tagfile} if a condition is satisfied. See "Exceptions in Universal Ctags" about the condition. TAG FILE INFORMATION: Pseudo-tag lines can be used to encode information into the tag file regarding details about its content (e.g. have the tags been sorted?, are the optional tagfields present?), and regarding the program used to generate the tag file. This information can be used both to optimize use of the tag file (e.g. enable/disable binary searching) and provide general information (what version of the generator was used). The names of the tags used in these lines may be suitably chosen to ensure that when sorted, they will always be located near the first lines of the tag file. The use of "!_TAG_" is recommended. Note that a rare tag like "!" can sort to before these lines. The program reading the tags file should be smart enough to skip over these tags. The lines described below have been chosen to convey a select set of information. Tag lines providing information about the content of the tag file: !_TAG_FILE_FORMAT {version-number} /optional comment/ !_TAG_FILE_SORTED {0|1} /0=unsorted, 1=sorted/ The {version-number} used in the tag file format line reserves the value of "1" for tag files complying with the original UNIX vi/ctags format, and reserves the value "2" for tag files complying with this proposal. This value may be used to determine if the extended features described in this proposal are present. Tag lines providing information about the program used to generate the tag file, and provided solely for documentation purposes: !_TAG_PROGRAM_AUTHOR {author-name} /{email-address}/ !_TAG_PROGRAM_NAME {program-name} /optional comment/ !_TAG_PROGRAM_URL {URL} /optional comment/ !_TAG_PROGRAM_VERSION {version-id} /optional comment/ EXCEPTION: Universal Ctags introduces more kinds of pseudo-tags. See ctags-client-tools(7) about them. COMMENT: Though pseudo-tags are semantically different from regular tags, They use the same format, which is: {tagname}<Tab>{tagfile}<Tab>{tagaddress} , and the escape sequences and illegal characters explained in "Proposal" section also applies to pseudo-tags.
EXCEPTIONS IN UNIVERSAL CTAGSUniversal Ctags supports this proposal with some exceptions. Exceptions
Compatible output and weaknessDefault behavior (--output-format=u-ctags option) has the exceptions. On the other hand, with --output-format=e-ctags option ctags has no exception; Universal Ctags command may use the same file format as Exuberant Ctags. However, --output-format=e-ctags throws away a tag entry which name includes a space or a tab character. TAG_OUTPUT_MODE pseudo-tag tells which format is used when ctags generating tags file. SEE ALSOctags(1), ctags-client-tools(7), ctags-incompatibilities(7), readtags(1)
|