This might be better off on some other site but I’m hoping it might be useful to some cocoa person out there enough that they might help.
I am trying to build a Yacc parser to parse Objective C code. The parser would simply parse Objective C code. (If you do not know what a yacc or bison parser is, just take a computer science degree. :-> ) There does not seem to be a parser out there, (if you know of one, please tell).
Applications might include extensions to object reflection, blocks, objective-c interpreters, enhancements to package system, obj-C IDEs etc.
GCC uses a complicated yacc parser, (in all 3864 lines), which is coupled to gcc (look in GCC sources for - gcc/objc/objc-parse.y). A lot of grammar rules seem spurious for my purposes. So I got a C formal grammar ( http://www.lysator.liu.se/c/ANSI-C-grammar-y.html ) and the objective C grammar and put them together with a lexer. And lo, there were plenty of shift/reduce | reduce/reduce errors. Big problems seem to be telling apart protocols, classnames, typenames and ordinary identifiers. There are some problems with the Objective C grammar as listed in “The Objective C Programming Language” - it needs a bit of reworking to get optional stuff like protocol lists working. I am not sure if anyone actually used that grammar anywhere.
I admit I am not (yet) proficient at parsing, grammars and such. I hope there may be some kind soul out there who can help.
MikeAsh has very kindly hosted the files.
Anyway, here is the lexer: http://www.mikeash.com/objcparser/objC.lex Here is the parser: http://www.mikeash.com/objcparser/objCParser.yacc (which does not work and needs help)
Here is gcc’s grammar, minus the code and other nasties: http://www.mikeash.com/objcparser/gcc-objc-parse.y
While the grammar is not complete (it was designed for the simpler task of documenting source) it is still a good example of integrating ObjC with lex and yacc code. See http://www.clindberg.org/ and look at the AutoDoc source code.
-jason
The Objective-C Programming Language grammar lists type-specifier as:
type-specifier: void char short int long float double signed unsigned id [ protocol-reference-list ] class-name [ protocol-reference-list ] struct-or-union-specifier enum-specifier typedef-name
However I often just use protocol-reference-list as a type-specifier, eg:
which seems to be fine with gcc. I would certainly want to parse stuff like this. But making this part of the grammar interferes with many things. I don’t think I can trust that grammar anymore. Which leads me to ask: “Does anyone have the real grammar for Objective-C?” – MikeAmy
I think that works because unspecified types in ObjC default to id, so it interprets those untyped protocol declarations that way. – CharlesSteinman
I’ve never seen that syntax for protocols before, I always stick ‘id’ in front of them. Anyway, I assume that adding a raw protocol-reference-list to type-specifier runs into conflicts because of the <>’s around the list. What if you made a specialized type-specifier, call it method-type-specifier or something, which included all of the standard type-specifier things, plus the protocols?
I can certainly understand your desire for a “real grammar” for ObjC, but I don’t think you will get it. There is no ObjC standard; the language is defined by its implementation. I think the closest thing you’ll get to a real grammar is the grammar from one of the compilers. The reason I’m so interested in this parser is because it may provide a “real grammar” for ObjC that isn’t totally inseparable from a compiler. – MikeAsh
I came across the rule that allows
– MikeAmy
Given up for now. This is just too much of a problem. For anyone looking for a grammar, starting with the GCC grammar would be the best idea but still pretty hard. So I have given up with this for now. Don’t watch this space. – MikeAmy
There’s an Objective-C 2.0 grammar for ANTLR here: http://www.antlr.org/grammar/list (did not test it) – Daniel Furrer
I got as far as an Objective-C grammar and outlines for parsers in C and Objective-C.
http://code.google.com/p/parce/
Maybe the grammar would be useful since it’s for YACC. (I’m sorry the rest of the project is pretty hopeless; it was mooted by Objective-C 2 and the improved code completion in Xcode 3.) – BrentGulanowski