Object-oriented design patterns in C and kernel developmentoshub.org

244 points by joexbayer 3 days ago | 209 comments

ryao 3 days ago [-]

> The article describes how the Linux kernel, despite being written in C, embraces object-oriented principles by using function pointers in structures to achieve polymorphism.

This technique predates object oriented programming. It is called an abstract data type or data abstraction. A key difference between data abstraction and object oriented programming is that you can leave functions unimplemented in your abstract data type while OOP requires that the functions always be implemented.

The sanest way to have optional functions in object oriented programming that occurs to me would be to have an additional class for each optional function and inherit each one you implement alongside your base class via multiple inheritance. Then you would need to check at runtime whether the object is an instance of the additional class before using an optional function. With an abstract data type, you would just be do a simple NULL check to see if the function pointer is present before using it.

pavlov 2 days ago [-]

In Smalltalk and Objective-C, you just check at runtime whether an object instance responds to a message. This is the original OOP way.

It's sad that OOP was corrupted by the excessively class-centric C++ and Java design patterns.

moregrist 2 days ago [-]

> In Smalltalk and Objective-C, you just check at runtime whether an object instance responds to a message. This is the original OOP way.

This introduces performance issues larger than the typical ones associated with vtable lookups. Not all domains can afford this today and even fewer in the 80s/90s when these languages were first designed.

> It's sad that OOP was corrupted by the excessively class-centric C++ and Java design patterns.

Both Smalltalk and Objective-C are class based and messages are single-receiver dispatched. So it’s not classes that you’re objecting to. It’s compile-time resolved (eg: vtable) method dispatch vs a more dynamic dispatch with messages.

Ruby, Python, and Javascript all allow for last-resort attribute/message dispatching in various ways: Ruby via `method_missing`, Python by supplying `__getattr__`, and Javascript via Proxy objects.

chunkyguy 2 days ago [-]

> This introduces performance issues larger than the typical ones associated with vtable lookups.

Don't know about other programming languages but with Objective-C due to IMP caching the performance is close to C++ vtable

  Name Iterations Total time (sec) Time per (ns)
  C++ virtual method call 1000000000 1.5 1.5
  IMP-cached message send 1000000000 1.6 1.6

https://mikeash.com/pyblog/friday-qa-2016-04-15-performance-...

pjmlp 1 days ago [-]

In NeXTSTEP it was fast enough to have the whole OS, including device drivers, written in Objective-C.

In Smalltalk systems that stop being an issue after JITs got introduced.

1718627440 2 days ago [-]

When you fear about the branch miss, then you can also just call the method and catch the SIGSEGV. I think if you allow for the possibility of there being no implementation, then you can't really not have some decision there. This would also apply for say a C++ virtual method.

frutiger 1 days ago [-]

A SIGSEGV is not guaranteed when calling an address that does not have a function.

1718627440 1 days ago [-]

Are you talking about C? In C it's not guaranteed whether a non existing function will result in an actual function call. As soon as an actual function call was generated by the compiler, a modern CPU is very likely to trap.

We are talking about an optimization of a language implementation here. This would be very much written in a ASM or another language were this is defined.

pjmlp 2 days ago [-]

Actually I would say that it is sad that developers learn a specific way how a technology is done in language XYZ and then use it as template everywhere else, what happened to the curiosity of learning?

ryao 2 days ago [-]

The curiosity of learning is infeasible given that there are >15,000 programming languages. You might say to only learn the major/influential languages, but I have found my curiosity to wane with every additional language that I learn. So far, I have either used or dabbled in C, C++, FORTRAN, SML/NJ, PHP, Java, JavaScript, Python, Go, POSIX Shell, Assembly and SQL stored procedures. That is not to mention different dialects/versions of those, and the fact that assembly itself is a catch all category in which I have dabbled in multiple languages too. I am also not including languages for which I spent a minuscule amount of time (say a few hours), such as D (as in DTrace), Objective-C, Perl, COBOL and LISP. Then there is the misadventure into C# I took before I actually understood programming where I got stuck and dropped it. I am also not sure if I should mention AWK, as I only use it to select the Nth field in a bunch of lines of text when doing shell scripting and nothing else. Thus, I have used it many times, yet know almost none of its syntax.

Every few years, someone tells me I should learn another language, and in recent years, there just is no desire in my mind to want to learn yet another language that is merely another way of doing something that I already can do elsewhere and the only way I will is if I am forced (that is how I used Go).

That said, I do see what you are saying. C++ for example has an “support every paradigm” philosophy, so whenever someone who has learned C++ encounters a language using a paradigm that C++ assimilated, there is a huge temptation to try to view it through the lens of C++. I also can see the other side too: “C++ took me forever to learn. Why go through that again when I can use C++ as a shortcut to understand something else?”. C++ is essentially the Borg of programming languages.

1718627440 2 days ago [-]

How do you got proficient in so many languages? I think it takes some years, before you start to think in a language.

johnisgood 2 days ago [-]

I did not know they say the same thing about programming languages.

Based on his comment, I did not think that he is proficient in them, but that he has used them, which is fair enough, so have I, sans all the ones tied to either Apple (Swift) or Microsoft (C#).

I have some projects in Haskell just for curiosity's sake, and because what I wanted seemed like it would be nice in Haskell, and it indeed looks quite elegant to me, for this one particular project. Haskell is not a language I would use generally. OCaml is.

ryao 1 days ago [-]

You are correct. I am only highly proficient in a few of them. The others are ones that I have used for varying sized projects for varying reasons, but I only learned the subsets I needed for my purposes. I would say my ability in the others in my main list varies within the low to intermediate range of proficiency.

johnisgood 1 days ago [-]

Yeah, same here. This Haskell project is nothing serious, it just prints you the difference in time between two dates. I thought the Haskell implementation would look nice and it did. I initially made it in bash but I realized this is much more complex than what bash can handle. I had to calculate for particular calendars among a lot of other things. I think it works, more or less, however.

ryao 1 days ago [-]

My proficiency in them varies. I have used them for projects, such that I can write code in them (provided I have references to read), but when I do, I only learn/use the subset of the language that I need. If I want to read code others have written, then I need to know the subset that they used and that is not always the subset that I know. Most of them are languages that I have used for a handful of projects or in the case of assembly, for very small portions of projects, with at least half of that spend just trying to understand how good of a job the compiler did in critical loops.

There are some languages in which I am extremely proficient. My best language is C, which is my favorite and I have used most features of every version of C from C89 to C11. My second best is probably either C++ or POSIX shell (although I have moments where I forget certain syntax and need to look it up, especially in POSIX shell for variants on variable substitution, e.g. ${VAR%%foobar}). I have used most features of C++98 and some from newer versions. My experiences with C++ have soured me on it, so I now try to avoid C++ whenever I can in favor of C and Python.

My first language was actually PHP 4.2.y, and I was fairly proficient in it, having spent a long time learning it while simultaneously writing my own code for a website as a teenager. However, I never once touched the portions describing objects/classes, namespaces or exceptions. Someone else at work writes Modern PHP code using Symfony and I have taken a peek at it. It looks very different from the PHP I knew because it uses the features I had avoided learning (and probably some new language features too), although I can sort of read it thanks to having learned those concepts in other languages.

I used SML/NJ and Java in college. Years after college, I modified an open source Android TV application written in Java to add some things I wanted, although honestly, beyond that I have not really touched either language. Give me an arbitrary application written in either to improve and I would have some difficulty, although I will probably be able to do it after filling the gaps in my understanding (and doing plenty of head banging if it is a large program).

I have used JavaScript for a few recent projects via electron/nodejs and I have done several small things in Python over the past several years. Each time, I only worked with the subset that I need. I am far from a master of either language that can understand arbitrary code written in them, but I am able to manage as far as my specific needs are concerned.

I could continue listing my experiences doing things in languages (like the time in college that I wrote some basic programs in FORTRAN 90 to try to learn it), but they really are not that interesting. It was often a project here or a small application there, and as I readily admitted, I only used a subset of most of the languages. For programming, a subset of commonly used bits is often all you need.

bluGill 2 days ago [-]

Java is excessively class centric. C++ often is, but it need not be and developers are movingiaway.

smalltalk is not original OO - c++ took oo from simula which was always a different system.

baranul 1 days ago [-]

This is a very good point. A greater distinction needs to be made about class-based versus class-free OOP. Many of the problems or constraints around OOP that people don't like, come back to classes.

Often class-based programming is confused as being the only style of OOP, superior to all other styles, or heavy-handedly pushed on others. Many programmers are perfectly fine with using objects or only specific features of OOP, without classes, if they are "allowed" to.

mettamage 2 days ago [-]

Wait, so in obj-c, could you also write some kijdnof doesnotunderstand method to achieve some dynamic method dispatch?

pjmlp 2 days ago [-]

Yes, that is how microservices were implemented in the days of NeXTSTEP. with PDO.

https://en.wikipedia.org/wiki/Portable_Distributed_Objects

astrange 2 days ago [-]

It's how many things are implemented, like UI message routing, IPC, undo, test mocks.

mettamage 2 days ago [-]

So why did Swift become a thing? Or does Swift has this too?

ryao 22 hours ago [-]

There is always a subset of the population for which it is fashionable to try to make a new language to make things easier. It is inevitable that someone in a sufficiently large organization will try to make one, provided management supports it. Apple is a very wealthy company that is happy to sponsor R&D, had someone interested in this that they sponsored and the result was appealing enough that they decided to adopt it.

That is my understanding of how the process generally works and what I am willing to guess happened. Prior to this, they had been making incremental changes to Objective-C.

That said, from what I have seen of the syntax of both languages, swift’s syntax is nicer and that is not something that they would have been able to get from Objective-C. They already had tried syntax reform for Objective-C once in the past and abandoned it.

pjmlp 2 days ago [-]

Because Objective-C being based on C, means it would never be safe while remaining compatible with C.

Also dynamic runtime dispatch Smalltalk style can never be as fast as the VMT based dispatch, or compile time dispatch via generics, even with all the optimizations in place, that objc_msgSend() has had during its lifetime.

Still, Metal is implemented in Objective-C, so there is that.

1718627440 1 days ago [-]

Objective-C already has a different compiler I fail to see how the compiler is restricted in the checks it does.

astrange 2 days ago [-]

Swift isn't messaging-based, it's protocol-based. (Except it's also messaging-based if you use @objc.)

bitwize 2 days ago [-]

Because Objective-C does not have foo.bar() style method calls and that's what everybody else uses and wants.

saagarjha 2 days ago [-]

Because Swift adds many other features.

MangoToupe 2 days ago [-]

Sure, but that is a a horrible coding pattern. Even good smalltalk code doesn't rely on this. It's dogshit and lazy

trws 3 days ago [-]

I largely agree, and use these patterns in C, but you’re neglecting the usual approach of having a default or stub implementation in the base for classic OOP. There’s also the option of using interfaces in more modern OOP or concept-style languages where you can cast to an interface type to only require the subset of the API you actually need to call. Go is a good example of this, in fact doing the lookup at runtime from effectively a table of function pointers like this.

ryao 2 days ago [-]

My point is that this pattern is not object oriented programming. As for a default behavior with it, you usually would do that by either always adding the default pointer when creating the structure or calling the default whenever the pointer is NULL.

In the Linux VFS for example, there are optimized functions for reading and writing, but if those are not implemented, a fallback to unoptimized functions is done at the call sites. Both sets are function pointers and you only need to implement one if I recall correctly.

f1shy 2 days ago [-]

To be fair, OOP is not 100% absolutely perfectly defined. Strustrup swears C++ is OOP, Alan Key, at least at some point laughed at C++, and people using CLOS have yet another definition

pjmlp 1 days ago [-]

You forgot about people using BETA, or Self, or ......

naasking 2 days ago [-]

> My point is that this pattern is not object oriented programming.

I think the "is/is not" question is not so clear. If you think of "is" as a whether there's a homomorphism, then it makes sense to say that it is OOP, but it can qualify as being something else too, ie. it's not an exclusionary relationaship.

ryao 2 days ago [-]

Object oriented programming implies certain contracts that the compiler enforces that are not enforced with data abstraction. Given that object oriented programming and data abstraction two live side by side in C++, we can spot the differences between member functions that have contracts enforced, and members function pointers that do not. Member functions have an implicit this pointer, and in a derived class, can call the base class version via a shorthand notation to the compiler (BaseClass::func() or super()), unless that base class version is a pure virtual function. Member function pointers have no implicit this pointer unless one is explicitly passed. They have no ability to access a base class variant via some shorthand notation to the compiler because the compiler has no contract saying that OOP is being done and there is a base class version of this function. Finally, classes with unimplemented member functions may not be instantiated as objects, while classes with unimplemented member functions pointers may.

If you think of the differences as being OOP implies contracts with the compiler and data abstraction does not (beyond a simple structure saying where the members are in memory), it becomes easier to see the two as different things.

1718627440 2 days ago [-]

So you can opt out or in to syntactic sugar, that makes C++ an interesting and useful language, but how you implement OOP, doesn't really affect if it is OOP.

ryao 2 days ago [-]

By this logic, C is an objective oriented language. It is widely held to not be. That is why there were two separate approaches to extend it to make it object oriented, C++ and Objective-C.

1718627440 2 days ago [-]

You can implement OOP in C as you can in any language, the article is an example of this. C is not an OOP language in any way, it doesn't have any syntactic features for it and use the term "object" for something different.

ryao 2 days ago [-]

The article mentions file_operations, but ignores that it has what would be a static member function in C++ in the form of ->check_flags(), which is never in a vtable. The article author is describing overlap between object oriented programming and something else, called data abstraction, which is what is really being done inside Linux, and calling it OOP.

You can implement OOP in C if you do vtables for inheritance hierarchies manually, among other things, but that is different than what Linux does.

1718627440 1 days ago [-]

I honestly don't think how a C++ compiler chooses to implement an object method does matter here.

It's a function belonging to an object, to which is dynamically dispatched with something I would call a vtable. To me that sounds like a classic example of OOP.

Data abstraction is a core of OOP.

This pattern can be used to implement inheritance, when it isn't here that doesn't mean its not OOP.

ryao 19 hours ago [-]

Data abstraction is a separate invention from OOP since it involves abstract data types. What is being used here is an abstract data type. It is not the pattern used in OOP languages and it is not OOP. It bears similarities and overlap with the vtables used to implement some OOP languages. It is like how thumbs bear similarities and overlap with index fingers, but the two are not the same.

1718627440 8 hours ago [-]

To cite Wikipedia:

> Object-oriented programming (OOP) is a programming paradigm based on the object – a software entity that encapsulates data and function(s). An OOP computer program consists of objects that interact with one another. A programming language that provides OOP features is classified as an OOP language [...]

You don't disagree, that this kernel pattern is about data abstraction. You probably don't disagree, that the kernel uses functions. The kernel uses "objects" (FS implementations) that follow a defined set of functions, sometimes called "class" (vtables/wtables/however you like to call them). Therefore I conclude what the kernel does here is a prime example of OOP.

ryao 5 hours ago [-]

You can use similar logic to declare English to be an example of Chinese. They both have syllables. They both assemble syllables into words that convey meaning. They both use grammars to form relationships between those words. Thus, they must be the same. It is fallacious logic. Some similarities do not make things the same. Data abstraction is also its own topic that is able to stand independently from OOP. What the kernel does is data abstraction, not OOP. What you are seeing in the kernel are the abstract data types of data abstraction.

teo_zero 2 days ago [-]

> Object oriented programming implies certain contracts that the compiler enforces

Sorry, but where did you got this definition from? I've always thought OOP as a way of organizing your data and your code, sometimes supported by language-specific constructs, but not necessarily.

Can you organize your data into lists, trees, and hashmaps even if your language does not have those as native types? So you can think in a OO way even if the language has no notion of objects, methods, etc.

ryao 2 days ago [-]

> Sorry, but where did you got this definition from?

It is from experience with object oriented languages (mainly C++ and Java). Technically, you can do everything manually, but that involves shoehorning things into the OO paradigm that do not naturally fit, like the article author did when he claimed struct file_operations was a vtable when it has ->check_flags(), which would be equivalent to a static member function in C++. That is never in a vtable.

If Al Viro were trying to restrict himself to object oriented programming, he would need to remove function pointers to what are effectively the equivalent of static member functions in C++ to turn it into a proper vtable, and handle accesses to that function through the “class”, rather than the “object”.

Of course, since he is not doing object oriented programming, placing pointers to what would be virtual member functions and static member functions into the same structure is fine. There will never be a use case where you want to inherit from a filesystem implementation’s struct file_operations, so there is no need for the decoupling that object oriented programming forces.

> I've always thought OOP as a way of organizing your data and your code, sometimes supported by language-specific constructs, but not necessarily.

It certainly can be, but it is not the only way.

> Can you organize your data into lists, trees, and hashmaps even if your language does not have those as native types?

This is an odd question. First, exactly what is a native type? If you mean primitive types, then yes. Even C++ does that. If you mean standard library compound types, again, yes. The C++ STL started as a third party library at SGI before becoming part of the C++ standard. If you mean types that you can define, then probably not without a bunch of pain, as then we are going back to the dark days of manually remembering offsets as people had to do in assembly language, although it is technically possible to do in both C and C++.

What you are asking seems to be exactly what data abstraction is, which involves making an interface that separates use and implementation, allowing different data structures to be used to organize data using the same interface. As per Wikipedia:

> For example, one could define an abstract data type called lookup table which uniquely associates keys with values, and in which values may be retrieved by specifying their corresponding keys. Such a lookup table may be implemented in various ways: as a hash table, a binary search tree, or even a simple linear list of (key:value) pairs. As far as client code is concerned, the abstract properties of the type are the same in each case.

https://en.wikipedia.org/wiki/Abstraction_(computer_science)...

Getting back to doing data structures without object oriented programming, this is often done in C using a structure definition and the CPP (C PreProcessor) via intrusive data structures. Those break encapsulation, but are great for performance since they can coalesce memory allocations and reduce pointer indirections for objects indexed by multiple structures. They also are extremely beneficial for debugging, since you can see all data structures indexing the object. Here are some of the more common examples:

https://github.com/openbsd/src/blob/master/sys/sys/queue.h

https://github.com/openbsd/src/blob/master/sys/sys/tree.h

sys/queue.h is actually part of the POSIX standard, while sys/tree.h never achieved standardization. You will find a number of libraries that implement trees like libuutil on Solaris/Illumos, glib on GNU, sys/tree.h on BSD, and others. The implementations are portable to other platforms, so you can pick the one you want and use it.

As for “hash maps” or hash tables, those tend to be more purpose built in practice to fit the data from what I have seen. However, generic implementations exist:

https://stackoverflow.com/questions/6118539/why-are-there-no...

That said, anyone using hash tables at scale should pay very close attention to how their hash function distributes keys to ensure it is as close to uniformly random as possible, or you are going to have a bad time. Most other applications would be fine using binary search trees. It probably is not a good idea to use hash tables with user controlled keys from a security perspective, since then a guy named Bob can pick keys that cause collisions to slow everything down in a DoS attack. An upgrade from binary search trees that does not risk issues from hash function collisions would be B-trees.

By the way, B-trees are containers and cannot function as intrusive data structures, so you give up some convenience when debugging if you use B-Trees.

1718627440 2 days ago [-]

> handle accesses to that function through the “class”, rather than the “object”

You don't need classes for OOP. C++ not putting methods that logically operate on an object, but don't need a pointer to it, into the automatically created vtable, is an optimization and an implementation detail. I don't know why you think that putting this function into a vtable precludes OOP.

Wait, how does inheritance work when the method is not in the vtable?

ryao 20 hours ago [-]

The calling convention for C++ non-static member functions always includes a this pointer, even if the function does not use it. Removing it on member functions that do not use it would pose a problem if another class inherited from this class and overrode the function definition with one that did use it. Maybe in very special cases whole program optimization could safely remove the this pointer, but it is questionable whether any compiler author would go through the trouble given that the exception handling would need to know about the change. Outside of whole program optimization, it is unlikely removing this from member functions that do not use it would ever happen because it would break ABI stability.

As for how inheritance works when the member function is not in the vtable, that depends on what kind of member function it is. All C++ functions are given a mangled name that is stuffed into C’s infrastructure for linking symbols. For static member functions, inheritance is irrelevant since they are tied to the class. Calls to static member functions go directly to the mangled function with no indirections, just as if it had been a global function. For non-static virtual member functions, you use the vtable pointer to find it. For non-virtual member functions, the call goes straight to the function as if a global function had been called (and the this pointer is still passed, even if the function does not use it), since the compiler knows the type and thus can tell the linker to have calls there go to the function through the appropriately mangled name. It is just like calling a global function.

1718627440 8 hours ago [-]

> The calling convention for C++ non-static member functions always includes a this pointer, even if the function does not use it.

Yes. Since we are not in C++ we can choose to get rid of this useless pointer.

> Removing it on member functions that do not use it would pose a problem if another class inherited from this class and overrode the function definition with one that did use it.

That problem has nothing to do with the this pointer specifically. When you change the method signature of an inherited method you always have this problem. This simply means, that the superclass prescribes limits to subclasses, which is why it's possible to use a subclass inplace of a superclass.

> Maybe in very special cases whole program optimization could safely remove the this pointer, but it is questionable whether any compiler author

Yes, that's why its not done in C++, but we can do it, if we handroll it.

> it would break ABI stability

It does not if it has always been like this.

> For static member functions, inheritance is irrelevant since they are tied to the class. Calls to static member functions go directly to the mangled function with no indirections

In other words, ->check_flags() can't be implemented as a static member functions in C++. It would simply have a this pointer, that it just wouldn't use, since C++ has no way to express non-static member functions, that just don't take a this pointer.

> thus can tell the linker to have calls there go to the function

In our case the linker can only resolve the call to the appropriate vtable, since the type isn't known until runtime.

ryao 8 hours ago [-]

> Yes. Since we are not in C++ we can choose to get rid of this useless pointer.

If you were trying to implement OOP in the kernel in C and implemented a vtable, you cannot get rid of the this pointer in vtable entries since a child class might want to use it in the overrode definition. It is one of the same reasons why you cannot remove it in C++. The entire point of a vtable is to enable inheritance. If OOP really were being done, an out of tree module could make a class that inherits from this one without needing any code changes and use the this pointer, but you cannot do that if you drop the this pointer. I already explained this.

1718627440 7 hours ago [-]

This is one interpretation. The other is that the interface of check_flags() specifies, that any implementation of it is only allowed to differ on the type of the object and not any other property.

You already prescribe with the chosen arguments in the superclass on which things the child implementation can depend. Why not also do this with the first argument?

ryao 5 hours ago [-]

You would typically put the this pointer into the first argument when doing OOP in C. You can put the this pointer in the last argument to have it work too. However, you cannot omit it entirely. That is something that is not OOP. It is an ADT.

1718627440 2 days ago [-]

> My point is that this pattern is not object oriented programming.

Isn't this exactly how most (every?) OOP language implements it? You would say a C++ virtual method isn't OOP?

ryao 2 days ago [-]

Member function pointers and member functions in C++ are two different things. Member function pointers are not OOP. They are data abstraction.

The entire point of OOP is to make contracts with the compiler that forcibly tie certain things together that are not tied together with data abstraction. Member functions are subject to inheritance and polymorphism. Member function pointers are not. Changing the type of your class will never magically change the contents of a member function pointer, but it will change the constants of a non-virtual member function. A member function will have a this pointer to refer to the class. A member function pointer does not unless you explicitly add one (named something other than this in C++).

1718627440 2 days ago [-]

Yeah, but the compiler implements these by adding vtables, propagating vtables values across inheritance hierarchies, adding another parameter.

You claim when the compiler does this, it's OOP, but when I do it, it's not?

dragonwriter 2 days ago [-]

Ìf you do it, it can still be OOP, its just not in an OO language. People have trouble separating using a paradigm and using a language focused on the paradigm, for some reason.

2 days ago [-]

ryao 2 days ago [-]

The entire point of OOP in every OOP language that I have ever used has been to have the language try to constrain what you can do by pushing restrictions on syntactic sugar involving objects, inheritance and encapsulation, so I would say yes. The marketing claims that people will be more productive at programming by using these.

1718627440 2 days ago [-]

Yes, you need to have that to have an OOP language. OOP is object-oriented _Programming_, it's about how you program, not what features the language has.

ryao 2 days ago [-]

In hindsight, I had your remark confused with another remark insisting that struct inode_operations is a vtable, despite it having what would be static member functions in C++, which are never in vtables, and there being no inheritance hierarchy. If you are disciplined enough to do what you said, then I could see that as being OOP, but the context here is of something that is not OOP and only happens to overlap with it. The article mentions file_operations, but ignores that it has what would be a static member function in C++ in the form of ->check_flags(), which is never in a vtable.

1718627440 2 days ago [-]

I'm also thinking that these kind of vtables in the linux kernel are what would be implemented by the compiler in C++. But because its self-written, you can be much more creative and do other things, that weren't possible if this would be created by a compiler.

Of course you could implement the same in C++ and then it can't be the same as the vtable introduced by the compiler, so you would just end up with to vtables, you own and the one introduced by the compiler.

ryao 20 hours ago [-]

If the kernel were written in C++, it would still be done the way it is done now. C++ does not allow unimplemented member functions and the ADTs currently used do. You can emulate that with multiple inheritance, but it is an inferior way of doing this.

As I said, these are NOT vtables. The fact that you and some others keep thinking of them as vtables misleads you into thinking that this can be done using the object oriented tools of C++. It cannot without major hacks and the result would be slower, harder to read and only something that a bureaucrat could like.

1718627440 8 hours ago [-]

If the kernel were written in C++, it simply had the incentive to be less creative. Since it isn't it can be. It's just a restriction imposed by C++, not a restriction in the loosely defined paradigm of OOP.

> As I said, these are NOT vtables

Ok, you just define vtables differently then me. To me a vtable is a table of virtual functions that are used to implement polymorphic behaviour of objects. This applies to their usage in the kernel and the article. Feel free to introduce a new term for this. If your only distinction is whether these are created by a compiler, this is just a distinction I don't care about.

ryao 5 hours ago [-]

The article author is wrong. It happens. Draw a vent diagram with two partially overlapping circles. You and the author are looking at the overlap and concluding the two are the same. They are not, given the stuff outside the overlap.

As for the one distinction you recognize and think is invalid, that distinction is given by the definition you found. You refuse to obey the definition you yourself quoted to settle matter elsewhere in the thread.

1718627440 2 days ago [-]

> This technique predates object oriented programming.

I would rather say that OOP is a formalization of predating patterns and paradigma.

ryao 2 days ago [-]

OOP cannot be a formalization of what predated it because the predating patterns support things that OOP explicitly disallows, like instantiation with unimplemented functions. That is extremely useful when you want to implement an optional function, or mutually exclusive functions such that you pick which is optional. This should be the case in the Linux VFS with ->read() and ->read_iter(). Also, ASTs were formalized after OOP, despite existing prior to it in Lisp.

For full disclosure, I have never verified that leaving ->read() unimplemented when ->read_iter() is implemented is safe, but I have seen enough examples of code that I strongly suspect it is and if it is not, it is probably a bug.

p_l 2 days ago [-]

OOP does not disallow instantiation with unimplemented functions, it's just an artefact of implementation in some languages.

kazinator 2 days ago [-]

OOP only disallows inheritance with unimplemented functions when it's a contract violation.

So that is to say, if the base class has a certain function which must be implemented and must provide certain behaviors, then the derived class must implement that function and provide all those behaviors.

The POSIX functions like read and write do not have a contract which says that all implementations of them must successfully transfer data. Being unimplemented (e.g returning -1 with errno EOPNOTSUPP or whatever) is allowed in the contract.

OOP just wants a derived thing to obey the contract of the abtraction it is inheriting, so if you want certain liberties, you have to push them into the contract.

ryao 2 days ago [-]

I would call returning something being implemented as a stub rather than being unimplemented. When something is unimplemented and you try to call it, you crash due to a NULL/invalid pointer dereference, not get an error back. Of course, as far as getting things done is concerned, the two might as well be the same, but as far as how the language works, the two are different.

kazinator 2 days ago [-]

That's just it; in the POSIX world, the library functions like read and write are just a facade. They call something in a kernel, and that something examines the file descriptor and dispatches a lower level function. It's possible that that is literally unimplemented: as in a null function pointer in some structure. The facade converts that to a -1 return with an approprriate `errno` like EOPNOTSUPP (operation not supported).

p_l 2 days ago [-]

Crashing is optional, depending on error model of the language. C has pitiful error model, thus you'll usually end up jumping to 0... but I recall at least one C environment where that gave you an error back instead of crash.

As far as OOP is concerned, lack of implementation is not an issue in instantiating something - an object will just not understand the message, possibly catastrophically.

ryao 2 days ago [-]

I was referring to POSIX functions when talking about stubs versus unimplemented functions. Messaging in a programming language is a different animal; one that I have yet to fully understand. Objective C for example resolves messages to function calls, so they look like function calls to me. I would love to know how they differ, but I have never dug into it.

p_l 2 days ago [-]

Semantically, most OOP models[1] (even C++) involve "messages" that are "sent" to an object. "Methods" are "how do I handle this kind of message".

This usually resolves to a function call because it's the easiest and most sensible way to do it.

Objective-C is more explicit about it due to Smalltalk heritage. Some languages model objects as functions (closures) that one calls with the "message" (an old trick in various FP languages is to implement a trivial object system with closures for local state).

[1] Arguably CLOS with Generic Functions can be seen as outlier, because the operation becomes centerpiece, not the object.

1718627440 2 days ago [-]

Having a stub or having a NULL pointer are two ways to leave something unimplemented. Which you use is an implementation detail. You can also use some other sigil instead of NULL.

ryao 19 hours ago [-]

A stub is an empty implementation. An empty implementation is different from no implementation, where there is no code to execute. It is the difference between writing 0 and writing nothing.

1718627440 8 hours ago [-]

If you care about this distinction, yes. However the LWN post describes how both can be used to implement the same behaviour, that their is no defined special implementation.

ryao 5 hours ago [-]

I have no idea what LWN post you mean. You can make stubs trap by doing a NULL function pointer dereference, but that does not make NULL function pointer dereferences the same as stubs in general.

1718627440 2 days ago [-]

That sounds like a case of the Liskov substitution principle.

kazinator 2 days ago [-]

Yes it is, but the point is that without the contract being defined, we can't apply the principle; the details of the interface contract determine what it means to be substitutable.

The LSP is never absolute in a practical system. Because why would you have, say, in a casee of inheritance, a Y which is an new kind of X, if all it did was substitute for X, and behave exactly the same way? Just use X and don't create Y; why create a substitute that is perfectly identical.

If there is a reason to introduce a new type Y which can be used where X is currently used, then it means they are not substitutable. The new situations need a Y and not X, and some situations don't need a Y and stay with X.

In a graphics program, an ellipse and rectangle are substitutable in that they have a draw() method and others, so that they plug into the framework.

But they are not substitutable in the user's design; where the user needs an ellipse, a rectangle won't do, and vice versa.

In that case the contract only cares about the mechanics of the program: making multiple shapes available in a uniform way, with a program organization that makes it easy to code a new shape.

The substitution principle serves the program organization, and not much more.

So with the above discussion in place, we can make an analogy: an ellipse object in a vector graphics program can easily be regarded as a version of a rectangle with unimplemented corners.

The user not onnly doesn't mind that the corners are not implemented, but doesn't want them because they wouldn't make sense.

In the same way, it doesn't make sense to have lseek() on a pipe, or accept() on TTY descriptor or write() on a file which is read only to the caller, etc.

1718627440 1 days ago [-]

> if all it did was substitute for X, and behave exactly the same way?

LSP is about behaviour existing in the supertype. Adding behaviour doesn't violate LSP.

> In a graphics program, an ellipse and rectangle are substitutable in that they have a draw() method and others, so that they plug into the framework.

The behaviour in question means it draws something. It can draw something different every time, and not violate LSP here.

1718627440 2 days ago [-]

> like instantiation with unimplemented functions

I think this is more of an effect of C distinguishing between allocating memory (aka object creation) and initialization, which other languages disallow for other reasons, not because there are not OOPy enough.

ryao 2 days ago [-]

Unlike OOLs, C does not enforce contracts around structures (objects). You are free to do literally anything you want to them.

pjmlp 1 days ago [-]

Including implementing Abstract Data Types, thus everything on the structs is only accessible via functions that enforce the contracts.

mistrial9 2 days ago [-]

The concept of abstract data type is a real idea in the days of compiler design. You might as well say "compiler design predates object oriented programming". The technique described in the lead is used to implement object-oriented programming structures, just as it says. So are lots of compiler design features under the hood.

source- I wrote a windowing framework for MacOS using this pattern and others, in C with MetroWerks at the time.

ryao 2 days ago [-]

Compiler design does predate object oriented programming. The first compiler was made by John Backus et al at IBM in April 1957.

As for abstract data types, they originated in Lisp, which also predates object oriented programming.

pjmlp 2 days ago [-]

Actually, no.

"AN ALGORITHMIC THEORY OF LANGUAGE", 1962

https://apps.dtic.mil/sti/tr/pdf/AD0296998.pdf

In this paper they are known as plexes, eventually ML and CLU will show similar approaches as well.

Only much latter would Lisps evolve from plain lists and cons cells.

ryao 2 days ago [-]

You caused me to do some digging. That publication is dated November 1962. The Lisp 1.5 manual’s preface is dated August 17, 1962, which is even older. It describes lambdas and property lists, which seem like they can be used to implement ADTs, although I do not have a Lisp 1.5 interpreter since those are obsolete, so I cannot verify that. Computer history articles claim that Simula, the first object oriented language, was born in May 1962, but was not actually operational until January 1965:

https://history-computer.com/software/simula-guide/

Thus, while I had thought Lisp had ADT concepts before the first OOL existed, now I am not sure. My remark that they originated in Lisp had been said with the intention that I was talking about the first language to have it. The idea that the concept had been described outside of an actual language is tangential to what I had intended to say, which is news to me. Thanks for the link.

hiker 1 days ago [-]

Plexes are first mentioned in 1960

https://dl.acm.org/doi/pdf/10.1145/366199.366256

and the paper even starts with a critique of the efficiency of Lisp's approach for representing data with cons pairs (citing McCarthy's paper from the same year).

You might also want to watch Casey's great talk on the history of OOP

https://www.youtube.com/watch?v=wo84LFzx5nI

kerblang 2 days ago [-]

You can do exactly what was done in C with most OOP languages like Java & C# because you have lambdas now, and lambdas are just function pointers. You can literally assign them to instance variables (or static variables).

(sorry it took more than a decade for Java to catch up and Sun Microsystems originally sued Microsoft for trying to add lambdas to java way back when, and even wrote a white paper insisting that anonymous inner classes are a perfectly good substitute - stop laughing)

yndoendo 2 days ago [-]

Inheritance is not needed when a composite pattern can be used.

class DefaultTask { }

class SpecialTask { }

class UsedItem {

    UsedItem() { _task = new SpecialTask() }
    
    void DoIt() { _task.DoIt() }

}

Is python a OOP language? Self / this / object pointer has to be passed similar to using C style object-oriented / data abstraction.

1718627440 2 days ago [-]

The interesting thing is, that in the OOP implementation inheritance IS composition of vtables and data. It's really only syntactic sugar, that is sometimes not unambiguous.

zozbot234 2 days ago [-]

This is not quite correct. OOP implementation inheritance involves a kind of "open recursion" (that is, calls to base-class methods can end up dispatching to implementations from a derived class) that is not replicated with pure composition. All method calls, including method calls that originate from code in some class anywhere in the hierarchy, ultimately dispatch through the vtable of whatever object they're called on.

1718627440 2 days ago [-]

But that's exactly the same you would need to implement manually when you use composition. When constructing, you also need to construct the contained objects, when doing something that should affect a contained object, you need to dispatch it.

When a method is never overridden, it doesn't need to be in the vtable.

zozbot234 2 days ago [-]

That's not what people usually mean by "composition" though. The whole point of "use composition over inheritance" is to avoid that behavior.

1718627440 2 days ago [-]

I don't see how you can use composition without eventually calling methods of the contained objects. Every method of the outer object either uses an inner object or it doesn't. Yes, using the inner object doesn't always means just delegating to a single call. You maybe would implement the outer by calling an inner objects method multiple times or different methods, but nothing stops you of doing the same thing with a super class. When you don't call to an inner object, it's the same as you adding another method to the subclass, without it being present in the parent class.

I think composition over inheritance is only about being explicit. That's it.

maleldil 2 days ago [-]

Python doesn't require self to be passed. You need it in method definitions, but not calls.

1718627440 2 days ago [-]

But you can do it. Actually you can call instance methods of other classes and change the class of an instance in Python like in C, but this dynamism is probably what makes it slow. Also doing that will make the program quite complicated, more so than in C, since Python also abstracts about this.

pakl 2 days ago [-]

A few years ago Peterpaul developed a lightweight object-oriented system on top of C that was really pleasant to use[0].

No need to pass in the object explicitly, etc.

Doesn't have the greatest documentation, but has a full test suite (e.g., [1][2]).

[0] https://github.com/peterpaul/co2

[1] https://github.com/peterpaul/co2/blob/master/carbon/test/pas...

[2] https://github.com/peterpaul/co2/blob/master/carbon/test/pas...

guerrilla 2 days ago [-]

For people wondering what it looks like without the syntactic sugar of carbon then look here [0]. As far as I can see, there's no support for parametric polymorphism.

0. https://github.com/peterpaul/co2/tree/master/examples/my-obj...

1718627440 2 days ago [-]

Doesn't look much different than GLib the base for the GTK implementation (and other things in GNOME, the GNU Network _Object_ Model Environment).

cryptonector 23 hours ago [-]

Objects yes, classes and inheritance no. Just interfaces please.

saagarjha 2 days ago [-]

I feel like Vala tries to fit in this niche too.

1718627440 2 days ago [-]

> Having to pass the object explicitly every time feels clunky, especially compared to C++ where this is implicit.

I personally don't like implicit this. You are very much passing a this instance around, as opposed to a class method. Also explicit this eliminates the problem, that you don't know if the variable is an instance variable or a global/from somewhere else.

MontyCarloHall 2 days ago [-]

Agreed, one of the biggest design mistakes in the OOP syntax of C++ (and Java, for that matter) was not making `this` mandatory when referring to instance members.

cjfd 1 days ago [-]

Mandatory this can also be a major hit in readability. What if you have a class that implements the abc-formula. You get

  (- this->b + sqrt(this->b * this->b - 4 this->a * this->c))/(2 * this->a)

and

  (- this->b - sqrt(this->b * this->b - 4 this->a * this->c))/(2 * this->a)

This is a readability problem for any class that is used to do computations.

chuckadams 2 days ago [-]

C++ and Java went for the "objects as static closures" route, where it doesn't make any sense to have a `this`. Or, they made them superficially look like static closures, which in hindsight was probably not the best idea. Anyway, Java lets you use explicit `this`, I don't recall whether C++ makes it into a footgun or not.

MontyCarloHall 2 days ago [-]

Both languages let you use explicit `this` but don’t mandate it. The “static closure” approach is great. I don’t like having to explicitly pass `this` as a parameter to every method call as in the OP (or worse, the confusing Python approach of forcing `self` to be explicitly written in every non-static method signature but having it be implicitly passed during method calls).

What I don’t like is being able to reference instance members without `this`, e.g.

   void foo() {
      int x = bar + 1; // should be illegal: it can be hard to distinguish if `bar` is a local variable versus an instance member
      int y = this->bar + 1; // disambiguation is good
   }

josefx 2 days ago [-]

> int x = bar + 1; // should be illegal: it can be hard to distinguish if `bar` is a local variable versus an instance member

If it was this->bar it could be a member, it could also be a static variable. A bar on its own could be local or it could be in any of the enclosing scopes or namespaces. Forcing "this" to be explicit doesn't make the code any clearer on its own.

ryao 2 days ago [-]

The guy was referring to the explicit case where bar is a member variable. The cases where it is in the local scope under scoping rules or the global scope are not really an issue, since you can check the function definition to find if it is local. In the case that it is not in the function definition, then it is in the global scope in C. If implicit this were not done in C++, that would also be the case in C++, provided you do the sane thing and use the std namespace for everything. Just thinking about the headaches namespaces could cause when looking for such definitions with cscope gives me yet another reason to stay away from C++ whenever possible.

cherryteastain 2 days ago [-]

this in C++ is just a regular pointer, it has no special footguns, just the typical ones you have with pointers in general

jcelerier 2 days ago [-]

that's not really true - unlike a regular pointer, `this` is not allowed to be null, thus removing `if(this == nullptr)` is always a valid optimization to do.

cherryteastain 2 days ago [-]

It absolutely is allowed to be null:

    #include <iostream>
    struct Foo {
      void bar() {
        std::cout << this << std::endl;
      }
    };
    
    int main() {
      Foo *p = nullptr;
      p->bar();
    }

will print 0

Krssst 2 days ago [-]

This is undefined behavior in my understanding, it just happens to work until it doesn't.

I wouldn't be surprised if any null check against this would be erased by the optimizer for example as the parent comment mentioned. Sanitizers might check for null this too.

steveklabnik 2 days ago [-]

Just to cite what the others have told you: https://godbolt.org/z/bWfaYrqoY

    /app/example.cpp:10:6: runtime error: member call on null pointer of type 'Foo'
    SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /app/example.cpp:10:6 
    /app/example.cpp:3:8: runtime error: member call on null pointer of type 'Foo *'
    SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /app/example.cpp:3:8

usefulcat 2 days ago [-]

    Foo *p = nullptr;
    p->bar();

That's undefined behavior. It's "allowed" in the sense of "yes it's possible to write, compile and run that code", but the language makes no guarantees about what the results will be. So maybe not the most useful definition of "allowed".

jcelerier 1 days ago [-]

That code is outside the set of valid c++ programs

2 days ago [-]

hhdknddkkjd 2 days ago [-]

Only in so much that this being null is UB, but in the real world it very much can be null and programs will sometimes crash deep inside methods called on nullptr.

ryao 2 days ago [-]

Supporting an explicit this is required to be able to access the member variable when a local variable hides it under scoping rules. As another person replied, C++ does indeed have an implicit this.

eps 1 days ago [-]

You must be joking.

That would be a complete redability disaster... at least for C++. Java peeps probably won't even flinch ;)

loeg 2 days ago [-]

I think the author is talking about this:

  object->ops->start(object)

Where not only is it explicit, but you need to specify the object twice (once to resolve the Vtable, and a second time to pass the object to the stateless C method implementation).

ryao 2 days ago [-]

What guarantee do you have that ->ops is a vtable? It could contain function pointers that don’t take an implicit this argument like struct file_operations in Linux does. It could also contain variables that are non-pointers. Neither is allowed in a vtable, but both are fine to do in C.

1718627440 2 days ago [-]

As it isn't the compiler that creates the vtable, you can also have the equivalent of this as the last parameter or where you want it to be.

> It could also contain variables that are non-pointers.

The convention of it being a pure vtable is that it just doesn't.

> Neither is allowed in a vtable

Who is the vtable membership authority? :-)

ryao 1 days ago [-]

You can take the API for a linked list and implement a balanced binary search tree behind it, but continuing to call it a linked list after doing that is wrong. Similarly, you can do pointer indirections the same way a vtable would do them, but if the things you get are not the equivalent of member function pointers, it is not a vtable.

1718627440 1 days ago [-]

> In computer programming, a virtual method table (VMT), virtual function table, virtual call table, dispatch table, vtable, or vftable is a mechanism used in a programming language to support dynamic dispatch (or run-time method binding). (Wikipedia)

When its a table of function pointers used for dynamic dispatch, to me it's a vtable. I don't care about their type signatures as long as they logically belong to the object in question.

You seem to have a different very narrow definition of vtables, so the discussion is kind of useless.

ryao 23 hours ago [-]

Read the definition again. The programming language here is not the one using this. It is the programmer using it.

1718627440 21 hours ago [-]

Which means that it's not the language doing OOP, but the programmer.

ryao 20 hours ago [-]

The definition you quoted requires the language to be the one doing it for it to be a vtable. If the programmer is doing it, then it is not a vtable by that definition.

1718627440 10 hours ago [-]

Ok, we can name it a vtable if it's done by the language and a wtable if it is done by the programmer. I don't care about this distinction, the mechanism is the same, the effects are the same, the implemented theory (OOP) is the same. Heck event the emitted code is the same. I guess the vtable implementations in a C++ compiler are now called wtables.

ryao 8 hours ago [-]

It is not the same theory, since it is an ADT, not anything object oriented. Different theories can and do overlap. The emitted code is also not the same, since there is no implicit this pointer. You are the one who posted a definition you found and then claimed that it agreed with you, when it did not and now are reneging on your use of it. You do not understand this topic as well as you think you do and this is a silly hill to die on.

1718627440 8 hours ago [-]

> You are the one who posted a definition you found and then claimed that it agreed with you

X is Y used in Z. Does that mean X is not Y when it is not used in Z? A knife is a sharp object used to cut meat. How do I call the thing to cut fish?

Yes, I see how you can parse the definition your way, I didn't thought about that before introducing it.

> since it is an ADT, not anything object oriented

Yes, they have large overlap. To my understanding, the difference is, that OOP has inheritance and an ADT can be provided by the big god object. The latter I would refrain to call OOP, although it can be argued it still is.

I think our real discussion here is, whether a function entry that doesn't take an argument of the object type precludes it being a vtable. To me it is as long as this function is supposed to be a method of a single object and not for all objects in general, i.e. a global function.

ryao 6 hours ago [-]

The point of a vtable is to implement virtual member functions that support inheritance and polymorphism without changing the base class implementations no matter what the child classes do. If you omit the this pointer, you cannot do inheritance that uses the this pointer in an override without changing the implementation of the parent to add the this pointer back, which would be like a tail wagging a dog. Thus, it is not a vtable. It is something similar, but different.

1718627440 3 hours ago [-]

> The point of a vtable is to implement virtual member functions that support inheritance and polymorphism without changing the base class implementations no matter what the child classes do

I agree.

> If you omit the this pointer, you cannot do inheritance that uses the this pointer in an override

Yes, but:

> If you omit the this pointer, you cannot do inheritance

You can absolutely do inheritance when the child implementation doesn't need a this pointer. You need the this pointer to read/write values of the instance not to know which types it is.

Let me give you an example:

    class Vehicle {
        virtual unsigned int get_number_of_wheels () = 0;
    };

    class Car: Vehicle;
    class Bicycle: Vehicle;

    unsigned int Car::get_number_of_wheels () { return 4; }
    unsigned int Bicycle::get_number_of_wheels () { return 2; }

This is clearly OOP, there is inheritance, a base class, a vtable introduced by the compiler. C++ will still have a this pointer here for consistency, but absolutely isn't needed here.

This is exactly was is going on with check_flags(). The signature is inherited from a (virtual) base class. Part of that interface contract is that the allowed values are only depended upon the file object's type and not on it's values. You choose the implementation invoked at object construction, if it would need a this pointer, it means that the values returned can change during the lifetime of the object.

    The  following  commands manipulate the flags associated with a file descriptor. [...]
    F_SETFD (int)
              Set the file descriptor flags to the value specified by arg.

Why should the set of allowed flags change over the life time of a file descriptor? That is what the kernel prevents here in it's internal interface by refusing to provide a this pointer to a child implementation.

ginko 2 days ago [-]

Nothing a little macro magic couldn't fix..

  #define CALL(object, function, ...) (object->ops->function(object, __VA_ARGS__))

ActorNightly 2 days ago [-]

You can also just replicate the vtable of sorts in C that keeps track of things when new objects are created.

loeg 2 days ago [-]

You can just use C++ instead of reinventing a worse version of C++98!

ryao 2 days ago [-]

It would be an improvement on C++ since you would avoid the horrendous error messages from ridiculously long types, slow compile times, exception handling nightmares, bloat from RTTI and constant worry that operators might not mean what you expect. That is before even mentioning entirely new classes of problems like the diamond problem. Less is more.

That said, similar macro magic is used in C generic data structures and it works very well.

1718627440 2 days ago [-]

Yes I know. From the caller that might seem to be redundant, my argument was about the callee's side. Also it is not truely redundant, as you can write:

    object1->op->start(object2)
    superclass->op->start(object)

loeg 2 days ago [-]

I think both of these invocations are invalid. Using object1's vtable methods on object2, obviously, but in the latter case: the vtable method should just point at the superclass impl, if not overridden. And if overridden and the child impl needs to call the superclass, it can just do so without dispatching through some vtable.

1718627440 2 days ago [-]

When you think of vtables as unique or owned by an object, then these example seem weird to you. When you think of them as orthogonal to your types/objects, these examples can be useful.

In the first example, object1 and object2 can very much be of the same type or compatible types/subtypes/supertypes. Having vtables per object as opposed to per class to me indicates, that it IS intended to modify the behaviour of an object by changing it's vtable. Using the behaviour of another object of the same type to treat the second object, seams valid to me.

In the second case, it's not about the child implementation dispatching to the superclass, it's about some external code wanting to treat it as an object of the supertype. It's what in other languages needs an upcast. And the supertype might also have dynamic behaviour, otherwise you of course wouldn't use a vtable.

loeg 2 days ago [-]

I think it is wrong/weird for objects of the same type to have different vtables, yes. I would call those different types.

Upcasting is fine, but generally speaking the expected behavior of invoking a superclass method on an object that is actually a subclass is that the subclass method implementation is used (in C++, this would be a virtual/override type method, as opposed to a static method). Invoking a superclass-specific method impl on a subclass object is kind of weird.

1718627440 2 days ago [-]

In most languages, this is not possible, because they abstract over the implementation of classes. In C it is so you can be more creative. You can for example use it instead of a flag for behaviour. Why branch on a variable and then call separate methods, when you can simply assign the wanted implementation directly. If you want to know, which implementation/mode is used, comparing function pointers and scalar variables amounts to the same. It is also an easy way to get a unique number. When all implementations of that can operate on the same type, they are interchangeable.

In C you can also change the "class" of an instance as needed, without special syntax. Maybe you need to already call a method of the new/old class, before/after actually changing the class type.

> is that the subclass method implementation is used

The entire point of invoking the superclass method is, because the subclass has a different implementation and you want to use the superclass implementation.

zozbot234 2 days ago [-]

"Orthogonal" vtables are essentially traits/typeclasses.

1718627440 2 days ago [-]

What I really like about C is that it supports these sophisticated concepts without having explicit support for them. It just naturally emerges from the core concepts. This is what makes it feel like it just doesn't restrict the programmer much.

Maybe it's a bit due to its evolution. It started with a language that should have all features every possible, that was to complicated to be implemented at the time. Then it was dumbed down to a really simple language. And then it evolved along side a project adding the features, that are truly useful.

ryao 2 days ago [-]

You are wrong about those invocations being invalid. Such patterns happen in filesystem code fairly often. The best example off the top of my head is:

error = old_dir->i_op->rename(rd->new_mnt_idmap, old_dir, old_dentry, new_dir, new_dentry, flags);

https://github.com/torvalds/linux/blob/master/fs/namei.c#L51...

That is a close match for the first example, with additional arguments.

It is helpful to remember that this is not object oriented programming and not try to shoehorn this into the paradigm of object oriented programming. This is data abstraction, which has similarities (and inspired OOP), but is subtly different. Data abstraction does not automatically imply any sort of inheritance. Thus you cannot treat things as necessarily having a subclass and superclass. If you must think of it in OOP terms, imagine that your superclass is an abstract class, with no implemented members, except you can instantiate a child class that is also abstract, and you will never do any inheritance on the so called child class.

Now, it is possible to implement things in such a way where they actually do have something that resembles a subclass and a superclass. This is often done in filesystem inode structures. The filesystem will have its own specialized inode structure where the generic VFS inode structure is the first member and thus you can cast safely from the generic inode structure to the specialized one. There is no need to cast in the other direction since you can access all of the generic inode structure’s members. This trick is useful when the VFS calls us via inode operations. We know that the inode pointer is really a pointer to our specialized inode structure, so we can safely cast to it to access the specialized fields. This is essentially `superclass->op->start(object)`, which was the second example.

Data abstraction is a really powerful technique and honestly, object oriented programming rarely does anything that makes me want it over data abstraction. The only thing that I have seen object oriented programming do better in practice than data abstraction is marketing. The second example is similar to C++’s curiously recurring template pattern, which adds boilerplate and fighting with the compiler with absurdly long error messages due to absurdly verbose types to achieve a result that often at best is the same thing. On top of those headaches, all of the language complexity makes the compile times slow. Only marketing could convince someone that the C++ OOP way is better.

loeg 2 days ago [-]

I don't agree that your example pattern matches to the example I'm complaining about. vfs_rename() is using old_dir's vtable on old_dir. The vtable matches the object.

ryao 2 days ago [-]

It is not a vtable. It is a structure of function pointers called struct inode_operations. It is reused for all inodes in that filesystem. If you get it from one callback, you can safely use it on another struct inode on the same filesystem without a problem because of that, because nobody uses this like a vtable to implement an inheritance hierarchy. There are even functions in struct inode_operations that don’t even require the inode structure to be passed, such as ->readlink, which is most unlike a vtable since static member functions are never in vtables:

https://www.kernel.org/doc/html/latest/filesystems/vfs.html

As I said previously, it is helpful to remember that this is not object oriented programming and not try to shoehorn this into the paradigm of object oriented programming. Calling this a vtable is wrong.

loeg 2 days ago [-]

Again, this just isn't responsive to my comments, which discuss the article. The article's author is clearly talking about OOP. You've invented some alternative article and are arguing about that instead; I'm not interested.

ryao 2 days ago [-]

You were not discussing the article in your previous reply. That said, I have repeated explained why both you and the author article are wrong to describe that structure of function pointers as a vtable by giving examples of it containing things that are not allowed in vtables (if say a C++ compiler did this to its vtables, it could cause the wrong function to be called when trying to call a static member function). You ignore that.

zozbot234 2 days ago [-]

The term "vtable" is not exclusive to C++. Trait function dictionaries in Rust are called vtables and behave as described here, there's not necessarily any 'this' or 'self' object.

ryao 1 days ago [-]

The point of the vtable is to allow dynamic dispatch based on the actual type of an object. When you have a function that does not need a this pointer, it no longer depends on the type of the object and putting it in there anyway could cause you to execute a variant depending on the type of the object, which seems like a buggy undesirable behavior.

1718627440 1 days ago [-]

It can still depend on the type, the answer just doesn't need information from the instance.

What speed limits can this road possibly have is a question I want to ask about this specific road. Yet this can be answered by referring to the country, which is already known when you create the road. But the user that asks this question can still ask this about roads in different countries, so this question still is valid.

Different objects can have different methods. When the method to be used is known at the time of object creation it can be chosen by assigning the appropriate function pointer in the vtable. The method itself might not necessarily need a instance pointer though.

ryao 23 hours ago [-]

Let’s say I have ClassA::Foobar() and ClassB::Foobar(), and ClassB inherits from ClassA. Now let’s say I want to use ClassA::Foobar(), and I access it from the object because this pointer is in the vtable and the object is of type ClassB. Now ClassB::Foobar() executed, which is wrong. This is why static functions are not put into vtables. What Linux is doing is not a vtable, even if it shares similarities. Perhaps you would understand this if I said all thumbs are fingers, but not all fingers are thumbs.

1718627440 10 hours ago [-]

That's a very good example.

When I invoke object->Foobar() I want to invoke the appropriate method for this object, from whatever class that might be. This is exactly what's happening in the kernel here.

When I actually intend to call the method from ClassA, I would either call something like object->base->Foobar() or ClassA->Foobar(object). Note how this is the very example that you are replying to: https://news.ycombinator.com/user?id=1718627440

ryao 8 hours ago [-]

You don’t implement calls to static member functions by putting them into a vtable, which is one of many reasons why this is not a vtable. The proper way to implement static functions is through static dispatch.

1718627440 7 hours ago [-]

> You don’t implement calls to static member functions by putting them into a vtable

Yes, you claimed it is like a static member function, I don't think it can be.

> The proper way to implement static functions is through static dispatch

Yes, but we are talking about dynamic dispatch here.

To quote your earlier comment:

> which is most unlike a vtable since static member functions are never in vtables

You conclude it isn't a vtable, I conclude, it's not a static member function, because it uses dynamic dispatch.

I don't think we actually disagree on how and when to use static and dynamic dispatch.

ryao 6 hours ago [-]

If you leave out the this pointer, it is the equivalent of putting a function pointer to a global function (or a static member function) into the structure. This is not how OOP works.

Omitting the this pointer also breaks inheritance, since hypothetical child classes would not be able to override the definition while using the this pointer. Having to edit the parent class to be able to do that is not how OOP works.

This is not OOP nor is it intended to be.

spacechild1 2 days ago [-]

> Also explicit this eliminates the problem, that you don't know if the variable is an instance variable or a global/from somewhere else.

People typically use some kind of naming convention for their member variables, e.g. mFoo, m_Foo, m_foo, foo_, etc., so that's not an issue. I find `foo_` much more concise than `this->foo`. Also note that you can use explicity this in C++ if you really want to.

1718627440 2 days ago [-]

In code I write, I can know what variables mean. The feature loses its point, when it's not mandatory. Also being explicit allows you to be more expressive with variable name and ordering.

Galanwe 2 days ago [-]

I don't quite agree, especially because the implicit this not only saves you from explicitly typing it, but also because by having actual methods you don't need to add the struct suffix to every function.

    mystruct_dosmth(s);
    mystruct_dosmthelse(s);

    s->dosmth();
    s->dosmthelse();

1718627440 2 days ago [-]

My problem with implicit this is more, that you can access member variables, without it being explicit, i.e. about the callee, not about the caller.

For the function naming, nothing stops you from doing the same in C:

   static dosmth (struct * s);
   
   s->dosmth = dosmth;

That doesn't stop you from mentioning s twice. While it is redundant in the common case, it isn't in every case like I wrote elsewhere. Also this is easily fixable as written several times here, by a macro, or by using the type directly.

Galanwe 2 days ago [-]

This is not the same, you introduced dynamic function resolution (i.e.a function pointer tied to a specific instance), we are talking about static function resolution (purely based on the declared type).

1718627440 2 days ago [-]

True, if you don't trust the compiler to optimize that, then you must live with the C naming.

ActorNightly 2 days ago [-]

You can also get clever with macros.

Gibbon1 2 days ago [-]

The implicit this sounds to me like magic. Magic!

Ask how do I do this, well see it's magic. It just happens.

Something went wrong? That's also magic.

After 40 years I hate magic.

elteto 2 days ago [-]

...and C++ added explicit this parameters (deducing this) in C++23.

ryao 2 days ago [-]

“this” is a reserved keyword in C++, so you do not need to worry about it being a global variable.

That said, I like having a this pointer explicitly passed as it is in C with ADTs. The functions that do not need a this pointer never accidentally have it passed from the developer forgetting to mark the function static or not wanting to rewrite all of the function accesses to use the :: operator.

wmanley 2 days ago [-]

It’s not about ‘this’ being a global, it’s if you see ‘i++’ in code it’s not obvious if ‘i’ is a member or not without having to check context.

Kranar 2 days ago [-]

If you see "i++" in code and you don't have any context about what "i" is, then what difference does it make if "i" is a member variable, global variable, parameter, etc etc...

If all you see in code is a very tiny 3 character expression, you won't be able to make much of a judgement about it to begin with.

ryao 2 days ago [-]

Not allowing a variable to implicitly refer to a member variable makes it much easier to find. If it is not declared in the function and there is no implicit dereferencing of a this pointer, the variable is global. If the variable name is commonly used and it is a member variable, it is a nightmare to hunt for the correct declaration in the codebase with cscope.

ryao 2 days ago [-]

Good point. I had misunderstood the previous comment as suggesting that this be passed to the member function as an explicit argument, rather than requiring dereferences of this be explicit. The latter makes far more sense and I agree it makes reasoning about things much easier.

wosined 1 days ago [-]

Hi, I don't know much about this. But it seems to me that the OP is doing it differently than the kernel devs. If you read the article that the OP links, then you get the impression that the vtables contain typed function pointers, while OP uses void pointers. Also the main benefit mentioned in the kernel dev article is that you save memory, by not having multiple function pointers in each structure instance, but instead you have just one pointer to a vtable in each instance. Thus the main benefit is saving memory according to kernel dev, but OP uses this vtable as a form of indirection to implement runtime method swapping and polymorphism, which is not even mentioned in the kernel dev article. Thus, OP uses some other pattern than the one mentioned by kernel dev.

1718627440 1 days ago [-]

> while OP use void pointers

OP doesn't use void pointers, he uses void. He writes about functions having no arguments and returning nothing for the same reason other blog posts name functions foo and bar.

> OP uses this vtable as a form of indirection to implement runtime method swapping and polymorphism

The kernel uses vtables to implement polymorphism, it doesn't store the vtable in the object to save space. If there is no polymorphism, you don't use a vtable at all, that's saving even more space.

tdrnl 2 days ago [-]

A talk[0] about Tmux is where I learned about this pattern in C.

I wrote about this concept[1] for my own understanding as well -- just tracing the an instance of the pattern through the tmux code.

[0] https://raw.githubusercontent.com/tmux/tmux/1536b7e206e51488... [1] https://blog.drnll.com/tmux-obj-oriented-commands

SLWW 2 days ago [-]

I've done this on a few smaller projects when I was in college. It's fun bringing something similar to OOP into C; however you can get into trouble really quickly if you are not careful.

munchler 2 days ago [-]

Note that this is using interfaces (i.e. vtables, records of function pointers), not full object-orientation. Other OO features, like classes and inheritance, have much more baggage, and are often not worth the associated pain.

1718627440 2 days ago [-]

What do you think inheritance is, if not composition of vtables? What do you think classes are, if not a composition of a vtable and scoped variables?

munchler 2 days ago [-]

Those "scoped variables" are the difference. Mutable state adds a great deal of complexity.

1718627440 2 days ago [-]

And the style presented in the article uses vtables with "scoped variables". How do you conclude it's "not full object-orientation"?

PhilipRoman 2 days ago [-]

Field inheritance is surprisingly natural in C, where a struct can be cast to it's first member.

1718627440 2 days ago [-]

Note that you only need to cast for an upcast. To access the first member, you wouldn't need to cast.

It would be nice though, if syntax like the following would be supported:

    struct A 
    {
        int a;
    };

    struct B 
    {
        int b; 
        struct A a;
    };

    void foo (struct A * a)
    {
        struct B * b;

        &b->a = pa;
    }

    struct B b;

    foo (&b.a);

teo_zero 2 days ago [-]

In what scenario would this be useful? If foo() takes a struct A, it should be more generic and have no knowledge about the more specialized struct B.

1718627440 2 days ago [-]

In exact that same scenario, that you would cast to a subclass in another language, it's about language support for what for example the kernel does with container_of.

Of course casting to a subclass isn't guaranteed to succeed always, but for example when you have actually declared it as the subclass elsewhere it's fine without checking for isinstance.

PhilipRoman 2 days ago [-]

Yeah you're right, I meant the other way around. Also another loosely related idea is the container_of macro in Linux kernel.

1718627440 2 days ago [-]

Yeah, my idea is literally native type-safe support of container_of for assignment in the compiler.

ryao 2 days ago [-]

vtables contain function pointers to functions that take “this” pointers. The author mentions struct file_operations as an example of a vtable. struct file_operations contains a pointer to a function that does not take “this” pointer. It is not even a vtable.

1718627440 2 days ago [-]

I would still call it a vtable. Who assures you that every function of a class needs a pointer to the object instance? When you don't need it, you can just leave it of when you roll your own.

ryao 1 days ago [-]

Static member functions are tied to the class while virtual member functions are tied to the object. If you throw the static member functions into a vtable, you will at best have a bug where the static member function from a different class can be called. Alternatively, you would have an undefined behavior where the other class in the hierarchy does not implement this function.

There is a saying “If Your Only Tool Is a Hammer Then Every Problem Looks Like a Nail”. That is precisely what is happening here with the insistence to call what appears to be all structures of function pointers vtables. A vtable is something that follows a fairly well defined pattern for implementing inheritance. Not all things containing function pointers are vtables.

1718627440 1 days ago [-]

> Static member functions are tied to the class while virtual member functions are tied to the object.

That's nice, but the entire point is, that the caller doesn't know the type of the object, it only has a supertype. That's why you need dynamic dispatch here. Of course you can implement dynamic dispatch without function pointers, but it is done with function pointers here. If you don't want to name dynamic dispatch implemented with function pointers a vtable, OK, that's fine, but that's the definition I am familiar with.

ryao 23 hours ago [-]

The vtable pointer corresponds to the actual type in languages that use vtables to implement inheritance, so you do know the type.

Read my previous comment for the bug that would happen if what us being used in Linux were actually used for dynamic dispatch when implementing inheritance, and it should be clear this is something similar, but different.

1718627440 9 hours ago [-]

No the type of an object isn't known until runtime, so the compiler can't know it. Yes you know the type of the vtable, which means you know the supertype. This is true in all languages that don't use duck-typing. No you don't know the assigned values of entries in the vtable, which means you don't know the type of the object.

> Read my previous comment for the bug

No this is not a bug. The entire point of inheritance or dynamic dispatch IS that you call a function from "a different class" aka the subclass the object is an instance of. This is not a bug, this is the entire point of implementing it with vtables.

ryao 8 hours ago [-]

I never said that the type was known before runtime in OO. Your not a bug comment sounds awfully like an all bugs are features. Implementing static member functions this way would cause undefined behavior, which is a bug.

1718627440 7 hours ago [-]

> Implementing static member functions this way would cause undefined behavior, which is a bug.

Yes. My point is, that it can't be a static member function, because it's overridden by subclasses.

ryao 6 hours ago [-]

It cannot be overridden by subclasses without risking undefined behavior because there is no this pointer for it to use.

1718627440 3 hours ago [-]

Yes it can. It simply means the subclass implementation has no access to a this pointer. When would undefined behaviour occur here? When a function tries to use a parameter that does not exist in the function signature, that's a compile time error not UB.

Maybe you think that, because a this pointer is needed for dynamic dispatch? A this pointer exists there, it is just not passed to the implementation.

accelbred 2 days ago [-]

I usually put an inline wrapper around vtable functions so that `thing->vtable->foo(thing, ...)` becomes `foo(thing, ...)`.

2OEH8eoCRo0 2 days ago [-]

Yup. I've often wonder why the aversion to C++ since they are obviously using objects. Is it that they don't want to also enable all the C++ language junk like templates or OO junk like inheritance?

nphardon 2 days ago [-]

Here's one example. For us, it's more a tradeoff rather than an aversion. There's pros (manual memory management in C) and cons (manual memory management in C) for each. We do math operations (dense and sparse matrix math for setting up and solving massive systems of differential equations) on massive graphs with up to billions of nodes and edges. We use C in parts of the engine because we need to manage memory at a very fine level to meet performance demands on our tool. Other parts of the tool use C++ because they decided the tradeoff benefited in the other direction, re memory access / management / ease of use. As a result we need really robust qa around memory leaks etc. and tbh we rely on one generational talent of an engineer to keep things from falling apart; but we get that speed. As a side note, we implement objects in C a little more complex than the op, so that the object really does end up as a black box to the user (other engineers), with all the beauty of data agnosticism.

TuxSH 2 days ago [-]

What parts of it can't just be compiled as C++ code? (unless it has to do with the subtle difference in implicit lifetime object rules)

IMO it's much easier to write resleaks/double-frees with refcounted objects in C than it is in C++

1718627440 2 days ago [-]

VLAs, named structure assignment, sane treatment of the void type, having different "lifetimes" for object (the C understanding of object) existence and initialization, having different namespaces for composed types and variables, a local error handling convention, leading to better error messages, robuster behaviour and a feeling for completeness, a lot of examples of the article and here in the thread, and most importantly no magic.

TuxSH 21 hours ago [-]

> named structure assignment

Is a thing since C++20

> sane treatment of the void type

If you're talking about conversion rules, explicit cast from void* to T* makes more sense than implicit since it is a downcast.

> VLAs

IMO these are a mistake (and C++ templates remove a lot of the need for it). They give a false sense of security and invite stack boundary overrun as many people forget to check bounds on these. I found and reported an unauthenticated RCE DoS (crash) in a distributed DB due to VLAs; worse, one cannot always assume the minimum stack size on a system.

> a local error handling convention

Exceptions are problematic in their implementation and how they are (mis)used, but they are supposed to be orthogonal to normal control flow handling, and are not supposed to replace it. They are more-or-less recoverable panics

1718627440 9 hours ago [-]

> If you're talking about conversion rules, explicit cast from void* to T* makes more sense than implicit since it is a downcast.

Yes, but you also need to specify the type in C. ((void *)p)->foo only works in New B not in C.

> IMO these are a mistake

Forgetting to check bounds does always result in these problems in C, this is not specific to VLAs. I find them useful.

> Exceptions are problematic in their implementation

Ok, but that means to me having these in the language is only a downside.

1718627440 2 days ago [-]

C makes it obvious were you use that dynamism and where you don't. Syntactic sugar doesn't really make that much of a difference and also restricts more creative uses.

The C syntax is not really that complicated. Dynamic dispatch and virtual methods was already in the article. Here is inheritance:

    struct Subclass {
        struct Baseclass base;
    };

That's not really that complicated. Sure, you need to encapsulate every method of the parent class, if you want to expose it. But you are also recommended to do that in other languages, and if you subclass you probably want to slightly modify behaviour anyway.

As for stuff like templates: C doesn't thinks everything needs to be in the compiler. For example shadowing and hiding symbols can be done by the linker, since this is the component that handles symbol resolution across different units anyway. When you want templates, either you actually want a cheap way of runtime dynamism, then do that, or you want source code generation. Why does the compiler need to do that? For the basics there is a separate tool in the language: the Preprocessor, if you want more, you are free to choose your tool. If you want a macro language, there is e.g. M4. If you want another generator just use it. If you feel no tool really cuts it, why don't you write your code generator in C?

BinaryIgor 2 days ago [-]

I always wonder, why not anything similar made it into a new (some) C version? Clearly, there is a significant demand for - lots of people reimplementing the same (similar) set of patterns.

1718627440 2 days ago [-]

Whenever you invent syntactic sugar you need to make some usage blessed and some usage impossible/needing to fallback to the old way without syntactic sugar. See https://news.ycombinator.com/item?id=45040662. Also some point of C is, that it doesn't hide that dynamic complexity. You always see when there is dynamic dispatch. There are tons of language, which introduce some formalism for these concepts, honestly most modern imperative languages seem to be. The unique selling point of C is, that you see the complexity. That influences you to only use it if you really want it. Also the syntax isn't really that complicated.

davikr 2 days ago [-]

Probably into the High C Compiler.

TickleSteve 1 days ago [-]

Never. Do. This...

I was involved in a product with a large codebase structured like this and it was a maintainability nightmare with no upsides. Multiple attempts were made to move away from this to no avail.

Consider that the code has terrible readability due to no syntax-sugar, the compiler cannot see through the pointers to optimise anything, tooling has no clue what to do with it. On top of that, the syntax is odd and requires any newbies to effectively understand how a c++ compiler works under-the-hood to get anything out of it.

On top of those points, the dubious benefits of OOP make doing this a quick way to kill long-term maintainability of your project.

For the devs who come after you, dont try to turn C into a poor-mans C++. If you really want to, please just use C++.

1718627440 1 days ago [-]

Can you elaborate what exactly the maintainability nightmare was?

To me less syntactic sugar is more readable, because you see what function call involves dynamic dispatch and which doesn't. Ideally it should also lead to dynamic dispatch being restricted to where it is needed.

I don't know where (might also have been LWN), but there was a post about it actually being more optimizable by the compiler, because dynamic code in C involves much less function pointers and the compiler can assume UB more often, because the assignments are in user code.

> requires any newbies to effectively understand how a c++ compiler

You are not supposed to reimplement a C++ compiler exactly, you are supposed to understand how OOP works and then this emerges naturally.

> dont try to turn C into a poor-mans C++

It's not poor-mans C++, when it's idiomatic C.

People like me very much choose C while having this usage in mind, because its clearer and I can sprinkle dynamism where it's needed not where the language/compiler prescribes it and because every dynamism is clear because there is not dynamic sugar, so you can't hide it.

nphardon 2 days ago [-]

Another cool thing about this approach is you can have the arguments to your object init be a pointer to a structure of args. Then down the line you can add features to your object without having to change all the calls to init your object throughout the code base.

2 days ago [-]

MangoToupe 2 days ago [-]

If this is the pattern you prefer, why not choose a language that caters to it? Choosing C just seems like you're TRYING to shoot yourself. I don't care how good you are at coding, this is just a bad decision.

1718627440 1 days ago [-]

Because they like how C caters to this. This question was asked here several times, please read the answers there.

MangoToupe 19 hours ago [-]

Ok so... why choose C if they know they're shooting themselves?

1718627440 8 hours ago [-]

> Ok so... why choose C if they know they're shooting themselves?

> Because they like how C caters to this.

We(aka I) think we are shooting ourselves less, because C represents the algorithms more in a way how we want to express them. C's lack of syntactic sugar means dynamic dispatch is always visible. C not prescribing which function pointers you can use, means that the most fitting way can be chosen as described by the article and the LWN post, as opposed to shoehorning it into some paradigm prescribed by the language, which causes more problems done the line.