Post by Admin on Dec 4, 2018 23:34:38 GMT
So I have Modern data science with R / Benjamin S. Baumer, Daniel T. Kaplan, and Nicholas J. Horton. (2017)
Chapman & Hall/CRC
Texts in Statistical Science Series
It is a huge book packed with info. And quite new too.
They list over 50, and they all look really good. CRC Press is always good, and our academic libraries have a good number of their titles.
I want to start here looking at the material in Appendix F, how to connect R to databases. This had after all been an area of concern for me.
Finding that I have to absorb information holographically. Lots and lots of books I need to access, to get all of what I want.
So what I am seeing, and this book confirms it, is this, to do what you want you add expansion modules to the interpreter. The interpreter is a straight C program. Not too big. You do stuff by adding expansion modules. These typically are C modules, and then some R code. You can access these without changing the original source code and compiling and re-linking because you will use Dynamic Linking. This is available for Linux, Windows, and Mac OS.
I presume that the expansion R code can pass connection info for the new C language interpreter code.
So then you use this to get to the data bases, MySQL or PostgreSQL. You do this via the Client Server interface, now accommodated for in each of the operating systems.
So this alleviates my concern about having to transfer through text files. It actually is very well done.
And they cover MySQL and PostgreSQL. They do not talk about MongoDB or any other NoSQL databases.
They also talk about SQLite. Now generally you can just write a program and link to the code for this and include it in your executable. But here you probably do still use dynamic linking, and you might still use the client server interface. Not sure, but I am confident it will work out well.
As I see it, if it is "Enterprise" data and you need complex access control, then you need MySQL or PostgreSQL. But if it just pertains to running your program, then probably SQLite is better.
As I see it though, no one wants to write much of a program using an interpreted language, its 100x slower. Want to use C++.
But the thing is, need some way for the user to input data to the program and to control it. Nice to be able to let the user be able to change this themselves. So this is where the interpretive language comes in, as the natural dividing line for what the user can change, and what they cannot.
You don't ever want a user to be able to change code and then expect you to maintain it, fork your software. And dealing with a large corp, run very sloppily, that could happen if you are not careful. So anything of any complexity is in your C++, and complied up and kept shut. Then in the interpretive language you just set up a chain type program which moves through its parts. The user can change this.
Okay, but the upshot of this is that considering Python, Ruby, R, and Io, for anything one of them can do, it should not be too difficult to make the others do it too. You just need to write your own program in the form of an expansion module.
So is there any reason to do some stuff using one of them and other stuff using another? Or better just to pick one and use it for everything?
Python, at least initially, it does not impress me. But I see Python being invoked for lots of R like applications.
Ruby seems to be intended as a corrective, and it seems to be getting used as such. Io, at least initially, it looks to me like something good for embedded applications, like a replacement for Forth. But for R, and really all of them, I still have much more to learn.
As far was why R uses something written in Fortran, sounds like it is just some legacy mathematical module. I though am not averse to using Fortran, not even for new stuff, if that makes it fit well within a tradition.
Still much more to learn.
Chapman & Hall/CRC
Texts in Statistical Science Series
It is a huge book packed with info. And quite new too.
They list over 50, and they all look really good. CRC Press is always good, and our academic libraries have a good number of their titles.
I want to start here looking at the material in Appendix F, how to connect R to databases. This had after all been an area of concern for me.
Finding that I have to absorb information holographically. Lots and lots of books I need to access, to get all of what I want.
So what I am seeing, and this book confirms it, is this, to do what you want you add expansion modules to the interpreter. The interpreter is a straight C program. Not too big. You do stuff by adding expansion modules. These typically are C modules, and then some R code. You can access these without changing the original source code and compiling and re-linking because you will use Dynamic Linking. This is available for Linux, Windows, and Mac OS.
I presume that the expansion R code can pass connection info for the new C language interpreter code.
So then you use this to get to the data bases, MySQL or PostgreSQL. You do this via the Client Server interface, now accommodated for in each of the operating systems.
So this alleviates my concern about having to transfer through text files. It actually is very well done.
And they cover MySQL and PostgreSQL. They do not talk about MongoDB or any other NoSQL databases.
They also talk about SQLite. Now generally you can just write a program and link to the code for this and include it in your executable. But here you probably do still use dynamic linking, and you might still use the client server interface. Not sure, but I am confident it will work out well.
As I see it, if it is "Enterprise" data and you need complex access control, then you need MySQL or PostgreSQL. But if it just pertains to running your program, then probably SQLite is better.
As I see it though, no one wants to write much of a program using an interpreted language, its 100x slower. Want to use C++.
But the thing is, need some way for the user to input data to the program and to control it. Nice to be able to let the user be able to change this themselves. So this is where the interpretive language comes in, as the natural dividing line for what the user can change, and what they cannot.
You don't ever want a user to be able to change code and then expect you to maintain it, fork your software. And dealing with a large corp, run very sloppily, that could happen if you are not careful. So anything of any complexity is in your C++, and complied up and kept shut. Then in the interpretive language you just set up a chain type program which moves through its parts. The user can change this.
Okay, but the upshot of this is that considering Python, Ruby, R, and Io, for anything one of them can do, it should not be too difficult to make the others do it too. You just need to write your own program in the form of an expansion module.
So is there any reason to do some stuff using one of them and other stuff using another? Or better just to pick one and use it for everything?
Python, at least initially, it does not impress me. But I see Python being invoked for lots of R like applications.
Ruby seems to be intended as a corrective, and it seems to be getting used as such. Io, at least initially, it looks to me like something good for embedded applications, like a replacement for Forth. But for R, and really all of them, I still have much more to learn.
As far was why R uses something written in Fortran, sounds like it is just some legacy mathematical module. I though am not averse to using Fortran, not even for new stuff, if that makes it fit well within a tradition.
Still much more to learn.