User:Eberkowitz
Eberkowitz (Talk | contribs) |
m |
||
(2 intermediate revisions by one user not shown) | |||
Line 7: | Line 7: | ||
Prior to coming to Roosevelt, Dr. Berkowitz taught at the Illinois Institute of Technology and before that worked and consulted on document management systems and workplace automation systems. | Prior to coming to Roosevelt, Dr. Berkowitz taught at the Illinois Institute of Technology and before that worked and consulted on document management systems and workplace automation systems. | ||
− | My interest is in teaching students that everything in life is about | + | My interest is in teaching students that everything in life is about balancing forces and all the more so in the ''Information Age'' As governments and other power-players learn more about citizens, and much is made of this issue, far less is made of citizen's abilities to learn more about the power-players. Thus balance can be restored by active civic-engagement through possessing information about the power-players and the means to collect and process that information in the form of open-source tools. I am interested in getting students from all disciplines ranging from the humanities to the formal sciences to understand how the open source world provides crucial information tools and exploit them as appropriate for their discipline. |
My search for "Big Data" on SourceForge yielded a very unhelpful list of 175 projects, though there are a few gemstones amongst the gravel. | My search for "Big Data" on SourceForge yielded a very unhelpful list of 175 projects, though there are a few gemstones amongst the gravel. | ||
Line 22: | Line 22: | ||
PDL (Perl Data Language) is beta software. That means that is has been deployed but bugs may still exist. Looking at the bug list on their site, there are still serious bugs in the software that do affect one's ability to rely on the software to perform as needed or to produce reliable results. | PDL (Perl Data Language) is beta software. That means that is has been deployed but bugs may still exist. Looking at the bug list on their site, there are still serious bugs in the software that do affect one's ability to rely on the software to perform as needed or to produce reliable results. | ||
+ | |||
+ | I am looking at the PDL project. It provides a scripting language inside the Perl framework to perform analyses on data table as is done in Matlab and other such languages. The project is written in C (as are almost all such libraries that require maximum speed at iterative operations on tabular data) Data scientists and others who need to analyze data, and who wish to do so in the framework of a general-purpose programming language might choose this project. I know this because this is the target audience. The project is quite active and the last bug-fix/patch was applied three days ago. I would not use this particular project since it, in many ways, duplicates the functionality of python/scipy which I already use. | ||
+ | |||
+ | |||
+ | [[Category:POSSE 2014-05]] |
Latest revision as of 09:48, 7 February 2017
Eric Berkowitz
Eric G. Berkowitz is a Professor and Chair of the Department of Computer Science and Information Technology at Roosevelt University in Chicago.
Dr. Berkowitz's research focuses on automated proactive organization of large-scale document collections as a component of open information access. He also teaches courses on Big Data, and Open Source integration and Technology and Information in Local and Global Society.
Prior to coming to Roosevelt, Dr. Berkowitz taught at the Illinois Institute of Technology and before that worked and consulted on document management systems and workplace automation systems.
My interest is in teaching students that everything in life is about balancing forces and all the more so in the Information Age As governments and other power-players learn more about citizens, and much is made of this issue, far less is made of citizen's abilities to learn more about the power-players. Thus balance can be restored by active civic-engagement through possessing information about the power-players and the means to collect and process that information in the form of open-source tools. I am interested in getting students from all disciplines ranging from the humanities to the formal sciences to understand how the open source world provides crucial information tools and exploit them as appropriate for their discipline.
My search for "Big Data" on SourceForge yielded a very unhelpful list of 175 projects, though there are a few gemstones amongst the gravel.
Some of the most common languages for these types of projects are Java, C, Python
From: Linux Annoyances for Geeks: Getting the Most Flexible System in the World Just the Way You Want Michael Jang; "O'Reilly Media, Inc.", Apr 5, 2006; Page: 138
Many applications are "not ready for prime time." If you find that an application is in alpha development, it generally has not been tested with any rigor on most Linux distributions. However, many (but not all) beta projects, which are nominally still in testing, are as stable as any Microsoft application that you can purchase today. As described on SourceForge, there are seven levels of development: planning, pre-alpha, alpha, beta, production/stable, mature, and inactive. In most cases, you should not install planning, pre-alpha, and alpha applications on production computers. Beta software may or may not be ready for production computers and should be tested rigorously before installation. Production/stable software can generally be installed on production computers without as much testing. Mature and inactive applications may not have the latest features, or may be superseded by other applications.
WEKA is production software. This means that it has been tested in a deployment situation and both the software and the results can be relied upon. The assumption is that no serious bugs that would detract from reliability exist.
PDL (Perl Data Language) is beta software. That means that is has been deployed but bugs may still exist. Looking at the bug list on their site, there are still serious bugs in the software that do affect one's ability to rely on the software to perform as needed or to produce reliable results.
I am looking at the PDL project. It provides a scripting language inside the Perl framework to perform analyses on data table as is done in Matlab and other such languages. The project is written in C (as are almost all such libraries that require maximum speed at iterative operations on tabular data) Data scientists and others who need to analyze data, and who wish to do so in the framework of a general-purpose programming language might choose this project. I know this because this is the target audience. The project is quite active and the last bug-fix/patch was applied three days ago. I would not use this particular project since it, in many ways, duplicates the functionality of python/scipy which I already use.