opensource-lab

Research

Release Patterns

I want to identify patterns of release schedules that coincide with better software. We can measure “better” based on different factors, such as number of users, number of citations, or even Google Search Trends.

Pull Request and Issue Sentiment Analysis

What is the tone for issue boards and pull request discussion? Does the popularity (or various metrics about the software or users involved) matter? Does tone change over time? Why do we think that is?

Software in Scientific Domains

Does the citation (frequency, kind) of citation for software differ across domains? E.g., is it more common to see a citation for software in a biology paper than music (I’d say yes). Can we identify domains that are not citing (maybe not using?) software?

Software Lifecycle

For any given open source software, what is it’s lifecycle? How do we know (can we predict) if it will take off? Is it common to have more or fewer maintainers or contributors?

Container Identification

What attributes of a container are most meaningful to predict the domain and/or contents? E.g., is python or R more common in scientific containers? What kind of software do we not find in a container? (e.g., no Dockerfile or similar in the repository).

Container Binary Tree

Can we represent a container filesystem as a binary tree for quicker comparison of files between containers?

Github Events

If we use stars or followers as a metric, I want to understand how different kinds of developers with many stars and / or followers are different from others. What do they do?