githuborganizationreadmediscoverability

What is a good way to allow the wider discoverability of private GitHub repositories?


If you are in an organisation, there may be GitHub repositories that are private (i.e. you don't have access to them), but it would be useful to know that they exist, and then you could arrange access where appropriate.

In other words we are trying to enable discoverability, in a way that can lead to access. This could be done with sharing readme's (noting that people need to have some discipline to write sensible readme's).

This blog post Solving the innersource discoverability problem looks like a potential solution, but may require that the user has access to see all the repos in the portal? I'd like for the user to be able to view readme's for all repos - if they don't have access, the can contact whoever is listed on the readme.

I see another option for making a file public from a private repo (using gitexporter to create a public repo with only the readme, example here. This makes it public, not my first preference, and would require every repo to do some work, far from ideal. While it doesn't give a neat portal, it should allow GitHub search functionality to find it by topic or keyword?

A related, perhaps simpler option is proposed here, where a student shares a readme from a private repo as a public GitHub page. Again, requires a little work from every repo, no neat portal, but can be found with GitHub search? While public Github pages can be made private, then would only be visible to those with repo access?

So, if I'm summarising basic requirements:

Additional nice to have features:

Suggestions?


Solution

  • I think you have already provided a suitable solution for it here already within your question. Alternatively, you can use APIs (GET repos, GET README of a repo) to get each repositories README and save it to a database/JSON based on a cron scheduler and create a web interface based on that data.

    But, I'm gonna elaborate on a few areas of improvement. The problem I see with this is the nature of the search. We aren't always looking for keywords, sometimes we are trying to find a potential fuzzy match for our problem, especially in the case of a larger organization with more than a couple of thousand repositories. In those cases, a search engine implementation will provide much better results. In my opinion, we should collect the README and FAQs and put them into Elastic search, expose search API for queries. The collection of README and FAQs should be part of the CI/CD pipeline, and while pushing new versions to artifactory it must publish metadata as well.