An enthusiastic developer presented a card project on GitHub

Short description

Summarize this content to 100 words Seattle-based developer Andrei Kashcha presented the GitHub map project, gathering over 400,000 repositories on one site. All repositories are divided by countries with fictional names that describe the technologies used in which countries.To create the map, data was used on who and which repositories gave stars from June 2020 to March 2023. This helped to collect projects with a total of more than 350 million stars. All data were obtained and analyzed using Google BigQuery.After that, it was necessary to find similar repositories in order to sort them according to the technologies used. And that is why the author chose the algorithm for determining similarity using the Jaccard coefficient. His home computer with 24GB of RAM was running out of power, so an AWS EC2 instance with 512GB of RAM was rented for the project, which was up to the task in just a few hours. The author said that he experimented with other matching algorithms, but the Jaccard coefficient showed the most accurate result.At the third stage, it was necessary to perform clustering of all projects. For this, Leydan’s algorithm was used. With its help, it was possible to obtain more than 1000 clusters. The nodes in the graphs were calculated by the author of the project using his solution, the code of which is published on GitHub.The result was a map of projects on GitHub, which are united by the technologies used in them. Each field of knowledge was given the name of a fictional country generated by ChatGPT&.Swiftoria – Swift projects;Vuetopia – projects on Vue.js;Javaland – Java projects;Python – Python projects;Dotnetia – .NET projects;Frontera – frontend development;Unity Land – projects on the Unity engine;Ladyopolis – projects related to LED;Hardlands – development of iron;PHP Kingdom – PHP projects;Diplernia – deep learning projects.Each fictional country contains dots representing projects. Clicking on a point opens the project card with README. The connections of the point with other projects where it is found or used are also shown.The map can be viewed in a browser, but the first launch of the site may take a long time due to data loading. The project code is published on GitHub.

An enthusiastic developer presented a card project on GitHub

Seattle-based developer Andrei Kashcha presented the GitHub map project, gathering over 400,000 repositories on one site. All repositories are divided by countries with fictional names that describe the technologies used in which countries.

To create the map, data was used on who and which repositories gave stars from June 2020 to March 2023. This helped to collect projects with a total of more than 350 million stars. All data were obtained and analyzed using Google BigQuery.

After that, it was necessary to find similar repositories in order to sort them according to the technologies used. And that is why the author chose the algorithm for determining similarity using the Jaccard coefficient. His home computer with 24GB of RAM was running out of power, so an AWS EC2 instance with 512GB of RAM was rented for the project, which was up to the task in just a few hours. The author said that he experimented with other matching algorithms, but the Jaccard coefficient showed the most accurate result.

At the third stage, it was necessary to perform clustering of all projects. For this, Leydan’s algorithm was used. With its help, it was possible to obtain more than 1000 clusters. The nodes in the graphs were calculated by the author of the project using his solution, the code of which is published on GitHub.

The result was a map of projects on GitHub, which are united by the technologies used in them. Each field of knowledge was given the name of a fictional country generated by ChatGPT&.

  • Swiftoria – Swift projects;

  • Vuetopia – projects on Vue.js;

  • Javaland – Java projects;

  • Python – Python projects;

  • Dotnetia – .NET projects;

  • Frontera – frontend development;

  • Unity Land – projects on the Unity engine;

  • Ladyopolis – projects related to LED;

  • Hardlands – development of iron;

  • PHP Kingdom – PHP projects;

  • Diplernia – deep learning projects.

Each fictional country contains dots representing projects. Clicking on a point opens the project card with README. The connections of the point with other projects where it is found or used are also shown.

The map can be viewed in a browser, but the first launch of the site may take a long time due to data loading. The project code is published on GitHub.

Related posts