diff --git a/README.md b/README.md index 35a7a59d..f74305cc 100644 --- a/README.md +++ b/README.md @@ -5,22 +5,34 @@ The OpenML documentation in written in MarkDown. The sources are generated by [M The overal structure (navigation) of the docs is configurated in the `mkdocs.yml` file. -Some of the API's use other documentation generators, such as [Sphinx](https://restcoder.readthedocs.io/en/latest/sphinx-docgen.html) in openml-python. This documentation is pulled in via iframes to gather all docs into the same place, but they need to be edited in their own GitHub repo's. +Some of the API's use other documentation generators, such as [Sphinx](https://restcoder.readthedocs.io/en/latest/sphinx-docgen.html) in openml-python. This documentation is pulled in using the [multirepo plugin](https://github.com/jdoiro3/mkdocs-multirepo-plugin) to gather all docs into the same place, but they need to be edited in their own GitHub repo's. ## Editing documentation Documentation can be edited by simply editing the markdown files in the `docs` folder and creating a pull request. End users can edit the docs by simply clicking the edit button (the pencil icon) on the top of every documentation page. It will open up an editing page on [GitHub](https://github.com/) (you do need to be logged in on GitHub). When you are done, add a small message explaining the change and click 'commit changes'. On the next page, just launch the pull request. We will then review it and approve the changes, or discuss them if necessary. +## Developing +To build the documentation locally, run `mkdocs serve -f mkdocs-local.yml` in the top directory (with the `mkdocs.yml` file). Any changes made after that will be hot-loaded. + +To build the full documentation, including importing the documentation from other repositories, run `mkdocs serve` in the top directory (with the `mkdocs.yml` file). This can take a while to compile, so only use this when needed. You might also need to set `export NUMPY_EXPERIMENTAL_DTYPE_API=1` (or `set NUMPY_EXPERIMENTAL_DTYPE_API=1` on Windows). + ## Deployment The documentation is hosted on GitHub pages. -To deploy the documentation, you need to have MkDocs and MkDocs-Material installed, and then run `mkdocs gh-deploy` in the top directory (with the `mkdocs.yml` file). This will build the HTML files and push them to the gh-pages branch of openml/docs. `https://docs.openml.org` is just a reverse proxy for `https://openml.github.io/docs/`. +To deploy the documentation, you need to have MkDocs installed locally, and then run `mkdocs gh-deploy` in the top directory (with the `mkdocs.yml` file). This will build the HTML files and push them to the gh-pages branch of openml/docs. `https://docs.openml.org` is just a reverse proxy for `https://openml.github.io/docs/`. -MKDocs and MkDocs-Material can be installed as follows: +MkDocs and all required extensions can be installed as follows: ``` -pip install mkdocs -pip install mkdocs-material -pip install -U fontawesome_markdown +pip install -r requirements.txt ``` +To test the documentation locally, run +``` +mkdocs serve +``` + +To deploy to GitHub Pages, run +``` +mkdocs gh-deploy +``` diff --git a/docs/contributing/OpenML-Docs.md b/docs/contributing/OpenML-Docs.md index 3e0f5460..fe3bb1e9 100644 --- a/docs/contributing/OpenML-Docs.md +++ b/docs/contributing/OpenML-Docs.md @@ -13,7 +13,10 @@ combined into these documentation pages using [MkDocs multirepo](https://github. git clone https://github.com/openml/docs.git pip install -r requirements.txt ``` - To build the documentation, run `mkdocs serve` in the top directory (with the `mkdocs.yml` file). Any changes made after that will be hot-loaded. + + To build the documentation locally, run `mkdocs serve -f mkdocs-local.yml` in the top directory (with the `mkdocs.yml` file). Any changes made after that will be hot-loaded. + + To build the full documentation, including importing the documentation from other repositories, run `mkdocs serve` in the top directory (with the `mkdocs.yml` file). This can take a while to compile, so only use this when needed. The documentation will be auto-deployed with every push or merge with the master branch of `https://www.github.com/openml/docs/`. In the background, a CI job will run `mkdocs gh-deploy`, which will build the HTML files and push them to the gh-pages branch of openml/docs. `https://docs.openml.org` is just a reverse proxy for `https://openml.github.io/docs/`. diff --git a/docs/index.md b/docs/index.md index 0c7bb778..f2d7f909 100644 --- a/docs/index.md +++ b/docs/index.md @@ -15,56 +15,15 @@ icon: material/creation

  Make your work more visible and reusable

  Built for automation: streamline your experiments and model building

-## Installation +## How to use OpenML -The OpenML package is available in many languages and across libraries. For more information about them, see the [Integrations](./ecosystem/index.md) page.

+OpenML is accessible to a wide range of people: -=== "Python/sklearn" +:computer: Explore the OpenML website to discover, download and upload ML resources. - - [Python/sklearn repository](https://github.com/openml/openml-python) - - `pip install openml` +:robot: [Install an OpenML library](intro/index.md) to access and share resources programmatically through our APIs. Select one of the detailed guides in the top menu. -=== "Pytorch" - - - [Pytorch repository](https://github.com/openml/openml-pytorch) - - `pip install openml-pytorch` - -=== "Keras" - - - [Keras repository](https://github.com/openml/openml-keras) - - `pip install openml-keras` - -=== "TensorFlow" - - - [TensorFlow repository](https://github.com/openml/openml-tensorflow) - - `pip install openml-tensorflow` - -=== "R" - - - [R repository](https://github.com/openml/openml-R) - - `install.packages("mlr3oml")` -=== "Julia" - - - [Julia repository](https://github.com/JuliaAI/OpenML.jl/tree/master) - - `using Pkg;Pkg.add("OpenML")` - -=== "RUST" - - - [RUST repository](https://github.com/mbillingr/openml-rust) - - Install from source - -=== ".Net" - - - [.Net repository](https://github.com/openml/openml-dotnet) - - `Install-Package openMl` - - -You might also need to set up the API key. For more information, see [Authentication](http://localhost:8000/concepts/openness/). - -## Learning OpenML - -Aside from the individual package documentations, you can learn more about OpenML through the following resources:
-The core concepts of OpenML are explained in the [Concepts](./concepts/index.md) page. These concepts include the principle behind using Datasets, Runs, Tasks, Flows, Benchmarking and much more. Going through them will help you leverage OpenML even better in your work.
+:mortar_board: [Get started](./concepts/index.md) by learning more about the structure and concepts behind OpenML, such as Datasets, Tasks, Flows, Runs, Benchmarking and much more. This will help you leverage OpenML even better in your work. ## Contributing to OpenML diff --git a/docs/intro/index.md b/docs/intro/index.md new file mode 100644 index 00000000..b2750209 --- /dev/null +++ b/docs/intro/index.md @@ -0,0 +1,107 @@ +--- +icon: material/rocket-launch +--- + +## :computer: Installation + +The OpenML package is available in many languages and has deep integration in many machine learning libraries. + +=== "Python/sklearn" + + - [Python/sklearn repository](https://github.com/openml/openml-python) + - `pip install openml` + +=== "Pytorch" + + - [Pytorch repository](https://github.com/openml/openml-pytorch) + - `pip install openml-pytorch` + +=== "TensorFlow" + + - [TensorFlow repository](https://github.com/openml/openml-tensorflow) + - `pip install openml-tensorflow` + +=== "R" + + - [R repository](https://github.com/openml/openml-R) + - `install.packages("mlr3oml")` + +=== "Julia" + + - [Julia repository](https://github.com/JuliaAI/OpenML.jl/tree/master) + - `using Pkg;Pkg.add("OpenML")` + +=== "RUST" + + - [RUST repository](https://github.com/mbillingr/openml-rust) + - Install from source + +=== ".Net" + + - [.Net repository](https://github.com/openml/openml-dotnet) + - `Install-Package openMl` + +You can find detailed guides for the different libraries in the top menu. + + +## :key: Authentication + +OpenML is entirely open and you do not need an account to access data (rate limits apply). However, signing up via the OpenML website is very easy (and free) and required to upload new resources to OpenML and to manage them online. + +API authentication happens via an **API key**, which you can find in your profile after logging in to openml.org. + +``` +openml.config.apikey = "YOUR KEY" +``` + +## :joystick: Minimal Example + +:material-database: Use the following code to load the [credit-g](https://www.openml.org/search?type=data&sort=runs&status=active&id=31) [dataset](https://docs.openml.org/concepts/data/) directly into a pandas dataframe. Note that OpenML can automatically load all datasets, separate data X and labels y, and give you useful dataset metadata (e.g. feature names and which ones have categorical data). + +```python +import openml + +dataset = openml.datasets.get_dataset("credit-g") # or by ID get_dataset(31) +X, y, categorical_indicator, attribute_names = dataset.get_data(target="class") +``` + + +:trophy: Get a [task](https://docs.openml.org/concepts/tasks/) for [supervised classification on credit-g](https://www.openml.org/search?type=task&id=31&source_data.data_id=31). +Tasks specify how a dataset should be used, e.g. including train and test splits. + +```python +task = openml.tasks.get_task(31) +dataset = task.get_dataset() +X, y, categorical_indicator, attribute_names = dataset.get_data(target=task.target_name) +# get splits for the first fold of 10-fold cross-validation +train_indices, test_indices = task.get_train_test_split_indices(fold=0) +``` + +:bar_chart: Use an [OpenML benchmarking suite](https://docs.openml.org/concepts/benchmarking/) to get a curated list of machine-learning tasks: +```python +suite = openml.study.get_suite("amlb-classification-all") # Get a curated list of tasks for classification +for task_id in suite.tasks: + task = openml.tasks.get_task(task_id) +``` + +:star2: You can now benchmark your models easily across many datasets at once. A model training is called a run: + +```python +from sklearn import neighbors + +task = openml.tasks.get_task(403) +clf = neighbors.KNeighborsClassifier(n_neighbors=5) +run = openml.runs.run_model_on_task(clf, task) +``` + +:raised_hands: You can now publish your experiment on OpenML so that others can build on it: + +```python +myrun = run.publish() +print(f"kNN on {data.name}: {myrun.openml_url}") +``` + + +## Learning more OpenML + +Next, check out the :rocket: [10 minute tutorial](notebooks/getting_started.ipynb) and the :mortar_board: [short description of OpenML concepts](concepts/index.md). \ No newline at end of file diff --git a/docs/notebooks/getting_started.ipynb b/docs/notebooks/getting_started.ipynb index 1f65b4ce..ab700f91 100644 --- a/docs/notebooks/getting_started.ipynb +++ b/docs/notebooks/getting_started.ipynb @@ -49,7 +49,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Getting Started\n", + "# OpenML in 10 minutes\n", "\n", "This page will guide you through the process of getting started with OpenML. While this page is a good starting point, for more detailed information, please refer to the [integrations section](Scikit-learn/index.md) and the rest of the documentation.\n", "\n" diff --git a/mkdocs-local.yml b/mkdocs-local.yml index ce96f8c2..19644536 100644 --- a/mkdocs-local.yml +++ b/mkdocs-local.yml @@ -82,6 +82,12 @@ markdown_extensions: plugins: - autorefs - section-index + - mkdocs-jupyter: + ignore: ['temp_dir/**/*','docs/examples/**/*'] + theme: light + remove_tag_config: + remove_input_tags: + - hide_code - redirects: redirect_maps: 'APIs.md': 'https://www.openml.org/apis' @@ -98,9 +104,10 @@ plugins: - git-committers: repository: openml/docs nav: - - OpenML: - - Introduction: index.md - - Getting Started: notebooks/getting_started.ipynb + - OpenML: index.md + - Get Started: + - OpenML: intro/index.md + - 10 Minute Tutorial: notebooks/getting_started.ipynb - Concepts: - Main concepts: concepts/index.md - Data: concepts/data.md diff --git a/mkdocs.yml b/mkdocs.yml index 9a62216c..f13d3b3a 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -120,6 +120,8 @@ plugins: docstring_section_style: table show_docstring_functions: true docstring_style: numpy + follow_imports: false + show_submodules: false - gen-files: scripts: - scripts/gen_python_ref_pages.py @@ -131,9 +133,10 @@ plugins: - git-committers: repository: openml/docs nav: - - OpenML: - - Introduction: index.md - - Getting Started: notebooks/getting_started.ipynb + - OpenML: index.md + - Get Started: + - OpenML: intro/index.md + - 10 Minute Tutorial: notebooks/getting_started.ipynb - Concepts: - Main concepts: concepts/index.md - Data: concepts/data.md @@ -213,6 +216,7 @@ extra_css: - css/extra.css extra_javascript: - js/extra.js + - js/reset_nav.js exclude_docs: | scripts/ old/ diff --git a/requirements.txt b/requirements.txt index 70de847a..74a85356 100644 --- a/requirements.txt +++ b/requirements.txt @@ -5,14 +5,15 @@ mkdocs-redirects==1.2.1 mkdocs-jupyter==0.25.0 mkdocs-awesome-pages-plugin==2.9.3 mkdocs-multirepo-plugin==0.8.3 -mkdocs-autorefs -mkdocs-section-index -mkdocs-gen-files -mkdocs-literate-nav -mkdocs-git-committers-plugin-2 -mkdocs-git-revision-date-localized-plugin -mkdocstrings -mkdocstrings-python -markdown-include +mkdocs-autorefs==1.2.0 +mkdocs-section-index==0.3.9 +mkdocs-gen-files==0.5.0 +mkdocs-literate-nav==0.6.1 +mkdocs-git-committers-plugin-2==2.5.0 +mkdocs-git-revision-date-localized-plugin==1.3.0 +mkdocstrings==0.26.2 +mkdocstrings-python==1.12.1 +markdown-include==0.8.1 notebook==6.4.12 -tqdm \ No newline at end of file +jupyter_contrib_nbextensions==0.7.0 +tqdm diff --git a/scripts/gen_python_ref_pages.py b/scripts/gen_python_ref_pages.py index c6fc321a..c89eb8e5 100644 --- a/scripts/gen_python_ref_pages.py +++ b/scripts/gen_python_ref_pages.py @@ -12,43 +12,42 @@ import os import shutil -# Move the python code and example folders into the root folder. This is necessary because the literate-nav has very strong -# opinions on where the files should be located. It refuses to work from the temp_dir directory. -def copy_folders_to_destinations(source_folders:list[str], destination_folders:list[str]): - """ - Copies folders from source to specified destinations and overwrites if they already exist. - - Parameters: - - source_folders (list of str): List of paths to the source folders. - - destination_folders (list of str): List of full paths to the target directories, including the new folder names. - """ - if len(source_folders) != len(destination_folders): - return +# Clean a folder completely +def clean_folder(folder: Path): + if folder.exists() and folder.is_dir(): + shutil.rmtree(folder) - # Copy each folder to its specified destination - for src, dest in zip(source_folders, destination_folders): - # Ensure the parent directory of the destination path exists - os.makedirs(os.path.dirname(dest), exist_ok=True) - - # Remove the folder if it already exists - if os.path.exists(dest): - shutil.rmtree(dest) - - # Copy the folder - shutil.copytree(src, dest) - -temp_dir = Path(__file__).parent.parent / "temp_dir" / "python" +root = Path(__file__).parent.parent +temp_dir = root / "temp_dir" / "python" + +# Destination folders +destination_folders = [ + root / "docs" / "python", + root / "docs" / "examples", + root / "openml", +] + +# Clean all destination folders +for folder in destination_folders: + clean_folder(folder) + +# Source folders source_folders = [ temp_dir / "docs", - temp_dir / "openml", temp_dir / "examples", + temp_dir / "openml", ] -destination_folders = [ - Path(__file__).parent.parent / "docs" / "python", - Path(__file__).parent.parent / "openml", - Path(__file__).parent.parent / "docs" / "examples" # Move them straight here to avoid duplication. mkdocs-jupyter will handle them. -] -copy_folders_to_destinations(source_folders, destination_folders) + +# Copy source to destination +def copy_folders(source_folders: list[Path], destination_folders: list[Path]): + if len(source_folders) != len(destination_folders): + raise ValueError("Source and destination lists must have the same length.") + + for src, dest in zip(source_folders, destination_folders): + if src.exists(): + shutil.copytree(src, dest) + +copy_folders(source_folders, destination_folders) # Generate the reference page docs nav = mkdocs_gen_files.Nav()