Open Source Heartbeat: Open Source Heartbeat

Open Source Heartbeat

vsoch commented on issue kubernetes-sigs/scheduler-plugins#722.

> 120 seems too much as a general default value IMO. Actually in additional to plugin-level config, it also honors PodGroup-level config, which can be specified in the PodGroup spec, and it takes precedence over the plugin-level one: …

View Comment

vsoch commented on issue spack/spack#43331.

This is still failing almost all our builds and updates, almost every night, reliably. :cry: …

View Comment

vsoch pushed to singularityhub/shpc-registry

Merge pull request #220 from singularityhub/update/containers-2024-04-18

[bot] update/containers-2024-04-18</small>

View Commit

vsoch pushed to researchapps/kueue

Merge pull request #791 from kubernetes-sigs/dependabot/go_modules/github.com/kubeflow/common-0.4.7

Bump github.com/kubeflow/common from 0.4.6 to 0.4.7</small>

View Commit

vsoch commented on issue kubernetes-sigs/kueue#2001.

hmm the rebase won’t work because it adds my user to all the previous commits. …

View Comment

vsoch pushed to converged-computing/operator-experiments

experiment 10: complete run of all schedulers on same cluster

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue snakemake/snakemake-executor-plugin-googlebatch#29.

Ping @johanneskoester can you review again?…

View Comment

vsoch pushed to researchapps/wg-image-compatibility

add subject to artifact

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to researchapps/kueue

review: aldo

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch open issue kubernetes-sigs/scheduler-plugins#722.

Rejection due to timeout / unreserve

Hi! I want to make sure I’m not doing anything wrong. I bring up a new cluster on GKE: …View Comment

vsoch commented on kubernetes-sigs/kueue

View Comment

vsoch commented on issue flux-framework/flux-k8s#74.

Failure due to controller entry point change from 5 days ago. Hopefully won

vsoch pushed to converged-computing/operator-experiments

kueue is working!

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to vsoch/vsoch.github.io

Update 2024-04-15-seeing-yourself.md

View Commit

vsoch pushed to researchapps/flux-sched

qmanager: Preserve reservations across sched-loop iterations

Problem: Currently we remove all of the temporary reservations created in sched-fluxion-resource to backfill jobs. This has a side effect in that we can’t use that information as start-time estimates for pending jobs.

Preserve those temporary reservations across two consecutive schedule loop iterations.</small>

View Commit

vsoch commented on opencontainers/wg-image-compatibility

View Comment

vsoch commented on issue flux-framework/flux-sched#1178.

Woot!! Ping @trws :green_circle: …

View Comment

vsoch created a new branch, add-ssl at converged-computing/rainbow

View Repository

vsoch pushed to singularityhub/shpc-registry

Automated deployment to update containers 2024-04-15 (#219)

Co-authored-by: github-actions github-actions@users.noreply.github.com</small>

View Commit

vsoch pushed to researchapps/wg-image-compatibility

add use cases so brandon is happy

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on opencontainers/wg-image-compatibility

View Comment

vsoch commented on issue opencontainers/wg-image-compatibility#15.

To be clear for this comment: https://github.com/opencontainers/wg-image-compatibility/pull/15#discussion_r1555142734 …

View Comment

vsoch commented on issue flux-framework/flux-python#11.

Sure thing, thanks for the notice! The flux versions are moving very quickly these days. Going to close the issue - please re-open or comment if something else comes up….

View Comment

vsoch pushed to flux-framework/Tutorials

Merge pull request #36 from flux-framework/fix-flux-tree

hotfix: flux-tree stat -> stats</small>

View Commit

vsoch pushed to converged-computing/ensemble-operator

test: adding testing setup (#13)

  • test: adding testing setup

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch merged a pull request to converged-computing/ensemble-experiments

View Pull Request

vsoch pushed to rseng/software

Merge pull request #370 from rseng/update/software-2024-04-14

Update from update/software-2024-04-14</small>

View Commit

vsoch pushed to flux-framework/Tutorials

Removes old notebook (#35)

View Commit

vsoch released 0.0.2.

## What’s Changed

  • gke: allow customization of autoscaling strategy by @vsoch in https://github.com/converged-computing/kubescaler/pull/21

Full Changelog: https://github.com/converged-computing/kubescaler/compare/0.0.19…0.0.2</small>View Comment

vsoch pushed to converged-computing/kubescaler

add support for node scale request

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/ensemble-operator

wip: testing different orderings (#12)

  • wip: testing different orderings

Problem: randomize is limited for ordering Solution: change randomize “boolean” into an order variable that can take several forms. We will want to test this to see how order of jobs can impact an ensemble with autoscaling.

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/ensemble-operator

preparing to run experiments

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/ensemble-experiments

Fix title of run3 plot

View Commit

vsoch created a new branch, main at converged-computing/ensemble-experiments

View Repository

vsoch opened a pull request to spack/spack

View Pull Request

vsoch pushed to researchapps/flux-sched

test: init flywheel in tests and try fwinit

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to flux-framework/spack

Automated deployment to update flux-core versions 2024-04-13 (#176)

Signed-off-by: github-actions github-actions@users.noreply.github.com Co-authored-by: github-actions github-actions@users.noreply.github.com</small>

View Commit

vsoch commented on issue flux-framework/flux-sched#1169.

There is also something called a key value flyweight, I wonder if we need to use that for some of the subsystem maps? https://www.boost.org/doc/libs/1_79_0/libs/flyweight/example/key_value.cpp. I also don’t know the difference between when they show: …

View Comment

vsoch pushed to flux-framework/flux-framework.github.io

Merge pull request #113 from flux-framework/release-docs-2024-04-13

Update from release-docs-2024-04-13</small>

View Commit

vsoch pushed to flux-framework/Tutorials

Cleanup and improve text for RIKEN tutorial (#34)

  • Fixes some URLs
  • Updates notebook “goals”
  • Updates description of job throughput fig and adds proper figure numbering/captions
  • Adds proper figure numbers and captions to module 3
  • Adds fig numbers, captions, and footnotes to every figure in tutorial
  • Updates refs to figures in text to correspond to figure numbering
  • Updates verb tenses, fixes a DYAD figure, and adds a link to survey</small>

View Commit

vsoch pushed to flux-framework/Tutorials

Merge pull request #32 from TauferLab/riken-dyad-hotfix

Hotfix for a bug in DyadTorchDataLoader</small>

View Commit

vsoch pushed to researchapps/flux-sched

fix: convert strings to boost:flyweight

Problem: string comparison is immensely inefficient, taking 6-10% of resources (reported by Tom) for a trace. Solution: start with the root of the issue in the scoring module and work backwards to convert string types to boost:: flyweight. I tried to have a minimal footprint here but it spread really quickly

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch created a new branch, add-boost-flyweight at researchapps/flux-sched

View Repository

vsoch opened a pull request to flux-framework/flux-sched

View Pull Request

vsoch commented on issue flux-framework/flux-sched#1169.

Some tiny progress! Thanks to @milroy for seeing this. Here is the first failure to build (this is for the focal build) …

View Comment

vsoch pushed to converged-computing/rainbow

Merge pull request #30 from converged-computing/add/devcontainer

dev: add vscode development container and docs for usage</small>

View Commit

vsoch pushed to converged-computing/kubescaler

formatting

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to snakemake/snakemake-executor-plugin-googlebatch

fix: cos credentials only for settings

Problem: the COS container and credentials should be exposed in the executor settings. Solution: add them there.

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to singularityhub/shpc-registry

Automated deployment to update containers 2024-04-11 (#218)

Co-authored-by: github-actions github-actions@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/fluxterm

typo: ridiculous in README

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/fluxterm

fix: release workflow

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch opened a pull request to spack/spack

View Pull Request

vsoch commented on issue snakemake/snakemake-executor-plugin-googlebatch#29.

huh, but if it works for you that’s great! Let’s get @johanneskoester to try it out for another test….

View Comment

vsoch pushed to rseng/rsepedia-analysis

Merge pull request #116 from rseng/update/analysis-2024-04-10

Update from update/analysis-2024-04-10</small>

View Commit

vsoch pushed to flux-framework/spack

Automated deployment to update flux-sched versions 2024-04-10 (#174)

Signed-off-by: github-actions github-actions@users.noreply.github.com Co-authored-by: github-actions github-actions@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/ensemble-operator

adding experiment

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue snakemake/snakemake-executor-plugin-googlebatch#46.

Also if you are new to rebasing, I made a very dumb video a few years ago, haha. https://youtu.be/9F4RE2_yn6I …

View Comment

vsoch pushed to sciworks/spack-updater

fix; add gpg init to spack

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to flux-framework/spack

test: buildcache for libsodium (#170)

  • test: buildcache for libsodium

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch open issue flux-framework/flux-k8s#73.

Test using Active queue "Activate Siblings" vs Current approach

Coscheduling uses a strategy of moving siblings to the active Q when a pod that is about to hit a node hits the Permit endpoint. The strategy I have in place to schedule the first pod seems to be working OK, but I’d like to (after we merge the current PR) test this approach. I can see pros and cons to both ways - having to rely on another queue (subject to other issues) seems less ideal than having them all scheduled at the right time. On the other hand, if something might happen with the latter approach that warrants the active queue, maybe it makes sense. I think empirical testing can help us determine which strategy we like best (or even a combination of the two)….View Comment

vsoch open issue flux-framework/flux-k8s#71.

Update fluence to go 1.20 or 1.21

We are going to hit issues using fluence (go 1.19) with other integrations like rainbow (go 1.20) and on our systems (go 1.20), and after #69 should consider updating. …View Comment

vsoch commented on issue flux-framework/flux-core#5862.

> Additionally, if there are any other tips or workarounds for building and testing Flux within a Singularity container without sudo privileges, I’d like to hear them. …

View Comment

vsoch merged a pull request to singularityhub/shpc-registry

View Pull Request

vsoch reviewed a opencontainers/wg-image-compatibility pull request

None

View Review

vsoch pushed to converged-computing/rainbow

docs: separate early from current designs

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to rseng/software

Merge pull request #369 from rseng/update/software-2024-04-07

Update from update/software-2024-04-07</small>

View Commit

vsoch opened a pull request to converged-computing/rainbow

View Pull Request

vsoch pushed to snakemake/snakemake-executor-plugin-googlebatch

fix: make get_snakefile return rel path to snakefile (#40) (#41)

Should fix #39. Previously #40

Co-authored-by: Cade Mirchandani cmirchan@ucsc.edu</small>

View Commit

vsoch commented on issue snakemake/snakemake-executor-plugin-googlebatch#29.

That

vsoch pushed to converged-computing/rainbow

Merge pull request #27 from converged-computing/add-state-endpoint

feat: state endpoint</small>

View Commit

vsoch created a new branch, add-constraint-select-algorithm at converged-computing/rainbow

View Repository

vsoch pushed to compspec/jobspec-go

fix: bug with omitemtpy->omitempty

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to compspec/jobspec

Merge pull request #10 from compspec/allow-parameter-properties

fix: nest parameter properties in their own section</small>

View Commit

vsoch pushed to flux-framework/flux-k8s

test: only allow scheduling first pod

Problem: we currently allow any pod in the group to make the request Solution: Making a BIG assumption that might be wrong, I am adding logic that only allows scheduling (meaning going through PreFilter with AskFlux) given that we see the first pod in the listing. In practice this is the first index (e.g., index 0) which based on our sorting strategy (timestamp then name) I think might work. But I am not 100% on that. The reason we want to do that is so the nodes are chosen for the first pod, and then the group can quickly follow and be actually assigned. Before I did this I kept seeing huge delays in waiting for the queue to move (e.g., 5/6 pods Running and the last one waiting, and then kicking in much later like an old car) and I think with this tweak that is fixed. But this is my subjective evaluation. I am also adding in the hack script for deploying to gke, which requires a push instead of a kind load.

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch opened a pull request to converged-computing/ensemble-operator

View Pull Request

vsoch pushed to rseng/devstories

tweak episode notes

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue oras-project/oras-py#129.

@my5cents looks like you just need one more run of black and we’re good (take note of the version)….

View Comment

vsoch pushed to flux-framework/flux-operator

update gozmq example (#225)

  • update gozmq example

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to flux-framework/flux-docs

Merge pull request #267 from flux-framework/add-architecture-slides

Add flux components slides</small>

View Commit

vsoch commented on issue flux-framework/flux-docs#267.

Thanks @garlick ! I’ll get started on these changes and ping you when they are ready for a second review. I really appreciate it!…

View Comment

vsoch commented on flux-framework/Tutorials

View Comment

vsoch commented on issue sustainable-computing-io/peaks#9.

Perfect, thank you! Is there a link to that somewhere (prominently) here?…

View Comment

vsoch opened a pull request to spack/spack

View Pull Request

vsoch pushed to rseng/rsepedia-analysis

Merge pull request #115 from rseng/update/analysis-2024-04-03

Update from update/analysis-2024-04-03</small>

View Commit

vsoch reviewed a oras-project/oras-py pull request

None

View Review

vsoch pushed to flux-framework/spack

Merge pull request #166 from flux-framework/release/flux-core-v0.61.0

Update from release/flux-core-v0.61.0</small>

View Commit

vsoch pushed to flux-framework/flux-framework.github.io

Merge pull request #111 from flux-framework/release-docs-2024-04-03

Update from release-docs-2024-04-03</small>

View Commit

vsoch pushed to converged-computing/rainbow-experiments

add linear model with memory features

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch commented on issue sustainable-computing-io/peaks#9.

Hi! Is work still underway here? I am interested in the project idea but I don’t see any custom scheduler plugin code (is it somewhere else)? Thanks! …

View Comment

vsoch commented on issue spack/spack-infrastructure#795.

Thank you!…

View Comment

vsoch pushed to opencontainers/specs.opencontainers.org

Merge pull request #6 from Nicceboy/main

README: fix website url</small>

View Commit

vsoch pushed to converged-computing/rainbow-experiments

add linear and log linear regression for spack builds

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch open issue converged-computing/rainbow#25.

The gnomes have things to say

I hate to disappoint, there are no cookies here. :cookie: :cookie: …View Comment

vsoch pushed to converged-computing/operator-experiments

fluence: first successful set of runs without clogging

Signed-off-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to singularityhub/shpc-registry

Automated deployment to update containers 2024-04-01 (#215)

Co-authored-by: github-actions github-actions@users.noreply.github.com</small>

View Commit

vsoch pushed to converged-computing/rainbow-experiments

Add reliabuild version range experiments (#1)

  • wip: running reliabuild experiments

Signed-off-by: vsoch vsoch@users.noreply.github.com

  • refactor: match based on versions

Signed-off-by: vsoch vsoch@users.noreply.github.com

  • wip: refactored reliabuild experiments

This experiment has become primarily about the build use case, and specifically package versions as the compatibility metadata. Since we cannot generate every perfect cluster, I think the result is going to be an example of how adding too much (in terms of requirements) leads to a poorer outcomes. Of course it is entirely based on the number of clusters I made, etc. I currently have only 100 clusters for about 10K jobs and over 200 dependency metadata (of course not every one is relevant for every package) so the odds of getting a matching cluster are pretty slim when you start asking for more detail. Hopefully folks can help to think of smarter experiments too.

Signed-off-by: vsoch vsoch@users.noreply.github.com

  • results: add raw files in case I do something stupid

Signed-off-by: vsoch vsoch@users.noreply.github.com

  • add results

Signed-off-by: vsoch vsoch@users.noreply.github.com


Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>

View Commit

vsoch pushed to rseng/software

Merge pull request #368 from rseng/update/software-2024-03-31

Update from update/software-2024-03-31</small>

View Commit

vsoch opened a pull request to converged-computing/rainbow-experiments

View Pull Request

vsoch open issue converged-computing/rainbow#24.

Add robust logging library to control verbosity

As my experiments are getting very large, I am commenting out debugging messages so the terminal doesn’t explode. It would be better to use a logging library proper….View Comment

vsoch commented on issue singularityhub/singularity-cli#220.

I think it