vsoch pushed to converged-computing/metrics-operator-experiments
new testing container for hpcg
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
add work for today
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
boost: Add 1.85.0 (#43788)
- boost: Add 1.85.0
- Add conflict for Boost 1.85.0 stacktrace change</small>
vsoch pushed to converged-computing/metrics-containers
remove pennant
moving into metrics-operator-experiments
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #223 from singularityhub/update/containers-2024-04-29
[bot] update/containers-2024-04-29</small>
vsoch pushed to compspec/jobspec
feat: finishing up group in group example
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #372 from rseng/update/software-2024-04-28
Update from update/software-2024-04-28</small>
vsoch pushed to flux-framework/spack
containers: Add Fedora 40, 39 (#43847)
vsoch pushed to converged-computing/metrics-operator-experiments
experiment: pennant on gke
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-containers
add pennant with gpu/flux
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/jobspec
wip: saving state
My computer is acting funky, do not want to lose this work.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #222 from singularityhub/update/containers-2024-04-25
[bot] update/containers-2024-04-25</small>
vsoch pushed to converged-computing/ensemble-experiments
add batch experiments
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-operator
fix: bug with requeue (#16)
Problem: if there is a timeout error, we need to requeue Solution: do that.
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/scheduler-sniffer
Merge pull request #4 from converged-computing/add-plot-prototype
feat: prototype for scheduler sniffer vis</small>
vsoch pushed to researchapps/flux-sched
reapi c++: add satisfy endpoint
Problem: the c++ bindings do not have satisfy support. Solution: add satisfy to them.
In practice I was adding this for exposure to the Go bindings, but I do not think it is necessary, because the Go bindings use the C bindings, which already have reapi_cli_match_satisfy. I saw that match_allocate seems to have support to provide the SATISFIABILITY match_op, which is provided to the traverser, so I tried to call that same function. I am opening this PR in case it is interesting or useful. If not, please close and disregard.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/flux-sched
Merge pull request #1178 from researchapps/refactor-qp-base
queue_base_manager: refactor to remove impl</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #220 from singularityhub/update/containers-2024-04-18
[bot] update/containers-2024-04-18</small>
vsoch pushed to researchapps/kueue
Merge pull request #791 from kubernetes-sigs/dependabot/go_modules/github.com/kubeflow/common-0.4.7
Bump github.com/kubeflow/common from 0.4.6 to 0.4.7</small>
vsoch pushed to converged-computing/operator-experiments
experiment 10: complete run of all schedulers on same cluster
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/wg-image-compatibility
add subject to artifact
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/kueue
review: aldo
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/operator-experiments
kueue is working!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
Update 2024-04-15-seeing-yourself.md
vsoch pushed to researchapps/flux-sched
qmanager: Preserve reservations across sched-loop iterations
Problem: Currently we remove all of the temporary reservations created in sched-fluxion-resource to backfill jobs. This has a side effect in that we can’t use that information as start-time estimates for pending jobs.
Preserve those temporary reservations across two consecutive schedule loop iterations.</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-04-15 (#219)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to researchapps/wg-image-compatibility
add use cases so brandon is happy
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/Tutorials
Merge pull request #36 from flux-framework/fix-flux-tree
hotfix: flux-tree stat -> stats</small>
vsoch pushed to converged-computing/ensemble-operator
test: adding testing setup (#13)
- test: adding testing setup
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #370 from rseng/update/software-2024-04-14
Update from update/software-2024-04-14</small>
vsoch pushed to flux-framework/Tutorials
Removes old notebook (#35)
vsoch pushed to converged-computing/kubescaler
add support for node scale request
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-operator
wip: testing different orderings (#12)
- wip: testing different orderings
Problem: randomize is limited for ordering Solution: change randomize “boolean” into an order variable that can take several forms. We will want to test this to see how order of jobs can impact an ensemble with autoscaling.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-operator
preparing to run experiments
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-experiments
Fix title of run3 plot
vsoch pushed to researchapps/flux-sched
test: init flywheel in tests and try fwinit
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Automated deployment to update flux-core versions 2024-04-13 (#176)
Signed-off-by: github-actions github-actions@users.noreply.github.com Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #113 from flux-framework/release-docs-2024-04-13
Update from release-docs-2024-04-13</small>
vsoch pushed to flux-framework/Tutorials
Cleanup and improve text for RIKEN tutorial (#34)
- Fixes some URLs
- Updates notebook “goals”
- Updates description of job throughput fig and adds proper figure numbering/captions
- Adds proper figure numbers and captions to module 3
- Adds fig numbers, captions, and footnotes to every figure in tutorial
- Updates refs to figures in text to correspond to figure numbering
- Updates verb tenses, fixes a DYAD figure, and adds a link to survey</small>
vsoch pushed to flux-framework/Tutorials
Merge pull request #32 from TauferLab/riken-dyad-hotfix
Hotfix for a bug in DyadTorchDataLoader</small>
vsoch pushed to researchapps/flux-sched
fix: convert strings to boost:flyweight
Problem: string comparison is immensely inefficient, taking 6-10% of resources (reported by Tom) for a trace. Solution: start with the root of the issue in the scoring module and work backwards to convert string types to boost:: flyweight. I tried to have a minimal footprint here but it spread really quickly
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
Merge pull request #30 from converged-computing/add/devcontainer
dev: add vscode development container and docs for usage</small>
vsoch pushed to converged-computing/kubescaler
formatting
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to snakemake/snakemake-executor-plugin-googlebatch
fix: cos credentials only for settings
Problem: the COS container and credentials should be exposed in the executor settings. Solution: add them there.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-04-11 (#218)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to converged-computing/fluxterm
typo: ridiculous in README
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/fluxterm
fix: release workflow
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #116 from rseng/update/analysis-2024-04-10
Update from update/analysis-2024-04-10</small>
vsoch pushed to flux-framework/spack
Automated deployment to update flux-sched versions 2024-04-10 (#174)
Signed-off-by: github-actions github-actions@users.noreply.github.com Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-operator
adding experiment
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to sciworks/spack-updater
fix; add gpg init to spack
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
test: buildcache for libsodium (#170)
- test: buildcache for libsodium
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
docs: separate early from current designs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #369 from rseng/update/software-2024-04-07
Update from update/software-2024-04-07</small>
vsoch pushed to snakemake/snakemake-executor-plugin-googlebatch
fix: make get_snakefile return rel path to snakefile (#40) (#41)
Should fix #39. Previously #40
Co-authored-by: Cade Mirchandani cmirchan@ucsc.edu</small>
vsoch pushed to converged-computing/rainbow
Merge pull request #27 from converged-computing/add-state-endpoint
feat: state endpoint</small>
vsoch pushed to compspec/jobspec-go
fix: bug with omitemtpy->omitempty
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/jobspec
Merge pull request #10 from compspec/allow-parameter-properties
fix: nest parameter properties in their own section</small>
vsoch pushed to flux-framework/flux-k8s
test: only allow scheduling first pod
Problem: we currently allow any pod in the group to make the request Solution: Making a BIG assumption that might be wrong, I am adding logic that only allows scheduling (meaning going through PreFilter with AskFlux) given that we see the first pod in the listing. In practice this is the first index (e.g., index 0) which based on our sorting strategy (timestamp then name) I think might work. But I am not 100% on that. The reason we want to do that is so the nodes are chosen for the first pod, and then the group can quickly follow and be actually assigned. Before I did this I kept seeing huge delays in waiting for the queue to move (e.g., 5/6 pods Running and the last one waiting, and then kicking in much later like an old car) and I think with this tweak that is fixed. But this is my subjective evaluation. I am also adding in the hack script for deploying to gke, which requires a push instead of a kind load.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/devstories
tweak episode notes
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
update gozmq example (#225)
- update gozmq example
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-docs
Merge pull request #267 from flux-framework/add-architecture-slides
Add flux components slides</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #115 from rseng/update/analysis-2024-04-03
Update from update/analysis-2024-04-03</small>
vsoch pushed to flux-framework/spack
Merge pull request #166 from flux-framework/release/flux-core-v0.61.0
Update from release/flux-core-v0.61.0</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #111 from flux-framework/release-docs-2024-04-03
Update from release-docs-2024-04-03</small>
vsoch pushed to converged-computing/rainbow-experiments
add linear model with memory features
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to opencontainers/specs.opencontainers.org
Merge pull request #6 from Nicceboy/main
README: fix website url</small>
vsoch pushed to converged-computing/rainbow-experiments
add linear and log linear regression for spack builds
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/operator-experiments
fluence: first successful set of runs without clogging
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-04-01 (#215)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow-experiments
Add reliabuild version range experiments (#1)
- wip: running reliabuild experiments
Signed-off-by: vsoch vsoch@users.noreply.github.com
- refactor: match based on versions
Signed-off-by: vsoch vsoch@users.noreply.github.com
- wip: refactored reliabuild experiments
This experiment has become primarily about the build use case, and specifically package versions as the compatibility metadata. Since we cannot generate every perfect cluster, I think the result is going to be an example of how adding too much (in terms of requirements) leads to a poorer outcomes. Of course it is entirely based on the number of clusters I made, etc. I currently have only 100 clusters for about 10K jobs and over 200 dependency metadata (of course not every one is relevant for every package) so the odds of getting a matching cluster are pretty slim when you start asking for more detail. Hopefully folks can help to think of smarter experiments too.
Signed-off-by: vsoch vsoch@users.noreply.github.com
- results: add raw files in case I do something stupid
Signed-off-by: vsoch vsoch@users.noreply.github.com
- add results
Signed-off-by: vsoch vsoch@users.noreply.github.com
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #368 from rseng/update/software-2024-03-31
Update from update/software-2024-03-31</small>
vsoch pushed to converged-computing/rainbow
docs: move prototype to bottom
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/Tutorials
add: flux riken tutorial 2024
This adds the new directory for the Flux Riken tutorial, with the following additions:
- flux-tree was removed from flux-sched and is added here
- tutorial files were kept in rse-ops, are now moved here
- New tutorial content: flux tree and hierarchy section/examples
- New tutorial content: flux archive (previously flux filemap)
- images that show a dummy example of job throughout
- update of names in login page / directory to be more general
- automated builds updated for riken
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/singularity-compose
ci: restore tests in gha, linting, license bump (#70)
- ci: restore tests in gha, linting, license bump
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-03-28 (#214)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
Merge pull request #224 from flux-framework/add-gomq-example
example: gozmq for pair to pair</small>
vsoch pushed to flux-framework/flux-docs
ci: test python 3.11 for readthedocs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #114 from rseng/update/analysis-2024-03-27
Update from update/analysis-2024-03-27</small>
vsoch pushed to flux-framework/spack
build(deps): bump actions/setup-python from 5.0.0 to 5.1.0 (#43384)
vsoch pushed to converged-computing/gomq
example: add sampler to sink pattern
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-03-25 (#213)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to researchapps/wg-image-compatibility
proposal f: simple artifact with metadata attributes
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-operator
feat: scaling for workload-demand algorithm
It is working! This is just really cool - I can launch a workload (a matrix of jobs) and then watch it wait a few increments, and then the cluster scales according to the rules that I set. I have not yet implemented scaling down, but can do that next.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #367 from rseng/update/software-2024-03-24
Update from update/software-2024-03-24</small>
vsoch pushed to converged-computing/ensemble-operator
wip: add workload demand
This is the starting setup for our first algorithm! Right now, we are defining jobs in the CRD, and the minicluster is made interactive so we do not need to worry about adding random sleeps or waits, etc. We also have validation and member type support for each algorithm (starting just with minicluster). Where I am at now is that we are instantiating the workload-demand algorithm (it is an interface) and it is receiving input from the gRPC sidecar, and then in the “MakeDecision” function we have our jobs spec (from the CRD) and queue status. Next (after my run) I am going to work on this interaction! We will want the algorithm to prepare a response payload that directs the gRPC service to run all the lammps jobs, and then on the operator side we will update the jobs spec to indicate they are run (decrementing the count) and on the next iteration check the queue status again for size, etc and adjust as needed. This is still early but very cool so far.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
add post on distributed fractal
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to hpc-social/jobs
try restoring jobs to feed
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/ensemble-operator
typos
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
Merge pull request #223 from flux-framework/add-citation-cff
add citation cff file and update readme/index</small>
vsoch pushed to vsoch/vsoch.github.io
add flux operator post
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to snakemake/snakemake-executor-plugin-kueue
fix: underlying plugin argument for storage args removed (#9)
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Update aws-ofi-nccl to use the hwloc option (#43287)
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #110 from flux-framework/add-flux-operator-paper
pub: flux operator</small>
vsoch pushed to converged-computing/rainbow-experiments
add simulation!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/jobspec
Merge pull request #8 from compspec/tweaks-to-schema
tweaks to schema</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #113 from rseng/update/analysis-2024-03-20
Update from update/analysis-2024-03-20</small>
vsoch pushed to flux-framework/spack
py-snoop: new package (#42945)
vsoch pushed to converged-computing/rainbow-experiments
add notes on simulation metadata and plan for experiment
I will likely start the actual simulation running tomorrow, kind of tired tonight.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
dep: remove jobspec from rainbow
Will be maintained in separate jobspec library.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
py-optax: add new version (#43169)
vsoch pushed to converged-computing/rainbow
feat: support for script/batch directive in jobspec
Problem: many tasks will require logic beyond a one off command Solution: allow the user to define a script or batch directive that, when found, will be written to file first. This is an experimental feature.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/jobspec-go
Merge pull request #6 from compspec/add/scripts
update jobspec for experimental types</small>
vsoch pushed to compspec/jobspec
Merge pull request #7 from compspec/add-stage-copy
Testing out idea for settings</small>
vsoch pushed to singularityhub/shpc-registry
manual update run for march 2024 (#211)
Signed-off-by: vsoch vsochat@stanford.edu</small>
vsoch pushed to flux-framework/flux-operator
Merge pull request #222 from flux-framework/remove-refactor-condifigs
cleanup: remove old refactor configs</small>
vsoch pushed to converged-computing/distributed-fractal
Merge pull request #6 from converged-computing/bug-missing-image
fix: bug that image is not generated</small>
vsoch pushed to compspec/jobspec
badge: add pypi badge
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rse-ops/lammps-matrix
scheduler: save state of experiments
I spent a few days on this, and ultimately am abandoning this approach to use lammps. I do not have it in me to debug 18 different lammps installs on common clusters. It worked previously because we could deploy the exact environment alongside lammps that the container was built with (and thus it run). Trying to remove that extra layer, and just run these on the base OS is a huge undertaking that I have decided is too much anguish to do.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/dragonboat-example
typo
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
Merge pull request #17 from converged-computing/add/compatibility-artifact-to-jobspec-converter
feat: support to convert from artifact to jobspec</small>
vsoch pushed to compspec/schemas
Merge pull request #9 from compspec/separate-supercontainers
refactor: separating supercontainers into resource types</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-03-14 (#209)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-k8s
update: adding back in fluence logic
Problem: fluence is missing! Solution: add back fluence. This is a different design in that we do the asking from the perspective of the pod group, meaning that we get back a full set of nodes, and save them (assigned exactly) to specific pods. This could also be more lenient - e.g., creating a cache of the list and then taking off the cache, but I like the finer granularity of 1:1 mapping for future issues that might arise (where one pod needs a new node). This design also introduces a nice feature that we can ask for the resources (meaning creating a jobspec) for exactly what we need across pods for the group because we are listing all pods for the group before we generate the jobspec. I left it as it currently was before (using one representative pod) to not incur too many changes but this definitely can be tried. There is likely more work to be done to test edge cases and account for resources when fluence starts (and be able to load a state if it restarts) but this is pretty great for a first shot! The local lammps experiment ran without clogging and I am testing on GKE as a next step. Finally, I think there is a lot of poetential error in allowing a ton of other PreFilter plugins to exist, each of which could return their own set of nodes to consider that might mismatch what fluence has decided on. For this reason I have done aggressive pruning and we can add things back as we see fit.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #112 from rseng/update/analysis-2024-03-13
Update from update/analysis-2024-03-13</small>
vsoch pushed to flux-framework/spack
abinit: add version 9.10.5 (#43148)
set the FC variable to the MPI Fortran compiler and also set the F90 variable to the same compiler for versions 9.8 and up. FC needs to be set because the configure script still uses FC.</small>
vsoch pushed to converged-computing/flux-usernetes
work to do same allreduce on two nodes
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-usernetes
preparing to run ML example
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-03-11 (#208)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
fix typo in dependency (#43105)
vsoch pushed to converged-computing/rainbow
adding missing jobspec for io
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #365 from rseng/update/software-2024-03-10
Update from update/software-2024-03-10</small>
vsoch pushed to flux-framework/flux-restful-api
Merge pull request #67 from flux-framework/contributors/update-2024-03-10
[tributors] contributors/update-2024-03-10</small>
vsoch pushed to converged-computing/flux-usernetes
preparing to estimate times for osu benchmarks (#5)
- preparing to estimate times for osu benchmarks
Signed-off-by: vsoch vsoch@users.noreply.github.com
- final tweaks for experiment tomorrow
going to be a long day, but worth it I think. I can do it.
Signed-off-by: vsoch vsoch@users.noreply.github.com
- clean up dockerfiles
Signed-off-by: vsoch vsoch@users.noreply.github.com
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
fix: broken image link in docs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
feat: add register -> nodes in graph endpoint
This adds the registration command for the subsystem, ensuring that the subsystem root is created (e.g., for IO) and the cluster name (e.g., IO for cluster keebler) created off of that. We currently allow any vertex to be created with an edge to itself (within the same subsystem) OR to a reference in the dominant subystem. We have that reference birectional so if/when there is a delete command we can parse the subsystem nodes (e.g., IO) to find the link to the dominant subsystem node, and then clean it up from the other side (and no dangling entries that no longer exist). Now that this is added I can work on a prototype of intents (basically a jobspec asking to request resources for this)
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-usernetes
Merge pull request #4 from converged-computing/fix-hpc7g
fix hpc7g</small>
vsoch pushed to compspec/jobspec-go
feat: add experimental jobspec (#4)
This adds the resources section to tasks, and it is intended to be flexible to allow serializing only.
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to snakemake/snakemake-storage-plugin-gcs
fix: repair GCS query string (#26)
This addresses #19 , #18 and #25.</small>
vsoch pushed to jairav/snakemake-storage-plugin-gcs
Update pyproject.toml
fix: gcs storage query schema fix. toml file reverted to remove manual version number.</small>
vsoch pushed to flux-framework/spack
Add 5.022 and cleanup how autoconf is done (#43085)
lint: formatting (#7)
- lint: formatting
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/devstories-episodes-2
episode 95: alan sill
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/devstories
episode 95: alan sill
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-k8s
refactor: testing idea to wrap coscheduling
This is the “skeleton” of a new idea to wrap coscheduling, adding in the logic for fluence only where it is needed, likely in the PodGroup (in the new fluence/core/core that wraps the same in coscheduling). This is just a skeleton because we are deploying the sidecar with the wrapped scheduling and absolutely no logic ported over to AskFlux. I think I have a sense of where to put this, but wanted to save this vanilla/skeleton state in case we need to go back to it. Note that it did not work to have fluence inherit the functions from coscheduler, so I opted for a strategy of adding it as a helper field, and then just using it when necessary.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #106 from flux-framework/add-publications-2024
Add publications 2024</small>
vsoch pushed to singularityhub/shpc-registry
Remove missing tag.
Signed-off-by: Vanessasaurus <814322+vsoch@users.noreply.github.com></small>
vsoch pushed to converged-computing/rainbow
Merge pull request #14 from converged-computing/add-basic-satisfy-request
feat: support for basic satisfy logic</small>
vsoch pushed to converged-computing/rainbow
feat: python support for receive jobs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #108 from flux-framework/release-docs-2024-03-05
Update from release-docs-2024-03-05</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #206 from singularityhub/update/containers-2024-03-04
[bot] update/containers-2024-03-04</small>
vsoch pushed to rseng/software
Merge pull request #364 from rseng/update/software-2024-03-03
Update from update/software-2024-03-03</small>
vsoch pushed to flux-framework/spack
Fix mgard: OpenMP on AppleClang (#42933)
macOS AppleClang does not provide OpenMP by default with XCode. Use LLVM’s OpenMP to fix compile errors of mgard with OpenMP (default).</small>
vsoch pushed to converged-computing/rainbow
feat: add support for assignment table
When the graph database returns clusters that satisfy a jobspec, they need to be redirected to the rainbow cluster to be assigned. This is a two step process, where first we add the jobid to the jobs table (and it is not assigned) and then we will add it to an assignment table with each cluster id that can receive it. At this point we are going to ask the clusters if they can satisfy (and when) to assign it, and I do think we need a push model for the assignment. We will need to allow for failure to connect (and retry) and some kind of heartbeat to do that, but I first need to think about how a cluster can have work pushed to it - likely we need a special client running there that I have not written yet
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
libogg does not build a shared libary with cmake (#42877)
-
when built with cmake, libogg does not build with a shared libary by default. This resolves that
-
spack style fixes
-
Clean up imports
-
enforce +pic when +shared</small>
vsoch pushed to converged-computing/flux-usernetes
bug: fix efa (#3)
- bug: fix efa
Problem: efa (via the basic test) is not working. Solution: we needed to add a “self” for ingress and egress to the security groups. efa is still not working with MPI but that can be a next step to figure out.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #205 from singularityhub/update/containers-2024-02-29
[bot] update/containers-2024-02-29</small>
vsoch pushed to researchapps/go-hwloc
Update shared (#1)
- test: building without libgcc
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/compspec-flux
Merge pull request #1 from compspec/add-tests
feat: add testing and standalone mode</small>
vsoch pushed to compspec/compspec
Merge pull request #20 from compspec/add-create-plugin
add support for more generic creation</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #110 from rseng/update/analysis-2024-02-28
Update from update/analysis-2024-02-28</small>
vsoch pushed to rse-ops/lammps-matrix
add simulation
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/go-hwloc
test removing go.mod
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/compspec-go
Merge pull request #27 from compspec/wip-for-rainbow-register
wip: integration with rainbow</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #204 from singularityhub/update/containers-2024-02-26
[bot] update/containers-2024-02-26</small>
vsoch pushed to compspec/jobspec-go
typo: JobSpec -> Jobspec
Problem: you had one job, Vanessa, one job… Solution: go sit in the corner and think about your bad capitalization life decisions.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
[laghos] Add a patch for MPI_Session (#42841)
vsoch pushed to converged-computing/rainbow
Merge pull request #9 from converged-computing/thinking-graph-submission
[wip] preparing to integrate scheduler</small>
vsoch pushed to converged-computing/operator-experiments
testing fluence on gke for clogging
clog does not happen at sizes 2, but does happen at 2 and 3. It could be state between the runs or something else.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/compspec-go
wip: integration with rainbow
This set of small changes is to support node generation for the rainbow scheduler. There was a small bug in the way I was serializing the cluster graph that it forgot the top level “graph” key, and I have also cleaned up passing of arguments for the plugins. The paths for containment also were slightly off. I will leave this PR open as I figure out how to actually accept the graph in rainbow. I wanted to use fluxion but I realize there is not support for extending or growing a graph so I would not be able to add more than one cluster.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/compspec-go
wip: add support for node extraction -> cluster metadata
Problem: we need a extract metadata for nodes and then parse into a cluster graph Solution: create a compspec create nodes subcommand.
In this PR I am adding a ClusterGraph, which still needs work to improve the output to easily map into a JGF (right now it has elements that can support any type that need further parsing). I am also generalizing the idea of plugins more, so we will have extractors and converters (that run create) but I need to finalize the design for the latter, right now the create commands are very separate. I am opening the PR sooner than later in case my computer explodes. A few problems I have run into is that NFD does not have cpu counts, let along physical vs. logical. This information is in /proc/cpuinfo for x86 but not arm. We also do not have a way to get socket -> core mapping. So likely we do need to add the hwloc extractor, and provide an automated build for doing that since it requires hwloc on the system. I will put some thought into this.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/half-baked
add more testing files
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/devstories
update feed action to checkout v4
vsoch pushed to hpc-social/good-first-issues
Update main.yml
vsoch pushed to hpc-social/good-first-issues
Update main.yml
vsoch pushed to compspec/compspec-ior
release: version bump
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/compspec-go
Merge pull request #22 from compspec/test-nfd-source
dep: refactor to use nfd-source that does not require k8s</small>
vsoch pushed to compspec/compspec
docs: update spec.md location
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/half-baked
add half-baked lammps
eww, like who eats lammps? the scientists, joe. They eat it all up.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/wg-image-compatibility
add proposal D
Proposal D is an extension to Proposal C. Proposal C defines an explicit example of a compatibility artifact, meaning what a single artifact would look like paired alongside an image in a registry (in some way) to describe its compatibility for image selection or similar. Proposal D defines a compatibility schema that is maintained by a compatibility interest group, for which the goal is to define the namespace of allowed metadata attributes and relationships between them. These two proposals are complementary and would work together to allow for validation and understanding of relationships between terms, but without adding complexity to the compatibility artifact (Proposal C) directly.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/spec
org: rename abi to asp since it is generic
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to compspec/compspec-go
Merge pull request #21 from compspec/rebrand-compspec
rename: supercontainers -> compspec</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #201 from singularityhub/update/containers-2024-02-19
[bot] update/containers-2024-02-19</small>
vsoch pushed to flux-framework/spack
build(deps): bump python-levenshtein in /lib/spack/docs (#42630)
Bumps python-levenshtein from 0.24.0 to 0.25.0.
updated-dependencies:
- dependency-name: python-levenshtein dependency-type: direct:production update-type: version-update:semver-minor …
Signed-off-by: dependabot[bot] support@github.com Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com></small>
vsoch pushed to flux-framework/flux-k8s
Merge pull request #63 from flux-framework/refactor-fluence-podgroup
[WIP] fluence: refactor to use new PodGroup</small>
vsoch pushed to flux-framework/flux-k8s
Merge pull request #62 from flux-framework/attempt-add-webhook-2
Attempt add webhook 2</small>
vsoch pushed to flux-framework/fluxion-go
Merge pull request #4 from flux-framework/add-build-matrix
test: adding testing matrix for os release matrix</small>
vsoch pushed to snakemake/snakemake-executor-plugin-googlebatch
chore(main): release 0.3.0 (#23)
:robot: I have created a release beep boop
## 0.3.0 (2024-02-15)
Features
- add simple example with cos (#22) (2951454)
- add support for Google Batch GPUs (#26) (f2af21c)
- expose network policy interfaces (#28) (41c8d44)
- preemption (#25) (d6913a1)
- support for boot disk type, size, and image (#27) (d5de5a1)
This PR was generated with Release Please. See documentation.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com></small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #199 from singularityhub/update/containers-2024-02-15
[bot] update/containers-2024-02-15</small>
vsoch pushed to singularityhub/install-singularity
Merge pull request #4 from sebastiangrimberg/main
Silence warning for setup-go>=v4
</small>
vsoch pushed to converged-computing/rainbow
Merge pull request #8 from converged-computing/add-global-token
add support for global token and kind example</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #108 from rseng/update/analysis-2024-02-14
Update from update/analysis-2024-02-14</small>
vsoch pushed to rse-ops/lammps-matrix
add note for singularity runs
the main issue here is needing to do binds for mpi. There might be a good / easy way to do that with that mpi4py thing / other I forget the name of!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/wg-image-compatibility
add proposal D
Proposal D is an extension to Proposal C. Proposal C defines an explicit example of a compatibility artifact, meaning what a single artifact would look like paired alongside an image in a registry (in some way) to describe its compatibility for image selection or similar. Proposal D defines a compatibility schema that is maintained by a compatibility interest group, for which the goal is to define the namespace of allowed metadata attributes and relationships between them. These two proposals are complementary and would work together to allow for validation and understanding of relationships between terms, but without adding complexity to the compatibility artifact (Proposal C) directly.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
Merge pull request #3 from converged-computing/add-accept-endpoint
add endpoint to accept jobs</small>
vsoch pushed to singularityhub/shpc-registry
Merge pull request #198 from singularityhub/update/containers-2024-02-12
[bot] update/containers-2024-02-12</small>
vsoch pushed to flux-framework/flux-go
docs: add in image for readme
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
add job submit commands, database, and new client
This change adds the JobSubmit, to proto, and to each of the client and server. At this point we can request a job to be submit to a specific cluster, and the token that was generated on register of the cluster is required to “authenticate.” We then validate those things and add the job to the database! Next we need a small client to run from within a flux instance and check for jobs assigned to it, and when it receives one, it will be removed from the database. I think I want to make flux-core “bindings” for Go first.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/rainbow
add database and basic auth flow
This will add an actual sqlite database backend to the server, meaning that a register request will provide a cluster name, a secret, and then get a token back that can be used for subsequent requests. I will next work on the job submission and then we can hook in an actual flux instance and give it whirl by adding a poll command to get the jobspec
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to supercontainers/compspec-go
add node feature discovery (nfd)
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/wg-image-compatibility
add requirements to proposal d
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/half-baked
add rellume example
rellume seems to take byte code, in the example it is on the level of a function, and compile to llvm ir. This might be too small/scoped for what we want to do, but could still offer a means to test something out, so worth including. The container added here (pushed with tag “rellume” builds the software and examples
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/flux-sched
Merge pull request #1120 from milroy/lib-updates
Build reapi_cli as a shared library</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-02-09 (#197)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to researchapps/wg-image-compatibility
add requirements to proposal d
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/half-baked
add note from dinos
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to conda-forge/urlchecker-feedstock
Fix missing selenium requirement and add pip check. (#19)
-
Add missing selenium requirement.
-
Add pip check.
-
Bump build number.</small>
vsoch pushed to singularityhub/singularity-hpc
try again
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rse-ops/lammps-matrix
add results
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
julia: fix self-referential dependencies (#42486)
vsoch pushed to converged-computing/half-baked
add minicluster example
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-02-05 (#193)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
update CV with the bare metal bros!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to supercontainers/compspec-go
add arch command for variant of metadata
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rse-ops/lammps-matrix
add two missing builds - total should be 18
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to conda-forge/urlchecker-feedstock
urlchecker v0.0.35 (#18)
-
updated v0.0.35
-
MNT: Re-rendered with conda-build 3.28.4, conda-smithy 3.30.4, and conda-forge-pinning 2024.02.03.22.07.48</small>
vsoch pushed to supercontainers/compspec
changed my mind - io.archspec!
vsoch pushed to rse-ops/hpc-apps
Merge pull request #30 from rse-ops/add/mpi-graph
add mpigraph prototype</small>
vsoch pushed to researchapps/flux-sched
fix: clean up go docstrings for auto format
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
Add files via upload
vsoch pushed to flux-framework/spack
add gmsh v4.12.2 (#42375)
vsoch pushed to converged-computing/operator-experiments
add prototype run for lscpu/lstopo on google cloud size vcpu4 vms
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-views
rocky builds fail adding pmix
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/wg-image-compatibility
add proposal D
Proposal D is an extension to Proposal C. Proposal C defines an explicit example of a compatibility artifact, meaning what a single artifact would look like paired alongside an image in a registry (in some way) to describe its compatibility for image selection or similar. Proposal D defines a compatibility schema that is maintained by a compatibility interest group, for which the goal is to define the namespace of allowed metadata attributes and relationships between them. These two proposals are complementary and would work together to allow for validation and understanding of relationships between terms, but without adding complexity to the compatibility artifact (Proposal C) directly.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/flux-sched
fix: clean up go docstrings for auto format
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
py-frozendict: patch up for Python 3.11 (#42192)
- py-frozendict: patch up for Python 3.11
See also Marco-Sulla/python-frozendict#68, rely on a pure Python implementation when 3.11+ is used.
- mention related Github issue</small>
vsoch pushed to conda-forge/oras-py-feedstock
oras-py v0.1.27 (#24)
-
updated v0.1.27
-
MNT: Re-rendered with conda-build 3.28.4, conda-smithy 3.30.4, and conda-forge-pinning 2024.01.26.23.27.14</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-01-25 (#190)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Apply black 2024 style to Spack (#42317)
vsoch pushed to supercontainers/compspec-go
Merge pull request #8 from supercontainers/add-check-prototype
add example prototype for check</small>
vsoch pushed to flux-framework/flux-operator
add aws example for flux-restful service sidecar
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to supercontainers/compspec
Add/supercontainers os (#5)
- add generic available for supercontainers gpu
- add simple representation for os information
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-01-26 (#191)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to oras-project/oras-py
Update update-contributors.yaml
Signed-off-by: Vanessasaurus <814322+vsoch@users.noreply.github.com></small>
vsoch pushed to vsoch/scif
Merge pull request #72 from vsoch/add/pypi-release-workflow
add workflow to release to pypi</small>
vsoch pushed to supercontainers/compspec
Merge pull request #4 from supercontainers/simplify-mpi
simplify MPI</small>
vsoch pushed to supercontainers/compspec-go
add: kernel extractor basic example
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #105 from rseng/update/analysis-2024-01-24
Update from update/analysis-2024-01-24</small>
vsoch pushed to researchapps/flux-sched
build: don’t look for libpmi.so
Problem: the build system looks for flux-core’s libpmi but it is unused.
Drop it from FindFluxCore.cmake.</small>
vsoch pushed to conda-forge/scif-feedstock
scif v0.0.82 (#7)
-
updated v0.0.82
-
MNT: Re-rendered with conda-build 3.28.4, conda-smithy 3.30.4, and conda-forge-pinning 2024.01.24.19.49.09</small>
vsoch pushed to rse-ops/lammps-matrix
add missing manifest-tool gpu builds
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rse-ops/lammps-matrix
add image table and manifest tool
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to supercontainers/compspec
Merge pull request #1 from supercontainers/add-ci
ci: add schema validation and drafts</small>
vsoch pushed to researchapps/jobset-jupyter
Merge pull request #1 from kannon92/job-to-jobset
rename job to jobset for clarity</small>
vsoch pushed to flux-framework/flux-k8s
support for skeleton grpc server and service/ingress for external client
This adds a prototype support for an extra helm flag that dually enables adding an extra grpc set of endpoints, and then the configs (ingress and service) necessary to expose them. I next need to figure out how to interact with grpc from a local client, likely built from the same codebase and grpc spec. This is super cool!!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
path bug for prototypes/aws-0
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/fluence-kubectl
ensure graph summary multiplies each by count
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-k8s
support for skeleton grpc server and service/ingress for external client
This adds a prototype support for an extra helm flag that dually enables adding an extra grpc set of endpoints, and then the configs (ingress and service) necessary to expose them. I next need to figure out how to interact with grpc from a local client, likely built from the same codebase and grpc spec. This is super cool!!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/fluence-kubectl
need to stop for tonight
I am so far into this rabbit hole I am afraid if I do not stop I will not come out.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to snakemake/snakemake-executor-plugin-googlebatch
Update snakemake_executor_plugin_googlebatch/executor.py
Co-authored-by: Johannes K
vsoch pushed to flux-framework/flux-k8s
support for skeleton grpc server and service/ingress for external client
This adds a prototype support for an extra helm flag that dually enables adding an extra grpc set of endpoints, and then the configs (ingress and service) necessary to expose them. I next need to figure out how to interact with grpc from a local client, likely built from the same codebase and grpc spec. This is super cool!!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-01-18 (#188)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
feat: add support for downsize
we can allow the cluster to downsize if the follower broker exits cleanly with 0, without need for the broker index max completions attribute that is enabled with a feature gate and requires k8s 1.28. This change also adds support for a minSize cluster, which will work to start the quorum when fewer than the size workers are available. note that this does not adjust tasks given to a job, so might be assigning too many tasks to too few workers. This also adds in the previous downsize workers example, except instead of using pkill for rockylinux we fall back to flux overlay disconnect, as pkill is not available by default. It is up to the user to ensure that the follower broker can be disconnected (and is not running anything). Finally, we add support for a flux->arch tag, specifically for an arm binary to be downloaded and used for the go-wait-fs command.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/goshare
add arm build
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #104 from rseng/update/analysis-2024-01-17
Update from update/analysis-2024-01-17</small>
vsoch pushed to researchapps/wg-image-compatibility
rename schemaVersion to version
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/sregistry
ci
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-k8s
add examples with lammps to reproduce error
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-k8s
logs: more for various steps to see what is going on
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
Update 2024-01-14-nuances-of-job-design.md
vsoch pushed to vsoch/vsoch.github.io
add post on job design
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-01-15 (#187)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to researchapps/wg-image-compatibility
add: proposal C for working compatibility group
This proposal is focused on a simple design for metadata about compatibility within either an existing manifest (or list) or a newly created artifact. It describes a plugin architecture and namespaced attributes (the metadata) that are maintained by compatibility interest groups, and a plugin framework that includes plugins for extracting, checking, and creation, and within each flexibility for simple compatibility checks (e.g., key/value pair matching) or more complex graph-based approaches.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/operator-experiments
restore iters back
vsoch pushed to snakemake/snakemake-executor-plugin-flux
Merge pull request #5 from snakemake/release-please–branches–main–components–snakemake-executor-plugin-flux
chore(main): release 0.1.0</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-01-11 (#186)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to opencontainers/wg-image-compatibility
Merge pull request #5 from mfranczy/organizers
Format the organizers in alphabetical order</small>
vsoch pushed to flux-framework/flux-k8s
wip: experimental work on fluence
I am testing adding the abstraction of a pod group to be carried from fluence labels (on the pods) through to the same memory cache we are currently using for the list of pods. As long as store the timestamp there (and the cache is created once) I think this might be a good replacement, but I also might have just broke everything and it is a terrible idea.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Delete quay.io/pawsey/cuda-intel-hpc-python/container.yaml
This URI no longer has any tags associated.
Signed-off-by: Vanessasaurus <814322+vsoch@users.noreply.github.com></small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #103 from rseng/update/analysis-2024-01-10
Update from update/analysis-2024-01-10</small>
vsoch pushed to flux-framework/flux-operator
ensure that worker exits with 0 so pod cleans up
If we do a flux drain
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to snakemake/snakemake-executor-plugin-googlebatch
clean up install of snakemake assets
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-k8s
wip testing strategies for pod grouping
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
hepmc3: add v3.2.7 (#41879)
Bugfix only, https://gitlab.cern.ch/hepmc/HepMC3/-/compare/3.2.6…3.2.7?from_project_id=6751&straight=false.</small>
vsoch pushed to converged-computing/metrics-operator-experiments
wip: experiments for google
the issue we are running into here is that lammps does not work on most instance types
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
add kueue snakemake post!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to snakemake/snakemake-executor-plugin-kueue
feat: add support for flux operator (#6)
- wip: add support for flux operator
- prototype with flux operator working!
I have the MiniCluster created and the command running - the challenge now will be figuring out how to hand the long snakemake command to flux to actually run.
both hello world and lammps are working! We use the launcher approach, which gives the user more freedom to submit jobs as they need/like.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #104 from flux-framework/release-docs-2024-01-06
Update from release-docs-2024-01-06</small>
vsoch pushed to converged-computing/metrics-operator-experiments
add spot testing prices for google cloud
There might be some interesting finds here! It will be good to test if lammps can run across spot next.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/cloud-select
Update update-contributors.yaml
vsoch pushed to flux-framework/flux-k8s
Merge pull request #51 from flux-framework/remove-backup-file
remove backup file main.go.bk</small>
vsoch pushed to converged-computing/operator-experiments
fluence is working again!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-lima
json does not work either!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/devstories-episodes-2
add wolf episode 93
vsoch pushed to rseng/devstories
add wolf episode
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/eksctl
fix unit test failures for pkg/actions/nodegroup/update_test.go
vsoch pushed to snakemake/snakemake-executor-plugin-kueue
Merge pull request #5 from snakemake/release-please–branches–main–components–snakemake-executor-plugin-googlebatch
chore(main): release 0.1.0</small>
vsoch pushed to snakemake/snakemake-executor-plugin-googlebatch
feat: add simple example with cos (#22)
We will likely want to test more complex workflows here, but this should be a good start!
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #102 from rseng/update/analysis-2024-01-03
Update from update/analysis-2024-01-03</small>
vsoch pushed to converged-computing/operator-experiments
add wip fluence/default sched experiments
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Change h5z-zfp from MakefilePackage to CMakePackage. (#41890)
Remove versions before 1.1.0 that do not support CMake. Remove patches for the removed versions.</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2024-01-01 (#183)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to researchapps/nixpkgs
maintainers: add vsoch
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-lima
preparing to test single node runs in usernetes
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/chonker-awards
Update README.md
vsoch pushed to singularityhub/shpc-registry-cache
update cache
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
CI: Fix timing search paths to ignore bootstrap (#40677)
vsoch pushed to flux-framework/flux-operator
add support for schedulerName
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/operator-experiments
add small experiment to test fluence with flux operator
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/lammps-stream-ml
tweak docs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/lammps-stream-ml
I am not happy with these multiple scripts (#5)
- I am not happy with these multiple scripts
And think it is reasonable to install the api clients to the host for interaction
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #101 from rseng/update/analysis-2023-12-27
Update from update/analysis-2023-12-27</small>
vsoch pushed to converged-computing/flux-lima
set priviledged to 0, I am sure this is a bad idea
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/lammps-stream-ml
Add k8s (#2)
- add kubernetes deployment
this will run the ml-service in kubernetes, and then run lammps alongside it from a singularity container. We randomly select parameters x,y,z from some range and then use that to train each of three models in the server. We then have another script that does the same to generate testing data, save the predictions alongside the actual values, and calculate an accuracy.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/django-river-ml
remove napolean from docs build
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #362 from rseng/update/software-2023-12-24
Update from update/software-2023-12-24</small>
vsoch pushed to converged-computing/flux-lima
add pytorch with resources
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
add the tiny hurt
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to snakemake/snakemake-executor-plugin-googlebatch
use variables for project and storage name
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/nixpkgs
Merge pull request #275263 from r-ryantm/auto-update/python310Packages.dissect-extfs
python310Packages.dissect-extfs: 3.6 -> 3.7</small>
vsoch pushed to converged-computing/flux-lima
ensure worker of pytorch is consistent with 5 epochs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flex-archspec
query is working, but problems with querying expressiveness
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/flux-sched
add clear error message to output
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-lima
add sha to metric-lammps container;
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-lima
disable lead broker from running jobs
we need the experiment cluster size to be 6. The 7th node is for usernetes to install operators to, etc.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/archspec-go
feat: add parsing of compilers to archspec-go
Problem: we currently have compilers in the metadata but they are not parsed. Having compilers for use in the library would be hugely useful. Solution: add them to be parsed.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/jsongraph-go
add janky json parser (#5)
- add json parser
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/jsongraph-go
Merge pull request #2 from converged-computing/add/node-init
ensure we add nodes on init</small>
vsoch pushed to converged-computing/flux-lima
install singularity faster way!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/flux-core
maint: remove flux spack docker
Problem: the flux + spack docker images in /etc are not commonly used and would better be maintained elsewhere anyway. Solution: delete them from this repository (and will put elsewhere)
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
environment modifications for externals (#41723)
-
allow externals to configure environment modifications
-
docs for external env modification
Co-authored-by: becker33 becker33@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-lima
ensure we write the correct socket path to flux bashrc
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to hpc-social/hpc-social.github.io
Update update-contributors.yaml
vsoch pushed to flux-framework/flux-k8s
ci: add automated and on demand testing of fluence
Problem: we cannot tell if/when fluence builds will break against upstream Solution: have a weekly run that will build and test images, and deploy on successful results. For testing, I have added a complete example that uses Job for fluence/default-scheduler, and the reason is because we can run a container that generates output, have it complete, and there is no crash loop backoff or similar. I have added a complete testing setup using kind, and it is in one GitHub job so we can build both containers and load into kind, and then run the tests. Note that MiniKube does NOT appear to work for custom schedulers - I suspect there are extensions/plugins that need to be added. Finally, I was able to figure out how to programmatically check both the pod metadata for the scheduler along with events, and that combined with the output should be sufficient (for now) to test that fluence is working.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/slurm-operator
Merge pull request #4 from converged-computing/update/jobset
update slurm operator with new jobset</small>
vsoch pushed to converged-computing/flux-lima
ensure we add hostnames to /etc/hosts
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flex-ice-cream
fix: typos
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-views
fix markdown table formatting
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-views
test arm builds for current builds (#7)
- test arm builds for current builds (will need to do on aws instance)
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-lima
deny ssh for fluxuser
vsoch pushed to converged-computing/cyclecloud-flux
add note about conda-forge
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
ensure we run test for disable-view
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/cyclecloud-flux
add much faster build with conda-forge
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry-cache
update cache command
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-views
add arch build arg to container
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
update PR action to master branch
vsoch pushed to flux-framework/flux-operator
nginx example test needs to specify not to wrap the entrypoint
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
add custom metrics api example
This allows us to install the flux metrics api alongside the lead broker, meaning that with ingress and a selector service that exposes the pod with the lead broker, we can easily get real time stats about the cluster and queue!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-lima
update scripts with more logic
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/operator-experiments
update readme with summary and next steps
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to conda-forge/spython-feedstock
updated v0.3.13 (#52)
vsoch pushed to singularityhub/shpc-registry-cache
Update update-cache.yaml
vsoch pushed to vsoch/vsoch.github.io
Update 2023-12-09-resources-cgroups-kubernetes.md
vsoch pushed to vsoch/vsoch.github.io
add post on cgroups on kubernetes with gotchas
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #360 from rseng/update/software-2023-12-10
Update from update/software-2023-12-10</small>
vsoch pushed to researchapps/efa-device-plugin-helm
add support for hpc7g family of images
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/operator-experiments
add work that shows controlling cgroups to assign more than one flux pod to a single physical node
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
pull request action update contributors use master branch
Signed-off-by: Vanessasaurus <814322+vsoch@users.noreply.github.com></small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2023-12-09 (#176)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
Testing new algorithm: adjust bounds based on max cost/core then randomize (#4)
- testing new algorithm
instead of just choosing the first by index, this approach chooses to remove the most instance size (decrement the max bound by one) for the most expensive instance type. This means in practice we tend to keep the cheaper ones (cost per core per hour) around and will run out of solutions (e.g., around 17). Instead we could also, when this happens, allow removing one at random. We will likely need to allow for duplicate results in there, as we are likely to iterate over the same bounds more than once. Would be good to discuss this!
- allow for randomizing instead (I like this better)!
eyeballing the data, with the randomization, about 75% of the solutions sill use the cheapest core, but then 25% do not! That seems pretty good, and I think we can adjust things to tweak how we explore the space.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to snakemake/snakemake-executor-plugin-kueue
cleanup: tabs in readme
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/devstories-episodes-2
Add files via upload
vsoch pushed to rseng/devstories
update date
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
llvm: reformulate a when condition to avoid tautology (#41461)
The condition on swig can be interpreted as “true if true, false if false” and gives clingo the option to add swig or not.
If not other optimization criteria break the tie, then the concretization is non-deterministic.</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #99 from rseng/update/analysis-2023-12-06
Update from update/analysis-2023-12-06</small>
vsoch pushed to flux-framework/spack
py-gidgetlab: add new package (#41338)
-
gidgetlab: add new package
-
Convert both cachetools and aiohttp to optional deps with variants
-
Fix forgotten variant conditional on cachetools dependency
-
Add git url and main version for dev workflows
-
Fix variant and dependency ordering
-
Remove cachetools variant and merge dependency with aiohttp variant</small>
vsoch pushed to converged-computing/kubescaler
remove waiter for nodes function so clear it is a bad idea
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/supercontainers.org
Spelling mistake in team page
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2023-12-04 (#175)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to researchapps/scheduler-plugins
networkAware plugins
vsoch pushed to rseng/software
Merge pull request #359 from rseng/update/software-2023-12-03
Update from update/software-2023-12-03</small>
vsoch pushed to converged-computing/metrics-operator-experiments
update spot instance script
I am stopping here because I do not see that spot prices are an improvement over the hpc instance types. E.g.,
- hpc7g: 64 physical cores, $1.68 per hour
- hpc6a: 96 physical cores, $2.88 per hour and it seems like this maybe gives us 1-3 instance types per filter group, assuming those are physical cores and not vCPU. We should continue this if we are interested in other instance sizes only.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
eccodes: add v2.32.0, v2.31.0 (#40770)
- eccodes new versions and dependencies
- Suggested changes for multiple variant defaults
- Update var/spack/repos/builtin/packages/eccodes/package.py
Co-authored-by: Sergey Kosukhin skosukhin@gmail.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
add support for function to monitor activity of spot instances
We want a separate thread that can control watching the spot instances, and keeping track of when they appear and go away. While not exact, if this runs every 15 seconds we can get some sense of this! This function also currently takes charge of disabling hyper threading via the hotplug script. Later I am going to explore accomplishing this via a launch template, so that we can have assurance that the nodes come up ready to go. My biggest concern here is that the template will not allow for detailed enough customization of different spot types and the need for the single thread. We may need to run the logic on all types, regardless.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
add mountpoint driver example
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-containers
add linktest metrics container (#36)
- add linktest metrics container
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
py-numba: add v0.58.1 (#41262)
-
py-numba: add v0.58.1
-
Passing tests</small>
vsoch pushed to converged-computing/metrics-operator-experiments
add early experiment planning for spot
The spot_instances.py is refactored to include spot prices, and we have an idea of the overall design. I next need to write a test setup that will implement the features that I want, namely using the metrics operator to run lammps, hwloc, and then pushing to a remote oras cache (needs to be developed in the oras-operator) and also using the aws locality / topology API to get metadata for each group. Primarily I am interested in testing the different sizes scoped in the README against problem sizes to better estimate the total time and thus cost.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #98 from rseng/update/analysis-2023-11-29
Update from update/analysis-2023-11-29</small>
vsoch pushed to flux-framework/spack
cuda: add 12.3.0 (#40827)
vsoch pushed to converged-computing/metrics-containers
Add/ecp copa exampm (#35)
- wip to add exaMPM
Signed-off-by: vsoch vsochat@stanford.edu</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2023-11-27 (#174)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator
add cabana pic (#82)
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-containers
update cabana (#34)
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to researchapps/scheduler-plugins
test: unit tests are running without issue
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/operator-experiments
add missing note about PodGroup
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-containers
add cabana (#33)
- add cabana
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to snakemake/snakemake-executor-plugin-googlebatch
linting
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-lima
add updates for using/learning eBPF (#3)
- add updates for using/learning eBPF
- fix up package installs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
add kubecon talk
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2023-11-23 (#173)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-lima
flux inside kubernetes inside flux!!
It worked! I ran lammps. How awesome!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #97 from rseng/update/analysis-2023-11-22
Update from update/analysis-2023-11-22</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2023-11-20 (#172)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to converged-computing/usernetes-lima
refactor to be more automated (#2)
- refactor to be more automated
- update end of readme
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/oras-operator
Merge pull request #16 from converged-computing/add/docs-test
add test for building docs</small>
vsoch pushed to converged-computing/metrics-operator-experiments
add hpctoolkit automation
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
add proper interaction with oras registry via ingress
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator
Merge pull request #80 from converged-computing/hpc-toolkit-as-init
hpctoolkit as addon init container</small>
vsoch pushed to rseng/software
Merge pull request #357 from rseng/update/software-2023-11-19
Update from update/software-2023-11-19</small>
vsoch pushed to flux-framework/spack
votca: add v2023 (#41100)
vsoch pushed to hpc-social/noodles-award
last update for noodles award
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #100 from flux-framework/release-docs-2023-11-18
Update from release-docs-2023-11-18</small>
vsoch pushed to converged-computing/usernetes-lima
[wip] further automate setup (#1)
- further automate setup
- updates to worker node
we can move a lot of the setup into the usernetes.yaml
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator
add ml example (#76)
- add ml example
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
add flux operator refactor post
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to opencontainers/specs.opencontainers.org
Merge pull request #5 from jdolitsky/link-to-playground
Add link to OCI specs latest webpage</small>
vsoch pushed to converged-computing/metrics-operator-experiments
add first automated run
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator
adding test for building docs (#79)
the docs build fails once in a while so we should test before merge
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to conda-forge/oras-py-feedstock
oras-py v0.1.26 (#23)
-
updated v0.1.26
-
MNT: Re-rendered with conda-build 3.27.0, conda-smithy 3.29.0, and conda-forge-pinning 2023.11.16.09.46.51</small>
vsoch pushed to vsoch/vsoch.github.io
add things fall apart
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/singularity-hpc
Merge pull request #666 from singularityhub/contributors/update-2023-11-15
[tributors] contributors/update-2023-11-15</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #96 from rseng/update/analysis-2023-11-15
Update from update/analysis-2023-11-15</small>
vsoch pushed to rse-ops/hpc-apps
update sundials build to use jammy (#28)
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
sundials working again
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
add back flux restful example
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to dipietrantonio/singularity-hpc
Update user-guide.rst
Make it clear that the container image is bundled with the recipe when adding a local container.</small>
vsoch pushed to flux-framework/flux-python
Merge pull request #8 from flux-framework/remove-security-requirement
test: setup that does not require building flux security</small>
vsoch pushed to converged-computing/flux-views
Merge pull request #6 from converged-computing/add/ubuntu-version
add extended ubuntu versions</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2023-11-13 (#170)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to rse-ops/hpc-apps
add back ramble without flux (#27)
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rse-ops/hpc-apps
re-add k3s without flux (#26)
- readd k3s without flux
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #356 from rseng/update/software-2023-11-12
Update from update/software-2023-11-12</small>
vsoch pushed to rse-ops/hpc-apps
missing pip install
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
update docs for storage and remove singularity examples
We no longer need to have a container built with flux inside, so using singularity further inside of an application container is not a likely use case. I am removing them for now. I also tested the filestore setup on GCP and for now am removing the fusion setup since it is not open source.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to snakemake/snakemake-executor-kueue
preparing to run with cache pull
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-docs
Merge pull request #256 from wickberg/master
Use preferred capitalization of “Slurm”</small>
vsoch pushed to hpc-social/noodles-award
improve background color for readability on hover
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/oras-operator
ensure we copy inputs to correct place (#14)
- ensure we copy inputs to correct place
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to sci-f/sci-f.github.io
Merge pull request #16 from preminger/fix-ver-and-date-for-1.1.1
fix version to 1.1.1 and date to 2023-02-24</small>
vsoch pushed to converged-computing/oras-operator
Support multiple inputs (#12)
- feature: support multiple inputs
Workflow DAGs do not always have one parent. This will allow a step to extract more than one input,a and for now we assume into the same path.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #95 from rseng/update/analysis-2023-11-08
Update from update/analysis-2023-11-08</small>
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #99 from flux-framework/release-docs-2023-11-08
Update from release-docs-2023-11-08</small>
vsoch pushed to converged-computing/usernetes-terraform-gcp
update to use tofu
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
add post on mutating webhook
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/oras-operator
Spelling errors in orascache_webhook.go
Bad dinosaur.</small>
vsoch pushed to converged-computing/oras-operator
try adding // +kubebuilder:object:generate=true
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
fix configure args for darshan-runtime
Problem: the current configure arguments are added lists to a list, and this needs to be adding strings to the same list. Solution: ensure we add each item (string) separately.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
small typos
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/devstories-episodes-2
Add files via upload
vsoch pushed to rseng/devstories
add links to johannes site
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/operator-experiments
add vanilla setup
there are new (errors?) or messages I have never seen before. I am going to try and research updated scheduler plugins and see if I can understand the messages.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator
Add/dlio (#75)
- add dlio example
This is an IO tool so we run it alongside IOR. We will eventually also add an application that uses IO and does something with ML.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
add setup for SOMOSPIE
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-containers
remove hdf5 variant
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-burst
update apidocs
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #94 from rseng/update/analysis-2023-11-01
Update from update/analysis-2023-11-01</small>
vsoch pushed to flux-framework/locator-map
add note about what it means
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to LLNL/radiuss
Merge pull request #61 from hauten/cht
Add video link to tutorials page</small>
vsoch pushed to rse-ops/hpc-apps
missing slash
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rse-ops/hpc-apps
add weave demos (#23)
- add weave demos
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2023-10-30 (#167)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to converged-computing/aws-tofu
add tiny tofu
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/software
Merge pull request #354 from rseng/update/october-28
updating software database end of october</small>
vsoch pushed to converged-computing/oras-operator
WIP to add entrypoint logic (#3)
- WIP to add entrypoint logic
The oras cache sidecar needs to know how to pull an artifact, and then create an indicator to the application that it is ready. The application needs to wait for the indicator, and then run an entrypoint that wraps the command. Since I want to keep things simple (and not navigate adding a config map for a webhook !) I am starting with a wget of the scripts instead of that. We will see if this is a good idea or not!
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2023-10-26 (#166)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to tjgalvin/singularity-cli
linting and pin black to 23.3.0
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/singularity-cli
Automated deployment to update contributors 2023-10-25 (#210)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #93 from rseng/update/analysis-2023-10-25
Update from update/analysis-2023-10-25</small>
vsoch pushed to rse-ops/hpc-apps
add back qmcpack (#22)
- add back qmcpack
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to conda-forge/spython-feedstock
spython v0.3.1 (#50)
-
updated v0.3.1
-
MNT: Re-rendered with conda-build 3.27.0, conda-smithy 3.27.1, and conda-forge-pinning 2023.10.23.08.51.53</small>
vsoch pushed to rse-ops/flux-hpc
missing comma
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to snakemake/poetry-snakemake-plugin
fix: bug that storage scaffold not added
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-restful-api
restore container to latest
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
remove service example for now
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator
add mpitrace (#74)
- add mpitrace addon
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rseng-activity
add plot that accounts for when zenodo doi was created
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rse-ops/snakemake-executor-plugin-googlebatch
Merge pull request #2 from rse-ops/add/snippets
add support for basic snippets</small>
vsoch pushed to flux-framework/flux-operator
flux-restful start of working
Need to debug pip dependencies separately
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-containers
Merge pull request #31 from converged-computing/add/mpitrace
add mpitrace rocky</small>
vsoch pushed to rse-ops/snakemake-executor-plugin-googlebatch
Merge pull request #1 from rse-ops/add/testing-setup
add skeleton for tests</small>
vsoch pushed to flux-framework/spack
build(deps): bump actions/checkout from 4.1.0 to 4.1.1 (#40584)
vsoch pushed to singularityhub/container-executable-discovery
clear space needs bash shell
vsoch pushed to rse-ops/snakemake-executor-plugin-googlebatch
feat: upload to google storage build cache and download script
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/kubescaler
Merge pull request #14 from converged-computing/add/timeout-signal
add timeout signal decorator</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #92 from rseng/update/analysis-2023-10-18
Update from update/analysis-2023-10-18</small>
vsoch pushed to rse-ops/snakemake-executor-plugin-googlebatch
workflow step is now running on batch
The workflow step is running! I needed to use microconda that would support python 3.11. For the next step we need to think about how to persist workflow assets. Likely an approach similar to what we did for Life Sciences can at least be tried first.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Merge pull request #118 from flux-framework/update-flux-sched-cmake
update: flux-sched to build with cmake</small>
vsoch pushed to converged-computing/metrics-operator-experiments
add example run and some bugfixes
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator
Merge pull request #73 from converged-computing/allow-custom-kubeconfig
fix: allow metrics operator python sdk to take custom kubeconfig</small>
vsoch pushed to sciworks/spack-updater
add back cache for build (to test)
vsoch pushed to rse-ops/snakemake-executor-plugin-googlebatch
add command generation based on image family
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator
adding hwloc metric (#72)
- adding hwloc metric
- hwloc should be system family
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/cloud-select
Merge pull request #33 from xorJane/gpu-vendors
Update GPU vendor options</small>
vsoch pushed to vsoch/snakemake
feat: implement precommand (#2482)
Description
QC
- The PR contains a test case for the changes or the changes are already covered by an existing test case.
- The documentation (
docs/
) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).</small>
vsoch pushed to singularityhub/container-executable-discovery
Ensure we max space for builds/pulls
vsoch pushed to flux-framework/flux-operator
shared process namespace example working
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-containers
trigger build hwloc
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-containers
prepare to add hwloc sniffer as a metric! (#30)
Signed-off-by: vsoch vsoch@users.noreply.github.com Co-authored-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
updating elasticity basic example to work with new design
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
Merge pull request #209 from flux-framework/test-init-container
test init container design instead of sidecar</small>
vsoch pushed to converged-computing/kubescaler
tweak spacing of x emoji
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/cloud-select
formatting
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to conda-forge/juliart-feedstock
Rebuild for python312 (#11)
-
Rebuild for python312
-
MNT: Re-rendered with conda-build 3.27.0, conda-smithy 3.27.1, and conda-forge-pinning 2023.10.14.14.50.55</small>
vsoch pushed to flux-framework/flux-operator
update tests
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/operator-experiments
add back nodes
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2023-10-12 (#163)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to researchapps/flux-accounting
feat: developer container environment
Problem: there is no easy way to develop (in a container) and this would be nice if we consider other kinds of bindings. Solution: add a .devcontainer setup.
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Merge pull request #120 from flux-framework/update-package/flux-core-2023-10-11
Update from update-package/flux-core-2023-10-11</small>
vsoch pushed to flux-framework/flux-operator
add back debug and existing volumes
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/flux-operator
ignore trailing newlines
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Add step to make more space
vsoch pushed to converged-computing/flux-views
Merge pull request #5 from converged-computing/another-strategy
try another strategy</small>
vsoch pushed to flux-framework/flux-operator
update entrypoint and curve generation (still not working but testing)
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-views
test adding ubuntu (#2)
- test adding ubuntu
- we do not need munge
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/flux-views
wrong org, should be converged-computing and not rse-ops
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to vsoch/vsoch.github.io
update cv
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to singularityhub/shpc-registry
Automated deployment to update containers 2023-10-09 (#162)
Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to rseng/software
update checkout action to v4
vsoch pushed to converged-computing/metrics-operator-experiments
add eperiments with osu and network optimizations on c2d
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Update var/spack/repos/builtin/packages/flux-pmix/package.py
Co-authored-by: Tamara Dahlgren <35777542+tldahlgren@users.noreply.github.com></small>
vsoch pushed to oras-project/oras-py
Automated deployment to update contributors 2023-10-05 (#110)
Signed-off-by: github-actions github-actions@users.noreply.github.com Co-authored-by: github-actions github-actions@users.noreply.github.com</small>
vsoch pushed to flux-framework/spack
Update var/spack/repos/builtin/packages/flux-pmix/package.py
Co-authored-by: Mark Grondona mark.grondona@gmail.com</small>
vsoch pushed to singularityhub/shpc-registry
rename vault to hashicorp vault
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to rseng/rsepedia-analysis
Merge pull request #91 from rseng/update/analysis-2023-10-04
Update from update/analysis-2023-10-04</small>
vsoch pushed to hpc-social/map
Add link to add entry to navbar (#12)
Add link to add entry to navbar in nav.html</small>
vsoch pushed to flux-framework/spack
py-tables: add v3.8.0 (#40295)
vsoch pushed to flux-framework/flux-framework.github.io
Merge pull request #97 from flux-framework/release-docs-2023-10-04
Update from release-docs-2023-10-04</small>
vsoch pushed to converged-computing/metrics-operator-experiments
update plot title to reflect lammps categories
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>
vsoch pushed to converged-computing/metrics-operator-experiments
add lassen results! Will create shared lammps plot next
Signed-off-by: vsoch vsoch@users.noreply.github.com</small>